CN110795991A - Mining locomotive pedestrian detection method based on multi-information fusion - Google Patents
Mining locomotive pedestrian detection method based on multi-information fusion Download PDFInfo
- Publication number
- CN110795991A CN110795991A CN201910860797.5A CN201910860797A CN110795991A CN 110795991 A CN110795991 A CN 110795991A CN 201910860797 A CN201910860797 A CN 201910860797A CN 110795991 A CN110795991 A CN 110795991A
- Authority
- CN
- China
- Prior art keywords
- image
- model
- target detection
- training
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
- G06V40/25—Recognition of walking or running movements, e.g. gait recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a mine locomotive pedestrian detection method based on multi-information fusion, which comprises the steps of establishing a pedestrian data set of visible light and infrared light under a coal mine, respectively carrying out CLAHE finite contrast adaptive histogram equalization and optimal notch denoising method processing based on a 3-order Butterworth function on two types of images aiming at a special underground environment, adjusting and improving a YOLOv3 target detection network by adopting a dense connection and multi-scale pooling structure, extracting and fusing the characteristics of the two types of images, optimizing a loss function by adopting a cross entropy function, establishing a model, improving the accuracy, real-time property and stability of the model under the complex underground environment, further improving the training precision of the network by adopting a transfer learning method, shortening the training time, and further detecting a pedestrian target by using the target detection model obtained by training, and the pedestrian detection result in front of the mining locomotive is output in real time, so that the detection speed required by the running of the mining locomotive can be met.
Description
Technical Field
The invention relates to the technical field of underground coal mine detection, in particular to a mine locomotive pedestrian detection method based on multi-information fusion.
Background
With the continuous rise of the coal mine resource market, the underground transportation task is heavier. The data show that the rate of safety accidents caused by mine locomotive transportation accounts for 20% -30% of the total accidents due to the influence of dust, illumination and other factors of the operation environment. Transportation accidents may occur due to fatigue driving of drivers, improper operation or illegal operation of miners during operation of mine locomotives, so that serious harm is brought to life safety of the miners, and meanwhile, great loss is brought to production efficiency of the miners.
Generally, installing an object detection device on a locomotive under a coal mine is a main means for reducing accidents. And in the image acquisition stage, single visible light sensor is easily influenced by light, and is relatively poor to the penetrability of tiny granule, is difficult to adapt to complicated environment in the pit. The infrared sensor is less affected by dark light and dust, and the defects of the visible light sensor can be well made up. The current image processing technology has the technical defects of low processing precision, low processing speed and the like.
TABLE 1 comparison of visible and infrared sensors
Therefore, the advantages of visible light and infrared light are fully fused, and the convolutional neural network technology is combined to be applied to pedestrian target detection in front of the mine locomotive in order to prevent the occurrence of the accident of collision of the locomotive.
Disclosure of Invention
In order to solve the problems, the invention provides a method for detecting pedestrians by a mining locomotive based on multi-information fusion, which can overcome the defect that the existing visible light sensor is easily influenced by light and dust in a special underground environment, has strong adaptability and anti-interference capability, and further improves the detection precision and real-time performance.
In order to achieve the purpose, the invention provides a mining locomotive pedestrian detection method based on multi-information fusion, which comprises the following steps:
step 1, acquiring visible light and infrared light videos of pedestrians in front of a mining locomotive, extracting the videos into images, respectively preprocessing the images by using a CLAHE finite contrast adaptive histogram equalization and an optimal notch denoising method, then labeling the images by using LabelImg software, and expanding a data set by using an image enhancement method;
step 2, dividing the data set into a training set, a cross validation set and a test set according to the ratio of 8:1:1, wherein the training set is used for model training, the cross validation set is used for measuring the performance of the model so as to select optimal parameters, and the test set is used for final evaluation of the model; each data set is expanded into a plurality of scales through an image scaling method and is used for subsequent multi-scale training;
step 3, improving the YOLOv3 target detection network by adopting dense connection and a multi-scale pooling structure, and optimizing a loss function of the YOLOv3 target detection network;
step 4, initializing the weight parameters of the first 43 layers of convolutions of the improved YOLOv3 target detection network by adopting a transfer learning method, wherein the weight parameters of the first 43 layers of convolutions of the YOLOv3 target detection network are trained;
step 5, adjusting training parameters, and training the improved Yolov3 target detection network by using a training set;
step 6, selecting a model with the highest detection precision as an optimal model according to the detection result of the cross validation set, and then evaluating the performance of the model by using the test set;
step 7, analyzing the evaluation result, if the performance does not meet the expected requirement, executing the step 5 again, otherwise, directly outputting the trained target detection model;
and 8, detecting the re-acquired visible light and infrared light videos by using the trained target detection model, and outputting a pedestrian detection result in front of the mining locomotive in real time.
Optionally, the YOLOv3 target detection network is improved by adopting dense connection, including performing jump connection on a 52 × 52 × 256 feature map in the network, so that the adjusted feature map is overlapped with the subsequent two feature maps 26 × 26 × 512 and 13 × 13 × 512; the 26 × 26 × 512 feature map is then superimposed with the two subsequent feature maps 13 × 13 × 512 and 26 × 26 × 256, whereas the 13 × 13 × 512 feature map is superimposed with only one subsequent feature map 26 × 26 × 256.
Optionally, the YOLOv3 target detection network is improved by adopting a multi-scale pooling structure, which includes extracting 4 feature maps with different sizes from 13 × 13 × 512, 26 × 26 × 256 and 52 × 52 × 128 feature maps in the network through 4 pooling layers with different scales, combining context information of global and sub-regions, and then combining the 4 feature maps with original features to form a final feature expression, thereby performing convolution output.
Optionally, the optimizing the loss function includes defining class loss by using a cross entropy loss function, so that the model is more easily fitted, that is, the modified loss function is as follows: (1)
where S represents the network size and is 13 × 13, 26 × 26, or 52 × 52, B is the number of candidate frames, and the variable xiAnd yiAs coordinates of the center point of the candidate frame, wiAnd hiWidth and height of the bounding box, respectively, CiIs the confidence of the predicted object, p(ij)Is a class, variable, of an objectIs a predicted value;
indicating the presence of an object in bounding box j in mesh i;
Optionally, the expanding each data set to multiple scales by the image scaling method is to scale each image to 10 sizes using the image scaling method: {320, 352, 384, 416,448,480,512,544,576,608}.
Optionally, in step 1, for the visible light image, processing the visible light image by using a CLAHE finite contrast adaptive histogram equalization method; and for the infrared light image, denoising the infrared light image by adopting a filtering method of an optimal notch filter based on a 3-order Butterworth function.
Optionally, the denoising processing of the infrared image by using a filtering method based on an optimal notch filter of a 3-order butterworth function includes:
firstly, Fourier transform is carried out on an infrared light image G (x, y) containing periodic noise to obtain a frequency spectrum image G (u, v) of the infrared light image G;
a5-pair 3-order Butterworth notch band-pass filter H (u, v) is placed at the position of a noise peak and used for extracting a main frequency part of noise, and the mathematical expression of the filter is as follows:
wherein, for a notch, its center point coordinate is (u)k,vk) Then a distance D from the center of the filterk(u, v) and the coordinates of the center point of the notch, which is symmetrical about the origin, are (-u)k,-vk) At a distance D from the center of the filter-k(u, v), W is the width of the band, D0Is the center radius of the frequency band, k is a natural number;
the spectral image of the extracted noise can be represented as:
N(u,v)=H(u,v)G(u,v) (3);
and (3) obtaining a corresponding spatial domain image n (x, y) by carrying out inverse Fourier transform on the frequency spectrum image of the noise:
weighting and adjusting the noise by using a w (x, y) modulation function, and then subtracting the modulated noise image from the infrared image containing the noise in the spatial domain to obtain an estimate of a denoised image
The modulation function is then minimizedThe minimum of the variance over a given neighborhood (2a +1) (2b +1) of each point (x, y), namely:
The second derivative is made to be zero, and then the modulation function can be obtained:
and finally, obtaining the denoised space domain infrared image through the step 4.
Optionally, in step 5, the improved YOLOv3 target detection network is trained by setting different learning rates, weight attenuation coefficients and momentum coefficients, and then 10 different models are generated simultaneously.
Optionally, in step 6, the 10 models are verified on the cross-validation set to obtain their respective loss function values, and the model corresponding to the minimum value is determined as the optimal model.
In addition, the present invention also provides an electronic device including:
a memory for storing a computer program;
and the processor is used for executing the computer program stored in the memory, and when the computer program is executed, the pedestrian detection method for the mining locomotive based on multi-information fusion is realized.
The invention has the advantages and beneficial effects that: compared with the existing mine locomotive pedestrian detection technology, the invention provides a mine locomotive pedestrian detection method based on multi-information fusion, which has the following advantages:
(1) by utilizing a multi-sensor fusion technology, the defect of a single visible light sensor is overcome, and the method can adapt to complex underground coal mine environments.
(2) For visible light, a CLAHE finite contrast adaptive histogram equalization method is adopted to process the visible light image, and the problem that the visible light image is weak in detail under the dark light condition is effectively solved.
(3) For the infrared image, the filtering method of the optimal notch filter based on the 3-order Butterworth function is adopted to carry out denoising processing on the infrared image, and the influence of periodic noise is effectively reduced.
(4) By adopting the idea of dense connection, the characteristic graphs of different levels are overlapped in a jumping mode, so that the characteristic multiplexing is realized, the training parameters are effectively reduced, the performance of back propagation is improved, and the model is easier to train.
(5) By introducing a multi-scale pooling model structure, the network can be combined with feature maps of different levels, the context information of the whole area and the sub-area can be combined, and the detection precision can be effectively improved.
(6) The training precision of the network is improved and the training time is shortened by using the transfer learning technology.
(7) The method lays a foundation for the application of auxiliary driving of the mining locomotive and the like in the future.
Drawings
Fig. 1 is a flow chart of a training phase of a multi-information fusion mining locomotive pedestrian detection method in an embodiment of the invention.
FIG. 2 is a flow chart of a detection phase of a multi-information fusion mining locomotive pedestrian detection method in an embodiment of the invention.
Fig. 3 is a schematic diagram of the convolutional layer of the improved YOLOv3 network structure of the present invention.
Fig. 4 is a schematic diagram of a residual block of the improved YOLOv3 network structure in the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
In one embodiment, as shown in fig. 1, the invention provides a mining locomotive pedestrian detection method based on multi-information fusion, which includes the following steps:
step 1, acquiring visible light and infrared light videos of pedestrians in front of a mining locomotive, extracting the videos into images, respectively preprocessing the images by using a CLAHE finite contrast adaptive histogram equalization and an optimal notch denoising method, then labeling the images by using LabelImg software, and expanding a data set by using an image enhancement method;
step 2, dividing the data set into a training set, a cross validation set and a test set according to the ratio of 8:1:1, wherein the training set is used for model training, the cross validation set is used for measuring the performance of the model so as to select optimal parameters, and the test set is used for final evaluation of the model; each data set is expanded into a plurality of scales through an image scaling method and is used for subsequent multi-scale training;
step 3, improving the YOLOv3 target detection network by adopting dense connection and a multi-scale pooling structure, and optimizing a loss function of the YOLOv3 target detection network;
step 4, initializing the weight parameters of the first 43 layers of convolutions of the improved YOLOv3 target detection network by adopting a transfer learning method, wherein the weight parameters of the first 43 layers of convolutions of the YOLOv3 target detection network are trained;
step 5, adjusting training parameters, and training the improved Yolov3 target detection network by using a training set;
step 6, selecting a model with the highest detection precision as an optimal model according to the detection result of the cross validation set, and then evaluating the performance of the model by using the test set;
step 7, analyzing the evaluation result, if the performance does not meet the expected requirement, executing the step 5 again, otherwise, directly outputting the trained target detection model;
and 8, as shown in fig. 2, detecting the re-acquired visible light and infrared light videos by using the trained target detection model, and outputting a pedestrian detection result in front of the mining locomotive in real time.
Among them, a real-time object detection method is a youonly look once (YOLO) network structure of YOLOv3 (v 3 is version three), and the position of the bounding box and the category to which the bounding box belongs are directly regressed in the output layer by using the whole graph as the input of the network. The improved yollov 3 network structure is composed of a Convolutional Layer (conditional Layer) of 1x1 (default step size is 1), 3x3 and 3x3/2 (step size is 2), a Residual Block (Residual Block), a multi-scale average pooling Layer (averaging Pooling), an upsampling (Up Sample), and a feature map connection (collocation) as a whole.
FIG. 3 is a schematic view of a convolutional layer. The convolutional layer consists of a convolution, batch normalization and leakage ReLU activation function (linear unit function with leakage correction). The convolution is used for extracting features, batch normalization is used for improving the convergence efficiency of the algorithm and accelerating fitting, and the Leaky ReLU activation function is used for realizing the nonlinear operation of the network.
Fig. 4 is a schematic diagram of a residual block. The input x is output as F (x) after passing through two volumes, namely blocks, and the input x is directly connected with the output F (x) by introducing a jump connection to form the final output F (x) + x.
Taking a three-channel 416 × 416 color image as an example, the input image is subjected to four sets of convolution (32 3 × 3 convolutional layers, 64 3 × 3/2 convolutional layers, 32 1 × 1 convolutional layers, and 64 3 × 3 convolutional layers in sequence), and a feature map (feature map) with an output of 208 × 208 × 64 is obtained by calculation according to the following formula:
where n denotes the size of the input, p denotes the number of fills, f denotes the size of the convolutional or pooling layer, s denotes the step size (Stride), and out is the size of the output.
The method sequentially uses the obtained 208 × 208 × 64 feature map as input to obtain a 52 × 52 × 256 feature map through the subsequent 22 sets of convolutional layers, obtains a 26 × 26 × 512 feature map through the subsequent 17 convolutional layers for the 52 × 52 × 256 feature map, and obtains a 13 × 13 × 512 feature map through the subsequent 14 convolutional layers.
The 13 × 13 × 512 feature map is subjected to 5-layer convolution to obtain a 13 × 13 × 256 feature map, and then the feature map is up-sampled to 26 by using a bilinear difference algorithm, so that a 26 × 26 × 256 feature map is obtained. And obtaining a 52 × 52 × 128 feature map through 5-layer convolution and upsampling in a sequential method.
The 52 × 52 × 256 feature maps are subjected to skip connection, and the first connection is reduced to 26 × 26 × 128 by using 128 convolution layers of 3 × 3/2 and is overlapped with the subsequent first feature map. The second connection takes 128 convolutional layers of 3 × 3/2 and 64 3 × 3/2 to be dimensionality reduced to 13 × 13 × 64 and overlaid with the subsequent second feature map.
Similarly, the 26 × 26 × 512 feature maps are subjected to skip connection, adjusted by the convolution layer, and then superimposed on the two subsequent feature maps. Whereas the 13 x 512 signature is superimposed with only a subsequent one.
The 13 × 13 × 512 feature map is reduced in size by a multi-scale pooling method, that is, 4 feature maps are simultaneously output through 4 pooling layers (a global average pooling layer, a 5 × 5/4 average pooling layer of 7 × 7/6, and a 3 × 3/2 average pooling layer) and then are input and simultaneously pass through 128 1 × 1 convolutional layers to respectively obtain 4 feature maps, the feature maps are uniformly enlarged to 13 × 13 × 128 by an upsampling method, then are fused with the previous 13 × 13 × 512 feature map by a connection method, and finally the fused feature maps pass through 18 1 × 1 convolutional layers to obtain an output with the size of 13 × 13 × 18.
By using the above method for the 26 × 26 × 256 and 52 × 52 × 128 feature maps, 26 × 26 × 18 and 52 × 52 × 18 outputs can be obtained, respectively.
The first two-dimensional vector in the output represents the number of grids which divide the input image, each grid complexly detects an object of which the center position falls into, and each grid can simultaneously detect 3 kinds of objects by adopting a priori frame method, each object contains information of size, coordinates, confidence coefficient and class probability, and the total number of the objects is 5 parameters, so that the total number of the parameters is 18.
The method is characterized in that a traditional manual selection method is replaced, a modified K-means algorithm is adopted to aggregate a data set to obtain the size and the number of prior frames, and original distance measurement in the K-means algorithm is modified into IOU according to the following formula, namely distance measurement d is represented by subtracting the intersection ratio of all bounding boxes box and clustering center bounding box centroids from 1.
d(box,centroid)=1-IOU(box,centroid) (9);
For each output (3 total) 3 prior boxes were set, for a total of 9 sizes of prior boxes clustered. In assignment, a larger prior box is applied on the smallest 13 x 13 signature (with the largest receptive field) for detecting larger objects. Medium prior frames were applied on medium 26 x 26 signature (medium receptive field) for detection of medium sized objects. A smaller a priori box is applied on the larger 52 x 52 signature (smaller field of reception) for detecting smaller objects.
In the process of calculating the gradient decline of the weight parameters, the square sum function used by the original category loss is probably not a convex function, namely, a plurality of local optimal values exist, and a global optimal solution is difficult to obtain, so that the plan adopts a cross entropy function to replace the original square sum function to define the category loss. The modified loss function is as follows: (1)
where S represents the network size and is 13 × 13, 26 × 26, or 52 × 52, B is the number of candidate frames (3), and the variable xiAnd yiAs coordinates of the center point of the candidate frame, wiAnd hiWidth and height of the bounding box, respectively, CiIs the confidence of the predicted object, p(ij)Is a class, variable, of an objectIs a predicted value;
indicating the presence of an object in grid i, meaning that the grid in which the object is present accounts for the error.
Indicating the presence of an object in bounding box j in grid i, means that only the bounding box data that is "responsible" (relatively large) for the prediction will be subject to error.
-indicating that no object is present in the bounding box j in the mesh i.
Since there are some large objects or objects near multiple grids that can be detected by multiple grids at the same time, multiple bounding boxes are output, where the result of output repetition is filtered by using a non-maximum suppression method.
The method comprises the steps of extracting visible light and infrared light pedestrian videos recorded in front of a mining locomotive into images, processing the visible light images by a CLAHE finite contrast adaptive histogram equalization method, namely dividing the images by an 8 x 8 network, then respectively performing histogram equalization on each small block, and finally stitching the small blocks by using a bilinear difference algorithm in order to remove boundaries caused by the algorithm among the small blocks.
For the infrared image, because the infrared image is easily interfered by periodic noise, the infrared image is denoised by adopting a filtering method of an optimal notch filter based on a 3-order Butterworth function, namely:
firstly, Fourier transform is carried out on the infrared light image G (x, y) containing periodic noise to obtain a frequency spectrum image G (u, v).
According to the noise characteristics of the spectrum image, a 5-pair 3-order Butterworth notch band-pass filter H (u, v) is placed at the position of a noise peak and used for extracting the main frequency part of noise, and the mathematical expression of the filter is as follows:
wherein, for a notch, its center point coordinate is (u)k,vk) Then a distance D from the center of the filterk(u, v) and the coordinates of the center point of the notch, which is symmetrical about the origin, are (-u)k,-vk) At a distance D from the center of the filter-k(u, v), W is the width of the band, D0Is the center radius of the frequency band, k is a natural number;
the spectral image of the extracted noise can be represented as:
N(u,v)=H(u,v)G(u,v) (3);
the corresponding spatial domain image n (x, y) can be obtained by the frequency spectrum image of the noise through inverse Fourier transform
Since this process usually only yields an approximation of the noise, the noise is weighted with a w (x, y) modulation function and then subtracted with the noisy infrared image in the spatial domainThe noise image after being processed can obtain an estimation of a de-noised image
The modulation function is then minimizedThe minimum of the variance over a given neighborhood (2a +1) (2b +1) of each point (x, y), namely:
The second derivative is made to be zero, and then the modulation function can be obtained:
and finally, obtaining the denoised space domain infrared image through the step 4.
The data sets were divided into training sets, cross-validation sets, and test sets in 8:1:1, and each data set was expanded to 10 sizes by an image scaling method: {320, 352, 384, 416,448,480,512,544,576,608], i.e. in increments of size 32.
And initializing the weight parameters of the front 43 layers of convolutions of the improved YOLOv3 target detection network by adopting a transfer learning method and training the weight parameters of the front 43 layers of convolutions of the YOLOv 3.
Adam is adopted in the optimization algorithm, the iteration number of the Adam is set to be 100, the number of samples of each iteration is 64, the momentum coefficient is 0.9, and the learning rate attenuation coefficient is 0.0005. After every 10 iterations, i.e. randomly selecting a new size for training (10 sizes in total), all iterations are completed, i.e. completing one training to obtain one model, and then modifying the parameters again to continuously output 10 models in total.
And verifying the 10 models on the cross verification set to respectively obtain loss function values of the 10 models, determining the model corresponding to the minimum value as an optimal model, calculating the loss function value of the model by using the verification set, modifying parameters again if the value fails to meet the expected requirement, training the model, and otherwise, directly outputting the model as a target detection model.
The method comprises the steps of extracting the obtained visible light and infrared light videos into images, respectively carrying out CLAHE finite contrast adaptive histogram equalization on the images, carrying out denoising processing on the images and a filtering method of an optimal notch filter based on a 3-order Butterworth function, then detecting the images by using a target detection model obtained through training, and outputting a pedestrian detection result in front of the mining locomotive in real time.
In addition, the present invention also provides an electronic device including:
a memory for storing a computer program;
a processor for executing the computer program stored in the memory, and when the computer program is executed, implementing a mining locomotive pedestrian detection method based on multi-information fusion, at least comprising the following steps:
step 1, acquiring visible light and infrared light videos of pedestrians in front of a mining locomotive, extracting the videos into images, respectively preprocessing the images by using a CLAHE finite contrast adaptive histogram equalization and an optimal notch denoising method, then labeling the images by using LabelImg software, and expanding a data set by using an image enhancement method;
step 2, dividing the data set into a training set, a cross validation set and a test set according to the ratio of 8:1:1, wherein the training set is used for model training, the cross validation set is used for measuring the performance of the model so as to select optimal parameters, and the test set is used for final evaluation of the model; each data set is expanded into a plurality of scales through an image scaling method and is used for subsequent multi-scale training;
step 3, improving the YOLOv3 target detection network by adopting dense connection and a multi-scale pooling structure, and optimizing a loss function of the YOLOv3 target detection network;
step 4, initializing the weight parameters of the first 43 layers of convolutions of the improved YOLOv3 target detection network by adopting a transfer learning method, wherein the weight parameters of the first 43 layers of convolutions of the YOLOv3 target detection network are trained;
step 5, adjusting training parameters, and training the improved Yolov3 target detection network by using a training set;
step 6, selecting an optimal target detection model according to the detection result of the cross validation set, and then evaluating the performance of the model by using the test set;
step 7, analyzing the evaluation result, if the performance does not meet the expected requirement, executing the step 5 again, otherwise, directly outputting the trained target detection model;
and 8, detecting the re-acquired visible light and infrared light videos by using the trained target detection model, and outputting a pedestrian detection result in front of the mining locomotive in real time.
Alternatively, the electronic device may be a server or a personal computer, or the like.
Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the invention have been described with reference to the above specific embodiments, it is to be understood that the invention is not limited to the specific embodiments disclosed, nor is the division of the aspects, which is for convenience only as the features in these aspects cannot be combined to advantage. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Claims (10)
1. A mining locomotive pedestrian detection method based on multi-information fusion is characterized by comprising the following steps:
step 1, acquiring visible light and infrared light videos of pedestrians in front of a mining locomotive, extracting the videos into images, respectively preprocessing the images by using a CLAHE finite contrast adaptive histogram equalization and an optimal notch denoising method, then labeling the images by using LabelImg software, and expanding a data set by using an image enhancement method;
step 2, dividing the data set into a training set, a cross validation set and a test set according to the ratio of 8:1:1, wherein the training set is used for model training, the cross validation set is used for measuring the performance of the model so as to select optimal parameters, and the test set is used for final evaluation of the model; each data set is expanded into a plurality of scales through an image scaling method and is used for subsequent multi-scale training;
step 3, improving the YOLOv3 target detection network by adopting dense connection and a multi-scale pooling structure, and optimizing a loss function of the YOLOv3 target detection network;
step 4, initializing the weight parameters of the first 43 layers of convolutions of the improved YOLOv3 target detection network by adopting a transfer learning method, wherein the weight parameters of the first 43 layers of convolutions of the YOLOv3 target detection network are trained;
step 5, adjusting training parameters, and training the improved Yolov3 target detection network by using a training set;
step 6, selecting a model with the highest detection precision as an optimal model according to the detection result of the cross validation set, and then evaluating the performance of the model by using the test set;
step 7, analyzing the evaluation result, if the performance does not meet the expected requirement, executing the step 5 again, otherwise, directly outputting the trained target detection model;
and 8, detecting the re-acquired visible light and infrared light videos by using the trained target detection model, and outputting a pedestrian detection result in front of the mining locomotive in real time.
2. The method of claim 1, wherein the YOLOv3 target detection network is improved by using dense connection, and comprises performing jump connection on a 52 x 256 feature map in the network, so that the adjusted feature map is overlapped with the two subsequent feature maps 26 x 512 and 13 x 512; the 26 × 26 × 512 feature map is then superimposed with the two subsequent feature maps 13 × 13 × 512 and 26 × 26 × 256, whereas the 13 × 13 × 512 feature map is superimposed with only one subsequent feature map 26 × 26 × 256.
3. The method of claim 1, wherein the YOLOv3 target detection network is improved by using a multi-scale pooling structure, which comprises extracting 4 feature maps with different sizes from 13 x 512, 26 x 256 and 52 x 128 feature maps in the network through 4 pooling layers with different scales, combining the context information of the global region and the sub-region, and then combining the 4 feature maps with the original features to form a final feature expression, thereby performing convolution output.
4. The method of claim 1, wherein optimizing the loss function comprises defining class losses using a cross-entropy loss function to make the model easier to fit, i.e., the modified loss function is as follows:
where S represents the network size and is 13 × 13, 26 × 26, or 52 × 52, B is the number of candidate frames, and the variable xiAnd yiAs coordinates of the center point of the candidate frame, wiAnd hiWidth and height of the bounding box, respectively, CiIs the confidence of the predicted object, p(ij)Is a class, variable, of an objectIs its predicted value;
indicating the presence of an object in bounding box j in mesh i;
5. The method of claim 1, wherein optionally the expanding each data set to multiple scales by an image scaling method is scaling each image to 10 sizes using an image scaling method: {320, 352, 384, 416,448,480,512,544,576,608}.
6. The method of claim 1, wherein in step 1, for the visible light image, the CLAHE finite contrast adaptive histogram equalization method is adopted to process the visible light image; and for the infrared light image, denoising the infrared light image by adopting a filtering method of an optimal notch filter based on a 3-order Butterworth function.
7. The method of claim 6, wherein denoising the infrared image using a filtering method based on an optimal notch filter of a 3 rd order Butterworth function comprises:
firstly, Fourier transform is carried out on an infrared light image G (x, y) containing periodic noise to obtain a frequency spectrum image G (u, v) of the infrared light image G;
a5-pair 3-order Butterworth notch band-pass filter H (u, v) is placed at the position of a noise peak and used for extracting a main frequency part of noise, and the mathematical expression of the filter is as follows:
wherein, for a notch, its center point coordinate is (u)k,vk) Then a distance D from the center of the filterk(u, v) and the coordinates of the center point of the notch, which is symmetrical about the origin, are (-u)k,-vk) At a distance D from the center of the filter-k(u, v), W is the width of the band, D0Is the center radius of the frequency band, k is a natural number;
the spectral image of the extracted noise can be represented as:
N(u,v)=H(u,v)G(u,v) (3);
and (3) obtaining a corresponding spatial domain image n (x, y) by carrying out inverse Fourier transform on the frequency spectrum image of the noise:
weighting and adjusting the noise by using a w (x, y) modulation function, and then subtracting the modulated noise image from the infrared image containing the noise in the spatial domain to obtain an estimate of a denoised image
The modulation function is then minimizedThe minimum of the variance over a given neighborhood (2a +1) (2b +1) of each point (x, y), namely:
the second derivative is made to be zero, and then the modulation function can be obtained:
and finally, obtaining the denoised space domain infrared image through the step 4.
8. The method of claim 1, wherein in step 5, the improved YOLOv3 target detection network is trained by setting different learning rates, weighting attenuation coefficients and momentum coefficients, and then 10 different models are generated simultaneously.
9. The method of claim 8, wherein in step 6, the 10 models are validated on the cross-validation set to obtain their respective loss function values, and the model with the minimum value is identified as the optimal model.
10. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing a computer program stored in the memory, and when executed, implementing the method of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910860797.5A CN110795991B (en) | 2019-09-11 | 2019-09-11 | Mining locomotive pedestrian detection method based on multi-information fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910860797.5A CN110795991B (en) | 2019-09-11 | 2019-09-11 | Mining locomotive pedestrian detection method based on multi-information fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110795991A true CN110795991A (en) | 2020-02-14 |
CN110795991B CN110795991B (en) | 2023-03-31 |
Family
ID=69427241
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910860797.5A Active CN110795991B (en) | 2019-09-11 | 2019-09-11 | Mining locomotive pedestrian detection method based on multi-information fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110795991B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111382683A (en) * | 2020-03-02 | 2020-07-07 | 东南大学 | Target detection method based on feature fusion of color camera and infrared thermal imager |
CN111553289A (en) * | 2020-04-29 | 2020-08-18 | 中国科学院空天信息创新研究院 | Remote sensing image cloud detection method and system |
CN111832489A (en) * | 2020-07-15 | 2020-10-27 | 中国电子科技集团公司第三十八研究所 | Subway crowd density estimation method and system based on target detection |
CN111898427A (en) * | 2020-06-22 | 2020-11-06 | 西北工业大学 | Multispectral pedestrian detection method based on feature fusion deep neural network |
CN111950475A (en) * | 2020-08-15 | 2020-11-17 | 哈尔滨理工大学 | Yalhe histogram enhancement type target recognition algorithm based on yoloV3 |
CN111986240A (en) * | 2020-09-01 | 2020-11-24 | 交通运输部水运科学研究所 | Drowning person detection method and system based on visible light and thermal imaging data fusion |
CN112070111A (en) * | 2020-07-28 | 2020-12-11 | 浙江大学 | Multi-target detection method and system adaptive to multiband images |
CN112183265A (en) * | 2020-09-17 | 2021-01-05 | 国家电网有限公司 | Electric power construction video monitoring and alarming method and system based on image recognition |
CN112287839A (en) * | 2020-10-29 | 2021-01-29 | 广西科技大学 | SSD infrared image pedestrian detection method based on transfer learning |
CN112418358A (en) * | 2021-01-14 | 2021-02-26 | 苏州博宇鑫交通科技有限公司 | Vehicle multi-attribute classification method for strengthening deep fusion network |
CN112528934A (en) * | 2020-12-22 | 2021-03-19 | 燕山大学 | Improved YOLOv3 traffic sign detection method based on multi-scale feature layer |
CN112989924A (en) * | 2021-01-26 | 2021-06-18 | 深圳市优必选科技股份有限公司 | Target detection method, target detection device and terminal equipment |
CN114529879A (en) * | 2022-02-08 | 2022-05-24 | 安徽理工大学 | Real-time detection method for road conditions of electric locomotive in mine based on YOLOv4-Tiny |
CN115311241A (en) * | 2022-08-16 | 2022-11-08 | 天地(常州)自动化股份有限公司 | Coal mine down-hole person detection method based on image fusion and feature enhancement |
CN116664604A (en) * | 2023-07-31 | 2023-08-29 | 苏州浪潮智能科技有限公司 | Image processing method and device, storage medium and electronic equipment |
CN117315453A (en) * | 2023-11-21 | 2023-12-29 | 南开大学 | Underwater small target detection method based on underwater sonar image |
CN117830141A (en) * | 2024-03-04 | 2024-04-05 | 奥谱天成(成都)信息科技有限公司 | Method, medium, equipment and device for removing vertical stripe noise of infrared image |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815886A (en) * | 2019-01-21 | 2019-05-28 | 南京邮电大学 | A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3 |
CN109919058A (en) * | 2019-02-26 | 2019-06-21 | 武汉大学 | A kind of multisource video image highest priority rapid detection method based on Yolo V3 |
CN109934121A (en) * | 2019-02-21 | 2019-06-25 | 江苏大学 | A kind of orchard pedestrian detection method based on YOLOv3 algorithm |
US20190265714A1 (en) * | 2018-02-26 | 2019-08-29 | Fedex Corporate Services, Inc. | Systems and methods for enhanced collision avoidance on logistics ground support equipment using multi-sensor detection fusion |
-
2019
- 2019-09-11 CN CN201910860797.5A patent/CN110795991B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190265714A1 (en) * | 2018-02-26 | 2019-08-29 | Fedex Corporate Services, Inc. | Systems and methods for enhanced collision avoidance on logistics ground support equipment using multi-sensor detection fusion |
CN109815886A (en) * | 2019-01-21 | 2019-05-28 | 南京邮电大学 | A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3 |
CN109934121A (en) * | 2019-02-21 | 2019-06-25 | 江苏大学 | A kind of orchard pedestrian detection method based on YOLOv3 algorithm |
CN109919058A (en) * | 2019-02-26 | 2019-06-21 | 武汉大学 | A kind of multisource video image highest priority rapid detection method based on Yolo V3 |
Non-Patent Citations (2)
Title |
---|
王殿伟等: "改进的YOLOv3红外视频图像行人检测算法", 《西安邮电大学学报》 * |
谭康霞等: "基于YOLO模型的红外图像行人检测方法", 《激光与红外》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111382683A (en) * | 2020-03-02 | 2020-07-07 | 东南大学 | Target detection method based on feature fusion of color camera and infrared thermal imager |
CN111553289A (en) * | 2020-04-29 | 2020-08-18 | 中国科学院空天信息创新研究院 | Remote sensing image cloud detection method and system |
CN111898427A (en) * | 2020-06-22 | 2020-11-06 | 西北工业大学 | Multispectral pedestrian detection method based on feature fusion deep neural network |
CN111832489A (en) * | 2020-07-15 | 2020-10-27 | 中国电子科技集团公司第三十八研究所 | Subway crowd density estimation method and system based on target detection |
CN112070111A (en) * | 2020-07-28 | 2020-12-11 | 浙江大学 | Multi-target detection method and system adaptive to multiband images |
CN112070111B (en) * | 2020-07-28 | 2023-11-28 | 浙江大学 | Multi-target detection method and system adapting to multi-band image |
CN111950475A (en) * | 2020-08-15 | 2020-11-17 | 哈尔滨理工大学 | Yalhe histogram enhancement type target recognition algorithm based on yoloV3 |
CN111986240A (en) * | 2020-09-01 | 2020-11-24 | 交通运输部水运科学研究所 | Drowning person detection method and system based on visible light and thermal imaging data fusion |
CN112183265A (en) * | 2020-09-17 | 2021-01-05 | 国家电网有限公司 | Electric power construction video monitoring and alarming method and system based on image recognition |
CN112287839A (en) * | 2020-10-29 | 2021-01-29 | 广西科技大学 | SSD infrared image pedestrian detection method based on transfer learning |
CN112528934A (en) * | 2020-12-22 | 2021-03-19 | 燕山大学 | Improved YOLOv3 traffic sign detection method based on multi-scale feature layer |
CN112418358A (en) * | 2021-01-14 | 2021-02-26 | 苏州博宇鑫交通科技有限公司 | Vehicle multi-attribute classification method for strengthening deep fusion network |
CN112989924A (en) * | 2021-01-26 | 2021-06-18 | 深圳市优必选科技股份有限公司 | Target detection method, target detection device and terminal equipment |
CN112989924B (en) * | 2021-01-26 | 2024-05-24 | 深圳市优必选科技股份有限公司 | Target detection method, target detection device and terminal equipment |
CN114529879A (en) * | 2022-02-08 | 2022-05-24 | 安徽理工大学 | Real-time detection method for road conditions of electric locomotive in mine based on YOLOv4-Tiny |
CN114529879B (en) * | 2022-02-08 | 2024-07-16 | 安徽理工大学 | Mine electric locomotive road condition real-time detection method based on YOLOv-Tiny |
CN115311241B (en) * | 2022-08-16 | 2024-04-23 | 天地(常州)自动化股份有限公司 | Underground coal mine pedestrian detection method based on image fusion and feature enhancement |
CN115311241A (en) * | 2022-08-16 | 2022-11-08 | 天地(常州)自动化股份有限公司 | Coal mine down-hole person detection method based on image fusion and feature enhancement |
CN116664604B (en) * | 2023-07-31 | 2023-11-03 | 苏州浪潮智能科技有限公司 | Image processing method and device, storage medium and electronic equipment |
CN116664604A (en) * | 2023-07-31 | 2023-08-29 | 苏州浪潮智能科技有限公司 | Image processing method and device, storage medium and electronic equipment |
CN117315453A (en) * | 2023-11-21 | 2023-12-29 | 南开大学 | Underwater small target detection method based on underwater sonar image |
CN117315453B (en) * | 2023-11-21 | 2024-02-20 | 南开大学 | Underwater small target detection method based on underwater sonar image |
CN117830141A (en) * | 2024-03-04 | 2024-04-05 | 奥谱天成(成都)信息科技有限公司 | Method, medium, equipment and device for removing vertical stripe noise of infrared image |
CN117830141B (en) * | 2024-03-04 | 2024-05-03 | 奥谱天成(成都)信息科技有限公司 | Method, medium, equipment and device for removing vertical stripe noise of infrared image |
Also Published As
Publication number | Publication date |
---|---|
CN110795991B (en) | 2023-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110795991B (en) | Mining locomotive pedestrian detection method based on multi-information fusion | |
CN110956126B (en) | Small target detection method combined with super-resolution reconstruction | |
CN103870818B (en) | Smog detection method and device | |
CN113554089A (en) | Image classification countermeasure sample defense method and system and data processing terminal | |
KR101533925B1 (en) | Method and apparatus for small target detection in IR image | |
CN103226832B (en) | Based on the multi-spectrum remote sensing image change detecting method of spectral reflectivity mutation analysis | |
CN106251344A (en) | A kind of multiple dimensioned infrared target self-adapting detecting method of view-based access control model receptive field | |
JP7327077B2 (en) | Road obstacle detection device, road obstacle detection method, and road obstacle detection program | |
CN112862845A (en) | Lane line reconstruction method and device based on confidence evaluation | |
CN112307984B (en) | Safety helmet detection method and device based on neural network | |
CN109117746A (en) | Hand detection method and machine readable storage medium | |
CN105590301A (en) | Impulse noise elimination method of self-adaption normal-inclined double cross window mean filtering | |
KR101993085B1 (en) | Semantic image segmentation method based on deep learing | |
EP3671635B1 (en) | Curvilinear object segmentation with noise priors | |
CN114266894A (en) | Image segmentation method and device, electronic equipment and storage medium | |
CN110956602B (en) | Method and device for determining change area and storage medium | |
CN117333459A (en) | Image tampering detection method and system based on double-order attention and edge supervision | |
CN115170824A (en) | Change detection method for enhancing Siamese network based on space self-adaption and characteristics | |
CN115358952A (en) | Image enhancement method, system, equipment and storage medium based on meta-learning | |
Wu et al. | Pyramid edge detection based on stack filter | |
CN101739670A (en) | Non-local mean space domain time varying image filtering method | |
CN104616034B (en) | A kind of smog detection method | |
CN109785312B (en) | Image blur detection method and system and electronic equipment | |
CN115984712A (en) | Multi-scale feature-based remote sensing image small target detection method and system | |
CN115829875A (en) | Anti-patch generation method and device for non-shielding physical attack |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |