CN110210452A

CN110210452A - It is a kind of based on improve tiny-yolov3 mine truck environment under object detection method

Info

Publication number: CN110210452A
Application number: CN201910513549.3A
Authority: CN
Inventors: 肖冬; 单丰; 李泽; 黎霸俊; 李雪娆; 刘爔文
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2019-06-14
Filing date: 2019-06-14
Publication date: 2019-09-06

Abstract

The present invention relates to a kind of based on object detection method under the mine truck environment for improving tiny-yolov3, includes the following steps: S1, obtains object image data；S2, the object image data got is pre-processed；S3, it will be inputted in tiny-yolov3 model by pretreated object image data, and obtain the location of pixels coordinate of object in the picture by the processing of the tiny-yolov3 model；Wherein, the tiny-yolov3 model is by combining the improved model of residual error network structure, and the object image data is the object image data obtained under the conditions of vertical view.Detection method provided by the invention can increase substantially the detection accuracy of target under the premise of not reducing the speed of service.

Description

It is a kind of based on improve tiny-yolov3 mine truck environment under object detection method

Technical field

The invention belongs to Computer Vision Detection Technique field more particularly to a kind of mines based on improvement tiny-yolov3 With object detection method under truck environment.

Background technique

In most of opencut, due to labor shortage, and labor cost is high, on the one hand seeks bigger mining On the other hand truck also finds better solution to reduce the demand of personnel.With the continuous progress of technology, unmanned Mine truck come into being.Unpiloted mine truck also improves safety and production while saving human cost Efficiency becomes a component part of Mine digitalization.In mine truck unmannedization, active anti-corrosion technology is very heavy The ring wanted.Mine truck is transported due to high, wide, big feature, causes its blind area big, braking distance is long, and truck driver also can not Traffic accident can occur because of fatigue or carelessness with avoiding, the realization of truck active anti-corrosion technology is manned In the case where it is also very necessary.Traditional anticollision technology includes: fixed using the convex lens expansion visual field, radar or infrared distance measurement, GPS Position etc..Though the visual field can be expanded using convex lens, but still there is vision dead zone；It is uncomfortable using radar or infrared distance measurement applicable surface body Close strip object, and can only measure its object on straight line, large-scale mine truck need to install multiple radars or infrared, Improve cost；Since the pit of opencut is very deep, using GPS positioning may due to there are signal blind zone cause positioning failure or Person's wrong report.Therefore the detection positioning to barrier in mine, Lai Shixian active anti-corrosion are realized using computer vision herein Technology.

Target detection is to carry out the premise of the advanced visual task such as scene content understanding, be applied to intelligent video monitoring, In the tasks such as content-based image retrieval, robot navigation and enhancing realization.In recent years, depth convolutional network is regarded in computer Feel field makes a breakthrough sexual development.Depth convolutional network mainly passes through weight sharing policy and constantly deepens the level of network, makes Network has stronger analytic ability.VGG network, GoogleNet and residual error network etc. push convolutional network to deeper time, greatly Width improves the performance of network, and the accuracy rate for making large-scale image classify is promoted to very high level.Convolutional Neural based on region Network (Region Based Convolutional Neural Network, RCNN) arrives the accuracy rate promotion of target detection One new level.It is 3 independent processes since RCNN divides, so detection efficiency is very low.YOLO(You Only Look One) it is intended to be promoted the detection efficiency of target detection with the proposition of SSD (Single Shot Multibox Detector), it is intended to Target detection is set to reach the degree of real-time detection.The tiny-yolov3 of newest proposition is compared with SSD, in the item for guaranteeing real-time Under part, detection accuracy is substantially increased.Although yolov3 and SSD detection accuracy are very high, their insertions poor in performance The real-time run on formula equipment and PC is unsatisfactory, and tiny-yolov3 is a kind of simplified model of YOLOV3, it reduces convolution The depth of layer.Although detection accuracy is declined, the speed of service is greatly promoted.

Summary of the invention

(1) technical problems to be solved

In order to solve the above problem of the prior art, the present invention provides a kind of based on the mining card for improving tiny-yolov3 Object detection method under vehicle environment can increase substantially the detection accuracy of target under the premise of not reducing the speed of service.

(2) technical solution

In order to achieve the above object, the present invention uses main technical schemes the following steps are included:

It is a kind of based on improve tiny-yolov3 mine truck environment under object detection method, include the following steps:

S1, object image data is obtained；

S2, the object image data got is pre-processed；

S3, it will be inputted in tiny-yolov3 model by pretreated object image data, by the tiny- The processing of yolov3 model obtains the location of pixels coordinate of object in the picture；

Wherein, the tiny-yolov3 model is by combining the improved model of residual error network structure, the object Image data is the object image data obtained under the conditions of vertical view.

Preferably, the basic structure of the tiny-yolov3 model includes seven convolutional layers；

The structure of seven convolutional layers is Convolution2D+BatchNormalization+LeakyRelu knot Structure.

Preferably, the 4th convolutional layer of the basic structure of the tiny-yolov3 model is between the 7th convolutional layer Residual error network structure is added.

Preferably, feature is extracted using the convolutional layer of the convolutional layer of 1x1 and 3x3 in the residual error network structure.

Preferably, the loss function expression formula of the tiny-yolov3 model is as follows:

The tiny-yolov3 model includes four factors；

It is respectively as follows: the coordinate of the position x and y of prediction block, the scale of prediction block size w and h, types of forecast class, prediction Confidence level confidence；

Wherein, n is general objective number.

Preferably, the loss function of the position of factor prediction block are as follows:

loss_xy=objectmask* (2-w*h) * binarycrossentropy (truexy, predxy).

Preferably, the loss function of factor prediction block size are as follows:

loss_wh=objectmask* (2-w*h) * 0.5*square (truewh, predwh).

Preferably, the loss function of factor types of forecast are as follows:

loss_class=objectmask*binarycrossentropy (trueclass, predclass).

Preferably, the loss function of factor forecast confidence are as follows:

loss_confidence=objectmask*binarycrossentropy (objectmask, predmask)

+(1-objectmask)*binarycrossentropy(objectmask,predmask)*ignoremask；

Wherein, objectmask refers to the point for having object；(w, h) is the width and height of prediction block； Bianrycrossentropy is that two-value intersects entropy function；Square is the function for seeking variance；Truexy is actual target locations； Predxy is predicted position；Truewh is practical groundtruth frame size；Predwh is the size of prediction block；trueclass For realistic objective type；Predclass is types of forecast；Predmask is the point for predicting object；Ignoremask has with IOU It closes, when IOU is less than given threshold, ignoremask 0.

Preferably, the pretreatment in the step S2 includes carrying out data enhancing to the object image data of acquisition and adopting The size for being suitble to the anchors of the data set of tiny-yolov3 model in step S3 is calculated with k-means clustering algorithm.

(3) beneficial effect

The beneficial effects of the present invention are: the present invention provide it is a kind of based on improve tiny-yolov3 mine truck environment under Object detection method has the advantages that

The present invention does data enhancing for mine truck this kind, and calculates data set using k-means clustering algorithm Residual error network structure is added on the basis of tiny-yolov3 model to improve the precision of detection in the size of anchors.Experiment The result shows that new model proposed by the present invention compared in terms of target detection with tiny-yolov3 model show it is more excellent, and It is unobvious to detect speed decline.

Specifically, feature is extracted in residual error network using the convolutional layer of the convolutional layer of 1x1 and 3x3, and this knot will be inputted Characteristic pattern before structure is added with the characteristic pattern after residual error structure, and the information of the information profound level of shallow hierarchy is sent to simultaneously Next convolutional layer extracts feature, and such characteristic information is lost when interlayer transmits to be reduced, to improve network detection accuracy.

Detailed description of the invention

Fig. 1 is a kind of based on target detection side under the mine truck environment for improving tiny-yolov3 in the embodiment of the present invention Flow diagram in method；

Fig. 2 is a kind of based on target detection side under the mine truck environment for improving tiny-yolov3 in the embodiment of the present invention The structural schematic diagram of tiny-yolov3 in method；

Fig. 3 is a kind of based on target detection side under the mine truck environment for improving tiny-yolov3 in the embodiment of the present invention The structural schematic diagram of improved tiny-yolov3 in method；

Fig. 4 is a kind of based on target detection side under the mine truck environment for improving tiny-yolov3 in the embodiment of the present invention Tiny-yolov3 testing result and tiny-yolov3 testing result contrast schematic diagram after improvement in method；

Fig. 5 is a kind of based on target detection side under the mine truck environment for improving tiny-yolov3 in the embodiment of the present invention Tiny-yolov3 testing result and tiny-yolov3 testing result contrast schematic diagram after improvement in method；

Fig. 6 is a kind of based on target detection side under the mine truck environment for improving tiny-yolov3 in the embodiment of the present invention Tiny-yolov3 testing result and tiny-yolov3 testing result contrast schematic diagram after improvement in method；

Fig. 7 is a kind of based on target detection side under the mine truck environment for improving tiny-yolov3 in the embodiment of the present invention Tiny-yolov3 testing result and tiny-yolov3 testing result contrast schematic diagram after improvement in method.

Specific embodiment

In order to preferably explain the present invention, in order to understand, with reference to the accompanying drawing, by specific embodiment, to this hair It is bright to be described in detail.Specific embodiments of the present invention will be described in detail with reference to the accompanying drawing.

Embodiment 1

It is a kind of based on object detection method under the mine truck environment for improving tiny-yolov3 as shown in Figure 1:, including such as Lower step:

S1, object image data is obtained.

Object image data described here is figure of the camera on truck to overlook the people and vehicle that shoot under visual angle Piece.

S2, the object image data got is pre-processed.

It is noted that pretreatment described here, which is included at least, carries out data enhancing to the object image data of acquisition With the size for calculating the anchors of the data set of tiny-yolov3 model in suitable step S3 using k-means clustering algorithm.

S3, it will be inputted in tiny-yolov3 model by pretreated object image data, by the tiny- The processing of yolov3 model obtains the location of pixels coordinate of object in the picture.

As shown in Figures 2 and 3: tiny-yolov3 model described in the present embodiment includes seven convolutional layers.

Tiny-yolov3 model described in the present embodiment is a kind of simple version of yolov3 model, with yolov3 phase Than, yolov3 uses the framework of darknet53, and is continuing to extract feature later using the convolution kernel of many 1x1 and 3x3, Tiny-yolov3 reduces the quantity of convolutional layer, and basic structure only has 7 layers of convolutional layer, later using a small amount of 1x1 and 3x3 convolution Layer extracts feature.Tiny-yolov3 realizes dimensionality reduction instead of the convolutional layer that the step-length of yolov3 is 2 using pond layer.

The use Convolution2D+ identical with yolov3 but its convolutional layer structure remains unchanged The structure of BatchNormalization+LeakyRelu.

Therefore, the structure of seven convolutional layers described in this city embodiment is Convolution2D+ BatchNormalization+LeakyRelu structure.

4th convolutional layer of the basic structure of tiny-yolov3 model described in the present embodiment to the 7th convolutional layer it Between be added residual error network structure.

Specifically, feature is extracted using the convolutional layer of the convolutional layer of 1x1 and 3x3 in the residual error network structure.

The improved tiny-yolov3 model provided in the present embodiment is mentioned in the case where detection speed is in a slight decrease The precision of detection is risen, although frame per second has and reduces by a small margin when program is run, still meets system real time.Improve depth convolution The most common method of network detection accuracy is to increase the depth of network, that is, increases the number of convolutional layer.Network is deeper, is less susceptible to Convergence, and the feature of such as wisp that shallow-layer network extracts can deepen be diluted with network, and network is too deep can also draw Characteristic information is played to lose when interlayer transmits.Therefore the structure that residual error network is used for reference in the present embodiment, legacy network the 4th to Residual error network structure is added between 7th convolutional layer, extracts spy using the convolutional layer of the convolutional layer of 1x1 and 3x3 in residual error network Sign, and be added the characteristic pattern before this structure is inputted with the characteristic pattern after residual error structure, by the information profound level of shallow hierarchy Information send next convolutional layer to simultaneously and extract feature, such characteristic information is lost when interlayer transmits to be reduced, to mention High network detection accuracy.

It is noted that the loss function expression formula of tiny-yolov3 model described in the present embodiment is as follows:

The tiny-yolov3 model includes four factors；

Wherein, n is general objective number.

The loss function of the position of factor prediction block are as follows:

loss_xy=objectmask* (2-w*h) * binarycrossentropy (truexy, predxy).

The loss function of factor prediction block size are as follows:

loss_wh=objectmask* (2-w*h) * 0.5*square (truewh, predwh).

The loss function of factor types of forecast are as follows:

loss_class=objectmask*binarycrossentropy (trueclass, predclass).

The loss function of factor forecast confidence are as follows:

loss_confidence=objectmask*binarycrossentropy (objectmask, predmask)

+(1-objectmask)*binarycrossentropy(objectmask,predmask)*ignoremask；

Embodiment 2

The composition of data set used in the present embodiment are as follows: extract people and two class of vehicle in VOC2007 and VOC2012, and respectively with Machine chooses 50% a part as data set；By the picture of people and vehicle that camera is shot in the case where overlooking visual angle；Opencut is existing Fortune mine truck, people and the vehicle picture of field shooting.Wherein, it needs to do data enhancing to the picture comprising transporting mine truck, to expand mine With the training quantity of truck this kind target.

When being tested in the present embodiment the hardware configuration that uses be Intel Core i5-75003.40GHz processor, NVIDIA GTX 1050Ti video card, 16GB RAM, the western number mechanical hard disk of 500GB computer, system Windows10,64 System, programming language Python, GPU acceleration are realized using CUDA9.0 and CUDNN7.0.

Training method employed in the present embodiment are as follows: take 90% to be used as at random in the picture in above-mentioned data set and train Collection, remaining 10% as verifying collection, loads the tiny-yolov3 preloading model downloaded from the official website yolo and is trained.Training Process is divided into two parts, and first part trains training set 50 periods, and in this part, trained batchsize is 32, training Optimizer uses adam optimization algorithm, and trained learning rate is 0.001, beta_1=0.9, beta_2=0.999；Second part By 50 periods of training set retraining, in this part, trained data batch batchsize is 16, and training optimizer uses Adam optimization algorithm, trained learning rate are 0.0001, parameter beta_1=0.9, beta_2=0.999.When the every instruction of training set After practicing a cycle, calculate ap, map and loss of verifying collection, every five periods save a model, and by map it is highest and Loss minimum model is also saved and is come out.

MAP (Mean Average Precision) i.e. be averaged AP value, be to multiple verifyings collect individual be averaging AP value, work For the index for measuring detection accuracy in target detection.AP (Average Precision) i.e. accuracy of the mean.Its value is The area that Precision-recall curve and reference axis surround, P-R curve is i.e. using precision and recall as longitudinal and transverse The two-dimensional curve of axial coordinate.Corresponding precision and recall rate are drawn when by choosing different threshold values.The calculation of precesion It is shown below:

Wherein TP is the number for being correctly divided into positive example, and FP is the number for being mistakenly divided into positive example.

The calculation of recall is shown below:

Wherein FN is the number for mistakenly being divided the example that is negative.P-R curve can be drawn by above-mentioned formula, to calculate The AP value of single classification, averaging to the AP value of all categories just can obtain the mAP value of entire model.

In addition to accuracy in detection, another important performance indexes of algorithm of target detection are speed, and only speed is fast, It is able to achieve real-time detection, this is of crucial importance to application scenes.The common counter of estimating velocity is frame per second (Frame per second Per Second, FPS), i.e., interior manageable picture number per second.

Experimental result

With the increase position of residual error network structure in model in the present embodiment, whether add is used in increased residual error structure Structure is variable, has carried out many experiments.Firstly, tiny-yolov3 model third layer convolutional layer to layer 6 convolutional layer Residual error network structure is added afterwards, is compared with master mould, the mAP and FPS of model are as shown in table 1:

Whether table 1tiny-yolov3 increases residual error network structure test result

Table 1The test results with whether addresnet in tiny-yolov3

Network structure	Fps (frame/s)	MAP (%)
			tiny-yolov3	25	62.0
tiny-yolov3(conv3-6+resnet)	24	65.4

From table 1 it is known that after residual error network structure is added after third layer convolutional layer to layer 6 convolutional layer, detection essence Degree has biggish promotion.Since the most important convolutional layer of tint-yolov3 is the first seven layer, and with the increase of the number of plies, the convolution The number of convolution kernel is also increasing in layer.The convolution kernel of same size, convolution kernel number is more, and extraction feature is abundanter.Therefore, Also increase residual error network structure, the mAP and FPS of detection model such as 2 institute of table in the 4th layer of convolutional layer to layer 7 in the present embodiment Show:

2 different location of table increases residual error network structure test result

Table 2The testresults with different layers addresnet

From table 2 it is known that increasing residual error structure ratio in third convolutional layer to the in Volume Four lamination to the 7th convolutional layer Six convolutional layers increase residual error structure and compare, and are extracted more features, detection accuracy is further promoted, although comparing with master mould FPS has dropped 2 frames, but still meets the requirement of real-time.

Further, it is tested on the basis of model after improvement in the present embodiment, by the increased residual error network knot of institute Add is deleted part in structure, i.e., increases the convolution kernel of 1x1 and 3x3 only further to extract feature, do not use residual error structure.Experiment The results are shown in Table 3:

Whether residual error network structure added by table 3 uses add function test result

Table 3The test results with whether add resnet in tiny-yolov3

Network structure	Fps (frame/s)	MAP (%)
			tiny-yolov3(conv4-7+resnet)	23	66.5
tiny-yolov3(conv4-7+conv)	23	65.1

As known from Table 3, only increase convolution kernel compared with increasing residual error structure, detection accuracy has biggish decline, this explanation The information that add function will be passed to the characteristic information of this layer and extract after two layers of convolutional layer in residual error structure all passes to Next layer can reduce the information occurred when the transmitting of interlayer characteristic information really and lose, thus realize the promotion of detection accuracy, and The two FPS is identical.Therefore the model for increasing residual error structure in the present embodiment using the 4th to the 7th convolutional layer, as improved Model.It is as shown in Figure 4-Figure 7 to improve front and back model inspection effect picture:

Proposed in the present embodiment it is a kind of based on improve tiny-yolov3 mine truck environment under object detection method, Improvement tiny-yolov3 model in this method does data enhancing for mine truck this kind, and is calculated using k-means cluster Method calculates the size for being most suitable for the anchors of data set used in the present embodiment, on the basis of tiny-yolov3 model, Residual error network structure is added to improve the precision of detection.The experimental results showed that the improved model proposed in the present embodiment is in target Context of detection compare with tiny-yolov3 model show it is more excellent, and detect speed decline it is unobvious.

The technical principle of the invention is described above in combination with a specific embodiment, these descriptions are intended merely to explain of the invention Principle shall not be construed in any way as a limitation of the scope of protection of the invention.Based on explaining herein, those skilled in the art It can associate with other specific embodiments of the invention without creative labor, these modes fall within this hair Within bright protection scope.

Claims

1. a kind of based on object detection method under the mine truck environment for improving tiny-yolov3, which is characterized in that including as follows Step:

S1, object image data is obtained；

S2, the object image data got is pre-processed；

S3, it will be inputted in tiny-yolov3 model by pretreated object image data, by the tiny-yolov3 The processing of model obtains the location of pixels coordinate of object in the picture；

Wherein, the tiny-yolov3 model is by combining the improved model of residual error network structure, the target object image Data are the object image data obtained under the conditions of vertical view.

2. the method according to claim 1, wherein

The basic structure of the tiny-yolov3 model includes seven convolutional layers；

The structure of seven convolutional layers is Convolution2D+BatchNormalization+LeakyRelu structure.

3. according to the method described in claim 2, it is characterized in that,

4th convolutional layer of the basic structure of the tiny-yolov3 model between the 7th convolutional layer be added residual error network Structure.

4. according to the method described in claim 3, it is characterized in that,

Feature is extracted using the convolutional layer of the convolutional layer of 1x1 and 3x3 in the residual error network structure.

5. according to the method described in claim 4, it is characterized in that,

The loss function expression formula of the tiny-yolov3 model is as follows:

The tiny-yolov3 model includes four factors；

It is respectively as follows: the coordinate of the position x and y of prediction block, the scale of prediction block size w and h, types of forecast class, prediction confidence Spend confidence；

Wherein, n is general objective number.

6. according to the method described in claim 5, it is characterized in that, the loss function of the position of factor prediction block are as follows:

loss_xy=objectmask* (2-w*h) * binarycrossentropy (truexy, predxy).

7. according to the method described in claim 6, it is characterized in that, the loss function of factor prediction block size are as follows:

loss_wh=objectmask* (2-w*h) * 0.5*square (truewh, predwh).

8. the method according to the description of claim 7 is characterized in that the loss function of factor types of forecast are as follows:

loss_class=objectmask*binarycrossentropy (trueclass, predclass).

9. the method according to the description of claim 7 is characterized in that the loss function of factor forecast confidence are as follows:

loss_confidence=objectmask*binarycrossentropy (objectmask, predmask)+(1- Objectmask) * binarycrossentropy (objectmask, predmask) * ignoremask；

Wherein, objectmask refers to the point for having object；(w, h) is the width and height of prediction block；bianrycrossentropy Intersect entropy function for two-value；Square is the function for seeking variance；Truexy is actual target locations；Predxy is predicted position； Truewh is practical groundtruth frame size；Predwh is the size of prediction block；Trueclass is realistic objective type； Predclass is types of forecast；Predmask is the point for predicting object；Ignoremask is related with IOU, sets when IOU is less than Determine threshold value, ignoremask 0.

10. -9 described in any item detection methods according to claim 1, which is characterized in that

Pretreatment in the step S2 includes carrying out data enhancing to the object image data of acquisition and being gathered using k-means Class algorithm calculates the size for being suitble to the anchors of the data set of tiny-yolov3 model in step S3.