CN114663407A

CN114663407A - Coal gangue target detection method based on improved YOLOv5s model

Info

Publication number: CN114663407A
Application number: CN202210320475.3A
Authority: CN
Inventors: 季亮; 沈科; 张袁浩; 陈晓晶; 周李兵; 霍振龙; 潘祥生; 任书文; ***; 郝大彬
Original assignee: Tiandi Changzhou Automation Co Ltd; Changzhou Research Institute of China Coal Technology and Engineering Group Corp
Current assignee: Tiandi Changzhou Automation Co Ltd; Changzhou Research Institute of China Coal Technology and Engineering Group Corp
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-06-24

Abstract

The invention discloses a coal gangue target detection method based on an improved YOLOv5s model, which comprises the following steps: s1, acquiring real-time images of coal and gangue; s2, carrying out visual identification processing on the acquired real-time image based on the improved YOLOv5S model, thereby identifying coal and gangue in the real-time image and determining coordinate information of the gangue; and S3, sorting the gangue from the coal by the mechanical arm according to the coordinates of the gangue. On the basis of a YOLOv5s model, a self-correcting convolution network SCConv is embedded into a backsbone area of the YOLOv5s model, 19 multiplied by 19 characteristic diagram branches of a Neck area and a Prediction area in the YOLOv5s model are deleted, linear scaling is carried out on an anchor frame obtained by K-means algorithm clustering, an improved YOLOv5s model is provided and applied to coal and gangue target detection, and the detection speed and the detection precision are effectively improved.

Description

Coal gangue target detection method based on improved YOLOv5s model

Technical Field

The invention belongs to the technical field of coal and gangue identification, and particularly relates to a coal and gangue target detection method based on an improved YOLOv5s model.

Background

A large amount of gangue is often accompanied in coal mine production, and the quality of coal can be influenced if the gangue is not timely treated, so that the gangue separated from the mined coal mine can effectively improve the quality of the coal. With the development of machine vision technology, the machine vision technology is widely applied to the field of coal and gangue identification, and is mainly divided into 2 types of image processing algorithms and deep learning algorithms according to the technical principle. The image processing algorithm extracts the characteristics of the coal and gangue, such as color, gray level, edge, contour and the like, by designing a specific convolution filter, and then detects the coal and gangue target through an image segmentation algorithm, but the parameters need to be adjusted artificially according to different scenes in practical application, and the algorithm has poor robustness and poor practicability. The deep learning algorithm has the advantages of high recognition rate, strong robustness and the like, and is rapidly popularized in the aspect of coal and gangue recognition. In the application of coal and gangue target detection, Wangzhong lifting and the like propose a coal and gangue image classification method based on a deep learning network, the recognition rate is high, and accurate detection of the position and the size of a coal and gangue target is not carried out. Wangpo and the like adopt a convolutional neural network to detect the target of the coal gangue, and because the scene of the data set in the literature has only one target and the detection precision can be met under the condition of better illumination environment, the detection precision cannot be guaranteed when the scene of the data set in the literature has at least more than 6 targets, and the detection speed is slow because the detection time of each frame is 50 ms. Laiwehao and the like collect 3 wave bands by using a multi-spectral system to form a pseudo RGB image data set, and then use an improved YOLOv4 model to detect a coal and gangue target, but the single-frame detection time is 4.18s, so that the real-time detection of the coal and gangue cannot be realized.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art.

Therefore, the invention provides a coal gangue target detection method based on an improved YOLOv5s model, and the coal gangue target detection method based on the improved YOLOv5s model has the advantages of high detection speed and high detection precision.

The gangue target detection method based on the improved YOLOv5s model comprises the following steps: s1, acquiring real-time images of coal and gangue; s2, carrying out visual identification processing on the acquired real-time image based on the improved YOLOv5S model, thereby identifying coal and gangue in the real-time image and determining coordinate information of the gangue; and S3, sorting the gangue from the coal by the mechanical arm according to the coordinates of the gangue.

According to an embodiment of the present invention, S2 includes: s21, collecting a plurality of image samples of coal and gangue, dividing the image samples into a training set, a verification set and a test set, and labeling to complete data sets of the coal and the gangue simultaneously; s22, improving the YOLOv5S model to obtain an improved YOLOv5S model; s23, training the improved YOLOv5S model by utilizing a training set; s24, testing the improved YOLOv5S model by using a verification set, so as to test whether the training of the improved YOLOv5S model is accurate; and S25, detecting the image samples in the test set by using the trained improved YOLOv5S model, and evaluating the detection precision and the detection efficiency of the detection result of the test set.

In S22, two SCConv structures are embedded in the Backbone region of the YOLOv5S model, one SCConv structure located between the first CBL module and the CSP1_1 module and the other SCConv structure located between the second CBL module and the first CSP1_3 module, according to an embodiment of the present invention.

According to an embodiment of the present invention, in S22, the Neck region and the Prediction region of the YOLOv5S model are reduced, and the 19 × 19 feature map branches in the Neck region and the Prediction region are deleted.

According to an embodiment of the invention, in the training process of the improved YOLOv5s model, the sizes of the 6 groups of anchor boxes obtained after clustering by the K-means algorithm are (41, 63), (47, 94), (54, 69), (54, 51), (64, 84) and (64, 120), respectively.

According to one embodiment of the invention, the sizes of 6 groups of anchor frames obtained after the clustering by the K-means algorithm are scaled, and the scaling formula is as follows:

x′₁＝Ax₁ (1)

x′₆＝Bx₆ (2)

wherein: x is the number of_iThe width of the ith anchor frame (in the order of the width size of the anchor frame from small to large), i is 1, 2. x'_iThe width of the zoomed anchor frame; a is the reduction multiple of the anchor frame; b is the magnification of the anchor frame; y is_iThe height of the ith anchor frame; y'_iIs the zoomed anchor frame height.

According to one embodiment of the present invention, the scaled anchor frame sizes are (20, 31), (39, 79), (62, 80), (62, 59), (96, 126), and (96, 180), respectively.

According to one embodiment of the invention, the training platform adopted in S23 is NVIDIA GeForce GTX 2080Ti, the inference platform is a mining intrinsically safe edge computing device, and the mining intrinsically safe edge computing device has 14TOPS computing power; the input image size is 608 × 608, the channel is 3; during training, the momentum coefficient is set to be 0.937, the weight attenuation coefficient is set to be 0.0005, the learning rate is set to be 0.01, a arm-up method is adopted for updating the learning rate, the batch size is 16, and the training iteration number is 300.

In S21, 526 image samples with a native resolution of 1280 × 960 are collected, each image sample includes more than 4 coals and gangues and includes stacked and blocked coals and gangues, the training set includes 373 image samples, the verification set includes 77 image samples, and the test set includes 76 image samples.

According to an embodiment of the invention, in step S21, the coal and gangue data set is preliminarily labeled by using an auxiliary labeling tool, and then visualized by using an open source tool LabelImg, so as to complete the data set production of coal and gangue.

The improved YOLOv5s model has the beneficial effects that on the basis of the YOLOv5s model, a Self-correcting convolutional network (SCConv) is embedded into a backhaul region of the YOLOv5s model, 19 multiplied by 19 characteristic diagram branches of the Neck and Prediction regions are deleted, and an anchor frame obtained by K-means algorithm clustering is subjected to linear scaling, so that the improved YOLOv5s model is provided and applied to coal and gangue target detection, and the detection speed and the detection accuracy are effectively improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a coal gangue target detection method based on an improved YOLOv5s model in the invention;

FIG. 2 is a diagram of the structure of the YOLOv5s model;

FIG. 3 is a view of the SCConv structure of the present invention;

FIG. 4 is a structural diagram of a Backbone of the improved YOLOv5s model of the present application;

FIG. 5 is a block diagram of the Neck and Prediction of the improved YOLOv5s model of the present application;

FIG. 6 is a graph of the results of the test using the YOLOv5s model;

fig. 7 is a graph of the results of the test using the improved YOLOv5s model of the present application.

FIG. 8 is a P-R plot of models on a gangue identification test set.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced device or element must have a particular orientation, be constructed and operated in a particular orientation, and are not to be considered limiting of the invention. Furthermore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The gangue target detection method based on the improved YOLOv5s model according to the embodiment of the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1 to 8, the method for detecting the coal gangue target based on the improved YOLOv5s model according to the embodiment of the invention includes: s0, configuring and starting camera parameters, S1, collecting real-time images of coal and gangue through a camera; s2, carrying out visual identification processing on the acquired real-time image based on the improved YOLOv5S model, thereby identifying coal and gangue in the real-time image and determining coordinate information of the gangue; s3, sorting the gangue from the coal by the mechanical arm according to the coordinates of the gangue; and repeating the steps S1 to S3 to complete the circular detection and the coal gangue sorting.

On this basis, in S22, two SCConv structures are embedded in the Backbone region of the YOLOv5S model, one SCConv structure being located between the first CBL module and the CSP1_1 module, and the other SCConv structure being located between the second CBL module and the first CSP1_3 module. That is, as shown in fig. 2, the YOLOv5s model mainly realizes flexible configuration of model size and performance on the basis of the YOLOv4 model, and introduces the latest network modules and training skills, such as mosaic data enhancement, DropBlock mechanism, hardwinh activation function, GIoU bounding box regression loss, and the like. The Yolov5s model mainly comprises input, Backbone, Neck, Prediction and other areas, and each area is composed of CBL (Conv + BN + Leaky _ Relu), CSP (CBL + Res unint + Concat + BN + Leaky _ Relu), Focus, SPP and other modules. Since the Backbone area of YOLOv5s is mainly formed by stacking multiple sets of residual modules. However, the residual module cannot sufficiently fuse multi-scale feature information, so that an SCConv structure is introduced, where SCConv is a network component that achieves the effect of amplifying a receptive field by enhancing the intrinsic communication of a feature map without changing the model architecture, as shown in fig. 3, cxhxw is a dimension of an input feature map X, and X is X₁、X₂For the feature map after splitting, K₁—K₄For a convolution kernel, F₁—F₄The characteristic maps are respectively processed characteristic maps, r is the average pooling down-sampling multiple, Y₁、Y₂The feature maps are respectively output by the first branch and the second branch, Y is an output feature map, and the dimension is C multiplied by H multiplied by W. SCConv structure is divided according to channel dimensionFor 2 branches, the first branch utilizes down-sampling to increase the receptive field of the feature map, and the second branch is used for conventional convolution operation, and 2 branch channel information are combined, so that the feature extraction and expression capacity of the model is increased. Then, as can be seen from fig. 2 in combination with fig. 4, the Backbone structure of the improved YOLOv5s model has two additional SCConv structures compared to the Backbone structure of the YOLOv5s model, and the SCConv structures are embedded in the Backbone region of the YOLOv5s model in the invention, so that the feature extraction capability of the Backbone region is improved without significantly increasing the complexity of the YOLOv5s model.

From fig. 5 in conjunction with fig. 2, in S22, the Neck region and the Prediction region of the YOLOv5S model are reduced, and the 19 × 19 feature map branches in the Neck region and the Prediction region are deleted. The Neck area in the YOLOv5s model adopts multipath structure aggregation characteristics to enhance the network characteristic fusion capability. Because the sizes of the coal blocks and the gangue are small relative to the whole image, the large target detection of the 3 rd branch in the middle Neck area becomes redundant. In order to improve the detection speed of the model, a NeolOv 5s model Neck region and a Prediction region are appropriately simplified, and 19 x 19 characteristic diagram branches which have the largest receptive field and are suitable for detecting objects with larger sizes are deleted, so that the complexity of the model is reduced, and the detection real-time performance is improved.

According to an embodiment of the invention, in the training process of the improved YOLOv5s model, the sizes of the 6 groups of anchor boxes obtained after clustering by the K-means algorithm are (41, 63), (47, 94), (54, 69), (54, 51), (64, 84) and (64, 120), respectively. Further, the sizes of the 6 groups of anchor frames obtained after the clustering by the K-means algorithm are scaled, and the scaling formula is as follows:

x′₁＝Ax₁ (1)

x′₆＝Bx₆ (2)

wherein: x is the number of_iThe width of the ith anchor frame (in order of the width dimension of the anchor frame from small to large), i is 1, 2. x'_iThe width of the zoomed anchor frame; a is the reduction multiple of the anchor frame; b is the anchor box magnification (the scaling factor A, B needs to be determined from the selected data set to ensure that the scaled anchor box covers all marker box sizes in the data set); y is_iThe height of the ith anchor frame; y'_iIs the zoomed anchor frame height. Further, the scaled anchor frame sizes are (20, 31), (39, 79), (62, 80), (62, 59), (96, 126), and (96, 180), respectively.

That is, in the training process, the anchor box set is generated by performing K-means algorithm clustering on the target bounding boxes in the data set. Since the 19 x 19 feature map branches of the predicted large target are deleted in the hack region, the number of anchor boxes of the cluster is reduced from 9 groups to 6 groups. The 6 sets of anchor box sizes obtained after K-means algorithm clustering were (41, 63), (47, 94), (54, 69), (54, 51), (64, 84), (64, 120), respectively. Meanwhile, the sizes of the anchor frames generated by clustering through the K-means algorithm are relatively concentrated, the sizes of a considerable part of real mark frames of objects are greatly different from the sizes of the anchor frames obtained by clustering through the K-means algorithm, and the sizes of the anchor frames obtained by clustering cannot well cover the real sizes of most mark frames in the data set, so that the model convergence is slow and the optimal state is difficult to achieve. Therefore, the sizes of the 6 groups of anchor frames generated by clustering by the K-means algorithm are subjected to linear scaling, and in the scaling formula in the embodiment, a is 0.5; b is 1.5.

According to an embodiment of the present invention, in S21, 526 image samples with a native resolution of 1280 × 960 are collected, each image sample includes more than 4 pieces of coal and gangue, and includes stacking and blocking conditions of coal and gangue, the training set includes 373 image samples, the verification set includes 77 image samples, and the test set includes 76 image samples.

According to an embodiment of the invention, in step S21, in order to reduce the labor labeling cost, an auxiliary labeling tool is used to perform preliminary labeling on the coal and gangue data set, and then visualization is performed by an open source tool LabelImg, so as to complete the data set production of coal and gangue.

To verify the detection effect of the YOLOv5s model, a comparison experiment was performed based on the YOLOv5s model, and the results are shown in table 1(FPS is the frame rate per second, and mAP is the average precision). It can be seen that the model size of YOLOv5s is 6.74MB, the maps on the test set is 87.5%, and the FPS is 30.5 frames/s; the YOLOv5s-SCC model embeds an SCConv structure in a Backbone area as a main feature extraction network, and under the premise that the size of the model is increased by 0.26MB and the FPS is reduced by 0.9 frame/s, the mAP is improved by 0.7% compared with the YOLOv5s model, which indicates that the SCConv structure can improve the detection accuracy of the model; the YOLOv5s-TA model deletes 19 x 19 characteristic diagram branches in the Neck and Prediction regions, and the mAP is only reduced by 0.7% compared with the YOLOv5s model on the premise that the size of the model is reduced by 1.69MB and the FPS is increased by 3.2 frames/s, which shows that the YOLOv5s-TA model can improve the detection speed of the model; the YOLOv5s-DS model performs linear scale scaling on an anchor frame generated by K-means algorithm clustering, and on the premise that the size of the model is reduced by 1.69MB and the FPS is increased by 3.1 frames/s, the mAP is reduced by 0.1% compared with the YOLOv5s model, which shows that the YOLOv5s-DS model can improve the detection speed of the model on the premise that the detection precision tends to be stable; compared with the YOLOv5s model, the size of the improved YOLOv5s model is reduced by 1.57MB, FPS is increased by 2.1 frames/s, and mAP is increased by 1.7%, which shows that the improved YOLOv5s model of the application is improved in detection rate and detection precision to a certain extent.

TABLE 1 comparison of different improved YOLOv5s model test results

Model (model)	Model size/MB	FPS/frame s-1	mAP/％
				YOLOv5s	6.74	30.5	87.5
YOLOv5s-SCC	7.0	29.6	88.2
				YOLOv5s-TA	5.05	33.7	86.8
YOLOv5s-DS	5.05	33.6	87.4
				Improved YOLOv5s model	5.17	32.6	89.2

With reference to fig. 8, the accuracy P is plotted on the horizontal axis and the recall R is plotted on the vertical axis to obtain the P-R curves of YOLOv5s and 4 improved models, and the areas of the P-R curves and the horizontal and vertical coordinate enclosed cities are the average detection accuracy. As can be seen from fig. 8, the improved YOLOv5s model has the highest detection accuracy and the best performance.

As can be seen from fig. 6 and 7, the YOLOv5s model in fig. 6 does not identify the coal on the left side of the image, and the improved YOLOv5s model in fig. 7 has higher identification accuracy.

In conclusion, the improved YOLOv5s model has the beneficial effects that on the basis of the YOLOv5s model, a Self-correcting convolutional network (SCConv) is embedded into a backhaul region of the YOLOv5s model, 19 × 19 feature map branches of the Neck and Prediction regions are deleted, and an anchor frame obtained by clustering the K-means algorithm is linearly scaled, so that the improved YOLOv5s model is provided and applied to coal and gangue target detection, and the detection speed and the detection precision are effectively improved. SCConv is embedded in a Backbone area of a YOLOv5s model to serve as a feature extraction network, so that the problem of insufficient multi-scale feature extraction of the model is solved; the 19 multiplied by 19 characteristic diagram branches of the Neck and Prediction areas of the YOLOv5s model are deleted, so that the size of the model is effectively reduced; and linear scaling operation is carried out on the anchor frame obtained by clustering through the K-means algorithm, so that the model detection precision is improved. Compared with the YOLOv5s model, the improved YOLOv5s model reduces the size by 1.57MB, reduces the model parameters, increases the FPS by 2.1 frames/s, improves the mAP by 1.7%, and shows that the improved YOLOv5s model is improved in detection speed and detection precision.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples" or the like mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A gangue target detection method based on an improved YOLOv5s model is characterized by comprising the following steps:

s1, acquiring real-time images of coal and gangue;

s2, carrying out visual identification processing on the acquired real-time image based on the improved YOLOv5S model, thereby identifying coal and gangue in the real-time image and determining coordinate information of the gangue;

and S3, sorting the gangue from the coal by the mechanical arm according to the coordinates of the gangue.

2. The method for detecting the gangue targets based on the improved YOLOv5S model according to claim 1, wherein S2 comprises:

s21, collecting a plurality of image samples of coal and gangue, dividing the image samples into a training set, a verification set and a test set, and labeling to complete data sets of the coal and the gangue simultaneously;

s22, improving the YOLOv5S model to obtain an improved YOLOv5S model;

s23, training the improved YOLOv5S model by utilizing a training set;

s24, testing the improved YOLOv5S model by using a verification set, so as to test whether the training of the improved YOLOv5S model is accurate;

and S25, detecting the image samples in the test set by using the trained improved YOLOv5S model, and evaluating the detection precision and the detection efficiency of the detection result of the test set.

3. The method for detecting the gangue targets based on the improved YOLOv5S model as claimed in claim 2, wherein in S22, two SCConv structures are embedded in the backsbone region of the YOLOv5S model, one SCConv structure is located between the first CBL module and the CSP1_1 module, and the other SCConv structure is located between the second CBL module and the first CSP1_3 module.

4. The method for detecting the gangue targets based on the improved YOLOv5S model as claimed in claim 3, wherein in S22, the Neck region and the Prediction region of the YOLOv5S model are reduced, and the 19 × 19 feature map branches in the Neck region and the Prediction region are deleted.

5. The method for detecting the gangue targets based on the improved YOLOv5s model according to claim 4, wherein in the training process of the improved YOLOv5s model, the sizes of the 6 groups of anchor boxes obtained after clustering by the K-means algorithm are (41, 63), (47, 94), (54, 69), (54, 51), (64, 84) and (64, 120), respectively.

6. The method for detecting the gangue targets based on the improved YOLOv5s model according to claim 5, wherein 6 groups of anchor frame sizes obtained after clustering by a K-means algorithm are scaled, and the scaling formula is as follows:

x′₁＝Ax₁ (1)

x′₆＝Bx₆ (2)

wherein: x is the number of_iThe width of the ith anchor frame (in the order of the width size of the anchor frame from small to large), i is 1, 2. x'_iThe width of the zoomed anchor frame; a is the reduction times of the anchor frame; b is the magnification of the anchor frame; y is_iThe height of the ith anchor frame; y'_iIs the zoomed anchor frame height.

7. The method for detecting the gangue target based on the improved YOLOv5s model as claimed in claim 6, wherein the scaled anchor frame sizes are (20, 31), (39, 79), (62, 80), (62, 59), (96, 126) and (96, 180), respectively.

8. The gangue target detection method based on the improved YOLOv5S model as claimed in claim 2, wherein the training platform adopted in S23 is NVIDIA GeForce GTX 2080Ti, the inference platform is mining intrinsically safe type edge computing equipment, and the mining intrinsically safe type edge computing equipment has 14TOPS computing power; the input image size is 608 × 608, the channel is 3; during training, the momentum coefficient is set to be 0.937, the weight attenuation coefficient is set to be 0.0005, the learning rate is set to be 0.01, a arm-up method is adopted for updating the learning rate, the batch size is 16, and the training iteration number is 300.

9. The method for detecting the coal gangue target based on the improved YOLOv5S model as claimed in claim 8, wherein in S21, 526 image samples with a primary resolution of 1280 x 960 are collected, each image sample comprises more than 4 coals and gangue, and contains the condition of coal and gangue stacking and blocking, the training set comprises 373 image samples, the verification set comprises 77 image samples, and the test set comprises 76 image samples.

10. The method for detecting the target of the coal gangue based on the improved YOLOv5S model as claimed in claim 9, wherein in S21, the coal gangue data set is preliminarily labeled by using an auxiliary labeling tool, and then visualized by an open source tool LabelImg, so as to complete the production of the coal and gangue data set.