CN113920400A

CN113920400A - Metal surface defect detection method based on improved YOLOv3

Info

Publication number: CN113920400A
Application number: CN202111199291.8A
Authority: CN
Inventors: 齐向明; 司松林
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2022-01-11

Abstract

The invention discloses a metal surface defect detection method based on improved YOLOv 3. Firstly, clustering real frame parameters by adopting a K-means + + clustering algorithm to obtain a prior frame with a higher average intersection ratio with the real frame so as to improve the convergence speed of the algorithm; then, a characteristic pyramid (FPN) structure in the YOLOv3 algorithm is improved, a path aggregation network is introduced into the FPN structure, and a ResBlock structure is provided to replace a convolution structure in the FPN structure, so that the characteristic extraction capability of the algorithm is enhanced, and the detection capability of small targets and objects with unobvious characteristics is improved; and finally, optimizing the regression of the bounding box by using a DIOU loss function, improving the positioning accuracy of the bounding box, and further improving the detection effect of the YOLOv3 model. Experimental results on the NEU-DET data set show that the algorithm can effectively improve the metal surface defect detection precision, the mAP reaches 72.83%, and compared with a single YOLOv3 algorithm, the mAP is improved by 8.00%, and the method is superior to other mainstream target detection models.

Description

Metal surface defect detection method based on improved YOLOv3

Technical Field

The invention relates to the technical field of machining, in particular to a metal surface defect detection method based on improved YOLOv 3.

Background

In the machining process, due to the influence of machining tools or operation, damage is inevitably caused to the metal surface, and a series of defects such as cracks, plaques and the like are formed. The surface defects of the metal can seriously affect the quality and the appearance of related products, thereby damaging the benefits of enterprises, effectively detecting the defects on the surface of the metal, finding defective products in time, improving the utilization rate of the products, ensuring the product quality and playing an important role in the development of the enterprises. The traditional manual visual inspection is easily influenced by subjective judgment of people, and has the problems of unstable detection precision, low efficiency and the like, so that how to accurately and efficiently detect the metal surface defects becomes one of the popular research problems in recent years.

The early metal surface defect detection method is mainly divided into a traditional image processing method and a machine learning method based on artificial extraction features. The traditional image processing method uses the attributes reflected by local abnormity to detect and segment defects, such as edge detection, template matching and other algorithms; the method based on machine learning comprises the steps of firstly extracting features of an input image by means of LBP (local binary pattern), HOG (histogram of oriented gradient) and other methods, designing a feature vector for describing defect information, and then inputting the feature vector into a pre-trained classifier model to determine whether the input image contains defects. Although these methods have promoted the development of defect detection techniques, these methods are generally only applicable to specific scenes, lack adaptability to environmental changes, and are less robust.

With the wide application of deep learning in the field of computer vision, the deep learning method gradually replaces the detection of metal surface defects by the traditional machine vision. The detection algorithm of deep learning is mainly divided into a single-stage algorithm represented by algorithms such as YOLO and SSD and a two-stage algorithm represented by an R-CNN algorithm. The two-stage algorithm first uses the region selection network to perform feature extraction and generate candidate boxes, and then uses another convolutional neural network to classify and regress the candidate boxes. The two-stage algorithm has higher detection precision, but the detection speed is slower, and the real-time detection is difficult to realize; the single-stage algorithm converts the detection problem from a classification problem to a regression problem, only uses a convolutional neural network to carry out feature extraction and classification and regression of candidate frames, and is an end-to-end target detection algorithm. The single-phase algorithm has a faster detection speed, but the detection precision is generally not as high as that of the two-phase algorithm.

Therefore, a method for detecting defects on a metal surface with high detection speed and high detection accuracy is needed.

Disclosure of Invention

In view of this, the invention provides a metal surface defect detection method based on improved YOLOv3, which improves the YOLOv3 algorithm, and further improves the metal surface defect detection accuracy on the basis of keeping the advantage of faster detection speed of the YOLOv3 algorithm.

Therefore, the invention provides the following technical scheme:

the invention provides a metal surface defect detection method based on improved YOLOv3, which comprises the following steps:

s1, acquiring an image of the metal surface defect to be detected;

s2, carrying out image preprocessing on the image of the metal surface defect to be detected;

s3: inputting the image after image preprocessing into a trained metal surface defect detection model to obtain a metal surface defect detection result; the metal surface defect detection model is based on an improved YOLOv3 model; in the improved YOLOv3 model, a feature pyramid network transfers and fuses deep feature information in an up-sampling mode from top to bottom, then a path aggregation network is introduced, a feature pyramid structure from bottom to top is added behind a feature pyramid network layer, a down-sampling branch is added to the fused feature map except for prediction, and a shallow feature map and a deep feature map are fused again; 5 continuous CBL structures in the feature pyramid network are re-integrated by using a residual error structure, Add operation and Concat operation, and information is overlapped through the Add operation, so that the information quantity for describing the image features is increased without changing the dimensionality of the image; stacking the feature graph obtained by the Add operation and the feature graph of jump connection through the Concat operation, and adding features describing the image per se through stacking of dimensions; and taking the feature map passing through one CBL structure as the output of the residual structure.

Further, the image preprocessing comprises:

and adjusting the image size to fit the input image size of the metal surface defect detection model.

Further, the image preprocessing further comprises: the image was randomly subjected to flipping and gamut transformation operations with a probability of 0.5.

Further, the improved YOLOv3 uses the DIOU loss function to optimize the bounding box regression.

Further, a K-means + + clustering algorithm is adopted in the improved YOLOv3, real frame parameters are clustered, and a priori frame with a higher average intersection ratio with the real frame is obtained.

The invention has the advantages and positive effects that:

the invention discloses a metal surface defect detection method based on improved YOLOv3, which improves YOLOv3, firstly adopts a K-means + + clustering algorithm to cluster real frame parameters, and obtains a prior frame with higher average intersection-parallel ratio with the real frame so as to improve the convergence speed of the algorithm; then, a characteristic pyramid (FPN) structure in the YOLOv3 algorithm is improved, a path aggregation network is introduced into the FPN structure, and a ResBlock structure is provided to replace a convolution structure in the FPN structure, so that the characteristic extraction capability of the algorithm is enhanced, and the detection capability of small targets and objects with unobvious characteristics is improved; and finally, optimizing the regression of the bounding box by using a DIOU loss function, improving the positioning accuracy of the bounding box, and further improving the detection effect of the YOLOv3 model. Experimental results on the NEU-DET data set show that the method provided by the invention can effectively improve the metal surface defect detection precision, the mAP reaches 72.83%, and the mAP is improved by 8.00% compared with a single YOLOv3, and is superior to other mainstream target detection models.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of a metal surface defect detection method based on YOLOv3 according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the original YOLOv3 network structure;

FIG. 3 is a diagram of the residual structure in the original YOLOv3 network;

FIG. 4 is a schematic diagram of a modified YOLOv3 network according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a ResBlock structure in an improved yollov 3 network according to an embodiment of the present invention;

FIG. 6 is a prediction box and a real box for the same L2 distance but different IOUs;

FIG. 7 is a schematic diagram of a DIOU in an embodiment of the present invention;

FIG. 8 is a schematic diagram of the NEU-DET data set;

FIG. 9 is a graph of the loss function values for YOLOv3 and modified YOLOv 3;

FIG. 10 is a comparison of P-R curves for YOLOv3 and improved YOLOv3 for each defect class;

FIG. 11 is a comparison of the average accuracy values of YOLOv3 and improved YOLOv 3.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, it shows a flowchart of a metal surface defect detection method based on improved YOLOv3 provided in an embodiment of the present invention, and the method includes:

s1, acquiring an image of the metal surface defect to be detected;

wherein, the metal surface defects mainly comprise the following 6 types: cracking, inclusions, patches, pitted surfaces, scales, scratches, etc.

The image of the metal surface defect to be detected comprises one or more metal surface defects.

wherein the image preprocessing comprises: the image size is adjusted to fit the input image size of the metal surface defect detection model, e.g., the initial pixels of the image are 200 x 200, and the image size is adjusted to 416 x 416 size that fits the input image size of the YOLOv3 model.

In order to enhance the processing effect of the metal surface defect detection model, the image preprocessing further comprises the following steps: the image was randomly subjected to flipping and gamut transformation operations with a probability of 0.5.

S3: and inputting the image after image preprocessing into a trained metal surface defect detection model to obtain a metal surface defect detection result.

The metal surface defect detection model in the embodiment of the invention is based on an improved YOLOv3 model.

Yolov3 is an end-to-end real-time target detection algorithm, and the network structure is shown in FIG. 2, and is mainly composed of Darknet-53 feature extraction network, FPN and multi-scale prediction layer. Wherein:

the Darknet-53 fuses residual structures of ResNet (residual network), and the whole body consists of a CBL structure and 5 residual structures. The CBL structure is formed by serially connecting a convolution layer, a batch normalization layer and an activation function layer; the 5 residual structures are respectively composed of different numbers of residual blocks (1,2,8,8,4), and the N residual blocks are composed of a CBL structure and N residual units connected in series, as shown in fig. 3 (a). Each residual unit is composed of an input of the previous layer and an output of the previous layer through 2 CBL structures, which are subjected to Add operation, as shown in fig. 3 (b). Each residual structure compresses the input feature map size to half of the original size, and in fig. 2, the input image is 416 × 416 × 3, and the feature map sizes output by 5 residual structures are 208 × 208 × 64, 104 × 104 × 128, 52 × 52 × 256, 26 × 26 × 512, and 13 × 13 × 1024, respectively. The output feature maps of the last three residual structures will be used as input for the FPN.

The FPN comprises 3 prediction branch structures, provides characteristic information for detection of a large target, a medium target and a small target in sequence, fuses strong semantic information contained in a deep characteristic branch and a shallow characteristic branch by using a CBL structure and an upper sampling layer through top-down characteristic flow, and provides stronger semantic characteristic information for the shallow characteristic branch.

The multiscale prediction layer yolo head structure consists of a CBL structure and a1 x 1 convolutional layer, and reduces the dimension of an input feature map to a fixed number of channels to generate a final output.

YOLOv3 uses an image of 416 x 416 size as input, which is divided into S x S grids after passing through the feature extraction network. Generating a prior frame by using a K-means clustering algorithm on a data set, and finally presetting 3 groups of prior frames on 3 different scales for each grid respectively, wherein each predicted feature layer needs to predict N multiplied by (S multiplied by (4+1+ C)) parameters, wherein N represents the scale size; s represents the predicted scale number of each grid; 4+1+ C parameters are predicted for each scale, including 4 offset parameters for each prediction bounding box; 1 confidence score, wherein the confidence of the current grid is 0 when the current grid does not contain the target, and the confidence becomes the IOU value of the prediction frame and the real frame when the current grid contains the target; c categories are scored, and the category with the highest score is the target category. And then, setting a threshold value for the reliability, eliminating the prediction frames with lower scores, and finally filtering out the redundant prediction frames through non-maximum value suppression (NMS) to obtain a final network prediction result.

In order to improve the detection accuracy of YOLOv3 for detecting metal surface defects, the following improvements are proposed for the YOLOv3 algorithm in the embodiment of the invention: firstly, a prior frame with a higher average intersection ratio is obtained by adopting a K-means + + clustering algorithm, so that the convergence speed of a target detection algorithm is improved; then, a PANet structure and a ResBlock structure are introduced to improve FPN, so that the feature extraction capability of the algorithm is enhanced, and the detection capability of small targets and objects with unobvious features is improved; and finally, optimizing the regression of the bounding box by using a CIOU loss function, improving the accuracy of the locating bounding box, and further improving the detection effect of the YOLOv3 model.

For the convenience of understanding, the following describes the improvement made to the YOLOv3 algorithm in the embodiment of the present invention:

(1) improving the generation mode of the prior frame: using K-means + + clustering algorithm

The original YOLOv3 algorithm adopts a K-means clustering algorithm to cluster the sizes of the real bounding boxes, and determines the size of the optimal prior box so as to improve the coincidence rate of the prior box and the real box.

Aiming at the phenomenon that a final prior frame is not accurate enough because different initial clustering centers can generate different clustering results and the different results can generate a phenomenon of large difference due to the fact that the K-means clustering algorithm is sensitive to the selection of the initial clustering centers, the embodiment of the invention adopts a K-means + + clustering algorithm to replace the clustering center selection mode of the K-means algorithm: firstly, taking a data mean value as a first clustering center, then selecting K-1 points which are farthest away from the first clustering center in the data as remaining clustering centers, obtaining K initial clustering centers, and then clustering by using a K-means algorithm.

The K-means algorithm divides samples by adopting Euclidean distance as a similarity measurement mode, and the Euclidean distance used in the distance measurement mode can cause a large error of a large boundary box compared with a small boundary box, so that a clustering result deviates. Because the clustering meaning is to determine more accurate prior frame parameters, so that the prior frame and the real frame have higher average cross-over ratio, the cross-over ratio is taken as a new judgment standard, and the final distance measurement formula in the embodiment of the invention is shown as the formula (1):

the box represents a real boundary box parameter, the centroid represents a clustering center, the IOU represents the intersection ratio of the prediction box A and the real box B, and the larger the IOU is, the smaller the distance is.

Since the anchor box size of the original YOLO algorithm is generated by clustering the COCO dataset, which has a large difference between the target class and size of the dataset and the NEU-DET dataset, the previous box generation is performed on the NEU-DET dataset with the input resolution of 416 × 416 by using the Kmeans + + algorithm, and the resulting 9 anchor box sizes are (33,349), (37,73), (50,147), (79,391), (96,81), (119,150), (162,258), (293,122), (333,399), respectively, and the average intersection ratio is increased to 69.03%, which is 2.68% higher than the result obtained by K-means.

(2) Improved network structure

In the embodiment of the invention, a further optimization improvement is made on the FPN structure of YOLOv3, the FPN structure is combined with a path aggregation network, the importance of shallow feature information of the network is mainly considered in the path aggregation, and the shallow features are mostly the features such as edge shapes and the like, so that the method is very important for accurate detection. The improved YOLOv3 network structure is shown in fig. 4, after the model transfers and fuses deep feature information in an up-sampling mode from top to bottom through an FPN, a path aggregation network is introduced, a feature pyramid structure from bottom to top is added behind the FPN layer, a down-sampling branch is added to the feature map of 52 × 52 scale after the fusion, except for prediction, and the shallow feature map and the deep feature map are fused again. By the combination operation, the FPN layer transmits strong semantic features from top to bottom, the feature pyramid transmits strong positioning features from bottom to top, the strong semantic features and the strong positioning features are combined pairwise, and parameter aggregation is performed on different detection layers from different trunk layers.

In addition, the embodiment of the invention provides a ResBlock structure to replace continuous 5 CBL structures in the original model. Using 5 continuous CBL structures in an original model, performing feature extraction by alternately using convolutions of 1 × 1 and 3 × 3, and in order to enhance the feature extraction effect, re-integrating the 5 continuous CBL structures by using a residual error structure, an Add operation and a Concat operation, as shown in FIG. 5, firstly, information can be superimposed by the Add operation, and the information quantity describing image features is increased without changing the dimensionality of an image; stacking the feature graph obtained by the Add operation and the feature graph of jump connection through the Concat operation, and increasing the feature of the description image through stacking dimensions; and finally, taking the characteristic diagram passing through a CBL structure as the output of the ResBlock structure.

(3) Improving the loss function

The loss function is used as a judgment basis for the difference between the predicted value and the true value, and plays a significant role in the performance effect of the deep learning algorithm. A better detection result can be obtained by using a loss function more suitable for the detection principle of the YOLOv3 algorithm.

The loss function of the original YOLOv3 model is composed of a weighted summation of bounding box regression loss, confidence loss and classification loss, namely:

L＝L_coord+L_conf+L_cla (3)

the regression loss of the boundary frame consists of central point coordinate loss and width-height size loss, and is calculated by adopting a mean square error function; the confidence loss is obtained by summing the confidence loss of the target detection frame and the confidence loss of the non-target detection frame and is calculated by adopting a cross entropy function; the class loss is calculated by the Sigmoid activation function. The formula for calculating the loss function is shown in equation (4):

wherein λ_coord、λ_noobjFor the balance coefficient, the grid is S in total²Generating B candidate frames for each grid, finally obtaining S B boundary frames,

whether the jth anchor box, denoted as the ith mesh, is "responsible" for this goal, and if so, then

Otherwise it is 0. The definition of "responsible" is: when the ith gridThe IOU of the B anchor boxes with the real bounding box of the target is the largest among the IOUs of all anchor boxes with the real bounding box, and the anchor box is called to be "responsible" for the target. X_i、Y_i、W_i、H_iRespectively represent the horizontal and vertical coordinates of the center of the real bounding box and the width and the height of the box,

a value representing a corresponding prediction bounding box;

and C_iRespectively representing the confidence degrees of the prediction frame and the real frame;

and p_i(c) Respectively representing class probability values of the predicted box and the real box.

The polynomial of the formula (4) contains 5 sub-terms, and the first term represents the central coordinate loss of the prediction frame; the second term is the predicted frame width height dimension penalty; the third term is the confidence loss of the prediction box containing the target; the fourth term is the prediction box confidence loss without target; the last term is the category probability value loss. The square error is used in the boundary box regression loss to reduce the influence of the excessive ratio of the prediction box loss value, and the (2-W) is used because the small target boundary box regression loss value is small_i×H_i) The weight of the equation is adjusted, and the regression loss of the small target bounding box is increased.

From the formula (4), it can be seen that the original YOLOv3 uses a mean square error loss function to perform prediction box regression, but the mean square error loss function treats the coordinate of the center point of the bounding box and the width and height information as independent variables, and ignores the overall relation of the four parameters, so that the IOU values of the prediction box and the real box cannot be well reflected, and the efficiency of positioning the target is reduced. Fig. 6 is a comparison of the IOU with the same L2 norm of the coordinate distance between the predicted frame and the real frame corner point, wherein the solid line box represents the real frame and the dashed line box represents the predicted frame, and it can be seen from fig. 6 that the IOU values of two frames with the same distance of L2 are greatly different.

Since YOLOv3 requires a prediction box with a large IOU value as the final result, IOU is used to measure the bounding box regression loss. However, the IOU cannot accurately reflect the degree of overlap between two frames, and when the IOU is equal, the degree of overlap between two frames may be different, so the IOU cannot be directly used as a regression loss function of the bounding box. For this case, a DIOU loss function is used as a new bounding box regression loss function, and a DIOU diagram is shown in fig. 7, where d represents the distance between the center points of two candidate boxes, and c represents the diagonal distance of the minimum closure region that can contain both the predicted box and the real box.

The calculation formula of DIOU is shown in formula (5):

wherein, b and b^gtRespectively representing the central points of the prediction frame and the real frame, and p represents the Euclidean distance. When the two bounding boxes are completely overlapped, L_DIoUWhen two bounding boxes are far apart, L is 0_DIoUApproaching 2, the DIoU can reflect the overlapping degree of the two bounding boxes well. The DIOU loss function is shown in equation (6):

L_DIoU＝1-DIoU (6)

the DIoU takes the overlapping area and the distance between the central points of the boundary frames into consideration, and when the prediction frame and the real frame are not intersected, the correct moving direction can be still provided for the prediction frame, so that the regression of the prediction frame becomes more accurate. Since the DIoU directly calculates the distance of the two boxes, the predicted box regression speed can also be increased.

In the specific implementation, a metal surface defect detection model needs to be trained in advance, and the training process is as follows:

first, a metal surface defect data set, such as a NEU-DET steel surface defect data set, is obtained. The size of each image in the dataset was adjusted to a size (416 x 416) that fits the yolov3 model. To enhance the model training effect, each image was randomly subjected to flipping and gamut transformation operations with a probability of 0.5. The data set is scaled into a training set and a test set.

And then, performing network training on the metal surface defect detection model by using a training set, wherein Adam is used as an optimizer in the network training, the batch size is set to be 8, 250 epochs are passed, the initial learning rate is set to be 1e-2, a piecewise constant attenuation strategy is adopted, the step length is set to be 2, the attenuation rate is 0.96, and finally the learning rate is reduced to be 1 e-6.

According to the metal surface defect detection method based on the improved YOLOv3, YOLOv3 is improved, firstly, a K-means + + clustering algorithm is adopted, real frame parameters are clustered, and a prior frame with a higher average intersection ratio with the real frame is obtained so as to improve the convergence speed of the algorithm; then, a characteristic pyramid (FPN) structure in the YOLOv3 algorithm is improved, a path aggregation network is introduced into the FPN structure, and a ResBlock structure is provided to replace a convolution structure in the FPN structure, so that the characteristic extraction capability of the algorithm is enhanced, and the detection capability of small targets and objects with unobvious characteristics is improved; and finally, optimizing the regression of the bounding box by using a DIOU loss function, improving the positioning accuracy of the bounding box, and further improving the detection effect of the YOLOv3 model. The method can effectively improve the detection precision of the metal surface defects.

In order to confirm the effectiveness of the metal surface defect detection method based on the improved YOLOv3, the following experiments are carried out:

an experiment platform: the experiment is based on a Windows10 operating system, and the workstation is configured to be Intel (R) core (TM) i7-10875H CPU @2.30GHz, 32G running memory, Nvidia RTX 3070 display card and 8G display memory. Pytorch1.7 was used as the deep learning framework and GPU accelerated model training was performed using CUDA11.0 and cudnn 8.0.5.

Data set: the NEU-DET steel surface defect data set is adopted in the experiment, 6 types of defects are shared, namely cracking (crazing), inclusion (inclusion), patches (patches), pitting surface (plated _ surface), scale (rolled-in _ scale) and scratches (scratches), and each type of defect has 300 images. Since the initial pixels of the image are 200 × 200, to fit the input image size of the YOLO model, the 1800 images of the data set resize are 416 × 416 in size. To enhance the model training effect, each image was randomly flipped and gamut transformed with a probability of 0.5. The data set was divided into training and testing sets at a ratio of 8:2, then training and validation sets at a ratio of 9:1, and finally 1296 training images, 144 validation images and 360 testing images were obtained. The NEU-DET data set is shown in FIG. 8.

Model training: since the NEU-DET dataset is very different from the dataset used for the pre-training weights, the pre-training weights are not loaded for training in the experiment. Adam is used as an optimizer in network training, the batch size is set to be 8, 250 epochs are passed, the initial learning rate is set to be 1e-2, a piecewise constant attenuation strategy is adopted, the step length is set to be 2, the attenuation rate is 0.96, and finally the learning rate is reduced to be 1 e-6.

Fig. 9 shows a loss value graph of YOLOv3 and a modified version (the left graph of fig. 9 is a YOLOv3 model loss function value graph, and the right graph of fig. 9 is a modified YOLOv3 model loss function value graph), with an epoch value on the abscissa and a loss value on the ordinate. The initial loss value is higher because the model is not loaded with pre-training weights for training, and the initial loss value of the Yolov3 model reaches 198.56; the initial loss value of the improved YOLOv3 model reached 171.38. In order to make curve fluctuation of the loss function value more clear and visual, a loss function value curve graph is drawn from the 2 nd epoch, the loss of the 2 nd epoch of the YOLOv3 model is 46.39, and the loss finally converges to 13.65 after the training of 250 epochs; the loss of the 2 nd epoch of the improved YOLOv3 model is 26.92, and the 2 nd epoch is finally converged to 13.01 after the training of 250 epochs. It can be seen that the convergence rate of the improved loss value of the algorithm is obviously accelerated, and the effectiveness of the algorithm improvement is proved.

Evaluation indexes are as follows: experiments used the mAP (mean Average precision) to test the effectiveness of the improved model. The mAP is the most important and commonly used evaluation index in the target detection algorithm, and is one of the best evaluation criteria for measuring the comprehensive performance of the model. The AP (average precision) can be obtained by calculating the accuracy P (precision) and the recall rate R (recall), and the final mAP can be obtained by averaging the AP according to the category number. The calculation formulas of P, R, AP and mAP are shown in formulas (7) to (10):

wherein TP represents the number of targets predicted to be correct; FN represents the number of unpredicted targets; FP represents the number of targets of prediction error; AP is the average accuracy of each category; n is the total number of categories in the dataset.

Compared with the YOLOv3 algorithm, the detection precision of each defect type is improved, and the improved effectiveness is proved.

And (3) analyzing an experimental result:

(1) longitudinal comparison experiment and result analysis:

the test set was predicted using the original YOLOv3 and the improved YOLOv3 of the present invention, respectively, and the visual prediction results of the 6 defect classes were compared with the true label values, as shown in fig. 10 (the first line represents the true label values, the second line represents the prediction results of the original YOLOv3, and the third line represents the prediction results of the improved YOLOv3 of the present invention). Compared with the YOLOv3, the algorithm can detect more targets, and the size and the position of the prediction frame are closer to the real frame, in the detection of the 6 defect types, the detection effect of the algorithm is better than that of YOLOv3, and the improved algorithm can effectively improve the recall rate, improve the regression effect of the prediction frame and enable the prediction frame to be closer to the real frame.

Fig. 11 shows the comparison of the AP values of different categories between the original algorithm and the improved algorithm, and it can be seen from the figure that the model proposed by the present invention has a certain degree of improvement over the original model for all categories, and meanwhile, for 3 defect categories, namely crazing, inclusion and scraches, the improved algorithm precision is improved significantly, which is improved by 17%, 7% and 8%, respectively.

In order to further analyze the influence of each part improvement on the YOLOv3 algorithm and carry out ablation experimental study, the algorithm of the invention is divided into 4 groups for training and testing, the 1 st group is the YOLOv3 algorithm, the 4 th group is the algorithm of the invention, the improved methods provided by the invention are sequentially added into the 2-4 groups, and the experimental results are shown in table 1.

TABLE 1

As can be seen from Table 1, the prior frames of the group A using K-means + + clustering have higher average cross-over ratio, and mAP is slightly improved compared with the original algorithm; comparing the group A and the group B, the mAP is obviously improved after the characteristic extraction effect of the FPN structure enhanced model is improved, and although the addition of the PANet structure introduces some parameter quantity to reduce the FPS, the improvement is worthy in consideration of obvious precision improvement. Finally, compared with the algorithm of the invention, after DIOU is adopted as the bounding box regression loss, mAP is also improved to 72.83%.

In conclusion, the ablation experiments show that the improved method provided by YOLOv3 is effective, namely the detection accuracy is effectively improved under the condition of reducing the loss of the FPS as much as possible.

(2) Transverse contrast experiment and result analysis

The invention also compares the detection effect of the improved YOLOv3 algorithm with that of other more excellent target detection algorithms in the application of metal surface defect detection, and the result is shown in Table 2, so that the mAP of the improved YOLOv3 algorithm reaches 72.83%, the mAP is improved by 8.00% compared with the original YOLOv3 algorithm, and the AP values of 6 defects are improved; compared with single-stage target detection algorithms SSD and DSSD, the mAP is respectively 11.06% and 6.64% higher, the detection speed is far higher than that of the DSSD, and the real-time detection effect is achieved; compared with the two-stage detection algorithm, fast R-CNN, mAP is higher by 4.38%. In the aspect of FPS, although the algorithm is reduced compared with the original algorithm, real-time detection can still be carried out by using higher FPS; compared with the mainstream target detection algorithms Mobilenetv3-YOLOv4 and EfficientDet D6 in recent two years, the algorithm is comparable to the FPS of the former and the mAP is higher by 2.00 percent, although the mAP of the latter is higher by 3.49 percent, the real-time performance is extremely low due to the huge network structure of EfficientDet D6, the FPS is only 2.90, and the real-time industrial detection cannot be realized.

In conclusion, the comprehensive detection performance of the algorithm is optimal, and the higher detection requirement of metal surface defect detection is met.

TABLE 2

The invention provides an improved YOLOv3 metal surface defect detection algorithm for solving the problems of poor small target detection effect and unobvious target omission in metal surface defect detection. Firstly, obtaining a prior frame with higher average intersection ratio through K-means + + clustering; secondly, introducing a path aggregation network behind the FPN structure and replacing 5 continuous convolution blocks with ResBlock structures to enhance the algorithm feature extraction effect; and finally, using the DIOU as a new boundary box regression loss to optimize the boundary box regression effect. The final experimental result shows that compared with the original YOLOv3 and other mainstream target detection algorithms, the improved algorithm is improved to a certain extent, and the further accuracy requirement of metal surface defect detection can be met. In future research, the detection precision of the model is further improved, the detection speed is balanced, and the real-time detection requirement is met.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A metal surface defect detection method based on improved YOLOv3 is characterized by comprising the following steps:

s1, acquiring an image of the metal surface defect to be detected;

s3: inputting the image after image preprocessing into a trained metal surface defect detection model to obtain a metal surface defect detection result; the metal surface defect detection model is based on an improved YOLOv3 model; in the improved YOLOv3 model, a feature pyramid network transfers and fuses deep feature information in an up-sampling mode from top to bottom, then a path aggregation network is introduced, a feature pyramid structure from bottom to top is added behind a feature pyramid network layer, a down-sampling branch is added to the fused feature map except for prediction, and a shallow feature map and a deep feature map are fused again; the method comprises the steps of integrating 5 continuous CBL structures in a feature pyramid network again by using a residual error structure, Add operation and Concat operation, firstly, adding information through the Add operation to increase the information quantity for describing image features without changing the dimension of an image, then, stacking a feature graph obtained through the Add operation and a feature graph in jump connection through the Concat operation, increasing the features for describing the image through stacking the dimension, and finally, taking the feature graph passing through one CBL structure as the output of the residual error structure.

2. The improved YOLOv 3-based metal surface defect detection method according to claim 1, wherein the image preprocessing comprises:

3. The improved YOLOv 3-based metal surface defect detection method according to claim 2, wherein the image preprocessing further comprises:

the image was randomly subjected to flipping and gamut transformation operations with a probability of 0.5.

4. The improved YOLOv 3-based metal surface defect detection method as claimed in claim 1, wherein the improved YOLOv3 is implemented by using DIOU loss function to optimize bounding box regression.

5. The improved YOLOv 3-based metal surface defect detection method as claimed in claim 2, wherein the improved YOLOv3 adopts a K-means + + clustering algorithm to cluster the real box parameters to obtain the prior boxes with higher average intersection ratio with the real boxes.