CN114663346A - Strip steel surface defect detection method based on improved YOLOv5 network - Google Patents

Strip steel surface defect detection method based on improved YOLOv5 network Download PDF

Info

Publication number
CN114663346A
CN114663346A CN202210113743.4A CN202210113743A CN114663346A CN 114663346 A CN114663346 A CN 114663346A CN 202210113743 A CN202210113743 A CN 202210113743A CN 114663346 A CN114663346 A CN 114663346A
Authority
CN
China
Prior art keywords
defect
network model
module
training
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210113743.4A
Other languages
Chinese (zh)
Inventor
石肖松
刘坤
杨晓松
孟蕊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei University of Technology
Original Assignee
Hebei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei University of Technology filed Critical Hebei University of Technology
Priority to CN202210113743.4A priority Critical patent/CN114663346A/en
Publication of CN114663346A publication Critical patent/CN114663346A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30136Metal

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a strip steel surface defect detection method based on an improved YOLOv5 network, which is based on a YOLOv5 network model and is added with a self-designed channel space attention module, thereby improving the detection precision and solving the problem of feature extraction in complex scenes and backgrounds. The detection method provided by the invention gives full play to the advantage of extracting the characteristics by the deep learning method, can learn simple shallow characteristics from a large amount of data set without depending on manual characteristic engineering, and then gradually learn more complex and abstract deep characteristics, and has the advantages of better performance, higher defect type identification precision, high defect accuracy and recall rate of the lithium battery and high identification speed.

Description

Strip steel surface defect detection method based on improved YOLOv5 network
Technical Field
The invention belongs to the technical field of industrial defect detection, and particularly relates to a method for detecting defects on the surface of strip steel of a YOLOv5 network based on a spatial channel attention module.
Background
The strip steel is one of important raw materials of steel, is widely applied to mechanical manufacturing, aerospace and transportation, and plays an important role in various production lives, but in the production process of the strip steel, due to the limitation of industrial technology and the influence of production process, various defects such as surface oil spots, almond-shaped defects, white spots, scratches and the like can be caused on the surface of the strip steel. These defects largely affect the corrosion resistance and the service life of the strip. The existing defect detection means mainly uses manual naked eye detection, has low detection efficiency of workers, high labor intensity and high production cost, and can not meet the requirement of detecting the surface defects of the strip steel.
The deep learning automatically extracts and learns the defect characteristics through the convolutional neural network without designing artificial characteristic factors, so that the deep neural network has the characteristics of strong learning capability and high robustness and gradually becomes a mainstream method for detecting the surface defects of the strip steel. Wengyushang et al (Wengyushang, Xiaojin ball, Xiayuang. improved Mask R-CNN algorithm strip steel surface defect detection [ J/OL ] computer engineering and application: 1-12[2021-06-24 ]) propose an improved Mask region convolution neural network (Mask R-CNN) algorithm, and use a k-means II clustering algorithm to improve a region suggestion network (RPN) anchor frame generation method. Li Wei Steel et al (Li Wei Steel, leaf Xin, Zhao Yuntao, Wang Wen Bo. strip steel surface defect detection based on improved YOLOv3 algorithm [ J ]. electronic newspaper, 2020) proposed a YOLOV3 algorithm framework which fuses shallow features and deep features, and the average precision mean of the improved YOLOv3 algorithm on a strip steel data set of northeast university reaches 80%. However, when weak and tiny strip steel defects are faced, the deep learning model cannot well extract features due to the fact that the background and the foreground are high in coupling degree and small in defect area, and therefore the model detection effect is poor.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to solve the technical problem of providing the strip steel surface defect detection method based on the improved YOLOv5 network, which can detect the surface defects of the strip steel of different types in real time and locate the defects, improves the accuracy of the defect identification of different types and similar structures, and can meet the requirements of real-time property and accuracy of the actual industrial production of the strip steel.
The technical scheme adopted by the invention for solving the technical problems is as follows: a strip steel surface defect detection method based on an improved YOLOv5 network is designed, and is characterized by comprising the following steps:
the first step is as follows: image dataset acquisition
1.1, acquiring a surface image of the strip steel by using an industrial camera, and screening out a picture containing a defect; when the defect type in the screened defect image covers the known type of the surface defect of the strip steel, a defect image set is formed;
1.2, carrying out size normalization operation on the defect picture set, and then manually labeling the pictures in the defect picture set by using Labelimg software to enable each defect picture to have a label of a defect type and a defect position coordinate;
1.3 randomly dividing not less than 60% of the marked defect picture set into a training set, and taking the rest as a verification set;
the second step is that: construction of improved YOLOv5 network model
The improved YOLOv5 network model is characterized in that on the basis of a YOLOv5 network model, three CSP23_ modules of a PAN and three conv modules of a classification and positioning part of the network model are connected in series to form a CSA module;
the CSA module comprises a channel attention module and a space attention module, the two modules are connected in series, and the output of the channel attention module is the input of the space attention module;
CSP23_ Module output feature map F1Input into the CSA module, first passes through the channel attention moduleThe channel attention module first inputs the feature F1Respectively carrying out global maximum pooling and global average pooling based on depth and width to obtain two 1 × 1 × C feature maps; secondly, respectively carrying out fast one-dimensional convolution processing on the two obtained 1 multiplied by C characteristic graphs with a convolution kernel size of k, adding results obtained by the two fast one-dimensional convolutions, and then carrying out sigmoid processing on the results to obtain channel attention; multiplying the channel attention by the original feature F1Carrying out feature re-weighting to obtain weighted feature F2
Feature F of channel attention module output2Input to the spatial attention Module, spatial attention Module to feature F2Carrying out global maximum pooling and global average pooling respectively to obtain two characteristic graphs of HxWx1; then, performing channel splicing operation on the two H multiplied by W multiplied by 1 feature graphs based on channels, performing convolution operation with a convolution kernel of 7 multiplied by 7 to obtain a result, reducing the dimension into a one-dimensional vector, namely H multiplied by W multiplied by 1, and generating a spatial attention weight through an activation function sigmoid; finally, the spatial attention weight and the input feature F are combined2Multiplying to obtain the output characteristic F of the space attention module3(ii) a Characteristic F3The CSA module is used for classifying and positioning the network model, namely the output of the CSA module and the input of a conv module of the network model;
the third step: training improved Yolov5 network model
3.1 image dataset preprocessing
Preprocessing the training set in a Mosaic data enhancement mode;
3.2 parameter settings
Initializing all weight values, bias values and batch normalization scale factor values, setting the initial learning rate and batch _ size of the network, and inputting initialized parameter data into the network; dynamically adjusting the learning rate and the iteration times according to the change of the training loss so as to update the parameters of the whole network; the training is divided into two stages, the first stage is the first 100 periods of the training, and the initial learning rate is fixed to be 0.001 so as to accelerate convergence; the second stage refers to a training period 100 periods later, and the initial learning rate is set to 0.0001;
3.3 network model training
Inputting the preprocessed training set into an improved YOLOv5 network model with set initialization parameters in the second step for feature extraction, automatically generating an anchor frame for the images of the training set by using a K-means clustering method, taking the size of the anchor frame as a prior frame, and obtaining a boundary frame through frame regression prediction; then, classifying the bounding boxes by using a logistic classifier to obtain defect class classification probability corresponding to each bounding box; sorting the defect type classification probabilities of all the boundary frames by a non-maximum value inhibition method, and determining the defect type corresponding to each boundary frame to obtain a predicted value; the predicted value comprises defect type and defect position information, and the non-maximum inhibition threshold value is 0.5; then calculating a loss value between the predicted value and the true value through a loss function GIOU; performing back propagation according to the training loss value, updating parameters of the backbone network and the classification regression network until the loss value accords with the preset value, and finishing the training of the network model parameters;
3.4 network model testing
Inputting the verification set into the network model which completes the parameter training in the step 3.3 to obtain a tensor prediction value of the verification set; comparing the predicted value of the tensor with the labeling information, and testing the reliability of the network model; evaluating the network model by using the AP, and testing the network model to be reliable when the AP is not less than 85%;
the fourth step: strip steel surface defect detection
And (3) performing the same size normalization operation as in the step 1.2 in the first step on the surface image of the strip steel to be detected, and then inputting the image into the network model tested as reliable in the third step to obtain the defect tensor information of the surface image of the strip steel to be detected, wherein the defect tensor information comprises a defect position, a defect type and a confidence coefficient.
Compared with the prior art, the invention has the beneficial effects that: the detection method is based on a YOLOv5 network model, and a self-designed channel space attention module is added; the channel space attention module performs attention operation by fusing the shallow features and the deep features together, so that the deep layer contains more high-level semantic information and less background information, the target information after the fusion of the shallow features and the deep features is strengthened, more attention target defects of a network can be paid to when attention operation is performed, background information is restrained, multi-scale fusion can be better guided, detection precision is improved, and the problem of feature extraction in complex scenes and under the background is solved. The detection method provided by the invention gives full play to the advantage of extracting the characteristics by the deep learning method, can learn simple shallow characteristics from a large amount of data set without depending on manual characteristic engineering, and then gradually learn more complex and abstract deep characteristics, and has the advantages of better performance, higher defect type identification precision, high defect accuracy and recall rate of the lithium battery and high identification speed.
Drawings
Fig. 1 is a schematic structural and schematic diagram of a CSA module according to an embodiment of the detection method of the present invention.
Fig. 2 is a schematic structural and schematic diagram of an improved YOLOv5 network model according to an embodiment of the detection method of the present invention.
Detailed Description
The technical solutions of the present application will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without making any creative effort belong to the protection scope of the present application.
The invention provides a strip steel surface defect detection method (a detection method for short, see figures 1-2) based on an improved YOLOv5 network, which comprises the following steps:
the first step is as follows: image dataset acquisition
1.1, acquiring a surface image of the strip steel by using an industrial camera, and screening out a picture containing a defect; when the defect type in the screened defect image covers the known type of the surface defect of the strip steel, a defect image set is formed;
1.2, performing size normalization operation on the defect picture set (scaling to 608 × 608 pixels in this embodiment), and then manually labeling the pictures in the defect picture set by using Labelimg software, so that each defect picture has a label of the defect type and the defect position coordinate;
1.3 randomly dividing not less than 60% of the marked defect picture set into a training set, and taking the rest as a verification set; the present implementation is 4:1, i.e. 80% of the training set, the remaining 20% being the validation set.
The second step is that: construction of improved YOLOv5 network model
The improved YOLOv5 network model is based on the YOLOv5 network model, and one CSA (channel space attention) module is serially connected between three CSP23_ (cross-stage local networks, respectively CSP23_5, CSP23_4 and CSP23_3) modules of PAN (pixel aggregation network) and three conv (convolution) modules of the classification and positioning part of the network model.
The CSA module includes a channel attention (channeltention) module and a spatial attention (SpatialAttention) module, both modules being connected in series, the output of the channel attention module being the input of the spatial attention module.
CSP23_ Module output feature map F1(CxW x H) is input into the CSA module, and is first processed by the channel attention module, which first inputs the input features F1(C × W × H) were subjected to global max pooling (Maxpool) and global mean pooling (AvgPool) based on depth and width, respectively, resulting in two C × 1 × 1 feature maps. Secondly, processing the two obtained Cx 1 x 1 feature graphs by a fast one-dimensional convolution (Conv1d) with a convolution kernel of k, adding the results obtained by the two fast one-dimensional convolution, and processing by an activation function sigmoid to obtain channel attention; multiplying the channel attention by the original feature F1Re-weighting the features to obtain weighted features F2
The convolution kernel size k of the fast one-dimensional convolution represents the coverage of local cross-channel interaction, i.e. how many neighbors participate in the attention prediction of the one-dimensional channel. Wherein the coverage of interaction (i.e. kernel size k) is proportional to the channel dimension, and the specific calculation formula is:
Figure BDA0003495617880000071
where C denotes the number of eigenchannels, β ═ 2 and b ═ 1 denote two superparameters.
Feature F of channel attention module output2Input to the spatial attention Module, spatial attention Module on feature F2Respectively carrying out global maximum pooling (Max Pool) and global average pooling (Mean Pool) treatment to obtain two characteristic graphs of 1 multiplied by W multiplied by H; and then performing channel splicing (Concat) operation on the two 1 xWxH feature graphs based on channels, performing convolution operation with a convolution kernel of 7 x 7 on a splicing operation result to reduce the dimension to a one-dimensional vector, namely 1 xWxH, and generating a spatial attention weight through an activation function sigmoid. Finally, the spatial attention weight and the input feature F are combined2Multiplying to obtain the output characteristic F of the space attention module3. Characteristic F3Namely the output of the CSA module and the input of the conv module of the classification and positioning part of the network model.
The third step: training improved Yolov5 network model
3.1 image dataset preprocessing
Preprocessing the training set in a Mosaic data enhancement mode;
3.2 parameter settings
Initializing all weight values, bias values and batch normalization scale factor values, setting the initial learning rate and batch _ size of the network, and inputting initialized parameter data into the network; dynamically adjusting the learning rate and the iteration times according to the change of the training loss so as to update the parameters of the whole network; the training is divided into two stages, the first stage is the first 100 periods of the training, and the initial learning rate is fixed to be 0.001 so as to accelerate convergence; the second stage refers to a training period 100 periods later, and the initial learning rate is set to 0.0001.
3.3 network model training
Inputting the preprocessed training set into an improved YOLOv5 network model with set initialization parameters in the second step for feature extraction, automatically generating an anchor frame for the images of the training set by using a K-means clustering method, taking the size of the anchor frame (the size of the anchor frame can be scaled proportionally according to the scaling size of the images) as a prior frame, and obtaining a boundary frame through frame regression prediction; then, classifying the bounding boxes by using a logistic classifier to obtain defect class classification probability corresponding to each bounding box; sorting the defect category classification probabilities of all the bounding boxes by a non-maximum value suppression (NMS) method, and determining the defect category corresponding to each bounding box to obtain a predicted value; the predicted value comprises defect type and defect position information, and the non-maximum inhibition threshold value is 0.5; then calculating a loss value between the predicted value and the true value through a loss function GIOU; performing backward propagation according to the training loss value, updating parameters of the backbone network and the classification regression network until the loss value accords with the preset value, and finishing the training of the network model parameters;
3.4 network model testing
Inputting the verification set into the network model which completes parameter training in the step 3.3 to obtain a tensor prediction value of the verification set; comparing the predicted value of the tensor with the labeling information, and testing the reliability of the network model; and evaluating the network model by using the AP, wherein the network model is tested to be reliable when the AP is not less than 85%.
The fourth step: strip steel surface defect detection
The same size normalization operation as in step 1.2 of the first step is performed on the surface image of the strip steel to be detected, and then the surface image of the strip steel is input into the network model tested as reliable in the third step, so that the defect tensor information of the surface image of the strip steel to be detected, including the defect position, the defect type and the confidence coefficient (the maximum value of the defect type classification probability), is obtained.
The YOLOv5 model enriches the image content by adopting a Mosaic data enhancement mode on the image of the input end, and improves the detection effect on lighter smoke or smaller smoke areas.
The Mosaic data enhancement has the following characteristics:
firstly, taking a Batch of (Batch refers to Batch, Batch refers to Batch size, and is a hyper-parameter of the model, and the Batch is equal to 32 in the embodiment) images from a training sample set; secondly, randomly selecting 4 images from the batch of images, and randomly operating the four images in a color gamut changing, reducing, reversing and/or cutting mode, wherein at least one operation is performed on each image; then, the 4 images are arranged according to four directions of left upper, left lower, right upper and right lower and then spliced to obtain a new image, and the size of the new image is the same as that of the original image which is not operated, namely 608 multiplied by 3; the above operation is repeated, and the number of cycles is set equal to the Batch size.
The Focus module is connected with a structure consisting of three groups of CBL (convolutional layer Convolution-Batch standardization Batch normalization-activation function Leaky Relu, CBL) Convolution structures and a cross-stage local network CSP1_ X module, and then is connected with a pooling network SPP module to form a feature extraction network; slicing operation is carried out on the image by using a Focus module, the image passes through the Focus module and then is sent into a structure consisting of a CBL convolution structure and a CSP1_ X module, and then the low-level features and the high-level features are fused by using an SPP module.
The Focus module has the following characteristics:
the method comprises the steps of firstly marking an original image with the size of 608 multiplied by 3 by four numbers of 1, 2, 3 and 4, secondly combining pixels with the same number into 4 parts with the size of 304 multiplied by 3, then splicing the 4 parts into a feature map with the size of 304 multiplied by 12 in the depth direction according to the number size, and then connecting a CBL convolution structure.
The characteristics of the CBL convolution structure contained in the Focus module are as follows: the convolution (conv) has a convolution kernel number of 64, a size of 3 × 3, and a step size of 1.
The CSP1_ X module has the following features:
x represents the number of residual error structures, and the structures except the residual error structures are the same.
Taking the CSP1_3 module as an example, the CSP1_3 module firstly performs CBL convolution operation on the input feature map; then, the feature images are sent into 3 residual error structures, convolution is carried out on the feature images passing through the residual error structures, and splicing concat in the depth direction is carried out on new feature images obtained after direct convolution with the input feature images; finally, the data is input into the next module through batch standardization, an activation function Leakyrelu and a layer of CBL convolution structure.
The convolutional layer conv of the CBL convolutional structure directly convolving the input feature map in the CSP1_ X module is the same as the convolutional layer conv in the last CBL convolutional structure in the CSP1_ X module, and the relevant sizes of the convolutional layer conv are as follows: the convolution kernel size is 1 × 1, with a step size of 1.
And (3) designing a feature reprocessing network: the method comprises the following steps of adopting a structure of a feature pyramid FPN and a pixel aggregation network PAN, wherein the FPN structure is formed by connecting two groups of CSP2_3 modules, a CBL convolution structure and an up-sampling up sample structure in series; the PAN comprises two CBL convolution structures, down-sampling the data.
Carrying out depth direction tensor splicing concat on the output of each up-sampling structure in the FPN and the output feature maps of CSP1_9-1 and CSP1_9-2 modules in the feature extraction network, simultaneously carrying out depth direction tensor splicing concat on the output of each CBL convolution structure in the FPN and the feature maps with the corresponding sizes of the CBL convolution structures in the PAN, respectively adding a CSP2_3 module and an SPP module after the PAN structure passes through the CBL convolution structure each time, and adding a CSP2_3 module and an SPP module before the first CBL convolution structure of the PAN structure; neither CSP2_3 nor SPP module changed the feature map size, the output of CSP2_3-3 module in Neck was 76 × 76, and then connected to the CBL convolution structure to change the image size to 38 × 38, and the output of CSP2_3-4 module was connected to the input of the CBL convolution structure to change the image to 19 × 19.
The FPN comprises two CBL convolution structures, but CBL is convolution with step distance of 1, has no influence on the size of the characteristic diagram, is the size of the characteristic diagram changed by up-sampling up sample in the FPN, and learns multi-scale target information.
The cross-stage local network CSP2_3 module has the characteristics of:
the structure of the module is the same as the structure of each residual structure of the CSP1_3 module after the add fusion process is deleted. The CSP2_3 module comprises an initial CBL convolution structure, a tail CBL convolution structure and a plurality of repeating units, wherein 1 x 1 sized convolution layers are connected with 3 x 3 convolution layers through batch normalization and activation functions Leaky relu, the 3 x 3 convolution layers are connected with batch normalization and activation functions Leaky relu to form repeating units, the number of the repeating units is three, the input of the first repeating unit is connected with the output of the initial CBL convolution structure, the output of the three repeating units after being sequentially connected in series is connected with one layer of convolution layer, the output of the convolution layer and the original input of the original convolution layer are spliced after one layer of convolution layer, and the tail CBL convolution structure is connected through batch normalization and activation functions Leaky relu after splicing.
The CBL convolution structure in FPN has the characteristics of: the convolution kernels are all 1 × 1 in size and the steps are all 1.
The CBL convolution structure in PAN has the characteristics: the convolution kernels are all 3 × 3 in size, and the steps are all 2.
The SPP module consists of four parallel pooling layers with maximum pooling kernel sizes of 1 × 1, 5 × 5, 9 × 9, 13 × 13, respectively, and the SPP module itself contains two layers of CBL convolution structures, at the beginning and end.
The output of the YOLOv5 model inherits the idea of YOLOv3, and is detected by adopting 3 scale feature maps, wherein the feature maps are 19 × 19, 38 × 38 and 76 × 76 respectively. And distributing a corresponding number of Anchor boxes for each scale behind each added SPP module, generating 9 Anchor boxes for each pixel in the characteristic diagram, screening out an optimal frame by weighting non-maximum value inhibition, and returning GIOU as a loss function to the network to train parameters.
This embodiment is implemented under a centos7.9.2 platform, and implemented using Python programming, and the computer performance of the test network model and the training network model is as follows: tesla v100, interl Xeon (R) Gold 6271c CPU @2.6 GHz; the framework used is the pytorch deep learning framework. The learning rate of the YOLOv5 model is selected to be λ 0.01, and the training step number is 500 times.
This embodiment adopts AP (Average Precision) and mAP (Average Precision) for evaluation. In target detection, each category corresponds to an accuracy rate and a recall rate. Common evaluation indexes in recent years are AP and mAP, wherein the AP is the area under an accuracy (Precision) -Recall (call) curve, and the P-R curve is drawn by taking Precision as a y axis and Recall as an x axis. The quality of the judging model is mainly determined by the size of the area under the curve. The mAP is the average of the APs of a plurality of categories.
Figure BDA0003495617880000121
Figure BDA0003495617880000122
Where TP is the number of positive cases correctly divided into positive cases, FP is the number of positive cases incorrectly divided into negative cases, and FN is the number of negative cases divided into positive cases.
The AP calculation method comprises the following steps:
Figure BDA0003495617880000123
where p denotes accuracy (precision) and r denotes recall (recall).
In practical statistical situations, accuracy and recall are not continuous curves, independent finite numbers. Therefore, discrete statistics are calculated:
Figure BDA0003495617880000124
wherein, N represents the total number of images of the data set to be detected, p (k) represents the accuracy rate of the model for identifying k images, and Δ r (k) represents the recall rate change condition of the model when identifying k images to k-1 images.
In target detection, it is usually determined whether the model correctly detects the target by using how much the predicted overlap degree between the mark frame and the real mark frame of the target is, where the overlap degree is also called iou (interaction Over union), a threshold value of IoU is generally set to be 0.5, and if IoU obtained by calculation of the model is greater than 0.5, it is determined that the target is correctly detected, and a diagram IoU is shown in the figure.
The calculation formula is as follows:
Figure BDA0003495617880000125
wherein, A ^ B represents the overlapping area of the prediction frame and the target frame, and A ^ B represents the union area of the prediction frame and the target frame.
In the embodiment, experiments are carried out on 4 defect images including block defects, scratches, oil spots and white spots on the surface of the strip steel, wherein the identification accuracy rate of the white spots is about 87%, the identification rates of all the other defects are more than 90%, and the identification rates of the two defects with similar structures, namely the block defects and the oil spots, are higher.
Nothing in this specification is said to apply to the prior art.

Claims (4)

1. A strip steel surface defect detection method based on an improved YOLOv5 network is characterized by comprising the following steps:
the first step is as follows: image dataset acquisition
1.1, acquiring a surface image of the strip steel by using an industrial camera, and screening out a picture containing a defect; when the defect type in the screened defect image covers the known type of the surface defect of the strip steel, a defect image set is formed;
1.2, carrying out size normalization operation on the defect picture set, and then manually labeling the pictures in the defect picture set by using Labelimg software to enable each defect picture to have a label of defect type and defect position coordinates;
1.3 randomly dividing not less than 60% of the marked defect picture set into a training set, and taking the rest as a verification set;
the second step: improved YOLOv5 network model
The improved YOLOv5 network model is characterized in that on the basis of a YOLOv5 network model, a CSA module is connected in series between three CSP23_ modules of a PAN and three conv modules of a classification and positioning part of the network model;
the CSA module comprises a channel attention module and a space attention module, wherein the two modules are connected in series, and the output of the channel attention module is the input of the space attention module;
CSP23_ Module output feature map F1The input is processed by a channel attention module firstly, and the input characteristic F is processed by the channel attention module firstly1Respectively carrying out global maximum pooling and global average pooling based on depth and width to obtain two Cx 1 x 1 feature maps; secondly, respectively carrying out fast one-dimensional convolution processing on the two obtained Cx 1 x 1 characteristic graphs with a convolution kernel size of k, adding results obtained by the two fast one-dimensional convolutions, and carrying out sigmoid processing on the results to obtain channel attention; multiplying the channel attention by the original feature F1Re-weighting the features to obtain weighted features F2
Feature F of channel attention module output2Input to the spatial attention Module, spatial attention Module to feature F2Respectively carrying out global maximum pooling and global average pooling to obtain two characteristic graphs of 1 multiplied by W multiplied by H; then, performing channel splicing operation on the two 1 xWxH feature graphs based on channels, performing convolution operation with a convolution kernel of 7 x 7 on the result obtained by the splicing operation, reducing the dimension to a one-dimensional vector, namely 1 xWxH, and generating a spatial attention weight through an activation function sigmoid; finally, the spatial attention weight and the input feature F are combined2Multiplying to obtain the output characteristic F of the space attention module3(ii) a Characteristic F3The CSA module is the output of the CSA module and is also the input of the conv module of the classification and positioning part of the network model;
the third step: training improved Yolov5 network model
3.1 image dataset preprocessing
Preprocessing the training set in a Mosaic data enhancement mode;
3.2 parameter settings
Initializing all weight values, bias values and batch normalization scale factor values, setting the initial learning rate and batch _ size of the network, and inputting initialized parameter data into the network; dynamically adjusting the learning rate and the iteration times according to the change of the training loss so as to update the parameters of the whole network; the training is divided into two stages, the first stage is the first 100 periods of the training, and the initial learning rate is fixed to be 0.001 so as to accelerate convergence; the second stage refers to a training period 100 periods later, and the initial learning rate is set to 0.0001;
3.3 network model training
Inputting the preprocessed training set into an improved YOLOv5 network model with set initialization parameters in the second step for feature extraction, automatically generating an anchor frame for the images of the training set by using a K-means clustering method, taking the size of the anchor frame as a prior frame, and obtaining a boundary frame through frame regression prediction; then, classifying the bounding boxes by using a logistic classifier to obtain defect class classification probability corresponding to each bounding box; sorting the defect type classification probabilities of all the boundary frames by a non-maximum value inhibition method, and determining the defect type corresponding to each boundary frame to obtain a predicted value; the predicted value comprises defect type and defect position information, and the non-maximum inhibition threshold value is 0.5; then calculating a loss value between the predicted value and the true value through a loss function GIOU; performing back propagation according to the training loss value, updating parameters of the backbone network and the classification regression network until the loss value accords with the preset value, and finishing the training of the network model parameters;
3.4 network model testing
Inputting the verification set into the network model which completes parameter training in the step 3.3 to obtain a tensor prediction value of the verification set; comparing the predicted value of the tensor with the labeling information, and testing the reliability of the network model; evaluating the network model by using the AP, and testing the network model to be reliable when the AP is not less than 85%;
the fourth step: strip steel surface defect detection
And (3) performing the same size normalization operation as in the step 1.2 of the first step on the surface image of the strip steel to be detected, and then inputting the image into the network model tested to be reliable in the third step to obtain the defect tensor information of the surface image of the strip steel to be detected, including the defect position, the defect type and the confidence coefficient.
2. The strip steel surface defect detection method based on the improved YOLOv5 network as claimed in claim 1, wherein the convolution kernel size k of the fast one-dimensional convolution in the channel attention module represents the coverage of local cross-channel interaction, i.e. how many neighbors participate in the attention prediction of the one-dimensional channel; wherein the coverage of the interaction k is proportional to the channel dimension, and the specific calculation formula is:
Figure FDA0003495617870000031
where C denotes the number of eigenchannels, β ═ 2 and b ═ 1 denote two superparameters.
3. The method as claimed in claim 1, wherein in step 1.2 of the first step, the size normalization is performed to scale the image to 608 × 608 pixels.
4. The method for detecting the surface defects of the steel strip based on the improved YOLOv5 network as claimed in claim 1, wherein in step 1.3 of the first step, the training set is 80%, and the rest 20% is the verification set.
CN202210113743.4A 2022-01-30 2022-01-30 Strip steel surface defect detection method based on improved YOLOv5 network Pending CN114663346A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210113743.4A CN114663346A (en) 2022-01-30 2022-01-30 Strip steel surface defect detection method based on improved YOLOv5 network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210113743.4A CN114663346A (en) 2022-01-30 2022-01-30 Strip steel surface defect detection method based on improved YOLOv5 network

Publications (1)

Publication Number Publication Date
CN114663346A true CN114663346A (en) 2022-06-24

Family

ID=82025734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210113743.4A Pending CN114663346A (en) 2022-01-30 2022-01-30 Strip steel surface defect detection method based on improved YOLOv5 network

Country Status (1)

Country Link
CN (1) CN114663346A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147375A (en) * 2022-07-04 2022-10-04 河海大学 Concrete surface defect characteristic detection method based on multi-scale attention
CN115205568A (en) * 2022-07-13 2022-10-18 昆明理工大学 Road traffic multi-factor detection method with multi-scale feature fusion
CN115861855A (en) * 2022-12-15 2023-03-28 福建亿山能源管理有限公司 Operation and maintenance monitoring method and system for photovoltaic power station
CN116342531A (en) * 2023-03-27 2023-06-27 中国十七冶集团有限公司 Light-weight large-scale building high-altitude steel structure weld defect identification model, weld quality detection device and method
CN116523902A (en) * 2023-06-21 2023-08-01 湖南盛鼎科技发展有限责任公司 Electronic powder coating uniformity detection method and device based on improved YOLOV5
CN116612124A (en) * 2023-07-21 2023-08-18 国网四川省电力公司电力科学研究院 Transmission line defect detection method based on double-branch serial mixed attention
CN116664558A (en) * 2023-07-28 2023-08-29 广东石油化工学院 Method, system and computer equipment for detecting surface defects of steel
CN117252899A (en) * 2023-09-26 2023-12-19 探维科技(苏州)有限公司 Target tracking method and device
CN117274263A (en) * 2023-11-22 2023-12-22 泸州通源电子科技有限公司 Display scar defect detection method
WO2024113541A1 (en) * 2022-11-30 2024-06-06 宁德时代新能源科技股份有限公司 Battery cell electrode sheet inspection method and apparatus, and electronic device

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147375A (en) * 2022-07-04 2022-10-04 河海大学 Concrete surface defect characteristic detection method based on multi-scale attention
CN115205568A (en) * 2022-07-13 2022-10-18 昆明理工大学 Road traffic multi-factor detection method with multi-scale feature fusion
CN115205568B (en) * 2022-07-13 2024-04-19 昆明理工大学 Road traffic multi-element detection method based on multi-scale feature fusion
WO2024113541A1 (en) * 2022-11-30 2024-06-06 宁德时代新能源科技股份有限公司 Battery cell electrode sheet inspection method and apparatus, and electronic device
CN115861855A (en) * 2022-12-15 2023-03-28 福建亿山能源管理有限公司 Operation and maintenance monitoring method and system for photovoltaic power station
CN115861855B (en) * 2022-12-15 2023-10-24 福建亿山能源管理有限公司 Operation and maintenance monitoring method and system for photovoltaic power station
CN116342531B (en) * 2023-03-27 2024-01-19 中国十七冶集团有限公司 Device and method for detecting quality of welding seam of high-altitude steel structure of lightweight large-scale building
CN116342531A (en) * 2023-03-27 2023-06-27 中国十七冶集团有限公司 Light-weight large-scale building high-altitude steel structure weld defect identification model, weld quality detection device and method
CN116523902A (en) * 2023-06-21 2023-08-01 湖南盛鼎科技发展有限责任公司 Electronic powder coating uniformity detection method and device based on improved YOLOV5
CN116523902B (en) * 2023-06-21 2023-09-26 湖南盛鼎科技发展有限责任公司 Electronic powder coating uniformity detection method and device based on improved YOLOV5
CN116612124B (en) * 2023-07-21 2023-10-20 国网四川省电力公司电力科学研究院 Transmission line defect detection method based on double-branch serial mixed attention
CN116612124A (en) * 2023-07-21 2023-08-18 国网四川省电力公司电力科学研究院 Transmission line defect detection method based on double-branch serial mixed attention
CN116664558B (en) * 2023-07-28 2023-11-21 广东石油化工学院 Method, system and computer equipment for detecting surface defects of steel
CN116664558A (en) * 2023-07-28 2023-08-29 广东石油化工学院 Method, system and computer equipment for detecting surface defects of steel
CN117252899A (en) * 2023-09-26 2023-12-19 探维科技(苏州)有限公司 Target tracking method and device
CN117252899B (en) * 2023-09-26 2024-05-17 探维科技(苏州)有限公司 Target tracking method and device
CN117274263A (en) * 2023-11-22 2023-12-22 泸州通源电子科技有限公司 Display scar defect detection method
CN117274263B (en) * 2023-11-22 2024-01-26 泸州通源电子科技有限公司 Display scar defect detection method

Similar Documents

Publication Publication Date Title
CN114663346A (en) Strip steel surface defect detection method based on improved YOLOv5 network
CN111310862B (en) Image enhancement-based deep neural network license plate positioning method in complex environment
CN113160192B (en) Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN107016413B (en) A kind of online stage division of tobacco leaf based on deep learning algorithm
CN112241699A (en) Object defect category identification method and device, computer equipment and storage medium
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN111815564B (en) Method and device for detecting silk ingots and silk ingot sorting system
CN112560675B (en) Bird visual target detection method combining YOLO and rotation-fusion strategy
CN112287941B (en) License plate recognition method based on automatic character region perception
CN113920107A (en) Insulator damage detection method based on improved yolov5 algorithm
CN112926652B (en) Fish fine granularity image recognition method based on deep learning
CN111242026A (en) Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN113469950A (en) Method for diagnosing abnormal heating defect of composite insulator based on deep learning
CN114359245A (en) Method for detecting surface defects of products in industrial scene
CN116342536A (en) Aluminum strip surface defect detection method, system and equipment based on lightweight model
CN115239672A (en) Defect detection method and device, equipment and storage medium
CN116740758A (en) Bird image recognition method and system for preventing misjudgment
CN116071294A (en) Optical fiber surface defect detection method and device
CN111598854A (en) Complex texture small defect segmentation method based on rich robust convolution characteristic model
CN111540203A (en) Method for adjusting green light passing time based on fast-RCNN
CN114549489A (en) Carved lipstick quality inspection-oriented instance segmentation defect detection method
CN113516652A (en) Battery surface defect and adhesive detection method, device, medium and electronic equipment
CN116740572A (en) Marine vessel target detection method and system based on improved YOLOX
CN112348762A (en) Single image rain removing method for generating confrontation network based on multi-scale fusion
CN113887455B (en) Face mask detection system and method based on improved FCOS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination