CN117576038A

CN117576038A - Fabric flaw detection method and system based on YOLOv8 network

Info

Publication number: CN117576038A
Application number: CN202311553890.4A
Authority: CN
Inventors: 吕文涛; 余凯
Original assignee: Zhejiang Sci Tech University ZSTU
Current assignee: Zhejiang Sci Tech University ZSTU
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-02-20

Abstract

The invention discloses a fabric surface flaw detection method and system based on improved YOLO v8, wherein the method comprises the following steps: s1, acquiring a fabric image dataset; s2, data enhancement is carried out; s3, dividing according to a preset proportion to obtain a training set, a verification set and a test set; s4, obtaining N prediction feature graphs; s5, for N prediction feature maps, predicting probability that each pixel in the image belongs to an object to be detected and boundary frame information of the object to generate a boundary frame; s6, calculating a network loss value according to the boundary frame and the GT frame of the corresponding picture, and updating parameters of the YOLOv8 network by using a gradient descent method; s7, repeating the steps S4-S6 until all pictures in the training set are input into the network once; performing loop iteration until mAP is stabilized at a certain value to obtain a training-completed YOLOv8 network; s8, predicting all images in the test set by using a YOLOv8 network to obtain a prediction frame on the feature map; and mapping the prediction frame to the corresponding original image according to the proportional relation between the feature image and the original image so as to position the flaw.

Description

Fabric flaw detection method and system based on YOLOv8 network

Technical Field

The invention belongs to the technical field of fabric surface flaw detection, and particularly relates to a fabric surface flaw detection method and system based on an improved YOLOv8 model.

Background

Real-time automatic defect detection plays an important role in the textile industry, ensures the product quality and improves the competitiveness. In the past, defect detection in fabric images was performed manually. Because the types of the defects of the fabrics are various, the background textures are complex, the detection work depends on the experience of the inspector to a great extent, and the real-time requirements are difficult to meet. Although the traditional method can improve the detection effect based on computer vision, the traditional method has the inherent defects of low precision, low speed, complex process, poor generalization performance and the like. In contrast, detection methods based on deep learning have great potential in solving these problems. In modern fabric production, specialized quality inspection personnel are typically required to check the fabric surface for defects. However, this manual detection method has problems of low efficiency and high subjectivity. The inspection personnel are inevitably tired, which may lead to false inspection or missing inspection, affecting product quality and production efficiency. It is therefore highly desirable to develop an automated fabric defect detection technique.

In recent years, deep Convolutional Neural Networks (CNNs) have attracted a great deal of work to study in terms of image classification, detection, and segmentation tasks. In the field of target detection, a great deal of excellent work has also emerged, typically as the detection models of the R-CNN and YOLO series, and are now constantly iterating. Compared with the prior art, the detection accuracy and the detection speed of the existing fabric defect detection technology have certain defects due to the characteristics of high detection speed and light weight of the YOLO series algorithm, so that the accuracy of the yolov8 algorithm is further improved on the basis of ensuring the detection efficiency.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention provides a fabric flaw detection method and a system based on an improved YOLOv8 network, an E-ASPP module is introduced at a specific position of the network, and downward sampling optimization is carried out on PANet (path aggregation network), so that the problems of complex background interference, low detection speed and the like existing in a fabric flaw detection task are solved.

The invention adopts the following technical scheme:

the fabric flaw detection method based on the YOLOv8 network comprises the following steps:

s1, acquiring a fabric image dataset;

s2, carrying out data enhancement on the fabric image data set to expand the data set so as to obtain a data set after fabric image enhancement;

s3, dividing the enhanced data set according to a preset proportion to respectively obtain a training set, a verification set and a test set;

s4, randomly selecting X images in a training set as input of a YOLOv8 network, obtaining N effective feature images with different scales after feature extraction of a backbone network, then further fusing the obtained effective feature images to obtain N fused feature images with different scales, and finally adjusting channels of the fused feature images to obtain N predicted feature images;

s5, directly predicting the probability that each pixel in the image belongs to the object to be detected and the boundary frame information of the object for N prediction feature images, and generating a boundary frame according to the information;

s6, calculating a network loss value according to the prediction boundary frame obtained in the step S5 and the GT frame of the corresponding picture, and updating parameters of the YOLOv8 network by using a gradient descent method;

s7, repeating the steps S4-S6 until all pictures in the training set are input into the network once; predicting each image of the verification set according to the YOLOv8 network with updated parameters, and outputting AP values of various categories in the verification set after statistics; performing loop iteration until the counted mAP is stabilized at a certain value, and obtaining a training-completed YOLOv8 network;

s8, predicting all images in the test set by using the trained YOLOv8 network to obtain a prediction frame on the feature map; and mapping the prediction frame to the corresponding original image according to the proportional relation between the feature image and the original image to position the flaw.

In a preferred embodiment, in step S1, each image corresponds to a TXT format of a labeling file, where each flaw in the image is labeled with a category and a position.

Preferably, in step S1, the category and the location are marked as a true box, which is called a Group Truth (GT).

Preferably, the data set in steps S1 to S data set includes a plurality of fabric images I and corresponding tag files. The fabric image size is 1024 x 1024 pixels. The tag file is a txt format file that records target position information and category information in an image.

In the step S2, the fabric dataset is expanded by using the mosaics data enhancement, which is to randomly select four images in four training sets, randomly cut the four images, and splice the four images to one image as new training data. Specifically, the Mosaic data enhancement combines with the mix up data enhancement with 20% probability to perform data enhancement on the fabric image data set so as to realize expansion of the data set.

In a preferred scheme, in step S3, the ratio of the divided training set, verification set and test set is 8:1:1, and the train. Txt, val. Txt and test. Txt files are generated to store the corresponding image list.

Preferably, the step S4 specifically includes the following steps:

s4.1, randomly selecting X images in a training set, inputting the X images into a backbone network Darknet-53 of YOLOv8 for step-by-step feature extraction, and taking out three effective feature graphs with different scales and channel numbers at the deepest layer, wherein the scales are respectively 20 multiplied by 20, 40 multiplied by 40 and 80 multiplied by 80, and the backbone network Darknet-53 comprises 2 Conv modules, four C2f modules and an E-ASPP module which are sequentially connected;

s4.2, the three effective feature graphs (respectively called M5, M4 and M3 from small to large according to the scale) obtained in the step S4.1 are further fused, deep and shallow features are fully mixed through top-down and bottom-up fusion, the scales of the output feature graphs P5, P4 and P3 of each layer of the feature fusion module and the input feature graphs M5, M4 and M3 are consistent (respectively 20×20, 40×40 and 80×80) respectively, wherein the improved PANet is a path aggregation network comprising fusion paths from top to bottom and from bottom to top by introducing depth separable convolution after E-ASPP to replace the original common convolution, so that a downsampling process is realized;

s4.3, adjusting the channel number of the three fusion feature graphs output by the PANet to 3 (5+num_class) through the lightweight YOLO Head, and outputting N prediction feature graphs. Wherein the lightweight YOLO Head is obtained by improving the 3×3 convolution in the existing YOLO Head to a 3×3 packet convolution, num_class representing the number of categories.

Preferably, the step S4.2 specifically includes the following steps:

s4.2.1, the feature map M5 is processed by an E-ASPP module to obtain a feature map K5, the feature map K5 is up-sampled and fused with the feature map M4, and the fusion result is input into a C2f module to obtain a feature map K4;

s4.2.2, up-sampling the feature map K4 and fusing with the feature map M3, and inputting the fusion result into a C2f module to obtain the shallowest layer output feature map P3;

s4.2.3, downsampling the feature map P3, fusing the feature map P3 with the feature map M4 and the feature map K4, and inputting the fusion result into a C2f module to obtain an intermediate layer output feature map P4;

s4.2.4, downsampling the feature map P4, fusing with the feature map K5, and inputting the fusion result into the C2f module to obtain the deepest output feature map P5.

As a preferable scheme, in step S6, specifically: firstly, finding out a corresponding priori frame based on the GT frame position, and converting GT and label information into a vector with the length of 5+num_class; and then calculates the loss with each a priori block vector on the prediction feature map. The method comprises the following steps:

calculating the cross-correlation loss (Intersection over Union, ioU) according to the prediction frames and the corresponding GT frames, calculating the classification confidence loss and the frame confidence loss according to the classification confidence and the frame confidence of each prediction frame contained in the network output feature map, weighting and summing the cross-correlation loss, the classification confidence loss and the frame confidence loss according to preset proportion to obtain the network overall loss, and carrying out back propagation to optimize network parameters.

Preferably, in step S7: the process of inputting pictures of the whole training set into a network once for forward propagation and reverse optimization of network parameters is called an epoch. And after each epoch, predicting each picture in the verification set by using the network with updated parameters, and counting the AP indexes of each category of the verification set according to the prediction of each picture and GT. And judging the network convergence through continuous multiple rounds of AP values without changing or with the descending trend, and otherwise, continuing the next epoch training.

In step S7, specifically, YOLO v8 network model training is performed, which specifically includes the following steps:

a: configuring a network environment, wherein the Python version is 3.8, the deep learning framework is PyTorch 1.8, and accelerating by using CUDA;

b: setting the initial learning rate to be 0.001, and setting a learning rate adjustment strategy to be cosine annealing attenuation;

c: setting the number of images of each batch of input networks to be 8;

d: the network does not use pre-training weights. And (5) carrying out network overall loss calculation after each period in the training process is finished. And (4) performing loop iteration until the mAP (mean Average Precision, full-class tie precision) index of the verification set is stabilized at a certain value, and stopping training of the YOLOv8 network at the moment.

Preferably, in step S8: and for each image, outputting N corresponding prediction feature graphs by the network, directly predicting the probability that each pixel in the image belongs to an object to be detected and the boundary frame information of the object, generating boundary frames according to the information to obtain all the prediction frames of each image, and removing redundant frames by using non-maximum suppression (NMS) to obtain the prediction frames on the feature graphs. And finally, mapping the prediction frame on the feature map scale to the original map scale according to the proportional relation.

In a preferred embodiment, in step S8, the obtaining a final prediction frame from all the adjusted test set prediction frames specifically includes the following steps:

s9.1, sorting all the adjusted prediction frames according to confidence scores;

s9.2, removing redundant frames in all the adjusted prediction frames by using Non-maximum suppression (Non-Maximum Suppression, NMS) to obtain a final prediction frame.

The invention also discloses a fabric flaw detection system based on the YOLOv8 network, which comprises the following modules:

and an image acquisition module: collecting a fabric image dataset;

and a data enhancement module: performing data enhancement on the fabric image dataset to augment the dataset;

a data set dividing module: dividing the enhanced data set according to a preset proportion to obtain a training set, a verification set and a test set;

and a prediction module: randomly selecting X images in a training set as input of a YOLOv8 network, obtaining N effective feature images with different scales after feature extraction of a backbone network, then fusing the obtained effective feature images to obtain N fused feature images with different scales, and finally adjusting channels of the fused feature images to obtain N predicted feature images;

a boundary box generation module: for N prediction feature graphs, generating a boundary frame according to the probability that each pixel in the prediction image belongs to the object to be detected and the boundary frame information of the object;

and an updating module: calculating a network loss value according to the obtained boundary frame and the GT frame of the corresponding picture, and updating parameters of the YOLOv8 network by using a gradient descent method;

and (3) an iteration module: inputting all pictures in the training set into a network once; predicting each image of the verification set according to the YOLOv8 network after parameter updating, and outputting AP values of various categories in the verification set after statistics; performing loop iteration until the counted mAP is stabilized at a certain value, and obtaining a training-completed YOLOv8 network;

and a mapping module: predicting all images in the test set by using the trained YOLOv8 network to obtain a prediction frame on the feature map; and mapping the prediction frame to the corresponding original image according to the proportional relation between the feature image and the original image so as to position the flaw.

The beneficial effects of the invention are as follows:

(1) The method improves the YOLOv8 network, removes redundancy and extracts context characteristics from the deep characteristic diagram of the main network, accelerates the network, and does not influence the detection accuracy of the network.

(2) Considering that the downsampling operation efficiency of the PANet of the prior art feature fusion module is low, a lightweight downsampling DWConv (depth separable convolution) with the same function is introduced to improve the efficiency of network feature fusion. In addition, the invention screens features that are more advantageous to the detection task by introducing lightweight, efficient hole space convolution pooling (Efficient Atrous Spatial Pyramid Pooling) at the deepest point of the backbone network. In particular, the activation function in the deep layer of the network is replaced with hardware friendly Hard-Swish to further accelerate the network.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a fabric flaw detection method based on a YOLOv8 network in accordance with a preferred embodiment of the present invention;

FIG. 2 is a block diagram of an improved E-ASPP proposed by the present invention;

FIG. 3 is a schematic diagram of the improved YOLOv8 structure of the present invention;

FIG. 4 is a graph of loss and mAP for the training process of the present invention;

fig. 5 is a block diagram of a fabric defect detection system based on YOLOv8 network in accordance with a preferred embodiment of the present invention.

Detailed Description

The following specific examples are presented to illustrate the present invention, and those skilled in the art will readily appreciate the additional advantages and capabilities of the present invention as disclosed herein. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

The embodiment provides a fabric defect detection method based on an improved YOLOv8 network, and referring to fig. 1, a flow chart of the method is shown. Referring to fig. 2, which is a block diagram of the E-ASPP, the image is processed according to the flow of the method of the present embodiment, so as to describe in detail the technical effect of the method of the present invention on improving the detection accuracy.

The fabric flaw detection method based on the improved YOLOv8 network comprises the following steps:

s1, acquiring a fabric image data set, wherein each image corresponds to a TXT format annotation file, and each flaw in the image is annotated with category and position;

s4, randomly selecting X images in a training set as input of a YOLOv8 network, obtaining N effective feature images with different scales after feature extraction of a backbone network, then further fusing the obtained effective feature images by a feature aggregation module to obtain N fused feature images with different scales, and finally adjusting channels of the fused feature images to obtain N predicted feature images;

s6, calculating a network loss value according to the prediction frame obtained in the step S5 and the GT frame of the corresponding picture, and updating parameters of the YOLOv8 network by using a gradient descent method;

s7, repeating the steps S4-S6 until all pictures in the training set are input into the network once. Predicting each image of the verification set according to the YOLOv8 network with updated parameters, and outputting AP values of various categories in the verification set after statistics; repeatedly executing until the counted mAP is stabilized at a certain value, and obtaining a training-completed YOLOv8 network;

and S8, predicting all images in the test set by using the trained YOLOv8 network to obtain a prediction frame on the feature map. And mapping the prediction frame to the corresponding original image according to the proportional relation between the feature image and the original image to position the flaw. The steps of this embodiment are described in more detail below.

The data set obtained in step S1 specifically includes a plurality of fabric images I and corresponding tag files. The fabric image size is 1024 x 1024 pixels. The tag file is a txt format file that records target position information and category information in an image.

In step S2, a fabric data set is expanded by using Mosaic data enhancement, wherein the Mosaic data enhancement is to randomly select four images in four training sets, randomly cut the four images and splice the four images to one image to serve as new training data.

In step S3, the dividing ratio of the training set, the verification set, and the test set is set to 8:1:1, and the train. Txt, val. Txt, and test. Txt files are generated to save the corresponding image list.

The step S4 specifically includes the following steps:

s4.1, randomly taking X pictures from a training set, inputting the X pictures into a backbone network Darknet-53 of YOLOv8 for multi-scale feature extraction, and outputting three effective feature graphs with different scales of 20X 20, 40X 40 and 80X 80, wherein the proposed Darknet-53 consists of 2 Conv modules, four C2f modules and an E-ASPP module;

s4.2, further fusing three effective feature graphs with different scales extracted by a backbone network through PANet, wherein the output feature graphs correspond to the scales of the output feature graphs of the backbone network (20×20, 40×40 and 80×80 respectively), the improved PANet is to introduce depth separable convolution after E-ASPP to replace an original downsampling module, and the PANet is a path aggregation network and comprises fusion paths from top to bottom and from bottom to top;

and S4.3, adjusting the channel number of the fusion feature map to 3 (5+num_class) through the lightweight YOLO Head, and outputting N prediction feature maps. Wherein, the lightweight YOLO Head is a group convolution by improving the 3×3 convolution in the original YOLO Head to 3×3, and num_class represents the number of categories.

In the step S6, the overall network loss is calculated according to the network output feature map, the prediction frame and the corresponding GT frame, specifically:

In the step S7, YOLO v8 network model training is performed, which specifically includes the following steps:

c: setting the number of images of each batch of input networks to be 8;

d: the network does not use pre-training weights. And (5) carrying out network overall loss calculation after each period in the training process is finished. And (3) carrying out loop iteration until mAP indexes of the verification set are stabilized at a certain value, and stopping training of the YOLOv8 network at the moment.

Further, in step S8, obtaining a final prediction frame from all the adjusted test set prediction frames specifically includes the following steps:

s8.1, sorting all the adjusted prediction frames according to confidence scores;

and S8.2, removing redundant frames in all the adjusted prediction frames by using Non-maximum suppression (Non-Maximum Suppression, NMS) to obtain a final prediction frame.

In order to verify the performance of the method, the improved YOLOv8 network is used for predicting the images in the test set, and the average accuracy mean mAP (mean Average Precision) and the accuracy (Precision) and Recall (Recall) corresponding to each category are calculated by using the prediction result and the GT. As shown in the experimental results in FIG. 4, the method of the invention can realize the detection of various types of fabric flaws, and achieves higher accuracy.

The preferred embodiment of the invention has the following technical effects:

(1) The method improves the YOLOv8, removes redundancy and extracts context characteristics from the deep characteristic diagram of the backbone network, accelerates the network, and does not influence the detection accuracy of the network.

(2) Considering that the downsampling operation of the original feature fusion module PANet is not efficient, a lightweight downsampling DWConv (depth separable convolution) with the same effect is introduced to improve the efficiency of network feature fusion. In addition, features that are more advantageous to the detection task are screened by introducing lightweight, efficient hole space convolution pooling (Efficient Atrous Spatial Pyramid Pooling) at the deepest point of the backbone network. In particular, the activation function in the deep layer of the network is replaced with hardware friendly Hard-Swish to further accelerate the network.

As shown in fig. 5, this embodiment discloses a fabric defect detection system based on YOLOv8 network, based on the above method embodiment, the fabric defect detection system includes the following modules:

and an image acquisition module: collecting a fabric image dataset;

In summary, the invention discloses a fabric surface flaw detection method and system based on improved YOLO v8, firstly, a fabric surface flaw data set is adopted, a diaper flaw data set is enhanced by using Mosaic data, and the diaper flaw data set is divided into a training set, a verification set and a test set according to the proportion; embedding the E-ASPP structure module into a backbone network of YOLO v8, improving the feature extraction capability, further enhancing the feature fusion capability of the network by using a bidirectional feature pyramid in a path aggregation network, and improving the detection efficiency of the network by using a depth separable convolution; and training the network model through a training set, carrying out loss calculation through a verification set in each period, and testing the overall performance of the model through a test set after the training is finished. According to the invention, an E-ASPP structure is introduced into the YOLOv8 network to improve the precision, meanwhile, the path aggregation network is simplified for the reasoning speed of the model, and the flaw identification precision of the fabric image is higher under the condition of ensuring the reasoning speed.

The above examples are merely illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the protection scope of the present invention without departing from the design spirit of the present invention.

Claims

1. The fabric flaw detection method based on the YOLOv8 network is characterized by comprising the following steps of:

s1, acquiring a fabric image dataset;

s2, carrying out data enhancement on the fabric image data set to expand the data set;

s3, dividing the enhanced data set according to a preset proportion to obtain a training set, a verification set and a test set;

s4, randomly selecting X images in a training set as input of a YOLOv8 network, obtaining N effective feature images with different scales after feature extraction of a backbone network, then fusing the obtained effective feature images to obtain N fused feature images with different scales, and finally adjusting channels of the fused feature images to obtain N predicted feature images;

s5, for N prediction feature maps, predicting probability that each pixel in the image belongs to an object to be detected and boundary frame information of the object to generate a boundary frame;

s6, calculating a network loss value according to the boundary frame obtained in the step S5 and the GT frame of the corresponding picture, and updating parameters of the YOLOv8 network by using a gradient descent method;

s7, repeating the steps S4-S6 until all pictures in the training set are input into the network once; predicting each image of the verification set according to the YOLOv8 network after parameter updating, and outputting AP values of various categories in the verification set after statistics; performing loop iteration until the counted mAP is stabilized at a certain value, and obtaining a training-completed YOLOv8 network;

s8, predicting all images in the test set by using the trained YOLOv8 network to obtain a prediction frame on the feature map; and mapping the prediction frame to the corresponding original image according to the proportional relation between the feature image and the original image so as to position the flaw.

2. The YOLOv8 network-based fabric flaw detection method of claim 1, wherein: the data set obtained in the step S1 specifically comprises a plurality of fabric images I and corresponding tag files; the tag file is a txt format file that records target position information and category information in an image.

3. The method for detecting fabric defects based on YOLOv8 network according to claim 1, wherein step S2 is specifically as follows: the fabric dataset was augmented with a Mosaic data enhancement.

4. The YOLOv8 network-based fabric flaw detection method of claim 1, wherein: in step S3, the dividing ratio of the training set, the verification set, and the test set is set to 8:1:1, and the train. Txt, val. Txt, and test. Txt files are generated to save the corresponding image list.

5. The YOLOv8 network-based fabric flaw detection method of claim 1, wherein: the step S4 specifically comprises the following steps:

s4.1, randomly taking X images from a training set, inputting the X images into a backbone network Darknet-53 of YOLOv8 for multi-scale feature extraction, and outputting three effective feature images with different scales of 20X 20, 40X 40 and 80X 80, wherein the Darknet-53 consists of 2 Conv modules, four C2f modules and an E-ASPP module;

s4.2, further fusing three effective feature graphs with different scales extracted by a backbone network through PANet, wherein the output feature graphs correspond to the scales of the output feature graphs of the backbone network, the PANet introduces an efficient attention mechanism after E-ASPP and keeps each calculation node in a lightweight CSP structure;

s4.3, adjusting the channel number of the fusion feature map to 3 (5+num_class) through a lightweight YOLO Head, and outputting N prediction feature maps; where num_class represents the number of categories.

6. The fabric flaw detection method based on YOLOv8 improvement algorithm as claimed in claim 1, wherein the method comprises the following steps: the step S6 specifically comprises the following steps:

calculating the cross-correlation loss according to the boundary frames and the GT frames of the corresponding pictures, calculating the classification confidence loss and the frame confidence loss according to the classification confidence and the frame confidence of each prediction frame contained in the network output feature map, weighting and summing the cross-correlation loss, the classification confidence loss and the frame confidence loss according to a preset proportion to obtain the overall loss of the network, and carrying out back propagation to optimize network parameters.

7. The YOLOv8 network-based fabric flaw detection method of claim 1, wherein: in step S7, the YOLO v8 network model is trained as follows:

s7.1, configuring a network environment, wherein Python version is 3.8, a deep learning framework is PyTorch1.8, and acceleration is performed by using CUDA;

s7.2, setting an initial learning rate to be 0.001, and setting a learning rate adjustment strategy to be cosine annealing attenuation;

s7.3, setting the number of images of each batch of input networks to be 8;

s7.4, calculating the overall loss of the network after each period is finished; and (3) carrying out loop iteration until mAP indexes of the verification set are stabilized at a certain value, and stopping training of the YOLOv8 network at the moment.

8. The YOLOv8 network-based fabric flaw detection method of claim 1, wherein: in step S8, obtaining a final prediction frame from all the adjusted test set prediction frames specifically includes the following steps:

and S8.2, removing redundant frames by using non-maximum suppression in all the adjusted prediction frames so as to obtain a final prediction frame.

9. Fabric defect detection system based on YOLOv8 network, based on the method according to any one of claims 1-8, characterized in that it comprises the following modules:

and an image acquisition module: collecting a fabric image dataset;