CN113989216A

CN113989216A - Self-attention-based method for detecting texture surface defects of antagonistic self-encoder

Info

Publication number: CN113989216A
Application number: CN202111243400.1A
Authority: CN
Inventors: 张琳娜; 张芳慧; 岑翼刚; 张兰尧
Original assignee: Guizhou University
Current assignee: Guizhou University
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-01-28

Abstract

The invention discloses a self-attention-based method for detecting texture surface defects of an antagonistic self-encoder, which comprises the following steps of: step 1: according to the non-defective texture industrial product image, two defect generation methods based on color and deformation are set up to generate a defect sample with defect data simulating reality; step 2: constructing a self-attention-based antagonistic self-encoder texture surface defect detection framework; and step 3: designing the metric loss of the input image and the reconstructed image; and 4, step 4: and obtaining abnormal scores of pixel levels by calculating the multi-scale structural similarity of the test image and the reconstructed image, and further judging whether the test image has defects. The method can accurately position the defects on the surface of the product under the condition of giving a large number of non-defective texture product surface samples; the operation efficiency is high, and the real-time requirement can be met; and the hardware cost is low, the method is easy to popularize in an industrial defect detection system, and has good application prospect.

Description

Self-attention-based method for detecting texture surface defects of antagonistic self-encoder

Technical Field

The invention relates to a method for processing a surface defect image of a texture industrial product, in particular to a method for detecting the surface defect of the industrial product based on a self-attention confrontation self-encoder algorithm.

Background

The detection of the texture surface defects is a key step in industrial production, and aims to identify and position the defects (shown in figure 1) on the surface of a product so as to determine whether the product is qualified or not, further adjust a production scheme in time, improve the production efficiency and reduce the production cost. Traditionally, surface defect detection is performed by the human eye, but human eye visual assessment is not only time-consuming and labor-consuming, but also has strong subjectivity and is prone to errors. With the reduction of hardware equipment cost and the increase of computing power, the computer vision based detection technology for surface defects of industrial products has made great progress and is widely used in industrial production fault detection systems.

Traditional machine vision defect detection methods, such as Support Vector Machine (SVM) methods, HOG and Gabor feature extraction methods, distinguish defects from non-defects, and such features are not robust enough under conditions of complex texture types and scene changes, and therefore, it is difficult to apply the industrial product surface defect detection process. In recent years, self-encoder networks have achieved good performance in tasks such as image reconstruction and defect detection. However, the defect detection algorithm based on the self-encoder network still has many problems, such as poor defect localization capability, blurred edge of the reconstructed image, and the like.

Disclosure of Invention

Aiming at the defects of the traditional method, the invention aims to provide a self-attention-based confrontation self-encoder texture surface defect detection algorithm, and for an acquired texture industrial product surface image, a defect target in the image is detected by using the self-attention-based confrontation self-encoder algorithm. In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

step 1: according to the non-defective texture industrial product image, two defect generation methods based on color and deformation are set up to generate a defect sample with defect data simulating reality;

step 2: constructing a self-attention-based antagonistic self-encoder texture surface defect detection framework;

and step 3: designing the metric loss of the input image and the reconstructed image;

and 4, step 4: and obtaining abnormal scores of pixel levels by calculating the multi-scale structural similarity of the test image and the reconstructed image, and further judging whether the test image has defects.

The further technical scheme is that the specific construction steps for generating the defect data based on the color method in the step 1 are as follows:

step 1-1: establishing a closed area by using a Bezier curve;

step 1-2: adding random colors into the enclosed area;

step 1-3: adjusting the size of the closed area to be random;

step 1-4: and adding the closed area to the random position of the non-defective image to finish generating the defect data based on the color method.

The further technical scheme is that the specific construction steps for generating the defect data based on the deformation method in the step 1 are as follows:

step 1-5: two points (p) are randomly generated₁,p₂) And a deformation radius r;

step 1-6: according to p₁，p₂And r calculating a deformation region;

step 1-7: according to the formula

And calculating the pixel position of the deformation area to finish generating the defect data based on the deformation method.

The further technical scheme is that the construction steps of the self-attention-based anti-self-encoder texture surface defect detection framework in the step 2 are as follows:

step 2-1: constructing an encoder network frame;

step 2-2: building a decoder network framework;

step 2-3: building a self-attention framework of the combined channel and position;

step 2-4: constructing a discriminator network framework by using a discriminator in the DCGAN network;

step 2-5: constructing a middle layer network framework by using four full-connection layer operations;

step 2-6: a classifier network framework is built using three full connectivity layer operations.

The further technical scheme is that the specific operation steps for constructing the encoder network framework in the step 2-1 are as follows:

step 2-1-1: using the first five convolution modules of the VGG-16 network as a coding layer network;

step 2-1-2: after each convolution module a ReLUs activation operation with a slope of 0.2 was performed followed by a maximum pooling operation with a kernel size of 2 x 2 and a step size of 2.

The further technical scheme is that the specific operation steps for constructing the decoder network framework in the step 2-2 are as follows:

step 2-2-1: for each convolution block in the encoder, there is a corresponding decoded block in the decoder. The first four decoding blocks respectively comprise a convolution layer and an anti-convolution layer, the convolution layer adopts a convolution kernel of 3 x 3, the step length is 1, the filling is 1, the anti-convolution layer adopts a convolution kernel of 4 x 4, the step length is 2, the filling is 1, and the ReLUs activation operation with the slope of 0.2 is adopted after the convolution layer and the anti-convolution layer;

step 2-2-2: the fifth decoding block comprises a convolution layer, the convolution kernel size is 3 x 3, the step size is 1, and the padding is 1.

The further technical scheme is that the specific operation steps of constructing the channel self-attention frame in the step 2-3 are as follows:

step 2-3-1: features of the last layer output of an encoder module

Inputting to a channel self-attention module;

step 2-3-2: readjusting M_AIs in the shape of

Referred to as M_A1Wherein C represents the number of channels, and N ═ H × W represents M_AThe number of features in;

step 2-3-3: by M_A1And M_A1Multiplying by the transpose matrix of (a) to obtain a channel attention map

Step 2-3-4: applying softmax operation to obtain a final channel attention diagram;

step 2-3-5: using the formula M_out＝[X·M_A1]^R+M_AApplying channel self-attention maps to feature maps [. ]]^RIs the operation of re-adjusting the matrix shape.

The further technical scheme is that the specific operation steps of constructing the position self-attention frame in the step 2-3 are as follows:

step 2-3-6: outputting M from channel attention module_BInputting the features of (2) into the three convolution layers to generate three new feature maps M_B1，M_B2And M_B3Wherein

Reshape to

Wherein C represents the number of channels, and N ═ H × W represents M_BThe number of features in;

step 2-3-7: calculating M_B2And M_B1Obtaining a position self-attention diagram by the cosine distance between transpositions;

step 2-3-8: using softmax operation to obtain a final position self-attention map;

step 2-3-9: using the formula M_out＝[M_B3·S]^R+M_BThe location self-attention map is applied to the feature map.

The further technical solution is that the design of the metric loss of the input image and the reconstructed image in step 3 is specifically described as follows:

step 3-1: the structural similarity between the reconstructed image and the input image and the attention map F of the input image are comprehensively considered, and the reconstruction loss is designed to be L_re(i,j)＝SSIM(I_in(i,j),I_re(i,j))*(1+C_rF (i, j)), here, L_re(I, j) represents the loss of the ith, j position of the image, I_inRepresenting an input image, I_reRepresenting the reconstructed image, SSIM (. cndot.) representing the loss of structural similarity, C_rE {0,1} represents the result of the classification, F (·) represents the intent of specifying the convolutional layer;

step 3-2: suggesting a loss of edge coherence in the gradient domain

The input image is made to coincide with the edges of the reconstructed image, where,

λ_eclet a constant, ↓, n denote the downsampling operation, | | | · | | | conveyance_FRepresenting the F norm, sigma () representing a weakly modified linear unit,

indicates a gradient operation, an indicates a corresponding element multiplication operation;

step 3-3: the final metric function is designed to be L ═ theta₁L_re+θ₂L_edge+θ₃L_D+θ₄L_cHere, θ_iIs a constant number, L_DTo combat losses, L_cIs a cross-entropy classification penalty.

L_re＝SSIM(I_in,I_re)*(1+C_r*F))

The further technical solution is that the judgment of whether the test image in step 4 is defective is specifically described as follows:

step 4-1: computing an anomaly map ASM (I) of a test image using multi-scale structural similarity_o,I_r) (ii) a Here, I_oRepresenting a test image, I_rWhich represents the reconstructed image(s),

step 4-2: by setting a segmentation threshold τ according to the anomaly map₁And a classification threshold τ₂To obtain the final defect segmentation map of pixel level and the defect classification category of image level.

In consideration of the disadvantage of poor localization capability of the network defect of the conventional self-encoder, the invention introduces a self-attention mechanism in the encoding layer, so that the long-range context information can be adaptively gathered to improve the feature expression of texture reconstruction, as shown in fig. 2. Aiming at the defect that the reconstruction result of the defect area is poor in robustness, the defect samples are manufactured from two aspects of color and deformation, so that the model can accurately classify the manufactured defect samples. In addition, during the reconstruction process, two input and output modes (defect-free ) make the reconstruction result more robust. Aiming at the problem of edge blurring of a reconstruction result, the invention adds edge consistency loss, so that the input image and the reconstructed image are kept consistent on the edge, and the blurring is eliminated.

Compared with the prior art, the invention has the remarkable advantages that: (1) the method comprises the steps of performing unsupervised defect detection on a surface image of a texture industrial product, and providing a defect sample and a label are not needed; (2) the accuracy rate of detecting the surface defects of the industrial products is high; (3) the running speed is high, and the real-time requirement of practical application can be met; (4) the hardware requirement is simple, and the large-scale popularization is easy.

Drawings

FIG. 1 is a normal image and an abnormal image of a texture-like product of the present invention; the first line is a normal image, the second line is an abnormal image, and the third line is a partial enlarged view of the abnormal image.

Fig. 2 is a network framework diagram of the present invention.

FIG. 3 is a schematic diagram of a defect manufacturing process based on a deformation method according to the present invention.

FIG. 4 is a graph of the results of the defect production method of the present invention based on both color and deformation.

Fig. 5 is a defect reconstruction and detection result diagram of the present invention, in which the first to fourth lines respectively represent a test image, a reconstructed image, a defect localization result diagram and a truth diagram.

Detailed Description

As shown in FIG. 1, the steps of the self-attention-based countering self-encoder texture surface defect detection algorithm of the present invention are as follows:

The invention discloses a self-attention-based method for detecting texture surface defects of an antagonistic self-encoder, and provides an unsupervised solution for the surface anomaly detection of industrial products. The method combines a self-encoder and a self-attention mechanism to detect the defects on the surface of the texture industrial product. First, defect data is generated based on both color and deformation methods to simulate a real defect sample. Second, the self-attention-based robust self-encoder defect detection algorithm is trained using two input-output modes (no defect-no defect, defect-no defect). And finally, obtaining abnormal scores of pixel levels by calculating the multi-scale structural similarity of the test image and the reconstructed image, and further judging whether the test image has defects. The method can accurately position the defects on the surface of the product under the condition of giving a large number of non-defective texture product surface samples; the operation efficiency is high, and the real-time requirement can be met; and the hardware cost is low, the method is easy to popularize in an industrial defect detection system, and has good application prospect.

The specific implementation is as follows.

The first step is as follows: the defect data is generated based on two methods, color and deformation.

Based on a given normal texture surface sample (as shown in the first row of FIG. 1), the present invention generates a defect data simulation real defect sample (as shown in the second row of FIG. 1) based on both color and deformation defect generation methods.

The generation of the color defect sample is mainly divided into the following steps. Firstly, establishing a closed region by using a Bezier curve; second, random colors are added to the enclosed areas. Third, the closed region is adjusted to a random size. Finally, the occlusion region is added to the random position of the non-defective image.

The preparation of the deformation defect sample is mainly divided into the following steps. First, two dots (p) are randomly generated₁,p₂) And a deformation radius r. Secondly, according to p₁，p₂And r calculating a deformation region. Finally, according to the formula

The pixel position of the deformation region is calculated, and the process diagram is shown in fig. 3. The final defect sample results are shown in fig. 4.

The second step is that: and constructing a self-attention-based antagonistic self-encoder texture surface defect detection framework.

The countermeasure self-encoder network provided by the invention consists of three modules, namely a self-encoder, a countermeasure network and a classifier network, as shown in figure 2. The self-encoder network consists of three parts, namely an encoder, a decoder and an intermediate layer. In the encoder, the size of the convolution kernel is 3, the step size and the padding are both 1, and the activation function uses a leak ReLU with a slope of 0.2. In the encoder, deconvolution is used to increase the feature map size. The intermediate layer network is built by four full connection layers, the countermeasure network is built by a discriminator in the DCGAN network, and the classifier network is built by three full connection layers.

The present invention proposes a self-attention framework that combines channels and positions, as shown in fig. 2. In the channel self-attention module, first, the characteristics of the last layer output of the encoder module

Input to the channel self-attention module, and then readjust M_AIs in the shape of

Referred to as M_A1Wherein C represents the number of channels, and N ═ H × W represents M_AThe number of features in (1). By M_A1And M_A1Multiplying by the transpose matrix of (a) to obtain a channel attention map

Applying softmax operation to obtain final channel attention diagram, and finally, applying the channel self-attention diagram to the feature diagram. In the location self-attention module, first, the channel self-attention module outputs M_BInputting the features of (2) into the three convolution layers to generate three new feature maps M_B1，M_B2And M_B3Wherein

Then, the shape is re-adjusted to

Wherein C represents the number of channels, and N ═ H × W represents M_BThe number of features in (1). Next, M is calculated_B2And M_B1Applying softmax operation to obtain the final positionAnd (4) self-attention drawing, and finally, applying the channel self-attention drawing to the characteristic diagram.

The third step: the metric loss of the input image and the reconstructed image is designed.

The invention comprehensively considers the structural similarity of a reconstructed image and an input image and an attention chart F of the input image, and designs the reconstruction loss to be L_re(i,j)＝SSIM(I_in(i,j),I_re(i,j))*(1+C_rF (i, j)), here, L_re(I, j) represents the loss of the ith, j position of the image, I_inRepresenting an input image, I_reRepresenting the reconstructed image, SSIM (. cndot.) representing the loss of structural similarity, C_rE {0,1} represents the result of the classification, and F (·) represents the intent of specifying the convolutional layer.

In addition, the invention proposes an edge consistency loss of the gradient domain

indicating a gradient operation, an indicates a corresponding element.

Thus, the final metric function is L ═ θ₁L_re+θ₂L_edge+θ₃L_D+θ₄L_cHere, θ_iIs a constant number, L_DTo combat losses, L_cIs a cross-entropy classification penalty.

The fourth step: and obtaining abnormal scores of pixel levels by calculating the multi-scale structural similarity of the test image and the reconstructed image, and further judging whether the test image has defects.

The invention uses multi-scale structural similarity to calculate an anomaly map ASM (I) of a test image_o,I_r) (ii) a Here, the，I_oRepresenting a test image, I_rRepresenting the reconstructed image. By setting a segmentation threshold τ according to the anomaly map₁And a classification threshold τ₂To obtain the final defect segmentation map of pixel level and the defect classification category of image level. FIG. 5 shows the defect reconstruction map and the detection results of the present invention. The first to fourth lines are respectively a test image, a reconstructed image, a defect positioning result diagram and a truth diagram.

Claims

1. A self-attention-based method for detecting texture surface defects of a confrontation self-encoder is characterized by comprising the following steps:

2. The method for detecting the texture surface defects of the self-attention-based antagonistic self-encoder according to the claim 1, wherein the construction of the color-based defect generation method in the step 1 comprises the following processing steps:

step 1-1: establishing a closed area by using a Bezier curve;

step 1-2: adding random colors into the enclosed area;

step 1-3: adjusting the size of the closed area to be random;

3. The method for detecting the texture surface defects of the self-attention-based antagonistic self-encoder according to the claim 1, wherein the construction of the defect generation method based on deformation in the step 1 comprises the following processing steps:

step 1-6: according to p₁，p₂And r calculating a deformation region;

step 1-7: according to the formula

4. The method for detecting texture surface defects of self-attention-based countering self-encoder according to claim 1, wherein the step 2 of constructing the self-attention-based countering self-encoder texture surface defect detection framework comprises the following processing steps:

step 2-1: constructing an encoder network frame;

step 2-2: building a decoder network framework;

5. The method for detecting texture surface defects of a self-attention-based antagonistic self-encoder according to claim 4, wherein the building of the encoder network framework in the step 2-1 comprises the following processing steps:

6. The self-attention-based method for detecting texture surface defects of a self-opposing self-encoder according to claim 4, wherein the construction of the decoder network framework in the step 2-2 comprises the following processing steps:

7. The method for detecting texture surface defects of self-attention-based antagonistic self-encoder according to claim 4, wherein the construction of the channel self-attention frame in step 2-3 comprises the following processing steps:

step 2-3-1: features of the last layer output of an encoder module

Inputting to a channel self-attention module;

step 2-3-2: readjusting M_AIs in the shape of

8. The method for detecting texture surface defects of self-attention-based antagonistic self-encoder according to claim 4, wherein the construction of the position self-attention frame in step 3 comprises the following processing steps:

Reshape to

9. The method of claim 1, wherein the step 3 of designing the metric loss of the input image and the reconstructed image comprises the following steps:

step 3-1: comprehensive consideration of reconstructed imagesStructural similarity with the input image and attention map F of the input image, design reconstruction loss L_re(i,j)＝SSIM(I_in(i,j),I_re(i,j))*(1+C_rF (i, j)), here, L_re(I, j) represents the loss of the ith, j position of the image, I_inRepresenting an input image, I_reRepresenting the reconstructed image, SSIM (. cndot.) representing the loss of structural similarity, C_rE {0,1} represents the result of the classification, F (·) represents the intent of specifying the convolutional layer;

step 3-2: suggesting a loss of edge coherence in the gradient domain

10. The method of claim 1, wherein the step 4 of determining whether the test image is defective comprises the following steps: