CN111814867A

CN111814867A - Defect detection model training method, defect detection method and related device

Info

Publication number: CN111814867A
Application number: CN202010635033.9A
Authority: CN
Inventors: 黄积晟; 任宇鹏; 卢维
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-07-03
Filing date: 2020-07-03
Publication date: 2020-10-23
Anticipated expiration: 2040-07-03
Also published as: CN111814867B

Abstract

The application discloses a training method of a defect detection model, a defect detection method and a related device. The training method of the defect detection model comprises the following steps: acquiring a training image, wherein the training image is marked with defective real information and a mask area, the mask area is an area formed by pixel points representing defects in the training image, and the real information comprises a real frame of the defects; detecting the training image by using a defect detection model to obtain detection information of the defect, wherein the detection information of the defect comprises a final detection frame of the defect, and the defect detection model is used for classifying positive and negative samples of a plurality of initial detection frames of the training image by using a mask region and determining the final detection frame of the defect based on a classification result; and adjusting the network parameters of the defect detection model according to the difference between the real information and the detection information of the defect. By the method, the detection accuracy of the defect detection model can be improved.

Description

Defect detection model training method, defect detection method and related device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a training method for a defect detection model, a defect detection method, and a related apparatus.

Background

In the manufacturing industry, many products have many defects due to human or non-human factors in the manufacturing process, and some defects can seriously affect the quality of the products. For example, in the manufacturing process of aluminum type materials, defects such as scratches, under-leakage, dirty spots, etc. can be generated due to artificial or non-artificial factors.

At present, in order to ensure the production quality of products, most of manual inspectors are adopted to carry out random sampling inspection, but too many unreliable factors exist in manual sampling inspection, omission and errors are easily caused, and the quality of the products is difficult to ensure.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a training method of a defect detection model, a defect detection method and a related device, which can improve the detection accuracy of the defect detection model.

The technical scheme adopted by the application is to provide a training method of a defect detection model, and the method comprises the following steps: acquiring a training image, wherein the training image is marked with defective real information and a mask area, the mask area is an area formed by pixel points representing defects in the training image, and the real information comprises a real frame of the defects; detecting the training image by using a defect detection model to obtain detection information of the defect, wherein the detection information of the defect comprises a final detection frame of the defect, and the defect detection model is used for classifying positive and negative samples of a plurality of initial detection frames of the training image by using a mask region and determining the final detection frame of the defect based on a classification result; and adjusting the network parameters of the defect detection model according to the difference between the real information and the detection information of the defect.

The defect detection model comprises a feature extraction network, a region generation network, a feature aggregation layer and a classification layer; detecting the training image by using a defect detection model to obtain the detection information of the defect, comprising the following steps: inputting the training image into a feature extraction network to obtain a multi-dimensional feature map; inputting the multi-dimensional feature map into an area generation network to obtain a plurality of initial detection frames, classifying positive and negative samples of the initial detection frames by using a mask area, and obtaining a plurality of candidate frames based on a classification result; inputting the multi-dimensional feature map and a plurality of candidate frames into a feature aggregation layer to obtain a target feature map corresponding to the candidate frames; and inputting the target characteristic diagram corresponding to the candidate frame into a classification layer to obtain the detection information of the defects in the training image.

The method for classifying the positive and negative samples of the plurality of initial detection frames by using the mask area comprises the following steps: acquiring a first pixel quantity occupied by the mask area in the real frame and acquiring a second pixel quantity occupied by the mask area in the initial detection frame; if the first ratio between the second pixel quantity and the first pixel quantity is smaller than a first reference threshold value, determining the initial detection frame as a negative sample; and if the first ratio is not less than the first reference threshold, determining the initial detection frame as a positive sample.

Wherein if the first ratio is not less than the first reference threshold, determining that the initial detection frame is a positive sample, comprising: if the first ratio is not smaller than the first reference threshold, acquiring a second ratio between the first pixel quantity and a third pixel quantity of the real frame, and acquiring a third ratio between the second pixel quantity and a fourth pixel quantity of the initial detection frame; and if a fourth ratio between the third ratio and the second ratio is greater than a second reference threshold, determining that the initial detection frame is a positive sample.

The detection information of the defects also comprises the detection types of the defects and the confidence degrees of the detection types; before the classifying the positive and negative samples of the plurality of initial detection frames by the mask region, the method further includes: acquiring the intersection ratio between the final detection frame and the real frame in the last training; carrying out weighted summation on the intersection ratio and the confidence coefficient in the last training to obtain a first control value; and taking the product of the first control value and the preset parameter value as a first reference threshold value in the training.

The feature extraction network is a feature pyramid network FPN, the region generation network is a region candidate network RPN, the feature aggregation layer is an ROI ALIGN layer, and variable convolution is used in the ROI ALIGN layer for convolution processing.

Wherein, input training image to the characteristic extraction network, obtain the multidimensional characteristic map, include: sequentially carrying out N times of downsampling on the training image by using a feature extraction network, and obtaining feature maps of the 2 nd to Nth downsampling to obtain an N-1-dimensional initial feature map, wherein a void convolution is used for carrying out convolution processing in the Nth downsampling process, and N is larger than 2; for the N-1 dimensional initial feature map, performing ith up-sampling on the N-1 dimensional initial feature map based on the N-i dimensional initial feature map to obtain an ith dimensional final feature map; wherein i is an integer of 1 to N-1; obtaining a plurality of initial detection boxes, comprising: and traversing the multi-dimensional feature map to obtain a plurality of initial detection frames corresponding to each pixel point.

The detection information of the defects also comprises the detection types of the defects and the confidence degrees of the detection types; inputting the target feature map corresponding to the candidate frame into a classification layer to obtain detection information of the defects in the training image, wherein the detection information comprises the following steps: classifying the target characteristic graph corresponding to the candidate frame to obtain the detection category of the candidate frame and the confidence coefficient of the detection category; processing the confidence coefficient by using a soft non-maximum value to obtain a processed confidence coefficient; and taking the candidate frame with the processed confidence coefficient higher than the preset confidence coefficient as a final detection frame, and outputting the position information representing the final detection frame, the detection category of the final detection frame and the confidence coefficient of the detection category.

The real information of the defects further comprises real categories of the defects, and the detection information of the defects further comprises detection categories of the defects and confidence degrees of the detection categories; adjusting network parameters of a defect detection model according to a difference between real information and detection information of the defect, comprising: acquiring an intersection ratio between the final detection frame and the real frame in the last training, and obtaining a second control value based on the intersection ratio, wherein the second control value is in positive correlation with the intersection ratio; weighting the second control value and the confidence coefficient in the last training to obtain a third control value; obtaining a first loss value by using the difference between the final detection frame and the real frame which belong to the positive sample and are obtained by the training, and weighting the first loss value by using a third control value to obtain a second loss value; obtaining a third loss value by using the difference between the detection category and the real category of the positive sample obtained by the training, weighting the third loss value by using a third control value to obtain a fourth loss value, and obtaining a fifth loss value by using the difference between the detection category and the real category of the negative sample obtained by the training; and adjusting the network parameters of the defect detection model by using the second loss value, the fourth loss value and the fifth loss value.

Another technical solution adopted by the present application is to provide a defect detection method, including: acquiring an image to be processed; and detecting the image to be processed by using the defect detection model to obtain detection information corresponding to the defect in the image to be processed, wherein the defect detection model is obtained by training by using the training method of the defect detection model in the scheme provided by the application.

Another technical solution adopted by the present application is to provide an image processing apparatus, which includes a processor and a memory coupled to the processor; wherein the memory is used for storing program data and the processor is used for executing the program data to realize the method in any scheme provided by the application.

Another technical solution adopted by the present application is to provide a computer-readable storage medium for storing program data, which when executed by a processor, is used for implementing the method in any one of the aspects provided in the present application.

The beneficial effect of this application is: different from the situation of the prior art, the method and the device have the advantages that during training of the defect detection model, the mask regions are used for carrying out positive and negative sample classification on a plurality of initial detection frames of the training image, and the final detection frame of the defect is determined based on the classification result, so that the network parameters of the defect detection model are adjusted according to the difference between the real information of the defect and the detection information. Because the mask region is a region formed by pixel points representing defects in the training image, the classification quality of positive and negative samples can be improved when the positive and negative samples are classified, and the detection accuracy of the defect detection model is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flow chart diagram illustrating a first embodiment of a method for training a defect inspection model provided in the present application;

FIG. 2 is a schematic diagram of a training image provided herein;

FIG. 3 is a schematic flow chart diagram illustrating a second embodiment of a method for training a defect inspection model provided in the present application;

FIG. 4 is a schematic structural diagram of an embodiment of a feature extraction network provided herein;

FIG. 5 is a schematic flow chart diagram illustrating step 33 of FIG. 3 provided herein;

FIG. 6 is a detailed flow chart of step 35 of FIG. 3 provided herein;

FIG. 7 is a schematic flowchart of a third embodiment of a training method for a defect detection model provided in the present application;

FIG. 8 is a schematic flowchart of an embodiment of a defect detection method provided herein;

FIG. 9 is a schematic structural diagram of an embodiment of an image processing apparatus provided in the present application;

FIG. 10 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Referring to fig. 1, fig. 1 is a schematic flowchart of a first embodiment of a training method of a defect detection model provided in the present application, where the method includes:

step 11: a training image is acquired.

It is understood that the training images are acquired manually or automatically. The training image includes defects of the product, such as scratches, under-lets, dirty spots, bubbles, wheel marks, scratches and the like generated in the manufacturing process of the aluminum type material. Plastic materials can suffer from deformation, breakage, blisters, scratches, etc. during the manufacturing process. The metal welding has the common defects of undercut, welding beading, dent, welding deformation and the like, and sometimes has surface pores and surface cracks. The root of the single-side welding has no defects of penetration, air holes, cracks, and the like. These can be presented by images for category differentiation.

In some embodiments, information corresponding to the defect in the training image, such as the type of the defect, is marked by using a manual marking method.

In some embodiments, the training image is labeled with real information of the defect and a mask region, the mask region is a region formed by pixel points representing the defect in the training image, and the real information includes a real frame of the defect. As shown in fig. 2, if there is a defect a in the training image a, the number of pixels of the defect a and the area formed by the pixels are obtained, such as the position information of the defect a relative to the training image, and the defect a is framed using the real frame a1, and the defect type of the defect a is described.

In some embodiments, after the training images are obtained, the training images are preprocessed, for example, by operations such as random rotation, mirror inversion, random blurring, random cropping, and brightness change, the training images are processed to obtain a plurality of training images corresponding to the operations, and labels of the images are changed according to actual operations, so that the number of the training images is greatly increased. The training can be completed without inputting other related training images again.

Step 12: and detecting the training image by using the defect detection model to obtain the detection information of the defect.

In some embodiments, the detection information of the defect includes a final detection frame of the defect, and the defect detection model is to classify positive and negative samples of a plurality of initial detection frames of the training image by using the mask region and determine the final detection frame of the defect based on the classification result. For example, the number of pixels in the mask region included in the initial detection frame is used as a judgment basis, positive samples are obtained when the number of pixels is higher than a first set threshold number, and negative samples are obtained when the number of pixels is lower than a second set threshold number, wherein the first set threshold number is greater than the second set threshold number.

In some embodiments, the defect detection model includes a feature extraction network for performing feature extraction on the input training image, such as using a convolutional neural network as the feature extraction network, and obtaining a corresponding feature map through the feature extraction network. It can be understood that the feature map contains important information about the defect.

In some embodiments, an FCN (full volume Networks), SegNet, or the like network may be employed to build the model. For example, the encoder-decoder (encoding-decoding) structure of SegNet is adopted to perform training learning on the input pictures so as to obtain the distribution characteristics of the data. Also used by the encoder portion of SegNet is the first 13-layer convolutional network of VGG 16. Each encoder layer corresponds to a decoder layer, and finally the output of each decoder is sent to the next layer for positive and negative sample classification, and a final detection frame of the defect is determined based on the classification result.

Step 13: and adjusting the network parameters of the defect detection model according to the difference between the real information and the detection information of the defect.

In some embodiments, the training times of the defect detection model can be adjusted according to the difference between the real information and the detection information of the defect, so as to adjust the network parameters of the defect detection model. If the real information is A and the detection information is B, the training times of the defect detection model can be adjusted, and the network parameters of the defect detection model can be adjusted; and if the real information is A and the detection information is B, but the confidence coefficient is lower than a set threshold value, adjusting the training times of the defect detection model, and further adjusting the network parameters of the defect detection model.

In some embodiments, the network parameters of the defect detection model may be adjusted according to the difference between the actual information and the detected information of the defect, and if there is a convolutional neural network in the defect detection model, the number, step length, and padding of convolutional kernels may be set, the excitation function may be adjusted, the parameters of the pooling layer may be adjusted, and the like.

In some embodiments, the loss value may be calculated according to the real information and the data of the detection information of the defect, and if the loss value is different from a preset loss threshold, the network parameter of the defect detection model is adjusted.

In an application scene, an image of an aluminum type material marked with defect types of under-leakage and a mask area is used as a training image and input into a defect detection model for training, and the training image is preprocessed, such as operations of random rotation, mirror image turning, random blurring, random cutting, brightness change and the like are adopted, so that a plurality of corresponding images are obtained, and the training image corresponding to the defect types of under-leakage is added. After the training image is correspondingly generated, a plurality of initial detection frames are obtained based on each pixel point in the training image, then positive and negative sample classification is carried out on the plurality of initial detection frames of the training image by utilizing the mask region, a final detection frame of the defect is determined based on the classification result, and detection information is obtained. And adjusting the network parameters of the defect detection model according to the difference between the real information and the detection information of the defect.

It can be understood that the initial detection frame takes the pixel point as a reference, and an area corresponding to the size of the initial detection frame is obtained on the training image.

In this embodiment, positive and negative sample classification is performed on a plurality of initial detection frames of a training image by using a mask region, and a final detection frame of a defect is determined based on a classification result, so that a network parameter of a defect detection model is adjusted according to a difference between real information of the defect and the detection information. Because the mask region is a region formed by pixel points representing defects in the training image, the classification quality of positive and negative samples can be improved when the positive and negative samples are classified, so that the precision of a defect detection model is improved, and the detection accuracy is improved.

Referring to fig. 3, fig. 3 is a schematic flowchart of a second embodiment of a training method of a defect detection model provided in the present application, in this embodiment, the defect detection model includes a feature extraction network, a region generation network, a feature aggregation layer, and a classification layer. The method comprises the following steps:

step 31: a training image is acquired.

Step 31 has the same or similar technical solutions as those in the above embodiments, and is not described herein.

Step 32: and inputting the training image into a feature extraction network to obtain a multi-dimensional feature map.

Referring to fig. 4, step 32 will be described:

the feature extraction network comprises a C1 layer, a C2 layer, a C3 layer, a C4 layer, a C5 layer, a P2 layer, a P3 layer, a P4 layer and a P5 layer. Wherein, the C1 layer, the C2 layer, the C3 layer, the C4 layer and the C5 layer are a down-sampling process, and the P5 layer, the P4 layer, the P3 layer and the P2 layer are an up-sampling process. The corresponding C1, C2, C3, C4, and C5 layers each include a convolutional layer, a pooling layer, and a RELU layer.

After the training image is input, the image is converted into corresponding color value channels according to the type of the image, such as a gray image and a color image, for example, the color image has three color value channels of RGB, which respectively represent red, green and blue, pixels in each channel can be represented by a two-dimensional array, and the numerical values represent pixel values between 0 and 255. Assuming a 900 x 600 color picture, the computer can be represented by an array matrix of (900 x 600 x 3). After the conversion is completed, the training images are sequentially downsampled in a C1 layer, a C2 layer, a C3 layer, a C4 layer and a C5 layer, and a plurality of feature maps are obtained in a C5 layer. Upsampling was performed in the order of P5 layer, P4 layer, P3 layer, and P2 layer. The P5 layer, the P4 layer, the P3 layer and the P2 layer each comprise an upsampling layer and a deconvolution layer, a plurality of feature images of the previous layer are amplified through the upsampling layer, only pooled data exist in the amplified feature images, so the weights of other positions are 0, and then missing contents are filled up through the deconvolution layer.

Corresponding relations exist among a C2 layer, a C3 layer, a C4 layer, a C5 layer, a P5 layer, a P4 layer, a P3 layer and a P2 layer, for example, the C2 layer corresponds to the P2 layer, the C3 layer corresponds to the P3 layer, the C4 layer corresponds to the P4 layer, and the C5 layer corresponds to the P5 layer. The pooling indexes generated at the pooling levels among the C2 level, the C3 level, the C4 level, and the C5 level are inputted to the corresponding upsampling levels among the P2 level, the P3 level, the P4 level, and the P5 level. In an actual operation process, when a feature map is generated at a pooling layer among layers C2, C3, C4 and C5, a pooling index is generated, that is, the pooling index corresponds to position information of an element in an existing feature map in a feature map at a previous layer, an upsampling layer among layers P2, P3, P4 and P5 is used for feature map enlargement, and when an enlarged feature map is obtained, the element in the feature map is placed at a corresponding position in the enlarged feature map according to the corresponding pooling index.

Corresponding feature maps are generated in the P2 layer, the P3 layer, the P4 layer and the P5 layer, and a plurality of feature maps of each of the P2 layer, the P3 layer, the P4 layer and the P5 layer are defined as a one-dimensional feature map, so that multidimensional feature maps are generated in the P2 layer, the P3 layer, the P4 layer and the P5 layer.

Thus, the outputs of the C2, C3, C4, and C5 layers are represented as { C₂,C₃,C₄,C₅Outputs of the P2, P3, P4, and P5 layers are expressed as { P₂,P₃,P₄,P₅}. Because of the large memory footprint, we will not represent the output of the C1 layer in it. { P₂,P₃,P₄,P₅Denotes a multidimensional feature map obtained from the feature extraction network.

Step 33: inputting the multi-dimensional feature map into an area generation network to obtain a plurality of initial detection frames, classifying positive and negative samples of the initial detection frames by using a mask area, and obtaining a plurality of candidate frames based on a classification result.

In some embodiments, the multidimensional feature map is input to the area generation network, in the area generation network, a convolution of 3 × 3 is performed on the multidimensional feature map, and then the convolution result is processed through the RELU layer, so as to increase the nonlinear features of the multidimensional feature map. And traversing through the RELU layer to obtain a feature map, and obtaining a plurality of initial detection frames corresponding to each pixel point.

For example, 9 initial detection frames are set for each pixel point of each feature map in the multi-dimensional feature map, and then there are corresponding initial detection frames with 9 times of the number of pixel points. It can be understood that the initial detection frame takes the pixel point as a reference, and an area corresponding to the size of the initial detection frame is obtained on the feature map.

And then positive and negative sample classification is carried out on the plurality of initial detection frames by using the mask area. With particular reference to fig. 5:

step 331: and acquiring a first pixel number occupied by the mask area in the real frame and acquiring a second pixel number occupied by the mask area in the initial detection frame.

It can be understood that after the initial detection frame is generated, part of the pixel points in the region where the initial detection frame is located may be defective pixel points. Therefore, the second number of pixels belonging to the mask region in the initial detection frame and the first number of pixels occupied by the mask region in the real frame are acquired. And the real box is an area where the defect mask is located in the input training image.

Step 332: and if the first ratio between the second pixel quantity and the first pixel quantity is smaller than the first reference threshold value, determining the initial detection frame as a negative sample.

It can be understood that, if the first ratio between the second pixel number and the first pixel number is smaller than the first reference threshold, it indicates that the number of defect masks in the initial detection frame corresponding to the second pixel number is smaller, and meets the requirement of the negative sample.

Step 333: and if the first ratio between the second pixel quantity and the first pixel quantity is not less than the first reference threshold, determining the initial detection frame as a positive sample.

It can be understood that, if the first ratio between the second pixel number and the first pixel number is not less than the first reference threshold, it indicates that the number of the defect masks in the initial detection frame corresponding to the second pixel number is greater, and meets the requirement of the positive sample.

For example, the first number of pixels occupied by the mask region in the real frame is M_gThe number of second pixels occupied by the mask region in the initial detection frame is M_rThe first ratio between the second number of pixels and the first number of pixels is M_r/M_gIf M is present_r/M_g<t₁The initial detection frame is determined to be a negative sample, otherwise the initial detection frame is determined to be a positive sample. Wherein t is₁Is a variable threshold.

Specifically, if the first ratio is not less than the first reference threshold, a second ratio between the first number of pixels and a third number of pixels of the real frame is obtained, and a third ratio between the second number of pixels and a fourth number of pixels of the initial detection frame is obtained. And if a fourth ratio between the third ratio and the second ratio is greater than a second reference threshold, determining that the initial detection frame is a positive sample.

For example, the third number of pixels B of the real frame_gThe first pixel number occupied by the mask region in the real frame is M_gA second ratio between the first number of pixels and a third number of pixels of the real frame

The fourth pixel number of the initial detection frame is B_rWherein the mask region occupies a second number of pixels M in the initial detection frame_rA third ratio between the second number of pixels and a fourth number of pixels of the initial detection frame

If M is_r/M_g<t₁And a fourth ratio between the third ratio and the second ratio is greater than a second reference threshold, e.g., P_r/P_g>t₂Then the initial detection box is determined to be a positive sample. Wherein t is₁Is a first reference threshold value, t₂Is a second reference threshold. Other detection frames which do not meet the conditions do not carry out positive and negative sample classification and do not participate inAnd (5) carrying out the subsequent process. By the method, the quantity of noise introduced in the positive and negative sample sampling process can be reduced, and the classification accuracy of the positive and negative samples by the defect detection model is improved.

Step 34: and inputting the multi-dimensional feature map and a plurality of candidate frames into the feature aggregation layer to obtain a target feature map corresponding to the candidate frames.

It is understood that the candidate frames are the candidate frames after the positive and negative samples have been classified correspondingly in the above steps. In some embodiments, a portion of the candidate boxes of positive and negative samples may be selected for input to the feature aggregation layer due to the large number.

In some embodiments, the feature aggregation layer is an ROI Pooling layer, and the feature aggregation layer is mapped in the multi-dimensional feature map according to the position coordinates of several candidate frames, so as to obtain the position corresponding to the candidate frame on each feature map, and then the feature map is pooled and adjusted to a target feature map with a fixed size for subsequent operations. Firstly, the coordinates of the candidate frame are adjusted according to the ratio of the input image to the size of each multi-dimensional feature map, so that the corresponding coordinates of the candidate frame in the feature map are obtained, and the area in the feature map is obtained. Then dividing the region into grids; each portion of the grid is processed for maximum pooling or average pooling. After the processing, even if the feature map output results with different sizes are all in fixed sizes, fixed-length output is realized.

In some embodiments, the feature accumulation layer is a ROI ALIGN layer, wherein the convolution processing is performed using a variable convolution in the ROI ALIGN layer. Mapping in the multi-dimensional characteristic diagram according to the position coordinates of the candidate frames, obtaining the position corresponding to the candidate frame on each characteristic diagram, then performing pooling operation on the position corresponding to the candidate frame, and adjusting the position to be a target characteristic diagram with a fixed size so as to perform subsequent operation. Firstly, the coordinates of the candidate frame are adjusted according to the ratio of the input image to the size of each multi-dimensional feature map, so that the corresponding coordinates of the candidate frame in the feature map are obtained, and the area in the feature map is obtained. Then dividing the region into grids; each portion of the grid is processed for maximum pooling or average pooling. After the processing, even if the feature map output results with different sizes are all in fixed sizes, fixed-length output is realized. The ROI ALIGN layer retains floating point numbers during size change during declaration operation, and the operation can improve the detection precision of small targets and reduce precision errors during regression.

Step 35: and inputting the target characteristic diagram corresponding to the candidate frame into a classification layer to obtain the detection information of the defects in the training image.

Specifically, the description is made with reference to fig. 6:

step 351: and classifying the target characteristic graphs corresponding to the candidate frames to obtain the detection classes of the candidate frames and the confidence degrees of the detection classes.

In some embodiments, the classification layer transforms the target feature map into a1 × n vector for classification, so as to obtain the detection classes of the candidate frames and the confidence degrees of the detection classes.

It can be understood that, since there are many detection categories of defects, there is a corresponding confidence level for each detection category corresponding to the candidate box.

Step 352: and processing the confidence coefficient by using the soft non-maximum value to obtain the processed confidence coefficient.

In some embodiments, this is accomplished using:

first, for a certain detection category, the detection category is sorted according to the confidence level obtained in step 351, such as from large to small. Then, the first candidate box is selected, and the overlapping rate of the 2 nd to the last 1 st candidate boxes is calculated in sequence. If the intersection ratio is smaller than the first threshold, the corresponding confidence coefficient is not changed, and if the intersection ratio is larger than the first threshold, the confidence coefficient is updated by using the following formula:

where s represents the calculated confidence, s_iWhich is indicative of the current degree of confidence,_Mrepresenting the candidate box with the highest confidence when non-maximum suppression is performed,_bis prepared by reacting with_MCandidate boxes when making overlap ratio comparisons. By using soft non-maximum suppression, filtering can be effectively avoidedThe detection frame is effective, and the detection rate is improved.

Step 353: and taking the candidate frame with the processed confidence coefficient higher than the preset confidence coefficient as a final detection frame, and outputting the position information representing the final detection frame, the detection category of the final detection frame and the confidence coefficient of the detection category.

In some embodiments, after the final detection frame is determined, the final detection frame is regressed into the training image by means of coordinate regression to obtain corresponding position information.

In combination with the above, the detection information of the defect includes a detection type of the defect and a confidence of the detection type.

By the mode, the soft non-maximum value is used for inhibiting, so that effective candidate frames can be effectively reduced and filtered, and the detection rate is improved.

Step 36: and adjusting the network parameters of the defect detection model according to the difference between the real information and the detection information of the defect.

By the mode, the introduction of a noisy anchor frame can be reduced by using the mask region, and the classification accuracy of the positive and negative samples is improved.

Referring to fig. 7, fig. 7 is a schematic flowchart of a third embodiment of a training method of a defect detection model provided in the present application. The defect detection model comprises a feature extraction network, a region generation network, a feature aggregation layer and a classification layer. The method comprises the following steps:

step 701: a training image is acquired.

Step 702: and inputting the training image into a feature extraction network to obtain a multi-dimensional feature map.

In some embodiments, the Feature extraction network is a Feature Pyramid Network (FPN), which can solve the multi-scale problem in object detection, and the performance of small object detection is greatly improved through simple network connection change without increasing the calculation amount of the original model.

For example, a feature extraction network is used for sequentially carrying out N times of downsampling on a training image, feature maps of downsampling from 2 nd to Nth times are obtained, an N-1-dimensional initial feature map is obtained, a void convolution is used for carrying out convolution processing in the process of the downsampling of the Nth time, and N is larger than 2; for the N-1 dimensional initial feature map, performing ith up-sampling on the N-1 dimensional initial feature map based on the N-i dimensional initial feature map to obtain an ith dimensional final feature map; wherein i is an integer of 1 to N-1.

Step 703: inputting the multi-dimensional feature map into an area generation network to obtain a plurality of initial detection frames, classifying positive and negative samples of the initial detection frames by using a mask area, and obtaining a plurality of candidate frames based on a classification result.

In some embodiments, the multi-dimensional feature map is traversed to obtain a plurality of initial detection boxes corresponding to each pixel point. It is understood that the division of the regions, which include the defect regions, is performed according to the plurality of initial detection frames corresponding to each pixel point.

In some embodiments, the region generating network is a region candidate network rpn (region pro posalnetwork). For example, one of the signatures has a size of N × 16, and after entering the region generation network, the region generation network is first convolved by 3 × 3 to obtain a signature of 256 × 16, and then convolved twice by 1 × 1 to obtain a signature of 18 × 16 and a signature of 36 × 16, respectively. And 18, 16, the feature map comprises a plurality of initial detection frames, positive and negative sample classification is carried out on the initial detection frames by using the mask region, and a plurality of candidate frames are obtained based on the classification result. 36 × 16 feature maps are used to calculate the bounding box regression offset for the initial detection box to obtain the accurate detection box region. And finally, processing the candidate frames and the offset after the positive and negative samples are classified to obtain a plurality of more accurate candidate frames, and simultaneously removing the initial candidate frames which are too small and exceed the boundary.

In some embodiments, before the positive and negative sample classification is performed on the initial detection frame, the intersection and parallel ratio between the final detection frame and the real frame in the last training is obtained; carrying out weighted summation on the intersection ratio and the confidence coefficient in the last training to obtain a first control value; and taking the product of the first control value and the preset parameter value as a first reference threshold value in the training.

Specifically, the following formula is adopted for expression:

t₁＝βc

c＝α·loc_a+(1-α)·cls_c。

wherein, t₁Representing a first reference threshold value, beta representing a preset parameter, c representing a first control value, loc _ a representing an intersection ratio between a final detection box and a real box in the last training, cls _ c representing a confidence degree of the final detection box in the last training, and alpha representing a second preset parameter. With each training of the defect detection model, the confidence coefficient of the final detection frame is correspondingly improved, the intersection ratio between the final detection frame and the real frame is also improved, the first reference threshold value is also improved through the cyclic training, the classification standard of the positive and negative samples is further increased, and the precision of the whole defect detection model can be improved.

By screening positive and negative samples by using the variable first reference threshold, the excessive sample data of a certain type can be effectively avoided in the training process, and the condition of unbalance of the positive and negative samples is avoided.

Step 704: and inputting the multi-dimensional feature map and a plurality of candidate frames into the feature aggregation layer to obtain a target feature map corresponding to the candidate frames.

In some embodiments, the feature accumulation layer is a ROI ALIGN layer, wherein the convolution processing is performed using a variable convolution in the ROI ALIGN layer. The detection precision of the small target can be improved, and the precision error in the frame regression process can be reduced.

Step 705: and inputting the target characteristic diagram corresponding to the candidate frame into a classification layer to obtain the detection information of the defects in the training image.

Step 706: and acquiring the intersection and parallel ratio between the final detection frame and the real frame in the last training, and acquiring a second control value based on the intersection and parallel ratio, wherein the second control value is in positive correlation with the intersection and parallel ratio.

In some embodiments, the following formula is used:

wherein, f (x) represents a second control value, and x represents the intersection ratio between the final detection frame and the real frame in the last training.

Step 707: and carrying out weighting processing on the second control value and the confidence coefficient in the last training to obtain a third control value.

In some embodiments, the following formula is used:

r＝(α·f(loc_a)+(1-α)·f(cls_c))^γ。

wherein r represents a third control value, loc _ a represents the intersection ratio between the final detection frame and the real frame in the last training, γ represents a preset coefficient, α represents a preset parameter, and cls _ c represents the confidence of the final detection frame in the last training.

Step 708: and obtaining a first loss value by using the difference between the final detection frame and the real frame which belong to the positive sample and are obtained by the training, and weighting the first loss value by using a third control value to obtain a second loss value.

In some embodiments, the operational procedure of step 708 is represented using the following formula:

wherein L is_boxRepresents the second penalty value, r represents the third control value, i represents the number of candidate boxes of positive samples in the input feature aggregation layer, Pos represents positive samples, smooth _ L1 represents the smoothed L1 penalty function, i.e., the first penalty value.

Step 709: and obtaining a third loss value by using the difference between the detection category and the real category of the positive sample obtained by the training, weighting the third loss value by using a third control value to obtain a fourth loss value, and obtaining a fifth loss value by using the difference between the detection category and the real category of the negative sample obtained by the training.

In some embodiments, the operation of step 709 is represented by the following formula:

wherein L is_clsRepresenting the sum of the fourth loss value and the fifth loss value,

a fourth loss value is represented as a fourth loss value,

and a fifth loss value is represented, r represents a third control value, i represents the number of candidate boxes of positive samples or negative samples in the input feature aggregation layer, Pos represents a positive sample, Neg represents a negative sample, and BCE is binary cross entropy.

It is understood that BCE represents the third loss value when performing the difference calculation between the detection class and the true class that are belonging to the positive sample.

Step 710: and adjusting the network parameters of the defect detection model by using the second loss value, the fourth loss value and the fifth loss value.

Through the method, the loss value is calculated by using the third control numerical value, and the third control numerical value is positively correlated with the second control numerical value and increases along with the increase of the second numerical value, so that the effective utilization rate of the voiceprint of the training image is improved. Corresponding calculation is carried out aiming at different training images, and the recognition precision of the defect detection model to different detection types is improved.

Referring to fig. 8, fig. 8 is a schematic flowchart illustrating an embodiment of a defect detection method provided in the present application. The method comprises the following steps:

step 81: and acquiring an image to be processed.

In some implementations, the images of the images to be processed can be color images or black and white images. And (5) processing the image to be processed.

Step 82: and detecting the image to be processed by using the defect detection model to obtain detection information of the corresponding defect in the image to be processed.

The defect detection model is obtained by training by adopting the training method of any one of the embodiments.

It can be understood that the defect detection method provided by the embodiment can realize the detection of the surface defects of the product based on the defect detection model, and further improve the quality inspection efficiency of the product.

In this embodiment, the defect detection processing is performed by using the defect detection model trained according to the above embodiment, so that the defects on the surface of the product can be effectively distinguished, the production process of the product is improved, and the production efficiency is improved.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of an image processing apparatus provided in the present application, where the image processing apparatus 90 includes a processor 91 and a memory 92 connected to the processor 91; wherein the memory 92 is used for storing program data and the processor 91 is used for executing the program data, for implementing the following method:

acquiring a training image, wherein the training image is marked with defective real information and a mask area, the mask area is an area formed by pixel points representing defects in the training image, and the real information comprises a real frame of the defects; detecting the training image by using a defect detection model to obtain detection information of the defect, wherein the detection information of the defect comprises a final detection frame of the defect, and the defect detection model is used for classifying positive and negative samples of a plurality of initial detection frames of the training image by using a mask region and determining the final detection frame of the defect based on a classification result; and adjusting the network parameters of the defect detection model according to the difference between the real information and the detection information of the defect.

Or, acquiring an image to be processed; and detecting the image to be processed by using the defect detection model to obtain detection information of the corresponding defect in the image to be processed.

It can be understood that, when the processor 91 is used for executing the program data, it is also used for implementing any method of the foregoing embodiment, and specific implementation steps thereof may refer to the foregoing embodiment, which is not described herein again.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an embodiment of a computer-readable storage medium 100 provided in the present application, the computer-readable storage medium 100 is used for storing program data 101, and the program data 101, when being executed by a processor, is used for implementing the following method steps:

It is understood that the program data 101, when executed by the processor, may be used to implement any method of the foregoing embodiments, and specific implementation steps thereof may refer to the foregoing embodiments, which are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated units in the other embodiments described above may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method for training a defect detection model, the method comprising:

acquiring a training image, wherein the training image is labeled with real information and a mask region, the real information comprises a real frame of a defect, the mask region is a region formed by pixel points representing the defect in the training image;

detecting the training image by using a defect detection model to obtain detection information of the defect, wherein the detection information of the defect comprises a final detection frame of the defect, the defect detection model is used for carrying out positive and negative sample classification on a plurality of initial detection frames of the training image by using the mask region, and determining the final detection frame of the defect based on a classification result;

and adjusting the network parameters of the defect detection model according to the difference between the real information and the detection information of the defect.

2. The training method of claim 1, wherein the defect detection model comprises a feature extraction network, a region generation network, a feature aggregation layer, and a classification layer;

the detecting the training image by using the defect detection model to obtain the detection information of the defect comprises the following steps:

inputting the training image into a feature extraction network to obtain a multi-dimensional feature map;

inputting the multi-dimensional feature map into an area generation network to obtain a plurality of initial detection frames, classifying positive and negative samples of the initial detection frames by using the mask area, and obtaining a plurality of candidate frames based on a classification result;

inputting the multi-dimensional feature map and the candidate frames into the feature gathering layer to obtain a target feature map corresponding to the candidate frames;

and inputting the target characteristic diagram corresponding to the candidate frame into the classification layer to obtain the detection information of the defects in the training image.

3. The training method of claim 2, wherein the classifying the plurality of initial detection boxes by positive and negative samples using the mask region comprises:

acquiring a first pixel number occupied by the mask area in the real frame and acquiring a second pixel number occupied by the mask area in the initial detection frame;

if a first ratio between the second pixel quantity and the first pixel quantity is smaller than a first reference threshold value, determining that the initial detection frame is a negative sample;

and if the first ratio is not less than the first reference threshold, determining the initial detection frame as a positive sample.

4. The training method according to claim 3, wherein the determining that the initial detection frame is a positive sample if the first ratio is not less than the first reference threshold comprises:

if the first ratio is not smaller than the first reference threshold, acquiring a second ratio between the first pixel quantity and a third pixel quantity of the real frame, and acquiring a third ratio between the second pixel quantity and a fourth pixel quantity of the initial detection frame;

and if a fourth ratio between the third ratio and the second ratio is greater than a second reference threshold, determining that the initial detection frame is a positive sample.

5. The training method according to claim 3, wherein the detection information of the defect further includes a detection category of the defect and a confidence of the detection category;

before the classifying the positive and negative samples of the plurality of initial detection boxes by using the mask region, the method further includes:

acquiring the intersection ratio between the final detection frame and the real frame in the last training;

carrying out weighted summation on the intersection ratio and the confidence coefficient in the last training to obtain a first control value;

and taking the product of the first control value and a preset parameter value as the first reference threshold value in the training.

6. The method of claim 3, wherein the feature extraction network is a Feature Pyramid Network (FPN), the region generation network is a region candidate network (RPN), and the feature aggregation layer is a ROIALIGN layer, wherein the ROIALIGN layer is convolved with a variable convolution.

7. The method of claim 3,

inputting the training image into a feature extraction network to obtain a multi-dimensional feature map, wherein the method comprises the following steps:

sequentially carrying out N times of downsampling on the training image by using the feature extraction network, and obtaining feature maps of downsampling from 2 nd to Nth times to obtain an N-1-dimensional initial feature map, wherein in the process of downsampling for the Nth time, a void convolution is used for carrying out convolution processing, and N is larger than 2;

for the N-1 dimensional initial feature map, performing ith up-sampling on the N-1 dimensional initial feature map based on the N-i dimensional initial feature map to obtain an ith dimensional final feature map; wherein i is an integer of 1 to N-1;

the obtaining a plurality of initial detection boxes comprises:

and traversing the multi-dimensional feature map to obtain a plurality of initial detection frames corresponding to each pixel point.

8. The method according to claim 3, wherein the detection information of the defect further includes a detection category of the defect and a confidence of the detection category;

inputting the target feature map corresponding to the candidate frame into the classification layer to obtain detection information of the defect in the training image, wherein the detection information includes:

classifying the target feature map corresponding to the candidate frame to obtain a detection category of the candidate frame and a confidence of the detection category;

processing the confidence coefficient by using a soft non-maximum value to obtain a processed confidence coefficient;

and taking the candidate frame with the processed confidence coefficient higher than the preset confidence coefficient as a final detection frame, and outputting the position information representing the final detection frame, the detection category of the final detection frame and the confidence coefficient of the detection category.

9. The method according to claim 1, wherein the real information of the defect further comprises a real category of the defect, the detection information of the defect further comprises a detection category of the defect and a confidence of the detection category;

the adjusting the network parameters of the defect detection model according to the difference between the real information and the detection information of the defect comprises:

acquiring an intersection ratio between a final detection frame and a real frame in the last training, and obtaining a second control value based on the intersection ratio, wherein the second control value is in positive correlation with the intersection ratio;

weighting the second control value and the confidence coefficient in the last training to obtain a third control value;

obtaining a first loss value by using the difference between the final detection frame and the real frame which belong to the positive sample and are obtained by the training, and weighting the first loss value by using the third control value to obtain a second loss value;

obtaining a third loss value by using the difference between the detection category and the real category of the positive sample obtained by the training, weighting the third loss value by using the third control value to obtain a fourth loss value, and obtaining a fifth loss value by using the difference between the detection category and the real category of the negative sample obtained by the training;

and adjusting the network parameters of the defect detection model by using the second loss value, the fourth loss value and the fifth loss value.

10. A method of defect detection, the method comprising:

acquiring an image to be processed;

detecting the image to be processed by using a defect detection model to obtain detection information of corresponding defects in the image to be processed, wherein the defect detection model is obtained by training by using the training method of the defect detection model according to any one of claims 1 to 9.

11. An image processing device, comprising a processor and a memory coupled to the processor;

wherein the memory is for storing program data and the processor is for executing the program data to implement the method of any one of claims 1 to 9 or the method of claim 10.

12. A computer-readable storage medium for storing program data for implementing the method of claim 10 when executed by a processor; or a method as claimed in any one of claims 1 to 9.