CN113469977A

CN113469977A - Flaw detection device and method based on distillation learning mechanism and storage medium

Info

Publication number: CN113469977A
Application number: CN202110765481.5A
Authority: CN
Inventors: 张晓武; 陈斌
Original assignee: Zhejiang Linyan Precision Technology Co ltd
Current assignee: Zhejiang Linyan Precision Technology Co ltd
Priority date: 2021-07-06
Filing date: 2021-07-06
Publication date: 2021-10-01
Anticipated expiration: 2041-07-06
Also published as: CN113469977B

Abstract

The invention discloses a flaw detection device, a flaw detection method and a storage medium based on a distillation learning mechanism. The characteristic information grafting module is used for selecting whether to graft the calculated fusion characteristic information into the student model or not by calculating the characteristic information difference value between the teacher model and the student model, so that the imitation capability of the student model on the teacher model is improved. According to the invention, a teacher model and a student model are built by utilizing a distillation learning mechanism, and then a characteristic information grafting module is introduced into the intermediate layer of the double-branch network, so that the expression capability of the student model is optimized, the learning process of the student model is accelerated, and the detection precision of the student model is improved.

Description

Flaw detection device and method based on distillation learning mechanism and storage medium

Technical Field

The invention belongs to the technical field of anomaly detection, and particularly relates to a flaw detection device and method based on a distillation learning mechanism, and a storage medium.

Background

With the flourishing development of national economy, the rapid development of manufacturing industry is promoted, and the automatic production technology of industrial products becomes a trend. The industrial products have more or less defects in the production process, such as the dead spots of the printed circuit board, the appearance defects of the textile, the dirt of the electronic screen and other inevitable surface defects, and the unqualified products are detected and removed in time, so that the quality of the industrial products can be ensured, and the production efficiency can be greatly improved.

The industrial product flaw detection method is mainly divided into a traditional method and an artificial intelligence method. The traditional method is divided into two types, one type is completely detected by human eyes, the detection result of the method is unstable, the subjective factor of detection personnel accounts for a large amount, and the detection personnel can have visual fatigue along with the increase of the product yield, so that a large amount of false detection is caused; the other method is to extract manual features from industrial products for classification, which is a technology derived by applying traditional image processing, but the defects of the manual features cause poor generalization for flaw detection. The artificial intelligence method is characterized in that flaw detection is carried out by using a deep learning-based method, collected image data of industrial products are analyzed through a deep neural network model, the positions of flaws are located, and under the condition of large and complex data quantity, the characteristic expression capacity of the model can be improved by deepening and widening the network model, so that the surface defects of the products are accurately detected, and a satisfactory detection effect is obtained.

At present, most of industrial product defect detection based on deep learning needs a large amount of product image data, and the data needs to contain defective data and non-defective data, so that a model with higher precision can be learned, but the non-defective industrial product image data can be easily obtained in an actual scene, and the defective industrial product image data randomly appears and is difficult to acquire, so that a supervised learning mode is not suitable for the scene. Therefore, an industrial product flaw detection scheme which is simple to deploy and fits with an actual scene is urgently needed, and the method can be used for learning only by using an unblemished industrial product, so that the generalization and the accuracy of the model are improved.

Disclosure of Invention

The present invention aims to provide a flaw detection device, method and storage medium based on a distillation learning mechanism, and aims to solve the above problems.

The invention is mainly realized by the following technical scheme:

a flaw detection device based on a distillation learning mechanism comprises an acquisition module, a training module and a detection module; the acquisition module is used for acquiring flaw-free industrial product image samples and forming a training set; the training module is used for training a network model by adopting a training set, the network model comprises a teacher model and a student model with consistent backbone network structures, and the teacher model is used for guiding the student model to finally obtain an optimized network model; the detection module is used for inputting the image to be detected into the optimized network model and outputting a detection result; the network layer complexity of the teacher model is higher than that of the student model, a characteristic information grafting module is added in an intermediate layer between the teacher model and the student model and used for calculating difference values of characteristic information of residual combination blocks of the teacher model and the student model at the same level, whether the expression capacity of the student model characteristic information is similar to that of the teacher model or not is judged, and if the difference is large, the teacher model characteristic information and the student model characteristic information are fused and original student model characteristic information is replaced.

In order to better implement the present invention, the feature information grafting module further includes a feature information difference value calculating part and a grafting feature information calculating part, the feature information difference value calculating part is configured to calculate a difference value of feature information of a residual combination block having the same level as the teacher model and the student model, when the difference value is greater than a threshold value, the grafting feature information calculating part is entered, the teacher model feature information and the student model feature information are fused together, then original student model feature information is replaced, and when the difference value is less than or equal to the threshold value, the student model feature information is not replaced.

In order to better realize the invention, further, after the characteristic information is input into the characteristic information difference value calculation part, a leveling layer and a full connection layer are sequentially used for processing so as to reduce extra calculation amount brought by high dimension, and then difference value calculation is carried out; the calculation formula of the characteristic information difference value calculation part is as follows:

wherein: d is the difference value of the characteristic information,

m is the dimension of the characteristic information, although the complexity of the modules used by the teacher model and the student model is different, the output dimension of the last output layer of the module is the same,

m is a dimension index parameter and is a dimension index parameter,

s is the characteristic information of the student model,

and t is teacher model characteristic information.

In order to better realize the invention, in the grafting characteristic information calculating part, the flexible maximum value calculation is carried out on the teacher model and the student model characteristic information in the channel direction to obtain the probability value of the characteristic information on each channel, when the probability value is larger than the threshold value, the channel student model characteristic information is not replaced, otherwise, the teacher model characteristic information and the student model characteristic information are fused, and the channel student model characteristic information is replaced.

In order to better implement the invention, further, the feature information fusion mapping relation of the grafting feature information calculation part is as follows:

F＝F_s+αF_t

wherein: f represents the processed characteristic information and,

F_sthe characteristic information of the student model is represented,

F_tthe information of the characteristics of the teacher model is represented,

alpha is a learnable tuning parameter.

The characteristic information grafting module mainly comprises a characteristic information difference value calculating part and a grafting characteristic information calculating part, wherein the characteristic information difference value calculating part calculates the difference value of the characteristic information of a residual combination block with the same level as a teacher model and a student model, when the difference value is larger than a threshold value, the characteristic information calculating part enters a grafting characteristic information calculating part, the characteristic information of the teacher model and the characteristic information of the student model are fused together, then the original characteristic information of the student model is replaced, if the difference value is smaller than the threshold value, the characteristic information of the student model is not replaced, and finally the characteristic information grafting mode is calculated along with model iterative training.

In order to better implement the method, the main networks of the teacher model and the student model respectively comprise a convolution layer, a batch normalization layer, an activation function layer and a plurality of residual modules which are sequentially arranged from front to back.

In order to better implement the present invention, the teacher model and the student model adopt a ResNet101 structure and a ResNet20 structure, the teacher model and the student model respectively include 4 residual combination blocks, the number of residual modules included in the 4 residual combination blocks of the teacher model is sequentially 6, 12, 24 and 6, and the number of residual modules included in the 4 residual combination blocks of the student model is sequentially 1, 2, 4 and 1.

The invention is mainly realized by the following technical scheme:

a flaw detection method based on a distillation learning mechanism is carried out by adopting the flaw detection device, and comprises the following steps:

step S100: collecting flaw-free and flaw-containing industrial product image samples, and forming a training set and a test set, wherein the training set consists of flaw-free industrial product image samples, and the test set consists of flaw-free and flaw-free industrial product image samples;

step S200: inputting a training set, training a teacher model and a student model simultaneously, adding a characteristic information grafting module between the teacher model and the student model, and finally performing loss calculation on characteristic information output by the teacher model and the student model to obtain a trained network model;

step S300: testing the precision of the trained network model by adopting a test set to obtain an optimized network model;

step S400: and inputting the image to be detected into the optimized network model and outputting a detection result.

In order to better implement the present invention, further, in step S200, the loss functions for performing the loss calculation are an euclidean distance metric loss function and a cosine similarity loss function, where the euclidean distance metric loss function is to calculate a distance similarity between the student model and the teacher model feature information as a loss value; the cosine similarity loss function is used for calculating a loss value between the teacher model and the student model feature information in the feature information channel direction; the calculation formula of the Euclidean distance measurement loss function is as follows:

wherein: n is the batch size, i is the sample index parameter, s_iFor student model feature information, t_iCharacteristic information of the teacher model is obtained;

the formula for calculating the cosine similarity loss function is as follows:

wherein: j is a dimension index of the feature information,

the Flatten () is a feature vector transformation function and is used for converting high-dimensional feature information into one-dimensional feature information;

finally, the overall loss function is calculated as:

L＝L_dis+λ(L_c1+L_c2+L_c3+L_c4)

wherein: l is_c1Representing the cosine similarity loss function calculated by the first-stage residual combination block,

L_c2representing the cosine similarity loss function calculated by the second-stage residual combination block,

L_c3representing the cosine similarity loss function calculated by the third-level residual combination block,

L_c4representing the fourth level residueThe cosine similarity loss function calculated by the difference combination block,

lambda is a regulating parameter and is set to be 0.8.

The cosine similarity loss function is used for calculating loss values between teacher feature information and student feature information in the direction of the feature information channel, and the loss calculation is performed in parallel at the position where the feature information grafting module is placed. The Euclidean distance measurement loss function is used for calculating the distance similarity between the characteristic information in the student model and the characteristic information in the teacher model to serve as a loss value. The invention uses a characteristic information grafting module and a cosine similarity loss function in the middle layer, directly accelerates the characteristic learning of students on a characteristic level, improves the precision, indirectly shortens the characteristic distance of the two models on a space and improves the robustness of the characteristics of the student models.

A computer readable storage medium storing computer program instructions which, when executed by a processor, implement the flaw detection method described above.

According to the method, a teacher model and a student model are constructed according to design, the structures of the teacher model and the student model are kept consistent, the complexity of a network layer of the teacher model is higher than that of the student model, then the learning progress and the detection precision of the student model are accelerated by adding a characteristic information grafting module in a middle layer, and finally a loss function is used for calculating loss value optimization model parameters. The characteristic information grafting module is used for selecting whether to graft the calculated fusion characteristic information into the student model or not by calculating the characteristic information difference value between the teacher model and the student model, so that the imitation capability of the student model on the teacher model is improved. According to the invention, a teacher model and a student model are built by utilizing a distillation learning mechanism, and then a characteristic information grafting module is introduced into the intermediate layer of the double-branch network, so that the expression capability of the student model is optimized, the learning process of the student model is accelerated, and the detection precision of the student model is improved.

The method initializes the teacher model by using the pre-training model, optimizes the loss value by using the student model by using a random initialization method, iteratively updates the model parameters until the training iteration number is equal to the total iteration number to obtain a final model, and finally tests the precision of the model on the test set.

The characteristic information grafting module mainly comprises a characteristic information difference value calculation part and a grafting characteristic information calculation part, and is introduced because a great deal of characteristic information obtained after convolution kernel processing can cover various characteristics, such as textures, edges and other characteristics, but the characteristic information has a great number of useless characteristics, part of the useless characteristics can be removed in a common method through fine tuning and model iterative training, and the detection potential of a model algorithm is limited to a certain extent. The invention provides a characteristic information grafting module according to the purpose of a model, the purpose of the model is to lead a teacher model to guide a student model, the ideal effect is that the expression capacity of the student model on flawless image data is close to that of the teacher model, if the characteristic information learned in the model training process is not processed in the process, the influence of useless characteristics is ignored, the student model can not reach the expression capacity of the teacher model, so the main purpose of the module is to remove the invalid characteristics from the characteristic information of the student model, and new characteristics are grafted to reactivate a network, thereby achieving the purpose of improving precision.

The method comprises the steps of firstly calculating the difference value between the characteristic information output by the same-level residual block combination blocks of the student model and the teacher model, and then judging the difference value, wherein the difference value is larger than a threshold value, so that the difference between the learned characteristic information of the student model and the characteristics of the teacher model is very large, grafting is needed, and a grafting characteristic information calculation part is carried out. The method comprises the steps of calculating a flexible maximum value in the channel direction, judging the contribution degree of the characteristic to model learning through the probability value, and judging the grafting characteristic through a threshold value, wherein if all the characteristics of a teacher model are grafted into a student model, the autonomous learning capacity of the student model is influenced, and the generalization of the student model is limited.

The invention has the beneficial effects that:

(1) the characteristic information of a residual error combination block with the same level as a teacher model and a student model is subjected to difference value calculation through a characteristic information grafting module, whether the expression capacity of the student model characteristic information is similar to that of the teacher model or not is judged, if the difference is too large, the teacher model characteristic information and the student model characteristic information are fused to replace the original student model characteristic information, and the purposes of optimizing the expression capacity of the student model and accelerating the learning process of the student model are achieved;

(2) according to the method, a teacher model with strong learning capacity is used for conducting flawless sample learning guidance on the student model, so that the student model can better reconstruct flawless sample characteristic information, the difference between reconstructed flawed characteristic information and flawless characteristic information is increased, the purpose of flaw detection is achieved, the advantages of fitting an actual scene and being simple to deploy are achieved;

(3) according to the method, only flawless data is used for training and generating the model, the model is more suitable for actual conditions, and secondly, a characteristic information grafting module is introduced into the middle layer of the teacher model and the student model double-branch network, so that the expression capability of the student model is optimized, the learning process of the student model is accelerated, and the detection precision of the student model is improved.

Drawings

FIG. 1 is a schematic diagram of the overall network structure of the present invention;

FIG. 2 is a schematic structural diagram of a characteristic information grafting module;

FIG. 3 is a flow chart of characteristic information grafting;

FIG. 4 is a flow chart of model training.

Detailed Description

Example 1:

a flaw detection device based on a distillation learning mechanism comprises an acquisition module, a training module and a detection module; the acquisition module is used for acquiring flaw-free industrial product image samples and forming a training set; the training module is used for training a network model by adopting a training set, the network model comprises a teacher model and a student model with consistent backbone network structures, and the teacher model is used for guiding the student model to finally obtain an optimized network model; the detection module is used for inputting the image to be detected into the optimized network model and outputting a detection result.

As shown in fig. 1, the complexity of the network layer of the teacher model is higher than that of the network layer of the student model, a feature information grafting module is added in the middle layer between the teacher model and the student model, and is used for calculating the difference value of the feature information of the residual error combination block with the same level as the teacher model and the student model, and judging whether the expression capability of the feature information of the student model is similar to that of the teacher model, if the difference is large, the feature information of the teacher model and the feature information of the student model are fused, and original feature information of the student model is replaced.

The characteristic information grafting module is used for selecting whether to graft the calculated fusion characteristic information into the student model or not by calculating the characteristic information difference value between the teacher model and the student model, so that the imitation capability of the student model on the teacher model is improved. According to the invention, a teacher model and a student model are built by utilizing a distillation learning mechanism, and then a characteristic information grafting module is introduced into the intermediate layer of the double-branch network, so that the expression capability of the student model is optimized, the learning process of the student model is accelerated, and the detection precision of the student model is improved.

According to the method, only flawless data is used for training and generating the model, the model is more suitable for actual conditions, and secondly, a characteristic information grafting module is introduced into the middle layer of the teacher model and the student model double-branch network, so that the expression capability of the student model is optimized, the learning process of the student model is accelerated, and the detection precision of the student model is improved.

Example 2:

the embodiment is optimized on the basis of embodiment 1, as shown in fig. 2, the feature information grafting module includes a feature information difference value calculating part and a grafting feature information calculating part, the feature information difference value calculating part is used for calculating a difference value of feature information of a residual combination block of a teacher model and a student model at the same level, as shown in fig. 3, when the difference value is greater than a threshold value, the grafting feature information calculating part is entered, the teacher model feature information and the student model feature information are fused together, then original student model feature information is replaced, and when the difference value is less than or equal to the threshold value, the student model feature information is not replaced.

Further, as shown in fig. 3, after the feature information is input to the feature information difference value calculation portion, a leveling layer and a full connection layer are sequentially used for processing to reduce the extra calculation amount caused by high dimension, and then difference value calculation is performed; the calculation formula of the characteristic information difference value calculation part is as follows:

wherein: d is the difference value of the characteristic information,

m is a dimension index parameter and is a dimension index parameter,

s is the characteristic information of the student model,

and t is teacher model characteristic information.

Further, as shown in fig. 3, in the grafted feature information calculation section, the flexible maximum value calculation is performed on the teacher model and the student model feature information in the channel direction to obtain the probability value of the feature information on each channel, when the probability value is greater than the threshold value, the channel student model feature information is not replaced, otherwise, the teacher model feature information and the student model feature information are fused, and the channel student model feature information is replaced.

Further, the feature information fusion mapping relation of the grafting feature information calculation part is as follows:

F＝F_s+αF_t

wherein: f represents the processed characteristic information and,

F_sthe characteristic information of the student model is represented,

F_tthe information of the characteristics of the teacher model is represented,

alpha is a learnable tuning parameter.

The invention utilizes the teacher model with stronger learning ability to conduct the flawless sample learning guidance on the student model, so that the student model can better reconstruct the flawless sample characteristic information, increase the difference between the reconstructed flawed characteristic information and the flawless characteristic information, achieve the purpose of flaw detection, and realize the advantages of fitting the actual scene and being simple in deployment.

Other parts of this embodiment are the same as embodiment 1, and thus are not described again.

Example 3:

in this embodiment, optimization is performed based on embodiment 1 or 2, and as shown in fig. 1, the backbone networks of the teacher model and the student model respectively include a convolution layer, a batch normalization layer, an activation function layer, and a plurality of residual modules, which are sequentially arranged from front to back. And the activation function layer adopts parameters to correct the linear unit layer.

Further, the teacher model and the student model respectively comprise 4 residual combination blocks, the number of residual modules contained in the 4 residual combination blocks of the teacher model is 6, 12, 24 and 6, and the number of residual modules contained in the 4 residual combination blocks of the student model is 1, 2, 4 and 1.

The rest of this embodiment is the same as embodiment 1 or 2, and therefore, the description thereof is omitted.

Example 4:

Further, in step S200, the loss function for performing the loss calculation is an euclidean distance metric loss function and a cosine similarity loss function, and the euclidean distance metric loss function is to calculate the distance similarity between the student model and the teacher model feature information as a loss value; the cosine similarity loss function is used for calculating a loss value between the teacher model and the student model feature information in the feature information channel direction.

The calculation formula of the Euclidean distance measurement loss function is as follows:

the formula for calculating the cosine similarity loss function is as follows:

wherein: j is a dimension index of the feature information,

finally, the overall loss function is calculated as:

L＝L_dis+λ(L_c1+L_c2+L_c3+L_c4)

L_c4represents the cosine similarity loss function calculated by the fourth-level residual combination block,

lambda is a regulating parameter and is set to be 0.8.

Example 5:

a flaw detection method based on a distillation learning mechanism comprises the following steps:

collecting image samples of the non-defective industrial products and defective industrial products, taking most of the non-defective samples as a training set, and taking the residual non-defective samples and all defective samples as a test set;

constructing a teacher model and a student model according to design, wherein the structures of the teacher model and the student model are consistent, but the complexity of a network layer of the teacher model is greater than that of the student model, accelerating the learning progress and the detection precision of the student model by adding a characteristic information grafting module in a middle layer, and finally performing loss calculation on characteristic information output by the teacher model and characteristic information output by the student model;

calculating a loss value between the characteristic information by using the Euclidean distance measurement loss function and the cosine similarity loss function;

as shown in fig. 4, the teacher model is initialized by using the pre-training model, the student model adopts a random initialization method, an optimizer needs to be selected to optimize the loss value, the model parameters are iteratively updated until the training iteration number is equal to the total iteration number to obtain a final model, and finally, the precision of the model is tested on the test set.

Further, as shown in fig. 1, the teacher model and the student model adopt a general backbone network structure, the teacher model and the student model both maintain the same structure, and all adopt convolutional layers, batch normalization layers, activation function layers, and a plurality of residual modules stacked according to a certain combination rule, but the complexity of the network layers is different, in the embodiment, the teacher model adopts a ResNet101 structure, and the student model adopts a ResNet20 structure, and all have 4 residual combination blocks, the number of residual modules included in the residual combination block in the teacher model is 6, 12, 24, and 6, and the number of residual modules included in the residual combination in the student model is 1, 2, 4, and 1 in sequence.

Further, as shown in fig. 2, the feature information grafting module is mainly composed of a feature information difference value calculation part and a grafting feature information calculation part, the feature information difference value calculation part calculates the difference value of the feature information of the residual error combination block with the same level as the teacher model and the student model, as shown in fig. 3, when the difference value is greater than a threshold value, the grafting feature information calculation part is entered, the teacher model feature information and the student model feature information are fused together, then the original student model feature information is replaced, if the difference value is less than the threshold value, the student model feature information is not replaced, and finally, the feature information grafting mode is calculated along with the model iterative training.

Furthermore, the cosine similarity loss function is used for calculating loss values between teacher feature information and student feature information in the direction of the feature information channel, and insertion loss calculation is performed in parallel at the position where the feature information grafting module is placed.

Further, the Euclidean distance measurement loss function is used for calculating the distance similarity between the feature information in the student model and the feature information in the teacher model to serve as a loss value.

Example 6:

according to the design, a teacher model and a student model are built, as shown in figure 1, the overall structure of a network model is a double-branch structure, the teacher model and the student model are built according to the design, the structures of the teacher model and the student model are consistent, but the complexity of the network layer of the teacher model is higher than that of the student model, and then the learning progress and the detection precision of the student model are accelerated by adding a characteristic information grafting module in the middle layer;

and finally, performing loss calculation on the characteristic information output by the teacher model and the characteristic information output by the student model.

Further, as shown in fig. 1, the teacher model and the student model adopt a general backbone network structure, both of which maintain the same structure, and all of which adopt convolutional layers, batch normalization layers, activation function layers, and a plurality of residual error modules stacked according to a certain combination rule, but the complexity of the network layers is different, in the embodiment, the teacher model adopts a ResNet101 structure, while the student model adopts a ResNet20 structure, all of which have 4 residual error combination blocks, the number of residual error modules included in the residual error combination block in the teacher model is sequentially 6, 12, 24, and 6, each residual error combination block is called a complex residual error combination block, the number of residual error modules included in the residual error combination block in the student model is sequentially 1, 2, 4, and 1, and each residual error combination block is called a simple residual error combination block.

Further, as shown in fig. 2, the feature information grafting module mainly includes a feature information difference value calculating part and a grafting feature information calculating part, the feature information difference value calculating part calculates the difference value of the feature information of the residual combination block at the same level as the teacher model and the student model, and the calculation formula is as follows:

d in the formula is a difference value of the characteristic information, M is the dimension of the characteristic information, although the complexity of the modules used by the teacher model and the student model is different, the output dimension of the last output layer of the modules is the same, M is a dimension index parameter, s is the characteristic information of the student model, and t is the characteristic information of the teacher model.

Further, as shown in fig. 3, after the feature information is input to the feature information difference value calculation portion, a leveling layer and a full connection layer are required to be used for processing, so that the extra calculation amount caused by high dimensionality is reduced.

Further, when the difference value is larger than a threshold value, the grafting characteristic information calculation part is carried out, and if the difference value is smaller than the threshold value, the student model characteristic information is not replaced, and the threshold value is set to be 0.2.

Further, replacing the feature information is performed in the channel direction of the feature information, as shown in fig. 2, the feature information is a feature block, each channel represents features with different meanings, as shown in fig. 3, the feature information is subjected to flexible maximum value calculation in the channel direction to obtain a probability value of the feature information on each channel, when the probability value is greater than a threshold value, processing is not performed, otherwise, the teacher model feature information and the student model feature information are fused, the threshold value is set to be 0.3, and finally the combined feature information is output. The characteristic information fusion mapping relation in the grafting characteristic information calculation part is as follows:

F＝F_s+αF_t

in the formula, F represents processed characteristic information, F_sRepresenting student model characteristic information, F_tRepresenting teacher model feature information, alpha is a learnable tuning parameter.

Further, the loss functions used in the present invention are an euclidean distance metric loss function and a cosine similarity loss function, and the calculation formula of the euclidean distance metric loss function is as follows:

in the formula, N is the batch size, i is the sample index parameter, s_iFor student model feature information, t_iAnd (4) feature information of the teacher model.

Further, the cosine similarity loss function calculation formula is as follows:

in the formula, j is a feature information dimension index, and Flatten () is a feature vector transformation function, so that high-dimensional feature information is converted into one-dimensional feature information.

Finally, the overall loss function is calculated as:

L＝L_dis+λ(L_c1+L_c2+L_c3+L_c4)

l in the formula_c1And (3) representing cosine similarity loss function calculated by the first-stage residual combination block and other similar parameters in the same way, as shown in fig. 1, calculating cosine similarity loss value of each-stage residual combination block, wherein lambda is an adjusting parameter and is set to be 0.8.

Further, as shown in fig. 4, a teacher model is initialized by using a pre-training model, a student model adopts a random initialization method, an optimizer needs to be selected to optimize a loss value, model parameters are iteratively updated until a final model is obtained when the number of training iterations is equal to the total number of iterations, and finally, the precision of the model is tested on a test set.

In the training process, the trained image is simultaneously input into a student model and a teacher model, and the difference value of characteristic information output by the two models is calculated, wherein the teacher model can distinguish a flawless product image from a flawed product image because of setting a pre-training parameter, and the student model only learns the flawless product image, so that the teacher model can reconstruct the characteristic information well aiming at the flawed product image, the student model is not reconstructed well, and finally the difference value of the characteristic information reconstruction is calculated, so that whether the product image sample has a flaw or not can be judged, and if the difference value is large, the flaw exists, otherwise, the product image sample is normal. If the position of the flaw needs to be located, the model is propagated backwards to find the position of the flaw through gradient change.

In conclusion, the flaw detection method provided by the invention only needs to use the flaw-free product image sample for learning, reduces the occurrence of the situation that the sample is difficult to collect, is more suitable for the actual scene, is easy to operate in network structure construction, and has higher feasibility of implementation.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. A flaw detection device based on a distillation learning mechanism is characterized by comprising an acquisition module, a training module and a detection module; the acquisition module is used for acquiring flaw-free industrial product image samples and forming a training set; the training module is used for training a network model by adopting a training set, the network model comprises a teacher model and a student model with consistent backbone network structures, and the teacher model is used for guiding the student model to finally obtain an optimized network model; the detection module is used for inputting the image to be detected into the optimized network model and outputting a detection result;

the network layer complexity of the teacher model is higher than that of the student model, a characteristic information grafting module is added in an intermediate layer between the teacher model and the student model and used for calculating difference values of characteristic information of residual combination blocks of the teacher model and the student model at the same level, whether the expression capacity of the student model characteristic information is similar to that of the teacher model or not is judged, and if the difference is large, the teacher model characteristic information and the student model characteristic information are fused and original student model characteristic information is replaced.

2. The flaw detection device based on the distillation learning mechanism as claimed in claim 1, wherein the characteristic information grafting module comprises a characteristic information difference value calculation part and a grafting characteristic information calculation part, the characteristic information difference value calculation part is used for performing difference value calculation on the characteristic information of a residual error combination block with the same level as a teacher model and a student model, when the difference value is larger than a threshold value, the grafting characteristic information calculation part is entered, the teacher model characteristic information and the student model characteristic information are fused together, then original student model characteristic information is replaced, and when the difference value is smaller than or equal to the threshold value, the student model characteristic information is not replaced.

3. The flaw detection device based on the distillation learning mechanism as claimed in claim 2, wherein after the feature information is input into the feature information difference value calculation part, the feature information is sequentially processed by a leveling layer and a full connection layer to reduce the extra calculation amount caused by high dimension, and then the difference value calculation is performed; the calculation formula of the characteristic information difference value calculation part is as follows:

wherein: d is the difference value of the characteristic information,

m is a dimension index parameter and is a dimension index parameter,

s is the characteristic information of the student model,

and t is teacher model characteristic information.

4. The flaw detection device based on the distillation learning mechanism as claimed in claim 2, wherein in the grafting characteristic information calculation part, the flexible maximum value calculation is performed on the teacher model and the student model characteristic information in the channel direction to obtain the probability value of the characteristic information on each channel, when the probability value is greater than the threshold value, the channel student model characteristic information is not replaced, otherwise, the teacher model characteristic information and the student model characteristic information are fused and the channel student model characteristic information is replaced.

5. The flaw detection device based on the distillation learning mechanism is characterized in that the feature information fusion mapping relation of the grafting feature information calculation part is as follows:

F＝F_s+αF_t

wherein: f represents the processed characteristic information and,

F_sthe characteristic information of the student model is represented,

F_tthe information of the characteristics of the teacher model is represented,

alpha is a learnable tuning parameter.

6. The flaw detection device based on the distillation learning mechanism is characterized in that the main network of the teacher model and the student model respectively comprises a convolution layer, a batch normalization layer, an activation function layer and a plurality of residual modules which are sequentially arranged from front to back.

7. The flaw detection device based on the distillation learning mechanism as claimed in claim 6, wherein the teacher model adopts a ResNet101 structure, the student model adopts a ResNet20 structure, the teacher model and the student model respectively comprise 4 residual combination blocks, the number of residual modules contained in the 4 residual combination blocks of the teacher model is sequentially 6, 12, 24 and 6, and the number of residual modules contained in the 4 residual combination blocks of the student model is sequentially 1, 2, 4 and 1.

8. A defect detection method based on a distillation learning mechanism, which is performed by using the defect detection device according to any one of claims 1 to 7, and comprises the following steps:

9. The flaw detection method based on the distillation learning mechanism according to claim 8, wherein in step S200, the loss functions for performing the loss calculation are an euclidean distance metric loss function and a cosine similarity loss function, and the euclidean distance metric loss function is a function for calculating distance similarity between the student model and the teacher model feature information as a loss value; the cosine similarity loss function is used for calculating a loss value between the teacher model and the student model feature information in the feature information channel direction; the calculation formula of the Euclidean distance measurement loss function is as follows:

the formula for calculating the cosine similarity loss function is as follows:

wherein: j is a dimension index of the feature information,

finally, the overall loss function is calculated as:

lambda is a regulating parameter and is set to be 0.8.

10. A computer readable storage medium storing computer program instructions, which when executed by a processor implement the fault detection method of claim 8 or 9.