CN113469977A - Flaw detection device and method based on distillation learning mechanism and storage medium - Google Patents

Flaw detection device and method based on distillation learning mechanism and storage medium Download PDF

Info

Publication number
CN113469977A
CN113469977A CN202110765481.5A CN202110765481A CN113469977A CN 113469977 A CN113469977 A CN 113469977A CN 202110765481 A CN202110765481 A CN 202110765481A CN 113469977 A CN113469977 A CN 113469977A
Authority
CN
China
Prior art keywords
model
characteristic information
student model
student
teacher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110765481.5A
Other languages
Chinese (zh)
Other versions
CN113469977B (en
Inventor
张晓武
陈斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Linyan Precision Technology Co ltd
Original Assignee
Zhejiang Linyan Precision Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Linyan Precision Technology Co ltd filed Critical Zhejiang Linyan Precision Technology Co ltd
Priority to CN202110765481.5A priority Critical patent/CN113469977B/en
Publication of CN113469977A publication Critical patent/CN113469977A/en
Application granted granted Critical
Publication of CN113469977B publication Critical patent/CN113469977B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a flaw detection device, a flaw detection method and a storage medium based on a distillation learning mechanism. The characteristic information grafting module is used for selecting whether to graft the calculated fusion characteristic information into the student model or not by calculating the characteristic information difference value between the teacher model and the student model, so that the imitation capability of the student model on the teacher model is improved. According to the invention, a teacher model and a student model are built by utilizing a distillation learning mechanism, and then a characteristic information grafting module is introduced into the intermediate layer of the double-branch network, so that the expression capability of the student model is optimized, the learning process of the student model is accelerated, and the detection precision of the student model is improved.

Description

Flaw detection device and method based on distillation learning mechanism and storage medium
Technical Field
The invention belongs to the technical field of anomaly detection, and particularly relates to a flaw detection device and method based on a distillation learning mechanism, and a storage medium.
Background
With the flourishing development of national economy, the rapid development of manufacturing industry is promoted, and the automatic production technology of industrial products becomes a trend. The industrial products have more or less defects in the production process, such as the dead spots of the printed circuit board, the appearance defects of the textile, the dirt of the electronic screen and other inevitable surface defects, and the unqualified products are detected and removed in time, so that the quality of the industrial products can be ensured, and the production efficiency can be greatly improved.
The industrial product flaw detection method is mainly divided into a traditional method and an artificial intelligence method. The traditional method is divided into two types, one type is completely detected by human eyes, the detection result of the method is unstable, the subjective factor of detection personnel accounts for a large amount, and the detection personnel can have visual fatigue along with the increase of the product yield, so that a large amount of false detection is caused; the other method is to extract manual features from industrial products for classification, which is a technology derived by applying traditional image processing, but the defects of the manual features cause poor generalization for flaw detection. The artificial intelligence method is characterized in that flaw detection is carried out by using a deep learning-based method, collected image data of industrial products are analyzed through a deep neural network model, the positions of flaws are located, and under the condition of large and complex data quantity, the characteristic expression capacity of the model can be improved by deepening and widening the network model, so that the surface defects of the products are accurately detected, and a satisfactory detection effect is obtained.
At present, most of industrial product defect detection based on deep learning needs a large amount of product image data, and the data needs to contain defective data and non-defective data, so that a model with higher precision can be learned, but the non-defective industrial product image data can be easily obtained in an actual scene, and the defective industrial product image data randomly appears and is difficult to acquire, so that a supervised learning mode is not suitable for the scene. Therefore, an industrial product flaw detection scheme which is simple to deploy and fits with an actual scene is urgently needed, and the method can be used for learning only by using an unblemished industrial product, so that the generalization and the accuracy of the model are improved.
Disclosure of Invention
The present invention aims to provide a flaw detection device, method and storage medium based on a distillation learning mechanism, and aims to solve the above problems.
The invention is mainly realized by the following technical scheme:
a flaw detection device based on a distillation learning mechanism comprises an acquisition module, a training module and a detection module; the acquisition module is used for acquiring flaw-free industrial product image samples and forming a training set; the training module is used for training a network model by adopting a training set, the network model comprises a teacher model and a student model with consistent backbone network structures, and the teacher model is used for guiding the student model to finally obtain an optimized network model; the detection module is used for inputting the image to be detected into the optimized network model and outputting a detection result; the network layer complexity of the teacher model is higher than that of the student model, a characteristic information grafting module is added in an intermediate layer between the teacher model and the student model and used for calculating difference values of characteristic information of residual combination blocks of the teacher model and the student model at the same level, whether the expression capacity of the student model characteristic information is similar to that of the teacher model or not is judged, and if the difference is large, the teacher model characteristic information and the student model characteristic information are fused and original student model characteristic information is replaced.
In order to better implement the present invention, the feature information grafting module further includes a feature information difference value calculating part and a grafting feature information calculating part, the feature information difference value calculating part is configured to calculate a difference value of feature information of a residual combination block having the same level as the teacher model and the student model, when the difference value is greater than a threshold value, the grafting feature information calculating part is entered, the teacher model feature information and the student model feature information are fused together, then original student model feature information is replaced, and when the difference value is less than or equal to the threshold value, the student model feature information is not replaced.
In order to better realize the invention, further, after the characteristic information is input into the characteristic information difference value calculation part, a leveling layer and a full connection layer are sequentially used for processing so as to reduce extra calculation amount brought by high dimension, and then difference value calculation is carried out; the calculation formula of the characteristic information difference value calculation part is as follows:
Figure BDA0003150877350000021
wherein: d is the difference value of the characteristic information,
m is the dimension of the characteristic information, although the complexity of the modules used by the teacher model and the student model is different, the output dimension of the last output layer of the module is the same,
m is a dimension index parameter and is a dimension index parameter,
s is the characteristic information of the student model,
and t is teacher model characteristic information.
In order to better realize the invention, in the grafting characteristic information calculating part, the flexible maximum value calculation is carried out on the teacher model and the student model characteristic information in the channel direction to obtain the probability value of the characteristic information on each channel, when the probability value is larger than the threshold value, the channel student model characteristic information is not replaced, otherwise, the teacher model characteristic information and the student model characteristic information are fused, and the channel student model characteristic information is replaced.
In order to better implement the invention, further, the feature information fusion mapping relation of the grafting feature information calculation part is as follows:
F=Fs+αFt
wherein: f represents the processed characteristic information and,
Fsthe characteristic information of the student model is represented,
Ftthe information of the characteristics of the teacher model is represented,
alpha is a learnable tuning parameter.
The characteristic information grafting module mainly comprises a characteristic information difference value calculating part and a grafting characteristic information calculating part, wherein the characteristic information difference value calculating part calculates the difference value of the characteristic information of a residual combination block with the same level as a teacher model and a student model, when the difference value is larger than a threshold value, the characteristic information calculating part enters a grafting characteristic information calculating part, the characteristic information of the teacher model and the characteristic information of the student model are fused together, then the original characteristic information of the student model is replaced, if the difference value is smaller than the threshold value, the characteristic information of the student model is not replaced, and finally the characteristic information grafting mode is calculated along with model iterative training.
In order to better implement the method, the main networks of the teacher model and the student model respectively comprise a convolution layer, a batch normalization layer, an activation function layer and a plurality of residual modules which are sequentially arranged from front to back.
In order to better implement the present invention, the teacher model and the student model adopt a ResNet101 structure and a ResNet20 structure, the teacher model and the student model respectively include 4 residual combination blocks, the number of residual modules included in the 4 residual combination blocks of the teacher model is sequentially 6, 12, 24 and 6, and the number of residual modules included in the 4 residual combination blocks of the student model is sequentially 1, 2, 4 and 1.
The invention is mainly realized by the following technical scheme:
a flaw detection method based on a distillation learning mechanism is carried out by adopting the flaw detection device, and comprises the following steps:
step S100: collecting flaw-free and flaw-containing industrial product image samples, and forming a training set and a test set, wherein the training set consists of flaw-free industrial product image samples, and the test set consists of flaw-free and flaw-free industrial product image samples;
step S200: inputting a training set, training a teacher model and a student model simultaneously, adding a characteristic information grafting module between the teacher model and the student model, and finally performing loss calculation on characteristic information output by the teacher model and the student model to obtain a trained network model;
step S300: testing the precision of the trained network model by adopting a test set to obtain an optimized network model;
step S400: and inputting the image to be detected into the optimized network model and outputting a detection result.
In order to better implement the present invention, further, in step S200, the loss functions for performing the loss calculation are an euclidean distance metric loss function and a cosine similarity loss function, where the euclidean distance metric loss function is to calculate a distance similarity between the student model and the teacher model feature information as a loss value; the cosine similarity loss function is used for calculating a loss value between the teacher model and the student model feature information in the feature information channel direction; the calculation formula of the Euclidean distance measurement loss function is as follows:
Figure BDA0003150877350000041
wherein: n is the batch size, i is the sample index parameter, siFor student model feature information, tiCharacteristic information of the teacher model is obtained;
the formula for calculating the cosine similarity loss function is as follows:
Figure BDA0003150877350000042
wherein: j is a dimension index of the feature information,
the Flatten () is a feature vector transformation function and is used for converting high-dimensional feature information into one-dimensional feature information;
finally, the overall loss function is calculated as:
L=Ldis+λ(Lc1+Lc2+Lc3+Lc4)
wherein: l isc1Representing the cosine similarity loss function calculated by the first-stage residual combination block,
Lc2representing the cosine similarity loss function calculated by the second-stage residual combination block,
Lc3representing the cosine similarity loss function calculated by the third-level residual combination block,
Lc4representing the fourth level residueThe cosine similarity loss function calculated by the difference combination block,
lambda is a regulating parameter and is set to be 0.8.
The cosine similarity loss function is used for calculating loss values between teacher feature information and student feature information in the direction of the feature information channel, and the loss calculation is performed in parallel at the position where the feature information grafting module is placed. The Euclidean distance measurement loss function is used for calculating the distance similarity between the characteristic information in the student model and the characteristic information in the teacher model to serve as a loss value. The invention uses a characteristic information grafting module and a cosine similarity loss function in the middle layer, directly accelerates the characteristic learning of students on a characteristic level, improves the precision, indirectly shortens the characteristic distance of the two models on a space and improves the robustness of the characteristics of the student models.
A computer readable storage medium storing computer program instructions which, when executed by a processor, implement the flaw detection method described above.
According to the method, a teacher model and a student model are constructed according to design, the structures of the teacher model and the student model are kept consistent, the complexity of a network layer of the teacher model is higher than that of the student model, then the learning progress and the detection precision of the student model are accelerated by adding a characteristic information grafting module in a middle layer, and finally a loss function is used for calculating loss value optimization model parameters. The characteristic information grafting module is used for selecting whether to graft the calculated fusion characteristic information into the student model or not by calculating the characteristic information difference value between the teacher model and the student model, so that the imitation capability of the student model on the teacher model is improved. According to the invention, a teacher model and a student model are built by utilizing a distillation learning mechanism, and then a characteristic information grafting module is introduced into the intermediate layer of the double-branch network, so that the expression capability of the student model is optimized, the learning process of the student model is accelerated, and the detection precision of the student model is improved.
The method initializes the teacher model by using the pre-training model, optimizes the loss value by using the student model by using a random initialization method, iteratively updates the model parameters until the training iteration number is equal to the total iteration number to obtain a final model, and finally tests the precision of the model on the test set.
The characteristic information grafting module mainly comprises a characteristic information difference value calculation part and a grafting characteristic information calculation part, and is introduced because a great deal of characteristic information obtained after convolution kernel processing can cover various characteristics, such as textures, edges and other characteristics, but the characteristic information has a great number of useless characteristics, part of the useless characteristics can be removed in a common method through fine tuning and model iterative training, and the detection potential of a model algorithm is limited to a certain extent. The invention provides a characteristic information grafting module according to the purpose of a model, the purpose of the model is to lead a teacher model to guide a student model, the ideal effect is that the expression capacity of the student model on flawless image data is close to that of the teacher model, if the characteristic information learned in the model training process is not processed in the process, the influence of useless characteristics is ignored, the student model can not reach the expression capacity of the teacher model, so the main purpose of the module is to remove the invalid characteristics from the characteristic information of the student model, and new characteristics are grafted to reactivate a network, thereby achieving the purpose of improving precision.
The method comprises the steps of firstly calculating the difference value between the characteristic information output by the same-level residual block combination blocks of the student model and the teacher model, and then judging the difference value, wherein the difference value is larger than a threshold value, so that the difference between the learned characteristic information of the student model and the characteristics of the teacher model is very large, grafting is needed, and a grafting characteristic information calculation part is carried out. The method comprises the steps of calculating a flexible maximum value in the channel direction, judging the contribution degree of the characteristic to model learning through the probability value, and judging the grafting characteristic through a threshold value, wherein if all the characteristics of a teacher model are grafted into a student model, the autonomous learning capacity of the student model is influenced, and the generalization of the student model is limited.
The invention has the beneficial effects that:
(1) the characteristic information of a residual error combination block with the same level as a teacher model and a student model is subjected to difference value calculation through a characteristic information grafting module, whether the expression capacity of the student model characteristic information is similar to that of the teacher model or not is judged, if the difference is too large, the teacher model characteristic information and the student model characteristic information are fused to replace the original student model characteristic information, and the purposes of optimizing the expression capacity of the student model and accelerating the learning process of the student model are achieved;
(2) according to the method, a teacher model with strong learning capacity is used for conducting flawless sample learning guidance on the student model, so that the student model can better reconstruct flawless sample characteristic information, the difference between reconstructed flawed characteristic information and flawless characteristic information is increased, the purpose of flaw detection is achieved, the advantages of fitting an actual scene and being simple to deploy are achieved;
(3) according to the method, only flawless data is used for training and generating the model, the model is more suitable for actual conditions, and secondly, a characteristic information grafting module is introduced into the middle layer of the teacher model and the student model double-branch network, so that the expression capability of the student model is optimized, the learning process of the student model is accelerated, and the detection precision of the student model is improved.
Drawings
FIG. 1 is a schematic diagram of the overall network structure of the present invention;
FIG. 2 is a schematic structural diagram of a characteristic information grafting module;
FIG. 3 is a flow chart of characteristic information grafting;
FIG. 4 is a flow chart of model training.
Detailed Description
Example 1:
a flaw detection device based on a distillation learning mechanism comprises an acquisition module, a training module and a detection module; the acquisition module is used for acquiring flaw-free industrial product image samples and forming a training set; the training module is used for training a network model by adopting a training set, the network model comprises a teacher model and a student model with consistent backbone network structures, and the teacher model is used for guiding the student model to finally obtain an optimized network model; the detection module is used for inputting the image to be detected into the optimized network model and outputting a detection result.
As shown in fig. 1, the complexity of the network layer of the teacher model is higher than that of the network layer of the student model, a feature information grafting module is added in the middle layer between the teacher model and the student model, and is used for calculating the difference value of the feature information of the residual error combination block with the same level as the teacher model and the student model, and judging whether the expression capability of the feature information of the student model is similar to that of the teacher model, if the difference is large, the feature information of the teacher model and the feature information of the student model are fused, and original feature information of the student model is replaced.
The characteristic information grafting module is used for selecting whether to graft the calculated fusion characteristic information into the student model or not by calculating the characteristic information difference value between the teacher model and the student model, so that the imitation capability of the student model on the teacher model is improved. According to the invention, a teacher model and a student model are built by utilizing a distillation learning mechanism, and then a characteristic information grafting module is introduced into the intermediate layer of the double-branch network, so that the expression capability of the student model is optimized, the learning process of the student model is accelerated, and the detection precision of the student model is improved.
According to the method, only flawless data is used for training and generating the model, the model is more suitable for actual conditions, and secondly, a characteristic information grafting module is introduced into the middle layer of the teacher model and the student model double-branch network, so that the expression capability of the student model is optimized, the learning process of the student model is accelerated, and the detection precision of the student model is improved.
Example 2:
the embodiment is optimized on the basis of embodiment 1, as shown in fig. 2, the feature information grafting module includes a feature information difference value calculating part and a grafting feature information calculating part, the feature information difference value calculating part is used for calculating a difference value of feature information of a residual combination block of a teacher model and a student model at the same level, as shown in fig. 3, when the difference value is greater than a threshold value, the grafting feature information calculating part is entered, the teacher model feature information and the student model feature information are fused together, then original student model feature information is replaced, and when the difference value is less than or equal to the threshold value, the student model feature information is not replaced.
Further, as shown in fig. 3, after the feature information is input to the feature information difference value calculation portion, a leveling layer and a full connection layer are sequentially used for processing to reduce the extra calculation amount caused by high dimension, and then difference value calculation is performed; the calculation formula of the characteristic information difference value calculation part is as follows:
Figure BDA0003150877350000071
wherein: d is the difference value of the characteristic information,
m is the dimension of the characteristic information, although the complexity of the modules used by the teacher model and the student model is different, the output dimension of the last output layer of the module is the same,
m is a dimension index parameter and is a dimension index parameter,
s is the characteristic information of the student model,
and t is teacher model characteristic information.
Further, as shown in fig. 3, in the grafted feature information calculation section, the flexible maximum value calculation is performed on the teacher model and the student model feature information in the channel direction to obtain the probability value of the feature information on each channel, when the probability value is greater than the threshold value, the channel student model feature information is not replaced, otherwise, the teacher model feature information and the student model feature information are fused, and the channel student model feature information is replaced.
Further, the feature information fusion mapping relation of the grafting feature information calculation part is as follows:
F=Fs+αFt
wherein: f represents the processed characteristic information and,
Fsthe characteristic information of the student model is represented,
Ftthe information of the characteristics of the teacher model is represented,
alpha is a learnable tuning parameter.
The invention utilizes the teacher model with stronger learning ability to conduct the flawless sample learning guidance on the student model, so that the student model can better reconstruct the flawless sample characteristic information, increase the difference between the reconstructed flawed characteristic information and the flawless characteristic information, achieve the purpose of flaw detection, and realize the advantages of fitting the actual scene and being simple in deployment.
Other parts of this embodiment are the same as embodiment 1, and thus are not described again.
Example 3:
in this embodiment, optimization is performed based on embodiment 1 or 2, and as shown in fig. 1, the backbone networks of the teacher model and the student model respectively include a convolution layer, a batch normalization layer, an activation function layer, and a plurality of residual modules, which are sequentially arranged from front to back. And the activation function layer adopts parameters to correct the linear unit layer.
Further, the teacher model and the student model respectively comprise 4 residual combination blocks, the number of residual modules contained in the 4 residual combination blocks of the teacher model is 6, 12, 24 and 6, and the number of residual modules contained in the 4 residual combination blocks of the student model is 1, 2, 4 and 1.
The rest of this embodiment is the same as embodiment 1 or 2, and therefore, the description thereof is omitted.
Example 4:
a flaw detection method based on a distillation learning mechanism is carried out by adopting the flaw detection device, and comprises the following steps:
step S100: collecting flaw-free and flaw-containing industrial product image samples, and forming a training set and a test set, wherein the training set consists of flaw-free industrial product image samples, and the test set consists of flaw-free and flaw-free industrial product image samples;
step S200: inputting a training set, training a teacher model and a student model simultaneously, adding a characteristic information grafting module between the teacher model and the student model, and finally performing loss calculation on characteristic information output by the teacher model and the student model to obtain a trained network model;
step S300: testing the precision of the trained network model by adopting a test set to obtain an optimized network model;
step S400: and inputting the image to be detected into the optimized network model and outputting a detection result.
Further, in step S200, the loss function for performing the loss calculation is an euclidean distance metric loss function and a cosine similarity loss function, and the euclidean distance metric loss function is to calculate the distance similarity between the student model and the teacher model feature information as a loss value; the cosine similarity loss function is used for calculating a loss value between the teacher model and the student model feature information in the feature information channel direction.
The calculation formula of the Euclidean distance measurement loss function is as follows:
Figure BDA0003150877350000081
wherein: n is the batch size, i is the sample index parameter, siFor student model feature information, tiCharacteristic information of the teacher model is obtained;
the formula for calculating the cosine similarity loss function is as follows:
Figure BDA0003150877350000091
wherein: j is a dimension index of the feature information,
the Flatten () is a feature vector transformation function and is used for converting high-dimensional feature information into one-dimensional feature information;
finally, the overall loss function is calculated as:
L=Ldis+λ(Lc1+Lc2+Lc3+Lc4)
wherein: l isc1Representing the cosine similarity loss function calculated by the first-stage residual combination block,
Lc2representing the cosine similarity loss function calculated by the second-stage residual combination block,
Lc3representing the cosine similarity loss function calculated by the third-level residual combination block,
Lc4represents the cosine similarity loss function calculated by the fourth-level residual combination block,
lambda is a regulating parameter and is set to be 0.8.
The invention utilizes the teacher model with stronger learning ability to conduct the flawless sample learning guidance on the student model, so that the student model can better reconstruct the flawless sample characteristic information, increase the difference between the reconstructed flawed characteristic information and the flawless characteristic information, achieve the purpose of flaw detection, and realize the advantages of fitting the actual scene and being simple in deployment.
Example 5:
a flaw detection method based on a distillation learning mechanism comprises the following steps:
collecting image samples of the non-defective industrial products and defective industrial products, taking most of the non-defective samples as a training set, and taking the residual non-defective samples and all defective samples as a test set;
constructing a teacher model and a student model according to design, wherein the structures of the teacher model and the student model are consistent, but the complexity of a network layer of the teacher model is greater than that of the student model, accelerating the learning progress and the detection precision of the student model by adding a characteristic information grafting module in a middle layer, and finally performing loss calculation on characteristic information output by the teacher model and characteristic information output by the student model;
calculating a loss value between the characteristic information by using the Euclidean distance measurement loss function and the cosine similarity loss function;
as shown in fig. 4, the teacher model is initialized by using the pre-training model, the student model adopts a random initialization method, an optimizer needs to be selected to optimize the loss value, the model parameters are iteratively updated until the training iteration number is equal to the total iteration number to obtain a final model, and finally, the precision of the model is tested on the test set.
Further, as shown in fig. 1, the teacher model and the student model adopt a general backbone network structure, the teacher model and the student model both maintain the same structure, and all adopt convolutional layers, batch normalization layers, activation function layers, and a plurality of residual modules stacked according to a certain combination rule, but the complexity of the network layers is different, in the embodiment, the teacher model adopts a ResNet101 structure, and the student model adopts a ResNet20 structure, and all have 4 residual combination blocks, the number of residual modules included in the residual combination block in the teacher model is 6, 12, 24, and 6, and the number of residual modules included in the residual combination in the student model is 1, 2, 4, and 1 in sequence.
Further, as shown in fig. 2, the feature information grafting module is mainly composed of a feature information difference value calculation part and a grafting feature information calculation part, the feature information difference value calculation part calculates the difference value of the feature information of the residual error combination block with the same level as the teacher model and the student model, as shown in fig. 3, when the difference value is greater than a threshold value, the grafting feature information calculation part is entered, the teacher model feature information and the student model feature information are fused together, then the original student model feature information is replaced, if the difference value is less than the threshold value, the student model feature information is not replaced, and finally, the feature information grafting mode is calculated along with the model iterative training.
Furthermore, the cosine similarity loss function is used for calculating loss values between teacher feature information and student feature information in the direction of the feature information channel, and insertion loss calculation is performed in parallel at the position where the feature information grafting module is placed.
Further, the Euclidean distance measurement loss function is used for calculating the distance similarity between the feature information in the student model and the feature information in the teacher model to serve as a loss value.
According to the method, only flawless data is used for training and generating the model, the model is more suitable for actual conditions, and secondly, a characteristic information grafting module is introduced into the middle layer of the teacher model and the student model double-branch network, so that the expression capability of the student model is optimized, the learning process of the student model is accelerated, and the detection precision of the student model is improved.
The invention utilizes the teacher model with stronger learning ability to conduct the flawless sample learning guidance on the student model, so that the student model can better reconstruct the flawless sample characteristic information, increase the difference between the reconstructed flawed characteristic information and the flawless characteristic information, achieve the purpose of flaw detection, and realize the advantages of fitting the actual scene and being simple in deployment.
Example 6:
a flaw detection method based on a distillation learning mechanism comprises the following steps:
collecting image samples of the non-defective industrial products and defective industrial products, taking most of the non-defective samples as a training set, and taking the residual non-defective samples and all defective samples as a test set;
according to the design, a teacher model and a student model are built, as shown in figure 1, the overall structure of a network model is a double-branch structure, the teacher model and the student model are built according to the design, the structures of the teacher model and the student model are consistent, but the complexity of the network layer of the teacher model is higher than that of the student model, and then the learning progress and the detection precision of the student model are accelerated by adding a characteristic information grafting module in the middle layer;
and finally, performing loss calculation on the characteristic information output by the teacher model and the characteristic information output by the student model.
Further, as shown in fig. 1, the teacher model and the student model adopt a general backbone network structure, both of which maintain the same structure, and all of which adopt convolutional layers, batch normalization layers, activation function layers, and a plurality of residual error modules stacked according to a certain combination rule, but the complexity of the network layers is different, in the embodiment, the teacher model adopts a ResNet101 structure, while the student model adopts a ResNet20 structure, all of which have 4 residual error combination blocks, the number of residual error modules included in the residual error combination block in the teacher model is sequentially 6, 12, 24, and 6, each residual error combination block is called a complex residual error combination block, the number of residual error modules included in the residual error combination block in the student model is sequentially 1, 2, 4, and 1, and each residual error combination block is called a simple residual error combination block.
Further, as shown in fig. 2, the feature information grafting module mainly includes a feature information difference value calculating part and a grafting feature information calculating part, the feature information difference value calculating part calculates the difference value of the feature information of the residual combination block at the same level as the teacher model and the student model, and the calculation formula is as follows:
Figure BDA0003150877350000111
d in the formula is a difference value of the characteristic information, M is the dimension of the characteristic information, although the complexity of the modules used by the teacher model and the student model is different, the output dimension of the last output layer of the modules is the same, M is a dimension index parameter, s is the characteristic information of the student model, and t is the characteristic information of the teacher model.
Further, as shown in fig. 3, after the feature information is input to the feature information difference value calculation portion, a leveling layer and a full connection layer are required to be used for processing, so that the extra calculation amount caused by high dimensionality is reduced.
Further, when the difference value is larger than a threshold value, the grafting characteristic information calculation part is carried out, and if the difference value is smaller than the threshold value, the student model characteristic information is not replaced, and the threshold value is set to be 0.2.
Further, replacing the feature information is performed in the channel direction of the feature information, as shown in fig. 2, the feature information is a feature block, each channel represents features with different meanings, as shown in fig. 3, the feature information is subjected to flexible maximum value calculation in the channel direction to obtain a probability value of the feature information on each channel, when the probability value is greater than a threshold value, processing is not performed, otherwise, the teacher model feature information and the student model feature information are fused, the threshold value is set to be 0.3, and finally the combined feature information is output. The characteristic information fusion mapping relation in the grafting characteristic information calculation part is as follows:
F=Fs+αFt
in the formula, F represents processed characteristic information, FsRepresenting student model characteristic information, FtRepresenting teacher model feature information, alpha is a learnable tuning parameter.
Further, the loss functions used in the present invention are an euclidean distance metric loss function and a cosine similarity loss function, and the calculation formula of the euclidean distance metric loss function is as follows:
Figure BDA0003150877350000112
in the formula, N is the batch size, i is the sample index parameter, siFor student model feature information, tiAnd (4) feature information of the teacher model.
Further, the cosine similarity loss function calculation formula is as follows:
Figure BDA0003150877350000121
in the formula, j is a feature information dimension index, and Flatten () is a feature vector transformation function, so that high-dimensional feature information is converted into one-dimensional feature information.
Finally, the overall loss function is calculated as:
L=Ldis+λ(Lc1+Lc2+Lc3+Lc4)
l in the formulac1And (3) representing cosine similarity loss function calculated by the first-stage residual combination block and other similar parameters in the same way, as shown in fig. 1, calculating cosine similarity loss value of each-stage residual combination block, wherein lambda is an adjusting parameter and is set to be 0.8.
Further, as shown in fig. 4, a teacher model is initialized by using a pre-training model, a student model adopts a random initialization method, an optimizer needs to be selected to optimize a loss value, model parameters are iteratively updated until a final model is obtained when the number of training iterations is equal to the total number of iterations, and finally, the precision of the model is tested on a test set.
In the training process, the trained image is simultaneously input into a student model and a teacher model, and the difference value of characteristic information output by the two models is calculated, wherein the teacher model can distinguish a flawless product image from a flawed product image because of setting a pre-training parameter, and the student model only learns the flawless product image, so that the teacher model can reconstruct the characteristic information well aiming at the flawed product image, the student model is not reconstructed well, and finally the difference value of the characteristic information reconstruction is calculated, so that whether the product image sample has a flaw or not can be judged, and if the difference value is large, the flaw exists, otherwise, the product image sample is normal. If the position of the flaw needs to be located, the model is propagated backwards to find the position of the flaw through gradient change.
In conclusion, the flaw detection method provided by the invention only needs to use the flaw-free product image sample for learning, reduces the occurrence of the situation that the sample is difficult to collect, is more suitable for the actual scene, is easy to operate in network structure construction, and has higher feasibility of implementation.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims (10)

1. A flaw detection device based on a distillation learning mechanism is characterized by comprising an acquisition module, a training module and a detection module; the acquisition module is used for acquiring flaw-free industrial product image samples and forming a training set; the training module is used for training a network model by adopting a training set, the network model comprises a teacher model and a student model with consistent backbone network structures, and the teacher model is used for guiding the student model to finally obtain an optimized network model; the detection module is used for inputting the image to be detected into the optimized network model and outputting a detection result;
the network layer complexity of the teacher model is higher than that of the student model, a characteristic information grafting module is added in an intermediate layer between the teacher model and the student model and used for calculating difference values of characteristic information of residual combination blocks of the teacher model and the student model at the same level, whether the expression capacity of the student model characteristic information is similar to that of the teacher model or not is judged, and if the difference is large, the teacher model characteristic information and the student model characteristic information are fused and original student model characteristic information is replaced.
2. The flaw detection device based on the distillation learning mechanism as claimed in claim 1, wherein the characteristic information grafting module comprises a characteristic information difference value calculation part and a grafting characteristic information calculation part, the characteristic information difference value calculation part is used for performing difference value calculation on the characteristic information of a residual error combination block with the same level as a teacher model and a student model, when the difference value is larger than a threshold value, the grafting characteristic information calculation part is entered, the teacher model characteristic information and the student model characteristic information are fused together, then original student model characteristic information is replaced, and when the difference value is smaller than or equal to the threshold value, the student model characteristic information is not replaced.
3. The flaw detection device based on the distillation learning mechanism as claimed in claim 2, wherein after the feature information is input into the feature information difference value calculation part, the feature information is sequentially processed by a leveling layer and a full connection layer to reduce the extra calculation amount caused by high dimension, and then the difference value calculation is performed; the calculation formula of the characteristic information difference value calculation part is as follows:
Figure FDA0003150877340000011
wherein: d is the difference value of the characteristic information,
m is the dimension of the characteristic information, although the complexity of the modules used by the teacher model and the student model is different, the output dimension of the last output layer of the module is the same,
m is a dimension index parameter and is a dimension index parameter,
s is the characteristic information of the student model,
and t is teacher model characteristic information.
4. The flaw detection device based on the distillation learning mechanism as claimed in claim 2, wherein in the grafting characteristic information calculation part, the flexible maximum value calculation is performed on the teacher model and the student model characteristic information in the channel direction to obtain the probability value of the characteristic information on each channel, when the probability value is greater than the threshold value, the channel student model characteristic information is not replaced, otherwise, the teacher model characteristic information and the student model characteristic information are fused and the channel student model characteristic information is replaced.
5. The flaw detection device based on the distillation learning mechanism is characterized in that the feature information fusion mapping relation of the grafting feature information calculation part is as follows:
F=Fs+αFt
wherein: f represents the processed characteristic information and,
Fsthe characteristic information of the student model is represented,
Ftthe information of the characteristics of the teacher model is represented,
alpha is a learnable tuning parameter.
6. The flaw detection device based on the distillation learning mechanism is characterized in that the main network of the teacher model and the student model respectively comprises a convolution layer, a batch normalization layer, an activation function layer and a plurality of residual modules which are sequentially arranged from front to back.
7. The flaw detection device based on the distillation learning mechanism as claimed in claim 6, wherein the teacher model adopts a ResNet101 structure, the student model adopts a ResNet20 structure, the teacher model and the student model respectively comprise 4 residual combination blocks, the number of residual modules contained in the 4 residual combination blocks of the teacher model is sequentially 6, 12, 24 and 6, and the number of residual modules contained in the 4 residual combination blocks of the student model is sequentially 1, 2, 4 and 1.
8. A defect detection method based on a distillation learning mechanism, which is performed by using the defect detection device according to any one of claims 1 to 7, and comprises the following steps:
step S100: collecting flaw-free and flaw-containing industrial product image samples, and forming a training set and a test set, wherein the training set consists of flaw-free industrial product image samples, and the test set consists of flaw-free and flaw-free industrial product image samples;
step S200: inputting a training set, training a teacher model and a student model simultaneously, adding a characteristic information grafting module between the teacher model and the student model, and finally performing loss calculation on characteristic information output by the teacher model and the student model to obtain a trained network model;
step S300: testing the precision of the trained network model by adopting a test set to obtain an optimized network model;
step S400: and inputting the image to be detected into the optimized network model and outputting a detection result.
9. The flaw detection method based on the distillation learning mechanism according to claim 8, wherein in step S200, the loss functions for performing the loss calculation are an euclidean distance metric loss function and a cosine similarity loss function, and the euclidean distance metric loss function is a function for calculating distance similarity between the student model and the teacher model feature information as a loss value; the cosine similarity loss function is used for calculating a loss value between the teacher model and the student model feature information in the feature information channel direction; the calculation formula of the Euclidean distance measurement loss function is as follows:
Figure FDA0003150877340000031
wherein: n is the batch size, i is the sample index parameter, siFor student model feature information, tiCharacteristic information of the teacher model is obtained;
the formula for calculating the cosine similarity loss function is as follows:
Figure FDA0003150877340000032
wherein: j is a dimension index of the feature information,
the Flatten () is a feature vector transformation function and is used for converting high-dimensional feature information into one-dimensional feature information;
finally, the overall loss function is calculated as:
Figure FDA0003150877340000033
wherein: l isc1Representing the cosine similarity loss function calculated by the first-stage residual combination block,
Lc2representing the cosine similarity loss function calculated by the second-stage residual combination block,
Lc3representing the cosine similarity loss function calculated by the third-level residual combination block,
Lc4represents the cosine similarity loss function calculated by the fourth-level residual combination block,
lambda is a regulating parameter and is set to be 0.8.
10. A computer readable storage medium storing computer program instructions, which when executed by a processor implement the fault detection method of claim 8 or 9.
CN202110765481.5A 2021-07-06 2021-07-06 Flaw detection device, method and storage medium based on distillation learning mechanism Active CN113469977B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110765481.5A CN113469977B (en) 2021-07-06 2021-07-06 Flaw detection device, method and storage medium based on distillation learning mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110765481.5A CN113469977B (en) 2021-07-06 2021-07-06 Flaw detection device, method and storage medium based on distillation learning mechanism

Publications (2)

Publication Number Publication Date
CN113469977A true CN113469977A (en) 2021-10-01
CN113469977B CN113469977B (en) 2024-01-12

Family

ID=77878753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110765481.5A Active CN113469977B (en) 2021-07-06 2021-07-06 Flaw detection device, method and storage medium based on distillation learning mechanism

Country Status (1)

Country Link
CN (1) CN113469977B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114693995A (en) * 2022-04-14 2022-07-01 北京百度网讯科技有限公司 Model training method applied to image processing, image processing method and device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830813A (en) * 2018-06-12 2018-11-16 福建帝视信息科技有限公司 A kind of image super-resolution Enhancement Method of knowledge based distillation
CN109711544A (en) * 2018-12-04 2019-05-03 北京市商汤科技开发有限公司 Method, apparatus, electronic equipment and the computer storage medium of model compression
WO2019228122A1 (en) * 2018-05-29 2019-12-05 腾讯科技(深圳)有限公司 Training method for model, storage medium and computer device
CN110674880A (en) * 2019-09-27 2020-01-10 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation
CN110852426A (en) * 2019-11-19 2020-02-28 成都晓多科技有限公司 Pre-training model integration acceleration method and device based on knowledge distillation
CN111062951A (en) * 2019-12-11 2020-04-24 华中科技大学 Knowledge distillation method based on semantic segmentation intra-class feature difference
CN111126599A (en) * 2019-12-20 2020-05-08 复旦大学 Neural network weight initialization method based on transfer learning
CN111160409A (en) * 2019-12-11 2020-05-15 浙江大学 Heterogeneous neural network knowledge reorganization method based on common feature learning
WO2020228446A1 (en) * 2019-05-13 2020-11-19 腾讯科技(深圳)有限公司 Model training method and apparatus, and terminal and storage medium
WO2020253127A1 (en) * 2019-06-21 2020-12-24 深圳壹账通智能科技有限公司 Facial feature extraction model training method and apparatus, facial feature extraction method and apparatus, device, and storage medium
CN112257453A (en) * 2020-09-23 2021-01-22 昆明理工大学 Chinese-Yue text similarity calculation method fusing keywords and semantic features
WO2021033791A1 (en) * 2019-08-19 2021-02-25 엘지전자 주식회사 Ai-based new learning model generation system for vision inspection on product production line
CN112487182A (en) * 2019-09-12 2021-03-12 华为技术有限公司 Training method of text processing model, and text processing method and device
WO2021056043A1 (en) * 2019-09-23 2021-04-01 Presagen Pty Ltd Decentralised artificial intelligence (ai)/machine learning training system
SE1930421A1 (en) * 2019-12-30 2021-07-01 Unibap Ab Method and means for detection of imperfections in products

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019228122A1 (en) * 2018-05-29 2019-12-05 腾讯科技(深圳)有限公司 Training method for model, storage medium and computer device
CN108830813A (en) * 2018-06-12 2018-11-16 福建帝视信息科技有限公司 A kind of image super-resolution Enhancement Method of knowledge based distillation
CN109711544A (en) * 2018-12-04 2019-05-03 北京市商汤科技开发有限公司 Method, apparatus, electronic equipment and the computer storage medium of model compression
WO2020228446A1 (en) * 2019-05-13 2020-11-19 腾讯科技(深圳)有限公司 Model training method and apparatus, and terminal and storage medium
WO2020253127A1 (en) * 2019-06-21 2020-12-24 深圳壹账通智能科技有限公司 Facial feature extraction model training method and apparatus, facial feature extraction method and apparatus, device, and storage medium
WO2021033791A1 (en) * 2019-08-19 2021-02-25 엘지전자 주식회사 Ai-based new learning model generation system for vision inspection on product production line
CN112487182A (en) * 2019-09-12 2021-03-12 华为技术有限公司 Training method of text processing model, and text processing method and device
WO2021056043A1 (en) * 2019-09-23 2021-04-01 Presagen Pty Ltd Decentralised artificial intelligence (ai)/machine learning training system
CN110674880A (en) * 2019-09-27 2020-01-10 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation
CN110852426A (en) * 2019-11-19 2020-02-28 成都晓多科技有限公司 Pre-training model integration acceleration method and device based on knowledge distillation
CN111160409A (en) * 2019-12-11 2020-05-15 浙江大学 Heterogeneous neural network knowledge reorganization method based on common feature learning
CN111062951A (en) * 2019-12-11 2020-04-24 华中科技大学 Knowledge distillation method based on semantic segmentation intra-class feature difference
CN111126599A (en) * 2019-12-20 2020-05-08 复旦大学 Neural network weight initialization method based on transfer learning
SE1930421A1 (en) * 2019-12-30 2021-07-01 Unibap Ab Method and means for detection of imperfections in products
CN112257453A (en) * 2020-09-23 2021-01-22 昆明理工大学 Chinese-Yue text similarity calculation method fusing keywords and semantic features

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
葛仕明;赵胜伟;***;李晨钰;: "基于深度特征蒸馏的人脸识别", 北京交通大学学报, no. 06 *
雷杰;高鑫;宋杰;王兴路;宋明黎;: "深度网络模型压缩综述", 软件学报, no. 02 *
高璇;饶鹏;刘高睿;: "基于特征蒸馏的实时人体动作识别", 工业控制计算机, no. 08 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114693995A (en) * 2022-04-14 2022-07-01 北京百度网讯科技有限公司 Model training method applied to image processing, image processing method and device

Also Published As

Publication number Publication date
CN113469977B (en) 2024-01-12

Similar Documents

Publication Publication Date Title
CN111598881B (en) Image anomaly detection method based on variational self-encoder
Zhao et al. A visual long-short-term memory based integrated CNN model for fabric defect image classification
CN108492286B (en) Medical image segmentation method based on dual-channel U-shaped convolutional neural network
CN105975573B (en) A kind of file classification method based on KNN
CN109389171B (en) Medical image classification method based on multi-granularity convolution noise reduction automatic encoder technology
CN111680706A (en) Double-channel output contour detection method based on coding and decoding structure
CN112102229A (en) Intelligent industrial CT detection defect identification method based on deep learning
CN112185423B (en) Voice emotion recognition method based on multi-head attention mechanism
KR20220050083A (en) AI-based new learning model creation system for vision inspection on product production lines
CN116843650A (en) SMT welding defect detection method and system integrating AOI detection and deep learning
CN115587543A (en) Federal learning and LSTM-based tool residual life prediction method and system
CN115935277A (en) Supercharged boiler fault diagnosis method based on small sample learning and training method and testing method of fault diagnosis model
CN113469977B (en) Flaw detection device, method and storage medium based on distillation learning mechanism
CN115358337A (en) Small sample fault diagnosis method and device and storage medium
CN111783616A (en) Data-driven self-learning-based nondestructive testing method
CN116109621B (en) Defect detection method and system based on depth template
CN116721071A (en) Industrial product surface defect detection method and device based on weak supervision
CN116152194A (en) Object defect detection method, system, equipment and medium
CN113889274B (en) Method and device for constructing risk prediction model of autism spectrum disorder
CN113642662B (en) Classification detection method and device based on lightweight classification model
CN115661498A (en) Self-optimization single cell clustering method
Lim et al. Analyzing deep neural networks with noisy labels
CN109409424B (en) Appearance defect detection model modeling method and device
Li et al. Industrial anomaly detection via teacher student network
CN111626409B (en) Data generation method for image quality detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant