CN116152240B - Industrial defect detection model compression method based on knowledge distillation - Google Patents

Industrial defect detection model compression method based on knowledge distillation Download PDF

Info

Publication number
CN116152240B
CN116152240B CN202310412539.7A CN202310412539A CN116152240B CN 116152240 B CN116152240 B CN 116152240B CN 202310412539 A CN202310412539 A CN 202310412539A CN 116152240 B CN116152240 B CN 116152240B
Authority
CN
China
Prior art keywords
model
distillation
layer
student
teacher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310412539.7A
Other languages
Chinese (zh)
Other versions
CN116152240A (en
Inventor
陈宇
陈震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Weitu Software Technology Co ltd
Original Assignee
Xiamen Weitu Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Weitu Software Technology Co ltd filed Critical Xiamen Weitu Software Technology Co ltd
Priority to CN202310412539.7A priority Critical patent/CN116152240B/en
Publication of CN116152240A publication Critical patent/CN116152240A/en
Application granted granted Critical
Publication of CN116152240B publication Critical patent/CN116152240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for compressing an industrial defect detection model based on knowledge distillation, which comprises the following steps: constructing and training a teacher model; distilling the teacher model to the student model by a knowledge distillation mode, comprising: intermediate network layer feature distillation: the student model adopts a characteristic distillation mode to learn the characteristic information of the middle network layer of the teacher model, so that the student model repeatedly approaches the characteristic information of the teacher model in the corresponding middle network layer in the middle network layers with different depths, and then the soft label of the teacher model is used for training the student model network; softmax layer target distillation: the student model learns softmax layer characteristic information of the teacher model in a mode of combining a soft target and a hard target. The invention adopts a knowledge distillation mode to distill the capability of the model for detecting the industrial defects with complex training in advance and high precision onto the lightweight model with simple structure, and simplifies the model parameters and improves the detection efficiency on the premise of ensuring the high precision detection rate.

Description

Industrial defect detection model compression method based on knowledge distillation
Technical Field
The invention relates to the technical field of industrial product defect detection, in particular to a method for compressing an industrial defect detection model based on knowledge distillation.
Background
With the continuous development of deep learning technology and the enhancement of hardware computing capability, the model detection capability is more and more powerful, and model network parameters are more and more complex, but in actual application scenes, the characteristics of light weight, high efficiency and the like are more focused. In the defect detection of the industrial field, the running beat and the memory occupation condition always need to be considered, so the defect detection of the industrial field is more biased to reduce the model parameters as much as possible under the condition of not remarkably reducing the detection precision, namely the lightweight property and the real-time property of the model are required.
The model network of the current mainstream deep learning-based industrial defect detection model generally has a convolutional network layer (input layer), an intermediate network layer and a finally output softmax layer (output layer). Taking depth residual network model res net-50 as an example, the intermediate network layer generally comprises a pooling layer, a plurality of residual layers, and the like, and intermediate layer characteristic information is generated in the model training process.
The output value of the output layer is a probability value of each category of the image, the output value of the softmax layer contains a large number of negative labels, the softmax function of the core is used in the multi-classification process, the output of a plurality of negative labels is mapped into a (0, 1) interval and can be seen as probability to be understood, so that multi-classification is carried out, and the softmax function is expressed as:
wherein the method comprises the steps ofRepresenting the final output of the network, which is the input image belonging toiThe likelihood of a category, the greater the value the greater the likelihood; />Represent the firstiThe probability value after the individual categories pass softmax is that the input image belongs toiProbability of category;Nis the number of categories.
The deep learning model with deeper and wider depth often needs a large amount of parameter calculation, and has higher requirements on hardware calculation resources, so the defect detection model based on the deep learning has higher precision, but is also limited in practical application, and in order to solve the problems, the classification model parameters are changed from complicated to simple and light, the parameter quantity is reduced, the model is light, and the adaptation to equipment with limited calculation force is required. Moreover, as the application scenes of industrial defect detection are more and more, the targeted compression of the model is particularly important for the desirability of industrial defects.
Aiming at the problems of complex parameters and large calculated amount of the existing industrial detection algorithm, the invention adopts a knowledge distillation method to lighten the model, reduces the parameter of the algorithm model, accelerates the detection speed and reduces the memory occupation under the condition of not losing the detection precision.
Disclosure of Invention
The invention aims to provide an industrial defect detection model compression method based on knowledge distillation, which distills the capability of an industrial defect detection model with complex training in advance and high precision onto a lightweight model with simple structure by adopting a knowledge distillation mode of a teacher model and a student model, and can simplify model parameters and improve detection efficiency on the premise of meeting the required high-precision detection rate.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
an industrial defect detection model compression method based on knowledge distillation comprises the following steps:
constructing and training a teacher model;
distilling the teacher model to the student model by a knowledge distillation mode, comprising:
intermediate network layer feature distillation: the student model adopts a characteristic distillation mode to learn the characteristic information of the middle network layer of the teacher model, so that the student model repeatedly approaches the characteristic information of the teacher model in the corresponding middle network layer in the middle network layers with different depths, and then the soft label of the teacher model is used for training the student model network;
softmax layer target distillation: the student model learns softmax layer characteristic information of the teacher model in a mode of combining a soft target and a hard target; the soft target refers to the learning of the teacher model by using the probability value output by the softmax layer of the teacher model, and the hard target refers to the learning of the teacher model by using the last label value of the teacher model.
Preferably, the distillation of the characteristics of the intermediate network layer specifically comprises:
selecting an intermediate network layer to be distilled, and connecting a convolution layer behind the intermediate network layer of the student model to ensure that the output characteristic information size of the student model is consistent with the output characteristic information size of the teacher model;
training parameters of an intermediate network layer of the student model according to the characteristic distillation mode;
the intermediate network layer of the student model is trained using soft labels of the teacher model intermediate network layer.
Further, in the middle network layer characteristic distillation process, the output characteristic information size of the middle network layer output characteristic information of the student model and the difference loss L between the middle network layer output characteristic information of the teacher model are kept consistent by calculating the calculation formula of the output characteristic information difference loss L:
in the above-mentioned description of the invention,is the distillation coefficient of the different stages,iis a certain stage layer of student model and teacher model in the middle network layer, and is a part of the middle network layer>Is the output characteristic information of the i-th stage teacher model,>is the output characteristic information of the student model of the ith stage.
Preferably, the softmax layer is distilled target, specifically comprising:
setting a probability value output by a teacher model softmax layer as an initial soft label, and generating a deformed soft label according to a set distillation temperature T based on the initial soft label, wherein a label value finally obtained by the teacher model is a hard label;
simultaneously using the deformed soft tag and the hard tag to carry out high-temperature distillation on the student model;
and dynamically adjusting the contribution degree of the deformed soft tag and the hard tag to the whole high-temperature distillation process according to the feedback result of the high-temperature distillation.
Further, the softmax function of the deformed soft label based on the initial soft label according to the set distillation temperature T is expressed as:
in the above formula, whereinRepresenting the final output of the model, representing that the input image belongs toiPossibility of category->Representing the probability value that the i-th category developed after passing through the softmax layer,Tis the distillation temperature.
Further, in the high temperature distillation process, an objective function is weighted by a distillation loss value Loft and a student model loss value Lhard, and the objective function is expressed as:
αandβrespectively represent distillation loss valuesL sof And student model loss valueL hard Weighting values in an objective function, wherein the distillation loss valueL sof The formula is as follows:
in the above-mentioned description of the invention,is the temperature of the teacher modelTOutput at the first through softmax layeriValue of class->Is the temperature of the student modelTOutput at the first through softmax layeriA class value, wherein:
in the above-mentioned description of the invention,y i is the teacher model softmax layer output,z i is the student model softmax layer output,Nis the number of categories;
the student model loss valueL hard The formula is as follows:
in the above-mentioned description of the invention,c i is the firstiThe true tag value of the class is used,,/>is thatTStudent model output at 1 through softmax layeriClass values.
Further, according to the feedback result of the high-temperature distillation, the contribution degree of the deformed soft tag and the hard tag to the whole distillation is dynamically adjusted, which specifically comprises: and dynamically adjusting alpha and beta weight values in the objective function according to a feedback result of the high-temperature distillation to balance the contribution degree of the soft and hard labels to the whole high-temperature distillation process.
Further, the student model adopts a depth separable convolution DenseNet as an input layer.
After the scheme is adopted, the invention has the following beneficial effects:
aiming at a complex industrial defect detection Model (teacher Model), the invention learns the characteristic information of an intermediate network layer of the teacher Model in a characteristic distillation mode, learns the characteristic information of a softmax layer of the teacher Model in a target distillation mode, and obtains a lightweight teacher Model Lightweight Model (student Model), which can ensure that the running rate of the Model is greatly improved on the premise that the precision of an original Model is basically not lost, and can meet the requirement of high detection rate under the condition of limited industrial quality inspection computing resources.
The industrial defect detection model compression method can be applied to industrial defect detection modules of equipment or systems such as a top cover welding visual detection system, an automatic feeding and discharging machine (universal), a Mylar machine bag Mylar CCD detection device, a sealing nail welding visual detection system, an EPD burning and lighting AOI detection device, a battery cell appearance detection machine, a blade battery six detection system, a two-dimensional bar code reader VCR, a bending machine, a PSA small material attaching machine (single channel) and the like.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other variants can be obtained according to these drawings without the aid of inventive efforts to a person skilled in the art.
FIG. 1 is a flow chart of the steps of a method for compressing an industrial defect detection model based on knowledge distillation in accordance with the present invention;
FIG. 2 is a block diagram of the overall scheme design of the knowledge-based industrial defect detection model compression method of the present invention;
FIG. 3 is a schematic diagram of a characteristic distillation structure;
FIG. 4 is a schematic diagram of a target distillation structure;
FIG. 5 is a generic convolutional layer parameter resolution diagram;
fig. 6 is a DenseNet layer parameter resolution diagram.
Detailed Description
The technical solutions of the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides an industrial defect detection model compression method based on knowledge distillation, which is a model compression method based on knowledge distillation, wherein the model compression method can damage the structure of an original model through parameter cutting, precision conversion and neural network structure searching, namely, the knowledge learned from data of the original model can be damaged, the knowledge learned in the original model is protected through the knowledge distillation, and then the knowledge is transferred into a compression model, so that the compression model has much smaller volume than the original model, but can learn the same knowledge. The Model to be compressed in knowledge distillation is called a Teacher neural network (Teacher Model) or a "Teacher Model" described below, and the Model after compression is called a Student neural network (Student Model) or a "Student Model" described below.
As shown in fig. 1-2, the above-mentioned industrial defect detection algorithm based on the standardized flow model specifically includes the following two stages:
s1, an original model training stage, namely constructing and utilizing a data set to train a teacher model;
s2, distilling the teacher model to the student model in a knowledge distillation mode, wherein the knowledge distillation step specifically comprises the following steps:
s21, distilling the characteristics of an intermediate network layer: the student model adopts a characteristic distillation mode to learn the characteristic information of an intermediate network layer of the teacher model, so that the student model repeatedly approaches the characteristic information of the teacher model in the corresponding intermediate network layer in the intermediate network layers with different depths, and then a soft label of the teacher model is used for training the student model network, wherein the soft label is a probability value output by a softmax layer of the teacher model; the student model fits not only the soft-label of the teacher model, but also the output of several hidden layers in the intermediate network layer (feature information extracted by the teacher model). Referring to fig. 3, the middle network layer characteristic distillation process specifically includes:
s211, selecting an intermediate network layer to be distilled, wherein the output characteristic value sizes of the intermediate network layer of the teacher model and the intermediate network layer of the student model can be different, and a convolution layer is required to be connected behind the intermediate network layer of the student model so that the output characteristic information size of the intermediate network layer is consistent with the output characteristic information size of the teacher model; specifically, the difference loss L between the output characteristic information of the middle network layer of the student model and the output characteristic information of the middle network layer of the teacher model is calculated so that the sizes of the output characteristic information of the student model and the output characteristic information are consistent, and the calculation formula of the difference loss L of the output characteristic information is as follows:
in the above-mentioned description of the invention,is the distillation coefficient of the different stages,iis a certain stage layer of student model and teacher model in the middle network layer, and is a part of the middle network layer>Is the firstiOutput characteristic information of stage layer teacher model, +.>Is the firstiAnd outputting characteristic information of the stage layer student model.
S212, training parameters of an intermediate network layer of the student model according to a characteristic distillation mode, wherein the student model learns output characteristic information of the teacher model;
s213, training the middle network layer of the student model by using the soft label of the middle network layer of the teacher model.
S22, softmax layer target distillation: the student model learns softmax layer characteristic information of the teacher model in a mode of combining a soft target and a hard target; in a classification model, the final output of the model network is typically a softmax layer, the layer output values are probability values for each class, and when the probability values output by the softmax layer of the teacher model are used to guide the student model to learn, this approach is called a soft target; when the final tag value of the teacher model is used to guide the student model to learn, this approach is called a hard target. The output value of the last softmax layer of the network model contains a large amount of negative label reasoning information, the information is favorable for the model to extract the characteristic information of the image more fully and comprehensively, and the hard target is favorable for the characteristic information learning of the current sample, so that the embodiment plays the advantages of two modes and guides the student model according to the teaching of the material. Referring to fig. 4, the softmax layer target distillation specifically includes:
s221, setting a probability value output by a soft max layer of a teacher model as an initial soft label (soft-target), and generating a deformed soft label according to a set distillation temperature T based on the initial soft label, wherein a label value finally obtained by the teacher model is a Hard label (Hard-target); the softmax function of the deformed soft label is expressed as:
in the above formula, whereinRepresenting the final output of the model, representing the likelihood that the input image belongs to the i category, the greater the value the greater the likelihood +.>Representing a probability value obtained after the ith category passes through the softmax layer, wherein T is distillation temperature, when T=1, the formula is a softmax function formula of the initial soft label, the higher T is, the smoother the function output probability is, the greater the distribution entropy is, the corresponding amplification is carried out on the characteristic information of the negative label, and the model learning is more focused on the negative label;
s222, performing high-temperature distillation on the student model by using the deformed soft tag and the hard tag simultaneously; in the high temperature distillation process, the objective function is determined by distillation loss valueL soft (distill loss) and student model loss valuesL hard (student) weighting, the objective function is expressed as:
αandβrespectively representDistillation loss valueL soft And student model loss valueL hard Weighting values in an objective function, wherein the distillation loss valueL soft The formula is as follows:
in the above-mentioned description of the invention,is the teacher model outputs the value in class i,/through softmax layer at temperature T>Is the student model outputs a value in class i through the softmax layer at temperature T, where the specific formula is as follows:
in the above-mentioned description of the invention,y i is the teacher model softmax layer output,z i is the student model softmax layer output,Nis the number of categories;
the student model loss valueL hard The formula is as follows:
in the above-mentioned description of the invention,c i is the firstiThe true tag value of the class is used,when the label is positivec i 1, negative labelc i 0->Is thatTFor 1 student model pass softmax layer output at the firstiClass values.
S223, dynamically adjusting the contribution degree of the deformed soft tag and the hard tag to the whole high-temperature distillation process according to the feedback result of the high-temperature distillation, specifically: according to the feedback result of high-temperature distillation, the alpha and beta weight values in the objective function are dynamically regulated to balance the contribution degree of the soft and hard labels to the whole high-temperature distillation process, the softmax layer output of the student model under the condition that the distillation temperature T is 1 and the cross entropy loss function cross entropy of the real labels are student model loss values Lhard, and introducing Lhard can be understood as effectively correcting the learning effect of the student model by using the real label values so as to prevent the student model from being misdeviated by a teacher model occasionally. Experiments show that whenL hard With smaller weight, better effect can be produced, andL soft the contribution degree of (2)L hard A kind of electronic deviceTherefore, if the contribution degree of the two losses needs to be kept consistent, the two losses need to be inL soft The weight of (2) is multiplied by a T2 coefficient.
In this embodiment, preferably, the student model uses a depth separable convolution DenseNet as an input layer, and the learning model uses the depth separable convolution DenseNet to replace a common convolution layer in a teacher model, under the same input and output conditions, the parameter amount and the calculation amount of the depth separable convolution DenseNet (hereinafter referred to as a "DenseNet layer") are smaller, and the DenseNet layer is in a single channel form, and convolutions are performed on each channel of image data, so that the number of characteristic information channels and the number of input channels obtained after passing through the DenseNet layer are kept consistent.
In the general convolution layer shown in fig. 5, the convolution layer is a convolution with n convolution kernels of k×k×m, and the number of convolution parameters is: kxkxmxn, a feature map of wxhxn is obtained by this convolution, that is, the calculated amount of this convolution is: kxkxmxnxwxh. The DenseNet layer consists of a depth convolution layer and a point-by-point convolution layer, as shown in FIG. 6, wherein the convolution kernel of the depth convolution has a size of k x m; the point-by-point convolution is a convolution with n convolution kernels of 1×1×m, so its parameters are: kxkxm+mxn, and then a w×h×n feature map is generated by passing through the DenseNet layer, namely, the calculated amount of the DenseNet layer is as follows: kxkxmxwxh+mxnxwxh.
The parameter number and the calculated amount of the common convolution layer and the DenseNet layer are calculated to be 1/n+1/K2 of the common convolution layer, so that the model has a lighter structure by using the DenseNet layer instead of the common convolution layer.
The following is an application example of the industrial defect detection model compression method based on knowledge distillation in this embodiment:
according to the invention, knowledge distillation is applied to compression of an industrial laser welding defect detection model, test pictures are 1000 welded defect sample pictures, the pictures are randomly divided into 10 groups, 100 pictures are respectively input into the model for testing, the results of all groups are taken to obtain an average value, and the test results are shown in the following table 1:
table 1: comparison of test results
The method for compressing the industrial defect detection model based on knowledge distillation can greatly improve the running speed of the model on the premise that the precision of the original model is basically not lost, and can meet the requirement of high detection speed under the condition of limited industrial quality inspection computing resources.
The industrial defect detection model compression method based on knowledge distillation can be applied to the defect detection of various industrial products, such as the defect detection of electronic products, mechanical equipment, precise instruments, parts and the like, and has the industrial product defect detection modules which can be applied to equipment or systems such as a top cover welding visual detection system, an automatic feeding and discharging machine (universal), a Mylar machine package Mylar CCD detection device, a sealing nail welding visual detection system, an EPD burning and lighting inspection AOI device, a battery cell appearance detection machine, a six-sided blade battery detection system, a two-dimensional bar code reader VCR, a bending machine, a PSA small-size attachment machine (single channel) and the like.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "examples," "particular examples," or "an alternative embodiment," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-described embodiments do not limit the scope of the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the above embodiments should be included in the scope of the present invention.

Claims (2)

1. The industrial defect detection model compression method based on knowledge distillation is characterized by comprising the following steps of:
constructing and training a teacher model;
distilling the teacher model to the student model by a knowledge distillation mode, comprising:
intermediate network layer feature distillation: the student model adopts a characteristic distillation mode to learn the characteristic information of the middle network layer of the teacher model, so that the student model repeatedly approaches the characteristic information of the teacher model in the corresponding middle network layer in the middle network layers with different depths, and then the soft label of the teacher model is used for training the student model network; the method specifically comprises the following steps:
selecting an intermediate network layer to be distilled, and connecting a convolution layer behind the intermediate network layer of the student model to ensure that the output characteristic information size of the student model is consistent with the output characteristic information size of the teacher model;
training parameters of an intermediate network layer of the student model according to the characteristic distillation mode;
training an intermediate network layer of the student model using the soft labels of the teacher model intermediate network layer;
softmax layer target distillation: the student model learns softmax layer characteristic information of the teacher model in a mode of combining a soft target and a hard target; the soft target refers to the fact that the probability value output by the softmax layer of the teacher model is used for guiding the student model to learn the knowledge of the teacher model, and the hard target refers to the fact that the final label value of the teacher model is used for guiding the knowledge of the teacher model when the student model learns; the method specifically comprises the following steps:
setting a probability value output by a teacher model softmax layer as an initial soft label, and generating a deformed soft label according to a set distillation temperature T based on the initial soft label, wherein a label value finally obtained by the teacher model is a hard label;
simultaneously using the deformed soft tag and the hard tag to carry out high-temperature distillation on the student model;
dynamically adjusting the contribution degree of the deformed soft tag and the hard tag to the whole high-temperature distillation process according to the feedback result of the high-temperature distillation;
the softmax function of the deformed soft label based on the initial soft label according to the set distillation temperature T is expressed as:
in the above formula, whereinRepresenting the final output of the model, representing that the input image belongs toiPossibility of category->Represent the firstiProbability values obtained after the individual categories pass through the softmax layer, T being the distillation temperature;
in the high temperature distillation process, the objective function is determined by distillation loss valueL soft And student model loss valueL hard Weighting is obtained, and the objective function is expressed as:
alpha and beta respectively represent distillation loss valuesL soft And student model loss valueL hard Weighting values in an objective function, wherein the distillation loss valueL soft The formula is as follows:
in the above-mentioned description of the invention,is the teacher model outputs the value in class i,/through softmax layer at temperature T>Is the student model outputs a value at class i through the softmax layer at temperature T, where:
in the above-mentioned description of the invention,y i is the teacher model softmax layer output,z i is student model softmax layer output, N is the number of categories;
the student model loss valueL hard The formula is as follows:
in the above-mentioned description of the invention,c i is the firstiThe true tag value of the class is used, ,/>is output at the first layer of softmax of student model when T is 1iA value of the class;
according to the feedback result of high-temperature distillation, the contribution degree of the deformed soft tag and the hard tag to the whole distillation is dynamically adjusted, and the method specifically comprises the following steps: according to the feedback result of high-temperature distillation, dynamically adjusting alpha and beta weight values in an objective function to balance the contribution degree of the soft and hard labels to the whole high-temperature distillation process;
the student model employs a depth separable convolution DenseNet as an input layer.
2. The method for compressing the industrial defect detection model based on knowledge distillation as set forth in claim 1, wherein: in the middle network layer characteristic distillation process, the difference loss between the middle network layer output characteristic information of the student model and the middle network layer output characteristic information of the teacher model is calculatedLThe sizes of the output characteristic information of the two are kept consistent, and the difference loss of the output characteristic informationLThe calculation formula is as follows:
in the above-mentioned description of the invention,is the distillation coefficient of the different stages,iis a certain stage layer of student model and teacher model in the middle network layer, and is a part of the middle network layer>Is the firstiOutput characteristic information of stage layer teacher model, +.>Is the firstiAnd outputting characteristic information of the stage layer student model.
CN202310412539.7A 2023-04-18 2023-04-18 Industrial defect detection model compression method based on knowledge distillation Active CN116152240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310412539.7A CN116152240B (en) 2023-04-18 2023-04-18 Industrial defect detection model compression method based on knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310412539.7A CN116152240B (en) 2023-04-18 2023-04-18 Industrial defect detection model compression method based on knowledge distillation

Publications (2)

Publication Number Publication Date
CN116152240A CN116152240A (en) 2023-05-23
CN116152240B true CN116152240B (en) 2023-07-25

Family

ID=86360365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310412539.7A Active CN116152240B (en) 2023-04-18 2023-04-18 Industrial defect detection model compression method based on knowledge distillation

Country Status (1)

Country Link
CN (1) CN116152240B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2584727B (en) * 2019-06-14 2024-02-28 Vision Semantics Ltd Optimised machine learning
JP7283835B2 (en) * 2020-12-17 2023-05-30 之江実験室 Automatic Compression Method and Platform for Pre-trained Language Models Based on Multilevel Knowledge Distillation
CN113361589A (en) * 2021-06-01 2021-09-07 杨晶晶 Rare or endangered plant leaf identification method based on transfer learning and knowledge distillation
CN113554716A (en) * 2021-07-28 2021-10-26 广东工业大学 Knowledge distillation-based tile color difference detection method and device
CN113887610B (en) * 2021-09-29 2024-02-02 内蒙古工业大学 Pollen image classification method based on cross-attention distillation transducer
CN115393671A (en) * 2022-08-25 2022-11-25 河海大学 Rock class prediction method based on multi-teacher knowledge distillation and normalized attention
CN115631393A (en) * 2022-09-28 2023-01-20 西南科技大学 Image processing method based on characteristic pyramid and knowledge guided knowledge distillation
CN115965964B (en) * 2023-01-29 2024-01-23 中国农业大学 Egg freshness identification method, system and equipment

Also Published As

Publication number Publication date
CN116152240A (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN108648197A (en) A kind of object candidate area extracting method based on image background mask
CN112465790A (en) Surface defect detection method based on multi-scale convolution and trilinear global attention
CN114049513A (en) Knowledge distillation method and system based on multi-student discussion
CN109753567A (en) A kind of file classification method of combination title and text attention mechanism
CN112052906B (en) Image description optimization method based on pointer network
CN112989942A (en) Target instance segmentation method based on traffic monitoring video
CN113140023B (en) Text-to-image generation method and system based on spatial attention
CN116311483B (en) Micro-expression recognition method based on local facial area reconstruction and memory contrast learning
CN111738169A (en) Handwriting formula recognition method based on end-to-end network model
Bouchain Character recognition using convolutional neural networks
CN113591978A (en) Image classification method, device and storage medium based on confidence penalty regularization self-knowledge distillation
CN113221852A (en) Target identification method and device
CN115393933A (en) Video face emotion recognition method based on frame attention mechanism
CN115829983A (en) Knowledge distillation-based high-speed industrial scene visual quality detection method
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
CN115564194A (en) Method and system for constructing metering abnormality diagnosis information generation model of smart power grid
CN114780767A (en) Large-scale image retrieval method and system based on deep convolutional neural network
CN117892175A (en) SNN multi-mode target identification method, system, equipment and medium
CN114170659A (en) Facial emotion recognition method based on attention mechanism
CN116152240B (en) Industrial defect detection model compression method based on knowledge distillation
CN112560668A (en) Human behavior identification method based on scene prior knowledge
CN113554040B (en) Image description method and device based on condition generation countermeasure network
CN115861239A (en) Small sample industrial part surface defect detection method based on meta-learning
CN115424123A (en) Multi-stage depth network indoor scene recognition method based on multi-attention mechanism
CN114863260A (en) Fast-Yolo real-time jellyfish detection method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant