CN117152154A - Similar increment flexible circuit board defect detection method based on double-teacher architecture - Google Patents

Similar increment flexible circuit board defect detection method based on double-teacher architecture Download PDF

Info

Publication number
CN117152154A
CN117152154A CN202311425206.4A CN202311425206A CN117152154A CN 117152154 A CN117152154 A CN 117152154A CN 202311425206 A CN202311425206 A CN 202311425206A CN 117152154 A CN117152154 A CN 117152154A
Authority
CN
China
Prior art keywords
class
new
old
model
teacher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311425206.4A
Other languages
Chinese (zh)
Other versions
CN117152154B (en
Inventor
廖晓鹃
熊文杰
陈光柱
陈润吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Univeristy of Technology
Original Assignee
Chengdu Univeristy of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Univeristy of Technology filed Critical Chengdu Univeristy of Technology
Priority to CN202311425206.4A priority Critical patent/CN117152154B/en
Publication of CN117152154A publication Critical patent/CN117152154A/en
Application granted granted Critical
Publication of CN117152154B publication Critical patent/CN117152154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30148Semiconductor; IC; Wafer
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image processing, and discloses a similar increment flexible circuit board defect detection method based on a double-teacher architecture, aiming at the problem of similar increment defect detection in a real flexible circuit board scene. The method adds a new-old type self-adaptive module and a decoupling characteristic distillation module on a double-teacher architecture. The new and old class self-adaptation module adjusts the weight of new and old class knowledge in the training process so as to solve the problem that class increment learning has preference on new and old classes. The decoupling characteristic distillation module utilizes the relative position information of the new class in the image to position the position in the characteristic diagram, and further removes the part of the characteristics in a decoupling mode, so that more accurate characteristic distillation is realized. The method for detecting the defects of the class increment flexible circuit board based on the double-teacher architecture can effectively balance the importance of new classes and old classes, improves the distillation effect and further carries out more accurate defect detection on the flexible circuit board.

Description

Similar increment flexible circuit board defect detection method based on double-teacher architecture
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a similar increment flexible circuit board defect detection method based on a double-teacher architecture.
Background
The flexible circuit board (Flexible Printed Circuit, FPC) is a light, thin and flexible circuit board and is widely applied to various fields such as smart phones, notebook computers, aerospace and the like. However, due to the fragile material and complex process flow of the flexible circuit board, defects with different degrees are very easy to occur in the production process, so that the electronic product is scrapped. Therefore, defect detection of flexible circuit boards is an important aspect in quality control. In the actual flexible circuit board production process, flexible production has gradually become the production mode of the current mainstream. The core of flexible production is to adapt to the change of production environment by adopting flexible production equipment and production flow, new defect types can continuously appear along with the change of products and production processes, and due to the requirement of production beats, defect detection must have real-time performance, which requires that a system can learn new types with less time and space expenditure. Class incremental learning provides a solution for flexible circuit board defect detection. The goal of class incremental learning is to enable models to continually adapt to new class data while maintaining the ability to identify previous tasks. Through class increment learning, the flexible circuit board defect detection model can continuously learn from new defect data without old class defect data, so that the time cost and the space cost of learning are greatly reduced.
However, most types of incremental methods currently face the following two challenges in flexible circuit board defect detection:
(1) In flexible production of flexible circuit boards, the target defects exhibit variability, i.e., the new class and number of each occurrence is uncertain. Therefore, when the increment category is changeable, the new and old category weights fixedly set in the current category increment learning model may not be well balanced between the new category defect and the old category defect.
(2) The influence of incorrect feature knowledge is ignored by the conventional feature knowledge distillation, namely, as the old teacher model does not learn the new class, the old teacher model can associate the new class defect with the old class defect or background, and the new class defect feature knowledge distilled by the old teacher model is inaccurate, so that the learning of the student model is disordered.
In summary, aiming at the problems, a similar increment flexible circuit board defect detection method based on a double-teacher architecture is provided.
Disclosure of Invention
In view of the above problems, the present invention aims to provide a method for detecting defects of a flexible circuit board with similar increment based on a dual-teacher architecture.
A similar increment flexible circuit board defect detection method based on a double-teacher architecture comprises the following steps:
s1, establishing a self-adaptive double-teacher architecture: the self-adaptive double-teacher architecture comprises a double-teacher architecture and a new and old type self-adaptive module;
the dual-teacher architecture respectively establishes a student model, an old teacher model and a new teacher model according to the incremental data set of the defect class of the flexible circuit board; the flexible circuit board defect type incremental data set is divided into a new type data set and an old type data set which are mutually disjoint; the old-class teacher model only uses the old-class data set for training, the new-class teacher model only uses the new-class data set for training, the student model only uses the new-class data set for training, and meanwhile, the distilling knowledge of the old-class teacher model for recognizing the old class and the distilling knowledge of the new class teacher model for recognizing the new class are obtained in a knowledge distilling mode;
the new-old type self-adaptation module obtains old type knowledge weights and new type knowledge weights according to the characteristics extracted by the student model, the old type teacher model and the new type teacher model; the old-class knowledge is distillation knowledge of an old-class teacher model, and the new-class knowledge comprises distillation knowledge of a new-class teacher model and knowledge obtained by training a student model according to a new-class data set;
s2, decoupling characteristic distillation: obtaining position information of new class defects according to the data set, and removing new class features in old class teacher models and student models extraction features in feature distillation by using a decoupling idea;
s3, incremental training is carried out: inputting the distillation knowledge of the new teacher model, the distillation knowledge of the old teacher model and the new data set into the student model, and carrying out end-to-end training by combining the new and old knowledge weights to obtain a trained student model;
s4, defect identification is carried out: and inputting the defect image into a trained student model, and generating a final defect prediction result by the student model.
The student model, the old teacher model and the new teacher model are FasterR-CNN target detection models; the FasterR-CNN target detection model network consists of a feature extractor FE, a region of interest extractor RPN and a classifier RCN, wherein the feature extractor FE is used for extracting features from a defect image, the region of interest extractor RPN is used for generating candidate regions from the features, and the features mapped by the candidate regions are sent into the classifier RCN for defect classification to generate a final defect prediction result.
The specific operation steps of the new and old type self-adaptive module are sequentially feature fusion operation, global average pooling operation, channel weight calculation operation, feature re-weighting operation and knowledge weight calculation operation:
feature fusion operation: respectively splicing and fusing the characteristics extracted by the student models and the characteristics extracted by the old teacher model and the new teacher model in the channel dimension:
wherein F is OT Representing the extracted characteristics of old teacher model, F S Representing the extracted features of the student model, F NT Representing the extracted characteristics of a new teacher model, F OT ,F S ,F NT ∈R H*W*C H, W, C the height, width and number of channels of the feature, respectively, and concat (·) represents stitching fusion of the feature, dim=c represents stitching fusion in the channel dimension, Z O ,Z N Represents F S Respectively with F OT ,F NT Fused features, Z O ,Z N ∈R H*W*2C
Global average pooling operation: designing two parallel global average pooling layers, and fusing the characteristics Z O ,Z N And performing global average pooling operation, wherein the specific operation formula is as follows:
wherein Y is O ,Y N Features expressed as global average pooling operation outputs, Y O ,Y N ∈R 1*1*2C ,Pool avg (.) represents a global average pooling operation;
calculating channel weight: pooling global average post-characterization Y using fully connected layers O ,Y N Compressing the channel number of the channel to C/8, and then recovering the characteristic channel number to 2C through full connection operation to obtain the weight of each channel:
wherein Fc is C/8 (·),Fc 2C (. Cndot.) represents a fully connected operation with outputs of sizes C/8 and 2C, Q, respectively O ,Q N Weights of each channel of old-class knowledge and new-class knowledge are respectively represented, Q O ,Q N ∈R 1*1*2C
Feature re-weighting operation: the obtained channel weight Q O ,Q N For feature Z O ,Z N Re-weighting each channel in (a) to obtain a re-weighted feature:
wherein,representing multiplication of corresponding elements, Z' O And Z' N Representing the re-weighted old and new classes of features, Z' O ,Z' N ∈R H*W*2C
Calculating knowledge weight operation: using fully connected layers, based on re-weighted features Z ', respectively' O ,Z' N Old class knowledge weights and new class knowledge weights are calculated, and then softmax operation is carried out:
wherein Softmax (. Cndot. ) represents normalization, lambda.using Softmax procedure 1 ,λ 2 And representing the normalized old class knowledge weight and the normalized new class knowledge weight.
The decoupling feature distillation includes obtaining position information of new type defects, and removing new type features in old type teacher models and student models extracted features in feature distillation:
firstly, acquiring position information of a new type of defect according to a data set, and decomposing a defect image P into a new type of defect area and a non-new type of defect area:
wherein p (m, n) represents the value of the pixel point of the m-th row and the n-th column, (m, n) represents the coordinates of the pixel point in the defect image, H, W respectively represent the height and width of the defect image, NC represents the region of a new type of defect in the defect image,representing areas of non-new types of defects in the defect image, (x) 1 , y 1 ),(x 2 , y 2 ) Horizontal and vertical seats respectively representing upper left corner and lower right corner of new defect areaMarking;
then, using the old class teacher model, the student model extracts features F from the training image OT ,F S And locating the feature region belonging to the new type of defect in the feature according to the relative position of the new type of defect in the defect image, and similarly, decomposing the feature into a new type of defect feature region NCF and a non-new type of defect feature region
Wherein f OT (i,j),f S (i, j) each represents F OT ,F S Elements of the ith row and jth column, (i, j) represent coordinates of the feature, h, w represent height and width of the feature, respectively, NCF represents a feature region corresponding to a new type of defect in the defect image,representing a characteristic region corresponding to a non-new type defect in the defect image;
finally, decoupling the features, and removing the features belonging to the new types of defects from the features to obtain the decoupled features:
wherein F' OT ,F ' S Representing the decoupled features.
The incremental training further comprises:
calculating the decoupling characteristic distillation loss of the old teacher model:
wherein,representing the decoupling characteristic distillation loss of the old teacher model;
then, the old classThe decoupling characteristic distillation loss of the teacher model is combined with the RPN layer distillation loss and the RCN layer distillation loss to form the old knowledge distillation loss; combining the characteristic distillation loss of the new teacher model with the RPN layer distillation loss and the RCN layer distillation loss to obtain new knowledge distillation loss; finally, combine the old class knowledge weight lambda 1 New class knowledge weight lambda 2 And old class knowledge distillation loss, new class knowledge distillation loss, the student model uses the loss of the new class data set training and regularization loss to form a model overall loss function, the loss function is as follows:
wherein,distillation loss of old class knowledge representing old class teacher model, < ->New class knowledge distillation loss representing new class teacher model, < ->Respectively represent the RPN distillation Loss (using L2 Loss function) of the old class teacher model, the RCN distillation Loss (using cross entropy Loss function) of the old class teacher model, the characteristic distillation Loss (using L2 Loss function) of the new class teacher model, the RPN distillation Loss (using L2 Loss function) of the new class teacher model, the RCN distillation Loss (using cross entropy Loss function) of the new class teacher model, the Loss represents the model overall Loss, L RCN Representing the loss of training of a student model using a new class of data set, L reg (. Cndot. ) is a regularization loss, specifically as follows:
and finally, performing end-to-end training on the student model by using the model total loss function to obtain a trained student model.
Compared with the prior art, the invention has the following beneficial effects:
1. the weights of new defects and old defects are captured more effectively in the training process, and the generalization capability of the model is improved;
2. the student model is effectively prevented from learning inaccurate new-type defect feature knowledge from the old-type teacher model, so that the feature distillation effect is improved, and the model performance is improved;
3. the method has higher instantaneity and smaller model scale, and can be applied to a flexible circuit board defect detection scene with higher instantaneity requirement.
Drawings
Fig. 1 is a flexible circuit board defect type incremental dataset defect image.
Fig. 2 is a flexible circuit board defect type incremental dataset defect target label image.
FIG. 3 is a flow chart of class delta defect detection.
FIG. 4 is a schematic representation of a FasterR-CNN target detection model.
Fig. 5 is a diagram of a dual teacher architecture framework.
Fig. 6 is a diagram of a new and old class adaptive module framework.
Fig. 7 is a diagram of a decoupling signature distillation framework.
Fig. 8 is an effect diagram of a similar incremental flexible circuit board defect detection method based on a dual teacher architecture.
Detailed Description
The technical scheme of the invention is described in detail below with reference to the accompanying drawings.
The embodiment provides a similar increment flexible circuit board defect detection method based on a double-teacher architecture, which specifically comprises the following steps:
s1, establishing a self-adaptive double-teacher architecture: the method comprises the steps of establishing a double-teacher architecture and establishing a new-old type self-adaptive module;
establishing a double teacher architecture: respectively establishing a student model, an old teacher model and a new teacher model according to the defect incremental data set of the flexible circuit board;
the flexible circuit board defect type incremental data set does not exist yet, a flexible circuit board defect type incremental data set (FPCSD 2023) needs to be established, and the defect types of the FPCSD2023 data set comprise 11 types of crush, crease, pollution, scratch, CVL foreign matters, oxidation under a film, ink foreign matters, ink falling, ink pollution, rough gold surface, skip plating and the like.
Then, various flexible circuit board defect samples are collected in an expanding mode, the number of data sets is greatly improved, the flexible circuit board data sets are expanded by utilizing a data enhancement mode due to the fact that the number of the flexible circuit board samples is small, the flexible circuit board samples are photographed after being subjected to data enhancement modes such as rotation, scaling, translation and illumination transformation, and finally an expanded FPCSD2023 data set is obtained, and the image resolution is 768 pixels. The number of FPCSD2023 data sets is 2211, labeling is carried out on the defect images of the flexible circuit board by using labeling software Labelimg, the defect images of the defect data sets of the flexible circuit board are shown in the figure 1, and the label images of the defect data sets of the flexible circuit board are shown in the figure 2;
finally, the old class data set and the new class data set are partitioned. Due to the flexible production mode of the flexible circuit board, new defect types can continuously appear along with the change of products and production processes, and the defect detector needs to learn the capability of identifying new types of defects on the premise of keeping the capability of identifying old types of defects, as shown in fig. 3. The FPCSD2023 dataset was divided into an old type dataset and a new type dataset according to defect types to simulate the flex circuit board to continuously appear new types of defects, the old type dataset comprises crush, crease, pollution, scratch, CVL foreign matter, oxidation under film, ink foreign matter and ink drop 8 types, and the new type dataset comprises ink pollution, rough gold surface and plating 3 types.
The FasterR-CNN target detection model pre-trained on the ImageNet is used as a flexible circuit board defect detection model, and the network structure of the FasterR-CNN target detection model is shown in figure 4 and consists of a feature extractor FE, a region of interest extractor RPN and a classifier RCN, wherein the feature extractor FE uses a ResNet50 network for extracting features from a defect image, the region of interest extractor RPN aims at generating candidate regions from the features, and the features mapped by the candidate regions are sent into the classifier RCN for defect classification to generate a final defect prediction result. The dual teacher architecture is shown in fig. 5, where three models are built that play different roles. Building an old-class teacher model, wherein the old-class teacher model only uses an old-class data set for training; building a new type teacher model, wherein the new type teacher model only uses the new type data set for training; a student model is built, the student model is trained by using a new class data set, and meanwhile, distillation knowledge of an old class teacher model for identifying the old class and distillation knowledge of a new class teacher model for identifying the new class are obtained through knowledge distillation (see Hinton G, vinylals O, dean J. Distilling the knowledge in a neural network [ J ]. ArXiv preprint arXiv:1503.02531, 2015), so that the identification capacity of the student model for the old class defects and the adaptability of the student model for the new class defects are improved. The student model, the old teacher model and the new teacher model all use FasterR-CNN target detection models.
And designing a new and old type self-adaptive module: as shown in fig. 6, according to the features extracted by the student model, the old teacher model and the new teacher model, the old knowledge weight and the new knowledge weight are obtained, and the old knowledge weight and the new knowledge weight are weighted into the loss function to adjust the importance of the new knowledge and the old knowledge during training. The specific operation steps of the method sequentially comprise a feature fusion operation, a global average pooling operation, a channel weight calculation operation, a feature re-weighting operation and a knowledge weight calculation operation;
feature fusion operation: respectively splicing and fusing the characteristics extracted by the student models and the characteristics extracted by the old teacher model and the new teacher model in the channel dimension:
wherein F is OT Representing the extracted characteristics of old teacher model, F S Representing the extracted features of the student model, F NT Representing the extracted characteristics of a new teacher model, F O T ,F S ,F NT ∈R H*W*C H, W, C the height, width and number of channels of the feature, respectively, concat (·) means that features are merged by stitching, dim=c means that the channel dimension is taken inRow splice fusion, Z O ,Z N Represents F S Respectively with F OT ,F NT Fused features, Z O ,Z N ∈R H*W*2C
Global average pooling operation: designing two parallel global average pooling layers, and performing global average pooling operation on the fused features, wherein the specific operation formula is as follows:
wherein Y is O ,Y N Features expressed as global average pooling operation outputs, Y O ,Y N ∈R 1*1*2C ,Pool avg (.) represents a global average pooling operation;
calculating channel weight: pooling global average post-characterization Y using fully connected layers O ,Y N Compressing the channel number of the channel to C/8, and then recovering the characteristic channel number to 2C through full connection operation to obtain the weight of each channel:
wherein Fc is C/8 (·),Fc 2C (. Cndot.) represents a fully connected operation with outputs of sizes C/8 and 2C, Q, respectively O ,Q N Weights of each channel of old-class knowledge and new-class knowledge are respectively represented, Q O ,Q N ∈R 1*1*2C
Feature re-weighting operation: the obtained channel weight Q O ,Q N For feature Z O ,Z N Re-weighting each channel in (a) to emphasize features of important channels while suppressing features of non-important channels, resulting in re-weighted features:
wherein,representing multiplication of corresponding elements, Z' O And Z' N Representing the re-weighted old and new classes of features, Z' O ,Z' N ∈R H*W*2C
Calculating knowledge weight operation: using fully connected layers, based on re-weighted features Z ', respectively' O ,Z' N Old class knowledge weights and new class knowledge weights are calculated, and then softmax operation is carried out:
wherein Softmax (. Cndot. ) represents normalization, lambda.using Softmax procedure 1 ,λ 2 Representing the normalized old class knowledge weight and new class knowledge weight;
the obtained normalized old class knowledge weight and new class knowledge weight are compared with the previous lambda 1 ,λ 2 Update to ensure lambda 1 ,λ 2 The latest state of the new and old class weights can be reflected.
S2, decoupling characteristic distillation: as shown in fig. 7, according to the position information of the new type defect obtained in the data set, the decoupling idea is utilized to remove the new type feature in the extracted features of the old type teacher model and the student model in the feature distillation, so as to improve the feature distillation effect;
firstly, acquiring position information of a new type of defect according to a data set, and decomposing a defect image P into a new type of defect area and a non-new type of defect area:
wherein p (m, n) represents the value of the pixel point of the m-th row and the n-th column, (m, n) represents the coordinates of the pixel point in the defect image, H, W respectively represent the height and width of the defect image, NC represents the region of a new type of defect in the defect image,representing areas of non-new types of defects in the defect image, (x) 1 , y 1 ),(x 2 , y 2 ) Respectively representing the horizontal and vertical coordinates of the left upper corner and the right lower corner of the new type of defect area;
then, using the old class teacher model, the student model extracts features F from the training image OT ,F S And locating the feature region belonging to the new type of defect in the feature according to the relative position of the new type of defect in the defect image, and similarly, decomposing the feature into a new type of defect feature region NCF and a non-new type of defect feature region
Wherein f OT (i,j),f S (i, j) each represents F OT ,F S Elements of the ith row and jth column, (i, j) represent coordinates of the feature, h, w represent height and width of the feature, respectively, NCF represents a feature region corresponding to a new type of defect in the defect image,representing a characteristic region corresponding to a non-new type defect in the defect image;
finally, decoupling the features, and removing the features belonging to the new types of defects from the features to obtain the decoupled features:
wherein F' OT ,F ' S Representing the decoupled features.
S3, incremental training is carried out: and inputting the defect image, the corresponding label and the characteristics after decoupling of the weights of the new class and the old class into a student model, and performing end-to-end training to obtain a trained defect detection network.
Firstly, calculating the decoupling characteristic distillation loss of the old teacher model:
wherein,representing the decoupling characteristic distillation loss of the old teacher model;
then, combining the decoupling characteristic distillation loss, the RPN layer distillation loss and the RCN layer distillation loss of the old class teacher model into the old class knowledge distillation loss, and combining the characteristic distillation loss, the RPN layer distillation loss and the RCN layer distillation loss of the new class teacher model into the new class knowledge distillation loss; finally, combine the old class knowledge weight lambda 1 New class knowledge weight lambda 2 And old class knowledge distillation loss, new class knowledge distillation loss, the student model uses the loss of the new class data set training and regularization loss to form a model overall loss function, the loss function is as follows:
wherein,distillation loss of old teacher model is represented, +.>Represents the distillation loss of the new class of teacher models,respectively represent the RPN distillation Loss (using L2 Loss function) of the old class teacher model, the RCN distillation Loss (using cross entropy Loss function) of the old class teacher model, the characteristic distillation Loss (using L2 Loss function) of the new class teacher model, the RPN distillation Loss (using L2 Loss function) of the new class teacher model, the RCN distillation Loss (using cross entropy Loss function) of the new class teacher model, loss is the model overall Loss, L RCN Representing the loss of training of student models using new classes of data sets (i.e., model loss of FasterR-CNN, for specific forms see Ren S, he K, girsheck R, et al Faster R-CNN: towards real-time object detection with region proposal networks [ J ]]. Advances in neural information processing systems, 2015, 28),L reg (. Cndot. ) is a regularization loss, specifically as follows:
and finally, performing end-to-end training by using the model total loss function to obtain a trained student model.
S4, defect identification is carried out: inputting the defect images into a trained student model, generating a final defect prediction result by the student model, taking an average precision mean value (MAP) as an evaluation index of defect detection, wherein the MAP of various defects and MAP of all defects of the trained student model on a flexible circuit board defect type incremental data set are shown in table 1, the prediction effect of the defect detection of the flexible circuit board is visualized as shown in figure 8, and the effective identification of various defects of the flexible circuit board can be realized.
Table 1 test results of trained student models on flexible printed circuit defect class incremental datasets

Claims (5)

1. The method for detecting the defects of the similar increment flexible circuit board based on the double-teacher architecture is characterized by comprising the following steps of:
s1, establishing a self-adaptive double-teacher architecture: the self-adaptive double-teacher architecture comprises a double-teacher architecture and a new and old type self-adaptive module;
the dual-teacher architecture respectively establishes a student model, an old teacher model and a new teacher model according to the incremental data set of the defect class of the flexible circuit board; the flexible circuit board defect type incremental data set is divided into a new type data set and an old type data set which are mutually disjoint; the old-class teacher model only uses the old-class data set for training, the new-class teacher model only uses the new-class data set for training, the student model only uses the new-class data set for training, and meanwhile, the distilling knowledge of the old-class teacher model for recognizing the old class and the distilling knowledge of the new class teacher model for recognizing the new class are obtained in a knowledge distilling mode;
the new-old type self-adaptation module obtains old type knowledge weights and new type knowledge weights according to the characteristics extracted by the student model, the old type teacher model and the new type teacher model; the old-class knowledge is distillation knowledge of an old-class teacher model, and the new-class knowledge comprises distillation knowledge of a new-class teacher model and knowledge obtained by training a student model according to a new-class data set;
s2, decoupling characteristic distillation: obtaining position information of new class defects according to the data set, and removing new class features in old class teacher models and student models extraction features in feature distillation by using a decoupling idea;
s3, incremental training is carried out: inputting the distillation knowledge of the new class teacher model, the distillation knowledge of the old class teacher model and the new class data set into the student model, and performing end-to-end training by combining the old class knowledge weight and the new class knowledge weight to obtain a trained student model;
s4, defect identification is carried out: and inputting the defect image into a trained student model, and generating a final defect prediction result by the student model.
2. The dual-teacher architecture-based class incremental flexible circuit board defect detection method of claim 1, wherein the student model, the old class teacher model and the new class teacher model are both FasterR-CNN target detection models; the FasterR-CNN target detection model network consists of a feature extractor FE, a region of interest extractor RPN and a classifier RCN, wherein the feature extractor FE is used for extracting features from a defect image, the region of interest extractor RPN is used for generating candidate regions from the features, and the features mapped by the candidate regions are sent into the classifier RCN for defect classification to generate a final defect prediction result.
3. The method for detecting the defects of the class increment flexible circuit board based on the double-teacher architecture according to claim 1, wherein the specific operation steps of the new and old class self-adaptation module are feature fusion operation, global average pooling operation, channel weight calculation operation, feature re-weighting operation and knowledge weight calculation operation in sequence:
feature fusion operation: respectively splicing and fusing the characteristics extracted by the student models and the characteristics extracted by the old teacher model and the new teacher model in the channel dimension:
wherein F is OT Representing the extracted characteristics of old teacher model, F S Representing the extracted features of the student model, F NT Representing the extracted characteristics of a new teacher model, F OT ,F S ,F NT ∈R H*W*C H, W, C the height, width and number of channels of the feature, respectively, and concat (·) represents stitching fusion of the feature, dim=c represents stitching fusion in the channel dimension, Z O ,Z N Represents F S Respectively with F OT ,F NT Fused features, Z O ,Z N ∈R H*W*2C
Global average pooling operation: designing two parallel global average pooling layers, and fusing the characteristics Z O ,Z N And performing global average pooling operation, wherein the specific operation formula is as follows:
wherein Y is O ,Y N Features expressed as global average pooling operation outputs, Y O ,Y N ∈R 1*1*2C ,Pool avg (.) represents a global average pooling operation;
calculating channel weight: pooling global average post-characterization Y using fully connected layers O ,Y N Compressing the channel number of the channel to C/8, and then recovering the characteristic channel number to 2C through full connection operation to obtain the weight of each channel:
wherein Fc is C/8 (·),Fc 2C (. Cndot.) represents a fully connected operation with outputs of sizes C/8 and 2C, Q, respectively O ,Q N Weights of each channel of old-class knowledge and new-class knowledge are respectively represented, Q O ,Q N ∈R 1*1*2C
Feature re-weighting operation: the obtained channel weight Q O ,Q N For feature Z O ,Z N Re-weighting each channel in (a) to obtain a re-weighted feature:
wherein,representing multiplication of corresponding elements, Z' O And Z' N Representing the re-weighted old and new classes of features, Z' O ,Z' N ∈R H*W*2C
Calculating knowledge weight operation: using fully connected layers, based on re-weighted features Z ', respectively' O ,Z' N Old class knowledge weights and new class knowledge weights are calculated, and then softmax operation is carried out:
wherein Softmax (. Cndot. ) represents normalization, lambda.using Softmax procedure 1 ,λ 2 And representing the normalized old class knowledge weight and the normalized new class knowledge weight.
4. The method for detecting defects of a class increment flexible circuit board based on a dual-teacher architecture according to claim 1, wherein the decoupling feature distillation includes obtaining position information of new class defects, and removing new class features in old class teacher models and student model extraction features in feature distillation:
firstly, acquiring position information of a new type of defect according to a data set, and decomposing a defect image P into a new type of defect area and a non-new type of defect area:
wherein p (m, n) represents the value of the pixel point of the m-th row and the n-th column, (m, n) represents the coordinates of the pixel point in the defect image, H, W respectively represent the height and width of the defect image, NC represents the region of a new type of defect in the defect image,representing areas of non-new types of defects in the defect image, (x) 1 , y 1 ),(x 2 , y 2 ) Respectively representing the horizontal and vertical coordinates of the left upper corner and the right lower corner of the new type of defect area;
then, using the old class teacher model, the student model extracts features F from the training image OT ,F S According to the relative position of the new type defect in the defect image, locating the characteristic region belonging to the new type defect in the characteristic, and decomposing the characteristic into a new type defect characteristic region NCF and a non-new type defect characteristic region
Wherein f OT (i, j),f S (i, j) each represents F OT ,F S Elements of the ith row and jth column, (i, j) represent coordinates of the feature, h, w represent height and width of the feature, respectively, NCF represents a feature region corresponding to a new type of defect in the defect image,representing non-new defect correspondence in defect imagesIs a feature region of (1);
finally, decoupling the features, and removing the features belonging to the new types of defects from the features to obtain the decoupled features:
wherein F' OT ,F' S Representing the decoupled features.
5. The dual teacher architecture-based incremental flexible circuit board defect detection method of claim 1, wherein the incremental training further comprises:
calculating the decoupling characteristic distillation loss of the old teacher model:
wherein,representing the decoupling characteristic distillation loss of the old teacher model;
then, combining the decoupling characteristic distillation loss, the RPN layer distillation loss and the RCN layer distillation loss of the old class teacher model into the old class knowledge distillation loss, and combining the characteristic distillation loss, the RPN layer distillation loss and the RCN layer distillation loss of the new class teacher model into the new class knowledge distillation loss; finally, combine the old class knowledge weight lambda 1 New class knowledge weight lambda 2 And old class knowledge distillation loss, new class knowledge distillation loss, the student model uses the loss of the new class data set training and regularization loss to form a model overall loss function, the loss function is as follows:
wherein,distillation loss of old class knowledge representing old class teacher model, < ->New class knowledge distillation loss representing new class teacher model, < ->Respectively representing RPN distillation Loss of old class teacher model, RCN distillation Loss of old class teacher model, characteristic distillation Loss of new class teacher model, RPN distillation Loss of new class teacher model, RCN distillation Loss of new class teacher model, loss represents model total Loss, L RCN Representing the loss of training of a student model using a new class of data set, L reg (. Cndot. ) is a regularization loss, specifically as follows:
and finally, performing end-to-end training on the student model by using the model total loss function to obtain a trained student model.
CN202311425206.4A 2023-10-31 2023-10-31 Similar increment flexible circuit board defect detection method based on double-teacher architecture Active CN117152154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311425206.4A CN117152154B (en) 2023-10-31 2023-10-31 Similar increment flexible circuit board defect detection method based on double-teacher architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311425206.4A CN117152154B (en) 2023-10-31 2023-10-31 Similar increment flexible circuit board defect detection method based on double-teacher architecture

Publications (2)

Publication Number Publication Date
CN117152154A true CN117152154A (en) 2023-12-01
CN117152154B CN117152154B (en) 2024-01-26

Family

ID=88903143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311425206.4A Active CN117152154B (en) 2023-10-31 2023-10-31 Similar increment flexible circuit board defect detection method based on double-teacher architecture

Country Status (1)

Country Link
CN (1) CN117152154B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200175384A1 (en) * 2018-11-30 2020-06-04 Samsung Electronics Co., Ltd. System and method for incremental learning
CN111967534A (en) * 2020-09-03 2020-11-20 福州大学 Incremental learning method based on generation of confrontation network knowledge distillation
CN114298179A (en) * 2021-12-17 2022-04-08 上海高德威智能交通***有限公司 Data processing method, device and equipment
US20220138633A1 (en) * 2020-11-05 2022-05-05 Samsung Electronics Co., Ltd. Method and apparatus for incremental learning
CN114492745A (en) * 2022-01-18 2022-05-13 天津大学 Knowledge distillation mechanism-based incremental radiation source individual identification method
CN114722892A (en) * 2022-02-22 2022-07-08 中国科学院自动化研究所 Continuous learning method and device based on machine learning
CN114863248A (en) * 2022-03-02 2022-08-05 武汉大学 Image target detection method based on deep supervision self-distillation
CN115170872A (en) * 2022-06-23 2022-10-11 江苏科技大学 Class increment learning method based on knowledge distillation
CN115546581A (en) * 2022-09-28 2022-12-30 云南大学 Decoupled incremental target detection method
CN116630285A (en) * 2023-05-31 2023-08-22 河北工业大学 Photovoltaic cell type incremental defect detection method based on significance characteristic hierarchical distillation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200175384A1 (en) * 2018-11-30 2020-06-04 Samsung Electronics Co., Ltd. System and method for incremental learning
CN111967534A (en) * 2020-09-03 2020-11-20 福州大学 Incremental learning method based on generation of confrontation network knowledge distillation
US20220138633A1 (en) * 2020-11-05 2022-05-05 Samsung Electronics Co., Ltd. Method and apparatus for incremental learning
CN114298179A (en) * 2021-12-17 2022-04-08 上海高德威智能交通***有限公司 Data processing method, device and equipment
CN114492745A (en) * 2022-01-18 2022-05-13 天津大学 Knowledge distillation mechanism-based incremental radiation source individual identification method
CN114722892A (en) * 2022-02-22 2022-07-08 中国科学院自动化研究所 Continuous learning method and device based on machine learning
CN114863248A (en) * 2022-03-02 2022-08-05 武汉大学 Image target detection method based on deep supervision self-distillation
CN115170872A (en) * 2022-06-23 2022-10-11 江苏科技大学 Class increment learning method based on knowledge distillation
CN115546581A (en) * 2022-09-28 2022-12-30 云南大学 Decoupled incremental target detection method
CN116630285A (en) * 2023-05-31 2023-08-22 河北工业大学 Photovoltaic cell type incremental defect detection method based on significance characteristic hierarchical distillation

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
LONGHUI YU 等: "Multi-Teacher Knowle dge Distillation for Incremental Implicitly-Refine d Classification", 《ARXIV》, pages 1 - 9 *
USMAN TAHIR 等: "Class Incremental Learning for Visual Task using Knowledge Distillation", 《2022 24TH INTERNATIONAL MULTITOPIC CONFERENCE (INMIC)》, pages 1 - 7 *
YOOJIN CHOI 等: "Dual-Teacher Class-Incremental Learning With Data-Free Generative Replay", 《ARXIV》, pages 1 - 13 *
YU N XIANG 等: "Efficient Incremental Learning Using Dynamic Correction Vector", 《IEEE ACCESS》, vol. 8, pages 23090 - 23099, XP011771154, DOI: 10.1109/ACCESS.2019.2963461 *
唐进洪: "基于多教师知识蒸馏网络的钢铁表面缺陷诊断", 《信息与电脑(理论版)》, vol. 35, no. 11, pages 217 - 219 *
徐岸 等: "自适应特征整合与参数优化的类增量学习方法", 《计算机工程与应用》, pages 1 - 11 *

Also Published As

Publication number Publication date
CN117152154B (en) 2024-01-26

Similar Documents

Publication Publication Date Title
CN106875381B (en) Mobile phone shell defect detection method based on deep learning
CN105825511B (en) A kind of picture background clarity detection method based on deep learning
CN109683360B (en) Liquid crystal panel defect detection method and device
CN112085735A (en) Aluminum image defect detection method based on self-adaptive anchor frame
CN107423760A (en) Based on pre-segmentation and the deep learning object detection method returned
CN110717526A (en) Unsupervised transfer learning method based on graph convolution network
CN110837947B (en) Assessment method for teacher teaching concentration degree based on audio and video analysis technology
CN111161244B (en) Industrial product surface defect detection method based on FCN + FC-WXGboost
CN113642574A (en) Small sample target detection method based on feature weighting and network fine tuning
CN111222519A (en) Construction method, method and device of hierarchical colored drawing manuscript line extraction model
CN108133235A (en) A kind of pedestrian detection method based on neural network Analysis On Multi-scale Features figure
Li et al. A review of deep learning methods for pixel-level crack detection
CN115147418B (en) Compression training method and device for defect detection model
CN114596500A (en) Remote sensing image semantic segmentation method based on channel-space attention and DeeplabV3plus
CN114862764A (en) Flaw detection model training method, flaw detection model training device, flaw detection model training equipment and storage medium
CN111652836A (en) Multi-scale target detection method based on clustering algorithm and neural network
CN111145145A (en) Image surface defect detection method based on MobileNet
CN110310305A (en) A kind of method for tracking target and device based on BSSD detection and Kalman filtering
CN111582270A (en) Identification tracking method based on high-precision bridge region visual target feature points
CN115147644A (en) Method, system, device and storage medium for training and describing image description model
CN113128518B (en) Sift mismatch detection method based on twin convolution network and feature mixing
CN117456480B (en) Light vehicle re-identification method based on multi-source information fusion
CN113283334B (en) Classroom concentration analysis method, device and storage medium
Zhou et al. Defect detection method based on knowledge distillation
CN113962980A (en) Glass container flaw detection method and system based on improved YOLOV5X

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant