CN117152154A

CN117152154A - Similar increment flexible circuit board defect detection method based on double-teacher architecture

Info

Publication number: CN117152154A
Application number: CN202311425206.4A
Authority: CN
Inventors: 廖晓鹃; 熊文杰; 陈光柱; 陈润吉
Original assignee: Chengdu Univeristy of Technology
Current assignee: Chengdu Univeristy of Technology
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2023-12-01
Anticipated expiration: 2043-10-31
Also published as: CN117152154B

Abstract

The invention belongs to the technical field of image processing, and discloses a similar increment flexible circuit board defect detection method based on a double-teacher architecture, aiming at the problem of similar increment defect detection in a real flexible circuit board scene. The method adds a new-old type self-adaptive module and a decoupling characteristic distillation module on a double-teacher architecture. The new and old class self-adaptation module adjusts the weight of new and old class knowledge in the training process so as to solve the problem that class increment learning has preference on new and old classes. The decoupling characteristic distillation module utilizes the relative position information of the new class in the image to position the position in the characteristic diagram, and further removes the part of the characteristics in a decoupling mode, so that more accurate characteristic distillation is realized. The method for detecting the defects of the class increment flexible circuit board based on the double-teacher architecture can effectively balance the importance of new classes and old classes, improves the distillation effect and further carries out more accurate defect detection on the flexible circuit board.

Description

Similar increment flexible circuit board defect detection method based on double-teacher architecture

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a similar increment flexible circuit board defect detection method based on a double-teacher architecture.

Background

The flexible circuit board (Flexible Printed Circuit, FPC) is a light, thin and flexible circuit board and is widely applied to various fields such as smart phones, notebook computers, aerospace and the like. However, due to the fragile material and complex process flow of the flexible circuit board, defects with different degrees are very easy to occur in the production process, so that the electronic product is scrapped. Therefore, defect detection of flexible circuit boards is an important aspect in quality control. In the actual flexible circuit board production process, flexible production has gradually become the production mode of the current mainstream. The core of flexible production is to adapt to the change of production environment by adopting flexible production equipment and production flow, new defect types can continuously appear along with the change of products and production processes, and due to the requirement of production beats, defect detection must have real-time performance, which requires that a system can learn new types with less time and space expenditure. Class incremental learning provides a solution for flexible circuit board defect detection. The goal of class incremental learning is to enable models to continually adapt to new class data while maintaining the ability to identify previous tasks. Through class increment learning, the flexible circuit board defect detection model can continuously learn from new defect data without old class defect data, so that the time cost and the space cost of learning are greatly reduced.

However, most types of incremental methods currently face the following two challenges in flexible circuit board defect detection:

(1) In flexible production of flexible circuit boards, the target defects exhibit variability, i.e., the new class and number of each occurrence is uncertain. Therefore, when the increment category is changeable, the new and old category weights fixedly set in the current category increment learning model may not be well balanced between the new category defect and the old category defect.

(2) The influence of incorrect feature knowledge is ignored by the conventional feature knowledge distillation, namely, as the old teacher model does not learn the new class, the old teacher model can associate the new class defect with the old class defect or background, and the new class defect feature knowledge distilled by the old teacher model is inaccurate, so that the learning of the student model is disordered.

In summary, aiming at the problems, a similar increment flexible circuit board defect detection method based on a double-teacher architecture is provided.

Disclosure of Invention

In view of the above problems, the present invention aims to provide a method for detecting defects of a flexible circuit board with similar increment based on a dual-teacher architecture.

A similar increment flexible circuit board defect detection method based on a double-teacher architecture comprises the following steps:

s1, establishing a self-adaptive double-teacher architecture: the self-adaptive double-teacher architecture comprises a double-teacher architecture and a new and old type self-adaptive module;

the dual-teacher architecture respectively establishes a student model, an old teacher model and a new teacher model according to the incremental data set of the defect class of the flexible circuit board; the flexible circuit board defect type incremental data set is divided into a new type data set and an old type data set which are mutually disjoint; the old-class teacher model only uses the old-class data set for training, the new-class teacher model only uses the new-class data set for training, the student model only uses the new-class data set for training, and meanwhile, the distilling knowledge of the old-class teacher model for recognizing the old class and the distilling knowledge of the new class teacher model for recognizing the new class are obtained in a knowledge distilling mode;

the new-old type self-adaptation module obtains old type knowledge weights and new type knowledge weights according to the characteristics extracted by the student model, the old type teacher model and the new type teacher model; the old-class knowledge is distillation knowledge of an old-class teacher model, and the new-class knowledge comprises distillation knowledge of a new-class teacher model and knowledge obtained by training a student model according to a new-class data set;

s2, decoupling characteristic distillation: obtaining position information of new class defects according to the data set, and removing new class features in old class teacher models and student models extraction features in feature distillation by using a decoupling idea;

s3, incremental training is carried out: inputting the distillation knowledge of the new teacher model, the distillation knowledge of the old teacher model and the new data set into the student model, and carrying out end-to-end training by combining the new and old knowledge weights to obtain a trained student model;

s4, defect identification is carried out: and inputting the defect image into a trained student model, and generating a final defect prediction result by the student model.

The student model, the old teacher model and the new teacher model are FasterR-CNN target detection models; the FasterR-CNN target detection model network consists of a feature extractor FE, a region of interest extractor RPN and a classifier RCN, wherein the feature extractor FE is used for extracting features from a defect image, the region of interest extractor RPN is used for generating candidate regions from the features, and the features mapped by the candidate regions are sent into the classifier RCN for defect classification to generate a final defect prediction result.

The specific operation steps of the new and old type self-adaptive module are sequentially feature fusion operation, global average pooling operation, channel weight calculation operation, feature re-weighting operation and knowledge weight calculation operation:

feature fusion operation: respectively splicing and fusing the characteristics extracted by the student models and the characteristics extracted by the old teacher model and the new teacher model in the channel dimension:

；

wherein F is ^OT Representing the extracted characteristics of old teacher model, F ^S Representing the extracted features of the student model, F ^NT Representing the extracted characteristics of a new teacher model, F ^OT ，F ^S ，F ^NT ∈R ^H*W*C H, W, C the height, width and number of channels of the feature, respectively, and concat (·) represents stitching fusion of the feature, dim=c represents stitching fusion in the channel dimension, Z ^O ，Z ^N Represents F ^S Respectively with F ^OT ，F ^NT Fused features, Z ^O ，Z ^N ∈R ^H*W*2C ；

Global average pooling operation: designing two parallel global average pooling layers, and fusing the characteristics Z ^O ，Z ^N And performing global average pooling operation, wherein the specific operation formula is as follows:

；

wherein Y is ^O ，Y ^N Features expressed as global average pooling operation outputs, Y ^O ，Y ^N ∈R ^1*1*2C ，Pool _avg (.) represents a global average pooling operation;

calculating channel weight: pooling global average post-characterization Y using fully connected layers ^O ，Y ^N Compressing the channel number of the channel to C/8, and then recovering the characteristic channel number to 2C through full connection operation to obtain the weight of each channel:

；

wherein Fc is _C/8 (·)，Fc _2C (. Cndot.) represents a fully connected operation with outputs of sizes C/8 and 2C, Q, respectively ^O ，Q ^N Weights of each channel of old-class knowledge and new-class knowledge are respectively represented, Q ^O ，Q ^N ∈R ^1*1*2C ；

Feature re-weighting operation: the obtained channel weight Q ^O ，Q ^N For feature Z ^O ，Z ^N Re-weighting each channel in (a) to obtain a re-weighted feature:

；

wherein,representing multiplication of corresponding elements, Z' ^O And Z' ^N Representing the re-weighted old and new classes of features, Z' ^O ，Z' ^N ∈R ^H*W*2C ；

Calculating knowledge weight operation: using fully connected layers, based on re-weighted features Z ', respectively' ^O ，Z' ^N Old class knowledge weights and new class knowledge weights are calculated, and then softmax operation is carried out:

；

wherein Softmax (. Cndot. ) represents normalization, lambda.using Softmax procedure ₁ ，λ ₂ And representing the normalized old class knowledge weight and the normalized new class knowledge weight.

The decoupling feature distillation includes obtaining position information of new type defects, and removing new type features in old type teacher models and student models extracted features in feature distillation:

firstly, acquiring position information of a new type of defect according to a data set, and decomposing a defect image P into a new type of defect area and a non-new type of defect area:

；

wherein p (m, n) represents the value of the pixel point of the m-th row and the n-th column, (m, n) represents the coordinates of the pixel point in the defect image, H, W respectively represent the height and width of the defect image, NC represents the region of a new type of defect in the defect image,representing areas of non-new types of defects in the defect image, (x) ₁ , y ₁ )，(x ₂ , y ₂ ) Horizontal and vertical seats respectively representing upper left corner and lower right corner of new defect areaMarking;

then, using the old class teacher model, the student model extracts features F from the training image ^OT ，F ^S And locating the feature region belonging to the new type of defect in the feature according to the relative position of the new type of defect in the defect image, and similarly, decomposing the feature into a new type of defect feature region NCF and a non-new type of defect feature region：

；

Wherein f ^OT (i,j)，f ^S (i, j) each represents F ^OT ，F ^S Elements of the ith row and jth column, (i, j) represent coordinates of the feature, h, w represent height and width of the feature, respectively, NCF represents a feature region corresponding to a new type of defect in the defect image,representing a characteristic region corresponding to a non-new type defect in the defect image;

finally, decoupling the features, and removing the features belonging to the new types of defects from the features to obtain the decoupled features:

；

wherein F' ^OT ，F ' ^S Representing the decoupled features.

The incremental training further comprises:

calculating the decoupling characteristic distillation loss of the old teacher model:

；

wherein,representing the decoupling characteristic distillation loss of the old teacher model;

then, the old classThe decoupling characteristic distillation loss of the teacher model is combined with the RPN layer distillation loss and the RCN layer distillation loss to form the old knowledge distillation loss; combining the characteristic distillation loss of the new teacher model with the RPN layer distillation loss and the RCN layer distillation loss to obtain new knowledge distillation loss; finally, combine the old class knowledge weight lambda ₁ New class knowledge weight lambda ₂ And old class knowledge distillation loss, new class knowledge distillation loss, the student model uses the loss of the new class data set training and regularization loss to form a model overall loss function, the loss function is as follows:

；

wherein,distillation loss of old class knowledge representing old class teacher model, < ->New class knowledge distillation loss representing new class teacher model, < ->Respectively represent the RPN distillation Loss (using L2 Loss function) of the old class teacher model, the RCN distillation Loss (using cross entropy Loss function) of the old class teacher model, the characteristic distillation Loss (using L2 Loss function) of the new class teacher model, the RPN distillation Loss (using L2 Loss function) of the new class teacher model, the RCN distillation Loss (using cross entropy Loss function) of the new class teacher model, the Loss represents the model overall Loss, L _RCN Representing the loss of training of a student model using a new class of data set, L _reg (. Cndot. ) is a regularization loss, specifically as follows:

；

and finally, performing end-to-end training on the student model by using the model total loss function to obtain a trained student model.

Compared with the prior art, the invention has the following beneficial effects:

1. the weights of new defects and old defects are captured more effectively in the training process, and the generalization capability of the model is improved;

2. the student model is effectively prevented from learning inaccurate new-type defect feature knowledge from the old-type teacher model, so that the feature distillation effect is improved, and the model performance is improved;

3. the method has higher instantaneity and smaller model scale, and can be applied to a flexible circuit board defect detection scene with higher instantaneity requirement.

Drawings

Fig. 1 is a flexible circuit board defect type incremental dataset defect image.

Fig. 2 is a flexible circuit board defect type incremental dataset defect target label image.

FIG. 3 is a flow chart of class delta defect detection.

FIG. 4 is a schematic representation of a FasterR-CNN target detection model.

Fig. 5 is a diagram of a dual teacher architecture framework.

Fig. 6 is a diagram of a new and old class adaptive module framework.

Fig. 7 is a diagram of a decoupling signature distillation framework.

Fig. 8 is an effect diagram of a similar incremental flexible circuit board defect detection method based on a dual teacher architecture.

Detailed Description

The technical scheme of the invention is described in detail below with reference to the accompanying drawings.

The embodiment provides a similar increment flexible circuit board defect detection method based on a double-teacher architecture, which specifically comprises the following steps:

s1, establishing a self-adaptive double-teacher architecture: the method comprises the steps of establishing a double-teacher architecture and establishing a new-old type self-adaptive module;

establishing a double teacher architecture: respectively establishing a student model, an old teacher model and a new teacher model according to the defect incremental data set of the flexible circuit board;

the flexible circuit board defect type incremental data set does not exist yet, a flexible circuit board defect type incremental data set (FPCSD 2023) needs to be established, and the defect types of the FPCSD2023 data set comprise 11 types of crush, crease, pollution, scratch, CVL foreign matters, oxidation under a film, ink foreign matters, ink falling, ink pollution, rough gold surface, skip plating and the like.

Then, various flexible circuit board defect samples are collected in an expanding mode, the number of data sets is greatly improved, the flexible circuit board data sets are expanded by utilizing a data enhancement mode due to the fact that the number of the flexible circuit board samples is small, the flexible circuit board samples are photographed after being subjected to data enhancement modes such as rotation, scaling, translation and illumination transformation, and finally an expanded FPCSD2023 data set is obtained, and the image resolution is 768 pixels. The number of FPCSD2023 data sets is 2211, labeling is carried out on the defect images of the flexible circuit board by using labeling software Labelimg, the defect images of the defect data sets of the flexible circuit board are shown in the figure 1, and the label images of the defect data sets of the flexible circuit board are shown in the figure 2;

finally, the old class data set and the new class data set are partitioned. Due to the flexible production mode of the flexible circuit board, new defect types can continuously appear along with the change of products and production processes, and the defect detector needs to learn the capability of identifying new types of defects on the premise of keeping the capability of identifying old types of defects, as shown in fig. 3. The FPCSD2023 dataset was divided into an old type dataset and a new type dataset according to defect types to simulate the flex circuit board to continuously appear new types of defects, the old type dataset comprises crush, crease, pollution, scratch, CVL foreign matter, oxidation under film, ink foreign matter and ink drop 8 types, and the new type dataset comprises ink pollution, rough gold surface and plating 3 types.

The FasterR-CNN target detection model pre-trained on the ImageNet is used as a flexible circuit board defect detection model, and the network structure of the FasterR-CNN target detection model is shown in figure 4 and consists of a feature extractor FE, a region of interest extractor RPN and a classifier RCN, wherein the feature extractor FE uses a ResNet50 network for extracting features from a defect image, the region of interest extractor RPN aims at generating candidate regions from the features, and the features mapped by the candidate regions are sent into the classifier RCN for defect classification to generate a final defect prediction result. The dual teacher architecture is shown in fig. 5, where three models are built that play different roles. Building an old-class teacher model, wherein the old-class teacher model only uses an old-class data set for training; building a new type teacher model, wherein the new type teacher model only uses the new type data set for training; a student model is built, the student model is trained by using a new class data set, and meanwhile, distillation knowledge of an old class teacher model for identifying the old class and distillation knowledge of a new class teacher model for identifying the new class are obtained through knowledge distillation (see Hinton G, vinylals O, dean J. Distilling the knowledge in a neural network [ J ]. ArXiv preprint arXiv:1503.02531, 2015), so that the identification capacity of the student model for the old class defects and the adaptability of the student model for the new class defects are improved. The student model, the old teacher model and the new teacher model all use FasterR-CNN target detection models.

And designing a new and old type self-adaptive module: as shown in fig. 6, according to the features extracted by the student model, the old teacher model and the new teacher model, the old knowledge weight and the new knowledge weight are obtained, and the old knowledge weight and the new knowledge weight are weighted into the loss function to adjust the importance of the new knowledge and the old knowledge during training. The specific operation steps of the method sequentially comprise a feature fusion operation, a global average pooling operation, a channel weight calculation operation, a feature re-weighting operation and a knowledge weight calculation operation;

；

wherein F is ^OT Representing the extracted characteristics of old teacher model, F ^S Representing the extracted features of the student model, F ^NT Representing the extracted characteristics of a new teacher model, F ^{O T} ，F ^S ，F ^NT ∈R ^H*W*C H, W, C the height, width and number of channels of the feature, respectively, concat (·) means that features are merged by stitching, dim=c means that the channel dimension is taken inRow splice fusion, Z ^O ，Z ^N Represents F ^S Respectively with F ^OT ，F ^NT Fused features, Z ^O ，Z ^N ∈R ^H*W*2C 。

Global average pooling operation: designing two parallel global average pooling layers, and performing global average pooling operation on the fused features, wherein the specific operation formula is as follows:

；

wherein Fc is _C/8 (·)，Fc _2C (. Cndot.) represents a fully connected operation with outputs of sizes C/8 and 2C, Q, respectively ^O ，Q ^N Weights of each channel of old-class knowledge and new-class knowledge are respectively represented, Q ^O ，Q ^N ∈R ^1*1*2C 。

Feature re-weighting operation: the obtained channel weight Q ^O ，Q ^N For feature Z ^O ，Z ^N Re-weighting each channel in (a) to emphasize features of important channels while suppressing features of non-important channels, resulting in re-weighted features:

；

wherein,representing multiplication of corresponding elements, Z' ^O And Z' ^N Representing the re-weighted old and new classes of features, Z' ^O ，Z' ^N ∈R ^H*W*2C 。

；

wherein Softmax (. Cndot. ) represents normalization, lambda.using Softmax procedure ₁ ，λ ₂ Representing the normalized old class knowledge weight and new class knowledge weight;

the obtained normalized old class knowledge weight and new class knowledge weight are compared with the previous lambda ₁ ，λ ₂ Update to ensure lambda ₁ ，λ ₂ The latest state of the new and old class weights can be reflected.

S2, decoupling characteristic distillation: as shown in fig. 7, according to the position information of the new type defect obtained in the data set, the decoupling idea is utilized to remove the new type feature in the extracted features of the old type teacher model and the student model in the feature distillation, so as to improve the feature distillation effect;

；

wherein p (m, n) represents the value of the pixel point of the m-th row and the n-th column, (m, n) represents the coordinates of the pixel point in the defect image, H, W respectively represent the height and width of the defect image, NC represents the region of a new type of defect in the defect image,representing areas of non-new types of defects in the defect image, (x) ₁ , y ₁ )，(x ₂ , y ₂ ) Respectively representing the horizontal and vertical coordinates of the left upper corner and the right lower corner of the new type of defect area;

；

wherein F' ^OT ，F ' ^S Representing the decoupled features.

S3, incremental training is carried out: and inputting the defect image, the corresponding label and the characteristics after decoupling of the weights of the new class and the old class into a student model, and performing end-to-end training to obtain a trained defect detection network.

Firstly, calculating the decoupling characteristic distillation loss of the old teacher model:

；

then, combining the decoupling characteristic distillation loss, the RPN layer distillation loss and the RCN layer distillation loss of the old class teacher model into the old class knowledge distillation loss, and combining the characteristic distillation loss, the RPN layer distillation loss and the RCN layer distillation loss of the new class teacher model into the new class knowledge distillation loss; finally, combine the old class knowledge weight lambda ₁ New class knowledge weight lambda ₂ And old class knowledge distillation loss, new class knowledge distillation loss, the student model uses the loss of the new class data set training and regularization loss to form a model overall loss function, the loss function is as follows:

；

wherein,distillation loss of old teacher model is represented, +.>Represents the distillation loss of the new class of teacher models,respectively represent the RPN distillation Loss (using L2 Loss function) of the old class teacher model, the RCN distillation Loss (using cross entropy Loss function) of the old class teacher model, the characteristic distillation Loss (using L2 Loss function) of the new class teacher model, the RPN distillation Loss (using L2 Loss function) of the new class teacher model, the RCN distillation Loss (using cross entropy Loss function) of the new class teacher model, loss is the model overall Loss, L _RCN Representing the loss of training of student models using new classes of data sets (i.e., model loss of FasterR-CNN, for specific forms see Ren S, he K, girsheck R, et al Faster R-CNN: towards real-time object detection with region proposal networks [ J ]]. Advances in neural information processing systems, 2015, 28），L _reg (. Cndot. ) is a regularization loss, specifically as follows:

；

and finally, performing end-to-end training by using the model total loss function to obtain a trained student model.

S4, defect identification is carried out: inputting the defect images into a trained student model, generating a final defect prediction result by the student model, taking an average precision mean value (MAP) as an evaluation index of defect detection, wherein the MAP of various defects and MAP of all defects of the trained student model on a flexible circuit board defect type incremental data set are shown in table 1, the prediction effect of the defect detection of the flexible circuit board is visualized as shown in figure 8, and the effective identification of various defects of the flexible circuit board can be realized.

Table 1 test results of trained student models on flexible printed circuit defect class incremental datasets

Claims

1. The method for detecting the defects of the similar increment flexible circuit board based on the double-teacher architecture is characterized by comprising the following steps of:

s3, incremental training is carried out: inputting the distillation knowledge of the new class teacher model, the distillation knowledge of the old class teacher model and the new class data set into the student model, and performing end-to-end training by combining the old class knowledge weight and the new class knowledge weight to obtain a trained student model;

2. The dual-teacher architecture-based class incremental flexible circuit board defect detection method of claim 1, wherein the student model, the old class teacher model and the new class teacher model are both FasterR-CNN target detection models; the FasterR-CNN target detection model network consists of a feature extractor FE, a region of interest extractor RPN and a classifier RCN, wherein the feature extractor FE is used for extracting features from a defect image, the region of interest extractor RPN is used for generating candidate regions from the features, and the features mapped by the candidate regions are sent into the classifier RCN for defect classification to generate a final defect prediction result.

3. The method for detecting the defects of the class increment flexible circuit board based on the double-teacher architecture according to claim 1, wherein the specific operation steps of the new and old class self-adaptation module are feature fusion operation, global average pooling operation, channel weight calculation operation, feature re-weighting operation and knowledge weight calculation operation in sequence:

；

4. The method for detecting defects of a class increment flexible circuit board based on a dual-teacher architecture according to claim 1, wherein the decoupling feature distillation includes obtaining position information of new class defects, and removing new class features in old class teacher models and student model extraction features in feature distillation:

；

then, using the old class teacher model, the student model extracts features F from the training image ^OT ，F ^S According to the relative position of the new type defect in the defect image, locating the characteristic region belonging to the new type defect in the characteristic, and decomposing the characteristic into a new type defect characteristic region NCF and a non-new type defect characteristic region：

；

Wherein f ^OT (i, j)，f ^S (i, j) each represents F ^OT ，F ^S Elements of the ith row and jth column, (i, j) represent coordinates of the feature, h, w represent height and width of the feature, respectively, NCF represents a feature region corresponding to a new type of defect in the defect image,representing non-new defect correspondence in defect imagesIs a feature region of (1);

；

wherein F' ^OT ，F' ^S Representing the decoupled features.

5. The dual teacher architecture-based incremental flexible circuit board defect detection method of claim 1, wherein the incremental training further comprises:

；

wherein,distillation loss of old class knowledge representing old class teacher model, < ->New class knowledge distillation loss representing new class teacher model, < ->Respectively representing RPN distillation Loss of old class teacher model, RCN distillation Loss of old class teacher model, characteristic distillation Loss of new class teacher model, RPN distillation Loss of new class teacher model, RCN distillation Loss of new class teacher model, loss represents model total Loss, L _RCN Representing the loss of training of a student model using a new class of data set, L _reg (. Cndot. ) is a regularization loss, specifically as follows:

；