CN115965571B

CN115965571B - Multi-source information fusion detection and model training method and medium for incremental autonomous learning

Info

Publication number: CN115965571B
Application number: CN202210461803.1A
Authority: CN
Inventors: 何良雨; 崔健; 刘彤
Original assignee: Fengrui Lingchuang Zhuhai Technology Co ltd
Current assignee: Fengrui Lingchuang Zhuhai Technology Co ltd
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2023-08-22
Anticipated expiration: 2042-04-28
Also published as: CN115965571A

Abstract

The application relates to the field of deep learning and precise detection in intelligent manufacturing, in particular to a method and a medium for multi-source information fusion detection and model training based on incremental autonomous learning, which are used for semiconductor detection. Comprising the following steps: inputting the detected image into a trained target detection model for detection; the target detection model comprises a deep convolutional neural network model, and the model comprises a multi-source information fusion full convolutional network feature extraction channel, an edge perception feature enhancement channel and an adaptive feature interaction fusion module. The full convolution network feature extraction channel is used for extracting depth texture features of the detected image, the edge perception feature enhancement channel is used for extracting edge texture features in the depth texture features, and the self-adaptive feature interaction fusion module is used for fusing the depth texture features and the edge texture features. The training is performed by adding a network for new detection target class to the trained network, and the incremental autonomous learning design ensures the detection capability of the incremental network on the defects of the new and old classes.

Description

Multi-source information fusion detection and model training method and medium for incremental autonomous learning

Technical Field

The application relates to the field of deep learning and precise detection of intelligent manufacturing, which can be applied to semiconductor defect detection, in particular to a multisource information fusion detection method, a model training method and a storage medium based on incremental autonomous learning.

Background

Various defects are easily generated in the production process of articles or products, such as device defects in the field of electronic manufacturing and various defects in the ultra-precise production process of semiconductors. These defects can affect the yield, life and reliability of the product. Therefore, surface defect detection of such articles or products is a key element of quality control. The surface defect detection method based on machine vision has the advantages of high efficiency, high accuracy, high real-time performance and the like, and is widely researched and applied in the defect detection field.

However, the characteristics are difficult to define due to the large number of defect types, and only occur in the production process, so that the method for detecting the surface defects of the article or the product based on computer vision encounters difficulty. In addition, in the field of industrial detection, the sources of data are numerous, and the data have different structural information, so that how to effectively extract various defect target features to perform defect detection is very important, and the technical problem to be solved is urgent.

Disclosure of Invention

The embodiment of the application provides a multisource information fusion detection method, a model training method and a storage medium based on incremental autonomous learning, which are used for effectively extracting various defect target characteristics and improving defect detection accuracy. The method solves the problem of catastrophic forgetting of the original target caused by updating the target detection model by new data by using a distillation loss function.

A multisource information fusion detection method based on incremental autonomous learning includes:

obtaining a detected image, and inputting the detected image into a trained target detection model to obtain a defect detection result;

the target detection model comprises a deep convolution neural network model, wherein the deep convolution neural network model comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and a self-adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interaction fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; the defect detection network is used for detecting defect targets based on the fusion characteristic diagram.

In one embodiment, the full convolution network feature extraction channel comprises a plurality of grouped convolution blocks, each grouped convolution block comprising a successive 3 x 3 convolution kernel and 1 x 1 convolution kernel.

In one embodiment, the edge-aware feature enhancement channel is specifically configured to: compressing an image channel and reducing the feature dimension of the depth texture feature map, and extracting features of the feature map after dimension reduction by using an edge feature extraction convolution kernel to obtain an edge texture feature map;

the calculation formula of the edge feature extraction convolution kernel is as follows:L _i,o representing a target convolution kernel, representing a dot product operation, G (c, v) representing a linear filter for edge extraction, c representing a direction of the linear filter, v representing a scale of the linear filter, a dimension of the target convolution kernel being consistent with a number of directions of the linear filter>Representing the convolution kernel after filtering by the linear filter.

In one embodiment, the linear filter comprises a Gabor filter.

In one embodiment, compressing the image channel and feature dimension reduction is performed on the depth texture feature map, including:

the depth texture feature map U ₁ The normalization calculation was performed as follows:

n represents a depth texture feature map U ₁ Is a dimension of (2);

Z _ij represented as the depth texture feature map U ₁ Matrix after standardized calculation is carried out, and the depth texture feature map U ₁ Is H ₁ ×W ₁ N-dimensional vector of size, the depth texture feature map U ₁ Each sample in (a) is denoted as u _ij Wherein s=h ₁ ×W ₁ ，Mean value>Representing standard deviation;

obtaining depth texture feature map U ₁ Is a correlation coefficient matrix R of (2) ₁ ，

Calculating a correlation coefficient matrix R ₁ Satisfies |λI-R ₁ When |=0, the obtained eigenvalue λ _i (i=1, 2, …, N), I being a eigenvector of the eigenvalue λ;

for characteristic value lambda _i After being arranged according to the order of magnitude, the characteristic value lambda is calculated according to a principal component analysis and calculation method _i Corresponding feature vector e _i Then according to the characteristic value lambda _i And a characteristic value lambda _i Corresponding feature vector e _i Obtaining a feature map U after dimension reduction ₁ ′。

In an embodiment, the adaptive feature interaction fusion module is further configured to: calibrating the fusion feature map to obtain a calibrated feature map; the defect detection network is used for detecting a defect target based on the calibrated feature map.

In an embodiment, calibrating the fused feature map to obtain a calibrated feature map includes:

carrying out global tie pooling operation on the feature images on each channel fusing the feature images, wherein W and H are the width and the height of the feature images on each channel respectively;

Compressing the dimension W×H of the feature map on each channel to generate channel statistics E ε R ⁿ The calculation formula is as follows:E _k d (i, j) represents the feature map on each channel for the pooling value of the feature map k channel corresponding feature map on each channel; wherein, K is more than or equal to 1 and less than or equal to K, K represents characteristic diagrams on all channels, i is more than or equal to 1 and less than or equal to W, j is more than or equal to 1 and less than or equal to H;

e on the respective channels of the fusion profile using an activation function _k Performing excitation operation to obtain a pooling value Z of the feature map on each channel _k The weight value beta is given _k ；

The weight value beta after excitation _k And corresponding feature map D _(i,j) Multiplying to obtain a calibrated characteristic diagram U _(i,j) ＝D _(i,j) ·β _k 。

In one embodiment, E on the respective channels of the fused feature map is determined using an activation function _k Performing excitation operation to obtain a pooling value Z of the feature map on each channel _k The weight value beta is given _k Comprising:

e on the respective channels of the fused feature map using the following formula _k Performing excitation operation to obtain a pooling value Z of the feature map on each channel _k The weight value beta is given _k ：

β _k Pooling value E of feature map on kth channel expressed as fusion feature map _k The weight value given, σ represents the sigmoid activation function, γ is the relu activation function, and w is the full-connection layer w· (γ·e _k ) Represent E _k A nonlinear full-join operation is performed.

A multisource information fusion model training method based on incremental autonomous learning comprises the following steps:

acquiring a training image set, wherein the training image set comprises training images;

training the target detection network through the training image to obtain a trained target detection model;

In an embodiment, training a target detection network through a training image to obtain a trained target detection model, migrating network parameters of the trained detection model for old class detection into the target detection network, and training the target detection network through the training image to obtain a trained target detection model; the network structure of the detection model for old class detection is the same as that of the target detection network.

A multi-source information fusion model training device based on incremental autonomous learning, comprising:

the acquisition module is used for acquiring a training image set, wherein the training image set comprises training images;

the training module is used for training the target detection network through the training image so as to obtain a trained target detection model; the target detection model comprises a deep convolution neural network model, wherein the deep convolution neural network model comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and a self-adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interaction fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; the defect detection network is used for detecting defect targets based on the fusion characteristic diagram.

A multi-source information fusion detection device based on incremental autonomous learning, comprising:

the acquisition module is used for acquiring the detected image;

the input module is used for inputting the detected image into the trained target detection model to obtain a defect detection result; the target detection model comprises a deep convolution neural network model, wherein the deep convolution neural network model comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and a self-adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interaction fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; the defect detection network is used for detecting defect targets based on the fusion characteristic diagram.

A computer device comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the multi-source information fusion detection method based on the incremental autonomous learning or the multi-source information fusion model training method based on the incremental autonomous learning when executing the computer program.

A computer readable storage medium storing a computer program which when executed by a processor implements the above-described multi-source information fusion detection method based on incremental autonomous learning or the steps of the multi-source information fusion model training method based on incremental autonomous learning.

In the scheme realized by the multi-source information fusion detection method, the model training method, the device, the computer equipment and the storage medium based on the incremental autonomous learning, different texture features of a detected product or object can be respectively extracted through two parallel feature extraction channels, including depth texture features and edge texture features, and then the features extracted through the two parallel channels are fused through an adaptive feature interaction fusion module to build a deep convolutional neural network model for defect detection, such as semiconductor defect detection, with a subsequent defect detection network, so that effective defect features can be extracted, and the defect detection capability is improved.

In addition, the object detection model of the embodiment of the application has the capability of incremental autonomous learning while improving the model feature extraction capability through the deep convolutional neural network model so as to solve the problem of catastrophic forgetting of the original object caused when new data update the network. The target detection model of the embodiment of the application can comprise two deep convolutional neural network models, adopts a special distillation learning mode, can use the existing deep convolutional neural network model which is trained on the old class to guide the training of the new deep convolutional neural network model, and adds a distillation loss function to improve the incremental learning capacity of the target detection model, and can greatly improve the training efficiency of the model while maintaining the detection capacity by adding the new detection target class without retraining the whole data set, thereby solving the problem of catastrophic forgetting of the original target caused by updating the network by new data.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a full convolution network feature extraction channel according to one embodiment of the present application;

FIG. 2 is a schematic diagram of an edge-aware feature enhancement channel according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a fusion process of the adaptive feature interaction fusion module according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a training process of a multisource information fusion model based on incremental autonomous learning according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a multi-source information fusion model training device based on incremental autonomous learning according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a multi-source information fusion detection device based on incremental autonomous learning according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a computer device according to an embodiment of the application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The multi-source information fusion detection method based on the incremental autonomous learning and the corresponding model training method provided by the embodiment of the application can be applied to terminal equipment or a server, wherein the model training method is generally applied to the server, the multi-source information fusion detection method based on the incremental autonomous learning can be applied to the terminal equipment or the server, the multi-source information fusion detection method based on the incremental autonomous learning can be directly used in a trained target detection model and is generally applied to the terminal equipment, and the terminal equipment can be but is not limited to various personal computers, notebook computers, smart phones, tablet computers and portable wearable equipment. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

The embodiment of the application comprises a multisource information fusion detection method based on incremental autonomous learning, a multisource information fusion model training method, a device, equipment and a storage medium based on incremental autonomous learning, and the embodiment of the application is fully described through the parts.

Model training part

The model training part comprises a first training process and a second training process, wherein the first training process is a general multi-source information fusion model training process, the second training process is a multi-source information fusion model training process added with incremental autonomous learning, and the training process is an incremental autonomous learning training process and is described below respectively.

The embodiment of the application provides a multisource information fusion training method, which comprises the following steps: acquiring a training image set, wherein the training image set comprises training images; training the target detection network through the training image to obtain a trained target detection model; the target detection model comprises a deep convolution neural network model DMSNet, wherein the deep convolution neural network model DMSNet comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and a self-adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interaction fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; the defect detection network is used for detecting defect targets based on the fusion characteristic diagram.

It can be seen that the target detection model trained by the training method provided by the embodiment of the application comprises a deep convolutional neural network model DMSNet, the deep convolutional neural network model DMSNet can respectively extract different texture features of a detected product or object, including deep texture features and edge texture features, the features extracted by the two parallel channels are fused through an adaptive feature interaction fusion module to form a multi-source information deep fusion module (Deep fusion of multi-source information), and the multi-source information deep fusion module and a subsequent defect detection network are utilized to build a deep convolutional neural network model DMSNet for detecting defects, such as semiconductor defects, so that effective defect features can be extracted, and defect detection capability is improved.

The defect detection network in the deep convolutional neural network model DMSNet may be a conventional defect detection network, which is not limited in particular, for example, a prediction and regression neural network for a certain target class (such as a semiconductor defect), and the multi-source information deep fusion neural network in the deep convolutional neural network model DMSNet is a feature extraction network specifically proposed by the present application, specifically, the multi-source information deep fusion neural network includes a parallel feature extraction network and an adaptive feature interaction fusion module, and each part of the multi-source information deep fusion neural network provided by the present application is described in detail below.

The parallel feature extraction network provided by the embodiment of the application comprises two parallel feature extraction channels, namely a full convolution network feature extraction channel and an edge perception feature enhancement channel, which are respectively used for extracting depth texture features and corresponding edge texture features of a defect target. The working processes of the full convolution network feature extraction channel, the edge perception feature enhancement channel and the self-adaptive feature interaction fusion module are respectively described below.

1. Full convolution network feature extraction channel

In one embodiment, the full convolution network feature extraction channel comprises a plurality of grouped convolution blocks, each grouped convolution block comprising a successive 3 x 3 convolution kernel and a 1 x 1 convolution kernel. It should be noted that, the number of the grouped convolution blocks can be flexibly set according to specific application scenarios and task requirements, and the embodiment of the application is not limited.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a full convolution network feature extraction channel according to an embodiment of the present application, where the full convolution network feature extraction channel includes a plurality of packet convolution blocks, and each packet convolution block includes consecutive 3×3 and 1×1 convolution layers. That is, the full convolution network feature extraction channel is to build a full convolution neural network framework by using a group convolution block formed by a plurality of continuous 3×3 and 1×1 convolution layers.

The feature extraction channel of the full convolution network can be used for realizing the dimension transformation of the feature map by adjusting the step length of the convolution kernel instead of the traditional pooling layer in the forward propagation process of the network, so that the feature information loss can be reduced, the high resolution of convolution output is ensured, and the detection precision is improved. It should be further noted that the multi-layer 3×3 convolution kernel has a stronger nonlinear expression capability than the conventional 5×5 convolution kernel or other size convolution kernel, and can use fewer parameters to achieve the same receptive field. And a convolution layer of 1 multiplied by 1, which is continuous with the convolution kernel of 3 multiplied by 3, is configured, so that the model complexity can be reduced as a dimension reduction module in the forward propagation process of the network, and feature fusion is carried out on feature graphs on different channels with the same depth in the forward propagation process, so that the feature extraction capability of a detection target is enhanced.

Illustratively, continuing with fig. 1, assuming that the input image X is an M-dimensional vector of size H 'X W', the dimension of the 3X 3 convolution kernel is the same as that of the input image X, a filter of one convolution kernel may be used for each input channel separately, and then the output of the filter is subjected to feature fusion by using a 1X 1 convolution kernel of N dimensions, to obtain a complete convolution output Y, which is an N-dimensional vector of size H X W. It can be seen that the computational effort can be greatly reduced by grouping the convolutions of the blocks.

2. Edge-aware feature enhancement channels

In order to further improve the feature extraction capability of the deep convolutional neural network model DMSNet, the embodiment of the application further adds a feature enhancement channel focusing on edge feature extraction, namely an edge perception feature enhancement channel, to the deep convolutional neural network model DMSNet, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map.

As shown in fig. 2, fig. 2 is a schematic structural diagram of an edge-aware feature enhancement channel according to an embodiment of the present application, where the edge-aware feature enhancement channel is further specifically configured to: compressing an image channel and reducing the feature dimension of the depth texture feature map, and extracting features of the feature map subjected to dimension reduction by using an edge feature extraction convolution kernel (EFC convolution kernel for short in the application) to obtain an edge texture feature map; the calculation formula of the edge feature extraction convolution kernel is as follows:L _i,o the target convolution kernel is represented, G (c, v) represents a linear filter for edge extraction, which may be a set of linear filters, c represents the direction of the linear filter, v represents the scale of the linear filter, and the dimension of the target convolution kernel is consistent with the direction number of the linear filter >Representing the convolution kernel after filtering by the linear filter.

As shown in FIG. 2, D in FIG. 2 _k ×D _k The xM is a target convolution kernel, and can be a traditional convolution kernel, D _k ×D _k The size of the convolution kernel representing the target convolution kernel, and M represents the number of convolution kernels of the target convolution kernel, which is not particularly limited.

In an embodiment, the edge-aware feature enhancement channel is a PCA edge-aware feature enhancement channel, configured to perform PCA principal component analysis on a feature map U output from a full convolution network feature extraction channel, so as to implement compression and feature dimension reduction of an image channel, and then perform edge feature extraction on the feature map U' after dimension reduction through a novel edge feature extraction convolution kernel, and output an edge texture feature map X corresponding to the feature map U.

In one embodiment, the linear filter comprises a Gabor filter. In this embodiment, the edge feature extraction convolution kernel (EFC convolution kernel) is designed by using the characteristic that the linear filter such as the Gabor filter has strong edge feature extraction capability, and combining the Gabor filter with the conventional target convolution kernel, so as to design a novel convolution kernel and improve the capability of the model for extracting the edge feature. Specifically:

in one embodiment, compressing the image channel and feature dimension reduction are performed on the depth texture feature map, including the following processes:

n represents a depth texture feature map U ₁ Is a dimension of (2);

for characteristic value lambda _i Arranged in order of size and then according to the mainComponent analysis and calculation method for calculating characteristic value lambda _i Corresponding feature vector e _i Then according to the characteristic value lambda _i And a characteristic value lambda _i Corresponding feature vector e _i Obtaining a feature map U after dimension reduction ₁ '。

In this embodiment, a feature map U after dimension reduction is obtained ₁ After' EFC convolution kernel is to perform dot product operation on Gabor filter and target convolution kernel, namely performing feature extraction on feature after dimension reduction by using edge feature extraction convolution kernel to obtain edge texture feature map. By reducing the dimension of the characteristic diagram U ₁ And feature extraction is carried out by utilizing an edge feature extraction convolution kernel, so that the extraction capability of the depth convolution neural network model DMSNet on the edge texture features of the defect target is effectively improved.

As shown in FIG. 2, the edge-aware feature enhancement channel provided by the embodiment of the present application can output U to the first packet convolution block of the full convolution network feature extraction channel ₁ Performing PCA principal component analysis to realize compression and feature dimension reduction of the channel, and then reducing the dimension of the feature U ₁ ' edge feature extraction by edge feature extraction convolution kernel (EFC convolution kernel) to obtain output feature X ₁ And adopting the same processing mode for the output characteristic diagrams of other grouping convolution blocks, thereby obtaining the edge texture characteristic diagram corresponding to each grouping convolution block.

3. Self-adaptive feature interaction fusion module

In the embodiment of the application, in order to fully utilize the defect texture features of different layers of convolutional neural networks, as shown in fig. 3, fig. 3 is a schematic diagram of a fusion process of an adaptive feature interaction fusion module provided by the embodiment of the application, and the adaptive feature interaction fusion module is used for fusing a depth texture feature map and an edge texture feature map to obtain a fusion feature map.

In an embodiment, the adaptive feature interaction fusion module is further configured to: calibrating the fusion feature map to obtain a calibrated feature map; the defect detection network is used for detecting a defect target based on the calibrated feature map. In this embodiment, the calibrated feature map is used to facilitate improved defect target detection capability and accuracy.

In one embodiment, E on the respective channels of the fused feature map is determined using an activation function _k Performing excitation operation to obtain a pooling value Z of the feature map on each channel _k The weight value beta is given _k Comprising: e on the respective channels of the fused feature map using the following formula _k Performing excitation operation to obtain a pooling value Z of the feature map on each channel _k The weight value beta is given _k ：

β _k Pooling value E of feature map on kth channel expressed as fusion feature map _k The weight value given, σ represents the sigmoid activation function, γ is the relu activation function, and w is the full-connection layer w· (γ·e _k ) Represent E _k A nonlinear full-join operation is performed. In the embodiment, feature fusion among channels of the fusion feature map can be realized, and a weight value beta is given to each layer of channels _k Then exciting and calculating the weight value of each channel of the fusion feature map by using a sigmoid function, increasing the weight of the effective feature, inhibiting the weight of the ineffective feature, and then exciting the weight value beta _k And a feature map D corresponding thereto _(i,j) Multiplying to obtain a recalibrated characteristic diagram U _(i,j) ＝D _(i,j) ·β _k The method can achieve the purpose of enhancing the characteristics favorable for the defect target of the detected object in the characteristic map and inhibiting the interference characteristics, namely, the effective characteristic weight is increased, the weight of invalid characteristics is reduced and the characteristic characterization capability of the model is improved by activating the weight values of the characteristic maps on different channels in each layer of the network.

In an embodiment, the deep convolutional neural network model DMSNet provided by the present application may include at least one adaptive feature interaction fusion module, which is not limited in detail.

In the embodiment of the application, the target detection network can be trained through training images to obtain a trained target detection model, in the training process of utilizing a training image set, the derivative of all errors on each parameter in the network can be calculated by utilizing a back propagation algorithm, then the parameters are updated according to gradient descent, the network parameters are updated through multiple iterations, the training of the target detection model is completed, and the specific parameter process related to the training process and the model training completion condition are not described in detail here, wherein the training process belongs to the first training process.

In addition to the first training process, in the embodiment of the present application, the training may be performed on the target detection model based on the incremental learning, that is, the second training process, which is described below.

4. Incremental autonomous learning network model training process

In an embodiment, the training the target detection network through the training image to obtain a trained target detection model includes: migrating the trained network parameters of the detection model for old class detection into the target detection network, and training the target detection network through the training image to obtain a trained target detection model; the network structure of the detection model for old class detection is the same as that of the target detection network.

The present application provides two of the above-described deep convolutional neural network models DMSNet, which can be represented by network a and network B. As shown in fig. 4, in the training process, the network structure includes two networks, one is network a, where the network a is an existing deep convolutional neural network model DMSNet trained on old classes, and for example, network a is used for defect detection of a certain 5 classes; since a new type of defect detection needs to be added to the model classifier, another is a trained network B for newly adding a new type of defect detection, and the network B also adopts a deep convolutional neural network model DMSNet, in this embodiment, a target detection incremental network is formed, and the network B is used for newly adding a new type of defect detection.

It can be seen that, in the network structure, the target detection model provided by the embodiment of the application is that the prediction and regression neurons for new detection target types, namely the network B (the currently trained deep convolutional neural network model DMSNet), are added at the last fully-connected output layer of the trained network a, and the incremental design ensures the detection capability of the incremental network B on new and old types of defect targets. In the incremental learning training process, firstly, the weight of the network A is frozen, the weight of the network A with the detection capability of the old class targets is duplicated by the weight initialization of the incremental network B, and in order to ensure that the detection capability of the network A on the old class targets is migrated into the network B, the distillation loss is introduced. Only new types of training data are used in training the network B, so that training efficiency can be improved.

As shown in fig. 4, the object detection model provided in the embodiment of the present application adopts an end-to-end network model structure, and adopts a single-stage manner during model reasoning, that is, it is required to predict whether each feature point on the final layer of feature map of the object detection model has an object, and simultaneously predict the category and position coordinates of the object. Compared with the traditional double-stage target detection algorithm, the method has the advantages that the method needs to find the region of interest first, then the target in the region of interest is identified and positioned, the single-stage detection algorithm can realize target detection by extracting the features only once, and the detection speed of the algorithm is improved. The loss function of the single-stage object detection model needs to simultaneously comprise judgment on whether the feature points exist an object or not and calculation of the loss of the category and the position coordinates of the object. In order to realize model incremental autonomous learning, knowledge distillation loss is required to be added during training. In summary, the loss functions of the object detection model include loss by location coordinates, confidence (IOU) loss, classification loss, and knowledge distillation loss. That is, the loss function of the object detection model can be expressed as: loss=l _coord +l _cls +l _iou +λl _d 。

Wherein l _coord For the bounding box loss function, l _cls For a two-class cross entropy loss function, l _iou For confidence loss function, l _d And lambda is a weight value for knowledge distillation loss function, and can be flexibly set according to the characteristics of a detection target.

In one embodiment, the bounding box loss function (l _coord ) The method comprises the following steps of error of a center coordinate and error of width and height of a boundary frame, wherein a calculation formula is as follows:

in the above formula: lambda (lambda) _coord I=0, 1, s., is the weight of the coordinate error ² Is the ith grid; b is the number of bounding boxes in each grid; j=0, 1, B is the jth bounding box in each grid;if the defect target exists in the jth prediction frame of the ith cell, taking 1 if the defect target exists, otherwise taking 0; (x, y) is the center coordinates of each grid cell,is the corresponding predicted value. It should be noted that, in the embodiment of the present application, each feature point in the feature map output by the last layer of the object detection model is used as a grid, that is, several feature points in the feature map of the last layer have several grids. For example, when the size of the feature map of the last layer of the object detection model is 13×13, a 13×13 grid is drawn on the feature map, and a loss function is used to determine whether an object exists in each grid, and the class and position coordinates of the object.

Two-class cross entropy loss (l) _cls ) The calculation formula of (2) is as follows:

wherein: delta _i ，η _i Indicating whether a target defect exists in the ith grid, if so, delta _i ＝1,η _i =0, if not present, δ _i ＝0,η _i ＝1，p _i (c) Scoring for a predicted target true category;to a scoring value predicted to be non-target.

Confidence loss function (l _iou ) The method comprises the following steps:

wherein: lambda (lambda) _noobj To weight the confidence loss when no target is detected,or 1 indicates whether or not there is a defective object in the jth bounding box in the ith grid, if there is 0, there is no 1, C _i As the score value of the target class,is the coincidence proportion of the predicted frame and the real frame.

The knowledge distillation loss function is defined as:

wherein: l (L) _total And N represents the number of samples and the number of target categories selected in the old dataset, respectively. h's' _i,j ，k′ _i,j Representing the class probability distribution of the predicted value of the j-th object of the i-th class in the dataset via the softmax layer by the old deep convolutional neural network model (network a) and the class probability distribution of the output feature (logic) of the new deep convolutional neural network model (network B) to the object via the softmax layer, respectively.

In the training process, the derivative of all errors to each parameter in the network is calculated by using a back propagation algorithm, then the parameters are updated according to gradient descent, and the network parameters are updated through multiple iterations, so that the training of the model is completed.

It should be noted that, due to the numerous defect types, the characteristics are difficult to define, and only semiconductor defect detection occurs in the production process, for example, in the semiconductor field, so that the method for detecting the surface defects of the article or the product based on computer vision encounters difficulty. In addition, in the field of industrial detection, the sources of data are numerous, the data have different structural information, and in the practical application process, new detection targets (new detection categories) may need to be added to the model from time to time, if new source data are added to the model, the model is retrained, so that time and labor are wasted, the production efficiency is affected, and along with the continuous increase of the data, the training data set becomes huge, and the storage space of a computer is seriously occupied. Therefore, it is important how to effectively extract various defect target features and make the model have the capability of incremental autonomous learning.

In the embodiment of the application, in order to enable the target detection algorithm to be suitable for a product defect detection scene with continuously changed data sets, the target detection model of the embodiment of the application has the capability of incremental autonomous learning while improving the model feature extraction capability through the deep convolutional neural network model so as to solve the problem of catastrophic forgetting of the original target caused when the target detection model is updated by new data. The target detection model of the embodiment of the application can comprise two deep convolutional neural network models, adopts a special distillation learning mode, can use the existing deep convolutional neural network model which is trained on the old class to guide the training of the new deep convolutional neural network model, and adds a distillation loss function to improve the incremental learning capacity of the target detection model, and can greatly improve the training efficiency of the model while maintaining the detection capacity by adding the new detection target class without retraining the whole data set, thereby solving the problem of catastrophic forgetting of the original target caused by updating the network by new data.

Detection method part

The training of the object detection model provided by the embodiment of the application is introduced, and the multi-source information fusion detection method based on incremental autonomous learning is provided based on the object detection model, which can be used in various fields, for example, in the field of electronic manufacturing and used for surface defects of components. For example, in the field of semiconductor manufacturing, the method is used for detecting semiconductor defects, such as various defects of semiconductor surfaces and the like, can remarkably improve the capability of a target detection model for extracting defect texture features, particularly small defects such as gaps and scratches in the semiconductor defects, can be used for detecting new target types at any time, can be used for detecting the surface defects of products such as semiconductors, 3C (three-dimensional) electronics and the like, has wide application scenes and has higher value.

In an embodiment, a multi-source information fusion detection method based on incremental autonomous learning is provided, including: obtaining a detected image, and inputting the detected image into a trained target detection model to obtain a defect detection result; the target detection model comprises a deep convolution neural network model DMSNet, the deep convolution neural network model DMSNet comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and a self-adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interaction fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; the defect detection network is used for detecting defect targets based on the fusion characteristic diagram.

It can be seen that, according to the multi-source information fusion detection method based on incremental autonomous learning provided by the embodiment of the application, different texture features of a detected product or object, including depth texture features and edge texture features, can be respectively extracted through two parallel feature extraction channels, and then the extracted features of the two parallel channels are fused through a self-adaptive feature interaction fusion module to form a multi-source information depth fusion module (Deep fusion of multi-source information), and a depth convolutional neural network model DMSNet for defect detection, such as semiconductor defect detection, is built by using the multi-source information depth fusion module and a subsequent defect detection network, so that effective defect features can be extracted, and defect detection capability is improved.

1. Full convolution network feature extraction channel

2. Edge-aware feature enhancement channels

As shown in fig. 2, fig. 2 is a schematic structural diagram of an edge-aware feature enhancement channel according to an embodiment of the present application, where the edge-aware feature enhancement channel is further specifically configured to: compressing an image channel and reducing the feature dimension of the depth texture feature map, and extracting features of the feature map subjected to dimension reduction by using an edge feature extraction convolution kernel (EFC convolution kernel for short in the application) to obtain an edge texture feature map; the calculation formula of the edge feature extraction convolution kernel is as follows:

L _i,o target volume for representationThe product kernel represents a dot product operation, G (c, v) represents a linear filter for edge extraction, which may be a set of linear filters, c represents the direction of the linear filter, v represents the scale of the linear filter, and the dimension of the target convolution kernel is consistent with the number of directions of the linear filter >Representing the convolution kernel after filtering by the linear filter.

In an embodiment, the edge-aware feature enhancement channel is a PCA edge-aware feature enhancement channel, which is configured to perform PCA principal component analysis on a feature map U output from a full convolution network feature extraction channel to implement compression and feature dimension reduction of an image channel, and then perform edge feature extraction on the feature map U' after dimension reduction through a novel edge feature extraction convolution kernel, and output an edge texture feature map X corresponding to the feature map U, where an exemplary linear filter includes a Gabor filter. In this embodiment, the edge feature extraction convolution kernel (EFC) is designed by using the characteristic that the linear filter such as the Gabor filter has strong capability of extracting the edge feature, and combining the Gabor filter with the conventional target convolution kernel, so as to design a novel convolution kernel and improve the capability of extracting the edge feature of the model. Specifically:

n represents a depth texture feature map U ₁ Is a dimension of (2);

Z _ij expressed as the depthTexture feature map U ₁ Matrix after standardized calculation is carried out, and the depth texture feature map U ₁ Is H ₁ ×W ₁ N-dimensional vector of size, the depth texture feature map U ₁ Each sample in (a) is denoted as u _ij Wherein s=h ₁ ×W ₁ ，Mean value>Representing standard deviation;

for characteristic value lambda _i After being arranged according to the order of magnitude, the characteristic value lambda is calculated according to a principal component analysis and calculation method _i Corresponding feature vector e _i Then according to the characteristic value lambda _i And a characteristic value lambda _i Corresponding feature vector e _i Obtaining a feature map U after dimension reduction ₁ '。

As shown in FIG. 2, the edge-aware feature enhancement channel provided by the embodiment of the application can extract the first channel of the full convolution network featuresOutput U of grouping convolution block ₁ Performing PCA principal component analysis to realize compression and feature dimension reduction of the channel, and then reducing the dimension of the feature U ₁ Edge feature extraction by edge feature extraction convolution kernel (EFC) to obtain output feature X ₁ And adopting the same processing mode for the output characteristic diagrams of other grouping convolution blocks, thereby obtaining the edge texture characteristic diagram corresponding to each grouping convolution block.

3. Self-adaptive feature interaction fusion module

In one embodiment, the target detection model provided by the application is a target detection model obtained by training a new deep convolutional neural network model by using a deep convolutional neural network model trained on an old class. The training process may be referred to the previous description of the actual training embodiment and will not be described here.

The object detection model of the embodiment of the application has the capability of incremental autonomous learning while improving the model feature extraction capability through the deep convolutional neural network model so as to solve the problem of catastrophic forgetting of the original object caused when new data update the network. The target detection model of the embodiment of the application can comprise two deep convolutional neural network models, adopts a special distillation learning mode, can use the existing deep convolutional neural network model which is trained on the old class to guide the training of the new deep convolutional neural network model, and adds a distillation loss function to improve the incremental learning capacity of the target detection model, and can further improve the target detection capacity by adding the new detection target class without retraining the whole data set in the process of training the target detection model, thereby greatly improving the training efficiency of the model while maintaining the detection capacity and solving the problem of catastrophic forgetting of the original target caused by updating the network with new data.

It should be noted that, the training process of the object detection model including the two deep convolutional neural network models DMSNet may refer to the foregoing embodiment, and the description is not repeated here.

It should be understood that the order of execution of the processes should be determined by their functions and inherent logic, and should not be construed as limiting the implementation of the embodiments of the present application in any way.

Apparatus, medium and device part

In an embodiment, a multi-source information fusion model training device based on incremental autonomous learning is provided, where the multi-source information fusion model training device based on incremental autonomous learning corresponds to the multi-source information fusion model training method based on incremental autonomous learning in the above embodiment one by one. As shown in fig. 4, the multi-source information fusion model training device 10 based on incremental autonomous learning includes an acquisition module 101 and a training module 102. The functional modules are described in detail as follows:

an acquisition module 101, configured to acquire a training image set, where the training image set includes training images;

the training module 102 is configured to train the target detection network through the training image to obtain a trained target detection model;

In an embodiment, a multi-source information fusion detection device based on incremental autonomous learning is provided, where the multi-source information fusion detection device based on incremental autonomous learning corresponds to the multi-source information fusion detection method based on incremental autonomous learning in the above embodiment one by one. As shown in fig. 5, the multi-source information fusion detection device 20 based on incremental autonomous learning includes an acquisition module 201 and an input module 202. The functional modules are described in detail as follows:

an acquisition module 201 for acquiring a detected image;

the input module 202 is configured to input the detected image into a trained target detection model to obtain a defect detection result;

For specific limitations of the incremental autonomous learning-based multi-source information fusion model training device or the incremental autonomous learning-based multi-source information fusion detection device, reference may be made to the above description of the incremental autonomous learning-based multi-source information fusion model training method or the incremental autonomous learning-based multi-source information fusion detection method, which are not described herein. The above-mentioned multi-source information fusion model training device based on the incremental autonomous learning, or each module in the multi-source information fusion detection device based on the incremental autonomous learning can be realized wholly or partly by software, hardware and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server or a terminal device, and the internal structure thereof may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The computer program is executed by a processor to realize a multi-source information fusion model training method based on incremental autonomous learning or a multi-source information fusion detection method based on incremental autonomous learning.

In one embodiment, a computer device is provided comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of when executing the computer program:

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

For specific limitations of the computer device and the computer storage medium, reference may be made to the above description of the method for training the multi-source information fusion model based on incremental autonomous learning, or the related limitations of the method for detecting multi-source information fusion based on incremental autonomous learning, which are not described herein.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. The multi-source information fusion detection method based on incremental autonomous learning is characterized by comprising the following steps of:

The target detection model comprises a deep convolution neural network model, wherein the deep convolution neural network model comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and an adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to the detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interaction fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; the defect detection network is used for detecting a defect target based on the fusion feature map;

the edge perception feature enhancement channel is specifically configured to: compressing an image channel and reducing the feature dimension of the depth texture feature map, and extracting features of the feature map subjected to dimension reduction by using an edge feature extraction convolution kernel to obtain the edge texture feature map;

The calculation formula of the edge feature extraction convolution kernel is as follows:L _i,o representing a target convolution kernel, representing a dot product operation, G (c, v) representing a linear filter for edge extraction, c representing a direction of the linear filter, v representing a dimension of the linear filter, the dimension of the target convolution kernel being consistent with a number of directions of the linear filter>Representing a convolution kernel filtered by the linear filter, the linear filter comprising Gabor filteringA wave device.

2. The incremental autonomous learning based multi-source information fusion detection method of claim 1 wherein the full convolutional network feature extraction channel comprises a plurality of grouped convolutional blocks, each of the grouped convolutional blocks comprising a continuous 3 x 3 convolutional kernel and a 1 x 1 convolutional kernel.

3. The method for detecting multi-source information fusion based on incremental autonomous learning according to claim 1, wherein the compressing and feature dimension reduction of the image channel for the depth texture feature map comprises:

n represents a depth texture feature map U ₁ Is a dimension of (2);

acquiring the depth texture feature map U ₁ Is a correlation coefficient matrix R of (2) ₁ ，

Calculating the correlation coefficient matrix R ₁ Satisfies |λI-R ₁ When |=0, the obtained eigenvalue λ _i I=1, 2,..n, I is the eigenvector of the eigenvalue λ;

for the characteristic value lambda _i After being arranged according to the order of magnitude, the characteristic value lambda is calculated according to a principal component analysis and calculation method _i Corresponding feature vector e _i And according to the characteristic value lambda _i And the characteristic value lambda _i Corresponding feature vector e _i Obtaining a feature map U after dimension reduction ₁ '。

4. The incremental autonomous learning-based multi-source information fusion detection method of claim 1, wherein the adaptive feature interaction fusion module is further configured to: calibrating the fusion feature map to obtain a calibrated feature map; the defect detection network is used for detecting a defect target based on the calibrated characteristic diagram.

5. The incremental autonomous learning-based multi-source information fusion detection method of claim 4, wherein calibrating the fusion feature map to obtain a calibrated feature map comprises:

Performing global tie pooling operation on the feature map on each channel of the fused feature map, wherein W and H are the width and the height of the feature map on each channel respectively;

using an activation function for said E on each channel of said fused feature map _k Performing excitation operation to obtain a pooling value Z of the feature map on each channel _k Weights given theretoValue beta _k ；

6. The incremental autonomous learning-based multi-source information fusion detection method of claim 5 wherein the E on the respective channels of the fusion profile is determined using an activation function _k Performing excitation operation to obtain a pooling value Z of the feature map on each channel _k The weight value beta is given _k Comprising:

the E on the respective channels of the fused feature map is calculated using the following formula _k Performing excitation operation to obtain a pooling value Z of the feature map on each channel _k The weight value beta is given _k ：

β _k Pooling value E expressed as a feature map on the kth channel of the fused feature map _k The weight value given, σ represents the sigmoid activation function, γ is the relu activation function, and w is the full-connection layer w· (γ·e _k ) Represent E _k A nonlinear full-join operation is performed.

7. A multi-source information fusion model training method based on incremental autonomous learning is characterized by comprising the following steps:

the target detection model comprises a deep convolution neural network model, wherein the deep convolution neural network model comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and an adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interaction fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; the defect detection network is used for detecting a defect target based on the fusion feature map;

the calculation formula of the edge feature extraction convolution kernel is as follows:L _i,o representing a target convolution kernel, representing a dot product operation, G (c, v) representing a linear filter for edge extraction, c representing a direction of the linear filter, v representing a dimension of the linear filter, the dimension of the target convolution kernel being consistent with a number of directions of the linear filter>Representing the convolution kernel after filtering by the linear filter, the linear filter comprising a Gabor filter.

8. The method for training a multi-source information fusion model based on incremental autonomous learning according to claim 7, wherein the training the target detection network through the training image to obtain a trained target detection model comprises:

migrating the trained network parameters of the detection model for old class detection into the target detection network, and training the target detection network through the training image to obtain a trained target detection model;

The network structure of the detection model for old class detection is the same as that of the target detection network.

9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 8.