CN115965571A

CN115965571A - Multi-source information fusion detection and model training method and medium for incremental autonomous learning

Info

Publication number: CN115965571A
Application number: CN202210461803.1A
Authority: CN
Inventors: 何良雨; 崔健; 刘彤
Original assignee: Fengrui Lingchuang Zhuhai Technology Co ltd
Current assignee: Fengrui Lingchuang Zhuhai Technology Co ltd
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2023-04-14
Anticipated expiration: 2042-04-28
Also published as: CN115965571B

Abstract

The application relates to the field of deep learning and precise detection of intelligent manufacturing, in particular to semiconductor detection, and discloses a multi-source information fusion detection and model training method and medium based on incremental autonomous learning. The method comprises the following steps: inputting the detected image into a trained target detection model for detection; the target detection model comprises a deep convolution neural network model, and the model comprises a multi-source information fusion full convolution network feature extraction channel, an edge perception feature enhancement channel and an adaptive feature interaction fusion module. The full convolution network feature extraction channel is used for extracting the depth texture features of the detected image, the edge perception feature enhancement channel is used for extracting the edge texture features in the depth texture features, and the self-adaptive feature interaction fusion module is used for fusing the depth texture features and the edge texture features. And in the training, a network for detecting a new target class is added to the trained network for training, and the incremental self-learning design ensures the detection capability of the incremental network on the defects of the new and old classes.

Description

Multi-source information fusion detection and model training method and medium for incremental autonomous learning

Technical Field

The application relates to the field of deep learning and precise detection of intelligent manufacturing, can be applied to semiconductor defect detection, and particularly relates to a multisource information fusion detection method based on incremental autonomous learning, a model training method and a storage medium.

Background

Articles or products are prone to various defects during their production, such as device defects in the field of electronic manufacturing, and various defects during ultra-precise semiconductor production processes. These defects can affect the production yield, lifetime and reliability of the product. Therefore, surface defect detection of such articles or products is a key link in quality control. The surface defect detection method based on machine vision has the advantages of high efficiency, high accuracy, high real-time performance and the like, and is widely researched and applied in the field of defect detection.

However, because of the wide variety of defect types, features are difficult to define and occur only during production, making computer vision based methods of detecting defects on the surface of an article or product difficult. In addition, in the field of industrial detection, data sources are numerous, and the data have different structural information, so how to effectively extract various defect target features for defect detection is very important, and a technical problem to be solved urgently is presented.

Disclosure of Invention

The embodiment of the application provides a multi-source information fusion detection method, a model training method and a storage medium based on incremental autonomous learning, so that various defect target characteristics are effectively extracted, and the defect detection accuracy is improved. The method solves the problem of catastrophic forgetting of the original target caused by updating the target detection model by new data by using the distillation loss function.

A multi-source information fusion detection method based on incremental autonomous learning comprises the following steps:

acquiring a detected image, and inputting the detected image into a trained target detection model to obtain a defect detection result;

the target detection model comprises a deep convolution neural network model, the deep convolution neural network model comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and a self-adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interaction fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; and the defect detection network is used for detecting the defect target based on the fusion characteristic diagram.

In one embodiment, the full convolution network feature extraction channel includes a plurality of block convolution blocks, each block convolution block including successive 3 x 3 and 1 x 1 convolution kernels.

In one embodiment, the edge perceptual feature enhancement channel is specifically configured to: compressing an image channel and reducing the dimension of the feature of the depth texture feature map, and extracting the feature of the feature map subjected to dimension reduction by using an edge feature extraction convolution kernel to obtain an edge texture feature map;

the calculation formula of the edge feature extraction convolution kernel is as follows:

L _i,o denotes a target convolution kernel, denotes a dot product operation, G (c, v) denotes a linear filter for edge extraction, c denotes a direction of the linear filter, v denotes a scale of the linear filter, the dimension of the target convolution kernel matches the number of directions of the linear filter, and>

representing the convolution kernel after filtering by a linear filter.

In an embodiment, the linear filter comprises a Gabor filter.

In one embodiment, the compression and feature dimension reduction of the image channel are performed on the depth texture feature map, and the method comprises the following steps:

the depth texture feature map U ₁ The normalization calculation was performed as follows:

n represents a depth texture feature map U ₁ The dimension of (a);

Z _ij is represented as the depth texture feature map U ₁ The matrix after the standardization calculation is carried out, and the depth texture feature map U ₁ Is H ₁ ×W ₁ Vector of size N dimension, the depth texture feature map U ₁ Each sample in (1) is denoted as u _ij Wherein s = H ₁ ×W ₁ ，

Represents a mean value, <' > is>

Represents the standard deviation;

obtaining a depth texture feature map U ₁ Is calculated by the correlation coefficient matrix R ₁ ，

Calculating a correlation coefficient matrix R ₁ Satisfies the equation of | λ I-R ₁ When | =0, the obtained characteristic value λ _i (I =1,2, \8230;, N), I being the eigenvector of the eigenvalue λ;

for the characteristic value lambda _i After being arranged according to the size sequence, the characteristic values lambda are respectively calculated according to a principal component analysis calculation method _i Corresponding feature vector e _i And then according to the characteristic value lambda _i And a characteristic value lambda _i Corresponding characteristic directionQuantity e _i Obtaining a feature map U after dimension reduction ₁ ′。

In one embodiment, the adaptive feature interaction fusion module is further configured to: calibrating the fusion characteristic diagram to obtain a calibrated characteristic diagram; and the defect detection network is used for detecting the defect target based on the calibrated characteristic diagram.

In one embodiment, calibrating the fused feature map to obtain a calibrated feature map includes:

performing global tie pooling on the feature map on each channel of the fused feature map, wherein W and H are the width and height of the feature map on each channel respectively;

compressing dimension W H of the feature map on each channel to generate channel statistics E ∈ R ⁿ The calculation formula is as follows:

E _k for the feature map k on each channel, representing the pooling value of the corresponding feature map of the channel, and D (i, j) represents the feature map on each channel; wherein K is more than or equal to 1 and less than or equal to K, K represents a characteristic diagram on all channels, i is more than or equal to 1 and less than or equal to W, and j is more than or equal to 1 and less than or equal to H;

using the activation function to fuse E on the respective channels of the feature map _k Excitation operation is carried out to obtain pooling values Z of the characteristic maps on each channel _k Assigned weight value beta _k ；

Weighted value beta after excitation _k Corresponding characteristic diagram D _(i,j) Multiplying to obtain a calibrated characteristic diagram U _(i,j) ＝D _(i,j) ·β _k 。

In one embodiment, the activation function is used to fuse E on the respective channels of the feature maps _k Performing excitation operation to obtain pooling value Z of the characteristic diagram on each channel _k Assigned weight value beta _k The method comprises the following steps:

e on the respective channels of the fused feature map using the following formula _k Performing excitation operation to obtain pooling value Z of the characteristic diagram on each channel _k Assigned weight value beta _k ：

β _k Pooled value E of feature map on k-th channel represented as fused feature map _k The assigned weight value is sigma representing sigmoid activation function, gamma is relu activation function, and w is full connection layer w · (gamma. E) _k ) Denotes a reaction of E with _k And carrying out nonlinear full-connection operation.

A multi-source information fusion model training method based on incremental autonomous learning comprises the following steps:

acquiring a training image set, wherein the training image set comprises training images;

training the target detection network through a training image to obtain a trained target detection model;

In one embodiment, a target detection network is trained through a training image to obtain a trained target detection model, network parameters of the trained detection model for old class detection are migrated into the target detection network, and the trained target detection model is trained through the training image to obtain the trained target detection model; wherein a network structure of the detection model for old class detection is the same as a network structure of the target detection network.

A multi-source information fusion model training device based on incremental autonomous learning comprises:

the acquisition module is used for acquiring a training image set, and the training image set comprises training images;

the training module is used for training the target detection network through a training image to obtain a trained target detection model; the target detection model comprises a deep convolution neural network model, the deep convolution neural network model comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and a self-adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interactive fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; and the defect detection network is used for detecting the defect target based on the fusion characteristic diagram.

A multi-source information fusion detection device based on incremental autonomous learning comprises:

the acquisition module is used for acquiring an image to be detected;

the input module is used for inputting the detected image into the trained target detection model to obtain a defect detection result; the target detection model comprises a deep convolution neural network model, the deep convolution neural network model comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and a self-adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interactive fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; and the defect detection network is used for detecting the defect target based on the fusion characteristic diagram.

The computer equipment comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor executes the computer program to realize the multi-source information fusion detection method based on the incremental autonomous learning or the multi-source information fusion model training method based on the incremental autonomous learning.

A computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above-mentioned multisource information fusion detection method based on incremental autonomous learning, or the multisource information fusion model training method based on incremental autonomous learning.

In the scheme realized by the multi-source information fusion detection method based on incremental autonomous learning, the model training method, the model training device, the computer equipment and the storage medium, different texture features including the depth texture feature and the edge texture feature of a detected product or object can be respectively extracted through two parallel feature extraction channels, the extracted features of the two parallel channels are fused through an adaptive feature interaction fusion module, and a deep convolution neural network model for defect detection, such as semiconductor defect detection, is built with a subsequent defect detection network, so that effective defect features can be extracted, and the defect detection capability is improved.

In addition, the target detection model of the embodiment of the application has the capability of incremental autonomous learning to solve the problem of catastrophic forgetting of the original target caused by updating the network with new data while improving the feature extraction capability of the model through the deep convolutional neural network model. The target detection model of the embodiment of the application can comprise two deep convolution neural network models, a special distillation learning mode is adopted, the existing deep convolution neural network model trained on the old type can be used for guiding the training of the new deep convolution neural network model, a distillation loss function is added, the incremental learning capacity of the target detection model is improved, the whole data set does not need to be retrained by adding the new detection target type, the training efficiency of the model is greatly improved while the detection capacity is maintained, and the problem of disastrous forgetting of the original target caused when the network is updated by new data can be solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without inventive labor.

Fig. 1 is a schematic structural diagram of a full convolution network feature extraction channel in an embodiment of the present application;

FIG. 2 is a schematic diagram of an edge-aware feature enhancement channel according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a fusion process of an adaptive feature interaction fusion module according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a multi-source information fusion model training process based on incremental autonomous learning according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a multi-source information fusion model training apparatus based on incremental autonomous learning according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a multi-source information fusion detection apparatus based on incremental autonomous learning according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The multisource information fusion detection method based on incremental autonomous learning and the corresponding model training method can be applied to terminal equipment or a server, the model training method is generally applied to the server, the multisource information fusion detection method based on incremental autonomous learning can be applied to the terminal equipment or the server, the multisource information fusion detection method based on incremental autonomous learning can be directly used in a trained target detection model and is generally applied to the terminal equipment, and the terminal equipment can be but not limited to various personal computers, notebook computers, smart phones, tablet computers and portable wearable equipment. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.

The embodiment of the application comprises a multi-source information fusion detection method based on incremental autonomous learning, a multi-source information fusion model training method based on incremental autonomous learning, a device, equipment and a storage medium, and the embodiment of the application is completely described through the parts.

Model training part

The model training part includes a first training process and a second training process, the first training process is a general multi-source information fusion model-based training process, and the second training process is a multi-source information fusion model training process added with incremental autonomous learning, which is an incremental autonomous learning training process, and is described below.

An embodiment of the application provides a training method based on multi-source information fusion, which includes: acquiring a training image set, wherein the training image set comprises training images; training a target detection network through a training image to obtain a trained target detection model; the target detection model comprises a deep convolution neural network model DMSNet, wherein the deep convolution neural network model DMSNet comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and an adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interactive fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; and the defect detection network is used for detecting the defect target based on the fusion characteristic diagram.

It can be seen that the target detection model trained by the training method provided in the embodiment of the present application includes a Deep convolutional neural network model DMSNet, the Deep convolutional neural network model DMSNet can respectively extract different texture features of a detected product or object, including a Deep texture feature and an edge texture feature, through two parallel feature extraction channels, and then fuse the features extracted by the two parallel channels through an adaptive feature interaction fusion module to form a multi-source information Deep fusion module (Deep fusion of multi-source information), and the multi-source information Deep fusion module and a subsequent defect detection network are used to build a Deep convolutional neural network model DMSNet for defect detection, for example, for semiconductor defect detection, etc., so that effective defect features can be extracted, and the defect detection capability can be improved.

The defect detection network in the deep convolutional neural network model DMSNet may adopt a conventional defect detection network, and is not limited specifically, for example, the conventional defect detection network is used for prediction and regression neural network of a certain target category (for example, a semiconductor defect), and the multi-source information deep fusion neural network in the deep convolutional neural network model DMSNet is a feature extraction network specifically proposed in the present application.

The parallel feature extraction network provided by the embodiment of the application comprises two parallel feature extraction channels, namely a full convolution network feature extraction channel and an edge perception feature enhancement channel, which are respectively used for extracting the depth texture features and the corresponding edge texture features of a defect target. The working processes of the full convolution network feature extraction channel, the edge perception feature enhancement channel and the adaptive feature interaction fusion module are described below.

1. Full convolution network feature extraction channel

In one embodiment, the full convolution network feature extraction channel includes a plurality of block convolution blocks, each block convolution block including successive 3 x 3 and 1 x 1 convolution kernels. It should be noted that the number of the grouped convolution blocks may be flexibly set according to a specific application scenario and a task requirement, which is not limited in the embodiment of the present application.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a full convolution network feature extraction channel in an embodiment of the present application, where the full convolution network feature extraction channel includes a plurality of block convolution blocks, and each block convolution block includes consecutive 3 × 3 and 1 × 1 convolution layers. Namely, the full convolution network feature extraction channel builds a full convolution neural network framework by using a grouping convolution block composed of a plurality of continuous convolution layers of 3 multiplied by 3 and 1 multiplied by 1.

It should be noted that, through the full convolution network feature extraction channel, in the forward propagation process of the network, the dimension transformation of the feature map can be realized by adjusting the step length of the convolution kernel instead of the traditional pooling layer, so that the loss of feature information can be reduced, the high resolution of convolution output is ensured, and the detection precision is improved. It is also worth noting that the multi-layered 3 × 3 convolution kernel has a stronger non-linear expression capability than the conventional 5 × 5 convolution kernel or other large and small convolution kernels, and can achieve the same receptive field with fewer parameters. And a 1 × 1 convolution layer continuous with a 3 × 3 convolution kernel is configured, so that the model complexity can be reduced as a dimension reduction module in the forward propagation process of the network, feature images on different channels with the same depth are subjected to feature fusion in the forward propagation process, and the feature extraction capability of a detection target is enhanced.

Illustratively, continuing with fig. 1, assuming that the input image X is an M-dimensional vector of H '× W' size, the dimension of the 3 × 3 convolution kernel is the same as that of the input image X, a filter of one convolution kernel may be used for each input channel separately, and then feature fusion is performed on the output of the filter by using an N-dimensional 1 × 1 convolution kernel to obtain a complete convolution output Y, where Y is an N-dimensional vector of H × W size. It can be seen that the amount of computation can be greatly reduced by the convolution approach of grouping the volume blocks.

2. Edge-aware feature enhancement channel

In order to further improve the feature extraction capability of the deep convolutional neural network model DMSNet, in the embodiment of the application, a feature enhancement channel which is concentrated in edge feature extraction, that is, an edge perception feature enhancement channel, is further added to the deep convolutional neural network model DMSNet, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain the edge texture feature map.

As shown in fig. 2, fig. 2 is a schematic structural diagram of an edge-aware feature enhancement channel in an embodiment of the present application, where the edge-aware feature enhancement channel is further specifically configured to: compressing an image channel and reducing the dimension of the feature of the depth texture feature map, and then extracting the feature of the feature map subjected to dimension reduction by using an edge feature extraction convolution kernel (EFC convolution kernel for short in the application) to obtain an edge texture feature map; the calculation formula of the edge feature extraction convolution kernel is as follows:

L _i,o representing target convolution kernel, representing dot product operation, G (c, v) representing a linear filter for edge extraction, which may be a set of linear filters, c representing the direction of the linear filter, v representing the scale of the linear filter, the dimension of the target convolution kernel being equal to the number of directions of the linear filter, and/or>

Representing the convolution kernel after filtering by a linear filter.

As shown in the figure2, D in FIG. 2 _k ×D _k X M is the target convolution kernel, which may be a conventional convolution kernel, D _k ×D _k The size of the convolution kernel of the target convolution kernel is represented, and M represents the number of the convolution kernels of the target convolution kernel, which is not limited specifically.

In an embodiment, the edge-aware feature enhancement channel is a PCA edge-aware feature enhancement channel, and is configured to perform PCA principal component analysis on a feature map U output from the full convolution network feature extraction channel to implement compression and feature dimension reduction of an image channel, and then perform edge feature extraction on a feature map U' after the dimension reduction through a novel edge feature extraction convolution kernel, and output an edge texture feature map X corresponding to the feature map U.

In an embodiment, the linear filter comprises a Gabor filter. In this embodiment, an edge feature extraction convolution kernel (EFC convolution kernel) is a novel convolution kernel designed by combining a Gabor filter with a conventional target convolution kernel by using the characteristic that a linear filter such as the Gabor filter has a strong ability to extract edge features, and the ability of a model to extract edge features is improved. Specifically, the method comprises the following steps:

in one embodiment, the compression of the image channel and the feature dimension reduction are performed on the depth texture feature map, and the method comprises the following processes:

n represents a depth texture feature map U ₁ The dimension of (a);

Z _ij is represented as the depth texture feature map U ₁ The matrix after the standardization calculation is carried out, and the depth texture feature map U ₁ Is H ₁ ×W ₁ A vector of size N, the depth texture feature map U ₁ Each sample in (1) is denoted as u _ij Wherein s = H ₁ ×W ₁ ，

Represents a mean value, <' > is>

Represents the standard deviation;

Calculating a correlation coefficient matrix R ₁ Satisfies | λ I-R ₁ When | =0, the eigenvalue λ obtained _i (I =1,2, \8230;, N), I being the eigenvector of the eigenvalue λ;

for the characteristic value lambda _i After being arranged according to the size sequence, the characteristic values lambda are respectively calculated according to a principal component analysis calculation method _i Corresponding feature vector e _i And then according to the characteristic value lambda _i And a characteristic value lambda _i Corresponding feature vector e _i Obtaining a feature map U after dimension reduction ₁ '。

In this embodiment, a feature map U after dimension reduction is obtained ₁ And then, performing dot product operation on the Gabor filter and a target convolution kernel by using the EFC convolution kernel, namely performing feature extraction on the feature after dimension reduction by using an edge feature extraction convolution kernel to obtain an edge texture feature map. By reducing the dimension of the feature map U ₁ The method utilizes the edge feature extraction convolution kernel to extract features, and further effectively improves the extraction capability of the deep convolution neural network model DMSNet on the edge texture features of the defect target.

As shown in fig. 2, with the edge-aware feature enhancement channel provided in this embodiment of the present application, the output U of the first packet convolution block of the full convolution network feature extraction channel may be obtained ₁ Carrying out PCA principal component analysis to realize the compression of the channel and the dimension reduction of the characteristic, and then carrying out the dimension reduction on the characteristic U ₁ Extracting edge feature by using edge feature extraction convolution kernel (EFC convolution kernel) to obtain output feature X ₁ And adopting the same processing mode for the output characteristic maps of other grouped convolution blocks so as to obtain the edge texture characteristic map corresponding to each grouped convolution block.

3. Adaptive feature interaction fusion module

In the embodiment of the present application, in order to fully utilize the defective texture features of convolutional neural networks of different levels, as shown in fig. 3, fig. 3 is a schematic diagram of a fusion process of an adaptive feature interaction fusion module provided in the embodiment of the present application, where the adaptive feature interaction fusion module is configured to fuse a depth texture feature map and an edge texture feature map to obtain a fusion feature map, an embodiment of the present application provides an adaptive feature interaction fusion manner, and a full convolution network feature extraction channel and an edge perception feature enhancement channel are fused, and features extracted by the two channels are fused, and features learned from the same layer of a detected image are adaptively combined by using a selective feature aggregation manner, so that the network has a feature identification capability to enhance valuable features and suppress worthless features.

In an embodiment, the adaptive feature interaction fusion module is further configured to: calibrating the fusion characteristic diagram to obtain a calibrated characteristic diagram; and the defect detection network is used for detecting the defect target based on the calibrated characteristic diagram. In the embodiment, the defect target detection capability and accuracy are improved based on the calibrated characteristic diagram.

performing global tie pooling on the feature maps on each channel of the fused feature maps, wherein W and H are the width and height of the feature maps on each channel respectively;

E _k for the feature map k on each channel, the pooling value of the corresponding feature map of the channel, and D (i, j) represents the feature map on each channel; wherein K is more than or equal to 1 and less than or equal to K, K represents a characteristic diagram on all channels, i is more than or equal to 1 and less than or equal to W, and j is more than or equal to 1 and less than or equal to H;

using activation functions, for fusionE on the respective channel of the feature map _k Excitation operation is carried out to obtain pooling values Z of the characteristic maps on each channel _k Weight value assigned beta _k ；

In one embodiment, the activation function is used to fuse E on the respective channels of the feature maps _k Excitation operation is carried out to obtain pooling values Z of the characteristic maps on each channel _k Assigned weight value beta _k The method comprises the following steps: e on the respective channels of the fused feature map using the following formula _k Excitation operation is carried out to obtain pooling values Z of the characteristic maps on each channel _k Assigned weight value beta _k ：

β _k Pooled value E of feature map on k-th channel represented as fused feature map _k The assigned weight value, sigma, is sigmoid activation function, gamma is relu activation function, and w is full connection layer w · (gamma. E) _k ) Denotes a reaction of E with _k And carrying out nonlinear full-connection operation. Through the embodiment, the feature fusion among the channels of the fusion feature graph can be realized, and each layer of channel is endowed with a weight value beta _k Then, a sigmoid function is used for carrying out excitation calculation on the weight value of each channel of the fused feature graph, the weight of the effective feature is increased, the weight of the ineffective feature is inhibited, and the excited weight value beta is obtained _k And a characteristic map D corresponding thereto _(i,j) Multiplying to obtain a feature graph U after recalibration _(i,j) ＝D _(i,j) ·β _k The process can enhance the characteristics in the characteristic diagram which are beneficial to the defect target of the detected object and inhibit the interference characteristics, namely, the characteristic diagram on different channels in each layer of the network is activated by the weighted value, thereby increasing the effective characteristic weight and reducing the ineffective characteristicsThe weight of the model, and further the capability of the model feature characterization is improved.

In an embodiment, the DMSNet may include at least one adaptive feature interaction fusion module, which is not limited specifically.

In the embodiment of the present application, a target detection network may be trained through a training image to obtain a trained target detection model, in a training process using a training image set, a back propagation algorithm may be used to calculate derivatives of all errors for each parameter in the network, then the parameters are updated according to gradient descent, and network parameters are updated through multiple iterations to complete training of the target detection model.

In addition to the first training process, in the embodiment of the present application, the target detection model may also be trained based on an incremental learning manner, that is, a second training process, which is described below.

4. Incremental autonomous learning network model training process

In an embodiment, the training a target detection network through the training image to obtain a trained target detection model includes: transferring the trained network parameters of the detection model for old type detection to the target detection network, and training the target detection network through the training image to obtain a trained target detection model; wherein a network structure of the detection model for old class detection is the same as a network structure of the target detection network.

The present application provides two of the above-described deep convolutional neural network models DMSNet, which are not represented by network a and network B. As shown in fig. 4, in the training process, the network structure includes two networks, one is a network a, where the network a is an existing deep convolutional neural network model DMSNet trained on old classes, for example, the network a is used for defect detection on some 5 classes; since the model classifier needs to add the defect detection of a new class, and the other is a network B trained for newly adding the defect detection of the new class, which also adopts the deep convolutional neural network model DMSNet, in this embodiment, a target detection incremental network is formed, and the network B is used for newly adding the defect detection of a new class.

It can be seen that, in terms of the network structure, the target detection model provided in the embodiment of the present application adds a prediction and regression neuron for a new detected target class, i.e., a network B (a currently trained deep convolutional neural network model DMSNet), to the last fully connected output layer of the trained network a, and the incremental design ensures the detection capability of the incremental network B on the defect targets of the new and old classes. In the incremental learning training process, the weight of the network A is frozen, the weight of the network A with the detection capability on the old type target is copied by the weight initialization of the incremental network B, and distillation loss is introduced in order to ensure that the detection capability of the network A on the old type target is transferred to the network B. Only the new class of training data is used in training the network B, which can improve the training efficiency.

As shown in fig. 4, the target detection model provided in the embodiment of the present application adopts an end-to-end network model structure, and a single-stage mode is adopted during model inference, that is, it is necessary to predict whether each feature point on the last layer feature map of the target detection model has a target, and the category and the position coordinate of the target at the same time. Compared with the traditional two-stage target detection algorithm, the method has the advantages that the interested region needs to be found first, and then the target in the interested region is identified and positioned, the single-stage detection algorithm can realize target detection only by extracting the features once, and the detection speed of the algorithm is improved. Therefore, the loss function of the single-stage target detection model needs to include the judgment of whether the target exists in the feature points and the loss calculation of the category and the position coordinates of the target. And in order to realize incremental autonomous learning of the model, knowledge distillation loss is required to be added during training. In summary, the penalty function for the target detection model includes a penalty derived from a loss of position coordinates, a loss of confidence (IOU), a loss of classification, and a loss of knowledge distillation. That is, the eyeThe loss function of the target detection model can be expressed as: loss = l _coord +l _cls +l _iou +λl _d 。

Wherein l _coord As a bounding box loss function,/ _cls Is a cross entropy loss function of two classes, l _iou As a confidence loss function,/ _d The lambda is a weight value for a knowledge distillation loss function and can be flexibly set according to the characteristics of a detection target.

In one embodiment, the bounding box penalty function (l) _coord ) The method comprises the error of a central coordinate and the error of the width and the height of a bounding box, and the calculation formula is as follows:

in the above formula: lambda [ alpha ] _coord I =0, 1.., S, weight of coordinate error ² Is the ith grid; b is the number of bounding boxes in each grid; j =0, 1.. B is the jth bounding box in each mesh;

judging whether a defect target exists in a jth prediction frame of the ith cell, if so, taking 1, otherwise, taking 0; (x, y) is the center coordinate of each grid cell, based on the value of the cell number>

Is the corresponding predicted value. It should be noted that, in the embodiment of the present application, each feature point in the feature map output by the last layer of the object detection model is taken as a mesh, that is, there are several meshes for several feature points in the feature map of the last layer. For example, when the size of the feature map of the last layer of the object detection model is 13 × 13, a 13 × 13 grid is drawn for the feature map, and whether an object exists in each grid, the category of the object, and the position coordinates of the object are determined by using a loss function.

Cross entropy loss of dichotomy (l) _cls ) The calculation formula of (2) is as follows:

wherein: delta. For the preparation of a coating _i ，η _i Indicating whether the target defect exists in the ith grid or not, if so, delta _i ＝1,η _i =0, if not present, δ _i ＝0,η _i ＝1，p _i (c) Scoring a predicted target real category;

is a score value predicted to be targetless.

Loss of confidence function (l) _iou ) Comprises the following steps:

wherein: lambda [ alpha ] _noobj The weight for confidence loss when no target is detected,

or 1 represents whether a defect target exists in the jth bounding box in the ith grid or not, if the defect target exists, the defect target does not exist in the jth bounding box in the ith grid, and if the defect target exists, the defect target does not exist in the jth bounding box in the ith grid, the defect target is 1 _i Is the score value of the target category,

is the coincidence ratio of the prediction frame and the real frame.

The knowledge distillation loss function is defined as:

wherein: l _total And N represents the number of samples and the number of target classes chosen in the old dataset, respectively. h' _i,j ，k′ _i,j Respectively representing the class probability distribution of the predicted value of the old deep convolutional neural network model (network A) to the jth target of the ith class in the data set after passing through the softmax layer, and the new deep convolutional neural network model (network A)B) The class probability distribution after the softmax layer for the output feature (logit) of the target.

In the training process, the derivative of each parameter in the network by all errors is calculated by using a back propagation algorithm, then the parameters are updated according to gradient decrease, and the network parameters are updated through multiple iterations to complete the training of the model.

It should be noted that the defect detection method for the surface of an article or product based on computer vision is difficult because the defect types are various, the characteristics are difficult to define, and the defect detection only occurs in the production process, such as the semiconductor defect detection in the semiconductor field. In addition, in the field of industrial detection, data sources are numerous, the data have different structural information, in the practical application process, new detection targets (new detection types) may need to be added to the model from time to time, and if new source data are added to the model, the model is retrained every time, so that time and labor are wasted, the production efficiency is affected, and along with the continuous increase of the data, a training data set is huge, and the storage space of a computer is seriously occupied. Therefore, how to effectively extract various defect target characteristics and enable the model to have the ability of incremental autonomous learning are very important.

In the embodiment of the application, in order to enable the target detection algorithm to be applicable to a product defect detection scene with a constantly changing data set, the target detection model of the embodiment of the application has the capability of incremental autonomous learning to solve the problem of catastrophic forgetting of an original target caused when the target detection model is updated with new data while the feature extraction capability of the model is improved through the deep convolutional neural network model. The target detection model of the embodiment of the application can comprise two deep convolutional neural network models, a special distillation learning mode is adopted, the existing deep convolutional neural network model trained on the old type can be used for guiding the training of the new deep convolutional neural network model, a distillation loss function is added, the incremental learning capacity of the target detection model is improved, the whole data set does not need to be retrained by adding the new detection target type, the training efficiency of the model is greatly improved while the detection capacity is maintained, and the problem of disastrous forgetting of the original target caused when the network is updated by new data can be solved.

Part of the detection method

The training of the target detection model provided by the embodiment of the application is introduced, and the multi-source information fusion detection method based on the incremental autonomous learning is provided based on the target detection model, and can be used in various fields, for example, in the field of electronic manufacturing and used for surface defects of components. For example, in the field of semiconductor manufacturing, the method is used for detecting various defects such as semiconductor defects, and the like, can remarkably improve the extraction capability of an object detection model on defect texture features, particularly small defects such as gaps, scratches and the like in the semiconductor defects, can add new object types at any time for detection, can be used for detecting the surface defects of products such as semiconductors and 3C electronics, and has wide application scenes and high value.

In one embodiment, a multi-source information fusion detection method based on incremental autonomous learning is provided, which includes: acquiring a detected image, and inputting the detected image into the trained target detection model to obtain a defect detection result; the target detection model comprises a deep convolution neural network model DMSNet which comprises a multi-source information deep fusion neural network and a defect detection network, wherein the multi-source information deep fusion neural network comprises a parallel feature extraction network and an adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interaction fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; and the defect detection network is used for detecting the defect target based on the fusion characteristic diagram.

It can be seen that the multisource information fusion detection method based on incremental autonomous learning provided by the embodiment of the application can respectively extract different texture features of a detected product or object through two parallel feature extraction channels, including a depth texture feature and an edge texture feature, and then fuse the features extracted by the two parallel channels through an adaptive feature interaction fusion module to form a multisource information depth fusion module (Deep fusion of multi-source information), and build a Deep convolution neural network model DMSNet for defect detection, for example, for semiconductor defect detection, by using the multisource information depth fusion module and a subsequent defect detection network, the method can effectively extract defect features and improve defect detection capability.

1. Full convolution network feature extraction channel

As shown in fig. 1, fig. 1 is a schematic structural diagram of a full convolution network feature extraction channel in an embodiment of the present application, where the full convolution network feature extraction channel includes a plurality of block convolution blocks, and each block convolution block includes consecutive 3 × 3 convolution layers and 1 × 1 convolution layers. Namely, the full convolution network feature extraction channel builds a full convolution neural network framework by using a grouping convolution block composed of a plurality of continuous convolution layers of 3 multiplied by 3 and 1 multiplied by 1.

It should be noted that, through the full convolution network feature extraction channel, in the forward propagation process of the network, the dimension transformation of the feature map can be realized by adjusting the step length of the convolution kernel instead of the traditional pooling layer, so that the loss of feature information can be reduced, the high resolution of convolution output is ensured, and the detection precision is improved. It should be noted that the multi-layer 3 × 3 convolution kernel has stronger non-linear expression capability than the conventional 5 × 5 convolution kernel or other convolution kernels with different sizes, and can achieve the same receptive field with fewer parameters. And a 1 × 1 convolution layer continuous with a 3 × 3 convolution kernel is configured, so that the model complexity can be reduced as a dimension reduction module in the forward propagation process of the network, feature images on different channels with the same depth are subjected to feature fusion in the forward propagation process, and the feature extraction capability of a detection target is enhanced.

2. Edge-aware feature enhancement channel

Representing the convolution kernel after filtering by a linear filter.

As shown in fig. 2, D in fig. 2 _k ×D _k X M is a target convolution kernel, which may be a conventional convolution kernel, D _k ×D _k The size of the convolution kernel of the target convolution kernel is represented, and M represents the number of the convolution kernels of the target convolution kernel, which is not limited specifically.

In an embodiment, the edge-aware feature enhancement channel is a PCA edge-aware feature enhancement channel, and is configured to perform PCA principal component analysis on a feature map U output from the full convolution network feature extraction channel to implement compression and feature dimension reduction of an image channel, and then perform edge feature extraction on a feature map U' after the dimension reduction through a novel edge feature extraction convolution kernel, and output an edge texture feature map X corresponding to the feature map U. In this embodiment, the edge feature extraction convolution kernel (EFC) is a novel convolution kernel designed by combining a Gabor filter and a conventional target convolution kernel by using the characteristic that linear filters such as the Gabor filter have strong capability of extracting edge features, and the capability of a model to extract edge features is improved. Specifically, the method comprises the following steps:

the depth texture feature map U is processed ₁ The normalization calculation was performed as follows:

n represents the depth texture feature map U ₁ The dimension of (a);

Z _ij is represented as the depth texture feature map U ₁ The matrix after the standardization calculation is carried out, and the depth texture characteristic graph U ₁ Is H ₁ ×W ₁ Vector of size N dimension, the depth texture feature map U ₁ Each sample in (1) is denoted as u _ij Wherein s = H ₁ ×W ₁ ，

Represents a mean value, <' > based on>

Represents the standard deviation;

Calculating a correlation coefficient matrix R ₁ Satisfies | λ I-R ₁ When | =0, the obtained characteristic value λ _i (I =1,2, \8230;, N), I being the eigenvector of the eigenvalue λ;

In this embodiment, a feature map U after dimension reduction is obtained ₁ And then, performing dot product operation on the Gabor filter and a target convolution kernel by using an EFC convolution kernel, namely performing feature extraction on the feature after dimension reduction by using an edge feature extraction convolution kernel to obtain an edge texture feature map. By reducing the dimension of the feature map U ₁ The method utilizes the edge feature extraction convolution kernel to extract features, and further effectively improves the extraction capability of the deep convolution neural network model DMSNet on the edge texture features of the defect target.

As shown in fig. 2, with the edge-aware feature enhancement channel provided in this embodiment of the present application, the output U of the first packet convolution block of the full convolution network feature extraction channel may be output ₁ Carrying out PCA principal component analysis to realize the compression of the channel and the dimension reduction of the characteristic, and then carrying out the dimension reduction on the characteristic U ₁ ' extracting edge feature by edge feature extraction convolution kernel (EFC) to obtain output feature X ₁ And adopting the same processing mode for the output characteristic maps of other grouped convolution blocks so as to obtain the edge texture characteristic map corresponding to each grouped convolution block.

3. Adaptive feature interaction fusion module

In the embodiment of the present application, in order to fully utilize defective texture features of convolutional neural networks of different levels, as shown in fig. 3, fig. 3 is a schematic diagram of a fusion process of an adaptive feature interaction fusion module provided in the embodiment of the present application, where the adaptive feature interaction fusion module is configured to fuse a depth texture feature map and an edge texture feature map to obtain a fusion feature map, and in the embodiment of the present application, an adaptive feature interaction fusion mode is provided, where a full convolution network feature extraction channel and an edge perception feature enhancement channel are used, and features extracted by the two channels are fused, and a selective feature aggregation mode is used to perform adaptive combination on features learned from the same layer of a detected image, so that the network has a capability of identifying features, so as to enhance valuable features and suppress worthless features.

In one embodiment, the adaptive feature interaction fusion module is further configured to: calibrating the fusion characteristic diagram to obtain a calibrated characteristic diagram; and the defect detection network is used for detecting the defect target based on the calibrated characteristic diagram. In the embodiment, the defect target detection capability and accuracy are improved based on the calibrated characteristic diagram.

compressing dimension W H of the feature map on each channel to generate channel statistics E E R ⁿ The calculation formula is as follows:

using activation function to fuse E on respective channels of feature map _k Performing excitation operation to obtain pooling value Z of the characteristic diagram on each channel _k Weight value assigned beta _k ；

In one embodiment, the activation function is used to fuse E on the respective channels of the feature maps _k Performing excitation operation to obtain pooling value Z of the characteristic diagram on each channel _k Weight value assigned beta _k The method comprises the following steps: e on the respective channels of the fused feature map using the following formula _k Performing excitation operation to obtain pooling value Z of the characteristic diagram on each channel _k Weight value assigned beta _k ：

β _k Pooled value E of feature map on k-th channel represented as fused feature map _k The assigned weight value, sigma, is sigmoid activation function, gamma is relu activation function, and w is full connection layer w · (gamma. E) _k ) Denotes a reaction of E _k And carrying out nonlinear full-connection operation. Through the embodiment, the feature fusion among the channels of the fused feature map can be realized, and each layer of channel is endowed with a weight value beta _k Then, a sigmoid function is used for carrying out excitation calculation on the weight value of each channel of the fused feature graph, the weight of the effective feature is increased, the weight of the ineffective feature is inhibited, and the excited weight value beta is obtained _k And a characteristic map D corresponding thereto _(i,j) Multiplying to obtain a feature map U after recalibration _(i,j) ＝D _(i,j) ·β _k By the process, the purpose of enhancing the characteristics in the characteristic diagram which are beneficial to the defect target of the detection object and inhibiting the interference characteristics can be achieved, namely, the characteristic diagram on different channels in each layer of the network is activated by the weighted value, so that the effective characteristic weight is increased, the weight of the invalid characteristic is reduced, and the characteristic characterization capability of the model is improved.

In an embodiment, the target detection model provided by the present application is a target detection model obtained by using a deep convolutional neural network model trained on an old class to guide the training of a new deep convolutional neural network model. The training process can be referred to the description of the foregoing practical training embodiment, and is not described here.

The target detection model provided by the embodiment of the application has the incremental autonomous learning capability while improving the feature extraction capability of the model through the deep convolutional neural network model, so as to solve the problem of catastrophic forgetting of an original target caused when a network is updated by new data. The target detection model of the embodiment of the application can comprise two deep convolutional neural network models, a special distillation learning mode is adopted, the existing deep convolutional neural network model trained on the old type can be used for guiding the training of the new deep convolutional neural network model, a distillation loss function is added, the incremental learning capacity of the target detection model is improved, the new detection target type is added, the whole data set does not need to be retrained in the process of training the target detection model, the detection capacity is kept, the training efficiency of the model is greatly improved, the problem of disastrous forgetting of the original target caused when the network is updated by new data can be solved, and the target detection capacity is further improved.

It should be noted that the training process of the target detection model including the two deep convolutional neural network models DMSNet described above may refer to the foregoing embodiments, and the description is not repeated here.

It should be understood that the execution order of each process should be determined by its function and its inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Apparatus, medium, and device parts

In an embodiment, the multi-source information fusion model training device based on incremental autonomous learning is provided, and the multi-source information fusion model training device based on incremental autonomous learning corresponds to the multi-source information fusion model training method based on incremental autonomous learning in the embodiment one to one. As shown in fig. 4, the multi-source information fusion model training apparatus 10 based on incremental autonomous learning includes an acquisition module 101 and a training module 102. The functional modules are explained in detail as follows:

an obtaining module 101, configured to obtain a training image set, where the training image set includes training images;

the training module 102 is configured to train a target detection network through a training image to obtain a trained target detection model;

the target detection model comprises a deep convolution neural network model, the deep convolution neural network model comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and a self-adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to a detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interactive fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; and the defect detection network is used for detecting the defect target based on the fusion characteristic diagram.

In an embodiment, the multi-source information fusion detection device based on the incremental autonomous learning is provided, and the multi-source information fusion detection device based on the incremental autonomous learning corresponds to the multi-source information fusion detection method based on the incremental autonomous learning in the embodiment one to one. As shown in fig. 5, the multi-source information fusion detection apparatus 20 based on incremental autonomous learning includes an acquisition module 201 and an input module 202. The functional modules are explained in detail as follows:

an acquiring module 201, configured to acquire a detected image;

an input module 202, configured to input the detected image into the trained target detection model to obtain a defect detection result;

For specific limitations of the multi-source information fusion model training device based on incremental autonomous learning, or the multi-source information fusion detection device based on incremental autonomous learning, reference may be made to the limitation of the multi-source information fusion model training method based on incremental autonomous learning, or the multi-source information fusion detection method based on incremental autonomous learning, which is not described herein again. All modules in the multi-source information fusion model training device based on incremental autonomous learning or the multi-source information fusion detection device based on incremental autonomous learning can be completely or partially realized through software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server or a terminal device, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The computer program is executed by a processor to realize a multi-source information fusion model training method based on incremental autonomous learning or a multi-source information fusion detection method based on incremental autonomous learning.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

training a target detection network through a training image to obtain a trained target detection model;

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, performs the steps of:

For specific limitations of the computer device and the computer storage medium, reference may be made to the above-mentioned limitations of the multi-source information fusion model training method based on incremental autonomous learning, or the multi-source information fusion detection method based on incremental autonomous learning, which is not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the apparatus may be divided into different functional units or modules to perform all or part of the above described functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A multi-source information fusion detection method based on incremental autonomous learning is characterized by comprising the following steps:

the target detection model comprises a deep convolution neural network model, the deep convolution neural network model comprises a multi-source information deep fusion neural network and a defect detection network, the multi-source information deep fusion neural network comprises a parallel feature extraction network and a self-adaptive feature interaction fusion module, the parallel feature extraction network comprises a full convolution network feature extraction channel and an edge perception feature enhancement channel, the full convolution network feature extraction channel is used for extracting a deep texture feature map corresponding to the detected image, and the edge perception feature enhancement channel is used for processing the deep texture feature map to obtain an edge texture feature map; the self-adaptive feature interaction fusion module is used for fusing the depth texture feature map and the edge texture feature map to obtain a fusion feature map; the defect detection network is used for detecting defect targets based on the fusion characteristic diagram.

2. The multi-source information fusion detection method based on incremental autonomous learning of claim 1, wherein the full convolution network feature extraction channel comprises a plurality of grouped convolution blocks, each of the grouped convolution blocks comprising a continuous 3 x 3 convolution kernel and a continuous 1 x 1 convolution kernel.

3. The multi-source information fusion detection method based on incremental autonomous learning of claim 1, wherein the edge-aware feature enhancement channel is specifically configured to: compressing an image channel and reducing the dimension of the feature of the depth texture feature map, and extracting the feature of the feature map subjected to dimension reduction by using an edge feature extraction convolution kernel to obtain the edge texture feature map;

L _i,o representing a target convolution kernel, a dot product operation, G (c, v) representing a linear filter for edge extraction, c representing the direction of the linear filter, v denotes the scale of the linear filter, the dimension of the target convolution kernel coincides with the number of directions of the linear filter, and ` H `>

Representing the convolution kernel after filtering by the linear filter, the linear filter comprising a Gabor filter.

4. The multi-source information fusion detection method based on incremental autonomous learning of claim 3, wherein the compressing and feature dimension reduction of the image channel on the depth texture feature map comprises:

n represents the depth texture feature map U ₁ The dimension of (a);

Z _ij represented as said depth texture feature map U ₁ Normalized calculated matrix, said depthTexture feature map U ₁ Is H ₁ ×W ₁ Vector of size N dimension, the depth texture feature map U ₁ Each sample in (1) is denoted as u _ij Wherein s = H ₁ ×W ₁ ，

Represents a mean value, <' > is>

Represents the standard deviation;

obtaining the depth texture feature map U ₁ Is calculated by the correlation coefficient matrix R ₁ ，

Calculating the correlation coefficient matrix R ₁ Satisfies | λ I-R ₁ When | =0, the eigenvalue λ obtained _i (I =1,2, \8230;, N), I being the eigenvector of the eigenvalue λ;

for the characteristic value lambda _i After being arranged according to the size sequence, the characteristic values lambda are respectively calculated according to a principal component analysis calculation method _i Corresponding feature vector e _i Based on said characteristic value λ _i With said characteristic value λ _i Corresponding feature vector e _i Obtaining a feature map U after dimension reduction ₁ '。

5. The multi-source information fusion detection method based on incremental autonomous learning of claim 1, wherein the adaptive feature interaction fusion module is further configured to: calibrating the fusion characteristic diagram to obtain a calibrated characteristic diagram; and the defect detection network is used for detecting a defect target based on the calibrated characteristic diagram.

6. The multi-source information fusion detection method based on incremental autonomous learning of claim 5, wherein calibrating the fusion feature map to obtain a calibrated feature map comprises:

using an activation function to said E on respective channels of said fused feature map _k Performing excitation operation to obtain pooling value Z of the characteristic diagram on each channel _k Assigned weight value beta _k ；

7. The multi-source information fusion detection method based on incremental autonomous learning of claim 6, wherein the E on each channel of the fusion feature map is subjected to activation function _k Performing excitation operation to obtain pooling value Z of the characteristic diagram on each channel _k Assigned weight value beta _k The method comprises the following steps:

using the following formula to apply the E on the respective channels of the fused feature map _k Performing excitation operation to obtain pooling value Z of the characteristic diagram on each channel _k Weight value assigned beta _k ：

β _k A pooling value E of the feature map on the k-th channel represented as said fused feature map _k The assigned weight value, sigma, is sigmoid activation function, gamma is relu activation function, and w is full connection layer w · (gamma. E) _k ) Denotes a reaction of E _k And carrying out nonlinear full-connection operation.

8. A multi-source information fusion model training method based on incremental autonomous learning is characterized by comprising the following steps:

training a target detection network through the training image to obtain a trained target detection model;

9. The multi-source information fusion model training method based on incremental autonomous learning according to claim 8, wherein the training of the target detection network through the training image to obtain a trained target detection model comprises:

transferring the trained network parameters of the detection model for old type detection to the target detection network, and training the target detection network through the training image to obtain a trained target detection model;

wherein a network structure of the detection model for old class detection is the same as a network structure of the target detection network.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.