CN116824334A

CN116824334A - Model back door attack countermeasure method based on frequency domain feature fusion reconstruction

Info

Publication number: CN116824334A
Application number: CN202310754608.2A
Authority: CN
Inventors: 王承杰; 赵琛; 武延军; 吴敬征; 郑森文; 罗天悦
Original assignee: Institute of Software of CAS
Current assignee: Institute of Software of CAS
Priority date: 2023-06-25
Filing date: 2023-06-25
Publication date: 2023-09-29

Abstract

The invention discloses a model back door attack countermeasure method based on frequency domain feature fusion reconstruction. The method filters a feature map set of a student model in a frequency domain by utilizing Fourier convolution to remove a back door attack mode injected in a time domain; sequentially cascading and fusing semantic information from the deep feature map to the shallow feature map to integrate the output of the whole student model, so that semantic information which can be learned by the student model is added in the process of matching the teacher model feature map, and possible attack backdoors based on local information are weakened; and (3) using attention operation to the fused feature images to enhance shallow semantic information density between adjacent output feature images by using deep high-order semantic information, thereby improving learning ability of the student model and obtaining higher training precision. The invention can learn and obtain the student model which has high precision and can remove the time domain attack back door and the attack back door based on the local information on the basis of the pre-training model of the unreliable source.

Description

Model back door attack countermeasure method based on frequency domain feature fusion reconstruction

Technical Field

The invention belongs to the technical field of information safety, and relates to a model back door attack countermeasure method based on frequency domain feature fusion reconstruction.

Background

With deep learning and the rapid development of neural network models, artificial intelligence systems have begun to play an increasingly important role in daily life. For end users with relatively scarce computing resources and data sets, they need to download pre-trained models from the Internet and then fine-tune these models according to their own needs (HeT, zhang Z, zhang H, et al, bag of tricks for image classification with convolutional neural networks [ C ]// Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR). 2019:558-567.) or knowledge distillation (Hinton G, vinylals O, dean J. Distilling the knowledge in a neural network [ J ]. ArXiv preprint arXiv:1503.02531, 2015.) to apply these model migration to specific tasks. The artificial intelligent model obtained by the method not only can realize good generalization performance, but also can obtain the model size required by a user, reduce the calculation cost and achieve better performance in specific tasks.

However, the pre-trained model published on the internet runs the risk of being implanted with malicious backdoor patterns. An attacker may cause the pre-trained model to exhibit predefined behavior, such as performance degradation, purposeful corruption, or erroneous decisions, when encountering certain input features, in a specific training manner. The back door attack of the artificial intelligent model has extremely strong pertinence, and unexpected error conditions can occur when back door activating factors are included in the model application scene (Liu Y, wen R, he X, et al { ML-sector }: holistic Risk Assessment of Inference Attacks Against Machine Learning Models [ C ]//31st USENIX Security Symposium (USENIX Security 22): 2022:4525-4542.). Besides directly issuing models with backdoors by attackers, some download links of pre-trained models issued in regular channels can be hijacked to propagate malicious models to end users to achieve the purpose of attack. Thus, the artificial intelligence model downloaded from the internet requires necessary processing to ensure security before use.

Currently, the application of optimized knowledge distillation methods to treat victim models is one of the effective means to combat artificial intelligence model back door attacks (Kim J, lee B K, roY M. Distribution robust and non-robust features in adversarial examples by information bottleneck [ J ] Advances in Neural Information Processing Systems, (NIPS) 2021, 34:17148-17159.). The prior art scheme mainly carries out matching learning on each characteristic pixel in output, and measures are taken in the learning process to reduce the possibility of back door transmission. The advantage of these methods is that the resist learning of all pixels can be done on an untrusted model, helping to remove trigger patterns that are active over the whole range. However, since the selected resistance mode is effective in the full range of input features, these methods have poor protection against back door attacks triggered by small range semantic information.

The attention mechanism is a globally weighted information interaction mechanism, the association relation in the information can be obtained through the information processed by the mechanism, the association weight (Vaswani A, shazer N, parmar N, et al, attention is all you need [ J ]. Advances in Neural Information Processing Systems (NIPS), 2017,30.) of the elements in the information can be given, and the mechanism can collect and enhance the semantic information of the deep learning model in a global scope.

For the back gate attack mode of time domain injection, common back gate attack countermeasure methods can also be processed (Li Y, lyu X, koren N, et al, neurol Attention Distillation: erasing Backdoor Triggers from Deep Neural Networks [ C ]// International Conference on Learning RepresentationS 2021), but the phenomenon of reducing the original performance of the model exists.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention aims to provide a model back door attack countermeasure method based on frequency domain feature fusion reconstruction.

Aiming at the back door attack of the neural network pre-training model injected in the time domain, the invention provides a method for filtering the feature map in the frequency domain by using Fourier convolution and filtering the time domain attack trigger mode, thereby safely learning the pre-training model; the fourier transform is a commonly used signal processing technique, which can convert a time domain signal into a frequency domain signal, so that signal filtering and directional processing can be performed on an original signal in the frequency domain, namely, certain frequency domain components can be selectively filtered or reserved in the frequency domain, and then inverse fourier transform is performed to obtain a restored filtered time domain signal, thereby achieving partial operation which is difficult to realize in the time domain. The partial signal characteristic mode of the time domain is clear to observe in the frequency domain, so that the potential back door attack trigger mode can be processed in the frequency domain.

Aiming at the neural network pre-training model backdoor attack based on local information, the invention provides an interlayer cascading semantic information fusion method for fusing the feature map information of different layers output by a training model to be learned, so that the overall semantic capability is formed, and the neural network pre-training model backdoor attack based on the local information can be resisted.

Aiming at the decline of the neural network model precision caused by the training mode of resisting the back door attack, the invention provides the method which uses an interlayer attention mechanism to output cascade connection of different layers of training models to be learned, and uses high-order semantic information from adjacent deep feature images to carry out weighted fusion on the semantic information of shallow feature images, thereby improving the semantic information aggregation degree and being capable of learning stronger model precision in the training process.

In order to achieve the above purpose, the invention is realized by the following technical scheme: a model back door attack countermeasure method based on frequency domain feature fusion reconstruction comprises the following steps:

(1) Selecting and initializing a large model corresponding to a superior application task to which a target application task belongs as a teacher model M _tea Downloading and using teacher model M stored on common platform _tea Incomplete trusted pre-training weight versus teacher model M _tea Is covered by the parameter weight of (1), and the teacher model M is fixed after the covering is completed _tea Selecting and initializing the output level number and M _tea Compact model or teacher model M _tea Consistent model as student model M _stu The method comprises the steps of carrying out a first treatment on the surface of the For example, when the target application task is a computer vision downstream task such as a dense small target detection task or multi-instance small target segmentation, the upper application task is generally an image classification task, and the large model can be referred to as a BASIC-L, viT-e image classification neural network model which uses a large-scale data set to perform training and has a large number of parameters;

(2) Respectively inputting the target application task data sets into the teacher model M _tea And student model M _stu Wherein, feature graphs are acquired at the middle level of each same stage to generate a teacher feature graph set Feats _tea Feature graphs integrated with students _stu And finally output probability distribution Logits _tea And Logits _stu ；

(3) For a deeper student characteristic diagram Feats _stu [l+1]Performing size alignment to match shallower layer of feature patterns Feats _stu [l]Wherein, l represents a characteristic stage, and the characteristic stage is commonly input into a Fourier convolution layer after the two time domain characteristic graphs are aligned in size;

(4) Features from a further layer are formed in the Fourier convolution layer _stu [l+1]And shallower layer features _stu [l]Respectively processed into frequency domain forms, respectively marked as Fourier _g [l]With Fourier _l [l]Wherein higher order features that are more global relative to the current first layer features are therefore identified as gl using gThe global feature, and l represents the local feature, and the cross-weighting calculation is performed on the high-layer and low-layer information of the frequency domain to complete the Feats _stu [l+1]Incorporating higher-order semantic information injection fusion to Feats _stu [l]Information fusion of (a) while utilizing Feats _stu [l]The detail semantic information pair Feats contained in the system _stu [l+1]Supplementing so as to fuse information between the two layers of feature images in a frequency domain; in the cross weighting calculation process, the convolution filter is synchronously carried out in the frequency domain, and the convolution filter is a filter with parameter learning and can be obtained by using Fourier of the frequency domain _g [l]With Fourier _l [l]The characteristic is filtered so as to remove the abnormal state of the time domain back gate trigger mode in the frequency domain, and M is avoided in the subsequent learning process _tea Transmitting a time domain back door attack trigger mode, and processing the processed frequency domain characteristic Fourier _g [l]With Fourier _l [l]The inverse transformation of discrete Fourier transform is used for restoring to the time domain, and the feature map Fourier after the frequency domain semantic information fusion and convolution filtering is output _global [l]With Fourier _local [l]Two time domain feature maps;

(5) The self-attention mechanism is utilized to perform Fourier on the fused characteristic diagram _global [l]With Fourier _local [l]Performing global semantic information attention weight calculation by using Fourier with high-order global information _global [l]As the query vector Q and the keyword vector K, fourier having low-order detail information is used _local [l]Completing self-attention operator calculation as a value vector V and outputting a self-attention feature map Feats _attn [l]；

(6) From the deepest layer M _stu Feature map features _stu [L]Initially, steps (3), (4), (5) are performed in shallow cascade, wherein at [ l-1 ]]During the beginning of the stage process, the deeper features in the Fourier convolution layer input are plotted as Feats _stu [l]Fourier generated by Fourier convolutional layer fusion in level i processing _local [l]Features being replaced, i.e. in [ l-1 ]]The input of the Fourier convolution layer in the stage and the subsequent processing is the current layer Feats _stu [l-1]Local partial Fourier in the output of the Fourier convolutional layer with the further layer _local [l]；

(7) For each i-level Feats finally obtained in turn _tea [l]And Feats _attn [l]Channel size alignment using the same technique as step (3), where l.epsilon.1, L-1]L is M _tea Outputting total layers, namely aligning the channel numbers and the sizes of the two feature graphs by using a convolution layer and an interpolation function, then calculating and summing losses of the two feature graphs corresponding to each layer in sequence by using a Kullback-Leible divergence mutual entropy loss function, and carrying out Logits _stu Calculating a loss according to an objective function specific to the objective application task;

(8) The different losses are weighted and summed to obtain the total loss and the total loss is trained back to M _stu And converging to obtain the student model with high precision and high safety and eliminating the potential back door attack trigger mode.

The training process of the method can convert the output of the student model into the frequency domain in the knowledge distillation migration process, and the M hidden in the open publication is obtained _tea Filtering the abnormal state value of the time domain-based back gate attack trigger mode in the feature map output generated by the pre-training weight in the frequency domain, so as to remove the time domain back gate; aiming at the neural network pre-training model backdoor attack based on local information, the method utilizes interlayer cascading semantic information fusion to fuse the feature map information of adjacent layers output by the training model to be learned, applies shallow detail features to refine and perfect deep global features and enables deep high-order semantic information to be injected into the shallow features to improve the expression capability, thereby forming the whole semantic capability, and sequentially cascading feature layers can serially connect model outputs into a whole so as to realize M pairs of integral states after cascading fusion _tea Learning is performed to avoid M _tea The feature information is subjected to one-to-one learning, so that the neural network pre-training model backdoor attack based on the local information can be resisted; in conclusion, the method can implement safe migration processing on the teacher model in the unsafe domain, and a student model network with a back door removed and high precision is obtained.

Further, the download platform of the pre-training model of the established model in the step (1) selects the pre-training of the open source or businessModel weight provides a platform, but the model weight which is regarded as an untrusted source is subjected to subsequent processing for a training data source, a training method and training parameters; the student model M _stu Equal to or weaker in model capacity than the teacher model M _tea 。

Further, the target application task data set in the step (2) is a student model M _stu The data set to be applied by the user is self-organized by the user, and is therefore considered as a trusted data set, which is weaker in scale and labeling accuracy than the data set used by the pre-training model;

the middle level of each stage of the model means that in the neural network model processing flow, a relatively obvious reusable normal form exists, the final output of each reusable normal form is extracted as the output of one layer, so that the complete performance of the reusable normal form is obtained, and meanwhile, the total quantity of feature images is reduced to accelerate calculation;

the final output probability distribution Logits _stu And Logits _tea Is the estimated value of the neural network model for various correct probabilities before the final result is output.

Further, the feature map alignment operation in the step (3) includes two operations: the channel alignment and the feature map scale alignment are performed by a convolution layer with a convolution kernel size of 1x1 and a step length of 1 to a deeper feature map Feats _stu [l+1]The channel processing is carried out, so that the number of channels can be aligned to a shallower layer of feature map while the semantic information is improved by carrying out full connection processing on the internal information of the layer;

the feature map scale alignment operation is to use interpolation function to align the feature map features of the deeper layer _stu [l+1]Scaling is performed, and the interpolation function is determined by the selected interpolation mode.

Further, the fourier convolution layer in the step (4) is a deep learning operator capable of splitting the input into local and global abstractions and performing local and global processing respectively, and optimizing the deep learning operator for application to the invention, and outputting Feats in a deeper layer in the pyramidal feature map output _stu [l+1]As global abstraction, a shallower layer is output with Feats _stu [l]Performing operation as local abstraction;

the processing of the feature map into a frequency domain form is to convert the time domain feature into a frequency domain form by using two-dimensional real number domain discrete Fourier transform, wherein the processing of the feature map is the last two-dimensional feature of the pointer pair feature map;

the cross weighting calculation is an improved operation mode on a Fourier convolution operator, and the formula of the operation mode is as follows:

Y ^l ＝Y ^l→l +Y ^g→l ＝f _l (X ^l )+f _g→l (X ^g ),

Y ^g ＝Y ^g→g +Y ^l→g ＝f _g (X ^g )+f _l→g (X ^l )

wherein Y represents output, X represents input, l and g represent local and global information, f functions with subscripts are processing functions respectively aiming at local and global, and the left side and the right side of a 'to' symbol respectively represent information layers to which information before and after processing is executed;

the f processing function is a frequency domain convolution filtering function, specifically a convolution kernel size 1*1, the number of convolution kernels is a convolution layer of the processed characteristic channel number, strong activation reaction can be carried out on the existing time domain back gate trigger mode in the frequency domain, the function is composed of learning training parameters, anomalies in a teacher model are found along with the training process of a student model, and frequency domain disturbance caused by the local characteristics is weakened through a global matching item of a final loss function, so that the aim of resisting back gate attack is fulfilled;

the information fusion between the two levels is completed in the optimized Fourier convolution, and the fusion mode is that the filtered global, local-global and global-local frequency domain features are respectively changed into the filtered global, local-global and global-local frequency domain features according to the relation Y ^l And Y is equal to ^g The calculation formulas of (1) are fused to obtain a feature map Fourier of the frequency domain fusion _g [l]With Fourier _l [l]；

The method for restoring the frequency domain features to the time domain is inverse operation of two-dimensional real number domain discrete Fourier transform, the operation acts on the latter two dimensions of the frequency domain output, and the output can be converted back to the time domain for subsequent operation.

Further, the attention mechanism in the step (5) is a mechanism for weighting a plurality of targets to obtain weights, and the formula is as follows:

wherein Q and K are Fourier _global [l]Initializing, and V is a Fourier _local [l]Initializing, d _k The number of channels is used for controlling the model scale;

in order to increase the operation speed, a multi-head attention mechanism is used to accelerate the computation, and the multi-head attention can be computed by utilizing the parallel computing capability of the graphic computing unit to obtain the acceleration, and the formula is as follows:

MultiHeadAttention(Q,K,V)＝Concat(head ₁ ,…,head _h )W ^O

wherein the projection weight is a parameter matrix

The global and local are not only global and local information abstractions defined by a Fourier convolution layer, but also deep and shallow layers on a feature map level, and the deep feature map can be regarded as global features by comparing information contained in the same pixels from the aspect of information condensation;

further, the executing step in the step (6) outputs feature graphs Feats on the L-order of the student model _stu Executing from deep layer to shallow layer in sequence in the set, and inputting feature graphs Feats of the deep layer after scale alignment in the first-stage fusion _stu [L]Features with shallow layer _stu [L-1]Initializing, and performing the subsequent execution round [ l-1 ]]The deeper feature map in the input is obtained from the previous roundFourier of arrival _local [i]Substitution is performed.

Further, the Kullback-le divergence mutual entropy objective function in the step (7) is a common objective function for fitting the distribution, and the formula is as follows:

wherein p and q are respectively approximate probability distributions, namely Feats in the scene of the method _tea [l]And Feats _attn [l]Feature map, L is Feats _tea Lambda of the total layer number _l The duty cycle of the different layer characteristic losses in the total loss can be set as an optional parameter.

Said for Logits _stu The calculation of the original objective function refers to the objective function of the user in the application task required by the user, namely, custom _loss The loss is defined and calculated by the user.

Further, the weighting parameters in the step (8) include the respective-level weighting parameters λ defined and used in the step (7) _l The loss α, which controls the ratio of the distillation portion to the original target loss, is expressed as follows:

Total _loss ＝α×Distill _loss +Custom _loss

the feedback training student model refers to a method for optimizing the model by using an error back propagation optimization algorithm, and a selected optimization function can be defined by a user.

The invention has the following advantages:

1. the Fourier convolution is used for converting the model output into the frequency domain for processing, so that a back gate trigger response mode which is partially unobvious in time domain characteristics but can directly cause frequency domain change can be eliminated;

2. the feature map from Fourier convolution is processed by using a self-attention mechanism, so that self-attention fusion can be carried out on semantic information in a frequency domain, stronger boundary information is given, and the model expression effect is improved;

3. the information of different layers is processed in a cascading way, so that a reverse pyramid structure can be formed, forward pyramid information output by an original model is supplemented, the process of reversely transmitting the information can also help a lower-layer model to acquire high-order semantic information, and the model learning capacity and the model accuracy are improved;

4. the student model with high precision and capable of removing the time domain attack back door and the attack back door based on local information can be obtained by learning on the basis of the pre-training model of the unreliable source.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention.

Fig. 2 is a schematic diagram of the improved fourier convolution information flow according to the present invention.

Detailed Description

The invention will now be described in further detail with reference to the accompanying drawings, which are given by way of illustration only and are not intended to limit the scope of the invention.

The method for resisting the model back door attack based on the frequency domain feature fusion reconstruction is shown in fig. 1.

The teacher model applied in this embodiment is ResNet101, the student model is ResNet18, and the user data set is a high resolution image classification data set consistent with the ImageNet format, comprising the steps of:

(1) Initializing a teacher model and a student model, and reasoning by using a batch user data set to obtain a teacher feature map set Feats _tea Feature graphs integrated with students _stu And finally output probability distribution Logits _tea And Logits _stu . The specific description is as follows:

(1a) Initializing ResNet101 model as teacher model M _tea Initializing ResNet18 as student model M using untrusted ResNet101 weight overlay parameters downloaded by an open source platform _stu Loading an ImageNet data set and initializing a random gradient descent optimizer according to training parameters;

(1b) Inputting data set input into M in turn _tea And M is as follows _stu Respectively calculating and obtaining the outputs at different levels to obtain a feature map set Feats _stu And Feats _tea Final output probability distribution Logits _stu And Logits _tea 。

(2) For a deeper student characteristic diagram Feats _stu [l+1]Alignment of dimensions and channels to match shallower layer features _stu [l]Wherein, l represents a characteristic stage, and the characteristic stage is commonly input into a Fourier convolution layer after the two time domain characteristic graphs are aligned in size; the concrete explanation is as follows:

(2a) Using inputs as deeper layer output Feats _stu [l+1]The channel number of (a) is output as a shallower output Feats _stu [l]The convolution kernel of the number of channels is 1x1, and the convolution layer with the step length of 1 performs channel alignment on the feature map of the deeper layer;

(2b) Using bilinear interpolation mode interpolation function to align channels for deeper layer output Feats _stu [l+1]Performing scale alignment to obtain a product with Feats _stu [l]The same feature size.

(3) Fourier convolution layer processing is performed to remove a time domain back gate in the frequency domain, specifically as follows:

(3a) Features from a further layer are formed in the Fourier convolution layer _stu [l+1]And shallower layer features _stu [l]Respectively processed into frequency domain forms, respectively marked as Fourier _g [l]With Fourier _l [l]Wherein the global higher order feature is identified as global feature using g with respect to the current first layer feature, the current layer identifies more detail information, and the local feature is identified using l;

(3b) Cross-weighting the higher and lower layer information of the frequency domain to complete the Feats _stu [l+1]Incorporating higher-order semantic information injection fusion to Feats _stu [l]Information fusion of (a) while utilizing Feats _stu [l]The detail semantic information pair Feats contained in the system _stu [l+1]Supplementing so as to fuse the information between the two layers of feature graphs in the frequency domain, wherein the rule of the cross-weighting calculation is shown in fig. 2, and the formula is as follows:

Y ^l ＝Y ^l→l +Y ^g→l ＝f _l (X ^l )+f _g→l (X ^g ),

Y ^g ＝Y ^g→g +Y ^l→g ＝f _g (X ^g )+f _l→g (X ^l )

(3c) In the cross weighting calculation process, the convolution filter is synchronously carried out in the frequency domain, and the convolution filter is a filter with parameter learning and can be obtained by using Fourier of the frequency domain _g [l]With Fourier _l [l]The characteristic is filtered so as to remove the abnormal state of the time domain back gate trigger mode in the frequency domain, and M is avoided in the subsequent learning process _tea Transmitting a time domain back door attack trigger mode;

(3d) Frequency domain characteristic Fourier after processing _g [l]With Fourier _l [l]The inverse transformation of discrete Fourier transformation is used for restoring to the time domain, the transformation scope is the final two dimensions of the feature map, and the feature map Fourier after the frequency domain semantic information fusion and convolution filtering is output _global [l]With Fourier _local [l]Two time domain feature maps.

(4) The self-attention mechanism is utilized to perform Fourier on the fused characteristic diagram _global [l]With Fourier _local [l]Performing global semantic information attention weight calculation by using Fourier with high-order global information _global [l]As the query vector Q and the keyword vector K, fourier having low-order detail information is used _local [l]Completing self-attention operator calculation as a value vector V and outputting attention feature graphs Feats _attn [l]The method comprises the steps of carrying out a first treatment on the surface of the The concrete explanation is as follows:

(4a) Attention mechanism vs. Fourier using the following formula _global [l]With Fourier _local [l]Processing is carried out to obtain interaction weights of local and global information:

(4b) In order to increase the operation speed, a multi-head attention mechanism is used to accelerate the computation, and the multi-head attention can be computed by utilizing the parallel computing capability of the graphic computing unit to obtain the acceleration, and the formula is as follows:

MultiHeadAttention(Q,K,V)＝Concat(head ₁ ,…,head _h )W ^O

wherein the projection weight is a parameter matrix

(5) From the deepest layer M _stu Feature map features _stu [L]Initially, steps (2), (3) and (4) are performed in a shallow cascade, wherein during the beginning of the first-1 stage process, the deeper features in the Fourier convolution input are generated by Fourier in the ith stage process via Fourier convolution fusion _local [l]Feature substitution, i.e. the input to the Fourier convolution layer in stage 1 and subsequent processing is the current layer Feats _stu [l-1]Local partial Fourier in the output of the Fourier convolutional layer with the further layer _local [l]。

(6) Successively for each level of Feats finally obtained _tea [l]And Feats _attn [l]Channel size alignment using the same technique as step (3), where l.epsilon.1, L-1]That is, the size and the channel number of the two feature maps are aligned by using a convolution layer and an interpolation function, then the loss is calculated and summed for the two feature maps corresponding to each layer l in turn by using a Kullback-Leible divergence mutual entropy loss function, and Logits are calculated and summed _stu Calculating a loss according to an objective function specific to the objective application task; detailed description of the inventionThe following steps:

(6a) The Kullback-Leible divergence mutual entropy objective function is an objective function that fits different distributions, and its formula is as follows, where all λ's are taken _l Set to 1:

wherein p and q are respectively approximate probability distributions, namely Feats in the scene of the method _tea [l]And Feats _attn [l]Feature map, L is Feats _tea Lambda of the total layer number _l The ratio of the characteristic losses of different layers in the total loss can be set as optional parameters;

(6b) For Logits _stu The loss function used is multi-class cross entropy loss, and the formula is as follows:

where M represents the number of categories, N is the total sample size, y _ic Is a sign function (0 or 1), y if the true class of sample i is equal to c _ic Get 1, otherwise y _ic Taking 0, p _ic The observation samples i belong to the prediction probability of category c.

(7) And (3) carrying out weighted summation on different losses to obtain total losses, and training the total losses back to the convergence of the student model, wherein the total losses are calculated according to the following formula:

Total _loss ＝0.5×Distill _loss +Custom _loss

optimization of M by gradient update using random gradient descent algorithm after calculation of total loss _stu Until the model loss converges, a high-precision ResNet18 model with the back door safely removed can be obtained.

Although specific embodiments of the invention have been disclosed for illustrative purposes, it will be appreciated by those skilled in the art that the invention may be implemented with the help of a variety of examples: various alternatives, variations and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will have the scope indicated by the scope of the appended claims.

Claims

1. A model back door attack countermeasure method based on frequency domain feature fusion reconstruction includes the steps:

1) Selecting and initializing a model corresponding to the upper application task to which the target application task belongs as a teacher model M _tea Obtaining the teacher model M _tea To the teacher model M _tea Covering the parameter weight of the (2); selecting and initializing a model as student model M _stu The method comprises the steps of carrying out a first treatment on the surface of the The student model M _stu Is output level number of (2) and the teacher model M _tea Consistent;

2) Inputting samples of the target application task data set to the teacher model M respectively _tea And student model M _stu In the middle, a teacher model M is obtained _tea Feature map set Feats composed of feature maps outputted from each intermediate level of (a) _tea Teacher model M _tea Probability distribution Logits of last layer output _tea Obtaining a student model M _stu Feature map set Feats composed of feature maps outputted from each intermediate level of (a) _stu Obtaining a student model M _stu Probability distribution Logits of last layer output _stu ；

3) Starting from the upper layer of the deepest middle layer, the student model M _stu Feature map Feats of the first +1 intermediate layer output _stu [l+1]Feature maps Feats output with the first intermediate layer _stu [l]After size alignment, the two components are input into a Fourier convolution layer together; the Fourier convolution layer pair feature map features _stu [l+1]、Feats _stu [l]After converting to frequency domain, information fusion is carried out, and Feats are obtained _stu [l+1]Incorporating higher-order semantic information injection fusion to Feats _stu [l]Obtaining a global frequency domain feature map Fourier of the first intermediate layer _g [l]And utilize Feats _stu [l]The detail semantic information pair Feats contained in the system _stu [l+1]Supplementing to obtain a local frequency domain feature map Fourier of the first intermediate layer _l [l]Then to Fourier _g [l]With Fourier _l [l]Respectively carrying out inverse transformation and recovering to the time domain to obtain a global time domain feature map Fourier of the first intermediate layer _global [l]Fourier of local time domain feature map with the first intermediate layer _local [l]The method comprises the steps of carrying out a first treatment on the surface of the Then the self-attention mechanism is utilized for the Fourier _global [l]、Fourier _local [l]Performing global semantic information attention weight calculation to obtain self-attention feature map features _attn [l]And utilizes the local time domain feature map Fourier of the first output in the next stage of processing the first-1 layer _local [l]Replacing feature map sets Feats as higher-order semantic information in fourier convolution layer input _stu Feature map Feats of inner first-1 intermediate layer output _stu [l-1]The method comprises the steps of carrying out a first treatment on the surface of the Wherein l is E [1, L-1 ]]L is the number of layers of the intermediate layer;

4) For feature graph set Feats _tea Each feature map in the first intermediate layer outputs feature maps Feats _tea [l]And corresponding self-attention feature map Feats _attn [l]After alignment, calculating the loss of the first intermediate layer by using a Kullback-Leible divergence mutual entropy loss function, and then summing the losses of the intermediate layers to obtain a loss value Distill _loss The method comprises the steps of carrying out a first treatment on the surface of the According to Logits corresponding to each sample in the target application task data set _stu The real label corresponding to the sample is calculated to obtain a classification loss value Custom _loss ；

5) According to loss value Distilll _loss And loss value Custom _loss Total loss Total calculated _loss Optimizing the student model M _stu ；

6) Iteratively repeating steps 2) -5) until the student model M _stu And converging to obtain a safety model with the potential back door removed.

2. The method of claim 1, wherein Total _loss ＝α×Distill _loss +Custom _loss The method comprises the steps of carrying out a first treatment on the surface of the Alpha is a proportional coefficient.

3. According to claimThe method of 1 or 2, wherein the loss value customer is calculated using a target application task loss function _loss 。

4. The method of claim 1, wherein the Fourier is performed using a self-attention mechanism _global [l]As query vector Q and keyword vector K, fourier is taken as _local [l]Self-attention calculation is carried out as a value vector V, and a self-attention feature map Feats is output _attn [l]。

5. The method of claim 1, wherein the target application task dataset is a trusted dataset that is weaker in scale and annotation accuracy than a dataset used by the pre-trained model.

6. The method according to claim 1, wherein the student model M _stu Is equal to or weaker in model capacity than the teacher model M _tea 。

7. The method according to claim 1, characterized in that the Total loss Total is based on _loss Use of error back propagation optimization algorithm on the student model M _stu And (5) optimizing.

8. A server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the method of any of claims 1 to 7.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.