CN116824334A - Model back door attack countermeasure method based on frequency domain feature fusion reconstruction - Google Patents

Model back door attack countermeasure method based on frequency domain feature fusion reconstruction Download PDF

Info

Publication number
CN116824334A
CN116824334A CN202310754608.2A CN202310754608A CN116824334A CN 116824334 A CN116824334 A CN 116824334A CN 202310754608 A CN202310754608 A CN 202310754608A CN 116824334 A CN116824334 A CN 116824334A
Authority
CN
China
Prior art keywords
stu
model
fourier
feats
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310754608.2A
Other languages
Chinese (zh)
Inventor
王承杰
赵琛
武延军
吴敬征
郑森文
罗天悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202310754608.2A priority Critical patent/CN116824334A/en
Publication of CN116824334A publication Critical patent/CN116824334A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a model back door attack countermeasure method based on frequency domain feature fusion reconstruction. The method filters a feature map set of a student model in a frequency domain by utilizing Fourier convolution to remove a back door attack mode injected in a time domain; sequentially cascading and fusing semantic information from the deep feature map to the shallow feature map to integrate the output of the whole student model, so that semantic information which can be learned by the student model is added in the process of matching the teacher model feature map, and possible attack backdoors based on local information are weakened; and (3) using attention operation to the fused feature images to enhance shallow semantic information density between adjacent output feature images by using deep high-order semantic information, thereby improving learning ability of the student model and obtaining higher training precision. The invention can learn and obtain the student model which has high precision and can remove the time domain attack back door and the attack back door based on the local information on the basis of the pre-training model of the unreliable source.

Description

Model back door attack countermeasure method based on frequency domain feature fusion reconstruction
Technical Field
The invention belongs to the technical field of information safety, and relates to a model back door attack countermeasure method based on frequency domain feature fusion reconstruction.
Background
With deep learning and the rapid development of neural network models, artificial intelligence systems have begun to play an increasingly important role in daily life. For end users with relatively scarce computing resources and data sets, they need to download pre-trained models from the Internet and then fine-tune these models according to their own needs (HeT, zhang Z, zhang H, et al, bag of tricks for image classification with convolutional neural networks [ C ]// Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR). 2019:558-567.) or knowledge distillation (Hinton G, vinylals O, dean J. Distilling the knowledge in a neural network [ J ]. ArXiv preprint arXiv:1503.02531, 2015.) to apply these model migration to specific tasks. The artificial intelligent model obtained by the method not only can realize good generalization performance, but also can obtain the model size required by a user, reduce the calculation cost and achieve better performance in specific tasks.
However, the pre-trained model published on the internet runs the risk of being implanted with malicious backdoor patterns. An attacker may cause the pre-trained model to exhibit predefined behavior, such as performance degradation, purposeful corruption, or erroneous decisions, when encountering certain input features, in a specific training manner. The back door attack of the artificial intelligent model has extremely strong pertinence, and unexpected error conditions can occur when back door activating factors are included in the model application scene (Liu Y, wen R, he X, et al { ML-sector }: holistic Risk Assessment of Inference Attacks Against Machine Learning Models [ C ]//31st USENIX Security Symposium (USENIX Security 22): 2022:4525-4542.). Besides directly issuing models with backdoors by attackers, some download links of pre-trained models issued in regular channels can be hijacked to propagate malicious models to end users to achieve the purpose of attack. Thus, the artificial intelligence model downloaded from the internet requires necessary processing to ensure security before use.
Currently, the application of optimized knowledge distillation methods to treat victim models is one of the effective means to combat artificial intelligence model back door attacks (Kim J, lee B K, roY M. Distribution robust and non-robust features in adversarial examples by information bottleneck [ J ] Advances in Neural Information Processing Systems, (NIPS) 2021, 34:17148-17159.). The prior art scheme mainly carries out matching learning on each characteristic pixel in output, and measures are taken in the learning process to reduce the possibility of back door transmission. The advantage of these methods is that the resist learning of all pixels can be done on an untrusted model, helping to remove trigger patterns that are active over the whole range. However, since the selected resistance mode is effective in the full range of input features, these methods have poor protection against back door attacks triggered by small range semantic information.
The attention mechanism is a globally weighted information interaction mechanism, the association relation in the information can be obtained through the information processed by the mechanism, the association weight (Vaswani A, shazer N, parmar N, et al, attention is all you need [ J ]. Advances in Neural Information Processing Systems (NIPS), 2017,30.) of the elements in the information can be given, and the mechanism can collect and enhance the semantic information of the deep learning model in a global scope.
For the back gate attack mode of time domain injection, common back gate attack countermeasure methods can also be processed (Li Y, lyu X, koren N, et al, neurol Attention Distillation: erasing Backdoor Triggers from Deep Neural Networks [ C ]// International Conference on Learning RepresentationS 2021), but the phenomenon of reducing the original performance of the model exists.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention aims to provide a model back door attack countermeasure method based on frequency domain feature fusion reconstruction.
Aiming at the back door attack of the neural network pre-training model injected in the time domain, the invention provides a method for filtering the feature map in the frequency domain by using Fourier convolution and filtering the time domain attack trigger mode, thereby safely learning the pre-training model; the fourier transform is a commonly used signal processing technique, which can convert a time domain signal into a frequency domain signal, so that signal filtering and directional processing can be performed on an original signal in the frequency domain, namely, certain frequency domain components can be selectively filtered or reserved in the frequency domain, and then inverse fourier transform is performed to obtain a restored filtered time domain signal, thereby achieving partial operation which is difficult to realize in the time domain. The partial signal characteristic mode of the time domain is clear to observe in the frequency domain, so that the potential back door attack trigger mode can be processed in the frequency domain.
Aiming at the neural network pre-training model backdoor attack based on local information, the invention provides an interlayer cascading semantic information fusion method for fusing the feature map information of different layers output by a training model to be learned, so that the overall semantic capability is formed, and the neural network pre-training model backdoor attack based on the local information can be resisted.
Aiming at the decline of the neural network model precision caused by the training mode of resisting the back door attack, the invention provides the method which uses an interlayer attention mechanism to output cascade connection of different layers of training models to be learned, and uses high-order semantic information from adjacent deep feature images to carry out weighted fusion on the semantic information of shallow feature images, thereby improving the semantic information aggregation degree and being capable of learning stronger model precision in the training process.
In order to achieve the above purpose, the invention is realized by the following technical scheme: a model back door attack countermeasure method based on frequency domain feature fusion reconstruction comprises the following steps:
(1) Selecting and initializing a large model corresponding to a superior application task to which a target application task belongs as a teacher model M tea Downloading and using teacher model M stored on common platform tea Incomplete trusted pre-training weight versus teacher model M tea Is covered by the parameter weight of (1), and the teacher model M is fixed after the covering is completed tea Selecting and initializing the output level number and M tea Compact model or teacher model M tea Consistent model as student model M stu The method comprises the steps of carrying out a first treatment on the surface of the For example, when the target application task is a computer vision downstream task such as a dense small target detection task or multi-instance small target segmentation, the upper application task is generally an image classification task, and the large model can be referred to as a BASIC-L, viT-e image classification neural network model which uses a large-scale data set to perform training and has a large number of parameters;
(2) Respectively inputting the target application task data sets into the teacher model M tea And student model M stu Wherein, feature graphs are acquired at the middle level of each same stage to generate a teacher feature graph set Feats tea Feature graphs integrated with students stu And finally output probability distribution Logits tea And Logits stu
(3) For a deeper student characteristic diagram Feats stu [l+1]Performing size alignment to match shallower layer of feature patterns Feats stu [l]Wherein, l represents a characteristic stage, and the characteristic stage is commonly input into a Fourier convolution layer after the two time domain characteristic graphs are aligned in size;
(4) Features from a further layer are formed in the Fourier convolution layer stu [l+1]And shallower layer features stu [l]Respectively processed into frequency domain forms, respectively marked as Fourier g [l]With Fourier l [l]Wherein higher order features that are more global relative to the current first layer features are therefore identified as gl using gThe global feature, and l represents the local feature, and the cross-weighting calculation is performed on the high-layer and low-layer information of the frequency domain to complete the Feats stu [l+1]Incorporating higher-order semantic information injection fusion to Feats stu [l]Information fusion of (a) while utilizing Feats stu [l]The detail semantic information pair Feats contained in the system stu [l+1]Supplementing so as to fuse information between the two layers of feature images in a frequency domain; in the cross weighting calculation process, the convolution filter is synchronously carried out in the frequency domain, and the convolution filter is a filter with parameter learning and can be obtained by using Fourier of the frequency domain g [l]With Fourier l [l]The characteristic is filtered so as to remove the abnormal state of the time domain back gate trigger mode in the frequency domain, and M is avoided in the subsequent learning process tea Transmitting a time domain back door attack trigger mode, and processing the processed frequency domain characteristic Fourier g [l]With Fourier l [l]The inverse transformation of discrete Fourier transform is used for restoring to the time domain, and the feature map Fourier after the frequency domain semantic information fusion and convolution filtering is output global [l]With Fourier local [l]Two time domain feature maps;
(5) The self-attention mechanism is utilized to perform Fourier on the fused characteristic diagram global [l]With Fourier local [l]Performing global semantic information attention weight calculation by using Fourier with high-order global information global [l]As the query vector Q and the keyword vector K, fourier having low-order detail information is used local [l]Completing self-attention operator calculation as a value vector V and outputting a self-attention feature map Feats attn [l];
(6) From the deepest layer M stu Feature map features stu [L]Initially, steps (3), (4), (5) are performed in shallow cascade, wherein at [ l-1 ]]During the beginning of the stage process, the deeper features in the Fourier convolution layer input are plotted as Feats stu [l]Fourier generated by Fourier convolutional layer fusion in level i processing local [l]Features being replaced, i.e. in [ l-1 ]]The input of the Fourier convolution layer in the stage and the subsequent processing is the current layer Feats stu [l-1]Local partial Fourier in the output of the Fourier convolutional layer with the further layer local [l];
(7) For each i-level Feats finally obtained in turn tea [l]And Feats attn [l]Channel size alignment using the same technique as step (3), where l.epsilon.1, L-1]L is M tea Outputting total layers, namely aligning the channel numbers and the sizes of the two feature graphs by using a convolution layer and an interpolation function, then calculating and summing losses of the two feature graphs corresponding to each layer in sequence by using a Kullback-Leible divergence mutual entropy loss function, and carrying out Logits stu Calculating a loss according to an objective function specific to the objective application task;
(8) The different losses are weighted and summed to obtain the total loss and the total loss is trained back to M stu And converging to obtain the student model with high precision and high safety and eliminating the potential back door attack trigger mode.
The training process of the method can convert the output of the student model into the frequency domain in the knowledge distillation migration process, and the M hidden in the open publication is obtained tea Filtering the abnormal state value of the time domain-based back gate attack trigger mode in the feature map output generated by the pre-training weight in the frequency domain, so as to remove the time domain back gate; aiming at the neural network pre-training model backdoor attack based on local information, the method utilizes interlayer cascading semantic information fusion to fuse the feature map information of adjacent layers output by the training model to be learned, applies shallow detail features to refine and perfect deep global features and enables deep high-order semantic information to be injected into the shallow features to improve the expression capability, thereby forming the whole semantic capability, and sequentially cascading feature layers can serially connect model outputs into a whole so as to realize M pairs of integral states after cascading fusion tea Learning is performed to avoid M tea The feature information is subjected to one-to-one learning, so that the neural network pre-training model backdoor attack based on the local information can be resisted; in conclusion, the method can implement safe migration processing on the teacher model in the unsafe domain, and a student model network with a back door removed and high precision is obtained.
Further, the download platform of the pre-training model of the established model in the step (1) selects the pre-training of the open source or businessModel weight provides a platform, but the model weight which is regarded as an untrusted source is subjected to subsequent processing for a training data source, a training method and training parameters; the student model M stu Equal to or weaker in model capacity than the teacher model M tea
Further, the target application task data set in the step (2) is a student model M stu The data set to be applied by the user is self-organized by the user, and is therefore considered as a trusted data set, which is weaker in scale and labeling accuracy than the data set used by the pre-training model;
the middle level of each stage of the model means that in the neural network model processing flow, a relatively obvious reusable normal form exists, the final output of each reusable normal form is extracted as the output of one layer, so that the complete performance of the reusable normal form is obtained, and meanwhile, the total quantity of feature images is reduced to accelerate calculation;
the final output probability distribution Logits stu And Logits tea Is the estimated value of the neural network model for various correct probabilities before the final result is output.
Further, the feature map alignment operation in the step (3) includes two operations: the channel alignment and the feature map scale alignment are performed by a convolution layer with a convolution kernel size of 1x1 and a step length of 1 to a deeper feature map Feats stu [l+1]The channel processing is carried out, so that the number of channels can be aligned to a shallower layer of feature map while the semantic information is improved by carrying out full connection processing on the internal information of the layer;
the feature map scale alignment operation is to use interpolation function to align the feature map features of the deeper layer stu [l+1]Scaling is performed, and the interpolation function is determined by the selected interpolation mode.
Further, the fourier convolution layer in the step (4) is a deep learning operator capable of splitting the input into local and global abstractions and performing local and global processing respectively, and optimizing the deep learning operator for application to the invention, and outputting Feats in a deeper layer in the pyramidal feature map output stu [l+1]As global abstraction, a shallower layer is output with Feats stu [l]Performing operation as local abstraction;
the processing of the feature map into a frequency domain form is to convert the time domain feature into a frequency domain form by using two-dimensional real number domain discrete Fourier transform, wherein the processing of the feature map is the last two-dimensional feature of the pointer pair feature map;
the cross weighting calculation is an improved operation mode on a Fourier convolution operator, and the formula of the operation mode is as follows:
Y l =Y l→l +Y g→l =f l (X l )+f g→l (X g ),
Y g =Y g→g +Y l→g =f g (X g )+f l→g (X l )
wherein Y represents output, X represents input, l and g represent local and global information, f functions with subscripts are processing functions respectively aiming at local and global, and the left side and the right side of a 'to' symbol respectively represent information layers to which information before and after processing is executed;
the f processing function is a frequency domain convolution filtering function, specifically a convolution kernel size 1*1, the number of convolution kernels is a convolution layer of the processed characteristic channel number, strong activation reaction can be carried out on the existing time domain back gate trigger mode in the frequency domain, the function is composed of learning training parameters, anomalies in a teacher model are found along with the training process of a student model, and frequency domain disturbance caused by the local characteristics is weakened through a global matching item of a final loss function, so that the aim of resisting back gate attack is fulfilled;
the information fusion between the two levels is completed in the optimized Fourier convolution, and the fusion mode is that the filtered global, local-global and global-local frequency domain features are respectively changed into the filtered global, local-global and global-local frequency domain features according to the relation Y l And Y is equal to g The calculation formulas of (1) are fused to obtain a feature map Fourier of the frequency domain fusion g [l]With Fourier l [l];
The method for restoring the frequency domain features to the time domain is inverse operation of two-dimensional real number domain discrete Fourier transform, the operation acts on the latter two dimensions of the frequency domain output, and the output can be converted back to the time domain for subsequent operation.
Further, the attention mechanism in the step (5) is a mechanism for weighting a plurality of targets to obtain weights, and the formula is as follows:
wherein Q and K are Fourier global [l]Initializing, and V is a Fourier local [l]Initializing, d k The number of channels is used for controlling the model scale;
in order to increase the operation speed, a multi-head attention mechanism is used to accelerate the computation, and the multi-head attention can be computed by utilizing the parallel computing capability of the graphic computing unit to obtain the acceleration, and the formula is as follows:
MultiHeadAttention(Q,K,V)=Concat(head 1 ,…,head h )W O
wherein the projection weight is a parameter matrix
The global and local are not only global and local information abstractions defined by a Fourier convolution layer, but also deep and shallow layers on a feature map level, and the deep feature map can be regarded as global features by comparing information contained in the same pixels from the aspect of information condensation;
further, the executing step in the step (6) outputs feature graphs Feats on the L-order of the student model stu Executing from deep layer to shallow layer in sequence in the set, and inputting feature graphs Feats of the deep layer after scale alignment in the first-stage fusion stu [L]Features with shallow layer stu [L-1]Initializing, and performing the subsequent execution round [ l-1 ]]The deeper feature map in the input is obtained from the previous roundFourier of arrival local [i]Substitution is performed.
Further, the Kullback-le divergence mutual entropy objective function in the step (7) is a common objective function for fitting the distribution, and the formula is as follows:
wherein p and q are respectively approximate probability distributions, namely Feats in the scene of the method tea [l]And Feats attn [l]Feature map, L is Feats tea Lambda of the total layer number l The duty cycle of the different layer characteristic losses in the total loss can be set as an optional parameter.
Said for Logits stu The calculation of the original objective function refers to the objective function of the user in the application task required by the user, namely, custom loss The loss is defined and calculated by the user.
Further, the weighting parameters in the step (8) include the respective-level weighting parameters λ defined and used in the step (7) l The loss α, which controls the ratio of the distillation portion to the original target loss, is expressed as follows:
Total loss =α×Distill loss +Custom loss
the feedback training student model refers to a method for optimizing the model by using an error back propagation optimization algorithm, and a selected optimization function can be defined by a user.
The invention has the following advantages:
1. the Fourier convolution is used for converting the model output into the frequency domain for processing, so that a back gate trigger response mode which is partially unobvious in time domain characteristics but can directly cause frequency domain change can be eliminated;
2. the feature map from Fourier convolution is processed by using a self-attention mechanism, so that self-attention fusion can be carried out on semantic information in a frequency domain, stronger boundary information is given, and the model expression effect is improved;
3. the information of different layers is processed in a cascading way, so that a reverse pyramid structure can be formed, forward pyramid information output by an original model is supplemented, the process of reversely transmitting the information can also help a lower-layer model to acquire high-order semantic information, and the model learning capacity and the model accuracy are improved;
4. the student model with high precision and capable of removing the time domain attack back door and the attack back door based on local information can be obtained by learning on the basis of the pre-training model of the unreliable source.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of the improved fourier convolution information flow according to the present invention.
Detailed Description
The invention will now be described in further detail with reference to the accompanying drawings, which are given by way of illustration only and are not intended to limit the scope of the invention.
The method for resisting the model back door attack based on the frequency domain feature fusion reconstruction is shown in fig. 1.
The teacher model applied in this embodiment is ResNet101, the student model is ResNet18, and the user data set is a high resolution image classification data set consistent with the ImageNet format, comprising the steps of:
(1) Initializing a teacher model and a student model, and reasoning by using a batch user data set to obtain a teacher feature map set Feats tea Feature graphs integrated with students stu And finally output probability distribution Logits tea And Logits stu . The specific description is as follows:
(1a) Initializing ResNet101 model as teacher model M tea Initializing ResNet18 as student model M using untrusted ResNet101 weight overlay parameters downloaded by an open source platform stu Loading an ImageNet data set and initializing a random gradient descent optimizer according to training parameters;
(1b) Inputting data set input into M in turn tea And M is as follows stu Respectively calculating and obtaining the outputs at different levels to obtain a feature map set Feats stu And Feats tea Final output probability distribution Logits stu And Logits tea
(2) For a deeper student characteristic diagram Feats stu [l+1]Alignment of dimensions and channels to match shallower layer features stu [l]Wherein, l represents a characteristic stage, and the characteristic stage is commonly input into a Fourier convolution layer after the two time domain characteristic graphs are aligned in size; the concrete explanation is as follows:
(2a) Using inputs as deeper layer output Feats stu [l+1]The channel number of (a) is output as a shallower output Feats stu [l]The convolution kernel of the number of channels is 1x1, and the convolution layer with the step length of 1 performs channel alignment on the feature map of the deeper layer;
(2b) Using bilinear interpolation mode interpolation function to align channels for deeper layer output Feats stu [l+1]Performing scale alignment to obtain a product with Feats stu [l]The same feature size.
(3) Fourier convolution layer processing is performed to remove a time domain back gate in the frequency domain, specifically as follows:
(3a) Features from a further layer are formed in the Fourier convolution layer stu [l+1]And shallower layer features stu [l]Respectively processed into frequency domain forms, respectively marked as Fourier g [l]With Fourier l [l]Wherein the global higher order feature is identified as global feature using g with respect to the current first layer feature, the current layer identifies more detail information, and the local feature is identified using l;
(3b) Cross-weighting the higher and lower layer information of the frequency domain to complete the Feats stu [l+1]Incorporating higher-order semantic information injection fusion to Feats stu [l]Information fusion of (a) while utilizing Feats stu [l]The detail semantic information pair Feats contained in the system stu [l+1]Supplementing so as to fuse the information between the two layers of feature graphs in the frequency domain, wherein the rule of the cross-weighting calculation is shown in fig. 2, and the formula is as follows:
Y l =Y l→l +Y g→l =f l (X l )+f g→l (X g ),
Y g =Y g→g +Y l→g =f g (X g )+f l→g (X l )
wherein Y represents output, X represents input, l and g represent local and global information, f functions with subscripts are processing functions respectively aiming at local and global, and the left side and the right side of a 'to' symbol respectively represent information layers to which information before and after processing is executed;
(3c) In the cross weighting calculation process, the convolution filter is synchronously carried out in the frequency domain, and the convolution filter is a filter with parameter learning and can be obtained by using Fourier of the frequency domain g [l]With Fourier l [l]The characteristic is filtered so as to remove the abnormal state of the time domain back gate trigger mode in the frequency domain, and M is avoided in the subsequent learning process tea Transmitting a time domain back door attack trigger mode;
(3d) Frequency domain characteristic Fourier after processing g [l]With Fourier l [l]The inverse transformation of discrete Fourier transformation is used for restoring to the time domain, the transformation scope is the final two dimensions of the feature map, and the feature map Fourier after the frequency domain semantic information fusion and convolution filtering is output global [l]With Fourier local [l]Two time domain feature maps.
(4) The self-attention mechanism is utilized to perform Fourier on the fused characteristic diagram global [l]With Fourier local [l]Performing global semantic information attention weight calculation by using Fourier with high-order global information global [l]As the query vector Q and the keyword vector K, fourier having low-order detail information is used local [l]Completing self-attention operator calculation as a value vector V and outputting attention feature graphs Feats attn [l]The method comprises the steps of carrying out a first treatment on the surface of the The concrete explanation is as follows:
(4a) Attention mechanism vs. Fourier using the following formula global [l]With Fourier local [l]Processing is carried out to obtain interaction weights of local and global information:
wherein Q and K are Fourier global [l]Initializing, and V is a Fourier local [l]Initializing, d k The number of channels is used for controlling the model scale;
(4b) In order to increase the operation speed, a multi-head attention mechanism is used to accelerate the computation, and the multi-head attention can be computed by utilizing the parallel computing capability of the graphic computing unit to obtain the acceleration, and the formula is as follows:
MultiHeadAttention(Q,K,V)=Concat(head 1 ,…,head h )W O
wherein the projection weight is a parameter matrix
(5) From the deepest layer M stu Feature map features stu [L]Initially, steps (2), (3) and (4) are performed in a shallow cascade, wherein during the beginning of the first-1 stage process, the deeper features in the Fourier convolution input are generated by Fourier in the ith stage process via Fourier convolution fusion local [l]Feature substitution, i.e. the input to the Fourier convolution layer in stage 1 and subsequent processing is the current layer Feats stu [l-1]Local partial Fourier in the output of the Fourier convolutional layer with the further layer local [l]。
(6) Successively for each level of Feats finally obtained tea [l]And Feats attn [l]Channel size alignment using the same technique as step (3), where l.epsilon.1, L-1]That is, the size and the channel number of the two feature maps are aligned by using a convolution layer and an interpolation function, then the loss is calculated and summed for the two feature maps corresponding to each layer l in turn by using a Kullback-Leible divergence mutual entropy loss function, and Logits are calculated and summed stu Calculating a loss according to an objective function specific to the objective application task; detailed description of the inventionThe following steps:
(6a) The Kullback-Leible divergence mutual entropy objective function is an objective function that fits different distributions, and its formula is as follows, where all λ's are taken l Set to 1:
wherein p and q are respectively approximate probability distributions, namely Feats in the scene of the method tea [l]And Feats attn [l]Feature map, L is Feats tea Lambda of the total layer number l The ratio of the characteristic losses of different layers in the total loss can be set as optional parameters;
(6b) For Logits stu The loss function used is multi-class cross entropy loss, and the formula is as follows:
where M represents the number of categories, N is the total sample size, y ic Is a sign function (0 or 1), y if the true class of sample i is equal to c ic Get 1, otherwise y ic Taking 0, p ic The observation samples i belong to the prediction probability of category c.
(7) And (3) carrying out weighted summation on different losses to obtain total losses, and training the total losses back to the convergence of the student model, wherein the total losses are calculated according to the following formula:
Total loss =0.5×Distill loss +Custom loss
optimization of M by gradient update using random gradient descent algorithm after calculation of total loss stu Until the model loss converges, a high-precision ResNet18 model with the back door safely removed can be obtained.
Although specific embodiments of the invention have been disclosed for illustrative purposes, it will be appreciated by those skilled in the art that the invention may be implemented with the help of a variety of examples: various alternatives, variations and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will have the scope indicated by the scope of the appended claims.

Claims (9)

1. A model back door attack countermeasure method based on frequency domain feature fusion reconstruction includes the steps:
1) Selecting and initializing a model corresponding to the upper application task to which the target application task belongs as a teacher model M tea Obtaining the teacher model M tea To the teacher model M tea Covering the parameter weight of the (2); selecting and initializing a model as student model M stu The method comprises the steps of carrying out a first treatment on the surface of the The student model M stu Is output level number of (2) and the teacher model M tea Consistent;
2) Inputting samples of the target application task data set to the teacher model M respectively tea And student model M stu In the middle, a teacher model M is obtained tea Feature map set Feats composed of feature maps outputted from each intermediate level of (a) tea Teacher model M tea Probability distribution Logits of last layer output tea Obtaining a student model M stu Feature map set Feats composed of feature maps outputted from each intermediate level of (a) stu Obtaining a student model M stu Probability distribution Logits of last layer output stu
3) Starting from the upper layer of the deepest middle layer, the student model M stu Feature map Feats of the first +1 intermediate layer output stu [l+1]Feature maps Feats output with the first intermediate layer stu [l]After size alignment, the two components are input into a Fourier convolution layer together; the Fourier convolution layer pair feature map features stu [l+1]、Feats stu [l]After converting to frequency domain, information fusion is carried out, and Feats are obtained stu [l+1]Incorporating higher-order semantic information injection fusion to Feats stu [l]Obtaining a global frequency domain feature map Fourier of the first intermediate layer g [l]And utilize Feats stu [l]The detail semantic information pair Feats contained in the system stu [l+1]Supplementing to obtain a local frequency domain feature map Fourier of the first intermediate layer l [l]Then to Fourier g [l]With Fourier l [l]Respectively carrying out inverse transformation and recovering to the time domain to obtain a global time domain feature map Fourier of the first intermediate layer global [l]Fourier of local time domain feature map with the first intermediate layer local [l]The method comprises the steps of carrying out a first treatment on the surface of the Then the self-attention mechanism is utilized for the Fourier global [l]、Fourier local [l]Performing global semantic information attention weight calculation to obtain self-attention feature map features attn [l]And utilizes the local time domain feature map Fourier of the first output in the next stage of processing the first-1 layer local [l]Replacing feature map sets Feats as higher-order semantic information in fourier convolution layer input stu Feature map Feats of inner first-1 intermediate layer output stu [l-1]The method comprises the steps of carrying out a first treatment on the surface of the Wherein l is E [1, L-1 ]]L is the number of layers of the intermediate layer;
4) For feature graph set Feats tea Each feature map in the first intermediate layer outputs feature maps Feats tea [l]And corresponding self-attention feature map Feats attn [l]After alignment, calculating the loss of the first intermediate layer by using a Kullback-Leible divergence mutual entropy loss function, and then summing the losses of the intermediate layers to obtain a loss value Distill loss The method comprises the steps of carrying out a first treatment on the surface of the According to Logits corresponding to each sample in the target application task data set stu The real label corresponding to the sample is calculated to obtain a classification loss value Custom loss
5) According to loss value Distilll loss And loss value Custom loss Total loss Total calculated loss Optimizing the student model M stu
6) Iteratively repeating steps 2) -5) until the student model M stu And converging to obtain a safety model with the potential back door removed.
2. The method of claim 1, wherein Total loss =α×Distill loss +Custom loss The method comprises the steps of carrying out a first treatment on the surface of the Alpha is a proportional coefficient.
3. According to claimThe method of 1 or 2, wherein the loss value customer is calculated using a target application task loss function loss
4. The method of claim 1, wherein the Fourier is performed using a self-attention mechanism global [l]As query vector Q and keyword vector K, fourier is taken as local [l]Self-attention calculation is carried out as a value vector V, and a self-attention feature map Feats is output attn [l]。
5. The method of claim 1, wherein the target application task dataset is a trusted dataset that is weaker in scale and annotation accuracy than a dataset used by the pre-trained model.
6. The method according to claim 1, wherein the student model M stu Is equal to or weaker in model capacity than the teacher model M tea
7. The method according to claim 1, characterized in that the Total loss Total is based on loss Use of error back propagation optimization algorithm on the student model M stu And (5) optimizing.
8. A server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the method of any of claims 1 to 7.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202310754608.2A 2023-06-25 2023-06-25 Model back door attack countermeasure method based on frequency domain feature fusion reconstruction Pending CN116824334A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310754608.2A CN116824334A (en) 2023-06-25 2023-06-25 Model back door attack countermeasure method based on frequency domain feature fusion reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310754608.2A CN116824334A (en) 2023-06-25 2023-06-25 Model back door attack countermeasure method based on frequency domain feature fusion reconstruction

Publications (1)

Publication Number Publication Date
CN116824334A true CN116824334A (en) 2023-09-29

Family

ID=88113962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310754608.2A Pending CN116824334A (en) 2023-06-25 2023-06-25 Model back door attack countermeasure method based on frequency domain feature fusion reconstruction

Country Status (1)

Country Link
CN (1) CN116824334A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421678A (en) * 2023-12-19 2024-01-19 西南石油大学 Single-lead atrial fibrillation recognition system based on knowledge distillation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421678A (en) * 2023-12-19 2024-01-19 西南石油大学 Single-lead atrial fibrillation recognition system based on knowledge distillation
CN117421678B (en) * 2023-12-19 2024-03-22 西南石油大学 Single-lead atrial fibrillation recognition system based on knowledge distillation

Similar Documents

Publication Publication Date Title
US9619749B2 (en) Neural network and method of neural network training
Papernot et al. The limitations of deep learning in adversarial settings
US9390373B2 (en) Neural network and method of neural network training
CN110941794B (en) Challenge attack defense method based on general inverse disturbance defense matrix
Wang et al. Neural architecture search for robust networks in 6G-enabled massive IoT domain
CN111310814A (en) Method and device for training business prediction model by utilizing unbalanced positive and negative samples
CN112580728B (en) Dynamic link prediction model robustness enhancement method based on reinforcement learning
Veness et al. Online learning with gated linear networks
CN116824334A (en) Model back door attack countermeasure method based on frequency domain feature fusion reconstruction
Kim et al. Exploring temporal information dynamics in spiking neural networks
Chivukula et al. Adversarial learning games with deep learning models
Xiao et al. Noise optimization in artificial neural networks
Khan et al. A hybrid defense method against adversarial attacks on traffic sign classifiers in autonomous vehicles
Hu et al. RL-VAEGAN: Adversarial defense for reinforcement learning agents via style transfer
CN109697511B (en) Data reasoning method and device and computer equipment
CN115758337A (en) Back door real-time monitoring method based on timing diagram convolutional network, electronic equipment and medium
CN115019102A (en) Construction method and application of confrontation sample generation model
Khaled et al. Careful what you wish for: on the extraction of adversarially trained models
Katzir et al. Why blocking targeted adversarial perturbations impairs the ability to learn
Xiu et al. FreMix: Frequency‐Based Mixup for Data Augmentation
Gangloff et al. A general parametrization framework for pairwise Markov models: An application to unsupervised image segmentation
CN116702832A (en) Back door attack countermeasure method and system for artificial intelligent model migration security
CN113283520B (en) Feature enhancement-based depth model privacy protection method and device for membership inference attack
Tajwar On the Robustness of Prunnig Algorithms to Adversarial Attacks
Peck Improving the robustness of deep neural networks to adversarial perturbations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination