CN117972700A

CN117972700A - Poisoning attack detection and punishment method and system based on deep reinforcement learning

Info

Publication number: CN117972700A
Application number: CN202410377957.1A
Authority: CN
Inventors: 万涛; 邓仙庆; 廖维川; 肖勇才; 刘遵雄; 周洁; 江娜; 虞莹豪
Original assignee: East China Jiaotong University
Current assignee: East China Jiaotong University
Priority date: 2024-03-29
Filing date: 2024-03-29
Publication date: 2024-05-03

Abstract

The invention belongs to the technical field of federal learning, and discloses a poisoning attack detection and punishment method and system based on deep reinforcement learning, wherein in the federal learning process, a server is used as an intelligent agent of the deep reinforcement learning, and a depth deterministic strategy gradient algorithm is used for dynamically updating an evaluation threshold value through interaction with the environment; and calculating cosine similarity of the local model parameter updating quantity and the global model parameter updating quantity of each client, comparing the obtained cosine similarity with an evaluation threshold value to identify an abnormal client, and prohibiting the abnormal client from participating in federal learning in the next several rounds of training. The invention uses cosine similarity to measure the similarity of the local model parameter updating quantity and the global model parameter updating quantity of the client, thereby finding possible poisoning attack; and the evaluation threshold value is dynamically updated by using DDPG algorithm, so that the safety of federal learning is enhanced.

Description

Poisoning attack detection and punishment method and system based on deep reinforcement learning

Technical Field

The invention belongs to the technical field of federal learning, and relates to a poisoning attack detection and punishment method and system based on deep reinforcement learning.

Background

Federal learning is a distributed machine learning method that allows multiple clients to train their own models locally, and then send parameters or updates of the models to a central server, which aggregates and synchronizes the knowledge sharing across the clients. The federal learning method has the advantages of protecting the data privacy of the client, reducing the communication overhead and improving the generalization capability of the model. However, federal learning also faces some challenges, one of which is a poisoning attack.

Poisoning attacks are malicious attacks that undermine the effectiveness of federal learning by injecting incorrect or biased information into the client's data or model. The hazard of poisoning attacks is obvious, and can lead to reduced quality of federal learning models, even to erroneous or unreliable predictions, thereby affecting federal learning application value and confidence.

The invention method of patent publication number CN117216779A provides a federal learning security aggregation method based on cosine similarity and homomorphic encryption, 1) a client performs gradient ciphertext uploading, each client trains a model obtained in the previous round by utilizing respective privacy data to obtain corresponding gradient update data, encrypts a gradient plaintext by using the same public key through homomorphic encryption technology to obtain a gradient ciphertext, and transmits the gradient ciphertext to a parameter Server1 through a secure communication channel; 2) The parameter Server1 performs gradient aggregation and calculation of contribution values; 3) Gradient update release.

However, this solution has the following drawbacks: 1) The threshold value is fixed and cannot be changed in real time according to the environment; 2) Instead of punishing an attacker, the contribution degree is used for selecting the client to participate, and the honest client can be removed from federal learning.

Thus, the problems to be solved include:

(1) When detecting poisoning attack, how to detect an attacker more intelligently and reasonably according to environmental transformation;

(2) How to punish malicious users, and avoid attackers from damaging the training of federal learning.

Disclosure of Invention

In order to solve the problems, the invention provides a poisoning attack detection and punishment method and system based on deep reinforcement learning.

The invention is realized by the following technical scheme. The poisoning attack detection and punishment method based on deep reinforcement learning comprises the following steps:

step S1: the server initializes the global model;

step S2: the server screens out clients which are not punished and selects a certain proportion of clients from the clients to participate in federal learning; the server sends the global model to the selected client;

Step S3: the selected client trains the received global model by using own data to obtain local model parameters, calculates the update quantity of the local model parameters and uploads the update quantity to the server;

Step S4: the server receives the local model parameter updating quantity sent by each client, aggregates to obtain a new global model, and calculates the global model parameter updating quantity;

Step S5: the server is used as an intelligent agent for deep reinforcement learning, and dynamically updates the evaluation threshold value by interacting with the environment and using a depth deterministic strategy gradient algorithm;

Step S6: the server calculates cosine similarity of the local model parameter updating quantity and the global model parameter updating quantity of each client, compares the obtained cosine similarity with an evaluation threshold, and marks the corresponding client as an abnormal client if the cosine similarity of the local model parameter updating quantity and the global model parameter updating quantity of the client is lower than the evaluation threshold;

Step S7: the server penalizes the abnormal client and prohibits the abnormal client from participating in federal learning in the next several rounds of training;

step S8: the server judges whether a preset target or condition is met, if yes, training is finished, otherwise, the steps S2 to S7 are repeated until convergence or stop conditions are reached.

Further preferably, the updating of the Depth Deterministic Policy Gradient (DDPG) algorithm includes two parts, one part being an updating of the Critic network and the other part being an updating of the Actor network; the Actor network is responsible for generating an action of adjusting the evaluation threshold, and receives the current stateAnd outputs an actionThis action represents an adjustment to the evaluation threshold, the goal of the Actor network being to maximize the expected return given by the Critic network; critic network evaluates the value of action generated by the Actor network, which receives the current stateAnd action proposed by the Actor networkAnd outputs the value of this state-action pair。

Further preferably, the policy function of the Actor network is expressed as:

；

Wherein, Representing the policy function of the Actor network,Is a parameter of the Actor network.

Further preferably, the cost function of the Critic network is expressed as:

；

Wherein, Is an instant prize to be awarded,Is a discount factor that is used to determine the discount,AndThe cost function and policy function of the target Critic network and the target Actor network respectively,Is in the next state and is in the next state,Is a parameter of the Critic network and,AndParameters of the target Critic network and the target Actor network, respectively.

Further preferably, the process of dynamically updating the evaluation threshold using the depth deterministic strategy gradient algorithm is as follows:

Randomly initializing a parameter and experience pool of an Actor-Critic network in a depth deterministic strategy gradient (DDPG) algorithm;

The Actor network selects an action according to the current state, namely, the adjustment of the evaluation threshold value;

updating the evaluation threshold by applying the action selected by the Actor network;

Storing the state, action and rewards into an experience pool;

Samples are drawn from the experience pool, the Actor network and Critic network are trained and their parameters are updated.

Further preferably, the cosine similarity calculating method is as follows:

；

Wherein, Representing cosine similarity of the local model parameter update quantity a and the global model parameter update quantity B,Euclidean norms representing local model parameter update amount a; The euclidean norm representing the global model parameter update quantity B.

The invention provides a system for realizing a poisoning attack detection and punishment method based on deep reinforcement learning, which comprises a server and clients, wherein a global model, an evaluation threshold updating module for dynamically updating an evaluation threshold and a diagnosis module are built in the server, the diagnosis module calculates cosine similarity of local model parameter updating quantity and global model parameter updating quantity of each client, compares the obtained cosine similarity with the evaluation threshold, and marks the corresponding client as an abnormal client if the cosine similarity of the local model parameter updating quantity and the global model parameter updating quantity of the client is lower than the evaluation threshold.

The invention provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions execute the poisoning attack detection and punishment method based on deep reinforcement learning.

The invention has the following beneficial effects: firstly, using cosine similarity to measure the similarity of the local model parameter updating quantity and the global model parameter updating quantity of a client so as to discover possible poisoning attack; secondly, a dynamic evaluation threshold is set to judge whether the client is an attacker or not, the evaluation threshold is adaptively updated by using deep reinforcement learning, and the evaluation threshold can be dynamically adjusted according to the change of the environment by using DDPG algorithm, which is more effective than a static threshold; thirdly, malicious users can be automatically marked and temporarily prevented from participating in federal learning, and the security of federal learning is enhanced. The method of the invention can effectively exclude the attacker and ensure the normal operation of federal learning.

Drawings

Fig. 1 is a schematic diagram of the present invention.

FIG. 2 is a flow chart of a poisoning attack detection and punishment method based on deep reinforcement learning according to the present invention.

Detailed Description

The invention is illustrated in further detail below in connection with examples.

Referring to fig. 1 and 2, the poisoning attack detection and punishment method based on deep reinforcement learning includes the following steps:

step S1: the server initializes the global model;

Federal learning is a distributed machine learning approach that allows multiple clients to co-train a shared global model while maintaining the privacy of the respective data. In the process, each client trains the global model based on own data set, obtains the updating quantity of the local model parameters and sends the updating quantity to the server. The server is responsible for aggregating these local model parameter updates to refine the global model.

An attacker attacks a client, and the client of the poisoning attack has great harm to the bang study, and in order to detect the poisoning attack, the local model parameter updating amount and the global model parameter updating amount are regarded as vectors, and cosine similarity is used for measuring the similarity between the local model parameter updating amount and the global model parameter updating amount. The cosine similarity calculates the cosine value of the included angle between the two points, and the cosine similarity can be expressed as follows:

；

Wherein, Representing cosine similarity of the local model parameter update quantity a and the global model parameter update quantity B,Euclidean norms representing local model parameter update amount a; the euclidean norm representing the global model parameter update quantity B. If the cosine similarity approaches 1, the two vector update directions are similar; if approaching-1, the update direction is indicated to be opposite.

By comparing the cosine similarity with the evaluation threshold, the clients with the cosine similarity lower than the evaluation threshold are marked as abnormal clients, and the abnormal behavior is required to be effectively punished to prevent the identified malicious clients from further damaging the global model in the future, so that the health and stability of the federal learning environment are maintained. In the method proposed by the invention, once a client is marked as abnormal, it is prohibited from participating in the next n rounds of federal learning. This n is a hyper-parameter that can be adjusted according to the specific requirements of federal learning and security policies. In these n rounds of federal learning, prohibited clients cannot upload client model update vectors, nor can they benefit from the global model. In this way, potentially malicious participants can be automatically isolated, thereby protecting the global model from poisoning attacks. The abnormal client is not permanently forbidden, so that the misjudged client is prevented from being subjected to excessively severe punishment.

By introducing a detection and punishment mechanism based on cosine similarity, the safety of the federal learning environment can be improved. There is still a key problem, namely updating of the evaluation threshold. The setting of the evaluation threshold directly influences the sensitivity and the specificity of attack detection, and a reasonable evaluation threshold can effectively distinguish the local model parameter updating quantity of a normal client and the client of an attacker, so that false alarm and missing report are reduced. To accommodate dynamically changing environments and attack strategies, deep reinforcement learning is used to dynamically adjust the evaluation threshold. The method can automatically adjust the evaluation threshold according to the historical data and the performance of the current model so as to achieve a better detection effect.

How to dynamically update the evaluation threshold using a Depth Deterministic Policy Gradient (DDPG) algorithm to identify and defend against poisoning attacks. DDPG is a deep reinforcement learning algorithm that combines strategy gradients and Q learning. The method is mainly used for solving the problem of continuous action space, and a strategy function and a Q value function are fitted through a neural network. In federal learning, updating of the evaluation threshold is regarded as a reinforcement learning problem. The method comprises the following specific steps:

(1) Randomly initializing parameters and experience pools of an Actor-Critic network in DDPG algorithm;

(2) The Actor network selects an action according to the current state, namely, the adjustment of the evaluation threshold value;

(3) The actions are performed: updating the evaluation threshold by applying the action selected by the Actor network;

(4) Storing the state, action and rewards into an experience pool;

(5) Samples are drawn from the experience pool, the Actor network and Critic network are trained and their parameters are updated.

DDPG is an Actor-Critic framework based algorithm that combines the advantages of deep learning and reinforcement learning. In this framework, the Actor network directly maps states to an action space for generating actions in the current state, and the Critic network evaluates the value of the actions generated by the Actor network, i.e., gives the Q value of the current state and action pair. In this way, the DDPG algorithm can learn how to make the optimal decision without explicit indication.

The update formula of DDPG algorithm mainly comprises two parts, one part is the update of Critic network and the other part is the update of Actor network. The Actor network is responsible for generating the action of evaluating the threshold adjustment. It receives the current stateAnd outputs an actionThis action represents the adjustment amount of the evaluation threshold. The goal of the Actor network is to maximize the expected return given by the Critic network. The policy function of the Actor network can be expressed as:

；

The Critic network evaluates the value of the action generated by the Actor network. Critic network receives current statusAnd action proposed by the Actor networkAnd outputs the value of this state-action pair. The cost function of the Critic network can be expressed as:

；

Wherein, Is an instant prize to be awarded,Is a discount factor that is used to determine the discount,AndThe cost function and policy function of the target Critic network and the target Actor network respectively,Is in the next state and is in the next state,Is a parameter of the Critic network and,AndParameters of the target Critic network and the target Actor network, respectively. By iterating this process continuously, DDPG algorithm can learn a strategy that can propose the optimal threshold adjustment action in each state, thus effectively combating poisoning attacks.

In another embodiment of the present invention, a system for implementing a poisoning attack detection and punishment method based on deep reinforcement learning is provided, including a server and a client, where the server is built with a global model, an evaluation threshold updating module for dynamically updating an evaluation threshold, and a diagnostic module, where the diagnostic module calculates cosine similarity between a local model parameter update amount and a global model parameter update amount of each client, and compares the obtained cosine similarity with the evaluation threshold, and if the cosine similarity between the local model parameter update amount and the global model parameter update amount of the client is lower than the evaluation threshold, marks the corresponding client as an abnormal client.

In another embodiment, a non-volatile computer storage medium is provided, the computer storage medium storing computer-executable instructions that perform the poisoning attack detection and punishment method based on deep reinforcement learning of any of the above embodiments.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The poisoning attack detection and punishment method based on deep reinforcement learning is characterized by comprising the following steps of:

step S1: the server initializes the global model;

2. The method for detecting and punishing poisoning attack based on deep reinforcement learning according to claim 1, wherein the updating of the depth deterministic strategy gradient algorithm comprises two parts, one part is updating of a Critic network and the other part is updating of an Actor network; the Actor network is responsible for generating an action of adjusting the evaluation threshold, and receives the current stateAnd outputs an action/>This action represents an adjustment to the evaluation threshold, the goal of the Actor network being to maximize the expected return given by the Critic network; critic network evaluates the value of action generated by the Actor network, which receives the current stateAnd action/>, proposed by the Actor networkAnd outputs the value of this state-action pair/>。

3. The poisoning attack detection and punishment method based on deep reinforcement learning according to claim 2, wherein the policy function of the Actor network is expressed as:

；

Wherein, Policy function representing an Actor network,/>Is a parameter of the Actor network.

4. The poisoning attack detection and punishment method based on deep reinforcement learning according to claim 3, wherein the cost function of the Critic network is expressed as:

；

Wherein, Is instant rewarding,/>Is a discount factor,/>And/>Cost and policy functions of the target Critic and target Actor networks, respectively,/>Is the next state,/>Is a parameter of Critic network,/>And/>Parameters of the target Critic network and the target Actor network, respectively.

5. The poisoning attack detection and punishment method based on deep reinforcement learning according to claim 2, wherein the process of dynamically updating the evaluation threshold using a deep deterministic strategy gradient algorithm is as follows:

Storing the state, action and rewards into an experience pool;

6. The poisoning attack detection and punishment method based on deep reinforcement learning according to claim 1, wherein the cosine similarity calculation mode is as follows:

；

Wherein, Representing cosine similarity of local model parameter update quantity A and global model parameter update quantity B,/>, andEuclidean norms representing local model parameter update amount a; /(I)The euclidean norm representing the global model parameter update quantity B.

7. A system for implementing the poisoning attack detection and punishment method based on deep reinforcement learning according to any one of claims 1 to 6, comprising a server and clients, wherein the server is internally provided with a global model, an evaluation threshold updating module for dynamically updating an evaluation threshold, and a diagnosis module, wherein the diagnosis module calculates cosine similarity of local model parameter updating amounts and global model parameter updating amounts of the clients, compares the obtained cosine similarity with the evaluation threshold, and marks the corresponding client as an abnormal client if the cosine similarity of the local model parameter updating amounts and the global model parameter updating amounts of the clients is lower than the evaluation threshold.

8. A non-transitory computer storage medium having stored thereon computer executable instructions for performing the poisoning attack detection and punishment method based on deep reinforcement learning according to any one of claims 1 to 6.