CN113268730B

CN113268730B - Smart power grid false data injection attack detection method based on reinforcement learning

Info

Publication number: CN113268730B
Application number: CN202110486653.5A
Authority: CN
Inventors: 吴争光; 张阔
Original assignee: Qunzhi Future Artificial Intelligence Technology Research Institute Wuxi Co ltd
Current assignee: Qunzhi Future Artificial Intelligence Technology Research Institute Wuxi Co ltd
Priority date: 2021-05-01
Filing date: 2021-05-01
Publication date: 2023-07-25
Anticipated expiration: 2041-05-01
Also published as: CN113268730A

Abstract

The invention discloses a smart grid false data injection attack detection method based on reinforcement learning. In view of the observation value construction, the direct false data injection attack detection combines a residual method with threshold segmentation, and the hidden false data injection attack combines the difference norm of the measured value with the threshold segmentation to respectively obtain the observation value. And respectively obtaining Q tables by using observation value training, and detecting the attack by using the Q tables. The design realizes the rapid detection of the hidden attack, increases the detection speed and the success rate, has simple implementation method and can obviously improve the detection efficiency.

Description

Smart power grid false data injection attack detection method based on reinforcement learning

Technical Field

The invention relates to the field of smart grids, in particular to a smart grid false data injection attack detection method based on reinforcement learning.

Background

The smart power grid is a novel power grid technology which separates an information transmission channel from a power transmission channel, so that the power grid has more efficient power resource allocation and stronger anti-interference capability, and along with the deep development of the information technology, the information safety of the smart power grid becomes an important concern. The false data injection attack plays an important role in intelligent power grid information attack research, and the core idea is that the state estimation of a power grid system is influenced by using a constructed attack vector through the loophole of a traditional detection method, so that the safe and stable operation of the power grid system is damaged. The traditional false data injection attack detection method is a bad data detection method. The method can only detect direct false data injection attacks, can not detect hidden false data injection attacks, and has a common detection success rate due to the adoption of a single threshold value. The current machine learning detection method can realize the detection of the hidden false data injection attack, but the detection method needs to ensure that the attack vector is far larger than noise and the attack vector is slightly larger than the process noise.

Disclosure of Invention

The invention aims to provide a method for detecting false data injection attack of a smart grid based on reinforcement learning aiming at the defects of the prior art.

The aim of the invention is realized by the following technical scheme: a false data injection attack detection method based on reinforcement learning comprises the following steps:

step one: establishing a general linear model of the power grid:

x _t ＝Ax _t-1 +v _t (1)

y _t ＝Hx _t +w _t (2)

wherein x is _t ＝[x _1，t ，…，x _n，t ，…，x _N，t ]For the system state at time t, x _n，t The phase angle on the nth node at the moment t is expressed, and N represents the total state number of the system; the measurement at time t is denoted as y _t ＝[y _1，t ，…，y _m，t ，…，y _M，t ]，y _m，t The detection value of the mth measuring instrument at the t moment is represented, and M represents the total measuring instrument value;for state transition matrix>Jacobian matrix, determined by the topology of the network>Representing a real set; />Representing system noise at time t->Representing the variance of the process noise, the value of which is determined by the system, I _N Representation ofAn N-dimensional identity matrix; />Represents the measurement noise at time t, < >>Representing the variance of the measured noise, the value of which is determined by the measuring device, I _M Representing an M-dimensional identity matrix;

step two: virtual attack obtains samples: the attacked measurement can be obtained using equation (3) for direct attacks, equation (4) for hidden attacks,

in which a is _t Attack vector Hc representing direct attack at time t _t Since H does not change over time, c is used as an attack vector representing a hidden attack _t Representing a hidden attack vector, a _t And c _t Known in sample training, unknown in actual detection, τ is the time the system is under attack,representing step functions, i.e. when t.gtoreq.τ

Step three: obtaining an observation value: calculating the measured value y _t And its estimated valueIs used as the detection of direct false data injection attacks, the current detection value y _t With the detected value y at the previous moment _t-1 Is the residue of (2)The difference module length is used for detecting the hidden false data injection attack, a threshold segmentation method is used for dividing the two module values to obtain a direct false data injection attack instant observed value and a hidden false data injection attack instant observed value respectively, a sliding window method is used for updating the two instant observed values into the observed values to obtain a direct false data injection attack observed value and a hidden false data injection attack observed value in corresponding time respectively;

step four: obtaining t-moment detector action a using epsilon greedy strategy _t : dividing the system into two states, s _n The system is not attacked and s _a The system is attacked and the detector action is also divided into two states a _s Alerting the algorithm to consider the system to be attacked, a _c The algorithm is shown to consider that the system is not attacked and does not give an alarm, and a direct attack detection observed value is obtained at the time tObservation value for detection of hidden attack +.>Q-table Q based on direct spurious data injection attack detection using greedy strategy ⁿ Q-table Q for attack detection with hidden false data injection ^s Selecting detector action, epsilon greedy strategy, i.e. the detector selects optimal action with probability 1-epsilon, randomly selects action with probability epsilon, and epsilon is updated once every d steps, and the update formula is shown as formula (5)

ε＝max(ε-e ^-1 ，ε _min ) (5)

Where e is the sample value, ε, that is the current and has been used _min A minimum epsilon value set for human beings;

step five: training was performed using the Sarsa algorithm, the Q table was updated using equation (6),

the parameter expression containing the upper corner mark i is used for detecting iParameters of type attack, i=n or s, i.e. when i=n the parameters are used for detecting direct attacks, and when i=s the parameters are used for detecting hidden attacks, Q ⁱ The Q table required to detect an i-type attack,observations for detecting i-type attacks at time t, < >>Get +.>Actions that can be taken later for i-type attacks, alpha being learning efficiency, gamma ⁱ For the fit factor trained for type i attacks, +.>The state for detecting an i-type attack for time t is +.>Action as->The return of the time is shown in the formula (7),

wherein r is ₀ And b is a predetermined coefficient of leading alarm return value and lagging alarm return value,the system state is detected for the type i at the time t;

step six: repeating steps one to five until reaching the stage maximum detection time T, orAnd->In which there is a _s Appearance;

step seven: repeating the first to sixth steps until the total sample number E is used up to obtain a complete Q ^d Table and Q ^s A table;

step eight: in the detection, the observed values are obtained by using the steps one to threeAnd->Using (8) according to Qd table and Q respectively ^s Watch obtaining action>And->When both action values are a _c Repeating the step until +.>And->One of which is a _s Stop detecting and giving an alarm when +.>Is a as _s The system is considered to be under direct spurious data injection attack when +.>Is a as _s The system is considered to be under direct spurious data injection attacks.

Further, the third step is realized by the following substeps:

(3.1) setting a threshold: setting direct false data injection attack threshold according to different power grid structures Threshold value of attack against hidden false data injection>

(3.2) obtaining a detection value: obtaining the detection value y at the moment t from each detection instrument _t Invoking the t-1 moment detection value y _t-1 ；

(3.3) estimating the detection value by using Kalman filtering: obtaining a t-time state estimation value by using a least squares algorithm represented by the formulas (9) and (10)Calculating t moment measurement estimated value +.>

In the middle ofFor measuringA variance matrix of the value deviations;

(3.4) calculating a deviation module value: calculating deviation module values of the measured value and the estimated value at the time t by using the steps (12) and (13), respectivelyAnd the value of the modulus which varies between time t and time t-1 +.>

(3.5) obtaining an instantaneous observation value by using a threshold segmentation method: from the following componentsAnd->The immediate observations of direct and suppressed spurious data injection attacks can be obtained according to equation (14)>And->

Because the threshold segmentation method is consistent, the upper corner mark i is still used for replacing the upper corner mark n and the upper corner mark s, i.e. i in the formula (14) can be n or s at the same time;

(3.6) observed values were obtained using a sliding window method: make t-1 moment straightAttach false data injection attack observationsIs thatHidden false data injection attack observation +.>Is->The sliding window method is used for adding the corresponding instant observed value at the t moment to the observed value at the t-1 moment, and then removing the oldest instant observed value to obtain the direct false data injection attack observed value at the t moment as +.>Hidden false data injection attack observation value is +.>

The method has the beneficial effects that the detection of the false data injection attack is realized by using the Sarsa algorithm, the detection accuracy and the detection speed of the false data injection attack are improved, the method has a good effect on the detection of the hidden false data injection attack, and the direct false data injection attack and the detection of the hidden false data injection attack are conveniently realized.

Drawings

Figure 1 is an IEEE-14 node diagram from which an H-array can be obtained,

figure 2 is a flow chart of a training Q-table,

figure 3 is a flow chart of the detection process,

figure 4 shows a lead alarm rate detection map,

figure 5 shows a hysteresis alert rate detection graph,

figure 6 shows a graph of the total alarm failure rate,

figure 7 shows a lead alarm rate detection map,

figure 8 shows a hysteresis alert rate detection graph,

figure 9 shows a graph of the total alarm failure rate,

figure 10 shows an alarm category error rate diagram,

figure 11 shows a graph of the immediate detection success rate of a hidden dummy data injection attack,

fig. 12 shows a graph of the immediate detection success rate of a direct spurious data injection attack.

Detailed Description

In order to enhance the understanding and appreciation for the invention, the invention will be described in detail below with reference to the drawings and embodiments.

Example 1: referring to fig. 1-4, a smart grid false data injection attack detection method based on reinforcement learning includes the following steps:

step one: establishing a general linear model of the power grid:

x _t ＝Ax _t-1 +v _t (1)

y _t ＝Hx _t +w _t (2)

wherein x is _t ＝[x _1，t ，…，x _n，t ，…，x _N，t ]For the system state at time t, x _n，t The phase angle on the nth node at the moment t is expressed, N represents the total state number of the system, and 14 is taken; the measurement at time t is denoted as y _t ＝[y _1，t ，…，y _m，t ，…，y _M，t ]，y _m，t The detection value of the mth measuring instrument at the t moment is represented, M represents the value of the total measuring instrument, and 23 is taken;is a state transition matrix, which is set as a unit matrix, < >>Jacobian matrix, determined by the topology of the network>Representing a real set;representing system noise at time t->Representing the variance of the process noise, the value of which takes 10 ^-4 ，I _N Representing an N-dimensional identity matrix; />Represents the measurement noise at time t, < >>Representing the variance of the measured noise, the value of which takes 2 x 10 ^-4 ，I _M Representing an M-dimensional identity matrix;

in which a is _t Attack vector Hc representing direct attack at time t _t Since H does not change over time, c is used as an attack vector representing a hidden attack _t Representing a hidden attack vector, a _t And c _t Known in sample training, unknown in actual detection, τ is the attack time of the system, 10 < τ < 200,representing a step function, i.e. when t is t

Step three: obtaining an observation value: calculating the measured value y _t And its estimated valueIs used as the detection of direct false data injection attacks, the current detection value y _t With the detected value y at the previous moment _t-1 The residual error module length of the (2) is used as the detection of the hidden false data injection attack, a threshold segmentation method is used for dividing the two module values to obtain a direct false data injection attack instant observed value and a hidden false data injection attack instant observed value respectively, a sliding window method is used for updating the two instant observed values into the observed values to obtain a direct false data injection attack observed value and a hidden false data injection attack observed value corresponding to the time respectively;

this step is the core of the present invention and is divided into the following sub-steps.

3.1 A threshold is set.

Setting direct false data injection attack threshold according to different power grid structuresThreshold value of attack against hidden false data injection>Taking out

3.2 A detection value is obtained.

Obtaining the detection value y at the moment t from each detection instrument _t Invoking the t-1 moment detection value y _t-1 ；

3.3 Using kalman filter to estimate the detection value.

Obtaining a t-time state estimation value by using a least squares algorithm represented by the formulas (9) and (10)Calculating t moment measurement estimated value +.>

In the middle ofA variance matrix for the measured value bias;

3.4 Calculating a deviation module value.

Calculating deviation module values of the measured value and the estimated value at the time t by using the steps (12) and (13), respectivelyAnd the value of the modulus which varies between time t and time t-1 +.>

3.5 A threshold segmentation method is used to obtain the instantaneous observations.

From the following componentsAnd->The immediate observations of direct and suppressed spurious data injection attacks can be obtained according to equation (14)>And->

3.6 Using a sliding window method to obtain observations.

Let t-1 time directly false data inject attack observation valueIs->Hidden false data injection attack observation +.>Is->The sliding window method is used for adding the corresponding instant observed value at the t moment to the observed value at the t-1 moment, and then removing the oldest instant observed value to obtain the direct false data injection attack at the t momentThe observed value of the click was +.>The observation value of the hidden false data injection attack is

Step four: obtaining t-moment detector action a using epsilon greedy strategy _t : dividing the system into two states, namely that the sn system is not attacked and s _a The system is attacked and the detector action is also divided into two states a _s Alerting the algorithm to consider the system to be attacked, a _c The algorithm is shown to consider that the system is not attacked and does not give an alarm, and a direct attack detection observed value is obtained at the time tObservation value for detection of hidden attack +.>Q-table Q based on direct spurious data injection attack detection using greedy strategy ⁿ Q-table Q for attack detection with hidden false data injection ^s Selecting detector action, epsilon greedy strategy, i.e. the detector selects optimal action with probability 1-epsilon, randomly selects action with probability epsilon, and updates epsilon once every d steps, let d=40, and the update formula is shown in formula (5)

ε＝max(ε-e ^-1 ，ε _min ) (5)

Where e is the sample value, ε, that is the current and has been used _min =0.01 is a minimum epsilon value set by human, and epsilon initial value is set to 0.2;

the parameter containing the superscript i in the formula represents a parameter for detecting an i-type attack, i=n or s,i.e. when i=n, the parameter is used to detect a direct attack, when i=s, the parameter is used to detect a hidden attack, Q ⁱ The Q table required to detect an i-type attack,observations for detecting i-type attacks at time t, < >>Get +.>Actions which can be taken for i-type attack later, alpha is learning efficiency and is set to 0.1 and gamma ⁱ For the matching factor trained for i-type attack, 1 is set for both direct and hidden dummy data injection attacks, ++>The state for detecting an i-type attack for time t is +.>Action as->The return of the time is shown in the formula (7),

wherein r is ₀ And b is a predetermined coefficient of the leading alarm return value and the lagging alarm return value, which are respectively set as r ₀ ＝1、b＝0.01，The system state is detected for the type i at the time t;

step six: repeating steps one to five until reaching the stage maximum detection time t=300, orAnd->In which there is a _s Appearance;

step seven: repeating the first to sixth steps until the total sample number E=40000 is used up to obtain the complete Q ^d Table and Q ^s A table;

step eight: in the detection, the observed values are obtained by using the steps one to threeAnd->According to Q using formula (8) ^d Table and Q ^s Watch obtaining action>And->When both action values are a _c Repeating the step until +.>And->One of which is a _s Stop detecting and giving an alarm when +.>Is a as _s The system is considered to be under direct spurious data injection attack when +.>Is a as _s The system is considered to be under direct spurious data injection attacks.

As can be seen in conjunction with the drawings, FIG. 4 shows a leading alarm rate detection chart, in which a _t And c _t Respectively obey [0,0.075 ]]、[0.075，0.15]、[0.1，0.175]、[0.15，0.225]、[0.175，0.25]FAR represents the advance alarm rate, i.e. the frequency of occurrence of an alarm without an attack, calculated as the number of advance alarms divided by the total number of detections, direct attack test represents the detection result of a direct false data injection attack using the method, stea1thattack test represents the detection result of a hidden false data injection attack using the method, BDD represents the result of a conventional bad data monitoring method, BDD can only detect a direct false data injection attack (detection threshold is set to 0.006), lm is a _t And c _t Obeying the lower interval limit of distribution, um is a _t And c _t Obeying the upper interval limit of the distribution. FIG. 5 shows a hysteresis alarm rate detection chart, in which a _t And c _t Respectively obey [0,0.075 ]]、[0.075，0.15]、[0.1，0.175]、[0.15，0.225]、[0.175，0.25]Where DAR represents the rate of the delayed alarms, i.e., the frequency of detecting more than 10 alarms not yet performed after an attack, calculated as the number of delayed alarms divided by the total number of detections represents the rate of the delayed alarms. FIG. 6 shows a total alarm failure rate graph, in which a _t And c _t Respectively obey [0,0.075 ]]、[0.075，0.15]、[0.1，0.175]、[0.15，0.225]、[0.175，0.25]Where TFR represents the total alarm failure rate, i.e. the total frequency of detection failures including FAR, DAR, CER, calculated as the total number of failures divided by the total number of detections. FIG. 7 shows a lead alarm rate detection chart, in which a _t And c _t Respectively obey [0,0.15 ]]、[0.01，0.25]、[0.15，0.3]、[0.2，0.35]、[0.25，0.4]Is a uniform distribution of (c). FIG. 8 shows a hysteresis alarm rate detection chart, in which a _t And c _t Respectively obey [0,0.15 ]]、[0.01，0.25]、[0.15，0.3]、[0.2，0.35]、[0.25，0.4]Is a uniform distribution of (c). FIG. 9 shows a total alarm failure rate graph, in which a _t And c _t Respectively obey [0,0.15 ]]、[0.01，0.25]、[0.15，0.3]、[0.2，0.35]、[0.25，0.4]Is a uniform distribution of (c). Fig. 10 shows an alarm class error rate chart, where um-lm=0.075 is a _t And c _t Following the distribution of fig. 4, um-lm= 0.1.5 is a _t And c _t Following the distribution of fig. 7, um-lm=0.075 is a _t And c _t Obeying respectively [0.03,0.08 ]]、[0.05，0.1]、[0.1，0.15]、[0.15，0.2]、[0.2，0.25]The CER represents the alarm class error rate, i.e. the frequency with which the direct and suppressed spurious data injection attacks detect class errors, calculated as the number of class detection errors divided by the total detection times. Fig. 11 shows a graph of the immediate detection success rate of a hidden false data injection attack, three distributions being identical to fig. 10, SDR shows the immediate detection success rate, i.e. the frequency of the alarm immediately after the attack, calculated as the number of immediate alarms divided by the total detection number. Fig. 12 shows a graph of the immediate detection success rate of a direct dummy data injection attack, with three distributions identical to fig. 10.

It should be noted that the above-mentioned embodiments are not intended to limit the scope of the present invention, and equivalent changes or substitutions made on the basis of the above-mentioned technical solutions fall within the scope of the present invention as defined in the claims.

Claims

1. The smart grid false data injection attack detection method based on reinforcement learning is characterized by comprising the following steps of:

step one: a general linear model of the electrical network is built,

step two: the virtual attack takes a sample of the sample,

step three: the observation value is obtained and the data of the observation value,

step four: obtaining t-moment detector action a using epsilon greedy strategy _t ：

Step five: the training was performed using the Sarsa algorithm,

step eight: detecting, namely judging whether the system is attacked by direct false data injection;

step one: establishing a general linear model of the power grid:

x _t ＝Ax _t-1 +v _t (1)

y _t ＝Hx _t +w _t (2)

wherein x is _t ＝[x _1，t ，…，x _n，t ，…，x _N，t ]For the system state at time t, x _n，t The phase angle on the nth node at the moment t is expressed, and N represents the total state number of the system; the measurement at time t is denoted as y _t ＝[y _1，t ，…，y _m，t ，…，y _M，t ]，y _m，t The detection value of the mth measuring instrument at the t moment is represented, and M represents the total measuring instrument value;for state transition matrix>Jacobian matrix, determined by the topology of the network>Representing a real set; />Representing system noise at time t->Representing the variance of the process noise, the value of which is determined by the system, I _N Representing an N-dimensional identity matrix; />Represents the measurement noise at time t, < >>Representing the variance of the measured noise, the value of which is determined by the measuring device, I _M Representing an M-dimensional identity matrix;

in which a is _t Attack vector Hc representing direct attack at time t _t Since H does not change over time, c is used as an attack vector representing a hidden attack _t Representing a hidden attack vector, a _t And c _t Known in sample training, unknown in actual detection, τ is the time the system is under attack,representing a step function, i.e. +.when t.gtoreq.tau>

ε＝max(ε-e ^-1 ，ε _min ) (5)

step five: training was performed using the Sarsa algorithm, updating the Q table using equation (4),

the parameter containing the superscript i in the formula represents a parameter for detecting an i-type attack, i=n or s, i.e. when i=n the parameter is used for detecting a direct attack, when i=s the parameter is used for detecting a hidden attack, Q ⁱ The Q table required to detect an i-type attack,observations for detecting i-type attacks at time t, < >>Get +.>Actions that can be taken later for i-type attacks, alpha being learning efficiency, gamma ⁱ For the fit factor trained for type i attacks, +.>The state for detecting an i-type attack for time t is +.>Action as->The return of the time is shown in the formula (7),

wherein r is ₀ And b is a predetermined coefficient of leading alarm return value and lagging alarm return value,and detecting the system state for the type i at the time t.

2. The smart grid dummy data injection attack detection method based on reinforcement learning according to claim 1, wherein the step six: repeating steps one to five until reaching the stage maximum detection time T, orAnd->In which there is a _s Appears.

3. The smart grid dummy data injection attack detection method based on reinforcement learning according to claim 1, wherein step seven: repeating the first to sixth steps until the total sample number E is used up to obtain a complete Q ^d Table and Q ^s And (3) a table.

4. The smart grid dummy data injection attack detection method based on reinforcement learning according to claim 1, wherein the step eight: in the detection, the observed values are obtained by using the steps one to threeAnd->According to Q using formula (8) ^d Table and Q ^s Watch obtaining action>And->When both action values are a _c Repeating the step until +.>And->One of which is a _s Stop detecting and giving an alarm when +.>Is a as _s The system is considered to be under direct spurious data injection attack when +.>Is a as _s The system is considered to be under direct spurious data injection attacks,

5. the smart grid dummy data injection attack detection method based on reinforcement learning according to claim 1, wherein the step three is implemented by the following substeps:

(3.1) setting a threshold: setting direct false data injection attack threshold according to different power grid structuresThreshold value of attack against hidden false data injection>

In the middle ofA variance matrix for the measured value bias;

(3.6) observed values were obtained using a sliding window method: let t-1 time directly false data inject attack observation valueIs thatHidden false data injection attack observation +.>Is->The sliding window method is used for adding the corresponding instant observed value at the t moment to the observed value at the t-1 moment, and then the oldest instant observed valueValue elimination can obtain the observation value of direct false data injection attack at the moment t as +.> Hidden false data injection attack observation value is +.>