CN113268730A

CN113268730A - Smart grid false data injection attack detection method based on reinforcement learning

Info

Publication number: CN113268730A
Application number: CN202110486653.5A
Authority: CN
Inventors: 吴争光; 张阔
Original assignee: Qunzhi Future Artificial Intelligence Technology Research Institute Wuxi Co ltd
Current assignee: Qunzhi Future Artificial Intelligence Technology Research Institute Wuxi Co ltd
Priority date: 2021-05-01
Filing date: 2021-05-01
Publication date: 2021-08-17
Anticipated expiration: 2041-05-01
Also published as: CN113268730B

Abstract

The invention discloses a smart grid false data injection attack detection method based on reinforcement learning, which is based on a detection method of false data injection attack of an Sarsa algorithm and divides the attack into direct false data injection attack and hidden false data injection attack for respective detection. On the aspect of observation value construction, a residual method is combined with threshold segmentation by direct false data injection attack detection, and the difference norm of a measured value is combined with the threshold segmentation by hidden false data injection attack to respectively obtain observation values. And (5) respectively obtaining Q tables by using the observation value training, and realizing the detection of the attack by using the Q tables. The design realizes the rapid detection of the hidden attack, increases the detection speed and the success rate, has simple realization method and can obviously improve the detection efficiency.

Description

Smart grid false data injection attack detection method based on reinforcement learning

Technical Field

The invention relates to the field of smart power grids, in particular to a smart power grid false data injection attack detection method based on reinforcement learning.

Background

The smart power grid is a novel power grid technology which separates an information transmission channel from a power transmission channel, so that the power grid has more efficient power resource allocation and stronger anti-interference capability, and the information security of the smart power grid becomes an important concern along with the deep development of the information technology. The false data injection attack plays an important role in the research of the smart grid information attack, and the core idea is that the state estimation of a power grid system is influenced by using constructed attack vectors through the loopholes of the traditional detection method, so that the safe and stable operation of the power system is damaged. The traditional false data injection attack detection method is a bad data detection method. The method can only detect direct false data injection attacks and cannot detect hidden false data injection attacks, and meanwhile, the detection success rate is general due to the fact that a single threshold is adopted. The current detection method of machine learning can realize the detection of the injection attack of the hidden false data, but the detection method that the attack vector is far larger than the noise and the attack vector is slightly larger than the process noise is lacked.

Disclosure of Invention

The invention aims to provide a smart grid false data injection attack detection method based on reinforcement learning aiming at the defects of the prior art.

The purpose of the invention is realized by the following technical scheme: a false data injection attack detection method based on reinforcement learning comprises the following steps:

the method comprises the following steps: establishing a general linear model of the power grid:

x_t＝Ax_t-1+v_t (1)

y_t＝Hx_t+w_t (2)

wherein x_t＝[x_1，t，…，x_n，t，…，x_N，t]Is the system state at time t, x_n，tRepresenting the phase angle of the nth node at the moment t, wherein N represents the total state number of the system; the measurement value at time t is denoted as y_t＝[y_1，t，…，y_m，t，…，y_M，t]，y_m，tThe detection value of the mth measuring instrument at the moment t is represented, and M represents the total measuring instrument value;

in order to be a state transition matrix,

is a Jacobian matrix determined by the grid topology,

representing a set of real numbers;

which is indicative of the system noise at time t,

the variance representing the process noise, the value of which is determined by the system, I_NRepresenting an N-dimensional identity matrix;

representing the noise of the measurement at the time t,

the variance representing the measurement noise, the value of which is determined by the measuring device, I_MRepresenting an M-dimensional identity matrix;

step two: virtual attack acquisition samples: the attacked measurement value can be obtained using equation (3) for a direct attack, the attacked measurement value can be obtained using equation (4) for an insidious attack,

in the formula a_tAttack vector, Hc, representing a direct attack at time t_tThe attack vector representing a secret attack uses c because H does not change over time_tRepresenting a concealed attack vector, a_tAnd c_tIt is known in sample training, unknown in actual detection, tau is the attack time of the system,

representing step function, i.e. when t ≧ τ

Step three: obtaining an observed value: calculating the measured value y_tAnd its estimated value

Is used as the detection of direct spurious data injection attacks, and the current detection value y is used_tAnd the detected value y of the previous time_t-1The residual modular length is used for detecting the injection attack of the hidden false data, a threshold segmentation method is used for carrying out degree division on two modular values to respectively obtain a direct false data injection attack instant observed value and a hidden false data injection attack instant observed value, a sliding window method is used for updating the two instant observed values into observed values, and a direct false data injection attack observed value and a hidden false data injection attack observed value which correspond to time are respectively obtained;

step four: deriving a Detector action a at time t using an epsilon greedy strategy_t: the system is divided into two states, s_nThe system is not attacked and_athe system is attacked, and the detector action is also divided into two states a_sIssuing an alarm for the algorithm to assume the system is under attack, a_cThe expression algorithm considers that the system is not attacked and does not send an alarm, and a direct attack detection observation value is obtained at the moment t

And detection of observations by covert attacks

Q-table Q based on direct spurious data injection attack detection using greedy strategyⁿQ-table Q with detection of concealed spurious data injection attacks^sSelecting detector actions, namely an epsilon greedy strategy that the detector selects the optimal action according to the probability 1-epsilon and randomly selects the action according to the probability epsilon, the epsilon is updated once every d steps, and the updating formula is shown as the formula (5)

ε＝max(ε-e^-1，ε_min) (5)

Where e is the current and used sample value, ε_minIs a minimum epsilon value set by human;

step five: training is carried out by using the Sarsa algorithm, a Q table is updated by using an equation (6),

the parameter containing the superscript i in the formula represents a parameter for detecting an i-type attack, i ═ n or s, i ═ n indicates that the parameter is for detecting a direct attack, i ═ s indicates that the parameter is for detecting a concealed attack, Q represents that the parameter is for detecting a concealed attackⁱThe Q-table required to detect the type i attack,

for the observations at time t used to detect type i attacks,

for time t to obtain

The action that can be taken later on for the i-type attack, α is the learning efficiency, γⁱTo characterize the factor for training against type i attacks,

the state for detecting an i-type attack for time t is

Acting as

The value of the time is shown as the formula (7),

in the formula r₀And b is a predetermined advance alarm report and retard alarm report coefficient,

the system state of type i detection at time t;

step six: repeating steps one to five until the maximum detection time T of the phase is reached, or

And

in which is a_s(ii) occurs;

step seven: repeating the first step to the sixth step until the total number E of samples is used up to obtain complete Q^dWatch and Q^sTable;

step eight: during detection, the observed value is obtained by using the steps from the first step to the third step

And

using equation (8) based on the Qd table and Q, respectively^sTable acquisition actions

And

when both action values are a_cThe steps are repeated until

And

has one as_sStopping detection and giving an alarm when

Is a_sThe system is considered to be attacked by direct spurious data injection when

Is a_sThe system is considered to be attacked by direct spurious data injection.

Further, the third step is realized by the following sub-steps:

(3.1) setting a threshold: respectively setting direct false data injection attack threshold values according to different power grid structures

And a suppressed spurious data injection attack threshold

(3.2) obtaining a detection value: obtaining a detection value y at time t from each detection instrument_tRecall the t-1 time detection value y_t-1；

(3.3) estimating a detection value by using Kalman filtering: obtaining a state estimation value at time t by using a least square algorithm represented by equations (9) and (10)

The estimated value of the measurement at the time t is calculated by using the formula (11)

In the formula

A variance matrix which is a deviation of the measured values;

(3.4) calculating a deviation module square value: calculating the deviation norm value of the measured value and the estimated value at the time t by using the equations (12) and (13), respectively

And the module square value of the change of t time and t-1 time

(3.5) obtaining instantaneous observations using a threshold segmentation method: by

And

the immediate observation value of the direct spurious data injection attack and the concealed spurious data injection attack can be obtained according to the formula (14)

And

since the threshold segmentation methods are consistent, the superscript i is still used to replace the superscript n and the superscript s, i in the formula (14) may be n or s at the same time;

(3.6) obtaining an observed value using a sliding window method: make t-1 moment directly inject the false data into attack observation value

Is composed of

Hiding false data injection attack observations

Is composed of

Adding the corresponding instant observed value at the time t to the observed value at the time t-1 by using a sliding window method, and then removing the oldest instant observed value to obtain a direct false data injection attack observed value at the time t

The observation of a concealed spurious data injection attack is

The invention has the advantages that the detection of the false data injection attack is realized by using the Sarsa algorithm, the detection accuracy and the detection speed of the false data injection attack are improved, the detection of the concealed false data injection attack also has a better effect, and the direct false data injection attack and the concealed false data injection attack are conveniently detected.

Drawings

Figure 1 is a diagram of IEEE-14 nodes from which an H-matrix can be obtained,

figure 2 is a flow chart of a training Q table,

figure 3 is a flow chart of the detection,

figure 4 shows a look-ahead rate detection diagram,

figure 5 shows a late alarm rate detection graph,

figure 6 shows a graph of the total alarm failure rate,

figure 7 shows a look-ahead rate detection map,

figure 8 shows a graph of late alarm rate detection,

figure 9 shows a graph of total alarm failure rate,

figure 10 shows an alarm category error rate map,

figure 11 shows a graph of the instantaneous detection success rate of a suppressed spurious data injection attack,

fig. 12 shows a graph of the instant detection success rate of a direct spurious data injection attack.

Detailed Description

For the purposes of promoting an understanding and appreciation of the invention, reference will now be made in detail to the present embodiments of the invention illustrated in the accompanying drawings.

Example 1: referring to fig. 1-4, a smart grid false data injection attack detection method based on reinforcement learning includes the following steps:

x_t＝Ax_t-1+v_t (1)

y_t＝Hx_t+w_t (2)

wherein x_t＝[x_1，t，…，x_n，t，…，x_N，t]Is the system state at time t, x_n，tDenoted as nth node at time tThe phase angle of the system is shown as N, and 14 is taken; the measurement value at time t is denoted as y_t＝[y_1，t，…，y_m，t，…，y_M，t]，y_m，tThe detection value of the mth measuring instrument at the moment t is shown, M represents the value of the total measuring instrument, and 23 is taken;

the state transition matrix, set as the identity matrix,

is a Jacobian matrix determined by the grid topology,

representing a set of real numbers;

which is indicative of the system noise at time t,

the variance of process noise is 10^-4，I_NRepresenting an N-dimensional identity matrix;

representing the noise of the measurement at the time t,

the variance of the measurement noise is expressed by 2 × 10^-4，I_MRepresenting an M-dimensional identity matrix;

in the formula a_tAttack vector, Hc, representing a direct attack at time t_tThe attack vector representing a secret attack uses c because H does not change over time_tRepresenting a concealed attack vector, a_tAnd c_tIn the sample training, it is known that it is unknown in the actual detection, tau is the attack time of the system, tau is set to be 10 < tau < 200,

representing step functions, i.e. when there is t of t

this step is the core of the present invention and is divided into the following substeps.

3.1) setting a threshold value.

Respectively setting direct false data injection attack threshold values according to different power grid structures

And a suppressed spurious data injection attack threshold

Get

3.2) obtaining a detection value.

Obtaining a detection value y at time t from each detection instrument_tRecall the t-1 time detection value y_t-1；

3.3) estimating the detection value by using Kalman filtering.

Obtaining a state estimation value at time t by using a least square algorithm represented by equations (9) and (10)

In the formula

A variance matrix which is a deviation of the measured values;

3.4) calculating a deviation module square value.

Calculating the deviation norm value of the measured value and the estimated value at the time t by using the equations (12) and (13), respectively

And the module square value of the change of t time and t-1 time

3.5) obtaining the instantaneous observation value by using a threshold segmentation method.

By

And

And

3.6) obtaining the observed values using a sliding window method.

Make t-1 moment directly inject the false data into attack observation value

Is composed of

Hiding false data injection attack observations

Is composed of

The observation of a concealed spurious data injection attack is

Step four: deriving a Detector action a at time t using an epsilon greedy strategy_t: the system is divided into two states, namely the state that the sn system is not attacked and the state that the sn system is not attacked_aThe system is attacked, and the detector action is also divided into two states a_sIssuing an alarm for the algorithm to assume the system is under attack, a_cThe expression algorithm considers that the system is not attacked and does not send an alarm, and a direct attack detection observation value is obtained at the moment t

And detection of observations by covert attacks

Q-table Q based on direct spurious data injection attack detection using greedy strategyⁿQ-table Q with detection of concealed spurious data injection attacks^sThe detector actions are selected, an epsilon greedy strategy, i.e. the detector selects the optimal action with probability 1-epsilon, and selects the action randomly with probability epsilonIf epsilon is updated once every d steps, let d equal to 40, the updating formula is shown in formula (5)

ε＝max(ε-e^-1，ε_min) (5)

Where e is the current and used sample value, ε_minThe minimum epsilon value set by people is 0.01, and the initial value of epsilon is set to be 0.2;

for the observations at time t used to detect type i attacks,

for time t to obtain

The action that can be taken later for the i-type attack, α is the learning efficiency, and is set to 0.1, γⁱThe impression factor for the i-type attack training is set to 1 for both the direct spurious data injection attack and the concealed spurious data injection attack,

the state for detecting an i-type attack for time t is

Acting as

The time is reported by the value asIs shown in a formula (7),

in the formula r₀And b are predetermined early and late alarm return coefficients, respectively set to r₀＝1、b＝0.01，

The system state of type i detection at time t;

step six: repeating the first to fifth steps until the maximum detection time T of the phase is 300, or

And

in which is a_s(ii) occurs;

step seven: repeating the first step to the sixth step until the total number of samples E is used up to 40000, and obtaining the complete Q^dWatch and Q^sTable;

And

according to Q using formula (8)^dWatch and Q^sTable acquisition actions

And

when both action values are a_cThe steps are repeated until

And

has one as_sStopping detection and giving an alarm when

As can be seen in conjunction with the drawing, FIG. 4 shows a look-ahead alarm rate detection map, where a_tAnd c_tAre respectively obeyed to 0, 0.075]、[0.075，0.15]、[0.1，0.175]、[0.15，0.225]、[0.175，0.25]The FAR represents the advanced alarm rate, namely the frequency of alarm occurrence without attack, the calculation mode is the number of advanced alarms divided by the total detection number, direct attack test represents the detection result of direct false data injection attack by using the method, stea1thattack test represents the detection result of hidden false data injection attack by using the method, BDD represents the result of using the traditional bad data monitoring method, BDD can only detect direct false data injection attack (the detection threshold is set to 0.006), lm is a_tAnd c_tThe lower limit of the distribution-obeying interval, um is a_tAnd c_tSubject to the upper interval limit of the distribution. FIG. 5 shows a graph of late alarm rate detection, where a_tAnd c_tAre respectively obeyed to 0, 0.075]、[0.075，0.15]、[0.1，0.175]、[0.15，0.225]、[0.175，0.25]Wherein DAR represents the rate of late warning, i.e. the frequency of more than 10 detections after an attack without warning, is calculated in such a way that the number of late warning divided by the total number of detections represents the rate of late warning. FIG. 6 is a graph of total alarm failure rate, where a_tAnd c_tAre equally dividedCompliance with [0, 0.075 ]]、[0.075，0.15]、[0.1，0.175]、[0.15，0.225]、[0.175，0.25]Where TFR represents the total alarm failure rate, i.e. the total frequency of detection failures including FAR, DAR, CER, calculated as the total number of failures divided by the total number of detections. FIG. 7 shows a look-ahead rate detection map, where a_tAnd c_tAre respectively obeyed to 0, 0.15]、[0.01，0.25]、[0.15，0.3]、[0.2，0.35]、[0.25，0.4]Is uniformly distributed. FIG. 8 shows a late alarm rate detection graph, in which a_tAnd c_tAre respectively obeyed to 0, 0.15]、[0.01，0.25]、[0.15，0.3]、[0.2，0.35]、[0.25，0.4]Is uniformly distributed. FIG. 9 is a graph of total alarm failure rate, where a_tAnd c_tAre respectively obeyed to 0, 0.15]、[0.01，0.25]、[0.15，0.3]、[0.2，0.35]、[0.25，0.4]Is uniformly distributed. FIG. 10 is a graph showing the error rate of the alarm category, where "a" denotes "0.075_tAnd c_tFollowing the distribution of fig. 4, um-lm-0.1.5 is a_tAnd c_tFollowing the distribution of fig. 7, um-lm ═ 0.075 is a_tAnd c_tRespectively obey [0.03, 0.08]、[0.05，0.1]、[0.1，0.15]、[0.15，0.2]、[0.2，0.25]The CER represents the alarm class error rate, i.e. the frequency of direct spurious data injection attacks and concealed spurious data injection attacks that detect the class is erroneous, calculated as the number of class detection errors divided by the total number of detections. Fig. 11 shows a graph of the success rate of the instantaneous detection of the concealed false data injection attack, with the three distributions being the same as in fig. 10, SDR showing the success rate of instantaneous detection, i.e. the frequency of alarming immediately after the attack, calculated as the number of alarming immediately divided by the total number of detections. Fig. 12 shows a graph of the instantaneous detection success rate of a direct spurious data injection attack, with three distributions identical to fig. 10.

It should be noted that the above-mentioned embodiments are not intended to limit the scope of the present invention, and all equivalent modifications and substitutions based on the above-mentioned technical solutions are within the scope of the present invention as defined in the claims.

Claims

1. A smart grid false data injection attack detection method based on reinforcement learning is characterized by comprising the following steps:

the method comprises the following steps: a generally linear model of the power grid is established,

step two: the virtual attack obtains a sample of the data,

step three: the acquisition of the observed value is carried out,

step four: deriving a Detector action a at time t using an epsilon greedy strategy_t：

Step five: the Sarsa algorithm is used for training purposes,

And

in which is a_s(ii) occurs;

step eight: and detecting and judging whether the system is attacked by direct false data injection.

2. The reinforcement learning-based smart grid false data injection attack detection method according to claim 1,

x_t＝Ax_t-1+v_t (1)

y_t＝Hx_t+w_t (2)

wherein x_t＝[x_1,t,…,x_n,t,…,x_N,t]Is the system state at time t, x_n,tRepresenting the phase angle of the nth node at the moment t, wherein N represents the total state number of the system; the measurement value at time t is denoted as y_t＝[y_1,t，…,y_m,t,…,y_M,t]，y_m,tExpressed as the value detected by the M-th meter at time t, M being the total meterA value;

in order to be a state transition matrix,

is a Jacobian matrix determined by the grid topology,

representing a set of real numbers;

which is indicative of the system noise at time t,

representing the noise of the measurement at the time t,

the variance representing the measurement noise, the value of which is determined by the measuring device, I_MRepresenting an M-dimensional identity matrix.

3. The reinforcement learning-based smart grid false data injection attack detection method according to claim 1,

representing step function, i.e. when t ≧ τ

4. The reinforcement learning-based smart grid false data injection attack detection method according to claim 1,

Is used as the detection of direct spurious data injection attacks, and the current detection value y is used_tAnd the detected value y of the previous time_t-1The residual modular length is used for detecting the injection attack of the hidden false data, a threshold segmentation method is used for carrying out degree division on two modular values to respectively obtain a direct false data injection attack immediate observation value and a hidden false data injection attack immediate observation value, a sliding window method is used for updating the two immediate observation values into the observation values, and the direct false data injection attack observation value and the hidden false data injection attack observation value at the corresponding time are respectively obtained.

5. The reinforcement learning-based smart grid false data injection attack detection method according to claim 1,

And detection of observations by covert attacks

ε＝max(ε-e^-1,ε_min) (5)

Where e is the current and used sample value, ε_minIs a minimum epsilon value set by human.

6. The reinforcement learning-based smart grid false data injection attack detection method according to claim 1,

step five: training is carried out by using the Sarsa algorithm, a Q table is updated by using an equation (4),

the parameter containing the superscript i in the formula represents a parameter for detecting an i-type attack, i ═ n or s, i ═ n indicates that the parameter is for detecting a direct attack, i ═ s indicates that the parameter is for detecting a concealed attack, Q represents that the parameter is for detecting a concealed attackⁱRequired for detecting i-type attacksThe table of Q to be used is,

for the observations at time t used to detect type i attacks,

for time t to obtain

the state for detecting an i-type attack for time t is

Acting as

The value of the time is shown as the formula (7),

the system status of type i detection at time t.

7. The reinforcement learning-based smart grid false data injection attack detection method according to claim 1, characterized by comprising the following steps: repeating steps one to five until the maximum detection time T of the phase is reached, or

And

in which is a_sAnd occurs.

8. The reinforcement learning-based smart grid false data injection attack detection method according to claim 1, characterized by comprising the following steps: repeating the first step to the sixth step until the total number E of samples is used up to obtain complete Q^dWatch and Q^sTable (7).

9. The reinforcement learning-based smart grid false data injection attack detection method according to claim 1, characterized in that the step eight: during detection, the observed value is obtained by using the steps from the first step to the third step

And

according to Q using formula (8)^dWatch and Q^sTable acquisition actions

And

when both action values are a_cThe steps are repeated until

And

has one as_sStopping detection and giving an alarm when

Is a_sThe system is considered to be attacked by direct spurious data injection,

10. the reinforcement learning-based smart grid false data injection attack detection method according to claim 1, wherein the third step is realized by the following sub-steps: