CN109375514A

CN109375514A - A kind of optimal track control device design method when the injection attacks there are false data

Info

Publication number: CN109375514A
Application number: CN201811453386.6A
Authority: CN
Inventors: 刘皓
Original assignee: Shenyang Aerospace University
Current assignee: Shensu Intelligent Agricultural Machinery Equipment Henan Co ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2019-02-22
Anticipated expiration: 2038-11-30
Also published as: CN109375514B

Abstract

The present invention relates to a kind of intelligent-tracking controllers, and when there are false data injection attacks, which can calculate optimal tracking control law in real time, so that the reference input of tracking system is capable of in the output of system.The controller may include different control algolithm processors, using the adaptive dynamic programming algorithm based on game theory and Q- study, the case where can be adapted for the unknown situation of system dynamic, can only even obtain input-output data.The present invention is suitable for the case where by Wireless networking systems and controller, or the case where by wireless communication networks transmission data, has great application value in terms of UAV Formation Flight, intelligent vehicle.

Description

A kind of optimal track control device design method when the injection attacks there are false data

Technical field

It the present invention relates to the use of game theory, adaptive Dynamic Programming and intensified learning method, linear discrete time system There are the methods for when false data injection attacks, determining optimal track control device for system.

Background technique

Optimal track control is an important subject of control field, has a wide range of applications background.For example, intelligence The track following of vehicle and unmanned plane, the tracing control etc. of robot.The purpose of optimal track control is to make the output of system most Under excellent meaning can track reference input (or reference locus), this can be by minimizing previously given quadratic performance index Function is realized.It should be pointed out that with the development and application of network technology, wireless network transmission technology is more and more Remote transmission applied to data.However, due to the presence of wireless network, so that the data of transmission are easy to be attacked by opponent, it is main It to include Denial of Service attack, Replay Attack, false data injection attacks etc..So research there are when network attack it is optimal with Track control has important practical significance.Present invention is generally directed to false data injection attacks to be studied.

Traditional optimal track control designs corresponding tracking control unit using dynamic programming method.However, Dynamic Programming Method belongs to recurrence method from the front to the back, therefore cannot be in line computation, and there are problems that dimension calamity.Adaptive dynamic is advised Draw method belong to artificial intelligence scope, be fundamentally based on intensified learning theory, simulation people by the feedback to complex environment into The thinking of row study, and then time forward Recursive Solution control strategy, so can execute online.

Optimum control rate is calculated using Q learning method, it may not be necessary to the system square of original system and reference locus generator Battle array, the situation unknown suitable for certain dynamic matrixes.In addition, this method can also be asked only with inputoutput data, iteration Optimal track control strategy is solved, without current state information.

Summary of the invention

When present invention aims at proposing a kind of injection attacks there are false data, discrete-time system optimal track control device Design method, can not be tracked when solving the problems, such as to be previously present false data injection attacks.System construction drawing of the invention is such as Shown in Fig. 1.Technical solution of the present invention is implemented as follows:

1) false data challenge model and augmented system model are established；

2) Game Theory is used, the betting model of attacking and defending both sides is established；Defender is controller, and attacker is false data Injection side；

3) Bellman equation is established, by the theory of optimal control, solves optimal control policy and attack strategies；Using strategy Alternative manner and value alternative manner solve game algebra Riccati equation；

4) use the intensified learning method based on Q- function, solve game both sides optimal policy, including Policy iteration and It is worth alternative manner；

5) input-output data are based only on, using Q- learning method, iteratively solve optimal policy.

Detailed description of the invention

Fig. 1 is that there are system construction drawings when false data injection attacks.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

With reference to Fig. 1, the invention proposes a kind of methods learnt using game theory, adaptive Dynamic Programming and Q, solve The optimal track control problem of discrete-time system.Specific embodiment is as follows:

1) false data challenge model and augmentation model are established

Consider following system model

x_k+1=Ax_k+Bu_k (1)

Wherein, A and B is sytem matrix；Assuming that system control input u_kIt is under attack in transmission process, by false data System model after injection attacks becomes

Wherein, q is attacker's number,It indicates i-th Transmission is logical to receive j-th of attacker's attack, otherwise not under fire；For the false data of j-th of channel of k moment injection.

Assuming that trace model has following form

Wherein, matrix T is trace model matrix；It should be noted that sytem matrix T should not Seeking Truth Hurwitz in above formula 's.Convolution (2) and formula (3), can obtain augmented system equation as follows

2) Game Theory is used, the betting model of attacking and defending both sides is established

In general, there are many forms for controller, for example, state feedback, output feedback, Dynamic Output Feedback etc..In addition, injection False data can also be varied.Present invention assumes that tracking control unit and false data are stateLinear function, i.e.,

Wherein, K=[K₁, K₂] andThe respectively feedback oscillator of attacking and defending both sides.Respectively game both sides select Take following pay off function:

Wherein,Q_e>=0, R > 0, γ ∈ (0,1) are discount factor.So defender and The optimal policy of attacker is

(9) and (10) are solved to be equivalent to solve following problem of game

3) Bellman equation is established, by the theory of optimal control, solves optimal control policy and attack strategies

Firstly, the utility function being defined as follows,

Then, by calculating, available following optimum control Bellman equation,

According to the theory of optimal control it is found thatP > 0.So by solving optimal control equation, it can To obtain the optimal policy of following attacking and defending both sides:

Wherein,

Θ=[(Θ¹)^T(Θ³)^T…(Θ^q)^T]^T

L (P)=[(L¹(P))^T(L²(P))^T…(L^q(P))^T]^T

Matrix P > 0, and meet

Result above is provided according to dynamic programming method, can only off-line calculation.We use intensified learning side now Method, in line computation game both sides' optimal policy.Policy iteration and value iterative calculation is set forth in following algorithm 1 and algorithm 2 Process.

Algorithm 1: strategy of on-line iteration

1. initialization: setting j=0 selects stable initial policyWith

2. Policy evaluation: solving following equation and seek P^j+1

3. stragetic innovation:

Algorithm 2: Iteration algorithm

1. initialization: setting j=0 selects stable initial policyWith

2. Policy evaluation: solving following equation and seek P^j+1

3. stragetic innovation:

Equation (17) are solved it can be seen from algorithm 1 needs given dataAnd Initial value must can be steady, otherwise equation is without solution.And algorithm 2 is correspondingly improved, it is no longer necessary to which initial value is can be steady 's.

4) the intensified learning method based on Q- function is used, the optimal policy of game both sides is solved

The Q- function being defined as follows,

The compact form being written as follow again for convenience,

Wherein,

So can be by solving equationWithIt is available as follows Attacking and defending both sides' optimal policy,

Formula (20) is brought into formula (19), the available Bellman equation based on Q- function, the equation is in iterative process An important equation.Policy iteration and value alternative manner based on Q- function provide in algorithm 3 and algorithm 4 respectively.

Algorithm 3: the Policy iteration algorithm based on Q- function

1. initialization: setting j=0 selects H⁰=(H⁰)^T

2. Policy evaluation: solving following equation and seek P^j+1

3. stragetic innovation:

4, stop condition: | | H^j+1-H^j| | < ∈

Algorithm 4: the Iteration algorithm based on Q- function

1. initialization: setting j=0 selects H⁰=(H⁰)^T

2. Policy evaluation: solving following equation and seek P^j+1

3. stragetic innovation:

4, stop condition: | | H^j+1-H^j| | < ∈.

It is worth noting that, the system that iterative algorithm 3 and algorithm 4 based on Q- function do not need previously known augmented system MatrixWith

Optimal policy is iteratively solved using Q- learning method based on input-output data

Assuming that system is observable, then system modeIt can be indicated using following input-output sequence,

Wherein,

As can be seen from the above equation, there are a constant k > 0, so that as N < k, rank (V_N) < n+p, as N >=κ, rank(V_N)=n+p.Wherein, n is original system state dimension, and p is that system exports dimension.Therefore, selection N >=κ makes matrix V_NColumn Full rank.Definition

So, Q- function can be write as following form

Therefore, the optimal policy of available attacking and defending both sides is

Wherein,

Bellman equation based on Q- function and input-output data can be write as

Linear parameterization Q- function, it is available

In above formula, unknown matrixHaveA unknown element,BecauseBased on the above analysis, algorithm 5 and algorithm 6 are set forth Input-output data only only used using the Policy iteration and value alternative manner, this method of Q- study.

Algorithm 5: the Policy iteration algorithm learnt using Q-

1. initialization: setting j=0, the initial policy that selection can be steadyWith

2. Policy evaluation: solving following equation and seek h^j+1

3. stragetic innovation:

4, stop condition: | | H^j+1-H^j| | < ∈

Algorithm 6: the Iteration algorithm learnt using Q-

1. initialization: setting j=0 selects arbitrary initial policyWith

2. Policy evaluation: solving following equation and seek h^j+1

3. stragetic innovation:

4, stop condition: | | H^j+1-H^j| | < ∈；

Attacking and defending both sides initial policy is not needed it can be seen from algorithm 6 can be steady.In addition, for recursive calculation Number of samples must satisfy

Claims

1. optimal track control device design method when a kind of injection attacks there are false data, which is characterized in that including walking as follows It is rapid:

Step 1: false data challenge model and augmented system model are established；

Step 2: Game Theory is used, the betting model of attacking and defending both sides is established；

Step 3: use the intensified learning method based on Q- function, solve game both sides optimal policy, including Policy iteration and It is worth alternative manner；

Step 4: being based on input-output data, using Q- learning method, iteratively solves optimal policy.

2. optimal track control device design method when a kind of injection attacks there are false data according to claim 1, It is characterized in that, the step one specifically:

Consider following system model:

x_k+1=Ax_k+Bu_k

Wherein, A and B is sytem matrix；If system control input u_kIt is under attack in transmission process, then by false data System model after injection attacks are as follows:

Wherein, q is attacker's number, Indicate that i-th of transmission is logical J-th of attacker's attack is received, otherwise not under fire；For the false data of j-th of channel of k moment injection；

Assuming that trace model has following form:

Wherein, matrix T is trace model matrix；Then augmented system can state are as follows:

3. optimal track control device design method when a kind of injection attacks there are false data according to claim 1, It is characterized in that, the step two specifically:

Assuming that tracking control unit and false data are stateLinear function, i.e.,

Wherein, K=[K₁, K₂] andThe respectively feedback oscillator of attacking and defending both sides.

Game both sides choose following pay off function:

Wherein, γ ∈ (0,1) is discount factor, Q_eIt is respectively given positive semidefinite and positive definite matrix with R；Defender and attacker Optimal policy design are as follows:

4. optimal track control device design method when a kind of injection attacks there are false data according to claim 1, It is characterized in that, the step three specifically:

The Q- function being defined as follows:

By solving equationWithAvailable following attacking and defending both sides are optimal Action strategy:

Wherein, Policy iteration and value alternative manner based on Q- function provide in algorithm 1 and algorithm 2 respectively；

Algorithm 1: the Policy iteration algorithm based on Q- function includes the following steps,

1), initialize: setting j=0 selects H⁰=(H⁰)^T

2), Policy evaluation: following equation is solved, P is sought^j+1

3), stragetic innovation:

4), stop condition: | | H^j+1-H^j| | < ∈；

Algorithm 2: the Iteration algorithm based on Q- function includes the following steps,

1), initialize: setting j=0 selects H⁰=(H⁰)^T

2) it, Policy evaluation: solves following equation and seeks P^j+1

3), stragetic innovation:

4) stop condition: | | H^j+1-H^j| | < ∈.

5. optimal track control device design method when a kind of injection attacks there are false data according to claim 1, It is characterized in that, the step four specifically:

System modeIt can be indicated using following input-output sequence:

So, Q- function can be write as following form:

Therefore, the optimal policy of attacking and defending both sides are as follows:

Wherein,

It is provided in algorithm 3 and algorithm 4 respectively using the Policy iteration and value alternative manner of Q- study:

Algorithm 3: using the Policy iteration algorithm of Q- study, including the following steps,

1) it initializes: setting j=0, the initial policy that selection can be steadyWith

2) it Policy evaluation: solves following equation and seeks h^j+1

3) stragetic innovation:

4) stop condition: | | H^j+1-H^j| | < ∈；

Algorithm 4: using the Iteration algorithm of Q- study, including the following steps,

1) initialize: setting j=0 selects arbitrary initial policyWith

2) it Policy evaluation: solves following equation and seeks h^j+1

3) stragetic innovation:

4) stop condition: | | H^j+1-H^j| | < ∈.