CN109375514A - A kind of optimal track control device design method when the injection attacks there are false data - Google Patents
A kind of optimal track control device design method when the injection attacks there are false data Download PDFInfo
- Publication number
- CN109375514A CN109375514A CN201811453386.6A CN201811453386A CN109375514A CN 109375514 A CN109375514 A CN 109375514A CN 201811453386 A CN201811453386 A CN 201811453386A CN 109375514 A CN109375514 A CN 109375514A
- Authority
- CN
- China
- Prior art keywords
- policy
- algorithm
- false data
- optimal
- following
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Complex Calculations (AREA)
- Feedback Control In General (AREA)
Abstract
The present invention relates to a kind of intelligent-tracking controllers, and when there are false data injection attacks, which can calculate optimal tracking control law in real time, so that the reference input of tracking system is capable of in the output of system.The controller may include different control algolithm processors, using the adaptive dynamic programming algorithm based on game theory and Q- study, the case where can be adapted for the unknown situation of system dynamic, can only even obtain input-output data.The present invention is suitable for the case where by Wireless networking systems and controller, or the case where by wireless communication networks transmission data, has great application value in terms of UAV Formation Flight, intelligent vehicle.
Description
Technical field
It the present invention relates to the use of game theory, adaptive Dynamic Programming and intensified learning method, linear discrete time system
There are the methods for when false data injection attacks, determining optimal track control device for system.
Background technique
Optimal track control is an important subject of control field, has a wide range of applications background.For example, intelligence
The track following of vehicle and unmanned plane, the tracing control etc. of robot.The purpose of optimal track control is to make the output of system most
Under excellent meaning can track reference input (or reference locus), this can be by minimizing previously given quadratic performance index
Function is realized.It should be pointed out that with the development and application of network technology, wireless network transmission technology is more and more
Remote transmission applied to data.However, due to the presence of wireless network, so that the data of transmission are easy to be attacked by opponent, it is main
It to include Denial of Service attack, Replay Attack, false data injection attacks etc..So research there are when network attack it is optimal with
Track control has important practical significance.Present invention is generally directed to false data injection attacks to be studied.
Traditional optimal track control designs corresponding tracking control unit using dynamic programming method.However, Dynamic Programming
Method belongs to recurrence method from the front to the back, therefore cannot be in line computation, and there are problems that dimension calamity.Adaptive dynamic is advised
Draw method belong to artificial intelligence scope, be fundamentally based on intensified learning theory, simulation people by the feedback to complex environment into
The thinking of row study, and then time forward Recursive Solution control strategy, so can execute online.
Optimum control rate is calculated using Q learning method, it may not be necessary to the system square of original system and reference locus generator
Battle array, the situation unknown suitable for certain dynamic matrixes.In addition, this method can also be asked only with inputoutput data, iteration
Optimal track control strategy is solved, without current state information.
Summary of the invention
When present invention aims at proposing a kind of injection attacks there are false data, discrete-time system optimal track control device
Design method, can not be tracked when solving the problems, such as to be previously present false data injection attacks.System construction drawing of the invention is such as
Shown in Fig. 1.Technical solution of the present invention is implemented as follows:
1) false data challenge model and augmented system model are established;
2) Game Theory is used, the betting model of attacking and defending both sides is established;Defender is controller, and attacker is false data
Injection side;
3) Bellman equation is established, by the theory of optimal control, solves optimal control policy and attack strategies;Using strategy
Alternative manner and value alternative manner solve game algebra Riccati equation;
4) use the intensified learning method based on Q- function, solve game both sides optimal policy, including Policy iteration and
It is worth alternative manner;
5) input-output data are based only on, using Q- learning method, iteratively solve optimal policy.
Detailed description of the invention
Fig. 1 is that there are system construction drawings when false data injection attacks.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
With reference to Fig. 1, the invention proposes a kind of methods learnt using game theory, adaptive Dynamic Programming and Q, solve
The optimal track control problem of discrete-time system.Specific embodiment is as follows:
1) false data challenge model and augmentation model are established
Consider following system model
xk+1=Axk+Buk (1)
Wherein, A and B is sytem matrix;Assuming that system control input ukIt is under attack in transmission process, by false data
System model after injection attacks becomes
Wherein, q is attacker's number,It indicates i-th
Transmission is logical to receive j-th of attacker's attack, otherwise not under fire;For the false data of j-th of channel of k moment injection.
Assuming that trace model has following form
Wherein, matrix T is trace model matrix;It should be noted that sytem matrix T should not Seeking Truth Hurwitz in above formula
's.Convolution (2) and formula (3), can obtain augmented system equation as follows
2) Game Theory is used, the betting model of attacking and defending both sides is established
In general, there are many forms for controller, for example, state feedback, output feedback, Dynamic Output Feedback etc..In addition, injection
False data can also be varied.Present invention assumes that tracking control unit and false data are stateLinear function, i.e.,
Wherein, K=[K1, K2] andThe respectively feedback oscillator of attacking and defending both sides.Respectively game both sides select
Take following pay off function:
Wherein,Qe>=0, R > 0, γ ∈ (0,1) are discount factor.So defender and
The optimal policy of attacker is
(9) and (10) are solved to be equivalent to solve following problem of game
3) Bellman equation is established, by the theory of optimal control, solves optimal control policy and attack strategies
Firstly, the utility function being defined as follows,
Then, by calculating, available following optimum control Bellman equation,
According to the theory of optimal control it is found thatP > 0.So by solving optimal control equation, it can
To obtain the optimal policy of following attacking and defending both sides:
Wherein,
Θ=[(Θ1)T(Θ3)T…(Θq)T]T
L (P)=[(L1(P))T(L2(P))T…(Lq(P))T]T
Matrix P > 0, and meet
Result above is provided according to dynamic programming method, can only off-line calculation.We use intensified learning side now
Method, in line computation game both sides' optimal policy.Policy iteration and value iterative calculation is set forth in following algorithm 1 and algorithm 2
Process.
Algorithm 1: strategy of on-line iteration
1. initialization: setting j=0 selects stable initial policyWith
2. Policy evaluation: solving following equation and seek Pj+1
3. stragetic innovation:
4. stop condition: | | Kj+1-Kj| | < ∈, | | Lj+1-Lj| | < ∈
Algorithm 2: Iteration algorithm
1. initialization: setting j=0 selects stable initial policyWith
2. Policy evaluation: solving following equation and seek Pj+1
3. stragetic innovation:
4. stop condition: | | Kj+1-Kj| | < ∈, | | Lj+1-Lj| | < ∈;
Equation (17) are solved it can be seen from algorithm 1 needs given dataAnd
Initial value must can be steady, otherwise equation is without solution.And algorithm 2 is correspondingly improved, it is no longer necessary to which initial value is can be steady
's.
4) the intensified learning method based on Q- function is used, the optimal policy of game both sides is solved
The Q- function being defined as follows,
The compact form being written as follow again for convenience,
Wherein,
So can be by solving equationWithIt is available as follows
Attacking and defending both sides' optimal policy,
Formula (20) is brought into formula (19), the available Bellman equation based on Q- function, the equation is in iterative process
An important equation.Policy iteration and value alternative manner based on Q- function provide in algorithm 3 and algorithm 4 respectively.
Algorithm 3: the Policy iteration algorithm based on Q- function
1. initialization: setting j=0 selects H0=(H0)T
2. Policy evaluation: solving following equation and seek Pj+1
3. stragetic innovation:
4, stop condition: | | Hj+1-Hj| | < ∈
Algorithm 4: the Iteration algorithm based on Q- function
1. initialization: setting j=0 selects H0=(H0)T
2. Policy evaluation: solving following equation and seek Pj+1
3. stragetic innovation:
4, stop condition: | | Hj+1-Hj| | < ∈.
It is worth noting that, the system that iterative algorithm 3 and algorithm 4 based on Q- function do not need previously known augmented system
MatrixWith
Optimal policy is iteratively solved using Q- learning method based on input-output data
Assuming that system is observable, then system modeIt can be indicated using following input-output sequence,
Wherein,
As can be seen from the above equation, there are a constant k > 0, so that as N < k, rank (VN) < n+p, as N >=κ,
rank(VN)=n+p.Wherein, n is original system state dimension, and p is that system exports dimension.Therefore, selection N >=κ makes matrix VNColumn
Full rank.Definition
So, Q- function can be write as following form
Therefore, the optimal policy of available attacking and defending both sides is
Wherein,
Bellman equation based on Q- function and input-output data can be write as
Linear parameterization Q- function, it is available
In above formula, unknown matrixHaveA unknown element,BecauseBased on the above analysis, algorithm 5 and algorithm 6 are set forth
Input-output data only only used using the Policy iteration and value alternative manner, this method of Q- study.
Algorithm 5: the Policy iteration algorithm learnt using Q-
1. initialization: setting j=0, the initial policy that selection can be steadyWith
2. Policy evaluation: solving following equation and seek hj+1
3. stragetic innovation:
4, stop condition: | | Hj+1-Hj| | < ∈
Algorithm 6: the Iteration algorithm learnt using Q-
1. initialization: setting j=0 selects arbitrary initial policyWith
2. Policy evaluation: solving following equation and seek hj+1
3. stragetic innovation:
4, stop condition: | | Hj+1-Hj| | < ∈;
Attacking and defending both sides initial policy is not needed it can be seen from algorithm 6 can be steady.In addition, for recursive calculation Number of samples must satisfy
Claims (5)
1. optimal track control device design method when a kind of injection attacks there are false data, which is characterized in that including walking as follows
It is rapid:
Step 1: false data challenge model and augmented system model are established;
Step 2: Game Theory is used, the betting model of attacking and defending both sides is established;
Step 3: use the intensified learning method based on Q- function, solve game both sides optimal policy, including Policy iteration and
It is worth alternative manner;
Step 4: being based on input-output data, using Q- learning method, iteratively solves optimal policy.
2. optimal track control device design method when a kind of injection attacks there are false data according to claim 1,
It is characterized in that, the step one specifically:
Consider following system model:
xk+1=Axk+Buk
Wherein, A and B is sytem matrix;If system control input ukIt is under attack in transmission process, then by false data
System model after injection attacks are as follows:
Wherein, q is attacker's number, Indicate that i-th of transmission is logical
J-th of attacker's attack is received, otherwise not under fire;For the false data of j-th of channel of k moment injection;
Assuming that trace model has following form:
Wherein, matrix T is trace model matrix;Then augmented system can state are as follows:
3. optimal track control device design method when a kind of injection attacks there are false data according to claim 1,
It is characterized in that, the step two specifically:
Assuming that tracking control unit and false data are stateLinear function, i.e.,
Wherein, K=[K1, K2] andThe respectively feedback oscillator of attacking and defending both sides.
Game both sides choose following pay off function:
Wherein, γ ∈ (0,1) is discount factor, QeIt is respectively given positive semidefinite and positive definite matrix with R;Defender and attacker
Optimal policy design are as follows:
4. optimal track control device design method when a kind of injection attacks there are false data according to claim 1,
It is characterized in that, the step three specifically:
The Q- function being defined as follows:
By solving equationWithAvailable following attacking and defending both sides are optimal
Action strategy:
Wherein, Policy iteration and value alternative manner based on Q- function provide in algorithm 1 and algorithm 2 respectively;
Algorithm 1: the Policy iteration algorithm based on Q- function includes the following steps,
1), initialize: setting j=0 selects H0=(H0)T
2), Policy evaluation: following equation is solved, P is soughtj+1
3), stragetic innovation:
4), stop condition: | | Hj+1-Hj| | < ∈;
Algorithm 2: the Iteration algorithm based on Q- function includes the following steps,
1), initialize: setting j=0 selects H0=(H0)T
2) it, Policy evaluation: solves following equation and seeks Pj+1
3), stragetic innovation:
4) stop condition: | | Hj+1-Hj| | < ∈.
5. optimal track control device design method when a kind of injection attacks there are false data according to claim 1,
It is characterized in that, the step four specifically:
System modeIt can be indicated using following input-output sequence:
So, Q- function can be write as following form:
Therefore, the optimal policy of attacking and defending both sides are as follows:
Wherein,
It is provided in algorithm 3 and algorithm 4 respectively using the Policy iteration and value alternative manner of Q- study:
Algorithm 3: using the Policy iteration algorithm of Q- study, including the following steps,
1) it initializes: setting j=0, the initial policy that selection can be steadyWith
2) it Policy evaluation: solves following equation and seeks hj+1
3) stragetic innovation:
4) stop condition: | | Hj+1-Hj| | < ∈;
Algorithm 4: using the Iteration algorithm of Q- study, including the following steps,
1) initialize: setting j=0 selects arbitrary initial policyWith
2) it Policy evaluation: solves following equation and seeks hj+1
3) stragetic innovation:
4) stop condition: | | Hj+1-Hj| | < ∈.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811453386.6A CN109375514B (en) | 2018-11-30 | 2018-11-30 | Design method of optimal tracking controller in presence of false data injection attack |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811453386.6A CN109375514B (en) | 2018-11-30 | 2018-11-30 | Design method of optimal tracking controller in presence of false data injection attack |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109375514A true CN109375514A (en) | 2019-02-22 |
CN109375514B CN109375514B (en) | 2021-11-05 |
Family
ID=65376219
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811453386.6A Active CN109375514B (en) | 2018-11-30 | 2018-11-30 | Design method of optimal tracking controller in presence of false data injection attack |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109375514B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109932905A (en) * | 2019-03-08 | 2019-06-25 | 辽宁石油化工大学 | A kind of optimal control method of the Observer State Feedback based on non-strategy |
CN110083064A (en) * | 2019-04-29 | 2019-08-02 | 辽宁石油化工大学 | A kind of network optimal track control method based on non-strategy Q- study |
CN111273543A (en) * | 2020-02-15 | 2020-06-12 | 西北工业大学 | PID optimization control method based on strategy iteration |
CN111673750A (en) * | 2020-06-12 | 2020-09-18 | 南京邮电大学 | Speed synchronization control scheme of master-slave type multi-mechanical arm system under deception attack |
CN112149361A (en) * | 2020-10-10 | 2020-12-29 | 中国科学技术大学 | Adaptive optimal control method and device for linear system |
CN112650057A (en) * | 2020-11-13 | 2021-04-13 | 西北工业大学深圳研究院 | Unmanned aerial vehicle model prediction control method based on anti-spoofing attack security domain |
CN113885330A (en) * | 2021-10-26 | 2022-01-04 | 哈尔滨工业大学 | Information physical system safety control method based on deep reinforcement learning |
CN114415633A (en) * | 2022-01-10 | 2022-04-29 | 云境商务智能研究院南京有限公司 | Security tracking control method based on dynamic event trigger mechanism under multi-network attack |
CN115877871A (en) * | 2023-03-03 | 2023-03-31 | 北京航空航天大学 | Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2140650B1 (en) * | 2007-03-30 | 2011-05-25 | International Business Machines Corporation | Method and system for resilient packet traceback in wireless mesh and sensor networks |
CN104994569A (en) * | 2015-06-25 | 2015-10-21 | 厦门大学 | Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method |
CN106937295A (en) * | 2017-02-22 | 2017-07-07 | 沈阳航空航天大学 | Heterogeneous network high energy efficiency power distribution method based on game theory |
CN107038477A (en) * | 2016-08-10 | 2017-08-11 | 哈尔滨工业大学深圳研究生院 | A kind of neutral net under non-complete information learns the estimation method of combination with Q |
CN107819785A (en) * | 2017-11-28 | 2018-03-20 | 东南大学 | A kind of double-deck defence method towards power system false data injection attacks |
CN108181816A (en) * | 2018-01-05 | 2018-06-19 | 南京航空航天大学 | A kind of synchronization policy update method for optimally controlling based on online data |
CN108196448A (en) * | 2017-12-25 | 2018-06-22 | 北京理工大学 | False data injection attacks method based on inaccurate mathematical model |
CN108512837A (en) * | 2018-03-16 | 2018-09-07 | 西安电子科技大学 | A kind of method and system of the networks security situation assessment based on attacking and defending evolutionary Game |
-
2018
- 2018-11-30 CN CN201811453386.6A patent/CN109375514B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2140650B1 (en) * | 2007-03-30 | 2011-05-25 | International Business Machines Corporation | Method and system for resilient packet traceback in wireless mesh and sensor networks |
CN104994569A (en) * | 2015-06-25 | 2015-10-21 | 厦门大学 | Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method |
CN107038477A (en) * | 2016-08-10 | 2017-08-11 | 哈尔滨工业大学深圳研究生院 | A kind of neutral net under non-complete information learns the estimation method of combination with Q |
CN106937295A (en) * | 2017-02-22 | 2017-07-07 | 沈阳航空航天大学 | Heterogeneous network high energy efficiency power distribution method based on game theory |
CN107819785A (en) * | 2017-11-28 | 2018-03-20 | 东南大学 | A kind of double-deck defence method towards power system false data injection attacks |
CN108196448A (en) * | 2017-12-25 | 2018-06-22 | 北京理工大学 | False data injection attacks method based on inaccurate mathematical model |
CN108181816A (en) * | 2018-01-05 | 2018-06-19 | 南京航空航天大学 | A kind of synchronization policy update method for optimally controlling based on online data |
CN108512837A (en) * | 2018-03-16 | 2018-09-07 | 西安电子科技大学 | A kind of method and system of the networks security situation assessment based on attacking and defending evolutionary Game |
Non-Patent Citations (5)
Title |
---|
HAO LIU 等: "《Optimal Tracking Control of Linear 》Discrete-Time Systems Under Cyber Attacks》", 《IFAC2020》 * |
YING CHEN 等: "《Evaluation of Reinforcement Learning Based False Data Injection Attack to Automatic Voltage Control》", 《IEEE》 * |
YUZHE LI 等: "《SINR-based DoS Attack on Remote State Estimation: A Game-theoretic Approach》", 《IEEE》 * |
刘皓: "《信息物理***的"攻与防"》", 《沈阳航空航天大学学报》 * |
田继伟 等: "《基于博弈论的负荷重分配攻击最佳防御策略》", 《计算机仿真》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109932905A (en) * | 2019-03-08 | 2019-06-25 | 辽宁石油化工大学 | A kind of optimal control method of the Observer State Feedback based on non-strategy |
CN109932905B (en) * | 2019-03-08 | 2021-11-09 | 辽宁石油化工大学 | Optimization control method based on non-strategy observer state feedback |
CN110083064B (en) * | 2019-04-29 | 2022-02-15 | 辽宁石油化工大学 | Network optimal tracking control method based on non-strategy Q-learning |
CN110083064A (en) * | 2019-04-29 | 2019-08-02 | 辽宁石油化工大学 | A kind of network optimal track control method based on non-strategy Q- study |
CN111273543A (en) * | 2020-02-15 | 2020-06-12 | 西北工业大学 | PID optimization control method based on strategy iteration |
CN111273543B (en) * | 2020-02-15 | 2022-10-04 | 西北工业大学 | PID optimization control method based on strategy iteration |
CN111673750A (en) * | 2020-06-12 | 2020-09-18 | 南京邮电大学 | Speed synchronization control scheme of master-slave type multi-mechanical arm system under deception attack |
CN111673750B (en) * | 2020-06-12 | 2022-03-04 | 南京邮电大学 | Speed synchronization control scheme of master-slave type multi-mechanical arm system under deception attack |
CN112149361A (en) * | 2020-10-10 | 2020-12-29 | 中国科学技术大学 | Adaptive optimal control method and device for linear system |
CN112149361B (en) * | 2020-10-10 | 2024-05-17 | 中国科学技术大学 | Self-adaptive optimal control method and device for linear system |
CN112650057B (en) * | 2020-11-13 | 2022-05-20 | 西北工业大学深圳研究院 | Unmanned aerial vehicle model prediction control method based on anti-spoofing attack security domain |
CN112650057A (en) * | 2020-11-13 | 2021-04-13 | 西北工业大学深圳研究院 | Unmanned aerial vehicle model prediction control method based on anti-spoofing attack security domain |
CN113885330A (en) * | 2021-10-26 | 2022-01-04 | 哈尔滨工业大学 | Information physical system safety control method based on deep reinforcement learning |
CN113885330B (en) * | 2021-10-26 | 2022-06-17 | 哈尔滨工业大学 | Information physical system safety control method based on deep reinforcement learning |
CN114415633A (en) * | 2022-01-10 | 2022-04-29 | 云境商务智能研究院南京有限公司 | Security tracking control method based on dynamic event trigger mechanism under multi-network attack |
CN114415633B (en) * | 2022-01-10 | 2024-02-02 | 云境商务智能研究院南京有限公司 | Security tracking control method based on dynamic event triggering mechanism under multi-network attack |
CN115877871A (en) * | 2023-03-03 | 2023-03-31 | 北京航空航天大学 | Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN109375514B (en) | 2021-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109375514A (en) | A kind of optimal track control device design method when the injection attacks there are false data | |
Yan et al. | A path planning algorithm for UAV based on improved Q-learning | |
CN108803349B (en) | Optimal consistency control method and system for nonlinear multi-agent system | |
Duan et al. | Imperialist competitive algorithm optimized artificial neural networks for UCAV global path planning | |
Givigi et al. | A reinforcement learning adaptive fuzzy controller for differential games | |
Yu et al. | Distributed multi‐agent deep reinforcement learning for cooperative multi‐robot pursuit | |
Fang et al. | Target‐driven visual navigation in indoor scenes using reinforcement learning and imitation learning | |
Schultz et al. | Improving tactical plans with genetic algorithms | |
Wei et al. | Recurrent MADDPG for object detection and assignment in combat tasks | |
Yue et al. | Deep reinforcement learning for UAV intelligent mission planning | |
Liu et al. | Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning | |
CN111811532B (en) | Path planning method and device based on impulse neural network | |
CN115047907B (en) | Air isomorphic formation command method based on multi-agent PPO algorithm | |
Xiao et al. | Graph attention mechanism based reinforcement learning for multi-agent flocking control in communication-restricted environment | |
Cao et al. | Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory | |
Xu et al. | Pursuit and evasion game between UVAs based on multi-agent reinforcement learning | |
Esrafilian et al. | Model-aided deep reinforcement learning for sample-efficient UAV trajectory design in IoT networks | |
Yang et al. | Learning graph-enhanced commander-executor for multi-agent navigation | |
Zhao et al. | Deep Reinforcement Learning‐Based Air Defense Decision‐Making Using Potential Games | |
CN116165886A (en) | Multi-sensor intelligent cooperative control method, device, equipment and medium | |
Tuba et al. | Water cycle algorithm for robot path planning | |
Lin et al. | Choice of discount rate in reinforcement learning with long-delay rewards | |
Liu et al. | A distributed driving decision scheme based on reinforcement learning for autonomous driving vehicles | |
Bromo | Reinforcement Learning Based Strategic Exploration Algorithm for UAVs Fleets | |
Yang et al. | An interrelated imitation learning method for heterogeneous drone swarm coordination |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220718 Address after: 452370 Building 2, Xingfu industrial new town, Micun Town, Xinmi City, Zhengzhou City, Henan Province Patentee after: Shensu intelligent agricultural machinery equipment (Henan) Co.,Ltd. Address before: 110136, Liaoning, Shenyang, Shenbei New Area moral South Avenue No. 37 Patentee before: SHENYANG AEROSPACE University |