CN112034888B - Autonomous control cooperation strategy training method for fixed wing unmanned aerial vehicle - Google Patents

Autonomous control cooperation strategy training method for fixed wing unmanned aerial vehicle Download PDF

Info

Publication number
CN112034888B
CN112034888B CN202010944803.8A CN202010944803A CN112034888B CN 112034888 B CN112034888 B CN 112034888B CN 202010944803 A CN202010944803 A CN 202010944803A CN 112034888 B CN112034888 B CN 112034888B
Authority
CN
China
Prior art keywords
unmanned aerial
strategy
aerial vehicle
learning
environment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010944803.8A
Other languages
Chinese (zh)
Other versions
CN112034888A (en
Inventor
俞扬
詹德川
周志华
王超
袁雷
陈立坤
黄宇洋
庞竟成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202010944803.8A priority Critical patent/CN112034888B/en
Publication of CN112034888A publication Critical patent/CN112034888A/en
Application granted granted Critical
Publication of CN112034888B publication Critical patent/CN112034888B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/104Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses a method for training an autonomous control cooperation strategy of a fixed wing unmanned aerial vehicle, which comprises the following steps: (1) dynamic-based fixed-wing unmanned aerial vehicle control simulation environment EsAcquiring real track data of a pilot for controlling the unmanned aerial vehicle, and learning in a supervision learning mode to obtain a flight control strategy of the unmanned aerial vehicle; (2) constructing a simplified abstract environment E stripped of flight controlaCreating two groups of unmanned aerial vehicles with grouped confrontation, and learning by using an APEX _ QMIX algorithm to obtain a cooperation strategy; (3) combining a flight control strategy and a cooperation strategy in a layered reinforcement learning mode to obtain a simulation environment EsA middle-entering learning fusion strategy; (3) migrating to the real environment. The method has significance in a real scene, and has the characteristics of good generalization, low cost, strong robustness and the like.

Description

Autonomous control cooperation strategy training method for fixed wing unmanned aerial vehicle
Technical Field
The invention relates to a fixed wing unmanned aerial vehicle autonomous control cooperation strategy training method based on hierarchical reinforcement learning and multi-agent reinforcement learning, and belongs to the technical field of unmanned aerial vehicle autonomous control cooperation strategies.
Background
For a traditional autonomous control cooperation strategy of the fixed-wing unmanned aerial vehicle, an automatic control method is mainly adopted, manual modeling is carried out, and a strategy is formulated. The flight rules are established by experts in the relevant field. High cost and frequent scene changes due to complex changing environments, there are a large number of situations that are not considered in the flight rules. Thus, flight regulations generally cannot handle complex changing environments and are less capable.
Recently, with the vigorous development of the technology of machine learning, reinforcement learning brings a new solution for the autonomous control strategy of the unmanned aerial vehicle. Reinforcement Learning is a branch of machine Learning, and compared with the classic supervised Learning and unsupervised Learning problems of machine Learning, reinforcement Learning is mainly characterized by Learning in Interaction (Learning from Interaction). The Agent learns knowledge continuously according to obtained rewards or punishment in interaction with the environment, and is more suitable for the environment. The paradigm of RL learning is very similar to our process of human learning of knowledge, and as such, RL is considered an important approach to implementing general AI. By means of a reinforcement learning method, a dynamics simulation environment simulator is constructed, a reasonable reward function is designed, an autonomous control strategy of the unmanned aircraft is trained in the simulator environment, and the method is efficient and low in cost. And because training samples are abundant, the flight control strategy learned by using reinforcement learning can face various complex change conditions, and compared with the method for controlling the unmanned aerial vehicle by using rules, the method is more robust and flexible. However, the simple reinforcement learning also has the limitation, the exploration learning space is too large, the effect seriously depends on the parameter tuning and optimizing rick, and the training is difficult.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems and the defects in the prior art, the invention provides a fixed wing unmanned aerial vehicle autonomous control cooperation strategy training method based on layered reinforcement learning and multi-agent reinforcement learning, which abstractly divides an unmanned aerial vehicle autonomous control cooperation strategy into two layers: the high-level strategy is responsible for the cooperation strategy, and the bottom-level strategy is responsible for the flight control. And the strategy is decoupled, the exploration space is reduced, and the learning difficulty is reduced. A framework for distributed execution of applications using centralized learning of APEX _ QMIX algorithm by constructing a simplified flight control stripped collaboration environment. Selfplay was conducted to explore various possible collaboration strategies from zero. Meanwhile, the flight control is learned in a dynamic simulation environment simulator. And finally, combining the two strategies to obtain a final unmanned aerial vehicle autonomous control strategy, and migrating the final unmanned aerial vehicle autonomous control strategy to a real environment. The method has the characteristics of good generalization, low cost, strong robustness and the like.
The technical scheme is as follows: a method for training an autonomous control cooperation strategy of a fixed wing unmanned aerial vehicle is characterized in that the unmanned aerial vehicle cooperation strategy is divided into a high-level strategy and a bottom-level strategy by adopting a layered reinforcement learning method; the high-level policy is used for a cooperation policy; the underlying strategy is used for flight control; dynamic-based fixed-wing unmanned aerial vehicle control simulation environment EsThe system is used for training the unmanned aerial vehicle to achieve flight control and cooperation targets; in addition, for strategy decoupling, exploration space is reduced, learning difficulty is reduced, and a simplified abstract environment E with flight control stripped is constructedaThe method is used for pre-training the cooperation strategy and accelerating the learning of the cooperation strategy; the bottom layer strategy is obtained by learning in a supervision learning mode; performing strategy fusion on a high-level strategy and a bottom-level strategy, and finally applying the trained autonomous control cooperation strategy to a real environment; using APEX _ QMIX algorithm, from the abstract Environment EaThe provided observation information is used for pre-training the cooperation strategy, and the simulation environment E is controlled according to the fixed-wing unmanned aerial vehiclesAnd the provided observation information is used for training the fusion strategy.
The high-level strategy receives observation, gives a flying target point and controls the cooperation of the unmanned aerial vehicle; and the bottom layer strategy receives the target point of the high layer strategy, selects the optimal flight mode and flies to the target point in the fastest optimal mode.
Dynamic-based fixed-wing unmanned aerial vehicle control simulation environment EsEm _ s. Constructing a simplified abstract environment E stripped of flight controlaEm _ a. The simulator Em _ s is used for approximately simulating the Markov process<S,A,P,R,γ>Providing observation information consistent with the unmanned aerial vehicle in the real scene and providing control instructions consistent with the real unmanned aerial vehicle, wherein the control instructions comprise three basic angle change indicators for controlling flightCommanding and controlling an accelerator control command of the flight speed of the unmanned aerial vehicle; the three basic angle change instructions comprise a pitching instruction, a rolling instruction and a yawing instruction; the control instruction is in the form of A ═<Δv,Δα,Δβ,Δγ>With a spatial dimension R4
The simulator Em _ a does not relate to unmanned aerial vehicle control information in a real scene, but abstracts and simplifies an autonomous control process of the unmanned aerial vehicle into a particle game in a three-dimensional environment; the unmanned aerial vehicle is regarded as a particle, and the flight process of the unmanned aerial vehicle with the fixed step length is abstracted to a reachable target point; the simulator generates a red-blue unmanned aerial vehicle group, and performs countermeasure cooperation training.
The bottom-layer strategy is obtained by learning in a supervision learning mode, and single flight action tasks such as constant-speed flat flight, flat flight acceleration and deceleration, steepest climbing, half-roll reversing and the like are constructed. Collecting real track data of the action of the unmanned aerial vehicle when the pilot controls the unmanned aerial vehicle; extracting all the 'state-action' pairs corresponding to the track to construct a new set D {(s)1,a1),(s2,a2),(s2,a2) .., learning by a reinforcement learning method of near-end strategy optimization by taking the state as a characteristic and taking an unmanned aerial vehicle steering column instruction as a mark to obtain an optimal strategy model; the supervised learning objective function of the underlying strategy is as follows:
Figure BDA0002674919340000021
wherein
atFor the action of the unmanned aerial vehicle agent at time t, StFor the unmanned aerial vehicle state information at the time t, theta' is a strategy model parameter interacted with the environment and used for sampling, theta is a strategy model parameter for updating learning, and pθ’、pθThe state transition probability functions given for theta', theta, respectively, are at state StLower selection action atProbability of (A)θ’E is the desired dominance function of θ'.
The steering column instruction specifically comprises a rolling instruction, a pitching instruction, a yawing instruction and a power instruction of the unmanned aerial vehicle.
In a simplified abstracted environment E stripped of flight controlaIn Em _ a of the simulator, counterwork cooperative training of unmanned aerial vehicles of both sides of red and blue is carried out, and for one unmanned aerial vehicle group, centralized learning of APEX _ QMIX algorithm is adopted, and an application framework is executed in a distributed mode; obtaining a distributed strategy of each unmanned aerial vehicle through centralized information learning; the global state information is borrowed to improve the algorithm effect; a neural network is used to integrate the local value functions of each agent into a joint action value function for evaluation of the actions of each drone.
Has the advantages that: compared with the prior art, the autonomous control cooperative strategy training method for the fixed-wing unmanned aerial vehicle has the following advantages:
(1) a simulator is built, and an autonomous control cooperation strategy of the fixed-wing unmanned aerial vehicle is trained by using a reinforcement learning algorithm. The flight rules are not established by experts in related fields, and the method is efficient and has zero trial and error cost.
(2) And the strategy decoupling is carried out by adopting layered reinforcement learning, the exploration space is reduced, and the learning difficulty is reduced.
(3) In order to accelerate the learning of the cooperative strategy, further reduce the search space and construct a simplified abstract environment E stripped of flight controlaAnd under the condition of not considering specific flight actions, the cooperation strategy model is pre-trained under the condition of only considering simplification of cooperation strategy behaviors, so that the overall training time cost is greatly reduced.
Drawings
FIG. 1 shows a simulation environment E for the control of a fixed-wing drone by the fusion strategy of the present inventionsTraining frame diagram of
FIG. 2 is a diagram of an abstract environment E according to the present inventionaTraining frame diagrams of the middle pre-training cooperation strategy; wherein, after action1,…,actionn,agent1,…,agentnDirectly reaching a single-step farthest point in the connecting direction of a target point and a current position point which are output by a corresponding strategy;
FIG. 3 is a schematic diagram of the Ape-x structure of the multi-agent reinforcement learning algorithm APEX _ QMIX used in the present invention;
FIG. 4 is a schematic diagram of a hybrid network (migration network) structure of a multi-agent reinforcement learning algorithm APEX _ QMIX used in the present invention;
FIG. 5 shows a simulation environment E for controlling a fixed-wing UAV by fusing strategies according to the present inventionsThe training flowchart in (1).
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
The method for training the autonomous control cooperation strategy of the fixed wing unmanned aerial vehicle comprises the following steps:
step 1: a simulator Em _ s controlled by a fixed-wing unmanned aerial vehicle is constructed based on dynamics, and the visualization part of the simulator Em _ s is realized based on a unity3D engine. Unmanned aerial vehicle simulation environment EsThe training process in (1) is defined as a tuple form of a Markov Decision Process (MDP)<S,A,P,R>And S is unmanned aerial vehicle state information, A is unmanned aerial vehicle action, P is an environment state transfer function, and R is environment reward. Wherein S ═<V,α,β,γ>The three angles respectively correspond to the current speed of the unmanned aerial vehicle and are relative to a north-east coordinate system. Control command (motion space) a ═<Δv,Δα,Δβ,Δγ>In order to simulate the control operation of the flying stick of the real flight, three basic angle change commands for controlling the flight are included: pitch commands, roll commands, yaw commands, and throttle control commands that control the aircraft's flight speed. The specific detailed steps for realizing the simulator Em _ s based on unity3D are as follows:
step 11, creating an environment for containing the agent.
Step 12, implement an Academy subclass and add this subclass to the game objects (GameObject) in the Unity scene containing the environment. This game object will serve as a parent to any Brain object in the scene. And implements an optional method of Academy class to update the scene independently of any agent, e.g., adding, moving, or deleting agents and other entities in the environment.
And step 13, adding one or more Brain objects into the scene as the sublevels of Academy.
And step 14, realizing Agent subclasses. The Agent subclass defines the necessary code for the Agent to observe its environment, perform specified actions, and compute rewards for intensive training. An optional method is implemented to reset the agent when it completes a task or when a task fails.
And step 15, adding the Agent subclasses to corresponding unmanned aerial vehicle objects, and distributing a Brain object for each Agent object.
And step 16, implementing the aerodynamically based simulated unmanned aerial vehicle flying state transition code.
And step 17, realizing the code for the state transition of the unmanned aerial vehicle group fight of the red and blue parties, and adding a win and loss judgment code.
And step 18, customizing the environment rule, so that the environment with different scenes, different time lengths and different difficulties can be generated.
Step 2: and carrying out strategy layering based on the idea of layered reinforcement learning. The autonomous control cooperation strategy of the unmanned aerial vehicle is abstractly divided into two layers: a high-level policy and a bottom-level policy; the high-level strategy is responsible for the cooperation strategy, and the bottom-level strategy is responsible for the flight control. And obtaining a bottom flight control model in the simulator Em _ s through supervised learning. The detailed implementation steps of the process are as follows:
step 21, extracting all the 'state-action' pairs corresponding to the acquired real track data of various tactical actions of the pilot controlling the unmanned aerial vehicle to construct a training set D {(s)1,a1),(s2,a2),(s2,a2)...}。
And step 22, designing a proper neural network structure, selecting proper neural network hyper-parameters, and building a neural network. For example, 5 layers of fully connected neural networks, each layer of neural network uses a relu function as an activation function.
Step 23, the state s of the aircraft is determinediAs the feature (feature), the joystick command action a is subjected to regression learning as a flag. Using the BP algorithm, the cumulative error on the training set is minimized:
Figure BDA0002674919340000041
wherein
Figure BDA0002674919340000042
Figure BDA0002674919340000043
Is a true tag of the state that,
Figure BDA0002674919340000044
to predict the tag, EkE is the cumulative error expectation over the training set.
And step 3: constructing a simplified abstract environment E stripped of flight control by the gym open source libraryaEm _ a. The detailed steps are as follows:
step 31, the state transition function code of Em _ a is realized through the unified environment interface of the gym, and the environment interface of the gym is as follows:
(1) reset (): resetting the state of the environment and returning to observation;
(2) step (action): and the physical engine advances a time step forward and returns updated, done and info. The action is the action of the intelligent agent, the assertion is the information observed by the intelligent agent from the environment, the reward is the reward received by the intelligent agent from the environment, done is a termination signal, and info is related information;
(3) render (): an image engine redraws a frame of an environment.
Rllib, using MutiAgentEnv to perform wrapper on Em _ a, so that Em _ a can perform distributed training using Ray. For centralized learning using APEX _ QMIX (QMIX algorithm employing APEX _ X structure) algorithm, the framework of distributed execution application performs selfplay (self-gaming, cooperative training of the red-blue two-party drone swarm confrontation) to provide support. APEX _ X uses a single GPU leaner to learn, a plurality of CPUworkers perform experience collection, and the tracks stored in replay buffers are distributed in priority distribution, so that collection of replay buffers can be expanded to the scale of using hundreds of CPUworkers in parallel, and the training process is greatly accelerated. The APEX _ X structure is shown in fig. 3.
And 4, step 4: in Em _ a, the APEX _ QMIX algorithm is used for carrying out cooperative training of the countermeasures of the unmanned aerial vehicles of the red and blue sides. The detailed steps are as follows:
and step 41, selecting input parameters (initialization position, win-lose judgment condition, duration and the like), initializing the environment of the simulator Em _ a, and generating the unmanned aerial vehicle clusters of the red and blue parties.
Step 42, taking the red formula as an example: a neural network model (the infrastructure is a 3-layer fully-connected neural network, and each layer of neural network uses a relu function as an activation function) is used for initializing a strategy model (a local action value function of a single agent) of each unmanned aerial vehicle. For all the intelligent agents in the red side, a mixed network is adopted to combine the local value functions of the single intelligent agents, and global state information is added in the training and learning process for assisting, so that the algorithm performance is improved. The schematic diagram of the hybrid network structure is shown in fig. 4. The cost function ultimately used is:
Figure BDA0002674919340000051
the update uses the conventional concept of DQN, where b denotes the number of samples sampled from the empirical memory,
Figure BDA0002674919340000052
Figure BDA0002674919340000053
representing the target network. Wherein, tau ', a ', s ' is the sample track sampled in the experience memory, the action of agent and the environment state, r is the reward given by the environment,
Figure BDA0002674919340000054
is a target network parameter.
And step 43, the red and blue parties organize confrontation training in a selfplay mode without any artificial priori knowledge, and the diversity and the robustness of the strategy are enriched from zero exploration.
And 5: and (4) fusing the two layers of strategy models obtained in the step (2) and the step (4) based on Hierarchical Reinforcement Learning (Hierarchical Deep Learning). The detailed steps are as follows:
step 51, establishing a double-layer network structure, wherein the first layer is called meta-controller and is responsible for determining a small target which can be reached, the second layer is a bottom layer controller, an action is given according to the target given by meta, and the new target is repeatedly determined after the small target reaches or reaches the set time. The meta-controller accepts the external prize while giving the underlying controller the internal prize. The underlying controller builds a Q function to estimate the reward scenario based on action at the current goal scenario. The Q function is as follows:
Figure BDA0002674919340000061
where E is expectation, γ is reward discount coefficient, a is action, s is state, g is goal, πagSelecting an action as a strategy under the condition that the target is g, wherein the subscript t is a time step;
meta-controller establishes a Q-function to estimate the reward condition of the good for different target situations,
Figure BDA0002674919340000062
Figure BDA0002674919340000063
where E is expectation, γ is reward discount coefficient, a is action, s is state, g is goal, πagAnd selecting the strategy as a under the condition that the target is g, wherein t and N are time steps, and f is the accumulated external reward.
Both controllers use a similar update method, i.e. a single step error is established and the update is done using gradient descent.
And step 52, initializing Meta-controller and low-level controller respectively by using the two layers of strategy models obtained in the step 2 and the step 4, and training in Em _ s. The training process is shown in figure 5.
Step 6: and finally migrate to the real environment.

Claims (7)

1. A fixed wing unmanned aerial vehicle autonomous control cooperation strategy training method is characterized by comprising the following steps: the unmanned aerial vehicle cooperation strategy is divided into a high-level strategy and a bottom-level strategy by adopting a layered reinforcement learning method; the high-level policy is used for a cooperation policy; the underlying strategy is used for flight control; dynamic-based fixed-wing unmanned aerial vehicle control simulation environment EsThe system is used for training the unmanned aerial vehicle to achieve flight control and cooperation targets; constructing a simplified abstract environment E stripped of flight controlaFor pre-training the cooperation strategy; the bottom layer strategy is obtained by learning in a supervision learning mode; performing strategy fusion on a high-level strategy and a bottom-level strategy, and finally applying the trained autonomous control cooperation strategy to a real environment;
using APEX _ QMIX algorithm, from the abstract Environment EaThe provided observation information is used for pre-training the cooperation strategy, and the simulation environment E is controlled according to the fixed-wing unmanned aerial vehiclesTraining a fusion strategy by the provided observation information; the APEX _ QMIX algorithm is a QMIX algorithm that employs an APEX _ X structure.
2. The fixed-wing drone autonomous control collaborative strategy training method of claim 1, characterized in that: the high-level strategy receives observation information, gives a flying target point and controls the cooperation of the unmanned aerial vehicle; and the bottom layer strategy receives the target point of the high layer strategy, selects the optimal flight mode and flies to the target point in the fastest optimal mode.
3. The fixed-wing drone autonomous control collaborative strategy training method of claim 1, characterized in that: dynamic-based fixed-wing unmanned aerial vehicle control simulation environment EsEm _ s of (1); constructing a simplified abstract environment E stripped of flight controlaEm _ a of (1); the moldThe simulator Em _ s is used for approximately simulating the Markov process<S,A,P,R>Providing observation information consistent with the unmanned aerial vehicle in a real scene, and providing control instructions consistent with the real unmanned aerial vehicle, wherein the control instructions comprise three basic angle change instructions for controlling flight and an accelerator control instruction for controlling the flight speed of the unmanned aerial vehicle; the three basic angle change instructions comprise a pitching instruction, a rolling instruction and a yawing instruction; the control instruction is in the form of A ═<Δv,Δα,Δβ,Δγ>With a spatial dimension R4(ii) a S is unmanned aerial vehicle state information, A is unmanned aerial vehicle action, P is an environment state transfer function, and R is environment reward; wherein S ═<V,α,β,γ>The three angles respectively correspond to the current speed of the unmanned aerial vehicle and are relative to a north-east coordinate system.
4. The fixed-wing drone autonomous control collaborative strategy training method of claim 3, characterized in that: the simulator Em _ a does not relate to unmanned aerial vehicle control information in a real scene, but abstracts and simplifies an autonomous control process of the unmanned aerial vehicle into a particle game in a three-dimensional environment; the unmanned aerial vehicle is regarded as a particle, and the flight process of the unmanned aerial vehicle with the fixed step length is abstracted to a reachable target point; the simulator generates a red-blue unmanned aerial vehicle group, and performs countermeasure cooperation training.
5. The fixed-wing drone autonomous control collaborative strategy training method of claim 1, characterized in that: the bottom-layer strategy is obtained by learning in a supervision learning mode, and a single flight action task is constructed; collecting real track data of the action of the unmanned aerial vehicle when the pilot controls the unmanned aerial vehicle; all the tracks are corresponded
Figure FDA0003051250850000011
Constructing a new set D {(s) by extraction1,a1),(s2,a2) .., learning by a reinforcement learning method of near-end strategy optimization by taking the state as a characteristic and taking an unmanned aerial vehicle steering column instruction as a mark to obtain an optimal strategy model; of the underlying policyThe supervised learning objective function is as follows:
Figure FDA0003051250850000021
atfor the action of the unmanned aerial vehicle agent at time t, StFor the unmanned aerial vehicle state information at the time t, theta' is a strategy model parameter interacted with the environment and used for sampling, theta is a strategy model parameter for updating learning, and pθ′、pθThe state transition probability functions given for theta', theta, respectively, are at state StLower selection action atProbability of (A)θE is the desired dominance function of θ'.
6. The fixed-wing drone autonomous control collaborative strategy training method of claim 5, characterized in that: the steering column instruction specifically comprises a rolling instruction, a pitching instruction, a yawing instruction and a power instruction of the unmanned aerial vehicle.
7. The fixed-wing drone autonomous control collaborative strategy training method of claim 3, characterized in that: in a simplified abstracted environment E stripped of flight controlaIn Em _ a of the simulator, counterwork cooperative training of unmanned aerial vehicles of both sides of red and blue is carried out, and for one unmanned aerial vehicle group, centralized learning of APEX _ QMIX algorithm is adopted, and an application framework is executed in a distributed mode; obtaining a distributed strategy of each unmanned aerial vehicle through centralized information learning; the global state information is borrowed to improve the algorithm effect; a neural network is used to integrate the local value functions of each agent into a joint action value function for evaluation of the actions of each drone.
CN202010944803.8A 2020-09-10 2020-09-10 Autonomous control cooperation strategy training method for fixed wing unmanned aerial vehicle Active CN112034888B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010944803.8A CN112034888B (en) 2020-09-10 2020-09-10 Autonomous control cooperation strategy training method for fixed wing unmanned aerial vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010944803.8A CN112034888B (en) 2020-09-10 2020-09-10 Autonomous control cooperation strategy training method for fixed wing unmanned aerial vehicle

Publications (2)

Publication Number Publication Date
CN112034888A CN112034888A (en) 2020-12-04
CN112034888B true CN112034888B (en) 2021-07-30

Family

ID=73584525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010944803.8A Active CN112034888B (en) 2020-09-10 2020-09-10 Autonomous control cooperation strategy training method for fixed wing unmanned aerial vehicle

Country Status (1)

Country Link
CN (1) CN112034888B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906888B (en) * 2021-03-02 2023-05-09 中国人民解放军军事科学院国防科技创新研究院 Task execution method and device, electronic equipment and storage medium
CN113110546B (en) * 2021-04-20 2022-09-23 南京大学 Unmanned aerial vehicle autonomous flight control method based on offline reinforcement learning
CN113435598B (en) * 2021-07-08 2022-06-21 中国人民解放军国防科技大学 Knowledge-driven intelligent strategy deduction decision method
CN113721645A (en) * 2021-08-07 2021-11-30 中国航空工业集团公司沈阳飞机设计研究所 Unmanned aerial vehicle continuous maneuvering control method based on distributed reinforcement learning
CN113886953B (en) * 2021-09-27 2022-07-19 中国人民解放军军事科学院国防科技创新研究院 Unmanned aerial vehicle intelligent simulation training method and device based on distributed reinforcement learning
CN113867178B (en) * 2021-10-26 2022-05-31 哈尔滨工业大学 Virtual and real migration training system for multi-robot confrontation
CN114141028B (en) * 2021-11-19 2023-05-12 哈尔滨工业大学(深圳) Intelligent traffic light traffic flow regulating and controlling system
CN114167756B (en) * 2021-12-08 2023-06-02 北京航空航天大学 Multi-unmanned aerial vehicle collaborative air combat decision autonomous learning and semi-physical simulation verification method
CN114444716A (en) * 2022-01-06 2022-05-06 中国电子科技集团公司电子科学研究院 Multi-agent game training method and system in virtual environment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964019A (en) * 2010-09-10 2011-02-02 北京航空航天大学 Against behavior modeling simulation platform and method based on Agent technology
US9622133B1 (en) * 2015-10-23 2017-04-11 The Florida International University Board Of Trustees Interference and mobility management in UAV-assisted wireless networks
CN108255059A (en) * 2018-01-19 2018-07-06 南京大学 A kind of robot control method based on simulator training
CN109116854A (en) * 2018-09-16 2019-01-01 南京大学 A kind of robot cooperated control method of multiple groups based on intensified learning and control system
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
CN110231814A (en) * 2019-07-03 2019-09-13 中国人民解放军国防科技大学 Layered distributed control system and control method for fixed-wing unmanned aerial vehicle cluster

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11093829B2 (en) * 2017-10-12 2021-08-17 Honda Motor Co., Ltd. Interaction-aware decision making
CN110502033B (en) * 2019-09-04 2022-08-09 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle cluster control method based on reinforcement learning
CN110991545B (en) * 2019-12-10 2021-02-02 中国人民解放军军事科学院国防科技创新研究院 Multi-agent confrontation oriented reinforcement learning training optimization method and device
CN111144580B (en) * 2019-12-31 2024-04-12 中国电子科技集团公司信息科学研究院 Hierarchical reinforcement learning training method and device based on imitation learning
CN111552301B (en) * 2020-06-21 2022-05-20 南开大学 Hierarchical control method for salamander robot path tracking based on reinforcement learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964019A (en) * 2010-09-10 2011-02-02 北京航空航天大学 Against behavior modeling simulation platform and method based on Agent technology
US9622133B1 (en) * 2015-10-23 2017-04-11 The Florida International University Board Of Trustees Interference and mobility management in UAV-assisted wireless networks
CN108255059A (en) * 2018-01-19 2018-07-06 南京大学 A kind of robot control method based on simulator training
CN109116854A (en) * 2018-09-16 2019-01-01 南京大学 A kind of robot cooperated control method of multiple groups based on intensified learning and control system
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
CN110231814A (en) * 2019-07-03 2019-09-13 中国人民解放军国防科技大学 Layered distributed control system and control method for fixed-wing unmanned aerial vehicle cluster

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Automatic Successive Reinforcement Learning with Multiple Auxiliary Rewards;Fu zhaoyang,et al.;《Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence》;20191231;全文 *
Towards Sample Efficient Reinforcement Learning;Yu yang;《Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence》;20181231;全文 *
分层式强化学习研究进展;陈春林;《***仿真技术及其应用》;20081231;第10卷;全文 *
分层强化学习综述;*** 等;《周志华》;20171031;第12卷(第5期);全文 *

Also Published As

Publication number Publication date
CN112034888A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
CN112034888B (en) Autonomous control cooperation strategy training method for fixed wing unmanned aerial vehicle
CN112162564B (en) Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm
CN110488859B (en) Unmanned aerial vehicle route planning method based on improved Q-learning algorithm
CN110531786B (en) Unmanned aerial vehicle maneuvering strategy autonomous generation method based on DQN
CN112215350B (en) Method and device for controlling agent based on reinforcement learning
Li et al. Oil: Observational imitation learning
CN105700526A (en) On-line sequence limit learning machine method possessing autonomous learning capability
CN114089776B (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN114815882B (en) Unmanned aerial vehicle autonomous formation intelligent control method based on reinforcement learning
CN113741533A (en) Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning
CN114741886A (en) Unmanned aerial vehicle cluster multi-task training method and system based on contribution degree evaluation
CN109740741A (en) A kind of intensified learning method and its application of combination Knowledge Conversion are in the learning method of the autonomous technical ability of unmanned vehicle
CN115509251A (en) Multi-unmanned aerial vehicle multi-target cooperative tracking control method based on MAPPO algorithm
CN113821045A (en) Leg and foot robot reinforcement learning action generation system
CN116225055A (en) Unmanned aerial vehicle autonomous flight path planning algorithm based on state decomposition in complex environment
Zijian et al. Imaginary filtered hindsight experience replay for UAV tracking dynamic targets in large-scale unknown environments
CN114355897B (en) Vehicle path tracking control method based on model and reinforcement learning hybrid switching
Jiang et al. A deep reinforcement learning strategy for UAV autonomous landing on a platform
CN116796843A (en) Unmanned aerial vehicle many-to-many chase game method based on PSO-M3DDPG
CN116796844A (en) M2 GPI-based unmanned aerial vehicle one-to-one chase game method
Nguyen et al. Apprenticeship bootstrapping
CN116227622A (en) Multi-agent landmark coverage method and system based on deep reinforcement learning
CN115933712A (en) Bionic fish leader-follower formation control method based on deep reinforcement learning
CN114371634B (en) Unmanned aerial vehicle combat analog simulation method based on multi-stage after-the-fact experience playback
Wang et al. Autonomous obstacle avoidance algorithm of UAVs for automatic terrain following application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant