CN113741533A - Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning - Google Patents

Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning Download PDF

Info

Publication number
CN113741533A
CN113741533A CN202111089240.XA CN202111089240A CN113741533A CN 113741533 A CN113741533 A CN 113741533A CN 202111089240 A CN202111089240 A CN 202111089240A CN 113741533 A CN113741533 A CN 113741533A
Authority
CN
China
Prior art keywords
learning
decision
module
aerial vehicle
unmanned aerial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111089240.XA
Other languages
Chinese (zh)
Inventor
柴兴华
耿虎军
柯良军
刘子锋
陈彦桥
高峰
张小龙
田苗
关俊志
王小强
王雅涵
轩书哲
张格玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 54 Research Institute
Original Assignee
CETC 54 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 54 Research Institute filed Critical CETC 54 Research Institute
Priority to CN202111089240.XA priority Critical patent/CN113741533A/en
Publication of CN113741533A publication Critical patent/CN113741533A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses an unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning, which comprises an environment perception module, an expert behavior demonstration module, a decision-making learning module and a control execution module, wherein the system can make an accurate decision on a complex real-time scene, so that a target flying from a starting point to a set ending point can be reached, the decision can be made autonomously in the whole flying process, unmanned aerial vehicle control strategies are selected according to environment information, real-time events and the like, obstacles are avoided, and the target point can be reached safely and efficiently. Compared with a traditional decision system for designing expert rules, the method has the advantages that the expert behaviors are cloned by simulating learning to obtain the initial value of the decision control network, then the reward function is set according to events and states in the process of executing flight tasks by deep reinforcement learning, the decision control Q network which is more accurate and has better generalization is obtained by learning, and the method has an important effect on improving the autonomous control performance of the unmanned aerial vehicle.

Description

Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning
Technical Field
The invention belongs to the crossing field of unmanned aerial vehicle intelligent control decision and computer technology, and particularly relates to an unmanned aerial vehicle intelligent decision system based on simulation learning and reinforcement learning.
Background
Unmanned aerial vehicles are increasingly commonly applied in various fields due to relatively low manufacturing cost, good maneuverability and high safety factor. However, the traditional drone control technology still needs a lot of human participation, and the autonomous decision generation technology still is a bottleneck problem limiting the autonomous capability of the drone. Currently, the challenges facing autonomous decision making for unmanned aerial vehicles are mainly: 1) the decision model has large calculation amount for solving and high requirement for obtaining a decision result in real time; 2) in practical application, an accurate unmanned aerial vehicle mathematical model is difficult to establish; 3) in the actual flight process, the environment is complex, a large amount of missing and wrong information exists, and the decision-making difficulty is increased.
The current intelligent decision-making method comprises an expert system, differential countermeasures, dynamic planning and the like. The expert system can match and output decision information according to situation information and expert rules, but the decision information excessively depends on the rules, so that the decision process is lack of flexibility and has insufficient adaptability to complex environments; the differential strategy method solves the decision problem from the angle of numerical optimization calculation, but is based on an accurate unmanned aerial vehicle mathematical model and is difficult to obtain in practical application; the decision problem is solved by using an approximate value function dynamic programming method, the establishment of a unified standard decision model is difficult, and the dimension disaster is easily caused when the numerical method is used for solving. Unlike traditional decision-making methods, reinforcement learning is where the agent learns in a "trial and error" manner, with the goal of making the agent receive the greatest reward by rewarding guided actions through interaction with the environment. Compared with the traditional decision-making method, the reinforcement learning system needs to mainly learn by depending on self experiences, the reinforcement signals provided by the environment evaluate the quality of generated actions, and the best model is determined by executing the actions capable of obtaining the maximum reward for a limited time. However, reinforcement learning essentially belongs to a data-driven optimization algorithm, and has strong dependence on information provided by an external environment, and how to obtain high-quality control decision data becomes a major bottleneck problem limiting the application of the reinforcement learning in the field of unmanned aerial vehicle measurement and control.
Disclosure of Invention
In order to solve the problems that a traditional decision method is weak in adaptability, a reinforcement learning method is limited by the problem that data acquisition is difficult to exert effectiveness and the like in the autonomous decision control process of the unmanned aerial vehicle, the invention provides an unmanned aerial vehicle intelligent decision system based on simulation learning and reinforcement learning, wherein simulation learning of artificial control knowledge and self-adaptive reinforcement learning of a complex environment are combined, unmanned aerial vehicle decision model training is realized step by step, and the unmanned aerial vehicle intelligent decision system has strong generalization capability and high robustness in response to complex dynamic scenes.
The invention adopts the modified technical scheme that:
an unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning comprises an environment perception module, an expert behavior demonstration module, a decision learning module and a control execution module;
the environment perception module: extracting and fusing environmental information acquired by various sensors in the flight process of the unmanned aerial vehicle to form a state space vector and outputting the state space vector to a demonstration module and a decision learning module; wherein the environment information comprises an image, an orientation and a distance;
the expert behavior demonstration module: collecting unmanned plane control instructions made by experts according to domain knowledge and operation experience in various environments and events, wherein the unmanned plane control instructions and data provided by the environment perception module form an expert demonstration data set together and are output to the decision learning module;
a decision learning module: taking the state space vector as the input of a network structure, pre-learning the network structure according to an expert demonstration data set to obtain a pre-learning model, training on the basis of the pre-learning model, learning decision control actions which should be taken under different scenes and different events to obtain a final decision learning model, and outputting the action vector and the probability thereof to a control execution module;
the control execution module: and after obtaining each action and the probability thereof output by the decision learning module, selecting the action instruction with the maximum probability to be executed by the unmanned aerial vehicle, and obtaining the new environment state of the unmanned aerial vehicle after the action is executed.
The specific processing procedure of the environment perception module is as follows: the input is environmental data acquired by various sensors in the flight process of the unmanned aerial vehicle, the image data is extracted by adopting a Resnet18 network to obtain high-dimensional characteristics, then azimuth information and distance information of the unmanned aerial vehicle are fused to form a fused state space vector as output, and the state space vector s ist=(lt,dt,xt,1,xt,2,...,xt,m) In which S istIs the state space vector of unmanned plane at time t,/tIn its own orientation, dt(x) is the distance of the current position from the target positiont,1,xt,2,...,xt,m) Is high-dimensional information of the image in the current field of view.
The specific processing process of the expert behavior demonstration module is as follows: the input is the state space vector of environmental perception module output, utilizes expert's knowledge, carries out unmanned aerial vehicle under different scenes and different time in virtual reality simulation platform and controls, outputs and controls the instruction sequence, and the state space vector of input constitutes expert's demonstration data set with the control instruction sequence of output jointly.
The decision learning module comprises an imitation learning module and a reinforcement learning module;
the simulation learning module and the reinforcement learning module have the same network structure and are realized by a network structure Q, the input of the network structure is a state space vector obtained by the environment sensing module, the input layer is fully connected with the hidden layer, the hidden layer is fully connected with the output layer, and finally the action space and the probability thereof are obtained by output, wherein the action space is an action vector a taken at t timet=(at,f,at,b,at,w,at,e,at,u,at,d);
The simulation learning module performs pre-learning according to the expert demonstration data set, and the expert demonstration data set is used as input environmental images, direction and distance information and unmanned plane control instructions output by experts to perform training to obtain a pre-learning model of the decision learning module; the reinforcement learning module performs reinforcement training on the basis of the initialization parameters of the pre-learning model, learns decision control actions which should be taken under different scenes and different events, and obtains a final decision learning model.
The specific processing procedure of the control execution module is as follows:
the action vector a output by the decision learning modulet=(at,f,at,b,at,w,at,e,at,u,at,d) And selecting the action instruction with the maximum action probability for execution to obtain a new environment state of the unmanned aerial vehicle after the action is executed, and then circularly iterating the environment sensing to control and execute the whole process to realize the autonomous control decision of the unmanned aerial vehicle.
The invention has the beneficial effects that:
the invention can make accurate decision on complex real-time scenes, so that the unmanned aerial vehicle flies from the starting point to the set ending point: 1) the whole flight process is completely and autonomously decided, and unmanned aerial vehicle control strategies can be selected according to environmental information, real-time events and the like, so that obstacles such as buildings, pedestrians and the like are avoided; 2) expert rules are not required to be designed, and behavior cloning is performed on expert behaviors through simulation learning to obtain a rough initial value of the decision control network, so that initialization of the decision network is completed; 3) by adopting a deep reinforcement learning method and setting a reward function for events and states in the flight process, the decision generation is more accurate and the generalization is higher.
Drawings
FIG. 1 is a diagram of a decision network architecture design of the system of the present invention.
FIG. 2 is a schematic diagram of the system components and information interaction of the present invention.
FIG. 3 is a flow chart of decision network training for the system of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
An unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning comprises an environment perception module, an expert behavior demonstration module, a decision learning module (comprising a simulation learning module and a reinforcement learning module) and a control execution module, and it needs to be explained that the system generally needs a virtual reality simulation platform to provide a typical scene for simulation environment data acquisition and unmanned aerial vehicle virtual model operation. The construction and realization ways of each module of the system are as follows:
1. the environment perception module: the method comprises the steps of extracting and fusing characteristics of environmental information collected by various sensors in the flight process of the unmanned aerial vehicle, wherein the environmental information comprises images, directions, distances and the like, and forming state space vectors to provide input data for a subsequent expert behavior demonstration module and a decision learning module.
Multiple sensorsExtracting the obtained environment information mainly comprises extracting image information by using a Resnet18 network to obtain high-dimensional features, wherein the network extracts the high-dimensional information in the environment image from an input layer through a convolutional layer, a pooling layer and a full-connection layer; then, the image feature, the orientation feature and the distance feature are fused into an environment state vector s through a full connection layert=(lt,dt,xt,1,xt,2,...,xt,m) Wherein s istFor the state of the unmanned plane at time t, the state vector contains self direction ltDistance d of current position from target positiontAnd high dimensional information (x) of the image in the current field of viewt,1,xt,2,…,xt,m)。
Referring to fig. 1, the input layer of the network adopts a ResNet18 network, which includes 256 neurons, and the input image is in a 64 × 64 pixel 3 channel format; the hidden layer has two layers, 128 and 64 neurons, respectively, and the output layer has 6 neurons. The input layer and the hidden layer are all connected, the hidden layer and the output layer are all connected, all neuron activation functions use relu functions, and the learning rate alpha is set to be 0.01.
2. The expert behavior demonstration module: by collecting unmanned plane control instructions made by experts according to field knowledge and operation experience in various environments and events, the unmanned plane control instructions and data provided by the environment perception module jointly form an expert demonstration data set, and instruction learning samples are provided for a simulation learning part of a follow-up decision learning module.
The module input is a state space vector s output by the environment sensing modulet=(lt,dt,xt,1,xt,2,...,xt,m) The unmanned aerial vehicle control system is characterized in that an unmanned aerial vehicle operator (namely, a flight control expert) utilizes expert knowledge, a control handle controller (the Xbox controller is adopted in the scheme), a large number of unmanned aerial vehicles with different scenes and time are controlled in a virtual reality simulation platform (the AirSim controller is adopted in the scheme), an output control instruction sequence is output, an input state space vector and an output control instruction sequence form an expert demonstration data set, and the expert demonstration data set comprises an expertThe data of the presentation data set is presented by all experts<State, action>And matching pairs for subsequent imitation learning.
3. A decision learning module: the system comprises an imitation learning module and a reinforcement learning module, wherein the two modules have the same network structure and are realized by a network structure Q, an input layer of the network structure is a state space vector obtained by an environment sensing module, the input layer is fully connected with a hidden layer, the hidden layer is fully connected with an output layer, and finally, an action space is obtained through output. The simulation learning module performs pre-learning according to the expert demonstration data set to obtain a pre-learning model of the decision learning module; the reinforcement learning is trained on the basis of a pre-learning model, and the final decision learning model is obtained by learning the decision control action which should be taken under different scenes and events.
3.1 the model learning module mainly pre-learns according to the expert demonstration data set, namely model learning is performed through behavior cloning to obtain an initial learning model of the decision learning module as an initialization parameter of the Q network. The process of training to mimic learning is: decision data { τ) provided by domain experts1,τ2,...,τmEach decision data comprises a sequence of states and actions, i.e.
Figure BDA0003266661950000061
Then all of<State, action>And for extracting to obtain a new data set:
D={(s1,a1),(s2,a2),(s3,a3),...}
and classifying or regressing by taking the state as a characteristic and the action as a mark so as to obtain an optimal decision strategy model. By collecting a large amount of human expert operation flight data, the state s of the flight data is the environment scene during flight, and the action A is the action in the scene. The data are input into a neural network Q network, so that the output of the network is as close as possible to the action actually made by human, and the task is completed. Namely, the decision control strategy is learned according to the state action pairs provided by human experts, and the process is behavior cloning. By means of the simulation learning, the search space of the reinforcement learning free exploration can be reduced, and a pre-training model is provided for deep reinforcement learning.
3.2 the reinforcement learning module mainly carries out reinforcement training on the basis of Q network initialization parameters obtained by pre-learning, so as to achieve the effects of reducing the search space and accelerating convergence. The implementation approach is to further learn through deep reinforcement learning after the Q network initialization is completed by the simulation learning module, specifically, a DQN algorithm is adopted, the DQN algorithm is an algorithm based on a value function, and the DQN updates the current state stQ value Q(s) of next operation AtA), action A is first executed, updating the one-step reached state st+1(ii) a Then handle st+1Input Q network, calculate st+1Taking the Q value of all actions, and taking the maximum Q value plus the reward R as an updating target; finally, calculate Q(s)tA) and max (Q(s)t+1And, A)) + R, and updating the Q network with the loss, i.e., DQN updating, according to the following formula:
Figure BDA0003266661950000071
wherein, Q(s)tAnd A) represents the agent in state stAfter selecting action A, the reward function is set to reward R relative to distance to destination, up to the expectation of the reward sum for the final statedWith time-of-flight dependent reward RtAnd (4) summing.
R=Rd+Rt
Destination dependent reward RdComprises the following steps: the arrival at the destination receives a larger reward, the closer to the destination the distance d, the greater the reward, and the encounter with an obstacle receives a negative reward. Namely:
Figure BDA0003266661950000072
time dependent negative rewards (penalties) RtIncreases with increasing flight time so that the flight mission is performed in a shorter time. Termination of reaching a final stateThe conditions are set as reaching the destination, flying to the boundary of the map, or executing the set exploration times, and any one of the conditions is the final state. Through continuous iterative training, an accurate Q network can be obtained and used as a core network of a decision learning module of the unmanned aerial vehicle intelligent decision system.
3.3Q network model training flow of decision learning module referring to FIG. 3, at the beginning of training, the weights of the decision learning network are initialized by the parameters obtained by the mimic learning. And 3-1, setting parameters, including an initial value N of the number of the cyclic training times, a set flight target point and the boundary setting of a map. And 3-2, setting the maximum exploration times n in one training. And 3-3, acquiring current environment information and inputting the current environment information to the environment sensing module. And 3-4, extracting the characteristics of the environment information to obtain the state space vector of the environment. And 3-5, inputting the state space vector into the Q network to obtain the probability of each action in the current state. And 3-6, calculating the error delta between the actual value of the network and the updated target. And 3-7, updating the weight in the network according to a back propagation algorithm by using the error delta. And 3-8, selecting the action with the maximum probability to execute according to the output action probability to obtain the next state. And 3-9, judging whether the current state is the final state, judging according to whether the maximum exploration frequency n of one-time training is reached, whether the target point is reached and whether the map boundary is exceeded, repeating the steps to continue exploration if the final state is not reached, and executing the step 3-10 if the final state is reached, wherein the training frequency is increased by 1. And 3-11, judging whether the training times reach the set circulating training times, if not, continuing to perform a new training, and if so, ending the training process.
4. The control execution module: and the system is responsible for the specific execution of the decision learning module, and after the probability of each action in the action space output by the decision learning module is obtained, the action instruction with the maximum probability is selected to be executed by the unmanned aerial vehicle, so that the new environment state of the unmanned aerial vehicle after the action is executed is obtained.
The action vector a output by the decision learning modulet=(at,f,at,b,at,w,at,e,at,u,at,d) And selecting the action instruction with the highest probability for executing the probability of each action so as to obtain a new environment state of the unmanned aerial vehicle after the action is executed, and then circularly iterating the environment sensing to control and execute the whole process to realize the autonomous control decision of the unmanned aerial vehicle.

Claims (5)

1. An unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning is characterized by comprising an environment perception module, an expert behavior demonstration module, a decision learning module and a control execution module;
the environment perception module: extracting and fusing environmental information acquired by various sensors in the flight process of the unmanned aerial vehicle to form a state space vector and outputting the state space vector to a demonstration module and a decision learning module; wherein the environment information comprises an image, an orientation and a distance;
the expert behavior demonstration module: collecting unmanned plane control instructions made by experts according to domain knowledge and operation experience in various environments and events, wherein the unmanned plane control instructions and data provided by the environment perception module form an expert demonstration data set together and are output to the decision learning module;
a decision learning module: taking the state space vector as the input of a network structure, pre-learning the network structure according to an expert demonstration data set to obtain a pre-learning model, training on the basis of the pre-learning model, learning decision control actions which should be taken under different scenes and different events to obtain a final decision learning model, and outputting the action vector and the probability thereof to a control execution module;
the control execution module: and after obtaining each action and the probability thereof output by the decision learning module, selecting the action instruction with the maximum probability to be executed by the unmanned aerial vehicle, and obtaining the new environment state of the unmanned aerial vehicle after the action is executed.
2. The unmanned aerial vehicle intelligent decision making system based on simulation learning and reinforcement learning of claim 1, wherein the specific processing procedures of the environment perception module are as follows: the input is unmannedThe method comprises the steps of collecting environmental data by various sensors in the flying process of the aircraft, extracting image data by adopting a Resnet18 network to obtain high-dimensional features, fusing azimuth information and distance information of the unmanned aerial vehicle to form a fused state space vector as output, and outputting the state space vector st=(lt,dt,xt,1,xt,2,...,xt,m) In which S istIs the state space vector of unmanned plane at time t,/tIn its own orientation, dt(x) is the distance of the current position from the target positiont,1,xt,2,...,xt,m) Is high-dimensional information of the image in the current field of view.
3. The unmanned aerial vehicle intelligent decision making system based on simulation learning and reinforcement learning of claim 1, wherein the specific processing procedure of the expert behavior demonstration module is as follows: the input is the state space vector of environmental perception module output, utilizes expert's knowledge, carries out unmanned aerial vehicle under different scenes and different time in virtual reality simulation platform and controls, outputs and controls the instruction sequence, and the state space vector of input constitutes expert's demonstration data set with the control instruction sequence of output jointly.
4. The unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning of claim 1, wherein the decision-making learning module comprises a simulation learning module and a reinforcement learning module;
the simulation learning module and the reinforcement learning module have the same network structure and are realized by a network structure Q, the input of the network structure is a state space vector obtained by the environment sensing module, the input layer is fully connected with the hidden layer, the hidden layer is fully connected with the output layer, and finally the action space and the probability thereof are obtained by output, wherein the action space is an action vector a taken at t timet=(at,f,at,b,at,w,at,e,at,u,at,d);
The simulation learning module performs pre-learning according to the expert demonstration data set, and the expert demonstration data set is used as input environmental images, direction and distance information and unmanned plane control instructions output by experts to perform training to obtain a pre-learning model of the decision learning module; the reinforcement learning module performs reinforcement training on the basis of the initialization parameters of the pre-learning model, learns decision control actions which should be taken under different scenes and different events, and obtains a final decision learning model.
5. The unmanned aerial vehicle intelligent decision making system based on simulation learning and reinforcement learning of claim 4, wherein the specific processing procedure of the control execution module is as follows:
the action vector a output by the decision learning modulet=(at,f,at,b,at,w,at,e,at,u,at,d) And selecting the action instruction with the maximum action probability for execution to obtain a new environment state of the unmanned aerial vehicle after the action is executed, and then circularly iterating the environment sensing to control and execute the whole process to realize the autonomous control decision of the unmanned aerial vehicle.
CN202111089240.XA 2021-09-16 2021-09-16 Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning Pending CN113741533A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111089240.XA CN113741533A (en) 2021-09-16 2021-09-16 Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111089240.XA CN113741533A (en) 2021-09-16 2021-09-16 Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning

Publications (1)

Publication Number Publication Date
CN113741533A true CN113741533A (en) 2021-12-03

Family

ID=78739471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111089240.XA Pending CN113741533A (en) 2021-09-16 2021-09-16 Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning

Country Status (1)

Country Link
CN (1) CN113741533A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114326438A (en) * 2021-12-30 2022-04-12 北京理工大学 Safety reinforcement learning four-rotor control system and method based on control barrier function
CN114626277A (en) * 2022-04-02 2022-06-14 浙江大学 Active flow control method based on reinforcement learning
CN115906673A (en) * 2023-01-10 2023-04-04 中国人民解放军陆军工程大学 Integrated modeling method and system for combat entity behavior model
CN116679615A (en) * 2023-08-03 2023-09-01 中科航迈数控软件(深圳)有限公司 Optimization method and device of numerical control machining process, terminal equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107656545A (en) * 2017-09-12 2018-02-02 武汉大学 A kind of automatic obstacle avoiding searched and rescued towards unmanned plane field and air navigation aid
CN110488859A (en) * 2019-07-15 2019-11-22 北京航空航天大学 A kind of Path Planning for UAV based on improvement Q-learning algorithm
CN111123963A (en) * 2019-12-19 2020-05-08 南京航空航天大学 Unknown environment autonomous navigation system and method based on reinforcement learning
CN111580560A (en) * 2020-05-29 2020-08-25 中国科学技术大学 Unmanned helicopter autonomous stunt flight method based on deep simulation learning
CN111618862A (en) * 2020-06-12 2020-09-04 山东大学 Robot operation skill learning system and method under guidance of priori knowledge
CN112162564A (en) * 2020-09-25 2021-01-01 南京大学 Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm
CN112232490A (en) * 2020-10-26 2021-01-15 大连大学 Deep simulation reinforcement learning driving strategy training method based on vision
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 Auv action plan and operation control method based on reinforcement learning
CN112884131A (en) * 2021-03-16 2021-06-01 浙江工业大学 Deep reinforcement learning strategy optimization defense method and device based on simulation learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107656545A (en) * 2017-09-12 2018-02-02 武汉大学 A kind of automatic obstacle avoiding searched and rescued towards unmanned plane field and air navigation aid
CN110488859A (en) * 2019-07-15 2019-11-22 北京航空航天大学 A kind of Path Planning for UAV based on improvement Q-learning algorithm
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 Auv action plan and operation control method based on reinforcement learning
CN111123963A (en) * 2019-12-19 2020-05-08 南京航空航天大学 Unknown environment autonomous navigation system and method based on reinforcement learning
CN111580560A (en) * 2020-05-29 2020-08-25 中国科学技术大学 Unmanned helicopter autonomous stunt flight method based on deep simulation learning
CN111618862A (en) * 2020-06-12 2020-09-04 山东大学 Robot operation skill learning system and method under guidance of priori knowledge
CN112162564A (en) * 2020-09-25 2021-01-01 南京大学 Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm
CN112232490A (en) * 2020-10-26 2021-01-15 大连大学 Deep simulation reinforcement learning driving strategy training method based on vision
CN112884131A (en) * 2021-03-16 2021-06-01 浙江工业大学 Deep reinforcement learning strategy optimization defense method and device based on simulation learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张堃 等: "基于深度强化学习的UAV航路自主引导机动控制决策算法", ***工程与电子技术, no. 07, 31 July 2020 (2020-07-31), pages 1567 - 1574 *
李湛 等: "跨传感器异步迁移学习的室内单目无人机避障", 宇航学报, vol. 41, no. 6, 30 June 2020 (2020-06-30), pages 811 - 819 *
王港 等: "基于深度强化学习的航天信息综合应用与决策研究", 无线电工程, vol. 49, no. 7, 31 July 2019 (2019-07-31), pages 564 - 570 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114326438A (en) * 2021-12-30 2022-04-12 北京理工大学 Safety reinforcement learning four-rotor control system and method based on control barrier function
CN114326438B (en) * 2021-12-30 2023-12-19 北京理工大学 Safety reinforcement learning four-rotor control system and method based on control obstacle function
CN114626277A (en) * 2022-04-02 2022-06-14 浙江大学 Active flow control method based on reinforcement learning
CN114626277B (en) * 2022-04-02 2023-08-25 浙江大学 Active flow control method based on reinforcement learning
CN115906673A (en) * 2023-01-10 2023-04-04 中国人民解放军陆军工程大学 Integrated modeling method and system for combat entity behavior model
CN115906673B (en) * 2023-01-10 2023-11-03 中国人民解放军陆军工程大学 Combat entity behavior model integrated modeling method and system
CN116679615A (en) * 2023-08-03 2023-09-01 中科航迈数控软件(深圳)有限公司 Optimization method and device of numerical control machining process, terminal equipment and storage medium
CN116679615B (en) * 2023-08-03 2023-10-20 中科航迈数控软件(深圳)有限公司 Optimization method and device of numerical control machining process, terminal equipment and storage medium

Similar Documents

Publication Publication Date Title
Ruan et al. Mobile robot navigation based on deep reinforcement learning
CN113741533A (en) Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning
CN112162564B (en) Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm
CN110806756B (en) Unmanned aerial vehicle autonomous guidance control method based on DDPG
CN113110592A (en) Unmanned aerial vehicle obstacle avoidance and path planning method
CN112034888B (en) Autonomous control cooperation strategy training method for fixed wing unmanned aerial vehicle
CN110745136A (en) Driving self-adaptive control method
CN110531786B (en) Unmanned aerial vehicle maneuvering strategy autonomous generation method based on DQN
CN110928189A (en) Robust control method based on reinforcement learning and Lyapunov function
CN114089776B (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN111240356A (en) Unmanned aerial vehicle cluster convergence method based on deep reinforcement learning
CN113156419B (en) Specific language navigation method based on radar and visual multi-mode fusion
Ciou et al. Composite reinforcement learning for social robot navigation
CN115016534A (en) Unmanned aerial vehicle autonomous obstacle avoidance navigation method based on memory reinforcement learning
CN114967721B (en) Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet
CN116242364A (en) Multi-unmanned aerial vehicle intelligent navigation method based on deep reinforcement learning
CN116225055A (en) Unmanned aerial vehicle autonomous flight path planning algorithm based on state decomposition in complex environment
Wang et al. Autonomous target tracking of multi-UAV: A two-stage deep reinforcement learning approach with expert experience
CN116562332B (en) Robot social movement planning method in man-machine co-fusion environment
Wu et al. An adaptive conversion speed Q-learning algorithm for search and rescue UAV path planning in unknown environments
CN117406762A (en) Unmanned aerial vehicle remote control algorithm based on sectional reinforcement learning
Li et al. Autopilot controller of fixed-wing planes based on curriculum reinforcement learning scheduled by adaptive learning curve
CN116385909A (en) Unmanned aerial vehicle target tracking method based on deep reinforcement learning
CN114326826B (en) Multi-unmanned aerial vehicle formation transformation method and system
CN115373415A (en) Unmanned aerial vehicle intelligent navigation method based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination