CN115268494B - Unmanned aerial vehicle path planning method based on layered reinforcement learning - Google Patents

Unmanned aerial vehicle path planning method based on layered reinforcement learning Download PDF

Info

Publication number
CN115268494B
CN115268494B CN202210883240.5A CN202210883240A CN115268494B CN 115268494 B CN115268494 B CN 115268494B CN 202210883240 A CN202210883240 A CN 202210883240A CN 115268494 B CN115268494 B CN 115268494B
Authority
CN
China
Prior art keywords
algorithm
aerial vehicle
unmanned aerial
path
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210883240.5A
Other languages
Chinese (zh)
Other versions
CN115268494A (en
Inventor
王�琦
潘德民
王栋
高尚
于化龙
崔弘杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202210883240.5A priority Critical patent/CN115268494B/en
Publication of CN115268494A publication Critical patent/CN115268494A/en
Application granted granted Critical
Publication of CN115268494B publication Critical patent/CN115268494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, which comprises the following steps: step 1: initializing a deep Q network algorithm and a Q learning algorithm; step 2: driving the unmanned aerial vehicle to move from a starting point to a target point, and training a deep Q network algorithm and a Q learning algorithm; when the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm; when the unmanned aerial vehicle detects a dynamic obstacle in the moving process, planning a path by using a Q learning algorithm; step 3: and (3) repeating the step (2) until training of the deep Q network algorithm and the Q learning algorithm is completed, setting the actual coordinates, the starting point coordinates and the target point coordinates of the unmanned aerial vehicle, and planning a path through the trained deep Q network algorithm and the trained Q learning algorithm. The method solves the problem that the network fitting is easily affected by dynamic obstacles when a single algorithm is applied to a dynamic environment, and improves the performance of algorithm path planning.

Description

Unmanned aerial vehicle path planning method based on layered reinforcement learning
Technical Field
The invention relates to the technical field of unmanned aerial vehicle path planning, in particular to an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning.
Background
In recent years, unmanned aerial vehicles are widely applied in a plurality of fields of military use and civil use, so that the demands for autonomy are stronger, and the unmanned aerial vehicle autonomous path planning is the key point of research. At present, most of unmanned plane path planning researches focus on path planning in a static environment, and the research on dynamic environments is relatively few. In the current methods of path planning, reinforcement learning is a hot spot method of path planning due to the unique reward and punishment mechanism and the characteristic of autonomous learning of an optimal strategy through interaction with the environment. Q learning (Q-learning), which is the most classical algorithm for reinforcement learning, is widely applied to the path planning problem of unmanned aerial vehicles. However, Q learning cannot be applied to scenes with complex environments or large dimensions of state space due to the characteristics of table learning. Deep reinforcement learning in combination with deep learning has been proposed and applied to various complex unmanned path planning problems, the most widely used of which is the Deep Q Network (DQN) algorithm.
However, the inventor finds that in the unmanned aerial vehicle dynamic path planning problem based on the deep Q network algorithm, the reinforcement learning algorithm adopts a search strategy of random selection action, so that the efficiency at the initial stage of training is low, the iteration times are too long, and the planned path is not optimal. This situation is more severe in complex environments where dynamic and static obstacles coexist. In addition, it is found that when a single deep Q network algorithm faces a dynamic environment, the position of a dynamic obstacle is not fixed, so that the fitting of a network in the training process is poor, and the performance of the finally trained network is poor.
It can be seen that the prior art has the technical problems of low training efficiency and easily influenced network fitting.
Disclosure of Invention
The invention provides an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, which aims to solve the problems of low training efficiency and easiness in influence of network fitting in the prior art.
The invention provides an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, which comprises the following steps:
Step 1: initializing a deep Q network algorithm and a Q learning algorithm;
Step 2: driving the unmanned aerial vehicle to move from a starting point to a target point, and training a deep Q network algorithm and a Q learning algorithm;
When the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm;
When the unmanned aerial vehicle detects a dynamic obstacle in the moving process, planning a path by using a Q learning algorithm;
Step 3: and (3) repeating the step (2) until training of the deep Q network algorithm and the Q learning algorithm is completed, setting the actual coordinates, the starting point coordinates and the target point coordinates of the unmanned aerial vehicle, and planning a path through the trained deep Q network algorithm and the trained Q learning algorithm.
Further, when the unmanned aerial vehicle does not detect the dynamic obstacle, the depth Q network algorithm performs planning on the path, and then the method further comprises updating the Q learning algorithm through an experience tuple generated in the depth Q network algorithm after the path is planned currently. At this time, the reward function used by the update depth Q network algorithm is consistent with the normal update of the reward function;
when the unmanned aerial vehicle detects a dynamic obstacle, the Q learning algorithm performs planning on the path, and the method further comprises updating the deep Q network algorithm through an experience tuple generated in the Q learning algorithm after the path is planned currently.
Further, when the Q learning algorithm is updated by the experience tuple generated in the deep Q network algorithm after the current planning path, the reward function formula used by the Q learning algorithm is as follows:
reward=η(ds-1-ds)
Wherein η is a constant; d s-1 is the distance from the unmanned aerial vehicle to the target point at the last moment; d s is the distance from the unmanned plane to the target point at the current moment.
Further, in the step 2, before planning the path by the deep Q network algorithm and the Q learning algorithm, the method further includes: the heuristic fish algorithm is used as action guidance of a deep Q network algorithm and a Q learning algorithm in path planning; wherein the heuristic fish algorithm comprises: the method comprises the following steps of a traveling behavior process and a foraging behavior process, wherein the traveling behavior process is used for acquiring the collision direction of an unmanned opportunity and surrounding obstacles; the foraging behavior process is to acquire a plurality of directions with high priority of the unmanned aerial vehicle advancing towards the target point, and the heuristic fish algorithm takes the collision direction out of the plurality of directions with high priority as action guidance.
Further, when the collision direction of the unmanned aerial vehicle with surrounding obstacles is obtained, and when the obstacles are dynamic, whether the unmanned aerial vehicle collides with the obstacles or not is judged through the movement direction and the movement speed of the obstacles.
The invention has the beneficial effects that:
the invention adds the action guidance strategy of heuristic fish algorithm into the action selection strategy of basic deep Q network algorithm and Q learning algorithm. And the action guidance is performed on two aspects of quickly reaching the target point and avoiding the dynamic and static obstacle, and the action guidance greatly reduces unnecessary exploration in the early stage of algorithm training so as to reduce the blindness of the original algorithm exploration.
The invention utilizes hierarchical reinforcement learning, and when facing a dynamic complex environment, two algorithms are used to respectively treat static and dynamic obstacles. The design overcomes the problem that network fitting is easily affected by dynamic obstacles when a single algorithm is applied to a dynamic environment, and improves the performance of algorithm path planning.
The two effects respectively solve the problems that the algorithm training efficiency is low and the planned path lacks safety consideration in the prior art.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the invention in any way, in which:
FIG. 1 is a schematic flow chart of an embodiment of the present invention;
FIG. 2 is a schematic view of detection of a drone sensor in an environment according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a heuristic fish algorithm according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a foraging behavior of a heuristic fish algorithm according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a traveling behavior of a heuristic fish algorithm according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment of the invention provides an unmanned aerial vehicle path planning method based on hierarchical reinforcement learning, which has a flow structure shown in figure 1 and comprises the following steps:
Step 1: initializing network parameter theta of depth Q network algorithm, experience playback zone Q tables for Q learning; initializing a training round number N episode, and setting a starting point P O and a target point P T of the unmanned aerial vehicle flight task;
Step 2: when the training round number is smaller than the set maximum round number, the state and the environment are reset, and the training of the round is started. Detecting the environment according to a sensor, judging whether dynamic obstacles exist in a detection range, wherein the detection range of the sensor is shown in figure 2;
When the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm;
And the depth Q network algorithm selects and executes the action according to the current position of the unmanned aerial vehicle and the position information of the static obstacle by using the heuristic fish algorithm as the action guidance of the algorithm, and then reaches the next state. Rewards for the current action may be derived from a rewards function, embodiments of the invention set the static path planning partial rewards function to:
Alpha, beta are constants that determine the weights of the two reward calculation units in the total reward function. According to experimental debugging, the present example sets α, β to 1.1,2, respectively. d s represents the distance between the unmanned plane and the target point in the last state; d s-1 denotes the distance between the drone and the target point in the next state. Is the distance from the unmanned aerial vehicle to each static obstacle.
Storing the experience tuple [ S, A, R, S' ] consisting of the current state, action, rewards and the next state obtained in the interaction into an experience playback areaIs a kind of medium. The algorithm then plays back zone/>, from experience, according to the set batch number mThe data is sampled to update the Q network of the deep Q network algorithm.
Meanwhile, when the deep Q network algorithm and the Q learning algorithm are switched and used, if one party is completely separated from the other party to stop working, the Q value of the partial state action pair is lost after training of the two algorithms is completed. To avoid this problem, when the deep Q network algorithm works, the Q table of the Q learning algorithm is also updated by using the experience tuples interactively generated in the previous step, and at this time, since the Q learning algorithm has no dynamic obstacle in the range of the unmanned aerial vehicle sensor in the non-working period, the reward function is defined as:
reward=η(ds-1-ds)
Finally, if the action taken by the unmanned aerial vehicle at this time causes a collision, ending and starting a new training round; if no collision is caused, the current round of training is continued.
When the unmanned aerial vehicle detects a dynamic obstacle in the moving process, planning a path by using a Q learning algorithm;
And the Q learning algorithm selects and executes the action according to the current position of the unmanned aerial vehicle and the information of the detected dynamic obstacle by using the heuristic fish algorithm as the action guidance of the algorithm, and the next state is reached. For the bonus function of the dynamic path planning part, the embodiment of the invention sets it as:
Gamma and delta are weight constants, and gamma and delta are respectively set as 1.1,1 according to experimental debugging; d' u→t,du→t represents the distance between the unmanned plane and the target point at the previous moment and the current moment respectively; d' u→o,du→o represents the distance of the unmanned aerial vehicle from the dynamic obstacle to be avoided at the previous moment and the current moment respectively.
And then, updating the Q table of the Q learning algorithm according to the information tuples [ S, A, R, S' ] obtained by the interaction.
And updating the network of the deep Q network algorithm by using the experience tuple obtained in the previous interaction. At this time, the reward function is consistent with the reward function when the deep Q network algorithm actually performs static path planning.
Finally, if the action taken by the unmanned aerial vehicle at this time causes a collision, ending and starting a new training round; if no collision is caused, the current round of training is continued.
Step 3: repeating the step 2, and ending the current round if the unmanned aerial vehicle reaches the target point; if the current training round number of the unmanned plane reaches the set maximum round number N episode, training of the deep Q network algorithm and the Q learning algorithm is completed. At the moment, the actual coordinates, the starting point coordinates and the target point coordinates of the unmanned aerial vehicle are set, and a path is planned through a depth Q network algorithm and a Q learning algorithm which are completed through training.
In step 2, before planning the path by the deep Q network algorithm and the Q learning algorithm, the method further includes: the heuristic fish algorithm is used as action guidance of a deep Q network algorithm and a Q learning algorithm in path planning; the heuristic fish algorithm is inspired by the phenomenon that fishes can feed by utilizing side organs in dark environments in nature, and comprises the following steps: the method comprises the following steps of a traveling behavior process and a foraging behavior process, wherein the traveling behavior process is used for acquiring the collision direction of an unmanned opportunity and surrounding obstacles; the foraging behavior process is to acquire a plurality of directions with high priority of the unmanned aerial vehicle advancing towards the target point, and the heuristic fish algorithm takes the collision direction out of the plurality of directions with high priority as action guidance. The algorithm flow is shown in fig. 3, and comprises the following steps:
Step 21: when the depth Q network algorithm or the Q learning algorithm calls the heuristic fish algorithm to select actions, the current state, the target point position and the information containing the dynamic and static obstacles are input into the heuristic fish algorithm. The experimental environment adopted by the invention is a grid environment, the unmanned aerial vehicle can take eight actions in directions, and the heuristic fish algorithm is responsible for selecting the optimal action in the current state.
Step 22: the foraging behavior calculates the set of selectable actions based on the current state and the target point location, as shown in fig. 4. Let the direction vector that unmanned aerial vehicle current position and target point constitute be L u→t,Lhorizontal and be a unit vector of unmanned aerial vehicle forward direction, then the contained angle of two vectors is:
Next, L action, action e a is a unit direction vector on a certain action in the action space, and the included angle between each action and L horizontal is:
The difference between θ t and each θ action is:
And finally, giving priority to each action from high to low according to the difference from small to large, and returning to the action set with the first five priorities.
Step 23: the traveling behavior calculates an optional action set that does not cause collision according to the current state and the moving and static obstacle information, as shown in fig. 5, gray squares represent static obstacles, and diagonal squares represent dynamic obstacles.
For avoiding the static obstacle, the position information of the static obstacle is utilized, when the unmanned aerial vehicle executes a certain action and enters the area of the static obstacle, the action is set as the forbidden action of the current state, and the available action is returned.
For avoiding a dynamic obstacle, predicting a threat area of the dynamic obstacle at the next moment according to a dynamic obstacle information set (speed, direction, position) detected by a sensor, setting the action as a forbidden action of the current state when the unmanned aerial vehicle executes a certain action and returning to an available action.
Step 24: combining the actions returned in step 22 and step 23, returning a plurality of actions with high priority and without collision to the deep Q network algorithm or Q learning. The call ends.
The process of the specific embodiment is exemplified in a simulation manner, and is specifically as follows:
example 1: hierarchical reinforcement learning
Step 1: initializing network parameters of a deep Q network algorithm, empirical playback zone size1000000; The Q table of the Q learning algorithm is initialized. Setting the total training round number as 500 rounds, wherein the starting point P O = [0,0] and the target point P T = [29, 29] of the unmanned aerial vehicle flight task;
Step 2: the sensor detection range is set to 3 as shown in fig. 2.
And if no dynamic obstacle exists in the current detection range of the unmanned aerial vehicle, calling a depth Q algorithm to conduct static path planning, and then calling a heuristic fish algorithm to conduct action selection. The unmanned aerial vehicle performs the selected action to enter the next state, and obtains a reward for performing the action. The algorithm deposits the experience tuples to the experience playback zone. And meanwhile, updating network parameters from the experience playback zone according to the set batch m=16 sampling information, and updating a Q table of a Q learning algorithm by using the experience tuple.
And if the dynamic obstacle exists in the detection range as in the case of fig. 2, calling a Q learning algorithm to perform dynamic path planning. And the heuristic fish algorithm is also called to select an action, and then the unmanned aerial vehicle executes the selected action to enter the next state and obtains rewards of the action. And finally, updating the Q table by using the experience tuple and simultaneously updating the network of the deep Q network algorithm by using the experience tuple.
Step 3: the unmanned plane is constantly circulated in the interaction process with the environment: detecting a dynamic obstacle, switching an algorithm, selecting an action, executing the action, calculating rewards, updating a Q network/Q table until collision with the obstacle occurs or a target point is reached, and ending the current round. When the total training round number reaches the set N episode, the whole training is ended.
Example 2: heuristic fish algorithm
Step 1: the heuristic is invoked by a deep Q network algorithm or a Q learning algorithm and inputs information including the current state, the target point position, and the dynamic and static obstacles. The heuristic algorithm performs the foraging and travelling actions, respectively, to select the available action set.
Step 2: the foraging behavior calculates theta taction according to the current state and the target point position, calculates the difference value between theta t and each theta action, gives eight actions different priorities according to the difference value, and returns to the action of the first five priorities. Referring to FIG. 4, the set of priority actions returned in this case is [ front left, front right, rear left ].
Step 3: the traveling behavior returns actions which do not lead to collisions according to the information of static and dynamic obstacles. For static obstacles, the action of selecting to enter the area is forbidden because of the fixed position; for dynamic obstacles, the position of the obstacle at the next moment is predicted by using a set [ speed, direction, position ], and then the action of selecting to enter the area is forbidden. As shown in fig. 5, a gray box is a static obstacle, a diagonal line is a dynamic obstacle, and the information of the dynamic obstacle is [1, left, current position ], so that the next time is the marked area in the figure. Finally, the actions causing collision [ left, right back ] are removed, and the rest 6 actions are optional actions.
Step 4: and 2, integrating the actions returned in the step 3, wherein the optional action set returned is [ left front, right front, left back ], and the call is ended.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims (4)

1. The unmanned aerial vehicle path planning method based on hierarchical reinforcement learning is characterized by comprising the following steps of:
Step 1: initializing a deep Q network algorithm and a Q learning algorithm;
Step 2: driving the unmanned aerial vehicle to move from a starting point to a target point, and training a deep Q network algorithm and a Q learning algorithm;
when the unmanned aerial vehicle does not detect a dynamic obstacle in the moving process, planning a path by using a depth Q network algorithm, and updating a Q learning algorithm through an experience tuple generated in the depth Q network algorithm after the path is planned currently;
when the unmanned aerial vehicle detects a dynamic obstacle in the moving process, planning a path by using a Q learning algorithm, and updating a depth Q network algorithm through an experience tuple generated in the Q learning algorithm after the path is planned currently;
Step 3: and (3) repeating the step (2) until training of the deep Q network algorithm and the Q learning algorithm is completed, setting the actual coordinates, the starting point coordinates and the target point coordinates of the unmanned aerial vehicle, and planning a path through the trained deep Q network algorithm and the trained Q learning algorithm.
2. The hierarchical reinforcement learning-based unmanned aerial vehicle path planning method of claim 1, wherein when the Q learning algorithm is updated by an empirical tuple generated in a depth Q network algorithm after a current planned path, a reward function formula used by the Q learning algorithm is as follows:
reward=η(ds-1-ds)
Wherein η is a constant; d s-1 is the distance from the unmanned aerial vehicle to the target point at the last moment; d s is the distance from the unmanned plane to the target point at the current moment.
3. The unmanned aerial vehicle path planning method based on hierarchical reinforcement learning according to claim 1, wherein in the step 2, before planning the path by the deep Q network algorithm and the Q learning algorithm, the method further comprises: the heuristic fish algorithm is used as action guidance of a deep Q network algorithm and a Q learning algorithm in path planning; wherein the heuristic fish algorithm comprises: the method comprises the following steps of a traveling behavior process and a foraging behavior process, wherein the traveling behavior process is used for acquiring the collision direction of an unmanned opportunity and surrounding obstacles; the foraging behavior process is to acquire a plurality of directions with high priority of the unmanned aerial vehicle advancing towards the target point, and the heuristic fish algorithm takes the collision direction out of the plurality of directions with high priority as action guidance.
4. The unmanned aerial vehicle path planning method based on hierarchical reinforcement learning according to claim 3, wherein when the direction in which the unmanned aerial vehicle collides with surrounding obstacles is obtained, and when the obstacles are dynamic, whether the unmanned aerial vehicle collides with the obstacles is judged by the movement direction and movement speed of the obstacles.
CN202210883240.5A 2022-07-26 2022-07-26 Unmanned aerial vehicle path planning method based on layered reinforcement learning Active CN115268494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210883240.5A CN115268494B (en) 2022-07-26 2022-07-26 Unmanned aerial vehicle path planning method based on layered reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210883240.5A CN115268494B (en) 2022-07-26 2022-07-26 Unmanned aerial vehicle path planning method based on layered reinforcement learning

Publications (2)

Publication Number Publication Date
CN115268494A CN115268494A (en) 2022-11-01
CN115268494B true CN115268494B (en) 2024-05-28

Family

ID=83769868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210883240.5A Active CN115268494B (en) 2022-07-26 2022-07-26 Unmanned aerial vehicle path planning method based on layered reinforcement learning

Country Status (1)

Country Link
CN (1) CN115268494B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
WO2019147235A1 (en) * 2018-01-24 2019-08-01 Ford Global Technologies, Llc Path planning for autonomous moving devices
CN113821041A (en) * 2021-10-09 2021-12-21 中山大学 Multi-robot collaborative navigation and obstacle avoidance method
CN114003059A (en) * 2021-11-01 2022-02-01 河海大学常州校区 UAV path planning method based on deep reinforcement learning under kinematic constraint condition
CN114518770A (en) * 2022-03-01 2022-05-20 西安交通大学 Unmanned aerial vehicle path planning method integrating potential field and deep reinforcement learning
CN114527759A (en) * 2022-02-25 2022-05-24 重庆大学 End-to-end driving method based on layered reinforcement learning
CN114529061A (en) * 2022-01-26 2022-05-24 江苏科技大学 Method for automatically predicting garbage output distribution and planning optimal transportation route
CN114625151A (en) * 2022-03-10 2022-06-14 大连理工大学 Underwater robot obstacle avoidance path planning method based on reinforcement learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019147235A1 (en) * 2018-01-24 2019-08-01 Ford Global Technologies, Llc Path planning for autonomous moving devices
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
CN113821041A (en) * 2021-10-09 2021-12-21 中山大学 Multi-robot collaborative navigation and obstacle avoidance method
CN114003059A (en) * 2021-11-01 2022-02-01 河海大学常州校区 UAV path planning method based on deep reinforcement learning under kinematic constraint condition
CN114529061A (en) * 2022-01-26 2022-05-24 江苏科技大学 Method for automatically predicting garbage output distribution and planning optimal transportation route
CN114527759A (en) * 2022-02-25 2022-05-24 重庆大学 End-to-end driving method based on layered reinforcement learning
CN114518770A (en) * 2022-03-01 2022-05-20 西安交通大学 Unmanned aerial vehicle path planning method integrating potential field and deep reinforcement learning
CN114625151A (en) * 2022-03-10 2022-06-14 大连理工大学 Underwater robot obstacle avoidance path planning method based on reinforcement learning

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
D3QHF: A Hybrid Double-deck Heuristic Reinforcement Learning Approach for UAV Path Planning;Demin Pan,等;IEEE;20221231;1221-1226 *
Study on interface temperature control of laser direct joining of CFRTP and aluminum alloy based on staged laser path planning;Qi Wang, 等;Optics and Laser Technology;20220609;第154卷;1-13 *
基于MAXQ分层强化学习的有人机/无人机协同路径规划研究;程先峰,严勇杰;信息化研究;20200229;第46卷(第1期);13-19 *
基于事件驱动的无人机强化学习避障研究;唐博文,等;广西科技大学学报;20190331(第1期);96-102 *
基于分数阶MRAC 的四旋翼姿态控制;陈开元,等;电光与控制;20211231;第28卷(第12期);1-5 *
王曌,胡立生.基于深度Q 学习的工业机械臂路径规划方法.化工自动化及仪表.(第2期),141-145. *

Also Published As

Publication number Publication date
CN115268494A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN108776483B (en) AGV path planning method and system based on ant colony algorithm and multi-agent Q learning
CN109765893B (en) Mobile robot path planning method based on whale optimization algorithm
Kurzer et al. Decentralized cooperative planning for automated vehicles with hierarchical monte carlo tree search
CN111260027B (en) Intelligent agent automatic decision-making method based on reinforcement learning
CN107229287A (en) A kind of unmanned plane global path planning method based on Genetic Ant algorithm
CN113741525B (en) Policy set-based MADDPG multi-unmanned aerial vehicle cooperative attack and defense countermeasure method
CN112269382B (en) Robot multi-target path planning method
CN113561986A (en) Decision-making method and device for automatically driving automobile
CN114089776B (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN111723931B (en) Multi-agent confrontation action prediction method and device
CN114967721B (en) Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet
CN112469050A (en) WSN three-dimensional coverage enhancement method based on improved wolf optimizer
CN115268494B (en) Unmanned aerial vehicle path planning method based on layered reinforcement learning
CN113467481B (en) Path planning method based on improved Sarsa algorithm
CN117705113A (en) Unmanned aerial vehicle vision obstacle avoidance and autonomous navigation method for improving PPO
Han et al. Multi-uav automatic dynamic obstacle avoidance with experience-shared a2c
CN117109574A (en) Agricultural transportation machinery coverage path planning method
CN111562740B (en) Automatic control method based on multi-target reinforcement learning algorithm utilizing gradient
Xiao et al. Design of reward functions based on The DDQN Algorithm
CN112947421B (en) AUV autonomous obstacle avoidance method based on reinforcement learning
CN118051063B (en) Training method for obstacle avoidance flight of low-altitude unmanned aerial vehicle
CN110955239B (en) Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning
Miyashita et al. Flexible Exploration Strategies in Multi-Agent Reinforcement Learning for Instability by Mutual Learning
CN117782106A (en) Improved depth path planning method and system based on impulse neural network
US20230126696A1 (en) Lane change method and system, storage medium, and vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant