CN113159432A - Multi-agent path planning method based on deep reinforcement learning - Google Patents
Multi-agent path planning method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113159432A CN113159432A CN202110468095.XA CN202110468095A CN113159432A CN 113159432 A CN113159432 A CN 113159432A CN 202110468095 A CN202110468095 A CN 202110468095A CN 113159432 A CN113159432 A CN 113159432A
- Authority
- CN
- China
- Prior art keywords
- agent
- map
- reinforcement learning
- point
- strategy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000002787 reinforcement Effects 0.000 title claims abstract description 39
- 238000013528 artificial neural network Methods 0.000 claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000004088 simulation Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims abstract description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 32
- 230000009471 action Effects 0.000 claims description 18
- 230000004927 fusion Effects 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 6
- 235000008694 Humulus lupulus Nutrition 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 5
- 238000004880 explosion Methods 0.000 description 3
- 230000003278 mimic effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000002922 simulated annealing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
- G06Q10/047—Optimisation of routes or paths, e.g. travelling salesman problem
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a multi-agent path planning method based on deep reinforcement learning. The method is a distributed path planning method, and is characterized in that local observation information of a single intelligent agent is input into a neural network, information among the intelligent agents is transmitted by using the neural network, a neural network approximate strategy function is trained, and therefore a moving strategy is output. The neural network parameters are trained by using a method combining deep reinforcement learning and simulation learning, so that the convergence of the return function is faster. After training, the higher success rate of group path planning in a four-neighborhood 2D grid map under the scale of thousands of agents can be realized, namely, a collision-free route from a starting point to a terminal point is successfully planned for each agent within the time limit. And has strong adaptability to the change of the map size and the obstacle density.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to multi-agent path planning based on deep reinforcement learning.
Background
Multi-agent path planning is a class of problems that finds a set of paths for multiple agents from a starting location to a target location without conflict, while achieving optimal constraints such as minimizing the sum of paths or sum of action costs for all agents, maximizing throughput, etc. The research aiming at the problem has a large number of application scenes in the fields of logistics, unmanned vehicles, military, security, games and the like.
The traditional algorithm for path planning of a single agent has many methods at home and abroad, such as an a-star algorithm, a particle swarm algorithm, a genetic algorithm, an ant colony algorithm, a simulated annealing algorithm and the like. With the improvement of the requirements of industry and living standard, the work of a single intelligent agent cannot meet the requirements of practical application. The multi-agent path planning technology capable of realizing group coordination is produced by the following steps: the traditional algorithms include M, CBS, WHCA and the variants thereof, and the path planning of less than 300 agents can be realized. The deep reinforcement learning method has structures such as DQN, Q-learning and MADDPG, and certain results are obtained.
However, multi-agent path planning based on deep reinforcement learning also faces several specific problems: the self-adaptability is poor under the conditions of various map sizes and high-density obstacles. Lack of communication among the agents leads to planning information blocking and congestion; with the increase of the number of agents, the state-behavior space of most path planning methods can generate dimension explosion, a great amount of calculation is needed, and the planning success rate (successfully planning a collision-free route from a starting point to a terminal point for each agent within the time limit) is limited; the training efficiency is low, and the training time is long.
Disclosure of Invention
In view of the above-mentioned shortcomings, the technical problem to be solved by the present invention is the lack of communication between agents in the prior art; the adaptability under the condition of a changeable map is poor; more intelligent agent information is easy to generate the limitation of dimension explosion; the return convergence and the training process caused by the frame design of the reinforcement learning algorithm are slow. Therefore, the application provides a multi-agent path planning method based on deep reinforcement learning. The method is a distributed path planning method, and is characterized in that local observation information of a single agent is input into a convolutional neural network for processing, information among agents is transmitted by using a graph neural network, a neural network approximate strategy function is trained, and therefore a mobile strategy is output. The neural network parameters are trained by using a method combining deep reinforcement learning and simulation learning, so that the convergence of the return function is faster. After training, the higher success rate of group path planning in a four-neighborhood 2D grid map under the scale of thousands of agents can be realized, namely, a collision-free route from a starting point to a terminal point is successfully planned for each agent within the time limit. And has strong adaptability to the change of the map size and the obstacle density.
In order to achieve the purpose, the technical scheme of the invention is as follows: a multi-agent path planning method based on deep reinforcement learning comprises the following steps:
s1: generating a complex data set, wherein the starting point and the target point of each agent, the different 2D grid square map sizes, the obstacle density and the number of agents are randomly generated and combined in the data set.
S2: inputting map local information tensor into a convolutional neural network for preprocessing, wherein the map local information takes a single agent body as a center and the side length is rlocalMap information within the square of each grid.
S3: the processed local information between agents is transferred S2 using the graphical neural network.
S4: and training the network parameters of the algorithm by a method combining simulation learning and reinforcement learning. Each agent copies a copy of the algorithm network, outputs a strategy, and selects an action strategy of one of the agent's up, down, left, right and no movement in time sequence.
Further, in the step S1:
s1: a global grid map, obstacles, a number of agent start points, and a goal point binary map are generated using python or manually designed. The grid map is a square with the side length of 10, 50 or 100; the obstacle density is the percentage of the number of obstacle grids in the whole map to the number of the map grids, and can be selected to be 10%, 30% or 50%; the number of agents can be selected to be 4, 8, 32, 512 or 1024, and the agents must reach the target point, i.e. communicate. And traversing the combination in the generated map.
Further, in the step S2:
the map local information tensor includes:
(1) an obstacle, and the boundary is considered as an obstacle;
(2) other agent location coordinates;
(3) if the coordinate is outside the local range, connecting the intelligent agent with the target point of the intelligent agent, and projecting the point on the boundary as a target coordinate point;
(4) target point coordinates of other agents.
Further, in the step S3:
s31: in a time step t, constructing a graph, specifically: each agent is abstracted into a point, and the local information observed by the agent is the characteristic X of the pointtAt rlocalThe agent in the system is a neighbor, and an edge is arranged between the agent and the neighbor.
S32: constructing a adjacency matrix StAnd recording the neighbor information of all the agents. Adjacency matrix StThe first action in (1) is the index of the current point, and the other actions are the neighbors of the current point.
S33: calculating a graph convolutionWhereinIndicating the fusion of information with the k-th neighbor, ka trained convolution filter is required for this fusion. The graph convolution shows the information fusion of the point and the K-hop neighbor, wherein 1 hop refers to the point, 2 hops refers to the neighbor, 3 hops refers to the neighbor of the neighbor, and the like. Performing relu activation operation on the graph volumeAnd (4) forming a neural network.
Further, in the step S4:
the specific process of training is as follows: when a training epicode starts, the local map information processed by the graph neural network is input into one of the simulation learning module or the reinforcement learning module at random in a probabilistic manner, and the simulation learning gives a trial-and-error exploration process for accelerating the reinforcement learning of an expert strategy to help to converge to an optimal strategy. Both optimize the same policy network parameters.
The reinforcement learning part utilizes an asynchronous dominant actor-critic algorithm to conduct exploration training, an actor network calculates a moving action strategy pi, a critic network calculates the value V of the moving action, and a gradient descent optimization strategy network is conducted through a loss function of V.
The simulation learning is to simulate an observation-action pair track generated by an expert algorithm, the expert algorithm is a multi-agent path planning algorithm Greedy-Conflict Based Search (GCBS), the cross entropy of the current strategy pi and the expert strategy is calculated, gradient descent is carried out, and gradient updating of a strategy network is realized, so that the strategy is closer to the expert algorithm.
The invention has the beneficial effects that: the distributed path planning algorithm is realized by utilizing the exchange of local information and the design of a single intelligent agent neural network, namely, the robot can only sense the local environment in reality to carry out the autonomous online planning of each intelligent agent, compared with the central planning, the calculation cost caused by dimension explosion is effectively reduced, and the planned path can be quickly calculated; local map information among a plurality of agents is transmitted by using a graph neural network, so that the action purposes of other agents are known, and the planning success rate is effectively improved; the training method combining reinforcement learning and simulation learning improves the efficiency of a trial-and-error exploration mode of reinforcement learning, improves the training and convergence speed, and simulates an expert algorithm to reduce collision, thereby embodying the coordination of groups.
Drawings
FIG. 1 is a flow chart of a method for multi-agent path planning based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a specific neural network structure of a deep reinforcement learning-based multi-agent path planning method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating the map information in step S1 of the multi-agent path planning method based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the 4-layer local observation tensors in step S2 of the deep reinforcement learning-based multi-agent path planning method according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating the conversion of the neighborhood map, the neighborhood map and the adjacency matrix in step S3 in the multi-agent path planning method based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of the combination of reinforcement learning and simulation learning method in step S4 of the deep reinforcement learning-based multi-agent path planning method according to an embodiment of the present invention;
Detailed Description
As shown in fig. 1 and fig. 2, for the method flow and the specific algorithm network structure of the present invention, the method for planning a multi-agent path based on deep reinforcement learning proposed by the present invention includes the following steps:
s1: generating a complex data set, wherein the starting point and the target point of each agent, the different 2D grid square map sizes, the obstacle density and the number of agents are randomly generated and combined in the data set.
A global grid map, obstacles, a number of agent start points, and a goal point binary map are generated using python or manually designed. The grid map is a square with side lengths of 10, 50 and 100; the obstacle density is the percentage of the number of obstacle grids in the whole map to the number of the map grids, and can be selected from 10%, 30% and 50%; the number of agents is 4, 8, 32, 512, 1024, and the agents must reach the target point, i.e. communicate. In the generated map, the traversal combination is carried out, and each item is represented by a binary matrix. As shown in fig. 3, a map with a side length of 10, an obstacle density of 10%, and a number of agents of 4 is generated.
S2: inputting the local tensor of the map into the convolutional neural network for preprocessing, wherein the local information otTo use a single intelligent body as a center, the side length is rlocalMap information within the square of the grid. The neural network architecture: 3 convolutional layers, 1 max pooling layer, 2 full-link layers. As shown in FIG. 4 as rlocalThe time map local tensor comprises, when being equal to 7:
(1) a': an obstacle, which regards the boundary as an obstacle if the local view is outside the global map;
(2) b': if the coordinate is outside the local range, connecting the intelligent agent with the target point of the intelligent agent, and projecting the point on the boundary as a target coordinate point;
(3) c': other agent (agent 2, 3) position coordinates;
(4) d': target point coordinates of other agents.
S3: as shown in fig. 5, the local information between agents preprocessed in S2 is transferred using a graph neural network.
S31: during one time step t, as in a' in fig. 5: at rlocalFor 7 agents in the grid as neighbors, b': constructing a graph, abstracting each agent into points, and obtaining local information observed by the agent after the preprocessing of S2, namely the point characteristic XtAt rlocalThe agent in the system is a neighbor, and an edge is arranged between the agent and the neighbor.
S32: as in c' of fig. 5: and constructing an adjacency matrix St and recording the neighbor information of all agents. The first action is the index of the current point and the other actions are the neighbors of the current point.
S33: calculating a graph convolutionWhereinIndicating the fusion of information with the k-th neighbor,Aka trained convolution filter is required for this fusion. The neighbors also perform graph convolution operation, so the graph convolution represents the information fusion between the point and the K-hop neighbors, wherein 1 hop refers to the neighbor, 2 hop refers to the neighbor, 3 hop refers to the neighbor of the neighbor, and so on. Information fusion of 2-hop neighbors is performed as in fig. 5. And carrying out relu activation operation on the graph convolution to form a graph neural network.
S4: and training the network parameters of the algorithm by a method combining simulation learning and reinforcement learning. Each agent copies a copy of the algorithm network, outputs a policy value matrix, and selects an action corresponding to the maximum value in the policy vector matrix at each time step: one of up, down, left, right, no movement.
As shown in fig. 6, the specific method of combining the mimic learning and the reinforcement learning is to input the local map information processed by the graph neural network in S3 into one of the mimic learning module and the reinforcement learning module probabilistically at the beginning of one training epadio, and the mimic learning will provide expert strategies to accelerate the trial-and-error exploration process of the reinforcement learning and help converge to the optimal strategies. Both optimize the same policy network parameters.
The reinforcement learning part utilizes an asynchronous dominant actor-critic algorithm to conduct exploration training, and an actor network passes through a dominant functionThe gradient descent optimization strategy network calculates a moving action strategy pi with an advantage function of Wherein T is the number of steps in a given time or the number of steps when a target is reached, theta is a value network parameter, gamma is a discount factor, rtFor the reward function, k is the number of steps, P (a)t| pi, o; theta) is the selection action atThe probability of (c). The critic network calculates the value V of the movement action and passes the loss function of VPerforming a gradient descent optimization strategy network, wherein theta' is a value network parameter, RtIs the cumulative reward calculated by the reward function.
The simulation learning part simulates observation-action pair tracks generated by an expert algorithm, and the expert algorithm is as follows: the traditional multi-agent path planning algorithm Greedy-Conflict Based Search (GCBS) calculates the cross entropy of the current strategy pi and the expert strategy actionAnd gradient descent is carried out, and gradient updating of the strategy network is realized, so that the strategy is closer to an expert algorithm.
Claims (5)
1. A multi-agent path planning method based on deep reinforcement learning is characterized by comprising the following steps:
s1: generating a complex data set, wherein the starting point and the target point of each agent, the different 2D grid square map sizes, the obstacle density and the number of agents are randomly generated and combined in the data set.
S2: inputting map local information tensor into a convolutional neural network for preprocessing, wherein the map local information takes a single agent body as a center and the side length is rlocalMap information within the square of each grid.
S3: the processed local information between agents is transferred S2 using the graphical neural network.
S4: and training the network parameters of the algorithm by a method combining simulation learning and reinforcement learning. Each agent copies a copy of the algorithm network, outputs a strategy, and selects an action strategy of one of the agent's up, down, left, right and no movement in time sequence.
2. The method for multi-agent path planning based on deep reinforcement learning as claimed in claim 1, wherein in step S1:
s1: a global grid map, obstacles, a number of agent start points, and a goal point binary map are generated using python or manually designed. The grid map is a square with the side length of 10, 50 or 100; the obstacle density is the percentage of the number of obstacle grids in the whole map to the number of the map grids, and can be selected to be 10%, 30% or 50%; the number of agents can be selected to be 4, 8, 32, 512 or 1024, and the agents must reach the target point, i.e. communicate. And traversing the combination in the generated map.
3. The method for multi-agent path planning based on deep reinforcement learning as claimed in claim 1, wherein in step S2:
the map local information tensor includes:
(1) an obstacle, and the boundary is considered as an obstacle;
(2) other agent location coordinates;
(3) if the coordinate is outside the local range, connecting the intelligent agent with the target point of the intelligent agent, and projecting the point on the boundary as a target coordinate point;
(4) target point coordinates of other agents.
4. The method for multi-agent path planning based on deep reinforcement learning as claimed in claim 1, wherein in step S3:
s31: in a time step t, constructing a graph, specifically: each agent is abstracted into a point, and the local information observed by the agent is the characteristic X of the pointtAt rlocalThe agent in the system is a neighbor, and an edge is arranged between the agent and the neighbor.
S32: constructing a adjacency matrix StAnd recording the neighbor information of all the agents. Adjacency matrix StThe first action in (1) is the index of the current point, and the other actions are the neighbors of the current point.
S33: calculating a graph convolutionWhereinIndicating the fusion of information with the k-th neighbor,Aka trained convolution filter is required for this fusion. The graph convolution shows the information fusion of the point and the K-hop neighbor, wherein 1 hop refers to the point, 2 hops refers to the neighbor, 3 hops refers to the neighbor of the neighbor, and the like. And carrying out relu activation operation on the graph convolution to form a graph neural network.
5. The method for multi-agent path planning based on deep reinforcement learning as claimed in claim 1, wherein in step S4:
the specific process of training is as follows: when a training epicode starts, the local map information processed by the graph neural network is input into one of the simulation learning module or the reinforcement learning module at random in a probabilistic manner, and the simulation learning gives a trial-and-error exploration process for accelerating the reinforcement learning of an expert strategy to help to converge to an optimal strategy. Both optimize the same policy network parameters.
The reinforcement learning part utilizes an asynchronous dominant actor-critic algorithm to conduct exploration training, an actor network calculates a moving action strategy pi, a critic network calculates the value V of the moving action, and a gradient descent optimization strategy network is conducted through a loss function of V.
The simulation learning is to simulate an observation-action pair track generated by an expert algorithm, the expert algorithm is a multi-agent path planning algorithm Greedy-Conflict Based Search (GCBS), the cross entropy of the current strategy pi and the expert strategy is calculated, gradient descent is carried out, and gradient updating of a strategy network is realized, so that the strategy is closer to the expert algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110468095.XA CN113159432A (en) | 2021-04-28 | 2021-04-28 | Multi-agent path planning method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110468095.XA CN113159432A (en) | 2021-04-28 | 2021-04-28 | Multi-agent path planning method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113159432A true CN113159432A (en) | 2021-07-23 |
Family
ID=76872031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110468095.XA Pending CN113159432A (en) | 2021-04-28 | 2021-04-28 | Multi-agent path planning method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113159432A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113612692A (en) * | 2021-08-11 | 2021-11-05 | 西安电子科技大学 | Centralized optical on-chip network self-adaptive route planning method based on DQN algorithm |
CN113850414A (en) * | 2021-08-20 | 2021-12-28 | 天津大学 | Logistics scheduling planning method based on graph neural network and reinforcement learning |
CN114415663A (en) * | 2021-12-15 | 2022-04-29 | 北京工业大学 | Path planning method and system based on deep reinforcement learning |
CN114489065A (en) * | 2022-01-20 | 2022-05-13 | 华中科技大学同济医学院附属同济医院 | Operating room medical material distribution multi-robot collaborative path planning method and application thereof |
CN114629798A (en) * | 2022-01-27 | 2022-06-14 | 清华大学 | Multi-agent collaborative planning method and device, electronic equipment and storage medium |
CN114676909A (en) * | 2022-03-25 | 2022-06-28 | 东南大学 | Unmanned vehicle charging path planning method based on deep reinforcement learning |
CN115493595A (en) * | 2022-09-28 | 2022-12-20 | 天津大学 | AUV path planning method based on local perception and near-end optimization strategy |
CN115907248A (en) * | 2022-10-26 | 2023-04-04 | 山东大学 | Multi-robot unknown environment path planning method based on geometric neural network |
CN115993831A (en) * | 2023-03-23 | 2023-04-21 | 安徽大学 | Method for planning path of robot non-target network based on deep reinforcement learning |
CN116187611A (en) * | 2023-04-25 | 2023-05-30 | 南方科技大学 | Multi-agent path planning method and terminal |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112286203A (en) * | 2020-11-11 | 2021-01-29 | 大连理工大学 | Multi-agent reinforcement learning path planning method based on ant colony algorithm |
CN112297005A (en) * | 2020-10-10 | 2021-02-02 | 杭州电子科技大学 | Robot autonomous control method based on graph neural network reinforcement learning |
CN112362066A (en) * | 2020-11-20 | 2021-02-12 | 西北工业大学 | Path planning method based on improved deep reinforcement learning |
-
2021
- 2021-04-28 CN CN202110468095.XA patent/CN113159432A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112297005A (en) * | 2020-10-10 | 2021-02-02 | 杭州电子科技大学 | Robot autonomous control method based on graph neural network reinforcement learning |
CN112286203A (en) * | 2020-11-11 | 2021-01-29 | 大连理工大学 | Multi-agent reinforcement learning path planning method based on ant colony algorithm |
CN112362066A (en) * | 2020-11-20 | 2021-02-12 | 西北工业大学 | Path planning method based on improved deep reinforcement learning |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113612692A (en) * | 2021-08-11 | 2021-11-05 | 西安电子科技大学 | Centralized optical on-chip network self-adaptive route planning method based on DQN algorithm |
CN113850414B (en) * | 2021-08-20 | 2023-08-04 | 天津大学 | Logistics scheduling planning method based on graph neural network and reinforcement learning |
CN113850414A (en) * | 2021-08-20 | 2021-12-28 | 天津大学 | Logistics scheduling planning method based on graph neural network and reinforcement learning |
CN114415663A (en) * | 2021-12-15 | 2022-04-29 | 北京工业大学 | Path planning method and system based on deep reinforcement learning |
CN114489065A (en) * | 2022-01-20 | 2022-05-13 | 华中科技大学同济医学院附属同济医院 | Operating room medical material distribution multi-robot collaborative path planning method and application thereof |
CN114489065B (en) * | 2022-01-20 | 2023-08-25 | 华中科技大学同济医学院附属同济医院 | Operating room medical material distribution multi-robot collaborative path planning method and application thereof |
CN114629798A (en) * | 2022-01-27 | 2022-06-14 | 清华大学 | Multi-agent collaborative planning method and device, electronic equipment and storage medium |
CN114629798B (en) * | 2022-01-27 | 2023-08-18 | 清华大学 | Multi-agent collaborative planning method and device, electronic equipment and storage medium |
CN114676909A (en) * | 2022-03-25 | 2022-06-28 | 东南大学 | Unmanned vehicle charging path planning method based on deep reinforcement learning |
CN114676909B (en) * | 2022-03-25 | 2024-04-09 | 东南大学 | Unmanned vehicle charging path planning method based on deep reinforcement learning |
CN115493595A (en) * | 2022-09-28 | 2022-12-20 | 天津大学 | AUV path planning method based on local perception and near-end optimization strategy |
CN115907248A (en) * | 2022-10-26 | 2023-04-04 | 山东大学 | Multi-robot unknown environment path planning method based on geometric neural network |
CN115993831B (en) * | 2023-03-23 | 2023-06-09 | 安徽大学 | Method for planning path of robot non-target network based on deep reinforcement learning |
CN115993831A (en) * | 2023-03-23 | 2023-04-21 | 安徽大学 | Method for planning path of robot non-target network based on deep reinforcement learning |
CN116187611A (en) * | 2023-04-25 | 2023-05-30 | 南方科技大学 | Multi-agent path planning method and terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113159432A (en) | Multi-agent path planning method based on deep reinforcement learning | |
CN110488859B (en) | Unmanned aerial vehicle route planning method based on improved Q-learning algorithm | |
Liu et al. | Multi-UAV path planning based on fusion of sparrow search algorithm and improved bioinspired neural network | |
Tang et al. | A novel hierarchical soft actor-critic algorithm for multi-logistics robots task allocation | |
CN113495578A (en) | Digital twin training-based cluster track planning reinforcement learning method | |
CN113900445A (en) | Unmanned aerial vehicle cooperative control training method and system based on multi-agent reinforcement learning | |
CN112327923A (en) | Multi-unmanned aerial vehicle collaborative path planning method | |
CN112947562A (en) | Multi-unmanned aerial vehicle motion planning method based on artificial potential field method and MADDPG | |
CN111609864A (en) | Multi-policeman cooperative trapping task allocation and path planning method under road network constraint | |
Ding et al. | Hierarchical reinforcement learning framework towards multi-agent navigation | |
CN108413963A (en) | Bar-type machine people's paths planning method based on self study ant group algorithm | |
CN110181508A (en) | Underwater robot three-dimensional Route planner and system | |
Zhang et al. | A self-heuristic ant-based method for path planning of unmanned aerial vehicle in complex 3-D space with dense U-type obstacles | |
Jin et al. | Inverse reinforcement learning via deep gaussian process | |
Chen et al. | Transformer-based imitative reinforcement learning for multi-robot path planning | |
Sui et al. | Path planning of multiagent constrained formation through deep reinforcement learning | |
CN116841317A (en) | Unmanned aerial vehicle cluster collaborative countermeasure method based on graph attention reinforcement learning | |
Tian et al. | The application of path planning algorithm based on deep reinforcement learning for mobile robots | |
Jin et al. | WOA-AGA algorithm design for robot path planning | |
Li et al. | Improving fast adaptation for newcomers in multi-robot reinforcement learning system | |
Li et al. | Improved genetic algorithm for multi-agent task allocation with time windows | |
Kermani et al. | Flight path planning using GA and fuzzy logic considering communication constraints | |
Chai et al. | Mobile robot path planning in 2d space: A survey | |
CN110598835B (en) | Automatic path-finding method for trolley based on Gaussian variation genetic algorithm optimization neural network | |
Araújo et al. | Cooperative observation of malicious targets in a 3d urban traffic environment using uavs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |