CN113159432A - Multi-agent path planning method based on deep reinforcement learning - Google Patents

Multi-agent path planning method based on deep reinforcement learning Download PDF

Info

Publication number
CN113159432A
CN113159432A CN202110468095.XA CN202110468095A CN113159432A CN 113159432 A CN113159432 A CN 113159432A CN 202110468095 A CN202110468095 A CN 202110468095A CN 113159432 A CN113159432 A CN 113159432A
Authority
CN
China
Prior art keywords
agent
map
reinforcement learning
point
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110468095.XA
Other languages
Chinese (zh)
Inventor
范钰捷
林志赟
王博
程自帅
韩志敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110468095.XA priority Critical patent/CN113159432A/en
Publication of CN113159432A publication Critical patent/CN113159432A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a multi-agent path planning method based on deep reinforcement learning. The method is a distributed path planning method, and is characterized in that local observation information of a single intelligent agent is input into a neural network, information among the intelligent agents is transmitted by using the neural network, a neural network approximate strategy function is trained, and therefore a moving strategy is output. The neural network parameters are trained by using a method combining deep reinforcement learning and simulation learning, so that the convergence of the return function is faster. After training, the higher success rate of group path planning in a four-neighborhood 2D grid map under the scale of thousands of agents can be realized, namely, a collision-free route from a starting point to a terminal point is successfully planned for each agent within the time limit. And has strong adaptability to the change of the map size and the obstacle density.

Description

Multi-agent path planning method based on deep reinforcement learning
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to multi-agent path planning based on deep reinforcement learning.
Background
Multi-agent path planning is a class of problems that finds a set of paths for multiple agents from a starting location to a target location without conflict, while achieving optimal constraints such as minimizing the sum of paths or sum of action costs for all agents, maximizing throughput, etc. The research aiming at the problem has a large number of application scenes in the fields of logistics, unmanned vehicles, military, security, games and the like.
The traditional algorithm for path planning of a single agent has many methods at home and abroad, such as an a-star algorithm, a particle swarm algorithm, a genetic algorithm, an ant colony algorithm, a simulated annealing algorithm and the like. With the improvement of the requirements of industry and living standard, the work of a single intelligent agent cannot meet the requirements of practical application. The multi-agent path planning technology capable of realizing group coordination is produced by the following steps: the traditional algorithms include M, CBS, WHCA and the variants thereof, and the path planning of less than 300 agents can be realized. The deep reinforcement learning method has structures such as DQN, Q-learning and MADDPG, and certain results are obtained.
However, multi-agent path planning based on deep reinforcement learning also faces several specific problems: the self-adaptability is poor under the conditions of various map sizes and high-density obstacles. Lack of communication among the agents leads to planning information blocking and congestion; with the increase of the number of agents, the state-behavior space of most path planning methods can generate dimension explosion, a great amount of calculation is needed, and the planning success rate (successfully planning a collision-free route from a starting point to a terminal point for each agent within the time limit) is limited; the training efficiency is low, and the training time is long.
Disclosure of Invention
In view of the above-mentioned shortcomings, the technical problem to be solved by the present invention is the lack of communication between agents in the prior art; the adaptability under the condition of a changeable map is poor; more intelligent agent information is easy to generate the limitation of dimension explosion; the return convergence and the training process caused by the frame design of the reinforcement learning algorithm are slow. Therefore, the application provides a multi-agent path planning method based on deep reinforcement learning. The method is a distributed path planning method, and is characterized in that local observation information of a single agent is input into a convolutional neural network for processing, information among agents is transmitted by using a graph neural network, a neural network approximate strategy function is trained, and therefore a mobile strategy is output. The neural network parameters are trained by using a method combining deep reinforcement learning and simulation learning, so that the convergence of the return function is faster. After training, the higher success rate of group path planning in a four-neighborhood 2D grid map under the scale of thousands of agents can be realized, namely, a collision-free route from a starting point to a terminal point is successfully planned for each agent within the time limit. And has strong adaptability to the change of the map size and the obstacle density.
In order to achieve the purpose, the technical scheme of the invention is as follows: a multi-agent path planning method based on deep reinforcement learning comprises the following steps:
s1: generating a complex data set, wherein the starting point and the target point of each agent, the different 2D grid square map sizes, the obstacle density and the number of agents are randomly generated and combined in the data set.
S2: inputting map local information tensor into a convolutional neural network for preprocessing, wherein the map local information takes a single agent body as a center and the side length is rlocalMap information within the square of each grid.
S3: the processed local information between agents is transferred S2 using the graphical neural network.
S4: and training the network parameters of the algorithm by a method combining simulation learning and reinforcement learning. Each agent copies a copy of the algorithm network, outputs a strategy, and selects an action strategy of one of the agent's up, down, left, right and no movement in time sequence.
Further, in the step S1:
s1: a global grid map, obstacles, a number of agent start points, and a goal point binary map are generated using python or manually designed. The grid map is a square with the side length of 10, 50 or 100; the obstacle density is the percentage of the number of obstacle grids in the whole map to the number of the map grids, and can be selected to be 10%, 30% or 50%; the number of agents can be selected to be 4, 8, 32, 512 or 1024, and the agents must reach the target point, i.e. communicate. And traversing the combination in the generated map.
Further, in the step S2:
the map local information tensor includes:
(1) an obstacle, and the boundary is considered as an obstacle;
(2) other agent location coordinates;
(3) if the coordinate is outside the local range, connecting the intelligent agent with the target point of the intelligent agent, and projecting the point on the boundary as a target coordinate point;
(4) target point coordinates of other agents.
Further, in the step S3:
s31: in a time step t, constructing a graph, specifically: each agent is abstracted into a point, and the local information observed by the agent is the characteristic X of the pointtAt rlocalThe agent in the system is a neighbor, and an edge is arranged between the agent and the neighbor.
S32: constructing a adjacency matrix StAnd recording the neighbor information of all the agents. Adjacency matrix StThe first action in (1) is the index of the current point, and the other actions are the neighbors of the current point.
S33: calculating a graph convolution
Figure BDA0003044110210000031
Wherein
Figure BDA0003044110210000032
Indicating the fusion of information with the k-th neighbor,
Figure BDA0003044110210000033
ka trained convolution filter is required for this fusion. The graph convolution shows the information fusion of the point and the K-hop neighbor, wherein 1 hop refers to the point, 2 hops refers to the neighbor, 3 hops refers to the neighbor of the neighbor, and the like. Performing relu activation operation on the graph volumeAnd (4) forming a neural network.
Further, in the step S4:
the specific process of training is as follows: when a training epicode starts, the local map information processed by the graph neural network is input into one of the simulation learning module or the reinforcement learning module at random in a probabilistic manner, and the simulation learning gives a trial-and-error exploration process for accelerating the reinforcement learning of an expert strategy to help to converge to an optimal strategy. Both optimize the same policy network parameters.
The reinforcement learning part utilizes an asynchronous dominant actor-critic algorithm to conduct exploration training, an actor network calculates a moving action strategy pi, a critic network calculates the value V of the moving action, and a gradient descent optimization strategy network is conducted through a loss function of V.
The simulation learning is to simulate an observation-action pair track generated by an expert algorithm, the expert algorithm is a multi-agent path planning algorithm Greedy-Conflict Based Search (GCBS), the cross entropy of the current strategy pi and the expert strategy is calculated, gradient descent is carried out, and gradient updating of a strategy network is realized, so that the strategy is closer to the expert algorithm.
The invention has the beneficial effects that: the distributed path planning algorithm is realized by utilizing the exchange of local information and the design of a single intelligent agent neural network, namely, the robot can only sense the local environment in reality to carry out the autonomous online planning of each intelligent agent, compared with the central planning, the calculation cost caused by dimension explosion is effectively reduced, and the planned path can be quickly calculated; local map information among a plurality of agents is transmitted by using a graph neural network, so that the action purposes of other agents are known, and the planning success rate is effectively improved; the training method combining reinforcement learning and simulation learning improves the efficiency of a trial-and-error exploration mode of reinforcement learning, improves the training and convergence speed, and simulates an expert algorithm to reduce collision, thereby embodying the coordination of groups.
Drawings
FIG. 1 is a flow chart of a method for multi-agent path planning based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a specific neural network structure of a deep reinforcement learning-based multi-agent path planning method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating the map information in step S1 of the multi-agent path planning method based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the 4-layer local observation tensors in step S2 of the deep reinforcement learning-based multi-agent path planning method according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating the conversion of the neighborhood map, the neighborhood map and the adjacency matrix in step S3 in the multi-agent path planning method based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of the combination of reinforcement learning and simulation learning method in step S4 of the deep reinforcement learning-based multi-agent path planning method according to an embodiment of the present invention;
Detailed Description
As shown in fig. 1 and fig. 2, for the method flow and the specific algorithm network structure of the present invention, the method for planning a multi-agent path based on deep reinforcement learning proposed by the present invention includes the following steps:
s1: generating a complex data set, wherein the starting point and the target point of each agent, the different 2D grid square map sizes, the obstacle density and the number of agents are randomly generated and combined in the data set.
A global grid map, obstacles, a number of agent start points, and a goal point binary map are generated using python or manually designed. The grid map is a square with side lengths of 10, 50 and 100; the obstacle density is the percentage of the number of obstacle grids in the whole map to the number of the map grids, and can be selected from 10%, 30% and 50%; the number of agents is 4, 8, 32, 512, 1024, and the agents must reach the target point, i.e. communicate. In the generated map, the traversal combination is carried out, and each item is represented by a binary matrix. As shown in fig. 3, a map with a side length of 10, an obstacle density of 10%, and a number of agents of 4 is generated.
S2: inputting the local tensor of the map into the convolutional neural network for preprocessing, wherein the local information otTo use a single intelligent body as a center, the side length is rlocalMap information within the square of the grid. The neural network architecture: 3 convolutional layers, 1 max pooling layer, 2 full-link layers. As shown in FIG. 4 as rlocalThe time map local tensor comprises, when being equal to 7:
(1) a': an obstacle, which regards the boundary as an obstacle if the local view is outside the global map;
(2) b': if the coordinate is outside the local range, connecting the intelligent agent with the target point of the intelligent agent, and projecting the point on the boundary as a target coordinate point;
(3) c': other agent (agent 2, 3) position coordinates;
(4) d': target point coordinates of other agents.
S3: as shown in fig. 5, the local information between agents preprocessed in S2 is transferred using a graph neural network.
S31: during one time step t, as in a' in fig. 5: at rlocalFor 7 agents in the grid as neighbors, b': constructing a graph, abstracting each agent into points, and obtaining local information observed by the agent after the preprocessing of S2, namely the point characteristic XtAt rlocalThe agent in the system is a neighbor, and an edge is arranged between the agent and the neighbor.
S32: as in c' of fig. 5: and constructing an adjacency matrix St and recording the neighbor information of all agents. The first action is the index of the current point and the other actions are the neighbors of the current point.
S33: calculating a graph convolution
Figure BDA0003044110210000051
Wherein
Figure BDA0003044110210000052
Indicating the fusion of information with the k-th neighbor,
Figure BDA0003044110210000053
Aka trained convolution filter is required for this fusion. The neighbors also perform graph convolution operation, so the graph convolution represents the information fusion between the point and the K-hop neighbors, wherein 1 hop refers to the neighbor, 2 hop refers to the neighbor, 3 hop refers to the neighbor of the neighbor, and so on. Information fusion of 2-hop neighbors is performed as in fig. 5. And carrying out relu activation operation on the graph convolution to form a graph neural network.
S4: and training the network parameters of the algorithm by a method combining simulation learning and reinforcement learning. Each agent copies a copy of the algorithm network, outputs a policy value matrix, and selects an action corresponding to the maximum value in the policy vector matrix at each time step: one of up, down, left, right, no movement.
As shown in fig. 6, the specific method of combining the mimic learning and the reinforcement learning is to input the local map information processed by the graph neural network in S3 into one of the mimic learning module and the reinforcement learning module probabilistically at the beginning of one training epadio, and the mimic learning will provide expert strategies to accelerate the trial-and-error exploration process of the reinforcement learning and help converge to the optimal strategies. Both optimize the same policy network parameters.
The reinforcement learning part utilizes an asynchronous dominant actor-critic algorithm to conduct exploration training, and an actor network passes through a dominant function
Figure BDA0003044110210000061
The gradient descent optimization strategy network calculates a moving action strategy pi with an advantage function of
Figure BDA0003044110210000062
Figure BDA0003044110210000063
Wherein T is the number of steps in a given time or the number of steps when a target is reached, theta is a value network parameter, gamma is a discount factor, rtFor the reward function, k is the number of steps, P (a)t| pi, o; theta) is the selection action atThe probability of (c). The critic network calculates the value V of the movement action and passes the loss function of V
Figure BDA0003044110210000064
Performing a gradient descent optimization strategy network, wherein theta' is a value network parameter, RtIs the cumulative reward calculated by the reward function.
The simulation learning part simulates observation-action pair tracks generated by an expert algorithm, and the expert algorithm is as follows: the traditional multi-agent path planning algorithm Greedy-Conflict Based Search (GCBS) calculates the cross entropy of the current strategy pi and the expert strategy action
Figure BDA0003044110210000065
And gradient descent is carried out, and gradient updating of the strategy network is realized, so that the strategy is closer to an expert algorithm.

Claims (5)

1. A multi-agent path planning method based on deep reinforcement learning is characterized by comprising the following steps:
s1: generating a complex data set, wherein the starting point and the target point of each agent, the different 2D grid square map sizes, the obstacle density and the number of agents are randomly generated and combined in the data set.
S2: inputting map local information tensor into a convolutional neural network for preprocessing, wherein the map local information takes a single agent body as a center and the side length is rlocalMap information within the square of each grid.
S3: the processed local information between agents is transferred S2 using the graphical neural network.
S4: and training the network parameters of the algorithm by a method combining simulation learning and reinforcement learning. Each agent copies a copy of the algorithm network, outputs a strategy, and selects an action strategy of one of the agent's up, down, left, right and no movement in time sequence.
2. The method for multi-agent path planning based on deep reinforcement learning as claimed in claim 1, wherein in step S1:
s1: a global grid map, obstacles, a number of agent start points, and a goal point binary map are generated using python or manually designed. The grid map is a square with the side length of 10, 50 or 100; the obstacle density is the percentage of the number of obstacle grids in the whole map to the number of the map grids, and can be selected to be 10%, 30% or 50%; the number of agents can be selected to be 4, 8, 32, 512 or 1024, and the agents must reach the target point, i.e. communicate. And traversing the combination in the generated map.
3. The method for multi-agent path planning based on deep reinforcement learning as claimed in claim 1, wherein in step S2:
the map local information tensor includes:
(1) an obstacle, and the boundary is considered as an obstacle;
(2) other agent location coordinates;
(3) if the coordinate is outside the local range, connecting the intelligent agent with the target point of the intelligent agent, and projecting the point on the boundary as a target coordinate point;
(4) target point coordinates of other agents.
4. The method for multi-agent path planning based on deep reinforcement learning as claimed in claim 1, wherein in step S3:
s31: in a time step t, constructing a graph, specifically: each agent is abstracted into a point, and the local information observed by the agent is the characteristic X of the pointtAt rlocalThe agent in the system is a neighbor, and an edge is arranged between the agent and the neighbor.
S32: constructing a adjacency matrix StAnd recording the neighbor information of all the agents. Adjacency matrix StThe first action in (1) is the index of the current point, and the other actions are the neighbors of the current point.
S33: calculating a graph convolution
Figure FDA0003044110200000021
Wherein
Figure FDA0003044110200000022
Indicating the fusion of information with the k-th neighbor,
Figure FDA0003044110200000023
Aka trained convolution filter is required for this fusion. The graph convolution shows the information fusion of the point and the K-hop neighbor, wherein 1 hop refers to the point, 2 hops refers to the neighbor, 3 hops refers to the neighbor of the neighbor, and the like. And carrying out relu activation operation on the graph convolution to form a graph neural network.
5. The method for multi-agent path planning based on deep reinforcement learning as claimed in claim 1, wherein in step S4:
the specific process of training is as follows: when a training epicode starts, the local map information processed by the graph neural network is input into one of the simulation learning module or the reinforcement learning module at random in a probabilistic manner, and the simulation learning gives a trial-and-error exploration process for accelerating the reinforcement learning of an expert strategy to help to converge to an optimal strategy. Both optimize the same policy network parameters.
The reinforcement learning part utilizes an asynchronous dominant actor-critic algorithm to conduct exploration training, an actor network calculates a moving action strategy pi, a critic network calculates the value V of the moving action, and a gradient descent optimization strategy network is conducted through a loss function of V.
The simulation learning is to simulate an observation-action pair track generated by an expert algorithm, the expert algorithm is a multi-agent path planning algorithm Greedy-Conflict Based Search (GCBS), the cross entropy of the current strategy pi and the expert strategy is calculated, gradient descent is carried out, and gradient updating of a strategy network is realized, so that the strategy is closer to the expert algorithm.
CN202110468095.XA 2021-04-28 2021-04-28 Multi-agent path planning method based on deep reinforcement learning Pending CN113159432A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110468095.XA CN113159432A (en) 2021-04-28 2021-04-28 Multi-agent path planning method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110468095.XA CN113159432A (en) 2021-04-28 2021-04-28 Multi-agent path planning method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN113159432A true CN113159432A (en) 2021-07-23

Family

ID=76872031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110468095.XA Pending CN113159432A (en) 2021-04-28 2021-04-28 Multi-agent path planning method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113159432A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113612692A (en) * 2021-08-11 2021-11-05 西安电子科技大学 Centralized optical on-chip network self-adaptive route planning method based on DQN algorithm
CN113850414A (en) * 2021-08-20 2021-12-28 天津大学 Logistics scheduling planning method based on graph neural network and reinforcement learning
CN114415663A (en) * 2021-12-15 2022-04-29 北京工业大学 Path planning method and system based on deep reinforcement learning
CN114489065A (en) * 2022-01-20 2022-05-13 华中科技大学同济医学院附属同济医院 Operating room medical material distribution multi-robot collaborative path planning method and application thereof
CN114629798A (en) * 2022-01-27 2022-06-14 清华大学 Multi-agent collaborative planning method and device, electronic equipment and storage medium
CN114676909A (en) * 2022-03-25 2022-06-28 东南大学 Unmanned vehicle charging path planning method based on deep reinforcement learning
CN115493595A (en) * 2022-09-28 2022-12-20 天津大学 AUV path planning method based on local perception and near-end optimization strategy
CN115907248A (en) * 2022-10-26 2023-04-04 山东大学 Multi-robot unknown environment path planning method based on geometric neural network
CN115993831A (en) * 2023-03-23 2023-04-21 安徽大学 Method for planning path of robot non-target network based on deep reinforcement learning
CN116187611A (en) * 2023-04-25 2023-05-30 南方科技大学 Multi-agent path planning method and terminal

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112286203A (en) * 2020-11-11 2021-01-29 大连理工大学 Multi-agent reinforcement learning path planning method based on ant colony algorithm
CN112297005A (en) * 2020-10-10 2021-02-02 杭州电子科技大学 Robot autonomous control method based on graph neural network reinforcement learning
CN112362066A (en) * 2020-11-20 2021-02-12 西北工业大学 Path planning method based on improved deep reinforcement learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112297005A (en) * 2020-10-10 2021-02-02 杭州电子科技大学 Robot autonomous control method based on graph neural network reinforcement learning
CN112286203A (en) * 2020-11-11 2021-01-29 大连理工大学 Multi-agent reinforcement learning path planning method based on ant colony algorithm
CN112362066A (en) * 2020-11-20 2021-02-12 西北工业大学 Path planning method based on improved deep reinforcement learning

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113612692A (en) * 2021-08-11 2021-11-05 西安电子科技大学 Centralized optical on-chip network self-adaptive route planning method based on DQN algorithm
CN113850414B (en) * 2021-08-20 2023-08-04 天津大学 Logistics scheduling planning method based on graph neural network and reinforcement learning
CN113850414A (en) * 2021-08-20 2021-12-28 天津大学 Logistics scheduling planning method based on graph neural network and reinforcement learning
CN114415663A (en) * 2021-12-15 2022-04-29 北京工业大学 Path planning method and system based on deep reinforcement learning
CN114489065A (en) * 2022-01-20 2022-05-13 华中科技大学同济医学院附属同济医院 Operating room medical material distribution multi-robot collaborative path planning method and application thereof
CN114489065B (en) * 2022-01-20 2023-08-25 华中科技大学同济医学院附属同济医院 Operating room medical material distribution multi-robot collaborative path planning method and application thereof
CN114629798A (en) * 2022-01-27 2022-06-14 清华大学 Multi-agent collaborative planning method and device, electronic equipment and storage medium
CN114629798B (en) * 2022-01-27 2023-08-18 清华大学 Multi-agent collaborative planning method and device, electronic equipment and storage medium
CN114676909A (en) * 2022-03-25 2022-06-28 东南大学 Unmanned vehicle charging path planning method based on deep reinforcement learning
CN114676909B (en) * 2022-03-25 2024-04-09 东南大学 Unmanned vehicle charging path planning method based on deep reinforcement learning
CN115493595A (en) * 2022-09-28 2022-12-20 天津大学 AUV path planning method based on local perception and near-end optimization strategy
CN115907248A (en) * 2022-10-26 2023-04-04 山东大学 Multi-robot unknown environment path planning method based on geometric neural network
CN115993831B (en) * 2023-03-23 2023-06-09 安徽大学 Method for planning path of robot non-target network based on deep reinforcement learning
CN115993831A (en) * 2023-03-23 2023-04-21 安徽大学 Method for planning path of robot non-target network based on deep reinforcement learning
CN116187611A (en) * 2023-04-25 2023-05-30 南方科技大学 Multi-agent path planning method and terminal

Similar Documents

Publication Publication Date Title
CN113159432A (en) Multi-agent path planning method based on deep reinforcement learning
CN110488859B (en) Unmanned aerial vehicle route planning method based on improved Q-learning algorithm
Liu et al. Multi-UAV path planning based on fusion of sparrow search algorithm and improved bioinspired neural network
Tang et al. A novel hierarchical soft actor-critic algorithm for multi-logistics robots task allocation
CN113495578A (en) Digital twin training-based cluster track planning reinforcement learning method
CN113900445A (en) Unmanned aerial vehicle cooperative control training method and system based on multi-agent reinforcement learning
CN112327923A (en) Multi-unmanned aerial vehicle collaborative path planning method
CN112947562A (en) Multi-unmanned aerial vehicle motion planning method based on artificial potential field method and MADDPG
CN111609864A (en) Multi-policeman cooperative trapping task allocation and path planning method under road network constraint
Ding et al. Hierarchical reinforcement learning framework towards multi-agent navigation
CN108413963A (en) Bar-type machine people's paths planning method based on self study ant group algorithm
CN110181508A (en) Underwater robot three-dimensional Route planner and system
Zhang et al. A self-heuristic ant-based method for path planning of unmanned aerial vehicle in complex 3-D space with dense U-type obstacles
Jin et al. Inverse reinforcement learning via deep gaussian process
Chen et al. Transformer-based imitative reinforcement learning for multi-robot path planning
Sui et al. Path planning of multiagent constrained formation through deep reinforcement learning
CN116841317A (en) Unmanned aerial vehicle cluster collaborative countermeasure method based on graph attention reinforcement learning
Tian et al. The application of path planning algorithm based on deep reinforcement learning for mobile robots
Jin et al. WOA-AGA algorithm design for robot path planning
Li et al. Improving fast adaptation for newcomers in multi-robot reinforcement learning system
Li et al. Improved genetic algorithm for multi-agent task allocation with time windows
Kermani et al. Flight path planning using GA and fuzzy logic considering communication constraints
Chai et al. Mobile robot path planning in 2d space: A survey
CN110598835B (en) Automatic path-finding method for trolley based on Gaussian variation genetic algorithm optimization neural network
Araújo et al. Cooperative observation of malicious targets in a 3d urban traffic environment using uavs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination