CN115047878A - DM-DQN-based mobile robot path planning method - Google Patents

DM-DQN-based mobile robot path planning method Download PDF

Info

Publication number
CN115047878A
CN115047878A CN202210673628.2A CN202210673628A CN115047878A CN 115047878 A CN115047878 A CN 115047878A CN 202210673628 A CN202210673628 A CN 202210673628A CN 115047878 A CN115047878 A CN 115047878A
Authority
CN
China
Prior art keywords
dqn
function
reward function
path planning
mobile robot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210673628.2A
Other languages
Chinese (zh)
Inventor
顾玉宛
朱智涛
吕继东
石林
徐守坤
刘铭雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou University
Original Assignee
Changzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou University filed Critical Changzhou University
Priority to CN202210673628.2A priority Critical patent/CN115047878A/en
Publication of CN115047878A publication Critical patent/CN115047878A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0238Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
    • G05D1/024Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0257Control of position or course in two dimensions specially adapted to land vehicles using a radar
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Optics & Photonics (AREA)
  • Electromagnetism (AREA)
  • Feedback Control In General (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to the technical field of DQN algorithm, in particular to a mobile robot path planning method based on DM-DQN, which comprises the steps of establishing a mobile robot path planning model based on DM-DQN; designing a state space, an action space, a DM-DQN network model and a reward function of the DM-DQN algorithm; and training the DM-DQN algorithm to obtain an experience reward value, and completing the collision-free path planning of the robot. The invention introduces a competitive network structure, decomposes the network structure into a value function and an advantage function, and decouples action selection and action evaluation, so that the state is not completely judged depending on the value of the action any more, and independent value prediction can be carried out, thereby solving the problem of low convergence speed; and by designing the reward function based on the artificial potential field, the problem that the robot is too close to the edge of the barrier is solved.

Description

DM-DQN-based mobile robot path planning method
Technical Field
The invention relates to the technical field of DQN (differential Quadrature reference network) algorithms, in particular to a mobile robot path planning method based on DM-DQN.
Background
With the development trend of artificial intelligence, the robot industry also develops towards the intelligent direction of autonomous learning and autonomous exploration, and the path planning of a mobile robot is a core problem in the motion of the robot, and aims to find an optimal or suboptimal path without collision from a starting point to a terminal point; with the continuous development of science and technology, the environment faced by the robot is more and more complex, and in an unknown environment, the information of the whole environment cannot be obtained, so that the traditional path planning algorithm cannot meet the requirements of people, for example: artificial potential field algorithm, ant colony algorithm, genetic algorithm, particle swarm algorithm and the like. Aiming at the situation, deep reinforcement learning is provided, and the deep learning is combined with the reinforcement learning, wherein the deep learning mainly extracts features of an input unknown environment state through a neural network, and the fitting of the environment state to an action value function is realized; and the reinforcement learning completes the decision according to the output of the deep neural network and the exploration strategy, thereby realizing the mapping from the state to the action. The combination of deep learning and reinforcement learning solves the problem of dimension disaster caused by mapping from states to actions, and can better meet the motion requirements of the robot in a complex environment.
Disclosure of Invention
Aiming at the defects of the existing algorithm, the invention introduces a competition network structure, decomposes the network structure into a value function and an advantage function, and thereby decouples the action selection and the action evaluation, so that the state is not completely dependent on the value of the action for judgment any more, the independent value prediction can be carried out, and the problem of low convergence speed is solved; and by designing the reward function based on the artificial potential field, the problem that the robot is too close to the edge of the barrier is solved.
The technical scheme adopted by the invention is as follows: a DM-DQN-based mobile robot path planning method comprises the following steps:
step one, establishing a mobile robot path planning model based on DM-DQN;
designing a state space, an action space, a DM-DQN network model and a reward function of the DM-DQN algorithm;
further, the structure of the DM-DQN network model is divided into a cost function V (s, ω, α) and a merit function a (s, a, ω, β), and the output of the DM-DQN network model is represented as:
Q(s,a,ω,α,β)=V(s,ω,α)+A(s,a,ω,β) (4)
where s represents the state, a represents the motion, ω is a parameter common to V and a, α and β are parameters of V and a, respectively, the value of V can be regarded as the average of the Q values in the state of s, the value of a is a limit with the average being 0, and the sum of the value of V and the value of a is the original Q value.
Further, the merit function is centralized, and the output of the DM-DQN network model is represented as:
Figure BDA0003690531640000021
where s denotes the state, a denotes the action, a' denotes the next action, a is an alternative action, ω is a common parameter for V and A, and α and β are parameters for V and A, respectively.
Further, the reward function is divided into a position reward function and a direction reward function, and a total reward function is calculated according to the position reward function and the direction reward function.
Further, in the position reward function, firstly, the gravity potential field function is used for constructing a target guide reward function:
Figure BDA0003690531640000031
where ζ represents the gravity reward function constant, d goal Representing the distance between the current position and the target point;
secondly, constructing an obstacle avoidance reward function by using a repulsive force potential field function, wherein the reward is a negative reward and is reduced along with the reduction of the distance between the robot and the obstacle:
Figure BDA0003690531640000032
wherein η represents a repulsive reward function constant, d obs Indicating the distance between the current position and the obstacle, d max Representing the maximum impact distance of the obstacle.
Further, the direction reward function is expressed according to the angle difference between the expected direction and the actual direction of the robot, and the formula is as follows:
Figure BDA0003690531640000033
wherein, F q Denotes the desired direction, F a Which is indicative of the actual direction of the light,
Figure BDA0003690531640000034
representing the angle between the expected direction and the actual direction;
the directional reward function may be expressed as:
Figure BDA0003690531640000035
further, the overall reward function of the mobile robot is expressed as:
Figure BDA0003690531640000036
wherein r is goal Representing the radius of the target area, r, centered on the target point obs Representing the radius of the impact zone centered on the obstacle;
and step three, training the DM-DQN algorithm, obtaining an experience reward value, and completing the collision-free path planning of the robot.
The invention has the beneficial effects that:
1. by introducing a competitive network structure, the network structure is decomposed into a value function and an advantage function, so that action selection and action evaluation are decoupled, the state is judged without completely depending on the value of the action, independent value prediction can be performed, the problem of low convergence speed is solved, and the network structure has better generalization performance.
2. By designing the reward function based on the artificial potential field, the problem that the robot is too close to the edge of the barrier is solved; the learning efficiency in the dynamic unknown environment is higher, the convergence speed is higher, and a collision-free path far away from an obstacle can be planned.
Drawings
Fig. 1 is a diagram of a DM-DQN network architecture of the present invention;
FIGS. 2(a) and (b) are a static environment diagram and a dynamic and static environment diagram, respectively, according to the present invention;
FIGS. 3(a), (b) are plots of reward values for the static and dynamic environments of the DM-DQN algorithm of the present invention;
fig. 4(a) and (b) are a static environment generation path diagram and a dynamic and static environment generation path diagram according to the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic drawings and which illustrate only the basic structure of the invention and, therefore, only show the structures associated with the invention.
Aiming at the problem of low convergence speed of M-DQN, the method is improved, a competition network structure is introduced, and the network structure is decomposed into a cost function and an advantage function; and aiming at the problem that the motion trail of the robot is too close to the edge of the obstacle, a reward function of the artificial potential field method is designed, so that the motion trail of the robot is far away from the periphery of the obstacle.
As shown in fig. 1, a DM-DQN-based mobile robot path planning method includes the following steps:
step one, establishing a DM-DQN-based mobile robot path planning model, and describing a mobile robot path planning problem as a Markov decision process;
first, the Q value is estimated by an online reducing Q-network with a weight of theta, and the weight of theta is copied to the weight of theta every C steps
Figure BDA0003690531640000051
In the target network of (2);
secondly, by interacting with the environment using an epsilon-greedy strategy, the robot gets a reward and the next state according to the designed artificial potential field based reward function, and finally transitions(s) t ,a t ,r t ,s t+1 ) Is stored in a fixed size FIFO playback buffer, and every F steps the DM-DQN randomly extracts a batch D from the playback buffer D t And according to the following formula:
Figure BDA0003690531640000052
returning to the target, and reducing the loss to the minimum;
where s represents the status, a represents the action, r represents the prize value, and γ represents the discount factor.
Figure BDA0003690531640000053
Satisfy the requirement of
Figure BDA0003690531640000054
τ is a hyperparameter for controlling the weight of the entropy, a' represents the action at time t +1, α is the hyperparameter set to 1,
Figure BDA0003690531640000055
indicating the policy selected in that state,
Figure BDA0003690531640000056
is an alternative action.
Designing a state space, an action space, a DM-DQN network model and a reward function of the DM-DQN algorithm;
the state space includes: the method comprises the following steps of (1) laser radar data, a current control instruction of the mobile robot, a control instruction of the mobile robot at the last moment, and the direction and distance of a target point;
the motion space includes: angular and linear velocities of the mobile robot;
the motion space of the robot is dispersed into 5 motions, the fixed linear velocity v is 0.15m/s, an angular velocity value is given, the angular velocity is selected instead of directly giving a corner for the output of the control quantity, the kinematic characteristics of the mobile robot are better met, and the angular velocity is given according to the following formula:
Figure BDA0003690531640000061
wherein action _ size represents that the action space is dispersed into 5 actions, action [5 ]]The values representing the actions are: 0 to 4, max _ angel vel The maximum angular velocity value representing the robot steering is 1.5rad/s, and 5 actions are calculated according to equation (2), as shown in equation (3), where linear velocity v is in m/s and angular velocity ω is in rad/s.
Figure BDA0003690531640000062
Further, the DM-DQN network model divides the network structure into two parts, as shown in fig. 1, the first part is only related to the state S, called cost function, and is represented as V (S, ω, α); another part is related to state S and action a, called the dominance function, denoted as a (S, a, ω, β), and therefore the output of the network can be expressed as:
Q(s,a,ω,α,β)=V(s,ω,α)+A(s,a,ω,β) (4)
wherein s represents the state, a represents the motion, ω is a common parameter of V and A, α and β are parameters of V and A respectively, the value of V can be regarded as the average of Q values in the state of s, the value of A is a limit with the average being 0, and the sum of the value of V and the value of A is the original Q value;
since there is a limit that the sum of the values a must be 0, the network will preferentially update the value V, which is the average of the values Q, and the adjustment of the average is equivalent to updating all the values Q in the state at one time, so that the network not only updates the value Q of a certain action, but adjusts the value Q of all the actions in the state at one time.
Further, in the robot path planning, the merit function is a condition that the learning robot does not detect the obstacle, and the merit function is a condition that the learning robot knows that the robot detects the obstacle, and in order to solve the identifiability problem, the merit function is centralized:
Figure BDA0003690531640000071
where s denotes the state, a denotes the action, a' denotes the next action, a is an alternative action, ω is a common parameter for V and A, and α and β are parameters for V and A, respectively.
Further, the reward function is designed according to an artificial potential field method, and is decomposed into two parts: the first part is a position reward function which comprises a target guide reward function and an obstacle avoidance reward function, wherein the target reward function is used for guiding the robot to quickly reach a target point, and the obstacle avoidance reward function is used for keeping the robot and an obstacle at a certain distance;
the second part is a direction reward function, the current orientation of the robot plays a key role in rational navigation, and the direction reward function is designed to guide the robot to move towards a target point in view of the fact that the direction of the resultant force applied to the robot in the artificial potential field can well fit the moving direction of the robot.
Further, in the position reward function, the gravity potential field function is firstly used for constructing a target guide reward function:
Figure BDA0003690531640000072
where ζ represents the gravity reward function constant, d goal Representing the distance between the current position and the target point;
secondly, constructing an obstacle avoidance reward function by using a repulsive force potential field function, wherein the reward is a negative reward and is reduced along with the reduction of the distance between the robot and the obstacle:
Figure BDA0003690531640000073
where eta represents a repulsive reward function constant, d obs Indicating current position to obstacleDistance between objects, d max Representing the maximum impact distance of the obstacle.
Further, in the direction reward function, the angular difference between the expected direction and the actual direction of the robot is expressed as:
Figure BDA0003690531640000081
wherein, F q Denotes the desired direction, F a Which represents the actual direction of the light beam,
Figure BDA0003690531640000082
representing the angle between the expected direction and the actual direction;
thus, the directional reward function may be expressed as:
Figure BDA0003690531640000083
further, the overall reward function may be expressed as:
Figure BDA0003690531640000084
the overall reward function for a mobile robot is expressed as:
Figure BDA0003690531640000085
wherein r is goal Representing the radius of the target area, r, centered on the target point obs Indicating the radius of the impact zone centered on the obstacle.
Designing a simulation environment, interacting the mobile robot with the environment, acquiring training data, sampling the training data to carry out simulation training on the mobile robot, and completing collision-free path planning;
and step three, training the DM-DQN algorithm to obtain an experience reward value, and completing the collision-free path planning of the robot.
The specific experimental steps are as follows:
a virtual simulation environment is created through a Gazebo simulator, and a robot model is established through the Gazebo to realize a path planning task, wherein the simulation environment comprises a static environment and a dynamic and static environment as shown in FIG. 2, FIG. 2(a) is the static environment, and FIG. 2(b) is the dynamic environment;
and implementing a path planning algorithm by adopting a python language and calling a built-in Gazebo simulator to control the motion of the robot and acquire the perception information of the robot.
The DM-DQN algorithm obtains an experience reward value through 320 times of simulation training, as shown in fig. 3, fig. 3(a) and (b) respectively show that the DM-DQN algorithm records the accumulated reward of each round and the average reward of the agent in static environment and dynamic and static environment, wherein each point represents one round, and a black curve represents the average reward, which indicates that the DM-DQN adopts a competitive network structure, and action selection and action evaluation are decoupled so that it has a faster learning rate, so that the experience of environmental exploration in the early stage can be more fully utilized, thereby obtaining a greater reward.
7 points are appointed for the robot to navigate, the robot autonomously reaches the No. 2-No. 7 positions from the No. 1 position in sequence without collision in an unknown environment and then returns to the No. 1 position, and collision-free path planning is achieved, as shown in FIG. 4.
As shown in table 1, comparing the DM-DQN algorithm of the present invention with the existing algorithm under the same training condition, the average moving times to a target point and the number of times to successfully reach the target point in 300 rounds are respectively compared, and it can be found from the table that the average moving times of DM-DQN is the least, and the success times is increased by 50% compared with the DQN algorithm; increased by 23.6% compared with dulling DQN; compared with M-DQN, the increase is 19.3%.
TABLE 1
Figure BDA0003690531640000091
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims (7)

1. A DM-DQN-based mobile robot path planning method is characterized by comprising the following steps:
step one, establishing a mobile robot path planning model based on DM-DQN;
designing a state space, an action space, a DM-DQN network model and a reward function of the DM-DQN algorithm;
and step three, training the DM-DQN algorithm to obtain an experience reward value, and completing the collision-free path planning of the robot.
2. The DM-DQN based mobile robot path planning method according to claim 1, wherein the structure of the DM-DQN network model is divided into cost function V (s, ω, α) and dominance function a (s, a, ω, β), and the output of DM-DQN network model is expressed as:
Q(s,a,ω,α,β)=V(s,ω,α)+A(s,a,ω,β) (4)
where s represents the state, a represents the motion, ω is a parameter common to V and a, α and β are parameters of V and a, respectively, and the value of V is the average of the values of Q in the state of s.
3. The DM-DQN based mobile robot path planning method of claim 2, in which the dominance function is centralized, the output of DM-DQN network model is expressed as:
Figure FDA0003690531630000011
where s represents the state, a represents the action, a' represents the next action, a is the alternative action, ω is a parameter common to V and a, and α and β are the parameters of V and a, respectively.
4. The DM-DQN based mobile robot path planning method of claim 1, wherein: the reward function is divided into a position reward function and a direction reward function, and a total reward function is calculated according to the position reward function and the direction reward function.
5. The DM-DQN based mobile robot path planning method of claim 4, wherein in the location reward function, an objective guided reward function is first constructed using a gravitational potential field function:
Figure FDA0003690531630000021
where ζ represents the gravity reward function constant, d goal Representing the distance between the current position and the target point;
secondly, constructing an obstacle avoidance reward function by using a repulsive force potential field function:
Figure FDA0003690531630000022
where eta represents a repulsive reward function constant, d obs Indicating the distance between the current position and the obstacle, d max Representing the maximum impact distance of the obstacle.
6. The DM-DQN based mobile robot path planning method of claim 4, wherein the direction reward function is expressed in terms of the angle difference between the robot's expected and actual directions, the formula of the angle difference is:
Figure FDA0003690531630000023
wherein, F q Denotes the desired direction, F a Which represents the actual direction of the light beam,
Figure FDA0003690531630000024
representing the angle between the expected direction and the actual direction;
the directional reward function is expressed as:
Figure FDA0003690531630000025
7. a DM-DQN based mobile robot path planning method according to claim 4, characterized in that the total reward function represents:
Figure FDA0003690531630000031
wherein r is goal Representing the radius of the target area, r, centered on the target point obs Indicating the radius of the impact zone centered on the obstacle.
CN202210673628.2A 2022-06-13 2022-06-13 DM-DQN-based mobile robot path planning method Pending CN115047878A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210673628.2A CN115047878A (en) 2022-06-13 2022-06-13 DM-DQN-based mobile robot path planning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210673628.2A CN115047878A (en) 2022-06-13 2022-06-13 DM-DQN-based mobile robot path planning method

Publications (1)

Publication Number Publication Date
CN115047878A true CN115047878A (en) 2022-09-13

Family

ID=83161444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210673628.2A Pending CN115047878A (en) 2022-06-13 2022-06-13 DM-DQN-based mobile robot path planning method

Country Status (1)

Country Link
CN (1) CN115047878A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116382304A (en) * 2023-05-26 2023-07-04 国网江苏省电力有限公司南京供电分公司 DQN model-based multi-inspection robot collaborative path planning method and system
CN116527567A (en) * 2023-06-30 2023-08-01 南京信息工程大学 Intelligent network path optimization method and system based on deep reinforcement learning
CN117474295A (en) * 2023-12-26 2024-01-30 长春工业大学 Multi-AGV load balancing and task scheduling method based on lasting DQN algorithm

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116382304A (en) * 2023-05-26 2023-07-04 国网江苏省电力有限公司南京供电分公司 DQN model-based multi-inspection robot collaborative path planning method and system
CN116382304B (en) * 2023-05-26 2023-09-15 国网江苏省电力有限公司南京供电分公司 DQN model-based multi-inspection robot collaborative path planning method and system
CN116527567A (en) * 2023-06-30 2023-08-01 南京信息工程大学 Intelligent network path optimization method and system based on deep reinforcement learning
CN116527567B (en) * 2023-06-30 2023-09-12 南京信息工程大学 Intelligent network path optimization method and system based on deep reinforcement learning
CN117474295A (en) * 2023-12-26 2024-01-30 长春工业大学 Multi-AGV load balancing and task scheduling method based on lasting DQN algorithm
CN117474295B (en) * 2023-12-26 2024-04-26 长春工业大学 Dueling DQN algorithm-based multi-AGV load balancing and task scheduling method

Similar Documents

Publication Publication Date Title
CN115047878A (en) DM-DQN-based mobile robot path planning method
Jiang et al. Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge
Qiu et al. A multi-objective pigeon-inspired optimization approach to UAV distributed flocking among obstacles
CN112677995B (en) Vehicle track planning method and device, storage medium and equipment
CN109144102B (en) Unmanned aerial vehicle route planning method based on improved bat algorithm
US7765029B2 (en) Hybrid control device
CN111930121B (en) Mixed path planning method for indoor mobile robot
CN112433525A (en) Mobile robot navigation method based on simulation learning and deep reinforcement learning
CN112731916A (en) Global dynamic path planning method integrating skip point search method and dynamic window method
CN114489059A (en) Mobile robot path planning method based on D3QN-PER
CN111506063B (en) Mobile robot map-free navigation method based on layered reinforcement learning framework
Cai et al. A PSO-based approach with fuzzy obstacle avoidance for cooperative multi-robots in unknown environments
Wan et al. ME‐MADDPG: An efficient learning‐based motion planning method for multiple agents in complex environments
CN113759901A (en) Mobile robot autonomous obstacle avoidance method based on deep reinforcement learning
Chang et al. Interpretable fuzzy logic control for multirobot coordination in a cluttered environment
CN116360457A (en) Path planning method based on self-adaptive grid and improved A-DWA fusion algorithm
Sundarraj et al. Route planning for an autonomous robotic vehicle employing a weight-controlled particle swarm-optimized Dijkstra algorithm
CN117434950A (en) Mobile robot dynamic path planning method based on Harris eagle heuristic hybrid algorithm
Raiesdana A hybrid method for industrial robot navigation
Smit et al. Informed sampling-based trajectory planner for automated driving in dynamic urban environments
Feng et al. A hybrid motion planning algorithm for multi-robot formation in a dynamic environment
CN115542921A (en) Autonomous path planning method for multiple robots
Yung et al. Avoidance of moving obstacles through behavior fusion and motion prediction
CN114740873A (en) Path planning method of autonomous underwater robot based on multi-target improved particle swarm algorithm
CN114545971A (en) Multi-agent distributed flyable path planning method, system, computer equipment and medium under communication constraint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination