CN113759929B - Multi-agent path planning method based on reinforcement learning and model predictive control - Google Patents
Multi-agent path planning method based on reinforcement learning and model predictive control Download PDFInfo
- Publication number
- CN113759929B CN113759929B CN202111107563.7A CN202111107563A CN113759929B CN 113759929 B CN113759929 B CN 113759929B CN 202111107563 A CN202111107563 A CN 202111107563A CN 113759929 B CN113759929 B CN 113759929B
- Authority
- CN
- China
- Prior art keywords
- agent
- time
- path planning
- follower
- particle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 18
- 230000002787 reinforcement Effects 0.000 title claims abstract description 12
- 239000002245 particle Substances 0.000 claims abstract description 72
- 239000000126 substance Substances 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 238000009499 grossing Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 description 7
- 238000011160 research Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Feedback Control In General (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a multi-agent path planning method based on reinforcement learning and model predictive control, which utilizes a path planning and tracking method combining ESB-MADPPG and MPC algorithms for the path planning problem of multi-agents, and comprises the following basic steps: firstly, a multi-agent system model is simplified into a particle model, then an ESB-MADDPG algorithm is used for path planning, and finally all paths are followed through model prediction control, so that the path planning of the multi-agent system can be quickly realized, and a foundation is laid for the large-scale multi-agent system to execute tasks.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a multi-agent path planning method based on reinforcement learning and model predictive control.
Background
With the development and maturity of artificial intelligence theory and related research technologies, multi-agent systems are being researched and applied more and more widely. The multi-agent system is an autonomous intelligent system which utilizes information interaction and feedback, excitation and response and other interaction behaviors to realize behavior coordination, adapts to a dynamic environment and finally completes specific tasks together.
The multi-agent system is an important application research field of group intelligence and is one of the important directions for the future development of the intelligent system. Path planning is the focus of research on multi-agent systems, and focuses on considering the global optimal path of the whole system, such as the shortest total path length or the smallest total energy consumption of the system path. The efficiency and success rate of the multi-agent system for executing tasks can be improved only by planning the most effective path of the whole system.
At present, for a path planning task of a multi-agent system, only a single target is optimized, and for multi-agent path planning with multi-target optimization requirements, the path optimization between a plurality of agents and a plurality of targets is difficult to realize by the existing path planning method.
Disclosure of Invention
Aiming at the problem that the existing path planning method cannot meet the path optimization between a plurality of intelligent agents and a plurality of targets, the invention provides a multi-intelligent-agent path planning method based on reinforcement learning and model predictive control.
The basic design idea of the invention is as follows:
on the basis of multi-agent depth certainty strategy gradient (MADDPG), the idea of an Expert System (ESB) and Model Predictive Control (MPC) are added. Firstly, simplifying the intelligent agents in a multi-agent system into a particle model, and then introducing an expert system to smooth the path drawn by the MADDPG algorithm and accelerate the convergence time; and finally, summarizing all paths obtained by the ESB-MADDPG algorithm, and following all paths through model prediction control, so that the multi-agent system can realize path planning meeting the multi-objective optimization requirement.
The specific technical scheme of the invention is as follows:
a multi-agent path planning method based on reinforcement learning and model predictive control comprises the following steps:
step 1: establishing a multi-agent system model, and acquiring initial state information of the multi-agent system model: the initial state information includes a number of agents in the multi-agent system model ofnThe number of target points isn、Arbitrary agents under global coordinatesiCurrent position coordinate is p i Each target pointjPosition coordinate p j The position coordinates of the target point are artificially given according to the multi-agent path planning task requirements; (i,j)∈n;
And 2, step: converting the multi-agent system model into a particle model;
particle models include models corresponding tonOf a personal agentnThe mass points of the material are distributed on the surface of the material,nthe start position coordinates of each mass point are the current position coordinates of the intelligent agent corresponding to the start position coordinates,nthe termination position coordinates of each mass point are the position coordinates of a target point corresponding to the termination position coordinates;
assigning each particle a start coordinate of an observation rangeGiving each particle a termination coordinate of a observable range;
And step 3: carrying out path planning by utilizing an ESB-MADDPG algorithm;
step 3.1: solving the reward value at each moment according to a formular:
step 3.2: obtaining any particle from step 2iThe start bit coordinate and the end bit coordinate of the point are obtained by the ESB-MADDPG algorithmiCurrent time state o i ;
Current time state o i From particleiCurrent time of day coordinates, and particleiThe relative position of the current time coordinate of the other particles and the current time coordinate of other particles;
step 3.3: obtaining a state o at a current time based on a motion estimation network i Lower mass pointiCurrent time of day actionI.e. by;
Wherein the content of the first and second substances,from particleiIs/are as followsx、ySpeed on the shaft;
step 3.4: in the mass pointiAction of selecting current momentAnd after execution, particlesiWill reach a new state;
Step 3.5: repeatedly executing steps 3.3-3.4 for a totalmAt each of the time points, the time point,mless than or equal to 50 to obtain particlesiState results of path planning at all timesAll the time state results obtained by the training are comparedMiddle mass pointiAre connected to obtain particlesiSet of paths of (1);
Step 3.7: for the path set obtained by trainingJudging; the judgment standard is the observation range in the final time state of all particlesAll have observable ranges thereinExist, i.e. areAndall contact, if yes, the initial path planning is considered to be finished at the moment, and the step 3.9 is executed;
if not, repeating the steps 3.1-3.6 for a totalMSecondly, makeFilling the experience pool D, and executing the step 3.8;M≥100;
step 3.8: from experience poolsDRandomly sampling a small portion of the sampleCalculated by a state estimation networkQA value;Qthe value is used for evaluating the action quality output by the action estimation network;
at the same time, the sample is mixedInputting the data into a motion estimation network, and estimating network parameters for the motion through a strategy gradient formulaUpdating, the updated action estimating network parametersInputting the data into a motion estimation network, and returning to the step 3.1;
step 3.9: carrying out smoothing processing on the initial path meeting the requirements and outputting the initial path;
and 4, step 4: tracking the path by using a model predictive control algorithm;
step 4.1: establishing an intelligent agent tracking model;
setting mass pointiFor virtual leadersIts initial time position is the mass pointiThe reference track is a smoothed particleiA path of (a); set and particleiThe corresponding intelligent agent is a followerThe initial time position of the agent is the agent in step 1iThe position of (a); set following personAnd a virtual leaderThe ideal control relationship between the two is;
l 1 Representing a distance between the virtual leader and the follower;
represents a deviation in orientation between the virtual leader and the follower, anAll the initial values of (1) are 0;
step 4.2: according to virtual leadersAnd following personObtaining the virtual leadership according to the respective speed, angular speed and distance between the two under the global coordinate systemAAnd following personThe expression of the control relation is combined with the kinematic formula of the intelligent agent to establish a tracking control model;
step 4.3: according to following personPosition and initial velocity at initial time, and tracking control model for predicting follower at t timeAnd a virtual leaderControl relationship between;
Step 4.4: predicted by step 4.3With setting in step 4.1Comparing and calculating the error e of the two t And correcting;
step 4.5: error e is corrected by particle swarm algorithm t Carrying out optimization correction, and calculating the follower at the t +1 moment through speed input at the t +1 momentAnd a virtual leaderControl relationship of;
Step 4.6: judging whether the control termination time is reached, if so, outputting a tracking path, and otherwise, returning to the step 4.4; the control termination time is the duration of the reference trajectory in step 4.1;
step 4.7: and tracking the initial paths of the rest particles according to the steps 4.1-4.6, and finally completing path planning of all the intelligent agents.
Further, the expression of the tracking control model in the step 4.2 is as follows:
the expression of the relationship between the virtual leader and follower is:
the kinematic formula of the intelligent agent is as follows:
wherein the content of the first and second substances,is the speed of the follower or the speed of the follower,is the angular velocity of the follower and,is the speed of the virtual leader or the virtual leader,is the angular velocity of the virtual leader,representing the follower's speed input.
Further, step 3.2 above i The expression of (a) is:
relative position p ij The solving formula is as follows:
further, in the above step 3.8QThe expression of (a) is:
wherein:in order to be able to use the attenuation factor,a value is awarded for a new time instant obtained at a new time instant.
Further, in step 3.8, the network parameters are estimated for the action by the policy gradient formulaThe strategy gradient formula for updating is as follows:
wherein: s represents the number of samples to be sampled,representing the strategic gradient method applied to its updated parameters,。
the invention has the beneficial effects that:
1. aiming at the problem of path planning of the multi-agent, the invention utilizes a path planning and tracking algorithm combined with ESB-MADPPG and MPC algorithms to quickly realize the path planning of the multi-agent system and lay a foundation for the large-scale multi-agent system to execute tasks.
2. According to the invention, by designing the reward value in the ESB-MADDPG algorithm and the neural network in the algorithm, the mutual interference between the paths of each particle system is avoided, and the distance of the path reaching the target point position is shortest; the kinematics model of the intelligent agent point is introduced through the MPC algorithm, the speed in the intelligent agent tracking path can be optimized, and the optimized multi-intelligent agent path is obtained.
Drawings
FIG. 1 is a flow chart of a basic implementation of the present invention;
FIG. 2 is a flow chart of path planning using ESB-MADDPG;
FIG. 3 is a flow chart of a path tracking based on model prediction for PSO;
FIG. 4 is a schematic diagram of a particle model;
FIG. 5 is a smoothed multi-agent trajectory graph;
FIG. 6 is a schematic diagram of an agent tracking model;
FIG. 7 is a graph of agent path tracking error, where (a) - (f) represent the path tracking error for agent as the follower for particle A, B, C, D, E, F, respectively.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings.
The embodiment provides a multi-agent path planning method based on reinforcement learning and model predictive control, in the embodiment, the agents are robots, and the implementation flow is as shown in fig. 1, and specifically includes the following steps:
step 1: establishing a multi-agent system model, and acquiring initial state information of the multi-agent system model: the initial state information includes a number of agents in the multi-agent system model ofnThe number of target points isn、Arbitrary agents under global coordinatesiCurrent position coordinate is p i Each target pointjPosition coordinate p j The position coordinates of the target point are artificially given according to the multi-agent path planning task requirement; (i,j)∈n;
Step 2: converting the multi-agent system model into a particle model; particle models include models corresponding tonOf a personal agentnThe number of particles is one,nthe start position coordinates of each mass point are the current position coordinates of the intelligent agent corresponding to the start position coordinates,nthe termination position coordinates of the mass points are the position coordinates of the target points corresponding to the termination position coordinates;
assigning each particle a start coordinate of an observation rangeGiving each particle a termination coordinate of a observable rangeAs shown in FIG. 4, in this embodiment, there are six agents, i.e. there are six particles (i.e. A, B, C, D, E, F), and the white area is the observation range of the start coordinates of the six particlesThe black area is the observed range of the termination coordinates of six particles;
And 3, step 3: path planning is carried out by utilizing an ESB-MADDPG algorithm, and the basic flow is shown in figure 2;
step 3.1: solving the reward value at each moment according to a formular:
step 3.2: obtaining particles according to step 2iThe start bit coordinate and the end bit coordinate of the point are obtained by the ESB-MADDPG algorithmiCurrent time state o i ;
Current time state o i From particleiCurrent time of day coordinates, and particlesiThe relative position of the current time coordinate of the other particles and the current time coordinate of other particles;
step 3.3: obtaining a state o at a current time based on a motion estimation network i Lower mass pointiCurrent time of day actionI.e. by;
Wherein the content of the first and second substances,from particleiIs/are as followsx、ySpeed on the shaft;
step 3.4: in the mass pointiAction of selecting current momentAnd after execution, mass pointiWill reach a new state;
Step 3.5: repeatedly executing steps 3.3-3.4 for a totalmAt each time, 30 times in this embodiment, the particles are obtainediPlanning the state results of all the time of a path, and obtaining the state of all the time of the trainingThe positions of the medium particles i are connected to obtain particlesiOf (2) a;
Step 3.7: for the set obtained by trainingJudging; the judgment standard is that all particle observation ranges in the final time stateAll have observable ranges thereinExist, i.e. areAndall contact, if yes, the initial path planning is considered to be finished at the moment, and the step 3.9 is executed;
if not, repeating the steps 3.1-3.6 for a totalMNext, the number of times in this embodiment is 100, so thatFilling the experience pool D, and executing the step 3.8;
step 3.8: from experience poolsDSampling a small portion of the sample at randomCalculated by a state estimation networkQA value;Qthe value is used for evaluating the action quality output by the action estimation network;
wherein:in order to be able to use the attenuation factor,a value is awarded for a new time instant obtained at a new time instant.
At the same time, according to the above-mentioned sampleInputting the parameters into the action estimation network, and carrying out the network parameter pair through a strategy gradient formulaAnd (3) updating:
wherein: s represents the number of samples to be sampled,representing the strategic gradient method applied to its updated parameters,。
step 3.9: and smoothing the initial path meeting the requirement and outputting the initial path. Due to the initial path obtainedThe trajectory is a non-smooth trajectory, in order to enable the intelligent agent to track the trajectory in real time, the trajectories formed by connecting a series of straight line segments need to be smoothed, a B-spline curve is adopted as a smoothing mode, and fig. 5 shows a smoothed multi-agent path;
and 4, step 4: tracking the path by using a model predictive control algorithm, wherein a flow chart of the algorithm is shown in FIG. 3;
step 4.1: establishing an agent tracking model As shown in FIG. 6, particles are setiFor virtual leadersThe initial time position is the mass pointiThe reference track is a smoothed particleiA path of (a); set and particleiThe corresponding intelligent agent is a followerThe initial time position of the intelligent agent in the step 1iThe position of (a); set followerAnd a virtual leaderThe ideal control relationship between the two is;
l 1 Representing a distance between the virtual leader and the follower;
represents a deviation in orientation between the virtual leader and the follower, anAll initial values of (1) are 0;
step 4.2: according to virtual leadersAnd following personAcquiring the virtual leader according to the respective speed, angular speed and distance between the two under the global coordinate systemFollowing personThe expression of the control relation is combined with the kinematic formula of the intelligent agent to establish a tracking control model;
wherein the expression of the relationship between the virtual leader and the follower is:
the agent kinematics formula is:
the expression of the tracking control model is:
wherein the content of the first and second substances,is the speed of the follower or the speed of the follower,is the angular velocity of the follower and,is the speed of the leader or the speed of the leader,is the angular velocity of the leader or the leader,representing the follower's speed input.
Step 4.3: according to following personPosition and initial velocity at an initial time, and a tracking control model to predict follower output at a next time;;
Step 4.4: predicted by step 4.3With setting in step 4.1Comparing and calculating the error e of the two t And correcting; as shown in fig. 7 (a), the path tracking error of the agent corresponding to the particle a as the follower (b) to(f) Path tracking errors for agents corresponding to the remaining five particles (i.e., B, C, D, E, F) as followers;
step 4.5: error e is corrected by Particle Swarm Optimization (PSO) t Performing optimization correction, calculating the speed input at the t +1 moment, and calculating the follower at the t +1 momentAnd a virtual leaderControl relationship of;
Step 4.6: judging whether the control termination time is reached, wherein the control termination time is the reference track duration in the step 4.1; if yes, outputting a tracking path, otherwise, executing the step 4.4 again;
step 4.7: and tracking the initial paths of the other 5 particles according to steps 4.1-4.6, and finally completing path planning of all 6 intelligent agents.
Claims (5)
1. A multi-agent path planning method based on reinforcement learning and model predictive control is characterized by comprising the following steps:
step 1: establishing a multi-agent system model, and acquiring initial state information of the multi-agent system model: the initial state information includes a number of agents in the multi-agent system model ofnThe number of target points isn、Arbitrary agents under global coordinatesiCurrent position coordinate is p i Each target pointjPosition coordinate p j The position coordinates of the target point are artificially given according to the multi-agent path planning task requirement; (i,j)∈n;
Step 2: converting the multi-agent system model into a particle model;
particle models include models corresponding tonOf a personal agentnThe number of particles is one,nthe start bit coordinates of each particle are corresponding toThe current location coordinates of the agent in question,nthe termination position coordinates of each mass point are the position coordinates of a target point corresponding to the termination position coordinates;
assigning each particle a start coordinate of an observation rangeGiving each particle a termination coordinate of a observable range;
And 3, step 3: carrying out path planning by utilizing an ESB-MADDPG algorithm;
step 3.1: solving the reward value at each moment according to a formular:
step 3.2: obtaining any particle according to step 2iThe start bit coordinate and the end bit coordinate of the point are obtained by the ESB-MADDPG algorithmiCurrent time state o i ;
Current time state o i From particleiCurrent time of day coordinates, and particlesiThe relative position of the current time coordinate of the other particles and the current time coordinate of other particles;
step 3.3: estimating a network from motionTo obtain the state o at the current time i Lower mass pointiCurrent time of day actionI.e. by;
Wherein the content of the first and second substances,composed of particlesiIs/are as followsx、ySpeed build on the shaft;
step 3.4: in the particleiAction of selecting current momentAnd after execution, mass pointiWill reach a new state;
Step 3.5: repeatedly executing steps 3.3-3.4 for a totalmAt the time of each of the time points,mless than or equal to 50 to obtain particlesiState results of path planning at all timesAll the time shapes obtained by the training are recordedState resultMiddle mass pointiAre connected to obtain particlesiSet of paths of;
Step 3.7: for the path set obtained by trainingJudging; the judgment standard is the observation range in the final time state of all particlesAll have observable ranges thereinExist, i.e. areAndall the contact points are contacted, if yes, the initial path planning is considered to be finished at the moment, and the step 3.9 is executed;
if not, repeating the steps 3.1-3.6 for a totalMSecondly, makeFilling the experience pool D, and executing the step 3.8;M≥100;
step 3.8: from experience poolsDRandomly sampling a small portion of the sampleCalculated by a state estimation networkQA value;Qthe value is used for evaluating the action quality output by the action estimation network;
at the same time, the sample is mixedInputting the data into the action estimation network, and estimating network parameters for the action by a strategy gradient formulaUpdating, the updated action estimation network parameterInputting the data into a motion estimation network, and returning to the step 3.1;
step 3.9: carrying out smoothing processing on the initial path meeting the requirements and outputting the initial path;
and 4, step 4: tracking the path by using a model predictive control algorithm;
step 4.1: establishing an intelligent agent tracking model;
setting mass pointiFor virtual leadersThe initial time position is the mass pointiThe reference track is a smoothed particleiA path of (a); set and particleiThe corresponding intelligent agent is a followerThe initial time position of the agent is the agent in step 1iThe position of (a); set following personAnd a virtual leaderThe ideal control relationship between the two is;
l 1 Representing a distance between the virtual leader and the follower;
represents a deviation in orientation between the virtual leader and the follower, anAll initial values of (1) are 0;
step 4.2: according to virtual leaderAnd following personAcquiring the virtual leader according to the respective speed, angular speed and distance between the two under the global coordinate systemAnd following personThe expression of the control relation is combined with the kinematic formula of the intelligent agent to establish a tracking control model;
step 4.3: according to following personPosition and initial velocity of initial time, and tracking control model for predicting follower at t timeAnd a virtual leaderControl relationship therebetween;
Step 4.4: predicted by step 4.3With setting in step 4.1Comparing and calculating the error e of the two t And correcting;
step 4.5: error e is corrected by adopting particle swarm algorithm t Carry out the bestCorrecting, calculating the follower at t +1 time by speed input at t +1 timeAnd a virtual leaderControl relationship of;
Step 4.6: judging whether the control termination time is reached, if so, outputting a tracking path, and otherwise, returning to the step 4.4; the control termination time is the duration of the reference trajectory in step 4.1;
step 4.7: and tracking the initial paths of the rest particles according to the steps 4.1-4.6, and finally completing path planning of all the intelligent agents.
2. The reinforcement learning and model predictive control-based multi-agent path planning method of claim 1, wherein: the expression of the tracking control model in the step 4.2 is as follows:
the expression of the relationship between the virtual leader and follower is:
the kinematic formula of the intelligent agent is as follows:
wherein the content of the first and second substances,is the speed of the follower in the direction of the speed of the follower,is the angular velocity of the follower and,is the speed of the virtual leader and is,is the angular velocity of the virtual leader and,representing the follower's speed input.
5. The reinforcement learning and model predictive control-based multi-agent path planning method of claim 1, wherein: in the step 3.8, the network parameters are estimated for the action through a strategy gradient formulaThe strategy gradient formula for updating is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111107563.7A CN113759929B (en) | 2021-09-22 | 2021-09-22 | Multi-agent path planning method based on reinforcement learning and model predictive control |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111107563.7A CN113759929B (en) | 2021-09-22 | 2021-09-22 | Multi-agent path planning method based on reinforcement learning and model predictive control |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113759929A CN113759929A (en) | 2021-12-07 |
CN113759929B true CN113759929B (en) | 2022-08-23 |
Family
ID=78796675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111107563.7A Active CN113759929B (en) | 2021-09-22 | 2021-09-22 | Multi-agent path planning method based on reinforcement learning and model predictive control |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113759929B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114857991B (en) * | 2022-05-26 | 2023-06-13 | 西安航天动力研究所 | Control method and system for automatically tracking shooting direction of target plane in tactical training |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
CN110852448A (en) * | 2019-11-15 | 2020-02-28 | 中山大学 | Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning |
CN112488310A (en) * | 2020-11-11 | 2021-03-12 | 厦门渊亭信息科技有限公司 | Multi-agent group cooperation strategy automatic generation method |
CN112488359A (en) * | 2020-11-02 | 2021-03-12 | 杭州电子科技大学 | Multi-agent static multi-target enclosure method based on RRT and OSPA distances |
CN113110509A (en) * | 2021-05-17 | 2021-07-13 | 哈尔滨工业大学(深圳) | Warehousing system multi-robot path planning method based on deep reinforcement learning |
CN113341958A (en) * | 2021-05-21 | 2021-09-03 | 西北工业大学 | Multi-agent reinforcement learning movement planning method with mixed experience |
CN113392935A (en) * | 2021-07-09 | 2021-09-14 | 浙江工业大学 | Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA3060900A1 (en) * | 2018-11-05 | 2020-05-05 | Royal Bank Of Canada | System and method for deep reinforcement learning |
-
2021
- 2021-09-22 CN CN202111107563.7A patent/CN113759929B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
CN110852448A (en) * | 2019-11-15 | 2020-02-28 | 中山大学 | Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning |
CN112488359A (en) * | 2020-11-02 | 2021-03-12 | 杭州电子科技大学 | Multi-agent static multi-target enclosure method based on RRT and OSPA distances |
CN112488310A (en) * | 2020-11-11 | 2021-03-12 | 厦门渊亭信息科技有限公司 | Multi-agent group cooperation strategy automatic generation method |
CN113110509A (en) * | 2021-05-17 | 2021-07-13 | 哈尔滨工业大学(深圳) | Warehousing system multi-robot path planning method based on deep reinforcement learning |
CN113341958A (en) * | 2021-05-21 | 2021-09-03 | 西北工业大学 | Multi-agent reinforcement learning movement planning method with mixed experience |
CN113392935A (en) * | 2021-07-09 | 2021-09-14 | 浙江工业大学 | Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism |
Non-Patent Citations (1)
Title |
---|
基于神经网络的多卫星姿态协同控制预测模型;宁宇等;《空间控制技术与应用》;20200415(第02期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113759929A (en) | 2021-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jiang et al. | Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge | |
CN112685165A (en) | Multi-target cloud workflow scheduling method based on joint reinforcement learning strategy | |
CN114741886B (en) | Unmanned aerial vehicle cluster multi-task training method and system based on contribution degree evaluation | |
Wang et al. | Model-based reinforcement learning for decentralized multiagent rendezvous | |
CN110716575A (en) | UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning | |
Balakrishna et al. | On-policy robot imitation learning from a converging supervisor | |
CN113759929B (en) | Multi-agent path planning method based on reinforcement learning and model predictive control | |
Huang et al. | To imitate or not to imitate: Boosting reinforcement learning-based construction robotic control for long-horizon tasks using virtual demonstrations | |
Wu et al. | Torch: Strategy evolution in swarm robots using heterogeneous–homogeneous coevolution method | |
CN110716574A (en) | UUV real-time collision avoidance planning method based on deep Q network | |
CN118201742A (en) | Multi-robot coordination using a graph neural network | |
Hafez et al. | Improving robot dual-system motor learning with intrinsically motivated meta-control and latent-space experience imagination | |
Sumiea et al. | Enhanced deep deterministic policy gradient algorithm using grey wolf optimizer for continuous control tasks | |
Chen et al. | Survey of multi-agent strategy based on reinforcement learning | |
CN115366099B (en) | Mechanical arm depth deterministic strategy gradient training method based on forward kinematics | |
CN116578080A (en) | Local path planning method based on deep reinforcement learning | |
Zhang et al. | Auto-conditioned recurrent mixture density networks for learning generalizable robot skills | |
CN116340737A (en) | Heterogeneous cluster zero communication target distribution method based on multi-agent reinforcement learning | |
CN116149179A (en) | Non-uniform track length differential evolution iterative learning control method for robot fish | |
CN115273502A (en) | Traffic signal cooperative control method | |
Riccio et al. | LoOP: Iterative learning for optimistic planning on robots | |
CN114118371A (en) | Intelligent agent deep reinforcement learning method and computer readable medium | |
Xu et al. | Reinforcement learning with construction robots: A preliminary review of research areas, challenges and opportunities | |
Yu et al. | A novel automated guided vehicle (AGV) remote path planning based on RLACA algorithm in 5G environment | |
Tang et al. | Hierarchical reinforcement learning based on multi-agent cooperation game theory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |