CN115797394B - Multi-agent coverage method based on reinforcement learning - Google Patents

Multi-agent coverage method based on reinforcement learning Download PDF

Info

Publication number
CN115797394B
CN115797394B CN202211432494.1A CN202211432494A CN115797394B CN 115797394 B CN115797394 B CN 115797394B CN 202211432494 A CN202211432494 A CN 202211432494A CN 115797394 B CN115797394 B CN 115797394B
Authority
CN
China
Prior art keywords
agent
coverage
mobile
area
agents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211432494.1A
Other languages
Chinese (zh)
Other versions
CN115797394A (en
Inventor
孙新苗
任明里
丁大伟
任莹莹
王恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202211432494.1A priority Critical patent/CN115797394B/en
Publication of CN115797394A publication Critical patent/CN115797394A/en
Application granted granted Critical
Publication of CN115797394B publication Critical patent/CN115797394B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Feedback Control In General (AREA)

Abstract

The invention discloses a multi-agent coverage method based on reinforcement learning, which comprises the following steps: the method comprises the steps of determining the positions of a plurality of static intelligent agents in an area with the aim of maximizing coverage performance, and dividing the area into an effective coverage area and an ineffective coverage area according to the positions of the static intelligent agents; calculating the maximum coverage performance which can be obtained by the mobile intelligent agent; setting observation and actions of the mobile intelligent agent, and setting rewards of the mobile intelligent agent based on the maximum coverage performance which can be obtained by the mobile intelligent agent; each mobile intelligent agent aims at maximizing a respective rewarding function, and based on a reinforcement learning algorithm, a plurality of mobile intelligent agents interact with the environment at the same time to perform distributed training, so that a motion plan of each mobile intelligent agent is obtained, and coverage of an ineffective coverage area is realized. The technical scheme of the invention can realize the effective coverage of the region completed by the cooperation of multiple agents and improve the coverage performance of the region.

Description

Multi-agent coverage method based on reinforcement learning
Technical Field
The invention relates to the technical field of multi-agent system coverage optimization, in particular to a multi-agent coverage method based on reinforcement learning.
Background
With the rapid development of computer and mems, robotics and communication technologies, multi-intelligent systems are receiving more and more attention and are being applied to various fields such as coverage. The multi-agent area coverage means that a plurality of agents form a team, and the whole area is effectively covered by a cooperation strategy. The plurality of agents cooperatively perform the area coverage task, so that the target task can be more efficiently completed, the limit of the number and the angle of single agent sensors can be overcome, and the intelligent vehicle sensor has the redundancy characteristic. At present, although a scheme can solve the problem of full coverage of multiple intelligent agents on an area, the coverage performance is improved while effective coverage cannot be realized.
Disclosure of Invention
The invention provides a multi-agent coverage method based on reinforcement learning, which is used for rapidly realizing effective coverage of an area and improving the coverage performance of the area.
In order to solve the technical problems, the invention provides the following technical scheme:
in one aspect, the present invention provides a reinforcement learning-based multi-agent coverage method, the multi-agent including a plurality of stationary agents and a plurality of mobile agents, the multi-agent coverage method comprising:
the method comprises the steps of determining the positions of a plurality of static intelligent agents in an area with the aim of maximizing coverage performance, and dividing the area into an effective coverage area and an ineffective coverage area according to the positions of the static intelligent agents;
calculating the maximum coverage performance which can be obtained by the mobile intelligent agent;
setting up the observation and action of each mobile agent to the environment, and setting up rewards of the mobile agents based on the maximum coverage performance that the mobile agents can obtain; each mobile intelligent agent aims at maximizing a respective rewarding function, and based on a reinforcement learning algorithm, a plurality of mobile intelligent agents interact with the environment at the same time to perform distributed training, so that the motion planning of each mobile intelligent agent is obtained, and coverage of an ineffective coverage area is realized.
Further, the determining the location of the plurality of stationary agents in the area with the goal of maximizing coverage performance includes:
the positions of the plurality of stationary agents in the area are adjusted so that the coverage can be as large as possible.
Further, the calculation function H (S) of the coverage performance is as follows:
H(S)=∫R(x)P(x,S)dx
wherein P (x, S) is the mid-point of the multi-agent in the regionThe joint detection probability at x is set to, p i (x,s i ) The detection probability of the ith agent is N, the number of agents is N, and R (x) is an event density function.
Further, when the area is divided into an effective coverage area and an ineffective coverage area, judging whether a point x in the area is effectively covered or not according to whether the joint detection probability P (x, S) of the multi-agent at the x is larger than a preset threshold, and when the P (x, S) is larger than the preset threshold, indicating that the x is effectively covered, otherwise, the x is not effectively covered.
Further, the mobile intelligent body observes the environment as three binary images; wherein,,
the first binary image shows the area which is not effectively covered at present;
the second binary image shows the position of the current mobile intelligent agent;
the third binary image shows the location of other mobile agents than the current mobile agent.
Further, the action set of the mobile agent is {0,1,2,3,4}, which respectively indicates that the mobile agent is stationary, the mobile agent moves up, the mobile agent moves down, the mobile agent moves left, and the mobile agent moves right.
Further, the rewards of the environment to the mobile agent, reward, are:
Reward=(H currently, the method is that -H max )/10+incres*30
Wherein H is Currently, the method is that Coverage performance of the mobile intelligent body at the current position is achieved; h max Maximum coverage performance obtainable for a mobile agent; incres is the area of the effective coverage area newly added at the previous time; the first part of the reward represents the difference between the coverage performance of the mobile agent at the current position and the maximum value, and the second part of the reward is the newly increased effectiveness at the last momentCoverage area.
Further, based on the reinforcement learning algorithm, a plurality of mobile intelligent agents interact with the environment at the same time, and when distributed training is performed, an actor network and a critic network of the mobile intelligent agents are set to be two-layer convolution layers plus three-layer full-connection layers; the first layer of convolution layer in the network is 16 convolution kernels of 20 x 20, the second layer of convolution layer is 8 convolution kernels of 10 x 10, and the channel numbers of the three layers of full-connection layers are 256, 128 and 64 respectively.
In yet another aspect, the present invention also provides an electronic device including a processor and a memory; wherein the memory stores at least one instruction that is loaded and executed by the processor to implement the above-described method.
In yet another aspect, the present invention also provides a computer readable storage medium having at least one instruction stored therein, the instruction being loaded and executed by a processor to implement the above method.
The technical scheme provided by the invention has the beneficial effects that at least:
1. the invention can realize the effective coverage of the multi-agent cooperation completion area.
2. The invention utilizes the decision optimization capability of reinforcement learning, and can improve the coverage performance of the area while realizing effective coverage. The method has the advantages of high efficiency and strong robustness.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a reinforcement learning-based multi-agent coverage method provided by an embodiment of the present invention;
FIG. 2 is a schematic illustration of stationary agent location deployment provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of mobile agent and environment interactions provided by an embodiment of the present invention;
FIG. 4 is a graph showing the effective coverage as a function of step size provided by an embodiment of the present invention;
fig. 5 is a schematic diagram showing the coverage performance according to the step length according to the embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
First embodiment
The embodiment provides a multi-agent coverage method based on reinforcement learning. Firstly, it should be noted that the multi-agent in this embodiment includes two types of agents, namely a stationary agent and a mobile agent, and by controlling the movement of the mobile agent, the effective coverage of the area is achieved and the coverage performance of the area is improved.
Based on the above, the execution flow of the method of this embodiment is shown in fig. 1, and includes the following steps:
s1, determining the positions of a plurality of static intelligent agents in an area with the aim of maximizing coverage performance, and dividing the area into an effective coverage area and an ineffective coverage area according to the positions of the static intelligent agents;
it should be noted that, when determining the position of the stationary agent, the objective should be to maximize coverage performance, i.e., to adjust the position s= (S) of the multi-agent 1 ,…,s N ) The coverage performance function H (S) is made as large as possible, wherein the coverage performance of the multi-agent in the area is the integral of the product of the event density and the detection probability in the area, namely: h (S) = ≡r (x) P (x, S) dx, where P (x, S) is the joint detection probability of the multi-agent system S at point x,p i (x,s i ) The detection probability for the ith agent is typically x and s i And the monotone decreasing function of the distance between the two agents is that N is the number of the agents, and R (x) is an event density function.
Specifically, in the present embodiment, the positional deployment of the stationary agent is as shown in fig. 2, where the gray area is an area where effective coverage has been achieved. The basis for judging whether a point x in the area is effectively covered is whether the multi-agent joint detection probability P (x, S) at the point x is larger than a threshold value rho, when P (x, S) is larger than rho, the point x is effectively covered, otherwise, the point x is not effectively covered. After the ineffective coverage areas are obtained, the goal of the mobile agent is to cover the ineffective coverage areas, namely P (x, S) is equal to or larger than ρ at a certain moment, and the coverage performance H (S) is improved as much as possible in the moving process.
S2, calculating the maximum coverage performance which can be obtained by the mobile intelligent agent;
in the present embodiment, the maximum coverage performance H that can be obtained by the mobile agent is calculated max I.e., maximizing the coverage performance H (S) = c R (x) P (x, S) dx. The goal is to use the calculation of the mobile agent reward function in a later step, where the situation where the number of mobile agents is small can be usually calculated with greedy algorithms, i.e. adding one mobile agent at a time on the basis of stationary agents, results in the most increase in coverage performance.
S3, setting the observation and actions of each mobile intelligent agent on the environment, and setting rewards of the mobile intelligent agents based on the maximum coverage performance which can be obtained by the mobile intelligent agents; each mobile intelligent agent aims at maximizing a respective rewarding function, and based on a reinforcement learning algorithm, a plurality of mobile intelligent agents interact with the environment at the same time to perform distributed training, so that the motion planning of each mobile intelligent agent is obtained, and coverage of an ineffective coverage area is realized.
It should be noted that, the above steps are preparation steps for training the mobile agent by reinforcement learning, and fig. 3 is an exemplary interaction between three mobile agents and the environment, where an action set of the mobile agent, an observation of the environment by the agent, and a reward function of the environment to the agent need to be set before training the mobile agent by reinforcement learning. The environment is a grid environment in which a static intelligent agent exists, the mobile intelligent agent can select 5 actions of static, upward movement, downward movement, leftward movement and rightward movement in the environment, and the set of actions of the mobile intelligent agent is {0,1,2,3,4}, which respectively represent the static, upward movement, downward movement, leftward movement and rightward movement, and the movement distance is one grid. In order to cooperatively realize effective coverage of the areas, the observation of the environment by the intelligent agent is set to three binary images, wherein the first binary image shows the areas which are not effectively covered at present, and the grid mark 1 which is effectively covered and the grid mark 0 which is not effectively covered are arranged in the grid mark 1 which is effectively covered; the second binary image shows the current position of the mobile intelligent agent, and the position mark 1 of the intelligent agent is positioned; the third binary image shows the current location of other mobile agents, labeled 1 at the grid where the other mobile agents are located. Setting the environment to rewards of the intelligent agent into two parts, and respectively realizing the targets of effectively covering and improving the covering performance. The rewards of the environment to the mobile agent are as follows:
Reward=(H currently, the method is that -H max )/10+incres*30
Wherein H is Currently, the method is that Coverage performance of the mobile intelligent body at the current position is achieved; incres is the area of the effective coverage area that was newly increased from the previous time; the first part of the reward represents the gap between the coverage performance of the mobile agent at the current location and the maximum value, and the second part of the reward is the newly increased effective coverage area compared to the last time. The function is used as a reward to improve the coverage performance of the area while realizing effective coverage quickly.
Further, when a plurality of mobile intelligent agents interact with the environment at the same time based on a reinforcement learning algorithm and perform distributed training, an actor network and a critic network of the mobile intelligent agents are set to be two-layer convolution layers plus three-layer full-connection layers; the first layer of convolution layer in the network is 16 convolution kernels of 20 x 20, the second layer of convolution layer is 8 convolution kernels of 10 x 10, and the channel numbers of the three layers of full-connection layers are 256, 128 and 64 respectively.
Further, when training a plurality of mobile agents simultaneously, the embodiment uses a near-end policy optimization algorithm (PPO) for training, wherein the near-end policy optimization algorithm is a model-free and online policy gradient reinforcement learning method, and the specific process is as follows:
a. the random parameter θ is used to initialize the actor pi (A|S; θ) and the random parameter φ is used to initialize the critic V (S; φ).
b. Following the current strategy, N segments of experience are generated, the sequence of experience is:
S ts ,A ts ,R ts+1 ,…,S ts+N-1 ,A ts+N-1 ,R ts+N ,S ts+N
wherein A is t Is in state S t Action taken, S t+1 Is the next state, R t+1 Is state S t Transition to S t+1 Awards obtained, the agent at S t Where pi (A|S; θ) is used to calculate the probability of taking each action and randomly select action A based on the probability distribution t
c. For each step t=ts+1, ts+2, …, ts+N, a return value G is calculated t And an advantage function D tδ k =R t +bγV(S t ;φ),G t =D t +V(S t The method comprises the steps of carrying out a first treatment on the surface of the Phi), where, when S ts+N B is 0 when in the end state, otherwise 1, lambda is the smoothing coefficient, gamma is the discount coefficient.
d. Randomly acquiring small batch data with the size of M from the current experience set, learning the small batch data, and minimizing a loss functionTo update the critic parameters by minimizing the action loss function +.> To update actor, where r i (θ)=π(A i |S i ;θ)/π(A i |S i ;θ old ),c i (θ)=max(min(r i (θ), 1+ε), 1- ε), the entropy loss function is increased to facilitate exploration of an agent>
e. Repeating b through d until the training termination condition is reached.
By executing the above steps, the change of the effective coverage area ratio along with the step length after the training of the embodiment is completed is shown in fig. 4, so that the coverage rate of 97% can be achieved in the embodiment, the change of the coverage performance along with the step length is shown in fig. 5, and the coverage performance can be improved both in the process of realizing the effective coverage and after the effective coverage is realized.
In summary, the present embodiment provides a multi-agent coverage method based on reinforcement learning, which can achieve effective coverage of a multi-agent cooperation completion area. The method utilizes decision optimization capability of reinforcement learning, and can improve coverage performance of an area while realizing effective coverage. The method has the advantages of high efficiency and strong robustness.
Second embodiment
The embodiment provides an electronic device, which comprises a processor and a memory; wherein the memory stores at least one instruction that is loaded and executed by the processor to implement the method of the first embodiment.
The electronic device may vary considerably in configuration or performance and may include one or more processors (central processing units, CPU) and one or more memories having at least one instruction stored therein that is loaded by the processors and performs the methods described above.
Third embodiment
The present embodiment provides a computer-readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the method of the first embodiment described above. The computer readable storage medium may be, among other things, ROM, random access memory, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. The instructions stored therein may be loaded by a processor in the terminal and perform the methods described above.
Furthermore, it should be noted that the present invention can be provided as a method, an apparatus, or a computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
It is finally pointed out that the above description of the preferred embodiments of the invention, it being understood that although preferred embodiments of the invention have been described, it will be obvious to those skilled in the art that, once the basic inventive concepts of the invention are known, several modifications and adaptations can be made without departing from the principles of the invention, and these modifications and adaptations are intended to be within the scope of the invention. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Claims (1)

1. A reinforcement learning-based multi-agent coverage method, wherein the multi-agent comprises a plurality of stationary agents and a plurality of mobile agents, the multi-agent coverage method comprising:
the method comprises the steps of determining the positions of a plurality of static intelligent agents in an area with the aim of maximizing coverage performance, and dividing the area into an effective coverage area and an ineffective coverage area according to the positions of the static intelligent agents;
calculating the maximum coverage performance which can be obtained by the mobile intelligent agent;
setting up the observation and action of each mobile agent to the environment, and setting up rewards of the mobile agents based on the maximum coverage performance that the mobile agents can obtain; each mobile intelligent agent aims at maximizing a respective rewarding function, and based on a reinforcement learning algorithm, a plurality of mobile intelligent agents interact with the environment at the same time, perform distributed training, obtain a motion plan of each mobile intelligent agent, and realize coverage of an ineffective coverage area;
the determining the location of the plurality of stationary agents in the area with the goal of maximizing coverage performance includes:
adjusting the positions of the plurality of stationary agents in the area so that the coverage performance is as large as possible;
the calculation function H (S) of the coverage performance is as follows:
H(S)=∫R(x)P(x,S)dx
wherein P (x, S) is the joint detection probability of multiple agents at the midpoint x of the region, p i (x,s i ) The detection probability of the ith agent is represented by N, the number of the agents is represented by R (x), and the event density function is represented by R (x);
when the area is divided into an effective coverage area and an ineffective coverage area, judging whether one point x in the area is effectively covered or not according to whether the joint detection probability P (x, S) of multiple intelligent agents at the x is larger than a preset threshold, and when the P (x, S) is larger than the preset threshold, indicating that the x is effectively covered, otherwise, the x is not effectively covered;
the mobile intelligent body observes the environment in three binary images; wherein,,
the first binary image shows the area which is not effectively covered at present;
the second binary image shows the position of the current mobile intelligent agent;
the third binary image shows the positions of other mobile intelligent agents except the current mobile intelligent agent;
the action set of the mobile agent is {0,1,2,3,4}, which respectively indicates that the mobile agent is stationary, the mobile agent moves upwards, the mobile agent moves downwards, the mobile agent moves leftwards and the mobile agent moves rightwards;
the rewards of the environment to the mobile agent are as follows:
Reward=(H currently, the method is that -H max )/10+incres*30
Wherein H is Currently, the method is that Coverage performance of the mobile intelligent body at the current position is achieved; h max Maximum coverage performance obtainable for a mobile agent; incres is the area of the effective coverage area that was newly increased from the previous time; the first part of the rewards represents the difference between the coverage performance of the mobile agent at the current position and the maximum value, and the second part of the rewards is the effective coverage area which is newly increased compared with the last moment;
the method comprises the steps that based on a reinforcement learning algorithm, a plurality of mobile intelligent agents interact with the environment at the same time, and when distributed training is carried out, an actor network and a critic network of the mobile intelligent agents are set to be two-layer convolution layers plus three-layer full-connection layers; the first layer of convolution layer in the network is 16 convolution kernels of 20 x 20, the second layer of convolution layer is 8 convolution kernels of 10 x 10, and the channel numbers of the three layers of full-connection layers are 256, 128 and 64 respectively.
CN202211432494.1A 2022-11-15 2022-11-15 Multi-agent coverage method based on reinforcement learning Active CN115797394B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211432494.1A CN115797394B (en) 2022-11-15 2022-11-15 Multi-agent coverage method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211432494.1A CN115797394B (en) 2022-11-15 2022-11-15 Multi-agent coverage method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN115797394A CN115797394A (en) 2023-03-14
CN115797394B true CN115797394B (en) 2023-09-05

Family

ID=85438088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211432494.1A Active CN115797394B (en) 2022-11-15 2022-11-15 Multi-agent coverage method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN115797394B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
WO2022083029A1 (en) * 2020-10-19 2022-04-28 深圳大学 Decision-making method based on deep reinforcement learning
CN114879742A (en) * 2022-06-17 2022-08-09 电子科技大学 Unmanned aerial vehicle cluster dynamic coverage method based on multi-agent deep reinforcement learning
CN115327926A (en) * 2022-09-15 2022-11-11 中国科学技术大学 Multi-agent dynamic coverage control method and system based on deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022083029A1 (en) * 2020-10-19 2022-04-28 深圳大学 Decision-making method based on deep reinforcement learning
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
CN114879742A (en) * 2022-06-17 2022-08-09 电子科技大学 Unmanned aerial vehicle cluster dynamic coverage method based on multi-agent deep reinforcement learning
CN115327926A (en) * 2022-09-15 2022-11-11 中国科学技术大学 Multi-agent dynamic coverage control method and system based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Rapid Coverage Control with Multi-agent Systems Based on K-Means Algorithm;YuZe Feng等;《2020 7th International Conference on Information, Cybernetics, and Computational Social Systems (ICCSS)》;第870-873页 *

Also Published As

Publication number Publication date
CN115797394A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN111563188B (en) Mobile multi-agent cooperative target searching method
US10175662B2 (en) Method of constructing navigation map by robot using mouse hippocampal place cell model
CN113110509B (en) Warehousing system multi-robot path planning method based on deep reinforcement learning
Oftadeh et al. A novel meta-heuristic optimization algorithm inspired by group hunting of animals: Hunting search
CN110327624B (en) Game following method and system based on curriculum reinforcement learning
CN111260027B (en) Intelligent agent automatic decision-making method based on reinforcement learning
CN104462856B (en) Ship Conflict Early Warning Method
CN111226234B (en) Method, apparatus and computer program for creating deep neural networks
JP4028384B2 (en) Agent learning apparatus, method, and program
CN111914878B (en) Feature point tracking training method and device, electronic equipment and storage medium
CN113344972B (en) Fish track prediction method based on intensive culture
CN115797394B (en) Multi-agent coverage method based on reinforcement learning
CN110930429A (en) Target tracking processing method, device and equipment and readable medium
CN115239760B (en) Target tracking method, system, equipment and storage medium
Das et al. Chemo-inspired genetic algorithm for function optimization
CN116341605A (en) Grey wolf algorithm hybrid optimization method based on reverse learning strategy
CN112651486A (en) Method for improving convergence rate of MADDPG algorithm and application thereof
CN104504935B (en) Navigation traffic control method
CN110009611B (en) Visual target dynamic counting method and system for image sequence
WO2003075221A1 (en) Mechanism for unsupervised clustering
CN106803361A (en) Navigation control method based on rolling planning strategy
CN113821270A (en) Task unloading sequence prediction method, decision-making method, electronic device and storage medium
CN114330933B (en) Execution method of meta-heuristic algorithm based on GPU parallel computation and electronic equipment
Chen et al. The research and application of improved ant colony algorithm with multi-thresholds in edge detection
US20220198225A1 (en) Method and system for determining action of device for given state using model trained based on risk-measure parameter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant