CN116647459A - Multi-agent co-evolution topological robustness optimization method for Internet of things - Google Patents

Multi-agent co-evolution topological robustness optimization method for Internet of things Download PDF

Info

Publication number
CN116647459A
CN116647459A CN202310614147.9A CN202310614147A CN116647459A CN 116647459 A CN116647459 A CN 116647459A CN 202310614147 A CN202310614147 A CN 202310614147A CN 116647459 A CN116647459 A CN 116647459A
Authority
CN
China
Prior art keywords
network
network topology
agent
actor
intelligent agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310614147.9A
Other languages
Chinese (zh)
Inventor
杨欣微
邱铁
陈宁
张松伟
徐天一
周晓波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202310614147.9A priority Critical patent/CN116647459A/en
Publication of CN116647459A publication Critical patent/CN116647459A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application discloses a topological robust optimization method of an Internet of things by multi-agent co-evolution, which comprises the following steps: s1, initializing a reinforcement learning neural network, a multi-agent population and a scaleless network topology environment, wherein the scaleless network topology environment comprises a network topology, an experience pool and an action set; s2, designing an actor network intelligent agent and a scaleless network topology environment distributed interaction method; s3, designing a multi-agent population evolution algorithm; s4, designing a reinforcement learning optimization algorithm; step 5: periodically repeating S2, S3 and S4 in one independent repetition experiment; along with the evolution of the multi-agent population and the updating of the reinforcement learning neural network, when the robustness of the outputted scaleless network topology is within a set value after a plurality of iteration times, or the set maximum iteration times are reached, the cycle is terminated, and the final structure of the scaleless network topology is used as the optimization result of the method.

Description

Multi-agent co-evolution topological robustness optimization method for Internet of things
Technical Field
The application relates to the technical field of the Internet of things, in particular to a topological robustness optimization method of the Internet of things with multi-agent cooperative evolution.
Background
With the spread of 5G communication services, internet of things has been incorporated into the production and life of modern society, such as in the fields of smart medicine, military command, electronic transportation, and intelligent education. The intelligent services of the Internet of things industry all depend on stable connection of networks, so that data interaction is maintained. If an automatic driving automobile monitors and responds to road conditions in real time through the intercommunication interconnection of the sensors, the manufacturing industry relies on the sensor communication to exchange data so as to realize quick and quick production workflow. Nowadays, malicious attacks against internet of things devices are emerging. If the sensor device is attacked and fails to exit the network, the network connectivity of the internet of things is directly reduced, so that all intelligent services are interrupted. Therefore, improving the capability of the internet of things network to resist malicious attacks is crucial to maintaining the normal operation of the intelligent services of the internet of things.
The internet of things network realizes a communication function through node connection, and a network connection mode is called a network topology structure. The degree of stability of the network topology, i.e. the degree of network connectivity after an attack, is defined as the robustness of the network topology. Therefore, the robustness of the internet of things network is greatly affected by the network topology. Currently, the mainstream internet of things has a topological structure including a scale network, a small world network, a motif network and the like. The scale-free network is a complex network with degree distribution conforming to power law distribution, most nodes in the network have few connecting edges, and some nodes have more connecting edges, so that the scale-free network is more suitable for the connection characteristics of Internet of things equipment in the real world, and therefore, most Internet of things networks are modeled based on the scale-free network. The characteristic that the degree distribution accords with the power law distribution makes the scale-free network have strong robustness to random attack and poor resistance to malicious attack, so that many researchers aim at improving the robustness of the scale-free network topology to the malicious attack, and the stability of the Internet of things is maintained.
Aiming at the problem of optimizing the topological robustness of the Internet of things network, at present, some mainstream algorithms utilize an evolutionary algorithm to improve the topological robustness of a scaleless network, such as a genetic algorithm, an ant colony algorithm and the like. However, as the network scale increases, the computational overhead of the evolutionary algorithm increases significantly. Thus, researchers have used more intelligent machine learning methods to train network topologies to obtain highly robust scaleless network structures with lower computational overhead. Such as journal "Deep Actor-Critic Learning-Based Robustness Enhancement of Internet of Things" [1] (DDLP) trains a scaleless network topology using Deep reinforcement Learning. However, because the state space and the action space of the topological environment are larger, when experience is generated by using reinforcement learning, more experience with the reward value of nearly 0 is generated, so that the sparse reward problem is caused, and the learning ability and the learning efficiency of the intelligent agent are reduced. In addition, the use of a single agent in reinforcement learning explores the topological environment, which results in a single learning direction and a tendency to fall into local optimization. Some researchers believe that combining evolutionary algorithms with reinforcement learning can solve the problems of sparse rewards and easy sinking into local optima in reinforcement learning.
Disclosure of Invention
The application aims to overcome the defects in the prior art, and provides a high-efficiency multi-agent cooperative evolution topological robust optimization method for the Internet of things, which improves the capability of a scaleless network topology for resisting malicious attacks, improves the algorithm optimization efficiency, obtains a more robust scaleless network topology structure with fewer iterations, and enhances the service quality of the Internet of things network.
The application aims at realizing the following technical scheme:
a topological robust optimization method of the Internet of things for multi-agent co-evolution comprises the following steps:
s1, initializing a reinforcement learning neural network, a multi-agent population and a scaleless network topology environment, wherein the scaleless network topology environment comprises a network topology, an experience pool and an action set;
the reinforcement learning neural network comprises an actor network intelligent agent, a critic network, a Q network and a target critic network, and weight values of the actor network intelligent agent, the critic network, the Q network and the target critic network are randomly set during initialization;
the multi-agent population comprises n individuals, each individual is an actor network agent with the same structure as an actor network in the reinforcement learning neural network, and the weight values of the n individuals are randomly generated;
when initializing network topology, firstly generating 4 initial nodes, and carrying out full connection according to the communication range of each node; the newly added node is preferentially connected with the node with the largest degree in the network topology, so that the non-scale characteristic of the network topology is ensured; the positions of all nodes in the network topology are fixed after generation, and all nodes form a node set; meanwhile, according to whether nodes in the network topology are connected or not, an adjacency matrix corresponding to the network topology is obtained; the upper triangular matrix of the adjacent matrix is spliced according to the rows to obtain a one-dimensional matrix, and the one-dimensional matrix is used for representing the network topology connection state and is used as the input of an actor network intelligent agent; when initializing an experience pool, reserving an empty storage space for each actor network intelligent agent for subsequent steps; the action set is a hash table, and the table stores operable edges existing in the current network topology;
s2, designing an actor network intelligent agent and a scaleless network topology environment distributed interaction method;
the method comprises the steps that an actor network intelligent agent in a multi-intelligent agent population and an actor network intelligent agent in a reinforcement learning neural network interact with a scaleless network topology environment in a distributed manner; when interacting with a scaleless network topology environment, according to the input network topology connection state, an actor network intelligent agent outputs an action value, the output action value is mapped to a pair of edges in the network topology, and the two edges are exchanged according to an edge exchange strategy to obtain a new network topology connection state, and the new network topology connection state is used as the next input of the corresponding actor network intelligent agent until the maximum interaction times; after each time of executing the edge exchange strategy, obtaining a reward value of an action value output by an action network intelligent agent under an input network topology connection state, taking the < input network topology connection state, the action value, the reward value and a new network topology connection state > in the process as an experience, and storing the experience into an experience pool for training the action network intelligent agent in a subsequent reinforcement learning neural network; the interaction adopts a distributed asynchronous interaction mode, n+1 active network agents run in parallel, the running processes are isolated from each other, and as the weight values of the active network agents are different, different action values are respectively output for the same input, so that a scale-free network topology environment is explored in a plurality of directions;
s3, designing a multi-agent population evolution algorithm;
in the interaction process of the actor network intelligent agent and the scale-free network topological environment, calculating to obtain an individual fitness value; screening two actor network intelligent agents as parents according to individual fitness values, and performing cross operation on the two parents; in addition, randomly screening an actor network intelligent agent, and carrying out mutation operation on the actor network intelligent agent; the new actor network agent generated by the cross operation and the mutation operation is used as a child, and the individual with the lowest individual fitness value in the multi-agent population is replaced, so that the evolution of the multi-agent population is completed;
s4, designing a reinforcement learning optimization algorithm;
combining a soft actor-critic algorithm with the multi-agent population evolution algorithm of the S3; acquiring experiences in an experience pool through a soft actor-critic algorithm, and updating an actor network intelligent agent with the aim of maximizing a search entropy and a reward value; after updating parameters of the actor network intelligent agent, inserting the actor network intelligent agent into the multi-intelligent agent population with the probability of 0.5, replacing individuals with the lowest individual fitness value in the multi-intelligent agent population, and realizing synchronous updating of the reinforcement learning neural network and the multi-intelligent agent population;
step 5: periodically repeating S2, S3 and S4 in one independent repetition experiment; along with the evolution of multi-agent population and the update of reinforcement learning neural network, the actor network agents continuously output action values capable of obtaining higher rewards, and in the interaction of the action values and the scale-free network topology environment, a robust scale-free network topology structure is continuously obtained; when the robustness of the outputted unscaled network topology is within a set value after a plurality of iteration times, or reaches a set maximum iteration number, the loop is terminated, and the final unscaled network topology structure is used as an optimization result of the method.
The application also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the multi-agent co-evolution Internet of things topology robust optimization method when executing the program.
The application also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the multi-agent co-evolving topological robustness optimization method for the internet of things.
Compared with the prior art, the technical scheme of the application has the following beneficial effects:
1. according to the application, a large-scale topological environment is searched in a distributed manner from different directions by utilizing multiple agent populations, so that the problem that the traditional reinforcement learning is easy to fall into local optimum by using single agent for searching is avoided, and the network topology robustness optimization effect of the Internet of things is improved.
2. Aiming at the problem of low learning efficiency caused by sparse rewards in traditional reinforcement learning, the application replaces rewards by individual fitness in multi-agent population. Individual fitness reflects rewards for a certain period of time, not rewards for a certain action. And screening, crossing and mutating the intelligent agents in the population by utilizing the individual fitness, continuously evolving the intelligent agent population, so that more excellent experiences are generated for reinforcement learning to learn, and the algorithm learning efficiency is improved. Meanwhile, a plurality of agents interact with the topological environment in a distributed manner, so that different agents can explore the environment from different directions and generate experience, and the efficiency of the agents for exploring the large-scale topological environment is further improved. Finally, the algorithm obtains a high-robustness scale-free network topology structure with high efficiency.
3. According to the method, the topology robustness of the Internet of things is optimized by adopting multi-agent cooperative evolution for the first time, and the problem that local optimization is easy to fall into when a single agent is used for optimizing the topology of the Internet of things by using a reinforcement learning algorithm is solved. According to the application, the multi-agent explores a scaleless network environment from multiple directions, so that the range of an algorithm searching solution space is enlarged, the possibility that the algorithm jumps out of local optimum is improved, and finally, the Internet of things topological structure with higher robustness is obtained. Experiments prove that the method can improve the optimized topological robustness to more than 80% of the initial topological robustness, and compared with DDLP, the method improves the robustness optimization rate by 7%.
4. The application adopts a proper population evolution mode, and avoids the sparse rewarding problem generated when the topology is optimized by reinforcement learning. Before the cross strategy is carried out, the individual fitness value is utilized to carry out the screening of the father, so that the generated intelligent agent inherits the strategy advantage of the father with large individual fitness, the learning efficiency of the algorithm is improved, and the more robust Internet of things topology is learned with higher efficiency. Experiments prove that compared with DDLP, the optimization efficiency of the application is improved by 13%.
5. The application designs a distributed interaction strategy of the agents and the topological environment, and realizes asynchronous interaction of a plurality of agents and the topological environment. During interaction, the intelligent agents can operate in parallel, and the operation processes are isolated from each other, so that the operation speed of the algorithm is increased.
Drawings
Fig. 1 is a flow chart of a topological robust optimization method of the internet of things.
Fig. 2 is a schematic diagram of interaction of an agent with a topological environment.
Fig. 3 is a schematic diagram of an agent cross strategy flow.
FIG. 4 is a schematic flow chart of an agent mutation strategy.
Detailed Description
The application is described in further detail below with reference to the drawings and the specific examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The embodiment provides a topological robust optimization method of the Internet of things for multi-agent co-evolution, which comprises the following steps:
step 1: initializing a reinforcement learning neural network, a multi-agent population and a scaleless network topology environment, wherein the scaleless network topology environment comprises an initial network topology, an experience pool and an action set.
(1) Initializing a reinforcement learning neural network. The reinforcement learning neural network comprises an actor network agent pi, a critic network V, Q network Q and a target critic network V', wherein the weight values of the four neural networks are randomly set during initialization, and the weights are respectively defined as phi,θ and->And->The same is true at initialization.
(2) The multi-agent population P is initialized. P comprises n individuals, each individual is an actor network intelligent agent pi i The network structure is the same as pi, and the weights of n individuals are randomly generated. Hereinafter, if not distinguished, are denoted pi and pi i And uniformly using pi.
(3) Initializing a scaleless network topology environment. First, generating 4 initial nodes, and carrying out full connection according to communication range of each node. The newly added node will then preferentially connect to the node in the network topology with the greatest degree, thus guaranteeing the scaleless nature of the original network topology. And meanwhile, obtaining an adjacency matrix of the network topology according to whether nodes in the network topology are connected or not. The upper triangular matrix of the adjacent matrix is spliced according to the rows to obtain a one-dimensional matrix, and the one-dimensional matrix can be used for representing the network topology connection state s and is used as the input of pi. Each pi is when initializing the experience pool RB i An empty memory space is reserved for subsequent steps. The action set VA is a hash tableThe value of each key value pair in the table is the connecting edge of two edges capable of carrying out edge switching strategy in the network topology, namely, the distance between two nodes to be connected is ensured to be in the communication range, and the two nodes are not connected end to end.
Step 2: a distributed interaction method of multiple agent populations and a scaleless network topology environment is designed. Pi in P i And pi in the reinforcement learning neural network interact with the scaleless network topology environment in a distributed manner, as shown in fig. 2. The interaction adopts a distributed asynchronous interaction mode, n+1 n are operated in parallel, and the operation processes are isolated from each other, and as the weight values of the n+1 n are different, different action values a can be respectively output for the same input s, so that the multi-direction exploration of the scale-free network topology environment is realized.
(1) And (5) processing action values. When interacting with a scaleless network topology environment, pi outputs a with the range of [ -1,1] according to s input. And a is mapped by VA to a pair of edges in the network topology.
(2) And executing the exchange edge policy. And performing edge switching operation on the pair of edges, and comparing the robustness values of the two network topologies before and after switching. Because the scale-free network topology environment is large, in order to reduce the influence of useless operation on robustness, judgment is firstly carried out before the operation is carried out, if the robustness value R 'of the network topology after exchange is smaller than a set value t compared with the robustness value R before exchange or R' is smaller than t compared with the robustness value initR of the initial network topology, the network topology connection is retracted to be in a state s before exchange, and pi is re-transmitted to carry out calculation. Otherwise, the state is accepted, and the network topology connection state s before the exchange edge, the network topology connection state s' after the exchange edge, the action value a and the rewarding value r are taken as an experience<s,a,r,s’>Storing in RB, and re-transmitting s' as the next network topology connection state to pi i And (5) performing calculation. RB includes a total experience pool RB total Pool of experience with individuals RB i ,RB total Including all experience of pi generation, while each individual in P maintains an RB i In which only the pi is stored i The experience generated. The above formula for calculating the prize value is defined as follows:
(3) Individual fitness is calculated. If the actor network agent performing the interaction is pi i The fitness value f of the individual needs to be calculated after each step of interaction i The calculation formula is as follows:
circularly executing (1), (2) and (3) for multiple steps to finish pi i Interaction with a scaleless network topology environment and in the process, a great deal of experience is gained with each pi i The fitness value f of (2) i . As can be seen from equation 2, P focuses on rewards during one complete interaction, rather than rewards for some action, thus avoiding sparse rewards in traditional reinforcement learning.
Step 3: and designing a multi-agent population evolution algorithm.
(1) And screening the father generation. At pi i After interaction with the scaleless network topology environment, the individual fitness fi can be obtained i . Selecting father according to individual fitness value, selecting father by greedy method, pairing individuals in P pairwise, and selecting the father with larger sum of individual fitness as father pi xy
(2) Actor network agent cross policy. And selecting a parent, and then performing an actor network agent crossing strategy, as shown in figure 3. Firstly, initializing a sub-actor network intelligent agent pi o And an empty offspring experience pool RB o . Pi at the beginning o Network weights of pi x Or pi y Identical and from pi x And pi y 1/2 of the experience of each random extraction in the experience pool is stored in RB o In (a) for pi x And pi y Is a cross training of (a).
In order for the generated child to inherit the policy advantage of the parent, the child agent is updated using equation 3. Wherein pi is x (s)、π y (s) are respectively the father pi xy Action value s generated in state s i Sum s j Representing RB o Q(s) ix (s i ) Pi is shown x In experience s i The following. Since the Q network in SAC represents future jackpots under specific conditions and actions, the Q network can be used to evaluate the performance of two parents under different experiences. Then, selecting samples with larger Q values in the two parents for calculating a loss value L (C), and finally updating the weight values of the children in a mode of minimizing the loss value. The offspring obtained by the method inherits the strategy advantages of the father under different s, so that the method is easier to output action values which enable future cumulative rewards to be larger for the following scale-free network topology environment, and the efficiency of optimizing the topology of the multi-agent population evolution algorithm is improved.
(3) Actor network agent mutation strategy. After the crossover is completed, the multi-agent population evolution algorithm performs an actor network agent mutation strategy, as shown in fig. 4. Mutations are classified into three modes, normal mutation, supermutation and reset, and are selectively performed with probabilities of 85%, 5% and 10%, respectively. Randomly extracting pi from P during mutation i As a parent, and acquire each layer weight thereof, and perform mutation operation for weight value of 10% thereof. For normal mutation and super mutation, the initial weight value w ij A perturbation value generated by a gaussian distribution is added. Wherein the variance of the perturbation values of the hypermutation is 10 xw ij Variance of common mutations was 0.1×w ij Thus super mutation pair pi i The adjustment of (c) will be larger. For reset, w ij Replaced with random values in a gaussian distribution. And finally, taking the individuals with weights of each layer subjected to mutation operation as offspring. Because the partial weights in the offspring have changed to different degrees, a different from the parent is generated for the current s, namely different strategy directions are generated, and multi-direction exploration improves the algorithm exploration to the greatest extentAbility to optimize solution.
Step 4: and designing a reinforcement learning optimization algorithm. This embodiment uses the soft actor-critic strategy [2 ]]Combined with an evolutionary algorithm. SAC acquires RB total Pi is updated in a manner that maximizes the search entropy and prize values. The definition of the exploration entropy H is as follows:
H(π)=E x~π [-logπ(x)] (4)
the critic network in SAC is used to evaluate the future jackpot under s and the Q network is used to evaluate the future jackpot under s and a. First utilize RB total The Q network and the critic network are updated first. According to the reinforcement learning optimization mode, the loss value L (θ) of the Q network is defined as follows:
wherein γ is the set discount factor. Loss value of Critic networkThe definition is as follows:
in minimizing L (theta) sumThen, the weight of pi is updated by gradient descent:
finally, the target critic network is subjected to soft update according to the weight value of the critic network [3]. After the weight of pi is updated, pi is inserted into P with the probability of 0.5, and the individual with the lowest individual fitness value in P is replaced, so that the synchronous updating of reinforcement learning and multi-agent population is realized.
Step 5: the complete algorithm flow is shown in fig. 1. Steps 2, 3 and 4 were repeated periodically in one independent repeat experiment. With the evolution of multi-agent population and the update of reinforcement learning neural network, a more robust network topology structure is continuously obtained in the interaction of an actor network agent and a topology environment. When the output network topology robustness does not exceed 0.0005 in the floating range within 5 iteration times or reaches the set maximum iteration times, the algorithm is terminated, and the final topology structure is the optimization result of the algorithm.
Preferably, the embodiment of the present application further provides a specific implementation manner of an electronic device capable of implementing all the steps in the topology robust optimization method of the internet of things for multi-agent co-evolution in the above embodiment, where the electronic device specifically includes the following contents:
a processor (processor), a memory (memory), a communication interface (Communications Interface), and a bus;
the processor, the memory and the communication interface complete communication with each other through buses; the communication interface is used for realizing information transmission among relevant equipment such as server-side equipment, metering equipment and user-side equipment.
The processor is used for calling a computer program in the memory, and when the processor executes the computer program, all the steps in the topological robust optimization method of the Internet of things for the multi-agent co-evolution in the embodiment are realized.
The embodiment of the application also provides a computer readable storage medium capable of realizing all the steps in the multi-agent co-evolution topological robust optimization method of the Internet of things in the embodiment, and the computer readable storage medium is stored with a computer program which is executed by a processor to realize all the steps in the multi-agent co-evolution topological robust optimization method of the Internet of things in the embodiment.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a hardware+program class embodiment, the description is relatively simple, as it is substantially similar to the method embodiment, as relevant see the partial description of the method embodiment.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Although the application provides method operational steps as an example or a flowchart, more or fewer operational steps may be included based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented by an actual device or client product, the instructions may be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment) as shown in the embodiments or figures.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Reference to the literature
[1]Chen N,Qiu T,Mu C,et al.Deep actor–critic learning-based robustness enhancement of Internet of Things[J].IEEE Internet of Things Journal,2020,7(7):6191-6200.
[2]Haarnoja T,Zhou A,Abbeel P,et al.Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning.PMLR,2018:1861-1870.
[3]Xiao Y,Liu J,Wu J,et al.Leveraging deep reinforcement learning for traffic engineering:A survey[J].IEEE Communications Surveys&Tutorials,2021,23(4):2064-2097.
The application is not limited to the embodiments described above. The above description of specific embodiments is intended to describe and illustrate the technical aspects of the present application, and is intended to be illustrative only and not limiting. Numerous specific modifications can be made by those skilled in the art without departing from the spirit of the application and scope of the claims, which are within the scope of the application.

Claims (3)

1. The topological robust optimization method of the Internet of things for multi-agent co-evolution is characterized by comprising the following steps of:
s1, initializing a reinforcement learning neural network, a multi-agent population and a scaleless network topology environment, wherein the scaleless network topology environment comprises a network topology, an experience pool and an action set;
the reinforcement learning neural network comprises an actor network intelligent agent, a critic network, a Q network and a target critic network, and weight values of the actor network intelligent agent, the critic network, the Q network and the target critic network are randomly set during initialization;
the multi-agent population comprises n individuals, each individual is an actor network agent with the same structure as an actor network in the reinforcement learning neural network, and the weight values of the n individuals are randomly generated;
when initializing network topology, firstly generating 4 initial nodes, and carrying out full connection according to the communication range of each node; the newly added node is preferentially connected with the node with the largest degree in the network topology, so that the non-scale characteristic of the network topology is ensured; the positions of all nodes in the network topology are fixed after generation, and all nodes form a node set; meanwhile, according to whether nodes in the network topology are connected or not, an adjacency matrix corresponding to the network topology is obtained; the upper triangular matrix of the adjacent matrix is spliced according to the rows to obtain a one-dimensional matrix, and the one-dimensional matrix is used for representing the network topology connection state and is used as the input of an actor network intelligent agent; when initializing an experience pool, reserving an empty storage space for each actor network intelligent agent for subsequent steps; the action set is a hash table, and the table stores operable edges existing in the current network topology;
s2, designing an actor network intelligent agent and a scaleless network topology environment distributed interaction method;
the method comprises the steps that an actor network intelligent agent in a multi-intelligent agent population and an actor network intelligent agent in a reinforcement learning neural network interact with a scaleless network topology environment in a distributed manner; when interacting with a scaleless network topology environment, according to the input network topology connection state, an actor network intelligent agent outputs an action value, the output action value is mapped to a pair of edges in the network topology, and the two edges are exchanged according to an edge exchange strategy to obtain a new network topology connection state, and the new network topology connection state is used as the next input of the corresponding actor network intelligent agent until the maximum interaction times; after each time of executing the edge exchange strategy, obtaining a reward value of an action value output by an action network intelligent agent under an input network topology connection state, taking the < input network topology connection state, the action value, the reward value and a new network topology connection state > in the process as an experience, and storing the experience into an experience pool for training the action network intelligent agent in a subsequent reinforcement learning neural network; the interaction adopts a distributed asynchronous interaction mode, n+1 active network agents run in parallel, the running processes are isolated from each other, and as the weight values of the active network agents are different, different action values are respectively output for the same input, so that a scale-free network topology environment is explored in a plurality of directions;
s3, designing a multi-agent population evolution algorithm;
in the interaction process of the actor network intelligent agent and the scale-free network topological environment, calculating to obtain an individual fitness value; screening two actor network intelligent agents as parents according to individual fitness values, and performing cross operation on the two parents; in addition, randomly screening an actor network intelligent agent, and carrying out mutation operation on the actor network intelligent agent; the new actor network agent generated by the cross operation and the mutation operation is used as a child, and the individual with the lowest individual fitness value in the multi-agent population is replaced, so that the evolution of the multi-agent population is completed;
s4, designing a reinforcement learning optimization algorithm;
combining a soft actor-critic algorithm with the multi-agent population evolution algorithm of the S3; acquiring experiences in an experience pool through a soft actor-critic algorithm, and updating an actor network intelligent agent with the aim of maximizing a search entropy and a reward value; after updating parameters of the actor network intelligent agent, inserting the actor network intelligent agent into the multi-intelligent agent population with the probability of 0.5, replacing individuals with the lowest individual fitness value in the multi-intelligent agent population, and realizing synchronous updating of the reinforcement learning neural network and the multi-intelligent agent population;
step 5: periodically repeating S2, S3 and S4 in one independent repetition experiment; along with the evolution of multi-agent population and the update of reinforcement learning neural network, the actor network agents continuously output action values capable of obtaining higher rewards, and in the interaction of the action values and the scale-free network topology environment, a robust scale-free network topology structure is continuously obtained; when the robustness of the output scaleless network topology is within a set value after a plurality of iteration times, or reaches a set maximum iteration number, the loop is terminated, and the final structure of the scaleless network topology is used as a final optimization result.
2. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps of the multi-agent co-evolving method of topological robustness optimization of internet of things of any one of claims 1 when the program is executed.
3. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the multi-agent co-evolving internet of things topology robust optimization method according to claim 1.
CN202310614147.9A 2023-05-29 2023-05-29 Multi-agent co-evolution topological robustness optimization method for Internet of things Pending CN116647459A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310614147.9A CN116647459A (en) 2023-05-29 2023-05-29 Multi-agent co-evolution topological robustness optimization method for Internet of things

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310614147.9A CN116647459A (en) 2023-05-29 2023-05-29 Multi-agent co-evolution topological robustness optimization method for Internet of things

Publications (1)

Publication Number Publication Date
CN116647459A true CN116647459A (en) 2023-08-25

Family

ID=87624163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310614147.9A Pending CN116647459A (en) 2023-05-29 2023-05-29 Multi-agent co-evolution topological robustness optimization method for Internet of things

Country Status (1)

Country Link
CN (1) CN116647459A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117097624A (en) * 2023-10-18 2023-11-21 浪潮(北京)电子信息产业有限公司 Network topology structure enhancement method and device, electronic equipment and storage medium
CN117424824A (en) * 2023-12-19 2024-01-19 天津斯巴克斯机电有限公司 Network topology optimization method and system for intelligent production line of electric roller

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117097624A (en) * 2023-10-18 2023-11-21 浪潮(北京)电子信息产业有限公司 Network topology structure enhancement method and device, electronic equipment and storage medium
CN117097624B (en) * 2023-10-18 2024-02-09 浪潮(北京)电子信息产业有限公司 Network topology structure enhancement method and device, electronic equipment and storage medium
CN117424824A (en) * 2023-12-19 2024-01-19 天津斯巴克斯机电有限公司 Network topology optimization method and system for intelligent production line of electric roller

Similar Documents

Publication Publication Date Title
CN116647459A (en) Multi-agent co-evolution topological robustness optimization method for Internet of things
CN109039942B (en) Network load balancing system and balancing method based on deep reinforcement learning
Du et al. The networked evolutionary algorithm: A network science perspective
CN110852448A (en) Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning
CN111935724B (en) Wireless sensor network topology optimization method based on asynchronous deep reinforcement learning
Chen et al. Edge intelligent networking optimization for internet of things in smart city
Ali et al. A novel hybrid Cultural Algorithms framework with trajectory-based search for global numerical optimization
CN113098714A (en) Low-delay network slicing method based on deep reinforcement learning
Zelinka et al. Evolutionary dynamics as the structure of complex networks
CN106789320A (en) A kind of multi-species cooperative method for optimizing wireless sensor network topology
CN113276852A (en) Unmanned lane keeping method based on maximum entropy reinforcement learning framework
Lamiable et al. An algorithmic game-theory approach for coarse-grain prediction of RNA 3D structure
CN104657901A (en) Community discovery method based on label propagation in random walk
CN112131089B (en) Software defect prediction method, classifier, computer device and storage medium
Przewozniczek et al. On turning black-into dark gray-optimization with the direct empirical linkage discovery and partition crossover
CN116611527B (en) Quantum circuit processing method and device and electronic equipment
CN117216071A (en) Transaction scheduling optimization method based on graph embedding
CN115759199A (en) Multi-robot environment exploration method and system based on hierarchical graph neural network
He et al. A membrane-inspired algorithm with a memory mechanism for knapsack problems
Wang et al. Decision tree models induced by membrane systems
CN114723005B (en) Multi-layer network collapse strategy deducing method based on depth map representation learning
Zhang et al. A Robust Networking Model With Quantum Evolution for Internet of Things
Kim et al. Stochastic multiscale approaches to consensus problems
Watkins Generating heuristics for graph-based problems using reinforcement learning
Sun et al. Global and Cluster Structural Balance via a Priority Strategy Based Memetic Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination