CN109947567B - Multi-agent reinforcement learning scheduling method and system and electronic equipment - Google Patents

Multi-agent reinforcement learning scheduling method and system and electronic equipment Download PDF

Info

Publication number
CN109947567B
CN109947567B CN201910193429.XA CN201910193429A CN109947567B CN 109947567 B CN109947567 B CN 109947567B CN 201910193429 A CN201910193429 A CN 201910193429A CN 109947567 B CN109947567 B CN 109947567B
Authority
CN
China
Prior art keywords
scheduling
agent
virtual machine
service node
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910193429.XA
Other languages
Chinese (zh)
Other versions
CN109947567A (en
Inventor
任宏帅
王洋
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201910193429.XA priority Critical patent/CN109947567B/en
Publication of CN109947567A publication Critical patent/CN109947567A/en
Priority to PCT/CN2019/130582 priority patent/WO2020181896A1/en
Application granted granted Critical
Publication of CN109947567B publication Critical patent/CN109947567B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer And Data Communications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a multi-agent reinforcement learning scheduling method, a multi-agent reinforcement learning scheduling system and electronic equipment. The method comprises the following steps: step a: collecting server parameters of a network data center and virtual machine load information running on each server; step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent; step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively; step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node. The method and the system have the advantages that the service running on the server is virtualized through the virtualization technology, the load balance is carried out in a virtual machine scheduling mode, the resource allocation is macroscopic, and the strategy that the multi-agent generates cooperation under the complex dynamic environment can be realized.

Description

Multi-agent reinforcement learning scheduling method and system and electronic equipment
Technical Field
The present application relates to the field of multi-agent systems, and in particular, to a method, a system, and an electronic device for multi-agent reinforcement learning scheduling.
Background
In a cloud computing environment, a traditional service deployment mode is difficult to deal with variable access modes, although fixed allocation of resources can stably provide services, a large amount of resource waste exists in the traditional service deployment mode, for example, in the same network topology structure, some servers may often run at full load, and some servers only deploy a few services and still have a lot of unused storage space and computing capacity, so that the traditional deployment service is difficult to deal with the waste of the resources, and efficient scheduling is difficult to realize, so that the resources cannot be efficiently utilized. There is therefore a need for a scheduling algorithm that can adapt to dynamic environments to balance the load of the individual servers in the network.
With the development of virtualization technology, the resource scheduling problem is also promoted from static allocation to dynamic allocation by the appearance of technologies such as virtual machine containers, and in recent years, schemes for resource adaptive scheduling are endless, most of the schemes adopt a heuristic algorithm, perform dynamic scheduling by adjusting parameters, adjust the abundant or insufficient conditions of available resources in the operating environment according to a threshold value, and iteratively calculate a suitable threshold value by using the heuristic algorithm. However, the scheduling method only seeks an optimal solution on a massive data combination, and the solved optimal decision is only for the current specific time node, and the time sequence information is not fully utilized, so that the problem of resource allocation in a large-scale complex dynamic environment is difficult to solve.
With the rise of artificial intelligence, the development of deep reinforcement learning technology makes the decision of an intelligent agent on a large state space possible. In the field of multi-agent reinforcement learning, if distributed learning is performed by using a traditional reinforcement learning algorithm such as Q-learning, PG (Policy Gradient Method), the expected effect still cannot be obtained because each agent tries to learn and predict the actions of other agents in each step, and other agents are always changing in a dynamic environment, so the environment becomes unstable, the knowledge is difficult to learn, and optimal resource allocation cannot be realized. In addition, from the aspect of a reinforcement learning method, most of the current scheduling means are single agent reinforcement learning and distributed reinforcement learning, and if only one agent is used for centralized training, the algorithm is difficult to train and is difficult to converge due to a large amount of action spaces of complex state changes and permutation combinations under a network topology structure. The method using distributed reinforcement learning also faces another problem, and the common distributed reinforcement learning is to train a plurality of agents together to accelerate the convergence rate, but in fact, the scheduling strategies of the agents are the same, and only a plurality of entities are used to accelerate the training rate in the training process, so that the finally obtained homogeneous agents have no cooperative ability. In the traditional multi-agent method, each agent can predict the decision of other agents at each decision step, but because the decision of other agents is unstable under the dynamic environment, the training is very difficult and each agent can do things almost the same without cooperative strategy.
Disclosure of Invention
The application provides a multi-agent reinforcement learning scheduling method, a multi-agent reinforcement learning scheduling system and electronic equipment, and aims to solve at least one of the technical problems in the prior art to a certain extent.
In order to solve the above problems, the present application provides the following technical solutions:
a multi-agent reinforcement learning scheduling method comprises the following steps:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively;
step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the step a further comprises: carrying out standardized preprocessing operation on the collected server parameters and the virtual machine load information; the normalized preprocessing operation comprises: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and respective configuration of the virtual machines, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and an operating state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine is less than the upper limit of the configuration of the server where the virtual machine is located.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step b, the deep reinforcement learning model of the multi-agent specifically comprises a prediction module and a scheduling module, wherein the prediction module predicts resources needing to be scheduled out in the current state through information input by each service node, and maps an action space into the total capacity of the current service node according to configuration information of the current service node; the scheduling module carries out rescheduling and distribution to generate a scheduling strategy according to the marked virtual machine in the state to be scheduled, and an agent on each service node calculates a return function according to the generated scheduling action; the prediction module measures the quality of the scheduling strategy, so that the load of each service node in the whole network is balanced.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in step c, the off-line training and learning by using the deep reinforcement learning model of the multi-agent and the simulation environment, and the training of an agent model for each server specifically includes: the intelligent agent on each service node adjusts the size of the resource to be scheduled through the prediction module, marks the virtual machine to be scheduled out, generates a scheduling strategy according to the virtual machine in the state to be scheduled, calculates the return value of each service node, summarizes and sums the return values to obtain a total return value, and adjusts the parameters of each prediction module according to the total return value.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step d, the deploying the agent model to the real service nodes and scheduling according to the load condition of each service node specifically comprises: deploying a trained intelligent agent model to a corresponding service node in a real environment, sensing state information of a server where the intelligent agent model is located within a period of time as input, predicting to obtain resources needing to be released by the current server, and selecting a virtual machine closest to a standard by using a knapsack algorithm to mark the virtual machine as a state to be scheduled; then, collecting prediction results on all servers and the virtual machines marked as the to-be-scheduled states through a scheduling module, assigning the virtual machines in the to-be-scheduled states to suitable servers as required to generate a scheduling strategy, and distributing a scheduling command to corresponding service nodes to execute scheduling operation; before executing the scheduling strategy, checking whether each scheduling command is legal or not, if not, feeding back a punishment reward updating parameter, and regenerating the scheduling strategy; and if the intelligent agent parameter is legal, executing the scheduling operation, and obtaining the feedback reward value to update the intelligent agent parameter.
Another technical scheme adopted by the embodiment of the application is as follows: a multi-agent reinforcement learning scheduling system comprising:
an information collection module: the system comprises a data center, a data center and a server, wherein the data center is used for collecting server parameters of the data center and virtual machine load information running on each server;
a reinforcement learning model construction module: the system comprises a virtual simulation environment and a deep reinforcement learning model of a multi-agent, wherein the virtual simulation environment is established by using the server parameters and the virtual machine load information;
the intelligent agent model training module: the system comprises a multi-agent deep reinforcement learning model, a simulation environment and a plurality of servers, wherein the multi-agent deep reinforcement learning model is used for performing offline training and learning by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server;
an agent deployment module: and the intelligent agent model is used for deploying the intelligent agent model to real service nodes and scheduling according to the load condition of each service node.
The technical scheme adopted by the embodiment of the application further comprises a preprocessing module, wherein the preprocessing module is used for carrying out standardized preprocessing operation on the collected server parameters and the collected virtual machine load information; the normalized preprocessing operation comprises: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and respective configuration of the virtual machines, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and an operating state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine is less than the upper limit of the configuration of the server where the virtual machine is located.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the reinforcement learning model building module comprises a prediction module and a scheduling module, wherein the prediction module comprises:
a state sensing unit: the system is used for predicting the resources needing to be scheduled out in the current state through the information input by each service node;
an action space unit: the action space is mapped into the total capacity of the current service node according to the configuration information of the current service node;
the scheduling module carries out rescheduling and distribution to generate a scheduling strategy according to the marked virtual machine in the state to be scheduled, and an agent on each service node calculates a return function according to the generated scheduling action;
the prediction module further comprises:
a reward function unit: the method is used for measuring the quality of the scheduling strategy, so that the load of each service node in the whole network is balanced.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the intelligent agent model training module utilizes a deep reinforcement learning model and a simulation environment of a plurality of intelligent agents to carry out off-line training and learning, and the training of an intelligent agent model for each server specifically comprises the following steps: the intelligent agent on each service node adjusts the size of the resource to be scheduled through the prediction module, marks the virtual machine to be scheduled out, generates a scheduling strategy according to the virtual machine in the state to be scheduled, calculates the return value of each service node, summarizes and sums the return values to obtain a total return value, and adjusts the parameters of each prediction module according to the total return value.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the intelligent agent deployment module deploys the intelligent agent model to the real service nodes, and the scheduling according to the load condition of each service node specifically comprises the following steps: deploying a trained intelligent agent model to a corresponding service node in a real environment, sensing state information of a server where the intelligent agent model is located within a period of time as input, predicting to obtain resources needing to be released by the current server, and selecting a virtual machine closest to a standard by using a knapsack algorithm to mark the virtual machine as a state to be scheduled; then, collecting prediction results on all servers and the virtual machines marked as the to-be-scheduled states through a scheduling module, assigning the virtual machines in the to-be-scheduled states to suitable servers as required to generate a scheduling strategy, and distributing a scheduling command to corresponding service nodes to execute scheduling operation; before executing the scheduling strategy, checking whether each scheduling command is legal or not, if not, feeding back a punishment reward updating parameter, and regenerating the scheduling strategy; and if the intelligent agent parameter is legal, executing the scheduling operation, and obtaining the feedback reward value to update the intelligent agent parameter.
The embodiment of the application adopts another technical scheme that: an electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the one processor to cause the at least one processor to perform the following operations of the multi-agent reinforcement learning scheduling method described above:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively;
step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node.
Compared with the prior art, the embodiment of the application has the advantages that: the multi-agent reinforcement learning scheduling method, the multi-agent reinforcement learning scheduling system and the electronic equipment virtualize the service running on the server through virtualization technology, and perform load balancing through a virtual machine scheduling mode, because the scheduling range is not limited in a single server, when one server is in a high-load state, the virtual machine can be scheduled to other low-load servers to run, and compared with a scheme of resource allocation, the method and the system are more macroscopic. Meanwhile, the MADDPG framework is used for expanding on the AC framework, critic adds extra information for decision making of other agents, but each agent can only use local information for training, and a strategy that a plurality of agents generate cooperation in a complex dynamic environment can be achieved through the framework.
Drawings
FIG. 1 is a flow chart of a multi-agent reinforcement learning scheduling method according to an embodiment of the present application;
FIG. 2 is a diagram of a MADDPG scheduling framework according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a scheduling overall framework according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a multi-agent reinforcement learning scheduling system according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a hardware device of a multi-agent reinforcement learning scheduling method according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In order to solve the defects in the prior art, the multi-agent reinforcement learning scheduling method in the embodiment of the application uses a multi-agent reinforcement learning technology in the reinforcement learning field, models are built according to load information on each service node in the cloud service environment, decision is made by using recurrent neural network learning time sequence information, an agent is trained for each server, and competition or cooperative work is performed on a plurality of agents with different tasks to maintain load balance under the whole network topology structure. After the initial training is completed, each intelligent agent is placed to a real service node, then scheduling is carried out according to the load condition of each node, each intelligent agent continues to learn and perfect according to the decision memory of the current independent environment and other nodes while decision and scheduling are carried out, so that each intelligent agent can cooperate with the intelligent agents of other nodes to generate a scheduling strategy, and the load balance of each service node is realized.
Specifically, please refer to fig. 1, which is a flowchart illustrating a multi-agent reinforcement learning scheduling method according to an embodiment of the present application. The multi-agent reinforcement learning scheduling method comprises the following steps:
step 100: collecting server parameters of a network data center and virtual machine load information running on each server;
in step 100, the collected server parameters specifically include: collecting configuration information, memory, hard disk storage space and the like of each server in a real scene for a period of time; the collected virtual machine load information specifically includes: and collecting parameters of resources occupied by the virtual machine running on each server, such as CPU occupancy rate, memory and hard disk occupancy rate and the like.
Step 200: preprocessing operations such as normalization and the like are carried out on the collected server parameters and the virtual machine load information;
in step 200, the preprocessing operation specifically includes: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and the respective configuration of the virtual machines, including a CPU, a memory, a hard disk and the current state, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and a running state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine cannot be more than the upper limit of the configuration of the server.
Step 300: establishing a virtual simulation environment by using the preprocessed data, and establishing a deep reinforcement learning model of the multi-agent;
in step 300, establishing a deep reinforcement learning model of a multi-agent specifically includes: modeling the collected time sequence dynamic information (server parameters and virtual machine load information) to create a simulation environment for off-line training, wherein the model adopts a multi-agent deep reinforcement learning model, and in order to fully utilize the influence of time sequence data, an LSTM model is adopted in a deep network part in the model to extract the time sequence information, so that the influence of abnormal data fluctuation in an instantaneous state on decision making is avoided. The model adopts a MADDPG (Multi-Agent Deep Deterministic Policy Gradient, namely, a Multi-Agent activator-critical for Mixed Cooperative-comprehensive environment from OpenAI) framework, the MADDPG framework is the expansion of a DDPG (continuous control with Deep learning article published by Google Deep Mind) algorithm in the Multi-Agent field, and the DDPG algorithm applies Deep reinforcement learning to a continuous action space. And the action space obtained by the deep learning part is set as the resource occupation ratio of the virtual machine in the state to be scheduled, namely the load balance of the current service node can be maintained only by scheduling the occupied space. Marking the virtual machine with proper size as a to-be-scheduled state according to the obtained to-be-scheduled space, then calculating the return rewards of the virtual machine in the to-be-scheduled state and each service node on each service node in the whole network, generating a scheduling strategy by using reward values obtained by the virtual machine by being distributed to the service nodes as distance measurement, finally checking whether the scheduling strategy is executable, if the scheduling strategy is executable, scheduling the virtual machine in the to-be-scheduled state to other proper service nodes, if the scheduling strategy is not executable, returning a negative feedback punishment, and generating the scheduling strategy again by the intelligent agent. The detailed scheduling framework is shown in fig. 2.
In the embodiment of the application, in order to solve the influence caused by some instantaneous abnormal load fluctuations in a dynamic environment, a circulating neural network LSTM (long-short time memory network) is used for replacing a fully-connected neural network in deep reinforcement learning, so that an intelligent agent can learn hidden information among time sequence data, and therefore self-adaptive scheduling based on space-time perception is achieved.
In the above, the virtual machines are marked as the states to be scheduled by using the intelligent agents on the service nodes, a knapsack problem solution is adopted, the predicted spaces to be scheduled are used as knapsack spaces, the occupied resources of each virtual machine are used as the weight and the value of the articles, the maximum value which can be loaded into the knapsack is calculated, and the loaded virtual machines are marked as the states to be scheduled. Then, the space to be scheduled predicted on the service node is counted (wherein, negative numbers exist to indicate how many resources to be scheduled can fully utilize the resources), the target is that the sum of the space to be scheduled occupied and the space to be scheduled of each service node is minimum, and a scheduling strategy can be obtained through calculation.
In the embodiment of the application, the MADDPG framework expands the deep reinforcement learning technology to the field of multi-agent, the algorithm is suitable for Centralized learning (Centralized learning) and distributed execution (Centralized execution) in the multi-agent environment, and the multi-agent can be learnt to cooperate and compete by using the framework.
Specifically, the maddppg algorithm takes into account a plurality of parameterizations θ ═ θ123,…θn-calculating Policy by gaming of a plurality of agents, the Policy for all agents being defined as pi ═ pi { (pi } pi)123,…πnThe expected profit for the ith agent is J (θ)i)=E[Ri]Then consider the deterministic policy μθiθiWhen parametric, the gradient can be expressed as:
Figure BDA0001995070660000111
wherein x is (o)1…on)。
Specifically, the deep reinforcement learning model comprises a prediction module and a scheduling module, the prediction module comprises a state sensing unit, an action space unit and a reward function unit, and the specific functions are as follows:
a state sensing unit: predicting resources needing to be scheduled out in the current state through information input by each node, wherein the input state is defined through load information of each node and resources occupied by running virtual machines;
an action space unit: mapping the action space to the total capacity of the current service node according to the configuration information of the current node;
a scheduling module: according to the marked virtual machine in the state to be scheduled, rescheduling and distributing are carried out to generate a scheduling strategy, and an agent on each service node calculates a return function according to the generated scheduling action;
a reward function unit: measuring the quality of a scheduling strategy, wherein the target is load balance of each service node in the whole network, and a return function on each service node is calculated independently; the return function is formulated as follows:
Figure BDA0001995070660000121
in the above formula, the first and second carbon atoms are,riis the reward return on each service node, wherein c represents the CPU occupancy rate on the ith machine, and alpha and beta are penalty coefficients. α can be set as the case may be, indicating a threshold value at which it is desired that the server CPU occupancy load remain steady.
Figure BDA0001995070660000122
In the above formula, R is an overall reward function, and the final optimization target is the maximum R obtained for the scheduling policy cooperatively generated by each agent.
Step 400: off-line training and learning are carried out by utilizing a deep reinforcement learning model of a plurality of intelligent agents and a simulation environment, and an intelligent agent model is trained for each server respectively;
in step 400, performing offline training in a simulation environment established according to real data, creating an agent for each service node, adjusting the size of resources to be scheduled by the agent on each service node through a prediction module, marking virtual machines to be scheduled, generating a scheduling policy according to the virtual machines in a state to be scheduled, calculating the return values of each service node, summarizing and summing the return values to obtain a total return value, and adjusting the parameters of each prediction module according to the total return value.
Step 500: and deploying the trained intelligent agent model to the real service nodes, and scheduling according to the load condition of each service node.
Step 500, putting each trained agent model down to a corresponding service node in a real environment, sensing state information of a server where the agent is located within a period of time as input by the agent, predicting to obtain resources which the current server wants to release through a prediction module of the agent, and selecting a virtual machine which is closest to a standard by using a knapsack algorithm to mark the virtual machine as a state to be scheduled; and then, collecting the prediction results on all the servers and the virtual machines marked as the to-be-scheduled states through a scheduling module, assigning the virtual machines in the to-be-scheduled states to appropriate servers as required to generate a scheduling strategy, and distributing a scheduling command to corresponding nodes to execute scheduling operation. Before executing the scheduling strategy, whether each scheduling command is legal or not needs to be checked, if not, a punitive reward updating parameter is fed back, the scheduling strategy is regenerated, and iteration is repeated until all the scheduling strategies can be executed. And if the intelligent agent parameters are legal, the intelligent agent parameters are updated by executing and obtaining the feedback reward values. The specific scheduling overall framework is shown in fig. 3.
In general, a scheduling action is directly obtained according to environment input in multi-agent reinforcement learning, but in a complex network topology, an action space for a virtual machine scheduling strategy is too large, and the action space is too large or an algorithm is difficult to converge, and in this way, each virtual machine running in the complex network topology needs to be configured with a global id for specifying a scheduling target, but it should be noted that although the id can be indexed to the virtual machine, resources occupied by the virtual machine are likely to change in the running process, and therefore the strategy learned in the learning process is not reliable. Even if the occupied resources of the virtual machines do not change, if a virtual machine is newly added, the intelligent agent trained based on the algorithm does not consider the newly added virtual machine in decision making. Therefore, the method is improved on the basis of the algorithm, so that the action space of the model is replaced by the resources which the current server wants to release, namely, how many resources are expected to be scheduled from the action space to keep load balance under the overall network topology. By the arrangement, the fact that the global id is used for marking each virtual machine can be avoided, and the operation can still be carried out even if a new virtual machine is added midway, so that the scheduling algorithm is more flexible and can be adaptive to a wider scene.
Please refer to fig. 4, which is a schematic structural diagram of a multi-agent reinforcement learning scheduling system according to an embodiment of the present application. The multi-agent reinforcement learning scheduling system comprises an information collection module, a preprocessing module, a reinforcement learning model construction module, an agent model training module and an agent deployment module.
An information collection module: the system comprises a data center, a data center and a server, wherein the data center is used for collecting server parameters of the data center and virtual machine load information running on each server; the collected server parameters specifically include: collecting configuration information, memory, hard disk storage space and the like of each server in a real scene for a period of time; the collected virtual machine load information specifically includes: and collecting parameters of resources occupied by the virtual machine running on each server, such as CPU occupancy rate, memory and hard disk occupancy rate and the like.
A preprocessing module: the system is used for carrying out preprocessing operations such as normalization on the collected server parameters and the collected virtual machine load information; wherein the preprocessing operation specifically comprises: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and the respective configuration of the virtual machines, including a CPU, a memory, a hard disk and the current state, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and a running state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine cannot be more than the upper limit of the configuration of the server.
A reinforcement learning model construction module: the system is used for establishing a virtual simulation environment by using the preprocessed data and establishing a deep reinforcement learning model of the multi-agent; the establishing of the deep reinforcement learning model of the multi-agent specifically comprises the following steps: modeling the collected time sequence dynamic information (server parameters and virtual machine load information) to create a simulation environment for off-line training, wherein the model adopts a multi-agent deep reinforcement learning model, and in order to fully utilize the influence of time sequence data, an LSTM model is adopted in a deep network part in the model to extract the time sequence information, so that the influence of abnormal data fluctuation in an instantaneous state on decision making is avoided. The model adopts an MADDPG frame, the MADDPG frame is the expansion of a DDPG algorithm in the field of multi-agents, and the DDPG algorithm applies deep reinforcement learning to a continuous action space. And the action space obtained by the deep learning part is set as the resource occupation ratio of the virtual machine in the state to be scheduled, namely the load balance of the current service node can be maintained only by scheduling the occupied space. Marking the virtual machine with proper size as a to-be-scheduled state according to the obtained to-be-scheduled space, then calculating the return rewards of the virtual machine in the to-be-scheduled state and each service node on each service node in the whole network, generating a scheduling strategy by using reward values obtained by the virtual machine by being distributed to the service nodes as distance measurement, finally checking whether the scheduling strategy is executable, if the scheduling strategy is executable, scheduling the virtual machine in the to-be-scheduled state to other proper service nodes, if the scheduling strategy is not executable, returning a negative feedback punishment, and generating the scheduling strategy again by the intelligent agent.
In the embodiment of the application, in order to solve the influence caused by some instantaneous abnormal load fluctuations in a dynamic environment, a circulating neural network LSTM (long-short time memory network) is used for replacing a fully-connected neural network in deep reinforcement learning, so that an intelligent agent can learn hidden information among time sequence data, and therefore self-adaptive scheduling based on space-time perception is achieved.
In the above, the virtual machines are marked as the states to be scheduled by using the intelligent agents on the service nodes, a knapsack problem solution is adopted, the predicted spaces to be scheduled are used as knapsack spaces, the occupied resources of each virtual machine are used as the weight and the value of the articles, the maximum value which can be loaded into the knapsack is calculated, and the loaded virtual machines are marked as the states to be scheduled. Then, the space to be scheduled predicted on the service node is counted (wherein, negative numbers exist to indicate how many resources to be scheduled can fully utilize the resources), the target is that the sum of the space to be scheduled occupied and the space to be scheduled of each service node is minimum, and a scheduling strategy can be obtained through calculation.
In the embodiment of the application, the MADDPG framework expands the deep reinforcement learning technology to the field of multi-agent, the algorithm is suitable for Centralized learning (Centralized learning) and distributed execution (Centralized execution) in the multi-agent environment, and the multi-agent can be learnt to cooperate and compete by using the framework.
Specifically, the maddppg algorithm takes into account a plurality of parameterizations θ ═ θ123,…θn-calculating Policy by gaming of a plurality of agents, the Policy for all agents being defined as pi ═ pi { (pi } pi)123,…πnThe expected profit for the ith agent is J (θ)i)=E[Ri]Then considerDeterministic strategy muθiθiWhen parametric, the gradient can be expressed as:
Figure BDA0001995070660000161
wherein x is (o)1…on)。
Further, the reinforcement learning model building module comprises a prediction module and a scheduling module, the prediction module comprises a state sensing unit, an action space unit and a reward function unit, and the specific functions are as follows:
a state sensing unit: predicting resources needing to be scheduled out in the current state through information input by each node, wherein the input state is defined through load information of each node and resources occupied by running virtual machines;
an action space unit: mapping the action space to the total capacity of the current service node according to the configuration information of the current node;
a scheduling module: according to the marked virtual machine in the state to be scheduled, rescheduling and distributing are carried out to generate a scheduling strategy, and an agent on each service node calculates a return function according to the generated scheduling action;
a reward function unit: measuring the quality of a scheduling strategy, wherein the target is load balance of each service node in the whole network, and a return function on each service node is calculated independently; the return function is formulated as follows:
Figure BDA0001995070660000162
in the above formula, riIs the reward return on each service node, wherein c represents the CPU occupancy rate on the ith machine, and alpha and beta are penalty coefficients. α can be set as the case may be, indicating a threshold value at which it is desired that the server CPU occupancy load remain steady.
Figure BDA0001995070660000171
In the above formula, R is an overall reward function, and the final optimization target is the maximum R obtained for the scheduling policy cooperatively generated by each agent.
The intelligent agent model training module: the system is used for performing off-line training and learning by utilizing a deep reinforcement learning model and a simulation environment of a plurality of intelligent agents, and respectively training an intelligent agent model for each server; the method comprises the steps of performing off-line training under a simulation environment established according to real data, respectively establishing an agent for each service node, adjusting the size of resources to be scheduled by the agent on each service node through a prediction module, marking virtual machines to be scheduled, generating a scheduling strategy according to the virtual machines in a state to be scheduled, respectively calculating the return values of the service nodes, summarizing and summing the return values to obtain a total return value, and finally adjusting the parameters of each prediction module according to the total return value.
An agent deployment module: and the intelligent agent model is used for deploying the trained intelligent agent model to the real service nodes and scheduling according to the load condition of each service node. The method comprises the steps of putting each trained intelligent agent model down to a corresponding service node in a real environment, then predicting and modifying a state to be scheduled through a prediction module of an intelligent agent, uniformly distributing and generating a scheduling strategy by the scheduling module, distributing a scheduling command to the corresponding node to execute scheduling operation, judging whether an action can be executed or not before executing the scheduling action, feeding back a punishment reward updating parameter if the action cannot be executed or fails to be executed, re-generating the scheduling strategy, and repeating iteration until all scheduling strategies can be executed.
In general, a scheduling action is directly obtained according to environment input in multi-agent reinforcement learning, but in a complex network topology, an action space for a virtual machine scheduling strategy is too large, and the action space is too large or an algorithm is difficult to converge, and in this way, each virtual machine running in the complex network topology needs to be configured with a global id for specifying a scheduling target, but it should be noted that although the id can be indexed to the virtual machine, resources occupied by the virtual machine are likely to change in the running process, and therefore the strategy learned in the learning process is not reliable. Even if the occupied resources of the virtual machines do not change, if a virtual machine is newly added, the intelligent agent trained based on the algorithm does not consider the newly added virtual machine in decision making. Therefore, the method is improved on the basis of the algorithm, so that the action space of the model is replaced by the resources which the current server wants to release, namely, how many resources are expected to be scheduled from the action space to keep load balance under the overall network topology. By the arrangement, the fact that the global id is used for marking each virtual machine can be avoided, and the operation can still be carried out even if a new virtual machine is added midway, so that the scheduling algorithm is more flexible and can be adaptive to a wider scene.
Fig. 5 is a schematic structural diagram of a hardware device of a multi-agent reinforcement learning scheduling method according to an embodiment of the present application. As shown in fig. 5, the device includes one or more processors and memory. Taking a processor as an example, the apparatus may further include: an input system and an output system.
The processor, memory, input system, and output system may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor executes various functional applications and data processing of the electronic device, i.e., implements the processing method of the above-described method embodiment, by executing the non-transitory software program, instructions and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processing system over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input system may receive input numeric or character information and generate a signal input. The output system may include a display device such as a display screen.
The one or more modules are stored in the memory and, when executed by the one or more processors, perform the following for any of the above method embodiments:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively;
step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
Embodiments of the present application provide a non-transitory (non-volatile) computer storage medium having stored thereon computer-executable instructions that may perform the following operations:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively;
step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node.
Embodiments of the present application provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the following:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively;
step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node.
The multi-agent reinforcement learning scheduling method, the multi-agent reinforcement learning scheduling system and the electronic equipment virtualize the service running on the server through virtualization technology, and perform load balancing through a virtual machine scheduling mode, because the scheduling range is not limited in a single server, when one server is in a high-load state, the virtual machine can be scheduled to other low-load servers to run, and compared with a scheme of resource allocation, the method and the system are more macroscopic. Meanwhile, the MADDPG framework is used for expanding on the AC framework, critic adds extra information for decision making of other agents, but each agent can only use local information for training, and a strategy that a plurality of agents generate cooperation in a complex dynamic environment can be achieved through the framework.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A multi-agent reinforcement learning scheduling method is characterized by comprising the following steps:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model of the multi-agent and the virtual simulation environment, and an agent model is trained for each server;
step d: deploying the intelligent agent model to a real service node, and scheduling according to the load condition of each service node;
in the step d, the deploying the agent model to the real service nodes and scheduling according to the load condition of each service node specifically comprises: deploying a trained intelligent agent model to a corresponding service node in a real environment, sensing state information of a server where the intelligent agent model is located within a period of time as input, predicting to obtain resources needing to be released by the current server, and selecting a virtual machine closest to a standard by using a knapsack algorithm to mark the virtual machine as a state to be scheduled; then, collecting prediction results on all servers and the virtual machines marked as the to-be-scheduled states through a scheduling module, assigning the virtual machines in the to-be-scheduled states to suitable servers as required to generate a scheduling strategy, and distributing a scheduling command to corresponding service nodes to execute scheduling operation; before executing the scheduling strategy, checking whether each scheduling command is legal or not, if not, feeding back a punishment reward updating parameter, and regenerating the scheduling strategy; and if the intelligent agent parameter is legal, executing the scheduling operation, and obtaining the feedback reward value to update the intelligent agent parameter.
2. The multi-agent reinforcement learning scheduling method of claim 1, wherein the step a further comprises: carrying out standardized preprocessing operation on the collected server parameters and the virtual machine load information; the normalized preprocessing operation comprises: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and respective configuration of the virtual machines, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and an operating state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine is less than the upper limit of the configuration of the server where the virtual machine is located.
3. The multi-agent reinforcement learning scheduling method according to claim 1 or 2, wherein in the step b, the deep reinforcement learning model of the multi-agent specifically includes a prediction module and a scheduling module, the prediction module predicts the resources to be scheduled out in the current state according to the information input by each service node, and maps the action space to the total capacity of the current service node according to the configuration information of the current service node; the scheduling module carries out rescheduling and distribution to generate a scheduling strategy according to the marked virtual machine in the state to be scheduled, and an agent on each service node calculates a return function according to the generated scheduling action; the prediction module measures the quality of the scheduling strategy, so that the load of each service node in the whole network is balanced.
4. The multi-agent reinforcement learning scheduling method of claim 3, wherein in the step c, the off-line training and learning are performed by using the deep reinforcement learning model and the virtual simulation environment of the multi-agent, and the training of one agent model for each server specifically comprises: the intelligent agent on each service node adjusts the size of the resource to be scheduled through the prediction module, marks the virtual machine to be scheduled out, generates a scheduling strategy according to the virtual machine in the state to be scheduled, calculates the return value of each service node, summarizes and sums the return values to obtain a total return value, and adjusts the parameters of each prediction module according to the total return value.
5. A multi-agent reinforcement learning scheduling system, comprising:
an information collection module: the system comprises a data center, a data center and a server, wherein the data center is used for collecting server parameters of the data center and virtual machine load information running on each server;
a reinforcement learning model construction module: the system comprises a virtual simulation environment and a deep reinforcement learning model of a multi-agent, wherein the virtual simulation environment is established by using the server parameters and the virtual machine load information;
the intelligent agent model training module: the system comprises a plurality of servers, a deep reinforcement learning model of a multi-agent and a virtual simulation environment, wherein the deep reinforcement learning model of the multi-agent and the virtual simulation environment are used for off-line training and learning, and an agent model is trained for each server;
an agent deployment module: the intelligent agent model is deployed to real service nodes and is scheduled according to the load condition of each service node;
the intelligent agent deployment module deploys the intelligent agent model to the real service nodes, and the scheduling according to the load condition of each service node specifically comprises the following steps: deploying a trained intelligent agent model to a corresponding service node in a real environment, sensing state information of a server where the intelligent agent model is located within a period of time as input, predicting to obtain resources needing to be released by the current server, and selecting a virtual machine closest to a standard by using a knapsack algorithm to mark the virtual machine as a state to be scheduled; then, collecting prediction results on all servers and the virtual machines marked as the to-be-scheduled states through a scheduling module, assigning the virtual machines in the to-be-scheduled states to suitable servers as required to generate a scheduling strategy, and distributing a scheduling command to corresponding service nodes to execute scheduling operation; before executing the scheduling strategy, checking whether each scheduling command is legal or not, if not, feeding back a punishment reward updating parameter, and regenerating the scheduling strategy; and if the intelligent agent parameter is legal, executing the scheduling operation, and obtaining the feedback reward value to update the intelligent agent parameter.
6. The multi-agent reinforcement learning scheduling system of claim 5, further comprising a preprocessing module for performing a normalized preprocessing operation on the collected server parameters and virtual machine load information; the normalized preprocessing operation comprises: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and respective configuration of the virtual machines, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and an operating state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine is less than the upper limit of the configuration of the server where the virtual machine is located.
7. The multi-agent reinforcement learning scheduling system of claim 5 or 6, wherein the reinforcement learning model building module comprises a prediction module and a scheduling module, the prediction module comprising:
a state sensing unit: the system is used for predicting the resources needing to be scheduled out in the current state through the information input by each service node;
an action space unit: the action space is mapped into the total capacity of the current service node according to the configuration information of the current service node;
the scheduling module carries out rescheduling and distribution to generate a scheduling strategy according to the marked virtual machine in the state to be scheduled, and an agent on each service node calculates a return function according to the generated scheduling action;
the prediction module further comprises:
a reward function unit: the method is used for measuring the quality of the scheduling strategy, so that the load of each service node in the whole network is balanced.
8. The multi-agent reinforcement learning scheduling system of claim 7, wherein the agent model training module performs off-line training and learning using the deep reinforcement learning model and the virtual simulation environment of the multi-agent, and training one agent model for each server specifically comprises: the intelligent agent on each service node adjusts the size of the resource to be scheduled through the prediction module, marks the virtual machine to be scheduled out, generates a scheduling strategy according to the virtual machine in the state to be scheduled, calculates the return value of each service node, summarizes and sums the return values to obtain a total return value, and adjusts the parameters of each prediction module according to the total return value.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the multi-agent reinforcement learning scheduling method of any of the above 1 to 4.
CN201910193429.XA 2019-03-14 2019-03-14 Multi-agent reinforcement learning scheduling method and system and electronic equipment Active CN109947567B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910193429.XA CN109947567B (en) 2019-03-14 2019-03-14 Multi-agent reinforcement learning scheduling method and system and electronic equipment
PCT/CN2019/130582 WO2020181896A1 (en) 2019-03-14 2019-12-31 Multi-agent reinforcement learning scheduling method and system and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910193429.XA CN109947567B (en) 2019-03-14 2019-03-14 Multi-agent reinforcement learning scheduling method and system and electronic equipment

Publications (2)

Publication Number Publication Date
CN109947567A CN109947567A (en) 2019-06-28
CN109947567B true CN109947567B (en) 2021-07-20

Family

ID=67009966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910193429.XA Active CN109947567B (en) 2019-03-14 2019-03-14 Multi-agent reinforcement learning scheduling method and system and electronic equipment

Country Status (2)

Country Link
CN (1) CN109947567B (en)
WO (1) WO2020181896A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2791840C2 (en) * 2021-12-21 2023-03-13 Владимир Германович Крюков Decision-making system in a multi-agent environment

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947567B (en) * 2019-03-14 2021-07-20 深圳先进技术研究院 Multi-agent reinforcement learning scheduling method and system and electronic equipment
CN110362411B (en) * 2019-07-25 2022-08-02 哈尔滨工业大学 CPU resource scheduling method based on Xen system
CN110442129B (en) * 2019-07-26 2021-10-22 中南大学 Control method and system for multi-agent formation
CN110471297B (en) * 2019-07-30 2020-08-11 清华大学 Multi-agent cooperative control method, system and equipment
CN110427006A (en) * 2019-08-22 2019-11-08 齐鲁工业大学 A kind of multi-agent cooperative control system and method for process industry
CN110516795B (en) * 2019-08-28 2022-05-10 北京达佳互联信息技术有限公司 Method and device for allocating processors to model variables and electronic equipment
CN110728368B (en) * 2019-10-25 2022-03-15 中国人民解放军国防科技大学 Acceleration method for deep reinforcement learning of simulation robot
CN111031387B (en) * 2019-11-21 2020-12-04 南京大学 Method for controlling video coding flow rate of monitoring video sending end
CN110882544B (en) * 2019-11-28 2023-09-15 网易(杭州)网络有限公司 Multi-agent training method and device and electronic equipment
CN111026549B (en) * 2019-11-28 2022-06-10 国网甘肃省电力公司电力科学研究院 Automatic test resource scheduling method for power information communication equipment
CN111047014B (en) * 2019-12-11 2023-06-23 中国航空工业集团公司沈阳飞机设计研究所 Multi-agent air countermeasure distributed sampling training method and equipment
CN111178545B (en) * 2019-12-31 2023-02-24 中国电子科技集团公司信息科学研究院 Dynamic reinforcement learning decision training system
CN113067714B (en) * 2020-01-02 2022-12-13 ***通信有限公司研究院 Content distribution network scheduling processing method, device and equipment
CN111310915B (en) * 2020-01-21 2023-09-01 浙江工业大学 Data anomaly detection defense method oriented to reinforcement learning
CN111324358B (en) * 2020-02-14 2020-10-16 南栖仙策(南京)科技有限公司 Training method for automatic operation and maintenance strategy of information system
CN111343095B (en) * 2020-02-15 2021-11-05 北京理工大学 Method for realizing controller load balance in software defined network
CN111461338A (en) * 2020-03-06 2020-07-28 北京仿真中心 Intelligent system updating method and device based on digital twin
CN111339675B (en) * 2020-03-10 2020-12-01 南栖仙策(南京)科技有限公司 Training method for intelligent marketing strategy based on machine learning simulation environment
CN111538668B (en) * 2020-04-28 2023-08-15 山东浪潮科学研究院有限公司 Mobile terminal application testing method, device, equipment and medium based on reinforcement learning
CN111585811B (en) * 2020-05-06 2022-09-02 郑州大学 Virtual optical network mapping method based on multi-agent deep reinforcement learning
CN111722910B (en) * 2020-06-19 2023-07-21 广东石油化工学院 Cloud job scheduling and resource allocation method
CN111724001B (en) * 2020-06-29 2023-08-29 重庆大学 Aircraft detection sensor resource scheduling method based on deep reinforcement learning
CN111860777B (en) * 2020-07-06 2021-07-02 中国人民解放军军事科学院战争研究院 Distributed reinforcement learning training method and device for super real-time simulation environment
CN112001585B (en) * 2020-07-14 2023-09-22 北京百度网讯科技有限公司 Multi-agent decision method, device, electronic equipment and storage medium
CN111967645B (en) * 2020-07-15 2022-04-29 清华大学 Social network information propagation range prediction method and system
CN112422651A (en) * 2020-11-06 2021-02-26 电子科技大学 Cloud resource scheduling performance bottleneck prediction method based on reinforcement learning
CN112838946B (en) * 2020-12-17 2023-04-28 国网江苏省电力有限公司信息通信分公司 Method for constructing intelligent sensing and early warning model based on communication network faults
CN112766705A (en) * 2021-01-13 2021-05-07 北京洛塔信息技术有限公司 Distributed work order processing method, system, device and storage medium
CN112966431B (en) * 2021-02-04 2023-04-28 西安交通大学 Data center energy consumption joint optimization method, system, medium and equipment
CN112801303A (en) * 2021-02-07 2021-05-14 中兴通讯股份有限公司 Intelligent pipeline processing method and device, storage medium and electronic device
CN113115451A (en) * 2021-02-23 2021-07-13 北京邮电大学 Interference management and resource allocation scheme based on multi-agent deep reinforcement learning
CN113094171A (en) * 2021-03-31 2021-07-09 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium
US20220321605A1 (en) * 2021-04-01 2022-10-06 Cisco Technology, Inc. Verifying trust postures of heterogeneous confidential computing clusters
CN113325721B (en) * 2021-08-02 2021-11-05 北京中超伟业信息安全技术股份有限公司 Model-free adaptive control method and system for industrial system
CN113672372B (en) * 2021-08-30 2023-08-08 福州大学 Multi-edge collaborative load balancing task scheduling method based on reinforcement learning
CN114003121B (en) * 2021-09-30 2023-10-31 中国科学院计算技术研究所 Data center server energy efficiency optimization method and device, electronic equipment and storage medium
CN113641462B (en) * 2021-10-14 2021-12-21 西南民族大学 Virtual network hierarchical distributed deployment method and system based on reinforcement learning
WO2023121514A1 (en) * 2021-12-21 2023-06-29 Владимир Германович КРЮКОВ System for making decisions in a multi-agent environment
CN114116183B (en) * 2022-01-28 2022-04-29 华北电力大学 Data center service load scheduling method and system based on deep reinforcement learning
CN114518948A (en) * 2022-02-21 2022-05-20 南京航空航天大学 Large-scale microservice application-oriented dynamic perception rescheduling method and application
CN114924684A (en) * 2022-04-24 2022-08-19 南栖仙策(南京)科技有限公司 Environmental modeling method and device based on decision flow graph and electronic equipment
CN114860416B (en) * 2022-06-06 2024-04-09 清华大学 Distributed multi-agent detection task allocation method and device in countermeasure scene
CN114781072A (en) * 2022-06-17 2022-07-22 北京理工大学前沿技术研究院 Decision-making method and system for unmanned vehicle
CN115293451B (en) * 2022-08-24 2023-06-16 中国西安卫星测控中心 Resource dynamic scheduling method based on deep reinforcement learning
CN116151137B (en) * 2023-04-24 2023-07-28 之江实验室 Simulation system, method and device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103873569B (en) * 2014-03-05 2017-04-19 兰雨晴 Resource optimized deployment method based on IaaS (infrastructure as a service) cloud platform
CN105607952B (en) * 2015-12-18 2021-04-20 航天恒星科技有限公司 Method and device for scheduling virtualized resources
CN108009016B (en) * 2016-10-31 2021-10-22 华为技术有限公司 Resource load balancing control method and cluster scheduler
US10649966B2 (en) * 2017-06-09 2020-05-12 Microsoft Technology Licensing, Llc Filter suggestion for selective data import
CN108021451B (en) * 2017-12-07 2021-08-13 上海交通大学 Self-adaptive container migration method in fog computing environment
CN108829494B (en) * 2018-06-25 2020-09-29 杭州谐云科技有限公司 Container cloud platform intelligent resource optimization method based on load prediction
CN109165081B (en) * 2018-08-15 2021-09-28 福州大学 Web application self-adaptive resource allocation method based on machine learning
CN109068350B (en) * 2018-08-15 2021-09-28 西安电子科技大学 Terminal autonomous network selection system and method for wireless heterogeneous network
CN109947567B (en) * 2019-03-14 2021-07-20 深圳先进技术研究院 Multi-agent reinforcement learning scheduling method and system and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2791840C2 (en) * 2021-12-21 2023-03-13 Владимир Германович Крюков Decision-making system in a multi-agent environment

Also Published As

Publication number Publication date
CN109947567A (en) 2019-06-28
WO2020181896A1 (en) 2020-09-17

Similar Documents

Publication Publication Date Title
CN109947567B (en) Multi-agent reinforcement learning scheduling method and system and electronic equipment
Liu et al. Adaptive asynchronous federated learning in resource-constrained edge computing
CN104317658B (en) A kind of loaded self-adaptive method for scheduling task based on MapReduce
CN111274036B (en) Scheduling method of deep learning task based on speed prediction
CN104408518B (en) Based on the neural network learning optimization method of particle swarm optimization algorithm
Shi et al. Energy-aware container consolidation based on PSO in cloud data centers
CN105975342A (en) Improved cuckoo search algorithm based cloud computing task scheduling method and system
Mechalikh et al. PureEdgeSim: A simulation framework for performance evaluation of cloud, edge and mist computing environments
US20230206132A1 (en) Method and Apparatus for Training AI Model, Computing Device, and Storage Medium
CN114237869B (en) Ray double-layer scheduling method and device based on reinforcement learning and electronic equipment
CN115085202A (en) Power grid multi-region intelligent power collaborative optimization method, device, equipment and medium
CN112732444A (en) Distributed machine learning-oriented data partitioning method
CN115168027A (en) Calculation power resource measurement method based on deep reinforcement learning
CN115543626A (en) Power defect image simulation method adopting heterogeneous computing resource load balancing scheduling
CN115934344A (en) Heterogeneous distributed reinforcement learning calculation method, system and storage medium
Gand et al. A fuzzy controller for self-adaptive lightweight edge container orchestration
Moazeni et al. Dynamic resource allocation using an adaptive multi-objective teaching-learning based optimization algorithm in cloud
CN114567560A (en) Edge node dynamic resource allocation method based on generation confrontation simulation learning
Tuli et al. Optimizing the Performance of Fog Computing Environments Using AI and Co-Simulation
Yang et al. Energy saving strategy of cloud data computing based on convolutional neural network and policy gradient algorithm
Tang et al. Edge computing energy-efficient resource scheduling based on deep reinforcement learning and imitation learning
Su et al. A power-aware virtual machine mapper using firefly optimization
CN114492052A (en) Global stream level network simulation method, system and device
CN114090239A (en) Model-based reinforcement learning edge resource scheduling method and device
Whiteside et al. Pann: Power allocation via neural networks dynamic bounded-power allocation in high performance computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant