CN117240636A

CN117240636A - Data center network energy saving method and system based on reinforcement learning

Info

Publication number: CN117240636A
Application number: CN202211423259.8A
Authority: CN
Inventors: 潘恬; 高明岚; 周夏欣; 宋恩格; 黄韬; 刘韵洁
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2022-11-15
Filing date: 2022-11-15
Publication date: 2023-12-15

Abstract

The application provides a data center network energy saving method and a system based on reinforcement learning, wherein the method comprises the following steps: the network state data of the data center network at the current moment is sent to the intelligent agent, so that the intelligent agent performs complexity reduction processing on the network state data, and based on a deep reinforcement learning algorithm, the network state data after the complexity reduction processing and the deep neural network are applied to generate energy-saving action decision data of the data center network for a target link at the next moment; and receiving the energy-saving action decision data, opening or closing the target link, and updating the network topology so that the controller correspondingly updates the data center network according to the updated network topology. The application can effectively reduce the power consumption of the data center on the basis of ensuring the stability of the topological integral structure of the data center, realize the energy-saving control of the data center network, effectively reduce the algorithm complexity in the energy-saving process, and further effectively improve the generation efficiency of energy-saving decisions.

Description

Data center network energy saving method and system based on reinforcement learning

Technical Field

The application relates to the technical field of data processing, in particular to a data center network energy saving method and system based on reinforcement learning.

Background

In recent years, the global data center market has matured and has been expanding in size, and the number and capacity of global data centers has doubled over the past five years. Investigation has shown that power is one of the largest operating costs of data centers and power consumption is also a very interesting indicator for operators. However, in data centers, many devices often operate with very low efficiency because the number of devices and the link capacity of the data center network often far exceed the actual traffic demands. In some cases, even more than 20% of the devices are on, but no traffic actually passes. Therefore, it is necessary to explore a power saving scheme for a data center network. Considering the topology of the full connection and the low link utilization in a data center network, one obvious energy saving idea is to aggregate traffic onto some critical links for transmission and to shut down the remaining free switches. However, how to select the critical links, and accordingly how to select the switches that need to be closed, is an important issue that needs to be studied.

However, the existing data center network energy-saving method can adopt the modes such as an elastic Tree or GreenTE.ai, but the modes can not simultaneously meet the requirements of guaranteeing the stability of the data center network and reducing the power consumption of the data center, and meanwhile, the problems of low energy-saving decision generation efficiency and the like caused by high algorithm complexity exist.

Disclosure of Invention

In view of the foregoing, embodiments of the present application provide a reinforcement learning-based data center network energy saving method and system that obviate or mitigate one or more disadvantages in the prior art.

A first aspect of the present application provides a data center network energy saving method based on reinforcement learning, comprising: a data center network energy conservation decision step, the data center network energy conservation decision step comprising:

the network state data of the current moment of the data center network is sent to an intelligent agent, so that the intelligent agent performs complexity reduction processing on the network state data, and based on a deep reinforcement learning algorithm, energy-saving action decision data of a target link at the next moment of the data center network is generated by applying the network state data subjected to the complexity reduction processing and a deep neural network;

receiving the energy-saving action decision data, opening or closing the target link according to the energy-saving action decision data, and correspondingly updating the network topology corresponding to the data center network;

and sending the updated network topology to a controller of the data center network so that the controller correspondingly updates the switch state in the data center network according to the updated network topology.

In some embodiments of the application, further comprising:

the method comprises the steps of carrying out real-time network state detection on a data center network to acquire network state data of the data center network at the current moment, and repeatedly executing the data center network energy-saving decision step, so that the intelligent agent combines the corresponding relation among action decision data, previous network state data, updated network state data and rewards acquired when the data center network energy-saving decision step is executed each time into an experience, and stores the experience in a replay buffer; and then in the process of repeatedly executing the data center network energy-saving decision step, the intelligent agent continuously extracts the experience from the replay buffer and learns and trains the deep neural network until the deep neural network converges.

In some embodiments of the present application, the state space corresponding to the deep reinforcement learning algorithm adopts a network topology structure formed by a one-dimensional array to store the data center network;

the reward function corresponding to the deep reinforcement learning algorithm is set according to a power consumption simplification rule, and the power consumption simplification rule comprises: the power consumption of the data center network is formed by the sum of the power consumption of the switches in the data center network;

And implementing decision of the deep neural network based on the DQN algorithm after improving the algorithm performance by a performance improvement mode, wherein the performance improvement mode comprises the following steps: at least one of a preset learning efficiency improving mode, a preset stability improving mode and a preset state value and action rewarding separating mode.

In some embodiments of the present application, after the agent generates the energy-saving action decision data for a target link at the next time of the data center network, action preprocessing is further performed on the energy-saving action decision data to determine whether to filter the energy-saving action decision data, and if not, the agent sends the energy-saving action decision data.

In some embodiments of the application, the complexity reduction process comprises: connectivity assurance processing for a data center network topology and/or topology segmentation processing based on a data center network topology rule.

A second aspect of the present application provides a data center network energy saving method based on reinforcement learning, comprising:

receiving network state data of the data center network at the current moment, which is sent by a monitoring device;

performing complexity reduction processing on the network state data, and based on a deep reinforcement learning algorithm, applying the network state data subjected to the complexity reduction processing and a deep neural network to generate energy-saving action decision data aiming at a target link at the next moment of the data center network;

The energy-saving action decision data is sent to the monitoring device, so that the monitoring device receives the energy-saving action decision data, the target link is opened or closed according to the energy-saving action decision data, the network topology corresponding to the data center network is correspondingly updated, and the monitoring device sends the updated network topology to the controller of the data center network, so that the controller correspondingly updates the switch state in the data center network according to the updated network topology.

In some embodiments of the application, further comprising:

after receiving the network state data of the data center network at the current moment sent by the monitoring device each time, merging the corresponding relation among the current action decision data, the previous network state data, the updated network state data and rewards into an experience, and storing the experience in a replay buffer; the experience is then continually extracted from the replay buffer and used to learn and train the deep neural network until the deep neural network converges.

In some embodiments of the present application, after generating the energy-saving action decision data for a target link at the next time of the data center network, the energy-saving action decision data is further subjected to action preprocessing to determine whether to filter the energy-saving action decision data, and if not, the energy-saving action decision data is sent to the monitoring device.

The third aspect of the present application also provides a monitoring device, comprising: a communication module and a network management module for performing a data center network power saving decision step, wherein,

the communication module is used for sending network state data of the current moment of the data center network to an intelligent agent so that the intelligent agent can carry out complexity reduction processing on the network state data, and based on a deep reinforcement learning algorithm, energy-saving action decision data of the next moment of the data center network aiming at a target link is generated by applying the network state data subjected to the complexity reduction processing and a deep neural network;

The network management module is used for receiving the energy-saving action decision data through the communication module, opening or closing the target link according to the energy-saving action decision data, and correspondingly updating the network topology corresponding to the data center network;

the network management module is further configured to send the updated network topology to a controller of the data center network, so that the controller correspondingly updates the switch state in the data center network according to the updated network topology.

In some embodiments of the application, further comprising:

the network monitoring module is used for carrying out real-time network state detection on the data center network to acquire network state data of the data center network at the current moment, repeatedly executing the data center network energy-saving decision step, enabling the intelligent agent to combine the corresponding relation among action decision data acquired when the data center network energy-saving decision step is executed each time, the previous network state data, updated network state data and rewards into an experience, and storing the experience in a replay buffer; and then in the process of repeatedly executing the data center network energy-saving decision step, the intelligent agent continuously extracts the experience from the replay buffer and learns and trains the deep neural network until the deep neural network converges.

The fourth aspect of the present application also provides an agent for performing the steps of:

The fifth aspect of the present application also provides a data center network energy saving system based on reinforcement learning, comprising:

monitoring means for performing the reinforcement learning-based data center network energy saving method provided in the foregoing first aspect;

And the intelligent agent is used for executing the data center network energy saving method based on reinforcement learning provided by the second aspect.

A sixth aspect of the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the reinforcement learning based data center network power saving method provided in the foregoing first aspect or the reinforcement learning based data center network power saving method provided in the second aspect when the computer program is executed.

A seventh aspect of the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the reinforcement learning-based data center network power saving method provided in the foregoing first aspect or the reinforcement learning-based data center network power saving method provided in the second aspect.

According to the data center network energy saving method based on reinforcement learning, network state data of the data center network at the current moment is sent to an intelligent agent, so that the intelligent agent can carry out complexity reduction processing on the network state data, and based on a deep reinforcement learning algorithm, energy saving action decision data of the data center network for a target link at the next moment is generated by applying the network state data subjected to the complexity reduction processing and a deep neural network; the energy-saving action decision data is received through the communication module, the target link is opened or closed according to the energy-saving action decision data, and the network topology corresponding to the data center network is correspondingly updated; the updated network topology is sent to the controller of the data center network, so that the controller correspondingly updates the switch state in the data center network according to the updated network topology, the power consumption of the data center can be effectively reduced on the basis of guaranteeing the stability of the overall structure of the data center topology, the energy-saving control of the data center network is realized, the algorithm complexity in the energy-saving process can be effectively reduced, and the generation efficiency of energy-saving decisions can be effectively improved.

Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.

It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present application are not limited to the above-described specific ones, and that the above and other objects that can be achieved with the present application will be more clearly understood from the following detailed description.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and together with the description serve to explain the application. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the application. Corresponding parts in the drawings may be exaggerated, i.e. made larger relative to other parts in an exemplary device actually manufactured according to the present application, for convenience in showing and describing some parts of the present application. In the drawings:

fig. 1 is a schematic diagram of an architecture of a reinforcement learning-based data center network energy saving system according to an embodiment of the application.

Fig. 2 is a flowchart of a data center network energy saving method based on reinforcement learning performed by a monitoring device according to an embodiment of the application.

Fig. 3 is a flow chart of a data center network energy saving method based on reinforcement learning performed by an agent according to another embodiment of the present application.

Fig. 4 is a schematic structural diagram of a monitoring device according to another embodiment of the present application.

Fig. 5 is a schematic structural diagram of an agent according to another embodiment of the present application.

Fig. 6 is a schematic diagram showing the change of the number of steps per round in the training process of the agent provided in the application example of the present application.

FIG. 7 is a graph showing the change in the percentage of the effective motion per round of the training process of the agent provided in the application example of the present application.

Fig. 8 is a schematic diagram of the change of the number of open links and energy consumption along with the flow in the data center network after the reinforcement learning-based data center network energy saving system is deployed in the application example of the present application.

Fig. 9 is a schematic diagram showing the change of energy consumption of a data center network with flow after deployment of the reinforcement learning-based data center network energy saving system and the elastic tree provided in the application example of the present application.

FIG. 10 is a diagram showing the comparison of the convergence rounds and energy saving efficiency of the reduced complexity algorithm and the unreduced complexity algorithm provided in the application example of the present application.

Detailed Description

The present application will be described in further detail with reference to the following embodiments and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present application more apparent. The exemplary embodiments of the present application and the descriptions thereof are used herein to explain the present application, but are not intended to limit the application.

It should be noted here that, in order to avoid obscuring the present application due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present application are shown in the drawings, while other details not greatly related to the present application are omitted.

It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

It is also noted herein that the term "coupled" may refer to not only a direct connection, but also an indirect connection in which an intermediate is present, unless otherwise specified.

Hereinafter, embodiments of the present application will be described with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar components, or the same or similar steps.

An elastic tree is a network-wide energy optimizer that continuously monitors traffic conditions in a data center network and selects a subset of network devices to meet the data center network performance requirements, and then shuts down as many unnecessary links and switches as possible. The elastctrie proposes a number of methods to find this subset of network devices, for example: an optimal model, a greedy container algorithm, a topology aware heuristic algorithm, and the like. The optimal model method is a linear programming method, constraint conditions comprise link capacity, flow conservation and flow forwarding demand satisfaction, variables are flow on each link, inputs of an algorithm are a topological structure, a switch power consumption model and a flow matrix, after data are input, the algorithm represents opening or closing of each link and each switch by binary variables, so that a problem is modeled as a 0-1 programming problem, and the aim of the programming is to minimize total network power and meet all constraints. The greedy container algorithm works by, for each traffic, evaluating all possible paths that can take over that traffic according to link capacity and traffic demand, and selecting the "leftmost" path among them, which means that for each layer in the structured topology, the paths are preferentially selected in order from left to right, rather than in random order. Topology aware heuristics assume that traffic is fully separable, it first calculates the number of aggregation layer switches needed to support the most active primary traffic, then turns on the partially related devices and turns off the remaining devices.

However, the optimal model approach proposed in the elastic tree, whose solution time, after analysis, will be up to 3.5 times the number of hosts, takes hundreds of seconds to complete computation even in very small scale data center networks, and over such long time the traffic matrix in the data center network has changed long enough that its solution is lagging, even ineffective, not of practical significance. The greedy container algorithm is taken as a heuristic method, although the time required for solving is reduced to a certain extent, the greedy container algorithm cannot guarantee to solve the optimal solution, rules in the heuristic method are very dependent on manual formulation, universality is often lacking, when the network topology or the demand of a data center changes, the original heuristic method is difficult to adapt quickly, and a great amount of manpower and material resources are often consumed for rescheduling design. Topology aware heuristics are designed based on the concept that traffic is fully separable, but this is not practical because such idealized splitting may result in different packets belonging to the same flow being scattered over different paths for transmission, but such a transmission is highly likely to cause packet misordering, causing a series of chain reactions related to TCP, so this approach is likewise not deployable irrespective of the actual performance of the switch and the actual requirements of data transmission.

In addition, greente.ai also focuses on how to design how traffic loads are aggregated onto fewer links and shut down the resulting idle devices to achieve power savings. More specifically, because the restart time of the router is long, greente.ai will not completely turn off the power of the network device, but put it into sleep mode, which can save a part of power and avoid the switch from taking too long to turn on. Greente ai uses real-time, network-wide switch state measurements with a central controller to have a global view of traffic load distribution throughout the network, once the controller detects that the traffic load of certain links exceeds a predefined water level, it will open some dormant routers to carry overload traffic; and when the load of the network link is reduced, the controller places as many routers as possible in the sleep mode to achieve maximum energy efficiency. In such designs, the critical issues are: under a traffic load, how to decide which switches to turn on or off can meet the demand, greente.ai implements this part of the functionality using reinforcement learning, and greente.ai maps the energy saving problem into a deep reinforcement learning instance by formulating its state space, operation space and rewards. The state space is defined as the traffic load on the link, which is acquired using in-band network telemetry. An operation space is defined as arbitrarily changing the state of a certain router. The prize is defined as a function of a value that is a weighted calculation of the number of routers in operation and the number of congested links.

However, while greente ai's design based on deep reinforcement learning has been able to make decisions in a very short time and flexibly adjust with network load changes. However, greente.ai is not fine-grained enough to use the entire switch as the smallest unit for opening or closing, and meeting the condition that the entire switch can be closed may be more demanding and detrimental to guaranteeing the stability of the data center network. Also, as the data center network grows in size, the state and action space of reinforcement learning defined by greente.ai increases, which would present an unpredictable risk to reinforcement learning training and decision making, and even would cause reinforcement learning training failure to converge.

That is, the data center makes the power supply expenditure of the equipment the largest single operation cost of the data center because of the huge number of the equipment and the links. But in general, the number of devices and link capacity of a data center network often far exceeds the actual traffic demand, and the operating efficiency of many devices is often low. In some cases, it may even happen that more than 20% of the devices are powered on, but do not actually work. And as the global data center market has matured in recent years, the number and capacity of data centers has increased even a factor of two over the last five years, which has also led to a dramatic increase in the expenditure of data centers in terms of electricity. These result in significant power wastage and insignificant cost expenditures. The application hopes to realize a decision mode, so that links and devices in a data center are in a dormant mode as much as possible, and the working process is changed from continuously starting and waiting for traffic to be directed to a certain link only after the traffic is decided, and the link can enter the working mode. Under the working mode, the stability of the topological integral structure of the data center can be ensured, and the problem of overlarge power consumption of the data center can be effectively improved.

Meanwhile, the number of power-consuming links in the time slice is reduced through on-off decision of the links, so that the overall power consumption is reduced. The decision making algorithm needs to be performed for the port state of each device, but the decision making algorithm for sequentially making on-off decision for each device port in the topology updated in real time can have too high complexity due to the huge number of devices and links in the data center, which can lead to insufficient time for decision update and cannot adapt to the update of the network state and real-time distribution of traffic to the greatest extent. Therefore, the application hopes to reduce the time for generating the decision by reducing the problem of excessive complexity of the algorithm, so that the speed of generating the decision and the speed of changing the network state and the flow of the data center can be matched in a fine granularity.

Based on this, the embodiment of the application designs a closed-loop control system, which can be specifically called as a data center network energy-saving system based on reinforcement learning, and can be abbreviated as GreenDCN.ai, and the complete structure diagram thereof is shown in FIG. 1. Greendcn.ai consists of a monitoring device (which may be written as GreenDCN-monitor) and an agent (which may be written as GreenDCN-agent), wherein a network monitoring module in the monitoring device collects network device status information within the network range based on in-band network telemetry (Inband Network Telemetry, INT), namely: port-level device internal state information (queue length, queuing delay, etc. of ports) and manages network device port states (open or closed) through a network management module and a controller; the intelligent agent is realized based on deep reinforcement learning (Deep Reinforcement Learning, DRL), and can make action decisions according to the state information of the network equipment provided by the monitoring device through the communication module thereof, wherein the action decisions are used for adjusting the opening or closing state of the ports of the network equipment, and the action decisions are informed to the network management module through the communication module so as to enable the controller to change the actual running state of the corresponding ports; and then, the monitoring device feeds back the updated network state to the intelligent body through the process, and the intelligent body calculates the energy-saving benefit brought by the action decision according to the change of the network state before and after the update. Wherein a represents action, r represents rewards, s and s 'represent states at different moments respectively, and Q' represent Q networks of different intelligent agents respectively.

The deep reinforcement learning algorithm is zero experience in an initial state, the action is random, the network state information acquisition, action decision-making, decision issuing, new state acquisition and benefit calculation are carried out, the information in the process is summarized into an experience and stored in a replay buffer, and the deep reinforcement learning is carried out by continuously extracting the experience from the replay buffer and learning, so that the effect of intelligent decision is finally realized.

The following examples are provided to illustrate the application in more detail.

The embodiment of the application provides a data center network energy saving method based on reinforcement learning, which can be realized by a monitoring device, referring to fig. 2, the data center network energy saving method based on reinforcement learning, which is realized by the monitoring device, specifically comprises the following contents:

a data center network energy conservation decision step, the data center network energy conservation decision step comprising:

step 100: and sending the network state data of the current moment of the data center network to an agent so that the agent can carry out complexity reduction processing on the network state data, and based on a deep reinforcement learning algorithm, applying the network state data subjected to the complexity reduction processing and the deep neural network to generate energy-saving action decision data of a target link at the next moment of the data center network.

In step 100, the network status data may include internal status information of the port-level device (queue length, queuing delay, etc. of the port). The energy-saving action decision data is used for adjusting the opening or closing state of the network equipment port.

It can be understood that the action space corresponding to the deep reinforcement learning DRL algorithm refers to: if each action can change the switching state of any link, then for a network containing k links, the action space will be any subset of the set of links, i.e., there will be 2k action decisions that may be taken under a particular link state. Obviously, such a large action space is detrimental to training and decision making by deep reinforcement learning algorithms. Therefore, the present application sets the actions to change only one link state at a time, i.e. only one link is opened or closed at a time, thereby simplifying the action space from 2k to 2k+1.

Step 200: and receiving the energy-saving action decision data, opening or closing the target link according to the energy-saving action decision data, and correspondingly updating the network topology corresponding to the data center network.

Step 300: and sending the updated network topology to a controller of the data center network so that the controller correspondingly updates the switch state in the data center network according to the updated network topology.

The intelligent agent informs the network management module of the energy-saving action decision data through the communication module so as to enable the controller to change the actual running state of the corresponding port.

In one or more embodiments of the application, the data center network topology: is a group of switches or router clusters that are arranged and connected in accordance with certain rules, and shown in fig. 1 is a Spine-to-Spine data center network topology that is formed by a layer of Leaf-node Leaf switches and a layer of Spine-node Spine switches that are connected together in pairs to form a fully connected topology.

It can be understood that, after step 300, the monitoring device may also feed back the updated network status to the agent again through the above process, and the agent calculates the energy saving benefit brought by the action decision according to the change of the network status before and after the update. That is, the step 300 in the embodiment of the data center network energy saving method based on reinforcement learning may further specifically include:

step 400: and carrying out real-time network state detection on the data center network to acquire network state data of the data center network at the current moment.

And then returning to execute the data center network energy saving decision step (steps 100 to 300), so that the intelligent agent combines the corresponding relation among the action decision data, the previous network state data, the updated network state data and rewards obtained when the data center network energy saving decision step is executed each time into an experience, and stores the experience in a replay buffer; and then in the process of repeatedly executing the data center network energy-saving decision step, the intelligent agent continuously extracts the experience from the replay buffer and learns and trains the deep neural network until the deep neural network converges.

In order to improve an intermediate structure of an algorithm for training, reduce convergence difficulty of a deep neural network and further reduce computational complexity, in the data center network energy-saving method based on reinforcement learning, which can be realized by a monitoring device, provided by the embodiment of the application, a state space corresponding to the deep reinforcement learning algorithm adopts a network topology structure of a one-dimensional array to form and store the data center network;

In particular, the state space improvement is that: in general, the present application uses an adjacency matrix to store a network topology structure, and also uses an adjacency matrix to store link states in a network, specifically, states of an element value corresponding to one link at each position in the adjacency matrix, where the states are classified into four types of closed, open but congested, open and normal operation, open but no load. In practice, however, there is a great deal of redundancy in this representation using adjacency matrices, firstly, the links considered by the present application are bi-directional links, so the adjacency matrices obtained by the present application will always be symmetric matrices, and secondly, there are many elements in the matrices that are not practical, such as diagonal elements, etc. Therefore, the application further converts the adjacency matrix into a one-dimensional array form, which is beneficial to information transfer between the intelligent agent and the monitoring device and is beneficial to directly taking the network state as the input of the deep reinforcement learning algorithm.

Wherein the improvement of the bonus function is that: to achieve the goal of data center network power conservation, the present application defines the reward function to be inversely proportional to network power consumption and directly proportional to network link quality. According to the energy consumption data of the switches under different flow modes and configurations already given in the prior art, the power consumption of the whole data center network is further modeled, and the power consumption of the data center network is simply considered to be formed by the sum of the power consumption of the switches.

In addition, the data center network energy-saving scheme provided by the application makes an intelligent decision by using a deep reinforcement learning algorithm. Because the action space and the state space of the actual problems to be processed are discrete, the application selects the function of realizing intelligent decision by using the DQN algorithm of the deep Q network, and on the basis, three modes are introduced simultaneously, so that the algorithm performance of the DQN is further improved: a preset learning efficiency improving mode, a stability improving mode and a state value and action rewarding separating mode.

In order to further improve the application effectiveness and reliability of the energy-saving decision, in the data center network energy-saving method based on reinforcement learning, which can be realized by the monitoring device, after the intelligent agent generates the energy-saving action decision data of a target link at the next moment of the data center network, action preprocessing is performed on the energy-saving action decision data to determine whether to filter the energy-saving action decision data, and if not, the intelligent agent sends the energy-saving action decision data.

In order to further reduce the computational complexity by utilizing the regularity of the data center network topology, in the data center network energy saving method based on reinforcement learning, which can be implemented by the monitoring device, the complexity reduction processing includes: connectivity assurance processing for a data center network topology and/or topology segmentation processing based on a data center network topology rule.

In addition, another embodiment of the present application further provides a data center network energy saving method based on reinforcement learning, which can be implemented by an agent, referring to fig. 3, the data center network energy saving method based on reinforcement learning implemented by an agent specifically includes the following contents:

step 500: and receiving network state data of the current moment of the data center network sent by the monitoring device.

Step 600: and performing complexity reduction processing on the network state data, and based on a deep reinforcement learning algorithm, applying the network state data subjected to the complexity reduction processing and a deep neural network to generate energy-saving action decision data aiming at a target link at the next moment of the data center network.

Step 700: the energy-saving action decision data is sent to the monitoring device, so that the monitoring device receives the energy-saving action decision data, the target link is opened or closed according to the energy-saving action decision data, the network topology corresponding to the data center network is correspondingly updated, and the monitoring device sends the updated network topology to the controller of the data center network, so that the controller correspondingly updates the switch state in the data center network according to the updated network topology.

In order to improve the intermediate structure of the algorithm for training, reduce the convergence difficulty of the deep neural network and further reduce the computational complexity, the data center network energy-saving method based on reinforcement learning, which can be realized by an intelligent agent, provided by the embodiment of the application further comprises the following specific contents:

step 800: after receiving the network state data of the data center network at the current moment sent by the monitoring device each time, merging the corresponding relation among the current action decision data, the previous network state data, the updated network state data and rewards into an experience, and storing the experience in a replay buffer; the experience is then continually extracted from the replay buffer and used to learn and train the deep neural network until the deep neural network converges.

In order to improve an intermediate structure of an algorithm for training, reduce convergence difficulty of a deep neural network and further reduce computational complexity, in the data center network energy-saving method based on reinforcement learning, which can be realized by an intelligent agent, provided by the embodiment of the application, a state space corresponding to the deep reinforcement learning algorithm adopts a network topology structure of a one-dimensional array to form and store the data center network;

and implementing decision of the deep neural network based on the DQN algorithm after improving the algorithm performance by a performance improvement mode, wherein the performance improvement mode comprises the following steps: at least one of learning efficiency improving mode, stability improving mode and separating mode of state value and action rewards.

In order to further improve the application effectiveness and reliability of the energy-saving decision, in the reinforcement learning-based data center network energy-saving method that can be implemented by the intelligent agent provided by the embodiment of the application, after energy-saving action decision data for a target link at the next moment of generating the data center network is generated, action preprocessing is performed on the energy-saving action decision data to determine whether to filter the energy-saving action decision data, and if not, the energy-saving action decision data is sent to the monitoring device.

In order to further reduce the computational complexity by utilizing the regularity of the data center network topology, in the data center network energy saving method based on reinforcement learning, which can be realized by an agent, provided by the embodiment of the application, the complexity reduction processing includes: connectivity assurance processing for a data center network topology and/or topology segmentation processing based on a data center network topology rule.

From the software aspect, the present application further provides a monitoring device for executing all or part of the reinforcement learning-based data center network energy saving method shown in fig. 2, referring to fig. 4, where the monitoring device specifically includes the following contents:

the communication module 10 is configured to send network state data of a current moment of a data center network to an agent, so that the agent performs complexity reduction processing on the network state data, and based on a deep reinforcement learning algorithm, applies the network state data after the complexity reduction processing and a deep neural network to generate energy-saving action decision data of a next moment of the data center network for a target link;

the network management module 20 is configured to receive the energy-saving action decision data through the communication module, perform opening or closing processing on the target link according to the energy-saving action decision data, and correspondingly update a network topology corresponding to the data center network;

the network management module 20 is further configured to send the updated network topology to a controller of the data center network, so that the controller correspondingly updates the switch state in the data center network according to the updated network topology.

The network monitoring module 30 is configured to perform real-time network state detection on the data center network to acquire network state data at the current moment of the data center network, and repeatedly perform the data center network energy-saving decision step, so that the agent merges the corresponding relationship among the action decision data acquired each time the data center network energy-saving decision step is performed, the previous network state data, the updated network state data and the rewards into an experience, and stores the experience in the replay buffer; and then in the process of repeatedly executing the data center network energy-saving decision step, the intelligent agent continuously extracts the experience from the replay buffer and learns and trains the deep neural network until the deep neural network converges.

The embodiment of the monitoring device provided by the application can be specifically used for executing the processing flow of the embodiment of the data center network energy saving method based on reinforcement learning in the embodiment shown in fig. 2, and the functions of the processing flow are not repeated herein, and reference can be made to the detailed description of the embodiment of the data center network energy saving method based on reinforcement learning shown in fig. 2.

The part of the monitoring device for data center network energy saving based on reinforcement learning can be completed in the client equipment. Specifically, the selection may be made according to the processing capability of the client device, and restrictions of the use scenario of the user. The application is not limited in this regard. If all operations are performed in the client device, the client device may further include a processor for reinforcement learning based specific processing of data center network energy savings.

The client device may have a communication module (i.e. a communication unit) and may be connected to a remote server in a communication manner, so as to implement data transmission with the server. The server may include a server on the side of the task scheduling center, and in other implementations may include a server of an intermediate platform, such as a server of a third party server platform having a communication link with the task scheduling center server. The server may include a single computer device, a server cluster formed by a plurality of servers, or a server structure of a distributed device.

Any suitable network protocol may be used between the server and the client device, including those not yet developed on the filing date of the present application. The network protocols may include, for example, TCP/IP protocol, UDP/IP protocol, HTTP protocol, HTTPS protocol, etc. Of course, the network protocol may also include, for example, RPC protocol (Remote Procedure Call Protocol ), REST protocol (Representational State Transfer, representational state transfer protocol), etc. used above the above-described protocol.

From a software aspect, the present application further provides an agent for performing all or part of the reinforcement learning-based data center network energy saving method shown in fig. 3, referring to fig. 5, where the agent specifically includes the following contents:

a state acquisition module 40, configured to receive network state data of a data center network at a current moment sent by the monitoring device;

the state preprocessing module 50 is configured to perform complexity reduction processing on the network state data, and apply the network state data after the complexity reduction processing and the deep neural network to generate energy-saving action decision data for a target link at a next moment of the data center network based on a deep reinforcement learning algorithm;

the learning and decision module 60 is configured to send the energy-saving action decision data to the monitoring device, so that the monitoring device receives the energy-saving action decision data, performs opening or closing processing on the target link according to the energy-saving action decision data, and correspondingly updates the network topology corresponding to the data center network, and the monitoring device sends the updated network topology to the controller of the data center network, so that the controller correspondingly updates the switch state in the data center network according to the updated network topology.

Based on the embodiments of the monitoring device and the agent, in the data center network energy saving system based on reinforcement learning provided by the application, the agent is used for executing the content of the data center network energy saving method based on reinforcement learning shown in fig. 3, and the monitoring device is used for executing the content of the data center network energy saving method based on reinforcement learning shown in fig. 2.

In order to further explain the scheme, the application also provides a specific application example of the reinforced learning data center network energy saving method realized by the reinforced learning data center network energy saving system.

In the application example, in the data center network energy-saving system based on reinforcement learning, the monitoring device is responsible for monitoring and managing network states, and the intelligent agent is an intelligent energy-saving decision algorithm based on DRL.

The monitoring device mainly comprises a data center network topology and three functional modules:

(1) Data center network topology: is a group of switches or router clusters that are arranged and connected in accordance with certain rules, and shown in fig. 1 is a Spine-to-Spine data center network topology that is formed by a layer of Leaf-node Leaf switches and a layer of Spine-node Spine switches that are connected together in pairs to form a fully connected topology.

(2) Network Monitor module (Network Monitor): the main function of the method is to collect the internal state information of the equipment in the data center network as the input of an intelligent agent, wherein the state information is real-time and fine-grained and comprises the length of an output port to an input port queue, buffer zone information, link utilization rate, queuing delay and the like. The application uses INT technology as network information collection frame, when the data packet carrying INT information field passes through the switch port, the relative information of the switch port is embedded into INT field, and reported to OVS controller at last hop of the whole monitoring path, then the network monitoring module can extract the complete information of network environment from the controller.

(3) Communication module (Communication): the communication module plays important roles of information transmission and translation between the monitoring device and the intelligent agent, and the action decision from the intelligent agent is analyzed into an operation of opening or closing a specific link in the data center network and then is transmitted to the network management module; and the internal state information of the whole network equipment from the network monitoring module of the monitoring device is processed into data meeting the input form of the algorithm and sent to the intelligent agent.

(4) Network management module (Network Management): the network management module is responsible for managing the switch state in the data center network, when the action instruction output by the communication module is received, the network management module opens or closes the corresponding port through the controller, so that a new network topology is obtained, and then the controller finishes operations such as updating the route and the like according to the new network topology.

The decision-making ability of the agent is realized based on the deep reinforcement learning algorithm, and the action decisions made by the untrained deep reinforcement learning algorithm are almost random and are hard to be caught, but after executing the action decisions and obtaining the new network state fed back by the monitoring device, the agent can combine the four elements of action, old network state, updated network state and rewards into an experience and store the experience in the replay buffer. By continually extracting experience from the replay buffer and using it to learn and train its neural network, the agent is ultimately able to gain the ability to make energy-saving decisions quickly and efficiently in any state.

However, the larger scale data center network topology clearly brings more state and action space for the agent and generates a lot of experience stored in the replay buffer, resulting in problems of sparse rewards, inefficient learning, etc., in which case the deep reinforcement learning algorithm will be difficult to converge. In order to solve the problem, the application improves the intermediate structure of the algorithm for training on one hand, reduces the computational complexity by utilizing the regularity of the data center network topology on the other hand, and finally realizes the intelligent agent capable of stably converging. The algorithm used by the agent and its improvement are described in detail in section (two), and the method of reducing the computational complexity is described in detail in section (three).

(II) agent Algorithm description

In the closed loop architecture shown in fig. 1, the agent is responsible for deciding which action should be selected in the current full network link state reported by the monitoring device. A change in the state of one link in a data center network typically has less impact on current traffic forwarding and energy consumption, but it may have profound impact on future network connections and the state of switches at some important locations. Such long-term cumulative returns are difficult to judge directly, and samples of supervised learning are difficult to obtain.

Therefore, the application selects and utilizes the learning ability of the deep reinforcement learning DRL to realize the decision function of the intelligent agent by continuously interacting with the environment. Four major elements of the deep reinforcement learning DRL algorithm are:

(1) A state space, which is used as the input of the DRL algorithm, and represents the information of the current network environment required by the algorithm to make a decision; an action space, which is an output of the DRL algorithm, representing a set of actions that the DRL algorithm can take;

(2) A reward function that accurately and precisely describes the benefits obtained by taking an action in a particular state;

(3) An algorithm that decides how to select an action based on the current state, i.e., a connection is established between the state and the selection of an action;

(4) The environment in which the changes are made and feedback is given may be based on the actions. The function of the environment is performed by the monitoring device, as already described above.

The application introduces the designs of the three elements mentioned in (1) to (3) as follows:

(1) State and action space

State space: in general, the present application uses an adjacency matrix to store a network topology structure, and also uses an adjacency matrix to store link states in a network, specifically, states of an element value corresponding to one link at each position in the adjacency matrix, where the states are classified into four types of closed, open but congested, open and normal operation, open but no load. In practice, however, there is a great deal of redundancy in this representation using adjacency matrices, firstly, the links considered by the present application are bi-directional links, so the adjacency matrices obtained by the present application will always be symmetric matrices, and secondly, there are many elements in the matrices that are not practical, such as diagonal elements, etc. Therefore, the application further converts the adjacency matrix into a one-dimensional array form, which is beneficial to information transfer between the intelligent agent and the monitoring device and is beneficial to directly taking the network state as the input of the deep reinforcement learning algorithm.

Action space: if each action can change the switch state of any link, then for a containerThe action space will be any subset of the link set for a network of k links, i.e. under a particular link state, the action decisions that may be taken will be 2 ^k A kind of module is assembled in the module and the module is assembled in the module. Obviously, such a large action space is detrimental to training and decision making by deep reinforcement learning algorithms. Therefore, the application sets the action to change the state of one link at a time, namely only open or close one link at a time, thereby changing the action space from 2 ^k Reduced to 2k+1.

Action pretreatment (Action Pretreatment): the intelligent agent can generate some actions which do not need to be issued to the monitoring device for execution in the training stage, and one is meaningless actions, namely, the intelligent agent opens an already opened link and closes an already closed link; the other is the action of destroying the network connectivity, namely the two-by-two communication between the hosts cannot be ensured after the action is issued. It is necessary for the design action preprocessing module to filter out these non-downloadable actions, both from a practical point of view and from an experimental point of view.

(2) Reward function

In state s _t Performs action a in _t Thereafter, the state will be updated to s _t+1 . Upon receiving status update feedback from the monitoring device, the agent should calculate the benefit of the status transition based on the reward function. The deep reinforcement learning algorithm will learn constantly, ultimately learning how to maximize the benefit of each state transition. Therefore, in order to achieve the goal of data center network power conservation, the present application defines the reward function as inversely proportional to network power consumption and directly proportional to network link quality. According to the energy consumption data of the exchanger under different flow modes and configurations which are already given by the prior art, the application models the energy consumption of a single exchanger as follows:

in the above formula, ρ represents the proportion of the power consumed by the port to the total power of the switch, P _max Representing the power consumption of a single switch when fully loaded. S is S _w,x (Boolean variable, the same applies below) represents the intersectionWhether the switch w is in state x, where x comprises 3 possible states: n represents on, f represents off, and o represents no load. S is S _t,x And S is _l,x Representing port status and link status, respectively. T (T) _w Representing the port set of switch w.

Based on this, the present application further models the power consumption of the entire data center network, which the present application simply considers to be constituted by the sum of the power consumption of the switches therein. The application selects a data center network architecture conforming to a typical two-layer Spine-Leaf, and has the following characteristics of The power consumption of the station spine switch and the gamma station leaf switch are as follows:

the network link quality can be measured as a weighted sum of the number of congested links, the number of idle links (idle) and the availability of links that are operating properly, as follows:

in summary, the overall return function may be expressed as r=ζp _n +τP _lq Where ζ and τ are parameters that adjust the weight of each portion.

(3) Deep reinforcement learning algorithm

The data center network energy-saving scheme provided by the application utilizes a deep reinforcement learning algorithm to make intelligent decisions. Because the action space and the state space of the actual problems to be processed are discrete, the application selects the function of realizing intelligent decision by using the DQN algorithm of the deep Q network, and on the basis, three modes are introduced simultaneously, so that the algorithm performance of the DQN is further improved:

1) And the learning efficiency is improved.

The experience Replay mechanism is one of two key technologies in the DQN algorithm, and the DQN is realized by randomly choosing some experiences in the Replay Buffer to learn, but processing all experiences with the same probability without considering the importance degree of each experience, so that the application introduces a mechanism for preferentially replaying experiences according to the importance degree to improve the effect of the algorithm. Specifically, the present application calculates the priority of the sample based on the TD-error in the DQN algorithm and introduces an impact modifier to give the low TD-error sample the likelihood of being extracted. In addition, the present application utilizes SumTree to implement the sampling by priority size, which is a storage structure that does not require the consumption of a large amount of resources to order before each sample.

In addition, the application also needs to establish balance between the priority of importance degree and the experience playback and the pure random playback so as to better ensure the unbiasedness of the algorithm while ensuring the utilization rate of the sample, and specifically, the application multiplies a weight updating formula before the sampling transition probability formula so as to extract the sample in a mixed mode.

The design of this section is embodied in lines 14-16 of algorithm 1 shown in Table 1.

2) Stability is improved.

Another key technique in DQN is the dual network architecture, namely Q-target and Q-evaluation. However, in the DQN, when the estimated function is calculated, the estimated maximum value of the motion in the current state is used, and the estimated maximum value often deviates due to inaccurate function approximation or experience error, and the fluctuation deviation will affect the learning effect, so as to cause overestimation. The application thus introduces a double dqn, decoupling the Q-target still further into two neural networks maintaining different parameters, one for evaluating the action and the other for selecting the action.

More specifically, in the algorithm of the application, in order to make the program simpler, the Q-evaluation parameter is used for the part for evaluating the action in the target network, so that under the condition of not consuming extra calculation force, the evaluation and selection links of the action in the target network are decoupled, the over-evaluation problem is solved, and the stability of the model is improved.

The design of this section is embodied in line 17 of algorithm 1 shown in table 1.

3) Separation of status value and action rewards.

The neural network output of the original DQN algorithm is divided into two parts: a cost function and a merit function. The cost function is only state dependent, the dominance function is both state and action dependent, they have both public and unique network parameters.

The reason for selecting the lasting DQN is that there are many cases where RL agents take different actions in the design of the application but correspond to the same value function, whereas the lasting DQN can improve the stability of the algorithm without changing other parts of the algorithm by only improving the intermediate structure of the neural network.

The design of this section is embodied in lines 3-4 of algorithm 1 shown in Table 1.

TABLE 1

(III) complexity reduction

One potential problem with DRL is "dimension disaster", i.e. the computational complexity of the solution may increase exponentially as the scale of the problem grows linearly. The GreenDCN-agent also faces the same problems, and when the application expands the network scale to carry out experiments, the existence of the problems is confirmed, the problems are particularly manifested by difficult algorithm convergence, and the algorithm which is not converged cannot make intelligent energy saving. Therefore, the application further provides two methods for reducing the computation complexity of the intelligent agent.

(1) Connectivity guarantees: the first method provided by the application is network connectivity assurance, namely, the application sets the minimum link set K_min capable of guaranteeing communication between any two terminal hosts to be in a normally open state, because the adjustment of the link state is the most basic principle of GreenDCN-agent on the premise of not damaging network connectivity. Thus, greenDCN-agent only needs to make action decisions for link on or off in the complement of the link set K_min. For the Spine-Le shown in FIG. 1The af topology, one of the Spine switches and all its connected links, i.e. the link marked with orange dashed lines, are selected by the present application to be divided into sets k_min. Then the size of the state space will be from 2 for a topology of 6 Leaf switches, 4 Spine switches as shown in figure 1 ²⁴ Reduced to 2 ¹⁸ . The method ensures that the target state of the DRL algorithm is more deterministic, thereby being more beneficial to convergence of the DRL algorithm.

(2) Topology segmentation: the second method provided by the application is topology segmentation based on the topology rule of the data center network. First, the links will be grouped according to the regularity of the DCN topology. Next, each panel will perform deep reinforcement learning training separately and then make decisions separately in parallel. The action decisions for all packets will then be performed sequentially. Note how the segmentation topology can be flexibly customized because what the deep reinforcement learning algorithm does is essentially to build potential connections between input states and output actions by training the neural network without knowing the topology. For the Spine-Leaf topology consisting of 6 Leaf switches and 4 Spine switches shown in fig. 1, the present application proposes a way to divide the Leaf switches into the smallest units, i.e. for each Leaf switch, the present application divides the links connected to it into a group, i.e. the links marked by green solid lines, thereby obtaining 6 groups of packets, and respectively putting these 6 groups into the algorithm for training and decision. Thereby, the state space and action space size of the algorithm will be greatly reduced. After each group makes action decisions, all actions are ordered according to the link utilization of the links to be acted, and are sequentially issued to the monitoring device for execution.

This "parallel training, serial decision" approach may bring many benefits to reinforcement learning based data center network energy saving systems: first, parallel training can significantly reduce the number of states and actions, thereby reducing the computational complexity of the algorithm, making the algorithm easier to converge. In addition, this divide-and-conquer approach helps to improve concurrency and usability of the system. The energy saving planning of other parts can continue even if some parts fail in decision making. Also, serial downstream execution of actions helps to avoid collisions or network disruption caused by different packets simultaneously executing multiple actions.

(3) The mixing method comprises the following steps: the two methods together form the mixing strategy for reducing training complexity. Firstly, marking links which need to be set to be always opened, and then grouping all links according to the topology rule of a data center network. These packets are trained in parallel and action decisions are made, which are serially issued for execution after ordering. Fig. 1 illustrates the overall process of complexity reduction (Complexity Reduction), which may also be referred to as state preprocessing (State Pretreatment) depending on its location in the overall reinforcement learning based data center network energy saving system of the present application.

(IV) scheme flow of data center network energy-saving system based on reinforcement learning

The scheme flow proposed by the present application is shown in solid lines in fig. 1, and the following description of how this closed loop system operates:

the first step: a Network Monitor module (Network Monitor) extracts queue length information of each port of each switch from a data center Network controller (also called an OVS controller);

and a second step of: the Communication module (Communication) and the network monitoring module belong to the same network process, so that the queue length information of the port can be conveniently obtained and used as the current network state (state) of the data center network to be sent to the intelligent agent;

and a third step of: the complexity reduction module (Complexity Reduction) will first perform two complexity reduction methods, namely "connectivity assurance" and "topology splitting", specifically, "connectivity assurance" will mark the links contained in the minimum spanning tree guaranteeing data center network connectivity as non-closeable (i.e., the links shown by the dashed lines above the "connectivity assurance" text in fig. 1), and "topology splitting" will divide the incoming data center network links into several groups according to the feature that the data center network is replicated by simple subunits, and for the spine-leaf topology shown in fig. 1, the subunits selected by the present application are a leaf switch and its connected links (i.e., the links shown by the solid lines above the "topology splitting" text in fig. 1). The application further integrates the two methods to form a final mixed scheme, and a plurality of groups of segmented link sets are obtained for training and decision making respectively.

Fourth step: the deep reinforcement learning algorithm (Deep Reinforcement Learning) receives the segmented network state information, trains and makes decisions of the deep neural network, and generates decisions of opening or closing a certain link;

fifth step: the action preprocessing module (Action Pretreatment) judges whether the action is legal (namely, whether the execution of the action breaks network connectivity) and whether the action is meaningful (namely, whether the action instructs the data center network to open an already-opened link or close an already-closed link), if not, the action is not issued by the round, and if so, the action is issued;

sixth step: the Communication module (Communication) receives the decision after the action pretreatment and forwards the decision to the network management module (Network Management);

seventh step: the network management module opens or closes the link according to the received action, regenerates new network topology, and places a switch in a sleep mode if all the connected links of the switch are closed at the moment;

eighth step: the data center network controller receives the new topology calculated by the network management module and executes corresponding change;

ninth step: the network monitoring module extracts the state information of each port of each exchanger from the data center network controller, and the information is transferred again through the closed loop system.

(fifth) advantageous effects of application example of the present application

(1) Algorithm stable convergence

For the deep reinforcement learning algorithm used in the present application, whether the algorithm converges after training is an important precondition for the effectiveness of the algorithm. The "round" of the horizontal axis in fig. 6 and 7 indicates that the data center network is in the most power efficient state from the initial state, where all links are open, to the final state. After about 500 rounds, the number of steps per round stabilizes around 20, and the effective percentage of motion increases to within the 0.8-1 interval, which illustrates that the algorithm of the present application is effectively convergent.

(2) The algorithm can flexibly and adaptively adjust along with the change of the network state

The application utilizes INT to collect real-time queue length information of each port of the data center network equipment in a fine granularity way, and utilizes a deep reinforcement learning algorithm to make decisions, and each decision making and issuing are microsecond, so that the closed loop system can continuously learn and timely make decisions, and can timely make self-adaptive adjustment along with the change of the network flow state very flexibly without manual intervention. Fig. 8 shows that the behavior of the application can follow the network state changes in time, i.e. open links when the load is high and close links when the load is low. Fig. 9 shows the comparison of the present application and the elastctric tree, where the experimental curve of the present application is more closely related to the flow change (flow change is grey line), whereas the elastctric tree shows a significant lag due to its higher calculation time. In addition, the comparison of the convergence rounds and energy saving efficiency of the reduced complexity algorithm with the non-reduced complexity algorithm provided in the application example of the present application is shown in fig. 10.

(3) The technical proposal of the application has good ductility and deployment performance

The application proves that the two complexity-reducing schemes of connectivity guarantee and topology segmentation designed based on the data center network topology rule are effective by comparing the complexity-reducing schemes with the complexity-reducing schemes not reducing. Under the large-scale data center network topology, the method (GA_NoCR) without complexity is not converged after thousands of rounds of training, and the non-converged algorithm performs poorly in terms of energy saving efficiency, but the training cost of the method (GA_CR) with complexity is not increased significantly as the network scale becomes larger, and the energy saving efficiency thereof is kept at a good level all the time. The experiment proves that the scheme for reducing the complexity greatly improves the expandability of the algorithm, so that the algorithm can be deployed in a large-scale data center network.

That is, the technical scheme provided by the application example of the application has the following improvements:

(1) Port level fine granularity energy-saving algorithm design

Different from other known technical schemes for directly controlling the state of the whole switch, the application collects the internal state information of the network equipment at the port level, decides on the opening or closing of the links (the opening or closing of each link is basically controlled by two port switches at two ends of the link), models the power of the switch, considers the power consumption of the switch and the two parts of the port, and jointly realizes the data center network energy-saving algorithm at the port level with fine granularity through the designs.

(2) Data center network scene-oriented high-deployability system

In the design process, the whole network equipment state information acquisition part considers the actual available technology of the data center network equipment, such as a programmable switch, in-band network telemetry and the like, the mode and the tool used by the network equipment state management are in line with the actual condition of the data center network, the complexity reduction algorithm is designed based on the data center network topology rule and the actual operation requirement, and the processing principle of the action preprocessing module is still considered from the aspect of the actual operation requirement of the data center network. In summary, the application is a highly deployable system specifically designed for data center network scenarios and optimized by utilizing data center network features.

(3) Deep reinforcement learning algorithm

The deep reinforcement learning algorithm used by the application is designed based on the DQN algorithm, on the basis of the design, the practical problem to be solved by the application is considered, the potential performance deficiency of the DQN algorithm is considered, the separation of learning efficiency improvement, stability improvement, state value and action rewarding is provided, the three improved modes jointly improve the process of algorithm training and decision making, and the usability and the deployment property of the algorithm are improved.

(4) Complexity reduction method

The application provides two methods for reducing algorithm complexity, namely connectivity guarantee and topology segmentation, and the two methods are combined to form a hybrid method. The two methods consider the basic requirement of the data center network operation on one hand, and on the other hand, grouping is carried out by utilizing the regularity of the data center network topology, which is a reasonable design which is developed around the specific scene of the data center, and the method for reducing the complexity is proved to be effective by various facts.

(5) Port action decision generation and execution phase separation

In the design of a system architecture, the generation and execution of port-level decisions generated by a deep reinforcement learning algorithm are divided into two modules, the generation of the decisions is performed by GreenDCN-agents, and the execution of the decisions is performed by Network Management in GreenDCN-agents. The relevance of the two processes is reduced, when any process fails, the other process cannot complete the task of the process due to the fact that the main process is occupied, and the fault location can be clearer and clearer.

The embodiment of the application also provides an electronic device (i.e., an electronic device), which may include a processor, a memory, a receiver, and a transmitter, where the processor is configured to perform the data center network energy saving method based on reinforcement learning mentioned in the foregoing embodiment, and the processor and the memory may be connected by a bus or other manners, for example, through a bus connection. The receiver may be connected to the processor, memory, by wire or wirelessly.

The processor may be a central processing unit (Central Processing Unit, CPU). The processor may also be any other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the reinforcement learning-based data center network energy saving method in the embodiments of the present application. The processor executes the non-transitory software programs, instructions and modules stored in the memory to perform various functional applications and data processing of the processor, i.e., to implement the reinforcement learning-based data center network energy saving method in the above-described method embodiments.

The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created by the processor, etc. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory may optionally include memory located remotely from the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory that, when executed by the processor, perform the reinforcement learning based data center network power saving method of an embodiment.

In some embodiments of the present application, a user equipment may include a processor, a memory, and a transceiver unit, which may include a receiver and a transmitter, the processor, the memory, the receiver, and the transmitter may be connected by a bus system, the memory being configured to store computer instructions, the processor being configured to execute the computer instructions stored in the memory to control the transceiver unit to transmit and receive signals.

As an implementation manner, the functions of the receiver and the transmitter in the present application may be considered to be implemented by a transceiver circuit or a dedicated chip for transceiver, and the processor may be considered to be implemented by a dedicated processing chip, a processing circuit or a general-purpose chip.

As another implementation manner, a manner of using a general-purpose computer may be considered to implement the server provided by the embodiment of the present application. I.e. program code for implementing the functions of the processor, the receiver and the transmitter are stored in the memory, and the general purpose processor implements the functions of the processor, the receiver and the transmitter by executing the code in the memory.

Embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the reinforcement learning based data center network energy saving method described above. The computer readable storage medium may be a tangible storage medium such as Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disk, a removable memory disk, a CD-ROM, or any other form of storage medium known in the art.

Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented as hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave.

It should be understood that the application is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present application.

In the present application, features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, and various modifications and variations can be made to the embodiments of the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A data center network energy saving method based on reinforcement learning, comprising: a data center network energy conservation decision step, the data center network energy conservation decision step comprising:

2. The reinforcement learning-based data center network energy saving method of claim 1, further comprising:

3. The reinforcement learning-based data center network energy saving method according to claim 1, wherein a state space corresponding to the deep reinforcement learning algorithm adopts a network topology structure formed by a one-dimensional array to store the data center network;

4. The reinforcement learning-based data center network energy saving method of claim 1, wherein after the agent generates energy saving action decision data for a target link at a next time of the data center network, the energy saving action decision data is further subjected to action preprocessing to determine whether to filter the energy saving action decision data, and if not, the agent sends the energy saving action decision data.

5. The reinforcement learning-based data center network energy saving method of any one of claims 1 to 4, wherein the complexity reduction process comprises: connectivity assurance processing for a data center network topology and/or topology segmentation processing based on a data center network topology rule.

6. A data center network energy saving method based on reinforcement learning, comprising:

7. The reinforcement learning-based data center network energy saving method of claim 6, further comprising:

8. The reinforcement learning-based data center network energy saving method according to claim 6, wherein the state space corresponding to the deep reinforcement learning algorithm adopts a network topology structure formed by a one-dimensional array to store the data center network;

9. The reinforcement learning-based data center network energy saving method of claim 6, further comprising performing an action pre-process on the energy saving action decision data after generating the energy saving action decision data for a target link at a next time of the data center network to determine whether to filter the energy saving action decision data, and if not, transmitting the energy saving action decision data to the monitoring device.

10. A reinforcement learning-based data center network energy saving system, comprising:

monitoring means for performing the reinforcement learning based data center network energy saving method of any one of claims 1 to 5;

an agent for performing the reinforcement learning-based data center network energy saving method of any one of claims 6 to 9.