CN107948083B - SDN data center congestion control method based on reinforcement learning - Google Patents

SDN data center congestion control method based on reinforcement learning Download PDF

Info

Publication number
CN107948083B
CN107948083B CN201711081371.7A CN201711081371A CN107948083B CN 107948083 B CN107948083 B CN 107948083B CN 201711081371 A CN201711081371 A CN 201711081371A CN 107948083 B CN107948083 B CN 107948083B
Authority
CN
China
Prior art keywords
flow
congestion control
data center
matrix
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711081371.7A
Other languages
Chinese (zh)
Other versions
CN107948083A (en
Inventor
金蓉
王伟明
李姣姣
庹鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Original Assignee
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University filed Critical Zhejiang Gongshang University
Priority to CN201711081371.7A priority Critical patent/CN107948083B/en
Publication of CN107948083A publication Critical patent/CN107948083A/en
Application granted granted Critical
Publication of CN107948083B publication Critical patent/CN107948083B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2425Traffic characterised by specific attributes, e.g. priority or QoS for supporting services specification, e.g. SLA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/25Flow control; Congestion control with rate being modified by the source upon detecting a change of network conditions

Abstract

The invention discloses an SDN data center congestion control method based on reinforcement learning. The method is based on the network background of the SDN, proposes a congestion control idea based on the flow, introduces a Q-learning algorithm in reinforcement learning, intelligently distributes the flow rate in a global mode, enables the utilization rate of a data link of the network to be as high as possible, and simultaneously enables the whole network to avoid congestion, thereby realizing the congestion control of a data center. Firstly, modeling a quintuple group to describe a problem; then, an improved Q-learning algorithm is provided, and a Q matrix is trained; and finally, according to the request of the flow, performing congestion control by using the Q matrix obtained by training. The invention provides the self-adaptive SDN data center congestion control method which is good in control effect, good in stability and high in efficiency, and the control algorithm is easy to implement. The invention provides an intelligent solution based on reinforcement learning for the congestion control problem of an SDN data center.

Description

SDN data center congestion control method based on reinforcement learning
Technical Field
The invention relates to the technical field of Network communication, in particular to a congestion control method of an SDN (Software Defined Network) Data Center Network (DCN) based on reinforcement learning.
Background
In recent years, cloud computing has become a hotspot and a future trend in the field of information-based construction, and the number of users of many new internet online services (such as search, social networking, instant messaging, etc.) is also rapidly increasing. In the rapid development process of cloud computing and internet online business, a data center serving as an information infrastructure is in a core position all the time. With the development of business and the use of new technologies, data centers are undergoing and developing significant changes and trends, thereby bringing new challenges and problems to data center networks. Emerging services require a large amount of one-to-many and many-to-many communication between servers, with the result that data center internal traffic is dramatically increasing and exhibits new characteristics that differ from internet traffic. Under the current technical conditions, a data center network is frequently congested, so that packet loss is increased, time delay is increased, throughput is reduced, and service performance and service quality are seriously affected. In order to ensure the performance and the service quality of the service, the flow management and optimization problem of the data center network becomes an important problem to be solved urgently at present.
Reinforcement Learning (Reinforcement Learning) is developed from theories such as animal Learning, stochastic approximation, optimization control and the like, and is an online Learning technology without instructor. By learning the mapping from the environment state to the behavior, the behavior selected by the intelligent agent can obtain the maximum reward of the environment, so that the external environment can be the best to evaluate the learning system in a certain sense (or the running performance of the whole system). The Q-learning algorithm is a model-independent reinforcement learning algorithm that uses reward discount and Q value of "state-action" pairs as estimation functions in iterations, requires each action to be considered in each learning iteration, and ensures that the learning process converges. The Q-learning algorithm can learn without prior knowledge, and has wide application prospect in solving complex optimization decision problems.
The invention provides an SDN data center congestion control method based on reinforcement learning. The method is based on the network background of the SDN, proposes a congestion control idea based on flow (flow), introduces a Q-learning algorithm in reinforcement learning, intelligently distributes the flow rate in a global mode, enables the utilization rate of a data link of the network to be as high as possible, and simultaneously enables the whole network to avoid congestion, thereby realizing congestion control of a data center. The invention provides an intelligent solution based on reinforcement learning for the congestion control problem of an SDN data center, which can optimize the use of network resources of the data center and improve the throughput, service performance and service quality of a network, thereby ensuring the healthy development of emerging services of the Internet and cloud computing, promoting the energy conservation of the data center and contributing to the realization of green communication.
Disclosure of Invention
The invention aims to solve the problem of congestion control of a data center network based on an SDN framework, and provides a congestion control method of the SDN data center network based on reinforcement learning.
The purpose of the invention is realized by the following technical scheme: an SDN data center congestion control method based on reinforcement learning specifically comprises the following steps:
step 1: and the reinforced learning is introduced into a data center based on the SDN, so that the problem of congestion control is solved. The SDN based data center congestion control problem is first described as the five-tuple < F, S, R, a, Q >.
The reinforcement learning is a guide-less online learning technology, an agent (agent) senses state information in an environment, selects an optimal action to cause the change of the state and obtain a report value, updates an evaluation function, enters the next round of learning training after completing a learning process, repeats cycle iteration until the condition of the whole learning is met, and terminates the learning. The SDN-based data center congestion control problem refers to a flow-based congestion control problem, namely, the rate is allocated to all flows in a lump, so that the flow rate request is met as far as possible, and the whole data center network is guaranteed not to generate congestion.
The five-membered group is described as < F, S, R, A, Q >. F (flow) represents the flow to be allocated with a queue length N; s (link state) represents the state of the entire link, and is a vector of length M; r (reward) represents a matrix of reward values obtained after selection of an action. A (action) represents the behavior of allocating rate to flow according to the link demand, and is a vector with the length of N; q (Q-matrix) represents a Q matrix obtained through training and is used for representing the knowledge that the agent has learned from experience.
Step 2: according to the problem requirement, an improved Q-learning algorithm is provided, and the Q matrix is trained.
The Q-learning algorithm is one of the classic algorithms in the reinforcement learning algorithm. Each state behavior pair corresponds to a relevant Q value, execution behaviors are selected according to the Q values in the algorithm, and the optimal strategy is obtained by estimating the value function of the state behavior pairs.
Based on an improved Q-learning algorithm, the training of the Q matrix specifically comprises the following steps:
and 2-1, giving a reward matrix R according to certain prior knowledge. And initializes the Q matrix.
2-2, improving the method for selecting action in the Q-learning algorithm in reinforcement learning. The classical Q-learning algorithm selects the action corresponding to the maximum reward in the R matrix according to the current state. The improved Q-learning algorithm combines the current state and the path passed by the current flow to select the action corresponding to the maximum reward in the R matrix.
2-3. action is performed, reward and new link state are observed, and the Q value Q (S, a) is iteratively updated according to Q (S, a) ← Q (S, a) + α [ r + γ max Q (S ', a') -Q (S, a) ].
The updating formula is a formula for updating the iteration Q value by the Q-learning algorithm. Wherein Q (S, a) represents a Q value after performing action a in the current state S, Q (S ', a') represents a Q value after performing action a 'in the next state S', r is a reward after performing action a in the current state S, γ is a discount factor, α is learning efficiency, γ max Q (S ', a') is a discount reward for the subsequent state, and γ max Q (S ', a') -Q (S, a) constitutes an improved estimate of the reward for the subsequent state.
And 2-4, circularly executing the Q matrix training process until s is in a final state to obtain the trained Q matrix.
And step 3: and (3) according to a specific flow request, combining the Q matrix obtained in the step (2) after training to perform congestion control.
The specific congestion control method comprises the following steps:
3-1, specific N flow requests are clarified, and the quantification standard of the occupied bandwidth of the link is determined.
And 3-2, inputting a flow request, acquiring the current link state, considering the link passed by the current flow, and selecting an action with the maximum reward to execute according to a Q matrix obtained by Q algorithm training, namely selecting the speed for the current flow. The current link state is then updated while recording the rate assigned to the current flow.
And 3-3, judging whether all the N flows are distributed completely. If not, it is necessary to return to step 3-2 to continue the loop until all flows are allocated rates.
And 3-4, outputting N mapping tables of flow and rate, thereby performing global congestion control on the data center.
The invention has the beneficial effects that: the invention provides an intelligent solution based on reinforcement learning for the congestion control problem of an SDN data center, which can optimize the use of network resources of the data center and improve the throughput, service performance and service quality of a network, thereby ensuring the healthy development of emerging services of the Internet and cloud computing, promoting the energy conservation of the data center and contributing to the realization of green communication.
Drawings
FIG. 1 is a system architecture diagram.
Fig. 2 is a data center network topology diagram adopted by the embodiment.
FIG. 3 is a flow chart of a training algorithm.
Fig. 4 is a flow chart of a congestion control method.
Fig. 5 is a diagram showing the variation of the bandwidth of each link in the embodiment.
Fig. 6 is a rate allocation diagram of flows in an embodiment.
Detailed Description
The invention is further described below with reference to the figures and examples.
The invention provides an SDN data center congestion control method based on reinforcement learning, which comprises the following steps:
step 1: and the reinforced learning is introduced into a data center based on the SDN, so that the problem of congestion control is solved. The data center congestion control problem is first described as the five-tuple < F, S, R, a, Q >.
Reinforcement learning is a guide-less online learning technology, an agent (agent) senses state information in an environment, selects an optimal action to cause the change of the state and obtain a report value, updates an evaluation function, enters the next round of learning training after completing a learning process, repeats cycle iteration until the condition of the whole learning is met, and terminates the learning. The SDN data center congestion control problem based on reinforcement learning refers to a congestion control problem based on flow, namely, the rate is allocated to all the flow in a comprehensive mode, so that the rate request of the flow is met as far as possible, and the whole data center network is guaranteed not to generate congestion.
The quintuple is described as < F, S, R, A, Q >. F (flow) represents the flow to be allocated with a queue length N; s (link state) represents the state of the entire link, and is a vector of length M; r (reward) represents a matrix of reward values obtained after selection of an action. A (action) represents the behavior of allocating rate to flow according to the link demand, and is a vector with the length of N; q (Q-matrix) represents a Q matrix obtained through training and is used for representing the knowledge that the agent has learned from experience.
Step 2: according to the problem requirement, an improved Q-learning algorithm is provided, and a Q matrix is trained.
The Q-learning algorithm is one of the classic algorithms in the reinforcement learning algorithm. Each state behavior pair corresponds to a relevant Q value, execution behaviors are selected according to the Q values in the algorithm, and the optimal strategy is obtained by estimating the value function of the state behavior pairs.
The step 2 specifically comprises the following steps:
and 2-1, giving a reward matrix R according to certain prior knowledge. And initializes the Q matrix.
2-2, improving the method for selecting action in the Q-learning algorithm in reinforcement learning. The classical Q-learning algorithm selects the action corresponding to the maximum reward in the R matrix according to the current state. The improved Q-learning algorithm combines the current state and the path passed by the current flow to select the action corresponding to the maximum reward in the R matrix.
2-3. action is performed, reward and new link state are observed, and the Q value Q (S, a) is iteratively updated according to Q (S, a) ← Q (S, a) + α [ r + γ max Q (S ', a') -Q (S, a) ].
The updating formula is a formula for updating the iterative Q value by the Q-learning algorithm. Wherein Q (S, a) represents a Q value after performing action a in the current state S, Q (S ', a') represents a Q value after performing action a 'in the next state S', r is a reward after performing action a in the current state S, γ is a discount factor, α is learning efficiency, γ max Q (S ', a') is a discount reward for the subsequent state, and γ max Q (S ', a') -Q (S, a) constitutes an improved estimate of the reward for the subsequent state.
And 2-4, circulating the steps until S is in a final state. And obtaining a trained Q matrix.
And step 3: and (3) according to a specific flow request, combining the Q matrix obtained in the step (2) after training to perform congestion control.
The specific congestion control method comprises the following steps:
3-1, specific N flow requests are clarified, and the quantification standard of the occupied bandwidth of the link is determined.
And 3-2, inputting a flow request, acquiring the current link state, considering the link passed by the current flow, and selecting an action with the maximum reward to execute according to a Q matrix obtained by Q algorithm training, namely selecting the speed for the current flow. The current link state is then updated while recording the rate assigned to the current flow.
And 3-3, judging whether all the N flows are distributed completely. If not, it is necessary to return to step 3-2 to continue the loop until all flows are allocated rates.
And 3-4, outputting N mapping tables of flow and rate, thereby performing global congestion control on the data center.
The quintuple describes the congestion control problem of a data center with a software defined network architecture as quintuple < F, S, R, A, Q >, describes the flow of the rate to be distributed as F, describes the link state as S, describes the reward as R matrix, describes the rate distribution as action A, and records the training result of the agent as Q matrix.
Examples
In order to facilitate the understanding and implementation of the present invention for those skilled in the art, the technical solutions of the present invention will be further described with reference to the accompanying drawings, and a specific embodiment of the method of the present invention is provided.
The invention introduces the reinforcement learning method into the data center based on the software defined network, and solves the problem of congestion control. Fig. 1 is a system architecture diagram, and the basic functions of each module are: (1) a perception module: adopting the current link state information of the data center network; (2) a learning module: learning the received link state information or obtaining quantitative information according to related experience knowledge to provide decision basis for a decision module; (3) a decision module: according to the information provided by the learning module, a corresponding control strategy is formulated; (4) an execution module: and executing the control strategy made by the decision module. The learning module of this embodiment adopts an improved Q-learning algorithm, and selects an Action corresponding to the maximum reward in the R matrix according to the current state with a classic Q-learning algorithm. The improvement is that the action corresponding to the maximum reward is selected in the R matrix by combining the current state and the two conditions of the path passed by the current flow in the Q-learning algorithm. The Q matrix obtained by the training of the learning module is provided for the decision module. And the decision module allocates a rate to each flow according to the Q matrix to realize congestion control.
Fig. 2 is a network topology diagram of an SDN data center employed in the embodiment. The whole network has 5 links, and the link bandwidth is 8G. The length of the flow queue adopted in the embodiment is 10.
The specific congestion control method comprises the following steps:
step 1: the data center congestion control problem is described as the five-tuple < F, S, R, a, Q >.
The congestion control problem of the data center is described as five-tuple < F, S, R, A, Q >, flow of the rate to be distributed is described as F, link state is described as S, reward is described as R matrix, rate distribution is described as action A, and the training result of the agent is recorded as Q matrix.
F (flow) -indicates the queue length of the flow to be allocated, where the queue length of the flow to be allocated with bandwidth in this embodiment is 10, and in addition, this embodiment has 5 links in total, and each flow occupies two links. The flow can be expressed as:
F=(flow1,flow2,...,flowi,...,flow10) (1)
flow of formula (1)iThe values of (A) are as follows:
flowi∈{fjkj, k e, wherein j, k e1,2,...,5 (2)
F in formula (2)jkIndicates flowiOccupying j, k two links.
(link state) -the state of the entire link is represented, and is a vector of length 5. Can be expressed as:
S=(ls1,ls2,...,lsi,...,ls5) (3)
ls in formula (3)iThe values of (A) are as follows:
lsi∈{gj}, where j ∈ 1, 2.., 8 (4)
G in formula (4)iRepresenting the quantization level of the used bandwidth of the link.
In this embodiment, we divide the occupied bandwidth of the link into 8 levels, B is the bandwidth of the link, i.e. the maximum transmission rate, and discretize the state of the link into 8 levels as shown in the following table:
Figure GDA0002897911430000061
Figure GDA0002897911430000071
further, the link bandwidth B in this embodiment is 40G.
(action) -the behavior of allocating rate for flow according to link demand is represented by a vector with length of 10, which can be represented as:
A=(a1,a2,...,ai,...,a10) (5)
a in formula (5)iThe values of (A) are as follows:
ai∈{1,2,3,4,5}
r (reward) -a matrix representing the prize value obtained after an Action is selected. We can represent their prize values by the current state S as a row and the next state S as a column. It is an 85Line 85A matrix of columns.
Figure GDA0002897911430000072
R in formula (6)ijRepresents a state SiAfter executing a certain action, transition is made to state SjThe reported value obtained.
As for the determination of Reward, various schemes are possible. In this embodiment, we adopt the following specific scheme: a unimodal function F ═ min (i/7,100 ═ 35-i) is used, where i denotes the size of the bandwidth occupancy of the link. Then reward has the following two cases:
Figure GDA0002897911430000073
q (Q-matrix) -represents a Q matrix obtained through training and is used for representing knowledge that the Agent has learned from experience. The Q matrix is of the same order as the R matrix, the row of the Q matrix represents the current state S, and the column represents the next state S after corresponding action is taken.
Figure GDA0002897911430000074
Q in the formula (8)ijRepresents a state SiTransition to State SjKnowledge learned by Agent.
Step 2: according to the problem requirement, an improved Q-learning algorithm is provided, and a Q matrix is trained. And obtaining a trained Q matrix.
In the architecture framework diagram of the Q-learning based congestion control system shown in fig. 1, the whole process mainly includes the following parts: the detector collects flow information and inputs the flow information into a detection state/processor for analysis and processing; inputting all link state information into a Q-learning optimization control decision-making device; the Q-value of the strategy is obtained in a Q-learning control decision device; the strategy decision device can obtain a better flow distribution strategy; and through continuous circulation, all flow allocation strategies on all links are found, so that the congestion control of the whole data center is realized.
According to the problem requirement, an improved Q-learning algorithm is provided for training a Q matrix. The classical Q-learning algorithm selects the action corresponding to the maximum reward in the R matrix according to the current state. The improved Q-learning algorithm combines the current state and the path passed by the current flow to select the action corresponding to the maximum reward in the R matrix. The improved algorithm is described as follows:
Figure GDA0002897911430000081
fig. 3 is a Q training flow diagram. The method specifically comprises the following steps:
2-1, giving the reward matrix R according to the step 1. And initializes the Q matrix. The initial load of the 5 links is [18,20,18,14,29 ].
2-2, selecting action according to the improved Q-learning algorithm.
2-3. action is performed, reward and new link state are observed, and the Q value Q (S, a) is iteratively updated according to Q (S, a) ← Q (S, a) + α [ r + γ max Q (S ', a') -Q (S, a) ].
And updating the formula, namely updating the iterative Q value by using the Q-learning algorithm. Wherein Q (S, a) represents a Q value after performing action a in the current state S, Q (S ', a') represents a Q value after performing action a 'in the next state S', r is a reward after performing action a in the current state S, γ is a discount factor, α is learning efficiency, γ max Q (S ', a') is a discount reward for the subsequent state, and γ max Q (S ', a') -Q (S, a) constitutes an improved estimate of the reward for the subsequent state.
And 2-4, iterating the loop until s is in a final state.
And step 3: and (3) according to a specific flow request, combining the Q matrix obtained in the step (2) after training to perform congestion control.
A flowchart of a specific congestion control method is shown in fig. 4, and specifically includes the following steps:
3-1, giving 5 links of the data center network, determining quantization standards g 1-g 8 of occupied bandwidth of the links, wherein flow requests to be distributed are 10, and the specific occupied links and bandwidth requirements are as follows:
flow1 flow2 flow3 flow4 flow5 flow6 flow7 flow8 flow9 flow10
seizing a link l1,l2 l1,l3 l1,l4 l1,l5 l2,l3 l2,l4 l2,l5 l3,l4 l3,l5 l4,l5
Bandwidth required (G) 5 5 5 5 5 5 5 5 5 5
And 3-2, inputting 10 flow requests, setting the initial load of 5 links as [18,20,18,14,29], considering the links passed by the current flow, and selecting an action with the maximum reward to execute according to a Q matrix obtained by Q algorithm training, namely selecting the speed for the current flow. The current link state is then updated while recording the rate allocated for the current flow.
And 3-3, judging whether the 10 flow are completely distributed. If not, it is necessary to return to step 3-2 to continue the loop until all flows are allocated rates.
3-4, outputting a mapping table of 10 flows and rates, as follows:
flow1 flow2 flow3 flow4 flow5 flow6 flow7 flow8 flow9 flow10
seizing a link l1,l2 l1,l3 l1,l4 l1,l5 l2,l3 l2,l4 l2,l5 l3,l4 l3,l5 l4,l5
Bandwidth required (G) 5 5 5 5 5 5 5 5 5 5
Allocation of bandwidth (G) 4 4 4 1 5 1 1 2 1 3
Fig. 5 shows a graph of the change in bandwidth per allocation of each link. On the abscissa 0, represents the initial bandwidth occupancy [18,20,18,14,29 ]; when the abscissa is 1, the bandwidth occupation condition of each link after the bandwidth is allocated to the first flow is represented, in this embodiment, the first flow occupies link 1 and link 2, and the allocated rate is 3G. As can be seen from fig. 5, after the rate assignment of 10 flows is completed, no congestion occurs in all links. The method can effectively realize congestion control.
Fig. 6 shows a rate allocation diagram for a flow. Fig. 6 shows 5G with 1 flow allocated the link demand, 3 flows allocated 4G on demand, 1 flow allocated 3G on demand, 1 flow allocated 2G on demand, and 4 flows allocated only 1G. The bandwidth requirement of each flow is met as much as possible, and meanwhile, the data center network does not generate congestion.
If the bandwidth is allocated completely on demand without the congestion control method of the present invention, the result would be that each flow is allocated 5G of bandwidth on demand, but the bandwidth of the actual link may not meet the demand of each flow, resulting in congestion.
The congestion control method of the present invention has been described above with reference to specific embodiments. The embodiment shows that the congestion control method of the data center provided by the invention is effective. The method can perform flow-based congestion control on the SDN data center network, and performs overall rate distribution on the flow by using the controller, so that congestion can be avoided, and the bandwidth utilization rate can be as high as possible.

Claims (2)

1. An SDN data center congestion control method based on reinforcement learning is characterized by comprising the following steps:
step 1: introducing an enhanced learning method into a data center based on a software defined network, and describing a congestion control problem of the data center based on an SDN into a quintuple < F, S, R, A, Q >; wherein F represents the flow to be distributed, and the queue length is N; s represents the state of the whole link and is a vector with the length of M; r represents a matrix of reward values obtained after selecting action; a represents the behavior of allocating rate to flow according to the link demand, and is a vector with the length of N; q represents a Q matrix obtained through training and used for representing the knowledge that the intelligent agent has learned from experience;
step 2: training a Q matrix based on an improved Q-learning algorithm; the method specifically comprises the following steps:
2-1, according to the prior knowledge, giving an incentive matrix R, and initializing a Q matrix;
2-2, improving a method for selecting action in a Q-learning algorithm in reinforcement learning, so that the algorithm combines two conditions of the current state and a path passed by the current flow, and selecting the action corresponding to the maximum reward in the R matrix;
2-3. performing action, observing reward and new link state, iteratively updating Q value Q (S, a) according to Q (S, a) ← Q (S, a) + α [ r + γ max Q (S ', a') -Q (S, a) ]; wherein Q (S, a) represents a Q value after performing action a in the current state S, Q (S ', a') represents a Q value after performing action a 'in the next state S', r is a reward after performing action a in the current state S, γ is a discount factor, α is learning efficiency, γ max Q (S ', a') is a discount reward for the subsequent state, and γ max Q (S ', a') -Q (S, a) constitutes an improved estimate of the reward for the subsequent state;
2-4, circularly executing a Q matrix training process until s is in a final state to obtain a trained Q matrix;
and step 3: according to a specific flow request, combining the Q matrix obtained in the step 2 after training to perform congestion control;
the specific congestion control method in step 3 includes the following steps:
3-1, determining the number N of flow requests and determining the quantitative standard of the link utilization rate;
3-2, inputting a flow request, acquiring the current link state, considering a link through which the current flow passes, and selecting an action with the maximum reward to execute according to a Q matrix obtained by Q algorithm training, namely selecting the speed for the current flow; then updating the current link state and recording the rate allocated to the current flow;
3-3, judging whether all the N flow are distributed completely; if not, then the process returns to step 3-2 to continue the cycle until all flows are allocated rates; if the distribution is finished, executing the step 3-4;
and 3-4, outputting N mapping tables of flow and rate, thereby performing global congestion control on the data center.
2. The reinforcement learning-based SDN data center congestion control method of claim 1, wherein: the SDN-based data center congestion control problem refers to a flow-based congestion control problem, namely, the rate is allocated to all flows in a lump, so that the flow rate request is met as far as possible, and the whole data center network is guaranteed not to generate congestion.
CN201711081371.7A 2017-11-07 2017-11-07 SDN data center congestion control method based on reinforcement learning Active CN107948083B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711081371.7A CN107948083B (en) 2017-11-07 2017-11-07 SDN data center congestion control method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711081371.7A CN107948083B (en) 2017-11-07 2017-11-07 SDN data center congestion control method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN107948083A CN107948083A (en) 2018-04-20
CN107948083B true CN107948083B (en) 2021-03-30

Family

ID=61934371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711081371.7A Active CN107948083B (en) 2017-11-07 2017-11-07 SDN data center congestion control method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN107948083B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108881048B (en) * 2018-08-23 2019-06-14 北京理工大学 A kind of name data network congestion control method based on intensified learning
CN110601973B (en) * 2019-08-26 2022-04-05 中移(杭州)信息技术有限公司 Route planning method, system, server and storage medium
CN110768906B (en) * 2019-11-05 2022-08-30 重庆邮电大学 SDN-oriented energy-saving routing method based on Q learning
CN111416774B (en) * 2020-03-17 2023-03-21 深圳市赛为智能股份有限公司 Network congestion control method and device, computer equipment and storage medium
CN113518039B (en) * 2021-03-03 2023-03-24 山东大学 Deep reinforcement learning-based resource optimization method and system under SDN architecture
CN113315715B (en) * 2021-04-07 2024-01-05 北京邮电大学 Distributed intra-network congestion control method based on QMIX
CN113347102B (en) * 2021-05-20 2022-08-16 中国电子科技集团公司第七研究所 SDN link surviving method, storage medium and system based on Q-learning
CN115150335B (en) * 2022-06-30 2023-10-31 武汉烽火技术服务有限公司 Optimal flow segmentation method and system based on deep reinforcement learning
CN117033005B (en) * 2023-10-07 2024-01-26 之江实验室 Deadlock-free routing method and device, storage medium and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051629A (en) * 2012-12-24 2013-04-17 华为技术有限公司 Software defined network-based data processing system, method and node

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010049931A1 (en) * 2008-10-29 2010-05-06 Ai Medical Semiconductor Ltd. Optimal cardiac pacing with q learning
US9679258B2 (en) * 2013-10-08 2017-06-13 Google Inc. Methods and apparatus for reinforcement learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051629A (en) * 2012-12-24 2013-04-17 华为技术有限公司 Software defined network-based data processing system, method and node

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于强化学习理论的网络拥塞控制算法研究;李鑫;《中国博士学位论文全文数据库》;20090531;正文第57-62页 *

Also Published As

Publication number Publication date
CN107948083A (en) 2018-04-20

Similar Documents

Publication Publication Date Title
CN107948083B (en) SDN data center congestion control method based on reinforcement learning
CN111835827B (en) Internet of things edge computing task unloading method and system
CN110225535B (en) Heterogeneous wireless network vertical switching method based on depth certainty strategy gradient
CN112486690B (en) Edge computing resource allocation method suitable for industrial Internet of things
CN110968426B (en) Edge cloud collaborative k-means clustering model optimization method based on online learning
CN104901989B (en) A kind of Site Service offer system and method
CN115633380B (en) Multi-edge service cache scheduling method and system considering dynamic topology
Ghalut et al. QoE-aware optimization of video stream downlink scheduling over LTE networks using RNNs and genetic algorithm
Han et al. Cache placement optimization in mobile edge computing networks with unaware environment—an extended multi-armed bandit approach
Zheng et al. Learning based task offloading in digital twin empowered internet of vehicles
CN113543160B (en) 5G slice resource allocation method, device, computing equipment and computer storage medium
CN111930435A (en) Task unloading decision method based on PD-BPSO technology
Cui et al. Multi-Agent Reinforcement Learning Based Cooperative Multitype Task Offloading Strategy for Internet of Vehicles in B5G/6G Network
Zhang et al. Intelligent resources management system design in information centric networking
CN113672372B (en) Multi-edge collaborative load balancing task scheduling method based on reinforcement learning
CN114785692A (en) Virtual power plant aggregation regulation and control communication network flow balancing method and device
CN114449536A (en) 5G ultra-dense network multi-user access selection method based on deep reinforcement learning
CN114980324A (en) Slice-oriented low-delay wireless resource scheduling method and system
CN114138416A (en) DQN cloud software resource self-adaptive distribution method facing load-time window
CN114401192A (en) Multi-SDN controller collaborative training method
CN115250156A (en) Wireless network multichannel frequency spectrum access method based on federal learning
CN114339892B (en) DQN and joint bidding based two-layer slice resource allocation method
CN117392483B (en) Album classification model training acceleration method, system and medium based on reinforcement learning
CN113938978B (en) Heterogeneous wireless sensor path finding method based on reinforcement learning
CN113068150B (en) Training method and device, transmission method, equipment and medium of strategy estimation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant