CN112968834B - SDN route convergence method under reinforcement learning based on network characteristics - Google Patents

SDN route convergence method under reinforcement learning based on network characteristics Download PDF

Info

Publication number
CN112968834B
CN112968834B CN202110145046.2A CN202110145046A CN112968834B CN 112968834 B CN112968834 B CN 112968834B CN 202110145046 A CN202110145046 A CN 202110145046A CN 112968834 B CN112968834 B CN 112968834B
Authority
CN
China
Prior art keywords
node
network
theta
reinforcement learning
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110145046.2A
Other languages
Chinese (zh)
Other versions
CN112968834A (en
Inventor
李传煌
陈忠良
汤中运
谭天
王峥
方春涛
陈超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou DPTech Technologies Co Ltd
Zhejiang Gongshang University
Original Assignee
Hangzhou DPTech Technologies Co Ltd
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou DPTech Technologies Co Ltd, Zhejiang Gongshang University filed Critical Hangzhou DPTech Technologies Co Ltd
Priority to CN202110145046.2A priority Critical patent/CN112968834B/en
Publication of CN112968834A publication Critical patent/CN112968834A/en
Application granted granted Critical
Publication of CN112968834B publication Critical patent/CN112968834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/122Shortest path evaluation by minimising distances, e.g. by selecting a route with minimum of number of hops
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/22Alternate routing

Abstract

The invention discloses an SDN route convergence method under reinforcement learning based on network characteristics, which applies reinforcement learning to SDN route convergence, uses a QLearning algorithm as a reinforcement learning model, and defines a direction factor theta to describe the direction of each transfer in a path according to an input network topology. And guiding the reinforcement learning agent to explore according to the theta value in the path transfer process. In the early stage of the epicode, the agent is allowed to select an action corresponding to the theta value being negative in the exploration phase, and the probability of the agent exploring the action corresponding to the theta value being negative is reduced with the continuous iteration of the epicode. Therefore, the exploration efficiency is improved while the agent obtains sufficient experience from the environment, and the generation of loops in the training phase is reduced. The method utilizes the characteristics of continuous interaction and strategy adjustment of reinforcement learning and network environment, and can find the optimal path in the route convergence process compared with the traditional route convergence algorithm.

Description

SDN route convergence method under reinforcement learning based on network characteristics
Technical Field
The invention relates to the field of network communication technology and reinforcement learning, in particular to an SDN route convergence method under reinforcement learning based on network characteristics.
Background
The reinforcement learning process may be summarized as the agent mapping from environmental states to behavioral actions such that the accumulated reward value is maximized. In the routing planning, the intelligent agent receives the current state information and the reward information from the routing system, and the action selected by the intelligent agent can be regarded as the input received by the routing system from the intelligent agent, and the action and the reward information in the current routing system can influence the action selection of the intelligent agent in a longer time later. In the whole routing planning system, the intelligent agent must learn the optimal action to maximize the accumulated reward value, and the action selected by the intelligent agent is the optimal path of the traffic. In the reinforcement learning task, Q-learning does not depend on an environment model, and in the limited Markov decision process, Q-learning can find the optimal strategy, and what agents need to do is to try in the system continuously to learn one strategy. The strategy is determined by the accumulated reward obtained after the strategy is executed, and the best strategy is to select the action with the maximum Q value in each state. Exploration refers to the agent selecting actions that have not been performed before, and exploitation refers to the agent taking the current optimal action from previously learned experience. In the invention, links which are not selected before are explored, namely selected, so that more possibilities are searched; and the known planning line is perfected by utilizing, namely selecting the currently selected link.
If the shortest path planning is realized by using reinforcement learning, agent's exploration trend is to find nodes with smaller and smaller depths. In the network, except for the link length, the link bandwidth, the delay, the hop count and the like can be taken as dominant characteristics, and various characteristics can also be taken as a new characteristic through weight addition. Through the dominant characteristics of the network topology, the agent can be guided to change from high random behavior in an exploration phase into efficient exploration, so that the learning network can converge more quickly. To avoid network convergence to sub-optimal solutions, we allow agent's exploration behavior to be highly random during the initial phase of training. With the increment of the training steps, by increasing the gradient difference of the dominant characteristics of each link, the agent can realize the transition from high random exploration to high-efficiency exploration, and can ensure the convergence of the optimal solution while improving the convergence speed.
Disclosure of Invention
The invention provides an SDN route convergence method under reinforcement learning based on network characteristics by combining reinforcement learning and solves the problems that a loop is easily formed around the conventional route convergence algorithm at present, an optimal path cannot be found and the like.
The technical scheme adopted by the invention for solving the technical problem is as follows: an SDN route convergence method under reinforcement learning based on network characteristics comprises the following steps:
step 1: establishing an SDN network area topological graph, and dividing network areas in a fine-grained manner;
step 2: defining a direction factor theta, setting a source node, wherein the direction factor theta is { -1, 0, 1}, when the direction factor theta from one node to another node in a topological graph is-1, the direction factor theta represents that the direction factor theta is close to the source node, when the direction factor theta from one node to another node in the topological graph is 1, the direction factor theta represents that the direction factor theta is far away from the source node, when the direction factor theta from one node to another node in the topological graph is 0, the shortest distances between the two nodes and the source node are the same, constructing a network topological hierarchical graph according to the theta values between different nodes in the topological graph and the relationship between the two nodes and the source node, and the shortest distances between all the nodes in each layer and the source node are the same;
and step 3: inputting the network topology hierarchical diagram obtained in the step 2 into a reinforcement learning model to guide agent exploration by using a Qlearning algorithm as the reinforcement learning model; when the height difference is oriented to 0, it means that the nodes are in the same layer, and at this time, the transition between the nodes in the same layer is less affected by the layering, and appears as a random exploration state of agent. And when the height difference between the layers is continuously increased, the agent is guided to explore the lower layer, and the efficient exploration state of the agent is shown.
The following formula is set:
Figure BDA0002929904510000021
h(θ)=f(θt)
wherein t represents the iteration number of the epsilon, and step is a set threshold value. And f (t) taking an absolute value of theta according to the iteration progress of the epamode, and specifically determining the specific value of theta by the h (theta) according to the state of the corresponding action. With continuous iteration of the epicode, in the selectable actions, the value of θ corresponding to the action close to the source node becomes smaller, and the value of θ corresponding to the action far from the source node becomes larger, so that a formula D is defined:
Figure BDA0002929904510000022
the range of the value of the interval D is from 0 to the sum of theta values corresponding to all current optional actions, the interval is divided into n parts, n is the number of the current optional actions, and the length of each subinterval is the theta value corresponding to the action.
η=random(D)
And obtaining the random number eta by carrying out equal probability value on the interval D.
Calculated by the following formula:
Figure BDA0002929904510000023
the function g (D, η) is the action corresponding to the interval D where the random number η is located, i.e. the agent searches for the action a to be selected.
The strategy formula of reinforcement learning based on the network area characteristics is as follows:
Figure BDA0002929904510000024
the epsilon-greedy policy balances exploration and utilization based on an exploration factor epsilon (epsilon 0, 1). Generating a random number sigma (sigma belongs to [0,1]), and when sigma is less than or equal to epsilon, using a random strategy by agent to explore the environment through randomly selecting actions to obtain experience; when σ > ε, agent uses a greedy strategy to leverage the experience that has been gained.
When agent generates state transition, inputting the current state s and the selected action a into a function R, generating reward to evaluate the state transition, and setting a reward function:
Rt(s,a)=αB-βt+γδ(s_-d)-δ
r is the reward earned at node i, selecting a link to node j. Alpha, beta, gamma and delta are used as four positive parameters to weigh the weights of the four parts of rewards. B is the residual bandwidth of the link corresponding to the selected action, and t is the delay of the corresponding link. δ (s _ -d) is the stimulus function, s _ representing the state to transition after selecting action a based on state s.
And training the Q value table according to the set reward function, and obtaining a path Routing through the trained Q value table, wherein the path is the converged optimal path after the link fails.
Further, the fine-grained network area division specifically includes: and constructing a network connection matrix according to the SDN network topology, wherein the network connection matrix comprises the adjacency relation among all nodes of the network. Inputting the network connection matrix and the node number n in the network topology into a hub node election algorithm, recording the connection number of the nodes, and expressing as follows:
Figure BDA0002929904510000031
wherein, node _ link [ i ] is node i, T [ i ] [ j ] is link connected by node i, and the node with the highest link connection number is selected as the pivot node. The hub node and the adjacent nodes thereof are used as a divided network area.
Further, the process of training the Q-value table by the QLearning algorithm is specifically as follows:
and setting the maximum step number of the single training.
(1) Initializing a Q value table and a reward function R;
(2) adopting a strategy based on network area characteristics, and selecting an action a;
(3) executing action a, transferring to a state s _, calculating a reward value by using a reward function R, and updating a Q value table;
(4) and judging whether s _ is a destination node or not. If not, let s be s _, go back to (2). If s _ is the destination node, the training ends.
Further, when planning the backup path, the link bandwidth performance and the link delay are concerned, and therefore α is set to 0.4, β is set to 0.3, γ is set to 0.1, and δ is set to 0.2.
The invention has the beneficial effects that: the present invention defines a direction factor theta to describe the direction of each transition in the path. And guiding the reinforcement learning agent to explore according to the theta value in the path transfer process. Therefore, the exploration efficiency is improved while the agent obtains sufficient experience from the environment, and the generation of loops in the training phase is reduced. Compared with the traditional route convergence algorithm, the method can find the optimal path in the route convergence process by utilizing the characteristics of continuous interaction and strategy adjustment between reinforcement learning and the network environment.
Drawings
Figure 1 is a SDN network topology diagram;
fig. 2 is a network topology hierarchy diagram.
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
Aiming at the problem that the existing SDN control adopts Dijkstra algorithm as the shortest route convergence algorithm, the invention tries to apply reinforcement learning to SDN route convergence. And directly using the network topology environment for the training of the Q value table by utilizing the characteristic of SDN forwarding control separation. Considering that the residual bandwidth of each link in the topology dynamically changes along with the forwarding operation of different flows, the invention introduces a reinforcement learning technology, and utilizes the advantages of a reinforcement learning self-exploration environment to deal with the dynamics of a network environment, thereby finding an optimal route convergence path under the condition of ensuring the route convergence speed.
The invention provides an SDN route convergence method under reinforcement learning based on network characteristics, which comprises the following steps:
step 1: establishing an SDN network area topological graph, and dividing network areas in a fine-grained manner; the method specifically comprises the following steps: and constructing a network connection matrix according to the SDN network topology, wherein the network connection matrix comprises the adjacency relation among nodes of the network. Inputting the network connection matrix and the node number n in the network topology into a hub node election algorithm, recording the connection number of the nodes, and expressing as follows:
Figure BDA0002929904510000041
wherein, node _ link [ i ] is node i, T [ i ] [ j ] is link connected by node i, and the node with the highest link connection number is selected as the pivot node. The hub node and its neighboring nodes serve as a divided network area, as shown in fig. 1.
And 2, step: defining a direction factor theta, setting a source node, wherein the direction factor theta is set to be { -1, 0, 1}, when the direction factor theta from one node to another node in a topological graph is set to be-1, the direction factor theta represents that the direction factor theta is close to the source node, when the direction factor theta from one node to another node in the topological graph is set to be 1, the direction factor theta represents that the direction factor theta is far from the source node, when the direction factor theta from one node to another node in the topological graph is set to be 0, the shortest distances between the two nodes and the source node are the same, and a network topological hierarchical graph is constructed according to the theta values between different nodes in the topological graph and the relationship between the theta values and the source node, wherein as shown in fig. 2, the shortest distances between all nodes in each layer and the source node are the same;
and step 3: inputting the network topology hierarchical diagram obtained in the step 2 into a reinforcement learning model to guide agent exploration by using a Qlearning algorithm as the reinforcement learning model; when the height difference is oriented to 0, it means that each node is almost in the same layer, and then the transition between nodes in the same layer is less affected by the layering and appears as a random exploration state of agent. And when the height difference between the layers is continuously increased, the agent is guided to explore the lower layer, and the efficient exploration state of the agent is shown.
The following formula is set:
Figure BDA0002929904510000042
h(θ)=f(θt)
wherein t represents the iteration number of the epsilon, and step is a set threshold value. And f (t) taking an absolute value of theta according to the epicode iteration progress, and specifically determining a specific value of theta by using h (theta) according to the state of the corresponding action. With continuous iteration of the epicode, in the selectable actions, the value of θ corresponding to the action close to the source node becomes smaller, and the value of θ corresponding to the action far from the source node becomes larger, so that a formula D is defined:
Figure BDA0002929904510000051
the range of the value of the interval D is from 0 to the sum of theta values corresponding to all current optional actions, the interval is divided into n parts, n is the number of the current optional actions, and the length of each subinterval is the theta value of the corresponding action.
η=random(D)
And obtaining the random number eta by carrying out equal probability value on the interval D.
Calculated by the following formula:
Figure BDA0002929904510000052
the function g (D, η) is the action corresponding to the interval D where the random number η is located, i.e. the agent searches for the action a to be selected.
The strategy formula of reinforcement learning based on the network area characteristics is as follows:
Figure BDA0002929904510000053
the epsilon-greedy policy balances exploration and utilization based on an exploration factor epsilon (epsilon 0, 1). Generating a random number sigma (sigma belongs to [0,1]), and when sigma is less than or equal to epsilon, using a random strategy by agent to explore the environment through randomly selecting actions to obtain experience; when σ > ε, agent uses a greedy strategy to leverage the experience that has been gained.
When agent generates state transition, inputting the current state s and the selected action a into a function R, generating reward to evaluate the state transition, and setting a reward function:
Rt(s,a)=αB-βt+γδ(s_-d)-δ
r is the reward earned at node i, selecting a link to node j. Alpha, beta, gamma and delta are used as four positive value parameters to weigh four parts of rewarding weights. B is the residual bandwidth of the link corresponding to the selected action, and t is the delay of the corresponding link. δ (s _ -d) is the stimulus function, s _ representing the state to transition after selecting action a based on state s.
Training the Q value table according to the set reward function, which comprises the following specific steps:
and setting the maximum step number of the single training.
(1) Initializing a Q value table and a reward function R;
(2) adopting a strategy based on network area characteristics, and selecting an action a;
(3) executing action a, transferring to a state s _, calculating a reward value by using a reward function R, and updating a Q value table;
(4) and judging whether s _ is a destination node or not. If not, let s be s _, go back to (2). If s _ is the destination node, the training ends.
And obtaining a path Routing by the trained Q value table, wherein the path is the converged optimal path after the link fails.
One specific application example of the present invention is as follows:
step 1: an SDN network area topological graph is constructed, and a MINET is used for constructing the network topological graph shown in the figure 1, wherein the network topological graph comprises 16 OpenFlow switches and 5 hosts. Step 2: the QLearning algorithm is used as a reinforcement learning model, and the route convergence method provided by the invention uses a Markov decision process to carry out modeling, so that the model MDP quadruple provided by the invention is defined as follows:
(1) state collection: in the network topology, each switch represents a state, and therefore, according to the network topology, the invention defines a network state set as follows:
S=[s1,s2,s3,…s16]
wherein s is1~s16Representing 16 OpenFlow switches in the network. The source node information of a packet indicates the initial state, destination, of the packetIndicates the termination status of the packet. When a certain data packet reaches the destination node, the data packet reaches the termination state. Once the current data packet reaches the termination state, the termination of one round of training is indicated, and the data packet will return to the initial state again for the next round of training.
(2) An action space: in an SDN network, the transmission path of a data packet is determined by the network state, i.e. the data packet can only be transmitted at connected network nodes. According to the network topological graph, the invention defines the network connection state as shown in the following formula:
Figure BDA0002929904510000061
since packets can only be transmitted at connected network nodes, the present invention can define a set of actions for each state S [ i ] ∈ S according to the set of network states and the network connection state as follows:
A(si)={sj|T[si][sj]=1}
indicates that the current state is at siThe state-selectable action set appears as s on the network topologyiDirectly connected nodes sjI.e. the current state siWill only select the state s connected to itj. For example: state s1The action set of (1) is: a(s)1)={s2,s4}。
(3) And (3) state transition: in each round of training, when the data packet is in state siIf the action is not the selected state of the round, the data packet moves to the next state.
(4) The reward function:
Rt(s,a)=αB-βt+γδ(s_-d)-δ
the present invention focuses on the link bandwidth performance and the link delay when planning the backup path, and therefore, α is set to 0.4, β is set to 0.3, γ is set to 0.1, and δ is set to 0.2.
In the system model, each time a data packet passes through one switch, a negative reward is obtained to represent the forwarding cost of the data packet, and the more switches pass through during forwarding, the more accumulated negative rewards are, and the higher the cost is; in order to increase the link bandwidth utilization rate, the data packet is encouraged to select a link with high link bandwidth utilization rate, and each time the data packet passes through one switch, the reward with the size equal to the size of the link utilization rate can be obtained; in order to force the data packet to reach the destination node as soon as possible, when the data packet reaches the destination node, an extra size 1 is obtained, which is expressed by the formula:
Figure BDA0002929904510000071
in the formula siIndicating the current state, i.e. the current packet is on switch number i, ajIndicating that the switch numbered j is selected.
In the invention, a network region characteristic strategy is adopted to carry out reinforcement learning model training.
After determining the MDP quadruple, when a certain link fails, a new path is searched from a source node to a destination node, and a Q value table is trained by using a QLearning algorithm:
and setting the maximum step number of the single training.
(1) Initializing a Q value table and a reward function R;
(2) adopting a strategy based on network area characteristics, and selecting an action a;
(3) executing action a, transferring to a state s _, calculating a reward value by using a reward function R, and updating a Q value table;
(4) and judging whether s' is a destination node or not. If not, let s be s _, go back to (2).
In the routing convergence planning process based on reinforcement learning, the learning rate alpha is set to be 0.8, the discount rate gamma is set to be 0.6, and the value of the action strategy epsilon-greedy strategy epsilon is epsilon-0.3.
And obtaining a path Routing according to the trained Q value table, wherein the path is the converged optimal path after the link fails.
The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the appended claims.

Claims (4)

1. An SDN route convergence method under reinforcement learning based on network characteristics is characterized by comprising the following steps:
step 1: establishing an SDN network area topological graph, and dividing network areas in a fine-grained manner;
step 2: defining a direction factor theta, setting a source node, wherein the direction factor theta is { -1, 0, 1}, when the direction factor theta from one node to another node in a topological graph is-1, the direction factor theta represents that the direction factor theta is close to the source node, when the direction factor theta from one node to another node in the topological graph is 1, the direction factor theta represents that the direction factor theta is far away from the source node, when the direction factor theta from one node to another node in the topological graph is 0, the shortest distances between the two nodes and the source node are the same, constructing a network topological hierarchical graph according to the theta values between different nodes in the topological graph and the relationship between the two nodes and the source node, and the shortest distances between all the nodes in each layer and the source node are the same;
and step 3: inputting the network topology hierarchical diagram obtained in the step 2 into a reinforcement learning model to guide agent exploration by using a Qlearning algorithm as the reinforcement learning model; when the height difference orientation is 0, the nodes are in the same layer, and the transfer between the nodes in the same layer is less influenced by layering, and the transfer is represented as a random exploration state of an agent; when the height difference between the layers is continuously increased, the agent is guided to explore the lower layer and the high-efficiency exploration state of the agent is shown;
the following formula is set:
Figure FDA0003560088920000011
h(θ)=f(θt)
wherein t represents the iteration frequency of the epsilon, and step is a set threshold value; taking an absolute value of theta according to the epsilon iteration progress by the function f (t), and specifically determining a specific value of theta by the h (theta) according to the state of the corresponding action; with continuous iteration of the epicode, in the selectable actions, the value of θ corresponding to the action close to the source node becomes smaller, and the value of θ corresponding to the action far from the source node becomes larger, so that a formula D is defined:
Figure FDA0003560088920000012
the range of the value of the interval D is from 0 to the sum of theta values corresponding to all current optional actions, the interval is divided into n parts, n is the number of the current optional actions, and the length of each subinterval is the theta value of the corresponding action;
η=random(D)
obtaining a random number eta by carrying out equal probability value on the interval D;
calculated by the following formula:
Figure FDA0003560088920000021
the function g (D, eta) is the action corresponding to the interval D where the random number eta is located, namely, the agent searches the action a to be selected;
the strategy formula of reinforcement learning based on the network area characteristics is as follows:
Figure FDA0003560088920000022
the epsilon-greedy strategy balances exploration and utilization based on an exploration factor epsilon (epsilon is epsilon to [0,1 ]); generating a random number sigma (sigma belongs to [0,1]), and when sigma is less than or equal to epsilon, using a random strategy by agent to explore the environment through randomly selecting actions to obtain experience; when sigma is larger than epsilon, agent uses a greedy strategy to utilize the obtained experience;
when agent generates state transition, inputting the current state s and the selected action a into a function R, generating reward to evaluate the state transition, and setting a reward function:
Rt(s,a)=αB-βt+γδ(s_-d)-δ
r is the reward obtained by selecting a link to the node j at the node i; alpha, beta, gamma and delta are used as four positive value parameters to weigh the weights of the four parts of rewards; b is the residual bandwidth of the link corresponding to the selected action, and t is the time delay of the corresponding link; δ (s _ -d) is an excitation function, s _ representing the state to which action a is transferred after selection based on state s;
and training the Q value table according to the set reward function, and obtaining a path Routing through the trained Q value table, wherein the path is the converged optimal path after the link fails.
2. The SDN route convergence method under reinforcement learning based on network characteristics according to claim 1, wherein the fine-grained network area division specifically includes: constructing a network connection matrix according to the SDN network topology, wherein the network connection matrix comprises the adjacency relation among nodes of the network; inputting the network connection matrix and the node number m in the network topology into a hub node election algorithm, recording the connection number of the nodes, and expressing as follows:
Figure FDA0003560088920000023
wherein, node _ link [ i ] is a node i, T [ i ] [ j ] is a link connected with the node i, and the node with the highest link connection number is selected as a pivot node; the hub node and the adjacent nodes thereof are used as a divided network area.
3. The SDN route convergence method under reinforcement learning based on network features of claim 1, wherein a process of training a Q-value table by the qlearning algorithm is specifically as follows:
setting the maximum step number of single training;
(1) initializing a Q value table and a reward function R;
(2) adopting a strategy based on network area characteristics, and selecting an action a;
(3) executing action a, transferring to a state s _, calculating a reward value by using a reward function R, and updating a Q value table;
(4) judging whether s _ is a destination node or not; if not, returning s to (2); if s _ is the destination node, the training ends.
4. The SDN route convergence method under reinforcement learning based on network characteristics of claim 1, wherein link bandwidth performance and link delay are considered when planning the backup path, so that α -0.4, β -0.3, γ -0.1, and δ -0.2 are set.
CN202110145046.2A 2021-02-02 2021-02-02 SDN route convergence method under reinforcement learning based on network characteristics Active CN112968834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110145046.2A CN112968834B (en) 2021-02-02 2021-02-02 SDN route convergence method under reinforcement learning based on network characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110145046.2A CN112968834B (en) 2021-02-02 2021-02-02 SDN route convergence method under reinforcement learning based on network characteristics

Publications (2)

Publication Number Publication Date
CN112968834A CN112968834A (en) 2021-06-15
CN112968834B true CN112968834B (en) 2022-05-24

Family

ID=76271994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110145046.2A Active CN112968834B (en) 2021-02-02 2021-02-02 SDN route convergence method under reinforcement learning based on network characteristics

Country Status (1)

Country Link
CN (1) CN112968834B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106411749A (en) * 2016-10-12 2017-02-15 国网江苏省电力公司苏州供电公司 Path selection method for software defined network based on Q learning
CN108667734A (en) * 2018-05-18 2018-10-16 南京邮电大学 It is a kind of that the through street with LSTM neural networks is learnt by decision making algorithm based on Q
CN111770019A (en) * 2020-05-13 2020-10-13 西安电子科技大学 Q-learning optical network-on-chip self-adaptive route planning method based on Dijkstra algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11126929B2 (en) * 2017-11-09 2021-09-21 Ciena Corporation Reinforcement learning for autonomous telecommunications networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106411749A (en) * 2016-10-12 2017-02-15 国网江苏省电力公司苏州供电公司 Path selection method for software defined network based on Q learning
CN108667734A (en) * 2018-05-18 2018-10-16 南京邮电大学 It is a kind of that the through street with LSTM neural networks is learnt by decision making algorithm based on Q
CN111770019A (en) * 2020-05-13 2020-10-13 西安电子科技大学 Q-learning optical network-on-chip self-adaptive route planning method based on Dijkstra algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Q-DATA: Enhanced Traffic Flow Monitoring in Software-Defined Networks applying Q-learning;Trung V. Phan 等;《 2019 15th International Conference on Network and Service Management (CNSM)》;20200227;全文 *

Also Published As

Publication number Publication date
CN112968834A (en) 2021-06-15

Similar Documents

Publication Publication Date Title
Qi et al. Knowledge-driven service offloading decision for vehicular edge computing: A deep reinforcement learning approach
Liu et al. DRL-R: Deep reinforcement learning approach for intelligent routing in software-defined data-center networks
CN110488861A (en) Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN114697229B (en) Construction method and application of distributed routing planning model
CN108521375A (en) The transmission of the network multi-service flow QoS based on SDN a kind of and dispatching method
CN112437020A (en) Data center network load balancing method based on deep reinforcement learning
Liu et al. Drl-or: Deep reinforcement learning-based online routing for multi-type service requirements
CN108075975B (en) Method and system for determining route transmission path in Internet of things environment
Singh et al. OANTALG: an orientation based ant colony algorithm for mobile ad hoc networks
CN113194034A (en) Route optimization method and system based on graph neural network and deep reinforcement learning
CN114500360A (en) Network traffic scheduling method and system based on deep reinforcement learning
Zhang et al. IFS-RL: An intelligent forwarding strategy based on reinforcement learning in named-data networking
CN114143264A (en) Traffic scheduling method based on reinforcement learning in SRv6 network
Oužecki et al. Reinforcement learning as adaptive network routing of mobile agents
Mani Kandan et al. Fuzzy hierarchical ant colony optimization routing for weighted cluster in MANET
CN110225493A (en) Based on D2D route selection method, system, equipment and the medium for improving ant colony
Zhou et al. Multi-task deep learning based dynamic service function chains routing in SDN/NFV-enabled networks
Cárdenas et al. A multimetric predictive ANN-based routing protocol for vehicular ad hoc networks
Zhang et al. A service migration method based on dynamic awareness in mobile edge computing
Liu et al. BULB: lightweight and automated load balancing for fast datacenter networks
Wei et al. GRL-PS: Graph embedding-based DRL approach for adaptive path selection
CN112968834B (en) SDN route convergence method under reinforcement learning based on network characteristics
Guo et al. A deep reinforcement learning approach for deploying sdn switches in isp networks from the perspective of traffic engineering
Chen et al. Traffic engineering based on deep reinforcement learning in hybrid IP/SR network
Cui et al. Particle swarm optimization for multi-constrained routing in telecommunication networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant