CN110233763B - Virtual network embedding algorithm based on time sequence difference learning - Google Patents

Virtual network embedding algorithm based on time sequence difference learning Download PDF

Info

Publication number
CN110233763B
CN110233763B CN201910527020.7A CN201910527020A CN110233763B CN 110233763 B CN110233763 B CN 110233763B CN 201910527020 A CN201910527020 A CN 201910527020A CN 110233763 B CN110233763 B CN 110233763B
Authority
CN
China
Prior art keywords
vne
function
state
node
vnr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910527020.7A
Other languages
Chinese (zh)
Other versions
CN110233763A (en
Inventor
王森
张标
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201910527020.7A priority Critical patent/CN110233763B/en
Publication of CN110233763A publication Critical patent/CN110233763A/en
Application granted granted Critical
Publication of CN110233763B publication Critical patent/CN110233763B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Abstract

The invention relates to a virtual network embedding algorithm based on time sequence difference learning, which models a VNE problem into a Markov Decision Process (MDP) and establishes a neural network to approximate a value function of a VNE state. On the basis, an algorithm named VNE-TD based on time sequence difference learning (a reinforced learning method) is provided. In VNE-TD, multiple embedding candidates for node mapping are probabilistic generated, and TD learning is used to evaluate the long term potential of each candidate. A large number of simulation results show that the VNE-TD algorithm is obviously superior to the previous algorithm in terms of (block ratio) blocking ratio and yield.

Description

Virtual network embedding algorithm based on time sequence difference learning
Technical Field
The invention relates to a computer network, in particular to a virtual network embedding algorithm based on time sequence difference learning.
Background
In recent years, network virtualization has received much attention from research community areas and industries due to the fact that it provides a promising solution for future networks. It is considered as a tool that can overcome the resistance of the current internet to fundamental changes. In addition, network virtualization is also a key enabler for cloud computing. The main entity of network virtualization is the Virtual Network (VN). As shown in fig. 1, a VN is a combination of virtual nodes and links on an underlying network (SN), where the numbers on a node or under a link are the node capacity and the link bandwidth, respectively. The virtual nodes are interconnected by virtual links of one or more SN paths. By virtualizing the node and link resources of one SN, multiple VNs with widely different characteristics can be hosted simultaneously on the same physical hardware. Given a set of Virtual Network Requests (VNRs) that have certain resource requirements for both nodes and links, the problem of finding a particular subset of nodes and links in a SN to satisfy each VNR is known as the Virtual Network Embedding (VNE) problem. In most implementations, the VNE problem must be treated as an online problem. That is, VNRs are not known in advance. Instead, they arrive at the system dynamically and may stay in the SN for a period of time. In practice, the VNE algorithm must handle VNRs on arrival, rather than a set of VNRs at a time (offline VNE). In making online embedded decisions for VNRs, infrastructure providers (InP, which is typically the owner of SNs) typically aim to maximize their long-term revenue, which makes VNE issues more challenging.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: a better balance between performance and computational complexity is achieved when virtual network is embedded.
In order to achieve the purpose, the invention adopts the following technical scheme: virtual network embedding calculation based on time sequence difference learning
The method comprises the following steps:
s101: establishing a VNE model
The underlying network SN is modeled as a weighted undirected graph and denoted Gs(Vs,Es) In which V issIs a set of bottom nodes, EsIs a set of underlying links, each underlying node vs∈VsIs provided with
Figure BDA0002098520450000011
Computing power, per underlying link es∈EsIs provided with
Figure BDA0002098520450000012
The bandwidth of (d);
will VNRkModeling as an undirected graph, denoted Gk(Vk,Ek) In which V iskIs a set of virtual nodes, EkIs a virtual set of links, each virtual node vk∈VkIs provided with
Figure BDA0002098520450000026
Computing power, per virtual link ek∈EkIs provided with
Figure BDA0002098520450000027
The bandwidth requirement of (d);
s102: defining states
S102 a: is VNEkDefining a reward function, such as formula (1): VNEkRepresents the procedure for the kth VNR;
Figure BDA0002098520450000021
wherein, cvRepresenting node capacity of node v, beRepresents the link bandwidth of link e, eta represents the unit price of computing resource, beta represents the bandwidth resourceA unit price of (1); therefore, it is natural that VNR will be processedkThe latter instant prize is defined as Rvn (k), i.e. rk=Rvn(k);
S102 b: define a set of operations for the VNE: the operational set of the VNE is defined as the set of all possible node mappings;
s102 c: a markov state is defined for the VNE:
representing state s using normalized remaining node capacity and link bandwidth of SNkIn the form of
Figure BDA0002098520450000022
And
Figure BDA0002098520450000023
skis an ordered set, as shown in equation (3) below:
Figure BDA0002098520450000024
in the RL, the state signal that successfully retains all relevant information is called Markov;
if the state signal has Markov properties, then the response of the environment at k +1 depends only on the state and action at k, in which case the dynamics of the environment can be determined by specifying only the following;
Pr{sk+1=s′,rk+1=r|sk,ak} (5)
s103: modeling the VNE as a Markov decision process MDP;
s103 a: defining a policy and value function: the policy of the VNE agent is a mapping from each state s and action a to the probability of taking action a, given a policy pi, the value function of the VNE is a function of the VNE state, representing the value function as Vπ(s), s∈S,Vπ(s) can be viewed as the potential to accommodate future VNRs and generate long-term revenue, which is defined as equation (8):
Figure BDA0002098520450000025
Rkis from VNRkIs the sum of all rewards, gamma is the discount rate that determines the present value of future rewards;
s103 b: defining an optimum function:
the objective of studying the VNE problem from the RL point of view is to find an optimal strategy that will yield the maximum return in the long term;
let pi*Is an optimal strategy, if and only if, given any arbitrary strategy pi, pi*>π means that for all S, S ∈ S, there is
Figure BDA0002098520450000039
The optimum function is defined as
Figure BDA0002098520450000031
For the optimum value function V*(s), there are the following iterative expressions:
Figure BDA0002098520450000032
s104: approximating an optimum function V using a neural network*(s), which is a function of the values under the optimal strategy:
approximation of the optima function V using a standard feedforward neural network with 2 fully connected (fc) layers*(s), fc1 and fc2 have the same number of nodes, denoted as H, and the input of the neural network is the state s by using the rectifier as an activation function, as shown in formula (3), the neural network takes the state s as the input and outputs the value V(s) by calculation, and the value is expected to be approximate to V*(s);
The supervised learning of the approximation function V(s) is an adjustment of neural network parameters
Figure BDA0002098520450000038
In order to minimizeV(s) and V*The difference between(s) can be expressed as:
Figure BDA0002098520450000033
as the RL process proceeds, V*(sk) Can be regarded as a sample of an approximation function V(s) for parallel supervised learning, according to the gradient descent method for VNR k, the parameters
Figure BDA0002098520450000034
The update is as follows:
Figure BDA0002098520450000035
wherein α is a positive step size parameter controlling learning speed;
s105: in VNE, given a VNR, we know the possible operations and the corresponding next state, and therefore,
Figure BDA0002098520450000036
and
Figure BDA0002098520450000037
it is determined, and as is known, the matching of each node map is traversed, and is used as an operation set, and a result state set of simulation embedding of the operation set is used as an input of the neural network in S104, so as to obtain a plurality of values of an optimal value function, and since the optimal strategy pi (S) can be expressed as:
Figure BDA0002098520450000041
i.e., the maximum value, meets the optimal policy,
s106: and selecting a matching actual embedded VNR corresponding to the optimal value function with the maximum value, and then finding the shortest path between two SN nodes with certain bandwidth to match the VN link.
As an improvement, in S105, when traversing the matching of each node mapping, it needs to perform the following reduction process first:
and generating node mapping candidate items with RW and uniform selection probability by using a probability method for generating a plurality of node mapping candidate items and using the measurement RW and the uniform value.
Compared with the prior art, the invention has at least the following advantages:
1. using neural networks to approximate the value function of VNE states helps to generalize from previously experienced states to never seen states for VNE problems with large state spaces.
2. Based on time sequence difference learning, passive balance is abandoned, the contradiction between an online embedding decision and a long-term target is overcome through active learning and online decision based on past experience, the problem of resource allocation is solved more effectively, and the resource utilization rate is improved.
Drawings
FIG. 1 is an example of a VNE problem
Fig. 2 is an example topology.
Figure 3 shows an example of a VNE problem.
The embedding results of the example of fig. 4.
Figure 5 illustrates the VNE process in the RL concept.
FIG. 6 neural network approximating an optimum function
Fig. 7(a) is a graph of the blocking ratio versus the parameter d for different algorithms, and fig. 7(b) is a graph of the benefit per second versus the parameter d for different algorithms.
Fig. 8(a) is a graph of blocking ratio versus time for different algorithms, fig. 8(b) is a graph of revenue per second versus time for different algorithms, and fig. 8(c) is a graph of WAPL versus time for different algorithms.
FIG. 9 is a graph of loss versus training times.
Fig. 10(a) is a graph of blocking ratio versus workload for different algorithms, fig. 10(b) is a graph of revenue per second versus workload for different algorithms, and fig. 10(c) is a graph of WAPL versus workload for different algorithms.
Fig. 11(a) is a graph showing an influence of the blocking ratio on the number of node mapping candidates, and fig. 11(b) is a graph showing an influence of the profit per second on the number of node mapping candidates.
Fig. 12(a) is a graph of blocking ratio versus VNRs link connectivity for different algorithms, and fig. 12(b) is a graph of revenue per second versus VNRs link connectivity for different algorithms.
Detailed Description
The present invention is described in further detail below.
The main challenge of the VNE problem is the contradiction between online decision making and pursuing long-term goals. The prior art attempts to overcome this challenge by balancing SN workloads, hopefully accommodating more future VNRs. However, the problem here is that the connectivity of the nodes is related to other nodes. The consumption of node connectivity capabilities does not necessarily only reduce its own capabilities. In fig. 3, a SN and a VNR need to be embedded in the SN. Take a node level metric (named GRC) in the prior art as an example. With the parameter d set to 0.85, the GRC value of the SN node is "origin" as shown in FIG. 4. To balance the SN workload, the GRC-VNE will select the node measured by the GRC and the two nodes with the strongest connectivity binding capability, i.e. node B and node G, to match the two nodes in the VNR (node a and node B). Thus, the remaining GRC values are shown in FIG. 4 as "After VNR embedded by GRC-VNE", and the variance of these values is 0.0032. In contrast, the VNE-TD algorithm proposed by the present invention selects node B and node C. The remaining GRC values are shown in FIG. 4 as "After VNR embedded by VNE-TD" and the variance of these values is 0.0016. This shows that the basic assumption of the prior art work of balancing SN workloads is problematic. It can neither bring a more balanced workload nor bring more remaining resources.
A virtual network embedding algorithm based on time sequence difference learning comprises the following steps:
s101: establishing a VNE model
The underlying network SN is modeled as a weighted undirected graph and denoted Gs(Vs,Es) In which V issIs a set of bottom nodes, EsIs a set of underlying links, each underlying node vs∈VsIs provided with
Figure BDA0002098520450000051
Computing power (e.g., CPU cycles), per underlying link es∈EsIs provided with
Figure BDA0002098520450000052
The bandwidth of (d); an example of SN is given at the bottom of fig. 1. The numbers around the nodes and links are their available resources.
Will VNRkModeling as an undirected graph, denoted Gk(Vk,Ek) In which V iskIs a set of virtual nodes, EkIs a virtual set of links, each virtual node vk∈VkIs provided with
Figure BDA0002098520450000053
Computing power, per virtual link ek∈EkIs provided with
Figure BDA0002098520450000054
The bandwidth requirement of (d);
an example of a VNR is given at the top of fig. 1. For VNR k, tkFor VNR arrival time, a defined value lkIs the lifetime of the VNR.
S102: defining states
S102 a: is VNEkDefining a reward function, such as formula (1): VNEkRepresents the procedure for the kth VNR;
Figure BDA0002098520450000061
wherein, cvRepresenting node capacity of node v, beRepresenting the link bandwidth of the link e, eta representing the unit price of the computing resource, and beta representing the unit price of the bandwidth resource;
the goal of the reward function is to maximize the long-term time-averaged benefit of InP, as follows:
Figure BDA0002098520450000062
wherein KT={k|0<tk< T } represents a plurality of sets of VNRs before time instance T arrives;
the reward function is intended to provide an immediate goodwill measure for a certain behavior in a given state. As can be seen from equation (2), the objective of the VNE problem is to maximize the long-term average time gain of InP. Therefore, it is natural to define the instant prize after processing vnrk as rvn (k), i.e., rk ═ rvn (k).
S102 b: define a set of operations for the VNE: the operational set of the VNE is defined as the set of all possible node mappings;
s102 c: a markov state is defined for the VNE:
representing state s using normalized remaining node capacity and link bandwidth of SNkIn the form of
Figure BDA0002098520450000063
And
Figure BDA0002098520450000064
skis an ordered set, as shown in equation (3) below:
Figure BDA0002098520450000065
in the RL, the state signal that successfully retains all relevant information is called Markov;
for markov states, all that is important is the current state signal; its meaning is independent of the path or history to it. More specifically, in the most common cause and effect relationships, the response of the environment may depend on everything that has occurred before. In most RL problems, the transfer function is a probability function. In this case, the dynamics can only be represented by specifying a complete probability distribution:
Pr{sk+1=s′,rk+1=r|sk,ak,rk,sk-1,ak-1,...,r1,s0,a0} (4)
on the other hand, if the state signal has Markov properties, then the response of the environment at k +1 depends only on the state and action at k, in which case the dynamics of the environment can be determined by specifying only the following;
Pr{sk+1=s′,rk+1=r|sk,ak} (5)
s103: modeling the VNE as a Markov decision process MDP;
s103 a: defining a policy and value function:
the reinforcement learning task satisfying the Markov characteristic is called a Markov decision process, and the VNE state provided by the invention is a Markov state, so that the decision process of the VNE problem can be perfectly modeled as MDP.
In MDP, given an arbitrary state s and action a, the probability of each possible next state s' is expressed as:
Figure BDA0002098520450000071
these quantities are called transition probabilities; likewise, the expected value of the next prize is noted as:
Figure BDA0002098520450000072
from the RL perspective, the objective of the VNE is to find an action-optimal policy that is optimal at any time and under any state;
the policy of the VNE agent is a mapping from each state s and action a to the probability of taking action a in state s, representing the policy and the corresponding probability as pi and pi (s, a);
almost all reinforcement learning algorithms estimate the quality of an agent in a given state based on an estimation value function and a state function;
given policySlightly π, the value function of VNE is a function of the state of VNE, and the value function is represented as Vπ(s),s∈S,Vπ(s) can be viewed as the potential to accommodate future VNRs and generate long-term revenue, formally defined as equation (8):
Figure BDA0002098520450000073
Rkis the sum of all rewards from VNR k, gamma is the discount rate that determines the present value of future rewards,
s103 b: defining an optimum function:
the objective of studying the VNE problem from the RL point of view is to find an optimal strategy that will yield the maximum return in the long term;
let pi*Is an optimal strategy, if and only if, given any arbitrary strategy pi, pi*>π means that for all S, S ∈ S, there is
Figure BDA0002098520450000075
The optimum function is defined as
Figure BDA0002098520450000074
For the optimum value function V*(s), there are the following iterative expressions:
Figure BDA0002098520450000081
s104: solving V(s) by the neural network to make V(s) approximate to an optimal value function V*(s):
Approximation of the optima function V using a standard feedforward neural network with 2 fully connected (fc) layers*(s) as shown in FIG. 6. The nodes at the fc1 and the fc2 are the same in number, denoted as H, and use a rectifier as an activation function, the input of the neural network is a state s, as shown in formula (3), and the neural network takes the state s as the input and outputs a value V(s) through calculation, which is expected to be similar to that of the state sV*(s);
The supervised learning of the approximation function V(s) is an adjustment of neural network parameters
Figure BDA0002098520450000088
In order to minimize V(s) and V*The difference between(s) can be expressed as:
Figure BDA0002098520450000082
as the RL process proceeds, V*(sk) Can be regarded as a sample of an approximation function V(s) for parallel supervised learning, according to the gradient descent method for VNR k, the parameters
Figure BDA0002098520450000083
The update is as follows:
Figure BDA0002098520450000084
wherein α is a positive step size parameter controlling learning speed;
s105: in VNE, given a VNR, we know the possible operations and the corresponding next state. Therefore, the temperature of the molten metal is controlled,
Figure BDA0002098520450000085
and
Figure BDA0002098520450000086
is deterministic and known. And traversing the mapping matching of each node, taking the mapping matching as an operation set, and taking a result state set of simulation embedding of the operation set as the input of the neural network in the S104 to obtain a plurality of values of the optimal value function. Since the optimal strategy pi(s) can be expressed as:
Figure BDA0002098520450000087
i.e. the maximum value, the optimal strategy is met.
S106: and selecting the node corresponding to the value function with the maximum value to be actually embedded into the VNR, and then finding the shortest path between two SN nodes with certain bandwidth to match the VN link.
The present invention uses a RL method, i.e., time sequence difference (abbreviated TD) learning, to update the estimate of the optimum function and make an embedding decision based on the estimate, specifically, TD learning updates its estimate V*(s) the following:
Figure BDA0002098520450000091
as mentioned above, V*(s) approximating the parameters of equation (11) by a neural network in combination with a TD algorithm
Figure BDA0002098520450000092
The update transformation of (d) is:
Figure BDA0002098520450000093
according to the above update rule, V*(s) and V(s) are in the process of TD and supervised learning, respectively, and are performed simultaneously.
The algorithm VNE-TD is a function of embedding decisions at VNR arrival. As shown in the algorithm VNE-TD, the input state to the neural network is the result state of each node mapping candidate simulation embedding, and the node with the largest value is selected to actually embed in the VNR. After the node mapping is established, the shortest path between two SN nodes with certain bandwidth is found to match the VN link. If a partitionable stream is allowed, then AND [12 ] is used]The same multi-commodity flow algorithm maps virtual links. According to expression (12), we should maximize the selection
Figure BDA0002098520450000094
Is matched to j. Because the reward (r ═ rvn (vnr)) is the same for the candidate, we can choose a match j that maximizes v (sjn). When the VNR's lifecycle is over, it will be offOpens the SN and releases the resources allocated to it as described earlier. The state of the SN may change. However, the parameters of the neural network are not updated when the VNR leaves and arrives.
As an improvement, in S105, when determining the optimal value function with the maximum value, the possible operation set is too large to be used
Traversing, firstly, performing the following reduction processing on the operation set: using a probabilistic approach to generating multiple node mapping candidates,
using the metric RW and the uniform value, node mapping candidates with RW and uniform selection probability are generated.
The method of the invention is described in detail as follows:
TABLE 1 symbols and notations used in the present invention
Figure BDA0002098520450000095
Figure BDA0002098520450000101
1.1VNE model
Modeling SN as a weighted undirected graph and representing it as Gs(Vs,Es) In which V issIs a set of bottom nodes, EsIs the underlying set of links. Each bottom node vs∈VsIs provided with
Figure BDA0002098520450000102
Computing power (e.g., CPU cycles), per underlying link es∈EsIs provided with
Figure BDA0002098520450000103
The bandwidth of (c). An example of SN is given at the bottom of fig. 1. The numbers around the nodes and links are their available resources.
1.1.1 virtual network request
A VNR k can also be modeled as an undirected graph, denoted Gk(Vk,Ek) In which V iskIs a set of virtual nodes and EkIs a collection of virtual links. Each virtual node vk∈VkIs provided with
Figure BDA0002098520450000104
Computing power, per virtual link ek∈EkIs provided with
Figure BDA0002098520450000105
Bandwidth requirements of (2). An example of a VNR is given at the top of fig. 1. For VNR k, tkFor VNR arrival time, a defined value lkIs the lifetime of the VNR.
2.2 VNE Process
For VNR k, the VNE flow consists of two key components, node mapping and link mapping
2.2.1 node mapping
The node mapping may be described as a one-to-one mapping, i.e., MN:Vk→VsThus, for MN(vk)=vs,vk∈VkAnd vs∈VsThe following two conditions must be satisfied (1) if
Figure BDA0002098520450000111
Then
Figure BDA0002098520450000112
(2)
Figure BDA0002098520450000117
The first constraint ensures that any two nodes of the VNR map to two different nodes of the SN, and the second constraint requires that each VN node map to one SN node with a certain node capacity.
2.2.2 Link mapping
In the link mapping phase, for one virtual link in the VNR, a set of paths needs to be found between two mapping nodes in the SN, and the total available bandwidth of these nodes is larger than the requirement of the virtual link. In the present invention, only single-path mapping is consideredThe case of (1), i.e., one virtual link can only map to one SN path. In the case of single-path mapping, the link mapping may be a mapping
Figure BDA0002098520450000118
And (4) showing. Wherein the content of the first and second substances,
Figure BDA0002098520450000119
is GsOf all paths. For the
Figure BDA0002098520450000113
The following conditions must be satisfied:
Figure BDA0002098520450000114
the VNE problem must be handled as an online problem. VNRs arrive dynamically to the system, and the VNE algorithm must handle VNRs as they arrive.
2.3VNE revenue model and goal
VNE yield model is similar, and the InP-generated yield is represented by:
Figure BDA0002098520450000115
where η and β represent the unit price of the computational resource and the bandwidth resource, respectively.
The goal is to maximize the long-term time-averaged benefit of InP as follows:
Figure BDA0002098520450000116
wherein KT={k|0<tk< T } denotes the set of VNRs before the time instance T arrives.
3. Fitting VME in RL model
RL is how the learning algorithm maps context to actions, maximizing the digital reward signal. As shown in fig. 5, the agent is the subject of learning, and the environment is the object of learning. The agent is capable of performing the operation. Performing an operation may leave the agent in the current state or cause a transition of the state space to another state. The conversion function may be a probability function or a deterministic function. The environment generates a reward for the agent as a result of the agent's actions. Typically, the value of the reward is calculated by a predetermined reward function that is used to control the enhancement process to the agent.
The purpose of the reward function is to provide an immediate goodwill measure for a certain activity in a particular state. The reward for each action depends on whether the new state is better than the current state. Over time, the agent attempts to learn the best operation to perform for each particular state, i.e., the operation that maximizes the overall long-term return. In the RL, a function is involved that indicates what is best in the future by accumulating the relevant instantaneous reward within a limited range.
As will be explained in more detail below, on the one hand, the objective of the VNE problem is to maximize the long-term time-averaged yield of InP; on the other hand, the embedding decision should be made immediately after the VNR appears, based on the present situation and past experience. The nature of the VNE problem provides a good environment for RL participation, both with long-term goals and with online decisions. In fig. 5, it is shown how to fit the VNE problem in the RL model. Regarding the VNE problem, considering SN and ever-coming VNRs as a whole from the RL point of view constitutes the environment. In the VNE problem, processing one VNR forms one RL cycle. For VNR k +1, the VNE agent depends on the current state skAnd previous experience in which all previous states and rewards may be included gives an embedded policy ak,. In action akThe environment then gives the resulting state sk+1And a prize rk+1
3.2 defining a reward function for the VNE
As previously mentioned, the reward function is intended to provide an immediate goodwill measure for a certain behavior at a given state. As can be seen from equation (2), the objective of the VNE problem is to maximize the long-term average time gain for InP. Thus, it is natural to define the immediate reward after processing VNRk as Rvn (k), i.e., rk=Rvn(k)。
It is clear that such a reward function can easily be adapted to other objectives of the VNE. This means that solving VNE problems with RL is very flexible. For example, if the objective of the VNE is to minimize the blocking ratio, then i can set the reward to 1 if the VNR is successfully embedded, and 0 otherwise.
3.3 defining an operation set and a Markov State for the VNE
How states and behaviors are defined is the key to the performance of the RL. In the present invention, the set of operations of the VNE is defined as the set of all possible node mappings. If the embedding is unsuccessful according to the action of the node mapping, the VNR will be blocked and no action is done on the SN.
In the VNE problem, we know the current VNR but not the next one. Therefore, before the next VNR arrives, if the VNR state representing the environment is included, the next state of the environment cannot be determined. Thus, while the environment of the VNE problem includes a SN and multiple VNRs as shown in fig. 5, we use only the state of the SN to represent the environment.
We use the normalized remaining node capacity and link bandwidth of the SN to represent state skIn the form of
Figure BDA0002098520450000121
And
Figure BDA0002098520450000122
skis an ordered set, as follows:
Figure BDA0002098520450000131
in the RL, the state signal that successfully retains all relevant information is referred to as Markov.
For markov states, all that is important is the current state signal; its meaning is independent of the path or history to it. More specifically, in the most common cause and effect relationships, the response of the environment may depend on everything that has occurred before. In most RL problems, the transfer function is a probability function. In this case, the dynamics can only be represented by specifying a complete probability distribution:
Pr{sk+1=s′,rk+1=r|sk,ak,rk,sk-1,ak-1,...,r1,s0,a0} (4)
on the other hand, if the state signal has Markov properties, then the response of the environment at k +1 depends only on the state and action at k, in which case the dynamics of the environment can be determined by specifying only:
Pr{sk+1=s′,rk+1=r|sk,ak} (5)
3.4 modeling VNE as a Markov decision Process
The reinforcement learning task that satisfies the markov property is called the Markov Decision Process (MDP). Since the VNE state given by the present invention is a markov state, the decision process of the VNE problem can be perfectly modeled as an MDP.
In MDP, given an arbitrary state s and action a, the probability of each possible next state s' is expressed as:
Figure BDA0002098520450000132
these quantities are called transition probabilities. Likewise, the expected value of the next prize is noted as:
Figure BDA0002098520450000133
from the RL point of view, the VNE's goal is to find an action-optimal policy that selects the best at any time and under any state.
Defining: the policy of the VNE agent is a mapping from each state s and action a to the probability of taking action a in state s. We denote the strategy and corresponding probability as π and π (s, a).
Defining: given a policy π, the value function of VNE is a function of VNE state. We denote the value function as Vπ(s),s ∈S。Vπ(s) can be viewed as the potential to accommodate future VNRs and generate long-term revenue. It is formally defined as follows:
Figure BDA0002098520450000134
Rkis the sum of all rewards from VNR k. Gamma is the discount rate that determines the present value of the future prize.
The objective of studying the VNE problem from the RL point of view is to find an optimal strategy pi that will yield the maximum return in the long term.
Defining: pi*Is an optimal strategy, if and only if, given any arbitrary strategy pi, pi*>π means that for all S, S ∈ S, there is
Figure BDA0002098520450000146
Defining: the optimum function is defined as
Figure BDA0002098520450000141
Proposition: for the optimum value function V*(s), we have the following iterative expression:
Figure BDA0002098520450000142
and (3) proving that:
Figure BDA0002098520450000143
equation (9) represents the relationship between the optimal value of the current state and the optimal value of the possible next state, giving the optimal value function, how to get the optimal action.
3.5 approximation of the optimal function
In the present invention, we approximate the optimal function V using a standard feedforward neural network with 2 fully-connected (fc) layers*(s) as shown in FIG. 6. The fc1 and fc2 nodes have the same number and are marked as H. A rectifier was used as the activation function, which is probably the most common activation function for deep neural networks as of 2018. The input to the neural network is state s, as shown in equation (3). By calculation, the neural network takes the state s as input and outputs the value V(s), which is expected to approximate V*(s)。
The supervised learning of the approximation function V(s) is an adjustment of neural network parameters
Figure BDA0002098520450000145
The process of (1). The objective is to minimize V(s) and V*The difference between(s) can be expressed as:
Figure BDA0002098520450000144
as the RL process proceeds, V*(sk) Can be regarded as a sample of an approximation function V(s) for parallel supervised learning, according to the gradient descent method for VNRk, the parameters
Figure BDA0002098520450000151
The update is as follows:
Figure BDA0002098520450000152
where α is a positive step size parameter that controls the learning rate.
3.6 solving VNE problem with TD learning
Computing V by approximation of neural networks in the learning process*(s). In VNE, given a VNR, we know the possible operations and the corresponding next state. Therefore, the temperature of the molten metal is controlled,
Figure BDA0002098520450000153
and
Figure BDA0002098520450000154
is deterministic and known. Optimal action pi*(s) can be calculated from the following formula:
Figure BDA0002098520450000155
however, the set of possible operations is too large to traverse. Therefore, we need to significantly reduce the search space. As shown in the algorithm GC _ GRC below, a probabilistic method of generating multiple node mapping candidates was developed using a node ranking metric (named GRC). However, the algorithm of the present invention is independent of the metrics of the GRC. Two other metrics are considered, namely the metric (called RW) and the uniform value. The two algorithms that generate node mapping candidates with RW and uniform selection probability are GC _ RW and GC _ UNI, respectively. In the algorithm GC _ GRC, the parameter L is the generated node mapping candidate number.
In the present invention, a RL method, i.e. sequential difference (abbreviated TD) learning, is used to update the estimate of the optimum function and make embedding decisions based on the estimate. Specifically, TD learning updates its estimate V*(s) the following:
Figure BDA0002098520450000156
Figure BDA0002098520450000161
V*(s) are approximated by a neural network. Combining the parameters of the formula (11) with a TD algorithm
Figure BDA0002098520450000162
The update transformation of (d) is:
Figure BDA0002098520450000163
according to the aboveSaid update rule, V*(s) and V(s) are in the process of TD and supervised learning, respectively, and are performed simultaneously.
The algorithm VNE-TD is a function of embedding decisions at VNR arrival. In VNE-TD, neural network parameters
Figure BDA0002098520450000164
Initialization is performed according to a normal distribution. As shown in the algorithm VNE-TD, the input state to the neural network is the result state of each node mapping candidate simulation embedding, and the node with the largest value is selected to actually embed in the VNR. After the node mapping is established, the shortest path between two SN nodes with certain bandwidth is found to match the VN link. If partitionable streams are allowed, the virtual links are mapped using a multiple commodity stream algorithm. According to expression (12), maximization should be selected
Figure BDA0002098520450000165
Is matched to j. Since the reward (r ═ rvn (vnr)) is the same for the candidate, one can choose to maximize V(s)j n) Is matched to j. After embedding VNR, algorithm VNE-TD stores triples in memory<sc,r,sn>As shown on line 26. The maximum number of triples that the memory can store is set to 1000. The memory follows the FIFO (first in, first out) replacement rule. In order to make the training of the neural network smoother and optimized, the parameters are compared with the single-step mode described by expression (14)
Figure BDA0002098520450000166
Is updated in batches. And randomly extracting the ternary groups with the batch sizes from the memory by the VNE-TD, and training the neural network by using the ternary groups with the batch sizes. As shown in equation (14), a triplet<sc,r,sn>The training error of (1) is r + gamma Vk(sn)-Vk(sc). The goal of the batch training process is to minimize the mean square error, i.e., loss, of the batch. As shown on line 2, the VNE-TD may use any one of three algorithms, i.e., GC _ GRC, GC _ RW, or GC _ UNI. Algorithms using GC _ GRC, GC _ RW, or GC _ UNI are named VNE-TD-GRC, VNE-TD-RW, or VNE-TD-UNI, respectively.
Figure BDA0002098520450000171
When the VNR's lifecycle ends, it will leave the SN and release the previously described resources allocated to it. The state of the SN will change. However, the parameters of the neural network are not updated when the VNR leaves and arrives.
Evaluation of
1. Benchmark test and performance index
The VNE-TD is compared to prior art algorithms.
The VNE-TD is compared to other algorithms using mainly three performance indicators (1) the blocking ratio is the number of blocking VNRs divided by the total number of all VNRs; (2) revenue per second is the total revenue obtained so far divided by the number of seconds elapsed; (3) the Weighted Average Path Length (WAPL) is the sum of all bandwidths actually allocated in the SN divided by the sum of the link bandwidths of all VNRs, i.e. the weighted average length of all paths to which a VNR link is mapped.
2. Simulation setup
An event-driven simulation environment is implemented using Python. Neural networks and their training are implemented using Tensorflow, a popular open source software library for machine learning applications such as neural networks. In the simulation, the topology of the SN and VNs was randomly generated using the GT-ITM tool. The SN has 60 nodes and 150 links. The number of VN nodes is uniformly distributed between 2 and 20, and the link connectivity between any two nodes is 0.2 in VNs. 4000 VNRs need to be embedded in the SN. For both SN and VNs networks, the initial node capacity and link bandwidth are chosen randomly, with a uniform distribution of the same average. The average of the node capacity and link bandwidth of the SN is 40 times VNs. VNRs arrive one after the other, forming a poisson process, with an average arrival rate of one request per second. VNRs have a lifetime that follows an exponential distribution with an average of μ ═ 70 seconds. The values of the parameters η and β in expression (1) in the profit model are set to 1. The reduction ratio in equation (8) is set to 1 because we find that setting γ to 1 makes the neural network converge more smoothly and faster. For a neural network, we set the number of hidden layer nodes H to 300, which is the same size as the number of inputs to the neural network. The batch size in the following evaluation subsection was empirically set to 50. The number of node mapping candidates (i.e., L) is set to 40. Unless otherwise stated, the above parameters are not altered in the following subsections.
Each simulation series in the following subsections, except 4, will run three times. The same topology of SNs and VNRs as described above will be used each time, as well as a different set of random node capacities and link bandwidths. The standard deviation of the three runs is represented by error bars as the following simulation results.
1. Robustness of the GRC parameter d
In general, the computation of a GRC is based on two factors, namely node capacity and connectivity to other nodes. These two factors are balanced by the parameter d of the GRC. In fig. 7(a), the blocking ratio of the different algorithms is shown. In fig. 7(b), revenue per second is shown. As can be seen from FIG. 7, VNE-TD-GRC is insensitive to parameter d, while performance of GRC-VNE is significantly dependent on parameter d. Furthermore, when d is relatively small, the deviation of GRC-VNE is very large. The offset of VNE-TD-GRC is small and stable. Under the blocking condition set by simulation, the demand of link bandwidth is larger than the node capacity and is more critical. Therefore, for GRC-VNE, the parameter d needs to be adjusted to close to 1.00 to support the factor of connectivity capability while almost ignoring the factor of node capacity. In contrast, VNE-TD-GRC uses only the metrics of GRC to help narrow the search range, relying on a value function to make the final decision for node mapping. This is why VNE-TD-GRC is insensitive to the parameter d compared to GRC-VNE. Clearly this is a very desirable property of VNE-TD-GRC, since VNRs are not known in advance and can vary greatly over time.
Therefore, the present invention sets the parameter d to VNE-TD-GRC of 0.95 and GRC-VNE of 0.995.
2. Influence of TD learning
To show the effect of TD learning, we compared VNE-TD-GRC using the Rand-GRC algorithm (referring to randomly selected GRC). Similar to the algorithm VNE-TD-GRC, the algorithm Rand-GRC uses the algorithm GC-GRC to probabilistically generate L node mapping candidates. Except that instead of selecting the maximum value denoted by v(s), a candidate is randomly selected from all candidates that can be successfully embedded. This means that Rand-GRC loses learning ability compared to VNE-TD-GRC. In the simulations of this subsection, L is set to 10.
As can be seen from fig. 8(a), although the node mapping is probabilistic, the blocking ratio of the algorithm Rand-GRC is better than the GRC-VNE due to the multiple candidates. This means that even during training, VNE-TD-GRC can still perform better than GRC-VNE. Furthermore, when TD learning involves selecting the optimum from a plurality of candidates, the occlusion ratio is significantly improved by 67.2% at 3900, as compared to GRC-VNE. As can be seen from FIG. 8(b), the VNE-TD-GRC algorithm can increase the yield by 13.9% per second at 3900 compared to GRC-VNE. Interestingly, Rand-GRC is almost as good as GRC-VNE in revenue per second, although it is better than GRC-VNE in blocking ratio. It appears that Rand-GRC is only good at embedding VNRs that are of low yield and relatively easy to handle. As can be seen from FIG. 8(c), the algorithm Rand-GRC significantly improves WAPL compared to GRC-VNE due to the probability mapping of the nodes. The algorithm TD-VNE-GRC can effectively overcome the defect. This means that using TD learning can help increase revenue per second by keeping the blocking ratio and WAPL low.
In fig. 9, we show the variation of the loss as the number of training increases. The loss is the mean square error of the training batch, which is the minimum objective of the training process. As can be seen from fig. 9, the loss converges to a local optimum at the 700 th training, i.e. the time after processing the 700 th VNR. At local optimum, the loss is about 400 (error about 20). The average reward is about 92 and the loss at local optimality is relatively small, which may mean that the approximation with the proposed neural network works well.
3. Effects of workload
We demonstrate the effect of workload by changing the mean lifetime of VNRs from 40 seconds to 100 seconds. We also add the algorithm LC-GRC (representing the node where the GRC cost is lowest, (our algorithm is to select the largest performing contrast) which uses the algorithm GC-GRC to generate the L node mapping candidates and select the lowest cost candidate in the SNs.
As can be seen from fig. 10, the blocking ratio and the per second gain of the proposed three VNE-TD algorithms are continuously improved with increasing workload compared to the other algorithms. Wherein the yield per second of the algorithm VNE-TD-GRC at the highest workload is increased by 24.8% and 17.1% respectively compared to GRC-VNE and RW-MM-SP.
The algorithm VNE-TD-GRC performs best in three versions of VNE-TD. The algorithm VNE-TD-UNI performs the worst, with the greatest variance among the three versions. This means that the two indices GRC and RW do help the VNE-TD to focus on a more promising search area, although the magnitude of the improvement is not large. In addition, it also shows the potential of VNE-TD in combination with other VNE algorithms.
4. Influence of the parameter L
In fig. 11(a) and (b), we show the effect of the number of node mapping candidates, i.e. the parameter L. this shows that VNE-TD-GRC can further improve the blocking ratio and revenue from 79.6% and 17.4%, 82.3% and 18.3% respectively per second compared to GRC-VNE, while L increases from 40 to 60%. Increasing L from 40 to 60 does not result in an unacceptable increase in computation time, depending on the computational complexity of VNE-TD in section 3.7.
5. Influence of topological properties
In fig. 12, we demonstrate the effect of VN node link connectivity. As the connectivity of the link increases, the connectivity of the VN node also increases, which means that the difficulty of embedding also increases. As can be seen from fig. 12, VNE-TD-GRC works better than GRC-VNE when link connectivity is higher. When the link connectivity is 0.5, the yield per second of VNE-TD-GRC is 23.1% higher than GRC-VNE.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (2)

1. A virtual network embedding method based on time sequence difference learning is characterized in that: the method comprises the following steps:
s101: establishing a VNE model
The underlying network SN is modeled as a weighted undirected graph and denoted Gs(Vs,Es) In which V issIs a set of bottom nodes, EsIs a set of underlying links, each underlying node vs∈VsIs provided with
Figure FDA0002945038380000011
Computing power, per underlying link es∈EsIs provided with
Figure FDA0002945038380000012
The bandwidth of (d);
will VNRkModeling as an undirected graph, denoted Gk(Vk,Ek) In which V iskIs a set of virtual nodes, EkIs a virtual set of links, each virtual node vk∈VkIs provided with
Figure FDA0002945038380000013
Computing power, per virtual link ek∈EkIs provided with
Figure FDA0002945038380000014
The bandwidth requirement of (d);
s102: defining states
S102 a: is VNEkDefining a reward function, such as formula (1): VNEkRepresents the procedure for the kth VNR;
Figure FDA0002945038380000015
wherein, cvRepresenting node capacity of node v, beRepresenting the link bandwidth of the link e, eta representing the unit price of the computing resource, and beta representing the unit price of the bandwidth resource; therefore, it is natural that VNR will be processedkThe latter instant prize is defined as Rvn (k), i.e. rk=Rvn(k);
S102 b: define a set of operations for the VNE: the operational set of the VNE is defined as the set of all possible node mappings;
s102 c: a markov state is defined for the VNE:
representing state s using normalized remaining node capacity and link bandwidth of SNkIn the form of
Figure FDA0002945038380000016
And
Figure FDA0002945038380000017
skis an ordered set, as shown in equation (3) below:
Figure FDA0002945038380000018
in the RL, the state signal that successfully retains all relevant information is called Markov;
if the state signal has Markov properties, then the response of the environment at k +1 depends only on the state and action at k, in which case the dynamics of the environment can be determined by specifying only the following;
Pr{st+1{s′,rk+1=r|sk,ak} (5)
s103: modeling the VNE as a Markov decision process MDP;
s103 a: defining a policy and value function: a policy of a VNE agent is a mapping from each state s and action a to the probability of taking action a, given a policy π, that the value function of the VNE is a function of the VNE state, representing the value function as Vπ(s),s∈S,Vπ(s) can be viewed as accommodating future VNRs and producing long-termThe income potential is used for measuring the quality of the current state, and the income potential is defined as the formula (8):
Figure FDA0002945038380000021
Rkis from VNRkIs the sum of all rewards, gamma is the discount rate that determines the present value of future rewards;
s103 b: defining an optimum function:
the objective of studying the VNE problem from the RL point of view is to find an optimal strategy that will yield the maximum return in the long term;
let pi*Is an optimal strategy, if and only if, given any arbitrary strategy pi, pi*>Pi means that for all S, S e S, there is V pi*(s)>=Vπ(s);
The optimum function is defined as
Figure FDA0002945038380000022
For the optimum value function V*(s), there are the following iterative expressions:
Figure FDA0002945038380000023
s104: approximating an optimum function V using a neural network*(s), which is a function of the values under the optimal strategy:
approximation of the optima function V using a standard feedforward neural network with 2 fully connected (fc) layers*(s), fc1 and fc2 have the same number of nodes, denoted as H, and the input of the neural network is the state s by using the rectifier as an activation function, as shown in formula (3), the neural network takes the state s as the input and outputs the value V(s) by calculation, and the value is expected to be approximate to V*(s);
The supervised learning of the approximation function V(s) is an adjustment of neural network parameters
Figure FDA0002945038380000024
In order to minimize V(s) and V*The difference between(s) can be expressed as:
Figure FDA0002945038380000025
as the RL process proceeds, V*(sk) Can be regarded as a sample of an approximation function V(s) for parallel supervised learning, according to the gradient descent method for VNR k, the parameters
Figure FDA0002945038380000026
The update is as follows:
Figure FDA0002945038380000027
wherein α is a positive step size parameter controlling learning speed;
s105: in VNE, given a VNR, we know the possible operations and the corresponding next state, and therefore,
Figure FDA0002945038380000031
and
Figure FDA0002945038380000032
it is determined, and as is known, the matching of each node map is traversed, and is used as an operation set, and a result state set of simulation embedding of the operation set is used as an input of the neural network in S104, so as to obtain a plurality of values of an optimal value function, and since the optimal strategy pi (S) can be expressed as:
Figure FDA0002945038380000033
i.e., the maximum value, meets the optimal policy,
s106: and selecting a matching actual embedded VNR corresponding to the optimal value function with the maximum value, and then finding the shortest path between two SN nodes with certain bandwidth to match the VN link.
2. The virtual network embedding method based on timing difference learning of claim 1, wherein: in S105, when traversing the matching of each node map, it needs to perform the following reduction process first:
using a probabilistic method of generating a plurality of node mapping candidates, using the metric RW and the uniform value, node mapping candidates with RW and uniform selection probabilities are generated.
CN201910527020.7A 2019-07-19 2019-07-19 Virtual network embedding algorithm based on time sequence difference learning Active CN110233763B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910527020.7A CN110233763B (en) 2019-07-19 2019-07-19 Virtual network embedding algorithm based on time sequence difference learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910527020.7A CN110233763B (en) 2019-07-19 2019-07-19 Virtual network embedding algorithm based on time sequence difference learning

Publications (2)

Publication Number Publication Date
CN110233763A CN110233763A (en) 2019-09-13
CN110233763B true CN110233763B (en) 2021-06-18

Family

ID=67859663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910527020.7A Active CN110233763B (en) 2019-07-19 2019-07-19 Virtual network embedding algorithm based on time sequence difference learning

Country Status (1)

Country Link
CN (1) CN110233763B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TR202104311A2 (en) * 2021-03-05 2021-04-21 Havelsan Hava Elektronik Sanayi Ve Ticaret Anonim Sirketi METHOD TO SOLVE THE VIRTUAL NETWORK EMBEDDING PROBLEM IN 5G AND BEYOND NETWORKS BY DEEP INFORMATION MAXIMIZATION USING MULTI-PHYSICAL NETWORK
CN113193999B (en) * 2021-04-29 2023-12-26 东北大学 Virtual network mapping method based on depth deterministic strategy gradient

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103259744A (en) * 2013-03-26 2013-08-21 北京航空航天大学 Method for mapping mobile virtual network based on clustering
CN103457752A (en) * 2012-05-30 2013-12-18 中国科学院声学研究所 Virtual network mapping method
CN108650191A (en) * 2018-04-20 2018-10-12 重庆邮电大学 The decision-making technique of mapping policy in a kind of virtualization network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10097372B2 (en) * 2014-01-09 2018-10-09 Ciena Corporation Method for resource optimized network virtualization overlay transport in virtualized data center environments

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103457752A (en) * 2012-05-30 2013-12-18 中国科学院声学研究所 Virtual network mapping method
CN103259744A (en) * 2013-03-26 2013-08-21 北京航空航天大学 Method for mapping mobile virtual network based on clustering
CN108650191A (en) * 2018-04-20 2018-10-12 重庆邮电大学 The decision-making technique of mapping policy in a kind of virtualization network

Also Published As

Publication number Publication date
CN110233763A (en) 2019-09-13

Similar Documents

Publication Publication Date Title
CN107995039B (en) Resource self-learning and self-adaptive distribution method for cloud software service
Zhang et al. Intelligent cloud resource management with deep reinforcement learning
CN112882815B (en) Multi-user edge calculation optimization scheduling method based on deep reinforcement learning
Marden et al. Game theory and distributed control
WO2020082973A1 (en) Load prediction method and apparatus based on neural network
Yu et al. Efficient task sub-delegation for crowdsourcing
CN111124689A (en) Dynamic allocation method for container resources in cluster
Su et al. Optimal resource allocation in sdn/nfv-enabled networks via deep reinforcement learning
CN110233763B (en) Virtual network embedding algorithm based on time sequence difference learning
Rjoub et al. A trust and energy-aware double deep reinforcement learning scheduling strategy for federated learning on IoT devices
CN110247795A (en) A kind of cloud net resource service chain method of combination and system based on intention
CN109976901A (en) A kind of resource regulating method, device, server and readable storage medium storing program for executing
Cheng et al. VNE-HRL: A proactive virtual network embedding algorithm based on hierarchical reinforcement learning
Liu et al. Towards a robust meta-reinforcement learning-based scheduling framework for time critical tasks in cloud environments
WO2023089350A1 (en) An architecture for a self-adaptive computation management in edge cloud
Fan et al. DRL-D: revenue-aware online service function chain deployment via deep reinforcement learning
Liu et al. Contextual learning for content caching with unknown time-varying popularity profiles via incremental clustering
CN116033026A (en) Resource scheduling method
CN113220437B (en) Workflow multi-target scheduling method and device
CN113037648B (en) Data transmission method and device
CN115220818A (en) Real-time dependency task unloading method based on deep reinforcement learning
Gowri et al. Fog-cloud enabled internet of things using extended classifier system (XCS)
Suzuki et al. Cooperative multi-agent deep reinforcement learning for dynamic virtual network allocation
CN111027709B (en) Information recommendation method and device, server and storage medium
CN116339932A (en) Resource scheduling method, device and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant