CN115686846B - Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation - Google Patents
Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation Download PDFInfo
- Publication number
- CN115686846B CN115686846B CN202211347967.8A CN202211347967A CN115686846B CN 115686846 B CN115686846 B CN 115686846B CN 202211347967 A CN202211347967 A CN 202211347967A CN 115686846 B CN115686846 B CN 115686846B
- Authority
- CN
- China
- Prior art keywords
- representing
- container
- request
- physical node
- deployment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000002787 reinforcement Effects 0.000 title claims abstract description 18
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 15
- 238000004364 calculation method Methods 0.000 title claims abstract description 9
- 238000005457 optimization Methods 0.000 claims abstract description 15
- 238000005096 rolling process Methods 0.000 claims abstract description 9
- 230000004927 fusion Effects 0.000 claims abstract 2
- 239000011159 matrix material Substances 0.000 claims description 28
- 238000005265 energy consumption Methods 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 23
- 230000008901 benefit Effects 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 19
- 238000003860 storage Methods 0.000 claims description 15
- 238000011156 evaluation Methods 0.000 claims description 9
- 230000009471 action Effects 0.000 claims description 5
- 238000012614 Monte-Carlo sampling Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 230000001186 cumulative effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- MWRWFPQBGSZWNV-UHFFFAOYSA-N Dinitrosopentamethylenetetramine Chemical compound C1N2CN(N=O)CN1CN(N=O)C2 MWRWFPQBGSZWNV-UHFFFAOYSA-N 0.000 description 1
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention provides a container cluster online deployment method for fusion graph neural network and reinforcement learning in edge calculation, which comprises the following steps: s1, extracting topological association relations existing between containers through a graph convolution network; s2, deducing the deployment strategy from the sequence to the sequence network with the aid of the graph rolling network. According to the method, edge calculation can be reasonably deployed according to the constructed optimization model.
Description
Technical Field
The invention relates to the technical field of edge deployment, in particular to a container cluster online deployment method integrating a graph neural network and reinforcement learning in edge calculation.
Background
With the rapid development of wireless access technology in recent years, various mobile internet and novel internet of things applications are continuously emerging, services increasingly present new characteristics of shorter response time requirements, higher service quality requirements, more and more resource requirements, dynamic change of resource requirement scale and the like, and IT is difficult to meet the new requirements by concentrating IT resources in a cloud computing mode of a data center for providing services for users. The near-end computing mode represented by edge computing attracts great attention, and the edge computing enables a mobile user to access services on an edge service node nearby by deploying the service node at the network edge closer to the user in a distributed mode, so that the service quality can be remarkably improved, and the resource load of a data center can be effectively reduced. By introducing the virtualization technology, the edge service provider can abstract the physical resources of the edge node into virtual network function units (Virtual Network Function, VNF), improve the utilization efficiency of IT resources on the premise of meeting the service demands of users, and further reduce the service expenditure (OPEX) of the edge service provider. Currently, virtual Machine (VM-VNF) based virtualization technology (VM) is most widely used. However, VM-VNF has limitations such as slow start-up and migration and large resource overhead, which makes it slow when facing the dynamic demands of tasks. With the recent rise of newly proposed serverless computing (Serverless Computing), network functions can be deployed in the form of Containers (CT) and, in turn, form Container-based virtualization technologies (CT-VNFs). CT-VNFs are increasingly being used by edge service providers due to their advantages of lighter resource usage, shorter service start-up time, and higher migration efficiency. Providing services to tasks at the edge end often requires deploying multiple container units on the edge service node and interconnecting them to build a Container Cluster (CC), for example: a real-time data analysis service with information security requirements may require the establishment of functional units including a Firewall, IDS, a plurality of computing units, and a load balancer. These functional units are all mapped onto the same or different edge service nodes in the form of containers and build a virtual network for interconnection. The complexity of the service itself and the higher demands on service efficiency make how to implement optimized CC deployment in an edge computing environment a challenging problem, which needs to be considered simultaneously: 1) The business is to the multiple characteristics of the resource request; 2) A logical association between a plurality of containers; 3) The IT resources remaining from the currently available edge nodes; 4) The expense of container deployment for energy consumption; 5) Quality of service degradation that may result from container deployment, etc.
Disclosure of Invention
The invention aims at least solving the technical problems in the prior art, and particularly creatively provides a container cluster online deployment method for fusing a graph neural network and reinforcement learning in edge calculation.
In order to achieve the above object of the present invention, the present invention provides a method for on-line deployment of a container cluster fusing graph neural network and reinforcement learning in edge computation, comprising the steps of:
s1, extracting topological association relations existing between containers through a graph convolution network;
s2, deducing the deployment strategy from the sequence to the sequence network with the aid of the graph rolling network.
In a preferred embodiment of the invention, the hierarchical propagation of the graph rolling network in step S1 is:
wherein ,H(l+1) Features representing layer l+1;
σ () represents an activation function;
a represents a relationship matrix between nodes in the graph G;
H (l) features representing a first layer;
W (l) representing the training parameter matrix of the first layer.
In a preferred embodiment of the present invention, the deployment strategy in step S2 is:
π(p|c,θ)=P r {A t =p|S t =c,θ t =θ}
where pi (p|c, θ) represents the probability of the output deployment policy p for a given input c;
θ represents the training parameters of the model;
P r representing a probability of outputting the deployment policy p;
A t an operation at time t;
S t a state at time t;
θ t the training parameters at time t are indicated.
In a preferred embodiment of the present invention, step S1 is followed by step S3, in which the reviewer network evaluates the return obtained after performing the actor' S action.
In a preferred embodiment of the present invention, step S1 is followed by step S4, in which the actor network updates the optimization model parameters according to the output of the commentator module.
In a preferred embodiment of the invention, the optimization model is:
max (total charge-total energy expenditure) (1.1)
Wherein N represents a set of physical nodes;
G c representing a per unit computational resource benefit;
η k,c representing the utilization of computing resources on a physical node k;
i represents a service request set;
V i a container set representing a service request i;
representing a binary flag bit->When container j representing request i is deployed on physical node k; />
G m representing the benefit of each unit memory resource;
G s representing the benefit of storage resources per unit;
wherein N represents a set of physical nodes;
i represents a service request set;
V i a container set representing a service request i;
representing a binary flag bit->When container j representing request i is deployed on physical node k;
u k representing binary flag bits, u k When=1, it means that physical node k is in an active state;
c represents the unit energy consumption branching coefficient.
In a preferred embodiment of the invention, the optimization model is: min (total energy expenditure), min () represents taking the minimum; max () means max.
Wherein N represents a set of physical nodes;
i represents a service request set;
V i a container set representing a service request i;
representing a binary flag bit->When container j representing request i is deployed on physical node k; />Representing the demand of container j for request i for computing resources;
u k representing binary flag bits, u k When=1, it means that physical node k is in an active state;
c represents the unit energy consumption branching coefficient.
In a preferred embodiment of the invention, the constraints of the optimization model are:
wherein ,ηk,c Representing the utilization of computing resources on a physical node k;
i represents a service request set;
n represents a set of physical nodes;
representing a binary flag bit->When container j representing request i is deployed on physical node k; />Representing the demand of container j for request i for computing resources;
wherein N represents a set of physical nodes;
representing a binary flag bit->When container j representing request i is deployed on physical node k;
i represents a service request set;
V i a container set representing a service request i;
wherein I represents a service request set;
V i a container set representing a service request i;
representing a binary flag bit->Container m, which represents request i, is deployed at physical node k u Applying;
representing a binary flag bit->Container n, which represents request i, is deployed at physical node k v Applying;
wherein I represents a service request set;
n represents a set of physical nodes;
representing a binary flag bit->When container j representing request i is deployed on physical node k;
In a preferred embodiment of the invention, the model is updated as:
wherein ,θk+1 Model parameters representing the next moment;
θ k model parameters representing the current time;
alpha represents a learning rate;
In a preferred embodiment of the present invention, the model update further comprises:
wherein ,representing the mean square error of the evaluation value b (c, p) and the prize value Q (c, p) given by the reference evaluator;
m represents the number of samples;
Q(c,p i ) Representing the decision p made by the algorithm at a given input container cluster c i The awards obtained are issued;
b(c,p i ) Expressed in a given input volumeCluster c and decision p i The evaluation value given by the lower reference evaluator b.
In summary, by adopting the technical scheme, the edge calculation can be reasonably deployed according to the constructed optimization model.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic diagram of a container cluster deployment in an edge network environment of the present invention.
FIG. 2 is a schematic diagram of a reinforcement learning model decision-reward cycle of the present invention.
FIG. 3 is a schematic diagram of the model training process of the present invention.
Fig. 4 is a detailed schematic diagram of the actor network model of the present invention.
FIG. 5 is a schematic representation of training history of the present invention in three experimental scenarios;
wherein, (a) is a training history (small scale scene), (b) is a training history (medium scale scene), (c) is a training history (large scale scene), (d) is a training penalty (small scale scene), (e) is a training penalty (medium scale scene), (f) is a training penalty (large scale scene).
FIG. 6 is a comparative schematic of the solution time of the present invention.
FIG. 7 is a comparative schematic diagram of the present invention in terms of deployment error rate.
FIG. 8 is a graph showing a comparison of cumulative benefits over a period of time in accordance with the present invention;
where (a) is a cumulative revenue comparison (small scale scenario), (b) is a cumulative revenue comparison (medium scale scenario), (c) is a cumulative revenue comparison (large scale scenario).
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
The invention mainly comprises the following steps: modeling a container cluster deployment problem in an edge computing network environment, and solving a framework based on an edge computing container cluster deployment strategy of Actor Critic (Actor-Critic) reinforcement learning. The method comprises the steps of extracting characteristics of a network relation topological structure among a plurality of containers in a container cluster by introducing a graph rolling network, taking the result as input of a attention mechanism in a Seq2Seq network to improve output quality of solution, performing embedded coding on the container cluster by an encoder part of the Seq2Seq, and outputting a corresponding container deployment position by a decoder part. And the reinforcement learning framework based on the Actor-Critic is adopted to train the network, label mapping is not needed, the Actor network and the Critic network can mutually train and learn to perform autonomous promotion, and the solution given by the trained network obviously improves the system income.
The same period of time edge computing platform may receive different numbers of service requests, each of which may require different functions, the different functions of the service requiring the use of different types and different numbers of containers, the same number of containers having uncertain communication requirements. The most intuitive impact of service request size and kind is the change in virtual nodes and links, i.e. the change in configuration of the fabric. Workload fluctuations typically change the amount of resource demand, i.e., the change in resource configuration, of a virtual node or link. Two different container clusters are shown in FIG. 1 as mapped to the underlying physical network.
1. Reinforcement learning solving framework combined with graph convolution network
In the invention, the model is trained by adopting an Actor-Critic reinforcement learning framework. The entire model involves two neural networks: actor networks and critics networks. Their workflow is as shown in fig. 2: for a given cluster of containers to be input into the decision system, agent (Actor network) will depend on the current network state S t Giving the appropriate decision A t Among our problems is the deployment policy Placement, which indicates the deployment location of a container in a container cluster. The environment will then evaluate the deployment policy to generate corresponding feedback information (rewards) R indicative of the quality of the deployment policy t+1 At the same time, the environment updates the new environment S after deployment t+1 . The critic network evaluates the return (namely Langerhans' day value) obtained after the action of the actor is executed, and the evaluation result is Baseline; the actor network updates the model parameters based on the output of the commentator module (the actor network will update the parameters in a direction towards a higher return). The training process of the model is shown in detail in fig. 3.
In the invention, the topological link relation existing in the container cluster is extracted by the graph convolution neural network (Graph Convolutional Network) on the basis of the neural combination optimization theory, so that an agent can predict the topological structure of the container cluster in advance, and a deployment strategy is more accurately given. In particular, we use a graph rolling network and a sequence-to-sequence model based on a codec structure to infer deployment policies. For the container clusters of the same training batch, we use the following method: the characteristic information groups in a plurality of container clusters and a block diagonal matrix are input into a graph rolling network for training. To explain the model operation more clearly, we assume a set of container clusters [ Q, V, W ]]Mapping into the underlying physical network is required. Each container cluster corresponding to the service request has a container number of variable size m, for example, q= (f) 1 ,f 2 ,...,f m ). Clustering containers [ Q, V, W]As input to the GCN network, containers q= (f) in the container cluster 1 ,f 2 ,...,f m ) As input to the encoder, the decoder section outputs a deployment policy p= (P 1 ,p 2 ,...,p m ) Indicating the deployment location of each container. The network model of the actor in the method proposed by the invention is shown in fig. 4.
One part of the task request is input to the GCN network for topology feature extraction, and the other part of the task request is input to the encoder part of the Seq2Seq network for controlling the sequence of container deployment. The output of the GCN network and the output part of the encoder are input to the decoder part of the Seq2Seq through matrix point multiplication operation, and finally the deployment strategy of the container is given by the decoder.
The invention builds an optimization model from the perspective of an edge computing service provider, and hopes to reduce the total energy consumption expenditure on the premise of meeting the service request of the user as much as possible so as to realize the maximization of the benefit of the service provider.
max (total charge-total energy expenditure) (1.1)
The objective function is divided into two parts: equation (1.2) charges the edge computing service provider for the corresponding rule on leased resources, i.e., for the container j e V contained in service request I e I i Occupied physical resources: computing resourcesMemory resource->And storage resource->Respectively multiplying by corresponding charging coefficients: g c 、G m and Gs . Notably, we creatively add a service effect coefficient (1-eta for the charging rules of the computing resources k,c ) The increased competing use of physical resources to constrain containers results in reduced service capabilities.
Wherein N represents a set of physical nodes;
G c representing a per unit computational resource benefit;
η k,c representing the utilization of computing resources on a physical node k;
i represents a service request set;
V i a container set representing a service request i;
representing a binary flag bit->When container j representing request i is deployed on physical node k;
G m representing the benefit of each unit memory resource;
G s representing the benefit of storage resources per unit;
in equation (1.3) we define the energy expenditure generated by the underlying physical network, considering that the energy expenditure accounts for a large part of the service provider's daily operating expenditure, so here our optimization model only considers the energy expenditure as an operator expenditure.For maximum energy consumption value of physical node k, < ->For the minimum energy consumption value of the physical node k, we use +.>And computing resource occupancy->The product of (2) represents the energy consumption of the physical node k, which would also occur when the physical node is idle, so the energy consumption value of the physical node k when idle is added +.>And finally multiplying the sum of the two by a unit energy consumption expense coefficient to represent the total energy consumption expense of the service provider. />
Wherein N represents a set of physical nodes;
i represents a service request set;
V i a container set representing a service request i;
representing a binary flag bit->When container j representing request i is deployed on physical node k;
u k representing binary flag bits, u k When=1, it means that physical node k is in an active state;
c represents a unit energy consumption branching coefficient;
the optimization model is limited by a plurality of constraint conditions, and the constraint (1.4) represents the utilization rate eta of the computing resources on the physical node k k,c ,η k,c The range of values is limited to [0,1 ]]。
wherein ,ηk,c Representing the utilization of computing resources on a physical node k;
i represents a service request set;
n represents a set of physical nodes;
representing a binary flag bit->When container j representing request i is deployed on physical node k;
constraint (1.5) defines that the jth container of the ith service request can only be deployed on one physical node and cannot be redeployed.
Wherein N represents a set of physical nodes;
representing a binary flag bit->When container j representing request i is deployed on physical node k;
i represents a service request set;
V i a container set representing a service request i;
constraint (1.6) defines that two service requests i are located at physical nodes k, respectively u and kv The bandwidth resources occupied by the communication between the containers m and n of (a) do not exceed the physical node k u and kv The total amount of bandwidth resources in between.
Wherein I represents a service request set;
V i a container set representing a service request i;
representing a binary flag bit->Container m, which represents request i, is deployed at physical node k u Applying;
representing a binary flag bit->Container n, which represents request i, is deployed at physical node k v Applying;
constraints (1.7), (1.8) and (1.9) define that the sum of the total amount of all container resources contained by the service request does not exceed the total amount of computing resources, memory resources and storage resources, respectively.
Wherein I represents a service request set;
n represents a set of physical nodes;
representing a binary flag bit->When container j representing request i is deployed on physical node k;
2. topological relation description based on graph convolution network
The invention adopts the graph rolling network to extract the topological relation of the input container cluster, and uses the extracted characteristics to assist the intelligent agent to give a more accurate deployment strategy on the premise of not damaging constraint conditions, thereby reducing the container deployment cost and improving the overall benefit of the edge computing service provider.
Let the graph of one container cluster be denoted by g= (N, E). Where N represents vertices in the graph, i.e., containers in the container cluster, and E represents edges in the graph, i.e., links resulting from communications between containers in the container cluster. The features of the vertices in G form an nxd matrix X, where D represents the number of features. The relationship between containers is represented by an N x N dimensional matrix a, i.e., a contiguous matrix of G. The hierarchical propagation of the graph convolutional network is shown in equation (10).
wherein ,H(l+1) Features representing layer l+1;
σ () represents an activation function;
a represents a relationship matrix between nodes in the graph G;
H (l) features representing a first layer;
W (l) representing a training parameter matrix of the first layer;
I N representing an identity matrix with an order of N;
x represents a feature matrix formed by G node features in the diagram;
in this formulaIs an adjacency matrix of undirected graph G with additional self-connections, where A is the adjacency matrix of undirected graph G, I N Is an identity matrix. />Is a matrix->Is a degree matrix of (2). W (W) (l) Is the training parameter matrix for the first layer. Sigma represents an activation function, such as ReLu, sigmoid, etc. (in our model we use ReLu). H (l) Representative is the feature of the first layer, h=x for the input layer.
3. Constraint optimization based on strategy gradients
Let C denote the cluster of one container cluster, where C denotes (C e C), the policy function of C is expressed as:
π(p|c,θ)=P r {A t =p|S t =c,θ t =θ}
where pi (p|c, θ) represents the probability of the output deployment policy p for a given input c;
θ represents the training parameters of the model;
P r representing a probability of outputting the deployment policy p;
A t an operation at time t;
S t a state at time t;
θ t training parameters representing the time t;
the strategy function represents the moment t, the input c, the parameter theta and the probability P of outputting the deployment strategy P r . The strategy gives higher probability to the high-benefit deployment strategy p and lower probability to the low-benefit deployment strategy p. Interaction of the input container clusters with the output strategy during period T generates a trajectory τ= (c) of a markov decision process 1 ,p 1 ,...,c T ,p T ) The probability of (2) can be expressed as:
wherein ,Pθ (c 1 ,p 1 ,...,c T ,p T ) Represents trajectory τ= (c) under parameter θ 1 ,p 1 ,...,c T ,p T ) Probability of occurrence;
p(c 1 ) Representing state c 1 (i.e. the input at time t=1 is c 1 ) Probability of occurrence;
t represents a period of time;
π θ (p t |c t ) Indicating at time t that the current state is c t (i.e., a cluster of containers entered), in an environment with parameters θ, the agent takes action p t Probability (i.e., the deployment policy that is output);
p(c t+1 |c t ,p t ) The state at time t (i.e., the input container cluster) is denoted as c t And the action (i.e., the output deployment policy) is p t Under the condition of (1), the system state at time t+1 (i.e. the input container cluster) is c t+1 Probability of (2);
c 1 representing the system state (i.e., the incoming container cluster) at time t=1;
p 1 representing a deployment policy at time t=1;
c t an input representing a time t;
p t representing a deployment strategy output at the moment t;
in the above policy function, for the current input container cluster c t Deployment policy p of (2) t The probability of (2) depends on the deployment position p of the previous container cluster (<t) And system status. For simplicity we assume that the system state is fully defined by the container cluster C. The policy function outputs only a probability indicating the deployment location of the container cluster. The goal of the strategy gradient method is to find the optimal set of parameters θ * GetTo the optimal deployment location of the container clusters. To this end, we need to define an objective function to describe the quality of the deployment strategy.
wherein ,JR (θ|c) represents the policy quality corresponding to input c;
r (p) represents the service benefit corresponding to the deployment policy p;
p-pi theta (|c) represents all deployment policies p for a given input c;
in the above formula, we use the expected service benefit R (p) of a given container cluster C for a deployment policy as an objective function describing the quality of the deployment policy. Because the agent infers the deployment policy from all container clusters, the revenue expectations may then be defined as expectations of the container probability distribution:
wherein ,JR (θ) represents the policy quality, i.e., the expected value of the benefit;
J R (θ|c) represents the policy quality corresponding to input c;
C-C represents a cluster C for all containers;
the same thing can be expressed as the expected penalty due to violating constraints:
wherein ,JC (θ) represents an expected value of the penalty value;
J C (θ|c) represents a penalty value corresponding to input c;
C-C represents a cluster C for all containers;
here we define four constraint signals, respectively: a computing resource cpu, a memory resource mem, a storage resource sto, and a bandwidth resource bw. The final optimization objective can be converted to an unconstrained problem by lagrangian relaxation techniques:
wherein ,JL (lambda, theta) represents the Lagrangian value in such a way that the expected value J of the benefit will be calculated R (θ) adding an expected value J of penalty values corresponding to various resources C A weighted sum of (θ);
λ represents the weights of the four constraint signals;
J R (θ) represents the policy quality, i.e., the expected value of the benefit;
λ i representing weights of the constraint signal;
J C (θ) represents an expected value of the penalty value;
J ξ (θ) represents a weighted sum of the expected values of the four constraint signal penalty values;
where λ is the weight of four constraint signals, J ξ And (θ) is the desired gain weighted sum of the four constraint signals. Next, we calculate J using log likelihood method L (lambda, theta) gradient.
J L (lambda, theta) represents the Lagrangian value in such a way that the expected value J of the benefit will be calculated R (θ) adding an expected value J of penalty values corresponding to various resources C A weighted sum of (θ);
pi theta (p|c) represents a policy function of c;
q (c, p) represents the rewards earned given the decision p made by the input container cluster c algorithm;
p-pi theta (|c) represents the deployment policy p for a given input c;
in the above equation, Q (c, p) is used to describe the rewards available at a given input c and decision p made by the algorithm. The calculation method is by adding the weighted sum of all constraint unsatisfied values C (p) to the benefit value R (p), as shown in (18):
wherein Q (c, p) represents the rewards earned under the decision p made by the algorithm of a given input container cluster c;
r (p) represents rewards available to the system corresponding to decision p;
ζ (p) represents a weighted sum of penalty values for all constraint signals of decision p;
λ i representing weights of the constraint signal;
c (p) represents the penalty value generated by the next constraint signal at decision p;
then we approximate the Lagrangian gradient using Monte Carlo samplingWhere m is the number of samples, to reduce the variance of the gradient, the modulus is acceleratedConvergence rate we use the critic network as the benchmark evaluator b, which is made up of a simple RNN network. The lagrangian gradient can be expressed as:
m represents the number of samples;
Q(c,p i ) Representing the decision p made by the algorithm at a given input container cluster c i And awarding the obtained awards.
b(c,p i ) Represented in a given input container cluster c and decision p i An evaluation value given by the lower reference evaluator b;
finally, updating the parameter theta of the network model by adopting a random gradient descent method:
wherein ,θk+1 Model parameters representing the next moment;
θ k model parameters representing the current time;
alpha represents a learning rate;
The reference evaluator gives the evaluation value b (c, p) of the current bin cluster report, and then the parameter sigma of the reference evaluator is updated based on the mean square error of b (c, p) and the prize value Q (c, p) using a random gradient descent method.
wherein ,representing the mean square error of the evaluation value b (c, p) and the prize value Q (c, p) given by the reference evaluator;
m represents the number of samples;
Q(c,p i ) Representing the decision p made by the algorithm at a given input container cluster c i The awards obtained are issued;
b(c,p i ) Represented in a given input container cluster c and decision p i An evaluation value given by the lower reference evaluator b;
the container cluster deployment algorithm training process optimized based on graph convolution network and neural combination can be described as table 1:
TABLE 1 description of Container Cluster deployment Algorithm training Process based on graph roll-up network and neural combination optimization
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
Claims (7)
1. The container cluster online deployment method for fusing the graph neural network and reinforcement learning in the edge calculation is characterized by comprising the following steps of:
s1, extracting topological association relations existing between containers through a graph convolution network; updating the optimized model parameters by the actor network according to the output of the commentator module; wherein the optimization model is:
max (total charge-total energy expenditure) (1.1)
Wherein N represents a set of physical nodes;
G c representing a per unit computational resource benefit;
η k,c representing the utilization of computing resources on a physical node k;
i represents a service request set;
V i a container set representing a service request i;
representing a binary flag bit->When container j representing request i is deployed on physical node k;
G m representing the benefit of each unit memory resource;
G s representing the benefit of storage resources per unit;
wherein N represents a set of physical nodes;
i represents a service request set;
V i a container set representing a service request i;
representing a binary flag bit->When container j representing request i is deployed on physical node k;
u k representing binary flag bits, u k When=1, it means that physical node k is activeA state;
c represents a unit energy consumption branching coefficient;
alternatively, min (total energy expenditure)
Wherein N represents a set of physical nodes;
i represents a service request set;
V i a container set representing a service request i;
representing a binary flag bit->When container j representing request i is deployed on physical node k;
u k representing binary flag bits, u k When=1, it means that the physical node k is activeA state;
c represents a unit energy consumption branching coefficient;
s2, deducing the deployment strategy from the sequence to the sequence network with the aid of the graph rolling network.
2. The method for online deployment of container clusters fusing graph neural networks and reinforcement learning in edge computing according to claim 1, wherein the hierarchical propagation of graph rolling networks in step S1 is:
wherein ,H(l+1) Features representing layer l+1;
σ () represents an activation function;
a represents a relationship matrix between nodes in the graph G;
H (l) features representing a first layer;
W (l) representing the training parameter matrix of the first layer.
3. The method for on-line deployment of container clusters in edge computing with fusion of graph neural networks and reinforcement learning according to claim 1, wherein the deployment strategy in step S2 is:
π(p|c,θ)=P r {A t =p|S t =c,θ t =θ}
where pi (p|c, θ) represents the probability of the output deployment policy p for a given input c;
θ represents the training parameters of the model;
P r representing a probability of outputting the deployment policy p;
A t an operation at time t;
S t a state at time t;
θ t the training parameters at time t are indicated.
4. The method for online deployment of container clusters in edge computing incorporating graph neural networks and reinforcement learning of claim 1, further comprising step S3, after step S1, of the reviewer network evaluating returns obtained after performing actor actions.
5. The method for on-line deployment of container clusters fusing graph neural network and reinforcement learning in edge computing according to claim 1, wherein constraint conditions of an optimization model are as follows:
wherein ,ηk,c Representing the utilization of computing resources on a physical node k;
i represents a service request set;
n represents a set of physical nodes;
representing a binary flag bit->When container j representing request i is deployed on physical node k; />Representing the demand of container j for request i for computing resources;
wherein N represents a set of physical nodes;
representing a binary flag bit->When container j representing request i is deployed on physical node k; i represents a service request set;
V i a container set representing a service request i;
wherein I represents a service request set;
V i a container set representing a service request i;
representing a binary flag bit->Container m, which represents request i, is deployed at physical node k u Applying; />Representing a binary flag bit->Container n, which represents request i, is deployed at physical node k v Applying; b (B) ku·kv Representing a physical node k u and kv The total amount of bandwidth resources between;
wherein I represents a service request set;
n represents a set of physical nodes;
representing a binary flag bit->When container j representing request i is deployed on physical node k; />
6. The method for online deployment of container clusters fusing graph neural network and reinforcement learning in edge computing according to claim 1, wherein the model is updated as follows:
wherein ,θk+1 Model parameters representing the next moment;
θ k model parameters representing the current time;
alpha represents a learning rate;
7. The method for online deployment of container clusters in edge computing that fuses graph neural networks and reinforcement learning of claim 6, wherein the model updating further comprises:
wherein ,representing the mean square error of the evaluation value b (c, p) and the prize value Q (c, p) given by the reference evaluator;
m represents the number of samples;
Q(c,p i ) Representing the decision p made by the algorithm at a given input container cluster c i The awards obtained are issued;
b(c,p i ) Represented in a given input container cluster c and decision p i The evaluation value given by the lower reference evaluator b.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211347967.8A CN115686846B (en) | 2022-10-31 | 2022-10-31 | Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211347967.8A CN115686846B (en) | 2022-10-31 | 2022-10-31 | Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115686846A CN115686846A (en) | 2023-02-03 |
CN115686846B true CN115686846B (en) | 2023-05-02 |
Family
ID=85045641
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211347967.8A Active CN115686846B (en) | 2022-10-31 | 2022-10-31 | Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115686846B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116069512B (en) * | 2023-03-23 | 2023-08-04 | 之江实验室 | Serverless efficient resource allocation method and system based on reinforcement learning |
CN117149443B (en) * | 2023-10-30 | 2024-01-26 | 江西师范大学 | Edge computing service deployment method based on neural network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110008819A (en) * | 2019-01-30 | 2019-07-12 | 武汉科技大学 | A kind of facial expression recognizing method based on figure convolutional neural networks |
CN113568675A (en) * | 2021-07-08 | 2021-10-29 | 广东利通科技投资有限公司 | Internet of vehicles edge calculation task unloading method based on layered reinforcement learning |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3792831A1 (en) * | 2019-09-11 | 2021-03-17 | Siemens Aktiengesellschaft | Method for generating an adapted task graph |
CN112631717B (en) * | 2020-12-21 | 2023-09-05 | 重庆大学 | Asynchronous reinforcement learning-based network service function chain dynamic deployment system and method |
CN112711475B (en) * | 2021-01-20 | 2022-09-06 | 上海交通大学 | Workflow scheduling method and system based on graph convolution neural network |
US20220124543A1 (en) * | 2021-06-30 | 2022-04-21 | Oner Orhan | Graph neural network and reinforcement learning techniques for connection management |
CN113778648B (en) * | 2021-08-31 | 2023-07-11 | 重庆理工大学 | Task scheduling method based on deep reinforcement learning in hierarchical edge computing environment |
-
2022
- 2022-10-31 CN CN202211347967.8A patent/CN115686846B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110008819A (en) * | 2019-01-30 | 2019-07-12 | 武汉科技大学 | A kind of facial expression recognizing method based on figure convolutional neural networks |
CN113568675A (en) * | 2021-07-08 | 2021-10-29 | 广东利通科技投资有限公司 | Internet of vehicles edge calculation task unloading method based on layered reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN115686846A (en) | 2023-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115686846B (en) | Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation | |
CN109818786B (en) | Method for optimally selecting distributed multi-resource combined path capable of sensing application of cloud data center | |
Guim et al. | Autonomous lifecycle management for resource-efficient workload orchestration for green edge computing | |
Rkhami et al. | Learn to improve: A novel deep reinforcement learning approach for beyond 5G network slicing | |
Bahrpeyma et al. | An adaptive RL based approach for dynamic resource provisioning in Cloud virtualized data centers | |
CN114936708A (en) | Fault diagnosis optimization method based on edge cloud collaborative task unloading and electronic equipment | |
Aslam et al. | Using artificial neural network for VM consolidation approach to enhance energy efficiency in green cloud | |
CN108073442B (en) | Simulation request execution time prediction method based on depth fuzzy stack self-coding | |
CN116009990B (en) | Cloud edge collaborative element reinforcement learning computing unloading method based on wide attention mechanism | |
Luan et al. | LRP‐based network pruning and policy distillation of robust and non‐robust DRL agents for embedded systems | |
CN113543160A (en) | 5G slice resource allocation method and device, computing equipment and computer storage medium | |
Huang et al. | Learning-aided fine grained offloading for real-time applications in edge-cloud computing | |
Qin et al. | Dynamic IoT service placement based on shared parallel architecture in fog-cloud computing | |
CN116126534A (en) | Cloud resource dynamic expansion method and system | |
CN112906745B (en) | Integrity intelligent network training method based on edge cooperation | |
CN115499511A (en) | Micro-service active scaling method based on space-time diagram neural network load prediction | |
CN113783726B (en) | SLA-oriented resource self-adaptive customization method for edge cloud system | |
Liu et al. | Hidden markov model based spot price prediction for cloud computing | |
Bhargavi et al. | Uncertainty aware resource provisioning framework for cloud using expected 3-SARSA learning agent: NSS and FNSS based approach | |
Li et al. | An automated VNF manager based on parameterized action MDP and reinforcement learning | |
CN111913780A (en) | Resource prediction and scheduling method in cloud computing | |
CN117648174B (en) | Cloud computing heterogeneous task scheduling and container management method based on artificial intelligence | |
Damaševičius et al. | Short time prediction of cloud server round-trip time using a hybrid neuro-fuzzy network | |
Dixit et al. | Machine Learning Based Adaptive Auto-scaling Policy for Resource Orchestration in Kubernetes Clusters | |
Su et al. | An Attention Mechanism-based Microservice Placement Scheme for On-star Edge Computing Nodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |