CN115686846B

CN115686846B - Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation

Info

Publication number: CN115686846B
Application number: CN202211347967.8A
Authority: CN
Inventors: 陈卓; 朱博文; 周川
Original assignee: Chongqing University of Technology
Current assignee: Chongqing University of Technology
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-05-02
Anticipated expiration: 2042-10-31
Also published as: CN115686846A

Abstract

The invention provides a container cluster online deployment method for fusion graph neural network and reinforcement learning in edge calculation, which comprises the following steps: s1, extracting topological association relations existing between containers through a graph convolution network; s2, deducing the deployment strategy from the sequence to the sequence network with the aid of the graph rolling network. According to the method, edge calculation can be reasonably deployed according to the constructed optimization model.

Description

Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation

Technical Field

The invention relates to the technical field of edge deployment, in particular to a container cluster online deployment method integrating a graph neural network and reinforcement learning in edge calculation.

Background

With the rapid development of wireless access technology in recent years, various mobile internet and novel internet of things applications are continuously emerging, services increasingly present new characteristics of shorter response time requirements, higher service quality requirements, more and more resource requirements, dynamic change of resource requirement scale and the like, and IT is difficult to meet the new requirements by concentrating IT resources in a cloud computing mode of a data center for providing services for users. The near-end computing mode represented by edge computing attracts great attention, and the edge computing enables a mobile user to access services on an edge service node nearby by deploying the service node at the network edge closer to the user in a distributed mode, so that the service quality can be remarkably improved, and the resource load of a data center can be effectively reduced. By introducing the virtualization technology, the edge service provider can abstract the physical resources of the edge node into virtual network function units (Virtual Network Function, VNF), improve the utilization efficiency of IT resources on the premise of meeting the service demands of users, and further reduce the service expenditure (OPEX) of the edge service provider. Currently, virtual Machine (VM-VNF) based virtualization technology (VM) is most widely used. However, VM-VNF has limitations such as slow start-up and migration and large resource overhead, which makes it slow when facing the dynamic demands of tasks. With the recent rise of newly proposed serverless computing (Serverless Computing), network functions can be deployed in the form of Containers (CT) and, in turn, form Container-based virtualization technologies (CT-VNFs). CT-VNFs are increasingly being used by edge service providers due to their advantages of lighter resource usage, shorter service start-up time, and higher migration efficiency. Providing services to tasks at the edge end often requires deploying multiple container units on the edge service node and interconnecting them to build a Container Cluster (CC), for example: a real-time data analysis service with information security requirements may require the establishment of functional units including a Firewall, IDS, a plurality of computing units, and a load balancer. These functional units are all mapped onto the same or different edge service nodes in the form of containers and build a virtual network for interconnection. The complexity of the service itself and the higher demands on service efficiency make how to implement optimized CC deployment in an edge computing environment a challenging problem, which needs to be considered simultaneously: 1) The business is to the multiple characteristics of the resource request; 2) A logical association between a plurality of containers; 3) The IT resources remaining from the currently available edge nodes; 4) The expense of container deployment for energy consumption; 5) Quality of service degradation that may result from container deployment, etc.

Disclosure of Invention

The invention aims at least solving the technical problems in the prior art, and particularly creatively provides a container cluster online deployment method for fusing a graph neural network and reinforcement learning in edge calculation.

In order to achieve the above object of the present invention, the present invention provides a method for on-line deployment of a container cluster fusing graph neural network and reinforcement learning in edge computation, comprising the steps of:

s1, extracting topological association relations existing between containers through a graph convolution network;

s2, deducing the deployment strategy from the sequence to the sequence network with the aid of the graph rolling network.

In a preferred embodiment of the invention, the hierarchical propagation of the graph rolling network in step S1 is:

wherein ,H^(l+1) Features representing layer l+1;

σ () represents an activation function;

representation matrix->

A degree matrix of (2);

representation pair->

Performing (-1/2) power operation on the matrix;

a represents a relationship matrix between nodes in the graph G;

representing an adjacency matrix of the undirected graph G with additional self-connections;

H ^(l) features representing a first layer;

W ^(l) representing the training parameter matrix of the first layer.

In a preferred embodiment of the present invention, the deployment strategy in step S2 is:

π(p|c,θ)＝P _r {A _t ＝p|S _t ＝c,θ _t ＝θ}

where pi (p|c, θ) represents the probability of the output deployment policy p for a given input c;

θ represents the training parameters of the model;

P _r representing a probability of outputting the deployment policy p;

A _t an operation at time t;

S _t a state at time t;

θ _t the training parameters at time t are indicated.

In a preferred embodiment of the present invention, step S1 is followed by step S3, in which the reviewer network evaluates the return obtained after performing the actor' S action.

In a preferred embodiment of the present invention, step S1 is followed by step S4, in which the actor network updates the optimization model parameters according to the output of the commentator module.

In a preferred embodiment of the invention, the optimization model is:

max (total charge-total energy expenditure) (1.1)

Wherein N represents a set of physical nodes;

G _c representing a per unit computational resource benefit;

η _k,c representing the utilization of computing resources on a physical node k;

i represents a service request set;

V _i a container set representing a service request i;

representing a binary flag bit->

When container j representing request i is deployed on physical node k; />

Representing the demand of container j for request i for computing resources;

G _m representing the benefit of each unit memory resource;

representing the amount of memory resource demand by container j of request i;

G _s representing the benefit of storage resources per unit;

representing the demand of container j for request i for storage resources;

wherein N represents a set of physical nodes;

representing the maximum energy consumption value of the physical node k;

an idle energy consumption value representing a physical node k;

i represents a service request set;

V _i a container set representing a service request i;

representing a binary flag bit->

When container j representing request i is deployed on physical node k;

representing the demand of container j for request i for computing resources;

representing the total amount of computing resources of the physical node k;

u _k representing binary flag bits, u _k When=1, it means that physical node k is in an active state;

c represents the unit energy consumption branching coefficient.

In a preferred embodiment of the invention, the optimization model is: min (total energy expenditure), min () represents taking the minimum; max () means max.

Wherein N represents a set of physical nodes;

representing the maximum energy consumption value of the physical node k;

an idle energy consumption value representing a physical node k;

i represents a service request set;

V _i a container set representing a service request i;

representing a binary flag bit->

When container j representing request i is deployed on physical node k; />

Representing the demand of container j for request i for computing resources;

representing the total amount of computing resources of the physical node k;

c represents the unit energy consumption branching coefficient.

In a preferred embodiment of the invention, the constraints of the optimization model are:

wherein ,η_k,c Representing the utilization of computing resources on a physical node k;

i represents a service request set;

n represents a set of physical nodes;

representing a binary flag bit->

When container j representing request i is deployed on physical node k; />

Representing the demand of container j for request i for computing resources;

representing the total amount of computing resources of the physical node k;

wherein N represents a set of physical nodes;

representing a binary flag bit->

When container j representing request i is deployed on physical node k;

i represents a service request set;

V _i a container set representing a service request i;

wherein I represents a service request set;

V _i a container set representing a service request i;

representing bandwidth requirements of container m and container n for request i;

representing a binary flag bit->

Container m, which represents request i, is deployed at physical node k _u Applying;

representing a binary flag bit->

Container n, which represents request i, is deployed at physical node k _v Applying;

representing a physical node k _u and k_v The total amount of bandwidth resources between;

wherein I represents a service request set;

n represents a set of physical nodes;

representing a binary flag bit->

When container j representing request i is deployed on physical node k;

representing the demand of container j for request i for computing resources;

representing the total amount of computing resources of the physical node k;

representing the amount of memory resource demand by container j of request i;

representing the total memory resource amount of the physical node k; />

Representing the demand of container j for request i for storage resources;

representing the total amount of storage resources of physical node k.

In a preferred embodiment of the invention, the model is updated as:

wherein ,θ_k+1 Model parameters representing the next moment;

θ _k model parameters representing the current time;

alpha represents a learning rate;

represents the lagrangian gradient approximated using monte carlo sampling.

In a preferred embodiment of the present invention, the model update further comprises:

wherein ,

representing the mean square error of the evaluation value b (c, p) and the prize value Q (c, p) given by the reference evaluator;

m represents the number of samples;

Q(c,p _i ) Representing the decision p made by the algorithm at a given input container cluster c _i The awards obtained are issued;

b(c,p _i ) Expressed in a given input volumeCluster c and decision p _i The evaluation value given by the lower reference evaluator b.

In summary, by adopting the technical scheme, the edge calculation can be reasonably deployed according to the constructed optimization model.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a schematic diagram of a container cluster deployment in an edge network environment of the present invention.

FIG. 2 is a schematic diagram of a reinforcement learning model decision-reward cycle of the present invention.

FIG. 3 is a schematic diagram of the model training process of the present invention.

Fig. 4 is a detailed schematic diagram of the actor network model of the present invention.

FIG. 5 is a schematic representation of training history of the present invention in three experimental scenarios;

wherein, (a) is a training history (small scale scene), (b) is a training history (medium scale scene), (c) is a training history (large scale scene), (d) is a training penalty (small scale scene), (e) is a training penalty (medium scale scene), (f) is a training penalty (large scale scene).

FIG. 6 is a comparative schematic of the solution time of the present invention.

FIG. 7 is a comparative schematic diagram of the present invention in terms of deployment error rate.

FIG. 8 is a graph showing a comparison of cumulative benefits over a period of time in accordance with the present invention;

where (a) is a cumulative revenue comparison (small scale scenario), (b) is a cumulative revenue comparison (medium scale scenario), (c) is a cumulative revenue comparison (large scale scenario).

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

The invention mainly comprises the following steps: modeling a container cluster deployment problem in an edge computing network environment, and solving a framework based on an edge computing container cluster deployment strategy of Actor Critic (Actor-Critic) reinforcement learning. The method comprises the steps of extracting characteristics of a network relation topological structure among a plurality of containers in a container cluster by introducing a graph rolling network, taking the result as input of a attention mechanism in a Seq2Seq network to improve output quality of solution, performing embedded coding on the container cluster by an encoder part of the Seq2Seq, and outputting a corresponding container deployment position by a decoder part. And the reinforcement learning framework based on the Actor-Critic is adopted to train the network, label mapping is not needed, the Actor network and the Critic network can mutually train and learn to perform autonomous promotion, and the solution given by the trained network obviously improves the system income.

The same period of time edge computing platform may receive different numbers of service requests, each of which may require different functions, the different functions of the service requiring the use of different types and different numbers of containers, the same number of containers having uncertain communication requirements. The most intuitive impact of service request size and kind is the change in virtual nodes and links, i.e. the change in configuration of the fabric. Workload fluctuations typically change the amount of resource demand, i.e., the change in resource configuration, of a virtual node or link. Two different container clusters are shown in FIG. 1 as mapped to the underlying physical network.

1. Reinforcement learning solving framework combined with graph convolution network

In the invention, the model is trained by adopting an Actor-Critic reinforcement learning framework. The entire model involves two neural networks: actor networks and critics networks. Their workflow is as shown in fig. 2: for a given cluster of containers to be input into the decision system, agent (Actor network) will depend on the current network state S _t Giving the appropriate decision A _t Among our problems is the deployment policy Placement, which indicates the deployment location of a container in a container cluster. The environment will then evaluate the deployment policy to generate corresponding feedback information (rewards) R indicative of the quality of the deployment policy _t+1 At the same time, the environment updates the new environment S after deployment _t+1 . The critic network evaluates the return (namely Langerhans' day value) obtained after the action of the actor is executed, and the evaluation result is Baseline; the actor network updates the model parameters based on the output of the commentator module (the actor network will update the parameters in a direction towards a higher return). The training process of the model is shown in detail in fig. 3.

In the invention, the topological link relation existing in the container cluster is extracted by the graph convolution neural network (Graph Convolutional Network) on the basis of the neural combination optimization theory, so that an agent can predict the topological structure of the container cluster in advance, and a deployment strategy is more accurately given. In particular, we use a graph rolling network and a sequence-to-sequence model based on a codec structure to infer deployment policies. For the container clusters of the same training batch, we use the following method: the characteristic information groups in a plurality of container clusters and a block diagonal matrix are input into a graph rolling network for training. To explain the model operation more clearly, we assume a set of container clusters [ Q, V, W ]]Mapping into the underlying physical network is required. Each container cluster corresponding to the service request has a container number of variable size m, for example, q= (f) ₁ ,f ₂ ,...,f _m ). Clustering containers [ Q, V, W]As input to the GCN network, containers q= (f) in the container cluster ₁ ,f ₂ ,...,f _m ) As input to the encoder, the decoder section outputs a deployment policy p= (P ₁ ,p ₂ ,...,p _m ) Indicating the deployment location of each container. The network model of the actor in the method proposed by the invention is shown in fig. 4.

One part of the task request is input to the GCN network for topology feature extraction, and the other part of the task request is input to the encoder part of the Seq2Seq network for controlling the sequence of container deployment. The output of the GCN network and the output part of the encoder are input to the decoder part of the Seq2Seq through matrix point multiplication operation, and finally the deployment strategy of the container is given by the decoder.

The invention builds an optimization model from the perspective of an edge computing service provider, and hopes to reduce the total energy consumption expenditure on the premise of meeting the service request of the user as much as possible so as to realize the maximization of the benefit of the service provider.

max (total charge-total energy expenditure) (1.1)

The objective function is divided into two parts: equation (1.2) charges the edge computing service provider for the corresponding rule on leased resources, i.e., for the container j e V contained in service request I e I _i Occupied physical resources: computing resources

Memory resource->

And storage resource->

Respectively multiplying by corresponding charging coefficients: g _c 、G _m and G_s . Notably, we creatively add a service effect coefficient (1-eta for the charging rules of the computing resources _k,c ) The increased competing use of physical resources to constrain containers results in reduced service capabilities.

Wherein N represents a set of physical nodes;

G _c representing a per unit computational resource benefit;

i represents a service request set;

V _i a container set representing a service request i;

representing a binary flag bit->

When container j representing request i is deployed on physical node k;

representing the demand of container j for request i for computing resources;

G _m representing the benefit of each unit memory resource;

representing the amount of memory resource demand by container j of request i;

G _s representing the benefit of storage resources per unit;

representing the demand of container j for request i for storage resources;

in equation (1.3) we define the energy expenditure generated by the underlying physical network, considering that the energy expenditure accounts for a large part of the service provider's daily operating expenditure, so here our optimization model only considers the energy expenditure as an operator expenditure.

For maximum energy consumption value of physical node k, < ->

For the minimum energy consumption value of the physical node k, we use +.>

And computing resource occupancy->

The product of (2) represents the energy consumption of the physical node k, which would also occur when the physical node is idle, so the energy consumption value of the physical node k when idle is added +.>

And finally multiplying the sum of the two by a unit energy consumption expense coefficient to represent the total energy consumption expense of the service provider. />

Wherein N represents a set of physical nodes;

representing the maximum energy consumption value of the physical node k;

an idle energy consumption value representing a physical node k;

i represents a service request set;

V _i a container set representing a service request i;

representing a binary flag bit->

When container j representing request i is deployed on physical node k;

representing the demand of container j for request i for computing resources;

representing the total amount of computing resources of the physical node k;

c represents a unit energy consumption branching coefficient;

the optimization model is limited by a plurality of constraint conditions, and the constraint (1.4) represents the utilization rate eta of the computing resources on the physical node k _k,c ，η _k,c The range of values is limited to [0,1 ]]。

i represents a service request set;

n represents a set of physical nodes;

representing a binary flag bit->

When container j representing request i is deployed on physical node k;

representing the demand of container j for request i for computing resources;

representing the total amount of computing resources of the physical node k;

constraint (1.5) defines that the jth container of the ith service request can only be deployed on one physical node and cannot be redeployed.

Wherein N represents a set of physical nodes;

representing a binary flag bit->

When container j representing request i is deployed on physical node k;

i represents a service request set;

V _i a container set representing a service request i;

constraint (1.6) defines that two service requests i are located at physical nodes k, respectively _u and k_v The bandwidth resources occupied by the communication between the containers m and n of (a) do not exceed the physical node k _u and k_v The total amount of bandwidth resources in between.

/>

Wherein I represents a service request set;

V _i a container set representing a service request i;

representing a binary flag bit->

representing a binary flag bit->

constraints (1.7), (1.8) and (1.9) define that the sum of the total amount of all container resources contained by the service request does not exceed the total amount of computing resources, memory resources and storage resources, respectively.

Wherein I represents a service request set;

n represents a set of physical nodes;

representing a binary flag bit->

When container j representing request i is deployed on physical node k;

representing the demand of container j for request i for computing resources;

representing the total amount of computing resources of the physical node k;

representing the amount of memory resource demand by container j of request i;

representing the total memory resource amount of the physical node k;

representing the demand of container j for request i for storage resources;

representing the total amount of storage resources of the physical node k;

2. topological relation description based on graph convolution network

The invention adopts the graph rolling network to extract the topological relation of the input container cluster, and uses the extracted characteristics to assist the intelligent agent to give a more accurate deployment strategy on the premise of not damaging constraint conditions, thereby reducing the container deployment cost and improving the overall benefit of the edge computing service provider.

Let the graph of one container cluster be denoted by g= (N, E). Where N represents vertices in the graph, i.e., containers in the container cluster, and E represents edges in the graph, i.e., links resulting from communications between containers in the container cluster. The features of the vertices in G form an nxd matrix X, where D represents the number of features. The relationship between containers is represented by an N x N dimensional matrix a, i.e., a contiguous matrix of G. The hierarchical propagation of the graph convolutional network is shown in equation (10).

wherein ,H^(l+1) Features representing layer l+1;

σ () represents an activation function;

representation matrix->

A degree matrix of (2); />

Representation pair->

Performing (-1/2) power operation on the matrix;

a represents a relationship matrix between nodes in the graph G;

H ^(l) features representing a first layer;

W ^(l) representing a training parameter matrix of the first layer;

I _N representing an identity matrix with an order of N;

representation->

The ith row, j, column element of the matrix;

x represents a feature matrix formed by G node features in the diagram;

in this formula

Is an adjacency matrix of undirected graph G with additional self-connections, where A is the adjacency matrix of undirected graph G, I _N Is an identity matrix. />

Is a matrix->

Is a degree matrix of (2). W (W) ^(l) Is the training parameter matrix for the first layer. Sigma represents an activation function, such as ReLu, sigmoid, etc. (in our model we use ReLu). H ^(l) Representative is the feature of the first layer, h=x for the input layer.

3. Constraint optimization based on strategy gradients

Let C denote the cluster of one container cluster, where C denotes (C e C), the policy function of C is expressed as:

π(p|c,θ)＝P _r {A _t ＝p|S _t ＝c,θ _t ＝θ}

θ represents the training parameters of the model;

P _r representing a probability of outputting the deployment policy p;

A _t an operation at time t;

S _t a state at time t;

θ _t training parameters representing the time t;

the strategy function represents the moment t, the input c, the parameter theta and the probability P of outputting the deployment strategy P _r . The strategy gives higher probability to the high-benefit deployment strategy p and lower probability to the low-benefit deployment strategy p. Interaction of the input container clusters with the output strategy during period T generates a trajectory τ= (c) of a markov decision process ₁ ,p ₁ ,...,c _T ,p _T ) The probability of (2) can be expressed as:

wherein ,P_θ (c ₁ ,p ₁ ,...,c _T ,p _T ) Represents trajectory τ= (c) under parameter θ ₁ ,p ₁ ,...,c _T ,p _T ) Probability of occurrence;

p(c ₁ ) Representing state c ₁ (i.e. the input at time t=1 is c ₁ ) Probability of occurrence;

t represents a period of time;

π _θ (p _t |c _t ) Indicating at time t that the current state is c _t (i.e., a cluster of containers entered), in an environment with parameters θ, the agent takes action p _t Probability (i.e., the deployment policy that is output);

p(c _t+1 |c _t ,p _t ) The state at time t (i.e., the input container cluster) is denoted as c _t And the action (i.e., the output deployment policy) is p _t Under the condition of (1), the system state at time t+1 (i.e. the input container cluster) is c _t+1 Probability of (2);

c ₁ representing the system state (i.e., the incoming container cluster) at time t=1;

p ₁ representing a deployment policy at time t=1;

c _t an input representing a time t;

p _t representing a deployment strategy output at the moment t;

in the above policy function, for the current input container cluster c _t Deployment policy p of (2) _t The probability of (2) depends on the deployment position p of the previous container cluster _(＜t) And system status. For simplicity we assume that the system state is fully defined by the container cluster C. The policy function outputs only a probability indicating the deployment location of the container cluster. The goal of the strategy gradient method is to find the optimal set of parameters θ ^* GetTo the optimal deployment location of the container clusters. To this end, we need to define an objective function to describe the quality of the deployment strategy.

wherein ,J_R (θ|c) represents the policy quality corresponding to input c;

representing a desire;

r (p) represents the service benefit corresponding to the deployment policy p;

p-pi theta (|c) represents all deployment policies p for a given input c;

in the above formula, we use the expected service benefit R (p) of a given container cluster C for a deployment policy as an objective function describing the quality of the deployment policy. Because the agent infers the deployment policy from all container clusters, the revenue expectations may then be defined as expectations of the container probability distribution:

wherein ,J_R (θ) represents the policy quality, i.e., the expected value of the benefit;

representing a desire;

J _R (θ|c) represents the policy quality corresponding to input c;

C-C represents a cluster C for all containers;

the same thing can be expressed as the expected penalty due to violating constraints:

wherein ,J_C (θ) represents an expected value of the penalty value;

representing a desire;

J _C (θ|c) represents a penalty value corresponding to input c;

C-C represents a cluster C for all containers;

here we define four constraint signals, respectively: a computing resource cpu, a memory resource mem, a storage resource sto, and a bandwidth resource bw. The final optimization objective can be converted to an unconstrained problem by lagrangian relaxation techniques:

wherein ,J_L (lambda, theta) represents the Lagrangian value in such a way that the expected value J of the benefit will be calculated _R (θ) adding an expected value J of penalty values corresponding to various resources _C A weighted sum of (θ);

λ represents the weights of the four constraint signals;

J _R (θ) represents the policy quality, i.e., the expected value of the benefit;

λ _i representing weights of the constraint signal;

J _C (θ) represents an expected value of the penalty value;

J _ξ (θ) represents a weighted sum of the expected values of the four constraint signal penalty values;

where λ is the weight of four constraint signals, J _ξ And (θ) is the desired gain weighted sum of the four constraint signals. Next, we calculate J using log likelihood method _L (lambda, theta) gradient.

wherein ,

representing gradient operations with respect to θ;

J _L (lambda, theta) represents the Lagrangian value in such a way that the expected value J of the benefit will be calculated _R (θ) adding an expected value J of penalty values corresponding to various resources _C A weighted sum of (θ);

representing a desire;

pi theta (p|c) represents a policy function of c;

q (c, p) represents the rewards earned given the decision p made by the input container cluster c algorithm;

p-pi theta (|c) represents the deployment policy p for a given input c;

in the above equation, Q (c, p) is used to describe the rewards available at a given input c and decision p made by the algorithm. The calculation method is by adding the weighted sum of all constraint unsatisfied values C (p) to the benefit value R (p), as shown in (18):

wherein Q (c, p) represents the rewards earned under the decision p made by the algorithm of a given input container cluster c;

r (p) represents rewards available to the system corresponding to decision p;

ζ (p) represents a weighted sum of penalty values for all constraint signals of decision p;

λ _i representing weights of the constraint signal;

c (p) represents the penalty value generated by the next constraint signal at decision p;

then we approximate the Lagrangian gradient using Monte Carlo sampling

Where m is the number of samples, to reduce the variance of the gradient, the modulus is acceleratedConvergence rate we use the critic network as the benchmark evaluator b, which is made up of a simple RNN network. The lagrangian gradient can be expressed as:

wherein ,

representing the Lagrangian gradient;

m represents the number of samples;

Q(c,p _i ) Representing the decision p made by the algorithm at a given input container cluster c _i And awarding the obtained awards.

b(c,p _i ) Represented in a given input container cluster c and decision p _i An evaluation value given by the lower reference evaluator b;

a gradient representing the logarithm of the policy function;

finally, updating the parameter theta of the network model by adopting a random gradient descent method:

wherein ,θ_k+1 Model parameters representing the next moment;

θ _k model parameters representing the current time;

alpha represents a learning rate;

representing the lagrangian gradient approximated using monte carlo sampling; />

The reference evaluator gives the evaluation value b (c, p) of the current bin cluster report, and then the parameter sigma of the reference evaluator is updated based on the mean square error of b (c, p) and the prize value Q (c, p) using a random gradient descent method.

wherein ,

m represents the number of samples;

the container cluster deployment algorithm training process optimized based on graph convolution network and neural combination can be described as table 1:

TABLE 1 description of Container Cluster deployment Algorithm training Process based on graph roll-up network and neural combination optimization

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. The container cluster online deployment method for fusing the graph neural network and reinforcement learning in the edge calculation is characterized by comprising the following steps of:

s1, extracting topological association relations existing between containers through a graph convolution network; updating the optimized model parameters by the actor network according to the output of the commentator module; wherein the optimization model is:

max (total charge-total energy expenditure) (1.1)

Wherein N represents a set of physical nodes;

G _c representing a per unit computational resource benefit;

i represents a service request set;

V _i a container set representing a service request i;

representing a binary flag bit->

When container j representing request i is deployed on physical node k;

representing the demand of container j for request i for computing resources;

G _m representing the benefit of each unit memory resource;

representing the amount of memory resource demand by container j of request i;

G _s representing the benefit of storage resources per unit;

representing the demand of container j for request i for storage resources;

wherein N represents a set of physical nodes;

representing the maximum energy consumption value of the physical node k;

an idle energy consumption value representing a physical node k;

i represents a service request set;

V _i a container set representing a service request i;

representing a binary flag bit->

When container j representing request i is deployed on physical node k;

representing the demand of container j for request i for computing resources;

representing the total amount of computing resources of the physical node k;

u _k representing binary flag bits, u _k When=1, it means that physical node k is activeA state;

c represents a unit energy consumption branching coefficient;

alternatively, min (total energy expenditure)

Wherein N represents a set of physical nodes;

representing the maximum energy consumption value of the physical node k;

an idle energy consumption value representing a physical node k;

i represents a service request set;

V _i a container set representing a service request i;

representing a binary flag bit->

When container j representing request i is deployed on physical node k;

representing the demand of container j for request i for computing resources;

representing the total amount of computing resources of the physical node k;

u _k representing binary flag bits, u _k When=1, it means that the physical node k is activeA state;

c represents a unit energy consumption branching coefficient;

2. The method for online deployment of container clusters fusing graph neural networks and reinforcement learning in edge computing according to claim 1, wherein the hierarchical propagation of graph rolling networks in step S1 is:

wherein ,H^(l+1) Features representing layer l+1;

σ () represents an activation function;

representation matrix->

A degree matrix of (2);

representation pair->

Performing (-1/2) power operation on the matrix;

a represents a relationship matrix between nodes in the graph G;

H ^(l) features representing a first layer;

W ^(l) representing the training parameter matrix of the first layer.

3. The method for on-line deployment of container clusters in edge computing with fusion of graph neural networks and reinforcement learning according to claim 1, wherein the deployment strategy in step S2 is:

π(p|c,θ)＝P _r {A _t ＝p|S _t ＝c,θ _t ＝θ}

θ represents the training parameters of the model;

P _r representing a probability of outputting the deployment policy p;

A _t an operation at time t;

S _t a state at time t;

θ _t the training parameters at time t are indicated.

4. The method for online deployment of container clusters in edge computing incorporating graph neural networks and reinforcement learning of claim 1, further comprising step S3, after step S1, of the reviewer network evaluating returns obtained after performing actor actions.

5. The method for on-line deployment of container clusters fusing graph neural network and reinforcement learning in edge computing according to claim 1, wherein constraint conditions of an optimization model are as follows:

i represents a service request set;

n represents a set of physical nodes;

representing a binary flag bit->

When container j representing request i is deployed on physical node k; />

Representing the demand of container j for request i for computing resources;

representing the total amount of computing resources of the physical node k;

wherein N represents a set of physical nodes;

representing a binary flag bit->

When container j representing request i is deployed on physical node k; i represents a service request set;

V _i a container set representing a service request i;

wherein I represents a service request set;

V _i a container set representing a service request i;

representing a binary flag bit->

Container m, which represents request i, is deployed at physical node k _u Applying; />

Representing a binary flag bit->

Container n, which represents request i, is deployed at physical node k _v Applying; b (B) _ku·kv Representing a physical node k _u and k_v The total amount of bandwidth resources between;

wherein I represents a service request set;

n represents a set of physical nodes;

representing a binary flag bit->

When container j representing request i is deployed on physical node k; />

Representing the demand of container j for request i for computing resources;

representing the total amount of computing resources of the physical node k;

representing the amount of memory resource demand by container j of request i;

representing the total memory resource amount of the physical node k;

representing the demand of container j for request i for storage resources;

representing the total amount of storage resources of physical node k.

6. The method for online deployment of container clusters fusing graph neural network and reinforcement learning in edge computing according to claim 1, wherein the model is updated as follows:

wherein ,θ_k+1 Model parameters representing the next moment;

θ _k model parameters representing the current time;

alpha represents a learning rate;

represents the lagrangian gradient approximated using monte carlo sampling.

7. The method for online deployment of container clusters in edge computing that fuses graph neural networks and reinforcement learning of claim 6, wherein the model updating further comprises:

wherein ,

m represents the number of samples;

b(c,p _i ) Represented in a given input container cluster c and decision p _i The evaluation value given by the lower reference evaluator b.