CN111813539A

CN111813539A - Edge computing resource allocation method based on priority and cooperation

Info

Publication number: CN111813539A
Application number: CN202010473969.6A
Authority: CN
Inventors: 袁新杰; 杜清河
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-10-23

Abstract

The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method based on priority and cooperation. The computing resources in the edge computing servers and cloud servers are represented by the number of CPU cycles per unit time available, which can be allocated to tasks running on the servers according to different allocation schemes, but it is difficult to guarantee that long-term revenue is maximized. The application provides a method for distributing edge computing resources based on priority and cooperation, which comprises the following steps: 1) defining edge computation model states, actions and rewards; 2) defining the structure of a neural network and the structure of input and output; 3) and updating, training and applying the neural network according to a given training method. By reasonably distributing computing resources in the edge computing server and the cloud server, specifically CPU (Central processing Unit) cycles in unit time, the long-term benefits related to relative time delay and server energy consumption are improved.

Description

Edge computing resource allocation method based on priority and cooperation

Technical Field

The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method based on priority and cooperation.

Background

Mobile users usually have less computing resources due to their own volume and other limitations, and cannot perform a large amount of computation for a long time due to the limitations of device energy consumption and battery capacity. Therefore, for some compute-intensive tasks, if the user only uses his own computing resources to perform processing, it is difficult to meet the requirements of the tasks for low latency, and problems such as shortened standby time and excessive heat generation of the device may be caused. Mobile users therefore need to resort to external computing resources, which in existing networks are typically from cloud computing nodes, also referred to as cloud nodes or cloud servers. However, with the increase of internet of things devices and the development of 5G, the cloud computing technology is increasingly weak in performance, so that the edge computing technology comes along with the increase of the internet of things devices as a supplement. The method aims to configure computing resources at the edge of a network so as to achieve the aims of reducing the bandwidth occupation of a core network, shortening time delay and the like.

In a traditional cloud computing mode, a user uploads a computing-intensive task to a cloud server for processing through a core network, although computing resources of the cloud server are sufficient and computing can be completed in a short time, transmission delay caused by factors such as limited bandwidth of the core network and network jitter is large. To reduce transmission delays, mobile edge computing technology deploys computing resources near the edge of the user's network, such as at a wireless router or base station. Therefore, only one-hop connection exists between the edge computing server and the user, and the data of the user does not need to be uploaded to the cloud computing server through the core network for processing, so that the lower transmission delay is realized. However, compared to a cloud computing server, the computing resources of the edge computing server are relatively limited, and therefore how to efficiently allocate and utilize the computing resources becomes one of the challenges in the mobile edge computing technology. The edge computing environment is modeled as a Markov decision process, and a deep reinforcement learning method is used for optimizing the success rate of tasks and long-term income in consideration of the complexity of the model.

The computing resources in the edge computing servers and cloud servers are represented by the number of CPU cycles per unit time available, which can be allocated to tasks running on the servers according to different allocation schemes, but it is difficult to guarantee that long-term revenue is maximized. Where the benefits are mainly related to relative latency and server energy consumption.

Disclosure of Invention

1. Technical problem to be solved

Based on the problem that the number of CPU cycles in the available unit time of computing resources in an edge computing server and a cloud server can be distributed to tasks running on the server according to different distribution schemes, but long-term profit maximization is difficult to guarantee, the application provides an edge computing resource distribution method based on priority and cooperation.

2. Technical scheme

To achieve the above objective, the set of edge computing servers and cloud servers in an edge computing server cluster in this application can be expressed as

Where number 0 represents a cloud computing node,

number

1, 2. Their computational resource capacity may be expressed as V ═ V₀,v₁,v₂,...,v_MIn which v is_mRepresenting nodes

The computing resource capacity of (a). After an edge user unloads a task to a certain edge computing server, the edge computing server is called as an origin server of the task, the origin server can further decide to process the task by itself or unload the task to a cloud server or other edge computing servers in the same cluster for processing, and a server for processing the task is called as a destination server.

In such an environment, the method of the present application comprises:

1): designing a state with priority and network node attributes under the condition of considering edge-to-edge coordination, edge-to-cloud coordination and task self-priority, designing an action comprising destination server decision and computing resource allocation decision, and rewarding aiming at the task;

2): for the 1) defined states, actions, and rewards, designing a first neural network structure for destination server decisions, and designing a second neural network structure for computing resource allocation decisions;

3): and according to a given algorithm, training and updating the first neural network and the second neural network in the process of interaction of the agent and the edge computing environment, and applying the neural networks after training.

Dividing the edge computing environment into a plurality of frames in time, wherein the time length of each frame is t^frame. Assuming that at the k frame, the total number of tasks uploaded to the cloud computing node in and from the cluster is N_kWill mark them as

Where the index k denotes the kth frame.

Another embodiment provided by the present application is: the state in 1) is an attribute of all tasks uploaded to the cloud computing node within and by the cluster, the action is a decision of destination server and computing resource allocation made by all tasks, and the reward is a contribution of each task to a utility function.

Another embodiment provided by the present application is: the rewards include a delayed revenue item, a task failure penalty item, and an energy consumption penalty item.

Another embodiment provided by the present application is: the first neural network in the step 2) is an h network, the h network comprises a state perceptron and an h actor network, and the state perceptron is used for extracting characteristic information in a state and inputting the characteristic information into the h actor network.

Another embodiment provided by the present application is: in the destination server decision process, the destination of each task is served as a different decision process, each decision has M +1 actions, and the final output is (M +1) multiplied by N scalars, wherein N represents the input task number which can be maximally responded by the neural network, so that N is more than or equal to N_kAnd M +1 is the number of compute nodes.

Another embodiment provided by the present application is: the second neural network in the step 2) is an f network, the f network comprises a state perceptron, an f actor network and an f criticizing network, and the state perceptron is used for extracting characteristic information in a state.

Another embodiment provided by the present application is: the f actor receives the output of the state sensor, thenPost-outputting the computing resources f allocated to each task_k＝[f_1,k,f_2,k,...,f_N,k]Here, the task set is selected only among the N outputs

Corresponding to N_kN of_kThe value is used to represent the number of CPU cycles per unit time allocated to the corresponding task; the f criticizing network receives the output of the state perceptron and the calculation resource allocation scheme and then outputs an action state cost function [ Q ] for the actions¹(s_k,f_k),Q²(s_k,f_k),...,Q^N(s_k,f_k)]Wherein s is_kIs the state defined in said 1), Q¹(s_k,f_k) Corresponds to f_1,kState cost function of, Q²(s_k,f_k) Corresponds to f_2,kThe state cost function of (c), and so on.

Another embodiment provided by the present application is: and 3) when the first neural network is updated, the mean square error function is taken as a Loss function, and when the second neural network is updated, the mean square error function is taken as the Loss function.

The updating method of the first neural network (h network) in the step 3) comprises the following steps: suppose task T_i,kIs inherited to the k +1 frame and is noted as T_m,k+1(in this time d_i,k0) or has been successfully completed or has failed due to a timeout (d is noted here)_i,k1), then

Updating the neural network for the Loss function, where θ^h,policyA parameter representing the h-network is shown,

Q_h,targetand Q_h,policyRespectively representing the outputs of the h-target network and the h-network, s_kAnd s_k+1For the state of the environment in the k frame, s_i,kAnd s_m,k+1Respectively represent tasks T_i,kAnd T_m,k+1OfHaving an attribute of D_i,kFor task T_i,kA destination server of, and R_i,kRepresenting a task T_i,kThe prize awarded, γ, is a discount factor.

The f actor network and the f critics network of the second neural network in 3) are updated respectively.

The updating method of the criticizing family network and the state perceptron comprises the following steps: suppose task T_i,kIs inherited to the k +1 frame and is noted as T_m,k+1(in this time d_i,k0) or has been successfully completed or has failed due to a timeout (d is noted here)_i,k1) to

Updating the neural network for the Loss function, where θ^f,policyParameters representing the critics and the status perceptors,

s_kand s_k+1For the state of the environment at the k frame, f_kRepresenting the computational resource allocation decision for the k-th frame,

and

respectively representing the ith output of the f critic network and the mth output of the target network corresponding to the f critics, pi_f,targetAnd f, showing the output of the target network corresponding to the actor.

The updating method of the actor network and the state perceptron comprises the following steps: with L_fa(θ^f,policy)＝-E[Q_f,policy(s_k,π_f,policy(s_k))]Updating the neural network as a Loss function, where_f,policyThe term "output" means "output of an actor" and the other terms are as defined above.

3. Advantageous effects

Compared with the prior art, the edge computing resource allocation method based on priority and cooperation has the advantages that:

according to the marginal computing resource allocation method based on priority and cooperation, effective perception and decision can be conducted on marginal computing environments through state, action and reward definition methods, a neural network structure, a neural network input and output structure, training and application methods and the like, and long-term profit maximization is achieved through side-to-side cooperation, side-to-cloud cooperation and load balancing.

According to the priority and cooperation-based edge computing resource allocation method, after the environment state is decoupled into the states of all tasks, the states are input into a specially designed neural network, and the output and the acquired rewards of the neural network also correspond to all the tasks.

According to the edge computing resource allocation method based on priority and cooperation, destination server decision and computing resource allocation decision are carried out by two sets of neural networks, namely the first neural network and the second neural network, and long-term profit maximization is achieved by fully utilizing the cooperation effect.

According to the edge computing resource allocation method based on priority and cooperation, computing resources in the edge computing server and the cloud server are reasonably allocated, specifically the number of CPU cycles in unit time, so that long-term benefits related to relative time delay and server energy consumption are improved.

Drawings

FIG. 1 is a schematic diagram of a first neural network architecture of the present application;

FIG. 2 is a second neural network architecture diagram of the present application;

FIG. 3 is a schematic diagram illustrating an effect of the edge computing resource allocation method based on priority and cooperation according to the present application.

Detailed Description

Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings, and it will be apparent to those skilled in the art from this detailed description that the present application can be practiced. Features from different embodiments may be combined to yield new embodiments, or certain features may be substituted for certain embodiments to yield yet further preferred embodiments, without departing from the principles of the present application.

Although the defects of factors such as network jitter and the like in the traditional cloud computing can not completely meet the requirements of 5G application and service, the abundant computing resources have certain advantages when processing computing-intensive tasks, and meanwhile, when the load of the edge computing nodes is high, the cloud computing nodes can share part of the load, so that edge cloud cooperation is realized, and the user requirements are met. The edge computing nodes and the cloud computing nodes need to be connected through a core network, the bandwidth of the edge computing nodes is relatively limited, and the edge computing nodes can be directly connected with each other in a certain area due to the close spatial distribution of the edge computing nodes, so that the bandwidth is relatively sufficient, and the edge computing nodes can be mutually matched to realize edge-edge cooperation and load balancing.

Different tasks typically require different priorities. For example, in a certain commercial city, a task request for photographing and identifying an object initiated by a common tourist should have a lower priority and can tolerate a longer time delay or even a task failure to a certain extent, while a task request for identifying a suspicious person or a behavior initiated by a security camera should have an extremely high priority and needs to be successfully processed within a shorter time delay.

Therefore, it is necessary to design an edge computing resource allocation method under a scenario in which task priority, edge-to-edge collaboration, and edge-to-cloud collaboration are considered.

With reference to fig. 1 to 3, the present application provides a method for allocating edge computing resources based on priority and collaboration, the method comprising:

Further, the state in 1) is an attribute of all tasks uploaded to the cloud computing node within and from the cluster, the action is used as a decision of destination server and computing resource allocation made for all tasks, and the reward is a contribution of each task to a utility function.

Further, the rewards include a delayed revenue item, a task failure penalty item, and an energy consumption penalty item.

Further, the first neural network in 2) is an h-network, the h-network includes a state perceptron and an h-actor network, and the state perceptron is configured to extract feature information in a state and input the feature information into the h-actor network.

Further, in the destination server decision process, the destination server decision of each task is regarded as different decision processes, each decision has M +1 actions, and the final output is (M +1) multiplied by N scalars, wherein N represents the number of input tasks which can be maximally handled by the neural network, so that N is more than or equal to N_kAnd M +1 is the number of compute nodes.

Further, the second neural network in 2) is an f network, the f network comprises a state perceptron, an f actor network and an f criticizing family network, and the state perceptron is used for extracting characteristic information in a state.

Further, the f actor receives the output of the state perceptron and then outputs the computing resources f allocated for each task_k＝[f_1,k,f_2,k,...,f_N,k](ii) a The f criticizing network receives the output of the state perceptron and the calculation resource allocation scheme and then outputs an action state cost function [ Q ] for the actions¹(s_k,f_k),Q²(s_k,f_k),...,Q^N(s_k,f_k)]Wherein s is_kIs the state defined in said 1), Q¹(s_k,f_k) Corresponds to f_1,kState cost function of, Q²(s_k,f_k) Corresponds to f_2,kThe state cost function of (c), and so on.

Further, in the step 3), the mean square error function is used as a Loss function when the first neural network is updated, and the mean square error function is used as a Loss function when the second neural network is updated.

Further, the first neural network updating process in 3) is as follows:

suppose task T_i,kIs inherited to the k +1 frame and is noted as T_m,k+1(in this time d_i,k0) or has been successfully completed or has failed due to a timeout (d is noted here)_i,k1), then

Q_h,targetand Q_h,policyRespectively representing the outputs of the h-target network and the h-network, s_kAnd s_k+1For the state of the environment in the k frame, s_i,kAnd s_m,k+1Respectively represent tasks T_i,kAnd T_m,k+1All attributes of, D_i,kFor task T_i,kA destination server of, and R_i,kRepresenting a task T_i,kThe prize awarded, γ, is a discount factor.

F actor and f critics of the second neural network will be updated separately;

and

respectively representing the ith output of the f critic network and the mth output of the target network corresponding to the f critics, pi_f,targetAn output representing a target network corresponding to the actor;

the updating method of the actor network and the state perceptron comprises the following steps: to be provided with

Updating the neural network as a Loss function, where_f,policyThe term "output" means "output of an actor" and the other terms are as defined above.

In the step 1), the process of defining the state, the action and the reward of the edge calculation model is as follows, taking the k-th frame as an example:

before defining the state, firstly, the attribute of the task is required to be acquired, and the task T is used for_i，kFor example, the required attributes are: amount of data to be transmitted

Number of CPU cycles to process

Remaining allowable delay

Maximum allowed time delay

Task priority l_i,kSource server

And destination server

Then the process of the first step is carried out,

state s_k: attributes of all tasks within and uploaded by the cluster to the cloud computing nodes. Namely, it is

Wherein

For each task attribute.

Action a_kThe decision of destination server and computing resource allocation made for all tasks. Namely, it is

Wherein a is_i,kDenoted as task T_i,kThe decision made, a_i,k＝[h_i,k,f_i,k]。

A destination server representing a processing task; f. of_i,kIndicating the computing resources that the destination server allocated for the task.

Prize r_kThe contribution of each task to the utility function. Namely, it is

Wherein R is_i,kAlso consists of three terms:

the delayed gain term:

task failure penalty item:

an energy consumption penalty term:

then the three terms are weighted and combined in the same way as the calculation utility function to obtain R_i，k：

Where α, η, and β are weighting coefficients associated with the edge computing environment.

In the above 2), the structure and input-output structure of the neural network are as follows

Note that the states, actions, and rewards defined in said 1) are all vectors, and the length of these three vectors is all equal to N_kIn this regard, the length thereof varies. The number of input nodes and output nodes of the used neural network is fixed, that is, the input dimension and the output dimension are fixed. Thus in the state s_kBefore inputting into the neural network, except normalization processing, zero filling expansion is also needed. While taking into account s_i,kD in (1)_i,kAnd R_i,kIs a server number, and does not indicate a relative size, and thus a one-hot code (D) is required to be encoded_i,kAnd R_i,k. For the action of the output of the neural network and the action state cost function, only meaningful N is taken_kAs a function of action and action state cost.

The schematic structure of the neural network is shown in fig. 1 and 2. In both figures it is assumed that there are a maximum of N tasks, N, within the scope of the study_kN is less than or equal to N. The leftmost side of the graph is the neural network input, the rightmost side is the neural network output, except for the smallest cube of the output layer, each cube represents a network structure formed by a plurality of network layers, and each smallest cube of the output layer represents a scalar.

Fig. 1 depicts a first neural network used for destination server decision making, which network is named h-network for convenience of expression. In this configuration, the two leftmost layers are state perceptrons (state per), which are used to determine the state of the deviceAnd the device is responsible for extracting characteristic information in the state. The characteristic information extracted by the state perceptron and the attribute information of a certain task are input to an h actor (vector), and the h actor network outputs a plurality of action state value functions (Q(s) corresponding to the task_k,s_i,k,h_i,k＝0),Q(s_k,s_i,k,h_i,k＝1),...,Q(s_k,s_i,k,h_i,k＝M)]Wherein Q(s)_k,s_i,k,h_i,k0) represents an action state cost function for offloading the task to compute node 0 (cloud server) processing, Q(s)_k,s_i,k,h_i,k1) represents the action state cost function that is handled to offload tasks to compute node 1 (edge server numbered 1), and so on. In this algorithm, the destination server decisions for each task are treated as a different decision process, so there are M +1 possible actions for each decision, so the final output is (M +1) × N scalars.

FIG. 2 depicts a second neural network used for the computational resource allocation problem, which is named f-network for convenience of expression. The first layers of the input layer of the structure are the same as the first layers of the h network in structure and are all state sensors. The two network blocks to the right of the state perceptron are named f actor (f actor) and f criticist (f critic). The actor receives the output of the state perceptron and then outputs the computing resources f allocated for each task_k＝[f_1,k,f_2,k,...,f_N,k]In which meaningless terms can be ignored, taking only the meaningful N thereof_kAn item. f criticizing the network receives the output of the state perceptron and the calculation resource allocation scheme, and then outputs the action state value function [ Q ] aiming at the actions¹(s_k,f_k),Q²(s_k,f_k),...,Q^N(s_k,f_k)]. Will f is_kEach of which is considered as an action made for the corresponding task, so that the output of the f critics also has N dimensions, Qⁱ(s_k,f_k) Just correspond to f_i,kThe state cost function of. With this structure, it can be understood that the function of the f criticist is similar to that of the h actor, and the input is performedThe action state cost function is generated, and the function of the actor is to seek the action that maximizes the action state cost function.

The training method and the application process in the step 3) are as follows:

target networks with the same structure are set for each network, namely an h target network with the same structure as the h network and an f target network with the same structure as the f network. In addition, the status, actions, rewards, next actions and whether the task was successfully processed or failed due to timeout for each step are stored in the memory bank using an empirical replay technique. In the interaction process of the agent and the edge computing environment, a concept of an ensemble (epicode) is also defined, each L frame is defined as one ensemble, and the update of the neural network is also carried out after each ensemble, but not after each frame.

Destination server decision process and h-network update algorithm

For the sake of simplicity, the h network and the h target network are distinguished by subscripts, i.e. the h network is marked as Q_h,policyH target network is Q_h,target。

And at the beginning of each frame, the h network acquires the current state information and outputs an action state cost function of each action. But only new task T_i,kRequiring the use of output corresponding thereto to make destination server decisions, i.e.

In the training process, in order to ensure that the intelligent agent can fully explore the environment, only a certain probability of 1-epsilon_kTake the action with ∈_kTake a random action. Namely, it is

The exploration of the environment by the agent should decrease as the number of iteration rounds increases, so belongs to_kWill decrease as the number of iteration rounds increases.

Unlike a typical reinforcement learning environment, the environment is a hybridActions other than per frame decisions may be performed. In this environment, once the destination server is determined, the destination server decisions for the next several frames are meaningless and cannot be performed, which makes it impossible to iteratively update the h-network directly using the bellman optimal equation. Therefore, the update method of the h-network needs to be adjusted. Specifically, assume task T_i,kIs inherited to the k +1 frame and is noted as T_m,k+1Then h network is updated as follows:

this equation may be understood as meaning that for an existing task, in whichever frame, its destination server decision action is only D_i,kThis option; if task T_i,kAfter the kth frame has been successfully completed or failed due to timeout, it is updated as follows:

by combining the above two equations, the update process can be merged into:

wherein d is_i,kRepresents the task T_i,kIf the task is successfully completed or fails due to timeout, d _i,k1 is ═ 1; otherwise, the task is not completed and will be inherited to the (k +1) th frame, then d _i,k0. It can thus also be seen that the transitions stored in the memory banks do not only require s_k,a_k,r_k,s_k+1And also stores the completion vector

The mean square error function is taken as the Loss function when updating, i.e.

Wherein

In the formula, h network is iteratively updated in each set, and h target network updates itself directly by using h network parameters after h network updates C round (C is a constant). If using theta^h,policyParameter, theta, representing h-network^h,targetThe parameter of the h target network is expressed, and every C round, the order theta^h,target＝θ^h,policy

Computational resource allocation decision process and f network update algorithm

For convenience of presentation, the state perceptron and the f actor network are denoted as π_f,policyWith its corresponding target network denoted as pi_f,targetWhile the status sensor and the f criticizing network are denoted as Q_f,policyIts corresponding target network is denoted as Q_f,target。

At the beginning of each frame, the computational resources f allocated to each task are output by the actors based on the status_k＝[f_1,k,f_2,k,...,f_N,k]However, in order for the agent to fully explore the environment, a certain amount of noise must be added to the action during the training process, i.e. the agent is not aware of the environment, but only needs to make a certain amount of noise

f′_i,k＝clip(f_i,k+n,0,v_m) (12)

Wherein

For random noise, the noise standard deviation σ is the same as the destination server decision process_kAlso decreases as the number of iteration rounds increases; clip represents a clipping function, which is defined as

The function is used for ensuring that the action after adding the noise still satisfies f which is more than or equal to 0_i,k≤v_m. Wherein for a task still in the wired transmission phase that has not yet reached the destination server, it is forced to set the computing resources allocated to it to 0. If the action after adding noise is not satisfied

The actions need to be further processed:

Q_f,policyand Q_f,targetThe network update process is very similar to the h-network update process, assuming task T_i,kSuccessfully processed in the k frame or failed due to timeout, or inherited to the k +1 frame and noted as T_m,k+1Then the update process can be expressed as follows

The relationship between the f actor and the f critics' network is very similar to that between the generator and the evaluator in the generative confrontation network, and the goal of the f actor is to maximize the output of the f critics, i.e., its training process can be expressed as

As with h-network, the mean square error function is used as Q_f,policyLoss function of network:

in the formula

And pi_f,policyThen is directly used

L_fa(θ^f,policy)＝-E[Q_f,policy(s_k,π_f,policy(s_k))](19)

As a Loss function. It is noted that L_fcFor updating Q only_f,policyNot the entire f-network, L_faAlso only used to update pi_f,policyUpdating pi_f,policyWhen f critics' network is fixed.

Similarly, the f network is iteratively updated in each set, and the f target network updates itself directly with the parameters of the f network after the f network updates C rounds (C is a constant). However, unlike h-networks, updates to f-networks use soft updates, i.e., if θ is used^f,policyRepresenting a parameter of the f-network, theta^f,targetParameters representing the f target network, let θ every C rounds^f,target＝τθ^f,policy+(1-τ)θ^f,targetWherein τ is the update rate, and generally takes a smaller value.

In summary, the interaction process of the agent with the environment and the learning process of the agent are shown in algorithm 1.

When in application, according to the structure and the input and output method in the 2), a state action value function is obtained from the h network, an action corresponding to the maximum value is selected from the state action value function as a destination server decision, and a computing resource decision for each task is obtained from an f actor in the f network.

Examples

In this section, a specific edge calculation model is given, but it should be noted that this is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and those skilled in the art should understand that the present application includes, but is not limited to, the contents described in the drawings and the above detailed description. Any modification which does not depart from the functional and structural principles of the present application is intended to be included within the scope of the claims.

The edge computing network model is composed of three layers of structures, wherein the three layers are respectively as follows from top to bottom: a cloud computing node layer, an edge computing server cluster layer, and an IoT device layer (user layer). The cloud settlement node layer comprises a cloud computing node which has relatively more computing resources. The edge computing server cluster layer comprises a plurality of edge computing server clusters, each server cluster comprises a plurality of edge computing servers, a schematic diagram of three edge computing server clusters is drawn in the diagram, each server cluster comprises three edge computing servers, and each edge computing server is placed beside one wireless access point (such as a base station, a wireless router and the like), so that the transmission delay from the wireless access point to the edge computing servers can be ignored. The edge calculation server cluster is divided according to position distribution, that is, a plurality of edge calculation servers in one edge calculation server cluster are relatively close in spatial position, so that they are directly connected with each other by optical fibers and the like. Different edge computing server clusters are connected to the core network together and connected to the cloud computing nodes through the core network. The IoT device layer contains several IoT devices, each connected with one wireless access point through a wireless link.

Consider edge-edge collaboration within a cluster of servers and collaboration of edge computing servers with cloud computing nodes within the cluster. Recording a set formed by cloud nodes and edge computing servers in a certain cluster as a set

Where number 0 represents a cloud computing node,

number

1, 2. The computational resource capacity of these nodes may be denoted V ═ { V ═ V₀,v₁,v₂,...,v_MIn which v is_mRepresenting nodes

The computing resources of the cloud node are relatively sufficient, i.e., v₀＞v_m(m.noteq.0). In this model, the computational resources are expressed in terms of the number of CPU cycles per unit time. Is connected to the clusterThe IoT device of the wireless access point at the edge computing server side can be represented as

In a network, users are involved in the transfer of data in the process of offloading tasks. In the wireless transmission process, the bandwidth is B^wireless(ii) a During wired transmission, data is assumed to be in the node

Is transmitted between, then its bandwidth is

Because the edge computing nodes in the same cluster are distributed close to each other in space and are directly connected by using optical fibers and the like, the bandwidth, the data transmission rate and the like between the edge nodes are relatively high; the distance between the edge computing node and the cloud computing node is longer, and the edge computing node and the cloud computing node need to be connected through a core network, so that the transmission delay is relatively higher, the network bandwidth is relatively narrower, and the network bandwidth is relatively shorter, so that the network bandwidth is relatively longer

Due to the limitation of the battery life and the limitation of the computing capacity of the internet of things equipment, the internet of things equipment needs to borrow computing resources of the edge computing nodes and the cloud computing nodes to process tasks which are continuously generated by the internet of things equipment. The task is assumed to be atomic, that is, the task can only be processed at one edge computing node or at a cloud node, and cannot be processed after being divided. Thus, a task will first be offloaded to an edge computing server near its corresponding wireless access point, which is referred to as the origin server for the task, which may have three further choices for the task:

a) the source server processes the task by itself;

b) the task is further transmitted to other edge computing servers in the same cluster for processing;

c) and further unloading the task to the cloud computing node for processing.

The server that ultimately processes the task is referred to as the destination server for the task.

A task may be abstracted into several key attributes. Assume that at the k-th frame, there is a total of N tasks uploaded to the cloud computing nodes within and from the cluster_kWill be described as

Some of the tasks are tasks which are not completed or failed in the k-1 th frame, and the tasks are inherited by the k-1 th frame, so that the tasks are called existing tasks; still other tasks are just arriving at the edge compute server at the beginning of this frame and are yet to be further offloaded or processed, which are referred to as new tasks. The tasks own some attributes related to transmission, processing and priority, namely the task T_i,kFor example, its own attributes can be expressed as:

a) amount of data to be transmitted

After the task reaches the source server, the task is processed by the destination server, and in the process, the source server needs to transmit data including user input data, code data and the like to the destination server. If the destination server and the source server are the same server, then

If it is a different server, then

The amount of data that needs to be transferred remains from the source server to the destination server, for a new task,

b) number of CPU cycles to process

The task itself requires a certain amount of computation,the amount of this calculation is expressed in number of CPU cycles. For new tasks and tasks that have not yet reached the destination server,

c) remaining allowable delay

The task is transmitted and processed with a certain time consumption, and the remaining allowable time delay is reduced. The desired task can be in

The task is completed within time, but it is also possible that the task is completed over time, when the task processing times out,

possibly negative. For new tasks

Wherein

In order to maximize the allowed time delay,

the wireless transmission process is time consuming.

d) Task priority l_i,k. The priority is an integer, and 1 represents the lowest priority.

The task has network-related attributes, i.e. origin server, in addition to its own attributes

And destination server

At the k frame, the utility function will contain a delay gain term

Task failure penalty item

And calculating node energy consumption penalty item

Three parts:

the delayed gain item is the gain obtained when the task in the current frame is completed, and is relative to the delayed gain of the task

And task priority l_i,kIs proportional, i.e.

Wherein 1 is_A(x) For indicating the function, it is used to indicate whether the element x is in the set A, and its expression is

While

Indicating a set of successfully completed tasks processed in the kth frame.

The task failure penalty item is the penalty suffered by the current intra-frame task processing overtime and failure, and the task failure penalty item is eliminated when the environment is eliminated, and is in direct proportion to the task priority, and the higher the priority of the task is, the greater the penalty is obtained after the task processing overtime and failure, namely

Wherein

Indicating a set of tasks that failed processing at the kth frame.

The penalty term of the energy consumption of the computing nodes is the sum of the energy consumption of each computing node

Where (x) is defined as a binary function, the expression for which is:

kappa is the power coefficient and has the unit of W.Hz^-3；

Representing the power of server m.

The utility function being a weighted combination of the three above, i.e. utility function

Where α, β and η are weighting coefficients, care should be taken to satisfy η > α, i.e. to ensure successful processing of a task over time

The negative gain obtained is greater than the penalty obtained when the task fails over time.

According to said 1), in such an environment, the states, actions and rewards in the edge calculation model are first defined, taking the k-th frame as an example:

Wherein

For each task attribute.

Action a_kPurpose made for all tasksAnd the location server and the computing resource allocation. Namely, it is

Prize r_kThe contribution of each task to the utility function. Namely, it is

Wherein R is_i,kAlso consists of three terms:

the delayed gain term:

task failure penalty item:

an energy consumption penalty term:

Where α, β and η are defined identically to the corresponding variables in the utility function.

After defining the states, actions and rewards, the structure of the neural network and the input-output structure are defined as follows:

note that the states, actions, and rewards defined in 1) are all vectors, and the lengths of these three vectors are all equal to N_kIn this regard, the length thereof varies. The number of input nodes and output nodes of the used neural network is fixed, that is, the input dimension and the output dimension are fixed. Thus in the state s_kBefore inputting into the neural network, except normalization processing, zero filling expansion is also needed. While taking into account s_i,kD in (1)_i,kAnd R_i,kIs a server number, and does not indicate a relative size, and thus a one-hot code (D) is required to be encoded_i,kAnd R_i,k. For the action of the output of the neural network and the action state cost function, only meaningful N is taken_kAs a function of action and action state cost.

Fig. 1 depicts a first neural network used for destination server decision making, which network is named h-network for convenience of expression. In this structure, the two leftmost layers are state perceptrons (state perceptrons) which are responsible for extracting feature information in the state. The characteristic information extracted by the state perceptron and the attribute information of a certain task are input to an h actor (vector), and the h actor outputs a plurality of action state value functions (Q(s) corresponding to the task_k,s_i,k,h_i,k＝0),Q(s_k,s_i,k,h_i,k＝1),…,Q(s_k,s_i,k,h_i,k＝M)]Wherein Q(s)_k,s_i,k,h_i,k0) represents an action state cost function for offloading the task to compute node 0 (cloud server) processing, Q(s)_k,s_i,k,h_i,k1) represents an action of offloading a task to a processing node 1 (edge server numbered 1)As a function of state cost, and so on. In this algorithm, the destination server decisions for each task are treated as a different decision process, so there are M +1 possible actions for each decision, so the final output is (M +1) × N scalars.

FIG. 2 depicts a second neural network used for the computational resource allocation problem, which is named f-network for convenience of expression. The first layers of the input layer of the structure are the same as the first layers of the h network in structure and are all state sensors. The two network blocks to the right of the state perceptron are named f actor (f actor) and f criticist (f critic). The actor receives the output of the state perceptron and then outputs the computing resources f allocated for each task_k＝[f_1,k,f_2,k,…,f_N,k]In which meaningless terms can be ignored, taking only the meaningful N thereof_kAn item. f criticizing the network receives the output of the state perceptron and the calculation resource allocation scheme, and then outputs the action state value function [ Q ] aiming at the actions¹(s_k,f_k),Q²(s_k,f_k),…,Q^N(s_k,f_k)]. Will f is_kEach of which is considered as an action made for the corresponding task, so that the output of the f critics also has N dimensions, Qⁱ(s_k,f_k) Just correspond to f_i,kThe state cost function of. With this structure, it can be understood that the function of the f criticist is similar to that of the h actor, and outputs the action state cost function, and the function of the f actor seeks to maximize the action state cost function.

Finally, the training method and the application process are given as follows:

Destination server decision process and h-network update algorithm

Unlike a typical reinforcement learning environment, not every frame of decision-making action may be performed in this environment. In this environment, once the destination server is determined, the destination server decisions for the next several frames are meaningless and cannot be performed, which makes it impossible to iteratively update the h-network directly using the bellman optimal equation. Therefore, the update method of the h-network needs to be adjusted. Specifically, assume task T_i,kIs inherited to the k +1 frame and is noted as T_m,k+1Then h network is updated as follows:

this formula can be understood as being forTasking, with only D being the destination server decision-making action, regardless of which frame it is in_i,kThis option; if task T_i,kAfter the kth frame has been successfully completed or failed due to timeout, it is updated as follows:

by combining the above two equations, the update process can be merged into:

Wherein

f′_i,k＝clip(f_i,k+n,0,v_m) (37)

Wherein

The actions need to be further processed:

in the formula

And pi_f,policyThen is directly used

L_fa(θ^f,policy)＝-E[Q_f,policy(s_k,π_f,policy(s_k))](44)

Similarly, the f network is iteratively updated in each set, and the f target network updates itself directly with the parameters of the f network after the f network updates C rounds (C is a constant). However, unlike h-networks, updates to f-networks use soft updates, i.e., if θ is used^f,policyParameters representing f network，θ^f,targetParameters representing the f target network, let θ every C rounds^f,target＝τθ^f,policy+(1-τ)θ^f,targetWherein τ is the update rate, and generally takes a smaller value.

With 500 frames as a set, the figure 3 is made with the sum of the utility functions of all frames in each set as the reward (reward) as the index. The "no collaboration function benchmark" is a curve corresponding to the algorithm provided by the present application without considering collaboration, the "disposed" is a curve corresponding to the algorithm provided by the present application, the abscissa represents the training process, and the ordinate represents the benefit of each set, i.e., the sum of the utility functions of the frames in each set. It can be seen that after a period of training, the algorithm is stable, with approximately three times more gain than the no-collaboration scheme.

Although the present application has been described above with reference to specific embodiments, those skilled in the art will recognize that many changes may be made in the configuration and details of the present application within the principles and scope of the present application. The scope of protection of the application is determined by the appended claims, and all changes that come within the meaning and range of equivalency of the technical features are intended to be embraced therein.

Claims

1. A method for distributing edge computing resources based on priority and cooperation is characterized in that: the method comprises the following steps:

2. The method of claim 1, wherein the edge computing resource allocation based on priority and cooperation is: the state in 1) is an attribute of all tasks uploaded to the cloud computing node within and by the cluster, the action is a decision of destination server and computing resource allocation made by all tasks, and the reward is a contribution of each task to a utility function.

3. The method of claim 2, wherein the edge computing resource allocation based on priority and cooperation is: the rewards include a delayed revenue item, a task failure penalty item, and an energy consumption penalty item.

4. The method of claim 1, wherein the edge computing resource allocation based on priority and cooperation is: the first neural network in the step 2) is an h network, the h network comprises a state perceptron and an h actor network, and the state perceptron is used for extracting characteristic information in a state and inputting the characteristic information into the h actor network.

5. The method of claim 4, wherein the edge computing resource allocation based on priority and cooperation is: the destination server decision processThe destination server decision of each task is regarded as different decision processes, each decision has M +1 actions, and the final output is (M +1) multiplied by N scalars, wherein N represents the input task number which can be handled by the neural network at most, so that N is more than or equal to N_kAnd M +1 is the number of compute nodes.

6. The method of claim 1, wherein the edge computing resource allocation based on priority and cooperation is: the second neural network in the step 2) is an f network, the f network comprises a state perceptron, an f actor network and an f criticizing network, and the state perceptron is used for extracting characteristic information in a state.

7. The method of claim 6, wherein the edge computing resource allocation based on priority and cooperation is: the actor receives the output of the state perceptron and then outputs the computational resources f allocated for each task_k＝[f_1,k,f_2,k,...,f_N,k](ii) a The f criticizing network receives the output of the state perceptron and the calculation resource allocation scheme and then outputs an action state cost function [ Q ] for the actions¹(s_k,f_k),Q²(s_k,f_k),...,Q^N(s_k,f_k)]Wherein s is_kIs the state defined in said 1), Q¹(s_k,f_k) Corresponds to f_1,kState cost function of, Q²(s_k,f_k) Corresponds to f_2,kThe state cost function of (c), and so on.

8. The method of claim 7, wherein the edge computing resource allocation based on priority and cooperation is: and 3) when the first neural network is updated, the mean square error function is taken as a Loss function, and when the second neural network is updated, the mean square error function is taken as the Loss function.

9. The method of claim 8, wherein the edge computing resource allocation based on priority and cooperation is: the first neural network updating process in the step 3) is as follows:

to be provided with

Q_h,targetand Q_h,policyRespectively representing the outputs of the h-target network and the h-network, s_kAnd s_k+1For the state of the environment in the k frame, s_i,kAnd s_m,k+1Respectively represent tasks T_i,kAnd T_m,k+1All attributes of, D_i,kFor task T_i,kA destination server of, and R_i,kRepresenting a task T_i,kThe prize won, γ is the discount factor;

f actor and f critics of the second neural network will be updated separately;

and

the updating method of the actor network and the state perceptron comprises the following steps: with L_fa(θ^f,policy)＝-E[Q_f,policy(s_k,π_f,policy(s_k))]Updating the neural network as a Loss function, where_f,policyIndicates the output of the actor.