CN111813539B

CN111813539B - Priority and collaboration-based edge computing resource allocation method

Info

Publication number: CN111813539B
Application number: CN202010473969.6A
Authority: CN
Inventors: 袁新杰; 杜清河
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Filing date: 2020-05-29
Publication date: 2024-06-28
Anticipated expiration: 2040-05-29

Abstract

The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method based on priority and cooperation. The computing resources in the edge computing server and cloud server are represented by the number of CPU cycles per unit time that can be allocated to tasks running on the servers according to different allocation schemes, but it is difficult to guarantee long-term revenue maximization. The application provides a priority and collaboration-based edge computing resource allocation method, which comprises the following steps: 1) Defining edge computing model states, actions, and rewards; 2) Defining the structure of a neural network and the structure of input and output; 3) The neural network is updated, trained and applied according to a given training method. The computing resources in the edge computing server and the cloud server, particularly the CPU cycle number in unit time, are reasonably distributed, so that long-term benefits related to relative time delay and server energy consumption are improved.

Description

Priority and collaboration-based edge computing resource allocation method

Technical Field

The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method based on priority and cooperation.

Background

Mobile users often have less computing resources due to their own volume and the like, and cannot perform long-time and large-scale computation due to the limitations of energy consumption of devices, volume of batteries and the like. Therefore, for some computationally intensive tasks, if the user only uses own computing resources to process, it is difficult to meet the requirement of the task for low latency, and meanwhile, the problems of shortening the standby time, excessive heating of the device and the like may be caused. The mobile user therefore needs to resort to external computing resources, which in existing networks are usually derived from cloud computing nodes, also called cloud nodes or cloud servers. However, with the increase of the devices of the internet of things and the development of 5G, the cloud computing technology is becoming weak, so as to supplement the cloud computing technology, and the edge computing technology is coming along. The method aims at configuring the computing resources at the network edge so as to achieve the aims of reducing the bandwidth occupation of the core network, shortening the time delay and the like.

In the traditional cloud computing mode, a user uploads a computation-intensive task to a cloud server for processing through a core network, and although the computing resources of the cloud server are sufficient, the computation can be completed in a short time, but the transmission delay is larger due to factors such as limited bandwidth of the core network, network jitter and the like. To reduce transmission latency, mobile edge computing technologies deploy computing resources near the network edge of a user, such as at a wireless router or base station. Therefore, only one-hop connection exists between the edge computing server and the user, and the data of the user does not need to be uploaded to the cloud computing server for processing through the core network, so that the data has lower transmission delay. However, compared to cloud computing servers, computing resources of edge computing servers are relatively limited, so how to efficiently allocate and utilize computing resources becomes one of the challenges in mobile edge computing technology. The edge computing environment is modeled as a Markov decision process, and the task success rate and long-term benefits are optimized using a deep reinforcement learning method, taking the complexity of the model into account.

The computing resources in the edge computing server and cloud server are represented by the number of CPU cycles per unit time that can be allocated to tasks running on the servers according to different allocation schemes, but it is difficult to guarantee long-term revenue maximization. Where the benefits are mainly related to the relative delay and server power consumption.

Disclosure of Invention

1. Technical problem to be solved

Based on the problem that the CPU cycle number in the unit time of the computing resources in the edge computing server and the cloud server can be used for representing, the CPU cycle number can be distributed to tasks running on the servers according to different distribution schemes, but long-term benefit maximization is difficult to ensure, the application provides an edge computing resource distribution method based on priority and cooperation.

2. Technical proposal

To achieve the above object, in the present application, a set of edge computing servers and cloud servers in an edge computing server cluster may be expressed asWhere number 0 represents a cloud computing node, number 1, 2. Their computing resource capacity may be represented as v= { V ₀,v₁,v₂,...,v_M }, where V _m represents a nodeIs a function of the computing resource capacity of the computer. After an edge user offloads a task to an edge computing server, the edge computing server is called an origin server of the task, and the origin server may further decide to process the task by itself, offload the task to a cloud server or other edge computing servers in the same cluster, and a server for processing the task is called a destination server.

In this context, the method of the application comprises:

1): designing a state with priority and network node attributes in consideration of edge-to-edge coordination, bian Yun coordination and task self priority, designing actions including destination server decisions and computing resource allocation decisions, and rewards for tasks;

2): designing a first neural network structure for the destination server decision, and a second neural network structure for the computing resource allocation decision for the state, action, and reward defined in 1);

3): according to a given algorithm, training, updating the first neural network and the second neural network in the process of interaction between the intelligent agent and the edge computing environment, and applying after training is finished.

The edge computing environment is divided in time into frames, each frame having a time length t ^frame. Assuming that there are a total of N _k tasks within and uploaded by the cluster to the cloud computing node in the kth frame, they are noted asWhere the subscript k denotes the kth frame.

Another embodiment provided by the application is: the state in 1) is an attribute of all tasks within and uploaded by the cluster to the cloud computing node, the action makes a decision of destination server and computing resource allocation for all tasks, and the reward is a contribution of each task to the utility function.

Another embodiment provided by the application is: the rewards include a delayed revenue term, a task failure penalty term, and an energy consumption penalty term.

Another embodiment provided by the application is: the first neural network in the step 2) is an h network, the h network comprises a state sensor and an h actor network, and the state sensor is used for extracting characteristic information in a state and inputting the characteristic information into the h actor network.

Another embodiment provided by the application is: in the decision process of the destination server, the destination of each task is served as a different decision process, each decision has M+1 actions, the final output is (M+1) multiplied by N scalar quantities, wherein N represents the input task number which can be handled by the neural network at most, therefore, N is larger than or equal to N _k, and M+1 is the number of calculation nodes.

Another embodiment provided by the application is: the second neural network in the 2) is an f network, wherein the f network comprises a state sensor, an f actor network and an f criticizing home network, and the state sensor is used for extracting characteristic information in a state.

Another embodiment provided by the application is: the actor receives the output of the state sensor and then outputs the computing resources f _k＝[f_1,k,f_2,k,...,f_N,k allocated to each task, where the set of tasks is selected from only N outputsCorresponding N _k, these N _k values are used to represent the number of CPU cycles per unit time allocated to the corresponding task; the f criticizing home network receives the output of the state sensor and the computing resource allocation scheme, and then outputs an action state cost function [Q¹(s_k,f_k),Q²(s_k,f_k),...,Q^N(s_k,f_k)], for these actions, where s _k is the state defined in 1), Q ¹(s_k,f_k) a state cost function corresponding to f _1,k, Q ²(s_k,f_k) a state cost function corresponding to f _2,k, and so on.

Another embodiment provided by the application is: the first neural network in the 3) uses a mean square error function as a Loss function when updated, and the second neural network uses a mean square error function as a Loss function when updated.

The updating method of the first neural network (h network) in the 3) comprises the following steps: assuming task T _i,k is inherited to the k+1 frame and noted as T _m,k+1 (note d _i,k =0 at this time), or has been completed successfully or failed by a timeout (note d _i,k =1 at this time), thenUpdating the neural network for the Loss function, where θ ^h,policy represents a parameter of the h network,Q _h,target and Q _h,policy represent the output of the h target network and the h network, respectively, s _k and s _k+1 are the states of the environment in the kth frame, s _i,k and s _m,k+1 represent all the attributes of tasks T _i,k and T _m,k+1, respectively, D _i,k is the destination server of task T _i,k, and R _i,k represents the prize obtained by task T _i,k, and γ is the discount factor.

The f actor network and the f criticizing home network of the second neural network in the 3) are updated respectively.

The method for updating the criticizing home network and the state sensor comprises the following steps: assume that task T _i,k is inherited to the k+1 frame and noted as T _m,k+1 (note d _i,k =0 at this time), or has been completed successfully or failed by a timeout (note d _i,k =1 at this time), toUpdating the neural network for the Loss function, where θ ^f,policy represents the parameters of the criticizing and status sensors,S _k and s _k+1 are states of the environment in the kth frame, f _k represents a computing resource allocation decision of the kth frame,AndRespectively representing the ith output of the f criticizing home network and the mth output of the target network corresponding to the f criticizing home, and pi _f,target represents the output of the target network corresponding to the f actor.

F, the updating method of the actor network and the state sensor comprises the following steps: let L _fa(θ^f,policy)＝-E[Q_f,policy(s_k,π_f,policy(s_k)) as a Loss function, where pi _f,policy represents the output of the f actor, and the remaining quantities are as above.

3. Advantageous effects

Compared with the prior art, the edge computing resource allocation method based on priority and cooperation has the beneficial effects that:

According to the priority and collaboration-based edge computing resource allocation method provided by the application, effective sensing and decision-making can be performed on an edge computing environment through a state, action and reward definition method, a neural network structure, a neural network input and output structure, training, an application method and the like, and long-term benefit maximization is realized through edge collaboration, bian Yun collaboration and load balancing.

According to the priority and collaboration-based edge computing resource allocation method provided by the application, after the environmental state is decoupled into the state of each task, the state is input into the specially designed neural network, and the output and obtained rewards of the neural network also correspond to each task.

According to the priority and collaboration-based edge computing resource allocation method provided by the application, two sets of neural networks are used, the first neural network and the second neural network are used for respectively making a destination server decision and a computing resource allocation decision, and the long-term benefit maximization is achieved by fully utilizing the collaboration effect.

According to the priority and collaboration-based edge computing resource allocation method, computing resources in the edge computing server and the cloud server, particularly CPU cycles in unit time, are reasonably allocated, so that long-term benefits related to relative time delay and server energy consumption are improved.

Drawings

FIG. 1 is a schematic diagram of a first neural network architecture of the present application;

FIG. 2 is a schematic diagram of a second neural network architecture of the present application;

fig. 3 is an effect schematic diagram of the priority and collaboration-based edge computing resource allocation method of the present application.

Detailed Description

Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings, and according to these detailed descriptions, those skilled in the art can clearly understand the present application and can practice the present application. Features from various embodiments may be combined to obtain new implementations, or substituted for certain features from certain embodiments to obtain further preferred implementations, without departing from the principles of the application.

Although the defects of network jitter and other factors in the traditional cloud computing can not completely meet the requirements of 5G application and service, abundant computing resources still have certain advantages when processing computation intensive tasks, and meanwhile, when the load of edge computing nodes is higher, the cloud computing nodes can share part of load, so that the edge cloud cooperation is realized, and the user requirements are met. The edge computing nodes and the cloud computing nodes are required to be connected through the core network, the bandwidth is relatively limited, the edge computing nodes can be directly connected with each other in a certain area due to the fact that the spatial distribution of the edge computing nodes is similar, the bandwidth is relatively sufficient, and therefore the edge computing nodes can be mutually matched, and edge-to-edge coordination and load balancing are achieved.

Different tasks typically require different priorities. For example, in a mall, task requests of photographing identities initiated by common tourists should have a lower priority and can tolerate longer time delay and even task failure to a certain extent, while task requests of suspicious character or behavior identification initiated by security cameras should have an extremely high priority and need to be successfully processed in a shorter time delay.

Therefore, there is a need to design an edge computing resource allocation method in a scenario where task priority, edge-to-edge, and Bian Yun-to-edge are considered.

Referring to fig. 1 to 3, the present application provides a priority and collaboration-based edge computing resource allocation method, which includes:

Further, the state in 1) is an attribute of all tasks within and uploaded by the cluster to the cloud computing node, the action is a decision made for destination servers and computing resource allocation for all tasks, and the reward is a contribution of each task to the utility function.

Further, the rewards include a delayed revenue term, a task failure penalty term, and an energy consumption penalty term.

Further, the first neural network in 2) is an h network, and the h network includes a state sensor and an h actor network, where the state sensor is configured to extract feature information in a state and input the feature information into the h actor network.

Further, in the decision process of the destination server, the decision of the destination server of each task is regarded as a different decision process, each decision has M+1 actions, the final output is (M+1) multiplied by N scalar, wherein N represents the number of input tasks which can be handled by the neural network at most, therefore, N is larger than or equal to N _k, and M+1 is the number of calculation nodes.

Further, the second neural network in the 2) is an f network, where the f network includes a state sensor, an f actor network, and an f criticizing home network, and the state sensor is used for extracting feature information in a state.

Further, the f actor receives the output of the state sensor and then outputs the computing resources f _k＝[f_1,k,f_2,k,...,f_N,k allocated for each task; the f criticizing home network receives the output of the state sensor and the computing resource allocation scheme, and then outputs an action state cost function [Q¹(s_k,f_k),Q²(s_k,f_k),...,Q^N(s_k,f_k)], for these actions, where s _k is the state defined in 1), Q ¹(s_k,f_k) a state cost function corresponding to f _1,k, Q ²(s_k,f_k) a state cost function corresponding to f _2,k, and so on.

Further, in the 3), the first neural network is updated with a mean square error function as a Loss function, and the second neural network is updated with a mean square error function as a Loss function.

Further, the first neural network updating process in 3) is as follows:

Assuming task T _i,k is inherited to the k+1 frame and noted as T _m,k+1 (note d _i,k =0 at this time), or has been completed successfully or failed by a timeout (note d _i,k =1 at this time), then Updating the neural network for the Loss function, where θ ^h,policy represents a parameter of the h network,Q _h,target and Q _h,policy represent the output of the h target network and the h network, respectively, s _k and s _k+1 are the states of the environment in the kth frame, s _i,k and s _m,k+1 represent all the attributes of tasks T _i,k and T _m,k+1, respectively, D _i,k is the destination server of task T _i,k, and R _i,k represents the prize obtained by task T _i,k, and γ is the discount factor.

F actor and f criticizer of the second neural network are updated respectively;

The method for updating the criticizing home network and the state sensor comprises the following steps: assume that task T _i,k is inherited to the k+1 frame and noted as T _m,k+1 (note d _i,k =0 at this time), or has been completed successfully or failed by a timeout (note d _i,k =1 at this time), to Updating the neural network for the Loss function, where θ ^f,policy represents the parameters of the criticizing and status sensors,S _k and s _k+1 are states of the environment in the kth frame, f _k represents a computing resource allocation decision of the kth frame,AndRespectively representing the ith output of the f criticizing home network and the mth output of the target network corresponding to the f criticizing home, and pi _f,target represents the output of the target network corresponding to the f actor;

f, the updating method of the actor network and the state sensor comprises the following steps: to be used for The neural network is updated as a Loss function, where pi _f,policy represents the output of the f actor, and the remaining quantities are as above.

In the 1), the model state, action and prize definition process for the edge calculation is as follows, taking the kth frame as an example:

before defining the state, the attributes of the task are required to be acquired, taking task T _i,k as an example, where the required attributes are: the amount of data to be transmitted CPU cycle number of required processingResidual allowable delayMaximum allowable delay timeTask priority l _i,k, source serverAnd destination serverThen the first time period of the first time period,

State s _k: attributes of all tasks within and uploaded by the cluster to the cloud computing node. I.e.Wherein the method comprises the steps ofFor each task's attributes.

Act a _k -decision making on destination server and computing resource allocation for all tasks. I.e.Where a _i,k represents the decision made for task T _i,k, a _i,k＝[h_i,k,f_i,k.A destination server representing a processing task; f _i,k denotes the computing resources allocated by the destination server for the task.

Rewards r _k contribution of each task to the utility function. I.e.Wherein R _i,k is also composed of three terms:

delay benefit term:

task failure penalty term:

energy consumption penalty term:

and then the three terms are weighted and combined in the same way as the utility function is calculated to obtain R _i,k:

Where α, η and β are weighting coefficients related to the edge computing environment.

In the 2), the structure of the neural network and the input/output structure are as follows

Note that the states, actions, and rewards defined in 1) are vectors, and the lengths of these three vectors are all related to N _k, which varies in length. The number of the input nodes and the output nodes of the neural network is fixed, namely, the input dimension and the output dimension are fixed. Therefore, before inputting the state s _k into the neural network, zero padding expansion is needed in addition to normalization processing. Also, considering that D _i,k and R _i,k in s _i,k are server numbers and do not represent relative sizes, D _i,k and R _i,k need to be encoded with one-hot codes (one-hot codes). For the action and action state cost functions of the output of the neural network, only significant N _k of them are taken as action and action state cost functions.

In fig. 1 and 2, a schematic structural diagram of a neural network is shown. In both figures it is assumed that there are at most N tasks in the scope of the study, i.e. N _k. Ltoreq.N. The leftmost side of the figure is the neural network input, the rightmost side is the neural network output, except the smallest cubes of the output layers, each cube represents a network structure formed by a plurality of network layers, and each smallest cube of the output layers represents a scalar.

Fig. 1 depicts a first neural network used for destination server decision, which is designated as an h-network for ease of description. In this structure, the leftmost two layers are state sensors (state perceptron) which are responsible for extracting the feature information in the state. The feature information extracted by the state sensor will be input to the h actor (hactor) together with the attribute information of a certain task, the h actor network will output a plurality of action state cost functions [Q(s_k,s_i,k,h_i,k＝0),Q(s_k,s_i,k,h_i,k＝1),...,Q(s_k,s_i,k,h_i,k＝M)], corresponding to the task, wherein Q (s _k,s_i,k,h_i,k =0) represents the action state cost function processed by offloading the task to the computing node 0 (cloud server), Q (s _k,s_i,k,h_i,k =1) represents the action state cost function processed by offloading the task to the computing node 1 (edge server numbered 1), and so on. In this algorithm, the destination server decision for each task is treated as a different decision process, so each decision has m+1 possible actions, so the final output is (m+1) x N scalar quantities.

Fig. 2 depicts a second neural network used for the computational resource allocation problem, which is designated as the f-network for ease of description. The first layers of the input layer of the structure are the same as the first layers of the h network in structure and are all state sensors. The two network blocks to the right of the state sensor are named f actor (f actor) and f criticizer (f critic). f the actor receives the output of the state sensor and then outputs the computing resources f _k＝[f_1,k,f_2,k,...,f_N,k allocated for each task, where nonsensical items can be ignored, and only the N _k items of interest therein are fetched. The criticizing home network receives the output of the state sensor and the computing resource allocation scheme, and then outputs the action state value function [Q¹(s_k,f_k),Q²(s_k,f_k),...,Q^N(s_k,f_k)]. for these actions, regarding each of f _k as an action to be made by the corresponding task, so that the output of the criticizing home also has N dimensions, Q ⁱ(s_k,f_k) corresponds to the state value function of f _i,k. With this configuration, it can be understood that the function of the criticizing agent f is similar to the function of the actor h, and the action state cost function is outputted, whereas the function of the actor f is an action for maximizing the action state cost function.

The training method and the application process in the 3) are as follows:

Each network is provided with a target network with the same structure, namely an h target network with the same structure as the h network and an f target network with the same structure as the f network. In addition, the status, actions, rewards, next actions and tasks of each step are stored in a memory using an empirical replay technique if they were successfully processed or failed due to a timeout. In the interaction process of the agent and the edge computing environment, the concept of a set (episode) is also defined, each L frames are defined as a set, and the update to the neural network is also performed after each set is finished, rather than after each frame.

Destination server decision making process and h-network updating algorithm

For simplicity of expression, the subscript is used to distinguish the h network from the h target network, i.e., the h network is denoted as Q _h,policy, and the h target network is denoted as Q _h,target.

At the beginning of each frame, the h network acquires current state information and outputs an action state cost function of each action. But only the new task T _i,k needs to make a destination server decision using its corresponding output, i.e

In the training process, in order to ensure that the intelligent agent can fully explore the environment, the action is only taken with a certain probability of 1- _k, and the random action is taken with a probability of _k. I.e.

The exploration of the environment by the agent should decrease as the number of iteration rounds increases, so e _k will decrease as the number of iteration rounds increases.

Unlike a typical reinforcement learning environment, not every frame decision action in the environment may be performed. In this environment, once the destination server is determined, the destination server decision for the next frames is meaningless and cannot be performed, which makes it impossible to iteratively update the h-network by directly applying the bellman best equation. Thus, an update method of the h network needs to be adjusted. Specifically, assuming task T _i,k is inherited to the k+1 frame and noted as T _m,k+1, the h network is updated as follows:

This formula can be understood as an option that for an existing task, no matter in which frame, its destination server decision action is D _i,k only; if task T _i,k has been completed successfully in the kth frame or failed due to a timeout, update as follows:

combining the two formulas, the update process can be combined into:

Where d _i,k denotes whether the task T _i,k completed successfully or failed due to a timeout, if the task has completed successfully or failed due to a timeout, d _i,k =1; otherwise, the task is not completed and is inherited to the (k+1) th frame, d _i,k =0. It can also be seen that the transition stored in the memory bank requires not only s _k,a_k,r_k,s_k+1, but also the store completion vector

The update being performed by taking the mean square error function as the Loss function, i.e

Wherein the method comprises the steps of

In the formula, the h network is iteratively updated in each set, and the h target network directly updates itself by using parameters of the h network after the h network updates the C round (C is a constant). If θ ^h,policy is used to represent the parameters of the h network, θ ^h,target is used to represent the parameters of the h target network, let θ be ^h,target＝θ^h,policy

Computing resource allocation decision process and f network update algorithm

For convenience of description, the state sensor and the f actor network are denoted pi _f,policy, their corresponding target networks are denoted pi _f,target, and the state sensor and the f criticizing home network are denoted Q _f,policy, their corresponding target networks are denoted Q _f,target.

At the beginning of each frame, the computing resources f _k＝[f_1,k,f_2,k,...,f_N,k allocated to each task are output by the f actor according to the state, but in order for the intelligent agent to fully explore the environment, a certain noise needs to be added to the action in the training process, namely

f′_i,k＝clip(f_i,k+n,0,v_m) (12)

Wherein the method comprises the steps ofFor random noise, the noise standard deviation σ _k here also decreases as the iteration number increases, as in the destination server decision process; clip represents a clipping function, which is defined as

The function is used for ensuring that the action after noise addition still meets 0.ltoreq.f _i,k≤v_m. Wherein for tasks that are still in the wired transmission phase, have not yet reached the destination server, the computing resources allocated thereto are forced to be set to 0. If the action after adding noise is not satisfiedThe actions need to be further processed:

The update procedure for the Q _f,policy and Q _f,target networks is very similar to that for the h network, and can be expressed as follows, assuming that task T _i,k is successfully processed to completion at the kth frame or fails due to timeout, or is inherited to the k+1 frame, and is noted as T _m,k+1

The relationship of the f actor and the f criticizing home network is very similar to the relationship of the generator and the discriminator in the generation type countermeasure network, and the goal of the f actor is to maximize the output of the f criticizer, that is, the training process thereof can be expressed as

Like the h network, the mean square error function is used as the Loss function of the Q _f,policy network:

In the middle of

Whereas pi _f,policy is directly used

L_fa(θ^f,policy)＝-E[Q_f,policy(s_k,π_f,policy(s_k))] (19)

As a Loss function. Note that L _fc is only used to update Q _f,policy instead of the entire f network, and L _fa is also only used to update pi _f,policy, with the f criticizing home network being pinned while pi _f,policy is updated.

Similarly, the f network is updated iteratively in each set, and the f target network directly updates itself with the parameters of the f network after the f network updates the C round (C is a constant). However, unlike the h network, soft update is used for the update of the f network, that is, if θ ^f,policy is used to represent the parameters of the f network, θ ^f,target is used to represent the parameters of the f target network, and θ ^f,target＝τθ^f,policy+(1-τ)θ^f,target is given per C rounds, where τ is the update rate and typically takes a smaller value.

In summary, the interaction process of the agent and the environment and the learning process of the agent are shown as algorithm 1.

When in application, according to the structure and the input/output method described in the 2), a state action cost function is obtained from the h network, an action corresponding to the maximum value is selected from the state action cost function as a destination server decision, and a calculation resource decision for each task is obtained from the f actor in the f network.

Examples

In this section, a specific edge calculation model is given, but it should be noted that this is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and it should be understood by those skilled in the art that the present application includes but is not limited to the accompanying drawings and the description of the above specific embodiment. Any modifications which do not depart from the functional and structural principles of the present application are intended to be included within the scope of the appended claims.

The edge computing network model is composed of three layers of structures, namely: cloud computing node layer, edge computing server cluster layer, and IoT device layer (user layer). The cloud settlement node layer contains one cloud computing node, which has relatively many computing resources. The edge computing server cluster layer comprises a plurality of edge computing server clusters, each server cluster comprises a plurality of edge computing servers, a schematic diagram of three edge computing server clusters is drawn in the figure, each server cluster comprises three edge computing servers, and each edge computing server is placed beside a wireless access point (such as a base station, a wireless router and the like), so that the transmission delay from the wireless access point to the edge computing server can be ignored. The edge computing server clusters are partitioned according to location distribution, that is, several edge computing servers within an edge computing server cluster are relatively close in spatial location, so they are directly connected to each other by means of optical fibers or the like. Different edge computing server clusters are commonly connected to the core network and connected to cloud computing nodes through the core network. The IoT device layer contains several IoT devices, each connected with a wireless access point through a wireless link.

Consider edge-to-edge collaboration within a cluster of servers and collaboration of edge computing servers within the cluster with cloud computing nodes. The set formed by the cloud node and the edge computing server in a certain cluster is recorded asWhere number 0 represents a cloud computing node, number 1, 2. The computational resource capacity of these nodes may be represented as v= { V ₀,v₁,v₂,...,v_M }, where V _m represents the nodeThe computing resources of the cloud node are relatively plentiful, i.e. v ₀＞v_m (m noteq0). In this model, the computing resources are expressed in terms of the number of CPU cycles per unit time. IoT devices connected to wireless access points beside an edge computing server within the cluster may be represented as

In a network, users are involved in the transmission of data in offloading tasks. In the wireless transmission process, the bandwidth is B ^wireless; in the wired transmission process, the data is assumed to be at the nodeTransmission between them, its bandwidth isBecause edge computing nodes in the same cluster are distributed close in space and are directly connected by using optical fibers and the like, bandwidth, data transmission rate and the like between the edge nodes are relatively high; the distance between the edge computing node and the cloud computing node is far, and the edge computing node and the cloud computing node need to be connected through a core network, so that the transmission delay is relatively high, the network bandwidth is relatively narrow, and the cloud computing node is relatively narrow for

Due to limitation of battery life, limitation of computing power and the like of the internet of things equipment, the internet of things equipment needs to process tasks which are continuously generated by the internet of things equipment by means of computing resources of the edge computing nodes and the cloud computing nodes. It is assumed that the task is atomic, i.e. the task can only be processed at one edge computing node or at cloud nodes, and cannot be processed in a separate way after being split. Thus, a task will first be offloaded to its corresponding edge computing server next to the wireless access point, which server is called the source server for the task, which can have three further choices for the task:

a) The source server processes the task itself;

b) The task is further transmitted to other edge computing servers in the same cluster for processing;

c) And further offloading the task to the cloud computing node for processing.

The server that ultimately processes the task is referred to as the destination server for the task.

A task may be abstracted into several key attributes. Assuming that there are a total of N _k tasks within and uploaded by the cluster to the cloud computing node at the kth frame, they are noted asSome of these tasks are tasks that have not been completed or failed within the k-1 frame, and they are "inherited" by the k-1 frame, called existing tasks; still other tasks are just reaching the edge computation server at the beginning of this frame, yet to be further offloaded or processed, called new tasks. These tasks themselves possess some properties related to transmission, processing, priority, for example task T _i,k, which can be expressed as:

a) The amount of data to be transmitted After reaching the source server, the task is further processed by the destination server, and in this process, the source server needs to transmit data including user input data, code data, and the like to the destination server. If the destination server and the source server are the same server, thenIf it is a different server, thenTo keep the size of the data volume to be transferred from the source server to the destination server, for a new task,

B) CPU cycle number of required processingThe task itself requires a certain amount of computation, expressed in terms of CPU cycles. For new tasks and tasks that have not yet reached the destination server,

C) Residual allowable delayThe task is transmitted and processed for a certain time, and the remaining allowable time delay is reduced. The desired task can be atCompletion in time, but it is also possible that the task is completed over time, when the task processing times out,Possibly negative. For new tasksWherein the method comprises the steps ofFor the maximum allowable time delay to be reached,Time consuming for the wireless transmission process.

D) Task priority l _i,k. The priority value is an integer, and 1 represents the lowest priority.

In addition to these properties of the task itself, there are network-related properties, namely origin serversAnd destination server

In the kth frame, the utility function will contain a delayed benefit termTask failure penalty termAnd calculating a node energy consumption penalty termThree parts:

the delayed benefit term is the benefit obtained when the task is completed within the current frame, its relative delayed benefit to the task Proportional to task priority l _i,k, i.e

Wherein 1 _A (x) is an indicator function for indicating whether the element x is in the set A, and the expression is

WhileRepresenting a set of tasks that have been successfully processed in the kth frame.

The task failure punishment item is the punishment which is received when the task processing in the current frame fails in the time-out, and is removed from the environment, and is proportional to the task priority, and the higher the priority, the greater the obtained punishment is after the task processing fails in the time-out, namely

Wherein the method comprises the steps ofRepresenting a set of tasks that failed to process at the kth frame.

The computing node energy consumption penalty term is the sum of the energy consumption of each computing node

Where δ (x) is defined as a binary function expressed as:

Kappa is a power coefficient, and the unit is W.Hz ^-3; representing the power of the server m.

The utility function being a weighted combination of the three, i.e. utility function

Wherein alpha, beta and eta are weighting coefficients, and care should be taken to satisfy eta > alpha, i.e. to ensure successful processing of task timeoutThe negative benefit obtained is greater than the penalty obtained when the task timeout process fails.

According to said 1), in such an environment, states, actions and rewards in the edge computation model are first defined, taking the kth frame as an example:

State s _k: attributes of all tasks within and uploaded by the cluster to the cloud computing node. I.e. Wherein the method comprises the steps ofFor each task's attributes.

delay benefit term:

task failure penalty term:

energy consumption penalty term:

wherein α, β and η are defined the same as the corresponding variables in the utility function.

After defining the states, actions and rewards, the structure and input/output structure of the defined neural network are as follows:

Fig. 1 depicts a first neural network used for destination server decision, which is designated as an h-network for ease of description. In this structure, the leftmost two layers are state sensors (state perceptron) which are responsible for extracting the feature information in the state. The feature information extracted by the state sensor is input to the h actor (hactor) together with the attribute information of a certain task, and the h actor outputs a plurality of action state cost functions [Q(s_k,s_i,k,h_i,k＝0),Q(s_k,s_i,k,h_i,k＝1),…,Q(s_k,s_i,k,h_i,k＝M)], corresponding to the task, wherein Q (s _k,s_i,k,h_i,k =0) represents an action state cost function processed by offloading the task to the computing node 0 (cloud server), Q (s _k,s_i,k,h_i,k =1) represents an action state cost function processed by offloading the task to the computing node 1 (edge server numbered 1), and the like. In this algorithm, the destination server decision for each task is treated as a different decision process, so each decision has m+1 possible actions, so the final output is (m+1) x N scalar quantities.

Fig. 2 depicts a second neural network used for the computational resource allocation problem, which is designated as the f-network for ease of description. The first layers of the input layer of the structure are the same as the first layers of the h network in structure and are all state sensors. The two network blocks to the right of the state sensor are named f actor (f actor) and f criticizer (f critic). f the actor receives the output of the state sensor and then outputs the computing resources f _k＝[f_1,k,f_2,k,…,f_N,k allocated for each task, where nonsensical items can be ignored, and only the N _k items of interest therein are fetched. The criticizing home network receives the output of the state sensor and the computing resource allocation scheme, and then outputs the action state value function [Q¹(s_k,f_k),Q²(s_k,f_k),…,Q^N(s_k,f_k)]. for these actions, regarding each of f _k as an action to be made by the corresponding task, so that the output of the criticizing home also has N dimensions, Q ⁱ(s_k,f_k) corresponds to the state value function of f _i,k. With this configuration, it can be understood that the function of the criticizing agent f is similar to the function of the actor h, and the action state cost function is outputted, whereas the function of the actor f is an action for maximizing the action state cost function.

Finally, a training method and an application process are provided as follows:

Destination server decision making process and h-network updating algorithm

combining the two formulas, the update process can be combined into:

Wherein the method comprises the steps of

Computing resource allocation decision process and f network update algorithm

f′_i,k＝clip(f_i,k+n,0,v_m) (37)

In the middle of

Whereas pi _f,policy is directly used

L_fa(θ^f,policy)＝-E[Q_f,policy(s_k,π_f,policy(s_k))] (44)

With 500 frames as a set, fig. 3 is made with the sum of utility functions of all frames in each set as a profit (recall) as an index. "no cooperation benchmark" is a curve corresponding to the algorithm of the present application, and "proposed" is a curve corresponding to the algorithm of the present application, wherein the abscissa represents the training process, and the ordinate represents the benefit of each set, i.e. the sum of utility functions of each frame in each set. It can be seen that after a period of training, the algorithm is stable, and the benefit is about three times more than that of the uncooperative scheme.

Although the application has been described with reference to specific embodiments, those skilled in the art will appreciate that many modifications are possible in the construction and detail of the application disclosed within the spirit and scope thereof. The scope of the application is to be determined by the appended claims, and it is intended that the claims cover all modifications that are within the literal meaning or range of equivalents of the technical features of the claims.

Claims

1. The edge computing resource allocation method based on priority and cooperation is characterized by comprising the following steps of: the method comprises the following steps:

3): training and updating the first neural network and the second neural network in the process of interaction between the intelligent agent and the edge computing environment according to a given algorithm, and applying after the training is finished;

The state in 1) is the attribute of all tasks within and uploaded by the cluster to the cloud computing node, the actions make decisions of destination server and computing resource allocation for all tasks, and the rewards are contributions of each task to the utility function; the rewards include a delayed revenue term, a task failure penalty term, and an energy consumption penalty term; the first neural network in the step 2) is an h network, wherein the h network comprises a state sensor and an h actor network, and the state sensor is used for extracting characteristic information in a state and inputting the characteristic information into the h actor network; in the DQN algorithm, the decision of a destination server of each task is regarded as a different decision process, each decision has M+1 actions, the final output is (M+1) multiplied by N scalar quantities, wherein N represents the number of input tasks which can be handled by the neural network at most, so that N is not less than N _k, and (M+1) is the number of calculation nodes; the second neural network in the 2) is an f network, wherein the f network comprises a state sensor, an f actor network and an f criticizing home network, and the state sensor is used for extracting characteristic information in a state; the actor receives the output of the state sensor and then outputs the computing resource f _k＝[f_1,k,f_2,k,…,f_N,k distributed to each task; the f criticizing home network receives the output of the state sensor and the computing resource allocation scheme, and then outputs an action state value function [Q¹(s_k,f_k),Q²(s_k,f_k),…,Q^N(s_k,f_k)], for the actions, wherein s _k is the state defined in 1), Q ¹(s_k,f_k) a state value function corresponding to f _1,k, Q ²(s_k,f_k) a state value function corresponding to f _2,k, and so on; the first neural network in the 3) takes a mean square error function as a Loss function when updated, and the second neural network takes the mean square error function as the Loss function when updated; the first neural network updating process in the 3) is as follows:

To be used for Updating the neural network for the Loss function, where θ ^h,policy represents a parameter of the h network,Q _h,target and Q _h,policy represent the output of the h target network and the h network, respectively, s _k and s _k+1 are the states of the environment in the kth frame, s _i,k and s _m,k+1 represent all the attributes of tasks T _i,k and T _m,k+1, respectively, D _i,k is the destination server of task T _i,k, and R _i,k represents the prize obtained by task T _i,k, and γ is the discount factor;

F actor and f criticizer of the second neural network are updated respectively;

f, the updating method of the actor network and the state sensor comprises the following steps: with L _fa(θ^f,policy)＝-E[Q_f,policy(s_k,π_f,policy(s_k)) as a Loss function, where pi _f,policy represents the output of the f actor;

f, the updating method of the actor network and the state sensor comprises the following steps: let L _fa(θ^f,policy)＝-E[Q_f,policy(s_k,π_f,policy(s_k)) as a Loss function, where pi _f,policy represents the output of the f actor.