CN111813539A - Edge computing resource allocation method based on priority and cooperation - Google Patents

Edge computing resource allocation method based on priority and cooperation Download PDF

Info

Publication number
CN111813539A
CN111813539A CN202010473969.6A CN202010473969A CN111813539A CN 111813539 A CN111813539 A CN 111813539A CN 202010473969 A CN202010473969 A CN 202010473969A CN 111813539 A CN111813539 A CN 111813539A
Authority
CN
China
Prior art keywords
network
task
state
resource allocation
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010473969.6A
Other languages
Chinese (zh)
Inventor
袁新杰
杜清河
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010473969.6A priority Critical patent/CN111813539A/en
Publication of CN111813539A publication Critical patent/CN111813539A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer And Data Communications (AREA)

Abstract

The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method based on priority and cooperation. The computing resources in the edge computing servers and cloud servers are represented by the number of CPU cycles per unit time available, which can be allocated to tasks running on the servers according to different allocation schemes, but it is difficult to guarantee that long-term revenue is maximized. The application provides a method for distributing edge computing resources based on priority and cooperation, which comprises the following steps: 1) defining edge computation model states, actions and rewards; 2) defining the structure of a neural network and the structure of input and output; 3) and updating, training and applying the neural network according to a given training method. By reasonably distributing computing resources in the edge computing server and the cloud server, specifically CPU (Central processing Unit) cycles in unit time, the long-term benefits related to relative time delay and server energy consumption are improved.

Description

Edge computing resource allocation method based on priority and cooperation
Technical Field
The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method based on priority and cooperation.
Background
Mobile users usually have less computing resources due to their own volume and other limitations, and cannot perform a large amount of computation for a long time due to the limitations of device energy consumption and battery capacity. Therefore, for some compute-intensive tasks, if the user only uses his own computing resources to perform processing, it is difficult to meet the requirements of the tasks for low latency, and problems such as shortened standby time and excessive heat generation of the device may be caused. Mobile users therefore need to resort to external computing resources, which in existing networks are typically from cloud computing nodes, also referred to as cloud nodes or cloud servers. However, with the increase of internet of things devices and the development of 5G, the cloud computing technology is increasingly weak in performance, so that the edge computing technology comes along with the increase of the internet of things devices as a supplement. The method aims to configure computing resources at the edge of a network so as to achieve the aims of reducing the bandwidth occupation of a core network, shortening time delay and the like.
In a traditional cloud computing mode, a user uploads a computing-intensive task to a cloud server for processing through a core network, although computing resources of the cloud server are sufficient and computing can be completed in a short time, transmission delay caused by factors such as limited bandwidth of the core network and network jitter is large. To reduce transmission delays, mobile edge computing technology deploys computing resources near the edge of the user's network, such as at a wireless router or base station. Therefore, only one-hop connection exists between the edge computing server and the user, and the data of the user does not need to be uploaded to the cloud computing server through the core network for processing, so that the lower transmission delay is realized. However, compared to a cloud computing server, the computing resources of the edge computing server are relatively limited, and therefore how to efficiently allocate and utilize the computing resources becomes one of the challenges in the mobile edge computing technology. The edge computing environment is modeled as a Markov decision process, and a deep reinforcement learning method is used for optimizing the success rate of tasks and long-term income in consideration of the complexity of the model.
The computing resources in the edge computing servers and cloud servers are represented by the number of CPU cycles per unit time available, which can be allocated to tasks running on the servers according to different allocation schemes, but it is difficult to guarantee that long-term revenue is maximized. Where the benefits are mainly related to relative latency and server energy consumption.
Disclosure of Invention
1. Technical problem to be solved
Based on the problem that the number of CPU cycles in the available unit time of computing resources in an edge computing server and a cloud server can be distributed to tasks running on the server according to different distribution schemes, but long-term profit maximization is difficult to guarantee, the application provides an edge computing resource distribution method based on priority and cooperation.
2. Technical scheme
To achieve the above objective, the set of edge computing servers and cloud servers in an edge computing server cluster in this application can be expressed as
Figure BDA0002515240070000021
Where number 0 represents a cloud computing node, number 1, 2. Their computational resource capacity may be expressed as V ═ V0,v1,v2,...,vMIn which v ismRepresenting nodes
Figure BDA0002515240070000022
The computing resource capacity of (a). After an edge user unloads a task to a certain edge computing server, the edge computing server is called as an origin server of the task, the origin server can further decide to process the task by itself or unload the task to a cloud server or other edge computing servers in the same cluster for processing, and a server for processing the task is called as a destination server.
In such an environment, the method of the present application comprises:
1): designing a state with priority and network node attributes under the condition of considering edge-to-edge coordination, edge-to-cloud coordination and task self-priority, designing an action comprising destination server decision and computing resource allocation decision, and rewarding aiming at the task;
2): for the 1) defined states, actions, and rewards, designing a first neural network structure for destination server decisions, and designing a second neural network structure for computing resource allocation decisions;
3): and according to a given algorithm, training and updating the first neural network and the second neural network in the process of interaction of the agent and the edge computing environment, and applying the neural networks after training.
Dividing the edge computing environment into a plurality of frames in time, wherein the time length of each frame is tframe. Assuming that at the k frame, the total number of tasks uploaded to the cloud computing node in and from the cluster is NkWill mark them as
Figure BDA0002515240070000023
Where the index k denotes the kth frame.
Another embodiment provided by the present application is: the state in 1) is an attribute of all tasks uploaded to the cloud computing node within and by the cluster, the action is a decision of destination server and computing resource allocation made by all tasks, and the reward is a contribution of each task to a utility function.
Another embodiment provided by the present application is: the rewards include a delayed revenue item, a task failure penalty item, and an energy consumption penalty item.
Another embodiment provided by the present application is: the first neural network in the step 2) is an h network, the h network comprises a state perceptron and an h actor network, and the state perceptron is used for extracting characteristic information in a state and inputting the characteristic information into the h actor network.
Another embodiment provided by the present application is: in the destination server decision process, the destination of each task is served as a different decision process, each decision has M +1 actions, and the final output is (M +1) multiplied by N scalars, wherein N represents the input task number which can be maximally responded by the neural network, so that N is more than or equal to NkAnd M +1 is the number of compute nodes.
Another embodiment provided by the present application is: the second neural network in the step 2) is an f network, the f network comprises a state perceptron, an f actor network and an f criticizing network, and the state perceptron is used for extracting characteristic information in a state.
Another embodiment provided by the present application is: the f actor receives the output of the state sensor, thenPost-outputting the computing resources f allocated to each taskk=[f1,k,f2,k,...,fN,k]Here, the task set is selected only among the N outputs
Figure BDA0002515240070000031
Corresponding to NkN ofkThe value is used to represent the number of CPU cycles per unit time allocated to the corresponding task; the f criticizing network receives the output of the state perceptron and the calculation resource allocation scheme and then outputs an action state cost function [ Q ] for the actions1(sk,fk),Q2(sk,fk),...,QN(sk,fk)]Wherein s iskIs the state defined in said 1), Q1(sk,fk) Corresponds to f1,kState cost function of, Q2(sk,fk) Corresponds to f2,kThe state cost function of (c), and so on.
Another embodiment provided by the present application is: and 3) when the first neural network is updated, the mean square error function is taken as a Loss function, and when the second neural network is updated, the mean square error function is taken as the Loss function.
The updating method of the first neural network (h network) in the step 3) comprises the following steps: suppose task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1(in this time di,k0) or has been successfully completed or has failed due to a timeout (d is noted here)i,k1), then
Figure BDA0002515240070000032
Updating the neural network for the Loss function, where θh,policyA parameter representing the h-network is shown,
Figure BDA0002515240070000033
Qh,targetand Qh,policyRespectively representing the outputs of the h-target network and the h-network, skAnd sk+1For the state of the environment in the k frame, si,kAnd sm,k+1Respectively represent tasks Ti,kAnd Tm,k+1OfHaving an attribute of Di,kFor task Ti,kA destination server of, and Ri,kRepresenting a task Ti,kThe prize awarded, γ, is a discount factor.
The f actor network and the f critics network of the second neural network in 3) are updated respectively.
The updating method of the criticizing family network and the state perceptron comprises the following steps: suppose task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1(in this time di,k0) or has been successfully completed or has failed due to a timeout (d is noted here)i,k1) to
Figure BDA0002515240070000034
Updating the neural network for the Loss function, where θf,policyParameters representing the critics and the status perceptors,
Figure BDA0002515240070000035
skand sk+1For the state of the environment at the k frame, fkRepresenting the computational resource allocation decision for the k-th frame,
Figure BDA0002515240070000036
and
Figure BDA0002515240070000037
respectively representing the ith output of the f critic network and the mth output of the target network corresponding to the f critics, pif,targetAnd f, showing the output of the target network corresponding to the actor.
The updating method of the actor network and the state perceptron comprises the following steps: with Lfaf,policy)=-E[Qf,policy(skf,policy(sk))]Updating the neural network as a Loss function, wheref,policyThe term "output" means "output of an actor" and the other terms are as defined above.
3. Advantageous effects
Compared with the prior art, the edge computing resource allocation method based on priority and cooperation has the advantages that:
according to the marginal computing resource allocation method based on priority and cooperation, effective perception and decision can be conducted on marginal computing environments through state, action and reward definition methods, a neural network structure, a neural network input and output structure, training and application methods and the like, and long-term profit maximization is achieved through side-to-side cooperation, side-to-cloud cooperation and load balancing.
According to the priority and cooperation-based edge computing resource allocation method, after the environment state is decoupled into the states of all tasks, the states are input into a specially designed neural network, and the output and the acquired rewards of the neural network also correspond to all the tasks.
According to the edge computing resource allocation method based on priority and cooperation, destination server decision and computing resource allocation decision are carried out by two sets of neural networks, namely the first neural network and the second neural network, and long-term profit maximization is achieved by fully utilizing the cooperation effect.
According to the edge computing resource allocation method based on priority and cooperation, computing resources in the edge computing server and the cloud server are reasonably allocated, specifically the number of CPU cycles in unit time, so that long-term benefits related to relative time delay and server energy consumption are improved.
Drawings
FIG. 1 is a schematic diagram of a first neural network architecture of the present application;
FIG. 2 is a second neural network architecture diagram of the present application;
FIG. 3 is a schematic diagram illustrating an effect of the edge computing resource allocation method based on priority and cooperation according to the present application.
Detailed Description
Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings, and it will be apparent to those skilled in the art from this detailed description that the present application can be practiced. Features from different embodiments may be combined to yield new embodiments, or certain features may be substituted for certain embodiments to yield yet further preferred embodiments, without departing from the principles of the present application.
Although the defects of factors such as network jitter and the like in the traditional cloud computing can not completely meet the requirements of 5G application and service, the abundant computing resources have certain advantages when processing computing-intensive tasks, and meanwhile, when the load of the edge computing nodes is high, the cloud computing nodes can share part of the load, so that edge cloud cooperation is realized, and the user requirements are met. The edge computing nodes and the cloud computing nodes need to be connected through a core network, the bandwidth of the edge computing nodes is relatively limited, and the edge computing nodes can be directly connected with each other in a certain area due to the close spatial distribution of the edge computing nodes, so that the bandwidth is relatively sufficient, and the edge computing nodes can be mutually matched to realize edge-edge cooperation and load balancing.
Different tasks typically require different priorities. For example, in a certain commercial city, a task request for photographing and identifying an object initiated by a common tourist should have a lower priority and can tolerate a longer time delay or even a task failure to a certain extent, while a task request for identifying a suspicious person or a behavior initiated by a security camera should have an extremely high priority and needs to be successfully processed within a shorter time delay.
Therefore, it is necessary to design an edge computing resource allocation method under a scenario in which task priority, edge-to-edge collaboration, and edge-to-cloud collaboration are considered.
With reference to fig. 1 to 3, the present application provides a method for allocating edge computing resources based on priority and collaboration, the method comprising:
1): designing a state with priority and network node attributes under the condition of considering edge-to-edge coordination, edge-to-cloud coordination and task self-priority, designing an action comprising destination server decision and computing resource allocation decision, and rewarding aiming at the task;
2): for the 1) defined states, actions, and rewards, designing a first neural network structure for destination server decisions, and designing a second neural network structure for computing resource allocation decisions;
3): and according to a given algorithm, training and updating the first neural network and the second neural network in the process of interaction of the agent and the edge computing environment, and applying the neural networks after training.
Further, the state in 1) is an attribute of all tasks uploaded to the cloud computing node within and from the cluster, the action is used as a decision of destination server and computing resource allocation made for all tasks, and the reward is a contribution of each task to a utility function.
Further, the rewards include a delayed revenue item, a task failure penalty item, and an energy consumption penalty item.
Further, the first neural network in 2) is an h-network, the h-network includes a state perceptron and an h-actor network, and the state perceptron is configured to extract feature information in a state and input the feature information into the h-actor network.
Further, in the destination server decision process, the destination server decision of each task is regarded as different decision processes, each decision has M +1 actions, and the final output is (M +1) multiplied by N scalars, wherein N represents the number of input tasks which can be maximally handled by the neural network, so that N is more than or equal to NkAnd M +1 is the number of compute nodes.
Further, the second neural network in 2) is an f network, the f network comprises a state perceptron, an f actor network and an f criticizing family network, and the state perceptron is used for extracting characteristic information in a state.
Further, the f actor receives the output of the state perceptron and then outputs the computing resources f allocated for each taskk=[f1,k,f2,k,...,fN,k](ii) a The f criticizing network receives the output of the state perceptron and the calculation resource allocation scheme and then outputs an action state cost function [ Q ] for the actions1(sk,fk),Q2(sk,fk),...,QN(sk,fk)]Wherein s iskIs the state defined in said 1), Q1(sk,fk) Corresponds to f1,kState cost function of, Q2(sk,fk) Corresponds to f2,kThe state cost function of (c), and so on.
Further, in the step 3), the mean square error function is used as a Loss function when the first neural network is updated, and the mean square error function is used as a Loss function when the second neural network is updated.
Further, the first neural network updating process in 3) is as follows:
suppose task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1(in this time di,k0) or has been successfully completed or has failed due to a timeout (d is noted here)i,k1), then
Figure BDA0002515240070000061
Updating the neural network for the Loss function, where θh,policyA parameter representing the h-network is shown,
Figure BDA0002515240070000062
Qh,targetand Qh,policyRespectively representing the outputs of the h-target network and the h-network, skAnd sk+1For the state of the environment in the k frame, si,kAnd sm,k+1Respectively represent tasks Ti,kAnd Tm,k+1All attributes of, Di,kFor task Ti,kA destination server of, and Ri,kRepresenting a task Ti,kThe prize awarded, γ, is a discount factor.
F actor and f critics of the second neural network will be updated separately;
the updating method of the criticizing family network and the state perceptron comprises the following steps: suppose task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1(in this time di,k0) or has been successfully completed or has failed due to a timeout (d is noted here)i,k1) to
Figure BDA0002515240070000063
Updating the neural network for the Loss function, where θf,policyParameters representing the critics and the status perceptors,
Figure BDA0002515240070000064
skand sk+1For the state of the environment at the k frame, fkRepresenting the computational resource allocation decision for the k-th frame,
Figure BDA0002515240070000065
and
Figure BDA0002515240070000066
respectively representing the ith output of the f critic network and the mth output of the target network corresponding to the f critics, pif,targetAn output representing a target network corresponding to the actor;
the updating method of the actor network and the state perceptron comprises the following steps: to be provided with
Figure BDA0002515240070000067
Updating the neural network as a Loss function, wheref,policyThe term "output" means "output of an actor" and the other terms are as defined above.
In the step 1), the process of defining the state, the action and the reward of the edge calculation model is as follows, taking the k-th frame as an example:
before defining the state, firstly, the attribute of the task is required to be acquired, and the task T is used fori,kFor example, the required attributes are: amount of data to be transmitted
Figure BDA0002515240070000071
Number of CPU cycles to process
Figure BDA0002515240070000072
Remaining allowable delay
Figure BDA0002515240070000073
Maximum allowed time delay
Figure BDA0002515240070000074
Task priority li,kSource server
Figure BDA0002515240070000075
And destination server
Figure BDA0002515240070000076
Then the process of the first step is carried out,
state sk: attributes of all tasks within and uploaded by the cluster to the cloud computing nodes. Namely, it is
Figure BDA0002515240070000077
Wherein
Figure BDA0002515240070000078
For each task attribute.
Action akThe decision of destination server and computing resource allocation made for all tasks. Namely, it is
Figure BDA0002515240070000079
Wherein a isi,kDenoted as task Ti,kThe decision made, ai,k=[hi,k,fi,k]。
Figure BDA00025152400700000710
A destination server representing a processing task; f. ofi,kIndicating the computing resources that the destination server allocated for the task.
Prize rkThe contribution of each task to the utility function. Namely, it is
Figure BDA00025152400700000711
Wherein R isi,kAlso consists of three terms:
the delayed gain term:
Figure BDA00025152400700000712
task failure penalty item:
Figure BDA00025152400700000713
an energy consumption penalty term:
Figure BDA00025152400700000714
then the three terms are weighted and combined in the same way as the calculation utility function to obtain Ri,k
Figure BDA00025152400700000715
Where α, η, and β are weighting coefficients associated with the edge computing environment.
In the above 2), the structure and input-output structure of the neural network are as follows
Note that the states, actions, and rewards defined in said 1) are all vectors, and the length of these three vectors is all equal to NkIn this regard, the length thereof varies. The number of input nodes and output nodes of the used neural network is fixed, that is, the input dimension and the output dimension are fixed. Thus in the state skBefore inputting into the neural network, except normalization processing, zero filling expansion is also needed. While taking into account si,kD in (1)i,kAnd Ri,kIs a server number, and does not indicate a relative size, and thus a one-hot code (D) is required to be encodedi,kAnd Ri,k. For the action of the output of the neural network and the action state cost function, only meaningful N is takenkAs a function of action and action state cost.
The schematic structure of the neural network is shown in fig. 1 and 2. In both figures it is assumed that there are a maximum of N tasks, N, within the scope of the studykN is less than or equal to N. The leftmost side of the graph is the neural network input, the rightmost side is the neural network output, except for the smallest cube of the output layer, each cube represents a network structure formed by a plurality of network layers, and each smallest cube of the output layer represents a scalar.
Fig. 1 depicts a first neural network used for destination server decision making, which network is named h-network for convenience of expression. In this configuration, the two leftmost layers are state perceptrons (state per), which are used to determine the state of the deviceAnd the device is responsible for extracting characteristic information in the state. The characteristic information extracted by the state perceptron and the attribute information of a certain task are input to an h actor (vector), and the h actor network outputs a plurality of action state value functions (Q(s) corresponding to the taskk,si,k,hi,k=0),Q(sk,si,k,hi,k=1),...,Q(sk,si,k,hi,k=M)]Wherein Q(s)k,si,k,hi,k0) represents an action state cost function for offloading the task to compute node 0 (cloud server) processing, Q(s)k,si,k,hi,k1) represents the action state cost function that is handled to offload tasks to compute node 1 (edge server numbered 1), and so on. In this algorithm, the destination server decisions for each task are treated as a different decision process, so there are M +1 possible actions for each decision, so the final output is (M +1) × N scalars.
FIG. 2 depicts a second neural network used for the computational resource allocation problem, which is named f-network for convenience of expression. The first layers of the input layer of the structure are the same as the first layers of the h network in structure and are all state sensors. The two network blocks to the right of the state perceptron are named f actor (f actor) and f criticist (f critic). The actor receives the output of the state perceptron and then outputs the computing resources f allocated for each taskk=[f1,k,f2,k,...,fN,k]In which meaningless terms can be ignored, taking only the meaningful N thereofkAn item. f criticizing the network receives the output of the state perceptron and the calculation resource allocation scheme, and then outputs the action state value function [ Q ] aiming at the actions1(sk,fk),Q2(sk,fk),...,QN(sk,fk)]. Will f iskEach of which is considered as an action made for the corresponding task, so that the output of the f critics also has N dimensions, Qi(sk,fk) Just correspond to fi,kThe state cost function of. With this structure, it can be understood that the function of the f criticist is similar to that of the h actor, and the input is performedThe action state cost function is generated, and the function of the actor is to seek the action that maximizes the action state cost function.
The training method and the application process in the step 3) are as follows:
target networks with the same structure are set for each network, namely an h target network with the same structure as the h network and an f target network with the same structure as the f network. In addition, the status, actions, rewards, next actions and whether the task was successfully processed or failed due to timeout for each step are stored in the memory bank using an empirical replay technique. In the interaction process of the agent and the edge computing environment, a concept of an ensemble (epicode) is also defined, each L frame is defined as one ensemble, and the update of the neural network is also carried out after each ensemble, but not after each frame.
Destination server decision process and h-network update algorithm
For the sake of simplicity, the h network and the h target network are distinguished by subscripts, i.e. the h network is marked as Qh,policyH target network is Qh,target
And at the beginning of each frame, the h network acquires the current state information and outputs an action state cost function of each action. But only new task Ti,kRequiring the use of output corresponding thereto to make destination server decisions, i.e.
Figure BDA0002515240070000091
In the training process, in order to ensure that the intelligent agent can fully explore the environment, only a certain probability of 1-epsilonkTake the action with ∈kTake a random action. Namely, it is
Figure BDA0002515240070000092
The exploration of the environment by the agent should decrease as the number of iteration rounds increases, so belongs tokWill decrease as the number of iteration rounds increases.
Unlike a typical reinforcement learning environment, the environment is a hybridActions other than per frame decisions may be performed. In this environment, once the destination server is determined, the destination server decisions for the next several frames are meaningless and cannot be performed, which makes it impossible to iteratively update the h-network directly using the bellman optimal equation. Therefore, the update method of the h-network needs to be adjusted. Specifically, assume task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1Then h network is updated as follows:
Figure BDA0002515240070000095
this equation may be understood as meaning that for an existing task, in whichever frame, its destination server decision action is only Di,kThis option; if task Ti,kAfter the kth frame has been successfully completed or failed due to timeout, it is updated as follows:
Figure BDA0002515240070000093
by combining the above two equations, the update process can be merged into:
Figure BDA0002515240070000094
wherein d isi,kRepresents the task Ti,kIf the task is successfully completed or fails due to timeout, d i,k1 is ═ 1; otherwise, the task is not completed and will be inherited to the (k +1) th frame, then d i,k0. It can thus also be seen that the transitions stored in the memory banks do not only require sk,ak,rk,sk+1And also stores the completion vector
Figure BDA0002515240070000101
The mean square error function is taken as the Loss function when updating, i.e.
Figure BDA0002515240070000102
Wherein
Figure BDA0002515240070000103
In the formula, h network is iteratively updated in each set, and h target network updates itself directly by using h network parameters after h network updates C round (C is a constant). If using thetah,policyParameter, theta, representing h-networkh,targetThe parameter of the h target network is expressed, and every C round, the order thetah,target=θh,policy
Computational resource allocation decision process and f network update algorithm
For convenience of presentation, the state perceptron and the f actor network are denoted as πf,policyWith its corresponding target network denoted as pif,targetWhile the status sensor and the f criticizing network are denoted as Qf,policyIts corresponding target network is denoted as Qf,target
At the beginning of each frame, the computational resources f allocated to each task are output by the actors based on the statusk=[f1,k,f2,k,...,fN,k]However, in order for the agent to fully explore the environment, a certain amount of noise must be added to the action during the training process, i.e. the agent is not aware of the environment, but only needs to make a certain amount of noise
f′i,k=clip(fi,k+n,0,vm) (12)
Wherein
Figure BDA0002515240070000104
For random noise, the noise standard deviation σ is the same as the destination server decision processkAlso decreases as the number of iteration rounds increases; clip represents a clipping function, which is defined as
Figure BDA0002515240070000105
The function is used for ensuring that the action after adding the noise still satisfies f which is more than or equal to 0i,k≤vm. Wherein for a task still in the wired transmission phase that has not yet reached the destination server, it is forced to set the computing resources allocated to it to 0. If the action after adding noise is not satisfied
Figure BDA0002515240070000106
The actions need to be further processed:
Figure BDA0002515240070000107
Qf,policyand Qf,targetThe network update process is very similar to the h-network update process, assuming task Ti,kSuccessfully processed in the k frame or failed due to timeout, or inherited to the k +1 frame and noted as Tm,k+1Then the update process can be expressed as follows
Figure BDA0002515240070000111
The relationship between the f actor and the f critics' network is very similar to that between the generator and the evaluator in the generative confrontation network, and the goal of the f actor is to maximize the output of the f critics, i.e., its training process can be expressed as
Figure BDA0002515240070000112
As with h-network, the mean square error function is used as Qf,policyLoss function of network:
Figure BDA0002515240070000113
in the formula
Figure BDA0002515240070000114
And pif,policyThen is directly used
Lfaf,policy)=-E[Qf,policy(skf,policy(sk))](19)
As a Loss function. It is noted that LfcFor updating Q onlyf,policyNot the entire f-network, LfaAlso only used to update pif,policyUpdating pif,policyWhen f critics' network is fixed.
Similarly, the f network is iteratively updated in each set, and the f target network updates itself directly with the parameters of the f network after the f network updates C rounds (C is a constant). However, unlike h-networks, updates to f-networks use soft updates, i.e., if θ is usedf,policyRepresenting a parameter of the f-network, thetaf,targetParameters representing the f target network, let θ every C roundsf,target=τθf,policy+(1-τ)θf,targetWherein τ is the update rate, and generally takes a smaller value.
In summary, the interaction process of the agent with the environment and the learning process of the agent are shown in algorithm 1.
Figure BDA0002515240070000115
Figure BDA0002515240070000121
When in application, according to the structure and the input and output method in the 2), a state action value function is obtained from the h network, an action corresponding to the maximum value is selected from the state action value function as a destination server decision, and a computing resource decision for each task is obtained from an f actor in the f network.
Examples
In this section, a specific edge calculation model is given, but it should be noted that this is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and those skilled in the art should understand that the present application includes, but is not limited to, the contents described in the drawings and the above detailed description. Any modification which does not depart from the functional and structural principles of the present application is intended to be included within the scope of the claims.
The edge computing network model is composed of three layers of structures, wherein the three layers are respectively as follows from top to bottom: a cloud computing node layer, an edge computing server cluster layer, and an IoT device layer (user layer). The cloud settlement node layer comprises a cloud computing node which has relatively more computing resources. The edge computing server cluster layer comprises a plurality of edge computing server clusters, each server cluster comprises a plurality of edge computing servers, a schematic diagram of three edge computing server clusters is drawn in the diagram, each server cluster comprises three edge computing servers, and each edge computing server is placed beside one wireless access point (such as a base station, a wireless router and the like), so that the transmission delay from the wireless access point to the edge computing servers can be ignored. The edge calculation server cluster is divided according to position distribution, that is, a plurality of edge calculation servers in one edge calculation server cluster are relatively close in spatial position, so that they are directly connected with each other by optical fibers and the like. Different edge computing server clusters are connected to the core network together and connected to the cloud computing nodes through the core network. The IoT device layer contains several IoT devices, each connected with one wireless access point through a wireless link.
Consider edge-edge collaboration within a cluster of servers and collaboration of edge computing servers with cloud computing nodes within the cluster. Recording a set formed by cloud nodes and edge computing servers in a certain cluster as a set
Figure BDA0002515240070000131
Where number 0 represents a cloud computing node, number 1, 2. The computational resource capacity of these nodes may be denoted V ═ { V ═ V0,v1,v2,...,vMIn which v ismRepresenting nodes
Figure BDA0002515240070000132
The computing resources of the cloud node are relatively sufficient, i.e., v0>vm(m.noteq.0). In this model, the computational resources are expressed in terms of the number of CPU cycles per unit time. Is connected to the clusterThe IoT device of the wireless access point at the edge computing server side can be represented as
Figure BDA0002515240070000133
In a network, users are involved in the transfer of data in the process of offloading tasks. In the wireless transmission process, the bandwidth is Bwireless(ii) a During wired transmission, data is assumed to be in the node
Figure BDA0002515240070000134
Is transmitted between, then its bandwidth is
Figure BDA0002515240070000135
Because the edge computing nodes in the same cluster are distributed close to each other in space and are directly connected by using optical fibers and the like, the bandwidth, the data transmission rate and the like between the edge nodes are relatively high; the distance between the edge computing node and the cloud computing node is longer, and the edge computing node and the cloud computing node need to be connected through a core network, so that the transmission delay is relatively higher, the network bandwidth is relatively narrower, and the network bandwidth is relatively shorter, so that the network bandwidth is relatively longer
Figure BDA0002515240070000136
Due to the limitation of the battery life and the limitation of the computing capacity of the internet of things equipment, the internet of things equipment needs to borrow computing resources of the edge computing nodes and the cloud computing nodes to process tasks which are continuously generated by the internet of things equipment. The task is assumed to be atomic, that is, the task can only be processed at one edge computing node or at a cloud node, and cannot be processed after being divided. Thus, a task will first be offloaded to an edge computing server near its corresponding wireless access point, which is referred to as the origin server for the task, which may have three further choices for the task:
a) the source server processes the task by itself;
b) the task is further transmitted to other edge computing servers in the same cluster for processing;
c) and further unloading the task to the cloud computing node for processing.
The server that ultimately processes the task is referred to as the destination server for the task.
A task may be abstracted into several key attributes. Assume that at the k-th frame, there is a total of N tasks uploaded to the cloud computing nodes within and from the clusterkWill be described as
Figure BDA0002515240070000141
Some of the tasks are tasks which are not completed or failed in the k-1 th frame, and the tasks are inherited by the k-1 th frame, so that the tasks are called existing tasks; still other tasks are just arriving at the edge compute server at the beginning of this frame and are yet to be further offloaded or processed, which are referred to as new tasks. The tasks own some attributes related to transmission, processing and priority, namely the task Ti,kFor example, its own attributes can be expressed as:
a) amount of data to be transmitted
Figure BDA0002515240070000142
After the task reaches the source server, the task is processed by the destination server, and in the process, the source server needs to transmit data including user input data, code data and the like to the destination server. If the destination server and the source server are the same server, then
Figure BDA0002515240070000143
If it is a different server, then
Figure BDA0002515240070000144
The amount of data that needs to be transferred remains from the source server to the destination server, for a new task,
Figure BDA0002515240070000145
b) number of CPU cycles to process
Figure BDA0002515240070000146
The task itself requires a certain amount of computation,the amount of this calculation is expressed in number of CPU cycles. For new tasks and tasks that have not yet reached the destination server,
Figure BDA0002515240070000147
c) remaining allowable delay
Figure BDA0002515240070000148
The task is transmitted and processed with a certain time consumption, and the remaining allowable time delay is reduced. The desired task can be in
Figure BDA0002515240070000149
The task is completed within time, but it is also possible that the task is completed over time, when the task processing times out,
Figure BDA00025152400700001410
possibly negative. For new tasks
Figure BDA00025152400700001411
Wherein
Figure BDA00025152400700001412
In order to maximize the allowed time delay,
Figure BDA00025152400700001413
the wireless transmission process is time consuming.
d) Task priority li,k. The priority is an integer, and 1 represents the lowest priority.
The task has network-related attributes, i.e. origin server, in addition to its own attributes
Figure BDA00025152400700001414
And destination server
Figure BDA00025152400700001415
At the k frame, the utility function will contain a delay gain term
Figure BDA00025152400700001416
Task failure penalty item
Figure BDA00025152400700001417
And calculating node energy consumption penalty item
Figure BDA00025152400700001418
Three parts:
the delayed gain item is the gain obtained when the task in the current frame is completed, and is relative to the delayed gain of the task
Figure BDA00025152400700001419
And task priority li,kIs proportional, i.e.
Figure BDA00025152400700001420
Wherein 1 isA(x) For indicating the function, it is used to indicate whether the element x is in the set A, and its expression is
Figure BDA0002515240070000151
While
Figure BDA0002515240070000152
Indicating a set of successfully completed tasks processed in the kth frame.
The task failure penalty item is the penalty suffered by the current intra-frame task processing overtime and failure, and the task failure penalty item is eliminated when the environment is eliminated, and is in direct proportion to the task priority, and the higher the priority of the task is, the greater the penalty is obtained after the task processing overtime and failure, namely
Figure BDA0002515240070000153
Wherein
Figure BDA0002515240070000154
Indicating a set of tasks that failed processing at the kth frame.
The penalty term of the energy consumption of the computing nodes is the sum of the energy consumption of each computing node
Figure BDA0002515240070000155
Where (x) is defined as a binary function, the expression for which is:
Figure BDA0002515240070000156
kappa is the power coefficient and has the unit of W.Hz-3
Figure BDA0002515240070000157
Representing the power of server m.
The utility function being a weighted combination of the three above, i.e. utility function
Figure BDA0002515240070000158
Where α, β and η are weighting coefficients, care should be taken to satisfy η > α, i.e. to ensure successful processing of a task over time
Figure BDA0002515240070000159
The negative gain obtained is greater than the penalty obtained when the task fails over time.
According to said 1), in such an environment, the states, actions and rewards in the edge calculation model are first defined, taking the k-th frame as an example:
state sk: attributes of all tasks within and uploaded by the cluster to the cloud computing nodes. Namely, it is
Figure BDA00025152400700001511
Wherein
Figure BDA00025152400700001510
For each task attribute.
Action akPurpose made for all tasksAnd the location server and the computing resource allocation. Namely, it is
Figure BDA0002515240070000161
Wherein a isi,kDenoted as task Ti,kThe decision made, ai,k=[hi,k,fi,k]。
Figure BDA0002515240070000162
A destination server representing a processing task; f. ofi,kIndicating the computing resources that the destination server allocated for the task.
Prize rkThe contribution of each task to the utility function. Namely, it is
Figure BDA0002515240070000163
Wherein R isi,kAlso consists of three terms:
the delayed gain term:
Figure BDA0002515240070000164
task failure penalty item:
Figure BDA0002515240070000165
an energy consumption penalty term:
Figure BDA0002515240070000166
then the three terms are weighted and combined in the same way as the calculation utility function to obtain Ri,k
Figure BDA0002515240070000167
Where α, β and η are defined identically to the corresponding variables in the utility function.
After defining the states, actions and rewards, the structure of the neural network and the input-output structure are defined as follows:
note that the states, actions, and rewards defined in 1) are all vectors, and the lengths of these three vectors are all equal to NkIn this regard, the length thereof varies. The number of input nodes and output nodes of the used neural network is fixed, that is, the input dimension and the output dimension are fixed. Thus in the state skBefore inputting into the neural network, except normalization processing, zero filling expansion is also needed. While taking into account si,kD in (1)i,kAnd Ri,kIs a server number, and does not indicate a relative size, and thus a one-hot code (D) is required to be encodedi,kAnd Ri,k. For the action of the output of the neural network and the action state cost function, only meaningful N is takenkAs a function of action and action state cost.
The schematic structure of the neural network is shown in fig. 1 and 2. In both figures it is assumed that there are a maximum of N tasks, N, within the scope of the studykN is less than or equal to N. The leftmost side of the graph is the neural network input, the rightmost side is the neural network output, except for the smallest cube of the output layer, each cube represents a network structure formed by a plurality of network layers, and each smallest cube of the output layer represents a scalar.
Fig. 1 depicts a first neural network used for destination server decision making, which network is named h-network for convenience of expression. In this structure, the two leftmost layers are state perceptrons (state perceptrons) which are responsible for extracting feature information in the state. The characteristic information extracted by the state perceptron and the attribute information of a certain task are input to an h actor (vector), and the h actor outputs a plurality of action state value functions (Q(s) corresponding to the taskk,si,k,hi,k=0),Q(sk,si,k,hi,k=1),…,Q(sk,si,k,hi,k=M)]Wherein Q(s)k,si,k,hi,k0) represents an action state cost function for offloading the task to compute node 0 (cloud server) processing, Q(s)k,si,k,hi,k1) represents an action of offloading a task to a processing node 1 (edge server numbered 1)As a function of state cost, and so on. In this algorithm, the destination server decisions for each task are treated as a different decision process, so there are M +1 possible actions for each decision, so the final output is (M +1) × N scalars.
FIG. 2 depicts a second neural network used for the computational resource allocation problem, which is named f-network for convenience of expression. The first layers of the input layer of the structure are the same as the first layers of the h network in structure and are all state sensors. The two network blocks to the right of the state perceptron are named f actor (f actor) and f criticist (f critic). The actor receives the output of the state perceptron and then outputs the computing resources f allocated for each taskk=[f1,k,f2,k,…,fN,k]In which meaningless terms can be ignored, taking only the meaningful N thereofkAn item. f criticizing the network receives the output of the state perceptron and the calculation resource allocation scheme, and then outputs the action state value function [ Q ] aiming at the actions1(sk,fk),Q2(sk,fk),…,QN(sk,fk)]. Will f iskEach of which is considered as an action made for the corresponding task, so that the output of the f critics also has N dimensions, Qi(sk,fk) Just correspond to fi,kThe state cost function of. With this structure, it can be understood that the function of the f criticist is similar to that of the h actor, and outputs the action state cost function, and the function of the f actor seeks to maximize the action state cost function.
Finally, the training method and the application process are given as follows:
target networks with the same structure are set for each network, namely an h target network with the same structure as the h network and an f target network with the same structure as the f network. In addition, the status, actions, rewards, next actions and whether the task was successfully processed or failed due to timeout for each step are stored in the memory bank using an empirical replay technique. In the interaction process of the agent and the edge computing environment, a concept of an ensemble (epicode) is also defined, each L frame is defined as one ensemble, and the update of the neural network is also carried out after each ensemble, but not after each frame.
Destination server decision process and h-network update algorithm
For the sake of simplicity, the h network and the h target network are distinguished by subscripts, i.e. the h network is marked as Qh,policyH target network is Qh,target
And at the beginning of each frame, the h network acquires the current state information and outputs an action state cost function of each action. But only new task Ti,kRequiring the use of output corresponding thereto to make destination server decisions, i.e.
Figure BDA0002515240070000181
In the training process, in order to ensure that the intelligent agent can fully explore the environment, only a certain probability of 1-epsilonkTake the action with ∈kTake a random action. Namely, it is
Figure BDA0002515240070000182
The exploration of the environment by the agent should decrease as the number of iteration rounds increases, so belongs tokWill decrease as the number of iteration rounds increases.
Unlike a typical reinforcement learning environment, not every frame of decision-making action may be performed in this environment. In this environment, once the destination server is determined, the destination server decisions for the next several frames are meaningless and cannot be performed, which makes it impossible to iteratively update the h-network directly using the bellman optimal equation. Therefore, the update method of the h-network needs to be adjusted. Specifically, assume task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1Then h network is updated as follows:
Figure BDA0002515240070000183
this formula can be understood as being forTasking, with only D being the destination server decision-making action, regardless of which frame it is ini,kThis option; if task Ti,kAfter the kth frame has been successfully completed or failed due to timeout, it is updated as follows:
Figure BDA0002515240070000184
by combining the above two equations, the update process can be merged into:
Figure BDA0002515240070000185
wherein d isi,kRepresents the task Ti,kIf the task is successfully completed or fails due to timeout, d i,k1 is ═ 1; otherwise, the task is not completed and will be inherited to the (k +1) th frame, then d i,k0. It can thus also be seen that the transitions stored in the memory banks do not only require sk,ak,rk,sk+1And also stores the completion vector
Figure BDA0002515240070000186
The mean square error function is taken as the Loss function when updating, i.e.
Figure BDA0002515240070000187
Wherein
Figure BDA0002515240070000188
In the formula, h network is iteratively updated in each set, and h target network updates itself directly by using h network parameters after h network updates C round (C is a constant). If using thetah,policyParameter, theta, representing h-networkh,targetThe parameter of the h target network is expressed, and every C round, the order thetah,target=θh,policy
Computational resource allocation decision process and f network update algorithm
For convenience of presentation, the state perceptron and the f actor network are denoted as πf,policyWith its corresponding target network denoted as pif,targetWhile the status sensor and the f criticizing network are denoted as Qf,policyIts corresponding target network is denoted as Qf,target
At the beginning of each frame, the computational resources f allocated to each task are output by the actors based on the statusk=[f1,k,f2,k,...,fN,k]However, in order for the agent to fully explore the environment, a certain amount of noise must be added to the action during the training process, i.e. the agent is not aware of the environment, but only needs to make a certain amount of noise
f′i,k=clip(fi,k+n,0,vm) (37)
Wherein
Figure BDA0002515240070000191
For random noise, the noise standard deviation σ is the same as the destination server decision processkAlso decreases as the number of iteration rounds increases; clip represents a clipping function, which is defined as
Figure BDA0002515240070000192
The function is used for ensuring that the action after adding the noise still satisfies f which is more than or equal to 0i,k≤vm. Wherein for a task still in the wired transmission phase that has not yet reached the destination server, it is forced to set the computing resources allocated to it to 0. If the action after adding noise is not satisfied
Figure BDA0002515240070000193
The actions need to be further processed:
Figure BDA0002515240070000194
Qf,policyand Qf,targetThe network update process is very similar to the h-network update process, assuming task Ti,kSuccessfully processed in the k frame or failed due to timeout, or inherited to the k +1 frame and noted as Tm,k+1Then the update process can be expressed as follows
Figure BDA0002515240070000195
The relationship between the f actor and the f critics' network is very similar to that between the generator and the evaluator in the generative confrontation network, and the goal of the f actor is to maximize the output of the f critics, i.e., its training process can be expressed as
Figure BDA0002515240070000196
As with h-network, the mean square error function is used as Qf,policyLoss function of network:
Figure BDA0002515240070000201
in the formula
Figure BDA0002515240070000202
And pif,policyThen is directly used
Lfaf,policy)=-E[Qf,policy(skf,policy(sk))](44)
As a Loss function. It is noted that LfcFor updating Q onlyf,policyNot the entire f-network, LfaAlso only used to update pif,policyUpdating pif,policyWhen f critics' network is fixed.
Similarly, the f network is iteratively updated in each set, and the f target network updates itself directly with the parameters of the f network after the f network updates C rounds (C is a constant). However, unlike h-networks, updates to f-networks use soft updates, i.e., if θ is usedf,policyParameters representing f network,θf,targetParameters representing the f target network, let θ every C roundsf,target=τθf,policy+(1-τ)θf,targetWherein τ is the update rate, and generally takes a smaller value.
In summary, the interaction process of the agent with the environment and the learning process of the agent are shown in algorithm 1.
Figure BDA0002515240070000203
Figure BDA0002515240070000211
When in application, according to the structure and the input and output method in the 2), a state action value function is obtained from the h network, an action corresponding to the maximum value is selected from the state action value function as a destination server decision, and a computing resource decision for each task is obtained from an f actor in the f network.
With 500 frames as a set, the figure 3 is made with the sum of the utility functions of all frames in each set as the reward (reward) as the index. The "no collaboration function benchmark" is a curve corresponding to the algorithm provided by the present application without considering collaboration, the "disposed" is a curve corresponding to the algorithm provided by the present application, the abscissa represents the training process, and the ordinate represents the benefit of each set, i.e., the sum of the utility functions of the frames in each set. It can be seen that after a period of training, the algorithm is stable, with approximately three times more gain than the no-collaboration scheme.
Although the present application has been described above with reference to specific embodiments, those skilled in the art will recognize that many changes may be made in the configuration and details of the present application within the principles and scope of the present application. The scope of protection of the application is determined by the appended claims, and all changes that come within the meaning and range of equivalency of the technical features are intended to be embraced therein.

Claims (9)

1. A method for distributing edge computing resources based on priority and cooperation is characterized in that: the method comprises the following steps:
1): designing a state with priority and network node attributes under the condition of considering edge-to-edge coordination, edge-to-cloud coordination and task self-priority, designing an action comprising destination server decision and computing resource allocation decision, and rewarding aiming at the task;
2): for the 1) defined states, actions, and rewards, designing a first neural network structure for destination server decisions, and designing a second neural network structure for computing resource allocation decisions;
3): and according to a given algorithm, training and updating the first neural network and the second neural network in the process of interaction of the agent and the edge computing environment, and applying the neural networks after training.
2. The method of claim 1, wherein the edge computing resource allocation based on priority and cooperation is: the state in 1) is an attribute of all tasks uploaded to the cloud computing node within and by the cluster, the action is a decision of destination server and computing resource allocation made by all tasks, and the reward is a contribution of each task to a utility function.
3. The method of claim 2, wherein the edge computing resource allocation based on priority and cooperation is: the rewards include a delayed revenue item, a task failure penalty item, and an energy consumption penalty item.
4. The method of claim 1, wherein the edge computing resource allocation based on priority and cooperation is: the first neural network in the step 2) is an h network, the h network comprises a state perceptron and an h actor network, and the state perceptron is used for extracting characteristic information in a state and inputting the characteristic information into the h actor network.
5. The method of claim 4, wherein the edge computing resource allocation based on priority and cooperation is: the destination server decision processThe destination server decision of each task is regarded as different decision processes, each decision has M +1 actions, and the final output is (M +1) multiplied by N scalars, wherein N represents the input task number which can be handled by the neural network at most, so that N is more than or equal to NkAnd M +1 is the number of compute nodes.
6. The method of claim 1, wherein the edge computing resource allocation based on priority and cooperation is: the second neural network in the step 2) is an f network, the f network comprises a state perceptron, an f actor network and an f criticizing network, and the state perceptron is used for extracting characteristic information in a state.
7. The method of claim 6, wherein the edge computing resource allocation based on priority and cooperation is: the actor receives the output of the state perceptron and then outputs the computational resources f allocated for each taskk=[f1,k,f2,k,...,fN,k](ii) a The f criticizing network receives the output of the state perceptron and the calculation resource allocation scheme and then outputs an action state cost function [ Q ] for the actions1(sk,fk),Q2(sk,fk),...,QN(sk,fk)]Wherein s iskIs the state defined in said 1), Q1(sk,fk) Corresponds to f1,kState cost function of, Q2(sk,fk) Corresponds to f2,kThe state cost function of (c), and so on.
8. The method of claim 7, wherein the edge computing resource allocation based on priority and cooperation is: and 3) when the first neural network is updated, the mean square error function is taken as a Loss function, and when the second neural network is updated, the mean square error function is taken as the Loss function.
9. The method of claim 8, wherein the edge computing resource allocation based on priority and cooperation is: the first neural network updating process in the step 3) is as follows:
to be provided with
Figure FDA0002515240060000021
Updating the neural network for the Loss function, where θh,policyA parameter representing the h-network is shown,
Figure FDA0002515240060000022
Qh,targetand Qh,policyRespectively representing the outputs of the h-target network and the h-network, skAnd sk+1For the state of the environment in the k frame, si,kAnd sm,k+1Respectively represent tasks Ti,kAnd Tm,k+1All attributes of, Di,kFor task Ti,kA destination server of, and Ri,kRepresenting a task Ti,kThe prize won, γ is the discount factor;
f actor and f critics of the second neural network will be updated separately;
the updating method of the criticizing family network and the state perceptron comprises the following steps: suppose task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1(in this time di,k0) or has been successfully completed or has failed due to a timeout (d is noted here)i,k1) to
Figure FDA0002515240060000023
Updating the neural network for the Loss function, where θf,policyParameters representing the critics and the status perceptors,
Figure FDA0002515240060000024
skand sk+1For the state of the environment at the k frame, fkRepresenting the computational resource allocation decision for the k-th frame,
Figure FDA0002515240060000025
and
Figure FDA0002515240060000026
respectively representing the ith output of the f critic network and the mth output of the target network corresponding to the f critics, pif,targetAn output representing a target network corresponding to the actor;
the updating method of the actor network and the state perceptron comprises the following steps: with Lfaf,policy)=-E[Qf,policy(skf,policy(sk))]Updating the neural network as a Loss function, wheref,policyIndicates the output of the actor.
CN202010473969.6A 2020-05-29 2020-05-29 Edge computing resource allocation method based on priority and cooperation Pending CN111813539A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010473969.6A CN111813539A (en) 2020-05-29 2020-05-29 Edge computing resource allocation method based on priority and cooperation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010473969.6A CN111813539A (en) 2020-05-29 2020-05-29 Edge computing resource allocation method based on priority and cooperation

Publications (1)

Publication Number Publication Date
CN111813539A true CN111813539A (en) 2020-10-23

Family

ID=72848732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010473969.6A Pending CN111813539A (en) 2020-05-29 2020-05-29 Edge computing resource allocation method based on priority and cooperation

Country Status (1)

Country Link
CN (1) CN111813539A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288478A (en) * 2020-10-28 2021-01-29 中山大学 Edge computing service incentive method based on reinforcement learning
CN112738767A (en) * 2020-11-30 2021-04-30 中南大学 Trust-based mobile edge user task scheduling method
CN113014649A (en) * 2021-02-26 2021-06-22 济南浪潮高新科技投资发展有限公司 Cloud Internet of things load balancing method, device and equipment based on deep learning
CN113590335A (en) * 2021-08-11 2021-11-02 南京大学 Task load balancing method based on grouping and delay estimation in tree edge network
CN113676559A (en) * 2021-10-23 2021-11-19 深圳希研工业科技有限公司 Information processing system and method for multi-device mobile edge calculation of Internet of things
CN114116156A (en) * 2021-10-18 2022-03-01 武汉理工大学 Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109819047A (en) * 2019-02-26 2019-05-28 吉林大学 A kind of mobile edge calculations resource allocation methods based on incentive mechanism
US20190266489A1 (en) * 2017-10-12 2019-08-29 Honda Motor Co., Ltd. Interaction-aware decision making
CN110503195A (en) * 2019-08-14 2019-11-26 北京中科寒武纪科技有限公司 The method and its Related product of task are executed using artificial intelligence process device
CN110798849A (en) * 2019-10-10 2020-02-14 西北工业大学 Computing resource allocation and task unloading method for ultra-dense network edge computing
US20200136920A1 (en) * 2019-12-20 2020-04-30 Kshitij Arun Doshi End-to-end quality of service in edge computing environments

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190266489A1 (en) * 2017-10-12 2019-08-29 Honda Motor Co., Ltd. Interaction-aware decision making
CN109819047A (en) * 2019-02-26 2019-05-28 吉林大学 A kind of mobile edge calculations resource allocation methods based on incentive mechanism
CN110503195A (en) * 2019-08-14 2019-11-26 北京中科寒武纪科技有限公司 The method and its Related product of task are executed using artificial intelligence process device
CN110798849A (en) * 2019-10-10 2020-02-14 西北工业大学 Computing resource allocation and task unloading method for ultra-dense network edge computing
US20200136920A1 (en) * 2019-12-20 2020-04-30 Kshitij Arun Doshi End-to-end quality of service in edge computing environments

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
余萌迪;唐俊华;李建华;: "一种基于强化学习的多节点MEC计算资源分配方案", 通信技术, no. 12 *
邓晓衡;关培源;万志文;刘恩陆;罗杰;赵智慧;刘亚军;张洪刚;: "基于综合信任的边缘计算资源协同研究", 计算机研究与发展, no. 03 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288478A (en) * 2020-10-28 2021-01-29 中山大学 Edge computing service incentive method based on reinforcement learning
CN112738767A (en) * 2020-11-30 2021-04-30 中南大学 Trust-based mobile edge user task scheduling method
CN112738767B (en) * 2020-11-30 2021-12-17 中南大学 Trust-based mobile edge user task scheduling method
CN113014649A (en) * 2021-02-26 2021-06-22 济南浪潮高新科技投资发展有限公司 Cloud Internet of things load balancing method, device and equipment based on deep learning
CN113590335A (en) * 2021-08-11 2021-11-02 南京大学 Task load balancing method based on grouping and delay estimation in tree edge network
CN113590335B (en) * 2021-08-11 2023-11-17 南京大学 Task load balancing method based on grouping and delay estimation in tree edge network
CN114116156A (en) * 2021-10-18 2022-03-01 武汉理工大学 Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method
CN113676559A (en) * 2021-10-23 2021-11-19 深圳希研工业科技有限公司 Information processing system and method for multi-device mobile edge calculation of Internet of things
CN113676559B (en) * 2021-10-23 2022-02-08 深圳希研工业科技有限公司 Information processing system and method for multi-device mobile edge calculation of Internet of things

Similar Documents

Publication Publication Date Title
CN111813539A (en) Edge computing resource allocation method based on priority and cooperation
CN111756812B (en) Energy consumption perception edge cloud cooperation dynamic unloading scheduling method
Zhan et al. A deep reinforcement learning based offloading game in edge computing
CN111835827B (en) Internet of things edge computing task unloading method and system
CN112860350B (en) Task cache-based computation unloading method in edge computation
Chen et al. Dynamic task offloading for internet of things in mobile edge computing via deep reinforcement learning
CN113225377B (en) Internet of things edge task unloading method and device
CN114143346B (en) Joint optimization method and system for task unloading and service caching of Internet of vehicles
CN113810233B (en) Distributed computation unloading method based on computation network cooperation in random network
CN113626104B (en) Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture
Heidari et al. A QoS-aware technique for computation offloading in IoT-edge platforms using a convolutional neural network and Markov decision process
Huang et al. Toward decentralized and collaborative deep learning inference for intelligent iot devices
CN111488528A (en) Content cache management method and device and electronic equipment
CN113573363A (en) MEC calculation unloading and resource allocation method based on deep reinforcement learning
CN116321293A (en) Edge computing unloading and resource allocation method based on multi-agent reinforcement learning
CN116489712A (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN113946423B (en) Multi-task edge computing, scheduling and optimizing method based on graph attention network
Matrouk et al. Mobility aware-task scheduling and virtual fog for offloading in IoT-fog-cloud environment
CN113032149B (en) Edge computing service placement and request distribution method and system based on evolution game
Jiang et al. Energy-saving service offloading for the internet of medical things using deep reinforcement learning
Henna et al. Distributed and collaborative high-speed inference deep learning for mobile edge with topological dependencies
CN116489708B (en) Meta universe oriented cloud edge end collaborative mobile edge computing task unloading method
CN115361453B (en) Load fair unloading and migration method for edge service network
CN116367190A (en) Digital twin function virtualization method for 6G mobile network
Wang et al. Task offloading for edge computing in industrial Internet with joint data compression and security protection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination