CN111813539A - Edge computing resource allocation method based on priority and cooperation - Google Patents
Edge computing resource allocation method based on priority and cooperation Download PDFInfo
- Publication number
- CN111813539A CN111813539A CN202010473969.6A CN202010473969A CN111813539A CN 111813539 A CN111813539 A CN 111813539A CN 202010473969 A CN202010473969 A CN 202010473969A CN 111813539 A CN111813539 A CN 111813539A
- Authority
- CN
- China
- Prior art keywords
- network
- task
- state
- resource allocation
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 98
- 238000013468 resource allocation Methods 0.000 title claims abstract description 45
- 230000009471 action Effects 0.000 claims abstract description 83
- 238000013528 artificial neural network Methods 0.000 claims abstract description 76
- 238000012549 training Methods 0.000 claims abstract description 19
- 238000005265 energy consumption Methods 0.000 claims abstract description 12
- 230000008569 process Effects 0.000 claims description 48
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000003111 delayed effect Effects 0.000 claims description 7
- 230000003993 interaction Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 abstract description 22
- 230000007774 longterm Effects 0.000 abstract description 8
- 230000008901 benefit Effects 0.000 abstract description 6
- 230000006870 function Effects 0.000 description 77
- 239000003795 chemical substances by application Substances 0.000 description 16
- 230000005540 biological transmission Effects 0.000 description 12
- 239000013598 vector Substances 0.000 description 8
- 230000007423 decrease Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000002787 reinforcement Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000020169 heat generation Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5021—Priority
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer And Data Communications (AREA)
Abstract
The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method based on priority and cooperation. The computing resources in the edge computing servers and cloud servers are represented by the number of CPU cycles per unit time available, which can be allocated to tasks running on the servers according to different allocation schemes, but it is difficult to guarantee that long-term revenue is maximized. The application provides a method for distributing edge computing resources based on priority and cooperation, which comprises the following steps: 1) defining edge computation model states, actions and rewards; 2) defining the structure of a neural network and the structure of input and output; 3) and updating, training and applying the neural network according to a given training method. By reasonably distributing computing resources in the edge computing server and the cloud server, specifically CPU (Central processing Unit) cycles in unit time, the long-term benefits related to relative time delay and server energy consumption are improved.
Description
Technical Field
The application belongs to the technical field of resource allocation strategies, and particularly relates to an edge computing resource allocation method based on priority and cooperation.
Background
Mobile users usually have less computing resources due to their own volume and other limitations, and cannot perform a large amount of computation for a long time due to the limitations of device energy consumption and battery capacity. Therefore, for some compute-intensive tasks, if the user only uses his own computing resources to perform processing, it is difficult to meet the requirements of the tasks for low latency, and problems such as shortened standby time and excessive heat generation of the device may be caused. Mobile users therefore need to resort to external computing resources, which in existing networks are typically from cloud computing nodes, also referred to as cloud nodes or cloud servers. However, with the increase of internet of things devices and the development of 5G, the cloud computing technology is increasingly weak in performance, so that the edge computing technology comes along with the increase of the internet of things devices as a supplement. The method aims to configure computing resources at the edge of a network so as to achieve the aims of reducing the bandwidth occupation of a core network, shortening time delay and the like.
In a traditional cloud computing mode, a user uploads a computing-intensive task to a cloud server for processing through a core network, although computing resources of the cloud server are sufficient and computing can be completed in a short time, transmission delay caused by factors such as limited bandwidth of the core network and network jitter is large. To reduce transmission delays, mobile edge computing technology deploys computing resources near the edge of the user's network, such as at a wireless router or base station. Therefore, only one-hop connection exists between the edge computing server and the user, and the data of the user does not need to be uploaded to the cloud computing server through the core network for processing, so that the lower transmission delay is realized. However, compared to a cloud computing server, the computing resources of the edge computing server are relatively limited, and therefore how to efficiently allocate and utilize the computing resources becomes one of the challenges in the mobile edge computing technology. The edge computing environment is modeled as a Markov decision process, and a deep reinforcement learning method is used for optimizing the success rate of tasks and long-term income in consideration of the complexity of the model.
The computing resources in the edge computing servers and cloud servers are represented by the number of CPU cycles per unit time available, which can be allocated to tasks running on the servers according to different allocation schemes, but it is difficult to guarantee that long-term revenue is maximized. Where the benefits are mainly related to relative latency and server energy consumption.
Disclosure of Invention
1. Technical problem to be solved
Based on the problem that the number of CPU cycles in the available unit time of computing resources in an edge computing server and a cloud server can be distributed to tasks running on the server according to different distribution schemes, but long-term profit maximization is difficult to guarantee, the application provides an edge computing resource distribution method based on priority and cooperation.
2. Technical scheme
To achieve the above objective, the set of edge computing servers and cloud servers in an edge computing server cluster in this application can be expressed asWhere number 0 represents a cloud computing node, number 1, 2. Their computational resource capacity may be expressed as V ═ V0,v1,v2,...,vMIn which v ismRepresenting nodesThe computing resource capacity of (a). After an edge user unloads a task to a certain edge computing server, the edge computing server is called as an origin server of the task, the origin server can further decide to process the task by itself or unload the task to a cloud server or other edge computing servers in the same cluster for processing, and a server for processing the task is called as a destination server.
In such an environment, the method of the present application comprises:
1): designing a state with priority and network node attributes under the condition of considering edge-to-edge coordination, edge-to-cloud coordination and task self-priority, designing an action comprising destination server decision and computing resource allocation decision, and rewarding aiming at the task;
2): for the 1) defined states, actions, and rewards, designing a first neural network structure for destination server decisions, and designing a second neural network structure for computing resource allocation decisions;
3): and according to a given algorithm, training and updating the first neural network and the second neural network in the process of interaction of the agent and the edge computing environment, and applying the neural networks after training.
Dividing the edge computing environment into a plurality of frames in time, wherein the time length of each frame is tframe. Assuming that at the k frame, the total number of tasks uploaded to the cloud computing node in and from the cluster is NkWill mark them asWhere the index k denotes the kth frame.
Another embodiment provided by the present application is: the state in 1) is an attribute of all tasks uploaded to the cloud computing node within and by the cluster, the action is a decision of destination server and computing resource allocation made by all tasks, and the reward is a contribution of each task to a utility function.
Another embodiment provided by the present application is: the rewards include a delayed revenue item, a task failure penalty item, and an energy consumption penalty item.
Another embodiment provided by the present application is: the first neural network in the step 2) is an h network, the h network comprises a state perceptron and an h actor network, and the state perceptron is used for extracting characteristic information in a state and inputting the characteristic information into the h actor network.
Another embodiment provided by the present application is: in the destination server decision process, the destination of each task is served as a different decision process, each decision has M +1 actions, and the final output is (M +1) multiplied by N scalars, wherein N represents the input task number which can be maximally responded by the neural network, so that N is more than or equal to NkAnd M +1 is the number of compute nodes.
Another embodiment provided by the present application is: the second neural network in the step 2) is an f network, the f network comprises a state perceptron, an f actor network and an f criticizing network, and the state perceptron is used for extracting characteristic information in a state.
Another embodiment provided by the present application is: the f actor receives the output of the state sensor, thenPost-outputting the computing resources f allocated to each taskk=[f1,k,f2,k,...,fN,k]Here, the task set is selected only among the N outputsCorresponding to NkN ofkThe value is used to represent the number of CPU cycles per unit time allocated to the corresponding task; the f criticizing network receives the output of the state perceptron and the calculation resource allocation scheme and then outputs an action state cost function [ Q ] for the actions1(sk,fk),Q2(sk,fk),...,QN(sk,fk)]Wherein s iskIs the state defined in said 1), Q1(sk,fk) Corresponds to f1,kState cost function of, Q2(sk,fk) Corresponds to f2,kThe state cost function of (c), and so on.
Another embodiment provided by the present application is: and 3) when the first neural network is updated, the mean square error function is taken as a Loss function, and when the second neural network is updated, the mean square error function is taken as the Loss function.
The updating method of the first neural network (h network) in the step 3) comprises the following steps: suppose task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1(in this time di,k0) or has been successfully completed or has failed due to a timeout (d is noted here)i,k1), thenUpdating the neural network for the Loss function, where θh,policyA parameter representing the h-network is shown,Qh,targetand Qh,policyRespectively representing the outputs of the h-target network and the h-network, skAnd sk+1For the state of the environment in the k frame, si,kAnd sm,k+1Respectively represent tasks Ti,kAnd Tm,k+1OfHaving an attribute of Di,kFor task Ti,kA destination server of, and Ri,kRepresenting a task Ti,kThe prize awarded, γ, is a discount factor.
The f actor network and the f critics network of the second neural network in 3) are updated respectively.
The updating method of the criticizing family network and the state perceptron comprises the following steps: suppose task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1(in this time di,k0) or has been successfully completed or has failed due to a timeout (d is noted here)i,k1) toUpdating the neural network for the Loss function, where θf,policyParameters representing the critics and the status perceptors,skand sk+1For the state of the environment at the k frame, fkRepresenting the computational resource allocation decision for the k-th frame,andrespectively representing the ith output of the f critic network and the mth output of the target network corresponding to the f critics, pif,targetAnd f, showing the output of the target network corresponding to the actor.
The updating method of the actor network and the state perceptron comprises the following steps: with Lfa(θf,policy)=-E[Qf,policy(sk,πf,policy(sk))]Updating the neural network as a Loss function, wheref,policyThe term "output" means "output of an actor" and the other terms are as defined above.
3. Advantageous effects
Compared with the prior art, the edge computing resource allocation method based on priority and cooperation has the advantages that:
according to the marginal computing resource allocation method based on priority and cooperation, effective perception and decision can be conducted on marginal computing environments through state, action and reward definition methods, a neural network structure, a neural network input and output structure, training and application methods and the like, and long-term profit maximization is achieved through side-to-side cooperation, side-to-cloud cooperation and load balancing.
According to the priority and cooperation-based edge computing resource allocation method, after the environment state is decoupled into the states of all tasks, the states are input into a specially designed neural network, and the output and the acquired rewards of the neural network also correspond to all the tasks.
According to the edge computing resource allocation method based on priority and cooperation, destination server decision and computing resource allocation decision are carried out by two sets of neural networks, namely the first neural network and the second neural network, and long-term profit maximization is achieved by fully utilizing the cooperation effect.
According to the edge computing resource allocation method based on priority and cooperation, computing resources in the edge computing server and the cloud server are reasonably allocated, specifically the number of CPU cycles in unit time, so that long-term benefits related to relative time delay and server energy consumption are improved.
Drawings
FIG. 1 is a schematic diagram of a first neural network architecture of the present application;
FIG. 2 is a second neural network architecture diagram of the present application;
FIG. 3 is a schematic diagram illustrating an effect of the edge computing resource allocation method based on priority and cooperation according to the present application.
Detailed Description
Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings, and it will be apparent to those skilled in the art from this detailed description that the present application can be practiced. Features from different embodiments may be combined to yield new embodiments, or certain features may be substituted for certain embodiments to yield yet further preferred embodiments, without departing from the principles of the present application.
Although the defects of factors such as network jitter and the like in the traditional cloud computing can not completely meet the requirements of 5G application and service, the abundant computing resources have certain advantages when processing computing-intensive tasks, and meanwhile, when the load of the edge computing nodes is high, the cloud computing nodes can share part of the load, so that edge cloud cooperation is realized, and the user requirements are met. The edge computing nodes and the cloud computing nodes need to be connected through a core network, the bandwidth of the edge computing nodes is relatively limited, and the edge computing nodes can be directly connected with each other in a certain area due to the close spatial distribution of the edge computing nodes, so that the bandwidth is relatively sufficient, and the edge computing nodes can be mutually matched to realize edge-edge cooperation and load balancing.
Different tasks typically require different priorities. For example, in a certain commercial city, a task request for photographing and identifying an object initiated by a common tourist should have a lower priority and can tolerate a longer time delay or even a task failure to a certain extent, while a task request for identifying a suspicious person or a behavior initiated by a security camera should have an extremely high priority and needs to be successfully processed within a shorter time delay.
Therefore, it is necessary to design an edge computing resource allocation method under a scenario in which task priority, edge-to-edge collaboration, and edge-to-cloud collaboration are considered.
With reference to fig. 1 to 3, the present application provides a method for allocating edge computing resources based on priority and collaboration, the method comprising:
1): designing a state with priority and network node attributes under the condition of considering edge-to-edge coordination, edge-to-cloud coordination and task self-priority, designing an action comprising destination server decision and computing resource allocation decision, and rewarding aiming at the task;
2): for the 1) defined states, actions, and rewards, designing a first neural network structure for destination server decisions, and designing a second neural network structure for computing resource allocation decisions;
3): and according to a given algorithm, training and updating the first neural network and the second neural network in the process of interaction of the agent and the edge computing environment, and applying the neural networks after training.
Further, the state in 1) is an attribute of all tasks uploaded to the cloud computing node within and from the cluster, the action is used as a decision of destination server and computing resource allocation made for all tasks, and the reward is a contribution of each task to a utility function.
Further, the rewards include a delayed revenue item, a task failure penalty item, and an energy consumption penalty item.
Further, the first neural network in 2) is an h-network, the h-network includes a state perceptron and an h-actor network, and the state perceptron is configured to extract feature information in a state and input the feature information into the h-actor network.
Further, in the destination server decision process, the destination server decision of each task is regarded as different decision processes, each decision has M +1 actions, and the final output is (M +1) multiplied by N scalars, wherein N represents the number of input tasks which can be maximally handled by the neural network, so that N is more than or equal to NkAnd M +1 is the number of compute nodes.
Further, the second neural network in 2) is an f network, the f network comprises a state perceptron, an f actor network and an f criticizing family network, and the state perceptron is used for extracting characteristic information in a state.
Further, the f actor receives the output of the state perceptron and then outputs the computing resources f allocated for each taskk=[f1,k,f2,k,...,fN,k](ii) a The f criticizing network receives the output of the state perceptron and the calculation resource allocation scheme and then outputs an action state cost function [ Q ] for the actions1(sk,fk),Q2(sk,fk),...,QN(sk,fk)]Wherein s iskIs the state defined in said 1), Q1(sk,fk) Corresponds to f1,kState cost function of, Q2(sk,fk) Corresponds to f2,kThe state cost function of (c), and so on.
Further, in the step 3), the mean square error function is used as a Loss function when the first neural network is updated, and the mean square error function is used as a Loss function when the second neural network is updated.
Further, the first neural network updating process in 3) is as follows:
suppose task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1(in this time di,k0) or has been successfully completed or has failed due to a timeout (d is noted here)i,k1), thenUpdating the neural network for the Loss function, where θh,policyA parameter representing the h-network is shown,Qh,targetand Qh,policyRespectively representing the outputs of the h-target network and the h-network, skAnd sk+1For the state of the environment in the k frame, si,kAnd sm,k+1Respectively represent tasks Ti,kAnd Tm,k+1All attributes of, Di,kFor task Ti,kA destination server of, and Ri,kRepresenting a task Ti,kThe prize awarded, γ, is a discount factor.
F actor and f critics of the second neural network will be updated separately;
the updating method of the criticizing family network and the state perceptron comprises the following steps: suppose task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1(in this time di,k0) or has been successfully completed or has failed due to a timeout (d is noted here)i,k1) toUpdating the neural network for the Loss function, where θf,policyParameters representing the critics and the status perceptors,skand sk+1For the state of the environment at the k frame, fkRepresenting the computational resource allocation decision for the k-th frame,andrespectively representing the ith output of the f critic network and the mth output of the target network corresponding to the f critics, pif,targetAn output representing a target network corresponding to the actor;
the updating method of the actor network and the state perceptron comprises the following steps: to be provided withUpdating the neural network as a Loss function, wheref,policyThe term "output" means "output of an actor" and the other terms are as defined above.
In the step 1), the process of defining the state, the action and the reward of the edge calculation model is as follows, taking the k-th frame as an example:
before defining the state, firstly, the attribute of the task is required to be acquired, and the task T is used fori,kFor example, the required attributes are: amount of data to be transmittedNumber of CPU cycles to processRemaining allowable delayMaximum allowed time delayTask priority li,kSource serverAnd destination serverThen the process of the first step is carried out,
state sk: attributes of all tasks within and uploaded by the cluster to the cloud computing nodes. Namely, it isWhereinFor each task attribute.
Action akThe decision of destination server and computing resource allocation made for all tasks. Namely, it isWherein a isi,kDenoted as task Ti,kThe decision made, ai,k=[hi,k,fi,k]。A destination server representing a processing task; f. ofi,kIndicating the computing resources that the destination server allocated for the task.
Prize rkThe contribution of each task to the utility function. Namely, it isWherein R isi,kAlso consists of three terms:
the delayed gain term:
task failure penalty item:
an energy consumption penalty term:
then the three terms are weighted and combined in the same way as the calculation utility function to obtain Ri,k:
Where α, η, and β are weighting coefficients associated with the edge computing environment.
In the above 2), the structure and input-output structure of the neural network are as follows
Note that the states, actions, and rewards defined in said 1) are all vectors, and the length of these three vectors is all equal to NkIn this regard, the length thereof varies. The number of input nodes and output nodes of the used neural network is fixed, that is, the input dimension and the output dimension are fixed. Thus in the state skBefore inputting into the neural network, except normalization processing, zero filling expansion is also needed. While taking into account si,kD in (1)i,kAnd Ri,kIs a server number, and does not indicate a relative size, and thus a one-hot code (D) is required to be encodedi,kAnd Ri,k. For the action of the output of the neural network and the action state cost function, only meaningful N is takenkAs a function of action and action state cost.
The schematic structure of the neural network is shown in fig. 1 and 2. In both figures it is assumed that there are a maximum of N tasks, N, within the scope of the studykN is less than or equal to N. The leftmost side of the graph is the neural network input, the rightmost side is the neural network output, except for the smallest cube of the output layer, each cube represents a network structure formed by a plurality of network layers, and each smallest cube of the output layer represents a scalar.
Fig. 1 depicts a first neural network used for destination server decision making, which network is named h-network for convenience of expression. In this configuration, the two leftmost layers are state perceptrons (state per), which are used to determine the state of the deviceAnd the device is responsible for extracting characteristic information in the state. The characteristic information extracted by the state perceptron and the attribute information of a certain task are input to an h actor (vector), and the h actor network outputs a plurality of action state value functions (Q(s) corresponding to the taskk,si,k,hi,k=0),Q(sk,si,k,hi,k=1),...,Q(sk,si,k,hi,k=M)]Wherein Q(s)k,si,k,hi,k0) represents an action state cost function for offloading the task to compute node 0 (cloud server) processing, Q(s)k,si,k,hi,k1) represents the action state cost function that is handled to offload tasks to compute node 1 (edge server numbered 1), and so on. In this algorithm, the destination server decisions for each task are treated as a different decision process, so there are M +1 possible actions for each decision, so the final output is (M +1) × N scalars.
FIG. 2 depicts a second neural network used for the computational resource allocation problem, which is named f-network for convenience of expression. The first layers of the input layer of the structure are the same as the first layers of the h network in structure and are all state sensors. The two network blocks to the right of the state perceptron are named f actor (f actor) and f criticist (f critic). The actor receives the output of the state perceptron and then outputs the computing resources f allocated for each taskk=[f1,k,f2,k,...,fN,k]In which meaningless terms can be ignored, taking only the meaningful N thereofkAn item. f criticizing the network receives the output of the state perceptron and the calculation resource allocation scheme, and then outputs the action state value function [ Q ] aiming at the actions1(sk,fk),Q2(sk,fk),...,QN(sk,fk)]. Will f iskEach of which is considered as an action made for the corresponding task, so that the output of the f critics also has N dimensions, Qi(sk,fk) Just correspond to fi,kThe state cost function of. With this structure, it can be understood that the function of the f criticist is similar to that of the h actor, and the input is performedThe action state cost function is generated, and the function of the actor is to seek the action that maximizes the action state cost function.
The training method and the application process in the step 3) are as follows:
target networks with the same structure are set for each network, namely an h target network with the same structure as the h network and an f target network with the same structure as the f network. In addition, the status, actions, rewards, next actions and whether the task was successfully processed or failed due to timeout for each step are stored in the memory bank using an empirical replay technique. In the interaction process of the agent and the edge computing environment, a concept of an ensemble (epicode) is also defined, each L frame is defined as one ensemble, and the update of the neural network is also carried out after each ensemble, but not after each frame.
Destination server decision process and h-network update algorithm
For the sake of simplicity, the h network and the h target network are distinguished by subscripts, i.e. the h network is marked as Qh,policyH target network is Qh,target。
And at the beginning of each frame, the h network acquires the current state information and outputs an action state cost function of each action. But only new task Ti,kRequiring the use of output corresponding thereto to make destination server decisions, i.e.
In the training process, in order to ensure that the intelligent agent can fully explore the environment, only a certain probability of 1-epsilonkTake the action with ∈kTake a random action. Namely, it is
The exploration of the environment by the agent should decrease as the number of iteration rounds increases, so belongs tokWill decrease as the number of iteration rounds increases.
Unlike a typical reinforcement learning environment, the environment is a hybridActions other than per frame decisions may be performed. In this environment, once the destination server is determined, the destination server decisions for the next several frames are meaningless and cannot be performed, which makes it impossible to iteratively update the h-network directly using the bellman optimal equation. Therefore, the update method of the h-network needs to be adjusted. Specifically, assume task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1Then h network is updated as follows:
this equation may be understood as meaning that for an existing task, in whichever frame, its destination server decision action is only Di,kThis option; if task Ti,kAfter the kth frame has been successfully completed or failed due to timeout, it is updated as follows:
by combining the above two equations, the update process can be merged into:
wherein d isi,kRepresents the task Ti,kIf the task is successfully completed or fails due to timeout, d i,k1 is ═ 1; otherwise, the task is not completed and will be inherited to the (k +1) th frame, then d i,k0. It can thus also be seen that the transitions stored in the memory banks do not only require sk,ak,rk,sk+1And also stores the completion vector
The mean square error function is taken as the Loss function when updating, i.e.
Wherein
In the formula, h network is iteratively updated in each set, and h target network updates itself directly by using h network parameters after h network updates C round (C is a constant). If using thetah,policyParameter, theta, representing h-networkh,targetThe parameter of the h target network is expressed, and every C round, the order thetah,target=θh,policy
Computational resource allocation decision process and f network update algorithm
For convenience of presentation, the state perceptron and the f actor network are denoted as πf,policyWith its corresponding target network denoted as pif,targetWhile the status sensor and the f criticizing network are denoted as Qf,policyIts corresponding target network is denoted as Qf,target。
At the beginning of each frame, the computational resources f allocated to each task are output by the actors based on the statusk=[f1,k,f2,k,...,fN,k]However, in order for the agent to fully explore the environment, a certain amount of noise must be added to the action during the training process, i.e. the agent is not aware of the environment, but only needs to make a certain amount of noise
f′i,k=clip(fi,k+n,0,vm) (12)
WhereinFor random noise, the noise standard deviation σ is the same as the destination server decision processkAlso decreases as the number of iteration rounds increases; clip represents a clipping function, which is defined as
The function is used for ensuring that the action after adding the noise still satisfies f which is more than or equal to 0i,k≤vm. Wherein for a task still in the wired transmission phase that has not yet reached the destination server, it is forced to set the computing resources allocated to it to 0. If the action after adding noise is not satisfiedThe actions need to be further processed:
Qf,policyand Qf,targetThe network update process is very similar to the h-network update process, assuming task Ti,kSuccessfully processed in the k frame or failed due to timeout, or inherited to the k +1 frame and noted as Tm,k+1Then the update process can be expressed as follows
The relationship between the f actor and the f critics' network is very similar to that between the generator and the evaluator in the generative confrontation network, and the goal of the f actor is to maximize the output of the f critics, i.e., its training process can be expressed as
As with h-network, the mean square error function is used as Qf,policyLoss function of network:
in the formula
And pif,policyThen is directly used
Lfa(θf,policy)=-E[Qf,policy(sk,πf,policy(sk))](19)
As a Loss function. It is noted that LfcFor updating Q onlyf,policyNot the entire f-network, LfaAlso only used to update pif,policyUpdating pif,policyWhen f critics' network is fixed.
Similarly, the f network is iteratively updated in each set, and the f target network updates itself directly with the parameters of the f network after the f network updates C rounds (C is a constant). However, unlike h-networks, updates to f-networks use soft updates, i.e., if θ is usedf,policyRepresenting a parameter of the f-network, thetaf,targetParameters representing the f target network, let θ every C roundsf,target=τθf,policy+(1-τ)θf,targetWherein τ is the update rate, and generally takes a smaller value.
In summary, the interaction process of the agent with the environment and the learning process of the agent are shown in algorithm 1.
When in application, according to the structure and the input and output method in the 2), a state action value function is obtained from the h network, an action corresponding to the maximum value is selected from the state action value function as a destination server decision, and a computing resource decision for each task is obtained from an f actor in the f network.
Examples
In this section, a specific edge calculation model is given, but it should be noted that this is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and those skilled in the art should understand that the present application includes, but is not limited to, the contents described in the drawings and the above detailed description. Any modification which does not depart from the functional and structural principles of the present application is intended to be included within the scope of the claims.
The edge computing network model is composed of three layers of structures, wherein the three layers are respectively as follows from top to bottom: a cloud computing node layer, an edge computing server cluster layer, and an IoT device layer (user layer). The cloud settlement node layer comprises a cloud computing node which has relatively more computing resources. The edge computing server cluster layer comprises a plurality of edge computing server clusters, each server cluster comprises a plurality of edge computing servers, a schematic diagram of three edge computing server clusters is drawn in the diagram, each server cluster comprises three edge computing servers, and each edge computing server is placed beside one wireless access point (such as a base station, a wireless router and the like), so that the transmission delay from the wireless access point to the edge computing servers can be ignored. The edge calculation server cluster is divided according to position distribution, that is, a plurality of edge calculation servers in one edge calculation server cluster are relatively close in spatial position, so that they are directly connected with each other by optical fibers and the like. Different edge computing server clusters are connected to the core network together and connected to the cloud computing nodes through the core network. The IoT device layer contains several IoT devices, each connected with one wireless access point through a wireless link.
Consider edge-edge collaboration within a cluster of servers and collaboration of edge computing servers with cloud computing nodes within the cluster. Recording a set formed by cloud nodes and edge computing servers in a certain cluster as a setWhere number 0 represents a cloud computing node, number 1, 2. The computational resource capacity of these nodes may be denoted V ═ { V ═ V0,v1,v2,...,vMIn which v ismRepresenting nodesThe computing resources of the cloud node are relatively sufficient, i.e., v0>vm(m.noteq.0). In this model, the computational resources are expressed in terms of the number of CPU cycles per unit time. Is connected to the clusterThe IoT device of the wireless access point at the edge computing server side can be represented as
In a network, users are involved in the transfer of data in the process of offloading tasks. In the wireless transmission process, the bandwidth is Bwireless(ii) a During wired transmission, data is assumed to be in the nodeIs transmitted between, then its bandwidth isBecause the edge computing nodes in the same cluster are distributed close to each other in space and are directly connected by using optical fibers and the like, the bandwidth, the data transmission rate and the like between the edge nodes are relatively high; the distance between the edge computing node and the cloud computing node is longer, and the edge computing node and the cloud computing node need to be connected through a core network, so that the transmission delay is relatively higher, the network bandwidth is relatively narrower, and the network bandwidth is relatively shorter, so that the network bandwidth is relatively longer
Due to the limitation of the battery life and the limitation of the computing capacity of the internet of things equipment, the internet of things equipment needs to borrow computing resources of the edge computing nodes and the cloud computing nodes to process tasks which are continuously generated by the internet of things equipment. The task is assumed to be atomic, that is, the task can only be processed at one edge computing node or at a cloud node, and cannot be processed after being divided. Thus, a task will first be offloaded to an edge computing server near its corresponding wireless access point, which is referred to as the origin server for the task, which may have three further choices for the task:
a) the source server processes the task by itself;
b) the task is further transmitted to other edge computing servers in the same cluster for processing;
c) and further unloading the task to the cloud computing node for processing.
The server that ultimately processes the task is referred to as the destination server for the task.
A task may be abstracted into several key attributes. Assume that at the k-th frame, there is a total of N tasks uploaded to the cloud computing nodes within and from the clusterkWill be described asSome of the tasks are tasks which are not completed or failed in the k-1 th frame, and the tasks are inherited by the k-1 th frame, so that the tasks are called existing tasks; still other tasks are just arriving at the edge compute server at the beginning of this frame and are yet to be further offloaded or processed, which are referred to as new tasks. The tasks own some attributes related to transmission, processing and priority, namely the task Ti,kFor example, its own attributes can be expressed as:
a) amount of data to be transmittedAfter the task reaches the source server, the task is processed by the destination server, and in the process, the source server needs to transmit data including user input data, code data and the like to the destination server. If the destination server and the source server are the same server, thenIf it is a different server, thenThe amount of data that needs to be transferred remains from the source server to the destination server, for a new task,
b) number of CPU cycles to processThe task itself requires a certain amount of computation,the amount of this calculation is expressed in number of CPU cycles. For new tasks and tasks that have not yet reached the destination server,
c) remaining allowable delayThe task is transmitted and processed with a certain time consumption, and the remaining allowable time delay is reduced. The desired task can be inThe task is completed within time, but it is also possible that the task is completed over time, when the task processing times out,possibly negative. For new tasksWhereinIn order to maximize the allowed time delay,the wireless transmission process is time consuming.
d) Task priority li,k. The priority is an integer, and 1 represents the lowest priority.
The task has network-related attributes, i.e. origin server, in addition to its own attributesAnd destination server
At the k frame, the utility function will contain a delay gain termTask failure penalty itemAnd calculating node energy consumption penalty itemThree parts:
the delayed gain item is the gain obtained when the task in the current frame is completed, and is relative to the delayed gain of the taskAnd task priority li,kIs proportional, i.e.
Wherein 1 isA(x) For indicating the function, it is used to indicate whether the element x is in the set A, and its expression is
The task failure penalty item is the penalty suffered by the current intra-frame task processing overtime and failure, and the task failure penalty item is eliminated when the environment is eliminated, and is in direct proportion to the task priority, and the higher the priority of the task is, the greater the penalty is obtained after the task processing overtime and failure, namely
The penalty term of the energy consumption of the computing nodes is the sum of the energy consumption of each computing node
Where (x) is defined as a binary function, the expression for which is:
The utility function being a weighted combination of the three above, i.e. utility function
Where α, β and η are weighting coefficients, care should be taken to satisfy η > α, i.e. to ensure successful processing of a task over timeThe negative gain obtained is greater than the penalty obtained when the task fails over time.
According to said 1), in such an environment, the states, actions and rewards in the edge calculation model are first defined, taking the k-th frame as an example:
state sk: attributes of all tasks within and uploaded by the cluster to the cloud computing nodes. Namely, it isWhereinFor each task attribute.
Action akPurpose made for all tasksAnd the location server and the computing resource allocation. Namely, it isWherein a isi,kDenoted as task Ti,kThe decision made, ai,k=[hi,k,fi,k]。A destination server representing a processing task; f. ofi,kIndicating the computing resources that the destination server allocated for the task.
Prize rkThe contribution of each task to the utility function. Namely, it isWherein R isi,kAlso consists of three terms:
the delayed gain term:
task failure penalty item:
an energy consumption penalty term:
then the three terms are weighted and combined in the same way as the calculation utility function to obtain Ri,k:
Where α, β and η are defined identically to the corresponding variables in the utility function.
After defining the states, actions and rewards, the structure of the neural network and the input-output structure are defined as follows:
note that the states, actions, and rewards defined in 1) are all vectors, and the lengths of these three vectors are all equal to NkIn this regard, the length thereof varies. The number of input nodes and output nodes of the used neural network is fixed, that is, the input dimension and the output dimension are fixed. Thus in the state skBefore inputting into the neural network, except normalization processing, zero filling expansion is also needed. While taking into account si,kD in (1)i,kAnd Ri,kIs a server number, and does not indicate a relative size, and thus a one-hot code (D) is required to be encodedi,kAnd Ri,k. For the action of the output of the neural network and the action state cost function, only meaningful N is takenkAs a function of action and action state cost.
The schematic structure of the neural network is shown in fig. 1 and 2. In both figures it is assumed that there are a maximum of N tasks, N, within the scope of the studykN is less than or equal to N. The leftmost side of the graph is the neural network input, the rightmost side is the neural network output, except for the smallest cube of the output layer, each cube represents a network structure formed by a plurality of network layers, and each smallest cube of the output layer represents a scalar.
Fig. 1 depicts a first neural network used for destination server decision making, which network is named h-network for convenience of expression. In this structure, the two leftmost layers are state perceptrons (state perceptrons) which are responsible for extracting feature information in the state. The characteristic information extracted by the state perceptron and the attribute information of a certain task are input to an h actor (vector), and the h actor outputs a plurality of action state value functions (Q(s) corresponding to the taskk,si,k,hi,k=0),Q(sk,si,k,hi,k=1),…,Q(sk,si,k,hi,k=M)]Wherein Q(s)k,si,k,hi,k0) represents an action state cost function for offloading the task to compute node 0 (cloud server) processing, Q(s)k,si,k,hi,k1) represents an action of offloading a task to a processing node 1 (edge server numbered 1)As a function of state cost, and so on. In this algorithm, the destination server decisions for each task are treated as a different decision process, so there are M +1 possible actions for each decision, so the final output is (M +1) × N scalars.
FIG. 2 depicts a second neural network used for the computational resource allocation problem, which is named f-network for convenience of expression. The first layers of the input layer of the structure are the same as the first layers of the h network in structure and are all state sensors. The two network blocks to the right of the state perceptron are named f actor (f actor) and f criticist (f critic). The actor receives the output of the state perceptron and then outputs the computing resources f allocated for each taskk=[f1,k,f2,k,…,fN,k]In which meaningless terms can be ignored, taking only the meaningful N thereofkAn item. f criticizing the network receives the output of the state perceptron and the calculation resource allocation scheme, and then outputs the action state value function [ Q ] aiming at the actions1(sk,fk),Q2(sk,fk),…,QN(sk,fk)]. Will f iskEach of which is considered as an action made for the corresponding task, so that the output of the f critics also has N dimensions, Qi(sk,fk) Just correspond to fi,kThe state cost function of. With this structure, it can be understood that the function of the f criticist is similar to that of the h actor, and outputs the action state cost function, and the function of the f actor seeks to maximize the action state cost function.
Finally, the training method and the application process are given as follows:
target networks with the same structure are set for each network, namely an h target network with the same structure as the h network and an f target network with the same structure as the f network. In addition, the status, actions, rewards, next actions and whether the task was successfully processed or failed due to timeout for each step are stored in the memory bank using an empirical replay technique. In the interaction process of the agent and the edge computing environment, a concept of an ensemble (epicode) is also defined, each L frame is defined as one ensemble, and the update of the neural network is also carried out after each ensemble, but not after each frame.
Destination server decision process and h-network update algorithm
For the sake of simplicity, the h network and the h target network are distinguished by subscripts, i.e. the h network is marked as Qh,policyH target network is Qh,target。
And at the beginning of each frame, the h network acquires the current state information and outputs an action state cost function of each action. But only new task Ti,kRequiring the use of output corresponding thereto to make destination server decisions, i.e.
In the training process, in order to ensure that the intelligent agent can fully explore the environment, only a certain probability of 1-epsilonkTake the action with ∈kTake a random action. Namely, it is
The exploration of the environment by the agent should decrease as the number of iteration rounds increases, so belongs tokWill decrease as the number of iteration rounds increases.
Unlike a typical reinforcement learning environment, not every frame of decision-making action may be performed in this environment. In this environment, once the destination server is determined, the destination server decisions for the next several frames are meaningless and cannot be performed, which makes it impossible to iteratively update the h-network directly using the bellman optimal equation. Therefore, the update method of the h-network needs to be adjusted. Specifically, assume task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1Then h network is updated as follows:
this formula can be understood as being forTasking, with only D being the destination server decision-making action, regardless of which frame it is ini,kThis option; if task Ti,kAfter the kth frame has been successfully completed or failed due to timeout, it is updated as follows:
by combining the above two equations, the update process can be merged into:
wherein d isi,kRepresents the task Ti,kIf the task is successfully completed or fails due to timeout, d i,k1 is ═ 1; otherwise, the task is not completed and will be inherited to the (k +1) th frame, then d i,k0. It can thus also be seen that the transitions stored in the memory banks do not only require sk,ak,rk,sk+1And also stores the completion vector
The mean square error function is taken as the Loss function when updating, i.e.
Wherein
In the formula, h network is iteratively updated in each set, and h target network updates itself directly by using h network parameters after h network updates C round (C is a constant). If using thetah,policyParameter, theta, representing h-networkh,targetThe parameter of the h target network is expressed, and every C round, the order thetah,target=θh,policy
Computational resource allocation decision process and f network update algorithm
For convenience of presentation, the state perceptron and the f actor network are denoted as πf,policyWith its corresponding target network denoted as pif,targetWhile the status sensor and the f criticizing network are denoted as Qf,policyIts corresponding target network is denoted as Qf,target。
At the beginning of each frame, the computational resources f allocated to each task are output by the actors based on the statusk=[f1,k,f2,k,...,fN,k]However, in order for the agent to fully explore the environment, a certain amount of noise must be added to the action during the training process, i.e. the agent is not aware of the environment, but only needs to make a certain amount of noise
f′i,k=clip(fi,k+n,0,vm) (37)
WhereinFor random noise, the noise standard deviation σ is the same as the destination server decision processkAlso decreases as the number of iteration rounds increases; clip represents a clipping function, which is defined as
The function is used for ensuring that the action after adding the noise still satisfies f which is more than or equal to 0i,k≤vm. Wherein for a task still in the wired transmission phase that has not yet reached the destination server, it is forced to set the computing resources allocated to it to 0. If the action after adding noise is not satisfiedThe actions need to be further processed:
Qf,policyand Qf,targetThe network update process is very similar to the h-network update process, assuming task Ti,kSuccessfully processed in the k frame or failed due to timeout, or inherited to the k +1 frame and noted as Tm,k+1Then the update process can be expressed as follows
The relationship between the f actor and the f critics' network is very similar to that between the generator and the evaluator in the generative confrontation network, and the goal of the f actor is to maximize the output of the f critics, i.e., its training process can be expressed as
As with h-network, the mean square error function is used as Qf,policyLoss function of network:
in the formula
And pif,policyThen is directly used
Lfa(θf,policy)=-E[Qf,policy(sk,πf,policy(sk))](44)
As a Loss function. It is noted that LfcFor updating Q onlyf,policyNot the entire f-network, LfaAlso only used to update pif,policyUpdating pif,policyWhen f critics' network is fixed.
Similarly, the f network is iteratively updated in each set, and the f target network updates itself directly with the parameters of the f network after the f network updates C rounds (C is a constant). However, unlike h-networks, updates to f-networks use soft updates, i.e., if θ is usedf,policyParameters representing f network,θf,targetParameters representing the f target network, let θ every C roundsf,target=τθf,policy+(1-τ)θf,targetWherein τ is the update rate, and generally takes a smaller value.
In summary, the interaction process of the agent with the environment and the learning process of the agent are shown in algorithm 1.
When in application, according to the structure and the input and output method in the 2), a state action value function is obtained from the h network, an action corresponding to the maximum value is selected from the state action value function as a destination server decision, and a computing resource decision for each task is obtained from an f actor in the f network.
With 500 frames as a set, the figure 3 is made with the sum of the utility functions of all frames in each set as the reward (reward) as the index. The "no collaboration function benchmark" is a curve corresponding to the algorithm provided by the present application without considering collaboration, the "disposed" is a curve corresponding to the algorithm provided by the present application, the abscissa represents the training process, and the ordinate represents the benefit of each set, i.e., the sum of the utility functions of the frames in each set. It can be seen that after a period of training, the algorithm is stable, with approximately three times more gain than the no-collaboration scheme.
Although the present application has been described above with reference to specific embodiments, those skilled in the art will recognize that many changes may be made in the configuration and details of the present application within the principles and scope of the present application. The scope of protection of the application is determined by the appended claims, and all changes that come within the meaning and range of equivalency of the technical features are intended to be embraced therein.
Claims (9)
1. A method for distributing edge computing resources based on priority and cooperation is characterized in that: the method comprises the following steps:
1): designing a state with priority and network node attributes under the condition of considering edge-to-edge coordination, edge-to-cloud coordination and task self-priority, designing an action comprising destination server decision and computing resource allocation decision, and rewarding aiming at the task;
2): for the 1) defined states, actions, and rewards, designing a first neural network structure for destination server decisions, and designing a second neural network structure for computing resource allocation decisions;
3): and according to a given algorithm, training and updating the first neural network and the second neural network in the process of interaction of the agent and the edge computing environment, and applying the neural networks after training.
2. The method of claim 1, wherein the edge computing resource allocation based on priority and cooperation is: the state in 1) is an attribute of all tasks uploaded to the cloud computing node within and by the cluster, the action is a decision of destination server and computing resource allocation made by all tasks, and the reward is a contribution of each task to a utility function.
3. The method of claim 2, wherein the edge computing resource allocation based on priority and cooperation is: the rewards include a delayed revenue item, a task failure penalty item, and an energy consumption penalty item.
4. The method of claim 1, wherein the edge computing resource allocation based on priority and cooperation is: the first neural network in the step 2) is an h network, the h network comprises a state perceptron and an h actor network, and the state perceptron is used for extracting characteristic information in a state and inputting the characteristic information into the h actor network.
5. The method of claim 4, wherein the edge computing resource allocation based on priority and cooperation is: the destination server decision processThe destination server decision of each task is regarded as different decision processes, each decision has M +1 actions, and the final output is (M +1) multiplied by N scalars, wherein N represents the input task number which can be handled by the neural network at most, so that N is more than or equal to NkAnd M +1 is the number of compute nodes.
6. The method of claim 1, wherein the edge computing resource allocation based on priority and cooperation is: the second neural network in the step 2) is an f network, the f network comprises a state perceptron, an f actor network and an f criticizing network, and the state perceptron is used for extracting characteristic information in a state.
7. The method of claim 6, wherein the edge computing resource allocation based on priority and cooperation is: the actor receives the output of the state perceptron and then outputs the computational resources f allocated for each taskk=[f1,k,f2,k,...,fN,k](ii) a The f criticizing network receives the output of the state perceptron and the calculation resource allocation scheme and then outputs an action state cost function [ Q ] for the actions1(sk,fk),Q2(sk,fk),...,QN(sk,fk)]Wherein s iskIs the state defined in said 1), Q1(sk,fk) Corresponds to f1,kState cost function of, Q2(sk,fk) Corresponds to f2,kThe state cost function of (c), and so on.
8. The method of claim 7, wherein the edge computing resource allocation based on priority and cooperation is: and 3) when the first neural network is updated, the mean square error function is taken as a Loss function, and when the second neural network is updated, the mean square error function is taken as the Loss function.
9. The method of claim 8, wherein the edge computing resource allocation based on priority and cooperation is: the first neural network updating process in the step 3) is as follows:
to be provided withUpdating the neural network for the Loss function, where θh,policyA parameter representing the h-network is shown,Qh,targetand Qh,policyRespectively representing the outputs of the h-target network and the h-network, skAnd sk+1For the state of the environment in the k frame, si,kAnd sm,k+1Respectively represent tasks Ti,kAnd Tm,k+1All attributes of, Di,kFor task Ti,kA destination server of, and Ri,kRepresenting a task Ti,kThe prize won, γ is the discount factor;
f actor and f critics of the second neural network will be updated separately;
the updating method of the criticizing family network and the state perceptron comprises the following steps: suppose task Ti,kIs inherited to the k +1 frame and is noted as Tm,k+1(in this time di,k0) or has been successfully completed or has failed due to a timeout (d is noted here)i,k1) toUpdating the neural network for the Loss function, where θf,policyParameters representing the critics and the status perceptors,skand sk+1For the state of the environment at the k frame, fkRepresenting the computational resource allocation decision for the k-th frame,andrespectively representing the ith output of the f critic network and the mth output of the target network corresponding to the f critics, pif,targetAn output representing a target network corresponding to the actor;
the updating method of the actor network and the state perceptron comprises the following steps: with Lfa(θf,policy)=-E[Qf,policy(sk,πf,policy(sk))]Updating the neural network as a Loss function, wheref,policyIndicates the output of the actor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010473969.6A CN111813539A (en) | 2020-05-29 | 2020-05-29 | Edge computing resource allocation method based on priority and cooperation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010473969.6A CN111813539A (en) | 2020-05-29 | 2020-05-29 | Edge computing resource allocation method based on priority and cooperation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111813539A true CN111813539A (en) | 2020-10-23 |
Family
ID=72848732
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010473969.6A Pending CN111813539A (en) | 2020-05-29 | 2020-05-29 | Edge computing resource allocation method based on priority and cooperation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111813539A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112288478A (en) * | 2020-10-28 | 2021-01-29 | 中山大学 | Edge computing service incentive method based on reinforcement learning |
CN112738767A (en) * | 2020-11-30 | 2021-04-30 | 中南大学 | Trust-based mobile edge user task scheduling method |
CN113014649A (en) * | 2021-02-26 | 2021-06-22 | 济南浪潮高新科技投资发展有限公司 | Cloud Internet of things load balancing method, device and equipment based on deep learning |
CN113590335A (en) * | 2021-08-11 | 2021-11-02 | 南京大学 | Task load balancing method based on grouping and delay estimation in tree edge network |
CN113676559A (en) * | 2021-10-23 | 2021-11-19 | 深圳希研工业科技有限公司 | Information processing system and method for multi-device mobile edge calculation of Internet of things |
CN114116156A (en) * | 2021-10-18 | 2022-03-01 | 武汉理工大学 | Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109819047A (en) * | 2019-02-26 | 2019-05-28 | 吉林大学 | A kind of mobile edge calculations resource allocation methods based on incentive mechanism |
US20190266489A1 (en) * | 2017-10-12 | 2019-08-29 | Honda Motor Co., Ltd. | Interaction-aware decision making |
CN110503195A (en) * | 2019-08-14 | 2019-11-26 | 北京中科寒武纪科技有限公司 | The method and its Related product of task are executed using artificial intelligence process device |
CN110798849A (en) * | 2019-10-10 | 2020-02-14 | 西北工业大学 | Computing resource allocation and task unloading method for ultra-dense network edge computing |
US20200136920A1 (en) * | 2019-12-20 | 2020-04-30 | Kshitij Arun Doshi | End-to-end quality of service in edge computing environments |
-
2020
- 2020-05-29 CN CN202010473969.6A patent/CN111813539A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190266489A1 (en) * | 2017-10-12 | 2019-08-29 | Honda Motor Co., Ltd. | Interaction-aware decision making |
CN109819047A (en) * | 2019-02-26 | 2019-05-28 | 吉林大学 | A kind of mobile edge calculations resource allocation methods based on incentive mechanism |
CN110503195A (en) * | 2019-08-14 | 2019-11-26 | 北京中科寒武纪科技有限公司 | The method and its Related product of task are executed using artificial intelligence process device |
CN110798849A (en) * | 2019-10-10 | 2020-02-14 | 西北工业大学 | Computing resource allocation and task unloading method for ultra-dense network edge computing |
US20200136920A1 (en) * | 2019-12-20 | 2020-04-30 | Kshitij Arun Doshi | End-to-end quality of service in edge computing environments |
Non-Patent Citations (2)
Title |
---|
余萌迪;唐俊华;李建华;: "一种基于强化学习的多节点MEC计算资源分配方案", 通信技术, no. 12 * |
邓晓衡;关培源;万志文;刘恩陆;罗杰;赵智慧;刘亚军;张洪刚;: "基于综合信任的边缘计算资源协同研究", 计算机研究与发展, no. 03 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112288478A (en) * | 2020-10-28 | 2021-01-29 | 中山大学 | Edge computing service incentive method based on reinforcement learning |
CN112738767A (en) * | 2020-11-30 | 2021-04-30 | 中南大学 | Trust-based mobile edge user task scheduling method |
CN112738767B (en) * | 2020-11-30 | 2021-12-17 | 中南大学 | Trust-based mobile edge user task scheduling method |
CN113014649A (en) * | 2021-02-26 | 2021-06-22 | 济南浪潮高新科技投资发展有限公司 | Cloud Internet of things load balancing method, device and equipment based on deep learning |
CN113590335A (en) * | 2021-08-11 | 2021-11-02 | 南京大学 | Task load balancing method based on grouping and delay estimation in tree edge network |
CN113590335B (en) * | 2021-08-11 | 2023-11-17 | 南京大学 | Task load balancing method based on grouping and delay estimation in tree edge network |
CN114116156A (en) * | 2021-10-18 | 2022-03-01 | 武汉理工大学 | Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method |
CN113676559A (en) * | 2021-10-23 | 2021-11-19 | 深圳希研工业科技有限公司 | Information processing system and method for multi-device mobile edge calculation of Internet of things |
CN113676559B (en) * | 2021-10-23 | 2022-02-08 | 深圳希研工业科技有限公司 | Information processing system and method for multi-device mobile edge calculation of Internet of things |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111813539A (en) | Edge computing resource allocation method based on priority and cooperation | |
CN111756812B (en) | Energy consumption perception edge cloud cooperation dynamic unloading scheduling method | |
Zhan et al. | A deep reinforcement learning based offloading game in edge computing | |
CN111835827B (en) | Internet of things edge computing task unloading method and system | |
CN112860350B (en) | Task cache-based computation unloading method in edge computation | |
Chen et al. | Dynamic task offloading for internet of things in mobile edge computing via deep reinforcement learning | |
CN113225377B (en) | Internet of things edge task unloading method and device | |
CN114143346B (en) | Joint optimization method and system for task unloading and service caching of Internet of vehicles | |
CN113810233B (en) | Distributed computation unloading method based on computation network cooperation in random network | |
CN113626104B (en) | Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture | |
Heidari et al. | A QoS-aware technique for computation offloading in IoT-edge platforms using a convolutional neural network and Markov decision process | |
Huang et al. | Toward decentralized and collaborative deep learning inference for intelligent iot devices | |
CN111488528A (en) | Content cache management method and device and electronic equipment | |
CN113573363A (en) | MEC calculation unloading and resource allocation method based on deep reinforcement learning | |
CN116321293A (en) | Edge computing unloading and resource allocation method based on multi-agent reinforcement learning | |
CN116489712A (en) | Mobile edge computing task unloading method based on deep reinforcement learning | |
CN113946423B (en) | Multi-task edge computing, scheduling and optimizing method based on graph attention network | |
Matrouk et al. | Mobility aware-task scheduling and virtual fog for offloading in IoT-fog-cloud environment | |
CN113032149B (en) | Edge computing service placement and request distribution method and system based on evolution game | |
Jiang et al. | Energy-saving service offloading for the internet of medical things using deep reinforcement learning | |
Henna et al. | Distributed and collaborative high-speed inference deep learning for mobile edge with topological dependencies | |
CN116489708B (en) | Meta universe oriented cloud edge end collaborative mobile edge computing task unloading method | |
CN115361453B (en) | Load fair unloading and migration method for edge service network | |
CN116367190A (en) | Digital twin function virtualization method for 6G mobile network | |
Wang et al. | Task offloading for edge computing in industrial Internet with joint data compression and security protection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |