WO2024065903A1 - 多约束边环境下计算卸载与资源分配联合优化***及方法 - Google Patents

多约束边环境下计算卸载与资源分配联合优化***及方法 Download PDF

Info

Publication number
WO2024065903A1
WO2024065903A1 PCT/CN2022/126471 CN2022126471W WO2024065903A1 WO 2024065903 A1 WO2024065903 A1 WO 2024065903A1 CN 2022126471 W CN2022126471 W CN 2022126471W WO 2024065903 A1 WO2024065903 A1 WO 2024065903A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
offloading
resource allocation
network
mds
Prior art date
Application number
PCT/CN2022/126471
Other languages
English (en)
French (fr)
Inventor
陈哲毅
黄思进
张俊杰
熊兵
Original Assignee
福州大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 福州大学 filed Critical 福州大学
Publication of WO2024065903A1 publication Critical patent/WO2024065903A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/06Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/78Architectures of resource allocation
    • H04L47/783Distributed allocation of resources, e.g. bandwidth brokers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0925Management thereof using policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0958Management thereof based on metrics or performance parameters
    • H04W28/0967Quality of Service [QoS] parameters
    • H04W28/0975Quality of Service [QoS] parameters for reducing delays

Definitions

  • the present invention relates to a system and method for joint optimization of computation offloading and resource allocation in a multi-constrained edge environment.
  • MEC mobile edge computing
  • WPT wireless power transfer
  • the purpose of the present invention is to provide a system and method for joint optimization of computing offloading and resource allocation in a multi-constrained edge environment, which can obtain the optimal strategy for computing offloading and resource allocation in a dynamic MEC environment.
  • the present invention adopts the following technical solution:
  • the MDs are equipped with energy harvesting (EH) components and are powered by energy harvested from radio frequency (RF) signals.
  • EH energy harvesting
  • RF radio frequency
  • the computing tasks are offloaded to the MEC server for execution or the tasks are executed locally.
  • the tasks with higher priorities tend to be offloaded to the MEC server for execution. Specifically, the above priorities is defined as
  • An optimization method for a joint optimization system of computation offloading and resource allocation under a multi-constrained edge environment comprises the following steps:
  • Step S1 Generate offloading decisions and resource allocation decisions based on the joint optimization model of computing offloading and resource allocation according to the tasks generated on different MDs, the offloading priorities of the tasks, the battery power of the MDs, and the computing resources available to the MEC server at the current moment;
  • Step S2 Communication resources are issued according to the resource allocation decision, and MDs offload tasks to the local or MEC server for execution according to the offloading decision;
  • Step S3 The job scheduler allocates jobs to servers from the job sequence based on resource allocation decisions.
  • N records are randomly selected to train the network parameters to obtain the final joint optimization model of computation offloading and resource allocation.
  • the initialization system is specifically as follows: based on the state space, action space and reward function, first initialize the parameters ⁇ of the actor network and the parameters ⁇ Q of the critic network; then, assign the actor network parameters ⁇ to the target actor network parameters ⁇ ′ and assign the critic network parameters ⁇ Q to the target critic network parameters ⁇ Q′ , and at the same time initialize the experience replay pool M, the training round P and the time series length Tmax .
  • the state space includes the tasks Task t generated on all MDs in sub-time slot t, the task offloading priority pr t , the MDs battery power b t and the computing resources available to the MEC server at the current moment Therefore, the system state at sub-time slot t is expressed as:
  • Action space The DRL agent makes computation offloading and resource allocation actions based on the current system state; the action space includes the offloading decision ⁇ t , the upload bandwidth allocation of the task w t , and the MEC server computing resources p t allocated to the task. Therefore, the action at sub-time slot t is expressed as:
  • Reward function The goal of the system is to minimize the sum of the weighted cost of system delay and energy consumption under the constraints of the optimization problem P1. Therefore, at the time of sub-time slot t, the instantaneous reward of the system is expressed as:
  • w1 and w2 represent the weights of the delay and energy consumption caused by executing the task
  • F represents the normalization function
  • Pu represents the penalty coefficient of task failure
  • the training is specifically: training the critic network ⁇ Q to fit Q(s t , a t ).
  • Q(s t , a t ) is determined, for a fixed s t there must be an a t that maximizes Q(s t , a t ).
  • Q(s t , a t ) is expressed as:
  • the actor network ⁇ outputs the maximum action at of Q value according to the current state st.
  • the process is expressed as:
  • the performance goal of an actor network is defined as:
  • the critic network is responsible for calculating the current Q-value Q(s t ,a t ) and defines the target Q-value y t :
  • the gradient ascent method is used to approximate the optimal solution of the actor network strategy.
  • the loss function of the critic network is defined as:
  • the target actor network and the target critic network move closer to the actor network and the critic network according to the update step ⁇ .
  • the present invention has the following beneficial effects:
  • the present invention can generate appropriate computing offloading and resource allocation schemes according to computing resources and network conditions, thereby improving the success rate of task execution and reducing the delay and energy consumption of task execution.
  • the present invention can assign priorities to tasks based on the amount of task data and the performance of the mobile device.
  • FIG1 is a single edge multi-mobile equipment MEC system according to an embodiment of the present invention.
  • FIG2 is a timing task workflow in one embodiment of the present invention.
  • FIG3 is a flow chart of the JOA_RL method in one embodiment of the present invention.
  • FIG. 4 is a comparison of the convergence of different methods in one embodiment of the present invention.
  • FIG5 is an example of the effect of network bandwidth on different methods according to an embodiment of the present invention.
  • FIG6 is an example of the impact of the computing power of the MEC server on different methods according to an embodiment of the present invention.
  • FIG. 7 shows the influence of the maximum capacity of the MD battery on different methods in one embodiment of the present invention.
  • the present invention designs a unified computational offloading and resource allocation model for a dynamic MEC system under multiple constraints, and takes the latency and energy consumption of executing tasks as optimization targets.
  • a task priority preprocessing mechanism is designed, which can assign priorities to tasks according to the data volume of the task and the performance of the mobile device.
  • the state space, action space and reward function of the computational offloading and resource allocation problem in the MEC environment are defined, and the above optimization problem is formalized as a Markov decision process (MDP).
  • MDP Markov decision process
  • JOR-RL deep reinforcement learning
  • the critic network adopts a single-step update method based on the value function to evaluate the current offloading plan and resource scheduling strategy; while the actor network adopts a policy gradient-based update method to output the offloading plan and resource scheduling strategy.
  • MDs access the BS via 5G or LTE, and the BS is equipped with a MEC server.
  • all MDs are equipped with energy harvesting (EH) components and are powered by energy harvested from radio frequency (RF) signals.
  • EH energy harvesting
  • each MD generates a computing task in The amount of data for the task, is the computing resource required for the task, and T d is the maximum completion delay allowed for the task.
  • MDs obtain power from the RF signal of the BS. The task must be completed within its corresponding maximum tolerable delay and the existing battery power, otherwise the task will be judged as failed.
  • tasks from MDs can be executed with the assistance of the MEC server.
  • the specific communication model, computing model, and energy collection model are defined as follows.
  • Bt represents the upload bandwidth shared by all MDs in the current sub-time slot t. It represents the bandwidth ratio allocated by BS to MD i for transmission upload task at sub-time slot t.
  • the computing power (i.e., CPU frequency) of different MDs may be different, but it will not change during task execution. Therefore, the latency and energy consumption of the local computing mode are defined as
  • fi represents the CPU frequency of MD i
  • k represents the effective capacitance coefficient
  • the MEC server When MDs offload tasks to MEC servers for execution, the MEC server will choose to allocate part of the currently available computing resources to MDs. After the task is completed, the MEC server will return the results to MDs. Usually, the amount of data in the calculation results is very small, and the delay and energy consumption of downloading the task calculation results can be ignored. Therefore, the delay and energy consumption of the edge computing mode are defined as
  • the present invention provides a task priority preprocessing mechanism that can assign priorities to tasks based on the amount of data in the task and the performance of the mobile device. This mechanism measures the suitability of different tasks to be uploaded to the MEC server for execution. Tasks with higher priorities tend to be offloaded to the MEC server for execution. Specifically, the above priorities are defined as
  • all MDs are equipped with rechargeable batteries with a maximum capacity of B max .
  • the charge of MD i at the beginning of sub-slot t is Specifically, ET and MEC servers are deployed at the edge of the network, allowing ET to provide on-demand energy to the central processing unit (CPU) and radio transceiver of wireless devices through WPT in a fully controllable manner, and the collected energy will be input into the battery of MDs. Using the collected energy, MDs can offload computing tasks to MEC servers for execution or execute tasks locally.
  • CPU central processing unit
  • WPT radio transceiver of wireless devices
  • the goal of the proposed MEC system is to minimize the sum of the weighted overhead of latency and energy consumption generated by executing sequential tasks on MDs, which can be formalized as an optimization problem P1 as follows:
  • w1 and w2 represent the weights of the latency and energy consumption generated by executing the task, respectively.
  • C1 means that a task can only be executed locally or offloaded to the MEC server.
  • C2 means that the energy consumption generated by executing the task cannot exceed the available power of the current device.
  • C3 means that the execution time of the task cannot exceed the maximum tolerable latency Td of the task.
  • C4 represents the constraint on the proportion of upload bandwidth allocated to the offloaded task.
  • C5 represents the constraint on the proportion of MEC server computing resources allocated to the offloaded task.
  • the present invention proposes a joint optimization method for computation offloading and resource allocation based on deep reinforcement learning (JOA_RL); the computation offloading and resource allocation in the MEC system are regarded as the environment, and the DRL agent selects the corresponding action by interacting with the environment.
  • JA_RL deep reinforcement learning
  • the state space, action space, and reward function defined in the JOA_RL method are as follows:
  • the state space includes the tasks Task t generated on all MDs in sub-time slot t, the task offloading priority pr t , the MDs battery power b t and the computing resources available to the MEC server at the current moment Therefore, the system state at sub-time slot t can be expressed as:
  • Action space The DRL agent makes computation offloading and resource allocation actions based on the current system state.
  • the action space includes the offloading decision ⁇ t , the upload bandwidth allocation of the task w t , and the MEC server computing resources p t allocated to the task. Therefore, the action at sub-time slot t can be expressed as:
  • the goal of the proposed MEC system is to minimize the sum of the weighted overhead of system delay and energy consumption under the constraints of the optimization problem P1. Therefore, at the sub-time slot t, the instantaneous reward of the system can be expressed as:
  • w1 and w2 represent the weights of the delay and energy consumption generated by executing the task, respectively.
  • F represents the normalization function, which is used to normalize the values of delay and energy consumption to the same numerical range.
  • Pu represents the penalty coefficient for task failure.
  • the DRL agent selects an action a t (computation offloading and resource allocation) under the current system state (including task state and resource usage) s t according to the strategy ⁇ .
  • the environment feedbacks the reward r t based on the action a t and transforms to the new system state s t+1 .
  • This process can be expressed as an MDP process.
  • JOA-RL can effectively approximate the optimal strategy for computing offloading and resource allocation in a dynamic MEC environment, achieve a better balance between latency and energy consumption under the constraints of maximum task latency and device power, and show a higher task execution success rate.
  • the JOA-RL method uses Deep Deterministic Policy Gradient (DDPG) to train DNN to obtain the optimal computational offloading and resource allocation strategy.
  • DDPG Deep Deterministic Policy Gradient
  • the critic network adopts a single-step update method based on the value function, which is responsible for evaluating the Q value corresponding to each action.
  • the actor network adopts an update method based on the policy gradient, which is responsible for generating corresponding computational offloading and resource allocation actions under the current system state.
  • the policy gradient error can be effectively reduced because the critic network can guide the actor network to learn the optimal policy.
  • the JOR-RL method can handle high-dimensional state space problems well.
  • the parameters ⁇ ⁇ of the actor network and the parameters ⁇ Q of the critic network are first initialized. Then, the actor network parameters ⁇ ⁇ are assigned to the target actor network parameters ⁇ ⁇ ′ and the critic network parameters ⁇ Q are assigned to the target critic network parameters ⁇ Q′ , and the experience replay pool M, the training round P, and the time series length T max are initialized. In particular, an independent target network is used in this method. The correlation between data is reduced, and the stability and robustness of the method are enhanced. At the same time, the correlation of data is reduced by introducing the experience replay mechanism.
  • training begins.
  • the method inputs the system environment state s t obtained at each step into the actor network, executes the actor network output action a t in the environment, and performs the corresponding offloading calculation and resource allocation operations (lines 5-11).
  • the corresponding reward is calculated according to the formula, and the environment feeds back the cumulative execution reward r t of the task and the next state s t+1 (line 12).
  • the JOA-RL method Since the system state and resource allocation action in the MEC environment are continuous values, the JOA-RL method considers MDPs where both the state and the action are continuous values.
  • the JOA-RL method trains the critic network ⁇ Q to fit Q(s t ,a t ).
  • Q(s t ,a t ) is determined, there must be an a t for a fixed s t that maximizes Q(s t ,a t ).
  • the mapping relationship between s t and a t is very complex.
  • the Q value after a given s t is a high-dimensional multi-layer nested nonlinear function about a t .
  • this paper uses the actor network ⁇ ⁇ to fit the complex mapping. Specifically, Q(s t ,a t ) is expressed as:
  • N records are randomly selected for training network parameters (line 14).
  • An important problem faced by this method when optimizing the loss function is that the performance is very unstable when optimizing the derivative of the expression containing max. Updating the parameters may not necessarily make max(s t+1 ,a t+1 ) change in the ideal direction. This is especially true when the action space is continuous, resulting in the target network itself moving while training Q(s t ,a t ) to move toward the target network.
  • the target actor network ⁇ ⁇ and the target critic network ⁇ Q are defined separately.
  • the critic network is responsible for calculating the current Q value Q(s t ,a t ) and defines the target Q value y t :
  • the gradient ascent method is used to approximate the optimal solution of the actor network strategy.
  • the loss function of the critic network is defined as:
  • the target actor network and the target critic network are updated to move closer to the actor network and the critic network at the update step ⁇ .
  • this update method can make the method more stable.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • the computation offloading and resource allocation joint optimization model proposed in this embodiment is built and trained based on Python 3.6 and the open source framework Pytorch. All simulation experiments are carried out on a laptop equipped with an Intel i5-7300HQ, whose CPU clock frequency is 2.5GHz and memory is 8GB. In the experiment, all MDs are randomly distributed and share bandwidth within the coverage of the AP, and the AP is equipped with a MEC server. Among them, the distribution of the computing power of each MD is [1,1.2]GHz/s, and the computing power of the MEC server is 20GHz/s. Under the default experimental settings, 10 MDs share a bandwidth of 10MHz, the duration of each time slot T is 1s, the duration of the sub-time slot t is 0.25s, and a training round totals 48 time slots T.
  • the learning rate of the actor network is 0.0006
  • the learning rate of the critic network is 0.006
  • the reward discount factor gamma is set to 0.95.
  • MEC All tasks are offloaded to the MEC server for execution
  • Random The task is executed on MDs or MEC servers in a random manner
  • DQN A value-based DRL method that learns a deterministic policy by calculating the probability of each computation offloading and resource allocation action.
  • Greedy method Compared with JOA-RL and DQN methods, Greedy method only focuses on the immediate rewards that can be obtained by completing the task, and does not consider long-term rewards well. In the early stage of the training process, the performance of Greedy method will be better than that of JOA-RL and DQN, two DRL-based methods. However, in the later stage of the training process, the performance of the two methods, JOA-RL and DQN, exceeded that of the Greedy method because they took into account the long-term rewards of the system.
  • the JOA-RL method proposed in the present invention integrates the value-based and policy-based DRL methods, can cope with high-dimensional continuous action space and converge faster, so that the performance of the JOA-RL method is better than that of the DQN method.
  • the average energy consumption of different methods for successfully completing tasks is compared.
  • the MEC method and the Local method show the highest and lowest average task energy consumption, respectively.
  • the Greedy method prioritizes local execution of tasks on the premise of meeting the maximum tolerance delay of the task, so its average task energy consumption is only higher than that of the Local method.
  • the JOA-RL method also performs better than the DQN method after convergence.
  • the average task waiting time of different methods is compared.
  • the JOA-RL method is better than the other five methods in the average task waiting time after convergence.
  • the Local method takes a long time to complete the task due to the limited local computing power, so the average task waiting time is much higher than that of the other five methods.
  • the task success rates of different methods are compared.
  • the change in network bandwidth has no effect on the Local method because there is no computation offloading process.
  • the bandwidth allocated to each uploaded task will be very low, which leads to a large amount of task upload time and many tasks fail due to failure to meet the maximum delay constraint. Therefore, the performance of the MEC method is poor.
  • the performance of the five methods other than the Local method also shows an upward trend.
  • the performance improvement of the MEC method is the most obvious, because the performance of this method is very dependent on the network bandwidth.
  • the JOA-RL method proposed in this paper can better handle the continuous resource allocation problem compared with the DQN method, and achieve lower latency and energy consumption.
  • the JOA-RL method has more advantages in the joint optimization problem of computation offloading and resource allocation.
  • the performance of the five methods except the Local method is basically stable. This is because with the increase of network bandwidth, the number of tasks that fail due to exceeding the delay constraint during computation offloading is reduced, but due to the constraint of MDs battery power, the performance of these methods cannot be further improved.
  • the Local method does not have a computational offloading process, so the change in the computing power of the MEC server has no effect on it.
  • the performance of the five methods other than the Local method also shows an upward trend.
  • the JOA-RL method proposed in the present invention can achieve lower latency and energy consumption than the DQN method. This is because the JOA-RL method can better handle the continuous resource allocation problem, indicating that the JOA-RL method has more advantages in the joint optimization problem of computational offloading and resource allocation.
  • the performance of the five methods except the Local method is basically stable.
  • the power consumed by the local calculation of the task is lower than the maximum capacity of the battery, so the increase in the maximum capacity of the MD battery has no effect on the Local method.
  • the power consumed by the task upload is large, so when the maximum capacity of the MD battery is small, the task often fails because the battery power is insufficient to support the calculation offloading.
  • the stored power can support more calculation offloading, so the performance of these five methods shows an upward trend.
  • the failure of calculation offloading caused by insufficient maximum capacity of the MD battery basically disappears, and the performance of these methods also tends to be stable.
  • the JOA-RL method proposed in this paper can better handle the continuous resource allocation problem than the DQN method, and achieve lower latency and energy consumption. This shows that the JOA-RL method has more advantages in the joint optimization problem of calculation offloading and resource allocation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本发明涉及一种多约束边环境下计算卸载与资源分配联合优化***及方法,针对多约束条件下动态的MEC***设计了一种统一的计算卸载与资源分配模型,并将执行任务的时延与能耗作为优化目标。设计了一种任务优先级预处理机制,能够根据任务的数据量与移动设备的性能为任务分配优先级,并提供一种基于深度强化学习的计算卸载与资源分配联合优化方法JOR-RL,在JOA-RL方法中,critic网络采用基于值函数的单步更新方式,用于评价当前卸载方案与资源调度策略;而actor网络采用基于策略梯度的更新方式,用于输出卸载方案与资源调度策略。本发明在提升任务执行成功率以及降低任务执行时延与能耗方面效果显著。

Description

多约束边环境下计算卸载与资源分配联合优化***及方法 技术领域
本发明涉及一种多约束边环境下计算卸载与资源分配联合优化***及方法。
背景技术
随着通信技术与移动设备的快速发展与普及,各类新兴的应用不断涌现,这些应用通常会收集大量传感数据并伴随着计算密集型的任务以支持其高质量的智能服务,这对移动设备的硬件性能提出了巨大的挑战。但是,受限于设备尺寸与制造成本,移动设备通常只会配备一定容量的蓄电池与计算能力有限的处理器,这已无法支持新兴应用对高性能可持续处理的需求。云计算提供了充足的计算与存储资源,移动设备可以借助云服务来弥补其在硬件性能上不足。因此,一种可行的解决方案是将移动设备上计算密集型的任务卸载到资源充足的远程云上执行,完成任务后将结果反馈给移动设备。然而,移动设备与远程云之间的长距离会导致严重的数据传输延迟,不能很好地满足延迟敏感型应用的需求,也会显著影响用户的服务体验。
技术问题
相比云计算,移动边缘计算(MEC)将计算与存储资源部署到更加接近移动设备的网络边缘。因此,利用MEC进行计算卸载可以有效避免云计算中出现的网络拥塞的情况,降低网络服务响应时间,同时也能更好地满足用户对隐私保护的基本需求。相对于云服务器,MEC服务器配备的资源更少,但灵活性更强。因此,如何在资源受限的MEC***中实现合理的资源分配是一个难点。此外,移动设备往往需要持续运行以支持各类智能应用,但受限于电池容量,任务的计算卸载过程在一定程度上也会受到影响。MEC与基于射频的无线电力传输(WPT)的集成 最近已成为一种可行且有前途的解决方案,可为无线移动设备的无线电收发器提供按需能量。但是,能量与延迟的多约束给边缘环境下的计算卸载与资源分配又带来了新的挑战,因此,需要设计一种有效的计算卸载与资源分配方法。
技术解决方案
有鉴于此,本发明的目的在于提供一种多约束边环境下计算卸载与资源分配联合优化***及方法,可以获得动态MEC环境下计算卸载与资源分配的最优策略。
为实现上述目的,本发明采用如下技术方案:
一种多约束边环境下计算卸载与资源分配联合优化***,包括基站BS、MEC服务器和N个可充电移动设备MDs,其中,N个可充电移动设备MDs记为集合MD={MD 1,MD 2,...MD i...,MD N};所述可充电移动设备MDs通过5G或LTE方式接入基站BS,在基站BS上配备了MEC服务器。
进一步的,所述MDs配备了能量收集(EH)组件并由无线电频率(RF)信号收集的能量为其提供电力。
进一步的,当可充电移动设备MDs产生任务时,将计算任务卸载到MEC服务器上执行或在本地执行任务,优先级越高的任务将倾向于卸载至MEC服务器上执行,具体地,上述优先级
Figure PCTCN2022126471-appb-000001
被定义为
Figure PCTCN2022126471-appb-000002
其中,
Figure PCTCN2022126471-appb-000003
表示子时隙t内传输信道增益,
Figure PCTCN2022126471-appb-000004
为Task i的数据量,f i为MD i的计算能力,P i表示MD i传输功率。
一种多约束边环境下计算卸载与资源分配联合优化***的优化方法,包括以下步骤:
步骤S1:根据不同MDs上所产生的任务、任务的卸载优先级、MDs的电池电量以及当前时刻MEC服务器可用的计算资源,基于计算卸载与资源分配联合优化模型生成卸载决策以及资源分配决策;
步骤S2:根据资源分配决策将通信资源下发,MDs根据卸载决策将任务卸载至本地或MEC服务器进行执行;
步骤S3:作业调度器根据资源分配决策,从作业序列中将作业分配给服务器。
进一步的,所述计算卸载和资源分配联合优化模型,基于Python3.6和开源框架Pytorch构建和训练,具体如下:
(1)获取MD i计算能力f i,MEC服务器计算能力
Figure PCTCN2022126471-appb-000005
网络带宽
Figure PCTCN2022126471-appb-000006
并初始化***;
(2)进行训练,并将每次训练获取的***环境状态s t输入actor网络,在环境中执行actor网络输出动作a t,执行相应的卸载计算与资源分配操作;
(3)根据公式计算相应的奖励,环境反馈该步任务累积执行奖励r t与下一个状态s t+1,并将训练样本存入经验回放池M.push(s t,a t,r t,s t+1);
(4)当存入M中的训练样本数达到N条时,随机选出N条记录用于训练网络参数,得到最终的计算卸载和资源分配联合优化模型。
进一步的,所述初始化***,具体为:基于状态空间、动作空间和奖励函数,首先初始化actor网络的参数θ μ和critic网络的参数θ Q;然后,将actor网络参数θ μ赋值给目标actor网络参数θ μ′并将critic网络参数θ Q赋值给目标critic网络参数θ Q′,同时初始化经验回放池M、训练回合P以及时间序列长度T max
进一步的,所述状态空间、动作空间和奖励函数如下:
状态空间:状态空间包含子时隙t所有MDs上所产生的任务Task t、任务的卸载 优先级pr t、MDs电池电量b t以及当前时刻MEC服务器可用的计算资源
Figure PCTCN2022126471-appb-000007
因此,在子时隙t时刻的***状态表示为:
Figure PCTCN2022126471-appb-000008
其中
Figure PCTCN2022126471-appb-000009
动作空间:DRL代理根据当前***状态做出计算卸载与资源分配的动作;动作空间包含卸载决策α t、任务的上传带宽分配w t以及为任务所分配的MEC服务器计算资源p t。因此,在子时隙t时刻的动作表示为:
a t={α t,w t,p t}公式(15)
其中,
Figure PCTCN2022126471-appb-000010
奖励函数:***的目标是在满足优化问题P1的约束条件下最小化***时延与能耗的加权开销之和,因此,在子时隙t时刻,***的即时奖励表示为:
Figure PCTCN2022126471-appb-000011
其中,w 1和w 2分别表示执行任务所产生的时延与能耗的权重,F表示归一化函数,Pu表示任务失败的惩罚系数。
进一步的,所述训练,具体为:训练critic网络θ Q去拟合Q(s t,a t),当Q(s t,a t)确定时,对于固定的s t一定存在一个a t使得Q(s t,a t)最大,Q(s t,a t)表示为:
Q(s t,a t)=E environment[r(s t,a t)+γQ(s t+1,μ(s t+1))]      公式(17)
其中,actor网络θ μ根据当前状态s t输出Q值的最大动作a t,该过程表示为:
a t=μ(s tμ)         公式(18)
actor网络的性能目标定义为:
Figure PCTCN2022126471-appb-000012
进一步的,定义目标actor网络θ μ′和目标critic网络θ Q′
critic网络负责计算当前Q值Q(s t,a t),并定义了目标Q值y t
y t=r t+γQ′(s t+1,μ′(s t+1μ′)|θ Q′)        公式(20)
采用梯度上升法逼近actor网络的策略最优解,critic网络的损失函数定义为:
Figure PCTCN2022126471-appb-000013
在每个训练步,目标actor网络与目标critic网络按照更新步伐τ向actor网络与critic网络靠近。
有益效果
本发明与现有技术相比具有以下有益效果:
1、本发明能够根据计算资源与网络状况,生成合适的计算卸载与资源分配方案,提高执行任务成功率并降低执行任务的时延与能耗
2、本发明能够根据任务数据量与移动设备性能为任务分配优先级。
附图说明
图1是本发明一实施例中单边缘多移动设备MEC***;
图2是本发明一实施例中时序任务工作流程;
图3是本发明一实施例中JOA_RL方法流程图;
图4是本发明一实施例中不同方法的收敛性对比
图5是本发明一实施例中网络带宽对不同方法的影响;
图6是本发明一实施例MEC服务器的计算能力对不同方法的影响;
图7是本发明一实施例中MD蓄电池最大容量对不同方法的影响。
本发明的实施方式
下面结合附图及实施例对本发明做进一步说明。
本发明针对多约束条件下动态的MEC***设计了一种统一的计算卸载与资源分配模型,并将执行任务的时延与能耗作为优化目标。设计了一种任务优先级预处理机制,能够根据任务的数据量与移动设备的性能为任务分配优先级。相应地,针对DRL框架,定义了MEC环境下计算卸载与资源分配问题的状态空间、动作空间和与奖励函数,并将上述优化问题形式化表示为马尔可夫决策过程(MDP)。继而提出了基于深度强化学习的计算卸载与资源分配联合优化方法JOR-RL,在JOA-RL方法中,critic网络采用基于值函数的单步更新方式,用于评价当前卸载方案与资源调度策略;而actor网络采用基于策略梯度的更新方式,用于输出卸载方案与资源调度策略。
请参照图1,本发明提供一种MEC***,由一个基站(BS)、一个MEC服务器和N个可充电移动设备(MDs)构成,其中,N个记为集合MD={MD 1,MD 2,...,MD N}。MDs通过5G或LTE方式接入BS,在BS上配备了MEC服务器。此外,所有MDs配备了能量收集(EH)组件并由无线电频率(RF)信号收集的能量为其提供电力。
如图2所示,在每个时隙T的开始时刻,每个MD产生一个计算任务
Figure PCTCN2022126471-appb-000014
其中
Figure PCTCN2022126471-appb-000015
为任务的数据量、
Figure PCTCN2022126471-appb-000016
为任务所需的计算资源、T d为任务允许的最大完成延迟。MDs从BS的射频信号中获取电力。任务须在其相应的最大容忍时延和现存的电池电量内完成,否则任务将被判定为失败。在所提出的MEC***中,来自MDs的任务可以在MEC服务器的协助下执行完成,具体的通信模型、计算模型以及能量收集模型定义如下。
1通信模型
如图2所示,
Figure PCTCN2022126471-appb-000017
被定义为时隙T开始时刻MD i上产生任务的卸载决策。当
Figure PCTCN2022126471-appb-000018
时,MD i将任务卸载到MEC服务器上执行;当
Figure PCTCN2022126471-appb-000019
MD i在本地执行任务。当MD i选择将任务卸载到MEC服务器上执行时,其任务计算所依赖的数据也将相应地上传,并由BS分配其上传任务的带宽。因此,MD i在子时隙t的信噪比为
Figure PCTCN2022126471-appb-000020
其中,δ表示高斯白噪声的平均功率,
Figure PCTCN2022126471-appb-000021
和P i分别表示MD i在子时隙t的信道增益和传输功率。因此,MD i传输计算任务的功率为
Figure PCTCN2022126471-appb-000022
其中,B t表示当前子时隙t所有MDs共享的上传带宽,
Figure PCTCN2022126471-appb-000023
表示子时隙t时刻BS分配给MD i传输上传任务的带宽比例。
2计算模型
在所提出的MEC***中,当MDs产生任务时,任务会先被添加到相应MD的任务缓冲队列上,先添加进队列的任务完成之后才能执行后续的任务。由于MDs和MEC服务器都可以提供计算服务,两种计算模式定义如下:
(1)本地计算模式
假设不同MDs的计算能力(即CPU频率)可能是不同的,但在任务执行过程中是不会改变的。因此,本地计算模式的延迟和能耗分别定义为
Figure PCTCN2022126471-appb-000024
Figure PCTCN2022126471-appb-000025
其中,f i表示MD i的CPU频率,
Figure PCTCN2022126471-appb-000026
表示
Figure PCTCN2022126471-appb-000027
所需的计算资源,k表示有效电容系数。
(2)边缘计算模式
当MDs将任务卸载到MEC服务器上执行时,MEC服务器会选择分配部分当前可用的计算资源给MDs,任务执行完成后MEC服务器会将结果返回给MDs。通常,计算结果的数据量非常小,下载任务计算结果的延迟与能耗可忽略不计。因此,边缘计算模式的延迟和能耗分别定义为
Figure PCTCN2022126471-appb-000028
Figure PCTCN2022126471-appb-000029
其中,
Figure PCTCN2022126471-appb-000030
表示子时隙t开始时刻MEC服务器可用的计算资源,
Figure PCTCN2022126471-appb-000031
表示t子时隙分配给MD i计算资源的比例,P e表示MEC服务器分配给任务的计算功率。
因此,执行
Figure PCTCN2022126471-appb-000032
的延迟可表示为
Figure PCTCN2022126471-appb-000033
执行
Figure PCTCN2022126471-appb-000034
的能耗可表示为
Figure PCTCN2022126471-appb-000035
其中,
Figure PCTCN2022126471-appb-000036
表示
Figure PCTCN2022126471-appb-000037
的卸载决策。
为了能够针对不同任务做出快速的决策找到合适的计算模式,本发明提供一种任务优先级预处理机制,能够根据任务的数据量与移动设备的性能为任务分配优先级。该机制衡量了不同任务上传至MEC服务器执行的合适程度,优先级越高的任务将倾向于卸载至MEC服务器上执行。具体地,上述优先级被定义为
Figure PCTCN2022126471-appb-000038
其中,
Figure PCTCN2022126471-appb-000039
表示子时隙t内传输信道增益、f i为MD i的计算能力以及P i表示MD i传输功率。根据任务的计算环境赋予其相应的优先级,在保证高优先级任务成功完 成的同时,减少任务计算总时间和能耗,从而提高服务质量。
3能量收集模型
在所提出的MEC***中,所有MDs都配备了可充电电池,其最大容量为B max,记MD i在子时隙t开始时刻的电量为
Figure PCTCN2022126471-appb-000040
具体而言,ET和MEC服务器部署在网络边缘,允许ET以完全可控的方式通过WPT为无线设备的中央处理单元(Center Processing Unit,CPU)和无线电收发器提供按需能量,收集的能量将输入MDs的电池。利用收集到的能量,MDs可以将计算任务卸载到MEC服务器上执行或在本地执行任务。为简化模型,假设在收集能量的过程中能量是以能量包的形式到达MDs,即在每个子时隙t开始时刻MDs通过EH组件获取能量包并输入电池,能量包大小记为e t。对于任务不同执行状态下MDs电量变化的考虑如下:
(1)当子时隙t内的任务因决策失败无法在MD i电量可支持范围内顺利完成或当前无任务执行,则在子时隙t内只有无线组件的充电电量变化。因此,在子时隙t+1开始时刻,MD i的电量为
Figure PCTCN2022126471-appb-000041
(2)当子时隙t内MD i上的任务在本地执行,其能耗为
Figure PCTCN2022126471-appb-000042
则在子时隙t+1开始时刻MD i的电量为
Figure PCTCN2022126471-appb-000043
(3)当子时隙t内MD i上的任务卸载到MEC服务器上执行,其能耗为
Figure PCTCN2022126471-appb-000044
则在子时隙t+1开始时刻MD i的电量为
Figure PCTCN2022126471-appb-000045
基于上述***模型定义,所提出MEC***的目标是最小化执行MDs上时序任务所产生的时延与能耗的加权开销之和,可形式化为优化问题P1如
Figure PCTCN2022126471-appb-000046
其中,w 1和w 2分别表示执行任务所产生的时延与能耗的权重。C1表示一个任务只能本地或者卸载到MEC服务器上执行。C2表示执行任务产生的能耗不能超过当前设备的可用电量。C3表示任务的执行时间不能超过任务最大容忍时延T d。C4表示为卸载任务所分配上传带宽比例的约束。C5表示为卸载任务所分配MEC服务器计算资源比例的约束。
在本实施例中,参考图3,本发明提出了一种基于深度强化学习的计算卸载和资源分配联合优化方法JOA_RL;MEC***中的计算卸载与资源分配被视为环境,DRL代理通过与环境交互来选择相应的动作
其中在JOA_RL方法中定义的状态空间、动作空间和奖励函数如下:
状态空间:状态空间包含子时隙t所有MDs上所产生的任务Task t、任务的卸载优先级pr t、MDs电池电量b t以及当前时刻MEC服务器可用的计算资源
Figure PCTCN2022126471-appb-000047
因此,在子时隙t时刻的***状态可表示为:
Figure PCTCN2022126471-appb-000048
其中
Figure PCTCN2022126471-appb-000049
动作空间:DRL代理根据当前***状态做出计算卸载与资源分配的动作。动作空间包含卸载决策α t、任务的上传带宽分配w t以及为任务所分配的MEC服务器计算资源p t。因此,在子时隙t时刻的动作可表示为:
a t={α t,w t,p t}           公式(15)其中,
Figure PCTCN2022126471-appb-000050
奖励函数:所提出MEC***的目标是在满足优化问题P1的约束条件下最小化***时延与能耗的加权开销之和。因此,在子时隙t时刻,***的即时奖励可表示为:
Figure PCTCN2022126471-appb-000051
其中,w 1和w 2分别表示执行任务所产生的时延与能耗的权重。F表示归一化函数,用于将时延与能耗的数值归一化到相同数值区间。Pu表示任务失败的惩罚系数。
在多约束MEC环境下的计算卸载与资源分配优化过程中,DRL代理根据策略μ在当前***状态(包含任务状态和资源使用)s t下选择一个动作a t(计算卸载与资源分配)。环境根据动作a t反馈奖励r t并转换到新的***状态s t+1,该过程可表述为MDP过程。
在本实施例中,JOA-RL能够有效地逼近动态MEC环境下计算卸载和资源分配的最优策略,在任务最大时延与设备电量约束下能够在时延与能耗之间取得更好的平衡,且展现出了更高的任务执行成功率。
JOA-RL方法中利用了深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)用于训练DNN以获取最优的计算卸载和资源分配策略。
在JOA-RL方法中,critic网络采用基于值函数的单步更新方式,负责评估每个动作相对应的Q值,actor网络采用基于策略梯度的更新方式,负责在当 前***状态下生成相应的计算卸载与资源分配动作。
通过使用critic网络可以有效降低策略梯度的误差,因为critic网络可以引导actor网络学习最优的策略。此外,通过集成DNN,JOR-RL方法可以很好处理高维度状态空间的问题。
本发明提出的JOA_RL方法的关键步骤如算法1所示:
Figure PCTCN2022126471-appb-000052
Figure PCTCN2022126471-appb-000053
基于式(14)中状态空间、式(15)中动作空间和式(16)中奖励函数的定义,首先初始化actor网络的参数θ μ和critic网络的参数θ Q。然后,将actor网络参数θ μ赋值给目标actor网络参数θ μ′并将critic网络参数θ Q赋值给目标critic网络参数θ Q′,同时初始化经验回放池M、训练回合P以及时间序列长度T max。特别地,该方法中采用了独立目标网络。减少了数据之间的相关性,并增强了方法的稳定性和鲁棒性,同时通过引入经验回放机制,降低数据的相关性。
初始化完成后,开始进行训练。在每个训练回合中,该方法将每一步获取的***环境状态s t输入actor网络,在环境中执行actor网络输出动作a t,执行相应的卸载计算与资源分配操作(第5-11行)。根据公式计算相应的奖励,环境反馈该步任务累积执行奖励r t与下一个状态s t+1(第12行)。
由于MEC环境中的***状态与资源分配动作是一个连续值,JOA-RL方法考虑状态与动作均为连续值的MDP。JOA-RL方法训练critic网络θ Q去拟合Q(s t,a t),当Q(s t,a t)确定时,对于固定的s t一定存在一个a t使得Q(s t,a t)最大。但是,s t到a t之间的映射关系十分复杂,给定s t后的Q值是一个关于a t的高维多层嵌套非线性函数。为解决这个问题,本文利用actor网络θ μ去拟合该复杂映射。具体而言,Q(s t,a t)表示为:
Q(s t,a t)=E environment[r(s t,a t)+γQ(s t+1,μ(s t+1))]     公式(17)其中,actor网络θ μ根据当前状态s t输出Q值的最大动作a t,该过程可以表示为:
a t=μ(s tμ)         公式(18)
在该方法中,actor网络的性能目标定义为:
Figure PCTCN2022126471-appb-000054
当存入M中的训练样本数达到N条时,随机选出N条记录用于训练网络参数(第14行)。该方法在优化损失函数时面临的一个重要的问题是对含有max表达式进行求导优化时性能很不稳定,更新参数不一定能使得max(s t+1,a t+1)向理想的方向变化。尤其当动作空间是连续时该情况更为明显,导致了训练Q(s t,a t)向目标网络移动过程时目标网络本身也在移动。
为了解决这个问题,在该方法中,分别定义了目标actor网络θ μ和目标critic网络θ Q
critic网络负责计算当前Q值Q(s t,a t),并定义了目标Q值y t
y t=r t+γQ′(s t+1,μ′(s t+1μ′)|θ Q′)       公式(20)
采用梯度上升法逼近actor网络的策略最优解,critic网络的损失函数定义为:
Figure PCTCN2022126471-appb-000055
在每个训练步,目标actor网络与目标critic网络按照更新步伐τ向actor网络与critic网络靠近。相比于单纯地复制网络参数,这种更新方式可以让方法更加稳定。
实施例1:
本实施例提出的计算卸载和资源分配联合优化模型是基于Python3.6和开源框架Pytorch构建和训练的。所有仿真实验在配备Intel i5-7300HQ的笔记本电脑上开展,其CPU时钟频率为2.5GHz、内存为8GB。在实验中,所有MDs在AP的覆盖范围内随机分布并共享带宽,且该AP配备了一台MEC服务器。其中,每台MD的计算能力的分布为[1,1.2]GHz/s,MEC服务器的计算能力为20GHz/s。在默认实验设置下,10台MDs共享带宽10MHz,每个时隙T的持续时间为1s,子时隙t的持续时间为0.25s,一个训练回合合计48个时隙T。
在训练过程中,actor网络的学习率为0.0006,critic网络的学习率为0.006,奖励折扣因子gamma设置为0.95。当JOA-RL方法完成训练后,可适用于多变MEC环境下计算卸载与资源分配的联合优化。
基于以上设置,我们进行了大量的仿真实验来评估所提出的基于深度强化学习的计算卸载和资源分配联合优化方法的性能。为了分析所提出的JOA_RL方法的有效性和优势,将所提出的JOA-RL方法与以下5种基准方法进行了对比。
Local:所有任务皆在MDs上执行;
MEC:所有任务皆卸载到MEC服务器上执行;
Random:任务通过随机的方式在MDs或MEC服务器上执行;
Greedy:在满足任务最大容忍时延的前提下,任务优先选择在MDs上执行;
DQN:基于值的DRL方法,通过计算每个计算卸载与资源分配动作的概率来学习确定性策略。
如图4中(a)所示,对比了不同方法的收敛性,Local、MEC、Random和Greedy等方法为单步决策,不存在学***均消耗能量,MEC方法和Local方法分别展现出了最高和最低的平均任务消耗能量。Greedy方法在满足任务最大容忍时延的前提下优先在本地执行任务,因此其平均任务消耗能量仅高于Local方法。相比于DQN方法,JOA-RL方法收敛之后效果也优于DQN方法。如图4中(c)所示,对比了不同方法的平均任务等待时间。JOA-RL方法在收敛后的平均任务等待时间上优于其他5种方法,Local方法由于本地计算能力受限,完成任务所需的时间较长,所以平均任务等待时间远高于其他5种方法。如图4中(d)所示,对比了不同方法的任务成功率。
如图5所示,Local方法由于不存在计算卸载的过程,所以网络带宽的变化对其没有影响。对MEC方法而言,当网络带宽很低时,每个上传的任务所分配到的 带宽就会很低,这导致了大量的任务上传时间,也使得很多任务由于无法满足最大时延迟约束而失败,所以MEC方法反映出来的性能表现较差。随着网络带宽的提升,除Local方法以外的5种方法的性能表现上也呈上升趋势。其中,MEC方法的性能提升最为明显,因为该方法的性能表现非常依赖于网络带宽。本文所提出的JOA-RL方法相比DQN方法能更好地处理连续的资源分配问题,实现更低的时延与能耗。这表明JOA-RL方法在计算卸载与资源分配联合优化问题上更具优势。当网络带宽提升到一定程度时,除Local方法以外的5种方法的性能都基本趋于稳定。这是因为随着网络带宽的提升,在计算卸载过程中因超出时延约束而失败的任务减少了,但由于依然存在MDs电池电量的约束,使得这些方法的性能无法得到进一步的提升。
如图6所示,Local方法由于不存在计算卸载的过程,所以MEC服务器计算能力的变化对其没有影响。随着MEC服务器计算能力的增加,除Local方法以外的5种方法的性能表现上也呈上升趋势。本发明所提出的JOA-RL方法相比于DQN方法能实现更低的时延与能耗,这是因JOA-RL方法能更好地处理连续的资源分配问题,表明JOA-RL方法在计算卸载与资源分配联合优化问题上更具优势。当MEC服务器的计算能力增加到一定程度时,除Local方法以外的5种方法的性能也都基本趋于稳定。这是因为随着MEC服务器计算能力的增加,在计算卸载过程中因超出时延约束而失败的任务减少了,但存在MDs电池电量的约束,使得这些方法的性能无法得到进一步的提升。
如图7所示,对Local方法而言,任务本地计算所消耗的电量低于蓄电池的最大容量,因此MD蓄电池最大容量的增加对Local方法没有影响。对于其他五种方法而言,其任务上传消耗的电量较大,因此当MD蓄电池最大容量较小时,任务往往会因为蓄电池电量不足以支持计算卸载而失败。随着MD蓄电池最大容量的增加, 存储的电量能够支持更多的计算卸载,因此这五种方法的性能表现呈上升趋势。当MD蓄电池最大容量增加到一定程度时,因MD蓄电池最大容量不足而导致的计算卸载失败的情况基本消失,这些方法的性能也趋于稳定。本文所提出的JOA-RL方法相比DQN方法能更好地处理连续的资源分配问题,实现更低的时延与能耗。这表明JOA-RL方法在计算卸载与资源分配联合优化问题上更具优势。
以上所述仅为本发明的较佳实施例,凡依本发明申请专利范围所做的均等变化与修饰,皆应属本发明的涵盖范围。

Claims (9)

  1. 一种多约束边环境下计算卸载与资源分配联合优化***,其特征在于,包括基站BS、MEC服务器和N个可充电移动设备MDs,其中,N个可充电移动设备MDs记为集合MD={MD 1,MD 2,...MD i...,MD N};所述可充电移动设备MDs通过5G或LTE方式接入基站BS,在基站BS上配备了MEC服务器。
  2. 根据权利要求1所述的多约束边环境下计算卸载与资源分配联合优化***,其特征在于,所述MDs配备了能量收集组件并由无线电频率信号收集的能量为其提供电力。
  3. 根据权利要求1所述的多约束边环境下计算卸载与资源分配联合优化***,其特征在于,当可充电移动设备MDs产生任务时,将计算任务卸载到MEC服务器上执行或在本地执行任务,优先级越高的任务将倾向于卸载至MEC服务器上执行,具体地,上述优先级pr i T被定义为
    Figure PCTCN2022126471-appb-100001
    其中,
    Figure PCTCN2022126471-appb-100002
    表示子时隙t内传输信道增益,
    Figure PCTCN2022126471-appb-100003
    为Task i的数据量,f i为MD i的计算能力,P i表示MD i传输功率。
  4. 根据权利要求1所述的多约束边环境下计算卸载与资源分配联合优化***的优化方法,其特征在于,包括以下步骤:
    步骤S1:根据不同MDs上所产生的任务、任务的卸载优先级、MDs的电池电量以及当前时刻MEC服务器可用的计算资源,基于计算卸载与资源分配联合优化模型生成卸载决策以及资源分配决策;
    步骤S2:根据资源分配决策将通信资源下发,MDs根据卸载决策将任务卸载至本地或MEC服务器进行执行;
    步骤S3:作业调度器根据资源分配决策,从作业序列中将作业分配给服务器。
  5. 根据权利要求4所述的优化方法,其特征在于,所述计算卸载和资源分配联合优化模型,基于Python3.6和开源框架Pytorch构建和训练,具体如下:
    (1)获取MD i计算能力f i,MEC服务器计算能力
    Figure PCTCN2022126471-appb-100004
    网络带宽
    Figure PCTCN2022126471-appb-100005
    并初始化***;
    (2)进行训练,并将每次训练获取的***环境状态s t输入actor网络,在环境中执行actor网络输出动作a t,执行相应的卸载计算与资源分配操作;
    (3)根据公式计算相应的奖励,环境反馈该步任务累积执行奖励r t与下一个状态s t+1,并将训练样本存入经验回放池M.push(s t,a t,r t,s t+1);
    (4)当存入M中的训练样本数达到N条时,随机选出N条记录用于训练网络参数,得到最终的计算卸载和资源分配联合优化模型。
  6. 根据权利要求4所述的优化方法,其特征在于,所述初始化***,具体为:基于状态空间、动作空间和奖励函数,首先初始化actor网络的参数θ μ和critic网络的参数θ Q;然后,将actor网络参数θ μ赋值给目标actor网络参数θ μ′并将critic网络参数θ Q赋值给目标critic网络参数θ Q′,同时初始化经验回放池M、训练回合P以及时间序列长度T max
  7. 根据权利要求6所述的优化方法,其特征在于,所述状态空间、动作空间和奖励函数如下:
    状态空间:状态空间包含子时隙t所有MDs上所产生的任务Task t、任务的卸载优先级pr t、MDs电池电量b t以及当前时刻MEC服务器可用的计算资源
    Figure PCTCN2022126471-appb-100006
    因此,在子时隙t时刻的***状态表示为:
    Figure PCTCN2022126471-appb-100007
    其中
    Figure PCTCN2022126471-appb-100008
    动作空间:DRL代理根据当前***状态做出计算卸载与资源分配的动作;动作空间包含卸载决策α t、任务的上传带宽分配w t以及为任务所分配的MEC服务器计算资源p t;因此,在子时隙t时刻的动作表示为:
    a t={α t,w t,p t}       公式(15)
    其中,
    Figure PCTCN2022126471-appb-100009
    奖励函数:***的目标是在满足优化问题P1的约束条件下最小化***时延与能耗的加权开销之和,因此,在子时隙t时刻,***的即时奖励表示为:
    Figure PCTCN2022126471-appb-100010
    其中,w 1和w 2分别表示执行任务所产生的时延与能耗的权重,F表示归一化函数,Pu表示任务失败的惩罚系数。
  8. 根据权利要求4所述的优化方法,其特征在于,所述训练,具体为:训练critic网络θ Q去拟合Q(s t,a t),当Q(s t,a t)确定时,对于固定的s t一定存在一个a t使得Q(s t,a t)最大,Q(s t,a t)表示为:
    Q(s t,a t)=E environment[r(s t,a t)+γQ(s t+1,μ(s t+1))]     公式(17)
    其中,actor网络θ μ根据当前状态s t输出Q值的最大动作a t,该过程表示为:
    a t=μ(s tμ)        公式(18)
    actor网络的性能目标定义为:
    Figure PCTCN2022126471-appb-100011
  9. 根据权利要求4所述的优化方法,其特征在于,定义目标actor网络θ μ′和目标 critic网络θ Q′
    critic网络负责计算当前Q值Q(s t,a t),并定义了目标Q值y t
    y t=r t+γQ′(s t+1,μ′(s t+1μ′)|θ Q′)      公式(20)
    采用梯度上升法逼近actor网络的策略最优解,critic网络的损失函数定义为:
    Figure PCTCN2022126471-appb-100012
    在每个训练步,目标actor网络与目标critic网络按照更新步伐τ向actor网络与critic网络靠近。
PCT/CN2022/126471 2022-09-29 2022-10-20 多约束边环境下计算卸载与资源分配联合优化***及方法 WO2024065903A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211200913.9 2022-09-29
CN202211200913.9A CN115567978A (zh) 2022-09-29 2022-09-29 多约束边环境下计算卸载与资源分配联合优化***及方法

Publications (1)

Publication Number Publication Date
WO2024065903A1 true WO2024065903A1 (zh) 2024-04-04

Family

ID=84742402

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/126471 WO2024065903A1 (zh) 2022-09-29 2022-10-20 多约束边环境下计算卸载与资源分配联合优化***及方法

Country Status (3)

Country Link
CN (1) CN115567978A (zh)
NL (1) NL2033996A (zh)
WO (1) WO2024065903A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118042495A (zh) * 2024-04-12 2024-05-14 华东交通大学 超密集网络中加压安全计算卸载与资源优化方法
CN118102386A (zh) * 2024-04-24 2024-05-28 南京邮电大学 D2d辅助mec网络中的服务缓存和任务卸载联合优化方法及***

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111464208A (zh) * 2020-03-09 2020-07-28 深圳大学 基于扩频通信的无源边缘计算***、任务卸载方法及存储介质
CN113158544A (zh) * 2021-02-03 2021-07-23 大连理工大学 车载内容中心网络下基于联邦学习的边缘预缓存策略
CN113286317A (zh) * 2021-04-25 2021-08-20 南京邮电大学 一种基于无线供能边缘网络的任务调度方法
CN113573324A (zh) * 2021-07-06 2021-10-29 河海大学 工业物联网中协作式任务卸载和资源分配的联合优化方法
CN113645273A (zh) * 2021-07-06 2021-11-12 南京邮电大学 基于业务优先级的车联网任务卸载方法
CN113873022A (zh) * 2021-09-23 2021-12-31 中国科学院上海微***与信息技术研究所 一种可划分任务的移动边缘网络智能资源分配方法
CN114641076A (zh) * 2022-03-25 2022-06-17 重庆邮电大学 一种超密集网络中基于动态用户满意度的边缘计算卸载方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111464208A (zh) * 2020-03-09 2020-07-28 深圳大学 基于扩频通信的无源边缘计算***、任务卸载方法及存储介质
CN113158544A (zh) * 2021-02-03 2021-07-23 大连理工大学 车载内容中心网络下基于联邦学习的边缘预缓存策略
CN113286317A (zh) * 2021-04-25 2021-08-20 南京邮电大学 一种基于无线供能边缘网络的任务调度方法
CN113573324A (zh) * 2021-07-06 2021-10-29 河海大学 工业物联网中协作式任务卸载和资源分配的联合优化方法
CN113645273A (zh) * 2021-07-06 2021-11-12 南京邮电大学 基于业务优先级的车联网任务卸载方法
CN113873022A (zh) * 2021-09-23 2021-12-31 中国科学院上海微***与信息技术研究所 一种可划分任务的移动边缘网络智能资源分配方法
CN114641076A (zh) * 2022-03-25 2022-06-17 重庆邮电大学 一种超密集网络中基于动态用户满意度的边缘计算卸载方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEN XING; ZHANG JIANSHAN; LIN BING; CHEN ZHEYI; WOLTER KATINKA; MIN GEYONG: "Energy-Efficient Offloading for DNN-Based Smart IoT Systems in Cloud-Edge Environments", IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, IEEE, USA, vol. 33, no. 3, 27 July 2021 (2021-07-27), USA, pages 683 - 697, XP011871692, ISSN: 1045-9219, DOI: 10.1109/TPDS.2021.3100298 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118042495A (zh) * 2024-04-12 2024-05-14 华东交通大学 超密集网络中加压安全计算卸载与资源优化方法
CN118102386A (zh) * 2024-04-24 2024-05-28 南京邮电大学 D2d辅助mec网络中的服务缓存和任务卸载联合优化方法及***

Also Published As

Publication number Publication date
NL2033996A (en) 2024-04-08
CN115567978A (zh) 2023-01-03

Similar Documents

Publication Publication Date Title
WO2024065903A1 (zh) 多约束边环境下计算卸载与资源分配联合优化***及方法
Zhao et al. Contract-based computing resource management via deep reinforcement learning in vehicular fog computing
CN110493360B (zh) 多服务器下降低***能耗的移动边缘计算卸载方法
CN108924936B (zh) 无人机辅助无线充电边缘计算网络的资源分配方法
CN109753751B (zh) 一种基于机器学习的mec随机任务迁移方法
CN110928654B (zh) 一种边缘计算***中分布式的在线任务卸载调度方法
CN113543176B (zh) 基于智能反射面辅助的移动边缘计算***的卸载决策方法
CN114189892B (zh) 一种基于区块链和集体强化学习的云边协同物联网***资源分配方法
CN113286317B (zh) 一种基于无线供能边缘网络的任务调度方法
CN113286329B (zh) 基于移动边缘计算的通信和计算资源联合优化方法
CN115827108B (zh) 基于多目标深度强化学习的无人机边缘计算卸载方法
CN116260871A (zh) 一种基于本地和边缘协同缓存的独立任务卸载方法
CN114285853A (zh) 设备密集型工业物联网中基于端边云协同的任务卸载方法
Zhao et al. QoE aware and cell capacity enhanced computation offloading for multi-server mobile edge computing systems with energy harvesting devices
CN113573363B (zh) 基于深度强化学习的mec计算卸载与资源分配方法
CN116489712B (zh) 一种基于深度强化学习的移动边缘计算任务卸载方法
Hu et al. Dynamic task offloading in MEC-enabled IoT networks: A hybrid DDPG-D3QN approach
CN115499441A (zh) 超密集网络中基于深度强化学习的边缘计算任务卸载方法
Gong et al. Hierarchical deep reinforcement learning for age-of-information minimization in irs-aided and wireless-powered wireless networks
Wang et al. Deep reinforcement learning based joint partial computation offloading and resource allocation in mobility-aware MEC system
CN113946423A (zh) 基于图注意力网络的多任务边缘计算调度优化方法
Yu et al. Deep reinforcement learning based computing offloading decision and task scheduling in internet of vehicles
Amer et al. Qos-based task replication for alleviating uncertainty in edge computing
CN115134364B (zh) 基于o-ran物联网***的节能计算卸载***及方法
Hu et al. Distributed task offloading based on multi-agent deep reinforcement learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22960504

Country of ref document: EP

Kind code of ref document: A1