CN112822701A - Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene - Google Patents

Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene Download PDF

Info

Publication number
CN112822701A
CN112822701A CN202011638611.0A CN202011638611A CN112822701A CN 112822701 A CN112822701 A CN 112822701A CN 202011638611 A CN202011638611 A CN 202011638611A CN 112822701 A CN112822701 A CN 112822701A
Authority
CN
China
Prior art keywords
neural network
deep neural
network model
modeling
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011638611.0A
Other languages
Chinese (zh)
Inventor
陈旭
唐歆
曾烈康
周知
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202011638611.0A priority Critical patent/CN112822701A/en
Publication of CN112822701A publication Critical patent/CN112822701A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic

Abstract

The invention discloses a multi-user deep neural network model segmentation and resource allocation optimization method in an edge computing scene, which models a combined optimization problem of deep neural network model segmentation and computing resource allocation on an edge server into a nonlinear integer programming problem by comprehensively analyzing the execution characteristics of a deep neural network model segmentation technology in an edge computing environment, and further provides an iterative alternative optimization algorithm based on step length dynamic adjustment. The algorithm can efficiently obtain the optimal solution of the problem within polynomial time, and has the characteristic of strong robustness to various external influences under a real deployment scene.

Description

Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene
Technical Field
The invention relates to the technical field of deep learning, edge calculation and distributed calculation, in particular to a multi-user deep neural network model segmentation and resource allocation optimization method in an edge calculation scene.
Background
With the gradual popularization of 5G technologies and the continuous development of technologies such as mobile artificial intelligence (mfi), internet of Things (IoT), and the like, the number of devices on the edge of a network has increased explosively. Meanwhile, the terminal device at the edge of the network gradually transitions from the consumer role of intelligent application to a special node with both consumers and producers, and continuously generates massive real-time data in the operation process. However, the conventional mobile cloud computing method is limited by the transmission bandwidth of the backbone network and the high transmission delay caused by the long physical distance, and it is difficult to meet the real-time requirement of the new mobile application. Moreover, uploading raw data to a cloud server also inevitably raises concerns for users about potential privacy disclosure issues. In order to solve these problems, Mobile Edge Computing (MEC) provides high-bandwidth, low-latency, and high-privacy data processing and caching services for users by deploying resources at the Edge of the network closer to the users. Due to the superior performance in handling computationally intensive and delay sensitive tasks, mobile edge computing is considered one of the most promising solutions for mobile artificial intelligence technology.
For intelligent application under mobile edge calculation, the current practical deployment mainly has two ideas: the first idea is to compress the model to the extent that the mobile terminal can bear the load by technologies such as model pruning, weight quantization and the like; the second idea is to deploy a part of the computation tasks of the model to other devices by using a distributed deployment technology, so that the costs of computation load, energy loss and the like of the computer are reduced.
The first idea is to optimize the model itself, which has the advantage of not requiring additional device support, but also has several problems that are difficult to solve: firstly, the precision of the compressed model is generally difficult to guarantee theoretically, and not all models are suitable for being compressed; secondly, weight pruning based on structure can not be applied to all models, and general weight pruning based on structure can prevent high-performance parallel optimization of hardware level.
The second idea is to disassemble the model structure, mix and deploy the model structure to multiple devices, and make full use of external computing resources. However, existing approaches all consider from a single user perspective, typically assuming that the resources on the corresponding cloud server or edge server are static, fixed. The actual deployment scenario usually involves a resource-limited multi-user scenario, the distributed deployment challenge is more complicated, and the existing methods fail to provide a low-overhead solution.
Disclosure of Invention
The invention provides a multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene to overcome at least one defect in the prior art, and realizes efficient and low-delay reasoning of a deep learning model on mobile terminal equipment.
In order to solve the technical problems, the invention adopts the technical scheme that: a multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene comprises the following steps:
s1, deep neural network model segmentation modeling step: defining a logic layer comprising a plurality of deep neural network concept layers, and abstracting a deep neural network model into a computational graph model comprising a plurality of continuous series tasks by taking the logic layer as a minimum segmentation unit;
s2, resource allocation and deep neural network model segmentation decision modeling under the edge computing multi-user environment: under the multi-user environment, fitting and estimating the calculation time delay of the deep neural network model segmentation by using a heuristic method, and modeling a problem into a nonlinear integer programming problem;
s3, solving a user response time delay optimization problem: and solving the problem obtained by modeling in the S2 by using an iterative alternative optimization algorithm, and deploying a deep neural network model on the edge server according to the obtained solution.
Further, the step S1 specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by kiThe logic layers are connected in sequence and composed of an integer variable siThe representation model is behind the ith logical layerIs divided; segmentation decision si∈{0,1,2,...,ki}; by using
Figure BDA0002879302240000023
Representing the calculated amount of the neural network before the division point by
Figure BDA0002879302240000022
Representing the computational load of the neural network after the segmentation point.
Further, the step S2 specifically includes:
s21, modeling calculation power segmentation of the edge server: the computing power of the minimum allocatable computing resource unit MCRU is denoted as Cmin(ii) a Denote the total number of MCRUs on the edge server by β, and fiIndicating the number of MCRUs allocated to each user i; naturally, there is ∑i∈Nfi≤β;
S22, performing time delay modeling on the local equipment:
Figure BDA0002879302240000021
in the formula (1), the first and second groups,
Figure BDA0002879302240000031
representing the computing power of device i;
s23, modeling of calculation time delay of a neural network part unloaded to an edge server by equipment:
Figure BDA0002879302240000032
where θ is a unit step function, and its expression is:
Figure BDA0002879302240000033
gamma is an approximate map fitted from real data, representing fiThe part of the computing resource actually reaches the computing capacity CminMultiples of (d);
s24, modeling of transmission delay of the intermediate result:
Figure BDA0002879302240000034
wherein the content of the first and second substances,
Figure BDA0002879302240000035
indicating that the user equipment i is at the cut-off point siThe size of the intermediate result that needs to be transmitted,
Figure BDA0002879302240000036
representing the uplink bandwidth of user equipment i;
s25, modeling transmission delay returned by the final result:
Figure BDA0002879302240000037
wherein the content of the first and second substances,
Figure BDA0002879302240000038
indicating the downlink bandwidth of the user equipment i;
s26, deducing and modeling the total time delay of the deep neural network segmentation model of the single user equipment:
combining the step S22, the step S23, the step S24 and the step S25 to perform the delay modeling (1), (2), (3) and (4) on each sub-step, obtaining the deep neural network segmentation model performed by the user equipment to infer that the total delay is:
Figure BDA0002879302240000039
s27, modeling of global time delay minimization of multi-user equipment:
Figure BDA00028793022400000310
Figure BDA00028793022400000311
Figure BDA00028793022400000312
Figure BDA00028793022400000313
where equation (7) indicates that the total number of resources of the edge server is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to the user equipment i (f)i0), must have si=kiThat is, all computing tasks are performed locally; in formula (9), N represents a natural number set, fiAnd siAre all non-negative integers.
Further, the step S3 specifically includes:
s31, generating an initial feasible solution vector (S, F): wherein
Figure BDA0002879302240000046
Representing the neural network segmentation decisions of all the user equipment and the amount of computing resources distributed to each user equipment i by the edge server;
s32, setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge server
Figure BDA0002879302240000047
The value list of the adjustment step length is [ p ]q,pq-1,...,p2,p1,1];
S33, sequentially adjusting feasible solution vectors (S, F) according to each value of the adjustment step length list;
s34, traversing the adjustment step length list, and for each value tau:
for the initial solution (S, F), attempting to transfer tau resources of other devices to the device with the longest time delay, and if a new local optimal solution (S ', F') is generated, reserving and recording as (S, F);
and (4) until the adjustment step length tau is 1, and the obtained final solution (S, F) is the global optimal solution.
Further, p is 2.
Further, the step S33 specifically includes:
s331, according to the current solution (S, F), traversing all the user equipment i and calculating the F of each user equipment according to the formula (5)iTo calculate the time delay optimally
Figure BDA0002879302240000041
Finding the user equipment k with the longest calculation time delay, wherein the time delay is recorded as
Figure BDA0002879302240000042
S332, traversing all unmarked user equipment according to the adjustment step length tau, and calculating the optimal calculation time delay if the user equipment j loses tau resources
Figure BDA0002879302240000043
If it is not
Figure BDA0002879302240000044
Tagging the user equipment; if it is not
Figure BDA0002879302240000045
Depriving tau resources from the user equipment j to k, and obtaining a new solution vector (S, F);
s333. repeat step S332 until all user devices are marked.
A multi-user deep neural network model segmentation and resource allocation optimization system under an edge computing scene comprises:
the deep neural network model segmentation modeling module: the device comprises a logic layer used for defining a plurality of deep neural network concept layers, and abstracting a deep neural network model into a computational graph model comprising a plurality of continuous series tasks by taking the logic layer as a minimum segmentation unit;
the resource allocation and deep neural network model segmentation decision modeling module under the edge computing multi-user environment comprises: the method is used for fitting and estimating the computation time delay of the deep neural network model segmentation by using a heuristic method under the multi-user environment, and modeling the problem as a nonlinear integer programming problem;
a user response time delay optimization problem solving module: and solving the problem obtained by modeling in the resource allocation and deep neural network model segmentation decision modeling module in the edge computing multi-user environment by using an iterative alternating optimization algorithm, and deploying a deep neural network model on the edge server according to the solved solution.
Further, the deep neural network model segmentation modeling module specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by kiThe logic layers are connected in sequence and composed of an integer variable siThe representation model is segmented after the ith logical layer; segmentation decision si∈{0,1,2,...,ki}; by using
Figure BDA0002879302240000051
Representing the calculated amount of the neural network before the division point by
Figure BDA0002879302240000052
Representing the computational load of the neural network after the segmentation point.
Further, the resource allocation and deep neural network model segmentation decision modeling module in the edge computing multi-user environment specifically includes:
an edge server computing power division modeling module for recording the computing power of the minimum allocable computing resource unit MCRU as Cmin(ii) a With beta denoting the total MCRU on the edge serverQuantity of, with fiIndicating the number of MCRUs allocated to each user i; naturally, there is ∑i∈Nfi≤β;
The equipment local execution delay modeling module is used for modeling to obtain:
Figure BDA0002879302240000053
in the formula (1), the first and second groups,
Figure BDA0002879302240000054
representing the computing power of device i;
the device is unloaded to the modeling module of the neural network part calculation time delay on the edge server, and is used for modeling to obtain:
Figure BDA0002879302240000055
where θ is a unit step function, and its expression is:
Figure BDA0002879302240000056
gamma is an approximate map fitted from real data, representing fiThe part of the computing resource actually reaches the computing capacity CminMultiples of (d);
the modeling module of the intermediate result transmission time delay is used for modeling to obtain:
Figure BDA0002879302240000061
wherein the content of the first and second substances,
Figure BDA0002879302240000062
indicating that the user equipment i is at the cut-off point siThe size of the intermediate result that needs to be transmitted,
Figure BDA0002879302240000063
representing the uplink bandwidth of user equipment i;
and the transmission delay modeling module with returned final results is used for modeling to obtain:
Figure BDA0002879302240000064
wherein the content of the first and second substances,
Figure BDA0002879302240000065
indicating the downlink bandwidth of the user equipment i;
a modeling module of the deep neural network segmentation model of the single user equipment for deducing total time delay, which is used for combining the execution time delay modeling (1), (2), (3) and (4) of each sub-step in the steps S22, S23, S24 and S25 to obtain the time delay of the deep neural network segmentation model of the user equipment as follows:
Figure BDA0002879302240000066
modeling for global latency minimization of a multi-user device for modeling yielding:
Figure BDA0002879302240000067
Figure BDA0002879302240000068
Figure BDA0002879302240000069
Figure BDA00028793022400000610
wherein equation (7) represents the total resources of the edge serverThe number is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to user equipment i (f)i0), must have si=kiThat is, all computing tasks are performed locally; in formula (9), N represents a natural number set, fiAnd siAre all non-negative integers.
Further, the user response delay optimization problem solving module includes:
generate initial feasible solution vector (S, F) module: wherein
Figure BDA00028793022400000611
Representing the neural network segmentation decisions of all the user equipment and the amount of computing resources distributed to each user equipment i by the edge server;
a module for setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge server
Figure BDA0002879302240000071
The value list of the adjustment step length is [ p ]q,pq-1,...,p2,p1,1];
Sequentially adjusting the feasible solution vectors (S, F) according to each value of the adjustment step length list;
traversing the adjustment step length list, and for each value tau:
for the initial solution (S, F), attempting to transfer tau resources of other devices to the device with the longest time delay, and if a new local optimal solution (S ', F') is generated, reserving and recording as (S, F);
and (4) until the adjustment step length tau is 1, and the obtained final solution (S, F) is the global optimal solution.
Compared with the prior art, the beneficial effects are:
1. the method for segmenting the multi-user deep neural network model and optimizing the resource allocation in the edge computing scene improves the computing efficiency of the deep neural network segmentation technology in the multi-user scene by means of parallelly segmenting the multi-user neural network and unloading the segmented multi-user deep neural network to an edge server and combining an iterative alternative optimization solution to solve an optimal allocation scheme, and realizes efficient and low-delay reasoning of a deep learning model on mobile-end equipment.
2. The method considers the multi-user multi-selection deep neural network segmentation, estimates the execution time delay of the single-user equipment through a heuristic function, solves a combined scheme of optimal computation unloading and resource allocation by utilizing an iterative alternative optimization algorithm, and has stronger generalization capability and practicability;
3. the invention provides a data-driven fitting method for more accurately modeling and estimating the calculation power distribution of the multi-core CPU in a real scene, and the method has higher practicability.
Drawings
FIG. 1 is a flow chart of the multi-user deep neural network segmentation optimization algorithm execution steps disclosed in the present invention;
FIG. 2 is a graph of task execution latency versus the number of computing resources of an edge server at an average bandwidth of 5 Mb/s;
FIG. 3 is a graph of task execution latency versus the number of computing resources of an edge server at high bandwidth (10 Mb/s for mobile devices, 100Mb/s for fixed devices);
FIG. 4 is a relationship between task execution latency and average bandwidth under computing resources of 2 CPU cores;
fig. 5 is a relationship between task execution latency and average bandwidth under the computing resources of 7 CPU cores.
Detailed Description
The embodiment discloses a multi-user deep neural network model segmentation and resource allocation optimization method in an edge computing scene, which estimates the execution time delay of user equipment through a heuristic function and solves a combined scheme of optimal computation unloading and resource allocation by utilizing an iterative alternative optimization algorithm.
The experimental environment of this embodiment is specifically as follows, a workstation equipped with an eight-core 3.7GHz intel processor and a 16G memory is used as an edge server to provide computing offload services for user equipment. The user equipment consists of two raspberry pi development boards and two nvidia jetsonnano nano. On the edge server side, a virtual server is constructed by using a Docker container technology to independently provide the DNN partitioning-based computing offload service for the user equipment. A plurality of CPU cores (regarded as allocable computing resources) are respectively allocated to different containers, and a minimum allocable computing resource unit (MCRU) is set to 0.1 core. The edge server cooperates with 4 user equipments. Two raspberries running the MobilenetV2 model are wirelessly connected to an edge server via Wi-Fi, representing low-performance mobile devices (e.g., smart phones, smart wearable devices). Two NVIDIA jetsonno devices running model VGG19 are connected to the edge server through a wired lan, representing higher performance fixed devices (e.g., intelligent routers, intelligent home devices).
A multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene comprises the following steps:
s1, deep neural network model segmentation modeling step: for the VGG19 model, defining a logic layer containing a plurality of deep neural network concept layers, and abstracting the deep neural network model into a computational graph model containing a plurality of continuous tandem tasks by taking the logic layer as a minimum partition unit;
s2, resource allocation and deep neural network model segmentation decision modeling under the edge computing multi-user environment: under the multi-user environment, fitting and estimating the calculation time delay of the deep neural network model segmentation by using a heuristic method, and modeling the problem into a nonlinear integer programming problem by combining with a calculation graph model in S1;
s3, solving a user response time delay optimization problem: and solving the problem obtained by modeling in the S2 by using an iterative alternative optimization algorithm, and deploying a deep neural network model on the edge server according to the obtained solution. The specific steps of the iterative alternative optimization algorithm are shown in fig. 1.
Further, the step S1 specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers(ii) a The deep neural network model for device i deployment is shared by kiThe logic layers are connected in sequence and composed of an integer variable siThe representation model is segmented after the ith logical layer; segmentation decision si∈{0,1,2,...,ki}; by using
Figure BDA0002879302240000081
Representing the calculated amount of the neural network before the division point by
Figure BDA0002879302240000082
Representing the computational load of the neural network after the segmentation point.
Further, the step S2 specifically includes:
s21, modeling calculation power segmentation of the edge server: the computing power of the minimum allocatable computing resource unit MCRU is denoted as Cmin(ii) a Denote the total number of MCRUs on the edge server by β, and fiIndicating the number of MCRUs allocated to each user i; naturally, there is ∑i∈Nfi≤β;
S22, performing time delay modeling on the local equipment:
Figure BDA0002879302240000091
in the formula (1), the first and second groups,
Figure BDA0002879302240000092
representing the computing power of device i;
s23, modeling of calculation time delay of a neural network part unloaded to an edge server by equipment:
Figure BDA0002879302240000093
where θ is a unit step function, and its expression is:
Figure BDA0002879302240000094
gamma is an approximate map fitted from real data, representing fiThe part of the computing resource actually reaches the computing capacity CminMultiples of (d);
s24, modeling of transmission delay of the intermediate result:
Figure BDA0002879302240000095
wherein the content of the first and second substances,
Figure BDA0002879302240000096
indicating that the user equipment i is at the cut-off point siThe size of the intermediate result that needs to be transmitted,
Figure BDA0002879302240000097
representing the uplink bandwidth of user equipment i;
s25, modeling transmission delay returned by the final result:
Figure BDA0002879302240000098
wherein the content of the first and second substances,
Figure BDA0002879302240000099
indicating the downlink bandwidth of the user equipment i;
s26, deducing and modeling the total time delay of the deep neural network segmentation model of the single user equipment:
combining the step S22, the step S23, the step S24 and the step S25 to perform the delay modeling (1), (2), (3) and (4) on each sub-step, obtaining the deep neural network segmentation model performed by the user equipment to infer that the total delay is:
Figure BDA00028793022400000910
s27, modeling of global time delay minimization of multi-user equipment:
Figure BDA0002879302240000101
Figure BDA0002879302240000102
Figure BDA0002879302240000103
Figure BDA0002879302240000104
where equation (7) indicates that the total number of resources of the edge server is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to the user equipment i (f)i0), must have si=kiThat is, all computing tasks are performed locally; in formula (9), N represents a natural number set, fiAnd siAre all non-negative integers.
Further, the step S3 specifically includes:
s31, generating an initial feasible solution vector (S, F): wherein
Figure BDA00028793022400001010
Representing the neural network segmentation decisions of all the user equipment and the amount of computing resources distributed to each user equipment i by the edge server;
s32, setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge server
Figure BDA00028793022400001011
The value list of the adjustment step length is [ p ]q,pq-1,...,p2,p1,1];
S33, sequentially adjusting feasible solution vectors (S, F) according to each value of the adjustment step length list;
s34, traversing the adjustment step length list, and for each value tau:
for the initial solution (S, F), attempting to transfer tau resources of other devices to the device with the longest time delay, and if a new local optimal solution (S ', F') is generated, reserving and recording as (S, F);
and (4) until the adjustment step length tau is 1, and the obtained final solution (S, F) is the global optimal solution.
Further, p is 2.
Further, the step S33 specifically includes:
s331, according to the current solution (S, F), traversing all the user equipment i and calculating the F of each user equipment according to the formula (5)iTo calculate the time delay optimally
Figure BDA0002879302240000105
Finding the user equipment k with the longest calculation time delay, wherein the time delay is recorded as
Figure BDA0002879302240000106
S332, traversing all unmarked user equipment according to the adjustment step length tau, and calculating the optimal calculation time delay if the user equipment j loses tau resources
Figure BDA0002879302240000107
If it is not
Figure BDA0002879302240000108
Tagging the user equipment; if it is not
Figure BDA0002879302240000109
Depriving tau resources from the user equipment j to k, and obtaining a new solution vector (S, F);
s333. repeat step S332 until all user devices are marked.
Fig. 2 and fig. 3 show the influence of the resource abundance of the edge server on the final calculation delay under different bandwidths, and experimental results show that the solution obtained by the scheme disclosed by the invention can obtain the optimal effect.
Fig. 4 and 5 show the influence of network bandwidth on the final computation delay under different computation resource quantities, and experimental results show that the solution obtained by the scheme disclosed by the invention can obtain the optimal effect.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene is characterized by comprising the following steps:
s1, deep neural network model segmentation modeling step: defining a logic layer comprising a plurality of deep neural network concept layers, and abstracting a deep neural network model into a computational graph model comprising a plurality of continuous series tasks by taking the logic layer as a minimum segmentation unit;
s2, resource allocation and deep neural network model segmentation decision modeling under the edge computing multi-user environment: under the multi-user environment, fitting and estimating the calculation time delay of the deep neural network model segmentation by using a heuristic method, and modeling a problem into a nonlinear integer programming problem;
s3, solving a user response time delay optimization problem: and solving the problem obtained by modeling in the S2 by using an iterative alternative optimization algorithm, and deploying a deep neural network model on the edge server according to the obtained solution.
2. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 1, wherein the step S1 specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by kiThe logic layers are connected in sequence and composed of an integer variable siThe representation model is segmented after the ith logical layer; segmentation decision si∈{0,1,2,…,ki}; by using
Figure FDA0002879302230000011
Representing the calculated amount of the neural network before the division point by
Figure FDA0002879302230000012
Representing the computational load of the neural network after the segmentation point.
3. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 2, wherein the step S2 specifically includes:
s21, modeling calculation power segmentation of the edge server: the computing capacity of the minimum allocatable computing resource unit MCRU is recorded as Cmin(ii) a Denote the total number of MCRUs on the edge server by β, and fiIndicating the number of MCRUs allocated to each user i; naturally, there is ∑i∈Nfi≤β;
S22, performing time delay modeling on the local equipment:
Figure FDA0002879302230000013
in the formula (1), the first and second groups,
Figure FDA0002879302230000014
representing the computing power of device i;
s23, modeling of calculation time delay of a neural network part unloaded to an edge server by equipment:
Figure FDA0002879302230000021
where θ is a unit step function, and its expression is:
Figure FDA0002879302230000022
gamma is an approximate mapping fitted through realistic data, representing that fi computing resources actually achieve computing power CminMultiples of (d);
s24, modeling of transmission delay of the intermediate result:
Figure FDA0002879302230000023
wherein the content of the first and second substances,
Figure FDA0002879302230000024
indicating that the user equipment i is at the cut-off point siThe size of the intermediate result that needs to be transmitted,
Figure FDA0002879302230000025
representing the uplink bandwidth of user equipment i;
s25, modeling transmission delay returned by the final result:
Figure FDA0002879302230000026
wherein the content of the first and second substances,
Figure FDA0002879302230000027
indicating the downlink bandwidth of the user equipment i;
s26, deducing and modeling the total time delay of the deep neural network segmentation model of the single user equipment:
combining the step S22, the step S23, the step S24 and the step S25 to perform the delay modeling (1), (2), (3) and (4) on each sub-step, obtaining the deep neural network segmentation model performed by the user equipment to infer that the total delay is:
Figure FDA0002879302230000028
s27, modeling of global time delay minimization of multi-user equipment:
Figure FDA0002879302230000029
Figure FDA00028793022300000210
Figure FDA00028793022300000211
Figure FDA00028793022300000212
where equation (7) indicates that the total number of resources of the edge server is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to the user equipment i (f)i0), must have si=kiThat is, all computing tasks are performed locally; n in formula (9) represents a natural numberSet, fiAnd siAre all non-negative integers.
4. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 3, wherein the step S3 specifically includes:
s31, generating an initial feasible solution vector (S, F): wherein
Figure FDA0002879302230000037
Representing the neural network segmentation decisions of all the user equipment and the amount of computing resources distributed to each user equipment i by the edge server;
s32, setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge server
Figure FDA0002879302230000031
The value list of the adjustment step length is [ p ]q,pq-1,...,p2,p1,1];
S33, sequentially adjusting feasible solution vectors (S, F) according to each value of the adjustment step length list;
s34, traversing the adjustment step length list, and for each value tau:
for the initial solution (S, F), attempting to transfer tau resources of other devices to the device with the longest time delay, and if a new local optimal solution (S ', F') is generated, reserving and recording as (S, F);
and (4) until the adjustment step length tau is 1, and the obtained final solution (S, F) is the global optimal solution.
5. The method for the multi-user deep neural network model segmentation and resource allocation optimization in the edge computing scenario as claimed in claim 4, wherein p is 2.
6. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 4, wherein the step S33 specifically includes:
s331, according to the current solution (S, F), traversing all the user equipment i and calculating the F of each user equipment according to the formula (5)iTo calculate the time delay optimally
Figure FDA0002879302230000032
Finding the user equipment k with the longest calculation time delay, wherein the time delay is recorded as
Figure FDA0002879302230000033
S332, traversing all unmarked user equipment according to the adjustment step length tau, and calculating the optimal calculation time delay if the user equipment j loses tau resources
Figure FDA0002879302230000034
If it is not
Figure FDA0002879302230000035
Tagging the user equipment; if it is not
Figure FDA0002879302230000036
Depriving tau resources from the user equipment j to k, and obtaining a new solution vector (S, F);
s333. repeat step S332 until all user devices are marked.
7. The multi-user deep neural network model segmentation and resource allocation optimization system under the edge computing scene is characterized by comprising the following steps:
the deep neural network model segmentation modeling module: the device comprises a logic layer used for defining a plurality of deep neural network concept layers, and abstracting a deep neural network model into a computational graph model comprising a plurality of continuous series tasks by taking the logic layer as a minimum segmentation unit;
the resource allocation and deep neural network model segmentation decision modeling module under the edge computing multi-user environment comprises: the method is used for fitting and estimating the computation time delay of the deep neural network model segmentation by using a heuristic method under the multi-user environment, and modeling the problem as a nonlinear integer programming problem;
a user response time delay optimization problem solving module: and solving the problem obtained by modeling in the resource allocation and deep neural network model segmentation decision modeling module in the edge computing multi-user environment by using an iterative alternating optimization algorithm, and deploying a deep neural network model on the edge server according to the solved solution.
8. The system for optimizing multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 7, wherein the deep neural network model segmentation modeling module specifically comprises: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by kiThe logic layers are connected in sequence and composed of an integer variable siThe representation model is segmented after the ith logical layer; segmentation decision si∈{0,1,2,…,ki}; by using
Figure FDA0002879302230000041
Representing the calculated amount of the neural network before the division point by
Figure FDA0002879302230000042
Representing the computational load of the neural network after the segmentation point.
9. The system for optimizing the partitioning and resource allocation of the multi-user deep neural network model under the edge computing scenario according to claim 8, wherein the decision modeling module for partitioning and resource allocation of the multi-user deep neural network model under the edge computing multi-user environment specifically comprises:
edge server computation partitioning modeling module for partitioning computation power of minimum allocable computational resource unit MCRUIs marked as Cmin(ii) a Denote the total number of MCRUs on the edge server by β, and fiIndicating the number of MCRUs allocated to each user i; naturally, there is ∑i∈Nfi≤β;
The equipment local execution delay modeling module is used for modeling to obtain:
Figure FDA0002879302230000043
in the formula (1), the first and second groups,
Figure FDA0002879302230000044
representing the computing power of device i;
the device is unloaded to the modeling module of the neural network part calculation time delay on the edge server, and is used for modeling to obtain:
Figure FDA0002879302230000045
where θ is a unit step function, and its expression is:
Figure FDA0002879302230000051
gamma is an approximate map fitted from real data, representing fiThe part of the computing resource actually reaches the computing capacity CminMultiples of (d);
the modeling module of the intermediate result transmission time delay is used for modeling to obtain:
Figure FDA0002879302230000052
wherein the content of the first and second substances,
Figure FDA0002879302230000053
representing user equipmenti at the point of tangency siThe size of the intermediate result that needs to be transmitted,
Figure FDA0002879302230000054
representing the uplink bandwidth of user equipment i;
and the transmission delay modeling module with returned final results is used for modeling to obtain:
Figure FDA0002879302230000055
wherein the content of the first and second substances,
Figure FDA0002879302230000056
indicating the downlink bandwidth of the user equipment i;
a modeling module of the deep neural network segmentation model of the single user equipment for deducing total time delay, which is used for combining the execution time delay modeling (1), (2), (3) and (4) of each sub-step in the steps S22, S23, S24 and S25 to obtain the time delay of the deep neural network segmentation model of the user equipment as follows:
Figure FDA0002879302230000057
modeling for global latency minimization of a multi-user device for modeling yielding:
Figure FDA0002879302230000058
Figure FDA0002879302230000059
Figure FDA00028793022300000510
Figure FDA00028793022300000511
where equation (7) indicates that the total number of resources of the edge server is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to the user equipment i (f)i0), must have si=kiThat is, all computing tasks are performed locally; in formula (9), N represents a natural number set, fiAnd siAre all non-negative integers.
10. The system for optimizing multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 9, wherein the user response delay optimization problem solving module comprises:
generate initial feasible solution vector (S, F) module: wherein
Figure FDA0002879302230000061
Representing the neural network segmentation decisions of all the user equipment and the amount of computing resources distributed to each user equipment i by the edge server;
a module for setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge server
Figure FDA0002879302230000062
The value list of the adjustment step length is [ p ]q,pq-1,…,p2,p1,1];
Sequentially adjusting the feasible solution vectors (S, F) according to each value of the adjustment step length list;
traversing the adjustment step length list, and for each value tau:
for the initial solution (S, F), attempting to transfer tau resources of other devices to the device with the longest time delay, and if a new local optimal solution (S ', F') is generated, reserving and recording as (S, F);
and (4) until the adjustment step length tau is 1, and the obtained final solution (S, F) is the global optimal solution.
CN202011638611.0A 2020-12-31 2020-12-31 Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene Pending CN112822701A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011638611.0A CN112822701A (en) 2020-12-31 2020-12-31 Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011638611.0A CN112822701A (en) 2020-12-31 2020-12-31 Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene

Publications (1)

Publication Number Publication Date
CN112822701A true CN112822701A (en) 2021-05-18

Family

ID=75857638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011638611.0A Pending CN112822701A (en) 2020-12-31 2020-12-31 Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene

Country Status (1)

Country Link
CN (1) CN112822701A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111312368A (en) * 2020-01-20 2020-06-19 广西师范大学 Method for accelerating medical image processing speed based on edge calculation
CN113315669A (en) * 2021-07-28 2021-08-27 江苏电力信息技术有限公司 Cloud edge cooperation-based throughput optimization machine learning inference task deployment method
CN113987692A (en) * 2021-12-29 2022-01-28 华东交通大学 Deep neural network partitioning method for unmanned aerial vehicle and edge computing server
CN115277452A (en) * 2022-07-01 2022-11-01 中铁第四勘察设计院集团有限公司 ResNet self-adaptive acceleration calculation method based on edge-end cooperation and application

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110996393A (en) * 2019-12-12 2020-04-10 大连理工大学 Single-edge computing server and multi-user cooperative computing unloading and resource allocation method
CN112148492A (en) * 2020-09-28 2020-12-29 南京大学 Service deployment and resource allocation method considering multi-user mobility

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110996393A (en) * 2019-12-12 2020-04-10 大连理工大学 Single-edge computing server and multi-user cooperative computing unloading and resource allocation method
CN112148492A (en) * 2020-09-28 2020-12-29 南京大学 Service deployment and resource allocation method considering multi-user mobility

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIN TANG等: "Joint Multiuser DNN Partitioning and Computational Resource Allocation for Collaborative Edge Intelligence", 《IEEE INTERNET OF THINGS JOURNAL》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111312368A (en) * 2020-01-20 2020-06-19 广西师范大学 Method for accelerating medical image processing speed based on edge calculation
CN113315669A (en) * 2021-07-28 2021-08-27 江苏电力信息技术有限公司 Cloud edge cooperation-based throughput optimization machine learning inference task deployment method
CN113987692A (en) * 2021-12-29 2022-01-28 华东交通大学 Deep neural network partitioning method for unmanned aerial vehicle and edge computing server
CN115277452A (en) * 2022-07-01 2022-11-01 中铁第四勘察设计院集团有限公司 ResNet self-adaptive acceleration calculation method based on edge-end cooperation and application
CN115277452B (en) * 2022-07-01 2023-11-28 中铁第四勘察设计院集团有限公司 ResNet self-adaptive acceleration calculation method based on edge-side coordination and application

Similar Documents

Publication Publication Date Title
CN112822701A (en) Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene
CN107995660B (en) Joint task scheduling and resource allocation method supporting D2D-edge server unloading
CN111918339B (en) AR task unloading and resource allocation method based on reinforcement learning in mobile edge network
CN109246761B (en) Unloading method based on alternating direction multiplier method considering delay and energy consumption
CN110968426B (en) Edge cloud collaborative k-means clustering model optimization method based on online learning
CN110519370B (en) Edge computing resource allocation method based on facility site selection problem
CN110968366B (en) Task unloading method, device and equipment based on limited MEC resources
CN113615137B (en) CDN optimization platform
CN110162390B (en) Task allocation method and system for fog computing system
CN111813539A (en) Edge computing resource allocation method based on priority and cooperation
Liang et al. A location-aware service deployment algorithm based on k-means for cloudlets
CN113573363A (en) MEC calculation unloading and resource allocation method based on deep reinforcement learning
Li et al. Computation offloading and service allocation in mobile edge computing
CN110167031B (en) Resource allocation method, equipment and storage medium for centralized base station
CN111158893B (en) Task unloading method, system, equipment and medium applied to fog computing network
Ma Edge server placement for service offloading in internet of things
US11811429B2 (en) Variational dropout with smoothness regularization for neural network model compression
CN112686374A (en) Deep neural network model collaborative reasoning method based on adaptive load distribution
CN113515378A (en) Method and device for migration and calculation resource allocation of 5G edge calculation task
CN114745386B (en) Neural network segmentation and unloading method in multi-user edge intelligent scene
Malazi et al. Distributed service placement and workload orchestration in a multi-access edge computing environment
CN110944335B (en) Resource allocation method and device for virtual reality service
CN115499876A (en) Computing unloading strategy based on DQN algorithm under MSDE scene
CN114116052A (en) Edge calculation method and device
CN113709817A (en) Task unloading and resource scheduling method and device under multi-base-station multi-server scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210518

RJ01 Rejection of invention patent application after publication