CN112822701A

CN112822701A - Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene

Info

Publication number: CN112822701A
Application number: CN202011638611.0A
Authority: CN
Inventors: 陈旭; 唐歆; 曾烈康; 周知
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-18

Abstract

The invention discloses a multi-user deep neural network model segmentation and resource allocation optimization method in an edge computing scene, which models a combined optimization problem of deep neural network model segmentation and computing resource allocation on an edge server into a nonlinear integer programming problem by comprehensively analyzing the execution characteristics of a deep neural network model segmentation technology in an edge computing environment, and further provides an iterative alternative optimization algorithm based on step length dynamic adjustment. The algorithm can efficiently obtain the optimal solution of the problem within polynomial time, and has the characteristic of strong robustness to various external influences under a real deployment scene.

Description

Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene

Technical Field

The invention relates to the technical field of deep learning, edge calculation and distributed calculation, in particular to a multi-user deep neural network model segmentation and resource allocation optimization method in an edge calculation scene.

Background

With the gradual popularization of 5G technologies and the continuous development of technologies such as mobile artificial intelligence (mfi), internet of Things (IoT), and the like, the number of devices on the edge of a network has increased explosively. Meanwhile, the terminal device at the edge of the network gradually transitions from the consumer role of intelligent application to a special node with both consumers and producers, and continuously generates massive real-time data in the operation process. However, the conventional mobile cloud computing method is limited by the transmission bandwidth of the backbone network and the high transmission delay caused by the long physical distance, and it is difficult to meet the real-time requirement of the new mobile application. Moreover, uploading raw data to a cloud server also inevitably raises concerns for users about potential privacy disclosure issues. In order to solve these problems, Mobile Edge Computing (MEC) provides high-bandwidth, low-latency, and high-privacy data processing and caching services for users by deploying resources at the Edge of the network closer to the users. Due to the superior performance in handling computationally intensive and delay sensitive tasks, mobile edge computing is considered one of the most promising solutions for mobile artificial intelligence technology.

For intelligent application under mobile edge calculation, the current practical deployment mainly has two ideas: the first idea is to compress the model to the extent that the mobile terminal can bear the load by technologies such as model pruning, weight quantization and the like; the second idea is to deploy a part of the computation tasks of the model to other devices by using a distributed deployment technology, so that the costs of computation load, energy loss and the like of the computer are reduced.

The first idea is to optimize the model itself, which has the advantage of not requiring additional device support, but also has several problems that are difficult to solve: firstly, the precision of the compressed model is generally difficult to guarantee theoretically, and not all models are suitable for being compressed; secondly, weight pruning based on structure can not be applied to all models, and general weight pruning based on structure can prevent high-performance parallel optimization of hardware level.

The second idea is to disassemble the model structure, mix and deploy the model structure to multiple devices, and make full use of external computing resources. However, existing approaches all consider from a single user perspective, typically assuming that the resources on the corresponding cloud server or edge server are static, fixed. The actual deployment scenario usually involves a resource-limited multi-user scenario, the distributed deployment challenge is more complicated, and the existing methods fail to provide a low-overhead solution.

Disclosure of Invention

The invention provides a multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene to overcome at least one defect in the prior art, and realizes efficient and low-delay reasoning of a deep learning model on mobile terminal equipment.

In order to solve the technical problems, the invention adopts the technical scheme that: a multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene comprises the following steps:

s1, deep neural network model segmentation modeling step: defining a logic layer comprising a plurality of deep neural network concept layers, and abstracting a deep neural network model into a computational graph model comprising a plurality of continuous series tasks by taking the logic layer as a minimum segmentation unit;

s2, resource allocation and deep neural network model segmentation decision modeling under the edge computing multi-user environment: under the multi-user environment, fitting and estimating the calculation time delay of the deep neural network model segmentation by using a heuristic method, and modeling a problem into a nonlinear integer programming problem;

s3, solving a user response time delay optimization problem: and solving the problem obtained by modeling in the S2 by using an iterative alternative optimization algorithm, and deploying a deep neural network model on the edge server according to the obtained solution.

Further, the step S1 specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by k_iThe logic layers are connected in sequence and composed of an integer variable s_iThe representation model is behind the ith logical layerIs divided; segmentation decision s_i∈{0，1，2，...，k_i}; by using

Representing the calculated amount of the neural network before the division point by

Representing the computational load of the neural network after the segmentation point.

Further, the step S2 specifically includes:

s21, modeling calculation power segmentation of the edge server: the computing power of the minimum allocatable computing resource unit MCRU is denoted as C_min(ii) a Denote the total number of MCRUs on the edge server by β, and f_iIndicating the number of MCRUs allocated to each user i; naturally, there is ∑_i∈Nf_i≤β；

S22, performing time delay modeling on the local equipment:

in the formula (1), the first and second groups,

representing the computing power of device i;

s23, modeling of calculation time delay of a neural network part unloaded to an edge server by equipment:

where θ is a unit step function, and its expression is:

gamma is an approximate map fitted from real data, representing f_iThe part of the computing resource actually reaches the computing capacity C_minMultiples of (d);

s24, modeling of transmission delay of the intermediate result:

wherein the content of the first and second substances,

indicating that the user equipment i is at the cut-off point s_iThe size of the intermediate result that needs to be transmitted,

representing the uplink bandwidth of user equipment i;

s25, modeling transmission delay returned by the final result:

wherein the content of the first and second substances,

indicating the downlink bandwidth of the user equipment i;

s26, deducing and modeling the total time delay of the deep neural network segmentation model of the single user equipment:

combining the step S22, the step S23, the step S24 and the step S25 to perform the delay modeling (1), (2), (3) and (4) on each sub-step, obtaining the deep neural network segmentation model performed by the user equipment to infer that the total delay is:

s27, modeling of global time delay minimization of multi-user equipment:

where equation (7) indicates that the total number of resources of the edge server is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to the user equipment i (f)_i0), must have s_i＝k_iThat is, all computing tasks are performed locally; in formula (9), N represents a natural number set, f_iAnd s_iAre all non-negative integers.

Further, the step S3 specifically includes:

s31, generating an initial feasible solution vector (S, F): wherein

Representing the neural network segmentation decisions of all the user equipment and the amount of computing resources distributed to each user equipment i by the edge server;

s32, setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge server

The value list of the adjustment step length is [ p ]^q，p^q-1，...，p²，p¹，1]；

S33, sequentially adjusting feasible solution vectors (S, F) according to each value of the adjustment step length list;

s34, traversing the adjustment step length list, and for each value tau:

for the initial solution (S, F), attempting to transfer tau resources of other devices to the device with the longest time delay, and if a new local optimal solution (S ', F') is generated, reserving and recording as (S, F);

and (4) until the adjustment step length tau is 1, and the obtained final solution (S, F) is the global optimal solution.

Further, p is 2.

Further, the step S33 specifically includes:

s331, according to the current solution (S, F), traversing all the user equipment i and calculating the F of each user equipment according to the formula (5)_iTo calculate the time delay optimally

Finding the user equipment k with the longest calculation time delay, wherein the time delay is recorded as

S332, traversing all unmarked user equipment according to the adjustment step length tau, and calculating the optimal calculation time delay if the user equipment j loses tau resources

If it is not

Tagging the user equipment; if it is not

Depriving tau resources from the user equipment j to k, and obtaining a new solution vector (S, F);

s333. repeat step S332 until all user devices are marked.

A multi-user deep neural network model segmentation and resource allocation optimization system under an edge computing scene comprises:

the deep neural network model segmentation modeling module: the device comprises a logic layer used for defining a plurality of deep neural network concept layers, and abstracting a deep neural network model into a computational graph model comprising a plurality of continuous series tasks by taking the logic layer as a minimum segmentation unit;

the resource allocation and deep neural network model segmentation decision modeling module under the edge computing multi-user environment comprises: the method is used for fitting and estimating the computation time delay of the deep neural network model segmentation by using a heuristic method under the multi-user environment, and modeling the problem as a nonlinear integer programming problem;

a user response time delay optimization problem solving module: and solving the problem obtained by modeling in the resource allocation and deep neural network model segmentation decision modeling module in the edge computing multi-user environment by using an iterative alternating optimization algorithm, and deploying a deep neural network model on the edge server according to the solved solution.

Further, the deep neural network model segmentation modeling module specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by k_iThe logic layers are connected in sequence and composed of an integer variable s_iThe representation model is segmented after the ith logical layer; segmentation decision s_i∈{0，1，2，...，k_i}; by using

Further, the resource allocation and deep neural network model segmentation decision modeling module in the edge computing multi-user environment specifically includes:

an edge server computing power division modeling module for recording the computing power of the minimum allocable computing resource unit MCRU as C_min(ii) a With beta denoting the total MCRU on the edge serverQuantity of, with f_iIndicating the number of MCRUs allocated to each user i; naturally, there is ∑_i∈Nf_i≤β；

The equipment local execution delay modeling module is used for modeling to obtain:

in the formula (1), the first and second groups,

representing the computing power of device i;

the device is unloaded to the modeling module of the neural network part calculation time delay on the edge server, and is used for modeling to obtain:

where θ is a unit step function, and its expression is:

the modeling module of the intermediate result transmission time delay is used for modeling to obtain:

wherein the content of the first and second substances,

representing the uplink bandwidth of user equipment i;

and the transmission delay modeling module with returned final results is used for modeling to obtain:

wherein the content of the first and second substances,

indicating the downlink bandwidth of the user equipment i;

a modeling module of the deep neural network segmentation model of the single user equipment for deducing total time delay, which is used for combining the execution time delay modeling (1), (2), (3) and (4) of each sub-step in the steps S22, S23, S24 and S25 to obtain the time delay of the deep neural network segmentation model of the user equipment as follows:

modeling for global latency minimization of a multi-user device for modeling yielding:

wherein equation (7) represents the total resources of the edge serverThe number is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to user equipment i (f)_i0), must have s_i＝k_iThat is, all computing tasks are performed locally; in formula (9), N represents a natural number set, f_iAnd s_iAre all non-negative integers.

Further, the user response delay optimization problem solving module includes:

generate initial feasible solution vector (S, F) module: wherein

a module for setting a decreasing coefficient p and an adjusting step length tau: the decreasing coefficient p is a super parameter set artificially, and is calculated according to the total calculation resource quantity beta of the edge server

Sequentially adjusting the feasible solution vectors (S, F) according to each value of the adjustment step length list;

traversing the adjustment step length list, and for each value tau:

Compared with the prior art, the beneficial effects are:

1. the method for segmenting the multi-user deep neural network model and optimizing the resource allocation in the edge computing scene improves the computing efficiency of the deep neural network segmentation technology in the multi-user scene by means of parallelly segmenting the multi-user neural network and unloading the segmented multi-user deep neural network to an edge server and combining an iterative alternative optimization solution to solve an optimal allocation scheme, and realizes efficient and low-delay reasoning of a deep learning model on mobile-end equipment.

2. The method considers the multi-user multi-selection deep neural network segmentation, estimates the execution time delay of the single-user equipment through a heuristic function, solves a combined scheme of optimal computation unloading and resource allocation by utilizing an iterative alternative optimization algorithm, and has stronger generalization capability and practicability;

3. the invention provides a data-driven fitting method for more accurately modeling and estimating the calculation power distribution of the multi-core CPU in a real scene, and the method has higher practicability.

Drawings

FIG. 1 is a flow chart of the multi-user deep neural network segmentation optimization algorithm execution steps disclosed in the present invention;

FIG. 2 is a graph of task execution latency versus the number of computing resources of an edge server at an average bandwidth of 5 Mb/s;

FIG. 3 is a graph of task execution latency versus the number of computing resources of an edge server at high bandwidth (10 Mb/s for mobile devices, 100Mb/s for fixed devices);

FIG. 4 is a relationship between task execution latency and average bandwidth under computing resources of 2 CPU cores;

fig. 5 is a relationship between task execution latency and average bandwidth under the computing resources of 7 CPU cores.

Detailed Description

The embodiment discloses a multi-user deep neural network model segmentation and resource allocation optimization method in an edge computing scene, which estimates the execution time delay of user equipment through a heuristic function and solves a combined scheme of optimal computation unloading and resource allocation by utilizing an iterative alternative optimization algorithm.

The experimental environment of this embodiment is specifically as follows, a workstation equipped with an eight-core 3.7GHz intel processor and a 16G memory is used as an edge server to provide computing offload services for user equipment. The user equipment consists of two raspberry pi development boards and two nvidia jetsonnano nano. On the edge server side, a virtual server is constructed by using a Docker container technology to independently provide the DNN partitioning-based computing offload service for the user equipment. A plurality of CPU cores (regarded as allocable computing resources) are respectively allocated to different containers, and a minimum allocable computing resource unit (MCRU) is set to 0.1 core. The edge server cooperates with 4 user equipments. Two raspberries running the MobilenetV2 model are wirelessly connected to an edge server via Wi-Fi, representing low-performance mobile devices (e.g., smart phones, smart wearable devices). Two NVIDIA jetsonno devices running model VGG19 are connected to the edge server through a wired lan, representing higher performance fixed devices (e.g., intelligent routers, intelligent home devices).

A multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene comprises the following steps:

s1, deep neural network model segmentation modeling step: for the VGG19 model, defining a logic layer containing a plurality of deep neural network concept layers, and abstracting the deep neural network model into a computational graph model containing a plurality of continuous tandem tasks by taking the logic layer as a minimum partition unit;

s2, resource allocation and deep neural network model segmentation decision modeling under the edge computing multi-user environment: under the multi-user environment, fitting and estimating the calculation time delay of the deep neural network model segmentation by using a heuristic method, and modeling the problem into a nonlinear integer programming problem by combining with a calculation graph model in S1;

s3, solving a user response time delay optimization problem: and solving the problem obtained by modeling in the S2 by using an iterative alternative optimization algorithm, and deploying a deep neural network model on the edge server according to the obtained solution. The specific steps of the iterative alternative optimization algorithm are shown in fig. 1.

Further, the step S1 specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers(ii) a The deep neural network model for device i deployment is shared by k_iThe logic layers are connected in sequence and composed of an integer variable s_iThe representation model is segmented after the ith logical layer; segmentation decision s_i∈{0，1，2，...，k_i}; by using

Further, the step S2 specifically includes:

S22, performing time delay modeling on the local equipment:

in the formula (1), the first and second groups,

representing the computing power of device i;

where θ is a unit step function, and its expression is:

s24, modeling of transmission delay of the intermediate result:

wherein the content of the first and second substances,

representing the uplink bandwidth of user equipment i;

s25, modeling transmission delay returned by the final result:

wherein the content of the first and second substances,

indicating the downlink bandwidth of the user equipment i;

s27, modeling of global time delay minimization of multi-user equipment:

Further, the step S3 specifically includes:

s31, generating an initial feasible solution vector (S, F): wherein

s34, traversing the adjustment step length list, and for each value tau:

Further, p is 2.

Further, the step S33 specifically includes:

If it is not

Tagging the user equipment; if it is not

s333. repeat step S332 until all user devices are marked.

Fig. 2 and fig. 3 show the influence of the resource abundance of the edge server on the final calculation delay under different bandwidths, and experimental results show that the solution obtained by the scheme disclosed by the invention can obtain the optimal effect.

Fig. 4 and 5 show the influence of network bandwidth on the final computation delay under different computation resource quantities, and experimental results show that the solution obtained by the scheme disclosed by the invention can obtain the optimal effect.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A multi-user deep neural network model segmentation and resource allocation optimization method under an edge computing scene is characterized by comprising the following steps:

2. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 1, wherein the step S1 specifically includes: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by k_iThe logic layers are connected in sequence and composed of an integer variable s_iThe representation model is segmented after the ith logical layer; segmentation decision s_i∈{0,1,2,…,k_i}; by using

3. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 2, wherein the step S2 specifically includes:

s21, modeling calculation power segmentation of the edge server: the computing capacity of the minimum allocatable computing resource unit MCRU is recorded as C_min(ii) a Denote the total number of MCRUs on the edge server by β, and f_iIndicating the number of MCRUs allocated to each user i; naturally, there is ∑_i∈Nf_i≤β；

S22, performing time delay modeling on the local equipment:

in the formula (1), the first and second groups,

representing the computing power of device i;

where θ is a unit step function, and its expression is:

gamma is an approximate mapping fitted through realistic data, representing that fi computing resources actually achieve computing power C_minMultiples of (d);

s24, modeling of transmission delay of the intermediate result:

wherein the content of the first and second substances,

representing the uplink bandwidth of user equipment i;

s25, modeling transmission delay returned by the final result:

wherein the content of the first and second substances,

indicating the downlink bandwidth of the user equipment i;

s27, modeling of global time delay minimization of multi-user equipment:

where equation (7) indicates that the total number of resources of the edge server is limited, equation (8) indicates that the slicing decision must be smaller than the total number of logical layers, and when the edge server does not allocate computing resources to the user equipment i (f)_i0), must have s_i＝k_iThat is, all computing tasks are performed locally; n in formula (9) represents a natural numberSet, f_iAnd s_iAre all non-negative integers.

4. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 3, wherein the step S3 specifically includes:

s31, generating an initial feasible solution vector (S, F): wherein

s34, traversing the adjustment step length list, and for each value tau:

5. The method for the multi-user deep neural network model segmentation and resource allocation optimization in the edge computing scenario as claimed in claim 4, wherein p is 2.

6. The method for optimizing the multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 4, wherein the step S33 specifically includes:

If it is not

Tagging the user equipment; if it is not

s333. repeat step S332 until all user devices are marked.

7. The multi-user deep neural network model segmentation and resource allocation optimization system under the edge computing scene is characterized by comprising the following steps:

8. The system for optimizing multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 7, wherein the deep neural network model segmentation modeling module specifically comprises: abstracting a plurality of parallel concept layers and concept layers with shortcut connection in a deep neural network model into a single logic layer, and further abstracting the deep neural network model into a computation graph formed by sequentially connecting the logic layers; the deep neural network model for device i deployment is shared by k_iThe logic layers are connected in sequence and composed of an integer variable s_iThe representation model is segmented after the ith logical layer; segmentation decision s_i∈{0,1,2,…,k_i}; by using

9. The system for optimizing the partitioning and resource allocation of the multi-user deep neural network model under the edge computing scenario according to claim 8, wherein the decision modeling module for partitioning and resource allocation of the multi-user deep neural network model under the edge computing multi-user environment specifically comprises:

edge server computation partitioning modeling module for partitioning computation power of minimum allocable computational resource unit MCRUIs marked as C_min(ii) a Denote the total number of MCRUs on the edge server by β, and f_iIndicating the number of MCRUs allocated to each user i; naturally, there is ∑_i∈Nf_i≤β；

in the formula (1), the first and second groups,

representing the computing power of device i;

where θ is a unit step function, and its expression is:

wherein the content of the first and second substances,

representing user equipmenti at the point of tangency s_iThe size of the intermediate result that needs to be transmitted,

representing the uplink bandwidth of user equipment i;

wherein the content of the first and second substances,

indicating the downlink bandwidth of the user equipment i;

10. The system for optimizing multi-user deep neural network model segmentation and resource allocation under the edge computing scenario of claim 9, wherein the user response delay optimization problem solving module comprises:

generate initial feasible solution vector (S, F) module: wherein

The value list of the adjustment step length is [ p ]^q,p^q-1,…,p²,p¹,1]；

traversing the adjustment step length list, and for each value tau: