CN113312183B

CN113312183B - Edge calculation method for deep neural network

Info

Publication number: CN113312183B
Application number: CN202110870123.0A
Authority: CN
Inventors: 罗喜伶; 潘洋洋; 王雪檬; 董赋然
Original assignee: Hangzhou Innovation Research Institute of Beihang University
Current assignee: Hangzhou Innovation Research Institute of Beihang University
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-12-21
Anticipated expiration: 2041-07-30
Also published as: CN113312183A

Abstract

The invention provides an edge calculation method for a deep neural network, and belongs to the field of edge calculation and deep learning. Compared with the existing model compression scheme and the scheme of not filling data near the segmentation points in the model segmentation, the data filling method and the data filling device have the advantages that the data near the segmentation points are filled, so that after the feature graphs unloaded to the edge nodes are calculated at the corresponding layers, the splicing of data results is consistent with that of the feature graphs without splitting, and the precision loss is avoided.

Description

Edge calculation method for deep neural network

Technical Field

The invention belongs to the field of edge calculation and deep learning, and particularly relates to an edge calculation method for a deep neural network.

Background

In recent years, research on image processing, pattern recognition and the like of a deep neural network has made breakthrough progress, and the deep neural network is applied to more and more scenes of the internet of things. For example, in an unmanned aerial vehicle monitoring scene, an onboard camera carries out image information from the surrounding environment in real time, and meanwhile, technologies such as image recognition or image classification are used for carrying out rapid analysis on collected information. Analysis of data is a very resource-consuming delay-sensitive, computationally intensive task that is typically characterized by high delay requirements and computational power requirements. The end device executing such tasks is usually a small terminal device with weak computing power, such as an unmanned aerial vehicle, a mobile phone, a small embedded device, and the like, and cannot complete the tasks within a specified time delay requirement. Therefore, it becomes challenging to deploy delay-sensitive, computationally intensive applications (such as image detection, etc.) on resource-constrained end devices.

To address this problem, conventional solutions employ a cloud computing model. In the cloud computing model, the entire neural network is placed on a cloud with nearly unlimited computing resources, and the end devices that collect data and ultimately receive the results send the source data directly into the cloud for processing. However, due to the limitation of bandwidth, such a scheme may cause extremely high communication delay, thereby causing application processing delay that cannot meet the user's requirement, and also causing additional bandwidth cost. Moreover, since data is concentrated on a cloud server provided by a third party for processing, a problem of privacy disclosure may be caused for a part of data related to personal activities of a user. At present, a terminal-cloud cooperation scheme intensively researched at home and abroad divides a neural network model according to the granularity of layers and deploys the neural network model on terminals and clouds, including NeuroSurgeon and the like described in the SIGGRAPH conference paper < neuron interference between the public and mobile edge >, calculates time delay and output data transmission time delay according to different devices on each layer of the neural network, and makes a decision to traverse a scheme taking different layers as division points and search for optimal division points. Such schemes are also limited by end-to-cloud network communication delays. Based on the challenges, an edge computing model arises, which provides computing services through network edge devices close to internet of things devices and data sources, and because of the characteristics of higher computing power of the edge devices, large network bandwidth and the like, edge computing can reduce end-to-end delay of application and improve efficiency, and can also reduce energy consumption of end devices and provide good privacy protection.

The existing deep neural network inference research based on the edge calculation model focuses on two aspects of model compression and model segmentation, and although the model compression method can reduce the network inference time delay, the accuracy of an output result is influenced. In the model segmentation method, two modes of input feature map horizontal space segmentation and channel segmentation are mainly adopted. The channel segmentation divides the input characteristic graph of each convolutional layer into a plurality of sub-channels and then performs convolution calculation, and in the scheme, because the output result of each convolutional layer needs to be completely spliced, extra data transmission overhead is brought. In the horizontal space division method, the above-described problem does not exist, but if the input feature map is divided and the related data near the division point is not filled, the output result is affected, and the accuracy is lost. Some schemes are as follows (patent: a convolutional neural network vertical segmentation method for image processing CN 112363844 a), which segments the input feature map of the last convolutional layer in the consecutive convolutional layers into consecutive sub-feature maps, and obtains the coordinates of the corresponding sub-feature map in the first convolutional layer by a reverse-derivation method, and the segmentation process does not consider the heterogeneity of edge devices and the end-edge network bandwidth environment.

Disclosure of Invention

In order to solve the technical problem, the invention provides an edge calculation method facing a deep neural network, which comprehensively considers the heterogeneity of edge equipment and the edge network environment, and performs data filling on the segmented input feature map, thereby ensuring the consistency of an output result and an undisassembled scheme.

In order to achieve the purpose, the invention adopts the following technical scheme:

a deep neural network-oriented edge computing method comprises the following steps:

step 1: acquiring a linear regression equation of the calculation time-floating point operand of a convolution layer, an activation layer and a pooling layer in the available edge equipment;

step 2: the end equipment sends parameter information of each layer of the deep neural network model to be calculated to each edge equipment, each edge equipment reconstructs the neural network model according to the received parameter information, and transmits computing power and inherent overhead parameters in the linear regression equation of the computing time-floating point operand of different layers to the end equipment;

and step 3: acquiring a side-end network environment to obtain a data transmission rate between edge equipment and end equipment;

and 4, step 4: calculating the predicted data transmission duration according to the data transmission rate, and dividing the input characteristic graph by combining the predicted data transmission duration and the predicted calculation duration to obtain an initial data distribution scheme;

and 5: filling data near the dividing points to realize data exchange distributed to adjacent edge equipment to obtain a final data distribution scheme;

step 6: the edge device with the most initial distribution data is used as the strongest computing force edge device, the end device unloads the rest layers except the convolutional layer and the adjacent activation layer in the deep neural network model to be computed and the pooling layer between the first convolutional layer and the last convolutional layer to the strongest computing force edge device, the output results of the rest edge devices are sent to the strongest computing force edge device for data splicing, the splicing result is continuously computed in the rest layers loaded by the strongest computing force edge device, and the final computing result is sent to the end device.

Further, the step 1 specifically comprises:

1.1) in different edge devices, setting different parameters aiming at a certain layer type of a convolutional layer, an active layer and a pooling layer, and recording the average execution time of characteristic graphs with different sizes as calculation time according to input characteristic graphs with different sizes;

1.2) respectively calculating floating point operands aiming at each convolution layer, activation layer and pooling layer under different parameter settings, wherein the floating point operand Fc of the convolution layer has the calculation formula:

Fc=2HW(C_inK²+1)C_out

the floating point operand Fr of the active layer is calculated by the formula:

Fr=HWC_in

the floating point operand Fp calculation formula of the pooling layer is as follows:

Fp=K²H_outW_outC_out

wherein H, W denotes the height and width of the feature map, C_inFor input channel number, C_outFor the number of output channels, K is the size of the convolution kernel in the convolution layer or the filter in the pooling layer, H_outFor outputting the height of the feature map, W_outIs the width of the output feature map;

1.3) Linear fitting of computation time and floating point operands:

y=kx+b

y is used for representing the predicted calculation time, and x is a floating point operand of each layer under the corresponding parameter setting; k is a slope and is used for representing a calculation force parameter; b is an intercept, representing an intrinsic overhead parameter.

Further, in order to obtain the edge-to-end network environment, the end device sends the feature map data to the edge device, the edge device sends a response to the end device immediately after receiving the data, and the data transmission time is recorded, so that the data transmission rate Vi of the edge-to-end network is obtained, and the data transmission rate Vi represents the data transmission rate between the ith edge device and the end device.

Further, the initial data allocation scheme is as follows:

4.1) calculating and simultaneously unloading the convolution layer and the adjacent activation layer in the deep neural network model to be calculated by the end equipment;

4.2) according to the convolution calculated force kci of each edge device and the adjacent activation layer calculated force kri, i =1,2, … …, n, n is the available number of edge devices, the predicted calculation time length of the complete input feature diagram in the edge device is obtained, so as to obtain the unit calculation time length Tsi corresponding to the unit length data of the longer side of the input feature diagram, and the calculation formula is as follows:

Tsi=(Fc*kci+Fr*kri)/L

wherein, Fc is the floating point operand of the convolution layer, Fr is the floating point operand of the activation layer, and L is the data length of the longer side of the input characteristic diagram;

4.3) calculating the joint calculation time length Tpci of the convolution layer and the activation layer:

Tpci=Tai+Thi

Tai=Tsi*li

Thi=bc+br

wherein, Tpci is the common calculation time length of the convolution layer and the adjacent activation layer of the ith edge device, Tai is the calculation time length of the distribution data of the ith edge device, Thi is the inherent expense of the ith edge device, li is the length of the longer side of the input feature diagram distributed to the ith edge device,

(ii) a bc. br is the inherent overhead of the convolutional layer and the adjacent active layer, respectively;

4.4) calculating the predicted data transmission time length Tpti of each edge device:

Tpti=(Vi*D*li)/L

wherein D is the data size corresponding to the input characteristic diagram;

4.5) calculating the total predicted time length Tpi = Tpti + Tpci of each edge device;

4.6) in the initial allocation scheme, firstly, assuming that all edge devices Tpi are equal, obtaining the length li of the longer side L corresponding to the initial allocation data of all the edge devices according to the formulas in the steps 4.1) to 4.3), and taking the edge device corresponding to the maximum value in the li as the strongest computational edge device;

rounding down all li values, adding the difference between L and the sum of li to a maximum value L_maxObtaining the integer length li of the longer side L of the input characteristic diagram corresponding to the data distributed by each edge device in the initial distribution scheme; if li is smaller than the size of the convolution kernel, the edge device is removed, and the step 4.2) is returned to recalculate the initial distribution scheme.

Further, the step 5 specifically includes:

5.1) marking data parts that cannot be normally convolved:

acquiring the position of each convolution along the longer side L of the non-segmented input feature map according to the convolution kernel size and step length of the convolution layer, marking data which cannot complete corresponding partial convolution due to insufficient segmented data, recording the initial position S of the partial data relative to the longer side L of the feature map as a first edge device and a second edge device respectively because the partial data is distributed to the convolution layers of two adjacent edge devices in an initial data distribution scheme_iAnd a termination position E_iThe division point is denoted as Seg_i；

5.2) data filling:

seg to be distributed to second edge device₁To (K + S)₁) Part of the data is filled into the first edge device, and is distributed to the first edge device (S)₁+ Str) to Seg₁Part of data is filled into the second edge device to realize data filling;

5.3) calculating the predicted calculation time length and the data transmission time length of the first edge device and the second edge device under the filling scheme in the step 5.2) to obtain the maximum value in the total predicted time length corresponding to each edge device, wherein the size of the transmission data is the sum of the size of the transmission data and the size of the return data, and the return data is the data of the input part of the next layer of convolution layer and the part which cannot complete normal convolution due to data segmentation under the filling scheme;

5.4) repeating the steps 5.2) to 5.3), wherein the data filled into the first edge device is the data filled into the first edge device for the previous time, the data is expanded backwards by one convolution kernel length Str, and the data filled into the second edge device is the data filled into the second edge device for the previous time, the data is shortened backwards by one convolution kernel length Str from the starting position; continuously circulating until the data filled into the first edge device is the Seg distributed into the second edge device₁To E₁Part of the data, and the data populated into the second edge device is distributed into the first edge device₁-K) to Seg₁Obtaining the maximum value of the total prediction duration corresponding to each edge device under each circulation from partial data;

5.5) comparing the maximum value T in the total predicted time length corresponding to each edge device under all schemes_maxChoosing the smallest T_maxAs a filling scheme for the first edge device and the second edge device;

5.6) taking the second edge device as the first edge device, taking a third edge device adjacent to the second edge device as the second edge device, and repeating the steps 5.2) to 5.5) until a final data filling scheme among all available edge devices is obtained.

Further, according to the final data filling scheme among all available edge devices, the first convolutional layer and the adjacent activation layer of the deep neural network model to be calculated in the end device are unloaded, the step 5 is repeated according to the returned data, and the next convolutional layer and the adjacent activation layer are unloaded again until all convolutional layers are unloaded. And for a pooling layer between the first convolution layer and the last convolution layer of the deep neural network model to be calculated in the end equipment, adopting the same data filling scheme as the convolution layer and the activation layer to realize unloading of the pooling layer. The pooling operation of the pooling layer is only for partial region data and not global data.

Compared with the existing model compression scheme and the scheme of not filling data near the segmentation points in the model segmentation, the data filling method and the data filling device have the advantages that the data near the segmentation points are filled, so that after the feature graphs unloaded to the edge nodes are calculated at the corresponding layers, the splicing of data results is consistent with that of the feature graphs without splitting, and the precision loss is avoided.

Drawings

FIG. 1 is a schematic diagram of the edge device collaborative inference flow of the present invention;

FIG. 2 is an edge device collaborative inference algorithm framework of the present invention;

FIG. 3 is a graph comparing the effects of the embodiments of the present invention;

FIG. 4 is a schematic diagram of data filling near the division point in the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the edge calculation method for a deep neural network proposed by the present invention mainly includes the following contents.

Firstly, acquiring the function relation of the computation time and the floating point operand of a convolution layer, an activation layer and a pooling layer in each edge device.

In this embodiment, in different edge devices, a plurality of different hyper-parameters are set for a certain layer type of a convolutional layer, an active layer, and a pooling layer, feature maps of different sizes are input, and the execution time of the feature maps is recorded, where the execution time is calculation time; for example, different convolution kernel sizes, input channel numbers, output channel numbers, step lengths and the like are set for the convolution layer, and corresponding hyper-parameters under each setting are recorded. And then inputting feature maps with different sizes matched with the layers with different hyper-parameters, executing the neural network layers under the setting for each setting for determining the hyper-parameters and the feature maps, recording the average value of the calculation time of the neural network layers under the setting, and taking the average value as the final calculation time under the setting.

And calculating the floating point operands of the corresponding layers under different settings. If the feature map is respectively subjected to convolution, activation and general pooling, the convolution layer floating point operand calculation formula is shown as formula (1):

Fc=2HW(C_inK²+1) (1)

the active layer floating point operand calculation formula is shown as formula (2):

Fr=HWC_in (2)

the general floating-point operand calculation formula of the pooling layer is shown as formula (3):

Fp=K²H_outW_outC_out (3)

wherein H is the height of the feature map, W is the width of the feature map, C_inIs input channel number, C_outIs the number of output channels, K is the size of the convolution kernel in the convolution layer or the filter in the pooling layer, Fc is the convolution layer floating point operand, Fr is the activation layer floating point operand, Fp is the pooling layer floating point operand, H_outFor outputting the height of the feature map, W_outIs the width of the output signature. The floating point operands of the convolution layer, the activation layer and the pooling layer can be obtained by the above formula.

And acquiring the functional relation of the computation time-floating point operand of different layers (convolution layer, activation layer and pooling layer). In this embodiment, the results of the calculation time and the corresponding floating point operand obtained in the above steps under different settings of the determination type layer indicate that the calculation time of the convolution layer, the activation layer, and the pooling layer is in a linear relationship with the corresponding floating point operand. In this embodiment, the pooling layer refers to a general pooling layer in which the pooling operation is only for partial region data, not for global data.

Assuming that the logistic regression model of floating point operands and computation time of the convolutional layer, the active layer and the pooling layer is shown in formula (4):

y=kx+b (4)

wherein y represents the calculation time corresponding to the convolutional layer, the active layer and the pooling layer, k represents the calculation power of the equipment, b represents the inherent expense of calculation of the convolutional layer, the active layer and the pooling layer, and x represents the floating point operand corresponding to the convolutional layer, the active layer and the pooling layer.

And solving linear regression equations calculated by the convolution layer, the activation layer and the pooling layer of each edge device by using a least square method, wherein the calculation time of the corresponding layer can be predicted by floating point operands before calculation of the corresponding layer of the neural network is subsequently carried out by using the solved linear regression equations.

And secondly, the end equipment sends the relevant information of each layer of the neural network model to be inferred to each edge equipment, and obtains the function relation of different layers of each edge equipment in the step one and the side end network environment.

In this embodiment, the end device sends information such as parameters and hyper-parameters of each convolutional layer, active layer, pooling layer, full-link layer, and the like in the neural network to be inferred to each edge device. After receiving the information, each edge device transmits the calculation force k and the inherent cost b in the linear regression equation of the calculation time-floating point operand of different layers to the end device, and loads the neural network model according to the received information in a corresponding sequence. In order to obtain the edge-to-end network environment, the end device may send a certain size of feature map data to the edge device, the edge device sends a response to the end device immediately after receiving the data, and the data transmission time is recorded, so as to obtain the data transmission rate Vi of the edge-to-end network, which indicates the time for data transmission between the ith edge device and the end device.

And thirdly, dividing the input characteristic diagram according to the data transmission time length and the predicted calculation time length so as to obtain an initial data distribution scheme.

In this embodiment, the neural network convolutional layer and the adjacent activation layer are simultaneously unloaded, and the convolutional layer input feature map is divided along the longer sides of the width and the height thereof, wherein the data length of the longer side of the input feature map is L.

According to the convolution calculation force kci of each edge device and the calculation force kri, i =1,2, … …, n, n of the adjacent activation layers as the available number of the edge devices, the incremental calculation time length of the predicted data of the complete input feature diagram in the edge device can be obtained, so that the incremental calculation time length Tsi of the unit data corresponding to the unit length data of the longer side of the input feature diagram is obtained, and the calculation formula is shown as formula (5):

Tsi=(Fc*kci+Fr*kri)/L (5)

and Fc and Fr are floating point operands corresponding to the complete input characteristic diagram and the corresponding layer.

The predicted time duration Tpci of joint inference of the convolutional layer and the active layer is the sum of the allocated data increment time duration Tai and the inherent cost Thi according to the corresponding layer, the allocated data increment time duration Tai is the product of the unit data increment inference time duration Tsi and the length li of the longer side of the corresponding input feature diagram of the allocated data, and the formula is (6):

Tai=Tsi*li (6)

the corresponding layer intrinsic cost Thi is calculated as (7):

Thi=bc+br (7)

therefore, the calculation formula of the convolution layer and the activation layer common inference prediction time length Tpci is (8):

Tpci=Tai+Thi (8)

where bc and br correspond to the inherent overhead of convolutional layer and active layer computations, respectively. In the initial allocation stage, the data transmission duration Tpti is approximately the data transmission duration, and the calculation formula is shown as (9):

Tpti=(Vi*D*li)/L (9)

wherein D is the data size corresponding to the convolutional layer input characteristic diagram.

The total prediction processing time length Tpi of a certain edge device is the sum of the data transmission time length Tpti and the joint reasoning prediction time length Tpci of the convolutional layer and the active layer, and the formula is shown as (10):

(10)

due to the serial execution characteristic of the neural network, and the execution output of the previous layer is the execution input of the next layer, the time for each edge device to complete the same layer of inference is close as possible, that is, the Tpi of each edge device is approximately equal.

To solve the initial allocation scheme, we assume that the edge devices Tpi are equal, i.e. equation (11):

Tp1=Tp2=…=Tpn (11)

the sum of the lengths initially allocated by each edge device is the data length of the longer side of the convolutional layer input feature graph, which is L, namely formula (12):

(12)

according to the formula, the length li of the longer side L corresponding to the initial distribution data of each edge device can be obtained, and the maximum value L in li is obtained_maxWhich represents the data length of the most computationally intensive edge device among the edge devices compared to L. Since li obtained at this time is in the form of a decimal number, it needs to be converted into an integer to satisfy the practical significance. Rounding down all li, at which point li no longer satisfies equation (12), and adding the difference between L and the sum of li to a maximum value L_maxIn the above, the integer length li of the data allocated to each edge device in the initial allocation scheme corresponding to the longer side L of the input feature map can be obtained. If the li length is smaller than the convolution kernel size, the edge device is removed and then the allocation is carried out again.

And fourthly, filling data near the division point and exchanging the data.

In order to ensure that the output result after data division and unloading is consistent with the unloaded condition, additional processing needs to be performed on the data of the divided part.

Referring to the algorithm block diagram shown in fig. 2, the present embodiment includes the following steps:

step 4.1: the position of each convolution along the longer side of the undivided input feature map is obtained according to the corresponding parameters of the convolution kernel size, the step size and the like of the convolution layer (FIG. 4 is the input feature map of one convolution layer, wherein the convolution kernel size is 5, the step size is 1, and 1-13 are the positions of each convolution along the direction). Due to the segmentation of the feature map, the convolution operation related to the position of the segmentation point cannot be completed due to insufficient data (7-10 in the figure), and the rest convolution operation is sufficient in position data and can normally complete convolution (1-6 and 11-13 in the figure). Marking the data which can not complete the partial convolution due to insufficient data after division on the input characteristic diagram which is not divided, and recording the starting position S of the data relative to the longer side L of the characteristic diagram_iAnd a termination position E_i(S1 and E1 in the figure, respectively), a pairThe partial data is additionally processed, and the consistency of results is guaranteed.

Step 4.2: assuming that this partial convolution kernel has a length K relative to the longer side of the feature map, the convolution step size is Str. The Seg will be assigned to the second edge device₁To (K + S)₁) Part of the data is filled into the first edge device, and is distributed to the first edge device (S)₁+ Str) to Seg₁And filling part of data into the second edge device to realize data filling. Under the filling scheme, the predicted calculation time lengths Tpc1 and Tpc2 of the two edge devices and the predicted data transmission time lengths Tpt1 and Tpt2 are calculated, and the maximum value T in the total predicted time lengths of all the edge devices under the filling scheme is obtained_max. The size of the transmission data is the sum of the size of the transmission data and the size of the return data, the return data is the data which is output under the filling scheme and is input as the next convolutional layer and can not complete the corresponding partial convolution due to data segmentation, and the data can be obtained by executing the step 4.1, wherein the difference is that the partial data is marked according to the related parameters of the next convolutional layer.

Step 4.3: step 4.2 is executed again, wherein the data stuffed in the first edge device is the data stuffed in the first edge device and the data stuffed in the second edge device is the data stuffed in the second edge device and the data stuffed in the first edge device and the data stuffed in the second edge device are the data with the length of Str and the data stuffed in the second edge device are the data with the length of Str and the data are expanded backwards from the starting position. Seg to be distributed to the second edge device₁To (K + S)₁+ Str) part of the data is supplemented to the first edge device and will be distributed to the first edge device (S)₁+2Str) to Seg₁Part of the data is filled to the second edge device. Repeating the step 4.2 until the data filled into the first edge device is the Seg in the second edge device₁To E₁Part of the data, the data now filled into the second edge device being in the first edge device (E)₁-K) to Seg₁Part of the data.

Comparing all the schemes, selecting T_maxThe minimum scheme is used as the final data filling scheme from the first edge device to the second edge device. Obtaining data between a first edge device and a second edge deviceAnd after the filling scheme, comparing the second edge device with the first edge device, comparing the third edge device with the second edge device, and repeating the step 4.2 and the step 4.3 to obtain a final data filling scheme from the second edge device to the third edge device. And so on until the final data filling scheme among all available edge devices is obtained.

Step 4.4: and (4) after the first convolutional layer and the activation layer of the neural network are unloaded according to the final data filling scheme, repeating the step 4.2 and the step 4.3 according to the returned data to exchange data, and unloading the next convolutional layer and the activation layer again until the calculation unloading of all convolutional layers is finished. For a general pooling layer between the first convolutional layer and the last convolutional layer, the same data filling scheme as the convolutional layer and the active layer is adopted.

Fifthly, calculating and unloading the residual layer of the neural network.

After the last convolutional layer and the activation layer are unloaded, the output results of all the edge devices are sent to the edge device with the maximum initial distribution data li in the step three for corresponding data splicing, the splicing results are sent to a sub-neural network formed by the rest layers for reasoning, and the final reasoning results are sent back to the end device.

And sixthly, performing and sensing the network.

And after receiving the input picture for the first time, the end equipment executes the step three to obtain an initial data distribution scheme according to the edge-end network environment in the step two, executes the step four to obtain final data filling schemes near all the segmentation points, records the data transmission rate when the convolution and the layer are unloaded for the first time and all the data segmentation and filling schemes obtained in the step, and finishes the step five.

And after the end equipment receives the pictures subsequently, the corresponding layer calculation is directly unloaded according to the data segmentation and filling scheme recorded in the process.

After reasoning a batch of pictures each time, judging whether the recorded data transmission rate and the data transmission rate fluctuation value recorded before distribution exceed a certain threshold (such as 30%), if the fluctuation exceeds the threshold, re-performing the data distribution in the second step to the fifth step and acquiring the data filling scheme near the division point. In the whole process of accelerating reasoning of the edge computing model, each terminal device uses a deep learning framework to carry out neural network reasoning computation, and uses a remote calling tool to carry out communication and data exchange.

Compared with cloud computing, the method and the device make full use of computing resources of the edge device, save cost of communication with the cloud and computing cost, and ensure privacy of data.

Compared with a model compression method and a method without filling data near the division points in model division, the method disclosed by the invention has the advantages that the data near the division points are filled, so that after the characteristic diagrams unloaded to each edge device are calculated at corresponding layers, the splicing of data results is consistent with that of the characteristic diagrams without splitting, and further, the inference method disclosed by the invention is ensured not to generate precision loss.

In order to verify the implementation effect of the invention, which fully considers the heterogeneous computing resources of the edge device and the dynamic edge-end network environment, in one specific implementation of the invention, a raspberry pi is used as the end device, and a MacBook Pro, a notebook computer (association) and a Mac mini are used as the edge device for verification. The pytorch is used as a deep learning reasoning framework and the pytorch. A trained classical neural network model AlexNet is used as a target acceleration neural network.

Method 1, only end equipment is used for reasoning.

And 2, performing multi-device collaborative reasoning, and performing equal-proportion segmentation on input loads without considering the edge-end network environment.

Method 3. the method of the invention.

Fig. 3 is an experimental result of an AlexNet neural network inference classification picture performed multiple times by using the above different methods in the embodiment of the present invention. As can be seen from FIG. 3, the method 1 has the longest inference time delay due to no use of any edge device for accelerating inference, and the method has the best inference acceleration effect.

The foregoing lists merely illustrate specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims

1. An edge calculation method facing a deep neural network is characterized by comprising the following steps:

step 1: acquiring a linear regression equation of the calculation time-floating point operand of a convolution layer, an activation layer and a pooling layer in the available edge equipment; the step 1 specifically comprises the following steps:

Fc=2HW(C_inK²+1)C_out

the floating point operand Fr of the active layer is calculated by the formula:

Fr=HWC_in

Fp=K²H_outW_outC_out

1.3) Linear fitting of computation time and floating point operands:

y=kx+b

y is used for representing the predicted calculation time, and x is a floating point operand of each layer under the corresponding parameter setting; k is a slope and is used for representing a calculation force parameter; b is an intercept, which is used for representing an inherent overhead parameter;

and 5: filling data near the dividing points to realize data exchange distributed to adjacent edge equipment to obtain a final data distribution scheme; the step 5 specifically comprises the following steps:

5.1) marking data parts that cannot be normally convolved:

acquiring the position of each convolution along the longer side of the input characteristic diagram according to the convolution kernel size and step length of the convolution layer, marking the data which can not complete the corresponding partial convolution due to insufficient data after the segmentation, recording the initial position S of the data relative to the longer side of the input characteristic diagram in the initial data distribution scheme as a first edge device and a second edge device because the partial data is distributed to the convolution layers of two adjacent edge devices in the initial data distribution scheme_iAnd a termination position E_iThe division point is denoted as Seg_i；

5.2) data filling:

seg to be distributed to second edge device₁To (K + S)₁) Part of the data is filled into the first edge device, and is distributed to the first edge device (S)₁+ Str) to Seg₁Part of data is filled into the second edge device to realize data filling; k is the size of the convolution kernel in the convolutional layer or the filter in the pooling layer, Str is the convolution kernel length;

5.3) calculating the predicted calculation time length and the data transmission time length of the first edge device and the second edge device under the filling scheme of the step 5.2) to obtain the maximum value in the total predicted time length corresponding to each edge device, wherein the size of the transmission data is the sum of the size of the transmission data and the size of the return data, and the return data is the data of the input part of the next layer of convolution layer and the part which cannot complete normal convolution due to data segmentation under the filling scheme;

5.6) taking the second edge device as a first edge device, taking a third edge device adjacent to the second edge device as a second edge device, and repeating the steps 5.2) to 5.5) until a final data filling scheme among all available edge devices is obtained;

2. The edge computing method for the deep neural network as claimed in claim 1, wherein to obtain the edge-to-edge network environment, the edge device sends feature map data to the edge device, the edge device sends a response to the edge device upon receiving the data, and the data transmission time is recorded to obtain the data transmission rate Vi of the edge-to-edge network, which represents the data transmission rate between the ith edge device and the edge device.

3. The method of claim 2, wherein the initial data distribution scheme is as follows:

Tsi=(Fc*kci+Fr*kri)/L

Tpci=Tai+Thi

Tai=Tsi*li

Thi=bc+br

wherein, Tpci is a common calculation time length of a convolution layer and an adjacent activation layer of the ith edge device, Tai is a calculation time length of distribution data of the ith edge device, Thi is an inherent overhead of the ith edge device, and li is a length of a longer side of an input feature map distributed to the ith edge device; bc. br is the inherent overhead of the convolutional layer and the adjacent active layer, respectively;

4.4) calculating the data transmission time length Tpti of each edge device:

Tpti=(Vi*D*li)/L

wherein D is the data size corresponding to the input characteristic diagram;

4.6) in the initial allocation scheme, firstly, assuming that all edge devices Tpi are equal, obtaining the length li of the longer side of the input feature graph corresponding to the initial allocation data of all the edge devices according to the formulas in the steps 4.1) to 4.3), and taking the edge device corresponding to the maximum value in the li as the strongest computing force edge device;

rounding down all li values, adding the difference between L and the sum of li to the maximum value L in li_maxObtaining the integer length li of the longer side of the input characteristic diagram corresponding to the data distributed to each edge device in the initial distribution scheme; if li is smaller than the size of the convolution kernel, the edge device is removed, and the step 4.2) is returned to recalculate the initial distribution scheme.

4. The method according to claim 1, wherein the first convolutional layer and the adjacent active layer of the deep neural network model to be computed in the end device are unloaded according to the final data filling scheme among all available edge devices, and step 5 is repeated according to the feedback data to unload the next convolutional layer and the adjacent active layer again until all convolutional layers are unloaded.

5. The edge computing method facing the deep neural network as claimed in claim 4, wherein for pooled layers between the first convolutional layer and the last convolutional layer of the deep neural network model to be computed in the end device, the same data filling scheme as the convolutional layers and the activation layer is adopted to realize unloading of the pooled layers; the pooling operation of the pooling layer is only for partial region data and not global data.

6. The method of claim 5, wherein after all convolutional layers and adjacent active layers, and pooled layers from the first convolutional layer to the last convolutional layer are completed, the remaining layers are unloaded to the strongest computational edge device.

7. The edge calculation method for the deep neural network as claimed in claim 1, wherein after processing a batch of pictures each time, it is determined whether the recorded data transmission rate and the fluctuation value of the data transmission rate recorded before distribution exceed a threshold, and if the fluctuation exceeds the threshold, the data distribution in steps 3 to 6 and the data filling scheme near the segmentation point are performed again.

8. The edge computing method for the deep neural network of claim 1, wherein a remote calling tool is used for communication and data transmission between each edge device and the end device.