CN111459670A

CN111459670A - Method for performing cooperative processing at different levels of edge calculation

Info

Publication number: CN111459670A
Application number: CN202010234547.3A
Authority: CN
Inventors: 李新明; 刘斌
Original assignee: Edge Intelligence Of Cas Co ltd
Current assignee: Edge Intelligence Of Cas Co ltd
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2020-07-28

Abstract

The invention relates to a method for carrying out cooperative processing at different levels of edge calculation, which comprises the following steps: (a) based on a k-means clustering algorithm, clustering the task components with the delay sensitivity most similar to the calculated quantity; (b) predicting the available service time of the service center based on historical data, sensing dynamically-changed communication factors in real time, and evaluating whether the calculation unloading task can finish processing and feedback within a delay tolerance range so as to predict and segment calculation resources; (c) classifying and distributing priority to the service nodes to be coordinated; (d) judging whether the computing resources are enough or not, and when the computing resources of the service center are in shortage, sequentially determining the cooperative service nodes according to the priority; and when the computing resources of the service center are sufficient, performing cooperative processing on a plurality of service nodes. The purpose of reducing the calculation load of a single task can be achieved, and therefore the overall performance and cost of calculation task processing are optimized.

Description

Method for performing cooperative processing at different levels of edge calculation

Technical Field

The invention belongs to the technical field of cooperative processing of computing tasks, relates to a method for performing cooperative processing at different levels of edge computing, and particularly relates to a method for performing cooperative processing by integrating multiple heterogeneous resources at different levels (service nodes, service node clusters, service centers and the like) of edge computing.

Background

When the mobile terminal device needs to use a certain calculation result, the mode that the terminal processes the calculation task locally brings minimum delay, thereby ensuring the real-time performance of obtaining the calculation result. However, battery-powered mobile terminal devices often suffer from limited power supply and insufficient computing power of the mobile terminal device itself, subject to device size and weight constraints. In order to solve the above problems, a mobile cloud computing technology based on a cloud fusion idea is proposed, and a basic idea of the mobile cloud computing technology is to offload a compute-intensive task component from a terminal to a remote cloud platform for computing. The mobile cloud computing method can effectively utilize the advantages of the mobile equipment and the cloud computing platform, and adjust the load distribution in real time, so that the computing performance and the equipment energy efficiency are optimized. In consideration of network communication delay between the terminal and the cloud, the computing architecture with cooperation between the terminal and the cloud is very suitable for applications with a certain tolerance degree on delay, but cannot meet the urgent requirement of real-time interactive applications on low delay.

In order to make up for the defect of poor real-time performance of cloud remote processing, edge computing based on a network edge processing computing task idea close to a terminal is carried out. The university of miami, usa, studied the problem of load balancing across edges driven by renewable energy availability and proposed an optimal load scheduling algorithm based on machine learning. The university of empire state engineers in the united kingdom studies how to optimize service placement and load scheduling in coordination in the case of multiple edges and proposes an online cost and quality of service balancing optimization algorithm. In addition, the american university of masseurs amester branch school also proposes a fast and transparent virtual machine migration algorithm across edges, thereby supporting load scheduling and coordination across edges. It is noted that, although the resource cooperation across the edges is beneficial to utilize the energy efficiency and delay heterogeneity among the nodes and overcome the constraint of limited resource capacity of a single node, in a practical production environment, multiple terminals in the geographic vicinity may belong to different network operators. The increasing nature of operators also allows scarce edge computing resources to be shared with other operators only if sufficient economic incentives are provided. However, how to design an efficient economic incentive mechanism facing cross-edge resource collaboration is still an open problem, and this is a key point for the project to try to break through.

In fact, in an airborne action battle environment, the computational task of data processing is high in complexity and sensitive in time delay, while the service center or the service node is limited in computational capability, so that it is difficult to independently and rapidly complete data processing and feedback of battlefield information, and the communication bandwidth is limited and unstable, and difficult to cooperate.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a method for performing cooperative processing at different levels of edge calculation, so that a complex task with large workload is decomposed into a simple fine-grained task component set with small workload, the purpose of reducing the calculation load of a single task is achieved, and the overall performance and cost of the calculation task processing are optimized.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a method for carrying out cooperative processing at different levels of edge calculation comprises the following steps:

(a) based on a k-means clustering algorithm, clustering the task components with the delay sensitivity most similar to the calculated quantity;

(b) predicting the available service time of the service center based on historical data, sensing dynamically-changed communication factors in real time, and evaluating whether the calculation unloading task can finish processing and feedback within a delay tolerance range so as to predict and segment calculation resources;

(c) classifying and distributing priority to the service nodes to be coordinated;

(d) judging whether the computing resources are enough or not, and when the computing resources of the service center are in shortage, sequentially determining the cooperative service nodes according to the priority; and when the computing resources of the service center are sufficient, performing cooperative processing on a plurality of service nodes.

Optimally, in the step (b), the service center also maps the service node group to two dimensions of a network plane and a resource plane by sensing the network connectivity and the idle computing resource information among the service nodes in the coverage range.

Further, in step (c), in the network plane, a communication link map which is based on the direct connection communication relationship between the service nodes and the network plane and can support the computation and unloading of the service nodes is established; introducing node weight to depict the quantity of idle surplus computing resources of each service node in the communication link graph on the resource plane; and forming a service node resource distribution map facing the calculation unloading of the cooperative service node for classification by integrating the network surface and the resource surface.

Further, in the step (b), the computing task components are dynamically adjusted according to the sensed network environment, and computing adaptive task unloading based on task characteristics is realized.

Further, in step (b), the adaptive task is a deep learning inference model task.

Further, in step (b), the deep learning inference model includes:

(b1) an off-line training stage; in an off-line training stage, training a branch network required by a task, and training a regression model for different neural network layers in the branch network so as to estimate the running time delay of the neural network layers on a service center and a service node;

(b2) an online optimization stage; the regression model is used for finding an exit point and a model dividing point which meet the task delay requirement;

(b3) a collaborative inference phase; the service center and the service node equipment operate the deep learning model according to the obtained scheme.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages: the method for performing the cooperative processing at different levels of the edge calculation can achieve the purpose of reducing the calculation load of a single task by adopting a k-means clustering algorithm to aggregate task components, predict and divide calculation resources and allocate priorities, thereby optimizing the overall performance and cost of the calculation task processing.

Drawings

FIG. 1 is a diagram of a co-processor between a service center and service nodes according to the present invention;

FIG. 2 is a multi-service node ad hoc cluster cooperation construction diagram of the present invention;

FIG. 3 is a service node cluster cooperation construction based on community detection and aggregation in the present invention;

FIG. 4 is a diagram illustrating multiple levels and different exit points of the deep learning branch network according to the present invention;

FIG. 5 is a framework diagram of deep learning model runtime optimization based on cooperation of a service center and service nodes.

Detailed Description

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

The cooperation between the service center and the service node mainly occurs in two scenarios: firstly, the computing capacity of a service node is limited, and the computing task of the service node needs to be processed, so that the computing capacity of the node which can contribute to the whole node cluster is still relatively limited, and the overall task of the node cluster possibly exceeds the overall computing capacity of the cluster, so that the cooperation between the node cluster and a service center needs to be considered, and the overloaded computing task is unloaded to a tactical service center with relatively abundant resources; and secondly, aiming at some tasks which have high computing power requirements and are difficult to continue to perform task splitting or have low splitting degree, the single service node is used to achieve the goals of expected computing power constraint, time delay, energy consumption and the like, and at the moment, the service center is preferably directly used for carrying out cooperative processing. Therefore, a method for performing cooperative processing at different levels of edge calculation is proposed, as shown in fig. 1, which includes the following steps:

(a) and (4) clustering the task components with the delay sensitivity most similar to the calculated amount based on a k-means clustering algorithm.

(b) Predicting the available service time of the service center based on historical data, sensing dynamically-changed communication factors in real time, and evaluating whether the calculation unloading task can finish processing and feedback within a delay tolerance range so as to predict and segment calculation resources; the service center also maps the service node group to two dimensions of a network surface and a resource surface by sensing the network connectivity and the idle computing resource information among the service nodes in the coverage range;

(c) classifying and distributing priority to the service nodes to be coordinated; therefore, a communication link graph which is based on the direct connection communication relation between the service nodes and the network plane and can support the service nodes to calculate and unload is established on the network plane; introducing node weight to depict the quantity of idle surplus computing resources of each service node in the communication link graph on the resource plane; forming a service node resource distribution map for calculating and unloading the collaborative service nodes for classification by integrating a network surface and a resource surface;

In this embodiment, in step (b), the computation task component is further dynamically adjusted according to the perceived network environment, so as to achieve computation adaptive task offloading based on task features, where the adaptive task is a deep learning inference model task, and the deep learning inference model includes: (b1) an off-line training stage; in an off-line training stage, training a branch network required by a task, and training a regression model for different neural network layers in the branch network so as to estimate the running time delay of the neural network layers on a service center and a service node; (b2) an online optimization stage; the regression model is used for finding an exit point and a model dividing point which meet the task delay requirement; (b3) a collaborative inference phase; the service center and the service node equipment operate the deep learning model according to the obtained scheme.

In this embodiment, under the condition of resource shortage of the tactical service center, the cooperative request of the node with a higher priority is satisfied first, so that the task with sensitivity in time delay and less consumption of calculation amount can be executed preferentially, and the queuing time delay of the subsequent task is effectively reduced, therefore, the dynamic cooperative problem of the service center and the service node can be mathematically modeled as a non-convex 0-1 integer programming problem, and the dynamic programming and the genetic algorithm are used for solving the problem.

The coordination of the service node cluster is mainly oriented to the situation that when a plurality of service nodes unload complex data processing tasks to the same service center at the same time, the computation capability of the service center is difficult to meet the unloading requirement at the same time in a maneuvering operation environment. In addition, the service node is mobile at any time according to needs, the communication performance with the service center is unstable, and when the communication distance between the service node and the service center is large, the transmission overhead may even be higher than the performance benefit brought by calculation unloading. Aiming at the problems, in order to ensure that the computing task is executed efficiently and reduce the dependence on the service center, an efficient service node cluster construction mechanism based on auxiliary sensing of the service center is adopted, the computing task is processed by utilizing the computing power of adjacent nodes in a coordinated mode, and the data is processed nearby at the source end to the maximum extent, so that the feedback is fast, and the fighting response capability is improved. As shown in fig. 2, by sensing information such as network connectivity and idle computing resources between tactical service nodes in a coverage area, the service center may map a node group to two dimensions: a network plane and a resource plane. On the network surface, a communication link diagram which is based on the direct connection communication relation between the service node and the nodes and can support the node calculation unloading is established; and in the resource plane, introducing node weight to describe the amount of idle surplus computing resources of each service node in the communication link graph. By integrating the network plane and the resource plane, a service node resource distribution map for collaborative node calculation unloading can be formed.

In this way, the micro-computing clusters constructed by the adjacent service nodes share the redundant computing resources, and realize the cooperative data processing, namely, form a mobile resource sharing domain. The method and the device utilize a community detection and aggregation algorithm (CommunnetyDetection and Merging Algorithms) to design a service node cluster efficient cooperation construction mechanism which takes a service node processing task as a drive. As shown in fig. 3, a mobile resource sharing domain unit composed of a plurality of service nodes in the service node resource distribution map is identified by using a Community Detection (Community Detection) algorithm in graph analysis. Furthermore, a Coalition composition (coordination format) technology in a cooperative game theory is adopted to aggregate the discovered resource domain units, and different computing resources are balanced and reasonably summarized from a plurality of resource domains according to computing unloading requirements of different types of tasks, so that the computing unloading requirements of multiple types of applications are effectively supported, and the clustering processing capacity is improved.

As the number of requests submitted by service nodes to a service center increases, a single service center inevitably faces the problem of limited computing capacity, and is difficult to respond to newly added computation offload requests in real time. Through mutual cooperation of the plurality of service centers, the calculation task is unloaded to the relatively idle service center, so that the real-time processing of the calculation task is realized, and the service quality is ensured. Because the service centers to be selected have difference, the proper service centers are selected for cooperation to ensure the service quality in two aspects of whether the service centers to be selected can trust and whether the task can meet the requirement of delay. From the viewpoint of reliability, in a mobile environment, the geographical position of the service center dynamically changes, and the communication is likely to be interrupted due to an increase in distance, and the result of data processing cannot be fed back. Resource allocation is also an important factor influencing the reliability, and the task execution is interrupted due to insufficient energy stored in the service center; flash memory is adopted in a large number of embedded devices of the service center, and the erasing frequency of the flash memory is 10⁵In order to deal with the data processing tasks that require a large number of updates, the erase/write lifetime of such non-volatile storage media is also one of the factors that affect the quality of service. The method comprises the steps that 3 aspects of mobility, energy storage and reliability of storage equipment of the service center are integrated into comprehensive trust, a comprehensive trust evaluation system and a comprehensive trust evaluation model are formed, trust evaluation is conducted on a plurality of adjacent service centers, whether the service centers meet the requirement of the computing tasks on the trust is judged, and normal unloading and feedback of the computing tasks are guaranteed. From the perspective of response delay, the heterogeneous computing capacity of the tactical service center is used for predicting the available service time of the tactical service center based on historical data, and dynamically-changed communication factors such as network bandwidth and the like are sensed in real time, so that whether the computation unloading task can complete processing and feedback within a delay tolerance range is evaluated in real time, sharing and optimal utilization of resources among the tactical service centers are realized, computation unloading is carried out efficiently, and the service quality requirement of a data processing task is better met.

In the application, the service node has the characteristics of weak computing capability, limited energy, communication resources and the like, and in the whole environment, the communication condition is severe and dynamically changed, at the moment, whether the task of the tactical service node is unloaded or not is considered, meanwhile, improvement aiming at the task can be considered, external resource constraint required by task operation is reduced or relieved to a certain extent, and achievement of a task target is guaranteed as far as possible. Therefore, the calculation components can be dynamically adjusted according to the perceived network environment, a calculation self-adaptive task unloading technology based on task characteristics is realized, the response time delay of the service node is shortened, and the accuracy of the calculation result is ensured to be as high as possible. Most importantly, deep learning is applied by training neural networks.

Deep learning inference models, particularly convolutional neural networks, include convolutional layers, pooling layers, fully-connected layers, and the like. Because of the consumption of a large amount of computing resources, it is very difficult to directly run a neural network model on a resource-limited service node device, but because the computing resource requirements of different neural network layers and the size of output data volume have significant differences, an intuitive solution is to divide the whole deep learning model, i.e. the neural network, into two parts, wherein the part with large computing amount is unloaded to a service center, and the part with small computing amount is retained at a service node. The service node and the service center cooperatively infer, so that the inference time delay of the deep learning model is effectively reduced, however, different model segmentation points are selected to result in different calculation times, so that the optimal model segmentation point is selected through task self-adaptation, and the advantage of cooperation of tactical service node equipment and the service center can be maximally exerted.

The simplified model is another means for accelerating deep learning model inference and optimizing the unloading process of the computing task, namely selecting a small model with low computing resource requirement and quicker completion time instead of selecting a large model with higher resource overhead. For any deep learning task, a branch network with a plurality of exit points can be trained offline, as shown in fig. 4, the branch network itself also has multiple levels, and the increase of the number of layers can improve the accuracy of the model, but this will also cause the exit points of the inference model to be delayed and generate a long time delay. This property of the branch network makes it necessary to trade off between the accuracy of the deep learning model and the inference delay.

Therefore, when the completion time of the deep learning task is very urgent, accuracy can be properly sacrificed in exchange for better performance (i.e., latency). Obviously, the deep learning model simplification can bring the balance problem between time delay and accuracy, and although the model simplification is realized and the calculation time is shortened by exiting the model method in advance, the accuracy of deep learning model deduction can be reduced due to the reduction of the calculation amount. Considering certain applications, there are strict latency requirements with a certain loss of accuracy tolerated. Therefore, a trade-off between performance and accuracy needs to be carefully balanced. In particular, the accuracy of the segmentation scheme is maximized without violating the latency requirements, taking into account the pre-set strict latency objectives.

The two optimization means of adjusting the deep learning model inference time, such as the model segmentation and the model simplification, are comprehensively applied, the tradeoff relationship between the performance and the precision caused by the optimization is carefully balanced, and for the deep learning task with the given time delay requirement, the two decisions of the model segmentation and the model simplification are jointly optimized, so that the precision of the deep learning model is maximized while the time delay requirement is not violated. A deep learning model runtime optimization framework based on service center and service node coordination can be adopted. As shown in fig. 5, the optimization logic is divided into three phases: an offline training phase, an online optimization phase, and a collaborative inference phase.

The deep learning model inference framework idea based on the cooperation of the service center and the service node equipment is as follows: in an off-line training stage, training a branch network required by a task, and training a regression model for different neural network layers in the branch network so as to estimate the running time delay of the neural network layers on a service center and a service node; in the online optimization stage, the regression model is used for finding an exit point and a model dividing point which meet the task delay requirement; in the collaborative inference phase, the service center and the service node equipment operate the deep learning model according to the obtained scheme.

The off-line training phase requires two initialization operations to be performed: (1) and analyzing the performance of the service center and the service node equipment, and generating a time delay estimation model based on a regression model aiming at different types of deep learning model network layers such as a convolution layer and a pooling layer. In the preliminary experiment, the time delay of different network layers is determined by respective independent variables (such as the size of input data and the size of output data), and a regression model can be established to estimate the time delay of each network layer based on the independent variable of each layer; (2) the branch network model with a plurality of exit points is trained, so that model simplification is realized, a BranchyNet branch network structure is adopted, and a branch network with a plurality of exit points can be designed and trained under the BranchyNet structure. It should be noted that the performance analysis depends on the device, while the deep learning model depends on the application, so that the service center and the service node device are defined in the case of a given resource, and the above two initialization operations need only be completed once in the offline phase.

In the online optimization stage, the main work is to find the exit point and the model dividing point which meet the delay requirement in the branch network by using the regression model trained offline, and because the accuracy of the given scheme needs to be maximized, the exit point and the model dividing point which meet the requirement are iteratively found in the online optimization stage from the branch with the highest accuracy in an iterative manner. In the process, the network bandwidth of the link between the current service node and the service center is measured in real time, so that the data transmission delay between the service node and the service center is estimated. And then, sequentially traversing different segmentation points on each network branch along the network branches with the sizes from large to small, and estimating the response delay and model accuracy of the service node corresponding to the segmentation points of the selected network branch based on the current network bandwidth and the calculation time of different network layers. After traversing all the branch networks and the dividing points, outputting a combination with the maximum accuracy in all the network branch and dividing point combinations meeting the time delay requirement.

And in the collaborative inference stage, the service center and the service node carry out collaborative inference on the deep learning model according to the combination of the optimal network branch and the segmentation point output in the online optimization stage.

Each tactical service node has limited computing resources (energy, computing capacity and the like), tasks of the tactical service nodes can be decomposed into fine-grained task components through computing unloading, and the fine-grained task components are partially unloaded to a service center for remote computing, and partially remain in the local to complete computing. Meanwhile, because the communication resources in the tactical environment are limited and unstable, the communication condition is also one of the important factors influencing whether to unload the calculation task under the condition that the service node faces the limit of the self calculation capability. On the other hand, some relay devices (satellite communication, tactical wireless base stations, etc.) of most wireless networks use multi-channel communication, and when a plurality of service nodes request to perform computation task offloading, how to select a proper communication channel for the service node to realize effective communication and complete computation offloading becomes a key problem. If too many serving nodes simultaneously select the same wireless channel to offload computing tasks to the service center, then severe interference with each other may result, which will reduce the data transfer rate of the computing offload, possibly resulting in low energy efficiency and long data transfer delays. In order to realize efficient calculation and unloading of tasks for multiple nodes, a calculation task unloading decision problem among multiple nodes and multiple channels is modeled into a multi-node calculation unloading game based on a game theory.

The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims

1. A method for performing cooperative processing at different levels of edge computation is characterized by comprising the following steps:

2. The method of claim 1, wherein the edge computation is performed at different levels of co-processing, and the method further comprises: in the step (b), the service center also maps the service node group to two dimensions of a network surface and a resource surface by sensing the network connectivity and the idle computing resource information among the service nodes in the coverage area.

3. The method of claim 2, wherein the edge computation is performed at different levels: in the step (c), a communication link graph which is based on the direct connection communication relationship between the service nodes and can support the calculation and unloading of the service nodes is established on the network surface; introducing node weight to depict the quantity of idle surplus computing resources of each service node in the communication link graph on the resource plane; and forming a service node resource distribution map facing the calculation unloading of the cooperative service node for classification by integrating the network surface and the resource surface.

4. The method of claim 2, wherein the edge computation is performed at different levels: in the step (b), the calculation task components are dynamically adjusted according to the sensed network environment, and calculation self-adaptive task unloading based on task characteristics is realized.

5. The method of claim 4, wherein the edge computation is performed at different levels: in the step (b), the adaptive task is a deep learning inference model task.

6. The method for collaborative processing at different levels of edge computation according to claim 5, wherein in step (b), the deep learning inference model comprises: