CN115756803A

CN115756803A - Task scheduling method, device, equipment and medium for heterogeneous computing system

Info

Publication number: CN115756803A
Application number: CN202211520501.3A
Authority: CN
Inventors: 陈雨濛; 刘松林; 陈彦君; 倪寒琦; 凌翔
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-03-07

Abstract

The invention discloses a task scheduling method, a device, equipment and a medium for a heterogeneous computing system, wherein the method comprises the following steps: acquiring a directed acyclic task graph corresponding to an application program task on a target heterogeneous computing system, determining a task node queue of the directed acyclic task graph, and determining a probability distribution matrix and a topological feasible point queue based on a topological relation between the task node queue and the directed acyclic task graph; based on a random walk algorithm, taking any random number in a preset range, and determining a plurality of designated points in a probability distribution matrix based on the random number to obtain a designated point set; converting the topology feasible point queue into a side coverage queue based on the specified point set; performing simulated scheduling on the edge coverage queue based on a probability distribution algorithm, and updating a current probability distribution matrix; and repeating the random walk and probability distribution estimation algorithm for a preset number of times to obtain a target probability distribution matrix, and further obtaining a target scheduling scheme. The invention provides a method for scheduling by using a side coverage queue, and designs a method for generating the side coverage queue based on a probability distribution estimation algorithm and a graph random walk strategy, so that the operation complexity and the iteration times are reduced.

Description

Task scheduling method, device, equipment and medium for heterogeneous computing system

Technical Field

The invention relates to the technical field of task scheduling of heterogeneous computing systems, in particular to a task scheduling method, device, equipment and medium for a heterogeneous computing system.

Background

A heterogeneous computing system refers to a set of processors with different computing and storage capabilities and interconnected. Due to the diversity of tasks and the difference of processor process architectures, heterogeneous computing systems exist widely in various computing scenarios. In practical applications, each application program is usually modeled as a Directed Acyclic Graph (DAG), each node in the Graph represents a subtask of the application program, and then the DAG is scheduled to a heterogeneous computing system. The efficient scheduling scheme can improve heterogeneous computing system performance and user experience quality. Due to the heterogeneity of heterogeneous computing systems, priority constraints between tasks, and NP-Hard nature of directed acyclic graphs, efficient scheduling is difficult to achieve.

The existing scheduling algorithms are divided into two categories, namely table scheduling algorithms and evolutionary algorithms. The table scheduling algorithm has low operation complexity and time complexity, but the scheduling result is not ideal enough in many scenes, and the table scheduling algorithm is easy to fall into a local optimal solution. The evolutionary algorithm is a global optimization algorithm, can obtain excellent scheduling results under enough computation amount, but has very high computation complexity and time complexity. In parallel and distributed heterogeneous computing systems, heuristic based task scheduling algorithms typically include two phases of task prioritization and processor selection. In a heuristic based task scheduling algorithm, different priorities may result in different maximum completion times on the heterogeneous computing system. Therefore, a good scheduling algorithm should be able to efficiently assign priorities and processors to each subtask according to a minimized maximum completion time.

The use of the table scheduling algorithm is divided into two phases. Firstly, according to task priority, tasks are sequenced according to weight, and a task scheduling queue is obtained. The tasks in the queue are then placed onto the processor in sequence in some fixed manner. Most of the early scheduling algorithms are directed at homogeneous processing systems, and among the representative algorithms in this category, the Earliest Start Time (EST) algorithm and the Earliest completion Time (EFT) algorithm are the simplest greedy-of-Time algorithms. To accommodate heterogeneous environments, a heterogeneous earliest completion Time (HEFT) and a critical path algorithm are proposed. In the heterogeneous earliest completion time algorithm, tasks are ordered according to the longest path length value from a node to the bottom of the DAG. The tasks are then sequentially allocated to the processor that achieves the earliest completion time, taking into account the available free time slots of the processor. In the critical path algorithm, a critical path of the DAG is calculated first, then the critical path is allocated to the same processor which enables the completion time of the critical path to be shortest, and other tasks are allocated in sequence. The prediction class table scheduling algorithm schedules the subsequent tasks by estimating the influence of the task allocation. In the Predict Earliest completion Time (PEFT) algorithm, which is one of the Predict class table scheduling algorithms, the algorithm constructs an optimistic cost table listing the shortest paths from its children to the egress node for each combination of task and processor. The tasks are ordered from high to low using an average Optimistic Cost Table (OCT) value per task. And finally, introducing an idle time slot strategy available for an insertion processor, and distributing tasks to the processor with the shortest EFT + OCT. The predictive look-ahead (LO) algorithm refers to a relatively special table scheduling algorithm, each task is allocated to a processor which minimizes the completion time of all its subtasks during scheduling, and the method has good effect in medium-scale DAG and some specific scene scheduling, but is also a type of table scheduling algorithm with the highest operation complexity.

The evolutionary algorithm is a globally optimized algorithm and can provide a satisfactory solution for complex problems within an acceptable time. For the DAG-SP problem, various evolutionary algorithms, genetic Algorithm (GA), ant Colony Optimization (ACO), and Differential Evolution (DE), have been tried to be used in the task scheduling problem. Besides directly providing a task-processor scheduling scheme, the evolutionary algorithm has research work on the evolutionary algorithm aiming at the initial task scheduling queue priority. The MPQGA method provides a task scheduling scheme of a heterogeneous computing system based on a multi-priority queue genetic algorithm. A Genetic Algorithm (GA) approach is suitable for the cross, mutation, and fitness functions of Directed Acyclic Graph (DAG) scheduling scenarios to assign a priority to each subtask while searching for a solution to the task-to-processor mapping using a heuristic-based earliest time to completion (EFT) approach. However, the large number of iterations and the iteration time of the evolutionary algorithm also limit the application range of the evolutionary algorithm.

Disclosure of Invention

The invention aims to solve the problems that the operation complexity of an evolutionary algorithm in the existing DAG scheduling algorithm for the heterogeneous computing system is high, the iteration time is long, the actual function is difficult to play, and the table scheduling algorithm effect is poor.

In order to achieve the above purpose, the invention provides the following technical scheme:

a task scheduling method for a heterogeneous computing system comprises the following steps:

s1, obtaining a directed acyclic graph corresponding to a target heterogeneous computing system,

s2, determining an initial scheduling scheme and a task node queue corresponding to the directed acyclic graph, determining a probability distribution matrix based on the task node queue and a topological relation thereof, and constructing a topological feasible point queue based on the topological relation of the task node queue;

s3, based on a random walk algorithm, taking any random number in a preset range, and determining a plurality of designated points in the probability distribution matrix based on the random number to obtain a designated point set; converting the topology feasible point queue to a side-covered queue based on the specified set of points;

s4, performing simulated scheduling on the edge coverage queue based on a probability distribution algorithm, and updating a current probability distribution matrix;

and S5, repeating the steps S3 to S4 until the preset times are reached, obtaining a target probability distribution matrix and a side coverage queue thereof, and further generating a target scheduling scheme.

According to a specific implementation manner, in the task scheduling method for the heterogeneous computing system, the preset range is:

[0，1/(n+1)]

and n is the total number of nodes in the task node queue.

According to a specific implementation manner, in the task scheduling method for the heterogeneous computing system, the determining a plurality of designated points in the probability distribution matrix based on the random number includes:

traversing all nodes in the probability distribution matrix, and judging whether the imaginary part of the diagonal element corresponding to each node exceeds the random number, if so, the node is an appointed point, and if not, the node is not the appointed point.

According to a specific implementation manner, in the task scheduling method for a heterogeneous computing system, the converting the topology feasible point queue into an edge coverage queue based on the designated point set includes:

traversing each node pi starting from the second node in the topology feasible point queue _i Judging pi _i If the node is a node in the appointed point set, if so, the pi is _i And its successor node succ (pi) _i ) Conversion to (pi) _i ,succ(pi _i ) Put into the edge coverage queue, if not, select node pi _i Corresponding target preamble node pred (pi) _i ) Will (pic (pi) _i ),pi _i ) Adding the edge coverage queue; wherein, the target preamble node is the preamble node with the largest communication cost with the current node.

According to a specific implementation manner, in the task scheduling method for the heterogeneous computing system, the method further includes: a deserializing step after converting the topology feasible point queue to an edge covered queue based on the specified set of points;

the step of decontinuousness comprising: based on a set of points specified [ m ] _i ]Find (pred (m) in the edge coverage queue _i ),m _i ) Checking whether logic error is generated after deleting the edge, if not, deleting, and if so, keeping.

According to a specific implementation manner, in the task scheduling method for the heterogeneous computing system, in S2, determining the probability distribution matrix based on the task node queue and the topological relation thereof includes:

initializing a complex matrix, and assigning values to elements in the complex matrix based on the task node queue and the topological relation thereof; calculating and sequencing the PL value of each node in the task node queue; choose the PL value from big to little

Each node forms a point set PL _ S; wherein n is the total number of nodes in the task node queue, and PL is the isomerism variance of the task on the processor;

and performing probability distribution iteration on the complex matrix based on the point set PL _ S until the preset times are reached, and generating the probability distribution matrix.

According to a specific implementation manner, in the task scheduling method for the heterogeneous computing system, the preset times are

And secondly, wherein n is the total number of nodes in the task node queue.

In another aspect of the present invention, a task scheduling apparatus for a heterogeneous computing system is provided, including:

an obtaining unit, configured to obtain a directed acyclic graph corresponding to a target heterogeneous computing system,

the device comprises an initialization unit, a probability distribution unit and a task node queue, wherein the initialization unit is used for determining an initial scheduling scheme and a task node queue corresponding to the directed acyclic graph, determining a probability distribution matrix based on the task node queue and a topological relation thereof, and constructing a topological feasible point queue based on the topological relation of the task node queue;

the edge coverage queue iteration unit is used for generating and outputting a target scheduling scheme, wherein the target scheduling scheme is generated by the following method and comprises the following steps: s3, based on a random walk algorithm, taking any random number in a preset range, and determining a plurality of designated points in the probability distribution matrix based on the random number to obtain a designated point set; converting the topology feasible point queue to a side-covered queue based on the specified set of points; s4, performing simulated scheduling on the edge coverage queue based on a probability distribution algorithm, and updating a current probability distribution matrix; and S5, repeating the steps S3 to S4 until the preset times are reached, obtaining a target probability distribution matrix and a side coverage queue thereof, and further generating a target scheduling scheme.

In another aspect of the present invention, an electronic device includes a processor, a network interface, and a memory, where the processor, the network interface, and the memory are connected to each other, where the memory is configured to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the above task scheduling method for a heterogeneous computing system.

In another aspect of the present invention, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores program instructions, and the program instructions, when executed by at least one processor, are configured to implement the above task scheduling method for a heterogeneous computing system.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides an idea of scheduling by using a side cover queue, wherein the side cover queue can explore local greedy on the basis of greedy points in a scheduling process, is a mixed scheduling strategy of greedy points and greedy edges, and designs a generation method of the side cover queue based on an Estimation of Distribution Algorithm (EDA) and a graph random walk strategy, and has lower operation complexity and iteration times.

Drawings

FIG. 1 is a flowchart of a task scheduling method for a heterogeneous computing system according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating operation of a task scheduler for a heterogeneous computing system according to an embodiment of the present invention;

FIG. 3 is a DAG graph and task representation intent thereof in one embodiment of the invention;

FIG. 4 is a diagram illustrating a final scheduling scheme in one embodiment of the present invention;

FIG. 5 is a comparison of the operation of the ECSA and other algorithms of the present invention in one embodiment with FIG. 1;

FIG. 6 is a comparison of the operation of the ECSA and other algorithms of the present invention in one embodiment to FIG. 2;

FIG. 7 is a comparison of the operation of the ECSA and other algorithms of the present invention in one embodiment with FIG. 3;

FIG. 8 is a block diagram of an electronic device in one embodiment of the invention.

Detailed Description

The present invention will be described in further detail with reference to test examples and specific embodiments. It should be understood that the scope of the above-described subject matter is not limited to the following examples, and any techniques implemented based on the disclosure of the present invention are within the scope of the present invention.

Example 1

Fig. 1 illustrates a task scheduling method for a heterogeneous computing system according to an exemplary embodiment of the present invention, including:

and S5, repeating the steps S3 to S4 until the preset times are reached, obtaining a target probability distribution matrix and an edge coverage queue thereof, and further generating a target scheduling scheme (namely, what task each processor processes at what time).

The embodiment provides the idea of scheduling by using the edge covering queue, and the edge covering queue can explore local greedy on the basis of greedy point in the scheduling process, and is a mixed scheduling strategy of greedy point and greedy edge; based on an Estimation of Distribution Algorithm (EDA) and a graph random walk strategy, a generation method of an edge coverage queue is designed, and operation complexity and iteration times are reduced.

Example 2

In a possible implementation manner, in the task scheduling method for the heterogeneous computing system (hereinafter referred to as an ECSA algorithm), in S2, determining an initial scheduling scheme and a task node queue corresponding to the directed acyclic graph specifically includes:

generating a DAG graph point queue by adopting any heuristic scheduling algorithm (such as a HEFT algorithm, a PEFT algorithm and the like) for an acquired DAG graph (directed acyclic graph) corresponding to the target heterogeneous computing system; and comparing the pi _ HEFT with the pi _ PEFT, and selecting the initial scheduling scheme with better initial performance as a DAG graph and the task node queue Qi.

In a possible implementation manner, in the task scheduling method for the heterogeneous computing system, the determining, in S2, a probability distribution matrix based on the task node queue and the topological relation thereof, and constructing a topological feasible point queue based on the topological relation of the task node queue specifically include:

s21, converting the adjacent matrix of the DAG into a 01 matrix A, solving the longest path S of the A matrix through dynamic planning, and further solving the P = A ¹ +A ² +…+A ^s . The P matrix contains all directed path information of the DAG graph and can be used for representing the topological relation of the nodes, and in the obtained P matrix, P is _ij >0 represents that the i node is a topology preorder node of the j node, and the i node and the j node can not be exchanged; p _ij And =0 indicates that the i node has no topological relation with the j node, and the i and the j nodes can be exchanged freely. Corresponds to P _ij Nodes i and j of =0 are dominant forces in random diffusion, P _ij Nodes of =0 are the main subject of EDA algorithms; and constructing a topological feasible point queue (a queue without violating the topological relation) based on the topological relation of the task node queue

S22, initializing a complex matrix S _(n+1)(n+1) . For the element S of i row and j column _ij And carrying out assignment according to the following rules to obtain an assigned complex matrix S:

if i = j: re (S) _ij )＝0

If i ≠ j: re (S) _ij ) =0.5or0 (if P in P matrix _ij If =0, 0.5 is given; if P is _ij If not equal to 0, then 0 is assigned,

and calculating each node q in the task node queue Qi _i PL (prediction lookahead, task on processor variance) and ordering:

choose the PL value from big to little

Each node constitutes a set of points PL _ S.

And assigning values to the elements in the matrix S according to the following rules:

if i = j: im (S) _ij ) = 1/(n + 1) or0 (0 if i = 0); wherein n is the total number of nodes in the task node queue;

if i ≠ j: im (S) _ij )＝0；

S22, sequentially taking one point from the point set PL _ S as a designated point; converting the task node queue into an edge coverage queue based on the designated point: traversing each node q starting from the second node of the task node queue _i Judgment of q _i Whether it is the designated point, if soIf yes, then q is _i And the subsequent node succ (qi is converted into (ni, succ (qi)) to be put into the edge covering queue, if not, the node n is selected _i A corresponding target preamble node pred (qi), adding (pred (qi), qi) to the edge coverage queue; the target preorder node is a preorder node with the largest communication cost with the current node; based on the currently specified point s _i Finding out (pred (si), si) in the edge covering queue, checking whether a logic error is generated after deleting the edge, if not, deleting, if yes, reserving, and generating a continuous edge covering queue;

accordingly, the imaginary part of the S matrix is updated based on the generated scheduling result of the edge coverage queue:

Im(S _ii )＝k*Im(S _ii )+(1-k)*Δt；

s23, repeating the step S22 for preset times (

Second), to traverse each point in the point set PL _ S, normalization is performed on the imaginary part of the diagonal elements of the matrix S, and the sum of the elements on the diagonal is guaranteed to be an imaginary unit. So far, the probability distribution matrix S is initialized.

In a possible implementation manner, in the task scheduling method for a heterogeneous computing system, the S3 specifically includes:

s31, based on a random walk algorithm, taking any random number within a preset range; the preset range is as follows: [0,1/(n + 1) ]; wherein n is the total number of nodes in the task node queue;

s33, traversing all nodes in the probability distribution matrix, judging whether the imaginary part of the diagonal element corresponding to each node exceeds the random number, if so, the node is an appointed point, otherwise, the node is not the appointed point, and further obtaining an appointed point set [ m [ m ] ] _i ]；

And S33, converting the topological feasible point queue into an edge coverage queue based on the specified point set.

Traversing each node pi starting from the second node in the topology feasible point queue _i Judging pi _i Whether or not to determine whether or not to performIf the node in the appointed point set is the node in the appointed point set, the pi is added _i And its successor node succ (pi) _i ) Conversion to (pi) _i ,succ(pi _i ) Put into edge coverage queue, if not, select node pi _i Corresponding target preamble node pred (pi) _i ) Will (pred (pi) _i ),pi _i ) Adding the edge coverage queue; wherein, the target preamble node is the preamble node with the largest communication cost with the current node.

And, based on the set of designated points [ m ] _i ]Find (pred (m) in the edge-covered queue set _i ),m _i ) Checking whether a logic error is generated after deleting the edge, if not, deleting, and if so, keeping; generating an edge coverage queue after continuous removal;

further, after the edge coverage queue is generated, S4 (performing simulated scheduling on the edge coverage queue based on a probability distribution algorithm, updating a current probability distribution matrix) is executed, and the S3 is returned; repeating the steps S3 to S4 until reaching a preset number of times (

And then) obtaining a target probability distribution matrix, and generating an edge coverage queue and a target scheduling scheme based on the target probability distribution matrix.

In particular, the graph random walk algorithm is to traverse a graph starting from one or a series of vertices. At any vertex, the traverser walks to a neighbor vertex of the vertex with a probability (1-a), randomly jumps to any vertex in the graph with a probability a, and obtains a probability distribution after each walk, wherein the probability distribution describes the probability that each vertex in the graph is visited, and the probability distribution is used as the input of the next walk and the process is iterated repeatedly. When certain preconditions are met, this probability distribution tends to converge; after convergence, a smooth probability distribution can be obtained.

In the ECSA algorithm, the contents of the random walk module are as follows:

random (0,1/(n + 1)), points whose diagonal is equal to or greater than this value are placed in the point set (if an edge can be formed in the selected point, the one with the greater PL value is selected), with the order being that the imaginary part is increased to decreased.

the device comprises an initialization unit, a scheduling unit and a task node queue, wherein the initialization unit is used for determining an initial scheduling scheme and the task node queue corresponding to the directed acyclic graph, determining a probability distribution matrix based on the task node queue and the topological relation thereof, and constructing a topological feasible point queue based on the topological relation of the task node queue;

the edge coverage queue iteration unit is used for generating and outputting a target scheduling scheme, wherein the target scheduling scheme is generated by the following method and comprises the following steps: s3, based on a random walk algorithm, taking any random number in a preset range, and determining a plurality of designated points in the probability distribution matrix based on the random number to obtain a designated point set; converting the topological feasible point queue to an edge-covered queue based on the specified set of points; s4, performing simulated scheduling on the edge coverage queue set based on a probability distribution algorithm, and updating a current probability distribution matrix; and S5, repeating the steps S3-S4 until the preset times are reached, obtaining a target probability distribution matrix, and generating an edge coverage queuing and target scheduling scheme based on the target probability distribution matrix.

Specifically, the edge coverage queue iteration unit includes: the device comprises a side coverage queue generating module, a probability distribution module (EDA) and a random walk module;

the edge covering queue generating module: the edge coverage queue generation is divided into two sub-modules, including: a point queue-edge covering queue conversion module and a continuity removal module.

1.1 point queue-edge covering queue conversion module:

the point queue-edge covering queue conversion module obtains a topological feasible point queue pi and a specified point queue [ n ] _i ]. Starting from the second node of pi, for node pi _i Become (pred (pi) _i ),pi _i ) And placed into a display queue. If pi _i Is a fixed point queue n _i ]One member of the family, all succ (pi) _i ) Become (pi) _i ,succ(pi _i ) And placed in an explicit queue. If not, then select pred (pi) for the largest edge weight _i ). Will remain (pred (pi) _i ),pi _i ) Put into an implicit queue. And finally, taking the obtained display queue as the output of the module.

1.2 decontinity Module:

in the edge covering queue generated by the point queue-edge covering queue conversion module, if n is needed _i Becomes a non-consecutive node, finds (pred (n) _i ),n _i ) And checking whether logic errors are generated after deletion, and deleting the logic errors if topology errors are not generated.

And the probability distribution module (EDA) is divided into an EDA-exchange module for guiding the topological relation change and an EDA-update module for guiding the S matrix update.

2.1EDA-exchange Module:

all (n) _i ,succ(n _i ) Put out to make a topological sort and put it into an empty queue in turn, corresponding succ (n) in S matrix in the process of putting _i ) The wheel method determines the position of the edge to obtain (n) _i ,succ(1))，(n _i Succ (2)) …, and the edges are filled back into the explicit queue in this order. If the backfill position has a topological error, the backfill position is continuously placed at the head of the queue and the next edge is selected to be backfilled. If last [ n ] _i ]If the queue is not empty, the position is backfilled to the initial position.

2.2EDA-update module:

according to delta t and succ (n) of the new queue and the old queue _i ) Topological relation between them. To n _j ∈ succ(n _i ),n _k ∈succ(n _i ) If task n _j At n _k The method comprises the following steps:

3. random walk module

The graph random walk algorithm is to traverse a graph starting from one or a series of vertices. At any vertex, the traverser walks to a neighbor vertex of the vertex with a probability (1-a), randomly jumps to any vertex in the graph with a probability a, and obtains a probability distribution after each walk, wherein the probability distribution describes the probability that each vertex in the graph is visited, and the probability distribution is used as the input of the next walk and the process is iterated repeatedly. This probability distribution tends to converge when certain preconditions are met. After convergence, a smooth probability distribution can be obtained.

In the ECSA algorithm, the contents of the random walk module are as follows:

random (0,i/(n + 1)), points whose diagonal is equal to or greater than this value are placed in the point set (if an edge can be formed in the selected point, the one with the greater PL value is selected), with the order being that the imaginary part is increased to decreased.

Accordingly, the modules form a side coverage queue iteration unit, in the side coverage queue iteration module, an S matrix guides the generation of a new side coverage queue, and the specific steps are as follows:

(1) An initial queue and a specified scheduling algorithm are received.

(2) By using random walk, a random number is taken in a preset range, points with values on the diagonal line and imaginary parts larger than or equal to the value are put into a point set (if one of the selected points can form an edge, the point with the larger PL value is selected and the other point is deleted), and the sequence is that the imaginary parts are reduced from large to small.

(3) And sending the initial point queue and the point set transmission point queue to a side coverage queue generating module to obtain a side coverage queue.

(4) The edge covering queue and the point set are transmitted into a continuity removing module to obtain a new edge covering queue.

(5) The local EDA module sequentially focuses on the point n under the guidance of the real part of the S matrix _i Succ (n) of _i ) Topological relationsChanges are made and the real part of the S matrix is updated according to the scheduling results.

(6) The imaginary parts of all the points in the set of points are updated and normalized.

(7) Repeating the steps (2), (3), (4), (5) and (6) until the steps are finished

And outputting a scheduling result.

Accordingly, the task scheduling apparatus for the heterogeneous computing system provided in the embodiment of the present invention takes the communication matrix and the heterogeneous computing matrix of the DAG as inputs, and takes the scheduling scheme of the DAG as an output, and specifically, as shown in fig. 2, a workflow of the scheduling method is shown.

In summary, the embodiment of the present invention applies the edge covering queue to the task scheduling model modeled as the directed acyclic graph, and generates a suitable edge covering queue by using the probability distribution algorithm and the random walk algorithm to obtain a better scheduling result. Theories and experiments prove that the scheduling length can be effectively reduced on a random task diagram and common engineering program diagrams (FFT, GE).

Example 3

In a further embodiment of the present invention, a DAG graph corresponding to a certain heterogeneous computing system and a task node queue thereof shown in fig. 3 are taken as an example to illustrate a flow of the ECSA scheduling algorithm according to the embodiment of the present invention, including:

1. obtaining an adjacency matrix A of the DAG graph, and calculating to obtain the longest path length of 4; calculating the topology matrix P = A ¹ +A ² +…+A ⁴ Thus, a topology matrix corresponding to the DAG is obtained;

2. scheduling the DAG by using a Fastest Time (FT) or fastest OEFT through a heuristic greedy method, and selecting a better heuristic greedy method (FToroOEFT) from the two greedy methods; in this example, the OEFT greedy method is more excellent, so the ECSA has a heuristic greedy method of OEFT, and the initial point queue is [1,4,6,2,3,5,8,7,9,10];

3. initializing a probability matrix S:

calculating a PL value to obtain PLS = [1,2,7]; the initial point queues [1,4,6,2,3,5,8,7,9,10] and [1] are transmitted into vertex-to-line and loss-continuity functions to obtain: line2= [ (1,4), (1,6), (1,2), (1,3), (1,5), (4,8), (3,7), (5,9), (8,10) ];

performing EDA on the Line2 and the Line 1 to obtain Line3; line3 schedules and updates the real part and the imaginary part in the S matrix; the same method is applied to [3], [7], so far the S matrix is initialized.

4. Random walk random (0,1/(n + 1)), put V with diagonal imaginary part larger than this point ₁ Point set, reuse V ₁ Putting V into the real part of the corresponding diagonal element of the point set in the S matrix ₂ (ii) a For example: in this iterative run, V ₁ ＝[1]，V ₂ ＝[1]：

Line1＝vertex-to-line(π,V ₁ )；

Line2＝loss-continuity(line1,V ₂ )；

Line3＝EDA-exchange(line2,V ₁ ,S)；

Repeating the above steps for iteration

Next, the process is carried out. The output line3 of the last iteration is the edge coverage queue obtained by ECSA, and the edge coverage queue obtained this time is: [ (1,3), (1,2), (1,4), (1,6), (1,5), (4,8), (3,7), (5,9), (8,10)](ii) a The final scheduling result is shown in fig. 4.

Further, to evaluate the relative performance of an ECSA provided by the present invention, a DAG graph is generated by a DAG generator TGFF (a DAG generation tool) for modeling a heterogeneous computing system, for which there are many parameters associated with the task graph structure: the average value of the lower limit of the number of nodes in the graph, the maturity, the maximum in-degree and out-degree of the graph nodes and the like randomly generate 1920 groups of different DAGs based on preset parameters, and in each DAG, 20 different random graphs generate different communication cost edges and computing cost tasks, so 38400 random DAGs are used in the re-search.

In ECSA, the learning rate k is 0.005 in EDA and graph random walk updates; to evaluate the performance of the ECSA, we compared it to four algorithms (EFT, HEFT, PEFT, and LO), all encoded in MATLAB-2021b, run on the same computer as AMD 5800x 3.8ghz and 32GB RAM, with the results shown in figures 5-7.

Fig. 5 and 6 (in fig. 5 and 6, the ordinate is the maximum completion time, m is the number of processors of the heterogeneous computing system, and n is the number of tasks) show the maximum completion time as a function of DAG size n, processor model m, and CCR, and it can be seen from fig. 5 and 6 that the ECSA can effectively reduce the maximum completion time of the DAG graph as the DAG size gradually increases. The greater the number of tasks, the more the maximum completion time drops. HEFE and PEFT maintain similar performance in a random DAG, EFT is the simplest greedy algorithm, the worst performance, and LO algorithm is designed and used in a DAG of a particular structure, which, although it has higher complexity, performs similarly to HEFE and PEFT when the size of the DAG is small, but performs worse than HEFE and PEFT when the size of the DAG is large. Overall, considering all the random plots as a whole, ECSA was reduced by 8% at maximum time-out compared to EFT, 5% compared to HEFT, 4.3% compared to PEFT, and 6.2% compared to LO algorithm.

The maximum completion time is the most direct target in task scheduling, and SLR is a parameter which can better reflect the performance of a scheduling algorithm; as can be seen from fig. 7 (EFT, LO, HEFT, PEFT, ECSA, respectively, from top to bottom in fig. 7), the scheduling scheme given by ECSA has a lower SLR than other scheduling algorithms, and as the DAG size increases, SLR increases, while ECSA decreases SLR more:

wherein makespan is the scheduling length, CP, of the scheduling scheme _MIN Is the shortest critical path of the DAG graph.

In summary, it can be seen from the experimental results that the ECSA has significantly improved performance compared to the conventional scheduling algorithm.

Example 4

In another aspect of the present invention, as shown in fig. 8, there is also provided an electronic device, including a processor, a network interface, and a memory, where the processor, the network interface, and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the above task scheduling method for a heterogeneous computing system.

In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capabilities. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware component.

The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.

In another aspect of the present invention, a computer storage medium is further provided, where program instructions are stored in the computer storage medium, and when the program instructions are executed by at least one processor, the program instructions are used to implement the above-mentioned task scheduling method for a heterogeneous computing system.

In one possible implementation, the storage medium may be a memory, for example, may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.

The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.

The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), SLDRAM (SLDRAM), and Direct Rambus RAM (DRRAM).

The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.

It should be understood that the disclosed system may be implemented in other ways. For example, the division of the modules into only one logical functional division may be implemented in practice in other ways, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the communication connection between the modules may be an indirect coupling or communication connection between servers or units through some interfaces, and may be electrical or in other forms.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A task scheduling method for a heterogeneous computing system is characterized by comprising the following steps:

2. The heterogeneous computing system-oriented task scheduling method according to claim 1, wherein the preset range is:

[0，1/(n+1)]

and n is the total number of nodes in the task node queue.

3. The heterogeneous computing system-oriented task scheduling method of claim 1, wherein the determining a plurality of specified points in the probability distribution matrix based on the random number comprises:

4. The heterogeneous computing system-oriented task scheduling method of claim 1, wherein the converting the topologically feasible point queue to an edge-covered queue based on the specified set of points comprises:

traversing each node pi starting from the second node in the topology feasible point queue _i Judging pi _i If the node is a node in the appointed point set, if so, the pi is _i And its successor node succ (pi) _i ) Conversion to (pi) _i ,succ(pi _i ) Put into the edge coverage queue, if not, select node pi _i Corresponding target Pred (pi) node _i ) Will (pred (pi) _i ),pi _i ) Adding the edge coverage queue; wherein, the target preamble node is the preamble node with the largest communication cost with the current node.

5. The heterogeneous computing system oriented task scheduling method of claim 4, further comprising: a deserializing step after converting the topology feasible point queue to an edge covered queue based on the specified set of points;

the step of decoherence includes: based on a set of points specified [ m ] _i ]Find (pred (m) in the edge coverage queue _i ),m _i ) Checking whether logic error is generated after deleting the edge, if not, deleting, and if so, keeping.

6. The task scheduling method for the heterogeneous computing system according to claim 1, wherein in S2, determining a probability distribution matrix based on the task node queues and topological relationships thereof includes:

7. The heterogeneous computing system-oriented task scheduling method according to claim 1 or 6, wherein the preset number of times is

And secondly, wherein n is the total number of nodes in the task node queue.

8. A task scheduling apparatus for a heterogeneous computing system, comprising:

the edge coverage queue iteration unit is used for generating and outputting a target scheduling scheme, wherein the target scheduling scheme is generated by the following method and comprises the following steps: s3, based on a random walk algorithm, taking any random number in a preset range, and determining a plurality of designated points in the probability distribution matrix based on the random number to obtain a designated point set; converting the topology feasible point queue into a side coverage queue based on the specified point set; s4, performing simulated scheduling on the edge coverage queue based on a probability distribution algorithm, and updating a current probability distribution matrix; and S5, repeating the steps S3 to S4 until the preset times are reached, obtaining a target probability distribution matrix and a side coverage queue thereof, and further generating a target scheduling scheme.

9. An electronic device, comprising a processor, a network interface, and a memory, the processor, the network interface, and the memory being interconnected, wherein the memory is configured to store a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the heterogeneous computing system-oriented task scheduling method of any of claims 1-7.

10. A computer-readable storage medium, having stored thereon program instructions, which when executed by at least one processor, are configured to implement the heterogeneous computing system-oriented task scheduling method of any one of claims 1-7.