CN110363700A

CN110363700A - A kind of custom instruction parallel enumerating method based on depth map segmentation

Info

Publication number: CN110363700A
Application number: CN201910627526.5A
Authority: CN
Inventors: 肖成龙; 王珊珊; 王心霖
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2019-10-22

Abstract

The present invention provides a kind of custom instruction parallel enumerating method based on depth map segmentation, is related to EDA Technique field.This method uses Master-slave parallel mode first, and the data flow diagram that the host node in computing cluster receives the intermediate representation generation phase generation that dedicated custom instruction collection automatically generates process is used as input；Then initial data flow graph is divided by several subgraphs using the depth image segmentation method based on nonlinear regression Runtime prediction model, and the subgraph after segmentation is distributed to the idle calculate node in computing cluster；The runing time of the subtask of segmentation is predicted simultaneously, according to the number of the predicted time of all subtasks and calculate node, judges whether to need to continue to be split complicated subtask；Calculate node enumerates custom instruction from the subgraph received using convex portion enumeration of graph algorithm.The method of the present invention can more effectively guarantee the load balancing between calculate node, reach the speed-up ratio of approximately linear.

Description

A kind of custom instruction parallel enumerating method based on depth map segmentation

Technical field

The present invention relates to EDA Technique field more particularly to a kind of customized fingers based on depth map segmentation Enable parallel enumerating method.

Background technique

In order to meet Embedded Application to high-performance and low-power consumption increasing need, by using accelerator or make by oneself The custom calculation of adopted functional unit (Custom Function Unit) is applied in embedded system more and more.Its In, application specific processor is one of the most important scheme for realizing customization operation.

Application specific processor is the processor of a kind of framework and instruction set optimization design, by expansion instruction set, so that mesh The partial code for marking application program is executed in reference processor, and other computation-intensive codes are real in the hardware of custom instruction It is executed in existing-custom feature unit.Application specific chip is designed for some specific application, can only be run and specifically be answered With, therefore lack certain flexibility.Reference processor relative to application specific chip (ASIC), in application specific processor framework It can guarantee certain flexibility.Have the design cycle long high with testing cost secondly, designing and producing application specific chip Disadvantage.

Relative to general processor, application specific processor encapsulates a series of basic operations by using custom instruction and instructs (for example, addition, subtraction, multiplication and logical operation), so that between these basic operations instruction certainly according to data dependence relation Dynamic chain, the not elementary instruction parallelization of data dependence relation, to largely improve the speed of operation.In addition, by It is encapsulated into a custom instruction in multiple elementary instructions, so that instruction fetch number and data are between register and processor The number of transmission is reduced, and then the power consumption of application specific processor is substantially less than general processor.

Automatically generating for expansion instruction set is the key that application specific processor design is realized.The application specific of application specific processor expands Automatically generating for instruction set of exhibition generally comprises four steps as shown in Figure 1: intermediate representation generates, and custom instruction is enumerated, made by oneself Justice instruction selection, code building.Application program is converted to suitable intermediate representation by intermediate representation generation phase, for example, control Data flow diagram；Custom instruction enumeration stage enumerates all subgraphs for meeting constraint condition as candidate's under framework constraint Custom instruction；The custom instruction choice phase is according to different purposes of design from the subgraph (figure of custom instruction enumerated Shapeization indicates) in select a part of best subgraph as final custom instruction.These custom instructions selected just are constituted Final expansion instruction set.Code generation phase is responsible for automatically generating the hardware realization code of custom instruction, and by source Code conversion is the fresh code comprising custom instruction.

It is the most key during expansion instruction set automatically generates and multiple that custom instruction, which is enumerated with custom instruction selection, Two miscellaneous stages.Aspect is enumerated in custom instruction and has a lot of research work, these research work are all using serial Method, however, when the problem is large in scale, serial approach possibly can not provide within reasonable time optimization design scheme or Optimization design scheme cannot be provided.Custom instruction enumerate problem be enumerated from the corresponding data flow diagram of application program it is all Meet the convex portion figure of certain design constraint or user's constraint as candidate custom instruction.Son in one given data flow diagram Figure (AS) number may at most have 2n, and wherein n is the number of node in data flow diagram.It can be seen that custom instruction piece Act is the very high problem of algorithm complexity.In order to reduce the complexity of problem, research or introducing microarchitecture before Constraint or some constraint conditions are artificially added.According to the difference of constraint condition, research before can classify and divide as follows Analysis:

(1) tree-like subgraph (TS): in order to reduce the complexity enumerated, the research of early stage, which focuses primarily upon, enumerates all trees Shape subgraph.However, only enumerate tree-like subgraph as custom instruction, very limited ground improving performance or power consumption can only be reduced.

(2) multiple input single output subgraph (MISO): such research is focused on enumerating all single defeated with multiple inputs Subgraph out.Although the complexity of problem, multi input can largely be reduced by only enumerating multiple input single output subgraph Single output subgraph is as the performance boost of custom instruction bring or lower power consumption or very limited.

(3) multiple-input and multiple-output subgraph (MIMO): due to the Limited Number (I/O) of the reading-writing port of register, recent Some researchs are mainly focused on enumerating the subgraph for meeting I/O limitation.Wherein Pozzi et al. is proposed based on binary decision tree Enumeration, the algorithm cut down search using the monotonicity for the output number for forming subgraph in data flow diagram by topological order Space.In order to more preferably utilize the topological structure characteristic of data flow diagram, Xiao et al. proposes more effective algorithm.Xiao et al. is mentioned Go out based on disposable figure dividing method custom instruction parallel enumerating method, when calculate node is less, the method can be obtained Approximately linear speed-up ratio.The upper limit that Chen et al. demonstrates the subgraph number for meeting I/O condition for the first time is n^IN+OUT, wherein n is number According to the number of node in flow graph, while it also proposed one and can enumerate respectively connected subgraph and separation by parameter setting The algorithm of subgraph.However, this algorithm is in the runing time enumerated on all subgraphs (including connected subgraph and separation subgraph) It is suitable with the runing time of algorithm that Pozzi et al. is proposed.Xiao et al. proposes one and can be required neatly according to user It enumerates connected subgraph or separates the algorithm of subgraph.Experiment show the algorithm enumerate the runing time on all subgraphs than Fast one to two orders of magnitude of algorithm that Chen et al. is proposed.

(4) maximum convex portion figure (MaxMIMO): being found through experiments that, enumerates in the case where loosening I/O limitation customized Instruction, tends to bring higher performance boost.Therefore, in recent years, also there is part research to concentrate on and enumerate maximum convex portion figure, Limitation without considering I/O.It should be noted that although MaxMIMO subgraph can bring higher property as custom instruction It can be promoted, but it does not have good reusability or versatility usually.

In addition to the research that above type of custom instruction is enumerated, in recent years, also there are some scholars to propose relevant algorithm To enumerate all convex portion figures (ACS).The upper limit that Gutin et al. demonstrates the number of the convex portion figure in data flow diagram for the first time is 2ⁿ+n+1-d_n, wherein n is the number of network nodes in data flow diagram, if n is even number d_n=2*2^n/2If n is odd number d_n=3*2^(n-1)/2.Wang et al., which is proposed, can enumerate all convex portion figures or enumerate the algorithm for meeting the convex portion figure of size constraint, the calculation Method is fast 3.29 times average compared with the algorithm that Balistera et al. is proposed.Wang et al. also proposed a kind of based on disposable figure simultaneously The subgraph parallel enumerating method of segmentation, the experimental results showed that, when calculate node number is less, this is parallel relative to serial approach Method can reach the speed-up ratio of approximately linear.Increasing however as calculate node number, the problem of load imbalance, gradually shows, It is in downward trend that this method, which obtains speed-up ratio,.Rezgui et al. proposes a kind of enough by the way that initial problem to be decomposed into Subproblem guarantees the method for parallel processing of the load balancing between calculate node.The author is usually each by many experiments discovery It, being capable of preferably proof load equilibrium when the subproblem number of calculate node distribution is between 30 to 100.However, due to each Calculate node distribute suitable subproblem number be it is very doubt, this method still may cause the load between calculate node It is unbalanced, to influence the efficiency of parallel processing.

Summary of the invention

It is a kind of based on depth map segmentation the technical problem to be solved by the present invention is in view of the above shortcomings of the prior art, provide Custom instruction parallel enumerating method, using Master-slave parallel mode realize custom instruction parallel enumerating.

In order to solve the above technical problems, the technical solution used in the present invention is: a kind of making by oneself based on depth map segmentation Adopted parallel instructions enumeration methodology, comprising the following steps:

Step 1, using Master-slave parallel mode, the host node in computing cluster receives dedicated custom instruction collection and automatically generates The data flow diagram that the intermediate representation generation phase of process generates is as input；

The data flow diagram is a kind of directed acyclic graph G=(V, E), wherein nodal set V={ v₁..., v_nIndicate basic Instruction, n are the number of data flow diagram node, side collection E={ e₁..., e_m∈ V × V indicate instruction between data dependence relation, m Indicate the number on data flow diagram side；

Step 2, using the depth image segmentation method based on nonlinear regression Runtime prediction model by original number Several subgraphs are divided into according to flow graph, and the subgraph after segmentation is distributed into the idle calculate node in computing cluster, specific method Are as follows:

Step 2.1, custom instruction is enumerated task T progress be disposably divided into k subtask, shown in following formula:

Wherein, G_k=G- { v₁, v₂..., v_k-1It is k-th of subgraph generating after data flow diagram G segmentation, and k=1,2 ..., | V |, | v | for the nodal point number in figure G, E (G_k, v_k) indicate from k-th of subgraph G_kIn enumerate it is all comprising node v_kCustomized finger It enables；

Step 2.2 establishes the Runtime prediction model based on nonlinear regression, the operation to the subtask of segmentation Time is predicted；

When constraint condition is certain, custom instruction enumerate the time in figure nodal point number and number of edges it is related, make by oneself Justice instruction is enumerated the time and is increased with the increase of nodal point number or number of edges；The subgraph given for one enumerates task T_kAnd its it is right The data flow diagram G answered_k(V_k, E_k), the worst runing time that custom instruction is enumerated isWherein α is constant, | V_k | for figure G_kIn nodal point number；There are many extensive forms of the truth of a matter in formula, it is assumed here that:

Wherein, f (| V_k|, | E_k|) be nodal point number | V_k| and number of edges | E_k| polynomial function；

Taylor series expansion is carried out to formula (2) and obtains Runtime prediction model, shown in following formula:

Wherein, parameter k ' is used for the expansion of Controlling model, parameter a_{I, j}By using the method for nonlinear regression, according to reality Data fitting is tested to obtain；

Step 2.3, according to the predicted time of all subtasks and the number of calculate node, judge whether to need to continue to multiple Miscellaneous subtask is split；If the prediction runing time of subtask is more than given time upper limit, which is continued Several subtasks are divided into, until the prediction runing time of all subtasks is less than or equal to given time upper limit, are otherwise held Row step 3；The given time upper limit is the consensus forecast runing time of current all subtasks；

Assuming that subtask T_kPrediction runing time be greater than given time upper limit, to T_kContinue to divide, following formula It is shown:

Wherein, G_{K, l}=G_k-{w₁, w₂..., w_l-1It is subtask T_kFirst of the subgraph generated after segmentation, E (G_{K, l}, { v_k, w_l) indicate from subgraph G_{K, l}It enumerates comprising node v_kWith node w_lAll custom instructions；H is as cutoff value, T_{K, h-1}Prediction Runing time is greater than current consensus forecast runing time, T_{K, h}Prediction runing time be less than current consensus forecast runing time；Such as Fruit task T_{K, l}Prediction runing time upper limit value is still greater than, then continue T_{K, l}It is divided into several subtasks；

Step 3, calculate node enumerate custom instruction from the subgraph received using convex portion enumeration of graph algorithm；

It is described to enumerate custom instruction are as follows: to be enumerated from a given data flow diagram G=(V, E) and meet the following conditions All subgraph S:(1) subgraph S is convex portion figure；(2) subgraph S is connected graph；

The convex portion figure are as follows: for the subgraph S of G,v∈V_sIf all only passing through S in any path in G between u and v In node, then claiming S is the convex portion figure of G；

The connected graph are as follows: for the subgraph S of G,v∈V_s, there are at least one paths to connect u and v, then S is connection Figure.

The beneficial effects of adopting the technical scheme are that provided by the invention a kind of based on depth map segmentation A kind of custom instruction parallel enumerating method, it is contemplated that custom instruction enumerates the high complexity of problem, is transported using task based access control Initial problem is divided into several subproblems by the depth image segmentation method of row time prediction model, and is distinguished by calculate node only On the spot solve subproblem.Compared with existing parallel method, the method for the present invention can more effectively guarantee negative between calculate node Equilibrium is carried, the speed-up ratio of approximately linear is reached.

Detailed description of the invention

Fig. 1 is that the application specific expansion instruction set that background of invention provides automatically generates flow chart；

Fig. 2 is a kind of stream of the custom instruction parallel enumerating method based on depth map segmentation provided in an embodiment of the present invention Cheng Tu；

Fig. 3 is a kind of frame of the custom instruction parallel enumerating method based on depth map segmentation provided in an embodiment of the present invention Frame figure；

Fig. 4 is the speed-up ratio pair provided in an embodiment of the present invention for being directed to the acquirement of four test benchmark programs, three kinds of parallel methods Than figure, wherein (a) is benchmark program MP3, (b) is benchmark program MESA, (c) is benchmark program IIR, (d) is benchmark program DES3；

Fig. 5 is that the total run time provided in an embodiment of the present invention for using three kinds of each calculate nodes of parallel method compares, Wherein, (a) is MGP parallel method, (b) is EPS parallel method, (c) is ODP parallel method；

Fig. 6 is by the time provided in an embodiment of the present invention predicted using runing time prediction model of the invention and reality The comparison figure of runing time.

Specific embodiment

With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below Example is not intended to limit the scope of the invention for illustrating the present invention.

In the present embodiment, a kind of custom instruction parallel enumerating method based on depth map segmentation, as shown in Figures 2 and 3, The following steps are included:

Step 2.2 can be seen that the subtask that the above segmentation generates, some tasks, cannot obviously compared with other tasks complexity Guarantee the load balancing between calculate node, need the prediction runing time according to subtask, select complicated subtask carry out into The segmentation of one step；Therefore, the Runtime prediction model based on nonlinear regression is established, when to the operation of the subtask of segmentation Between predicted；

Step 2.3, according to the predicted time of all subtasks and the number of calculate node, judge whether to need to continue to multiple Miscellaneous subtask is split；If the prediction runing time of subtask is more than given time upper limit, which is continued Several subtasks are divided into, until the prediction runing time of all subtasks is less than or equal to given time upper limit；It is described to give Fixed time upper limit is the consensus forecast runing time of current all subtasks；

Wherein, G_{K, l}=G_k-{w₁, w₂..., w_l-1It is subtask T_kFirst of the subgraph generated after segmentation, E (G_{K, l}, { v_k, w_l) indicate from subgraph G_{K, l}It enumerates comprising node v_kWith node w₁All custom instructions；H is as cutoff value, T_{K, m-1}Prediction Runing time is greater than current consensus forecast runing time, T_{K, n}Prediction runing time be less than current consensus forecast runing time；Such as Fruit task T_{K, l}Prediction runing time upper limit value is still greater than, then continue T_{K, l}It is divided into several subtasks；

The present embodiment gives a kind of pseudocode of custom instruction parallel enumerating method based on depth map segmentation, such as Shown in algorithm 1:

Algorithm 1: custom instruction parallel enumerating method

Input: data flow diagram G (V, E)

Output: meet the subgraph set S of constraint condition

The environment for the calculate node that the present embodiment uses is i3-3240 3.4GHz processor, 4-GB main memory.It surveys It tries benchmark set and derives from MiBench.These benchmark are to carry out front end compiling and simulation by generic compilation platform GeCoS.Implement The feature of data flow diagram used in example and the runing time of serial algorithm are shown in Table 1.NV, column NE are arranged in table and column NS is respectively indicated The number of node, the number on side and the subgraph number enumerated in data flow diagram.The operating time log of serial algorithm is arranging Runtime (unit is millisecond).

1 benchmark program feature of table and serial algorithm runing time

Benchmark program	NV	NE	NS	Runtime
					MP3	43	66	181,533,673	221,554
MESA	37	65	7,554,499	8,027
					IIR	40	56	23,195,414	28,725
DES3	45	60	637,125,710	649,873

In the present embodiment, custom instruction parallel enumerating method (being denoted as MGP) of the invention is mentioned with Wang et al. respectively Parallel enumerating method (being denoted as ODP) out and Rezgui et al.^]The parallel method (being denoted as EPS) of proposition compares.It is involved And the configuration of calculate node be i3-32403.4GHz processor, 4-GB main memory, three parallel methods use Hadoop1.0.0 is realized.Calculate node proposes that subgraph enumeration carries out subgraph and enumerates using Wang et al..Institute in the present embodiment The test benchmark program used is as shown in table 1.For parallel method EPS, the subtask number of each calculate node distribution is set as 50, Therefore, subtask sum is 50*w, and wherein w is the number of calculate node.

For four test benchmark programs in table 1, three parallel algorithms obtain acceleration under different calculate node numbers It is more as shown in Figure 4 than result.By comparing result it is observed that three parallel algorithms take when calculate node is less than or equal to 10 The speed-up ratio obtained is similar, is in the speed-up ratio of approximately linear.But obtained with the increase of calculate node number, MGP method Speed-up ratio is substantially better than other two parallel algorithms, and EPS method is better than ODP method.Wherein, acquired by EPS method and ODP method Speed-up ratio cannot with the increase of calculate node keep linear increase.Particularly with ODP method, when calculate node increases to 18 After, the growth of speed-up ratio tends towards stability, and no longer linearly increases with the increase of calculate node.Comparing result also illustrates, Treatment effect of the MGP method in problem of load balancing is better than EPS method and ODP method.

In order to further analyze the speed-up ratio difference of three of the above parallel method acquirement, compares and analyze in the present embodiment When three kinds of parallel methods enumerate custom instruction using same number calculate node, the total run time of each calculate node.? On the basis of test before this, the total run time of each calculate node is counted.For benchmark DES3, in calculate node In the case that number is 8, distinguished using the total run time of three kinds of each calculate nodes of parallel method as shown in Figure 5.It can observe To MGP method is used, the total run time between each calculate node differs smaller (maximum runing time and minimum runing time Difference be 13.1%), and to use EPS method or ODP method, the total run time difference between each calculate node is more apparent (difference of maximum runing time and minimum runing time is respectively 21.3% and 39.6%).By comparison, it was found that EPS method and There are larger differences for the subgraph size that ODP method generates, and wherein EPS method is primary concern is that the subgraph quantity generated after segmentation Without considering the problems of that subgraph size, ODP method are brighter using subgraph difference in size caused by disposable dividing method It is aobvious, and larger subgraph can be divided further into several compared with boy by MGP method under the guidance of Runtime prediction model Figure, therefore the subgraph size that this method generates is more balanced.Comparing result further illustrates that MGP method of the invention may make point The task of each calculate node of dispensing has similar complexity, it is ensured that the load balancing between calculate node, to obtain The speed-up ratio of approximately linear.

Meanwhile the present embodiment also evaluates runing time prediction model.It has been first randomly generated 200 data flows Figure, the nodal point number of these data flow diagram | V | range is 20~55, the number on side | E |=1.5* | V |.For each random generation Data flow diagram, be configured to i3-3240 3.4GHz processor at one, subgraph piece used on the computer of 4-GB main memory It lifts algorithm and enumerates all convex portion figures for meeting constraint condition, and record corresponding actual algorithm runing time.Then, using wherein 185 parts of actual data (nodal point number, number of edges, runing time) carry out the parameter in formula (3) as training sample polynary non- Linear regression fit.In order to evaluate Runtime prediction model, the data flow diagram generated at random for 15 transports task The time of row time prediction model prediction compares with actual run time.Subgraph enumerates Runtime prediction mould The prediction runing time that type provides is as shown in Figure 6 compared with actual run time.Comparing result shows, prediction runing time with Actual run time is very close, and error range is 3%~12%.

Extra time expense, the present embodiment detailed analysis are needed due to carrying out figure segmentation and the prediction of subtask runing time The ratio of parallel enumerating method total run time is accounted for the time required to the figure dividing processing of task based access control runing time prediction model.Needle To four different test benchmark programs, under conditions of calculate node number is 12, task based access control runing time prediction model The ratio that total run time is accounted for the time required to figure segmentation is as shown in table 2.Arrange Benchmark, column Pre.&Par.Time, column Total It is (unit is millisecond) the time required to Time and column Ratio respectively indicates test benchmark program name, figure segmentation and task prediction, total The ratio between runing time and the two (%).According to result, it can be seen that needed for the figure dividing processing of task based access control runing time prediction Time is averagely about the 3.82% of total run time.This also indicates that the figure segmentation side of task based access control time prediction model of the invention Method efficiency with higher, the time required to be considerably less than total run time, when will not influence parallel method significantly and always running Between.

The time required to the figure segmentation of 2 task based access control runing time prediction model of table compared with total run time

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify to technical solution documented by previous embodiment, or some or all of the technical features are equal Replacement；And these are modified or replaceed, model defined by the claims in the present invention that it does not separate the essence of the corresponding technical solution It encloses.

Claims

1. a kind of custom instruction parallel enumerating method based on depth map segmentation, it is characterised in that: the following steps are included:

Step 1, using Master-slave parallel mode, the host node in computing cluster receives dedicated custom instruction collection and automatically generates process Intermediate representation generation phase generate data flow diagram as input；

The data flow diagram is a kind of directed acyclic graph G=(V, E), wherein nodal set V={ v₁..., v_nIndicate elementary instruction, N is the number of data flow diagram node, side collection E={ e₁..., e_m∈ V × V indicate instruction between data dependence relation, m indicate number According to the number on flow graph side；

Step 2, using the depth image segmentation method based on nonlinear regression Runtime prediction model by original data stream Figure is divided into several subgraphs, and the subgraph after segmentation is distributed to the idle calculate node in computing cluster, method particularly includes:

Wherein, G_k=G- { v₁, v₂..., v_k-1It is k-th of subgraph generating after data flow diagram G segmentation, and k=1,2 ..., | v |, | V | for the nodal point number in figure G, E (G_k, v_k) indicate from k-th of subgraph G_kIn enumerate it is all comprising node v_kCustom instruction；

Step 2.2 establishes the Runtime prediction model based on nonlinear regression, to the runing time of the subtask of segmentation It is predicted；

Step 2.3, according to the predicted time of all subtasks and the number of calculate node, judge whether to need to continue to complicated Subtask is split；If the prediction runing time of subtask is more than given time upper limit, which is continued to divide For several subtasks, until the prediction runing time of all subtasks is less than or equal to given time upper limit, step is otherwise executed Rapid 3；The given time upper limit is the consensus forecast runing time of current all subtasks；

It is described to enumerate custom instruction are as follows: the institute for meeting the following conditions is enumerated from a given data flow diagram G=(V, E) Subgraph S:(1) subgraph S is convex portion figure；(2) subgraph S is connected graph；

The convex portion figure are as follows: for the subgraph S of G,If in any path in G between u and v all only by S Node, then claiming S is the convex portion figure of G；

The connected graph are as follows: for the subgraph S of G,There are at least one paths to connect u and v, then S is connected graph.

2. a kind of custom instruction parallel enumerating method based on depth map segmentation according to claim 1, feature exist In: the step 2.2 method particularly includes:

When constraint condition is certain, custom instruction enumerate the time in figure nodal point number and number of edges it is related, customized finger Order is enumerated the time and is increased with the increase of nodal point number or number of edges；The subgraph given for one enumerates task T_kAnd its it is corresponding Data flow diagram G_k(V_k, E_k), the worst runing time that custom instruction is enumerated isWherein α is constant, | V_k| it is Scheme G_kIn nodal point number；There are many extensive forms of the truth of a matter in formula, it is assumed here that:

Wherein, parameter k ' is used for the expansion of Controlling model, parameter a_{I, j}By using the method for nonlinear regression, according to experiment number It is obtained according to fitting.

3. a kind of custom instruction parallel enumerating method based on depth map segmentation according to claim 2, feature exist In: the step 2.3 method particularly includes:

Assuming that subtask T_kPrediction runing time be greater than given time upper limit, to T_kContinue to divide, following formula institute Show:

Wherein, G_{K, l}=G_k-{w₁, w₂..., w_l-1It is subtask T_kFirst of the subgraph generated after segmentation, E (G_{K, l}, { v_k, w_l}) It indicates from subgraph G_{K, l}It enumerates comprising node v_kWith node w_lAll custom instructions；H is as cutoff value, T_{K, h-1}Prediction fortune The row time is greater than current consensus forecast runing time, T_{K, h}Prediction runing time be less than current consensus forecast runing time；If Subtask T_{K, l}Prediction runing time upper limit value is still greater than, then continue T_{K, l}It is divided into several subtasks.