CN109960573A

CN109960573A - A kind of cross-domain calculating task dispatching method and system based on Intellisense

Info

Publication number: CN109960573A
Application number: CN201811643211.1A
Authority: CN
Inventors: 樊文昌; 云亚娇; 武新
Original assignee: TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Current assignee: TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2019-07-02
Anticipated expiration: 2038-12-29
Also published as: CN109960573B

Abstract

The present invention proposes a kind of cross-domain calculating task dispatching method and system based on Intellisense, comprising: step 1 trains decision-tree model based on label data；Step 2, the execution time based on relative time complexity estimation calculating task；Step 3, the change in resources trend indicator that each domain is predicted based on resource historical record and ARIMA algorithm；Step 4, the resource real-time status index that each domain is obtained using resource status interface；Step 5, the transit time that each domain is moved to based on available bandwidth estimated data；Step 6 is based on decision-tree model and overall target decision task optimal execution domain.The present invention, in cross-domain calculating task scheduling scenario, avoids task resource and seizes phenomenon, solve the problems, such as that scheduling decision accuracy is low creatively by trend prediction algorithm and decision Tree algorithms integrated application；By streaming machine learning techniques, the performance issue of trend prediction algorithm and decision Tree algorithms is overcome, cross-domain calculating task scheduling overall time is greatly shortened.

Description

A kind of cross-domain calculating task dispatching method and system based on Intellisense

Technical field

The invention belongs to task schedule fields, and in particular to cluster grade task scheduling scenario, especially one kind is towards cross-domain Calculate the calculating task dispatching technique of environment.

Background technique

Cross-domain calculating environment is made of multiple domains being mutually isolated, and each domain includes one or more complete storages and meter Cluster is calculated, specific calculating task can be independently executed.Domain where participating in the key data calculated is known as data field.In cross-domain meter It calculates in environment, it is not optimal scheduling strategy that calculating task, which is always submitted to data field execution,.When data field surplus resources When insufficient, task will enter waiting list, cause the task start time uncontrollable.When data field surplus resources anxiety, task Calculated performance will be affected, and cause task execution time elongated.It is excessively high when calculating task is committed to other low-load domains Data cross-domain moving costs also results in the task start time and substantially postpones.Therefore, it is necessary to a kind of task schedules of overall importance Technology, on the basis of comprehensively considering the influence factors such as each domain resource situation and data moving costs, intelligent decision task is most Excellent execution domain.

Domain, which is able to carry out specific calculating task, must satisfy two premises: 1) domain must satisfy calculating task to CPU With the demand of the resources such as memory；2) domain must store the data for participating in calculating, and be needed when necessary by data from other domain migrations To this domain.The size of migrating data directly affects the length of cross-domain transit time, and file data can be by summarizing each fragment text The size of part obtains its data volume, and database data can estimate its data by calculating tables of data width and recording the product of number Amount.

Summary of the invention

Present invention aim to address the resource utilization in each domain under cross-domain calculating environment is unbalanced and domain is domestic-investment The problem of source deficiency leads to task execution failure or executes overlong time is maintaining the resource utilization in each domain opposite to realize On the basis of equilibrium shorten calculating task the overall execution time regulation goal, propose it is a kind of by Intellisense it is cross-domain based on Method for scheduling task and system are calculated, for determining the optimal execution domain of specific calculating task.

In order to achieve the above objectives, the technical scheme of the present invention is realized as follows:

A kind of cross-domain calculating task dispatching method based on Intellisense, comprising:

Step 1 trains decision-tree model based on label data；

Step 2, the execution time based on relative time complexity estimation calculating task；

Step 3, the change in resources trend indicator that each domain is predicted based on resource historical record and ARIMA algorithm；

Step 4, the resource real-time status index that each domain is obtained using resource status interface；

Step 5, the transit time that each domain is moved to based on available bandwidth estimated data；

Step 6 is based on decision-tree model and overall target decision task optimal execution domain.

Further, decision-tree model described in step 1 show that steps are as follows by decision Tree algorithms training:

1.1, initial labels data are constructed, and are divided into training set and test set；

1.2, training set is input in decision tree training algorithm and is arranged training parameter, obtain decision-tree model；

1.3, decision-tree model and test set are input in decision tree assessment algorithm, show that decision-tree model assessment refers to Mark；1.4, when decision-tree model evaluation index is unsatisfactory for requiring:

A) adjusting training parameter repeats step 1.2 and 1.3, until index is met the requirements；Alternatively,

B) adjustment label rule, repeats step 1.1,1.2 and 1.3, until index is met the requirements.

Further, evaluation method described in step 2 includes:

2.1, a kind of benchmark algorithm is chosen, and fits the time complexity curve of the benchmark algorithm；

2.2, the time complexity according to task to be evaluated relative to benchmark algorithm calculates the expected time T that goes out on missions.

Further, algorithm described in step 3 specifically includes:

3.1 obtain the resource historical data of this domain the past period；

3.2 calculate the resources data of this domain following a period of time using ARIMA algorithm；

3.3 obtain the expected time T of current time t0 and current task；

3.4 intercept the data in the section [t0, t0+T] from resources data, calculate variation tendency index；

3.5 each domains repeat 3.1~3.4 steps, respectively calculate the variation tendency index in this domain.

Further, resource real-time status index described in step 4 specifically includes following 5 indexs:

4.1 cluster CPU idleness, for describing the overall service condition of cluster CPU；

4.2 cluster core cpu sums, for describing the core total number of cluster CPU；

4.3 cluster free memories, the summation of the memory remaining space size for describing each node of cluster；

4.4 cluster disk remaining spaces, the summation of the disk remaining space size for describing each node of cluster；

4.5 cross-domain network availability bandwidths, for describing the service condition of the network bandwidth between two clusters.

A kind of another aspect of the present invention, it is also proposed that cross-domain calculating task scheduling system based on Intellisense, comprising:

Model training module, based on label data training decision-tree model；

Task execution time evaluator, the execution time based on relative time complexity estimation calculating task；

Change in resources trend prediction device predicts the change in resources trend in each domain based on resource historical record and ARIMA algorithm Index；

Resource real-time indicators collector obtains the resource real-time status index in each domain using resource status interface；

Data Migration time evaluator, the transit time in each domain is moved to based on available bandwidth estimated data；

Task optimal execution domain decision-making device is based on decision-tree model and overall target decision task optimal execution domain.

Further, model training module includes:

Initial data unit for constructing initial labels data, and is divided into training set and test set；

Model unit obtains decision tree for being input in decision tree training algorithm and being arranged training parameter for training set Model；

Index unit obtains decision tree mould for decision-tree model and test set to be input in decision tree assessment algorithm Type evaluation index；

Adjustment unit reuses mould for the adjusting training parameter when decision-tree model evaluation index is unsatisfactory for requiring Type unit and index unit, until index is met the requirements；Alternatively, adjustment label rule, reuse initial data unit, Model unit and index unit, until index is met the requirements.

Further, task execution time evaluator includes:

Fitting unit chooses a kind of benchmark algorithm, and fits the time complexity curve of the benchmark algorithm；

Computing unit, the time complexity according to task to be evaluated relative to benchmark algorithm, calculating, which is gone out on missions, to be expected to execute Time T.

Further, change in resources trend prediction implement body includes:

Data capture unit obtains the resource historical data of this domain the past period；

Data Computation Unit calculates the resources data of this domain following a period of time using ARIMA algorithm；

Time acquisition unit obtains the expected time T of current time t0 and current task；

Indicator calculating unit intercepts the data in the section [t0, t0+T] from resources data, calculates variation tendency and refers to Mark；

Each domain uses said units, respectively calculates the variation tendency index in this domain.

Further, resource real-time indicators collector includes:

Cluster CPU idleness unit, for describing the overall service condition of cluster CPU；

The total counting unit of cluster core cpu, for describing the core total number of cluster CPU；

Cluster free memory unit, the summation of the memory remaining space size for describing each node of cluster；

Cluster disk remaining space unit, the summation of the disk remaining space size for describing each node of cluster；

Cross-domain network availability bandwidth unit, for describing the service condition of the network bandwidth between two clusters.

Compared with prior art, novelty of the invention is embodied in:

1) it is kept away creatively by trend prediction algorithm and decision Tree algorithms integrated application in cross-domain calculating task scheduling scenario Task resource is exempted from and has seized phenomenon, has solved the problems, such as that scheduling decision accuracy is low；

2) by streaming machine learning techniques, the performance issue of trend prediction algorithm and decision Tree algorithms is overcome, substantially Shorten cross-domain calculating task scheduling overall time.

Value dimension of the invention exists: 1) meeting the resource requirement of calculating task operation, it is ensured that each task is at runtime There are enough resources；2) resource that may occur between calculating task is avoided to seize phenomenon；3) calculating task is dispatched to from master The place for wanting data nearest executes；4) resource utilization is promoted, the loading condition between each domain is balanced；5) meet service application The transparent demand using cross-domain resource.

The resource that the present invention can avoid to occur between calculating task seizes phenomenon.By introducing ARIMA algorithm, so that It predicts that the change in resources trend in each domain is possibly realized, is seized so as to avoid current task and periodic task that resource occurs Phenomenon.ARIMA is weighted and averaged the index value in period in past to obtain current index value, when week is presented in period in past index When phase property, current period index also will appear similar quality.ARIMA algorithm is predicted according to the resource historical data in each domain The following corresponding variation tendency.The periodicity of historical data is stronger, and the estimated execution period of predetermined period task is more accurate, The probability for avoiding resource from seizing phenomenon is higher.

The invention can ensure that task optimal execution domain initial decision accuracy is greater than 80%, and it is allowed to run not with system It is disconnected to improve.By introducing decision Tree algorithms, solves this kind of complicated scheduling problem of cross-domain calculating.The mistake to label for training set Journey is substantially the cross-domain moving costs of adjustment, the Current resource service condition in each domain and Future variation tendency this tripartite The weight relationship of face index, its significance lies in that: customizable initial labels training set, to customize initial decision model.It is fixed The initial decision model of inhibition and generation improves initial decision accuracy to the full extent.As dispatching algorithm constantly executes, positive and negative use Example is continuously replenished in training set, the continuous iteration of decision model, and decision accuracy will also be continuously improved.

The present invention can provide efficient scheduling performances.By introducing streaming machine learning techniques, by ARIMA algorithm and decision Tree algorithm transform streaming machine learning task as, second grade response is realized, so that scheduling overall performance greatly improved.

Detailed description of the invention

Fig. 1 is the step schematic diagram of the embodiment of the present invention；

Fig. 2 is the resources curve graph of the embodiment of the present invention；

Fig. 3 is the scheduling implementation procedure schematic diagram of the embodiment of the present invention；

Fig. 4 is the streaming machine learning implementation procedure schematic diagram of the embodiment of the present invention.

Specific embodiment

It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the present invention can phase Mutually combination.

Technical solution of the present invention is described in further detail with reference to the accompanying drawing:

The present invention becomes in the Current resource service condition and Future for comprehensively considering data cross-domain moving costs, each domain On the basis of change trend three aspect factor, the intelligent decision function in task optimal execution domain is realized, is mainly comprised the steps of (such as Shown in Fig. 1):

Step 1 trains decision-tree model based on label data；

Wherein, step 1 is independently of other steps, once the decision-tree model met the requirements, step are generated by step 1 1 is no longer needed for executing.Hereafter, scheduling only carries out step 2 to step 6 every time, can obtain the unique of task optimal execution domain Mark.

1, based on label data training decision-tree model

Decision-tree model is used to generate correct label, the i.e. unique identification in task optimal execution domain to decision data.Certainly Plan tree-model show that steps are as follows by decision Tree algorithms training:

1) initial labels data are constructed, and are divided into training set and test set；

2) training set is input in decision tree training algorithm and is arranged training parameter, obtain decision-tree model；

3) decision-tree model and test set are input in decision tree assessment algorithm, obtain decision-tree model evaluation index；

4) when decision-tree model evaluation index is unsatisfactory for requiring:

A) adjusting training parameter repeats step 2 and 3, until index is met the requirements.Alternatively,

B) adjustment label rule, repeats steps 1 and 2 and 3, until index is met the requirements.

Label data includes two parts of label and feature, and label represents the unique identification in task optimal execution domain, feature The every factor for influencing the result of decision is represented, including data cross-domain moving costs, the Current resource service condition in each domain and not Carry out change in resources trend three aspect factor.The structure of label data is as follows:

(D, F)

Wherein, D represents label, and F represents feature and can further indicate that are as follows:

(C_d0, R_d0, M_d0) ..., (C_di, R_di, M_di) ..., (C_dn, R_dn, M_dn)

Wherein, C represents change in resources trend indicator, and R represents resource real-time status index, when M represents data cross-domain migration Between.Di represents the i-th domain, and i ∈ [0, n], d0 represents data field.Therefore, C_d0Represent the change in resources trend indicator of data field, R_d0 The resource real-time status index of data field, M_d0Represent time and M of the Data Migration to data field_d0=0.Similarly, C_diRepresent The change in resources trend indicator in the domain i, R_diRepresent the resource real-time status index in the i-th domain, M_diData Migration is represented to the i-th domain Time.

Initial labels data acquisition system comprising n data can indicate are as follows:

(Di, Fi) | i ∈ [0, n]

0.7n data therein is randomly selected out as training set, remaining 0.3n data is as test set.

Trained decision-tree model f is used for the unique identification D according to feature F calculating task optimal execution domain, and f is indicated such as Under:

D=f (F)

2, the execution time based on relative time complexity estimation calculating task

The execution time of calculating task is difficult directly to calculate, therefore proposes here a kind of based on relative time complexity Indirect evaluation method.Firstly, choosing a kind of benchmark algorithm, such as Terasort, and the time for fitting the benchmark algorithm is multiple Miscellaneous line of writing music.Then, the time complexity according to task to be evaluated relative to benchmark algorithm, calculating are gone out on missions the expected time T, calculation formula are as follows:

T=Cslog (s) r+B

Wherein, behalf participates in calculating the size of data, and r represents the relative time complexity of task, and C represents benchmark algorithm Coefficient constant, B represent set time consumption constant.S and r is passed to as variable by external callers.Relative time complexity r can To be determined and under different data amount sample point according to the relationship of the execution time and the execution time of benchmark algorithm of task.

3, the change in resources trend indicator in each domain is predicted based on resource historical record and ARIMA algorithm

If the variation of resource history curve shows apparent periodicity, show that the domain has periodically long task. After obtaining resource historical data by historical record access interface, the resource by ARIMA algorithm predictable domain future out becomes Change curve.When harmonic compoment variation is presented in resource history curve, resources curve can equally show the similar period Property, therefore the execution period of future period task can be predicted, so that current task and periodic task be avoided to occur Resource seizes phenomenon.The periodicity of historical data is stronger, and the estimated execution period of periodic task is predictably more accurate, avoids The probability that resource seizes phenomenon generation is higher.

Specific algorithm is as follows:

1) the resource historical data of this domain the past period (for example, one week in the past) is obtained；

2) the resources data of this domain following a period of time (for example, one day following) are calculated using ARIMA algorithm；

3) the expected time T of current time t0 and current task are obtained；

4) data that the section [t0, t0+T] is intercepted from resources data, calculate variation tendency index；

5) each domain repeats 1~4 step, respectively calculates the variation tendency index in this domain.

The variation tendency index in each domain becomes this after summarizing and waits for a part of decision data.Assuming that the resource in some domain Although prediction curve is as shown in Fig. 2, the domain current residual resource is more sufficient, it is contemplated that resource can quilt in the following T time Periodic task occupies, therefore the domain may not be the optimal execution domain of current task.

4, the resource real-time status index in each domain is obtained using resource status interface

Resource real-time status index is used to assess the resource real-time status in domain, and the resource real-time status index in each domain summarizes Become this afterwards and waits for a part of decision data.Resource real-time status index specifically includes following 5 indexs:

1) cluster CPU idleness.For describing the overall service condition of cluster CPU；

2) cluster core cpu sum.For describing the core total number of cluster CPU；

3) cluster free memory.For describing the summation of the memory remaining space size of each node of cluster；

4) cluster disk remaining space.For describing the summation of the disk remaining space size of each node of cluster；

5) cross-domain network availability bandwidth.For describing the service condition of the network bandwidth between two clusters.

5, based on available bandwidth estimate Data Migration to each domain transit time

Available bandwidth is insufficient between two domains, data cross-domain transit time is directly affected, to influence task optimal execution The final decision result in domain.The transit time in Data Migration to each domain becomes this after summarizing and waits for a part of decision data. Mainly the transmission time by data between domain and file disk IO time are constituted data cross-domain transit time M.Calculation formula is as follows:

Wherein, the size of behalf migrating data, n represent available bandwidth, and C represents network performance wave constant, and I, which is represented, to be divided Cloth file system write performance constant, O represent distributed file system reading performance constant.S and n is passed as variable by caller Enter.

6, decision-tree model and overall target decision task optimal execution domain are based on

Before acquisition after the indices of step, according to the decision-tree model of step 1 training, decision is gone out on missions optimal hold The unique identification in row domain.

To decision data by the change in resources trend indicator (step 3) in each domain, the resource real-time status index in each domain (step 4) and data move to the transit time (step 5) composition in each domain.Indices are recombinated as unit of domain, Ultimately form feature F, it may be assumed that

(C_d0, R_d0, M_d0) ..., (C_di, R_di, M_di) ..., (C_dn, R_dn, M_dn)

Then according to the unique identification D in decision-tree model f calculating task optimal execution domain, it may be assumed that

D=f (F).

System of the present invention is cooperated by following 5 components and is completed:

1) task execution time evaluator: it is responsible for the execution time of assessment calculating task；

2) Data Migration time evaluator: it is responsible for assessment data cross-domain transit time；

3) resource real-time indicators collector: it is responsible for the resource real-time status index in each domain of acquisition；

4) change in resources trend prediction device: the change in resources trend for being responsible for each domain of quick predict following a period of time refers to Mark；

5) task optimal execution domain decision-making device: it is responsible for the optimal execution domain of high-speed decision calculating task.

It is following (as shown in Figure 3) to dispatch implementation procedure:

1) task execution time evaluator, input data amount and relative time complexity are called, task execution time is obtained T；

2) the change in resources trend prediction device in each domain is called, incoming task expected time T obtains each domain in T Change in resources trend indicator C in time_di；

3) the resource real-time indicators collector for calling each domain, obtains the resource real-time indicators R in each domain_di；

4) Data Migration time evaluator is called, the resource real-time indicators R in each domain is inputted_diAnd data volume, obtain data Move to the transit time M in each domain_di；

5) task optimal execution domain decision-making device is called, the change in resources trend indicator C according to each domain is inputted_di, each domain Resource real-time indicators R_diThe transit time M in each domain is moved to data_diAfter integration to decision data (C_d0, R_d0, M_d0) ..., (C_di, R_di, M_di) ..., (C_dn, R_dn, M_dn), obtain the unique identification D in task optimal execution domain.

Here a specific embodiment of the invention is illustrated with a specific example.(it is assumed to be the domain A with the environment in two domains With the domain B) for, the cross-domain calculating task dispatching technique implementation process based on Intellisense is as follows:

1) task execution time evaluator is called, task expected time T is obtained；

2) the change in resources trend prediction device for calling the domain A obtains change in resources of the domain A in task expected time T Trend indicator C_a；

3) the change in resources trend prediction device for calling the domain B obtains change in resources of the domain B in task expected time T Trend indicator C_b；

4) the resource real-time indicators collector for calling the domain A, obtains the resource real-time status index R in the domain A_a；

5) the resource real-time indicators collector for calling the domain B, obtains the resource real-time status index R in the domain B_b；

6) Data Migration time evaluator is called, data cross-domain transit time M is obtained_aAnd M_b；

7) summarize and recombinate all indexs, obtain to decision data (C_a, R_a, M_a), (C_b, R_b, M_b), then input optimal hold Row domain decision-making device, it is final to obtain task optimal execution domain unique identification D.

Wherein, change in resources trend prediction device and task optimal execution domain decision-making device all use streaming machine learning techniques into It has gone performance optimization, has realized second grade response.Either ARIMA algorithm or decision Tree algorithms are held as off-line calculation task The capable response time is minute rank, seriously affects scheduling overall performance.It transform ARIMA algorithm and decision Tree algorithms as streaming Machine learning task simultaneously executes on backstage, not only saves the starting time of algorithm, and realize to the quick of single request Response.

Below by taking the decision-making device of task optimal execution domain as an example, illustrate the implementation procedure (as shown in Figure 4) of streaming machine learning:

1) task optimal execution domain decision making algorithm (streaming machine learning task) start and load decision-tree model, then into Enter suspended state；

2) request of task optimal execution domain decision interface caller and it is forwarded to request queue；

3) task optimal execution domain decision making algorithm is activated, and acquisition request is simultaneously calculated according to decision-tree model, and will As a result it is sent to response queue, then proceedes to suspended state；

4) task optimal execution domain decision interface obtains the result of decision from response queue and returns to caller, completes this Decision process, and step 2 is constantly repeated to step 4；

When sending terminates order, task optimal execution domain decision making algorithm is waken up, and terminates streaming machine after discharging resource Learning tasks.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of cross-domain calculating task dispatching method based on Intellisense characterized by comprising

Step 1 trains decision-tree model based on label data；

2. a kind of cross-domain calculating task dispatching method based on Intellisense according to claim 1, which is characterized in that step Rapid 1 decision-tree model show that steps are as follows by decision Tree algorithms training:

1.3, decision-tree model and test set are input in decision tree assessment algorithm, obtain decision-tree model evaluation index；

1.4, when decision-tree model evaluation index is unsatisfactory for requiring:

3. a kind of cross-domain calculating task dispatching method based on Intellisense according to claim 1, which is characterized in that step Rapid 2 evaluation method includes:

4. a kind of cross-domain calculating task dispatching method based on Intellisense according to claim 1, which is characterized in that step Rapid 3 algorithm specifically includes:

3.1 obtain the resource historical data of this domain the past period；

3.3 obtain the expected time T of current time t0 and current task；

5. a kind of cross-domain calculating task dispatching method based on Intellisense according to claim 1, which is characterized in that step The rapid 4 resource real-time status index specifically includes following 5 indexs:

6. a kind of cross-domain calculating task based on Intellisense dispatches system characterized by comprising

Model training module, based on label data training decision-tree model；

Change in resources trend prediction device predicts the change in resources trend indicator in each domain based on resource historical record and ARIMA algorithm；

7. a kind of cross-domain calculating task based on Intellisense according to claim 6 dispatches system, which is characterized in that mould Type training module includes:

Model unit obtains decision-tree model for being input in decision tree training algorithm and being arranged training parameter for training set；

Index unit show that decision-tree model is commented for decision-tree model and test set to be input in decision tree assessment algorithm Estimate index；

Adjustment unit reuses model list for the adjusting training parameter when decision-tree model evaluation index is unsatisfactory for requiring Member and index unit, until index is met the requirements；Alternatively, adjustment label rule, reuses initial data unit, model Unit and index unit, until index is met the requirements.

8. a kind of cross-domain calculating task based on Intellisense according to claim 6 dispatches system, which is characterized in that appoint Business executes time evaluator

Computing unit, the time complexity according to task to be evaluated relative to benchmark algorithm, calculating are gone out on missions the expected time T。

9. a kind of cross-domain calculating task based on Intellisense according to claim 6 dispatches system, which is characterized in that money Source trend implement body includes:

Indicator calculating unit intercepts the data in the section [t0, t0+T] from resources data, calculates variation tendency index；

10. a kind of cross-domain calculating task based on Intellisense according to claim 6 dispatches system, which is characterized in that Resource real-time indicators collector includes: