CN103226467A

CN103226467A - Data parallel processing method and system as well as load balancing scheduler

Info

Publication number: CN103226467A
Application number: CN2013101951796A
Authority: CN
Inventors: 杨树强; 华中杰; 贾焰; 尹洪; 赵辉; 李爱平; 陈志坤; 金松昌; 周斌; 韩伟红; 韩毅; 舒琦
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2013-05-23
Filing date: 2013-05-23
Publication date: 2013-07-31
Anticipated expiration: 2033-05-23
Also published as: CN103226467B

Abstract

The embodiment of the invention discloses a data parallel processing method and system as well as a load balancing scheduler. In the embodiment of the invention, any server in a server cluster has the capabilities of executing a task and storing data. On the basis, view from job scheduling, the embodiment of the invention has the advantages that the overall load balancing states of the system under different executing sequences are predicated according to a calculation localization strategy, an executing sequence for enabling the overall load balancing states of the system to be optimal is selected, and the scheduling operation is carried out according to the sequence. View from task scheduling, the embodiment of the invention has the advantage that each task entering an executing state is allocated according to the calculation localization strategy. Each data processing task is allocated on a server for storing a data block corresponding to the task in the calculation localization strategy, thus when the task is handled, the same server is used as both a server node for storing the data block and a server node for executing the task, so that network data transmission between the server nodes is reduced, and the data processing performance is improved.

Description

Data parallel processing method, system and load balance scheduler

Technical field

The present invention relates to technical field of data processing, more particularly, relate to data parallel processing method, system and load balance scheduler.

Background technology

Under distributed computing environment, for example, the MapReduce(that is proposed by Google is hereinafter to be referred as MR) in the parallel computation programming model, the data of the required processing of operation have been divided into a plurality of data blocks, and be that unit is stored on one or more server nodes with the data block.Behind client's submit job, this operation will be divided into and data block task one to one, and these tasks will be assigned to executed in parallel on the different server nodes.If do not store the data block of this task correspondence on the server node of executing the task, then need by network data transmission, data block is transferred to the server node that this is executed the task from the server node of storing it.Therefore, how to reduce the network data transmission expense between the server node, promote the performance of data processing, become the hot topic of present research.

Summary of the invention

In view of this, the purpose of the embodiment of the invention is to provide data parallel processing method, system and load balance scheduler, to address the above problem.

For achieving the above object, the embodiment of the invention provides following technical scheme:

A kind of data parallel processing method, based on server cluster, the arbitrary server in the described server cluster has the ability of executing the task and storing data;

Described method comprises:

The operation waiting list is put in the operation that the user submits to by client, and collected the DATA DISTRIBUTION information of described operation; The data of the required processing of described operation are divided into a plurality of data blocks, and be stored in respectively on the server in the described server cluster, the corresponding data Processing tasks of each described data block, described DATA DISTRIBUTION information comprises the distributed intelligence of the data block of described operation correspondence;

The operation number of carrying out when described server cluster is during less than first threshold, according to described DATA DISTRIBUTION information, prediction distributes the overall system load balancing state that operation caused in the described operation waiting list according to calculating localization strategy under different execution sequences, obtain optimum execution sequence;

Operation in the described operation waiting list is resequenced by described optimum execution sequence, and entering executing state according to the order after the rearrangement operation in the schedule job waiting list successively, the operation number of carrying out until described server cluster reaches first threshold or waits for that job queue is for empty;

According to calculating the operation that localization strategy distributes each to enter executing state, so that server is carried out data processing task;

Comprise described the distribution according to the calculating localization strategy:

Each data block at the data of the required processing of operation is created a data Processing tasks, and each data processing task is dispensed on the server of its corresponding data block of storage.

A kind of data parallel disposal system comprises server cluster and load balance scheduler;

Arbitrary server in the described server cluster has the ability of executing the task and storing data;

Described load balance scheduler comprises:

Pretreatment unit is used for the operation waiting list is put in the operation that the user submits to by client, and collects the DATA DISTRIBUTION information of described operation; The data of the required processing of described operation are divided into a plurality of data blocks, and be stored in respectively on the server in the described server cluster, the corresponding data Processing tasks of each described data block, described DATA DISTRIBUTION information comprises the distributed intelligence of the data block of described operation correspondence;

Predicting unit, when being used for the operation number carried out when described server cluster less than first threshold, according to described DATA DISTRIBUTION information, prediction distributes the overall system load balancing state that operation caused in the described operation waiting list according to calculating localization strategy under different execution sequences, obtain optimum execution sequence;

The job scheduling unit, be used for the operation of described operation waiting list is resequenced by described optimum execution sequence, and the operation of dispatching successively in the operation waiting list after the rearrangement according to the order after the rearrangement enters executing state, and the operation number of carrying out until described server cluster reaches first threshold or waits for that job queue is for empty;

The first task scheduling unit is used for according to calculating the operation that localization strategy distributes each to enter executing state, so that server is carried out data processing task;

A kind of load balance scheduler matches with server cluster, and the arbitrary server in the described server cluster has the ability of executing the task and storing data; Described load balance scheduler comprises:

As seen, in embodiments of the present invention, the arbitrary server in the server cluster has the ability of executing the task and storing data.On this basis, on the job scheduling aspect, the embodiment of the invention is predicted overall system load balancing state under the different execution sequences according to calculating localization strategy, selects the execution sequence that can make overall system load balancing state optimization, and schedule job in this order.At task dispatch layer face, the embodiment of the invention distributes each to enter the operation of executing state according to calculating localization strategy.Owing to calculate localization strategy is that each data processing task is dispensed on the server of its corresponding data block of storage, like this, when Processing tasks, same server is not only as the server node of storage data block but also as the server node of executing the task, reduce the network data transmission between the server node, promoted the performance of data processing.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

The data processing synoptic diagram that Fig. 1 provides for the embodiment of the invention based on MR;

The data parallel processing method process flow diagram that Fig. 2 provides for the embodiment of the invention;

The system load balancing view that Fig. 3 provides for the embodiment of the invention;

The global search tree synoptic diagram that Fig. 4 provides for the embodiment of the invention;

The heuristic search strategic process synoptic diagram that Fig. 5 a and Fig. 5 b provide for the embodiment of the invention;

The data parallel disposal system synoptic diagram that Fig. 6 provides for the embodiment of the invention;

The load balance scheduler structural representation that Fig. 7 provides for the embodiment of the invention.

Embodiment

For quote and know for the purpose of, hereinafter the technical term of Shi Yonging, write a Chinese character in simplified form or abridge to sum up and be explained as follows:

Calculate localization: calculate localization and be meant under distributed computing environment, distribution by computational logic, make that the calculation server (computing node) of deal with data is identical with the storage server node (memory node) of these data of storage, reduce network data transmission expense between computing node and the memory node with this, promote the performance of data processing;

The data locality: but refer to the satisfaction degree that calculates localization, promptly calculate required data and whether can not pass through Network Transmission, and directly calculating the ability that the place node obtains.Generally in large-scale distributed computing environment, represent whole localized degree with localization ratio (the shared number percent of calculating of localization fully);

Load balancing: refer under distributed computing environment, with load balancing be assigned to two to a plurality of nodes (server), avoid partial load overweight so that obtain higher resource utilization, improve data processing performance.Load can be computational load, I/O load, offered load or the like.

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.

MapReduce(is hereinafter to be referred as MR) be the parallel computation programming model that proposes by Google.Its basic thought is two functions (Map function and Reduce functions), and the parallel computation of the data processing request of any complexity (operation) on large-scale cluster all highly has been abstracted into these two functions.The MR model has not only been brought into play fabulous effect in actual applications, and is easy to learn and use, and is subjected to the favor of catenet IT enterprises.

The MR model is fit to the intensive calculating of deal with data.The data of the required processing of MR operation are divided into a plurality of data blocks (uncorrelated between the data block, as can be calculated separately), and these data blocks are stored on one or more server nodes.

Fig. 1 is the data processing synoptic diagram based on MR, supposes that handling this operation desired data is S set, and this set is divided into n mutually disjoint data subset (data block) S1～Sn, i.e. S=S1 ∪ S2 ... ∪ Sn.Each computation requests (operation) is broken down into a large amount of map and calculates (map task) and a small amount of reduce calculating (reduce task), map calculates and data block (S1～Sn) corresponding one by one, reduce independently calculates separately at the result of calculation (intermediate result that MR calculates) of map, and the result is saved in specified location in user.Wherein, the map task need be assigned to executed in parallel on the different computing nodes.Therefore, under the MR computing environment, the core is the scheduling of map task.Under other similar distributed computing environment, also need scheduler task.

Under distributed computing environment, there are the following problems: supposition, and the server node A job1 that executes the task, but do not store the data block of job1 correspondence, then need this data block to be transferred to server node A from the server node of storing it by network data transmission.How to reduce the network data transmission expense between the server node in the computation process, promote the performance of data processing, become the hot topic of present research.

In fact, existing task scheduling mode all is to satisfy particular demands (such as load balancing) as first target, will improve the data locality as second target, causes under the practical operation situation localization ratio not high.

Technical scheme provided by the present invention then will improve the data locality as first target, and solve data locality and the afoul problem of system load balancing by new thinking, when improving the data locality, the load balancing that optimization system is overall, reduce the network I/O expense in the computation process, increased the throughput of system and the execution time of the single operation of minimizing.

In addition, present MR scheduling mode is not distinguished job scheduling and task scheduling, and this is to be mainly used in the batch data processing at first because MR calculates, and generally has only a few operation carrying out, disturb lessly between the operation, do not need to distinguish job scheduling and task scheduling.But under the situation that the concurrent execution of a large amount of operations is arranged,, the optimization difficulty of scheduling mode will be strengthened if do not distinguish the scheduling of operation rank and task rank.And the core of technical scheme provided by the present invention is scheduling of operation rank and task rank scheduling: in the scheduling of operation rank, go out the best execution sequence of GSLB situation by the load balancing analyses and prediction; The task scheduling level not in, after operation enters executing state, can be divided into some Map tasks and Reduce task, according to the locality principle map task distribution is moved to its data place server.

To specifically introduce below.

Technical scheme provided by the present invention is based on server cluster, and the prerequisite of its enforcement is that the arbitrary server in the server cluster has the ability of executing the task and storing data, and like this, arbitrary server can be simultaneously as computing node and memory node.In other words, technical scheme provided by the present invention is based on each server and comprises independently storage and computing power, does not share the hypothesis of storage in the cluster.Present large-scale service provider and data center adopt this pattern, i.e. the large-scale cluster computing environment that forms by Internet connection by a large amount of low and middle-end servers, and therefore, this hypothesis is rational.

See also Fig. 2, the claimed data parallel processing method of the present invention comprises the steps: at least

S1, the operation that the user is submitted to by client are put into the operation waiting list, and are collected the DATA DISTRIBUTION information of operation.

Need to prove that the data of the required processing of operation that the user submits to are divided into a plurality of (at least two) data block (the corresponding map task of each data block), and be stored on the server in the server cluster.DATA DISTRIBUTION information then comprises the distributed intelligence of data block.

S2, the operation number carried out when server cluster are during less than first threshold, according to DATA DISTRIBUTION information, prediction according to the overall system load balancing state that operation caused that calculates in the localization strategy distribution operation waiting list, is obtained optimum execution sequence under different execution sequences.

Within this programme was discussed, those skilled in the art can not be provided with based on other technology or experience in the setting of first threshold size.In the MR technology, first threshold can be specified by the user.

In fact first threshold has represented the operation number of the maximum executed in parallel that server cluster can bear, if do not reach first threshold, and the more operation of also having the ability to carry out of expression server cluster.

S3, the operation in the operation waiting list is resequenced by optimum execution sequence, and the operation of dispatching successively in the operation waiting list after the rearrangement according to the order after the rearrangement enters executing state, and the operation number of carrying out until server cluster reaches first threshold or waits for that job queue is for empty.

Dispatch what operations and enter executing state, need determine by above-mentioned first threshold and the operation number that is in executing state.

By way of example, 4 station servers are arranged in the server cluster, every station server can be carried out 20 operations at most, can first threshold be set to 80.If the user has submitted 100 operations to, then there are 20 operations need put into the operation waiting list.

Suppose that server cluster has been finished 4 operations, like this, the operation number that server cluster is being carried out is 76 less than 80, at this moment, will predict to obtain optimum execution sequence (corresponding step S2).Afterwards, 20 operations in the operation waiting list are resequenced by optimum execution sequence, after the rearrangement, preceding 4 operations in the schedule job waiting list enter executing state (corresponding step S3).

S4, according to calculating the operation that localization strategy distributes each to enter executing state, so that server is carried out data processing task.

Need to prove that the operation number of carrying out whenever server cluster all needs to re-execute step S2-S4 during less than first threshold.

" distributing according to calculating localization strategy " among above-mentioned steps S2 and the S4 specifically can comprise: each data block at the data of the required processing of operation is created a data Processing tasks, and each data processing task is dispensed on the server of its corresponding data block of storage.Also promptly, the data block store of only seeing map task correspondence is just carried out this map task scheduling on which server to this server.

By way of example, 4 station server F1 to F4 are arranged in the server cluster.The data of operation X1 correspondence are divided into 2 data blocks, and are stored in respectively on F1 and the F2, and the data of operation X2 correspondence are divided into 3 data blocks, are stored in respectively among F1, F2, the F4.In this application, divide timing, operation X1 can be divided into two map tasks, and these two map tasks are respectively allocated on F1 and the F2 according to calculating localization strategy; Operation X2 is divided into three map tasks, and these three map tasks are respectively allocated on F1, F2, the F4.

Safeguarding on each server in the server cluster has local task queue (also can be the task waiting list), and the map task that distribution is come can be positioned in the local task queue, and server is carried out task in the local task queue according to the principle of first in first out.

More specifically, suffered, therefore, only needed computational logic with this map task be dispatched to carry out on this server and get final product because the data block wanted of a certain map required by task has been stored in a certain server.

Need to prove that one time map is calculated as a task, computational logic refers to map function, i.e. computing method.Computational logic on each data block of same operation is identical, and the computational logic of different operations may be different, also may be identical.

In addition, under special circumstances, if the operation number that server cluster is being carried out less than first threshold but have only an operation in the operation waiting list, does not then need to carry out above-mentioned steps S2 and S3, directly the operation in the schedule job waiting list enters executing state, afterwards execution in step S4.

As seen, in embodiments of the present invention, the arbitrary server in the server cluster has the ability of executing the task and storing data.On this basis, on the job scheduling aspect, the embodiment of the invention is predicted overall system load balancing state under the different execution sequences according to calculating localization strategy, select the execution sequence that can make overall system load balancing state optimization (load minimum and load are the most balanced), and schedule job in this order.

At task dispatch layer face, the embodiment of the invention makes each Map task all be dispatched on this required by task data place server and carries out, thereby make the Map task executions not have the network data transmission expense according to calculating the localization strategy allocating task.Reduce the network data transmission between the server node, promoted the performance of data processing.

In other embodiments of the invention, said method also can comprise:

The free time of each server in the periodic test server cluster;

Surpass on the server of second threshold value from data dispatching Processing tasks on the maximum server of number of tasks to free time.

More specifically, the data processing task at the local task queue end of server that can number of tasks is maximum is dispatched to free time above on the server of second threshold values.

Need to prove that simple scheduling mode is when scheduling occasion is ripe, each operation enters overall system load balancing state after the executing state in the computational tasks waiting list respectively, and the best operation of selection load balancing is dispatched it and entered executing state.This scheduling mode calculates simple, and when initial launch, can obtain effect preferably, but can cause the bad operation of load balancing situation slowly to be overstock, behind the long-play, may cause the load of system extremely unbalanced, Fig. 3 has illustrated this situation, and ordinate is load variance (can find out the load imbalance by the load variance) among Fig. 3, and horizontal ordinate is the time.

Above-mentioned situation for fear of Fig. 3, the embodiment of the invention is when predicting, it or not the overall system load balancing state after the single operation of prediction enters executing state, if but consider that All Jobs enters overall system load balancing state after the executing state according to certain execution sequence, select optimum execution sequence, thereby avoid the sharply situation appearance of decay of system performance shown in Figure 3.

To describe in detail to job scheduling below.

Aspect job scheduling, how to predict that to obtain optimum execution sequence be crucial.In one embodiment of the invention, " the overall system load balancing state that the operation in the prediction operation waiting list is caused under different execution sequences obtains optimum execution sequence " can comprise following substep among the above-mentioned steps S2:

One, structure global search tree, the global search tree comprises many searching routes sharing same root node, comprises leaf node in each searching route; Root node characterizes the current load balancing state of server cluster; Leaf node characterizes the operation in the operation waiting list, and different searching routes characterizes different execution sequences;

So that 3 operations to be arranged in the operation waiting list, and the ID of these 3 operations is respectively that job1-job3 is an example, and global search tree (referring to Fig. 4) can be by following substep structure:

Step1, the structure ground floor, ground floor has only a root node (start node), and root node is represented with job0.

Step2, because the operation that has N (3) to wait in the system, so the next operation that enters executing state has the possible selection of N kind, thereby the second layer can be expanded N leaf node, available operation ID represents leaf node.

Step3 constructs the 3rd layer, has selected an operation during owing to the structure second layer, so layer 2-based each node can be expanded N-1 (2) node and form the 3rd layer.

Step4 by that analogy, up to expanding, promptly can finish the structure of whole global search tree again.

Each searching route in the above-mentioned global search tree also can be considered operation and carries out sequence, searches for optimum execution sequence and is equivalent to search for optimum operation execution sequence.Present embodiment will be sought optimum operation, and to carry out sequence abstract be a graph search mathematical model, promptly seeks a optimal path from the root node to the leaf node based on some searching algorithm in as the global search tree of Fig. 4, thereby get access to optimum execution sequence.

Two, calculate the load balancing predicted value (the load balancing predicted value is used for characterization system overall load equilibrium state) of different searching routes, with the execution sequence of the searching route correspondence of load balancing predicted value minimum as optimum execution sequence.

But need to prove, if there be N to wait for operation, ANN kind searching route is so just arranged, obviously when dispatching each time the All Jobs in the current waiting list being carried out the overall situation considers, calculated amount can be very big, and is therefore, preferred, in the following embodiment of the present invention, adopt the heuristic search policy calculation.

Referring to Fig. 5 a, heuristic search strategy detailed content is as follows:

Steps A as destination node, is calculated the evaluation of estimate of destination node with root node, and with the evaluation of estimate of the root node load balancing predicted value as each searching route;

Step B, the searching route of selecting load balancing predicted value minimum is as the target search path, with the execution sequence of target search path correspondence as the target execution sequence;

Step C judges whether also there is the leaf node that does not carry out the load balancing predictor calculation in the target search path; If not, with the target execution sequence as optimum execution sequence (step e); If, with next leaf node of current goal node in the target search path as destination node, calculate the load balancing predicted value (step D) of the evaluation of estimate of destination node, return the step (step B) of the searching route of selection load balancing predicted value minimum as the target search path as affiliated searching route.

Below, see also Fig. 5 b, this paper will be there to be P (P=5) operation in the operation waiting list, the ID of these 5 operations is respectively that job1-job5 is that (P has determined the number of plies or perhaps the height of global search tree to example, the number of plies of global search tree or highly be P+1), the heuristic search strategy is introduced in more detail.

S501, the evaluation of estimate f(0 of calculating root node);

S502 on the root node basis, expands P leaf node, and at each leaf node, calculates the evaluation of estimate of each leaf node.

Need to prove that in step S502, all searching routes in the global search tree all are respectively target search path (because the f (0) of each searching route equates), and the 1st leaf node in each searching route is destination node respectively.

As for specifically how calculating evaluation of estimate, this paper is follow-up will to be described in detail.

S503 finds the node of evaluation of estimate minimum in the leaf node of the search tree of having expanded, and the lower level node of expanding this node calculates evaluation of estimate as destination node.

Suppose behind execution in step S502, job0-〉job1-〉job3-〉job5-〉job4-〉this operation of job2 carries out the evaluation of estimate minimum of job1 correspondence in sequence.Then expand the lower level node of the leaf node of job1 correspondence.Following one deck leaf node of expansion is respectively the leaf node of job2, job3, job4, job5 correspondence, and calculates the evaluation of estimate of each leaf node respectively.

S504, circulation has promptly found optimum operation to carry out sequence up to the P+1 layer that expands to the global search tree, and then search procedure finishes.

To introduce below and how calculate evaluation of estimate.

Suppose that destination node is the M node layer in a certain target search path.At each destination node, all available its evaluation of estimate of following function calculation f (M):

f(M)=g(M)+h(M)

(formula one)

Wherein:

When the operation of g (M) expression destination node correspondence is carried out, from the root node to the destination node load balancing value of related All Jobs and (the load balancing value that comprises original state).The available following formula of g (M) is calculated:

g (M) = Σ_{j = 1}^{M} {LB}_{j}

(formula two)

Wherein, j represents j node layer in the target search path;

In formula two, LB _jExpression, the load balancing predicted value of server cluster when the operation of j node layer correspondence enters executing state in the target search path, and LB1 represents the current load balancing value of server cluster (current system actual loading equilibrium value).

Still see also Fig. 5, with job0-〉job1-〉job3-〉job5-〉job4-〉to carry out sequence be example to this operation of job2, suppose that job3 is a destination node, then need the LB1 and the corresponding LB2 of job1 (job1 is a second layer node) of job0 correspondence are sued for peace.

Before address LB _jThe load balancing predicted value of server cluster when the operation of j node layer correspondence enters executing state in the expression target search path.In other words, LB _jIt is the quantification of operation to the j node layer correspondence load balancing of server cluster when entering executing state.

LB _jThe load variance is represented (referring to formula three) between the available server, represents then that when variance is 0 the load balancing of the system of this moment has reached best.

{LB}_{j} = Σ_{j = 1}^{M} Σ_{i = 1}^{N} {({Load}_{i}^{j} - \overset{&OverBar;}{{Load}^{j}})}^{2}

(formula three)

Wherein, Load _i ^jWhen the operation of j node layer correspondence enters executing state in the expression target search path, the load size of i server (N represents the total quantity of server in the server cluster) in the server cluster,

The average load of server cluster (also, when the operation of j node layer correspondence entered executing state in the expression target search path

).

Since in the MR technology, the corresponding data block of each map task, and the size of each data block is identical, all map computational logics of same operation are identical.Therefore, in the present embodiment, the load of server size is with representing for the map number of tasks of this server-assignment.In actual MR system, the load size equals map task queue length (comprising task and the waiting task carried out) on i the server.

Still with job0-〉job1-〉job3-〉job5-〉job4-〉to carry out sequence be example to this operation of job2, suppose to have in the server cluster five servers (F1-F5), the original state of server cluster is, on each server the length of local task queue to be 2(also be that the map number of tasks is 2).Suppose that the data of operation job1 correspondence are divided into 2 data blocks, and be stored in respectively on F1 and the F2 that the data of operation job3 correspondence are divided into 3 data blocks, be stored in respectively among F1, F2, the F4.

If the leaf node of job1 correspondence is a destination node, when then job1 entered executing state, the load of F1 was 3, and the load of F2 is 3, and the load of F3 is 2, and the load of F4 is 2, and the load of F5 is 2, and the g of job1 correspondence (M) is 1.25.

And if the leaf node of job3 correspondence is a destination node, when then job3 entered executing state, the load of F1 was 4, and the load of F2 is 4, and the load of F3 is 2, and the load of F4 is 3, and the load of F5 is 2.Then the g of job3 correspondence (M) is 4.

Under h (M) the expression perfect condition, the summation of the load balancing value that expection produced when remaining All Jobs was carried out.So-called perfect condition is meant that the server load in the server cluster is average fully.The available following formula of h (M) is calculated:

h (M) = Σ_{j = M + 1}^{P + 1} {lb}_{j}

(formula four)

Wherein, lb _jExpression, under the complete average case of the server load in server cluster, the load balancing predicted value of server cluster when the operation of j node layer correspondence enters executing state in the target search path.

Lb _jAvailable following formula is calculated:

{lb}_{j} = Σ_{j = M + 1}^{P + 1} Σ_{i = 1}^{N} {({l_{i}}^{j} - \overset{&OverBar;}{l^{j}})}^{2}

(formula five)

Wherein, l _i ^jBe illustrated under the complete average case of server load in the server cluster, when the operation of j node layer correspondence enters executing state in the target search path, the load size of i server in the server cluster.

And

When the operation of j node layer correspondence entered executing state in the expression target search path, the average load of server cluster (also was

).

In like manner, l _i ^jCan equal, under the complete average case of the server load in server cluster, map task queue length on i the server.

Still with job0-〉job1-〉job3-〉job5-〉job4-〉to carry out sequence be example to this operation of job2, suppose to have in the server cluster five servers (F1-F5), the original state of server cluster is, on each server the length of local task queue to be 2(also be that the map number of tasks is 2).

Suppose that the data of operation job1 correspondence are divided into 2 data blocks, and be stored in respectively on F1 and the F2; The data of operation job3 correspondence are divided into 3 data blocks, are stored in respectively among F1, F2, the F4; The data of job5 correspondence are divided into 3 data blocks, are stored in respectively among F1, F3, the F4; The data of job4 correspondence are divided into 4 data blocks, are stored in respectively among F2, F3, F4, the F5, and the data of job2 correspondence are divided into 2 data blocks, are stored in respectively among F1, the F5.

At job0-〉job1-〉job3-〉job5-〉job4-〉this operation of job2 carries out in sequence, if the leaf node of job1 correspondence is a destination node, when then job1 entered executing state, the load of F1 was 3, and the load of F2 is 3, the load of F3 is 2, the load of F4 is 2, and the load of F5 is 2, and then the g of job1 correspondence (M) is 1.25, corresponding h (M) is 4.4, and corresponding evaluation of estimate f (M) is 5.65.

And if the leaf node of job3 correspondence during as destination node, when then job3 entered executing state, the g of job3 correspondence (M) was 4, and corresponding h (M) is 3.2, and corresponding evaluation of estimate f (M) is 7.2.

Corresponding with said method, the present invention also desires the protected data parallel processing system (PPS), and referring to Fig. 6, this system can comprise server cluster 1 and load balance scheduler 2 at least;

Arbitrary server in the server cluster 1 has the ability of executing the task and storing data;

Referring to Fig. 7, above-mentioned load balance scheduler 2 can comprise:

Pretreatment unit 21 is used for the operation waiting list is put in the operation that the user submits to by client, and collects the DATA DISTRIBUTION information of operation;

Predicting unit 22, when being used for the operation number carried out when server cluster less than first threshold, according to DATA DISTRIBUTION information, prediction according to the overall system load balancing state that operation caused that calculates in the localization strategy distribution operation waiting list, is obtained optimum execution sequence under different execution sequences;

Job scheduling unit 23, be used for the operation of operation waiting list is resequenced by optimum execution sequence, and the operation of dispatching successively in the operation waiting list after the rearrangement according to the order after the rearrangement enters executing state, and the operation number of carrying out until server cluster reaches first threshold or waits for that job queue is for empty;

First task scheduling unit 24 is used for according to calculating the operation that localization strategy distributes each to enter executing state, so that server is executed the task.

Detail can not given unnecessary details at this referring to the aforementioned introduction of this paper.

In other embodiments of the invention, above-mentioned load balance scheduler also can comprise the second task scheduling unit, the free time that is used for described each server of server cluster of periodic test, and surpass on the server of second threshold value from data dispatching Processing tasks on the maximum server of number of tasks to free time.Detail can not given unnecessary details at this referring to the aforementioned introduction of this paper.

Load balance scheduler among also claimed above-mentioned all embodiment of the embodiment of the invention.

Need to prove that load balance scheduler can be a hardware device, also can be software program.And each unit in the load balance scheduler also can be hardware device (for example, pretreatment unit can be preprocessing server, the actual predictive server that can be of predicting unit) or software program.When load balance scheduler was software program, it can be installed in arbitrary server of server cluster.

Each embodiment adopts the mode of going forward one by one to describe in this instructions, and what each embodiment stressed all is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.For the device that embodiment provides, because it is corresponding with the method that embodiment provides, so description is fairly simple, relevant part partly illustrates referring to method and gets final product.

Also need to prove, in this article, relational terms such as first and second grades only is used for an entity or operation are made a distinction with another entity or operation, and not necessarily requires or hint and have the relation of any this reality or in proper order between these entities or the operation.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby make and comprise that process, method, article or the equipment of a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or also be included as this process, method, article or equipment intrinsic key element.Do not having under the situation of more restrictions, the key element that limits by statement " comprising ... ", and be not precluded within process, method, article or the equipment that comprises key element and also have other identical element.

Through the above description of the embodiments, the those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential common hardware, common hardware comprises universal integrated circuit, universal cpu, general-purpose storage, universal elements etc., can certainly comprise that special IC, dedicated cpu, private memory, special-purpose components and parts wait and realize by specialized hardware, but the former is better embodiment under a lot of situation.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in the storage medium that can read, as USB flash disk, mobile memory medium, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), various media that can the storing software program code such as magnetic disc or CD, comprise that some instructions are with so that a computer equipment (can be a personal computer, server, the perhaps network equipment etc.) carry out the method for each embodiment of the present invention.

Above-mentioned explanation to the embodiment that provided makes this area professional and technical personnel can realize or use the present invention.Multiple modification to these embodiment will be conspicuous concerning those skilled in the art, and defined herein General Principle can realize under the situation that does not break away from the spirit or scope of the present invention in other embodiments.Therefore, the present invention will can not be restricted to these embodiment shown in this article, but principle and the features of novelty the wideest corresponding to scope that is provided with this paper will be provided.

Claims

1. a data parallel processing method is characterized in that, based on server cluster, the arbitrary server in the described server cluster has the ability of executing the task and storing data;

Described method comprises:

2. the method for claim 1 is characterized in that, also comprises:

The free time of each server in the described server cluster of periodic test;

3. method as claimed in claim 2 is characterized in that:

The overall system load balancing state that operation in the described operation waiting list of described prediction is caused under different execution sequences, obtain optimum execution sequence and comprise:

Structure global search tree, described global search tree comprises many searching routes sharing same root node, comprises leaf node in each searching route; Described root node characterizes the current load balancing state of server cluster; Described leaf node characterizes the operation in the described operation waiting list, and different searching routes characterizes different execution sequences;

Calculate the load balancing predicted value of different searching routes, with the execution sequence of the searching route correspondence of load balancing predicted value minimum as optimum execution sequence; Described load balancing predicted value is used to characterize described overall system load balancing state.

4. method as claimed in claim 3 is characterized in that: the load balancing predicted value of the different searching routes of described calculating comprises the execution sequence of the searching route correspondence of load balancing predicted value minimum as optimum execution sequence:

Root node as destination node, is calculated the evaluation of estimate of described destination node, and with the evaluation of estimate of the root node load balancing predicted value as each searching route;

The searching route of selecting load balancing predicted value minimum is as the target search path, with the execution sequence of target search path correspondence as the target execution sequence;

Judge and whether also have the leaf node that does not carry out the load balancing predictor calculation in the described target search path; If not, with described target execution sequence as optimum execution sequence; If, with next leaf node of current goal node in the described target search path as destination node, calculate the load balancing predicted value of the evaluation of estimate of described destination node, and return the step of the searching route of described selection load balancing predicted value minimum as the target search path as affiliated searching route.

5. method as claimed in claim 4 is characterized in that:

Operation quantity in the described operation waiting list is P, and described P is a positive integer;

In described target search path, described destination node is the M node layer, and M is not less than 1, is not more than P+1;

The evaluation of estimate of the described destination node of described calculating comprises:

Use formula f (M)=g (M)+h (M) to calculate the evaluation of estimate f (M) of described destination node;

Wherein:

g (M) = Σ_{j = 1}^{M} {LB}_{j};

h (M) = Σ_{j = M + 1}^{P + 1} {lb}_{j};

J represents j node layer in the target search path;

LB _jExpression, the load balancing predicted value of server cluster when the operation of j node layer correspondence enters executing state in the target search path, LB ₁The current load balancing value of expression server cluster;

Lb _jExpression, under the complete average case of the server load in server cluster, the load balancing predicted value of server cluster when the operation of j node layer correspondence enters executing state in the target search path;

Load _i ^jWhen the operation of j node layer correspondence enters executing state in the expression target search path, the load size of i server in the server cluster,

When the operation of j node layer correspondence enters executing state in the expression target search path, the average load of server cluster, described N represents the total quantity of server in the server cluster;

l _i ^jBe illustrated under the complete average case of server load in the server cluster, when the operation of j node layer correspondence enters executing state in the target search path, the load size of i server in the server cluster,

When the operation of j node layer correspondence enters executing state in the expression target search path, the average load of server cluster;

\overset{&OverBar;}{{Load}^{j}} = \frac{Σ_{i = 1}^{N} {Load}_{i}^{j}}{N};

\overset{&OverBar;}{l^{k}} = \frac{Σ_{i = 1}^{N} {l_{i}}^{j}}{N} .

6. method as claimed in claim 5 is characterized in that described load characterizes with number of tasks.

7. a data parallel disposal system is characterized in that, comprises server cluster and load balance scheduler;

Described load balance scheduler comprises:

8. a load balance scheduler is characterized in that, matches with server cluster, and the arbitrary server in the described server cluster has the ability of executing the task and storing data; Described load balance scheduler comprises: