CN103226467A - Data parallel processing method and system as well as load balancing scheduler - Google Patents

Data parallel processing method and system as well as load balancing scheduler Download PDF

Info

Publication number
CN103226467A
CN103226467A CN2013101951796A CN201310195179A CN103226467A CN 103226467 A CN103226467 A CN 103226467A CN 2013101951796 A CN2013101951796 A CN 2013101951796A CN 201310195179 A CN201310195179 A CN 201310195179A CN 103226467 A CN103226467 A CN 103226467A
Authority
CN
China
Prior art keywords
server
data
server cluster
load balancing
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013101951796A
Other languages
Chinese (zh)
Other versions
CN103226467B (en
Inventor
杨树强
华中杰
贾焰
尹洪
赵辉
李爱平
陈志坤
金松昌
周斌
韩伟红
韩毅
舒琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201310195179.6A priority Critical patent/CN103226467B/en
Publication of CN103226467A publication Critical patent/CN103226467A/en
Application granted granted Critical
Publication of CN103226467B publication Critical patent/CN103226467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Multi Processors (AREA)

Abstract

The embodiment of the invention discloses a data parallel processing method and system as well as a load balancing scheduler. In the embodiment of the invention, any server in a server cluster has the capabilities of executing a task and storing data. On the basis, view from job scheduling, the embodiment of the invention has the advantages that the overall load balancing states of the system under different executing sequences are predicated according to a calculation localization strategy, an executing sequence for enabling the overall load balancing states of the system to be optimal is selected, and the scheduling operation is carried out according to the sequence. View from task scheduling, the embodiment of the invention has the advantage that each task entering an executing state is allocated according to the calculation localization strategy. Each data processing task is allocated on a server for storing a data block corresponding to the task in the calculation localization strategy, thus when the task is handled, the same server is used as both a server node for storing the data block and a server node for executing the task, so that network data transmission between the server nodes is reduced, and the data processing performance is improved.

Description

Data parallel processing method, system and load balance scheduler
Technical field
The present invention relates to technical field of data processing, more particularly, relate to data parallel processing method, system and load balance scheduler.
Background technology
Under distributed computing environment, for example, the MapReduce(that is proposed by Google is hereinafter to be referred as MR) in the parallel computation programming model, the data of the required processing of operation have been divided into a plurality of data blocks, and be that unit is stored on one or more server nodes with the data block.Behind client's submit job, this operation will be divided into and data block task one to one, and these tasks will be assigned to executed in parallel on the different server nodes.If do not store the data block of this task correspondence on the server node of executing the task, then need by network data transmission, data block is transferred to the server node that this is executed the task from the server node of storing it.Therefore, how to reduce the network data transmission expense between the server node, promote the performance of data processing, become the hot topic of present research.
Summary of the invention
In view of this, the purpose of the embodiment of the invention is to provide data parallel processing method, system and load balance scheduler, to address the above problem.
For achieving the above object, the embodiment of the invention provides following technical scheme:
A kind of data parallel processing method, based on server cluster, the arbitrary server in the described server cluster has the ability of executing the task and storing data;
Described method comprises:
The operation waiting list is put in the operation that the user submits to by client, and collected the DATA DISTRIBUTION information of described operation; The data of the required processing of described operation are divided into a plurality of data blocks, and be stored in respectively on the server in the described server cluster, the corresponding data Processing tasks of each described data block, described DATA DISTRIBUTION information comprises the distributed intelligence of the data block of described operation correspondence;
The operation number of carrying out when described server cluster is during less than first threshold, according to described DATA DISTRIBUTION information, prediction distributes the overall system load balancing state that operation caused in the described operation waiting list according to calculating localization strategy under different execution sequences, obtain optimum execution sequence;
Operation in the described operation waiting list is resequenced by described optimum execution sequence, and entering executing state according to the order after the rearrangement operation in the schedule job waiting list successively, the operation number of carrying out until described server cluster reaches first threshold or waits for that job queue is for empty;
According to calculating the operation that localization strategy distributes each to enter executing state, so that server is carried out data processing task;
Comprise described the distribution according to the calculating localization strategy:
Each data block at the data of the required processing of operation is created a data Processing tasks, and each data processing task is dispensed on the server of its corresponding data block of storage.
A kind of data parallel disposal system comprises server cluster and load balance scheduler;
Arbitrary server in the described server cluster has the ability of executing the task and storing data;
Described load balance scheduler comprises:
Pretreatment unit is used for the operation waiting list is put in the operation that the user submits to by client, and collects the DATA DISTRIBUTION information of described operation; The data of the required processing of described operation are divided into a plurality of data blocks, and be stored in respectively on the server in the described server cluster, the corresponding data Processing tasks of each described data block, described DATA DISTRIBUTION information comprises the distributed intelligence of the data block of described operation correspondence;
Predicting unit, when being used for the operation number carried out when described server cluster less than first threshold, according to described DATA DISTRIBUTION information, prediction distributes the overall system load balancing state that operation caused in the described operation waiting list according to calculating localization strategy under different execution sequences, obtain optimum execution sequence;
The job scheduling unit, be used for the operation of described operation waiting list is resequenced by described optimum execution sequence, and the operation of dispatching successively in the operation waiting list after the rearrangement according to the order after the rearrangement enters executing state, and the operation number of carrying out until described server cluster reaches first threshold or waits for that job queue is for empty;
The first task scheduling unit is used for according to calculating the operation that localization strategy distributes each to enter executing state, so that server is carried out data processing task;
Comprise described the distribution according to the calculating localization strategy:
Each data block at the data of the required processing of operation is created a data Processing tasks, and each data processing task is dispensed on the server of its corresponding data block of storage.
A kind of load balance scheduler matches with server cluster, and the arbitrary server in the described server cluster has the ability of executing the task and storing data; Described load balance scheduler comprises:
Pretreatment unit is used for the operation waiting list is put in the operation that the user submits to by client, and collects the DATA DISTRIBUTION information of described operation; The data of the required processing of described operation are divided into a plurality of data blocks, and be stored in respectively on the server in the described server cluster, the corresponding data Processing tasks of each described data block, described DATA DISTRIBUTION information comprises the distributed intelligence of the data block of described operation correspondence;
Predicting unit, when being used for the operation number carried out when described server cluster less than first threshold, according to described DATA DISTRIBUTION information, prediction distributes the overall system load balancing state that operation caused in the described operation waiting list according to calculating localization strategy under different execution sequences, obtain optimum execution sequence;
The job scheduling unit, be used for the operation of described operation waiting list is resequenced by described optimum execution sequence, and the operation of dispatching successively in the operation waiting list after the rearrangement according to the order after the rearrangement enters executing state, and the operation number of carrying out until described server cluster reaches first threshold or waits for that job queue is for empty;
The first task scheduling unit is used for according to calculating the operation that localization strategy distributes each to enter executing state, so that server is carried out data processing task;
Comprise described the distribution according to the calculating localization strategy:
Each data block at the data of the required processing of operation is created a data Processing tasks, and each data processing task is dispensed on the server of its corresponding data block of storage.
As seen, in embodiments of the present invention, the arbitrary server in the server cluster has the ability of executing the task and storing data.On this basis, on the job scheduling aspect, the embodiment of the invention is predicted overall system load balancing state under the different execution sequences according to calculating localization strategy, selects the execution sequence that can make overall system load balancing state optimization, and schedule job in this order.At task dispatch layer face, the embodiment of the invention distributes each to enter the operation of executing state according to calculating localization strategy.Owing to calculate localization strategy is that each data processing task is dispensed on the server of its corresponding data block of storage, like this, when Processing tasks, same server is not only as the server node of storage data block but also as the server node of executing the task, reduce the network data transmission between the server node, promoted the performance of data processing.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
The data processing synoptic diagram that Fig. 1 provides for the embodiment of the invention based on MR;
The data parallel processing method process flow diagram that Fig. 2 provides for the embodiment of the invention;
The system load balancing view that Fig. 3 provides for the embodiment of the invention;
The global search tree synoptic diagram that Fig. 4 provides for the embodiment of the invention;
The heuristic search strategic process synoptic diagram that Fig. 5 a and Fig. 5 b provide for the embodiment of the invention;
The data parallel disposal system synoptic diagram that Fig. 6 provides for the embodiment of the invention;
The load balance scheduler structural representation that Fig. 7 provides for the embodiment of the invention.
Embodiment
For quote and know for the purpose of, hereinafter the technical term of Shi Yonging, write a Chinese character in simplified form or abridge to sum up and be explained as follows:
Calculate localization: calculate localization and be meant under distributed computing environment, distribution by computational logic, make that the calculation server (computing node) of deal with data is identical with the storage server node (memory node) of these data of storage, reduce network data transmission expense between computing node and the memory node with this, promote the performance of data processing;
The data locality: but refer to the satisfaction degree that calculates localization, promptly calculate required data and whether can not pass through Network Transmission, and directly calculating the ability that the place node obtains.Generally in large-scale distributed computing environment, represent whole localized degree with localization ratio (the shared number percent of calculating of localization fully);
Load balancing: refer under distributed computing environment, with load balancing be assigned to two to a plurality of nodes (server), avoid partial load overweight so that obtain higher resource utilization, improve data processing performance.Load can be computational load, I/O load, offered load or the like.
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
MapReduce(is hereinafter to be referred as MR) be the parallel computation programming model that proposes by Google.Its basic thought is two functions (Map function and Reduce functions), and the parallel computation of the data processing request of any complexity (operation) on large-scale cluster all highly has been abstracted into these two functions.The MR model has not only been brought into play fabulous effect in actual applications, and is easy to learn and use, and is subjected to the favor of catenet IT enterprises.
The MR model is fit to the intensive calculating of deal with data.The data of the required processing of MR operation are divided into a plurality of data blocks (uncorrelated between the data block, as can be calculated separately), and these data blocks are stored on one or more server nodes.
Fig. 1 is the data processing synoptic diagram based on MR, supposes that handling this operation desired data is S set, and this set is divided into n mutually disjoint data subset (data block) S1~Sn, i.e. S=S1 ∪ S2 ... ∪ Sn.Each computation requests (operation) is broken down into a large amount of map and calculates (map task) and a small amount of reduce calculating (reduce task), map calculates and data block (S1~Sn) corresponding one by one, reduce independently calculates separately at the result of calculation (intermediate result that MR calculates) of map, and the result is saved in specified location in user.Wherein, the map task need be assigned to executed in parallel on the different computing nodes.Therefore, under the MR computing environment, the core is the scheduling of map task.Under other similar distributed computing environment, also need scheduler task.
Under distributed computing environment, there are the following problems: supposition, and the server node A job1 that executes the task, but do not store the data block of job1 correspondence, then need this data block to be transferred to server node A from the server node of storing it by network data transmission.How to reduce the network data transmission expense between the server node in the computation process, promote the performance of data processing, become the hot topic of present research.
In fact, existing task scheduling mode all is to satisfy particular demands (such as load balancing) as first target, will improve the data locality as second target, causes under the practical operation situation localization ratio not high.
Technical scheme provided by the present invention then will improve the data locality as first target, and solve data locality and the afoul problem of system load balancing by new thinking, when improving the data locality, the load balancing that optimization system is overall, reduce the network I/O expense in the computation process, increased the throughput of system and the execution time of the single operation of minimizing.
In addition, present MR scheduling mode is not distinguished job scheduling and task scheduling, and this is to be mainly used in the batch data processing at first because MR calculates, and generally has only a few operation carrying out, disturb lessly between the operation, do not need to distinguish job scheduling and task scheduling.But under the situation that the concurrent execution of a large amount of operations is arranged,, the optimization difficulty of scheduling mode will be strengthened if do not distinguish the scheduling of operation rank and task rank.And the core of technical scheme provided by the present invention is scheduling of operation rank and task rank scheduling: in the scheduling of operation rank, go out the best execution sequence of GSLB situation by the load balancing analyses and prediction; The task scheduling level not in, after operation enters executing state, can be divided into some Map tasks and Reduce task, according to the locality principle map task distribution is moved to its data place server.
To specifically introduce below.
Technical scheme provided by the present invention is based on server cluster, and the prerequisite of its enforcement is that the arbitrary server in the server cluster has the ability of executing the task and storing data, and like this, arbitrary server can be simultaneously as computing node and memory node.In other words, technical scheme provided by the present invention is based on each server and comprises independently storage and computing power, does not share the hypothesis of storage in the cluster.Present large-scale service provider and data center adopt this pattern, i.e. the large-scale cluster computing environment that forms by Internet connection by a large amount of low and middle-end servers, and therefore, this hypothesis is rational.
See also Fig. 2, the claimed data parallel processing method of the present invention comprises the steps: at least
S1, the operation that the user is submitted to by client are put into the operation waiting list, and are collected the DATA DISTRIBUTION information of operation.
Need to prove that the data of the required processing of operation that the user submits to are divided into a plurality of (at least two) data block (the corresponding map task of each data block), and be stored on the server in the server cluster.DATA DISTRIBUTION information then comprises the distributed intelligence of data block.
S2, the operation number carried out when server cluster are during less than first threshold, according to DATA DISTRIBUTION information, prediction according to the overall system load balancing state that operation caused that calculates in the localization strategy distribution operation waiting list, is obtained optimum execution sequence under different execution sequences.
Within this programme was discussed, those skilled in the art can not be provided with based on other technology or experience in the setting of first threshold size.In the MR technology, first threshold can be specified by the user.
In fact first threshold has represented the operation number of the maximum executed in parallel that server cluster can bear, if do not reach first threshold, and the more operation of also having the ability to carry out of expression server cluster.
S3, the operation in the operation waiting list is resequenced by optimum execution sequence, and the operation of dispatching successively in the operation waiting list after the rearrangement according to the order after the rearrangement enters executing state, and the operation number of carrying out until server cluster reaches first threshold or waits for that job queue is for empty.
Dispatch what operations and enter executing state, need determine by above-mentioned first threshold and the operation number that is in executing state.
By way of example, 4 station servers are arranged in the server cluster, every station server can be carried out 20 operations at most, can first threshold be set to 80.If the user has submitted 100 operations to, then there are 20 operations need put into the operation waiting list.
Suppose that server cluster has been finished 4 operations, like this, the operation number that server cluster is being carried out is 76 less than 80, at this moment, will predict to obtain optimum execution sequence (corresponding step S2).Afterwards, 20 operations in the operation waiting list are resequenced by optimum execution sequence, after the rearrangement, preceding 4 operations in the schedule job waiting list enter executing state (corresponding step S3).
S4, according to calculating the operation that localization strategy distributes each to enter executing state, so that server is carried out data processing task.
Need to prove that the operation number of carrying out whenever server cluster all needs to re-execute step S2-S4 during less than first threshold.
" distributing according to calculating localization strategy " among above-mentioned steps S2 and the S4 specifically can comprise: each data block at the data of the required processing of operation is created a data Processing tasks, and each data processing task is dispensed on the server of its corresponding data block of storage.Also promptly, the data block store of only seeing map task correspondence is just carried out this map task scheduling on which server to this server.
By way of example, 4 station server F1 to F4 are arranged in the server cluster.The data of operation X1 correspondence are divided into 2 data blocks, and are stored in respectively on F1 and the F2, and the data of operation X2 correspondence are divided into 3 data blocks, are stored in respectively among F1, F2, the F4.In this application, divide timing, operation X1 can be divided into two map tasks, and these two map tasks are respectively allocated on F1 and the F2 according to calculating localization strategy; Operation X2 is divided into three map tasks, and these three map tasks are respectively allocated on F1, F2, the F4.
Safeguarding on each server in the server cluster has local task queue (also can be the task waiting list), and the map task that distribution is come can be positioned in the local task queue, and server is carried out task in the local task queue according to the principle of first in first out.
More specifically, suffered, therefore, only needed computational logic with this map task be dispatched to carry out on this server and get final product because the data block wanted of a certain map required by task has been stored in a certain server.
Need to prove that one time map is calculated as a task, computational logic refers to map function, i.e. computing method.Computational logic on each data block of same operation is identical, and the computational logic of different operations may be different, also may be identical.
In addition, under special circumstances, if the operation number that server cluster is being carried out less than first threshold but have only an operation in the operation waiting list, does not then need to carry out above-mentioned steps S2 and S3, directly the operation in the schedule job waiting list enters executing state, afterwards execution in step S4.
As seen, in embodiments of the present invention, the arbitrary server in the server cluster has the ability of executing the task and storing data.On this basis, on the job scheduling aspect, the embodiment of the invention is predicted overall system load balancing state under the different execution sequences according to calculating localization strategy, select the execution sequence that can make overall system load balancing state optimization (load minimum and load are the most balanced), and schedule job in this order.
At task dispatch layer face, the embodiment of the invention makes each Map task all be dispatched on this required by task data place server and carries out, thereby make the Map task executions not have the network data transmission expense according to calculating the localization strategy allocating task.Reduce the network data transmission between the server node, promoted the performance of data processing.
In other embodiments of the invention, said method also can comprise:
The free time of each server in the periodic test server cluster;
Surpass on the server of second threshold value from data dispatching Processing tasks on the maximum server of number of tasks to free time.
More specifically, the data processing task at the local task queue end of server that can number of tasks is maximum is dispatched to free time above on the server of second threshold values.
Need to prove that simple scheduling mode is when scheduling occasion is ripe, each operation enters overall system load balancing state after the executing state in the computational tasks waiting list respectively, and the best operation of selection load balancing is dispatched it and entered executing state.This scheduling mode calculates simple, and when initial launch, can obtain effect preferably, but can cause the bad operation of load balancing situation slowly to be overstock, behind the long-play, may cause the load of system extremely unbalanced, Fig. 3 has illustrated this situation, and ordinate is load variance (can find out the load imbalance by the load variance) among Fig. 3, and horizontal ordinate is the time.
Above-mentioned situation for fear of Fig. 3, the embodiment of the invention is when predicting, it or not the overall system load balancing state after the single operation of prediction enters executing state, if but consider that All Jobs enters overall system load balancing state after the executing state according to certain execution sequence, select optimum execution sequence, thereby avoid the sharply situation appearance of decay of system performance shown in Figure 3.
To describe in detail to job scheduling below.
Aspect job scheduling, how to predict that to obtain optimum execution sequence be crucial.In one embodiment of the invention, " the overall system load balancing state that the operation in the prediction operation waiting list is caused under different execution sequences obtains optimum execution sequence " can comprise following substep among the above-mentioned steps S2:
One, structure global search tree, the global search tree comprises many searching routes sharing same root node, comprises leaf node in each searching route; Root node characterizes the current load balancing state of server cluster; Leaf node characterizes the operation in the operation waiting list, and different searching routes characterizes different execution sequences;
So that 3 operations to be arranged in the operation waiting list, and the ID of these 3 operations is respectively that job1-job3 is an example, and global search tree (referring to Fig. 4) can be by following substep structure:
Step1, the structure ground floor, ground floor has only a root node (start node), and root node is represented with job0.
Step2, because the operation that has N (3) to wait in the system, so the next operation that enters executing state has the possible selection of N kind, thereby the second layer can be expanded N leaf node, available operation ID represents leaf node.
Step3 constructs the 3rd layer, has selected an operation during owing to the structure second layer, so layer 2-based each node can be expanded N-1 (2) node and form the 3rd layer.
Step4 by that analogy, up to expanding, promptly can finish the structure of whole global search tree again.
Each searching route in the above-mentioned global search tree also can be considered operation and carries out sequence, searches for optimum execution sequence and is equivalent to search for optimum operation execution sequence.Present embodiment will be sought optimum operation, and to carry out sequence abstract be a graph search mathematical model, promptly seeks a optimal path from the root node to the leaf node based on some searching algorithm in as the global search tree of Fig. 4, thereby get access to optimum execution sequence.
Two, calculate the load balancing predicted value (the load balancing predicted value is used for characterization system overall load equilibrium state) of different searching routes, with the execution sequence of the searching route correspondence of load balancing predicted value minimum as optimum execution sequence.
But need to prove, if there be N to wait for operation, ANN kind searching route is so just arranged, obviously when dispatching each time the All Jobs in the current waiting list being carried out the overall situation considers, calculated amount can be very big, and is therefore, preferred, in the following embodiment of the present invention, adopt the heuristic search policy calculation.
Referring to Fig. 5 a, heuristic search strategy detailed content is as follows:
Steps A as destination node, is calculated the evaluation of estimate of destination node with root node, and with the evaluation of estimate of the root node load balancing predicted value as each searching route;
Step B, the searching route of selecting load balancing predicted value minimum is as the target search path, with the execution sequence of target search path correspondence as the target execution sequence;
Step C judges whether also there is the leaf node that does not carry out the load balancing predictor calculation in the target search path; If not, with the target execution sequence as optimum execution sequence (step e); If, with next leaf node of current goal node in the target search path as destination node, calculate the load balancing predicted value (step D) of the evaluation of estimate of destination node, return the step (step B) of the searching route of selection load balancing predicted value minimum as the target search path as affiliated searching route.
Below, see also Fig. 5 b, this paper will be there to be P (P=5) operation in the operation waiting list, the ID of these 5 operations is respectively that job1-job5 is that (P has determined the number of plies or perhaps the height of global search tree to example, the number of plies of global search tree or highly be P+1), the heuristic search strategy is introduced in more detail.
S501, the evaluation of estimate f(0 of calculating root node);
S502 on the root node basis, expands P leaf node, and at each leaf node, calculates the evaluation of estimate of each leaf node.
Need to prove that in step S502, all searching routes in the global search tree all are respectively target search path (because the f (0) of each searching route equates), and the 1st leaf node in each searching route is destination node respectively.
As for specifically how calculating evaluation of estimate, this paper is follow-up will to be described in detail.
S503 finds the node of evaluation of estimate minimum in the leaf node of the search tree of having expanded, and the lower level node of expanding this node calculates evaluation of estimate as destination node.
Suppose behind execution in step S502, job0-〉job1-〉job3-〉job5-〉job4-〉this operation of job2 carries out the evaluation of estimate minimum of job1 correspondence in sequence.Then expand the lower level node of the leaf node of job1 correspondence.Following one deck leaf node of expansion is respectively the leaf node of job2, job3, job4, job5 correspondence, and calculates the evaluation of estimate of each leaf node respectively.
S504, circulation has promptly found optimum operation to carry out sequence up to the P+1 layer that expands to the global search tree, and then search procedure finishes.
To introduce below and how calculate evaluation of estimate.
Suppose that destination node is the M node layer in a certain target search path.At each destination node, all available its evaluation of estimate of following function calculation f (M):
f(M)=g(M)+h(M)
(formula one)
Wherein:
When the operation of g (M) expression destination node correspondence is carried out, from the root node to the destination node load balancing value of related All Jobs and (the load balancing value that comprises original state).The available following formula of g (M) is calculated:
g ( M ) = Σ j = 1 M LB j (formula two)
Wherein, j represents j node layer in the target search path;
In formula two, LB jExpression, the load balancing predicted value of server cluster when the operation of j node layer correspondence enters executing state in the target search path, and LB1 represents the current load balancing value of server cluster (current system actual loading equilibrium value).
Still see also Fig. 5, with job0-〉job1-〉job3-〉job5-〉job4-〉to carry out sequence be example to this operation of job2, suppose that job3 is a destination node, then need the LB1 and the corresponding LB2 of job1 (job1 is a second layer node) of job0 correspondence are sued for peace.
Before address LB jThe load balancing predicted value of server cluster when the operation of j node layer correspondence enters executing state in the expression target search path.In other words, LB jIt is the quantification of operation to the j node layer correspondence load balancing of server cluster when entering executing state.
LB jThe load variance is represented (referring to formula three) between the available server, represents then that when variance is 0 the load balancing of the system of this moment has reached best.
LB j = Σ j = 1 M Σ i = 1 N ( Load i j - Load j ‾ ) 2 (formula three)
Wherein, Load i jWhen the operation of j node layer correspondence enters executing state in the expression target search path, the load size of i server (N represents the total quantity of server in the server cluster) in the server cluster,
Figure BDA00003236077600123
The average load of server cluster (also, when the operation of j node layer correspondence entered executing state in the expression target search path
Figure BDA00003236077600124
).
Since in the MR technology, the corresponding data block of each map task, and the size of each data block is identical, all map computational logics of same operation are identical.Therefore, in the present embodiment, the load of server size is with representing for the map number of tasks of this server-assignment.In actual MR system, the load size equals map task queue length (comprising task and the waiting task carried out) on i the server.
Still with job0-〉job1-〉job3-〉job5-〉job4-〉to carry out sequence be example to this operation of job2, suppose to have in the server cluster five servers (F1-F5), the original state of server cluster is, on each server the length of local task queue to be 2(also be that the map number of tasks is 2).Suppose that the data of operation job1 correspondence are divided into 2 data blocks, and be stored in respectively on F1 and the F2 that the data of operation job3 correspondence are divided into 3 data blocks, be stored in respectively among F1, F2, the F4.
If the leaf node of job1 correspondence is a destination node, when then job1 entered executing state, the load of F1 was 3, and the load of F2 is 3, and the load of F3 is 2, and the load of F4 is 2, and the load of F5 is 2, and the g of job1 correspondence (M) is 1.25.
And if the leaf node of job3 correspondence is a destination node, when then job3 entered executing state, the load of F1 was 4, and the load of F2 is 4, and the load of F3 is 2, and the load of F4 is 3, and the load of F5 is 2.Then the g of job3 correspondence (M) is 4.
Under h (M) the expression perfect condition, the summation of the load balancing value that expection produced when remaining All Jobs was carried out.So-called perfect condition is meant that the server load in the server cluster is average fully.The available following formula of h (M) is calculated:
h ( M ) = Σ j = M + 1 P + 1 lb j (formula four)
Wherein, lb jExpression, under the complete average case of the server load in server cluster, the load balancing predicted value of server cluster when the operation of j node layer correspondence enters executing state in the target search path.
Lb jAvailable following formula is calculated:
lb j = Σ j = M + 1 P + 1 Σ i = 1 N ( l i j - l j ‾ ) 2 (formula five)
Wherein, l i jBe illustrated under the complete average case of server load in the server cluster, when the operation of j node layer correspondence enters executing state in the target search path, the load size of i server in the server cluster.
And
Figure BDA00003236077600141
When the operation of j node layer correspondence entered executing state in the expression target search path, the average load of server cluster (also was
Figure BDA00003236077600142
).
In like manner, l i jCan equal, under the complete average case of the server load in server cluster, map task queue length on i the server.
Still with job0-〉job1-〉job3-〉job5-〉job4-〉to carry out sequence be example to this operation of job2, suppose to have in the server cluster five servers (F1-F5), the original state of server cluster is, on each server the length of local task queue to be 2(also be that the map number of tasks is 2).
Suppose that the data of operation job1 correspondence are divided into 2 data blocks, and be stored in respectively on F1 and the F2; The data of operation job3 correspondence are divided into 3 data blocks, are stored in respectively among F1, F2, the F4; The data of job5 correspondence are divided into 3 data blocks, are stored in respectively among F1, F3, the F4; The data of job4 correspondence are divided into 4 data blocks, are stored in respectively among F2, F3, F4, the F5, and the data of job2 correspondence are divided into 2 data blocks, are stored in respectively among F1, the F5.
At job0-〉job1-〉job3-〉job5-〉job4-〉this operation of job2 carries out in sequence, if the leaf node of job1 correspondence is a destination node, when then job1 entered executing state, the load of F1 was 3, and the load of F2 is 3, the load of F3 is 2, the load of F4 is 2, and the load of F5 is 2, and then the g of job1 correspondence (M) is 1.25, corresponding h (M) is 4.4, and corresponding evaluation of estimate f (M) is 5.65.
And if the leaf node of job3 correspondence during as destination node, when then job3 entered executing state, the g of job3 correspondence (M) was 4, and corresponding h (M) is 3.2, and corresponding evaluation of estimate f (M) is 7.2.
Corresponding with said method, the present invention also desires the protected data parallel processing system (PPS), and referring to Fig. 6, this system can comprise server cluster 1 and load balance scheduler 2 at least;
Arbitrary server in the server cluster 1 has the ability of executing the task and storing data;
Referring to Fig. 7, above-mentioned load balance scheduler 2 can comprise:
Pretreatment unit 21 is used for the operation waiting list is put in the operation that the user submits to by client, and collects the DATA DISTRIBUTION information of operation;
Predicting unit 22, when being used for the operation number carried out when server cluster less than first threshold, according to DATA DISTRIBUTION information, prediction according to the overall system load balancing state that operation caused that calculates in the localization strategy distribution operation waiting list, is obtained optimum execution sequence under different execution sequences;
Job scheduling unit 23, be used for the operation of operation waiting list is resequenced by optimum execution sequence, and the operation of dispatching successively in the operation waiting list after the rearrangement according to the order after the rearrangement enters executing state, and the operation number of carrying out until server cluster reaches first threshold or waits for that job queue is for empty;
First task scheduling unit 24 is used for according to calculating the operation that localization strategy distributes each to enter executing state, so that server is executed the task.
Detail can not given unnecessary details at this referring to the aforementioned introduction of this paper.
In other embodiments of the invention, above-mentioned load balance scheduler also can comprise the second task scheduling unit, the free time that is used for described each server of server cluster of periodic test, and surpass on the server of second threshold value from data dispatching Processing tasks on the maximum server of number of tasks to free time.Detail can not given unnecessary details at this referring to the aforementioned introduction of this paper.
Load balance scheduler among also claimed above-mentioned all embodiment of the embodiment of the invention.
Need to prove that load balance scheduler can be a hardware device, also can be software program.And each unit in the load balance scheduler also can be hardware device (for example, pretreatment unit can be preprocessing server, the actual predictive server that can be of predicting unit) or software program.When load balance scheduler was software program, it can be installed in arbitrary server of server cluster.
Each embodiment adopts the mode of going forward one by one to describe in this instructions, and what each embodiment stressed all is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.For the device that embodiment provides, because it is corresponding with the method that embodiment provides, so description is fairly simple, relevant part partly illustrates referring to method and gets final product.
Also need to prove, in this article, relational terms such as first and second grades only is used for an entity or operation are made a distinction with another entity or operation, and not necessarily requires or hint and have the relation of any this reality or in proper order between these entities or the operation.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby make and comprise that process, method, article or the equipment of a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or also be included as this process, method, article or equipment intrinsic key element.Do not having under the situation of more restrictions, the key element that limits by statement " comprising ... ", and be not precluded within process, method, article or the equipment that comprises key element and also have other identical element.
Through the above description of the embodiments, the those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential common hardware, common hardware comprises universal integrated circuit, universal cpu, general-purpose storage, universal elements etc., can certainly comprise that special IC, dedicated cpu, private memory, special-purpose components and parts wait and realize by specialized hardware, but the former is better embodiment under a lot of situation.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in the storage medium that can read, as USB flash disk, mobile memory medium, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), various media that can the storing software program code such as magnetic disc or CD, comprise that some instructions are with so that a computer equipment (can be a personal computer, server, the perhaps network equipment etc.) carry out the method for each embodiment of the present invention.
Above-mentioned explanation to the embodiment that provided makes this area professional and technical personnel can realize or use the present invention.Multiple modification to these embodiment will be conspicuous concerning those skilled in the art, and defined herein General Principle can realize under the situation that does not break away from the spirit or scope of the present invention in other embodiments.Therefore, the present invention will can not be restricted to these embodiment shown in this article, but principle and the features of novelty the wideest corresponding to scope that is provided with this paper will be provided.

Claims (8)

1. a data parallel processing method is characterized in that, based on server cluster, the arbitrary server in the described server cluster has the ability of executing the task and storing data;
Described method comprises:
The operation waiting list is put in the operation that the user submits to by client, and collected the DATA DISTRIBUTION information of described operation; The data of the required processing of described operation are divided into a plurality of data blocks, and be stored in respectively on the server in the described server cluster, the corresponding data Processing tasks of each described data block, described DATA DISTRIBUTION information comprises the distributed intelligence of the data block of described operation correspondence;
The operation number of carrying out when described server cluster is during less than first threshold, according to described DATA DISTRIBUTION information, prediction distributes the overall system load balancing state that operation caused in the described operation waiting list according to calculating localization strategy under different execution sequences, obtain optimum execution sequence;
Operation in the described operation waiting list is resequenced by described optimum execution sequence, and entering executing state according to the order after the rearrangement operation in the schedule job waiting list successively, the operation number of carrying out until described server cluster reaches first threshold or waits for that job queue is for empty;
According to calculating the operation that localization strategy distributes each to enter executing state, so that server is carried out data processing task;
Comprise described the distribution according to the calculating localization strategy:
Each data block at the data of the required processing of operation is created a data Processing tasks, and each data processing task is dispensed on the server of its corresponding data block of storage.
2. the method for claim 1 is characterized in that, also comprises:
The free time of each server in the described server cluster of periodic test;
Surpass on the server of second threshold value from data dispatching Processing tasks on the maximum server of number of tasks to free time.
3. method as claimed in claim 2 is characterized in that:
The overall system load balancing state that operation in the described operation waiting list of described prediction is caused under different execution sequences, obtain optimum execution sequence and comprise:
Structure global search tree, described global search tree comprises many searching routes sharing same root node, comprises leaf node in each searching route; Described root node characterizes the current load balancing state of server cluster; Described leaf node characterizes the operation in the described operation waiting list, and different searching routes characterizes different execution sequences;
Calculate the load balancing predicted value of different searching routes, with the execution sequence of the searching route correspondence of load balancing predicted value minimum as optimum execution sequence; Described load balancing predicted value is used to characterize described overall system load balancing state.
4. method as claimed in claim 3 is characterized in that: the load balancing predicted value of the different searching routes of described calculating comprises the execution sequence of the searching route correspondence of load balancing predicted value minimum as optimum execution sequence:
Root node as destination node, is calculated the evaluation of estimate of described destination node, and with the evaluation of estimate of the root node load balancing predicted value as each searching route;
The searching route of selecting load balancing predicted value minimum is as the target search path, with the execution sequence of target search path correspondence as the target execution sequence;
Judge and whether also have the leaf node that does not carry out the load balancing predictor calculation in the described target search path; If not, with described target execution sequence as optimum execution sequence; If, with next leaf node of current goal node in the described target search path as destination node, calculate the load balancing predicted value of the evaluation of estimate of described destination node, and return the step of the searching route of described selection load balancing predicted value minimum as the target search path as affiliated searching route.
5. method as claimed in claim 4 is characterized in that:
Operation quantity in the described operation waiting list is P, and described P is a positive integer;
In described target search path, described destination node is the M node layer, and M is not less than 1, is not more than P+1;
The evaluation of estimate of the described destination node of described calculating comprises:
Use formula f (M)=g (M)+h (M) to calculate the evaluation of estimate f (M) of described destination node;
Wherein:
g ( M ) = Σ j = 1 M LB j ;
h ( M ) = Σ j = M + 1 P + 1 lb j ;
J represents j node layer in the target search path;
LB jExpression, the load balancing predicted value of server cluster when the operation of j node layer correspondence enters executing state in the target search path, LB 1The current load balancing value of expression server cluster;
Lb jExpression, under the complete average case of the server load in server cluster, the load balancing predicted value of server cluster when the operation of j node layer correspondence enters executing state in the target search path;
Load i jWhen the operation of j node layer correspondence enters executing state in the expression target search path, the load size of i server in the server cluster,
Figure FDA00003236077500034
When the operation of j node layer correspondence enters executing state in the expression target search path, the average load of server cluster, described N represents the total quantity of server in the server cluster;
l i jBe illustrated under the complete average case of server load in the server cluster, when the operation of j node layer correspondence enters executing state in the target search path, the load size of i server in the server cluster,
Figure FDA00003236077500036
When the operation of j node layer correspondence enters executing state in the expression target search path, the average load of server cluster;
Load j ‾ = Σ i = 1 N Load i j N ;
l k ‾ = Σ i = 1 N l i j N .
6. method as claimed in claim 5 is characterized in that described load characterizes with number of tasks.
7. a data parallel disposal system is characterized in that, comprises server cluster and load balance scheduler;
Arbitrary server in the described server cluster has the ability of executing the task and storing data;
Described load balance scheduler comprises:
Pretreatment unit is used for the operation waiting list is put in the operation that the user submits to by client, and collects the DATA DISTRIBUTION information of described operation; The data of the required processing of described operation are divided into a plurality of data blocks, and be stored in respectively on the server in the described server cluster, the corresponding data Processing tasks of each described data block, described DATA DISTRIBUTION information comprises the distributed intelligence of the data block of described operation correspondence;
Predicting unit, when being used for the operation number carried out when described server cluster less than first threshold, according to described DATA DISTRIBUTION information, prediction distributes the overall system load balancing state that operation caused in the described operation waiting list according to calculating localization strategy under different execution sequences, obtain optimum execution sequence;
The job scheduling unit, be used for the operation of described operation waiting list is resequenced by described optimum execution sequence, and the operation of dispatching successively in the operation waiting list after the rearrangement according to the order after the rearrangement enters executing state, and the operation number of carrying out until described server cluster reaches first threshold or waits for that job queue is for empty;
The first task scheduling unit is used for according to calculating the operation that localization strategy distributes each to enter executing state, so that server is carried out data processing task;
Comprise described the distribution according to the calculating localization strategy:
Each data block at the data of the required processing of operation is created a data Processing tasks, and each data processing task is dispensed on the server of its corresponding data block of storage.
8. a load balance scheduler is characterized in that, matches with server cluster, and the arbitrary server in the described server cluster has the ability of executing the task and storing data; Described load balance scheduler comprises:
Pretreatment unit is used for the operation waiting list is put in the operation that the user submits to by client, and collects the DATA DISTRIBUTION information of described operation; The data of the required processing of described operation are divided into a plurality of data blocks, and be stored in respectively on the server in the described server cluster, the corresponding data Processing tasks of each described data block, described DATA DISTRIBUTION information comprises the distributed intelligence of the data block of described operation correspondence;
Predicting unit, when being used for the operation number carried out when described server cluster less than first threshold, according to described DATA DISTRIBUTION information, prediction distributes the overall system load balancing state that operation caused in the described operation waiting list according to calculating localization strategy under different execution sequences, obtain optimum execution sequence;
The job scheduling unit, be used for the operation of described operation waiting list is resequenced by described optimum execution sequence, and the operation of dispatching successively in the operation waiting list after the rearrangement according to the order after the rearrangement enters executing state, and the operation number of carrying out until described server cluster reaches first threshold or waits for that job queue is for empty;
The first task scheduling unit is used for according to calculating the operation that localization strategy distributes each to enter executing state, so that server is carried out data processing task;
Comprise described the distribution according to the calculating localization strategy:
Each data block at the data of the required processing of operation is created a data Processing tasks, and each data processing task is dispensed on the server of its corresponding data block of storage.
CN201310195179.6A 2013-05-23 2013-05-23 Data parallel processing method, system and load balance scheduler Active CN103226467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310195179.6A CN103226467B (en) 2013-05-23 2013-05-23 Data parallel processing method, system and load balance scheduler

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310195179.6A CN103226467B (en) 2013-05-23 2013-05-23 Data parallel processing method, system and load balance scheduler

Publications (2)

Publication Number Publication Date
CN103226467A true CN103226467A (en) 2013-07-31
CN103226467B CN103226467B (en) 2015-09-30

Family

ID=48836933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310195179.6A Active CN103226467B (en) 2013-05-23 2013-05-23 Data parallel processing method, system and load balance scheduler

Country Status (1)

Country Link
CN (1) CN103226467B (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530182A (en) * 2013-10-22 2014-01-22 海南大学 Working scheduling method and device
CN104199912A (en) * 2014-08-28 2014-12-10 无锡天脉聚源传媒科技有限公司 Task processing method and device
CN104301241A (en) * 2014-06-05 2015-01-21 中国人民解放军信息工程大学 SOA dynamic load distribution method and system
CN104915250A (en) * 2015-06-03 2015-09-16 电子科技大学 Method for realizing MapReduce data localization in operations
CN105516325A (en) * 2015-12-18 2016-04-20 内蒙古农业大学 Cloud load balancing method for carrying out elastic expansion and traffic distribution expansion according to application load
CN105721595A (en) * 2016-03-03 2016-06-29 上海携程商务有限公司 IOS APP packaging method and system
CN105847356A (en) * 2016-03-23 2016-08-10 上海爱数信息技术股份有限公司 Communication system, electronic device, data processing method and system
CN105959395A (en) * 2016-06-15 2016-09-21 徐州医科大学 Cluster self-feedback type load balancing scheduling system and method
CN105991705A (en) * 2015-02-10 2016-10-05 中兴通讯股份有限公司 Distributed storage system and method of realizing hard affinity of resource
CN106020988A (en) * 2016-06-03 2016-10-12 北京邮电大学 Off-line task scheduling method and device for intelligent video monitoring system
WO2017020742A1 (en) * 2015-08-06 2017-02-09 阿里巴巴集团控股有限公司 Load balancing method and device
CN106487823A (en) * 2015-08-24 2017-03-08 上海斐讯数据通信技术有限公司 A kind of document transmission method based on SDN framework and system
CN106681823A (en) * 2015-11-05 2017-05-17 田文洪 Load balancing method for processing MapReduce data skew
CN106844051A (en) * 2017-01-19 2017-06-13 河海大学 The loading commissions migration algorithm of optimised power consumption in a kind of edge calculations environment
CN107103009A (en) * 2016-02-23 2017-08-29 杭州海康威视数字技术股份有限公司 A kind of data processing method and device
CN107766160A (en) * 2017-09-26 2018-03-06 平安科技(深圳)有限公司 Queue message processing method and terminal device
CN108469988A (en) * 2018-02-28 2018-08-31 西北大学 A kind of method for scheduling task based on isomery Hadoop clusters
CN108563497A (en) * 2018-04-11 2018-09-21 中译语通科技股份有限公司 A kind of efficient various dimensions algorithmic dispatching method, task server
CN109343138A (en) * 2018-09-29 2019-02-15 深圳市华讯方舟太赫兹科技有限公司 A kind of load-balancing method and rays safety detection apparatus of safe examination system
CN109358959A (en) * 2018-10-23 2019-02-19 电子科技大学 Data distribution formula cooperative processing method based on prediction
CN109445282A (en) * 2018-11-07 2019-03-08 北京航空航天大学 A kind of Optimization Scheduling towards basic device processing technology
CN110018893A (en) * 2019-03-12 2019-07-16 平安普惠企业管理有限公司 A kind of method for scheduling task and relevant device based on data processing
CN110888919A (en) * 2019-12-04 2020-03-17 阳光电源股份有限公司 HBase-based big data statistical analysis method and device
CN111158919A (en) * 2020-01-20 2020-05-15 北京一流科技有限公司 Memory resource in-place sharing decision system and method thereof
CN111290841A (en) * 2018-12-10 2020-06-16 北京沃东天骏信息技术有限公司 Task scheduling method and device, computing equipment and storage medium
CN111552910A (en) * 2019-02-08 2020-08-18 萨沃伊公司 Method for ordering loads in an automated distribution system that reduces out-of-order during collection of loads on collectors
CN112150035A (en) * 2020-10-13 2020-12-29 中国农业银行股份有限公司 Data processing method and device
CN112328171A (en) * 2020-10-23 2021-02-05 苏州元核云技术有限公司 Data distribution prediction method, data equalization method, device and storage medium
CN112532464A (en) * 2021-02-08 2021-03-19 中国人民解放军国防科技大学 Data distributed processing acceleration method and system across multiple data centers
CN112631771A (en) * 2020-12-18 2021-04-09 江苏康融科技有限公司 Parallel processing method of big data system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040093420A1 (en) * 2002-11-13 2004-05-13 Gamble Jonathan Bailey Method and system for transferring large data files over parallel connections
US20110145511A1 (en) * 2009-12-14 2011-06-16 International Business Machines Corporation Page invalidation processing with setting of storage key to predefined value
US8056084B2 (en) * 2007-01-25 2011-11-08 Hewlett-Packard Development Company, L.P. Method and system for dynamically reallocating a resource among operating systems without rebooting of the computer system
EP2472397A1 (en) * 2010-12-28 2012-07-04 POLYTEDA Software Corporation Limited Load distribution scheduling method in data processing system
CN102708011A (en) * 2012-05-11 2012-10-03 南京邮电大学 Multistage load estimating method facing task scheduling of cloud computing platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040093420A1 (en) * 2002-11-13 2004-05-13 Gamble Jonathan Bailey Method and system for transferring large data files over parallel connections
US8056084B2 (en) * 2007-01-25 2011-11-08 Hewlett-Packard Development Company, L.P. Method and system for dynamically reallocating a resource among operating systems without rebooting of the computer system
US20110145511A1 (en) * 2009-12-14 2011-06-16 International Business Machines Corporation Page invalidation processing with setting of storage key to predefined value
EP2472397A1 (en) * 2010-12-28 2012-07-04 POLYTEDA Software Corporation Limited Load distribution scheduling method in data processing system
CN102708011A (en) * 2012-05-11 2012-10-03 南京邮电大学 Multistage load estimating method facing task scheduling of cloud computing platform

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
孟令芬: "pc集群作业调度算法研究", 《中国石油大学硕士学位论文》, 31 March 2010 (2010-03-31) *
彭利民等: "动态结构化P2P网络的负载均衡方案", 《华南理工大学学报(自然科学版)》, vol. 39, no. 10, 31 October 2011 (2011-10-31) *
王凯: "MapReduce集群多用户作业调度方法的研究与实现", 《国防科学技术大学硕士学位论文》, 29 February 2012 (2012-02-29) *
王凯等: "一种多用户MapReduce集群的作业调度算法的设计与实现", 《计算机与现代化》, no. 10, 15 October 2010 (2010-10-15) *

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530182A (en) * 2013-10-22 2014-01-22 海南大学 Working scheduling method and device
CN104301241A (en) * 2014-06-05 2015-01-21 中国人民解放军信息工程大学 SOA dynamic load distribution method and system
CN104199912A (en) * 2014-08-28 2014-12-10 无锡天脉聚源传媒科技有限公司 Task processing method and device
CN104199912B (en) * 2014-08-28 2018-10-26 无锡天脉聚源传媒科技有限公司 A kind of method and device of task processing
CN105991705B (en) * 2015-02-10 2020-04-28 中兴通讯股份有限公司 Distributed storage system and method for realizing hard affinity of resources
CN105991705A (en) * 2015-02-10 2016-10-05 中兴通讯股份有限公司 Distributed storage system and method of realizing hard affinity of resource
CN104915250A (en) * 2015-06-03 2015-09-16 电子科技大学 Method for realizing MapReduce data localization in operations
CN104915250B (en) * 2015-06-03 2018-04-06 电子科技大学 It is a kind of to realize the method for making MapReduce data localization in the industry
WO2017020742A1 (en) * 2015-08-06 2017-02-09 阿里巴巴集团控股有限公司 Load balancing method and device
CN106445677A (en) * 2015-08-06 2017-02-22 阿里巴巴集团控股有限公司 Load balancing method and device
CN106487823A (en) * 2015-08-24 2017-03-08 上海斐讯数据通信技术有限公司 A kind of document transmission method based on SDN framework and system
CN106681823A (en) * 2015-11-05 2017-05-17 田文洪 Load balancing method for processing MapReduce data skew
CN105516325A (en) * 2015-12-18 2016-04-20 内蒙古农业大学 Cloud load balancing method for carrying out elastic expansion and traffic distribution expansion according to application load
CN107103009A (en) * 2016-02-23 2017-08-29 杭州海康威视数字技术股份有限公司 A kind of data processing method and device
CN107103009B (en) * 2016-02-23 2020-04-10 杭州海康威视数字技术股份有限公司 Data processing method and device
US11379271B2 (en) 2016-02-23 2022-07-05 Hangzhou Hikvision Digital Technology Co., Ltd. Parallel processing on data processing servers through even division of data records
CN105721595B (en) * 2016-03-03 2019-04-09 上海携程商务有限公司 The packaging method and system of the app of IOS system
CN105721595A (en) * 2016-03-03 2016-06-29 上海携程商务有限公司 IOS APP packaging method and system
CN105847356A (en) * 2016-03-23 2016-08-10 上海爱数信息技术股份有限公司 Communication system, electronic device, data processing method and system
CN106020988A (en) * 2016-06-03 2016-10-12 北京邮电大学 Off-line task scheduling method and device for intelligent video monitoring system
CN106020988B (en) * 2016-06-03 2019-03-15 北京邮电大学 A kind of offline method for scheduling task of intelligent video monitoring system and device
CN105959395A (en) * 2016-06-15 2016-09-21 徐州医科大学 Cluster self-feedback type load balancing scheduling system and method
CN106844051A (en) * 2017-01-19 2017-06-13 河海大学 The loading commissions migration algorithm of optimised power consumption in a kind of edge calculations environment
CN107766160B (en) * 2017-09-26 2019-12-13 平安科技(深圳)有限公司 queue message processing method and terminal equipment
CN107766160A (en) * 2017-09-26 2018-03-06 平安科技(深圳)有限公司 Queue message processing method and terminal device
CN108469988A (en) * 2018-02-28 2018-08-31 西北大学 A kind of method for scheduling task based on isomery Hadoop clusters
CN108563497A (en) * 2018-04-11 2018-09-21 中译语通科技股份有限公司 A kind of efficient various dimensions algorithmic dispatching method, task server
CN108563497B (en) * 2018-04-11 2022-03-29 中译语通科技股份有限公司 Efficient multi-dimensional algorithm scheduling method and task server
CN109343138A (en) * 2018-09-29 2019-02-15 深圳市华讯方舟太赫兹科技有限公司 A kind of load-balancing method and rays safety detection apparatus of safe examination system
CN109358959A (en) * 2018-10-23 2019-02-19 电子科技大学 Data distribution formula cooperative processing method based on prediction
CN109445282A (en) * 2018-11-07 2019-03-08 北京航空航天大学 A kind of Optimization Scheduling towards basic device processing technology
CN111290841B (en) * 2018-12-10 2024-04-05 北京沃东天骏信息技术有限公司 Task scheduling method, device, computing equipment and storage medium
CN111290841A (en) * 2018-12-10 2020-06-16 北京沃东天骏信息技术有限公司 Task scheduling method and device, computing equipment and storage medium
CN111552910A (en) * 2019-02-08 2020-08-18 萨沃伊公司 Method for ordering loads in an automated distribution system that reduces out-of-order during collection of loads on collectors
CN111552910B (en) * 2019-02-08 2023-07-14 萨沃伊公司 Method for ordering loads in an automated distribution system that reduces disorder during collection of loads on collectors
CN110018893A (en) * 2019-03-12 2019-07-16 平安普惠企业管理有限公司 A kind of method for scheduling task and relevant device based on data processing
CN110888919A (en) * 2019-12-04 2020-03-17 阳光电源股份有限公司 HBase-based big data statistical analysis method and device
WO2021147876A1 (en) * 2020-01-20 2021-07-29 北京一流科技有限公司 Memory resource in-situ sharing decision-making system and method
CN111158919A (en) * 2020-01-20 2020-05-15 北京一流科技有限公司 Memory resource in-place sharing decision system and method thereof
CN112150035B (en) * 2020-10-13 2023-06-13 中国农业银行股份有限公司 Data processing method and device
CN112150035A (en) * 2020-10-13 2020-12-29 中国农业银行股份有限公司 Data processing method and device
CN112328171A (en) * 2020-10-23 2021-02-05 苏州元核云技术有限公司 Data distribution prediction method, data equalization method, device and storage medium
CN112328171B (en) * 2020-10-23 2024-04-30 苏州元核云技术有限公司 Data distribution prediction method, data equalization method, device and storage medium
CN112631771A (en) * 2020-12-18 2021-04-09 江苏康融科技有限公司 Parallel processing method of big data system
CN112532464A (en) * 2021-02-08 2021-03-19 中国人民解放军国防科技大学 Data distributed processing acceleration method and system across multiple data centers

Also Published As

Publication number Publication date
CN103226467B (en) 2015-09-30

Similar Documents

Publication Publication Date Title
CN103226467B (en) Data parallel processing method, system and load balance scheduler
JP6898496B2 (en) Computation graph processing
Polo et al. Performance-driven task co-scheduling for mapreduce environments
EP3353655B1 (en) Stream-based accelerator processing of computational graphs
US20170031712A1 (en) Data-aware workload scheduling and execution in heterogeneous environments
US8887165B2 (en) Real time system task configuration optimization system for multi-core processors, and method and program
CN104952032A (en) Graph processing method and device as well as rasterization representation and storage method
Li et al. An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters
Lei et al. CREST: Towards fast speculation of straggler tasks in MapReduce
Deng et al. A data and task co-scheduling algorithm for scientific cloud workflows
CN105468439A (en) Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework
CN104281495A (en) Method for task scheduling of shared cache of multi-core processor
WO2016018352A1 (en) Platform configuration selection based on a degraded makespan
Menouer et al. Scheduling and resource management allocation system combined with an economic model
US10402762B2 (en) Heterogeneous platform configurations
US10713096B2 (en) System and method for handling data skew at run time
CN107329826A (en) A kind of heuristic fusion resource dynamic dispatching algorithm based on Cloudsim platforms
Venugopal et al. A set coverage-based mapping heuristic for scheduling distributed data-intensive applications on global grids
Wieczorek et al. Comparison of workflow scheduling strategies on the Grid
CN116302327A (en) Resource scheduling method and related equipment
KR102045997B1 (en) Method for scheduling task in big data analysis platform based on distributed file system, program and computer readable storage medium therefor
CN110175172A (en) Very big two points of groups parallel enumerating method based on sparse bipartite graph
CN113238873B (en) Method for optimizing and configuring spacecraft resources
US11847490B2 (en) Intelligent workload scheduling using a ranking of sequences of tasks of a workload
Liu et al. A survey of speculative execution strategy in MapReduce

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant