CN106502790A - A kind of task distribution optimization method based on data distribution - Google Patents

A kind of task distribution optimization method based on data distribution Download PDF

Info

Publication number
CN106502790A
CN106502790A CN201610890105.8A CN201610890105A CN106502790A CN 106502790 A CN106502790 A CN 106502790A CN 201610890105 A CN201610890105 A CN 201610890105A CN 106502790 A CN106502790 A CN 106502790A
Authority
CN
China
Prior art keywords
distribution
task
node
map
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610890105.8A
Other languages
Chinese (zh)
Inventor
王洪添
李萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Cloud Service Information Technology Co Ltd
Original Assignee
Shandong Inspur Cloud Service Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Cloud Service Information Technology Co Ltd filed Critical Shandong Inspur Cloud Service Information Technology Co Ltd
Priority to CN201610890105.8A priority Critical patent/CN106502790A/en
Publication of CN106502790A publication Critical patent/CN106502790A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of task distribution optimization method based on data distribution, which realizes that process is:According to the network distance between node and the data transfer cost of intermediate result weight distribution assessment of scenario reduce tasks;Show that the optimum of each task executes node set according to reduce tasks data transfer cost on different nodes;Specific Task Assigned Policy and algorithm are provided based on the optimum node set that executes.Task distribution optimization method that should be based on data distribution is compared with prior art, the data transfer that execute reduce task bring effectively is reduced, can be network access request that MapReduce programs reduce about 12%, and the operation response time also shortens 9% or so, practical.

Description

A kind of task distribution optimization method based on data distribution
Technical field
The present invention relates to computer data integrated technology field, specifically a kind of practical, be based on data distribution Task distribution optimization method.
Background technology
The explosive growth of information promotes internet to enter the big data epoch, and nowadays big data has become a kind of important Strategic resource and new decision mode, and cloud computing then provides powerful calculating and storage energy for big data process and analysis Power.With big data and the rise of cloud computing, increasing company starts with MapReduce and Hadoop to provide cloud clothes Business.Wherein, MapReduce is a kind of programming model that Google proposes, and is generally used for the concurrent operation of large-scale dataset, and Hadoop is one and achieves including the multiple programming that increases income including MapReduce model and distributed file system (HDFS) Framework, with high efficiency, highly reliable, high fault-tolerant, inexpensive and extendible characteristic.
The network bandwidth always restricts the bottleneck of cloud computing development, while being also one of current study hotspot.Such as Fig. 1 institutes Show, MapReduce programs can be abstracted into two specific functions:Map functions and reduce functions, wherein map functions are responsible for Decompose input data and carry out preliminary treatment, and reduce functions are responsible for collecting intermediate result to obtain final result. MapReduce frameworks build map tasks generally on the node of data storage block, can so reduce data transfer and to network The occupancy of bandwidth.But reduce tasks do not have the advantage of data localization, because the input of single reduce tasks is logical Often from the output of multiple map tasks, and each reduce task is required for exporting final result in HDFS, so The input and output of reduce functions is required for taking the network bandwidth.
This is based on, the present invention proposes a kind of task distribution optimization method based on data distribution, by reasonable distribution The starter node of reduce tasks reduces network and the I/O expenses that data transfer is brought, while improving the property of MapReduce programs Energy.
Content of the invention
The technical assignment of the present invention is for above weak point, there is provided a kind of practical, appointing based on data distribution Business distribution optimization method.
A kind of task distribution optimization method based on data distribution, which implements process and is:
First, according to the network distance between node and the data transfer of intermediate result weight distribution assessment of scenario reduce tasks Cost;
2nd, the data transfer cost according to reduce tasks on different nodes show that the optimum of each task executes node Set;
3rd, specific Task Assigned Policy and algorithm are provided based on the optimum node set that executes.
Network distance between the node is referred specifically to:When MapReduce programs have m map task Mi and n reduce to appoint During business Rj, wherein 0≤i≤m, 0≤j≤n, and the input of each reduce task is all from the output of all map tasks; On node of the intermediate result that map tasks are produced by network transmission to operation reduce tasks, the section that all map tasks are located Point is the overall network of Rj apart from TND to reduce tasks Rj place node apart from sumRj.
Described intermediate result weight distribution, reduces local prediction distribution map by obtaining global distributed intelligence, with key assignments To being counted to the weight distribution situation of intermediate result for granularity and being predicted, and the number with reference to network distance to reduce tasks It is estimated according to transmission cost.
The detailed process for obtaining global distributed intelligence is as follows:
1) when the implementation progress in map stages is α, each node is counted to intermediate result key-value pair, wherein slowstartconf≤ α≤1, slowstartconfFor user configured parameter, represent when the ratio for executing the map tasks for completing Reach slowstartconfWhen, start to execute reduce tasks;
2) when each node carries out subregion according to partition functions to intermediate result, the key assignments corresponding to intermediate result is counted Right, generate a series of (k, n) tuple and the value according to n is ranked up from big to small;
3) global interceptive value θ is set, i.e., only complete as building with % (k, n) tuples list of front θ in local distribution figure The foundation of score of the game Butut, in local distribution figure, the key assignments logarithm n of θ % (k, n) is referred to as local interceptive value, after blocking Distribution map be referred to as local truncated distribution figure L;
4) global distribution map G is built:Global distribution lower limit G is defined firstLWith global distribution upper limit GU, they distinguish table Show by the maximum and minimum of a value of local truncated distribution figure number of tuples corresponding with each key that local interceptive value is obtained, then If global distribution lower limit GL={ (k, NL) k ∈ K, global distribution upper limit GU={ (k, NU) | k ∈ K }, then haveWherein,
If 5) set global distribution map G={ (k, N) | k ∈ K }, then using the median of upper and lower bound as global distribution map Result, i.e.,
6) correction is predicted to global distribution map according to historical rethinking, it is assumed that the distribution bias of arbitrary key are current distribution Ratio and the difference of historical rethinking ratio, choose the maximum key of distribution bias for correction key kc, and with (kc, N) and kcHistorical rethinking Scale prediction intermediate result key-value pair sum, and then predict the corresponding key-value pair predicted value of each key, revised global distribution map Referred to as global prediction distribution map Gc.
By global distributed intelligence reduce local prediction distribution map detailed process be:
The local prediction distribution map is Lc, from global distribution map G, for arbitrary key k, if (k, n) ∈ is Li, then which is right LcContribute as n, otherwise contribution isBased on the key assignments logarithm N that global prediction distribution map and global distribution map prediction will be generatedc, According to the proportional division number of tuples of the operation progress of each map task, the progress of even map tasks isSo task In corresponding intermediate result, the prediction key assignments logarithm of key k is
The data transfer cost for assessing reduce tasks in the step one is specially:Node w executes reduce tasks r Data transfer cost Costw/rFor the data transfer cost sum that r pulls corresponding intermediate result key-value pair from each node, i.e.,Wherein, miFor executing the node of map tasks i, d (w, mi) it is network distance between two nodes, rinputInput key-value pair set for r.
Show in the step 2 that the optimum node set that executes of each task as show that the optimum of reduce tasks r is held Row node set NoptimalR (), executes the data transfer generation that task r can all produce minimum on the arbitrary node w that this is gathered Valency Costw/r, its detailed process is:
OPTIMAL TASK set R when arbitrary nodeoptimalN () is that node n draws in all reduce tasks having not carried out When taking the set that the minimum task of intermediate result key-value pair desired data transmission cost is constituted, it is not any task in present node Optimum execute node in the case of, task selector will be its distribution RoptimalTask in (n);
During node request reduce tasks, obtain the optimum of the task that is not carried out first successively and execute node set, if currently Node is that the optimum of the task executes node, then return the task;Otherwise, it is that the skipcount attributes of the task add 1, Skipcount have recorded each task due to optimum executing the number of times that node is asked and is skipped;If present node It is not that the optimum of any task executes node, then obtains the optimum of present node and execute task list, and select distribution to skip meter The maximum task of number;Optimum execute node and optimum execution task the map stages execute terminate before periodically update, to ensure to adjust The real-time of degree.
A kind of task distribution optimization method based on data distribution of the present invention, with advantages below:
A kind of task distribution optimization method based on data distribution that the present invention is provided, accurately global by obtaining Distributed intelligence reduce local prediction distribution map, with key-value pair as granularity to the weight distribution situation of intermediate result carried out statistics and Prediction;According to the network distance between node and the data transfer cost of intermediate result weight distribution assessment of scenario reduce tasks, And provide the degree of accuracy and network transmission expense for blocking the perception of Forecasting Methodology equilibrium criterion;The optimum of task based access control executes set of node The OPTIMAL TASK set with node is closed, the allocation strategy of reduce tasks under cloud computing environment is optimized and is given and be concrete Algorithm;On the basis of job class scheduling strategy, data transfer is reduced by the starter node of reasonable distribution reduce task The network for bringing and I/O expenses, while the performance of MapReduce programs is improve, practical, applied widely, it is easy to Promote.
Description of the drawings
Accompanying drawing 1 is MapReduce DFDs.
Accompanying drawing 2 is Hadoop cluster network configuration diagrams.
Specific embodiment
Below in conjunction with the accompanying drawings and specific embodiment the invention will be further described.
Hadoop clusters use client/server and tree-network topology structure.Under cloud computing environment, data Center generally comprises multiple frames, and each frame is assembled with multiple servers again, the characteristics of this framework is:Same machine frame inside Total bandwidth between node will be far above the bandwidth of different frame intermediate nodes.The present invention takes full advantage of this feature, by reasonable The starter node of distribution reduce tasks is reducing the data transfer between frame.
As shown in Figure 2, a kind of task distribution optimization method based on data distribution of the invention, is distributed with perception data For core, it is proposed that the task distribution optimisation strategy with data transfer cost as evaluation index.The strategy employs greedy algorithm Thought, is calculated optimum execution node set, is reduced as far as by the local prediction distribution map of structure intermediate result Data transfer during reduce tasks carryings, so as to reduce network and the I/O expenses that data transfer is brought, while improve should Throughput with the time performance and whole cluster of program.
Main contents include:
First, according to the network distance between node and the data transfer of intermediate result weight distribution assessment of scenario reduce tasks Cost;
2nd, the data transfer cost according to reduce tasks on different nodes show that the optimum of each task executes node Set;
3rd, specific Task Assigned Policy and algorithm are provided based on the optimum node set that executes.
Network distance between the node is referred specifically to:Under cloud computing environment, it is assumed that MapReduce programs have m (0≤i≤ M) individual map tasks Mi and individual reduce tasks Rj of n (0≤j≤n), and the input of each reduce task is all from all map The output of task.As the intermediate result that map tasks are produced is needed on the node by network transmission to operation reduce tasks, The overall network distance that apart from sum be referred to as Rj of the node that then all map tasks are located to reduce tasks Rj place node (TNDRj).Obvious TNDRjBigger, need that the intermediate result for being transferred to reduce tasks is more remote, the speed of data transfer is slower.
As illustrated in fig. 2, it is assumed that a Hadoop cluster includes two frames, N0~N9 is represented in Hadoop clusters respectively 10 from node, wherein N0~N4 is located at frame 1, and N5~N9 is located at frame 2;Assume that MapReduce programs there are 6 map to appoint Business and 4 reduce tasks, wherein map tasks are located at N0, N1, N2, N3, N5, N6 node respectively, and reduce tasks distinguish position In N3, N4, N6, N7 node;Simultaneously assume in Hadoop clusters each child node to the network distance of its father node be 1, then together In one frame, the network distance of two nodes is 2, and between different frames, the network distance of two nodes is 4.Calculate separately below every The overall network of individual reduce tasks is apart from TNDR0、TNDR1、TNDR2And TNDR3
TNDR0=3 × 2+2 × 4=14;
TNDR1=4 × 2+2 × 4=16;
TNDR2=4 × 4+1 × 2=18;
TNDR3=4 × 4+2 × 2=20;
As can be seen here, when reduce tasks be located at different from node when, its overall network distance is also not quite similar.This is just Show that reduce tasks also possess the characteristic of data localization, but from unlike map tasks, reduce tasks are more concerned with whole The input data in map inputs rather than individual node in frame.When reduce tasks are located at the N3 nodes in frame 1, Its overall network distance is 14, and when reduce tasks are located at the N7 nodes in frame 2, its overall network distance is 20.Therefore, The starter node of reasonable selection reduce task can reduce overall network distance, shorten the duration in shuffle stages, from And improve the time performance of application program.
Except considering the network distance of reduce tasks, the weight distribution of intermediate result is also to weigh data transfer cost Key factor.The distribution situation of intermediate result key-value pair can be collected and count with subregion as granularity, but this yet suffers from two Individual problem:1) in order to reduce the delay that intermediate result data transmission brings, the scheduling of reduce tasks is typically complete in the map stages Just have begun to before terminating, partition size now there may be very big difference with final key-value pair distribution, be likely to result in Scheduling result is inaccurate;2) distribution of key-value pair generally has certain rule, even if being carried out to final subregion distribution with this Prediction, also due to subregion granularity is big and the problem of the partition functions that place one's entire reliance upon, it is impossible to using existing knowledge to final key Value to distribution be predicted and revise.For the problems referred to above, weight point of the present invention with key-value pair as granularity to intermediate result Cloth is counted and has been predicted, and the data transfer cost of reduce tasks is assessed with reference to network distance.
The quantity of intermediate result key-value pair needs to be counted on each node for executing map tasks, but due to data Than larger, the data distribution collector that all key-value pair tuples (k, n) are transferred to host node can be expended more amount by each node Internet resources and time.On the other hand, absolutely accurate key-value pair distributed intelligence meaning is obtained simultaneously less, and institute is in this way simultaneously Unreasonable.The present invention obtains accurately overall situation distributed intelligence first, then pre- according to global prediction distribution map reduction local Survey distribution map to calculate the data transfer cost for executing required by task, concrete statistic processes is as follows:
(1) when the implementation progress in map stages is α (slowstartconfDuring≤α≤1), each node is to intermediate result key Value to counting, wherein slowstartconfFor user configured parameter, represent and reach when the ratio for executing the map tasks for completing Arrive slowstartconfWhen, start to execute reduce tasks.
(2) when each node carries out subregion according to partition functions to intermediate result, the key corresponding to intermediate result is counted Value is right, generates a series of (k, n) tuple and the value according to n is ranked up from big to small.
(3) global interceptive value θ is set, i.e., only complete as building with % (k, n) tuples list of front θ in local distribution figure The foundation of score of the game Butut, in local distribution figure, the key assignments logarithm n of θ % (k, n) is referred to as local interceptive valueAfter blocking Distribution map is referred to as local truncated distribution figure L.
(4) global distribution map G is built.Global distribution lower limit (G is defined firstL) and the global distribution upper limit (GU), they distinguish Represent by the maximum and minimum of a value of local truncated distribution figure number of tuples corresponding with each key that local interceptive value is obtained, Ran Houshe Global distribution lower limit GL={ (k, NL) | k ∈ K }, global distribution upper limit GU={ (k, NU) | k ∈ K }, then have Wherein,
(5) if setting global distribution map G={ (k, N) | k ∈ K }, then using the median of upper and lower bound as global distribution map Result, i.e.,
(6) correction is predicted to global distribution map according to historical rethinking, it is assumed that the distribution bias of arbitrary key are current point Cloth ratio and the difference of historical rethinking ratio, choose the maximum key of distribution bias for correction key kc, and with (kc, N) and kcHistory point Cloth scale prediction intermediate result key-value pair sum, and then the corresponding key-value pair predicted value of each key is predicted, revised global distribution Figure is referred to as global prediction distribution map Gc.
(7) local prediction distribution map L is reduced according to global prediction distribution mapc.From global distribution map G, for appointing One key k, if (k, n) ∈ is Li, then which is to LcContribute as n, otherwise contribution isBased on global prediction distribution map and global distribution Figure can predict the key assignments logarithm N that will be generatedc, according to the proportional division number of tuples of the operation progress of each map task, i.e., If the progress of map tasks isSo in the corresponding intermediate result of the task, the prediction key assignments logarithm of key k is
In sum, can be drawn based on the network distance between node and intermediate result weight distribution situation:Node w is executed Data transfer cost Cost of reduce tasks rw/rPass from the data that each node pulls corresponding intermediate result key-value pair for r Defeated cost sum, i.e.,Wherein, miFor executing the node of map tasks i, d (w, mi) it is two nodes Between network distance, rinputInput key-value pair set for r.
Show in the step 2 that the optimum node set that executes of each task as show that the optimum of reduce tasks r is held Row node set NoptimalR (), executes the data transfer generation that task r can all produce minimum on the arbitrary node w that this is gathered Valency Costw/r, in order to reduce network and the I/O expenses that intermediate result data transmission brings, the optimal distributing scheme of reduce tasks All tasks are all assigned to respective optimum execution exactly execute on node, so as to reach minimum global data transmission generation Valency.But sometimes in order to meet requirement of the user to the operation response time in service-level agreement in real time, service provider must Sharing out the work for all tasks must be completed before the time.Under this constraints, it is likely to result in part reduce and appoints Business cannot be executed in its optimum and execute on node.
Additionally, it is also one of factor of constraint task distribution whether to have available resources on optimum execution node.In order to solve This problem, it is assumed that the OPTIMAL TASK set R of arbitrary nodeoptimalN () is node in all reduce tasks having not carried out N pulls the set of the minimum task composition of intermediate result key-value pair desired data transmission cost.It it is not any in present node In the case of the optimum execution node of business, task selector will be which distributes RoptimalN the task in (), concrete allocation algorithm is such as Under:
During node request reduce tasks, obtain the optimum of the task that is not carried out first successively and execute node set, if currently Node is that the optimum of the task executes node, then return the task;Otherwise, it is that the skipcount attributes of the task add 1, Skipcount have recorded each task due to optimum executing the number of times (1-8 rows) that node is asked and is skipped.If Present node is not that the optimum of any task executes node, then obtain the optimum of present node and execute task list, and selects point Maximum task (9-16 rows) is counted with skipping.Optimum node and the optimum execution task of executing is before the map stages execute and terminate Periodically update, to ensure the real-time that dispatches.
Under cloud computing environment, the task distribution optimization method that the present invention is realized can effectively reduce execution reduce tasks The data transfer that brings, can be the network access request of MapReduce programs minimizing about 12%, and the operation response time also contracts Short by 9% or so.
Above-mentioned specific embodiment be only the present invention concrete case, the present invention scope of patent protection include but is not limited to Above-mentioned specific embodiment, a kind of claim of any task distribution optimization method based on data distribution for meeting the present invention The appropriate change or replacement done to which by book and any technical field those of ordinary skill, should all fall into the present invention's Scope of patent protection.

Claims (7)

1. a kind of task distribution optimization method based on data distribution, it is characterised in which realizes that process is:
First, according to the network distance between node and the data transfer generation of intermediate result weight distribution assessment of scenario reduce tasks Valency;
2nd, the data transfer cost according to reduce tasks on different nodes show that the optimum of each task executes set of node Close;
3rd, specific Task Assigned Policy and algorithm are provided based on the optimum node set that executes.
2. a kind of task distribution optimization method based on data distribution according to claim 1, it is characterised in that the section Network distance between point is referred specifically to:When MapReduce programs have m map task Mi and n reduce task Rj, wherein 0≤ I≤m, 0≤j≤n, and the input of each reduce task is all from the output of all map tasks;During map tasks are produced Between on node of the result by network transmission to operation reduce tasks, the node that all map tasks are located is to reduce tasks Rj places node apart from sum be Rj overall network apart from TNDRj.
3. a kind of task distribution optimization method based on data distribution according to claim 1, it is characterised in that described Intermediate result weight distribution, reduces local prediction distribution map by obtaining global distributed intelligence, is granularity to centre with key-value pair As a result weight distribution situation is counted and is predicted, and the data transfer cost of reduce tasks is carried out with reference to network distance Assessment.
4. a kind of task distribution optimization method based on data distribution according to claim 3, it is characterised in that obtain complete The detailed process of office's distributed intelligence is as follows:
1) when the implementation progress in map stages is α, each node is counted to intermediate result key-value pair, wherein slowstartconf≤ α≤1, slowstartconfFor user configured parameter, represent when the ratio for executing the map tasks for completing Reach slowstartconfWhen, start to execute reduce tasks;
2) when each node carries out subregion according to partition functions to intermediate result, the key-value pair corresponding to intermediate result is counted, Generate a series of (k, n) tuple and the value according to n is ranked up from big to small;
3) overall situation interceptive value θ is set, i.e., is only divided as structure is global with % (k, n) tuples list of front θ in local distribution figure The foundation of Butut, in local distribution figure, the key assignments logarithm n of θ % (k, n) is referred to as local interceptive valueDistribution after blocking Figure is referred to as local truncated distribution figure L;
4) global distribution map G is built:Global distribution lower limit G is defined firstLWith global distribution upper limit GU, they are represented respectively by office The maximum and minimum of a value of portion's truncated distribution figure number of tuples corresponding with each key that local interceptive value is obtained, then sets global distribution Lower limit GL={ (k, NL) | k ∈ K }, global distribution upper limit GU={ (k, NU) | k ∈ K }, then haveIts In,
If 5) set global distribution map G={ (k, N) | k ∈ K }, then using the median of upper and lower bound as the knot of global distribution map Really, i.e.,
6) correction is predicted to global distribution map according to historical rethinking, it is assumed that the distribution bias of arbitrary key are current distribution proportion With the difference of historical rethinking ratio, the maximum key of distribution bias is chosen for correction key kc, and with (kc, N) and kcHistorical rethinking ratio Prediction intermediate result key-value pair sum, and then predict that the corresponding key-value pair predicted value of each key, revised global distribution map are referred to as Global prediction distribution map Gc.
5. a kind of task distribution optimization method based on data distribution according to claim 4, it is characterised in that by complete The detailed process of score of the game cloth information reverting local prediction distribution map is:
The local prediction distribution map is Lc, from global distribution map G, for arbitrary key k, if (k, n) ∈ is Li, then which is to LcTribute Offer as n, otherwise contribution isBased on the key assignments logarithm N that global prediction distribution map and global distribution map prediction will be generatedc, according to The proportional division number of tuples of the operation progress of each map task, the progress of even map tasks isSo the task is corresponding Intermediate result in the prediction key assignments logarithm of key k be
6. a kind of task distribution optimization method based on data distribution according to claim 5, it is characterised in that the step The data transfer cost for assessing reduce tasks in rapid one is specially:Node w executes the data transfer cost of reduce tasks r Costw/rFor the data transfer cost sum that r pulls corresponding intermediate result key-value pair from each node, i.e.,Wherein, miFor executing the node of map tasks i, d (w, mi) it is network distance between two nodes, rinputInput key-value pair set for r.
7. a kind of task distribution optimization method based on data distribution according to claim 1, it is characterised in that the step Show in rapid two that the optimum node set that executes of each task is the optimum execution node set for drawing reduce tasks r NoptimalR (), executes data transfer cost Cost that task r can all produce minimum on the arbitrary node w that this is gatheredw/r, its Detailed process is:
OPTIMAL TASK set R when arbitrary nodeoptimalN () is in all reduce tasks having not carried out, during node n is pulled Between the minimum task composition of result key-value pair desired data transmission cost set when, present node be not any task most In the case of excellent execution node, task selector will distribute R for whichoptimalTask in (n);
During node request reduce tasks, obtain the optimum of the task that is not carried out first successively and execute node set, if present node It is that the optimum of the task executes node, then returns the task;Otherwise, it is that the skipcount attributes of the task add 1, skipcount Each task be have recorded as optimum the number of times that node is asked and is skipped cannot be executed;If present node is not any The optimum of business executes node, then obtain the optimum of present node and execute task list, and selects to distribute to skip and count maximum appointing Business;Optimum execute node and optimum execution task the map stages executes terminate before periodically update, to ensure the real-time of scheduling Property.
CN201610890105.8A 2016-10-12 2016-10-12 A kind of task distribution optimization method based on data distribution Pending CN106502790A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610890105.8A CN106502790A (en) 2016-10-12 2016-10-12 A kind of task distribution optimization method based on data distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610890105.8A CN106502790A (en) 2016-10-12 2016-10-12 A kind of task distribution optimization method based on data distribution

Publications (1)

Publication Number Publication Date
CN106502790A true CN106502790A (en) 2017-03-15

Family

ID=58295238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610890105.8A Pending CN106502790A (en) 2016-10-12 2016-10-12 A kind of task distribution optimization method based on data distribution

Country Status (1)

Country Link
CN (1) CN106502790A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506388A (en) * 2017-07-27 2017-12-22 浙江工业大学 A kind of iterative data balancing optimization method towards Spark parallel computation frames
CN109496321A (en) * 2017-07-10 2019-03-19 欧洲阿菲尼帝科技有限责任公司 For estimating the technology of the expection performance in task distribution system
CN109871265A (en) * 2017-12-05 2019-06-11 航天信息股份有限公司 The dispatching method and device of Reduce task
CN109947559A (en) * 2019-02-03 2019-06-28 百度在线网络技术(北京)有限公司 Optimize method, apparatus, equipment and computer storage medium that MapReduce is calculated
CN113467700A (en) * 2020-03-31 2021-10-01 阿里巴巴集团控股有限公司 Data distribution method and device based on heterogeneous storage

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120151292A1 (en) * 2010-12-14 2012-06-14 Microsoft Corporation Supporting Distributed Key-Based Processes
CN102541858A (en) * 2010-12-07 2012-07-04 腾讯科技(深圳)有限公司 Data equality processing method, device and system based on mapping and protocol
CN102629219A (en) * 2012-02-27 2012-08-08 北京大学 Self-adaptive load balancing method for Reduce ends in parallel computing framework
CN103279351A (en) * 2013-05-31 2013-09-04 北京高森明晨信息科技有限公司 Method and device for task scheduling
US20130290972A1 (en) * 2012-04-27 2013-10-31 Ludmila Cherkasova Workload manager for mapreduce environments
US20160034482A1 (en) * 2014-07-31 2016-02-04 International Business Machines Corporation Method and apparatus for configuring relevant parameters of mapreduce applications
CN105589752A (en) * 2016-02-24 2016-05-18 哈尔滨工业大学深圳研究生院 Cross-data center big data processing based on key value distribution

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541858A (en) * 2010-12-07 2012-07-04 腾讯科技(深圳)有限公司 Data equality processing method, device and system based on mapping and protocol
US20120151292A1 (en) * 2010-12-14 2012-06-14 Microsoft Corporation Supporting Distributed Key-Based Processes
CN102629219A (en) * 2012-02-27 2012-08-08 北京大学 Self-adaptive load balancing method for Reduce ends in parallel computing framework
US20130290972A1 (en) * 2012-04-27 2013-10-31 Ludmila Cherkasova Workload manager for mapreduce environments
CN103279351A (en) * 2013-05-31 2013-09-04 北京高森明晨信息科技有限公司 Method and device for task scheduling
US20160034482A1 (en) * 2014-07-31 2016-02-04 International Business Machines Corporation Method and apparatus for configuring relevant parameters of mapreduce applications
CN105589752A (en) * 2016-02-24 2016-05-18 哈尔滨工业大学深圳研究生院 Cross-data center big data processing based on key value distribution

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王捷: "基于SLA的MapReduce调度机制研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109496321A (en) * 2017-07-10 2019-03-19 欧洲阿菲尼帝科技有限责任公司 For estimating the technology of the expection performance in task distribution system
CN107506388A (en) * 2017-07-27 2017-12-22 浙江工业大学 A kind of iterative data balancing optimization method towards Spark parallel computation frames
CN109871265A (en) * 2017-12-05 2019-06-11 航天信息股份有限公司 The dispatching method and device of Reduce task
CN109947559A (en) * 2019-02-03 2019-06-28 百度在线网络技术(北京)有限公司 Optimize method, apparatus, equipment and computer storage medium that MapReduce is calculated
CN109947559B (en) * 2019-02-03 2021-11-23 百度在线网络技术(北京)有限公司 Method, device, equipment and computer storage medium for optimizing MapReduce calculation
CN113467700A (en) * 2020-03-31 2021-10-01 阿里巴巴集团控股有限公司 Data distribution method and device based on heterogeneous storage
CN113467700B (en) * 2020-03-31 2024-04-23 阿里巴巴集团控股有限公司 Heterogeneous storage-based data distribution method and device

Similar Documents

Publication Publication Date Title
Xu et al. A method based on the combination of laxity and ant colony system for cloud-fog task scheduling
CN104915407B (en) A kind of resource regulating method based under Hadoop multi-job environment
CN103699446B (en) Quantum-behaved particle swarm optimization (QPSO) algorithm based multi-objective dynamic workflow scheduling method
Wieczorek et al. Towards a general model of the multi-criteria workflow scheduling on the grid
CN106502790A (en) A kind of task distribution optimization method based on data distribution
CN104123182B (en) Based on the MapReduce task of client/server across data center scheduling system and method
Tantalaki et al. Pipeline-based linear scheduling of big data streams in the cloud
CN101263458A (en) Method and apparatus for a grid network throttle and load collector
CN106371924B (en) A kind of method for scheduling task minimizing MapReduce cluster energy consumption
Niyato et al. Cooperative virtual machine management for multi-organization cloud computing environment
Li et al. An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters
Zhang et al. A PSO-based hierarchical resource scheduling strategy on cloud computing
CN106126340A (en) A kind of reducer system of selection across data center's cloud computing system
Wu et al. Monte Carlo simulation-based robust workflow scheduling for spot instances in cloud environments
Zhu et al. A priority-aware scheduling framework for heterogeneous workloads in container-based cloud
Abadi et al. Task scheduling in fog environment—Challenges, tools & methodologies: A review
AlOrbani et al. Load balancing and resource allocation in smart cities using reinforcement learning
CN113190342B (en) Method and system architecture for multi-application fine-grained offloading of cloud-edge collaborative networks
CN117493020A (en) Method for realizing computing resource scheduling of data grid
Cao et al. Online cost-rejection rate scheduling for resource requests in hybrid clouds
Prado et al. On providing quality of service in grid computing through multi-objective swarm-based knowledge acquisition in fuzzy schedulers
Hung et al. A dynamic scheduling method for collaborated cloud with thick clients.
Huang The value-of-information in matching with queues
Toporkov et al. Fair resource allocation and metascheduling in grid with VO stakeholders preferences
Liu A Programming Model for the Cloud Platform

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170315

RJ01 Rejection of invention patent application after publication