CN106502790A - A kind of task distribution optimization method based on data distribution - Google Patents
A kind of task distribution optimization method based on data distribution Download PDFInfo
- Publication number
- CN106502790A CN106502790A CN201610890105.8A CN201610890105A CN106502790A CN 106502790 A CN106502790 A CN 106502790A CN 201610890105 A CN201610890105 A CN 201610890105A CN 106502790 A CN106502790 A CN 106502790A
- Authority
- CN
- China
- Prior art keywords
- distribution
- task
- node
- map
- key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of task distribution optimization method based on data distribution, which realizes that process is:According to the network distance between node and the data transfer cost of intermediate result weight distribution assessment of scenario reduce tasks;Show that the optimum of each task executes node set according to reduce tasks data transfer cost on different nodes;Specific Task Assigned Policy and algorithm are provided based on the optimum node set that executes.Task distribution optimization method that should be based on data distribution is compared with prior art, the data transfer that execute reduce task bring effectively is reduced, can be network access request that MapReduce programs reduce about 12%, and the operation response time also shortens 9% or so, practical.
Description
Technical field
The present invention relates to computer data integrated technology field, specifically a kind of practical, be based on data distribution
Task distribution optimization method.
Background technology
The explosive growth of information promotes internet to enter the big data epoch, and nowadays big data has become a kind of important
Strategic resource and new decision mode, and cloud computing then provides powerful calculating and storage energy for big data process and analysis
Power.With big data and the rise of cloud computing, increasing company starts with MapReduce and Hadoop to provide cloud clothes
Business.Wherein, MapReduce is a kind of programming model that Google proposes, and is generally used for the concurrent operation of large-scale dataset, and
Hadoop is one and achieves including the multiple programming that increases income including MapReduce model and distributed file system (HDFS)
Framework, with high efficiency, highly reliable, high fault-tolerant, inexpensive and extendible characteristic.
The network bandwidth always restricts the bottleneck of cloud computing development, while being also one of current study hotspot.Such as Fig. 1 institutes
Show, MapReduce programs can be abstracted into two specific functions:Map functions and reduce functions, wherein map functions are responsible for
Decompose input data and carry out preliminary treatment, and reduce functions are responsible for collecting intermediate result to obtain final result.
MapReduce frameworks build map tasks generally on the node of data storage block, can so reduce data transfer and to network
The occupancy of bandwidth.But reduce tasks do not have the advantage of data localization, because the input of single reduce tasks is logical
Often from the output of multiple map tasks, and each reduce task is required for exporting final result in HDFS, so
The input and output of reduce functions is required for taking the network bandwidth.
This is based on, the present invention proposes a kind of task distribution optimization method based on data distribution, by reasonable distribution
The starter node of reduce tasks reduces network and the I/O expenses that data transfer is brought, while improving the property of MapReduce programs
Energy.
Content of the invention
The technical assignment of the present invention is for above weak point, there is provided a kind of practical, appointing based on data distribution
Business distribution optimization method.
A kind of task distribution optimization method based on data distribution, which implements process and is:
First, according to the network distance between node and the data transfer of intermediate result weight distribution assessment of scenario reduce tasks
Cost;
2nd, the data transfer cost according to reduce tasks on different nodes show that the optimum of each task executes node
Set;
3rd, specific Task Assigned Policy and algorithm are provided based on the optimum node set that executes.
Network distance between the node is referred specifically to:When MapReduce programs have m map task Mi and n reduce to appoint
During business Rj, wherein 0≤i≤m, 0≤j≤n, and the input of each reduce task is all from the output of all map tasks;
On node of the intermediate result that map tasks are produced by network transmission to operation reduce tasks, the section that all map tasks are located
Point is the overall network of Rj apart from TND to reduce tasks Rj place node apart from sumRj.
Described intermediate result weight distribution, reduces local prediction distribution map by obtaining global distributed intelligence, with key assignments
To being counted to the weight distribution situation of intermediate result for granularity and being predicted, and the number with reference to network distance to reduce tasks
It is estimated according to transmission cost.
The detailed process for obtaining global distributed intelligence is as follows:
1) when the implementation progress in map stages is α, each node is counted to intermediate result key-value pair, wherein
slowstartconf≤ α≤1, slowstartconfFor user configured parameter, represent when the ratio for executing the map tasks for completing
Reach slowstartconfWhen, start to execute reduce tasks;
2) when each node carries out subregion according to partition functions to intermediate result, the key assignments corresponding to intermediate result is counted
Right, generate a series of (k, n) tuple and the value according to n is ranked up from big to small;
3) global interceptive value θ is set, i.e., only complete as building with % (k, n) tuples list of front θ in local distribution figure
The foundation of score of the game Butut, in local distribution figure, the key assignments logarithm n of θ % (k, n) is referred to as local interceptive value, after blocking
Distribution map be referred to as local truncated distribution figure L;
4) global distribution map G is built:Global distribution lower limit G is defined firstLWith global distribution upper limit GU, they distinguish table
Show by the maximum and minimum of a value of local truncated distribution figure number of tuples corresponding with each key that local interceptive value is obtained, then
If global distribution lower limit GL={ (k, NL) k ∈ K, global distribution upper limit GU={ (k, NU) | k ∈ K }, then haveWherein,
If 5) set global distribution map G={ (k, N) | k ∈ K }, then using the median of upper and lower bound as global distribution map
Result, i.e.,
6) correction is predicted to global distribution map according to historical rethinking, it is assumed that the distribution bias of arbitrary key are current distribution
Ratio and the difference of historical rethinking ratio, choose the maximum key of distribution bias for correction key kc, and with (kc, N) and kcHistorical rethinking
Scale prediction intermediate result key-value pair sum, and then predict the corresponding key-value pair predicted value of each key, revised global distribution map
Referred to as global prediction distribution map Gc.
By global distributed intelligence reduce local prediction distribution map detailed process be:
The local prediction distribution map is Lc, from global distribution map G, for arbitrary key k, if (k, n) ∈ is Li, then which is right
LcContribute as n, otherwise contribution isBased on the key assignments logarithm N that global prediction distribution map and global distribution map prediction will be generatedc,
According to the proportional division number of tuples of the operation progress of each map task, the progress of even map tasks isSo task
In corresponding intermediate result, the prediction key assignments logarithm of key k is
The data transfer cost for assessing reduce tasks in the step one is specially:Node w executes reduce tasks r
Data transfer cost Costw/rFor the data transfer cost sum that r pulls corresponding intermediate result key-value pair from each node, i.e.,Wherein, miFor executing the node of map tasks i, d (w, mi) it is network distance between two nodes,
rinputInput key-value pair set for r.
Show in the step 2 that the optimum node set that executes of each task as show that the optimum of reduce tasks r is held
Row node set NoptimalR (), executes the data transfer generation that task r can all produce minimum on the arbitrary node w that this is gathered
Valency Costw/r, its detailed process is:
OPTIMAL TASK set R when arbitrary nodeoptimalN () is that node n draws in all reduce tasks having not carried out
When taking the set that the minimum task of intermediate result key-value pair desired data transmission cost is constituted, it is not any task in present node
Optimum execute node in the case of, task selector will be its distribution RoptimalTask in (n);
During node request reduce tasks, obtain the optimum of the task that is not carried out first successively and execute node set, if currently
Node is that the optimum of the task executes node, then return the task;Otherwise, it is that the skipcount attributes of the task add 1,
Skipcount have recorded each task due to optimum executing the number of times that node is asked and is skipped;If present node
It is not that the optimum of any task executes node, then obtains the optimum of present node and execute task list, and select distribution to skip meter
The maximum task of number;Optimum execute node and optimum execution task the map stages execute terminate before periodically update, to ensure to adjust
The real-time of degree.
A kind of task distribution optimization method based on data distribution of the present invention, with advantages below:
A kind of task distribution optimization method based on data distribution that the present invention is provided, accurately global by obtaining
Distributed intelligence reduce local prediction distribution map, with key-value pair as granularity to the weight distribution situation of intermediate result carried out statistics and
Prediction;According to the network distance between node and the data transfer cost of intermediate result weight distribution assessment of scenario reduce tasks,
And provide the degree of accuracy and network transmission expense for blocking the perception of Forecasting Methodology equilibrium criterion;The optimum of task based access control executes set of node
The OPTIMAL TASK set with node is closed, the allocation strategy of reduce tasks under cloud computing environment is optimized and is given and be concrete
Algorithm;On the basis of job class scheduling strategy, data transfer is reduced by the starter node of reasonable distribution reduce task
The network for bringing and I/O expenses, while the performance of MapReduce programs is improve, practical, applied widely, it is easy to
Promote.
Description of the drawings
Accompanying drawing 1 is MapReduce DFDs.
Accompanying drawing 2 is Hadoop cluster network configuration diagrams.
Specific embodiment
Below in conjunction with the accompanying drawings and specific embodiment the invention will be further described.
Hadoop clusters use client/server and tree-network topology structure.Under cloud computing environment, data
Center generally comprises multiple frames, and each frame is assembled with multiple servers again, the characteristics of this framework is:Same machine frame inside
Total bandwidth between node will be far above the bandwidth of different frame intermediate nodes.The present invention takes full advantage of this feature, by reasonable
The starter node of distribution reduce tasks is reducing the data transfer between frame.
As shown in Figure 2, a kind of task distribution optimization method based on data distribution of the invention, is distributed with perception data
For core, it is proposed that the task distribution optimisation strategy with data transfer cost as evaluation index.The strategy employs greedy algorithm
Thought, is calculated optimum execution node set, is reduced as far as by the local prediction distribution map of structure intermediate result
Data transfer during reduce tasks carryings, so as to reduce network and the I/O expenses that data transfer is brought, while improve should
Throughput with the time performance and whole cluster of program.
Main contents include:
First, according to the network distance between node and the data transfer of intermediate result weight distribution assessment of scenario reduce tasks
Cost;
2nd, the data transfer cost according to reduce tasks on different nodes show that the optimum of each task executes node
Set;
3rd, specific Task Assigned Policy and algorithm are provided based on the optimum node set that executes.
Network distance between the node is referred specifically to:Under cloud computing environment, it is assumed that MapReduce programs have m (0≤i≤
M) individual map tasks Mi and individual reduce tasks Rj of n (0≤j≤n), and the input of each reduce task is all from all map
The output of task.As the intermediate result that map tasks are produced is needed on the node by network transmission to operation reduce tasks,
The overall network distance that apart from sum be referred to as Rj of the node that then all map tasks are located to reduce tasks Rj place node
(TNDRj).Obvious TNDRjBigger, need that the intermediate result for being transferred to reduce tasks is more remote, the speed of data transfer is slower.
As illustrated in fig. 2, it is assumed that a Hadoop cluster includes two frames, N0~N9 is represented in Hadoop clusters respectively
10 from node, wherein N0~N4 is located at frame 1, and N5~N9 is located at frame 2;Assume that MapReduce programs there are 6 map to appoint
Business and 4 reduce tasks, wherein map tasks are located at N0, N1, N2, N3, N5, N6 node respectively, and reduce tasks distinguish position
In N3, N4, N6, N7 node;Simultaneously assume in Hadoop clusters each child node to the network distance of its father node be 1, then together
In one frame, the network distance of two nodes is 2, and between different frames, the network distance of two nodes is 4.Calculate separately below every
The overall network of individual reduce tasks is apart from TNDR0、TNDR1、TNDR2And TNDR3:
TNDR0=3 × 2+2 × 4=14;
TNDR1=4 × 2+2 × 4=16;
TNDR2=4 × 4+1 × 2=18;
TNDR3=4 × 4+2 × 2=20;
As can be seen here, when reduce tasks be located at different from node when, its overall network distance is also not quite similar.This is just
Show that reduce tasks also possess the characteristic of data localization, but from unlike map tasks, reduce tasks are more concerned with whole
The input data in map inputs rather than individual node in frame.When reduce tasks are located at the N3 nodes in frame 1,
Its overall network distance is 14, and when reduce tasks are located at the N7 nodes in frame 2, its overall network distance is 20.Therefore,
The starter node of reasonable selection reduce task can reduce overall network distance, shorten the duration in shuffle stages, from
And improve the time performance of application program.
Except considering the network distance of reduce tasks, the weight distribution of intermediate result is also to weigh data transfer cost
Key factor.The distribution situation of intermediate result key-value pair can be collected and count with subregion as granularity, but this yet suffers from two
Individual problem:1) in order to reduce the delay that intermediate result data transmission brings, the scheduling of reduce tasks is typically complete in the map stages
Just have begun to before terminating, partition size now there may be very big difference with final key-value pair distribution, be likely to result in
Scheduling result is inaccurate;2) distribution of key-value pair generally has certain rule, even if being carried out to final subregion distribution with this
Prediction, also due to subregion granularity is big and the problem of the partition functions that place one's entire reliance upon, it is impossible to using existing knowledge to final key
Value to distribution be predicted and revise.For the problems referred to above, weight point of the present invention with key-value pair as granularity to intermediate result
Cloth is counted and has been predicted, and the data transfer cost of reduce tasks is assessed with reference to network distance.
The quantity of intermediate result key-value pair needs to be counted on each node for executing map tasks, but due to data
Than larger, the data distribution collector that all key-value pair tuples (k, n) are transferred to host node can be expended more amount by each node
Internet resources and time.On the other hand, absolutely accurate key-value pair distributed intelligence meaning is obtained simultaneously less, and institute is in this way simultaneously
Unreasonable.The present invention obtains accurately overall situation distributed intelligence first, then pre- according to global prediction distribution map reduction local
Survey distribution map to calculate the data transfer cost for executing required by task, concrete statistic processes is as follows:
(1) when the implementation progress in map stages is α (slowstartconfDuring≤α≤1), each node is to intermediate result key
Value to counting, wherein slowstartconfFor user configured parameter, represent and reach when the ratio for executing the map tasks for completing
Arrive slowstartconfWhen, start to execute reduce tasks.
(2) when each node carries out subregion according to partition functions to intermediate result, the key corresponding to intermediate result is counted
Value is right, generates a series of (k, n) tuple and the value according to n is ranked up from big to small.
(3) global interceptive value θ is set, i.e., only complete as building with % (k, n) tuples list of front θ in local distribution figure
The foundation of score of the game Butut, in local distribution figure, the key assignments logarithm n of θ % (k, n) is referred to as local interceptive valueAfter blocking
Distribution map is referred to as local truncated distribution figure L.
(4) global distribution map G is built.Global distribution lower limit (G is defined firstL) and the global distribution upper limit (GU), they distinguish
Represent by the maximum and minimum of a value of local truncated distribution figure number of tuples corresponding with each key that local interceptive value is obtained, Ran Houshe
Global distribution lower limit GL={ (k, NL) | k ∈ K }, global distribution upper limit GU={ (k, NU) | k ∈ K }, then have Wherein,
(5) if setting global distribution map G={ (k, N) | k ∈ K }, then using the median of upper and lower bound as global distribution map
Result, i.e.,
(6) correction is predicted to global distribution map according to historical rethinking, it is assumed that the distribution bias of arbitrary key are current point
Cloth ratio and the difference of historical rethinking ratio, choose the maximum key of distribution bias for correction key kc, and with (kc, N) and kcHistory point
Cloth scale prediction intermediate result key-value pair sum, and then the corresponding key-value pair predicted value of each key is predicted, revised global distribution
Figure is referred to as global prediction distribution map Gc.
(7) local prediction distribution map L is reduced according to global prediction distribution mapc.From global distribution map G, for appointing
One key k, if (k, n) ∈ is Li, then which is to LcContribute as n, otherwise contribution isBased on global prediction distribution map and global distribution
Figure can predict the key assignments logarithm N that will be generatedc, according to the proportional division number of tuples of the operation progress of each map task, i.e.,
If the progress of map tasks isSo in the corresponding intermediate result of the task, the prediction key assignments logarithm of key k is
In sum, can be drawn based on the network distance between node and intermediate result weight distribution situation:Node w is executed
Data transfer cost Cost of reduce tasks rw/rPass from the data that each node pulls corresponding intermediate result key-value pair for r
Defeated cost sum, i.e.,Wherein, miFor executing the node of map tasks i, d (w, mi) it is two nodes
Between network distance, rinputInput key-value pair set for r.
Show in the step 2 that the optimum node set that executes of each task as show that the optimum of reduce tasks r is held
Row node set NoptimalR (), executes the data transfer generation that task r can all produce minimum on the arbitrary node w that this is gathered
Valency Costw/r, in order to reduce network and the I/O expenses that intermediate result data transmission brings, the optimal distributing scheme of reduce tasks
All tasks are all assigned to respective optimum execution exactly execute on node, so as to reach minimum global data transmission generation
Valency.But sometimes in order to meet requirement of the user to the operation response time in service-level agreement in real time, service provider must
Sharing out the work for all tasks must be completed before the time.Under this constraints, it is likely to result in part reduce and appoints
Business cannot be executed in its optimum and execute on node.
Additionally, it is also one of factor of constraint task distribution whether to have available resources on optimum execution node.In order to solve
This problem, it is assumed that the OPTIMAL TASK set R of arbitrary nodeoptimalN () is node in all reduce tasks having not carried out
N pulls the set of the minimum task composition of intermediate result key-value pair desired data transmission cost.It it is not any in present node
In the case of the optimum execution node of business, task selector will be which distributes RoptimalN the task in (), concrete allocation algorithm is such as
Under:
During node request reduce tasks, obtain the optimum of the task that is not carried out first successively and execute node set, if currently
Node is that the optimum of the task executes node, then return the task;Otherwise, it is that the skipcount attributes of the task add 1,
Skipcount have recorded each task due to optimum executing the number of times (1-8 rows) that node is asked and is skipped.If
Present node is not that the optimum of any task executes node, then obtain the optimum of present node and execute task list, and selects point
Maximum task (9-16 rows) is counted with skipping.Optimum node and the optimum execution task of executing is before the map stages execute and terminate
Periodically update, to ensure the real-time that dispatches.
Under cloud computing environment, the task distribution optimization method that the present invention is realized can effectively reduce execution reduce tasks
The data transfer that brings, can be the network access request of MapReduce programs minimizing about 12%, and the operation response time also contracts
Short by 9% or so.
Above-mentioned specific embodiment be only the present invention concrete case, the present invention scope of patent protection include but is not limited to
Above-mentioned specific embodiment, a kind of claim of any task distribution optimization method based on data distribution for meeting the present invention
The appropriate change or replacement done to which by book and any technical field those of ordinary skill, should all fall into the present invention's
Scope of patent protection.
Claims (7)
1. a kind of task distribution optimization method based on data distribution, it is characterised in which realizes that process is:
First, according to the network distance between node and the data transfer generation of intermediate result weight distribution assessment of scenario reduce tasks
Valency;
2nd, the data transfer cost according to reduce tasks on different nodes show that the optimum of each task executes set of node
Close;
3rd, specific Task Assigned Policy and algorithm are provided based on the optimum node set that executes.
2. a kind of task distribution optimization method based on data distribution according to claim 1, it is characterised in that the section
Network distance between point is referred specifically to:When MapReduce programs have m map task Mi and n reduce task Rj, wherein 0≤
I≤m, 0≤j≤n, and the input of each reduce task is all from the output of all map tasks;During map tasks are produced
Between on node of the result by network transmission to operation reduce tasks, the node that all map tasks are located is to reduce tasks
Rj places node apart from sum be Rj overall network apart from TNDRj.
3. a kind of task distribution optimization method based on data distribution according to claim 1, it is characterised in that described
Intermediate result weight distribution, reduces local prediction distribution map by obtaining global distributed intelligence, is granularity to centre with key-value pair
As a result weight distribution situation is counted and is predicted, and the data transfer cost of reduce tasks is carried out with reference to network distance
Assessment.
4. a kind of task distribution optimization method based on data distribution according to claim 3, it is characterised in that obtain complete
The detailed process of office's distributed intelligence is as follows:
1) when the implementation progress in map stages is α, each node is counted to intermediate result key-value pair, wherein
slowstartconf≤ α≤1, slowstartconfFor user configured parameter, represent when the ratio for executing the map tasks for completing
Reach slowstartconfWhen, start to execute reduce tasks;
2) when each node carries out subregion according to partition functions to intermediate result, the key-value pair corresponding to intermediate result is counted,
Generate a series of (k, n) tuple and the value according to n is ranked up from big to small;
3) overall situation interceptive value θ is set, i.e., is only divided as structure is global with % (k, n) tuples list of front θ in local distribution figure
The foundation of Butut, in local distribution figure, the key assignments logarithm n of θ % (k, n) is referred to as local interceptive valueDistribution after blocking
Figure is referred to as local truncated distribution figure L;
4) global distribution map G is built:Global distribution lower limit G is defined firstLWith global distribution upper limit GU, they are represented respectively by office
The maximum and minimum of a value of portion's truncated distribution figure number of tuples corresponding with each key that local interceptive value is obtained, then sets global distribution
Lower limit GL={ (k, NL) | k ∈ K }, global distribution upper limit GU={ (k, NU) | k ∈ K }, then haveIts
In,
If 5) set global distribution map G={ (k, N) | k ∈ K }, then using the median of upper and lower bound as the knot of global distribution map
Really, i.e.,
6) correction is predicted to global distribution map according to historical rethinking, it is assumed that the distribution bias of arbitrary key are current distribution proportion
With the difference of historical rethinking ratio, the maximum key of distribution bias is chosen for correction key kc, and with (kc, N) and kcHistorical rethinking ratio
Prediction intermediate result key-value pair sum, and then predict that the corresponding key-value pair predicted value of each key, revised global distribution map are referred to as
Global prediction distribution map Gc.
5. a kind of task distribution optimization method based on data distribution according to claim 4, it is characterised in that by complete
The detailed process of score of the game cloth information reverting local prediction distribution map is:
The local prediction distribution map is Lc, from global distribution map G, for arbitrary key k, if (k, n) ∈ is Li, then which is to LcTribute
Offer as n, otherwise contribution isBased on the key assignments logarithm N that global prediction distribution map and global distribution map prediction will be generatedc, according to
The proportional division number of tuples of the operation progress of each map task, the progress of even map tasks isSo the task is corresponding
Intermediate result in the prediction key assignments logarithm of key k be
6. a kind of task distribution optimization method based on data distribution according to claim 5, it is characterised in that the step
The data transfer cost for assessing reduce tasks in rapid one is specially:Node w executes the data transfer cost of reduce tasks r
Costw/rFor the data transfer cost sum that r pulls corresponding intermediate result key-value pair from each node, i.e.,Wherein, miFor executing the node of map tasks i, d (w, mi) it is network distance between two nodes,
rinputInput key-value pair set for r.
7. a kind of task distribution optimization method based on data distribution according to claim 1, it is characterised in that the step
Show in rapid two that the optimum node set that executes of each task is the optimum execution node set for drawing reduce tasks r
NoptimalR (), executes data transfer cost Cost that task r can all produce minimum on the arbitrary node w that this is gatheredw/r, its
Detailed process is:
OPTIMAL TASK set R when arbitrary nodeoptimalN () is in all reduce tasks having not carried out, during node n is pulled
Between the minimum task composition of result key-value pair desired data transmission cost set when, present node be not any task most
In the case of excellent execution node, task selector will distribute R for whichoptimalTask in (n);
During node request reduce tasks, obtain the optimum of the task that is not carried out first successively and execute node set, if present node
It is that the optimum of the task executes node, then returns the task;Otherwise, it is that the skipcount attributes of the task add 1, skipcount
Each task be have recorded as optimum the number of times that node is asked and is skipped cannot be executed;If present node is not any
The optimum of business executes node, then obtain the optimum of present node and execute task list, and selects to distribute to skip and count maximum appointing
Business;Optimum execute node and optimum execution task the map stages executes terminate before periodically update, to ensure the real-time of scheduling
Property.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610890105.8A CN106502790A (en) | 2016-10-12 | 2016-10-12 | A kind of task distribution optimization method based on data distribution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610890105.8A CN106502790A (en) | 2016-10-12 | 2016-10-12 | A kind of task distribution optimization method based on data distribution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106502790A true CN106502790A (en) | 2017-03-15 |
Family
ID=58295238
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610890105.8A Pending CN106502790A (en) | 2016-10-12 | 2016-10-12 | A kind of task distribution optimization method based on data distribution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106502790A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107506388A (en) * | 2017-07-27 | 2017-12-22 | 浙江工业大学 | A kind of iterative data balancing optimization method towards Spark parallel computation frames |
CN109496321A (en) * | 2017-07-10 | 2019-03-19 | 欧洲阿菲尼帝科技有限责任公司 | For estimating the technology of the expection performance in task distribution system |
CN109871265A (en) * | 2017-12-05 | 2019-06-11 | 航天信息股份有限公司 | The dispatching method and device of Reduce task |
CN109947559A (en) * | 2019-02-03 | 2019-06-28 | 百度在线网络技术(北京)有限公司 | Optimize method, apparatus, equipment and computer storage medium that MapReduce is calculated |
CN113467700A (en) * | 2020-03-31 | 2021-10-01 | 阿里巴巴集团控股有限公司 | Data distribution method and device based on heterogeneous storage |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120151292A1 (en) * | 2010-12-14 | 2012-06-14 | Microsoft Corporation | Supporting Distributed Key-Based Processes |
CN102541858A (en) * | 2010-12-07 | 2012-07-04 | 腾讯科技(深圳)有限公司 | Data equality processing method, device and system based on mapping and protocol |
CN102629219A (en) * | 2012-02-27 | 2012-08-08 | 北京大学 | Self-adaptive load balancing method for Reduce ends in parallel computing framework |
CN103279351A (en) * | 2013-05-31 | 2013-09-04 | 北京高森明晨信息科技有限公司 | Method and device for task scheduling |
US20130290972A1 (en) * | 2012-04-27 | 2013-10-31 | Ludmila Cherkasova | Workload manager for mapreduce environments |
US20160034482A1 (en) * | 2014-07-31 | 2016-02-04 | International Business Machines Corporation | Method and apparatus for configuring relevant parameters of mapreduce applications |
CN105589752A (en) * | 2016-02-24 | 2016-05-18 | 哈尔滨工业大学深圳研究生院 | Cross-data center big data processing based on key value distribution |
-
2016
- 2016-10-12 CN CN201610890105.8A patent/CN106502790A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102541858A (en) * | 2010-12-07 | 2012-07-04 | 腾讯科技(深圳)有限公司 | Data equality processing method, device and system based on mapping and protocol |
US20120151292A1 (en) * | 2010-12-14 | 2012-06-14 | Microsoft Corporation | Supporting Distributed Key-Based Processes |
CN102629219A (en) * | 2012-02-27 | 2012-08-08 | 北京大学 | Self-adaptive load balancing method for Reduce ends in parallel computing framework |
US20130290972A1 (en) * | 2012-04-27 | 2013-10-31 | Ludmila Cherkasova | Workload manager for mapreduce environments |
CN103279351A (en) * | 2013-05-31 | 2013-09-04 | 北京高森明晨信息科技有限公司 | Method and device for task scheduling |
US20160034482A1 (en) * | 2014-07-31 | 2016-02-04 | International Business Machines Corporation | Method and apparatus for configuring relevant parameters of mapreduce applications |
CN105589752A (en) * | 2016-02-24 | 2016-05-18 | 哈尔滨工业大学深圳研究生院 | Cross-data center big data processing based on key value distribution |
Non-Patent Citations (1)
Title |
---|
王捷: "基于SLA的MapReduce调度机制研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109496321A (en) * | 2017-07-10 | 2019-03-19 | 欧洲阿菲尼帝科技有限责任公司 | For estimating the technology of the expection performance in task distribution system |
CN107506388A (en) * | 2017-07-27 | 2017-12-22 | 浙江工业大学 | A kind of iterative data balancing optimization method towards Spark parallel computation frames |
CN109871265A (en) * | 2017-12-05 | 2019-06-11 | 航天信息股份有限公司 | The dispatching method and device of Reduce task |
CN109947559A (en) * | 2019-02-03 | 2019-06-28 | 百度在线网络技术(北京)有限公司 | Optimize method, apparatus, equipment and computer storage medium that MapReduce is calculated |
CN109947559B (en) * | 2019-02-03 | 2021-11-23 | 百度在线网络技术(北京)有限公司 | Method, device, equipment and computer storage medium for optimizing MapReduce calculation |
CN113467700A (en) * | 2020-03-31 | 2021-10-01 | 阿里巴巴集团控股有限公司 | Data distribution method and device based on heterogeneous storage |
CN113467700B (en) * | 2020-03-31 | 2024-04-23 | 阿里巴巴集团控股有限公司 | Heterogeneous storage-based data distribution method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | A method based on the combination of laxity and ant colony system for cloud-fog task scheduling | |
CN104915407B (en) | A kind of resource regulating method based under Hadoop multi-job environment | |
CN103699446B (en) | Quantum-behaved particle swarm optimization (QPSO) algorithm based multi-objective dynamic workflow scheduling method | |
Wieczorek et al. | Towards a general model of the multi-criteria workflow scheduling on the grid | |
CN106502790A (en) | A kind of task distribution optimization method based on data distribution | |
CN104123182B (en) | Based on the MapReduce task of client/server across data center scheduling system and method | |
Tantalaki et al. | Pipeline-based linear scheduling of big data streams in the cloud | |
CN101263458A (en) | Method and apparatus for a grid network throttle and load collector | |
CN106371924B (en) | A kind of method for scheduling task minimizing MapReduce cluster energy consumption | |
Niyato et al. | Cooperative virtual machine management for multi-organization cloud computing environment | |
Li et al. | An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters | |
Zhang et al. | A PSO-based hierarchical resource scheduling strategy on cloud computing | |
CN106126340A (en) | A kind of reducer system of selection across data center's cloud computing system | |
Wu et al. | Monte Carlo simulation-based robust workflow scheduling for spot instances in cloud environments | |
Zhu et al. | A priority-aware scheduling framework for heterogeneous workloads in container-based cloud | |
Abadi et al. | Task scheduling in fog environment—Challenges, tools & methodologies: A review | |
AlOrbani et al. | Load balancing and resource allocation in smart cities using reinforcement learning | |
CN113190342B (en) | Method and system architecture for multi-application fine-grained offloading of cloud-edge collaborative networks | |
CN117493020A (en) | Method for realizing computing resource scheduling of data grid | |
Cao et al. | Online cost-rejection rate scheduling for resource requests in hybrid clouds | |
Prado et al. | On providing quality of service in grid computing through multi-objective swarm-based knowledge acquisition in fuzzy schedulers | |
Hung et al. | A dynamic scheduling method for collaborated cloud with thick clients. | |
Huang | The value-of-information in matching with queues | |
Toporkov et al. | Fair resource allocation and metascheduling in grid with VO stakeholders preferences | |
Liu | A Programming Model for the Cloud Platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170315 |
|
RJ01 | Rejection of invention patent application after publication |