CN103218263A - Dynamic determining method and device for MapReduce parameter - Google Patents
Dynamic determining method and device for MapReduce parameter Download PDFInfo
- Publication number
- CN103218263A CN103218263A CN2013100785074A CN201310078507A CN103218263A CN 103218263 A CN103218263 A CN 103218263A CN 2013100785074 A CN2013100785074 A CN 2013100785074A CN 201310078507 A CN201310078507 A CN 201310078507A CN 103218263 A CN103218263 A CN 103218263A
- Authority
- CN
- China
- Prior art keywords
- task
- reduce task
- reduce
- adjusted
- default
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention provides a dynamic determining method and device for MapReduce parameters. The method comprises the steps of acquiring a MapReduce operation request, wherein the MapReduce operation request comprises a dataset to be operated, preset Reduce task quantity and parameters which are used for expressing whether an adjusting mechanism for the Reduce task quantity is allowed to be started or not; if the parameters which are used for expressing whether the adjusting mechanism for the Reduce task quantity is allowed to be started or not are allowable, monitoring the execution of Map tasks; if the quantity of the executed Map tasks satisfies a preset first threshold value, determining the adjusted Reduce task quantity; and according to the adjusted Reduce task quantity, enabling unexecuted preset Reduce tasks to correspond to the adjusted Reduce tasks, so as to realize the goal of dynamically determining a reasonable Reduce task quantity during MapReduce operation.
Description
Technical field
The present invention relates to the distributed computing technology field, relate in particular to a kind of dynamically definite method and device of MapReduce parameter.
Background technology
Mapping abbreviation MapReduce is a kind of distributed computing framework, and it uses for reference the thought of functional expression programming, and large-scale dataset is carried out Distributed Calculation efficiently.The MapReduce framework is divided into several mapping Map task and abbreviation Reduce tasks with a computational tasks Job, the data model of the input and output of Map task phase and Reduce task phase all is the Key-Value form, and the Reduce task phase relies on the input of the output of Map task phase as oneself.And the quantity of Map task determines that by the data set of importing the quantity of Reduce task is specified by the user.Because the data set of input is generally bigger, therefore, data set can be cut into a plurality of data block chunk, after the MapReduce operation is submitted to, the quantity of the data block that the scheduler Master of MapReduce framework can comprise according to input data set is determined the quantity of corresponding M ap task, makes each Map task handle a data block.
And the data block of each Map task input is converted into the Key-Value form, through the Map computing, the intermediate result of output Key-Value form, intermediate result can sort by Key, ranking results will be output on the local disk of this Map task run place computing node, and the MapReduce framework can be done polymerization to the key subregion and to the Value that drops in the same subregion by Key ordering and by the quantity of the Reduce task of user's appointment.The input data of each Reduce task are the parts of the intermediate result of a plurality of Map task outputs, for example, if the user has specified n Reduce task, just there be n subregion, the intermediate result that belongs to each subregion by Network Transmission to carrying out in the Reduce task that this subregion calculates, carry out the Reduce algorithm of user's appointment, export the result at last.
Because in existing MapReduce framework, before carrying out, the Map task just must know the accurate quantity of Reduce task, thereby can carry out subregion to the intermediate result of output according to the quantity of the Reduce task of user's appointment in the Map task phase, and the quantity of Reduce task is normally artificially specified by the user, therefore, how much intermediate result no matter the Map task phase is exported has, and all is to carry out the operation of Reduce task according to the quantity of set Reduce task.When the intermediate result of Map task phase output seldom the time, can in 1~2 Reduce task, carry out fully, but the quantity of the Reduce task of possible user's appointment is much larger than 2, if also according to the quantity operation of the Reduce task of user's appointment then can cause the unnecessary wasting of resources; And the quantity of the Reduce task of, user appointment a lot of when the intermediate result of Map task phase output relatively more after a little while, if according to the quantity operation of the Reduce task of user's appointment then can cause long problem of execution time.
Summary of the invention
The object of the present invention is to provide a kind of dynamically definite method and device of MapReduce parameter, thereby be implemented in the quantity of dynamically determining a rational Reduce task in the MapReduce operation.
First aspect of the present invention provides a kind of method of dynamically determining of MapReduce parameter, comprising:
Obtain the MapReduce job request, the parameter that described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and is used to represent whether allow start the adjustment mechanism of Reduce task quantity;
If the parameter of the described adjustment mechanism that is used to represent whether to allow to start Reduce task quantity is then monitored the Map task executions for allowing;
If the quantity of executed Map task satisfies presetting first threshold, the output result of described executed Map task is mapped to the subregion of the quantity of default Reduce task, then determines the quantity of adjusted Reduce task;
According to the quantity of described adjusted Reduce task, each unenforced default Reduce task is corresponded to each adjusted Reduce task, so that carry out each adjusted Reduce task.
Another aspect of the present invention provides a kind of dynamically definite device of MapReduce parameter, comprising:
The job request acquisition module, be used to obtain the MapReduce job request, the parameter that described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and is used to represent whether allow start the adjustment mechanism of Reduce task quantity;
Monitoring module is used for if the parameter of the described adjustment mechanism that is used to represent whether to allow to start Reduce task quantity for allowing, is then monitored the Map task executions;
Determination module is used for if the quantity of executed Map task satisfies presetting first threshold, the subregion of the quantity of the default Reduce task that the output result of described executed Map task is mapped to, the then quantity of definite adjusted Reduce task;
Mapping block is used for the quantity according to described adjusted Reduce task, and each unenforced default Reduce task is corresponded to each adjusted Reduce task, so that carry out each adjusted Reduce task.
Adopt the beneficial effect of the invention described above technical scheme to be: present embodiment is monitored by the Master to the MapReduce framework, thereby can be according to Map task executions situation, quantity to Reduce task in the MapReduce job request is dynamically adjusted, thereby carry out the Reduce task according to the quantity of adjusted Reduce task, to solve problems such as the wasting of resources that caused by the quantity of the static Reduce of appointment of user task in the prior art or execution time be long.
Description of drawings
The process flow diagram of the method for dynamically determining of a kind of MapReduce parameter that Fig. 1 provides for the embodiment of the invention one;
The structural representation of dynamically determining device of a kind of MapReduce parameter that Fig. 2 provides for the embodiment of the invention two.
Embodiment
The process flow diagram of the method for dynamically determining of a kind of MapReduce parameter that Fig. 1 provides for the embodiment of the invention one, as shown in Figure 1, described method can comprise the steps:
Need to prove that the executive agent of present embodiment method can be dynamically definite device of MapReduce parameter, this device is monitored the Master of MapReduce framework, thereby can obtain the job request of MapReduce by Master.Wherein, the job request of the MapReduce parameter that can comprise the quantity for the treatment of work data collection, default Reduce task and be used to represent whether allow start the adjustment mechanism of Reduce task quantity.Owing to before carrying out the Map task, just must know the accurate quantity of Reduce task, so that can carry out subregion to the intermediate result of output according to the quantity of Reduce task in the Map task phase, therefore, the job request of each MapReduce is all preset the quantity of a Reduce task, so that carry out the Map task; In addition, the device of dynamically determining of MapReduce parameter determines whether to start the adjustment of Reduce task quantity in the present embodiment according to the parameter that is used to represent whether to allow to start the adjustment mechanism of Reduce task quantity, for instance, this parameter can be the parameter value that expression allows startup or do not allow to start, as being True, False or default, in the present embodiment, True can represent to allow when treating the work data collection carries out the MapReduce operation device of dynamically determining of MapReduce parameter that the quantity of Reduce task is dynamically adjusted, so that can carry out the Reduce task according to the quantity of adjusted Reduce task when carrying out the Reduce task; False or defaultly then can represent not allow when treating the work data collection carries out the MapReduce operation device of dynamically determining of MapReduce parameter that the quantity of Reduce task is dynamically adjusted, and when carrying out the Reduce task, can only carry out the Reduce task according to the quantity of default Reduce task.
In the present embodiment, when the parameter of the adjustment mechanism that is used to represent whether to allow to start Reduce task quantity when allowing, then this parameter can trigger the adjustment of dynamically determining device startup Reduce task quantity of MapReduce parameter, therefore, dynamically definite device of MapReduce parameter continues the Master of MapReduce framework is monitored, and according to treating that the work data collection determines the quantity of Map task, or, monitor the Map task executions according to job request simultaneously by the quantity that Master obtains Map task in this MapReduce operation.
In the present embodiment, in order can reasonably to adjust to the quantity of Reduce task, therefore, can be after the Map task be carried out a period of time, determine the input data total amount of Reduce task in this MapReduce operation in conjunction with the data volume of the intermediate result of the quantity of executed Map task and the output of executed Map task, specifically, because the data volume of each Map task input is identical, therefore, the data volume of the intermediate result of its output is also basic identical, again because the input of Reduce task is exactly the output of Map task, therefore, can determine the input data total amount of Reduce task according to the quantity of Map task in the data volume of the intermediate result of the quantity of executed Map task and the output of executed Map task and this MapReduce operation, thereby the quantity of Reduce task be adjusted according to the data total amount of Reduce task.In the present embodiment, owing to be the intermediate result of output to be carried out subregion when beginning to carry out the Map task according to the quantity of default Reduce task, therefore, in the subregion of the quantity of the default Reduce task that is mapped to of the intermediate result of executed Map task output.In the present embodiment, start the adjustment opportunity of Reduce task quantity by the first threshold that is provided with as the device of dynamically determining that triggers the MapReduce parameter, specifically, when the quantity of executed Map task satisfies presetting first threshold, then triggering the device of dynamically determining of MapReduce parameter dynamically adjusts the quantity of Reduce task according to executed Map task, thereby determine the quantity of adjusted Reduce task, generally, the quantity of adjusted Reduce task is less than the quantity of default Reduce task.
In the present embodiment, owing to be the intermediate result of output to be carried out subregion when beginning to carry out the Map task according to the quantity of default Reduce task, it is the respectively corresponding subregion of each default Reduce task, therefore, after the quantity of Reduce task is adjusted, then the pairing partition map of unenforced Reduce task in the quantity of default Reduce task is arrived adjusted Reduce task, be the subregion that an adjusted Reduce task can corresponding one or more default Reduce task correspondences, thereby make that Master can the adjusted Reduce task of scheduled for executing.
Present embodiment is monitored by the Master to the MapReduce framework, thereby can be according to Map task executions situation, quantity to Reduce task in the MapReduce job request is dynamically adjusted, thereby carry out the Reduce task according to the quantity of adjusted Reduce task, to solve problems such as the wasting of resources that caused by the quantity of the static Reduce of appointment of user task in the prior art or execution time be long.
Concrete, presetting first threshold described in the foregoing description can be the amount threshold of the Map task preset, for example, can be 30 Map tasks or 100 Map task dispatchings, concrete choosing of numerical value can be provided with according to the actual job situation, and present embodiment does not limit this.Therefore, when the quantity of executed Map task satisfied the amount threshold of default Map task, the device of dynamically determining that then triggers the MapReduce parameter started the adjustment of Reduce task quantity, thereby determines the quantity of adjusted Reduce task.
Preferably, the presetting first threshold described in the foregoing description can be the ratio of presetting, and for example, can be 1/5 or 1/3 etc., and concrete choosing of numerical value can be provided with according to the actual job situation, and present embodiment does not limit this.Therefore, when the ratio between the total quantity of the quantity of executed Map task and Map task satisfies default ratio, the device of dynamically determining that then triggers the MapReduce parameter starts the adjustment of Reduce task quantity, thereby determine the quantity of adjusted Reduce task, in the present embodiment, the total quantity of Map task can be definite according to treating the work data collection, also can obtain by Master.
Further, on the basis of above-mentioned arbitrary embodiment, can also comprise in the MapReduce job request and be used for the second default threshold value that forward scheduling Reduce task is carried out, wherein, second threshold value also can be the amount threshold or the ratio of the Map task preset, promptly when the quantity of executed Map task satisfies second threshold value, can trigger Master and begin to carry out default Reduce task.
In the present embodiment, when second threshold value during less than first threshold, be that the quantity of executed Map task satisfies the second default threshold value but when not satisfying presetting first threshold, then the device of dynamically determining of MapReduce parameter can also be monitored default Reduce task executions, wherein, a subregion in the subregion of the quantity of the corresponding Reduce task of presetting of each default Reduce task difference; And after the quantity of executed Map task satisfied presetting first threshold, then dynamically definite device of MapReduce parameter can also be indicated and be stopped unenforced default Reduce task.And determine the data volume of unenforced Reduce task according to the data total amount of Reduce task in the quantity of executed Map task and this MapReduce operation, thereby determine the quantity of corresponding unenforced Reduce task according to the data volume of unenforced Reduce task, concrete, when carrying out unenforced Map task, the intermediate result of its output is mapped in the subregion of quantity of the Reduce task that redefines, promptly unenforced default Reduce task is carried out by the Reduce task that redefines.
For a MapReduce operation, at Map and Reduce in two stages, because the quantity of Map task is fixed, therefore, the cost of Map task also is relatively-stationary, and by the MapReduce performance model, calculate the T.T. cost and finding during the execution time of operation under the situation of quantity of the corresponding different Reduce tasks of same MapReduce operation respectively, the common meeting of T.T. cost of operation increases along with the increase of the quantity of Reduce task, simultaneously, the quantity that increases the Reduce task can improve the computing power of cluster and the degree of parallelism between the task, shortens the execution time of operation; Vice versa.Therefore, in a specific implementation of the present invention,, calculate the cost of Reduce task, and seek equilibrium point so that the quantity of Reduce task is adjusted at time cost and between the execution time by the MapReduce performance model.
For instance, in the MapReduce performance model, the time cost of Reduce task comprises TR1_init, TR2_read, TR3_net, TR4_merge, TR5_serial, TR6_io, TR7_parse, TR8_Reducer, TR9_net and TR10_write, wherein, the time cost of system when TR1_init represents initialization Reduce task, be initialization task, open task, close task, time of loading procedure etc., TR1_init=RedSysCost+RedInit usually; The IO cost of read data when TR2_read represents to begin to carry out the Reduce task, TR2_read=ReduceInput/seqRead usually; The network cost of transmission data when TR3_net represents to carry out the Reduce task, usually
The TR4_merge required CPU cost that sorts when representing to carry out the Reduce task, TR4_merge=SortCEF*RIRNumber*logNumberofMap usually; TR5_serial represents to carry out the time cost of Reduce task time seriesization, usually TR5_serial=(se1*RIRNumber+se2*ReduceInput); The disk I cost that TR6_io relates to when representing to carry out the Reduce task, usually
The required time cost of resolution data when TR7_parse represents to carry out the Reduce task, TR7_parse=pa1*RIRNumber+pa2*ReduceInput usually; The required time cost of function calculation when TR8_Reducer represents to carry out the Reduce task, TR8_Reducer=ReduceInput*ComplexOfReduce*CEF usually; TR9_net represents to carry out the network cost of transmission data after the Reduce task, usually
TR10_write represents to carry out after the Reduce task to the IO of disk write data cost, usually
Therefore, the time cost TR of each Reduce task equals above-mentioned every sum, that is:
Supposing has n Reduce task in the MapReduce operation, therefore, the T.T. cost TRS of all Reduce tasks is in MapReduce operation:
TRS=n*TR。
With above-mentioned various substitution pricing formula TR, the time cost TR that then carries out a Reduce task equals:
Wherein, initialization when RedSysCost represents initialization Reduce task, open, close the time cost of task, be traditionally arranged to be 2000 milliseconds (ms); When representing initialization Reduce task, RedInit loads additional programs or required time of data; ReduceInput represents to carry out the data volume that a Reduce task is imported; SeqRead represents to read in the Preset Time data volume of disk; Bandwidth represents network transfer speeds in the Preset Time; SortCEF represents the coefficient that sorts; The quantity of the data that RIRNumber imports when representing to carry out the Reduce task; NumberOfMap represents the quantity of Map task; SeqWrite represents to write in the Preset Time data volume of disk; The complexity that ComplexOfReduce calculates when representing to carry out the Reduce task; HDFSReplica represents the quantity of copy; ReduceOutput represents to carry out the data volume of being exported after the Reduce task; Se1 and se2 are two parameters in serializing stage, and in the Mapredcue performance model, the needed time cost of serializing is linear with data volume, se1 and se2 are exactly two parameters of this linear relationship, say for a cluster, and be constant, therefore, can think constant; Pa1 and pa2 are two constant parameters of phase sorting; CEF is that clustered machine is handled the needed time of standard operation, also is constant.Remove in above-mentioned each parameter outside the Pass in ReduceInput, ReduceOutput, RIRNumber, NumberOfMap and the MapReduce operation there being the data volume of Map task and Reduce task, other parameter is the Given information of systemic presupposition.
Further, n is the quantity of Reduce task in the MapReduce operation if make, ReduceInput=Input/n then, wherein, Input represents total input data volume of all Reduce tasks in the MapReduce operation, and establishing Y is the data conversion rate of Reduce, be Reduce output data quantity and the ratio of importing data volume, Y=ReduceOutput/ReduceInput, ReduceOutput=Y*ReduceInput=Y*Input/n then, then
RIRNumber=ReduceInput/RIRLength=Input/n/RIRLength; Wherein, RIRlength represents to import the average length of data.
The above-mentioned formula of substitution also by behind the abbreviation, obtains:
If make a=RedSysCost+RedInit,
For given MapReduce operation, the value of a and b can be tried to achieve by above-mentioned parameter, is equivalent to constant, therefore carries out the time cost TR=a+b*Input/n of a Reduce task; Correspondingly, TRS=n*TR=a*n+b*Input.
Formation at the T.T. cost TRS of all Reduce tasks in the time cost TR of a Reduce task and the MapReduce operation, the first order derivative sum that can select TR and TRS is that 0 point is as equilibrium point, thereby adjust the quantity of Reduce task, that is:
The substitution equation:
Thereby can determine the quantity of Reduce task according to following formula, further, for fear of thereby the long situation of more single Reduce task executions time of data volume that each Reduce handles occurring, the maximum processing data volume that can further limit single Reduce task is U, therefore, use the Reduce task of Input/U quantity at the minimum needs of each MapReduce operation and handle all data, the value of U is specifically as follows: U=ReduceJVM*ShuffleBufferPercent*ShuffleMergePercent, therefore:
Present embodiment is monitored by the Master to the MapReduce framework, thereby can be according to Map task executions situation, determine the total amount of data of Reduce task in the MapReduce operation, thereby according to the total amount of data of Reduce task and take all factors into consideration the time cost of carrying out the Reduce task and the quantity of Reduce task is adjusted, to determine the quantity of a rational Reduce task, and carry out the Reduce task according to the quantity of adjusted Reduce task, to solve problems such as the wasting of resources that caused by the quantity of the static Reduce of appointment of user task in the prior art or execution time be long, make the time cost of carrying out the Reduce task be tending towards minimum simultaneously.
One of ordinary skill in the art will appreciate that: all or part of step that realizes above-mentioned each method embodiment can be finished by the relevant hardware of programmed instruction.Aforesaid program can be stored in the computer read/write memory medium.This program is carried out the step that comprises above-mentioned each method embodiment when carrying out; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
The structural representation of dynamically determining device of a kind of MapReduce parameter that Fig. 2 provides for the embodiment of the invention two, as shown in Figure 2, the MapReduce parameter of present embodiment determine that dynamically device can comprise:
Job request acquisition module 201, be used to obtain the MapReduce job request, the parameter that described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and is used to represent whether allow start the adjustment mechanism of Reduce task quantity;
The device of dynamically determining of the MapReduce parameter of present embodiment can be used to carry out the technical scheme of method embodiment shown in Figure 1, and its realization principle and technique effect are similar, repeat no more herein.
Further, determination module specifically can be used for: according to the data total amount of unenforced Reduce task, determine the quantity of adjusted Reduce task.
Preferably, determination module specifically can be used for:
According to following formula, determine the quantity of adjusted Reduce task:
Wherein, n is the quantity of adjusted Reduce task, TR is for carrying out the time cost of an adjusted Reduce task, TRS is for carrying out the T.T. cost of adjusted all Reduce tasks, and TR depends on the data total amount of unenforced Reduce task and the quantity of adjusted Reduce task.
Concrete, the MapReduce job request in the foregoing description can also comprise the second default threshold value that is used for the execution of forward scheduling Reduce task, less than first threshold, then monitoring module specifically can be used for as if second threshold value:
If the quantity of executed Map task satisfies described second threshold value and does not satisfy presetting first threshold, the default Reduce task executions of monitoring then, a subregion in the subregion of the quantity of the corresponding respectively described default Reduce task of each default Reduce task;
And after the quantity of executed Map task satisfied presetting first threshold, indication stopped to carry out unenforced default Reduce task.
Concrete, in above-mentioned arbitrary embodiment, presetting first threshold can be the amount threshold of default Map task or default ratio, then determination module specifically is used for:
If the quantity of executed Map task satisfies the amount threshold of default Map task, then start the adjustment mechanism of Reduce task quantity, determine the quantity of adjusted Reduce task; Or,
If the ratio between the total quantity of the quantity of executed Map task and Map task satisfies default ratio, then start the adjustment mechanism of Reduce task quantity, determine the quantity of adjusted Reduce task, wherein, the total quantity of Map task is determined according to the described work data collection for the treatment of.
It should be noted that at last: above each embodiment is not intended to limit only in order to technical scheme of the present invention to be described; Although the present invention is had been described in detail with reference to aforementioned each embodiment, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, and perhaps some or all of technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the scope of various embodiments of the present invention technical scheme.
Claims (10)
1. the method for dynamically determining of a MapReduce parameter is characterized in that, comprising:
Obtain the MapReduce job request, the parameter that described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and is used to represent whether allow start the adjustment mechanism of Reduce task quantity;
If the parameter of the described adjustment mechanism that is used to represent whether to allow to start Reduce task quantity is then monitored the Map task executions for allowing;
If the quantity of executed Map task satisfies presetting first threshold, the output result of described executed Map task is mapped to the subregion of the quantity of default Reduce task, then determines the quantity of adjusted Reduce task;
According to the quantity of described adjusted Reduce task, each unenforced default Reduce task is corresponded to each adjusted Reduce task, so that carry out each adjusted Reduce task.
2. method according to claim 1 is characterized in that, the described quantity of determining adjusted Reduce task specifically comprises:
According to the data total amount of unenforced Reduce task, determine the quantity of adjusted Reduce task.
3. method according to claim 2 is characterized in that, described data total amount according to unenforced Reduce task is determined specifically to comprise the quantity of adjusted Reduce task:
According to following formula, determine the quantity of adjusted Reduce task:
Wherein, n is the quantity of adjusted Reduce task, TR is for carrying out the time cost of an adjusted Reduce task, TRS is for carrying out the T.T. cost of adjusted all Reduce tasks, and TR depends on the data total amount of unenforced Reduce task and the quantity of adjusted Reduce task.
4. method according to claim 1, it is characterized in that, described MapReduce job request also comprises the second default threshold value that is used for the execution of forward scheduling Reduce task, if described second threshold value is less than described first threshold, then described if the quantity of executed Map task satisfies before the presetting first threshold, also comprise:
If the quantity of described executed Map task satisfies described second threshold value and does not satisfy presetting first threshold, the default Reduce task executions of monitoring then, a subregion in the subregion of the quantity of the corresponding respectively described default Reduce task of each default Reduce task;
Described if the quantity of executed Map task satisfies after the presetting first threshold, also comprise:
Indication stops to carry out unenforced default Reduce task.
5. according to each described method in the claim 1~4, it is characterized in that, described presetting first threshold is the amount threshold of default Map task or default ratio, it is described if the quantity of described executed Map task satisfies presetting first threshold, then determine the quantity of adjusted Reduce task, be specially:
If the quantity of described executed Map task satisfies the amount threshold of described default Map task, then start the adjustment mechanism of Reduce task quantity, determine the quantity of adjusted Reduce task; Perhaps,
If the ratio between the total quantity of the quantity of described executed Map task and Map task satisfies default ratio, then start the adjustment mechanism of Reduce task quantity, determine the quantity of adjusted Reduce task, the total quantity of described Map task is determined according to the described work data collection for the treatment of.
6. dynamically definite device of a MapReduce parameter is characterized in that, comprising:
The job request acquisition module, be used to obtain the MapReduce job request, the parameter that described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and is used to represent whether allow start the adjustment mechanism of Reduce task quantity;
Monitoring module is used for if the parameter of the described adjustment mechanism that is used to represent whether to allow to start Reduce task quantity for allowing, is then monitored the Map task executions;
Determination module is used for if the quantity of executed Map task satisfies presetting first threshold, the subregion of the quantity of the default Reduce task that the output result of described executed Map task is mapped to, the then quantity of definite adjusted Reduce task;
Mapping block is used for the quantity according to described adjusted Reduce task, and each unenforced default Reduce task is corresponded to each adjusted Reduce task, so that carry out each adjusted Reduce task.
7. device according to claim 6 is characterized in that, described determination module specifically is used for:
According to the data total amount of unenforced Reduce task, determine the quantity of adjusted Reduce task.
8. device according to claim 7 is characterized in that, described determination module specifically is used for:
According to following formula, determine the quantity of adjusted Reduce task:
Wherein, n is the quantity of adjusted Reduce task, TR is for carrying out the time cost of an adjusted Reduce task, TRS is for carrying out the T.T. cost of adjusted all Reduce tasks, and TR depends on the data total amount of unenforced Reduce task and the quantity of adjusted Reduce task.
9. device according to claim 6, it is characterized in that, described MapReduce job request also comprises the second default threshold value that is used for the execution of forward scheduling Reduce task, and less than described first threshold, then described monitoring module specifically is used for as if described second threshold value:
If the quantity of described executed Map task satisfies described second threshold value and does not satisfy presetting first threshold, the default Reduce task executions of monitoring then, a subregion in the subregion of the quantity of the corresponding respectively described default Reduce task of each default Reduce task;
And after the quantity of described executed Map task satisfied presetting first threshold, indication stopped to carry out unenforced default Reduce task.
10. according to each described device of claim 6~9, it is characterized in that described presetting first threshold is the amount threshold of default Map task or default ratio, described determination module specifically is used for:
If the quantity of described executed Map task satisfies the amount threshold of described default Map task, then start the adjustment mechanism of Reduce task quantity, determine the quantity of adjusted Reduce task; Or,
If the ratio between the total quantity of the quantity of described executed Map task and Map task satisfies default ratio, then start the adjustment mechanism of Reduce task quantity, determine the quantity of adjusted Reduce task, the total quantity of described Map task is determined according to the described work data collection for the treatment of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310078507.4A CN103218263B (en) | 2013-03-12 | 2013-03-12 | The dynamic defining method of MapReduce parameter and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310078507.4A CN103218263B (en) | 2013-03-12 | 2013-03-12 | The dynamic defining method of MapReduce parameter and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103218263A true CN103218263A (en) | 2013-07-24 |
CN103218263B CN103218263B (en) | 2016-03-23 |
Family
ID=48816085
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310078507.4A Active CN103218263B (en) | 2013-03-12 | 2013-03-12 | The dynamic defining method of MapReduce parameter and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103218263B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103645952A (en) * | 2013-08-08 | 2014-03-19 | 中国人民解放军国防科学技术大学 | Non-accurate task parallel processing method based on MapReduce |
CN104598304A (en) * | 2013-10-31 | 2015-05-06 | 国际商业机器公司 | Dispatch method and device used in operation execution |
CN104978228A (en) * | 2014-04-09 | 2015-10-14 | 腾讯科技(深圳)有限公司 | Scheduling method and scheduling device of distributed computing system |
CN105302536A (en) * | 2014-07-31 | 2016-02-03 | 国际商业机器公司 | Configuration method and apparatus for related parameters of MapReduce application |
WO2017031961A1 (en) * | 2015-08-24 | 2017-03-02 | 华为技术有限公司 | Data processing method and apparatus |
WO2017162027A1 (en) * | 2016-03-21 | 2017-09-28 | 阿里巴巴集团控股有限公司 | Control method and device for map end aggregation regarding user task in mr computing platform |
CN107402952A (en) * | 2016-05-20 | 2017-11-28 | 伟萨科技有限公司 | Big data processor accelerator and big data processing system |
CN108196970A (en) * | 2017-12-29 | 2018-06-22 | 东软集团股份有限公司 | The dynamic memory management method and device of Spark platforms |
CN110209645A (en) * | 2017-12-30 | 2019-09-06 | ***通信集团四川有限公司 | Task processing method, device, electronic equipment and storage medium |
CN110222105A (en) * | 2019-05-14 | 2019-09-10 | 联动优势科技有限公司 | Data summarization processing method and processing device |
CN110413396A (en) * | 2019-07-30 | 2019-11-05 | 广东工业大学 | A kind of resource regulating method, device, equipment and readable storage medium storing program for executing |
CN110795301A (en) * | 2018-08-01 | 2020-02-14 | 马上消费金融股份有限公司 | Job monitoring method, device, terminal and computer storage medium |
CN113157448A (en) * | 2014-06-30 | 2021-07-23 | 亚马逊科技公司 | System and method for managing feature processing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101770402A (en) * | 2008-12-29 | 2010-07-07 | ***通信集团公司 | Map task scheduling method, equipment and system in MapReduce system |
CN102096603A (en) * | 2009-12-14 | 2011-06-15 | ***通信集团公司 | Task decomposition control method in MapReduce system and scheduling node equipment |
US20120304186A1 (en) * | 2011-05-26 | 2012-11-29 | International Business Machines Corporation | Scheduling Mapreduce Jobs in the Presence of Priority Classes |
US20120317579A1 (en) * | 2011-06-13 | 2012-12-13 | Huan Liu | System and method for performing distributed parallel processing tasks in a spot market |
-
2013
- 2013-03-12 CN CN201310078507.4A patent/CN103218263B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101770402A (en) * | 2008-12-29 | 2010-07-07 | ***通信集团公司 | Map task scheduling method, equipment and system in MapReduce system |
CN102096603A (en) * | 2009-12-14 | 2011-06-15 | ***通信集团公司 | Task decomposition control method in MapReduce system and scheduling node equipment |
US20120304186A1 (en) * | 2011-05-26 | 2012-11-29 | International Business Machines Corporation | Scheduling Mapreduce Jobs in the Presence of Priority Classes |
US20120317579A1 (en) * | 2011-06-13 | 2012-12-13 | Huan Liu | System and method for performing distributed parallel processing tasks in a spot market |
Non-Patent Citations (4)
Title |
---|
KEWEN WANG等: "Predator - An Experience Guided Configuration Optimizer for Hadoop MapReduce", 《2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE》 * |
XUELIAN LIN等: "A Practical Performance Model for Hadoop MapReduce", 《2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING WORKSHOPS》 * |
周锋: "一种改进的MapReduce并行编程模型", 《科协论坛(下半月)》 * |
奚建清: "基于MapReduce的封闭立方体并行计算方法", 《华南理工大学学报(自然科学版)》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103645952B (en) * | 2013-08-08 | 2017-06-06 | 中国人民解放军国防科学技术大学 | A kind of non-precision tasks in parallel processing method based on MapReduce |
CN103645952A (en) * | 2013-08-08 | 2014-03-19 | 中国人民解放军国防科学技术大学 | Non-accurate task parallel processing method based on MapReduce |
CN104598304A (en) * | 2013-10-31 | 2015-05-06 | 国际商业机器公司 | Dispatch method and device used in operation execution |
CN104598304B (en) * | 2013-10-31 | 2018-03-13 | 国际商业机器公司 | Method and apparatus for the scheduling in Job execution |
CN104978228B (en) * | 2014-04-09 | 2019-08-30 | 腾讯科技(深圳)有限公司 | A kind of dispatching method and device of distributed computing system |
CN104978228A (en) * | 2014-04-09 | 2015-10-14 | 腾讯科技(深圳)有限公司 | Scheduling method and scheduling device of distributed computing system |
CN113157448B (en) * | 2014-06-30 | 2024-04-12 | 亚马逊科技公司 | System and method for managing feature processing |
CN113157448A (en) * | 2014-06-30 | 2021-07-23 | 亚马逊科技公司 | System and method for managing feature processing |
US10831716B2 (en) | 2014-07-31 | 2020-11-10 | International Business Machines Corporation | Method and apparatus for configuring relevant parameters of MapReduce applications |
CN105302536A (en) * | 2014-07-31 | 2016-02-03 | 国际商业机器公司 | Configuration method and apparatus for related parameters of MapReduce application |
WO2017031961A1 (en) * | 2015-08-24 | 2017-03-02 | 华为技术有限公司 | Data processing method and apparatus |
CN106484689A (en) * | 2015-08-24 | 2017-03-08 | 杭州华为数字技术有限公司 | Data processing method and device |
CN106484689B (en) * | 2015-08-24 | 2019-09-03 | 杭州华为数字技术有限公司 | Data processing method and device |
WO2017162027A1 (en) * | 2016-03-21 | 2017-09-28 | 阿里巴巴集团控股有限公司 | Control method and device for map end aggregation regarding user task in mr computing platform |
CN107220247A (en) * | 2016-03-21 | 2017-09-29 | 阿里巴巴集团控股有限公司 | The control method and device that user task map ends polymerize in MR calculating platforms |
TWI730051B (en) * | 2016-03-21 | 2021-06-11 | 香港商阿里巴巴集團服務有限公司 | Method and device for controlling user task mapping (map) end aggregation in a mapping induction (MR) computing platform |
CN107402952A (en) * | 2016-05-20 | 2017-11-28 | 伟萨科技有限公司 | Big data processor accelerator and big data processing system |
CN108196970A (en) * | 2017-12-29 | 2018-06-22 | 东软集团股份有限公司 | The dynamic memory management method and device of Spark platforms |
CN110209645A (en) * | 2017-12-30 | 2019-09-06 | ***通信集团四川有限公司 | Task processing method, device, electronic equipment and storage medium |
CN110795301A (en) * | 2018-08-01 | 2020-02-14 | 马上消费金融股份有限公司 | Job monitoring method, device, terminal and computer storage medium |
CN110222105A (en) * | 2019-05-14 | 2019-09-10 | 联动优势科技有限公司 | Data summarization processing method and processing device |
CN110222105B (en) * | 2019-05-14 | 2021-06-29 | 联动优势科技有限公司 | Data summarization processing method and device |
CN110413396A (en) * | 2019-07-30 | 2019-11-05 | 广东工业大学 | A kind of resource regulating method, device, equipment and readable storage medium storing program for executing |
CN110413396B (en) * | 2019-07-30 | 2022-02-15 | 广东工业大学 | Resource scheduling method, device and equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN103218263B (en) | 2016-03-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103218263A (en) | Dynamic determining method and device for MapReduce parameter | |
US20200342322A1 (en) | Method and device for training data, storage medium, and electronic device | |
US20150295970A1 (en) | Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system | |
US20070038987A1 (en) | Preprocessor to improve the performance of message-passing-based parallel programs on virtualized multi-core processors | |
CN103399800B (en) | Based on the dynamic load balancing method of Linux parallel computing platform | |
US11144330B2 (en) | Algorithm program loading method and related apparatus | |
CN105022670A (en) | Heterogeneous distributed task processing system and processing method in cloud computing platform | |
CN106339252B (en) | Self-adaptive optimization method and device for distributed DAG system | |
US20200193964A1 (en) | Method and device for training an acoustic model | |
CN109726004B (en) | Data processing method and device | |
WO2015094269A1 (en) | Hybrid flows containing a continuous flow | |
CN103019855A (en) | Method for forecasting executive time of Map Reduce operation | |
CN110618860A (en) | Spark-based Kafka consumption concurrent processing method and device | |
US20220300323A1 (en) | Job Scheduling Method and Job Scheduling Apparatus | |
CN106648839B (en) | Data processing method and device | |
TW201723878A (en) | Method and system for recommending application parameter setting and system specification setting in distributed computation | |
CN111198754A (en) | Task scheduling method and device | |
CN110134646B (en) | Knowledge platform service data storage and integration method and system | |
CN104717251A (en) | Scheduling method and system for Cell nodes through OpenStack cloud computing management platform | |
CN105095515A (en) | Bucket dividing method, device and equipment supporting fast query of Map-Reduce output result | |
CN103442087B (en) | A kind of Web service system visit capacity based on response time trend analysis controls apparatus and method | |
CN110362387B (en) | Distributed task processing method, device, system and storage medium | |
WO2021017701A1 (en) | Spark performance optimization control method and apparatus, and device and storage medium | |
CN110109970B (en) | Data query processing method and device | |
CN106874129A (en) | A kind of operating system process scheduling order determines method and control method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |