CN103218263B - The dynamic defining method of MapReduce parameter and device - Google Patents

The dynamic defining method of MapReduce parameter and device Download PDF

Info

Publication number
CN103218263B
CN103218263B CN201310078507.4A CN201310078507A CN103218263B CN 103218263 B CN103218263 B CN 103218263B CN 201310078507 A CN201310078507 A CN 201310078507A CN 103218263 B CN103218263 B CN 103218263B
Authority
CN
China
Prior art keywords
task
reduce task
reduce
default
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310078507.4A
Other languages
Chinese (zh)
Other versions
CN103218263A (en
Inventor
林学练
于晨晖
韩军
叶玥
崔晓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201310078507.4A priority Critical patent/CN103218263B/en
Publication of CN103218263A publication Critical patent/CN103218263A/en
Application granted granted Critical
Publication of CN103218263B publication Critical patent/CN103218263B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a kind of dynamic defining method and device of MapReduce parameter, wherein, described method comprises: obtain MapReduce job request, and described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and for representing the parameter whether allowing the Regulation mechanism starting Reduce task quantity; If whether allow the parameter of the Regulation mechanism starting Reduce task quantity for allowing, then monitor the execution of Map task for representing; If the quantity of executed Map task meets default first threshold, then determine the quantity of the Reduce task after adjusting; According to the quantity of the Reduce task after adjustment, each unenforced default Reduce task is corresponded to the Reduce task after each adjustment, thus realize the quantity dynamically determining a rational Reduce task in MapReduce operation.

Description

The dynamic defining method of MapReduce parameter and device
Technical field
The present invention relates to distributed computing technology field, particularly relate to a kind of dynamic defining method and device of MapReduce parameter.
Background technology
Mapping abbreviation MapReduce is a kind of distributed computing framework, and it uses for reference the thought of functional expression programming, carries out Distributed Calculation efficiently to large-scale dataset.A computational tasks Job is divided into several and maps Map task and abbreviation Reduce task by MapReduce framework, the data model of the input and output of Map task phase and Reduce task phase is all Key-Value form, and Reduce task phase relies on the input of output as oneself of Map task phase.And the quantity of Map task is determined by the data set inputted, the quantity of Reduce task is specified by user.Because the data set of input is general larger, therefore, data set can be cut into multiple data block chunk, after MapReduce Hand up homework, the quantity of the data block that the scheduler Master of MapReduce framework can comprise according to input data set is determined the quantity of corresponding Map task to make each Map task process data block.
And the data block of each Map task input is converted into Key-Value form, through Map computing, export the intermediate result of Key-Value form, intermediate result can sort by Key, ranking results will be output on the local disk of this Map task run place computing node, MapReduce framework can by Key sequence and the quantity of the Reduce task of specifying by user do to key subregion and to the Value dropped in same subregion and be polymerized.The input data of each Reduce task are parts for the intermediate result that multiple Map task exports, such as, if user specifies n Reduce task, just there is n subregion, the intermediate result belonging to each subregion by Internet Transmission to performing in Reduce task that this subregion calculates, perform the Reduce algorithm that user specifies, last Output rusults.
Due in existing MapReduce framework, the accurate quantity of Reduce task just must be known before Map tasks carrying, thus the quantity of the Reduce task can specified according to user in Map task phase carries out subregion to the intermediate result exported, and the quantity of Reduce task is normally artificially specified by user, therefore, the intermediate result no matter Map task phase exports has how many, is all the operation carrying out Reduce task according to the quantity of set Reduce task.When the intermediate result that Map task phase exports is little, can perform in 1 ~ 2 Reduce task completely, but the quantity of the Reduce task that possible user specifies is much larger than 2, if the quantity operation of the Reduce task of also specifying according to user, the unnecessary wasting of resources can be caused; And when the quantity of the Reduce task that the intermediate result that Map task phase exports is a lot, user specifies is relatively less, if the quantity operation of the Reduce task of specifying according to user, the problem that the execution time is long can be caused.
Summary of the invention
The object of the present invention is to provide a kind of dynamic defining method and device of MapReduce parameter, thus realize the quantity dynamically determining a rational Reduce task in MapReduce operation.
First aspect of the present invention is to provide a kind of dynamic defining method of MapReduce parameter, comprising:
Obtain MapReduce job request, described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and for representing the parameter whether allowing the Regulation mechanism starting Reduce task quantity;
Whether allow the parameter of the Regulation mechanism starting Reduce task quantity for allowing, then monitor the execution of Map task if described for representing;
If the quantity of executed Map task meets default first threshold, the Output rusults of described executed Map task is mapped to the subregion of the quantity of default Reduce task, then determine the quantity of the Reduce task after adjusting;
According to the quantity of the Reduce task after described adjustment, each unenforced default Reduce task is corresponded to the Reduce task after each adjustment, to make the Reduce task after each adjustment of execution.
Another aspect of the present invention is to provide a kind of dynamic determining device of MapReduce parameter, comprising:
Job request acquisition module, for obtaining MapReduce job request, described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and for representing the parameter whether allowing the Regulation mechanism starting Reduce task quantity;
Whether monitoring module, if allow the parameter of the Regulation mechanism starting Reduce task quantity for allowing, then monitor the execution of Map task for described for representing;
Determination module, if meet default first threshold for the quantity of executed Map task, the subregion of the quantity of the Reduce task preset that the Output rusults of described executed Map task is mapped to, then determine the quantity of the Reduce task after adjusting;
Mapping block, for the quantity according to the Reduce task after described adjustment, corresponds to the Reduce task after each adjustment, to make the Reduce task after each adjustment of execution by each unenforced default Reduce task.
The beneficial effect of employing the invention described above technical scheme is: the present embodiment is by monitoring the Master of MapReduce framework, thus can according to the implementation status of Map task, dynamic conditioning is carried out to the quantity of Reduce task in MapReduce job request, thus perform Reduce task according to the quantity of the Reduce task after adjustment, specify the quantity of Reduce task by user's static state in prior art and the wasting of resources that causes or the problem such as the execution time is long to solve.
Accompanying drawing explanation
The process flow diagram of the dynamic defining method of a kind of MapReduce parameter that Fig. 1 provides for the embodiment of the present invention one;
The structural representation of the dynamic determining device of a kind of MapReduce parameter that Fig. 2 provides for the embodiment of the present invention two.
Embodiment
The process flow diagram of the dynamic defining method of a kind of MapReduce parameter that Fig. 1 provides for the embodiment of the present invention one, as shown in Figure 1, described method can comprise the steps:
Step 101, obtains MapReduce job request; Described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and for representing the parameter whether allowing the Regulation mechanism starting Reduce task quantity;
It should be noted that, the executive agent of the present embodiment method can be the dynamic determining device of MapReduce parameter, and the Master of this device to MapReduce framework monitors, thus can be obtained the job request of MapReduce by Master.Wherein, the job request of MapReduce can comprise the quantity for the treatment of work data collection, default Reduce task and for representing the parameter whether allowing the Regulation mechanism starting Reduce task quantity.Due to the accurate quantity of Reduce task just must be known before performing Map task, subregion can be carried out according to the quantity of Reduce task to the intermediate result exported in Map task phase, therefore, the job request of each MapReduce is preset to the quantity of a Reduce task, to perform Map task, in addition, in the present embodiment, the dynamic determining device of MapReduce parameter is according to for representing the adjustment whether allowing the parameter of the Regulation mechanism starting Reduce task quantity to determine whether to start Reduce task quantity, for example, this parameter can be represent the parameter value allowing to start or do not allow to start, as being True, False or default, in the present embodiment, True can represent to treat when work data collection carries out MapReduce operation and allows the dynamic determining device of MapReduce parameter to carry out dynamic conditioning to the quantity of Reduce task, Reduce task can be performed according to the quantity of the Reduce task after adjustment when performing Reduce task, False or default, can represent to treat when work data collection carries out MapReduce operation and not allow the dynamic determining device of MapReduce parameter to carry out dynamic conditioning to the quantity of Reduce task, and can only perform Reduce task according to the quantity of the Reduce task preset when performing Reduce task.
Whether step 102, allow the parameter of the Regulation mechanism starting Reduce task quantity for allowing, then monitor the execution of Map task if described for representing;
In the present embodiment, when for representing that the parameter whether allowing the Regulation mechanism starting Reduce task quantity is permission, then this parameter can trigger the adjustment of the dynamic determining device startup Reduce task quantity of MapReduce parameter, therefore, the dynamic determining device of MapReduce parameter continues to monitor the Master of MapReduce framework, and according to treating that the quantity of Map task determined by work data collection, or obtained the quantity of Map task in this MapReduce operation by Master, simultaneously according to the execution of job request monitoring Map task.
Step 103, if the quantity of executed Map task meets default first threshold, then determines the quantity of the Reduce task after adjusting;
In the present embodiment, in order to reasonably adjust the quantity of Reduce task, therefore, can after Map tasks carrying a period of time, the data volume of the intermediate result exported in conjunction with the quantity of executed Map task and executed Map task and determine the input data total amount of Reduce task in this MapReduce operation, specifically, because the data volume of each Map task input is identical, therefore, the data volume of its intermediate result exported is also substantially identical, again due to output that the input of Reduce task is exactly Map task, therefore, in the data volume of the intermediate result that can export according to the quantity of executed Map task and executed Map task and this MapReduce operation Map task quantity and determine the input data total amount of Reduce task, thus adjust according to the quantity of data total amount to Reduce task of Reduce task.In the present embodiment, due to start when performing Map task be according to preset Reduce task quantity to export intermediate result carry out subregion, therefore, in the subregion of the quantity of the Reduce task preset that the intermediate result that executed Map task exports is mapped to.In the present embodiment, started the adjustment opportunity of Reduce task quantity as the dynamic determining device triggering MapReduce parameter by the first threshold that arranges, specifically, when the quantity of executed Map task meets default first threshold, the dynamic determining device then triggering MapReduce parameter carries out dynamic conditioning according to executed Map task to the quantity of Reduce task, thus determine the quantity of the Reduce task after adjusting, under normal circumstances, the quantity of the Reduce task after adjustment is less than the quantity of default Reduce task.
Step 104, according to the quantity of the Reduce task after described adjustment, corresponds to the Reduce task after each adjustment, to make the Reduce task after each adjustment of execution by each unenforced default Reduce task.
In the present embodiment, due to start when performing Map task be according to preset Reduce task quantity to export intermediate result carry out subregion, i.e. each default Reduce task corresponding subregion respectively, therefore, after the quantity of Reduce task is adjusted, then by the Reduce task after the partition map in the quantity of default Reduce task corresponding to unenforced Reduce task to adjustment, namely a Reduce task after adjustment can subregion corresponding to corresponding one or more default Reduce task, thus make Master can scheduled for executing adjustment after Reduce task.
The present embodiment is by monitoring the Master of MapReduce framework, thus can according to the implementation status of Map task, dynamic conditioning is carried out to the quantity of Reduce task in MapReduce job request, thus perform Reduce task according to the quantity of the Reduce task after adjustment, specify the quantity of Reduce task by user's static state in prior art and the wasting of resources that causes or the problem such as the execution time is long to solve.
Concrete, the first threshold preset described in above-described embodiment can be the amount threshold of default Map task, such as, can be 30 Map tasks or 100 Map task dispatchings, choosing of concrete numerical value can be arranged according to actual job situation, and the present embodiment does not limit this.Therefore, when the quantity of executed Map task meets the amount threshold of default Map task, then the dynamic determining device triggering MapReduce parameter starts the adjustment of Reduce task quantity, thus determines the quantity of the Reduce task after adjusting.
Preferably, the first threshold preset described in above-described embodiment can be default ratio, and such as, can be 1/5 or 1/3 etc., choosing of concrete numerical value can be arranged according to actual job situation, and the present embodiment does not limit this.Therefore, when ratio between the quantity and the total quantity of Map task of executed Map task meets default ratio, the dynamic determining device then triggering MapReduce parameter starts the adjustment of Reduce task quantity, thus determine the quantity of the Reduce task after adjusting, in the present embodiment, the total quantity of Map task according to treating that work data collection is determined, also can be obtained by Master.
Further, on the basis of above-mentioned any embodiment, the Second Threshold preset for forward scheduling Reduce tasks carrying can also be comprised in MapReduce job request, wherein, Second Threshold also can be amount threshold or the ratio of default Map task, namely, when the quantity of executed Map task meets Second Threshold, Master can be triggered and start to perform default Reduce task.
In the present embodiment, when Second Threshold is less than first threshold, namely when the quantity of executed Map task meets default Second Threshold but does not meet the first threshold preset, then the dynamic determining device of MapReduce parameter can also monitor the execution of default Reduce task, wherein, each default Reduce task distinguishes a subregion in the subregion of the quantity of the corresponding Reduce task preset; And after the quantity of executed Map task meets default first threshold, then the dynamic determining device of MapReduce parameter can also indicate and stop unenforced default Reduce task.And determine the data volume of unenforced Reduce task according to the quantity of executed Map task with the data total amount of Reduce task in this MapReduce operation, thus the quantity of corresponding unenforced Reduce task is determined according to the data volume of unenforced Reduce task, concrete, when performing unenforced Map task, the intermediate result exported is mapped in the subregion of the quantity of the Reduce task redefined, i.e. the Reduce tasks carrying of unenforced default Reduce task by redefining.
For a MapReduce operation, in two stages of Map and Reduce, quantity due to Map task is fixing, therefore, the cost of Map task is also relatively-stationary, and by MapReduce performance model, calculate respectively operation when the quantity of same MapReduce operation corresponding different Reduce task T.T. cost and the execution time time find, the T.T. cost of operation can increase along with the increase of the quantity of Reduce task usually, simultaneously, the quantity of increase Reduce task can improve the degree of parallelism between the computing power of cluster and task, shorten the execution time of operation, vice versa.Therefore, in a specific implementation of the present invention, by MapReduce performance model, calculate the cost of Reduce task, and between time cost and execution time, find equilibrium point to adjust the quantity of Reduce task.
For example, in MapReduce performance model, the time cost of Reduce task comprises TR1_init, TR2_read, TR3_net, TR4_merge, TR5_serial, TR6_io, TR7_parse, TR8_Reducer, TR9_net and TR10_write, wherein, the time cost of system when TR1_init represents initialization Reduce task, namely initialization task, open time of task, closedown task, loading procedure etc., usual TR1_init=RedSysCost+RedInit; TR2_read represents the IO cost of read data when starting to perform Reduce task, usual TR2_read=ReduceInput/seqRead; TR3_net transmits the network cost of data when representing and perform Reduce task, usually cPU cost needed for TR4_merge sorts when representing and perform Reduce task, usual TR4_merge=SortCEF*RIRNumber*logNumberofMap; TR5_serial represents the time cost performing Reduce task time series, usual TR5_serial=(se1*RIRNumber+se2*ReduceInput); The disk I/O cost that TR6_io relates to when representing and perform Reduce task, usually TR 6 _ io = ReduceInput * ( 1 seqRead + 1 seqWrite ) ; TR7_parse represents time cost needed for resolution data when performing Reduce task, usual TR7_parse=pa1*RIRNumber+pa2*ReduceInput; TR8_Reducer represent perform Reduce task time function calculate needed for time cost, usual TR8_Reducer=ReduceInput*ComplexOfReduce*CEF; TR9_net transmits the network cost of data after representing execution Reduce task, usually TR 9 _ net = HDFSReplica * ReduceOutput BandWidth ; TR10_write represents the IO cost performed to write after Reduce task, usually TR 10 _ write = HDFSReplica * ReduceOutput SeqWrite .
Therefore, the time cost TR of each Reduce task equals above-mentioned every sum, that is: TR = Σ k = 1 10 T R k ;
Suppose there be n Reduce task in a MapReduce operation, therefore, in a MapReduce operation, the T.T. cost TRS of all Reduce tasks is:
TRS=n*TR。
By above-mentioned various substitution pricing formula TR, then the time cost TR performing a Reduce task equals: TR = RedSysCost + RedInit + ReduceInput / seqRead + ReduceInput BandWidth
+ SortCEF * RIRNunber * log NunberofMap + ( sel * RIRNumber + se 2 * ReduceInput ) + ReduceInput * ( 1 seqRead + 1 seqWrite ) + pa 1 * RIRNumber + pa 2 * ReduceInput + ReduceInput * ComplexOfReduce * CEF + HDFSReplica * ReduceOutput BandWidth + HDFSReplica * ReduceOutput SeqWrite ;
Wherein, initialization when RedSysCost represents initialization Reduce task, open, close the time cost of task, be traditionally arranged to be 2000 milliseconds (ms); RedInit loads additional programs or the time needed for data when representing initialization Reduce task; ReduceInput represents the data volume that execution Reduce task inputs; SeqRead represents in Preset Time the data volume reading disk; Bandwidth represents network transfer speeds in Preset Time; SortCEF represents sequence coefficient; The quantity of the data that RIRNumber inputs when representing and perform Reduce task; NumberOfMap represents the quantity of Map task; SeqWrite represents in Preset Time the data volume writing disk; The complexity that ComplexOfReduce calculates when representing and perform Reduce task; HDFSReplica represents the quantity of copy; The data volume that ReduceOutput exports after representing execution Reduce task; Se1 and se2 is two parameters in serializing stage, in Mapredcue performance model, and linear with data volume of the time cost required for serializing, se1 and se2 is exactly two parameters of this linear relationship, and saying for a cluster, is constant, therefore, constant can be thought; Pa1 and pa2 is two constant parameter of phase sorting; CEF is the time required for the computing of clustered machine cleanup standard, is also constant.Except ReduceInput, ReduceOutput, outside the Pass the data volume of Map task and Reduce task has in RIRNumber, NumberOfMap and MapReduce operation, other parameter is the Given information of systemic presupposition in above-mentioned each parameter.
Further, if make n be the quantity of Reduce task in MapReduce operation, then ReduceInput=Input/n, wherein, Input represents total input data volume of all Reduce tasks in MapReduce operation, if Y is the data transformations rate of Reduce, i.e. Reduce output data quantity and the ratio inputting data volume, Y=ReduceOutput/ReduceInput, then ReduceOutput=Y*ReduceInput=Y*Input/n, then
RIRNumber=ReduceInput/RIRLength=Input/n/RIRLength; Wherein, RIRlength represents the average length of input data.
Substitute into above-mentioned formula and by after abbreviation, obtain:
TR = RedSysCost + RedInit + [ 2 / SeqRead + ( 1 + HDFSReplica * Y ) * ( 1 SeqWrite + 1 BandWidth ) + ( se 1 + pa 1 + SortCEF * log NumberofMap ) / RIRLength + ( se 2 + pa 2 + ComplexOfReduce * CEF ) ] * Input / n ;
If make a=RedSysCost+RedInit,
b = 2 / SeqRed + ( 1 + HDFSReplica * Y ) * ( 1 SeqWrite + 1 BandWidth ) + ( se 1 + pa 1 + SortCEF * log NumberofMap ) / RIRLength + ( se 2 + pa 2 + ComplexOfReduce * CEF ) ;
For given MapReduce operation, the value of a and b can be tried to achieve by above-mentioned parameter, is equivalent to constant, therefore performs the time cost TR=a+b*Input/n of a Reduce task; Correspondingly, TRS=n*TR=a*n+b*Input.
For the formation of the T.T. cost TRS of all Reduce tasks in a time cost TR and MapReduce operation of a Reduce task, the first order derivative sum of TR and TRS can be selected to be that the point of 0 is as equilibrium point, thus the quantity of adjustment Reduce task, that is: ∂ ( TR ) ∂ ( n ) + ∂ ( TRS ) ∂ ( n ) = 0 ; Substitute into equation: n = b * Input / a = b a * Input .
Thus the quantity of Reduce task can be determined according to above formula, further, in order to avoid occurring that the data volume of each Reduce process is more thus the situation that the execution time that is single Reduce task is longer, the maximum process data volume that can limit single Reduce task is further U, therefore, the all data of Reduce task process using Input/U quantity are needed for each MapReduce operation is minimum, the value of U is specifically as follows: U=ReduceJVM*ShuffleBufferPercent*ShuffleMergePercent, therefore:
n = { b / a * Input , ifinput < U 2 * b / a Input U , else .
The present embodiment is by monitoring the Master of MapReduce framework, thus can according to the implementation status of Map task, determine the total amount of data of Reduce task in MapReduce operation, thus consider the time cost that performs Reduce task according to the total amount of data of Reduce task and the quantity of Reduce task is adjusted, to determine the quantity of a rational Reduce task, and perform Reduce task according to the quantity of the Reduce task after adjustment, the quantity of Reduce task is specified by user's static state in prior art and the wasting of resources that causes or the problem such as the execution time is long to solve, make the time cost performing Reduce task be tending towards minimum simultaneously.
One of ordinary skill in the art will appreciate that: all or part of step realizing above-mentioned each embodiment of the method can have been come by the hardware that programmed instruction is relevant.Aforesaid program can be stored in a computer read/write memory medium.This program, when performing, performs the step comprising above-mentioned each embodiment of the method; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium.
The structural representation of the dynamic determining device of a kind of MapReduce parameter that Fig. 2 provides for the embodiment of the present invention two, as shown in Figure 2, the dynamic determining device of the MapReduce parameter of the present embodiment can comprise:
Job request acquisition module 201, for obtaining MapReduce job request, described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and for representing the parameter whether allowing the Regulation mechanism starting Reduce task quantity;
Whether monitoring module 202, if for allowing the parameter of the Regulation mechanism starting Reduce task quantity for allowing, then monitor the execution of Map task for representing;
Determination module 203, if meet default first threshold for the quantity of executed Map task, the subregion of the quantity of the Reduce task preset that the Output rusults of described executed Map task is mapped to, then determine the quantity of the Reduce task after adjusting;
Mapping block 204, for the quantity according to the Reduce task after adjustment, corresponds to the Reduce task after each adjustment, to make the Reduce task after each adjustment of execution by each unenforced default Reduce task.
The dynamic determining device of the MapReduce parameter of the present embodiment may be used for the technical scheme performing embodiment of the method shown in Fig. 1, and it realizes principle and technique effect is similar, repeats no more herein.
Further, determination module specifically may be used for: according to the data total amount of unenforced Reduce task, determines the quantity of the Reduce task after adjusting.
Preferably, determination module specifically may be used for:
According to following formula, determine the quantity of the Reduce task after adjusting:
&PartialD; ( TR ) &PartialD; ( n ) + &PartialD; ( TRS ) &PartialD; ( n ) = 0 ;
Wherein, n is the quantity of the Reduce task after adjustment, TR is the time cost of a Reduce task after performing adjustment, TRS is the T.T. cost of all Reduce tasks after performing adjustment, and TR depends on the quantity of the data total amount of unenforced Reduce task and the Reduce task after adjusting.
Concrete, the MapReduce job request in above-described embodiment can also comprise the Second Threshold preset for forward scheduling Reduce tasks carrying, if Second Threshold is less than first threshold, then monitoring module specifically may be used for:
If the quantity of executed Map task meets described Second Threshold and do not meet the first threshold preset, then the execution of Reduce task is preset in monitoring, a subregion in the subregion of the quantity of the corresponding described default Reduce task of each default Reduce task difference;
And after the quantity of executed Map task meets default first threshold, instruction stops performing unenforced default Reduce task.
Concrete, in above-mentioned any embodiment, the first threshold preset can be the amount threshold of Map task preset or default ratio, then determination module specifically for:
If the quantity of executed Map task meets the amount threshold of default Map task, then start the Regulation mechanism of Reduce task quantity, determine the quantity of the Reduce task after adjusting; Or,
If the ratio between the quantity of executed Map task and the total quantity of Map task meets default ratio, then start the Regulation mechanism of Reduce task quantity, determine the quantity of Reduce task after adjusting, wherein, the total quantity of Map task according to described in treat that work data collection is determined.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.

Claims (6)

1. a dynamic defining method for MapReduce parameter, is characterized in that, comprising:
Obtain MapReduce job request, described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and for representing the parameter whether allowing the Regulation mechanism starting Reduce task quantity;
Whether allow the parameter of the Regulation mechanism starting Reduce task quantity for allowing, then monitor the execution of Map task if described for representing;
If the quantity of executed Map task meets default first threshold, the Output rusults of described executed Map task is mapped to the subregion of the quantity of default Reduce task, then determine the quantity of the Reduce task after adjusting;
According to the quantity of the Reduce task after described adjustment, each unenforced default Reduce task is corresponded to the Reduce task after each adjustment, to make the Reduce task after each adjustment of execution;
The described quantity determining the Reduce task after adjusting, specifically comprises:
According to the data total amount of unenforced Reduce task, determine the quantity of the Reduce task after adjusting;
The described data total amount according to unenforced Reduce task, determine the quantity of the Reduce task after adjusting, specifically comprise:
According to following formula, determine the quantity of the Reduce task after adjusting:
&part; ( T R ) &part; ( n ) + &part; ( T R S ) &part; ( n ) = 0 ;
Wherein, n is the quantity of the Reduce task after adjustment, TR is the time cost of a Reduce task after performing adjustment, TRS is the T.T. cost of all Reduce tasks after performing adjustment, and TR depends on the quantity of the data total amount of unenforced Reduce task and the Reduce task after adjusting.
2. method according to claim 1, it is characterized in that, described MapReduce job request also comprises the Second Threshold preset for forward scheduling Reduce tasks carrying, if described Second Threshold is less than described first threshold, if before then the quantity of described executed Map task meets default first threshold, also comprise:
If the quantity of described executed Map task meets described Second Threshold and do not meet the first threshold preset, then the execution of Reduce task is preset in monitoring, a subregion in the subregion of the quantity of the corresponding described default Reduce task of each default Reduce task difference;
If the quantity of described executed Map task also comprises after meeting default first threshold:
Instruction stops performing unenforced default Reduce task.
3. method according to claim 1 and 2, it is characterized in that, described default first threshold is the amount threshold of default Map task or default ratio, if the quantity of described executed Map task meets default first threshold, then determine the quantity of the Reduce task after adjusting, be specially:
If the quantity of described executed Map task meets the amount threshold of described default Map task, then start the Regulation mechanism of Reduce task quantity, determine the quantity of the Reduce task after adjusting; Or,
If the ratio between the quantity of described executed Map task and the total quantity of Map task meets default ratio, then start the Regulation mechanism of Reduce task quantity, determine the quantity of Reduce task after adjusting, the total quantity of described Map task according to described in treat that work data collection is determined.
4. a dynamic determining device for MapReduce parameter, is characterized in that, comprising:
Job request acquisition module, for obtaining MapReduce job request, described MapReduce job request comprises the quantity for the treatment of work data collection, default Reduce task and for representing the parameter whether allowing the Regulation mechanism starting Reduce task quantity;
Whether monitoring module, if allow the parameter of the Regulation mechanism starting Reduce task quantity for allowing, then monitor the execution of Map task for described for representing;
Determination module, if meet default first threshold for the quantity of executed Map task, the subregion of the quantity of the Reduce task preset that the Output rusults of described executed Map task is mapped to, then determine the quantity of the Reduce task after adjusting;
Mapping block, for the quantity according to the Reduce task after described adjustment, corresponds to the Reduce task after each adjustment, to make the Reduce task after each adjustment of execution by each unenforced default Reduce task;
Described determination module specifically for:
According to the data total amount of unenforced Reduce task, according to following formula, determine the quantity of the Reduce task after adjusting:
&part; ( T R ) &part; ( n ) + &part; ( T R S ) &part; ( n ) = 0 ;
Wherein, n is the quantity of the Reduce task after adjustment, TR is the time cost of a Reduce task after performing adjustment, TRS is the T.T. cost of all Reduce tasks after performing adjustment, and TR depends on the quantity of the data total amount of unenforced Reduce task and the Reduce task after adjusting.
5. device according to claim 4, it is characterized in that, described MapReduce job request also comprise for forward scheduling Reduce tasks carrying preset Second Threshold, if described Second Threshold is less than described first threshold, then described monitoring module specifically for:
If the quantity of described executed Map task meets described Second Threshold and do not meet the first threshold preset, then the execution of Reduce task is preset in monitoring, a subregion in the subregion of the quantity of the corresponding described default Reduce task of each default Reduce task difference;
And after the quantity of described executed Map task meets default first threshold, instruction stops performing unenforced default Reduce task.
6. the device according to claim 4 or 5, is characterized in that, described default first threshold is the amount threshold of default Map task or default ratio, described determination module specifically for:
If the quantity of described executed Map task meets the amount threshold of described default Map task, then start the Regulation mechanism of Reduce task quantity, determine the quantity of the Reduce task after adjusting; Or,
If the ratio between the quantity of described executed Map task and the total quantity of Map task meets default ratio, then start the Regulation mechanism of Reduce task quantity, determine the quantity of Reduce task after adjusting, the total quantity of described Map task according to described in treat that work data collection is determined.
CN201310078507.4A 2013-03-12 2013-03-12 The dynamic defining method of MapReduce parameter and device Active CN103218263B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310078507.4A CN103218263B (en) 2013-03-12 2013-03-12 The dynamic defining method of MapReduce parameter and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310078507.4A CN103218263B (en) 2013-03-12 2013-03-12 The dynamic defining method of MapReduce parameter and device

Publications (2)

Publication Number Publication Date
CN103218263A CN103218263A (en) 2013-07-24
CN103218263B true CN103218263B (en) 2016-03-23

Family

ID=48816085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310078507.4A Active CN103218263B (en) 2013-03-12 2013-03-12 The dynamic defining method of MapReduce parameter and device

Country Status (1)

Country Link
CN (1) CN103218263B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103645952B (en) * 2013-08-08 2017-06-06 中国人民解放军国防科学技术大学 A kind of non-precision tasks in parallel processing method based on MapReduce
CN104598304B (en) * 2013-10-31 2018-03-13 国际商业机器公司 Method and apparatus for the scheduling in Job execution
CN104978228B (en) * 2014-04-09 2019-08-30 腾讯科技(深圳)有限公司 A kind of dispatching method and device of distributed computing system
JP6371870B2 (en) * 2014-06-30 2018-08-08 アマゾン・テクノロジーズ・インコーポレーテッド Machine learning service
CN105302536A (en) * 2014-07-31 2016-02-03 国际商业机器公司 Configuration method and apparatus for related parameters of MapReduce application
CN106484689B (en) * 2015-08-24 2019-09-03 杭州华为数字技术有限公司 Data processing method and device
CN107220247B (en) * 2016-03-21 2019-03-01 阿里巴巴集团控股有限公司 The control method and device that the end user task map polymerize in MR computing platform
CN107402952A (en) * 2016-05-20 2017-11-28 伟萨科技有限公司 Big data processor accelerator and big data processing system
CN108196970A (en) * 2017-12-29 2018-06-22 东软集团股份有限公司 The dynamic memory management method and device of Spark platforms
CN110209645A (en) * 2017-12-30 2019-09-06 ***通信集团四川有限公司 Task processing method, device, electronic equipment and storage medium
CN110795301A (en) * 2018-08-01 2020-02-14 马上消费金融股份有限公司 Job monitoring method, device, terminal and computer storage medium
CN110222105B (en) * 2019-05-14 2021-06-29 联动优势科技有限公司 Data summarization processing method and device
CN110413396B (en) * 2019-07-30 2022-02-15 广东工业大学 Resource scheduling method, device and equipment and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101770402A (en) * 2008-12-29 2010-07-07 ***通信集团公司 Map task scheduling method, equipment and system in MapReduce system
CN102096603A (en) * 2009-12-14 2011-06-15 ***通信集团公司 Task decomposition control method in MapReduce system and scheduling node equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120304186A1 (en) * 2011-05-26 2012-11-29 International Business Machines Corporation Scheduling Mapreduce Jobs in the Presence of Priority Classes
US9063790B2 (en) * 2011-06-13 2015-06-23 Accenture Global Services Limited System and method for performing distributed parallel processing tasks in a spot market

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101770402A (en) * 2008-12-29 2010-07-07 ***通信集团公司 Map task scheduling method, equipment and system in MapReduce system
CN102096603A (en) * 2009-12-14 2011-06-15 ***通信集团公司 Task decomposition control method in MapReduce system and scheduling node equipment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Practical Performance Model for Hadoop MapReduce;Xuelian Lin等;《2012 IEEE International Conference on Cluster Computing Workshops》;20120928;第231-239页 *
Predator - An Experience Guided Configuration Optimizer for Hadoop MapReduce;Kewen Wang等;《2012 IEEE 4th International Conference on Cloud Computing Technology and Science》;20121206;第419-426页 *
一种改进的MapReduce并行编程模型;周锋;《科协论坛(下半月)》;20091231(第02期);第65-66页 *
基于MapReduce的封闭立方体并行计算方法;奚建清;《华南理工大学学报(自然科学版)》;20091231;第37卷(第1期);第91-95页 *

Also Published As

Publication number Publication date
CN103218263A (en) 2013-07-24

Similar Documents

Publication Publication Date Title
CN103218263B (en) The dynamic defining method of MapReduce parameter and device
EP3734475A1 (en) Method and device for training data, storage medium, and electronic device
US9152469B2 (en) Optimizing execution and resource usage in large scale computing
US20060242633A1 (en) Compiling computer programs to exploit parallelism without exceeding available processing resources
CN107402871B (en) Terminal performance monitoring method and device and monitoring file processing method and device
TWI603203B (en) Method and system for recommending application parameter setting and system specification setting in distributed computation
CN106339252B (en) Self-adaptive optimization method and device for distributed DAG system
US10565016B2 (en) Time frame bounded execution of computational algorithms
JP2016100006A (en) Method and device for generating benchmark application for performance test
US20180181415A1 (en) System and method for controlling batch jobs with plugins
CN103019855A (en) Method for forecasting executive time of Map Reduce operation
Jiang et al. Parallel K-Medoids clustering algorithm based on Hadoop
CN104077328A (en) Operation diagnosis method and device for MapReduce distributed system
Malakar et al. An adaptive framework for simulation and online remote visualization of critical climate applications in resource-constrained environments
CN106648839B (en) Data processing method and device
CN110362387B (en) Distributed task processing method, device, system and storage medium
US8255905B2 (en) Multi-threaded processes for opening and saving documents
Ericson et al. On the performance of distributed clustering algorithms in file and streaming processing systems
US20230059674A1 (en) Distributed learning server and distributed learning method
CN114138597B (en) Operating system performance tuning device, method, equipment and storage medium
CN115935909A (en) File generation method and device and electronic equipment
CN111736967B (en) Multi-branch flow management and control device, flow template generation method and storage medium
CN112000478A (en) Job operation resource allocation method and device
CN105224389A (en) The virtual machine resource integration method of theory of casing based on linear dependence and segmenting
Li et al. Scalability and performance analysis of BDPS in clouds

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant