CN104281492A

CN104281492A - Fair Hadoop task scheduling method in heterogeneous environment

Info

Publication number: CN104281492A
Application number: CN201310283998.6A
Authority: CN
Inventors: 李千目; 侯君; 魏士祥
Original assignee: Nanjing University of Science and Technology; Wuxi Nanligong Technology Development Co Ltd
Current assignee: Nanjing University of Science and Technology; Wuxi Nanligong Technology Development Co Ltd
Priority date: 2013-07-08
Filing date: 2013-07-08
Publication date: 2015-01-14

Abstract

The invention discloses a fair Hadoop task scheduling method in the heterogeneous environment. The method comprises the steps of judging whether to execute a big job; calling the job scheduling algorithm, selecting an appropriate job and calling the task scheduling algorithm if the big job is to be executed; calling the resource pool scheduling algorithm, executing the job scheduling algorithm, selecting the appropriate job and finally calling the task scheduling algorithm if the big job is not to be executed. Compared with the prior art, the method has the advantages of supporting the large memory job, shortening the response time of a single job and enabling resource allocation to be fairer in the heterogeneous cluster environment.

Description

Hadoop task equity dispatching method under a kind of isomerous environment

Technical field

The invention belongs to computer task dispatching technique, the Hadoop task equity dispatching method particularly under a kind of isomerous environment.

Background technology

Along with the develop rapidly of informationization technology, the development of internet and the mobile Internet interconnected life that to have allowed the people far away from the ends of the earth enjoy abundant.When people enjoy such actual life easily, we also create many data, and let us has stepped into large data age.How processing the data of these flood tides, is many companies key tasks.

In distributed programming model, Google proposes the distributed programmed model of Map/Reduce, utilizes map and reduce framework can realize large-scale distributed calculating on distributed type colony, and has very high stability.（Dean?J.?and?Ghemawat?S.?Mapreduce:?simplified?data?processing?on?large?clusters[J],?Commun.?ACM,?51,?1,?pp.?107-113,?2008.）

Receive the inspiration of Google just, Doug Cutting starts its open source system---the research and development of Hadoop.Hadoop is the general name of project.Mainly be made up of HDFS and MapReduce.HDFS is Google File System(GFS) realization of increasing income.MapReduce is the realization of increasing income of Google MapReduce.

This Distributed Architecture is very creative, and has great extendability, makes Google on throughput of system, have very large competitive power.Therefore Apache foundation Java achieves a version of increasing income, and supports the Linux platform such as Fedora, Ubuntu.Hadoop achieves HDFS file system and MapRecue.As long as user inherits MapReduceBase, provide two classes realizing Map and Reduce respectively, and register Job can automatic distributed operation.

Due to increasing income of Hadoop, and unsurpassed expressive force, increasing company have selected the platform that it carries out data analysis.Abroad, the company such as facebook uses it to analyze the data produced every day; In China, Taobao uses it to analyze user data, creates many useful business datas.

But carrying out task scheduling on Hadoop is a difficult problem.Each major company is all generally according to oneself in-company demand, carries out customized task dispatching algorithm.It is the FIFO algorithm of foundation that Hadoop acquiescence provided with time, also been proposed computing power dispatching algorithm (Capacity-Scheduler) afterwards, proposes fair scheduling algorithm (Fair-scheduler) afterwards by facebook.

Summary of the invention

1, object of the present invention.

The present invention in order to solve in prior art do not support large operation, the corresponding time course of single operation and easily purchase Resourse Distribute injustice in cluster environment when, propose the Hadoop task equity dispatching method under a kind of isomerous environment.

2, the technical solution adopted in the present invention.

The technical solution realizing the object of the invention is:

Hadoop task equity dispatching method step under a kind of isomerous environment is as follows:

Step 1, reading configuration information

The system information that initialization dispatching method needs, inherit the class of Fair Scheduler definition: resource pool, minimum shared amount, weight etc., user's group that each resource pool is corresponding, this user organizes the operation submitted to all leaves in this resource pool, and minimum shared amount is the minimum groove that resource pool needs when normally working to take; Weight is then for representing that each resource pool or operation are obtaining the priority had in resource;

Step 2, start more new thread

More new thread primary responsibility upgrades: upgrade the job state of in current system all groups, upgrade each group and need how many Map groove and Reduce groove, upgrade the Fairshare amount of each group and heavy dispensed amount, if allow to seize, then according to system current state, seize;

Step 3, judge whether to perform large operation

If in certain performer, available free groove exists in system, so judge whether have the large operation of memory requirements to need scheduling in current system, if there is large memory operation, and the authority of large memory operation group reaches maximal value, then perform step 4, otherwise perform step 5;

Step 4, employing job scheduling method, from the job queue in large memory operation group, select operation to distribute, judge whether current task executor is applicable to running large memory operation, if unaccommodated words, select next operation, until choose suitable operation, if there is no suitable operation, so proceed to the 4th step; If chosen suitable operation, calling task allocation algorithm, current groove is distributed to this task, dispatching method terminates current distribution;

If step 5 fail large memory operation distribute, then call resource pool dispatching algorithm, suitable operation group is selected from Current resource pond, then job scheduling method is called, suitable operation is selected from the operation set of this operation group, last calling task allocation algorithm selects task, distributes, and dispatching method terminates current distribution.

3, beneficial effect of the present invention.

The present invention compared with prior art, its remarkable advantage: (1) supports large memory operation; (2) improve the response time of single operation; (3) in isomeric group environment, make Resourse Distribute more fair.

Below in conjunction with accompanying drawing, the present invention is described in further detail.

Accompanying drawing explanation

Fig. 1 is the program flow diagram using MapReduce model to write.

Embodiment

The present invention relates to the Hadoop task equity dispatching method under a kind of isomerous environment, step is as follows:

Step 1, reading configuration information

Step 2, start more new thread

Step 3, judge whether to perform large operation

Job scheduling algorithm's concrete steps are as follows:

Step a, according to following computing formula, calculating weight is carried out to the operation in resource pool. ; Wherein for this operation is from being submitted to the time waited at present; , wherein Remainer is the remaining task number of this operation;

Step b is according to weight operation is sorted;

Step c selects operation to dispatch successively from the head of job queue;

If step 5 fail large memory operation distribute, then call resource pool dispatching algorithm, suitable operation group is selected from Current resource pond, then job scheduling method is called, suitable operation is selected from the operation set of this operation group, last calling task allocation algorithm selects task, distributes, and dispatching method terminates current distribution;

Resource pool dispatching algorithm is as follows:

Step a obtains the task number t1 participating in running in two resource pools comparing respectively, t2, and performer tt1 and tt2 running these tasks;

Step b obtains minimum shared amount and the demand in Current resource pond respectively, and asks minimum value minShare1 therebetween and minShare2;

Step c compares the difference between t1 and minShare1, if the minimum shared amount of resource pool that two participations are compared all does not have the number of task many, so represents resource pool all demand work nests.Then formula is utilized , wherein for the dominant frequency of CPU, for memory size.Calculate the weight of the work nest taken at present in this resource pool.This work nest is distributed to the less resource pool of weight, come before list by it;

If steps d only has the minimum shared amount of one of them resource pool less than task amount, so represent and only have this resource pool to need work nest, so this work nest can be distributed to this resource pool, come before list by it.

Above-described embodiment does not limit the present invention in any way, and the technical scheme that the mode that every employing is equal to replacement or equivalent transformation obtains all drops in protection scope of the present invention.

Claims

1. the Hadoop task equity dispatching method under isomerous environment, is characterized in that step is as follows:

Step 1, reading configuration information

Step 2, start more new thread

More new thread primary responsibility upgrades: upgrade the job state of in current system all groups, upgrade each group of needs how many statistics groove and converge groove, upgrade the Fairshare amount of each group and heavy dispensed amount, if allow to seize, then according to system current state, seize;

Step 3, judge whether to perform large operation

2. the Hadoop task equity dispatching method under isomerous environment according to claim 1, is characterized in that: the job scheduling algorithm's concrete steps in described step 4 are as follows:

Operation in step 2.1 pair resource pool carries out calculating weight according to following computing formula, ; Wherein for this operation is from being submitted to the time waited at present; , wherein Remainer is the remaining task number of this operation;

Step 2.2 is according to weight operation is sorted;

Step 2.3 selects operation to dispatch successively from the head of job queue.

3. the Hadoop task equity dispatching method under isomerous environment according to claim 1 and 2, is characterized in that the resource pool dispatching algorithm in described step 5:

Step 3.1 obtains the task number t1 participating in running in two resource pools comparing respectively, t2, and performer tt1 and tt2 running these tasks;

Step 3.2 obtains minimum shared amount and the demand in Current resource pond respectively, and asks minimum value minShare1 therebetween and minShare2;

Step 3.3 compares the difference between t1 and minShare1, if the minimum shared amount of resource pool that two participations are compared all does not have the number of task many, so represents resource pool all demand work nests, then utilizes formula , wherein for the dominant frequency of CPU, for memory size, calculate the weight of the work nest taken at present in this resource pool, this work nest is distributed to the less resource pool of weight, come before list by it;

If step 3.4 only has the minimum shared amount of one of them resource pool less than task amount, so represent and only have this resource pool to need work nest, so this work nest can be distributed to this resource pool, come before list by it.