CN114077486B

CN114077486B - MapReduce task scheduling method and system

Info

Publication number: CN114077486B
Application number: CN202111386374.8A
Authority: CN
Inventors: 高永强; 张凯丰
Original assignee: Inner Mongolia University
Current assignee: Inner Mongolia University
Priority date: 2021-11-22
Filing date: 2021-11-22
Publication date: 2024-03-29
Anticipated expiration: 2041-11-22
Also published as: CN114077486A

Abstract

The invention provides a MapReduce task scheduling method and a MapReduce task scheduling system, which overcome the defect that the existing preemption mechanism based on Kill of Yarn directly kills tasks by introducing the preemption mechanism based on a Docker container. The preemption mechanism based on the Docker container can release resources occupied by tasks while maintaining task progress, and can realize that tasks with high priority preempt the running resources of other tasks by combining task strategies perceived by a service level agreement, so that the completion time of the tasks reaches the goal of the Service Level Agreement (SLA), and the scheduling method can ensure higher cluster resource utilization rate and simultaneously consider the low delay and the instant response speed of the tasks.

Description

MapReduce task scheduling method and system

Technical Field

The invention relates to the technical field of task scheduling in heterogeneous cluster environments, in particular to a MapReduce task scheduling method and system.

Background

At present, with the development of internet technology, the data size required to be calculated and processed in daily production and life is increasingly larger, and a mode of processing large-scale data by using a distributed computing system has been widely used. Among these, the scheduler is a vital part of the distributed system. A well-designed scheduling strategy can efficiently allocate program requirements and available cluster resources and can reduce the operating costs of the data center. The most widely used distributed computing framework is the flagship project Hadoop of Apache at present, and the programming computing framework provided by the system is MapReduce. Hadoop abstracts the resource management part into an independent framework Yarn. Yarn is a universal resource management platform that can provide the resources required for operations for the computing programs, such as MapReduce.

At present, yarn has implemented three different schedulers, namely a first-in first-out scheduler, a capacity scheduler and a fairness scheduler, which are based on different scheduling strategies. Although these three scheduling strategies can improve the utilization rate of the cluster and optimize the cluster performance to a certain extent, how to schedule jobs with different resource requirements and QoS constraints under a complex heterogeneous cluster environment is still a challenge to be solved. The completion time of the job can be classified into short jobs and long jobs. Short jobs generally have low latency requirements, while long jobs can tolerate higher latency, but have requirements on quality of service. So for short jobs, scheduling is needed immediately after they commit to avoid queuing delays. For long jobs, if the cluster has free resources, the scheduler should allow the long jobs to use the cluster resources, which may increase the resource utilization of the cluster.

In a real working environment, a lot of long jobs and short jobs are usually mixed together to be scheduled, and the existing solution either forcibly terminates the running long jobs to ensure low delay of the short jobs or completely forbids the resource preemption behavior to improve the resource utilization of the cluster. This simple scheduling strategy cannot meet the scheduling of jobs for various different resource requirements in a complex heterogeneous environment. Our goal is to make a trade-off between resource utilization and job queuing delay, how to minimize job queuing delay while improving hardware resource utilization and performance, thereby achieving the service level agreement goal.

In view of this problem, there is an urgent need to develop a new scheduling strategy to meet the needs of actual work.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides the MapReduce task scheduling method and the MapReduce task scheduling system for the heterogeneous Yarn cluster environment, which ensure higher cluster resource utilization rate and simultaneously consider low delay and instant response speed of the operation.

The MapReduce task scheduling method provided by the invention comprises the following steps:

s1: the method comprises the steps that a client creates a JobSummiter instance, an input fragment of a job is calculated through an internal method of the JobSummiter, resources required by job operation are copied into a distributed file system, and MapReduce jobs are submitted to a resource scheduler;

s2: after receiving the job submitting message, the resource scheduler transmits the request message to the central resource scheduler, and the central resource scheduler analyzes detailed information related to the job through an internal job analysis method and analyzes the latest expiration date required by reaching the service level agreement;

s3: the central resource scheduler adds new tasks into a central task queue, and reorders all tasks from near to far according to the expiration date of each task;

s4: the central resource scheduler receives heartbeat information from the node resource schedulers, acquires the number of tasks allocated by each node resource scheduler, sequentially selects the node with the least task number from the nodes, and assigns the task with the latest current deadline date for past execution;

s5: after receiving the new task, the node resource scheduler adds the new task into a local task queue, and reorders the task queue according to the expiration date;

s6: the node resource scheduler checks the position of the new task in the task queue and if the expiration date of the new task is closer than the executing task, the new task preempts the executing task.

Further, in step S3, the central resource scheduler obtains the total CPU resources C and the total memory resources M, and obtains the job share of the long job according to the job numberPeriodically calculating the resource share ++of each job in the central task queue according to the fairness principle>

Further, in step S4, after receiving a resource request of a job, the central resource scheduler analyzes whether the job can be completed before the expiration date in combination with the expiration date constraint, the resource condition of the cluster and the resource requirement of the job, and if the central resource scheduler determines that the job can be completed before the expiration date, the job is added into the central task queue; otherwise, the central resource scheduler may refuse execution of the job.

Further, in step S4, when a certain job arrives, the central resource scheduler determines the current cluster resource amount according to the heartbeat information sent by each node resource scheduler, analyzes the resource amount requested by the job according to the history log of the job running, and if the job is not executed on the cluster, the scheduler uses a small part of the original data set as a pre-test set to execute the job.

Further, in step S4, if the amount of resources requested by the job does not exceed the amount of resources available in the cluster, the central resource scheduler adds the job to a central task queue;

otherwise, it is necessary to subdivide into two cases: one case includes executing the job if the job is preempted directly from the resources of the currently running job and the job can be completed in time if it is executed immediately;

another scenario includes that the central resource scheduler directly denies execution of the job, even if the job preempts the resources of other jobs, the job fails to meet the deadline requirement.

Further, in step S4, the amount of preemptive resources based on the service level agreement is determined by the following scheme:

after the W map tasks are executed, the reduce task is started to be executed, T ^up Representing the upper execution time limit of the W map tasks, the following can be obtained:

wherein M is _avg For the average execution time of the map task in job j,m is the number of map tasks in job j _max Maximum execution time for map task in job j; when Q jobs can be limited in time by T ^up The previous execution is completed, and the amount of resources released after completion of these jobs is R, the value of R can be calculated by the following formula:

wherein j represents a certain job,representing the number of reduce tasks of job j;

the amount of resources required for the reduce phase is E, the value of E can be calculated by the following equation:

wherein C is _r Indicating the amount of available resources in the cluster at the next time,representing the amount of resources required by the map task for job j.

Further, in step S6, when job preemption is required, the resource share of the job k to be preempted is calculatedWherein->Represents the resources actually occupied by job k during execution,/-, for example>Representing the amount of resources that the job k should obtain according to the fair allocation principle of the resources; then the resource share of the job j request to be executed is acquired +.>If->Then the resource to be preempted can be calculated>

Further, in step S6, ifThen calculate the resources to be preempted by algorithm +.>Resources requiring preemption ∈>The calculation of (1) comprises: firstly comparing CPU resources and memory resources, dividing the CPU resources and the memory resources into main resources and secondary resources, and then obtaining the resource recovery quantity of the secondary resources according to the recovery quantity of the main resources in proportion, wherein a calculation formula comprises:

wherein C is _j ,M _j Representing the CPU resource amount and the memory resource amount requested by the job j respectively, C _a ,M _a Respectively representing the CPU resource amount and the memory resource amount actually additionally occupied by the current operation k in the cluster;

representing the amount of resources that need to be preempted, if +.>The CPU resource requested by the job j is the main resource, so that all CPU resources additionally occupied by the job k are preempted, and the memory resources additionally occupied by the job k are preempted in proportion; otherwise, the memory is regarded as the main resource requested by the operation j, and the additional occupation of the operation k is preemptedAll memory resources and proportionally preempt the CPU resources additionally occupied by job k.

Further, the MapReduce task scheduling method is performed according to a scheduling policy based on a service level protocol, and the step of the scheduling policy based on the service level protocol comprises the following steps:

when job j arrives, analyzing the expiration date, required throughput and required resource amount of the job;

the central resource scheduler analyzes whether the current resource quantity of the cluster meets the resource demand quantity of the job j, if so, the job j is added into the central task queue;

if not, judging whether the resource quantity of the cluster can meet the resource demand quantity of the map task of the job j and whether the released resource after the execution of the map task is finished can meet the resource demand quantity of the reduce task;

if the two conditions are met, adding the job j into a central task queue, and marking the job j as high priority, so that the job j can occupy the resources of other jobs in the execution process; if the two conditions cannot be met simultaneously, the central resource scheduler refuses to execute the job j;

the central resource scheduler sorts the jobs in the central task queue according to the expiration date, and respectively traverses each job; for the job j in the central task queue, the central resource scheduler judges whether the map task of the job j is completely executed, if not, the priority of the job j is judged, if the job is a high-priority job, the job is immediately communicated with the node resource scheduler, the designated resource is preempted from the cluster to execute the map task of the job j, otherwise, the cluster is waited to generate idle resources and the map task of the job j is allocated;

the node resource scheduler reports the task execution state to the central resource scheduler through the heartbeat information, if the map tasks are all executed, the central resource scheduler judges whether the number of the map tasks which are already executed exceeds a threshold value W, if the number of the map tasks which are already executed exceeds the threshold value W, the request tasks of the job j are started to be executed, the priority of the job j is judged, if the job which is preferentially executed is the job which is also judged, the job preemption is completed together with the node resource scheduler, and otherwise, the allocation of idle resources is waited.

The invention also provides a MapReduce task scheduling system adopting the MapReduce task scheduling method, which comprises the following steps: the distributed data center cluster comprises a center resource scheduler and a plurality of node resource schedulers;

a central task queue is maintained in the central resource scheduler, and when a new job arrives, the central resource scheduler analyzes the job characteristics to obtain the running time and the expiration date of the job;

and maintaining an operating task queue and a suspended task queue in the node resource scheduler, and continuously reporting task information and resource use conditions on the node to the central resource scheduler through a heartbeat mechanism according to the sequencing of the deadlines.

The invention provides a MapReduce task scheduling method used in a heterogeneous Yarn cluster environment, a central resource scheduler runs on a daemon process of a resource manager (resource manager) and is responsible for receiving task information transmitted by the node resource scheduler, and periodically checking a current task scheduling strategy, the resource availability of each working node and acquiring the resource requirement of a newly arrived task, so as to deduce which queues occupy redundant resources, which queues have insufficient resource allocation, calculate the number of resources to be preempted, obtain an optimal resource allocation scheme of a task queue in each time period and transmit scheduling decisions to the node resource scheduler for execution.

The scheduling method creatively introduces a preemption mechanism based on a Docker container, and overcomes the defect that the existing preemption mechanism based on Kill of Yarn directly kills tasks. The preemption mechanism based on the Docker container can release resources occupied by tasks while maintaining task progress, and can realize that tasks with high priority preempt the running resources of other tasks by combining task strategies perceived by a service level agreement, so that the completion time of the tasks reaches the goal of the Service Level Agreement (SLA), and the scheduling method can ensure higher cluster resource utilization rate and simultaneously consider the low delay and the instant response speed of the tasks.

Drawings

The invention will be described in more detail hereinafter on the basis of embodiments and with reference to the accompanying drawings. Wherein:

fig. 1 is a schematic flow chart of a MapReduce task scheduling method in the present invention;

fig. 2 is a system architecture diagram of a MapReduce task scheduling method in the present invention;

fig. 3 is an example deployment diagram of a MapReduce task scheduling method in the present invention.

Detailed Description

In order to clearly illustrate the inventive content of the present invention, the present invention will be described below with reference to examples.

In the description of the present invention, it should be noted that the positional or positional relationship indicated by the terms such as "upper", "lower", "horizontal", "top", "bottom", etc. are based on the positional or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or element in question must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Referring to fig. 1-3, a central resource scheduler in the present invention is a daemon running on a resource manager (resource manager), and is responsible for receiving task information transmitted from a node resource scheduler, and periodically checking a current task scheduling policy, resource availability of each working node, and obtaining resource requirements of newly arrived tasks, so as to infer which queues occupy redundant resources, which queues have insufficient resource allocation, calculate the number of resources to be preempted, obtain an optimal resource allocation scheme of a task queue in each time period, and send scheduling decisions to the node resource scheduler for execution.

The Node resource scheduler is a daemon running on a working Node Manager (Node Manager), which integrates a Docker container with a Yarn framework and solves the preemption mode that a native Yarn container directly kills a task container. After receiving the task request, the node resource scheduler loads the task into a Docker container, and configures the container according to the resource request of the task. In addition, the node resource scheduler is responsible for operations of container suspension and container recovery, and the container resources are recovered or recovered as required.

In the actual job scheduling process, a specific job to be preempted is determined before the job preempting operation, and the invention designs a preemptive job scheduling strategy capable of ensuring QoS service quality to the greatest extent and meeting the SLA service level agreement. The idea of the job scheduling strategy is to execute the job with the earliest expiration date preferentially, so that the number of jobs with the expiration date being missed can be minimized, and the execution effect of the jobs is greatly improved. Specifically, after receiving a resource request of a job, the central resource scheduler analyzes whether the job can be completed before the expiration date according to the expiration date constraint, the resource condition of the cluster and the resource requirement of the job. If the central resource scheduler determines that a job can be completed before its expiration date, it is added to the job queue. Otherwise, the central resource scheduler may refuse execution of this job.

When a job j arrives, the central resource scheduler determines the current cluster resource amount according to the heartbeat information sent by each node resource scheduler, analyzes the resource amount requested by the job j according to the history log of the operation of the job, and if the job j is not executed on the cluster, the scheduler uses a small part of the original data set as a pre-test set to execute the job, so as to obtain the performance index related to the job.Represents the total amount of resources needed for job j, +.>The resources required for map task representing job jSource amount (amount of source)>Representing the amount of resources required for the reduce task for job j, we use C _r Indicating the amount of available resources in the cluster at the next time instant. If the amount of resources requested by the job does not exceed the amount of resources available in the cluster, i.eThe central resource scheduler adds job j to the job queue. Otherwise, go (L)> Then it needs to be subdivided into two cases: one case is that if job j is preempted directly from the resource of job k currently running and executed immediately, in which case the job can be completed in time, job j is executed; in another case, even if the job j preempts the resources of other jobs, the job j cannot meet the deadline requirement, and the central resource scheduler directly refuses to execute the job j.

The MapReduce task scheduling method is performed according to a scheduling strategy based on a Service Level Agreement (SLA), and the specific deployment mode based on the Service Level Agreement (SLA) comprises the following steps:

step 1: and integrating the central resource scheduler provided by the invention in the resource manager nodes in the Yarn cluster, and integrating the node resource schedulers provided by the invention by the rest NodeManager nodes.

Step 2: when a user submits a batch of jobs to be allocated to a cluster, a central resource scheduler analyzes the jobs, and analyzes the input data size of the jobs, the size of needed CPU, memory and other resource sizes, and the expiration date specified by the user.

Step 3: the central resource scheduler collects node state information sent from each node resource scheduler, and counts the execution progress of the currently executing job and the utilization rate of various resources in the cluster.

Step 4: the central resource scheduler gathers the current available resources of the cluster and the characteristics of the job to be executed, analyzes whether the current cluster resources can meet the resource demand of the job j, if so, adds the job j to the job queue to be executed, otherwise, judges whether the cluster resources can meet the resource demand of the map task of the job j, and after the map task execution is finished, the released resources can meet the resource demand of the reduce task. If both conditions are met, job j is added to the job queue and set to a high priority.

Step 5: the central resource scheduler sorts the jobs in the job queue according to the expiration date, and traverses each job respectively. For the job j in the job queue, the central resource scheduler judges whether the map task of the job j is completely executed, if not, the priority of the job j is judged, if the job is a high-priority job, the job is immediately communicated with the node resource scheduler, the designated resource is preempted from the cluster to execute the map task of the job j, otherwise, the cluster is waited to generate idle resources and the map task of the job j is allocated;

step 6: the node resource scheduler reports the status of task execution to the central resource scheduler via the heartbeat information. If the map tasks have been completely executed, the central resource scheduler determines whether the number of map tasks that have been completed by execution exceeds a threshold value W. If the threshold value W is exceeded, the reduce task of the job j starts to be executed, the priority of the job j is judged, if the job is executed preferentially, the job preemption is completed together with the node resource scheduler, and otherwise, the idle resources are waited for allocation.

The scheduling method in the invention can realize that the task with high priority can occupy the operation resources of other tasks based on the scheduling strategy of the Service Level Agreement (SLA), ensure that the completion time of the job reaches the goal of the Service Level Agreement (SLA), ensure higher cluster resource utilization rate by the scheduling method in the invention, and simultaneously consider the low delay and the instant response speed of the job. The MapReduce task scheduling system can balance the relation between the resource utilization rate and the job queuing delay, effectively improve the hardware resource utilization rate and efficiency, and greatly reduce the job queuing delay, thereby achieving the service level protocol target.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims

1. The MapReduce task scheduling method is characterized by comprising the following steps of:

s2: after receiving the job submitting message, the resource scheduler transmits the request message to the central resource scheduler, and the central resource scheduler analyzes the detailed information related to the job through an internal job analysis method and analyzes the latest expiration date required by reaching the service level agreement;

2. The MapReduce task scheduling method according to claim 1, wherein in step S2, the central resource scheduler obtains a total CPU resource amount C and a total memory resource amount M, and obtains job shares of long jobs according to the job amountsPeriodically calculating the resource share ++of each job in the central task queue according to the fairness principle>

3. The MapReduce task scheduling method according to claim 2, wherein in step S3, after receiving a resource request of a job, the central resource scheduler analyzes whether the job can be completed before an expiration date in combination with an expiration date constraint, a resource condition of a cluster, and a resource requirement of the job, and if the central resource scheduler determines that the job can be completed before the expiration date, the job is added to a central task queue; otherwise, the central resource scheduler may refuse execution of the job.

4. The MapReduce task scheduling method of claim 1, wherein in step S4, when a job arrives, the central resource scheduler determines the current cluster resource amount based on the heartbeat information sent by each node resource scheduler, and analyzes the resource amount requested by the job based on the history log of the job running, and if the job has not been executed on the cluster, the scheduler executes the job using a small portion of the original data set as a pre-test set.

5. The MapReduce task scheduling method of claim 4, wherein in step S4, if the amount of resources requested by the job does not exceed the amount of resources available in the cluster, the central resource scheduler adds the job to a central task queue;

6. The MapReduce task scheduling method according to claim 4, wherein in step S4, the amount of preemptive resources based on the service level agreement is determined by:

7. The MapReduce task scheduling method according to claim 6, wherein in step S6, when job preemption is required, the resource share of the job k which needs to be preempted is calculated in advanceWherein->Represents the resources actually occupied by job k during execution,/-, for example>Representing the amount of resources that the job k should obtain according to the fair allocation principle of the resources; then the resource share of the job j request to be executed is acquired +.>If->Then the resources that need to be preempted can be calculated

8. The MapReduce task scheduling method of claim 7, wherein in step S6, ifThen calculate the resources to be preempted by algorithm +.>Resources requiring preemption ∈>The calculation of (1) comprises: firstly comparing CPU resources and memory resources, dividing the CPU resources and the memory resources into main resources and secondary resources, and then obtaining the resource recovery quantity of the secondary resources according to the recovery quantity of the main resources in proportion, wherein a calculation formula comprises:

representing the amount of resources that need to be preempted, if +.>The CPU resource requested by the job j is the main resource, so that all CPU resources additionally occupied by the job k are preempted, and the memory resources additionally occupied by the job k are preempted in proportion; otherwise, the memory is considered as the main resource requested by the job j, all memory resources additionally occupied by the job k are preempted, and the CPU resources additionally occupied by the job k are preempted in proportion.

9. The MapReduce task scheduling method of claim 8, wherein the MapReduce task scheduling method is performed according to a service level protocol-based scheduling policy, and the service level protocol-based scheduling policy comprises:

10. A MapReduce task scheduling system employing the MapReduce task scheduling method of any one of claims 1 to 9, comprising: the distributed data center cluster comprises a center resource scheduler and a plurality of node resource schedulers;