CN107797863B - Fine-grained resource matching method in cloud computing platform - Google Patents

Fine-grained resource matching method in cloud computing platform Download PDF

Info

Publication number
CN107797863B
CN107797863B CN201710909672.8A CN201710909672A CN107797863B CN 107797863 B CN107797863 B CN 107797863B CN 201710909672 A CN201710909672 A CN 201710909672A CN 107797863 B CN107797863 B CN 107797863B
Authority
CN
China
Prior art keywords
resource
task
server
resources
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710909672.8A
Other languages
Chinese (zh)
Other versions
CN107797863A (en
Inventor
董小社
周墨颂
张兴军
陈衡
陈跃辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201710909672.8A priority Critical patent/CN107797863B/en
Publication of CN107797863A publication Critical patent/CN107797863A/en
Application granted granted Critical
Publication of CN107797863B publication Critical patent/CN107797863B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5033Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering data affinity

Abstract

The invention discloses a fine-grained resource matching method in a cloud computing platform, which belongs to the field of computers, and is characterized in that task execution time and various resource demand quantities are presumed based on the same task execution logic in cloud computing load, task execution stages are divided according to task resource demands, the resource demands are matched in stages, the problems of resource fragmentation, excessive distribution and the like are avoided, computing resources are abstracted into two types of compressible and incompressible in the resource matching process, and resource distribution is compressed if necessary, so that the resource utilization rate is further improved. The invention improves the matching granularity in the aspects of time allocation and resource quantity and has lower average scheduling response time. The method can be applied to the aspects of resource management, job scheduling and the like of each cloud computing platform, avoids the problems of resource fragmentation, excessive allocation and the like, improves the utilization efficiency of computing resources in the platform, and finally improves the overall throughput rate of the cloud computing platform.

Description

Fine-grained resource matching method in cloud computing platform
Technical Field
The invention belongs to the field of computers, relates to a resource management and job scheduling method in a cloud computing platform, and particularly relates to a fine-grained resource matching method in the cloud computing platform.
Background
Cloud computing is an internet-based computing method by which users can obtain computing resources and computing power as needed. The infrastructure of a cloud computing platform is typically formed by interconnecting a number of computer nodes over a high performance network, which organizes the nodes into a high performance, highly available, extensible single image for use by a user.
In resource management of a cloud computing platform, one resource (usually, a memory) is usually based on or two fixed amounts of computing resources are bundled and defined as slots, and the slots are used as a unit for resource allocation. Due to the diversity of the demands of the load on the resources in the cloud computing, resource fragments are easily generated by a resource allocation method based on a fixed unit, which causes resource waste, or poor resource sharing is caused by over-allocation. Poor sharing of some resources (e.g., CPUs) can cause competition between tasks and increase execution time significantly. In a data center, over 53% of the behind tasks are caused by high resource utilization due to poor sharing, and 4% -6% of the abnormal tasks affect 37% -49% of the jobs, resulting in a great extension of job completion time. Over-allocation of some resources (such as memory) can directly result in a task failure or server crash.
The existing cloud computing resource management platforms such as Yarn, Fuxi and Borg and the cloud computing scheduling algorithms such as Apollo, Omega, Tetris, DRF and Carbyne allocate fixed amount of resources based on resource application of operation, so as to avoid the problems of resource fragmentation, excessive allocation and the like. Since the resource application amount of a job is usually specified by a human, there is a great difference between the resource application and the actual use. In addition, the resource usage of the task fluctuates greatly and does not always remain at a peak value, and when the maximum resource usage of the task is used as the application amount in the job, the resource application amount and the actual usage amount still differ. Therefore, when the resource management platform or the scheduler performs allocation according to the resource application amount, the resource fragments still exist. The allocation mode based on the resource application is difficult to obtain high resource utilization rate, and the mode limits the resource utilization rate of the cluster to a certain extent.
Therefore, how to avoid resource fragmentation and over-allocation in cloud computing resource management and job scheduling becomes an important problem in cloud computing platform research.
Disclosure of Invention
The invention aims to provide a fine-grained resource matching method in a cloud computing platform, which improves the matching granularity in the aspects of allocation time and resource quantity, has lower average scheduling response time, can effectively improve the utilization efficiency of computing resources in the platform and improves the overall throughput rate of the cloud computing platform.
The invention is realized by the following technical scheme:
the invention discloses a fine-grained resource matching method in a cloud computing platform, which comprises the following steps:
step 1: the method comprises the following steps that the roles of servers in the cloud computing platform are divided into a computing server and a management server, the computing server is responsible for executing specific loads and regularly reports resource states to the management server; the management server is responsible for the management work of the whole cloud computing platform and comprises the steps of distributing computing tasks to the computing server;
step 2: the management server receives the information periodically reported by each computing server and speculates various resource requirements and duration of a certain task according to the similar task and the resource compression rate;
wherein, the similar task refers to a task which has the same execution logic and the same input data volume as the certain task in the load;
and step 3: the management server analyzes the CPU and memory space resource requirements of each estimated task progress and divides the task into a plurality of execution stages;
and 4, step 4: the management server selects tasks from the set to be scheduled, matches the task resource requirements with the server available computing resources in stages, and compresses and matches the resource requirements according to the requirements;
and 5: if the resource requirements are successfully matched, the management server assumes that the resources are already allocated, checks whether all tasks on the computing server are affected and cannot meet the constraint conditions, and allocates the computing resources if all task constraint conditions are met.
Preferably, the following operations are further included after the step 5: the management server checks whether enough resources remain on the computing server to enter the next round of matching, and if the remaining resources meet the conditions, the management server enters the next round of matching, namely, the steps 4 and 5 are repeated.
Preferably, in step1, the computing server is responsible for executing the specific load, and periodically reports the resource status to the management server, specifically: the method comprises the steps that a calculation server periodically collects the service condition of task resources in the running process of the server, calculates the available resource information of the server and reports the available resource information to a management server;
wherein, the available resource amount r of a certain resource on the server is calculated according to the formula (1):
Figure BDA0001424547800000031
in the formula, riAmount of resources sampled i, tiIs the duration of the ith sample, T is the total time of the sample, and n is the number of samples.
Preferably, in step2, various resource requirements and durations of a certain task are presumed according to similar tasks and resource compression ratios, and the resource requirement types include a CPU, a memory space, a disk bandwidth and a network bandwidth;
the method comprises the following steps of dividing the disk and network bandwidth resource demand of a task into three categories according to the relative position of the task and data for speculation, wherein the first category is as follows: the task and the data are in the same server; the second type: tasks and data are in the same rack; in the third category: others;
the specific operation is as follows:
the resource requirement and the duration time required by the task at a certain progress are calculated according to the formula (2):
Figure BDA0001424547800000032
in the formula, αnβ for the nth guess resultnIs the nth time similar task resource information, rcFor resource compression of task information, ThrE is a natural base number, which is the maximum compressibility limit factor.
Preferably, step3 is specifically operative to:
the management server traverses the CPU and memory space resource demand of the task at each progress, and respectively records the maximum value and the minimum value of the two resource quantities;
when the difference between the maximum value and the minimum value of the CPU or the memory space resource demand is greater than a division threshold value and the traversal progress reaches the division length, dividing the traversal progress into an execution stage of a task;
the resource demand values of the CPU, the memory space and the disk space in the task execution stage are the maximum value of the resource demand of each progress in the execution stage, and the demand values of the disk bandwidth and the network bandwidth resource are the average values of the resource demand of each progress in the execution stage.
Further preferably, referring to fig. 1, the method for dividing the task execution phase specifically includes the following steps:
1)P、Ps、Perepresenting task progress, phase start progress and phase end progress, respectively, Cmax、Cmin、MmaxAnd MminRespectively representing the maximum value and the minimum value of CPU and memory space resource requirements in the stage; wherein, P, Ps、PeInitialization is 0; cmin、MminIs 100, Cmax、MmaxInitialization is 0;
2) task progress P and phase end progress PeIncreasing by 1, if the task progress P reaches 100%, then P is addedsEnding after 100% of the new stages are divided, otherwise continuing to step 3);
3) CPU requirement C if current progresspGreater than CmaxThen C will bemaxIs updated to CpEntering step 5);
4) CPU requirement C if current progresspLess than CminThen C will beminIs updated to Cp
5) If the current schedule has memory requirement MpGreater than MmaxThen M will bemaxIs updated to MpEntering step 7);
6) if the current schedule has memory requirement MpLess than MminThen M will beminIs updated to Mp
7) If C is presentmaxAnd CminIs greater than the total CPU resource amount C and the threshold ThcProduct of or MmaxAnd MminIs greater than the total memory amount M and the threshold ThmIf so, entering step 8, otherwise, entering step 2);
8) if P iseAnd PsIs greater than a threshold value ThpThen P will besTo PeDividing into new phases, PsIs updated to PeReinitializing Cmin、MminIs 100, Cmax、MmaxIs 0; otherwise, go back to step 2).
Preferably, step4 is specifically operative to:
firstly, the management server checks whether the available computing resources on the computing server meet the matching requirements;
secondly, the management server sorts the set to be scheduled according to the resource allocation fairness and the data locality strategy;
finally, the resource management server takes out the task and obtains the guess information of the task from the guess result;
if the inferred information is failed to be obtained, computing resources are matched according to the application amount of the task resources; and if the presumed information is successfully obtained, sequentially matching the resource requirements of the tasks in stages.
Preferably, the management server abstracts the computing resources into compressible resources and incompressible resources according to the resource characteristics; the computing resources comprise CPU resources, memory resources, disk resources and network resources;
in the task execution process, if a certain resource allocated to the task is less than the resource demand of the task, the task can be normally completed by prolonging the execution time, the resource is a compressible resource, otherwise, the resource is an incompressible resource;
calculating resource compression ratio r on a servercCalculating according to the formula (3):
Figure BDA0001424547800000051
rcto the resource compression ratio, RrFor resource demand, RuAllocating amount for the resource;
if the resource is an incompressible resource, its resource compression ratio rcAlways 0.
Preferably, in the matching process, if the demand is greater than the available resource amount, the matching fails;
for the incompressible resource, the calculation server does not process the available resource amount of the resource;
for compressible resources, the management server calculates the maximum compression rate of each resource in each stage of the server according to the resource and the load condition of the calculation server, and the calculation server calculates the maximum compression rate of each resource in each stageMaximum compression ratio r of some compressible resource in a segmentmaxCalculating according to the formula (4):
Figure BDA0001424547800000052
in the formula, n tasks are matched on the server, the current matched task is the (n + 1) th task, the total resource demand of the (n + 1) th task is greater than the total resource, and the workload of the ith task completed in the phase is wiΔ p is the performance change caused by resource compression;
in the compressible resource matching, the actual available resource amount of the resource of the calculation server is calculated according to the formula (5),
ai=ni+ri×Ni(5);
in the formula, riIs the maximum compressibility, N, of the current matching stage of such resource on the serveriFor the total amount of such resources on the server, niThe available resource amount of the resource is acquired;
and if the requirements of all resources in all execution stages of the task are not matched and fail, the matching is successful.
Compared with the prior art, the invention has the following beneficial technical effects:
according to the fine-grained resource matching method in the cloud computing platform, computing resources are abstracted into two types, namely compressible type and incompressible type according to resource characteristics; the resource demand and the duration of the task are presumed according to the similar task and the resource compression rate existing in the cloud computing load; dividing a task into a plurality of execution stages according to task resource requirements, and classifying the resource types by stages to respectively match the resource requirements with available resources; in the matching process, the overall resource utilization rate and the load performance are improved at the cost of slightly prolonging the task completion time in a resource compression mode. The method has the advantages that the matching granularity is improved in the aspects of allocation time and resource quantity, and meanwhile, the average response time of scheduling is low, so that the method can be applied to the aspects of resource management, job scheduling and the like of each cloud computing platform, the problems of resource fragmentation, excessive allocation and the like are avoided, the utilization efficiency of computing resources in the platform is improved, and the overall throughput rate of the cloud computing platform is finally improved.
Drawings
FIG. 1 is a flow diagram of a task execution phase division algorithm.
Fig. 2 is an architecture diagram for implementing the fine-grained resource matching method in the Yarn platform.
Detailed Description
The present invention will now be described in further detail with reference to specific examples, which are intended to be illustrative, but not limiting, of the invention.
The invention discloses a fine-grained resource matching method in a cloud computing platform, which is used for conjecturing task resource requirements and duration according to similar tasks; abstracting the resource into a compressible type and an incompressible type according to the resource characteristics; dividing the task into a plurality of execution stages and matching the requirements of various types of resources of each execution stage in sequence. According to the method, the problems of resource fragmentation, excessive allocation and the like in resource management can be avoided by improving the time granularity and the resource quantity granularity of resource allocation when the resource management and the job scheduling of the cloud computing platform are matched with resources, so that the resource utilization rate and the performance are improved.
The invention discloses a fine-grained resource matching method in a cloud computing platform, which comprises the following steps:
step 1: the roles of the servers in the cloud computing platform are divided into a computing server and a management server: the computing server is responsible for executing the specific load and regularly reports the resource state to the management server; the management server is responsible for various management tasks of the whole cloud platform, including distributing computing tasks to the computing servers.
The calculation server regularly collects the service condition of the task resources in the running of the server, calculates the available resource information of the server and reports the available resource information to the management server.
The available resource amount of a certain resource on a server is calculated according to the following formula:
Figure BDA0001424547800000071
wherein r isiAmount of resources sampled i, tiIs the duration of the ith sample, T is the total time of the sample, and n is the number of samples.
Step 2: the management server receives the information periodically reported by each computing server and speculates various resource requirements and duration of the tasks according to the similar tasks and the resource compression rate. Here, the similar task refers to a task having the same execution logic and the same amount of input data as the task in the load. And the disk and network bandwidth resource demand of the task is divided into 3 types of tasks and data which are respectively speculated in the same server, the same rack and the other types according to the relative position of the task and the data.
The required amount of computing resources and the required time of a task at a certain progress are calculated according to the following formula:
Figure BDA0001424547800000072
wherein, αnβ for the nth guess resultnIs the nth time similar task resource information, rcFor resource compression of task information, ThrE is a natural base number, which is the maximum compressibility limit factor.
And step 3: the management server analyzes the CPU and memory space resource requirements of each progress of the task, and divides the task into a plurality of execution stages.
The management server traverses the CPU and memory space resource demand of the task at each progress, and respectively records the maximum value and the minimum value of the two resource quantities. When the difference between the maximum value and the minimum value of the CPU or the memory space resource demand is larger than a certain threshold value and the traversal progress reaches a certain length, dividing the traversal progress into an execution stage of the task. The resource demand values of a CPU, a memory space, a disk space and the like in the task execution stage are the maximum value of the resource demand of each progress in the stage, and the resource demand values of memory bandwidth, disk bandwidth, network bandwidth and the like are the average value of the resource demand of each progress in the stage.
And 4, step 4: when the available computing resources on the computing server reach a certain amount, the management server selects tasks from the set to be scheduled, and matches the task resource requirements with the available computing resources of the server in stages.
The management server first checks whether the resources available on the compute server meet the matching requirements. And secondly, the management server sorts the sets to be scheduled according to the strategies of resource allocation fairness, data locality and the like. And thirdly, the resource management server takes out the task and tries to acquire the guess information of the task from the guess result. And if the inferred information is failed to obtain, matching the computing resources according to the application amount of the task resources. And if the speculative information is successfully acquired, sequentially matching the resource requirements of the tasks in stages.
The management server abstracts the computing resources such as CPU, memory, disk, network and the like into two types of compressible and incompressible according to the resource characteristics. In the process of task execution, if a certain resource allocated to the task is less than the amount of the resource required by the task, the task can be normally completed by prolonging the execution time, and the resource is a compressible resource, otherwise, the resource is an incompressible resource.
Calculating the resource compression rate on the server according to the following formula:
Figure BDA0001424547800000081
wherein r iscTo the resource compression ratio, RrFor the demand of resources, RuAllocating the resource. If the resource is an incompressible resource, its compression ratio rcAlways 0.
In the matching process, if the demand is greater than the available resource amount, the matching fails. For the incompressible resource, calculating the available resource amount of the resource of the server without processing; for compressible resources, the management server calculates the maximum compression rate of various resources at each stage of the server according to the resources of the calculation server and the load condition.
The maximum compression rate of a certain compressible resource in a phase of the computing server is calculated according to the following formula:
Figure BDA0001424547800000091
the server is matched with n tasks, the current matched task is the (n + 1) th task, the total resource demand of the (n + 1) th task is greater than the total resource, and the workload of the ith task completed in the phase is wi△ p is the performance change caused by resource compression.
In compressible resource matching, the actual available resource amount of the resource of the server is calculated according to the formula ai=ni+ri×NiCalculating; wherein r isiIs the maximum compressibility, N, of the current matching stage of such resource on the serveriFor the total amount of such resources on the server, niThe available resource amount of the resource is acquired. If the requirements of all resources in all execution phases of the task fail to match, the matching is successful.
Specifically, the matching method performs resource matching according to the following steps:
step1), checking the available resource quantity of the server, if the available resource quantity is less than the scheduling threshold, ending the scheduling;
step2), and ordering the task set to be scheduled according to the policies of resource allocation fairness, data locality and the like.
step3), selecting one task from the task set to be scheduled, and acquiring the resource requirement and the duration of each execution stage of the task; if the acquisition is successful, entering step5), otherwise, matching according to the application amount, and entering step7 after passing); failing to pass step 5);
step4), matching the resources by taking the application amount of the task resources as the demand amount, and entering step7 if the matching is successful);
step5), s is task execution stage, n is corresponding available resource stage of the server, resource requirements s of the ith resource in the execution stage are compared in sequenceiResource availability a from the available resource phaseiIf the resource in the ith is an incompressible resource, aiThe value is the corresponding resource value n in the available resource stagei(ii) a If the resource is a compressible resource, then aiValue according to formula ai=ni+ri×NiCalculation of where riFor the maximum compressibility, N, in the phase of the corresponding available resource of the resource on the serveriIs the total amount of such resources on the server. If the resource requirement is larger than the available resource, the matching fails, step3 is entered), otherwise step6 is entered);
step6), if s is the last execution stage of the task, the matching is successful, step7 is entered), otherwise step5) is executed repeatedly, and the next stage is matched.
step7), check if the match affects other tasks executing on the server, if the check passes then the match is complete, otherwise go back to step 3).
And 5: if the resource match is successful, the management server assumes that the resource has been allocated, checking that all tasks on the compute server are affected and fail to satisfy the constraints. If all task constraints can be satisfied, then computing resources are allocated.
Step 6: the management server checks whether enough resources are left on the computing server to enter the next round of matching, and if the remaining resources meet the conditions, the management server enters the next round of matching.
According to the method, fine-grained matching of computing resources of the cloud computing platform can be achieved.
An example of a specific application of the method of the invention is given below:
referring to fig. 2, this example is combined with the open-source cloud computing platform Yarn, and it should be noted that the method of the present invention is not only applied to the open-source cloud computing platform Yarn, but also applied to other application platforms meeting the requirements.
Step 1: when the Application Master registers with Resource Manager, the MD5 values and input data size information of the Application code and parameters are provided. When the Application Master applies for the computing resources required by the task from the Resource Manager, the Application Master and the task type information are used for identifying the task.
And 2, the Node Manager acquires the task information and the residual resource information in the running process of the Node by analyzing the information under the L inux Proc folder.
The available resource amount of a certain resource of the Node Manager is calculated according to the following formula:
Figure BDA0001424547800000101
wherein r isiAmount of resources sampled i, tiIs the duration of the ith sample, T is the total time of the samples, and n is the number of samples.
The Node Manager returns the collected Resource information and relevant information such as available Resource amount obtained by analysis and calculation to the Resource Manager in a heartbeat mode.
And step 3: the Resource Manager receives the request and report of the Application Master and the Node Manager, and sends the Application registration and the Resource Application information of the Application Master to the Scheduler for processing, and sends the Resource information reported by the Node Manager to the Estimator for processing.
And 4, step 4: the Estimator identifies similar tasks according to the Application registration information and the resource request information of the Application Master. A similar task refers to a task in the load that has the same execution logic as the task and the same amount of input data.
The Estimator processes the Node Manager report information and estimates the resource demand and duration of the task. And the disk and network bandwidth resource demand of the task is divided into 3 types of tasks and data which are respectively speculated in the same server, the same rack and the other types according to the relative position of the task and the data.
The required computing resource demand and time of a task at a certain progress are calculated according to the following formula:
Figure BDA0001424547800000111
wherein, αnβ for the nth guess resultnIs the nth time similar task resource information, rcResource compression ratio, Th, for similar task informationrE is a natural base number, which is the maximum compressibility limit factor.
The Estimator analyzes the CPU and memory space demand information of the task and divides the task into a plurality of execution stages. The Estimator traverses the CPU and memory space resource demand of the task at each progress, and respectively records the maximum value and the minimum value of the two resource quantities. And when the difference between the maximum value and the minimum value of the CPU or the memory space resource demand is greater than the division threshold value and the traversal progress reaches the division length, dividing the traversal progress into an execution stage of the task. The dividing threshold value is 20% of the total resource amount by default, and the dividing length is 5% of the total task progress by default. The partition threshold and the partition length should be adjusted according to the specific platform and load. The resource demand values of CPU, memory space, disk space and the like in the task execution stage are the maximum value of the resource demand in the stage, and the demand values of resources such as memory bandwidth, disk bandwidth, network bandwidth and the like are the average value of the resource demand in the stage.
And 5: the Scheduler matches the resource Application of the Application Master with the available computing resources on the Node Manager.
And the Scheduler adds the resource Application of the Application Master to the set to be scheduled.
The Scheduler checks whether the available computing resources on the Node Manager meet the matching requirements.
The Scheduler sorts the to-be-scheduled set according to the strategies of resource allocation fairness, data locality and the like
The Scheduler takes out the tasks to be scheduled from the sorted sets to be scheduled, and obtains the presumed information such as task resource demand and duration from the Estimator. And if the information acquisition fails, matching according to the resource application amount. And if the information acquisition is successful, sequentially matching all resource requirements of all execution stages of the task.
In the matching process, the computing resources such as a CPU, a memory, a disk, a network and the like are abstracted into two types of compressible and incompressible according to the resource characteristics. In the process of task execution, if a certain resource allocated to the task is less than the amount of the resource required by the task, the task can be normally completed by prolonging the execution time, and the resource is a compressible resource, otherwise, the resource is an incompressible resource.
The degree of Node Manager resource compression is calculated according to the following formula:
Figure BDA0001424547800000121
wherein r iscTo the resource compression ratio, RrFor the demand of resources, RuAllocating the resource. If the resource is an incompressible resource, its compression ratio rcAlways 0.
The Node Manager calculates the maximum compression rate of a compressible resource at a certain stage according to the following formula:
Figure BDA0001424547800000122
wherein, the Node Manager has n matched tasks, the current matched task is n +1, the total resource demand of n +1 tasks is larger than the total resource, the i task completes the work load in phase as wi△ p is the performance change caused by resource compression.
In the matching process, if the resource demand of a certain task is larger than the corresponding available resource quantity on the Node Manager, the matching is failed. The available resource quantity of the incompressible resource on the Node Manager is the acquisition quantity; the available resource amount of the compressible resource is according to the formula ai=ni+ri×NiCalculation of where riThe maximum compressibility of the matching stage for the resource on the server, NiFor the total amount of such resources on the server, niThe available resource amount of the resource is acquired. If the requirements of all resources in all execution phases of the task fail to match, the matching is successful.
Step 6: after the matching is successful, the Scheduler will check if the constraint conditions of all executing tasks on the Node Manager can be satisfied if the matching decision is valid. If the check passes, the Scheduler allocates computing resources to the Application Master that issued the resource request.
After allocating the resources, the Scheduler will check the remaining resources of the server where the Node Manager is located, and determine whether to enter the next scheduling.
And 7: and the Application Master which obtains the resource allocation communicates with the Node Manager where the computing resource is located, and starts a corresponding task.
Actual test results show that the resource matching result obtained by the method can avoid resource fragmentation and excessive allocation, the resource utilization efficiency of the cloud computing platform is improved, and the overall throughput rate of the cloud computing platform is finally improved.
The embodiment can be seen that the method and the device can be used for resource management and job scheduling of the cloud computing platform. According to the method, the task resource demand and the duration are presumed according to similar tasks, the tasks are divided into a plurality of execution stages based on the resource demand, the task resource demand and the server computing resources are respectively matched by stages according to the resource characteristics, the completion time of a single task is prolonged within an acceptable range if necessary to obtain higher resource utilization rate and task parallel quantity, and the overall performance is finally improved.
The invention can be used in resource management and job scheduling in the cloud computing platform, and the cluster-based resource management platform can use reference for improvement.

Claims (8)

1. A fine-grained resource matching method in a cloud computing platform is characterized by comprising the following steps:
step 1: the method comprises the following steps that the roles of servers in the cloud computing platform are divided into a computing server and a management server, the computing server is responsible for executing specific loads and regularly reports resource states to the management server; the management server is responsible for the management work of the whole cloud computing platform and comprises the steps of distributing computing tasks to the computing server;
step 2: the management server receives the information periodically reported by each computing server and speculates various resource requirements and duration of a certain task according to the similar task and the resource compression rate;
wherein, the similar task refers to a task which has the same execution logic and the same input data volume as the certain task in the load;
and step 3: the management server analyzes the CPU and memory space resource requirements of each estimated task progress and divides the task into a plurality of execution stages;
and 4, step 4: the management server selects tasks from the set to be scheduled, matches the task resource requirements with the server available computing resources in stages, compresses and matches the resource requirements according to the requirements, and specifically operates as follows:
firstly, the management server checks whether the available computing resources on the computing server meet the matching requirements;
secondly, the management server sorts the set to be scheduled according to the resource allocation fairness and the data locality strategy;
finally, the resource management server takes out the task and obtains the guess information of the task from the guess result;
if the inferred information is failed to be obtained, computing resources are matched according to the application amount of the task resources; if the guessed information is successfully obtained, sequentially matching the resource requirements of the tasks in stages;
and 5: if the resource requirements are successfully matched, the management server assumes that the resources are already allocated, checks whether all tasks on the computing server are affected and cannot meet the constraint conditions, and allocates the computing resources if all task constraint conditions are met.
2. The fine-grained resource matching method in the cloud computing platform according to claim 1, characterized in that, after step5, the following operations are further included: the management server checks whether enough resources remain on the computing server to enter the next round of matching, and if the remaining resources meet the conditions, the management server enters the next round of matching, namely, the steps 4 and 5 are repeated.
3. The fine-grained resource matching method in the cloud computing platform according to claim 1, wherein in step1, the computing server is responsible for executing specific loads and periodically reports resource states to the management server, specifically: the method comprises the steps that a calculation server periodically collects the service condition of task resources in the running process of the server, calculates the available resource information of the server and reports the available resource information to a management server;
wherein, the available resource amount r of a certain resource on the server is calculated according to the formula (1):
Figure FDA0002380186050000021
in the formula, riAmount of resources sampled i, tiIs the duration of the ith sample, T is the total time of the sample, and n is the number of samples.
4. The fine-grained resource matching method in the cloud computing platform according to claim 1, wherein in step2, various resource requirements and durations of a certain task are presumed according to similar tasks and resource compression ratios, and the resource requirement types include a CPU, a memory space, a disk bandwidth and a network bandwidth;
the method comprises the following steps of dividing the disk and network bandwidth resource demand of a task into three categories according to the relative position of the task and data for speculation, wherein the first category is as follows: the task and the data are in the same server; the second type: tasks and data are in the same rack; in the third category: others;
the specific operation is as follows:
the resource requirement and the duration time required by the task at a certain progress are calculated according to the formula (2):
Figure FDA0002380186050000022
in the formula, αnβ for the nth guess resultnIs the nth time similar task resource information, rcFor resource compression of task information, ThrE is a natural base number, which is the maximum compressibility limit factor.
5. The fine-grained resource matching method in the cloud computing platform according to claim 1, wherein step3 is specifically operated as: the management server traverses the CPU and memory space resource demand of the task at each progress, and respectively records the maximum value and the minimum value of the two resource quantities;
when the difference between the maximum value and the minimum value of the CPU or the memory space resource demand is greater than a division threshold value and the traversal progress reaches the division length, dividing the traversal progress into an execution stage of a task;
the resource demand values of the CPU, the memory space and the disk space in the task execution stage are the maximum value of the resource demand of each progress in the execution stage, and the demand values of the disk bandwidth and the network bandwidth resource are the average values of the resource demand of each progress in the execution stage.
6. The fine-grained resource matching method in the cloud computing platform according to claim 5, wherein the method for dividing the task execution stage specifically comprises the following steps:
1)P、Ps、Perepresenting task progress, phase start progress and phase end progress, respectively, Cmax、Cmin、MmaxAnd MminRespectively representing the maximum value and the minimum value of CPU and memory space resource requirements in the stage; wherein, P, Ps、PeInitialization is 0; cmin、MminIs 100, Cmax、MmaxInitialization is 0;
2) task progress P and phase end progress PeIncreasing by 1, if the task progress P reaches 100%, then P is addedsEnding after 100% of the new stages are divided, otherwise continuing to step 3);
3) CPU requirement C if current progresspGreater than CmaxThen C will bemaxIs updated to CpEntering step 5);
4) CPU requirement C if current progresspLess than CminThen C will beminIs updated to Cp
5) If the current schedule has memory requirement MpGreater than MmaxThen M will bemaxIs updated to MpEntering step 7);
6) if the current schedule has memory requirement MpLess than MminThen M will beminIs updated to Mp
7) If C is presentmaxAnd CminIs greater than the total CPU resource amount C and the threshold ThcProduct of or MmaxAnd MminIs greater than the total memory amount M and the threshold ThmStep 8 is entered if the product of (c) is not obtainedStep2) is entered;
8) if P iseAnd PsIs greater than a threshold value ThpThen P will besTo PeDividing into new phases, PsIs updated to PeReinitializing Cmin、MminIs 100, Cmax、MmaxIs 0; otherwise, go back to step 2).
7. The fine-grained resource matching method in the cloud computing platform according to claim 1, wherein the management server abstracts computing resources into compressible resources and incompressible resources according to resource characteristics; the computing resources comprise CPU resources, memory resources, disk resources and network resources;
in the task execution process, if a certain resource allocated to the task is less than the resource demand of the task, the task can be normally completed by prolonging the execution time, the resource is a compressible resource, otherwise, the resource is an incompressible resource;
calculating resource compression ratio r on a servercCalculating according to the formula (3):
Figure FDA0002380186050000041
rcto the resource compression ratio, RrFor resource demand, RuAllocating amount for the resource;
if the resource is an incompressible resource, its resource compression ratio rcAlways 0.
8. The fine-grained resource matching method in the cloud computing platform according to claim 7, wherein in the matching process, if the demand is greater than the available resource amount, the matching fails;
for the incompressible resource, the calculation server does not process the available resource amount of the resource;
for compressible resources, the management server calculates the maximum compression rate of various resources of each stage of the server according to the resources and load conditions of the calculation server, and calculates the serverMaximum compression rate r of some compressible resource in each stage of servermaxCalculating according to the formula (4):
Figure FDA0002380186050000042
in the formula, n tasks are matched on the server, the current matched task is the (n + 1) th task, the total resource demand of the (n + 1) th task is greater than the total resource, and the workload of the ith task completed in the phase is wiΔ p is the performance change caused by resource compression;
in the compressible resource matching, the actual available resource amount of the resource of the calculation server is calculated according to the formula (5),
ai=ni+ri×Ni(5);
in the formula, riIs the maximum compressibility, N, of the current matching stage of such resource on the serveriFor the total amount of such resources on the server, niThe available resource amount of the resource is acquired;
and if the requirements of all resources in all execution stages of the task are not matched and fail, the matching is successful.
CN201710909672.8A 2017-09-29 2017-09-29 Fine-grained resource matching method in cloud computing platform Active CN107797863B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710909672.8A CN107797863B (en) 2017-09-29 2017-09-29 Fine-grained resource matching method in cloud computing platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710909672.8A CN107797863B (en) 2017-09-29 2017-09-29 Fine-grained resource matching method in cloud computing platform

Publications (2)

Publication Number Publication Date
CN107797863A CN107797863A (en) 2018-03-13
CN107797863B true CN107797863B (en) 2020-07-28

Family

ID=61532969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710909672.8A Active CN107797863B (en) 2017-09-29 2017-09-29 Fine-grained resource matching method in cloud computing platform

Country Status (1)

Country Link
CN (1) CN107797863B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108572875B (en) * 2018-04-28 2020-09-04 辽宁工程技术大学 Resource allocation method, device and system
CN110659126B (en) * 2018-06-29 2023-04-14 中兴通讯股份有限公司 Resource management method, device and computer readable storage medium
CN110166282B (en) * 2019-04-16 2020-12-01 苏宁云计算有限公司 Resource allocation method, device, computer equipment and storage medium
CN110247979B (en) * 2019-06-21 2021-08-17 北京邮电大学 Scheduling scheme determination method and device and electronic equipment
CN110780977B (en) * 2019-10-25 2022-06-03 杭州安恒信息技术股份有限公司 Task issuing method, device and system based on cloud computing and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102460393A (en) * 2009-05-01 2012-05-16 思杰***有限公司 Systems and methods for establishing a cloud bridge between virtual storage resources
CN103533086A (en) * 2013-10-31 2014-01-22 中国科学院计算机网络信息中心 Uniform resource scheduling method in cloud computing system
CN105718364A (en) * 2016-01-15 2016-06-29 西安交通大学 Dynamic assessment method for ability of computation resource in cloud computing platform

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9319464B2 (en) * 2013-01-22 2016-04-19 International Business Machines Corporation Storage managment in a multi-tiered storage architecture
US10007556B2 (en) * 2015-12-07 2018-06-26 International Business Machines Corporation Reducing utilization speed of disk storage based on rate of resource provisioning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102460393A (en) * 2009-05-01 2012-05-16 思杰***有限公司 Systems and methods for establishing a cloud bridge between virtual storage resources
CN103533086A (en) * 2013-10-31 2014-01-22 中国科学院计算机网络信息中心 Uniform resource scheduling method in cloud computing system
CN105718364A (en) * 2016-01-15 2016-06-29 西安交通大学 Dynamic assessment method for ability of computation resource in cloud computing platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
采用资源划分的云环境下Hadoop资源许可调度方法;周墨颂等;《西安交通大学学报》;20150831;全文 *

Also Published As

Publication number Publication date
CN107797863A (en) 2018-03-13

Similar Documents

Publication Publication Date Title
CN107797863B (en) Fine-grained resource matching method in cloud computing platform
Jalaparti et al. Network-aware scheduling for data-parallel jobs: Plan when you can
Grandl et al. Altruistic scheduling in {Multi-Resource} clusters
CN112162865B (en) Scheduling method and device of server and server
US8869164B2 (en) Scheduling a parallel job in a system of virtual containers
US9875135B2 (en) Utility-optimized scheduling of time-sensitive tasks in a resource-constrained environment
US9201690B2 (en) Resource aware scheduling in a distributed computing environment
US8209695B1 (en) Reserving resources in a resource-on-demand system for user desktop utility demand
US20140019987A1 (en) Scheduling map and reduce tasks for jobs execution according to performance goals
US20130290972A1 (en) Workload manager for mapreduce environments
Chard et al. Cost-aware cloud provisioning
US8843929B1 (en) Scheduling in computer clusters
WO2011076608A2 (en) Goal oriented performance management of workload utilizing accelerators
US8756307B1 (en) Translating service level objectives to system metrics
CN112272203A (en) Cluster service node selection method, system, terminal and storage medium
US8819239B2 (en) Distributed resource management systems and methods for resource management thereof
CN114116173A (en) Method, device and system for dynamically adjusting task allocation
Chin et al. Adaptive service scheduling for workflow applications in service-oriented grid
JPH1027167A (en) Method for distributing load of parallel computer
Postoaca et al. h-Fair: asymptotic scheduling of heavy workloads in heterogeneous data centers
Zacheilas et al. Orion: Online resource negotiator for multiple big data analytics frameworks
Xie et al. A novel independent job rescheduling strategy for cloud resilience in the cloud environment
Thai et al. Algorithms for optimising heterogeneous Cloud virtual machine clusters
CN113301087A (en) Resource scheduling method, device, computing equipment and medium
Sun et al. Quality of service of grid computing: resource sharing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant