CN111258746A

CN111258746A - Resource allocation method and service equipment

Info

Publication number: CN111258746A
Application number: CN201811455536.7A
Authority: CN
Inventors: 张杨; 冯亦挥; 李治; 汤志鹏
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-09
Anticipated expiration: 2038-11-30
Also published as: CN111258746B

Abstract

The application provides a resource allocation method and service equipment, wherein the method comprises the following steps: acquiring resource use data of allocated resources in a resource pool; determining surplus resources from the allocated resources according to the resource usage data of the allocated resources; and allocating the surplus resources to target jobs of resources to be allocated. By the scheme, the problem of resource waste in the existing resource allocation method in the distributed system is solved, and the technical effects of effectively reducing the resource waste and improving the resource utilization rate are achieved.

Description

Resource allocation method and service equipment

Technical Field

The present application belongs to the technical field of data processing, and in particular, to a resource allocation method and a service device.

Background

With the development of data processing technology, data processing methods for performing job processing using a distributed system have gradually become widespread. Based on a distributed system, most of the existing resource allocation methods are that a job manager sends resource use applications to a resource scheduler according to the job scale involved in the execution of a target job to apply for physical resources meeting the resource demand of job execution; the resource scheduler searches static resources meeting the resource demand from unallocated resources according to the resource use application and provides the static resources to the job manager; and then the operation manager sends the operation node for executing the target operation to the machine where the static resource is located so as to complete the corresponding operation.

However, in the above resource allocation method, in order to ensure that the target job can be stably executed, a relatively large amount of resources is often set as the resource demand amount according to the size of the target job. However, the target job does not require so large amount of resources at the time of actual execution, and these resources are wasted.

Aiming at the problem of resource waste in the existing resource allocation mode, an effective solution is not provided at present.

Disclosure of Invention

The application aims to provide a resource allocation method and service equipment to solve the problem of existing resource waste.

The application provides a resource allocation method and service equipment, which are realized as follows:

a method of resource allocation, comprising:

acquiring resource use data of allocated resources in a resource pool;

determining surplus resources from the allocated resources according to the resource usage data of the allocated resources;

and allocating the surplus resources to target jobs of resources to be allocated.

A service device comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

acquiring resource use data of allocated resources in a resource pool;

A computer readable storage medium having stored thereon computer instructions that when executed perform the steps of:

acquiring resource use data of allocated resources in a resource pool;

According to the resource allocation method and the resource allocation system, the surplus resources in the allocated resources are determined, and the surplus resources are allocated to the target operation of the resources to be allocated, namely, the surplus data in the allocated resources are allocated for the second time, so that the technical problem of low resource utilization rate in the existing method can be solved, the resources are fully utilized, and the technical effects of improving the operation processing efficiency are achieved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a schematic diagram of an architecture of a resource allocation system provided herein;

FIG. 2 is a schematic diagram of another architecture of a resource allocation system provided herein;

FIG. 3 is a schematic illustration of user request job processing provided herein;

FIG. 4 is a timing diagram of resource allocation provided herein;

FIG. 5 is a flow chart of a method of resource allocation provided herein;

FIG. 6 is an architecture diagram of a service device provided herein;

fig. 7 is a block diagram of a resource allocation apparatus according to the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In consideration of the fact that the existing resource allocation method determines the peak value of the resource amount required by the target job when allocating the running resources for the target job, and then searches resources which can reach the determined peak value resource amount from the unallocated resources to allocate to the target job. However, in the actual job processing, the resource usage amount of the target job is not always maintained at the resource amount peak, and the resource consumed by the target job in the execution of the target job in most of the time period is often smaller than the resource amount peak.

For example, the amount of resources consumed by the target job only in a very small time period in the middle of the execution process may reach the total amount of resources previously allocated to the target job, and the amount of resources actually used for running the target job in other time periods may be far lower than the peak amount of resources. In this case, the unused margin resources (i.e. the difference between the total resources allocated to the target job and the actual resources used to run the target job) are idle, and the margin resources are idle during this period, which results in waste of resources.

Further, since the amount of resources in the system is limited, when there are many jobs, if the resources are already allocated, other jobs can only enter a waiting state for allocation, and even if there are residual resources in the amount of resources allocated by some target jobs, these residual resources cannot be allocated to the jobs in the waiting state.

Based on this, in this example, it is considered that if a surplus resource among allocated resources can be allocated to a job of a resource to be allocated, the processing load of the system when there are many jobs can be relieved to some extent. Specifically, the residual resources may be determined according to the usage of resources in the jobs to which the resources are allocated, and the residual resources may be allocated to the jobs in the queue waiting state of the system, so as to implement processing of the jobs in the queue waiting state.

As shown in fig. 1, an embodiment of the present application provides a resource allocation system. Wherein, the resource allocation system can include: a resource allocation server 101 and a plurality of physical resources 102 (machines). Among them, the resource allocation server 101 is configured to allocate a plurality of physical resources, that is, to allocate a plurality of physical resources to a target job requesting a resource.

In an embodiment, the resource provided by the physical resource may specifically include one or more of the following resources: and disk resources, network resources, CPU resources, GPU resources and the like required by job execution. The physical resource may be a server or a server cluster, or may be a cloud processor, a cloud storage, or the like. It should be noted that the above listed resources are only for better illustration of the embodiments of the present application. In the specific implementation, other types of resources besides the above listed resources may be introduced as the resources provided by the physical resources according to specific situations and job requirements. The present application is not limited thereto. The physical resources may be processing resources, storage resources, etc. needed to process the target job.

The resource allocation server 101 may be a single server or a processor, or may be a server cluster, and when the resource allocation server 101 is actually implemented, a specific implementation manner of the resource allocation server 101 may be selected according to an actual need, which is not limited in this application.

Further, it is considered that if a margin resource among the allocated resources needs to be allocated to a job in a resource waiting state to implement processing of the job in the waiting state, the resource amount of the margin resource needs to be known. In order to determine the resource amount of the surplus resources in the allocated resources, a machine node for monitoring the resource usage condition may be set on the physical resources, the set machine node may acquire the resource usage data of the physical resources in real time, and the machine node may upload the determined resource surplus to the resource allocation server in real time, so that the resource allocation server may reallocate the surplus resources.

It should be noted that the setting of a machine node on a physical resource to detect the resource usage is merely an exemplary description, and in practical implementation, other ways of determining the resource usage may be adopted, for example, a centralized detector may be set to monitor the usage of all physical resources, or a machine node may be set to monitor the resource usage of one or more physical resources that are associated with the physical resource where the machine node is located, or the usage of the resource amount allocated to each target job may be calculated according to the execution of the job. The specific method for determining the resource use condition may be determined according to the actual use scenario and use condition, which is not limited in the present application.

Further, considering that the surplus resources are resources with relatively small resource amount, if the resource allocation is performed by searching the physical resources satisfying the resource amount from the surplus resources based on the amount required by the job, if the amount required by the job is large, there is a high possibility that the job cannot match reasonable surplus resources, and thus, matching of the next job is performed, and matching efficiency is low. For this reason, it is considered that a job that can be processed by first matching the margin resource can be used as a job to be reallocated for the margin resource, in a manner of dematching the job by the margin resource. For example, the resource amount of the margin resource is 15, a job list in the queue may be called, and the resource amounts required for the respective jobs in the job list may be sequentially matched, and for example, if the resource amount required for job 1 in the job list is 20, 20>15 and is not satisfied, the determination of the next job is performed, and the resource amount required for job 2 is 13, 13<15 and is satisfied, and therefore, the margin resource may be allocated to job 2 to realize the processing of job 2.

However, if the determined residual resources are allocated to the job in the queue (assumed to be job a) and the resource amount requirement of the original job (i.e., the job to which the non-residual resources are allocated, assumed to be job M) increases, the resources allocated to job a need to be reallocated to job M, which corresponds to the need to suspend or stop the processing of job a.

To this end, in this example, a resource allocation server is proposed, as shown in fig. 2, which may include: a resource scheduler and a job manager, which may be coupled to the resource scheduler. The job manager may specifically communicate with a plurality of job nodes, which may be used to perform specific job tasks.

In a specific implementation, as shown in fig. 3, a user may send a job request to the resource allocation server, and a job manager in the resource allocation server may determine a corresponding target job according to the job request, and analyze a resource demand to be used for executing the target job, so as to generate a resource usage application including the resource demand. And the job manager sends the resource use application to the resource scheduler so that the resource scheduler can allocate corresponding resource execution target jobs from a plurality of physical resources. The resource scheduler is coupled to a machine node of each of the plurality of physical resources. Each physical resource is provided with a machine node, and the machine node is used for monitoring and recording resource use data in the corresponding physical resource in real time. The resource usage data may specifically include: the situation of unallocated resources among the physical resources, the usage situation of allocated resources among the physical resources, and the like.

Specifically, the usage of the allocated resource in the physical resource may further include a current usage rate of the allocated resource, a current remaining amount of the allocated resource, a usage rate of the allocated resource within a preset time period, and the like. The resource scheduler may obtain current resource usage data for each physical resource by the machine node. When acquiring the current resource usage data, the resource scheduler can communicate with the machine node in real time, so that the current resource usage data can be acquired in real time. Of course, in order to reduce the data processing pressure, the current resource usage data of the physical resource transmitted by the machine node may also be received at preset time intervals (e.g., 2 minutes).

After receiving the current resource usage data, the resource scheduler may first retrieve each physical resource according to the current resource usage data, and determine whether an unallocated resource that meets a resource requirement of the target job, i.e., a first resource (also referred to as a Normal resource), exists in each physical resource. The resource demand amount according to the target job may be understood as a resource demand amount equal to or larger than the target job. For example, the resource demand of the target job a is 5G CPUs, the resource scheduler may search the plurality of physical resources according to the current resource usage data, and find that 6G CPUs are not allocated in the physical resource 2, and at this time, the resource scheduler may allocate 5G CPUs in 6G in the physical resource 2 as the first resource to the target job a for use.

In a case where it is determined that the first resource exists in the physical resources, the resource scheduler may send a resource usage list including the first resource information to the job manager and the machine node of the physical resource where the first resource exists, respectively. The first resource information may include location information of the first resource, which indicates which unallocated resource of the physical resources the first resource is. In this way, the job manager can send the job node for executing the target job to the physical resource on which the first resource is located according to the resource usage list. The machine node of the physical resource where the first resource is located may allow the job node to temporarily use the first resource in the physical resource to execute the target job according to the resource usage list. It should be noted that the first resource may be understood as a resource with a higher reliability level. Once the resource is allocated to a target job, the target job is protected by the system from the first resource, i.e., the first resource for the target job is not reclaimed until the target job is completed.

When it is determined that the first resource does not exist in the physical resources, the resource scheduler may continue to search the allocated resources in the physical resources according to the current resource usage data, and search whether there is a second resource (which may also be referred to as a over-sell resource) that is a surplus resource that is not currently used and whose resource amount satisfies the resource demand of the target job. For example, the resource demand of the target job B is 2G CPUs, the resource scheduler may search the unallocated resources of the plurality of physical resources according to the current resource usage data, and when it is determined that no CPU equal to or greater than 2G exists in the unallocated resources (i.e., the first resource does not exist), may continue to search the allocated resource entries in each physical resource according to the current resource usage data, find that one allocated resource (which has been allocated previously for processing the target job a) exists in the physical resource 2, and the total allocated resource amount is 5G CPUs, but only 3G CPUs are currently used, that is, the allocated resource includes a remaining resource of 2G CPUs, and is currently in an idle state and is not used. In order to improve the resource utilization rate and avoid that the target job B cannot be executed continuously because the unallocated resource is not available temporarily, the CPU of 2G which is not currently used among the allocated resources in the physical resource 2 may be temporarily allocated to the target job B for use as the second resource. In this way, although there is no unallocated resource currently meeting the resource demand of the target job B, a part of the remaining amount resources, which are not currently used, can be temporarily called from among the previously allocated resources to be allocated to other jobs for temporary use.

The second resource may be specifically understood as a surplus resource that is not currently used and temporarily borrowed from the first resource allocated to another job for use, with respect to the first resource. Furthermore, the second resource has a lower reliability level than the first resource. Specifically, when the target job originally assigned with the first resource is used as the surplus resource allocated to other target jobs as the second resource before the target job needs to be utilized at a certain stage, the system preferentially ensures the execution of the target job originally assigned with the first resource, that is, stops the subsequent target job assigned with the second resource, and returns the second resource to the target job originally assigned with the first resource for use, so as to ensure that at least the target job assigned with the first resource can be executed smoothly. For example, in the case where there is no unallocated resource that meets the resource demand, the target job a and the target job B can be smoothly executed by historically allocating, as the second resource, a part of the remaining resources that are not currently used among the first resources originally allocated to the target job a to the target job B. After a period of execution, the resource scheduler finds, according to the updated current resource usage data, that the target job a currently needs to use the resource temporarily allocated before, and at this time, in order to ensure smooth execution of the target job a originally allocated with the first resource, the resource scheduler may stop the target job B, and release and return the part of the residual resource temporarily borrowed from the first resource of the target job a before, that is, the second resource used by the target job B, so that the target job a can still be stably executed without job stop due to lack of resources.

As can be seen, the reliability level of the second resource is lower than that of the first resource, and when the use of the first resource and the use of the second resource conflict (for example, in the process of executing the target job by using the second resource borrowed from the first resource, the target job originally allocated with the first resource uses the second resource temporarily borrowed), the system preferentially protects the execution of the target job allocated with the first resource, stops the execution of the target job allocated with the second resource, and recovers the second resource. Thus, the target job assigned with the second resource is at a higher risk of being stopped during execution due to a conflict with the use of the first resource. Therefore, in order to reduce the possibility that the target job assigned with the second resource is stopped during execution, when the second resource is retrieved and determined from the already assigned first resource, a resource having a risk of conflicting with the use of the first resource within a preset time period smaller than the threshold parameter may be selected as the second resource. Specifically, for example, it is possible to retrieve and determine whether there is a currently unutilized resource satisfying the resource demand amount as the second resource from the first resource of the target job whose resource amount used by the target job has been over the resource amount peak within the preset time period, based on the current resource usage data and the historical resource usage data (i.e., the previously obtained current resource usage data).

It should be noted that, in the process of executing a target job, the amount of resources used may not be at the peak value of the amount of resources all the time, and the amount of resources used by the job may usually be at the peak value of the amount of resources after the peak value of the amount of resources is passed, and then the peak value of the amount of resources occurs for the second time after a relatively long period of time. Therefore, the probability that the second resource determined from the first resource of the target job whose resource amount used by the target job has an excessive resource amount peak in the preset time period conflicts with the use of the first resource in a future time period is relatively smaller, so that the target job assigned with the second resource has a higher probability of being executed. Of course, a prediction model of the resource usage situation may also be established according to historical resource usage data, an allocated resource with a low resource usage rate in a future time period is predicted as a target resource according to the prediction model of the resource usage situation, and then a surplus resource meeting the resource demand is retrieved from the target resource and determined as a second resource. Of course, it should be noted that the above-mentioned manner for reducing the risk that the allocated second resource is temporarily recovered is only for better explaining the embodiment of the present application, and in particular, the second resource may be determined in a suitable manner according to the job condition to be processed.

In the present embodiment, in consideration of temporarily allocating and using, as the second resource, a resource that is not used among the resources already allocated to the physical resources, such a processing method itself increases the operation load of the physical resources. Once the operation burden borne by the physical resources is too high, exceeding a certain limit may cause a downtime or a restart of the whole physical resources, which inevitably poses a risk to the whole operation of the job. In order to avoid the risk caused by the over-high physical resource burden, the resource scheduler may obtain current resource usage data through a machine node arranged on the physical resource, determine a current operating state parameter of the physical resource according to the current resource usage data, and compare the current operating state parameter of the physical resource with a threshold state parameter of the physical resource to determine whether the current operating state parameter is greater than the threshold state parameter. Under the condition that the current running state parameter is determined to be larger than the threshold state parameter, the current physical resource can be judged to have the risk of downtime or restart, in order to protect the stability of the overall running of the operation, the second resource which is distributed and has a lower reliability grade can be preferentially recovered, and the execution of the target operation of the distributed second resource is stopped, so that the overall stability of the physical resource is ensured.

Further, considering that the execution of the target job assigned with the second resource conflicts with the use of the first resource, the second resource is collected to protect the use of the first resource, and the target job assigned with the second resource is terminated. At this time, the execution of the target job returns to zero, and a new resource is re-registered to resume execution. In fact, the target job is executed by using the second resource for a period of time before being stopped, some intermediate results are obtained, when the second resource is recovered, the intermediate results obtained by the target job before being stopped by using the second resource can be recorded when the target job is stopped, so that when the target job subsequently obtains a new resource, the job node can execute the subsequent target job by using the intermediate results as the starting point of job execution, thereby avoiding waste of intermediate data obtained based on the previous second resource, and improving resource utilization rate and processing efficiency.

Although the risk that the allocated second resource is recovered before the job is completed can be reduced to a great extent by the above-mentioned manner, it cannot be completely ensured that the corresponding target job can be successfully and completely performed by using the second resource. The first resource is used to ensure that the target operation can be completely executed in consideration of the reliability level of the first resource is higher than that of the second resource. Therefore, in the implementation, after the second resource is allocated and the job node starts to run the corresponding target job by using the second resource, and before the target job is completely executed: the resource scheduler may continue to acquire the updated current resource usage data, and retrieve, according to the updated current resource usage data, whether an unallocated resource that satisfies a resource demand of the target job, that is, the first resource, exists in each physical resource. When the first resource is retrieved, the newly retrieved first resource is allocated to the target job to which the second resource has already been allocated.

As shown in fig. 4, when the newly retrieved first resource is allocated to the target job to which the second resource has already been allocated, the processing may be performed in cases. Specifically, it may be determined whether the newly retrieved first resource and the allocated second resource are not in the same physical resource. In the case where it is determined that the newly retrieved first resource and the second resource are located in the same physical resource, the newly retrieved first resource may be directly returned to the resource originally allocated to the other target job, and the label of the second resource being used by the target job may be modified to the first resource. This ensures that the target job can be stably executed without interruption. In the case where it is determined that the newly retrieved first resource and the second resource are located in different physical resources, the original execution link of the target job using the allocated second resource may be maintained, and the execution link of the same target job may be opened using the newly retrieved first resource. Specifically, the resource scheduler may send a resource usage list including the newly retrieved first resource to the job manager and the machine node of the physical resource where the newly retrieved first resource is located. The job manager can send the job node for executing the same target job to the newly retrieved first resource according to the resource use list; the machine node may allow the job node to invoke the first resource on the physical resource to execute the target job according to the resource usage list. The above-mentioned method is equivalent to executing the same target job on two different physical resources.

The target operation executed by utilizing the newly searched resource has higher reliability level and can be executed successfully. This can avoid the problem that the target job possibly occurring by using the second resource alone cannot be smoothly executed. In particular, the job node using the second resource may be maintained to continue execution of the target job. At this time, the resource scheduler may acquire execution information of the target job on the first resource and execution information of the target job on the second resource to determine an execution progress of the target job on the first resource and an execution progress of the target job on the second resource, respectively. The resource scheduler can further determine whether the target job has been executed and completed on one of the first resource and the second resource according to the execution information of the target job on the first resource and the execution information of the target job on the second resource; in the event that it is determined that the target job has completed executing on one of the first and second resources, the resource scheduler may stop execution of the target job on the other resource, and release the first resource and return the second resource back. For example, the resource scheduler may determine that the target job has completed executing on the first resource based on the execution information of the target job on the first resource and the execution information of the target job on the second resource, and then the resource scheduler may stop the execution of the target job on the second resource, release the first resource and return the borrowed second resource.

Further, it is considered that, when a plurality of target jobs to be executed are processed, resources are generally allocated to a target job with a higher priority in accordance with the priority of the target job. The priority may be determined according to the importance of the target job. This may cause the following situations to occur when allocating resources for a plurality of target jobs: for the job corresponding to the target job with higher priority, the importance degree is higher, and the job can be guaranteed to be smoothly executed and completed by priority. Therefore, even if no unallocated resource (i.e., first resource) currently available is provided for the higher priority target job, the first resource meeting the requirement is then preferentially allocated to the higher priority target job once it occurs, i.e., the higher priority target job has a higher probability of acquiring the first resource with a higher stability level. In this case, if the second resource is preferentially found and allocated for such a target job having a higher priority when there is no currently available allocated resource, it is highly likely that the target job has already acquired the first resource having higher stability before the completion of the target job is performed using the second resource, and the second resource previously allocated to the target job is a waste of resources to some extent. Therefore, in consideration of the specific feature of resource allocation to a plurality of target jobs, in order to avoid resource waste, the second resource may be preferentially determined and allocated to a type of target job having a relatively low priority.

Specifically, in a case where the job to be executed includes a plurality of target jobs, the resource scheduler may determine the priority of each target resource according to each resource usage application for the target job; and further, under the condition that no unallocated resource is available, corresponding second resources can be determined and allocated to the target jobs in sequence according to the sequence from low priority to high priority of the target jobs. Therefore, resource waste caused by the fact that the target operation obtains the first resource after the second resource is allocated to the target operation with the higher priority can be avoided.

Fig. 5 is a flowchart of a method of resource allocation provided in the present application. Although the present application provides method operational steps or apparatus configurations as illustrated in the following examples or figures, more or fewer operational steps or modular units may be included in the methods or apparatus based on conventional or non-inventive efforts. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution sequence of the steps or the module structure of the apparatus is not limited to the execution sequence or the module structure described in the embodiments and shown in the drawings of the present application. When the described method or module structure is applied in an actual device or end product, the method or module structure according to the embodiments or shown in the drawings can be executed sequentially or executed in parallel (for example, in a parallel processor or multi-thread processing environment, or even in a distributed processing environment).

As shown in fig. 5, the resource allocation method may include the following steps:

step 501: acquiring resource use data of allocated resources in a resource pool;

step 502: determining surplus resources from the allocated resources according to the resource usage data of the allocated resources;

step 503: and allocating the surplus resources to target jobs of resources to be allocated.

Specifically, in step 503, allocating the surplus resource to the target job of the resource to be allocated may specifically include: and allocating the surplus resources to target operation of resources to be allocated, wherein the resource demand is less than or equal to the resource amount of the surplus resources.

It is considered that when resource allocation is actually performed, in the case where unallocated resources are sufficient, resource allocation may be performed first through unallocated resources, and allocation through surplus resources is not required. When the method is implemented, before determining the surplus resources from the allocated resources according to the resource usage data of the allocated resources, the resource usage data of the unallocated resources can be acquired; determining whether unallocated resources meeting requirements exist in the unallocated resources according to the resource usage data of the unallocated resources, wherein the resource amount of the unallocated resources meeting the requirements is greater than or equal to the unallocated resources of the resource demand amount of the target operation of the resources to be allocated; and if the unallocated qualified resource exists in the unallocated resources, allocating the unallocated qualified resource to the target job.

In order to realize effective integration of the resource usage, a machine node for performing statistics on the resource may be provided, and the machine node may be provided on the physical resource, so that the resource usage may be integrated. That is, the resource usage data of the allocated resources may be acquired by a machine node preset on the physical resources.

Because the residual resources of the allocated resources are reallocated, and the situation that the resources are insufficient when the original job is processed exists when the residual resources are allocated, the unallocated resources can be reallocated to the target job under the condition that the unallocated resources can meet the requirement of the target job, and then the residual resources are returned to the original job. That is, after the surplus resources are allocated to the target job of the resources to be allocated, whether unallocated resources meeting the requirements exist in the resources to be allocated may be detected; and sending the qualified unallocated resource to the target job when the qualified unallocated resource exists in the resources to be allocated. Further, it may be determined whether the unallocated resource meeting the requirements and the surplus resource allocated to the target job are located on the same physical resource; and under the condition that the unallocated resource meeting the requirement and the residual resource allocated to the target operation are positioned in the same physical resource, returning the unallocated resource meeting the requirement to the operation corresponding to the allocated resource to which the residual resource belongs. And under the condition that the qualified unallocated resource and the residual resource allocated to the target job are not located in the same physical resource, allocating the qualified unallocated resource to the target job to execute the target job.

After the unallocated resource meeting the requirement is allocated to the target job to execute the target job, the execution information of the target job on the surplus resource and the target as the execution information on the unallocated resource meeting the requirement can be respectively acquired; and determining whether the target job is executed and completed on at least one of the residual resource and the unallocated resource meeting the requirement according to the execution information of the target job on the residual resource and the execution information of the target job on the unallocated resource meeting the requirement, and stopping the execution of the target job on the residual resource and the unallocated resource meeting the requirement when the target job is determined to be executed and completed on at least one of the residual resource and the unallocated resource meeting the requirement. That is, if resources are sufficient to simultaneously allocate margin resources and normal unallocated resources for the target job, and then both are simultaneously executed, after one of the executions is completed, both resources are released, thereby shortening the execution time of the target job.

Further, considering that the resource is secondarily allocated, but it is obviously unreasonable if the allocation affects the normal operation of the task to be allocated at the beginning, in order to avoid affecting the orderly operation of the task to which the resource is allocated originally, when the resource usage data of the allocated resource to which the residual resource belongs can be obtained after the residual resource is allocated to the target job of the resource to be allocated in the actual implementation; determining an operation state characterization parameter of the allocated resource of the surplus resource according to the resource usage data of the allocated resource to which the surplus resource belongs; and under the condition that the operation state characterization parameter is larger than the threshold state parameter, withdrawing the surplus resource.

In one embodiment, allocating the surplus resource to a target job of a resource to be allocated whose resource demand is less than or equal to the resource amount of the surplus resource may include: and when the target job of the resource to be allocated, of which the resource demand amount is less than or equal to the resource amount of the residual resource, comprises a plurality of target jobs, allocating the residual resource to the target job with the lowest priority in the plurality of target jobs. That is, an allocation with a low priority is selected, thereby avoiding an obstacle in the resource operation process caused by a high allocation priority.

In the actual implementation, if the job node receives a Normal resource (i.e., a regular resource) during the operation in the over-sell form, the following process may be performed:

1) if the Normal resource and the over-sell resource are on the same machine (i.e., on the same machine node), the job manager may notify the machine node to transfer the job node from the over-sell form to the Normal form;

2) if the Normal resource and the over-sell resource are not on the same machine, the job manager may start a copy of the job node according to the Normal resource, which is equivalent to having 2 identical job node instances running simultaneously, and finally take the first executed job node instance as an actually completed job node and kill another executing job node.

The machine node is an actual executor for monitoring the real utilization rate of the physical resources of the machine, and when the utilization rate of the resources is low, the machine node allows the execution of the over-selling operation node. If the resource utilization rate is in a high water level, the machine node refuses to start the over-selling operation node in order to ensure that the execution of the Normal operation node is guaranteed, and even the executed over-selling operation node is initiatively killed.

Specifically, after the surplus resources are allocated to the target operation of the resources to be allocated, the operation parameters of the machine nodes to which the surplus resources are allocated can be detected in real time; determining whether the machine node allocated with the surplus resources exceeds a preset load threshold value or not according to the operation parameters; and in the case that the preset load threshold value is determined to be exceeded, the allocated residual resources are reclaimed.

Wherein the monitoring of the resource by the machine node may be multidimensional, for example, may include but is not limited to: disk IO, network transmission, machine Load, CPU utilization rate, memory utilization rate and the like, and if a certain dimension is in a high water level, the running environment of the machine is unstable. For each dimension, two values can be set, one for the early warning value and one for the danger value. If a certain dimension reaches an early warning value, the machine node can be set to refuse to start a new over-selling operation node, if a dangerous value is reached, the machine node initiatively kills part of over-selling operation nodes until the resource use is lower than the dangerous value.

The machine node can periodically send the resource use condition (actual use and early warning value) of each dimensionality to the resource manager while monitoring the resource use condition of the machine node, so that the resource manager can select a machine with low real physical resource utilization rate to start the over-selling operation node, and the running stability of the over-selling operation node can be guaranteed to the maximum extent.

For the selection of the job node, when the selection is implemented, the job node is considered to have different priorities, wherein the priorities can be distinguished according to the importance degree of the service, and the higher the priority is, the more the resource scheduler should preferentially allocate the Normal resource to the job node. When cluster resources are in shortage, a plurality of job nodes with different priorities are in a state of queuing for waiting for the resources. In this example, the over-selling resources are allocated to the jobs with low priority, mainly because for a single job node, when the cluster is in shortage, the resource scheduler cannot accurately give when the resource is allocated to the single job node, but overall, the cluster resource must flow to the job node with high priority, that is, the higher the priority of the job node is, the higher the probability of allocating the resource is. Assuming that the high priority job node in the queue is selected for resource over-selling, the greater the probability of assigning a Normal resource during the over-selling execution. If the Normal resource and the over-selling resource are optimal on the same machine, the improvement of the utilization rate of the cluster physical resource is the greatest value, otherwise, the same operation node is started on the machine where the Normal resource is located, so that the over-selling node or the Normal node runs out first, and the other node always runs out, so that the utilization rate of the cluster resource is improved, but the cluster resource is wasted from the perspective of the operation node. Therefore, a low priority job node can be selected for over-selling, so that the probability of being allocated to the Normal resource is low, the probability of the waste is lower, and the low priority job node is optimal if running successfully in the over-selling mode, and the time required for running is not slower than the time required for waiting for the Normal resource to run again if being killed by the machine node in the running process.

Specifically, the allocating the surplus resource to the target job of the resource to be allocated may include: determining whether a job with the resource demand less than or equal to the resource amount of the surplus resource exists in a job pool; and when a plurality of jobs with the resource demand less than or equal to the resource amount of the surplus resource exist in the job pool, allocating the surplus resource to the job with the lowest priority in the job pool.

For the selection of the over-selling machine node, the real physical resource use condition and the early warning value of the machine in each dimension are reported to the resource manager by the machine node, so that the resource manager can count the health score of each dimension, namely the early warning value-resource use condition, if the health score of a certain dimension is less than or equal to 0, the over-selling operation node cannot be allocated to the machine, otherwise, the health scores of all the dimensions are summed to serve as the health score of the machine. And sorting all machines capable of allocating the over-selling resources from high to low according to the health scores of the machines, selecting the machines of the TopN, and allocating the over-selling resources to the machines. When implemented, the health score of the machine may be controlled to be updated once per second.

Specifically, when resource over-sale scheduling is performed, the most suitable machine allocation over-sale resource can be determined according to the following manner, for example, the multi-dimensional operating parameters of multiple machine nodes in the resource pool can be obtained; performing distributability degree sequencing on the plurality of machine nodes according to the operating parameters of the plurality of dimensions of the plurality of machine nodes; and allocating the surplus resources in the allocated resources of the preset number of machine nodes with the highest allocable degree to target operation of the resources to be allocated as the determined surplus resources. Wherein the plurality of dimensions may include, but are not limited to, at least one of: disk IO, download load, CPU utilization, memory utilization.

In the practical implementation of roaring, a TopN machine of the over-selling resources can be allocated, the over-selling resources are allocated to the operation nodes with the priority level from low to high in the queuing queue, and for each machine, only 1 over-selling operation node is allocated per second.

In the embodiment of the present application, according to the obtained current resource usage data, the resource allocation system and the resource allocation method provided in the embodiment of the present application search and determine, as the second resource, a surplus resource that meets the resource demand from currently allocated resources in the absence of an available unallocated resource (i.e., the first resource); the second resources which are not used currently are temporarily redistributed to execute the target operation, so that the technical problem of low resource utilization rate in the existing method is solved, and the technical effects of fully utilizing the existing resources and improving the overall operation processing efficiency are achieved; after the second resource is allocated to the target operation, the first resource with higher reliability level is retrieved and determined for the target operation according to the updated current resource use data, so that the stability of the whole operation processing is ensured; the second resource is preferentially allocated by selecting the target operation with relatively low priority, so that the resource waste is reduced, and the resource utilization rate is further improved; in addition, the second resource with lower risk level is preferentially selected and allocated, so that the risk that the target job allocated with the second resource is stopped in the middle of execution is reduced, and the stability of the execution of the target job allocated with the second resource is improved.

The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal or a similar operation device. Taking the example of operating on a service device, fig. 6 is a hardware structure block diagram of a service device of a resource allocation method according to an embodiment of the present invention. As shown in fig. 6, the service device 10 may include one or more (only one shown) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 6 is only an illustration and is not intended to limit the structure of the electronic device. For example, the service device 10 may also include more or fewer components than shown in FIG. 6, or have a different configuration than shown in FIG. 5.

The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the resource allocation method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the resource allocation method of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission module 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission module 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In the software aspect, the resource allocation apparatus may be as shown in fig. 7, and includes:

an obtaining module 701, configured to obtain resource usage data of allocated resources in a resource pool;

a determining module 702, configured to determine, according to the resource usage data of the allocated resources, a surplus resource from the allocated resources;

an allocating module 703 is configured to allocate the surplus resource to a target job of a resource to be allocated.

In one embodiment, the determining module 702 may specifically obtain operating parameters of multiple dimensions of multiple machine nodes in the resource pool; performing distributability degree sequencing on the plurality of machine nodes according to the operating parameters of the plurality of dimensions of the plurality of machine nodes; and allocating the surplus resources in the allocated resources of the preset number of machine nodes with the highest allocable degree to target operation of the resources to be allocated as the determined surplus resources.

In one embodiment, the plurality of dimensions may include, but are not limited to, at least one of: disk IO, download load, CPU utilization, memory utilization.

In an embodiment, the allocating module 703 may specifically determine whether there is a job in the job pool whose resource demand is less than or equal to the resource amount of the surplus resource; and when a plurality of jobs with the resource demand less than or equal to the resource amount of the surplus resource exist in the job pool, allocating the surplus resource to the job with the lowest priority in the job pool.

In one embodiment, the device may further detect, in real time, an operating parameter of a machine node to which the surplus resource has been allocated after allocating the surplus resource to the target job of the resource to be allocated; determining whether the machine node allocated with the surplus resources exceeds a preset load threshold value or not according to the operation parameters; and in the case that the preset load threshold value is determined to be exceeded, the allocated residual resources are reclaimed.

In one embodiment, the apparatus may further determine whether an allocable regular resource exists in the resource pool after the surplus resource is allocated to the target job of the resource to be allocated; in the event that it is determined that there are regular resources that can be allocated, allocating the regular resources to the target job.

In one embodiment, the above apparatus may further determine whether the regular resource and the margin resource are located in the same machine node after allocating the regular resource to the target job in a case where it is determined that the regular resource exists; under the condition that the residual resources are determined to be located in the same machine node, converting the residual resources into conventional resources; and in the case of determining that the target job is not located in the same machine node, running the target job in parallel through the surplus resources and the conventional resources.

In the above example, the residual resources in the allocated resources are determined, and the residual resources are allocated to the target job of the resources to be allocated, that is, the residual data in the allocated resources are allocated secondarily, so that the technical problem of low resource utilization rate in the existing method can be solved, and the technical effects of fully utilizing the resources and improving job processing efficiency are achieved.

Although the present application provides method steps as described in an embodiment or flowchart, additional or fewer steps may be included based on conventional or non-inventive efforts. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or client product executes, it may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.

The apparatuses or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. The functionality of the modules may be implemented in the same one or more software and/or hardware implementations of the present application. Of course, a module that implements a certain function may be implemented by a plurality of sub-modules or sub-units in combination.

The methods, apparatus or modules described herein may be implemented in computer readable program code to a controller implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

Some of the modules in the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary hardware. Based on such understanding, the technical solutions of the present application may be embodied in the form of software products or in the implementation process of data migration, which essentially or partially contributes to the prior art. The computer software product may be stored in a storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, mobile terminal, server, or network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. All or portions of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

While the present application has been described with examples, those of ordinary skill in the art will appreciate that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims

1. A method for resource allocation, comprising:

acquiring resource use data of allocated resources in a resource pool;

2. The method of claim 1, wherein determining the surplus resources from the allocated resources based on the resource usage data of the allocated resources comprises:

obtaining operation parameters of multiple dimensions of multiple machine nodes in the resource pool;

performing distributability degree sequencing on the plurality of machine nodes according to the operating parameters of the plurality of dimensions of the plurality of machine nodes;

and allocating the surplus resources in the allocated resources of the preset number of machine nodes with the highest allocable degree to target operation of the resources to be allocated as the determined surplus resources.

3. The method of claim 2, wherein the plurality of dimensions comprises: disk IO, download load, CPU utilization, memory utilization.

4. The method of claim 1, wherein allocating the surplus resources to a target job of resources to be allocated comprises:

determining whether a job with the resource demand less than or equal to the resource amount of the surplus resource exists in a job pool;

and when a plurality of jobs with the resource demand less than or equal to the resource amount of the surplus resource exist in the job pool, allocating the surplus resource to the job with the lowest priority in the job pool.

5. The method of claim 1, wherein after allocating the margin resources to a target job of resources to be allocated, the method further comprises:

detecting the operation parameters of the machine nodes distributed with the surplus resources in real time;

determining whether the machine node allocated with the surplus resources exceeds a preset load threshold value or not according to the operation parameters;

and in the case that the preset load threshold value is determined to be exceeded, the allocated residual resources are reclaimed.

6. The method of claim 1, wherein after allocating the margin resources to a target job of resources to be allocated, the method further comprises:

determining whether there are allocatable regular resources in the resource pool;

in the event that it is determined that there are regular resources that can be allocated, allocating the regular resources to the target job.

7. The method of claim 6, wherein in the event that it is determined that there are regular resources, allocating the regular resources to the target job, the method further comprises, after:

determining whether the regular resource and the margin resource are located in the same machine node;

under the condition that the residual resources are determined to be located in the same machine node, converting the residual resources into conventional resources;

and in the case of determining that the target job is not located in the same machine node, running the target job in parallel through the surplus resources and the conventional resources.

8. A service device comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

acquiring resource use data of allocated resources in a resource pool;

9. The apparatus of claim 8, wherein the processor determines a margin resource from the allocated resources based on the resource usage data of the allocated resources, comprising:

10. The apparatus of claim 9, wherein the plurality of dimensions comprise: disk IO, download load, CPU utilization, memory utilization.

11. The apparatus of claim 8, wherein the processor allocates the surplus resources to a target job of resources to be allocated, comprising:

12. The apparatus of claim 8, wherein the processor, after allocating the surplus resources to the target job of resources to be allocated, further:

13. The apparatus of claim 8, wherein the processor, after allocating the surplus resources to the target job of resources to be allocated, further:

14. The apparatus of claim 13, wherein the processor, in the event that it is determined that regular resources are present, allocates the regular resources to the target job, and thereafter further:

15. A computer readable storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 7.