CN113590317A - Scheduling method, device, medium and computing equipment of offline service - Google Patents

Scheduling method, device, medium and computing equipment of offline service Download PDF

Info

Publication number
CN113590317A
CN113590317A CN202110850872.7A CN202110850872A CN113590317A CN 113590317 A CN113590317 A CN 113590317A CN 202110850872 A CN202110850872 A CN 202110850872A CN 113590317 A CN113590317 A CN 113590317A
Authority
CN
China
Prior art keywords
offline
node
computing node
resources
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110850872.7A
Other languages
Chinese (zh)
Inventor
***
张晓龙
陈谔
汪源
李莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Langhe Technology Co Ltd
Original Assignee
Hangzhou Langhe Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Langhe Technology Co Ltd filed Critical Hangzhou Langhe Technology Co Ltd
Priority to CN202110850872.7A priority Critical patent/CN113590317A/en
Publication of CN113590317A publication Critical patent/CN113590317A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration

Abstract

The embodiment of the disclosure provides a scheduling method, a scheduling device, a scheduling medium and a computing device for offline service. The method comprises the following steps: obtaining the offline resource capacity corresponding to each computing node, wherein the offline resource capacity comprises: idle resources of the online service running on the computing node, wherein the idle resources are the parts except the first used resources actually occupied in the resources taken by the online service; determining a target computing node from each computing node according to the offline resource capacity of each computing node; and scheduling the offline service to be scheduled to the target computing node. According to the method and the device, the actual resource occupation of the service is considered in the scheduling process, so that the available resource amount on the computing nodes can be more accurately reflected, and the load among different computing nodes is more balanced; in addition, the off-line service utilizes the idle resources of the on-line service, so that the waste of the idle resources of the on-line service is reduced, and the resource utilization rate of the computing node is improved.

Description

Scheduling method, device, medium and computing equipment of offline service
Technical Field
The embodiment of the disclosure relates to the technical field of computers, and more particularly, to a scheduling method, device, medium and computing device for offline service.
Background
This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
The Kubernetes cluster can be based on container technology, and can realize automation of resource and service management, including service discovery, automatic deployment, load balancing, fault detection and recovery and the like. The Kubernetes cluster may include a control Node and a plurality of computing nodes (nodes), where a service of a user may run on a computing Node, and the computing Node may be a physical machine or a virtual machine. The control node may be responsible for scheduling a user service to be operated to the compute node according to the scheduling policy, and allocating resources on the compute node to the service for the service operation.
In practice, it is found that a phenomenon that a computing node, especially a node running an online service, is often low in utilization rate of node resources, but according to the current scheduling policy, a controller cannot schedule more services to be applied to the computing node, which causes waste of node resources. Moreover, load balancing among the computing nodes does not work well.
Disclosure of Invention
In this context, embodiments of the present disclosure are expected to provide a scheduling method, an apparatus, a medium, and a computing device for offline service, so as to improve resource utilization of a computing node.
In a first aspect of the disclosed embodiments, a method for scheduling an offline service is provided, where the method includes: obtaining offline resource capacity corresponding to each computing node, wherein the offline resource capacity comprises: idle resources of the online service running on the computing node, wherein the idle resources are the parts of the resources taken by the online service except for the first used resources actually occupied; determining a target computing node from each computing node according to the offline resource capacity of each computing node; and scheduling the offline service to be scheduled to the target computing node.
In an embodiment of the present disclosure, the obtaining offline resource capacities corresponding to the computing nodes respectively includes: for each computing node, acquiring the first usage resource actually occupied by the online service running on the computing node; determining an offline resource capacity of the compute node based on the expected resource utilization of the compute node and the first used resource of the online service.
In another embodiment of the present disclosure, the method further comprises: and if the offline resource capacity determined based on the expected resource utilization amount of the computing node and the first used resource is smaller than a preset minimum offline resource amount, taking the minimum offline resource amount as the offline resource capacity of the computing node.
In another embodiment of the present disclosure, the determining a target computing node from the computing nodes according to the offline resource capacity of the computing nodes includes: for each computing node, determining available offline resources on the computing node based on the offline resource capacity of the computing node and the offline resource usage amount on the computing node, wherein the available offline resources are used for representing the remaining available offline resources in the offline resource capacity of the computing node; determining a plurality of candidate computing nodes from each computing node according to the available offline resources corresponding to each computing node; and/or determining the target computing node from a plurality of candidate computing nodes according to available offline resources on each candidate computing node.
In yet another embodiment of the present disclosure, the determining available offline resources on the computing node based on the offline resource capacity of the computing node and the offline resource usage amount on the computing node comprises: acquiring a second use resource actually occupied by the existing offline service on the computing node and an offline demand resource of the existing offline service; taking the larger value of the second used resource and the offline demand resource as the offline resource usage amount; and removing the offline resource usage amount from the offline resource capacity of the computing node, and determining available offline resources on the computing node.
In yet another embodiment of the present disclosure, the determining a plurality of candidate computing nodes from each computing node according to the available offline resources respectively corresponding to each computing node includes: and in response to the available offline resource being greater than or equal to an offline demand resource of the offline service to be scheduled, taking the computing node as the candidate computing node.
In yet another embodiment of the present disclosure, the determining the target computing node from the plurality of candidate computing nodes according to available offline resources on each candidate computing node comprises: for each candidate computing node, determining a node evaluation parameter corresponding to the candidate computing node according to available offline resources on the candidate computing node; wherein the more available offline resources on the candidate compute node, the higher the node evaluation parameter; and determining the candidate computing node as the target node in response to the node evaluation parameter meeting a preset parameter condition.
In yet another embodiment of the present disclosure, the determining a node evaluation parameter corresponding to the candidate computing node according to the available offline resource on the candidate computing node includes: acquiring a plurality of node evaluation parameters in a historical preset time period corresponding to the candidate computing node; weighting the node evaluation parameters based on time, wherein the closer the current time for processing the offline service to be scheduled, the higher the weight of the node evaluation parameters is; and taking the result after the weighting processing as a final node evaluation parameter of the candidate computing node.
In a second aspect of the disclosed embodiments, there is provided an apparatus for scheduling an offline service, the apparatus including: an information obtaining module, configured to obtain offline resource capacities corresponding to the respective computing nodes, where the offline resource capacities include: idle resources of the online service running on the computing node, wherein the idle resources are the parts of the resources taken by the online service except for the first used resources actually occupied; a node determining module, configured to determine a target computing node from the computing nodes according to the offline resource capacity of each computing node; and the scheduling processing module is used for scheduling the offline service to be scheduled to the target computing node.
In a third aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, and the program, when executed by a processor, implements the scheduling method of offline service of any embodiment of the present disclosure.
In a fourth aspect of embodiments of the present disclosure, a computing device is provided, which includes a memory for storing computer instructions executable on a processor, and the processor is configured to implement the scheduling method of offline services of any of the embodiments of the present disclosure when executing the computer instructions.
The scheduling method, device, medium and computing device for the offline service of the embodiments of the present disclosure adopt a dynamic scheduling manner, comprehensively consider the actual occupied resources and request resources of Pod during the scheduling process, schedule the offline service to the computing node running the online service, can utilize the idle resources of the online service to execute the offline service, and adopt a mixed deployment manner of mixedly deploying the online service and the offline service on the same computing node, so as to reduce the waste of the idle resources of the online service as much as possible, fully utilize the idle resources not used by the online service, improve the resource utilization rate of the node, and greatly reduce the server purchase cost and operation and maintenance cost of enterprises. After the scheme is implemented, the resource utilization rate of the online business server is remarkably improved, the scored average CPU utilization rate of the node server is greatly improved on the premise of ensuring the online business service level, and the IT infrastructure cost of an enterprise is greatly reduced.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
FIG. 1 schematically illustrates a system architecture diagram for a Kubernets cluster according to an embodiment of the present disclosure;
fig. 2 schematically shows a flowchart of a scheduling method of offline traffic according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow diagram for determining a target compute node from a plurality of candidate compute nodes according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a scheduling method of offline traffic according to yet another embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a scheduling apparatus for offline service according to an embodiment of the present disclosure;
FIG. 6 schematically shows a schematic diagram of a computer-readable storage medium according to an embodiment of the present disclosure;
fig. 7 illustrates one configuration of the computing device.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the disclosure, a scheduling method, a medium, a device and a computing device of an offline service are provided. In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.
The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.
The inventor of the present disclosure finds in practice that the problem of low utilization of the computing Node, especially the Node running the online service, is caused by the scheduling policy in the related art. In the related art, when a control node in a Kubernetes cluster performs service scheduling, in the process of selecting a running computing node for a service to be scheduled, the actual resource utilization rate on the computing node is not considered, and only a static resource request of the service is referred. When the online service applies for the resources on the computing node, the resources are generally applied according to the requirements of the online service in the peak period, but the peak period is short in duration, and most of the time cannot reach the peak. Therefore, the online service applies for excessive resources, and the resources are already allocated to the online service and cannot be allocated to other services, so that the problem of low utilization rate of the resources of the computing nodes is caused. Similarly, this scheduling strategy may not solve the load balancing problem among the compute nodes well.
Based on the above, the embodiments of the present disclosure provide a scheduling method for offline service, so as to change a scheduling policy, and consider the actual resource usage of the computing nodes in service scheduling, so as to improve the resource utilization of the computing nodes and better achieve load balancing among the computing nodes.
In order to better understand the following description of the scheduling method, some concepts involved in the scheduling method are briefly explained as follows.
Kubernetes cluster: the cluster may include a control Node and a plurality of compute nodes (nodes). Referring to the schematic of fig. 1, fig. 1 illustrates a system architecture diagram of a kubernets cluster according to an embodiment of the present disclosure. The control node 11 in the cluster may be responsible for service scheduling, and schedules a service to the computing node 12, and the cluster may include a plurality of computing nodes 12, and may use resources such as CPUs, Mem, and the like on the computing nodes 12 to run the service. When selecting a target computing node from the plurality of computing nodes 12 in the cluster, the control node 11 may select the target computing node according to a certain scheduling policy, where the target computing node is a node to be operated with a scheduling service. In addition, when service scheduling is performed, the basic unit of scheduling is Pod, Pod is the basic scheduling unit in Kubernetes, and one Pod may include at least one service container, and the service is in the container.
And (3) online service: the method refers to the service which is interactive with the user and sensitive to the interactive delay, and has low error tolerance and high SLO requirement. Such as instant messaging services, gaming services, payment services, and the like.
And (3) offline service: the method is a batch processing task which does not have interaction with a user and is insensitive to time delay, the error tolerance of the batch processing task is high, and the SLO (Service-level object) requirement is low. Such as Spark tasks, machine learning training tasks, audio-video transcoding services, etc.
It should be noted that, the kubernets cluster is taken as an example for the description of the scheduling method of the offline service in the embodiment of the present disclosure, but the method is not limited to this, and the method may also be applied to other occasions.
Fig. 2 schematically shows a flowchart of a scheduling method of offline traffic according to an embodiment of the present disclosure, which may be performed by a control node of a kubernets cluster, for example. The method can schedule offline services to computing nodes running online services. As shown in fig. 2, the method may include the following processes:
in step 200, obtaining offline resource capacities corresponding to the computing nodes, respectively, where the offline resource capacities include: idle resources of online traffic running on the compute node.
In this step, each of the computing nodes includes a node running an online service. The idle resources of the online service may be a part of the resources taken by the online service except for the first used resources actually occupied.
For example: it is assumed that the online service applies for resources as much as S1 according to the resource requirement of its own peak time period, but in the non-peak time period at ordinary times, the resources actually used by the online service are only S2, S2 is smaller than S1, S2 may be referred to as a first used resource, and (S1-S2) may be referred to as an idle resource of the online service, that is, a resource which is applied by the online service but is idle at present.
The offline resource capacity is a resource that can be allocated to the offline service, and the offline resource capacity includes the idle resource of the online service, and may also include other resources. For example, for a certain computing node, an online service runs on the computing node, and the offline resource capacity on the computing node may include not only idle resources of the online service, but also resources that have not been allocated to any service on the node.
In addition, the manner in which the control node of the Kubernetes cluster acquires the offline resource capacity of each computing node may include, but is not limited to, the following two acquisition manners:
in one example, each computing node may monitor the resource status of the node itself and report the resource status to the control node. The parameters that can be monitored and reported may include, but are not limited to, node-level resource usage, resource usage of online services on the node, resource usage of offline services on the node, and the like, and in short, the computing node may monitor and report the parameters required for computing the offline resource capacity to the control node. The control node can respectively calculate the offline resource capacity of each node according to the parameters reported by each computing node.
In another example, each computing node may calculate offline resource capacity according to the monitored parameters, and report the offline resource capacity to the control node. In this way, the control node does not need to calculate again, but rather, it is equivalent to collect and store the offline resource capacity reported by each computing node.
The following example is a way to determine offline resource capacity:
taking the resource as CPU for example, the calculation can be performed according to the following formula (1):
nodeColocationCpuCapacity=tc*nodeCpuCapacity-ServiceCpuUsg…(1)
as shown in the above formula (1), nodecamocationcpuacity represents offline resource capacity that can be allocated to offline services on a node, nodecappacity represents total CPU resources on the node, tc is a configurable system parameter, which represents the CPU resource utilization rate of a computing node that is expected to be achieved, and tc is the expected resource utilization rate of the computing node. ServiceCpuUsg represents the amount of CPU resources actually used by the online service on the compute node, and may be referred to as the first-used resources. The formula shows that the resource obtained by removing the first used resource from the expected resource utilization of the computing node may be referred to as an offline resource capacity, and the offline resource capacity may include idle resources of the online service and other idle resources on the node.
As in the example of the above formula (1), when determining the offline resource capacity of the computing node, the expected resource utilization rate of the computing node is taken into consideration, so that the resources on the computing node can be fully utilized as much as possible, and the resource utilization can also be controlled within an expected range without causing an excessive load on the node, thereby contributing to making the resource utilization of the computing node in a moderate state. In addition, the actual occupied resources of the online service on the computing nodes are considered, so that the real idle resources on the computing nodes can be reflected more accurately through the offline resource capacity, and the load balance among the nodes is more favorably realized.
In addition, in order to avoid that the offline service is too few in allocated resources to seriously affect the throughput of the offline service, and even is starved, and to ensure the operation stability of the scheduled offline service as much as possible, the embodiment of the present disclosure further provides a minimum resource guarantee for the offline service, and a minimum offline resource quantity may be preset, and may be represented by rc × nodeccapacity. Where rc represents the percentage of CPU resources that are least allocated to offline traffic, and nodeccupspace represents the total CPU resources on the node. If "tc × nodeccpcupaxity-servicecpcupusg" calculated according to the above formula (1) is smaller than rc × nodeccpcupaxity, the lowest offline resource amount (rc × nodeccpcapacity) may be used as the offline resource capacity of the calculation node, that is, the offline resource capacity will be according to the following formula (2):
nodeColocationCpuCapacity=rc*nodeCpuCapacity…(2)
in the above, the CPU resource is taken as an example, and similarly, the Mem resource can be calculated in the same manner, so as to calculate the storage resource nodecocationmemcache in the offline resource capacity. I.e., the offline resource capacity on a compute node may include both CPU resources and Mem resources.
In step 202, a target computing node is determined from the computing nodes according to the offline resource capacity of the computing nodes.
In this step, after the offline resource capacity of each computing node is obtained, the target computing node may be determined according to the offline resource capacity in the process. The computing node selected from the various computing nodes to run the offline service being scheduled may be referred to as a target computing node. For example, if a certain offline service is being scheduled, and the offline service is to be scheduled to one of the computing nodes for operation, after selecting according to the offline resource capacity of each computing node, a computing node is determined from the offline resource capacity, and the control node may then schedule the offline service to the computing node for operation, then the computing node may be referred to as a target computing node.
The control node may include two stages in the process of determining the target computing node:
one stage, which may be referred to as the Filtering stage, may be to traverse all of the compute nodes in the cluster and screen a number of candidate compute nodes from the compute nodes that satisfy the condition. This stage is mainly to determine whether there are sufficient resources on the compute node for the service to be scheduled to use, and take the node with sufficient resources as a candidate compute node.
Another phase may be referred to as a Prioritizing phase, in which multiple candidate computing nodes are evaluated, and an optimal node is selected as the target computing node. In this stage, the computing node with the lowest load is preferentially selected from the plurality of candidate computing nodes as the target computing node, mainly for balancing the load among different computing nodes and improving the stability of the service.
In the embodiment of the present disclosure, after determining the offline resource capacity of each computing node, the offline resource capacity may be used in both the two phases Filtering and priority, or may be used in only one of the phases.
For example, for each compute node, the available offline resources on the compute node may be determined based on the offline resource capacity of the compute node and the offline resource usage on the compute node. The available offline resources are used to represent the available offline resources remaining in the offline resource capacity of the computing node. The available offline resource capacity may be different from the offline resource capacity, where the offline resource capacity may be a total resource on the computing node that can be allocated to the offline service, and the available offline resource may be a resource that can be allocated to the offline service currently being scheduled, for example, if the offline resource capacity on the node is 10 cores CPU, but the node already has offline services, and these existing offline services occupy 6 cores, then the available offline resource that can be allocated to the offline service currently being scheduled is 4 cores.
Determining a plurality of candidate computing nodes from each computing node according to the available offline resources corresponding to each computing node; and determining the target computing node from the plurality of candidate computing nodes according to the available offline resources on each candidate computing node.
As previously mentioned, the available offline resources may also be used in only one of the phases. For example, the target computing node is determined according to the available offline resources only in the priority stage, and the determination mode of the candidate computing node in the Filtering stage is not limited; or, the candidate computing node is determined according to the available offline resources only in the Filtering stage, and the determination mode of the target computing node in the priority stage is not limited.
The following example is a way to determine available offline resources:
wherein the available offline resources may include CPU resources and Mem resources, please see the following equation (3), which equation (3) exemplifies the calculation of CPU resources therein:
freeColocationCpun=ColocatonCpuCapacityn–max(ColocationCpuUsgn,ColocationCpuRequestn)…(3)
where the subscript n represents any one of the compute nodes n, each of which can be computed according to the formula. fresh ColocatationCpunRepresenting the CPU resource, ColocationCpuUsg, among the available offline resources of the compute node nnThe resource actually occupied by the existing offline service on the computing node is represented, and the resource can be called as a second used resource; ColocatationCpuRequestnThe offline demand resources of the offline services existing on the computing node are represented, namely the resources applied by the offline services. The larger value "max (ColocationCpuUsg) of the second used resource and the offline demanded resource may ben,ColocationCpuRequestn) "refers to offline resource usage on the compute node. Also, the ColocatonCpuCapacity may be derived from the compute node's offline resource capacitynThe offline resource usage amount is eliminated, and the available offline resource freeColocationCpu on the node can be obtainedn. The available offline resource represents the available offline resource remaining in the offline resource capacity of the compute node, which refers to the CPU resource.
In the same way, Mem resources among the available offline resources can be calculated, as shown in equation (4):
freeColocationMemn=ColocationMemCapacityn–max(ColocationMemUsgn,ColocationMemRequestn)…(4)
wherein, freeColocatationMemnIndicating storage resources in available offline resources, ColocationMemUsgnIndicating the actual memory resource occupied by the existing offline service on the compute node, the ColocationMemRequestnThe offline demand resources of the offline services existing on the computing node are represented, namely the storage resources of the offline service applications. The Colocatinonmeusg can be compared firstnAnd ColocatinationMemRequestnThe larger of the two, i.e., the larger of the two, is determined. Storage resource ColocatnEMCapacity from the offline resource capacity of the compute nodenIf the above-mentioned larger value is removed, the storage resource freeColocationMem in the available off-line resource on the node can be obtainedn
In the Filtering phase, candidate compute nodes may be screened out based on available offline resources. For example, if the candidate computing node is a node with sufficient resources, if the available offline resources of a certain computing node are greater than or equal to the offline demand resources of the offline service to be scheduled, the computing node may be used as the candidate computing node. And, the CPU resource in the available offline resource on the node is greater than or equal to the CPU resource required by the offline service to be scheduled, and meanwhile, the Mem resource in the available offline resource on the node is greater than or equal to the Mem resource required by the offline service to be scheduled, that is, both the CPU and the Mem resource satisfy the above conditions, and the node is taken as a candidate computing node. Of course, besides CPU resources and Mem resources, other conditions for screening as candidate compute nodes may also be set, and the embodiments of the present disclosure are not limited.
In addition, as described above, in the kubernets cluster example, the offline service scheduled by the control node is based on Pod as a basic scheduling unit, and then the offline service to be scheduled is usually an offline Pod, a container of the offline Pod includes the offline service to be executed, and a resource requirement of the offline service is a resource requirement of the offline Pod. Similarly, the resource occupied by the offline service stored on the computing node may be the occupied resource of the offline Pod existing on the node.
In the Prioritizing stage, a target compute node may be determined from a plurality of candidate compute nodes based on available offline resources. In particular, referring to FIG. 3, FIG. 3 illustrates one manner in which a target compute node is determined from among the candidate compute nodes during the prioritization phase. The following processes may be included:
in step 300, for each candidate computing node, a node evaluation parameter corresponding to the candidate computing node may be determined according to the available offline resources of the candidate computing node.
For example, the node evaluation parameter may be a node score. The more offline resources available on the candidate compute node, the higher the node score.
For example, the CPU resource freeColocationCpu in the available offline resources is calculated in equations (3) and (4) abovenAnd a storage resource freeColocationMem among available offline resourcesn. May be according to freeColocatationCpunConversion to a score of the corresponding node in terms of available offline CPU resourcesn(cpu) and according to freeColocactionMemnConversion to a score of the corresponding node in terms of available offline storage resourcesn(mem). The total score of a node is then the sum of the scores of these two aspects, see the following equation:
scoren=wcpu*scoren(cpu)+wmem*scoren(mem)…(5)
as in equation (5), the subscript n represents any one of the compute nodes n, scorenRepresents the score of a compute node n, where scoren(CPU) is the node's score in terms of CPU resources, which may be according to freeColocatationCpunObtained as scoren(Mem) is the node's score in terms of Mem resources, which may be according to freeColocatationMemnThus obtaining the product. w is acpuCPU resource score for a representation noden(cpu) corresponding weight, wmemMem resource score for a noden(mem) corresponding weight.
Each candidate compute node may compute a corresponding score according to equation (5) above. And the more offline resources available on a node, the higher the score.
In step 302, in response to that the node evaluation parameter of the candidate computing node satisfies a preset parameter condition, the candidate computing node is determined as the target node.
In one example, the preset parameter condition may be the highest score, that is, the candidate computing node corresponding to the highest score may be determined as the target computing node. The available offline resource on the candidate computing node corresponding to the highest score is also the highest, and the offline Pod to be scheduled is scheduled on the target computing node, so that the load among the nodes can be well balanced.
In step 204, the offline service to be scheduled is scheduled to the target computing node.
For example, the control node may schedule the offline Pod to the target compute node to run offline traffic in the container of the offline Pod using resources such as CPU and Mem on the target compute node.
The offline service scheduling method of the embodiment of the disclosure adopts a dynamic scheduling mode, comprehensively considers the actual occupied resources and the request resources of the Pod in the scheduling process, schedules the offline service to the computing node running the online service, can utilize the idle resources of the online service to execute the offline service, and adopts a mixed deployment mode of mixedly deploying the online service and the offline service on the same computing node, so that the waste of the idle resources of the online service can be reduced as much as possible, the idle resources not used by the online service are fully utilized, the resource utilization rate of the node is improved, and the server purchase cost and the operation and maintenance cost of an enterprise are greatly reduced. After the scheme is implemented, the resource utilization rate of the online business server is remarkably improved, the scored average CPU utilization rate of the node server is greatly improved on the premise of ensuring the online business service level, and the IT infrastructure cost of an enterprise is greatly reduced.
In addition, in the scheduling method of the embodiment of the present disclosure, in the scheduling, the real load on the node server is considered, for example, the resources actually occupied by the online service of the node, the available offline resources of the node, and the like, and the offline Pod is scheduled to the computing node with the largest available offline resources.
In another example, fig. 4 schematically shows a flowchart of a scheduling method of offline traffic according to another embodiment of the present disclosure, in this example, in determining the score of each computing node, a weighting based on time is adopted. As shown in fig. 4, the method may include the following processes, wherein the steps that are the same as the flow of the foregoing embodiments will be briefly described, and the detailed process please refer to the foregoing embodiments in combination.
In step 400, obtaining offline resource capacities corresponding to the computing nodes, respectively, where the offline resource capacities include: idle resources of online traffic running on the compute node.
In step 402, for each compute node, available offline resources on the compute node are determined based on the offline resource capacity of the compute node and the offline resource usage on the compute node.
In step 404, a plurality of candidate computing nodes are determined from each computing node according to the available offline resources corresponding to the computing node.
In step 406, for each candidate computing node, determining a node evaluation parameter corresponding to the candidate computing node according to available offline resources on the candidate computing node; acquiring a plurality of node evaluation parameters in a historical preset time period corresponding to the candidate computing node; and weighting the plurality of node evaluation parameters based on time. And taking the result after the weighting processing as a final node evaluation parameter of the candidate computing node.
In this embodiment, when calculating the score of each candidate compute node, the following manner may be adopted: and acquiring a plurality of scores in a historical preset time period corresponding to the candidate computing node, and performing weighting processing on the plurality of scores based on time.
In one example, the historical predetermined time period may be within 24 hours of the current time. For example, when the control node receives a scheduling request for an offline Pod and is processing a scheduling task for scheduling the offline Pod to a certain computing node, the time may be referred to as a current time, and is pushed forward for 24 hours as a historical preset time period. In practical implementation, the score corresponding to the computing node may be calculated at regular intervals, for example, for one computing node, the score may be calculated at intervals of two hours, and then a plurality of scores may be obtained within a historical preset time period of 24 hours.
The weighting of the multiple scores may be performed according to equation (6):
Figure BDA0003182452560000131
wherein i represents any one of the compute nodes. In the above formula (6), scorei[t]Representing the score of the computing node i corresponding to the current time t; w [ t ]]Score of the current time ti[t]Corresponding weight value, SCOREiRepresenting the final SCORE of compute node i, we will dispatch offline traffic to SCORE preferentiallyiThe highest compute node.
The value of the weight w [ t ] is closer to the current time for processing the offline service to be scheduled, and the weight of the node evaluation parameter score is higher. The following equation (7) illustrates one way of calculating the weights:
Figure BDA0003182452560000132
the closer to the current moment, the higher the weight is, the exponential continuous attenuation is realized, the half-attenuation duration of the weight is 6h, namely, the weight value is halved after every 6 h.
The above-mentioned manner of weighting the multiple scores in a period to obtain the final node score mainly considers that offline resources on the computing node have no higher quality guarantee, and the service quality of the online service is preferentially guaranteed, so that a situation that idle resources of the original online service used by the offline service are recycled by the online service may occur, which may affect throughput of the offline service and starvation of the offline service, and if the offline service is rescheduled to other idle nodes with lower load, such rescheduling may cause data loss, therefore, to avoid frequent rescheduling and starvation of the offline service, the service quality of the offline service is guaranteed as much as possible, the final score of the node is determined based on the scores in a period of the computing node, and a more stable target computing node may be selected.
In step 408, in response to that the node evaluation parameter satisfies a preset parameter condition, the candidate computing node is determined as the target node.
For example, for each candidate calculation node, the final SCORE of the node is calculated according to formula (6), and the SCORE is calculatediThe highest computing node is used as a target computing node.
In step 410, offline traffic to be scheduled is scheduled to the target computing node.
According to the scheduling method of the offline service, the target computing node is selected by adopting the node evaluation parameter based on time weighting, so that frequent rescheduling of the scheduled offline service can be avoided as much as possible, and the service quality of the offline service is ensured.
In order to implement the foregoing scheduling method for offline service, the embodiment of the present disclosure further provides a device for implementing the method. Fig. 5 schematically illustrates an offline service scheduling apparatus according to an embodiment of the present disclosure, and as shown in fig. 5, the apparatus may include: an information acquisition module 51, a node determination module 52 and a scheduling processing module 53.
An information obtaining module 51, configured to obtain offline resource capacities corresponding to the computing nodes, where the offline resource capacities include: and calculating idle resources of the online service running on the node, wherein the idle resources are the parts except the first used resources actually occupied in the resources picked up by the online service.
And a node determining module 52, configured to determine a target computing node from the computing nodes according to the offline resource capacity of each computing node.
And the scheduling processing module 53 is configured to schedule the offline service to be scheduled to the target computing node.
In an example, the information obtaining module 51, when configured to obtain offline resource capacities corresponding to respective computing nodes, includes: for each computing node, acquiring the first usage resource actually occupied by the online service running on the computing node; determining an offline resource capacity of the compute node based on the expected resource utilization of the compute node and the first used resource of the online service.
In an example, the information obtaining module 51 is further configured to: and under the condition that the offline resource capacity determined based on the expected resource utilization amount of the computing node and the first used resource is smaller than a preset minimum offline resource amount, taking the minimum offline resource amount as the offline resource capacity of the computing node.
In one example, the node determining module 52, when configured to determine a target computing node from the computing nodes according to offline resource capacities of the computing nodes, includes: for each computing node, determining available offline resources on the computing node based on the offline resource capacity of the computing node and the offline resource usage amount on the computing node, wherein the available offline resources are used for representing the remaining available offline resources in the offline resource capacity of the computing node; determining a plurality of candidate computing nodes from each computing node according to the available offline resources corresponding to each computing node; and/or determining the target computing node from the plurality of candidate computing nodes according to available offline resources on each candidate computing node.
In one example, node determination module 52, when configured to determine available offline resources on the computing node based on the offline resource capacity of the computing node and the offline resource usage amount on the computing node, comprises: acquiring a second use resource actually occupied by the existing offline service on the computing node and an offline demand resource of the existing offline service; taking the larger value of the second used resource and the offline demand resource as the offline resource usage amount; and removing the offline resource usage amount from the offline resource capacity of the computing node, and determining available offline resources on the computing node.
In an example, the node determining module 52, when configured to determine a plurality of candidate computing nodes from the computing nodes according to the available offline resources corresponding to the computing nodes respectively, includes: and in response to the available offline resource being greater than or equal to an offline demand resource of the offline service to be scheduled, taking the computing node as the candidate computing node.
In one example, node determination module 52, when configured to determine the target computing node from the plurality of candidate computing nodes based on available offline resources on each candidate computing node, comprises: for each candidate computing node, determining a node evaluation parameter corresponding to the candidate computing node according to available offline resources on the candidate computing node; wherein the more available offline resources on the candidate compute node, the higher the node evaluation parameter; and determining the candidate computing node as the target node in response to the node evaluation parameter meeting a preset parameter condition.
In one example, the node determining module 52, when configured to determine the node evaluation parameter corresponding to the candidate computing node according to the available offline resource on the candidate computing node, includes: acquiring a plurality of node evaluation parameters in a historical preset time period corresponding to the candidate computing node; weighting the node evaluation parameters based on time, wherein the closer the current time for processing the offline service to be scheduled, the higher the weight of the node evaluation parameters is; and taking the result after the weighting processing as a final node evaluation parameter of the candidate computing node.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the scheduling means of offline traffic are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
The embodiment of the disclosure also provides a computer readable storage medium. As shown in fig. 6, the storage medium has a computer program 601 stored thereon, and when executed by a processor, the computer program 601 may perform the scheduling method of offline service according to any embodiment of the present disclosure.
The disclosed embodiments also provide a computing device, which may include a memory for storing computer instructions executable on a processor, and the processor for implementing the scheduling method of offline service of any embodiment of the present disclosure when executing the computer instructions.
FIG. 7 illustrates one configuration of the computing device, and as shown in FIG. 7, the computing device 70 may include, but is not limited to: a processor 71, a memory 72, and a bus 73 that connects the various system components, including the memory 72 and the processor 71.
Wherein the memory 72 stores computer instructions executable by the processor 71 such that the processor 71 is capable of performing the advertisement push method of any of the embodiments of the present disclosure. The memory 72 may include a random access memory unit RAM721, a cache memory unit 722, and/or a read only memory unit ROM 723. The memory 72 may also include: program tool 725 having a set of program modules 724, the program modules 724 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, one or more combinations of which may comprise an implementation of a network environment.
The bus 73 may include, for example, a data bus, an address bus, a control bus, and the like. The computing device 70 may also communicate with external devices 75 through the I/O interface 74, the external devices 75 may be, for example, a keyboard, a bluetooth device, etc. The computing device 70 may also communicate with one or more networks, such as a local area network, a wide area network, a public network, etc., through the network adapter 76. The network adapter 76 may also communicate with other modules of the computing device 70 via the bus 73, as shown in FIG. 7.
Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method for scheduling offline traffic, the method comprising:
obtaining offline resource capacity corresponding to each computing node, wherein the offline resource capacity comprises: idle resources of the online service running on the computing node, wherein the idle resources are the parts of the resources taken by the online service except for the first used resources actually occupied;
determining a target computing node from each computing node according to the offline resource capacity of each computing node;
and scheduling the offline service to be scheduled to the target computing node.
2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
the acquiring of the offline resource capacity corresponding to each computing node includes:
for each computing node, acquiring the first usage resource actually occupied by the online service running on the computing node;
determining an offline resource capacity of the compute node based on the expected resource utilization of the compute node and the first used resource of the online service.
3. The method of claim 2, wherein the first and second light sources are selected from the group consisting of,
the method further comprises the following steps:
and if the offline resource capacity determined based on the expected resource utilization amount of the computing node and the first used resource is smaller than a preset minimum offline resource amount, taking the minimum offline resource amount as the offline resource capacity of the computing node.
4. The method of claim 1, wherein determining a target compute node from the compute nodes based on offline resource capacity of the compute nodes comprises:
for each computing node, determining available offline resources on the computing node based on the offline resource capacity of the computing node and the offline resource usage amount on the computing node, wherein the available offline resources are used for representing the remaining available offline resources in the offline resource capacity of the computing node;
determining a plurality of candidate computing nodes from each computing node according to the available offline resources corresponding to each computing node;
and/or determining the target computing node from the plurality of candidate computing nodes according to available offline resources on each candidate computing node.
5. The method of claim 4, wherein determining a plurality of candidate compute nodes from the compute nodes according to the available offline resources corresponding to the compute nodes, respectively, comprises:
and in response to the available offline resource being greater than or equal to an offline demand resource of the offline service to be scheduled, taking the computing node as the candidate computing node.
6. The method of claim 4, the determining the target compute node from the plurality of candidate compute nodes based on available offline resources on the candidate compute nodes, comprising:
for each candidate computing node, determining a node evaluation parameter corresponding to the candidate computing node according to available offline resources on the candidate computing node; wherein the more available offline resources on the candidate compute node, the higher the node evaluation parameter;
and determining the candidate computing node as the target node in response to the node evaluation parameter meeting a preset parameter condition.
7. The method of claim 6, wherein determining node evaluation parameters corresponding to the candidate computing nodes according to available offline resources on the candidate computing nodes comprises:
acquiring a plurality of node evaluation parameters in a historical preset time period corresponding to the candidate computing node; weighting the node evaluation parameters based on time, wherein the closer the current time for processing the offline service to be scheduled, the higher the weight of the node evaluation parameters is;
and taking the result after the weighting processing as a final node evaluation parameter of the candidate computing node.
8. An apparatus for scheduling off-line traffic, the apparatus comprising:
an information obtaining module, configured to obtain offline resource capacities corresponding to the respective computing nodes, where the offline resource capacities include: idle resources of the online service running on the computing node, wherein the idle resources are the parts of the resources taken by the online service except for the first used resources actually occupied;
a node determining module, configured to determine a target computing node from the computing nodes according to the offline resource capacity of each computing node;
and the scheduling processing module is used for scheduling the offline service to be scheduled to the target computing node.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 7.
10. A computing device comprising a memory for storing computer instructions executable on a processor, the processor for implementing the method of any one of claims 1 to 7 when executing the computer instructions.
CN202110850872.7A 2021-07-27 2021-07-27 Scheduling method, device, medium and computing equipment of offline service Pending CN113590317A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110850872.7A CN113590317A (en) 2021-07-27 2021-07-27 Scheduling method, device, medium and computing equipment of offline service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110850872.7A CN113590317A (en) 2021-07-27 2021-07-27 Scheduling method, device, medium and computing equipment of offline service

Publications (1)

Publication Number Publication Date
CN113590317A true CN113590317A (en) 2021-11-02

Family

ID=78250465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110850872.7A Pending CN113590317A (en) 2021-07-27 2021-07-27 Scheduling method, device, medium and computing equipment of offline service

Country Status (1)

Country Link
CN (1) CN113590317A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023174037A1 (en) * 2022-03-14 2023-09-21 抖音视界有限公司 Resource scheduling method, apparatus and system, device, medium, and program product

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357661A (en) * 2017-07-12 2017-11-17 北京航空航天大学 A kind of fine granularity GPU resource management method for mixed load
CN108228354A (en) * 2017-12-29 2018-06-29 杭州朗和科技有限公司 Dispatching method, system, computer equipment and medium
CN110647394A (en) * 2018-06-27 2020-01-03 阿里巴巴集团控股有限公司 Resource allocation method, device and equipment
CN111506414A (en) * 2019-01-30 2020-08-07 阿里巴巴集团控股有限公司 Resource scheduling method, device, equipment, system and readable storage medium
CN111858030A (en) * 2020-06-17 2020-10-30 北京百度网讯科技有限公司 Job resource processing method and device, electronic equipment and readable storage medium
CN112199194A (en) * 2020-10-14 2021-01-08 广州虎牙科技有限公司 Container cluster-based resource scheduling method, device, equipment and storage medium
CN112269641A (en) * 2020-11-18 2021-01-26 网易(杭州)网络有限公司 Scheduling method, scheduling device, electronic equipment and storage medium
CN112363813A (en) * 2020-11-20 2021-02-12 上海连尚网络科技有限公司 Resource scheduling method and device, electronic equipment and computer readable medium
CN112559182A (en) * 2020-12-16 2021-03-26 北京百度网讯科技有限公司 Resource allocation method, device, equipment and storage medium
CN112783607A (en) * 2021-01-29 2021-05-11 上海哔哩哔哩科技有限公司 Task deployment method and device in container cluster

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357661A (en) * 2017-07-12 2017-11-17 北京航空航天大学 A kind of fine granularity GPU resource management method for mixed load
CN108228354A (en) * 2017-12-29 2018-06-29 杭州朗和科技有限公司 Dispatching method, system, computer equipment and medium
CN110647394A (en) * 2018-06-27 2020-01-03 阿里巴巴集团控股有限公司 Resource allocation method, device and equipment
CN111506414A (en) * 2019-01-30 2020-08-07 阿里巴巴集团控股有限公司 Resource scheduling method, device, equipment, system and readable storage medium
CN111858030A (en) * 2020-06-17 2020-10-30 北京百度网讯科技有限公司 Job resource processing method and device, electronic equipment and readable storage medium
CN112199194A (en) * 2020-10-14 2021-01-08 广州虎牙科技有限公司 Container cluster-based resource scheduling method, device, equipment and storage medium
CN112269641A (en) * 2020-11-18 2021-01-26 网易(杭州)网络有限公司 Scheduling method, scheduling device, electronic equipment and storage medium
CN112363813A (en) * 2020-11-20 2021-02-12 上海连尚网络科技有限公司 Resource scheduling method and device, electronic equipment and computer readable medium
CN112559182A (en) * 2020-12-16 2021-03-26 北京百度网讯科技有限公司 Resource allocation method, device, equipment and storage medium
CN112783607A (en) * 2021-01-29 2021-05-11 上海哔哩哔哩科技有限公司 Task deployment method and device in container cluster

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张若峰;张巍;: "浅析吉林移动经营分析多租户***资源调度算法", 营销界, no. 24 *
王霆;董启文;范斐斐;: "基于虚拟机整合的云数据中心资源管理研究", 计算机工程, no. 09, pages 2 *
葛浙奉;王济伟;蒋从锋;张纪林;俞俊;林江彬;闫龙川;任祖杰;万健;: "混部集群资源利用分析", 计算机学报, no. 06 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023174037A1 (en) * 2022-03-14 2023-09-21 抖音视界有限公司 Resource scheduling method, apparatus and system, device, medium, and program product

Similar Documents

Publication Publication Date Title
CN108829494B (en) Container cloud platform intelligent resource optimization method based on load prediction
EP3847549B1 (en) Minimizing impact of migrating virtual services
Mazumdar et al. Power efficient server consolidation for cloud data center
US7856572B2 (en) Information processing device, program thereof, modular type system operation management system, and component selection method
US9584389B2 (en) Physical resource management
CN113806018B (en) Kubernetes cluster resource mixed scheduling method based on neural network and distributed cache
Sedaghat et al. Unifying cloud management: Towards overall governance of business level objectives
Sun et al. Rose: Cluster resource scheduling via speculative over-subscription
US10795735B1 (en) Method and apparatus for load balancing virtual data movers between nodes of a storage cluster
CN110287245A (en) Method and system for scheduling and executing distributed ETL (extract transform load) tasks
CN112799817A (en) Micro-service resource scheduling system and method
CN110086726A (en) A method of automatically switching Kubernetes host node
EP4029197B1 (en) Utilizing network analytics for service provisioning
CN113110914A (en) Internet of things platform construction method based on micro-service architecture
US20210357269A1 (en) Quality of service scheduling with workload profiles
CN114153580A (en) Cross-multi-cluster work scheduling method and device
CN115220916B (en) Automatic calculation scheduling method, device and system of video intelligent analysis platform
CN114356543A (en) Kubernetes-based multi-tenant machine learning task resource scheduling method
CN113032102A (en) Resource rescheduling method, device, equipment and medium
WO2018114740A1 (en) A local sdn controller and corresponding method of performing network control and management functions
CN113590317A (en) Scheduling method, device, medium and computing equipment of offline service
Kumar et al. Load balancing algorithm to minimize the makespan time in cloud environment
CN109408230B (en) Docker container deployment method and system based on energy consumption optimization
US20230155958A1 (en) Method for optimal resource selection based on available gpu resource analysis in large-scale container platform
CN109558214B (en) Host machine resource management method and device in heterogeneous environment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 310052 Room 301, Building No. 599, Changhe Street Network Business Road, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Hangzhou NetEase Shuzhifan Technology Co.,Ltd.

Address before: 310052 Room 301, Building No. 599, Changhe Street Network Business Road, Binjiang District, Hangzhou City, Zhejiang Province

Applicant before: HANGZHOU LANGHE TECHNOLOGY Ltd.