WO2022002148A1 - 资源调度方法、资源调度***及设备 - Google Patents

资源调度方法、资源调度***及设备 Download PDF

Info

Publication number
WO2022002148A1
WO2022002148A1 PCT/CN2021/103638 CN2021103638W WO2022002148A1 WO 2022002148 A1 WO2022002148 A1 WO 2022002148A1 CN 2021103638 W CN2021103638 W CN 2021103638W WO 2022002148 A1 WO2022002148 A1 WO 2022002148A1
Authority
WO
WIPO (PCT)
Prior art keywords
scheduling
resource
crd
scheduler
pod
Prior art date
Application number
PCT/CN2021/103638
Other languages
English (en)
French (fr)
Inventor
张乘铭
唐波
王科文
韩炳涛
王永成
屠要峰
高洪
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to US18/004,067 priority Critical patent/US20230266999A1/en
Priority to JP2023500093A priority patent/JP7502550B2/ja
Priority to EP21833960.4A priority patent/EP4177751A4/en
Publication of WO2022002148A1 publication Critical patent/WO2022002148A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1036Load balancing of requests to servers for services different from user content provisioning, e.g. load balancing across domain name servers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the field of computer technologies, and in particular, to a resource scheduling method, a resource scheduling system, a device, and a computer-readable storage medium.
  • Kubernetes is currently the most mainstream container orchestration and scheduling platform.
  • Kubernetes can support the management of user-defined resources (Custom Resource Definitions, CRD) through good scalability, which is convenient for users to manage custom resources as a whole object entity.
  • CRD Customer Resource Definitions
  • Kubernetes only supports the scheduling of Pods.
  • a special scheduler is required. Multiple schedulers will cause resource scheduling conflicts.
  • the following problems will also occur: the resources cannot meet the resource requests of the CRDs, resulting in the failure of the CRDs to be scheduled. ; Even if the CRD can be successfully scheduled, the CRD is not scheduled according to the optimal resource allocation method, and the operating efficiency will be reduced.
  • the present application provides a resource scheduling method, a resource scheduling system, a device, and a computer-readable storage medium.
  • the resource scheduling method provided by the embodiment of the present application includes: obtaining a scheduling object from a scheduling queue; when the scheduling object is a custom resource, dismantling the custom resource according to the current resource state to obtain a scheduling unit list , the scheduling unit list includes scheduling units configured to constitute the self-defined resource; scheduling the scheduling units in the scheduling unit list in sequence.
  • the resource scheduling system includes: a scheduler, configured to obtain a scheduling object from a scheduling queue; a splitter, configured to, when the scheduling object is a user-defined resource, according to the current resource state disassembles the custom resource to obtain a scheduling unit list, where the scheduling unit list includes scheduling units configured to constitute the custom resource; wherein the scheduler sequentially schedules the scheduling units in the scheduling unit list scheduling unit.
  • a device provided by an embodiment of the present application includes a memory, a processor, and a computer program that is stored in the memory and that can run on the processor.
  • the processor executes the computer program, the implementation is implemented as in the first aspect above.
  • Example of resource scheduling method Example of resource scheduling method.
  • a computer-readable storage medium provided by an embodiment of the present application stores computer-executable instructions, where the computer-executable instructions are used to execute the resource scheduling method according to the embodiment of the first aspect.
  • FIG. 1 is a schematic diagram of a system architecture platform provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a resource scheduling method provided by an embodiment of the present application
  • FIG. 3 is a flowchart of a resource scheduling method provided by another embodiment of the present application.
  • FIG. 4 is a flowchart of a resource scheduling method provided by another embodiment of the present application.
  • FIG. 5 is a flowchart of a resource scheduling method provided by another embodiment of the present application.
  • FIG. 6 is a flowchart of a resource scheduling method provided by another embodiment of the present application.
  • FIG. 7 is a flowchart of a resource scheduling method provided by another embodiment of the present application.
  • FIG. 8 is a flowchart of a resource scheduling method provided by another embodiment of the present application.
  • FIG. 9 is a flowchart of a resource scheduling method provided by another embodiment of the present application.
  • FIG. 10 is a flowchart of a resource scheduling method provided by another embodiment of the present application.
  • FIG. 11 is a flowchart of a resource scheduling method provided by another embodiment of the present application.
  • Kubernetes is an open source, used to manage containerized applications on multiple hosts in the cloud platform. The goal of Kubernetes is to make the deployment of containerized applications simple and efficient. Kubernetes provides a method for application deployment, planning, updating, and maintenance. mechanism. In Kubernetes, you can create multiple containers, each container runs an application instance, and then manages, discovers, and accesses this group of application instances through the built-in load balancing strategy, and these details do not require operation and maintenance personnel To perform complex manual configuration and processing, Kubernetes has a wide range of applications. Many cloud computing, artificial intelligence and other platforms of enterprises or research institutions are implemented based on Kubernetes. Kubernetes supports the management of user-defined resources (Custom Resource Definitions, CRD) through good extensibility, which is convenient for users to manage custom resources as a whole object entity.
  • CRD Customer Resource Definitions
  • Kubernetes currently only supports Pod scheduling.
  • Pod is the smallest unit that can be created and deployed in Kubernetes. It is an application instance in a Kubernetes cluster and is always deployed on the same node.
  • a Pod contains one or more containers. It includes resources shared by various containers such as storage and network.
  • Kubernetes requires a dedicated scheduler to schedule CRDs, and resource scheduling conflicts will arise between multiple schedulers.
  • the default scheduler of Kubernetes only supports the scheduling of Pods and does not support the scheduling of CRD objects.
  • the default scheduler of Kubernetes cannot automatically and reasonably disassemble CRD objects into Pods according to the current resource status.
  • This application provides a resource scheduling method, a resource scheduling system, a device and a computer-readable storage medium proposed in this application. During resource scheduling, a scheduling object is obtained from a scheduling queue.
  • the The state disassembles the custom resource to obtain a list of scheduling units, the list of scheduling units includes the first scheduling unit configured to constitute the custom resource, and then schedules each first scheduling unit in turn according to the list of scheduling units, the resource scheduling method can be applied
  • the first scheduling unit is a CRD object.
  • the scheduling object is a CRD during scheduling
  • the CRD is disassembled according to the current resource status to obtain a scheduling unit list.
  • the scheduling unit list includes a set of Pods, so that the Kubernetes scheduling platform can
  • the scheduling unit list performs atomic scheduling on Pods, and all Pods are scheduled in sequence according to the queue to avoid inserting other Pods. This ensures that CRDs can be reasonably scheduled, and the scheduling efficiency is high, making the Kubernetes scheduling platform compatible with various business scenarios.
  • FIG. 1 is a schematic diagram of a system architecture platform 100 for executing a resource scheduling method provided by an embodiment of the present application, and the system architecture platform 100 is also a resource scheduling system.
  • the system architecture platform 100 includes a scheduler 110 and a splitter 120 , wherein the scheduler 110 is configured to schedule the scheduling of objects, and the splitter 120 is configured to respond to the splitting of the scheduler 110 Disassemble the request, and disassemble the scheduling object to meet the scheduling requirement of the scheduler 110 .
  • the scheduler 110 obtains the scheduling object from the scheduling queue.
  • the splitter 120 can disassemble the custom resource according to the current resource state to obtain a scheduling unit list, where the scheduling unit list includes the A first scheduling unit configured to constitute a custom resource.
  • the scheduler 110 sequentially schedules the first scheduling units in the scheduling unit list according to the scheduling unit list, so as to complete the scheduling of the user-defined resources.
  • the Kubernetes scheduling platform is used as an example to illustrate.
  • the Kubernetes scheduling system of the embodiment includes a scheduler (Scheduler) 110 , a splitter (Pod-Splitor) 120 and a controller (CRD-Controller) 130 .
  • the scheduler 110 is responsible for scheduling Pods, the splitter is responsible for splitting CRD objects, the first scheduling unit is the CRD object, and the second scheduling unit is the native Pod object. In this embodiment, the CRD and the Pod are placed in the same scheduling queue.
  • the scheduler 110 obtains the disassembled Pod set through the extended split interface, and uses the scheduler 110 to process all the Pods. Schedule in turn.
  • the splitter 120 is a user-defined extension component, which mainly responds to the dismantling request of the scheduler 110, decomposes the CRD into reasonable Pods according to the current cluster resource occupancy, and is responsible for creating a list of scheduling units including these Pods, and The scheduling unit list is sent back to the scheduler 110 for scheduling; at the same time, the splitter 120 can respond to the node binding request from the scheduler 110 to complete the binding operation between the Pod and the node (Node).
  • the binding of Pod and node can be understood as adding some node information and resource information to the Pod object, and then the scheduling system will have special components to run the Pod on the corresponding node according to the binding information.
  • the controller 130 is a user-defined extension component, which is used for the status and life cycle management of a specific CRD.
  • the CRD status is updated according to the status of the CRD and the corresponding Pod, according to user commands, or the CRD's own policy, such as the end of the CRD life cycle after the Pod ends normally, so as to maintain the CRD life cycle.
  • the controller 130 is a functional component of the Kubernetes scheduling platform, and details are not described here.
  • the user creates CRD and Pod objects through Api-Server 140, and the scheduler 110 monitors the binding information of CRD and Pod objects through Api-Server. After all Pod scheduling is completed, the splitter 120 completes the Pod and Pod through Api-Server. Node bindings.
  • the scheduler 110 currently has two expansion modes: Extender (Extender) and Scheduling Framework (Scheduling Framework).
  • Extender Extender
  • Scheduling Framework Scheduling Framework
  • a new Split interface is added to the original extended interface.
  • the CRD is obtained through the Split interface.
  • the extender extends the scheduler 110 by means of web hooks, and the scheduling framework compiles the extension interface directly into the scheduler 110 .
  • the embodiment of the present application introduces a new extension interface, that is, the Split interface, which is configured to realize the splitting of the CRD resource objects and convert the CRD into a set of Pods. Different CRD resources may be split in different ways.
  • the specific implementation of the Split interface is carried out in Extender or Scheduling Framework, which is mainly responsible for two parts: adopting a certain strategy to split this CRD into a set of 1 to N Pods, and for Each Pod is divided into a specific number of resources; in the process of splitting, it is necessary to judge whether the remaining resources of the cluster nodes can meet the splitting requirements, such as GPU, CPU resources, etc.; if not, the scheduler 110 will return an error message; If satisfied, the split Pod collection will be returned.
  • Extender or Scheduling Framework which is mainly responsible for two parts: adopting a certain strategy to split this CRD into a set of 1 to N Pods, and for Each Pod is divided into a specific number of resources; in the process of splitting, it is necessary to judge whether the remaining resources of the cluster nodes can meet the splitting requirements, such as GPU, CPU resources, etc.; if not, the scheduler 110 will return an error message; If satisfied, the split Pod collection will be returned.
  • the scheduling system When the scheduling system is scheduling, if the scheduling object is a CRD, it disassembles the CRD according to the current resource status to obtain a list of scheduling units.
  • the list of scheduling units includes a set of Pods, so that the Kubernetes scheduling platform can schedule Pods according to the list of scheduling units, and all Pods can be scheduled. Scheduling is performed in sequence according to the queue to avoid inserting other Pods, which ensures that CRDs can be scheduled reasonably, and the scheduling efficiency is high, making the Kubernetes scheduling platform compatible with various business scenarios.
  • the scheduling object when the scheduling object is a Pod, it is processed according to the scheduling process of the original Kubernetes scheduling system, but the Pod binding operation is completed by the splitter 120; when the scheduling object is a CRD, the splitter 120 will To disassemble the Pod and disassemble the CRD into one or more Pods, the splitter 120 only needs to determine the number of Pods to be disassembled by the CRD, and the resources (CPU, memory, GPU) used by a Pod, After the splitter 120 splits the CRD, the scheduler 110 completes the scheduling of these Pods.
  • the scheduler 110 performs optimization algorithms such as filtering, prioritizing, and scoring the nodes to select suitable nodes for the Pods, so that the splitter 120 will Pods in the Pod list are bound to nodes, which can ensure that the resources of the scheduler 110 and the splitter 120 are synchronized.
  • the scheduler 110 of the Kubernetes scheduling platform can support the hybrid scheduling of CRDs and Pods, and the atomic scheduling of Pods with a single CRD. It can be understood that when CRDs and Pods are mixed scheduling, the scheduler 110 reads the configuration and knows which CRDs are involved in scheduling, and the scheduler 110 puts the Pods and the CRDs that need to be scheduled into the same scheduling queue; when the object scheduled by the scheduler 110 is CRDs When the CRD object is disassembled, it is necessary to obtain the list of Pod objects after the CRD object is disassembled through the extended Split interface, and schedule each Pod in turn, so as to realize the mixed scheduling of CRD and Pod.
  • the atomic scheduling of Pods in CRD can be understood as: when scheduling the set of Pods disassembled by CRD, other Pods cannot be scheduled.
  • the set of Pods decomposed by CRD must be successfully scheduled to be successful, otherwise it will fail, so as to avoid failure due to insufficient remaining resources. is scheduled, causing the entire CRD scheduling to fail.
  • CRD scheduling has a BackOff mechanism.
  • the BackOff mechanism can be understood as: if any one of the Pods in the CRD fails to be scheduled, it is considered that the entire CRD scheduling fails. If the CRD scheduling fails, the Pods that have been successfully scheduled in the CRD need to be deleted and resources will be released. In addition, the CRD decomposed Pod has the function of re-entrancy protection.
  • the scheduling queue of the scheduler 110 stores the CRD object and the Pod object, and the Pod set belonging to the CRD object does not need to be inserted into the scheduling queue.
  • the scheduler 110 and the splitter 120 have a resource synchronization mechanism.
  • the splitter 120 needs to disassemble the CRD reasonably and optimally. It needs to know the resource status of the cluster, monitor the node and Pod information, and cache it locally. Allocatable resource information.
  • the scheduler 110 After the scheduler 110 successfully schedules the Pod set of the CRD, the scheduler 110 sends a Pod binding (Bind) request to the splitter 120.
  • the splitter 120 After accepting the binding request, the splitter 120 first updates the local cache of the splitter 120 The node can allocate resource information, and then send the final binding request to Api-Server140, so that the synchronization of resources can be achieved.
  • the system architecture platform 100 and the application scenarios described in the embodiments of the present application are for the purpose of illustrating the technical solutions of the embodiments of the present application more clearly, and do not constitute limitations on the technical solutions provided by the embodiments of the present application.
  • the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
  • system architecture platform 100 shown in FIG. 1 does not constitute a limitation to the embodiments of the present application, and may include more or less components than those shown in the figure, or combine certain components, or different component layout.
  • FIG. 2 is a flowchart of a resource scheduling method provided by an embodiment of the present application.
  • the resource scheduling method includes but is not limited to step S100, step S200, and step S300.
  • Step S100 acquiring the scheduling object from the scheduling queue.
  • resource scheduling can be understood as the rational and effective utilization of various resources. It can be understood that the objects to be scheduled are resource objects, and the schedulable objects are arranged in the form of queues, and the scheduling is based on the order of the queues. Or the priority is called to obtain the scheduling object, which is convenient for quickly obtaining the scheduling object and is also conducive to the reasonable scheduling of resources.
  • the Kubernetes scheduling platform provides many default resource types, such as a series of resources such as Pod, Deployment, Service, and Volume, which can meet the needs of most daily system deployment and management.
  • resource types such as Pod, Deployment, Service, and Volume
  • CRD can be used to meet these requirements and effectively improve the scalability of Kubernetes.
  • the Kubernetes scheduling platform supports the scheduling of Pods, which can directly schedule Pods. It is understandable that CRD and Pod objects can be inserted into the same scheduling queue at the same time, or CRDs can be scheduled separately. Specifically, during the mixed scheduling of CRD and Pod, the configuration is read through the scheduler of the Kubernetes scheduling platform to obtain CRD objects and Pod objects that can participate in scheduling. The scheduler puts the Pod and the CRD that needs to be scheduled into the same scheduling queue, and passes The scheduler sequentially obtains scheduling objects from the scheduling queue for scheduling.
  • Step S200 when the scheduling object is a custom resource, disassemble the custom resource according to the current resource state to obtain a scheduling unit list.
  • the list of scheduling units includes a first scheduling unit configured to form a custom resource, the custom resource is a CRD, and the first scheduling unit is a CRD object. It can be understood that the CRD object and the CRD object can be simultaneously inserted into the same scheduling queue.
  • Native Pod objects that is, CRD objects and Pod objects can be mixed for scheduling. When CRD and Pod are mixed together, the scheduler obtains scheduling objects from the scheduling queue in turn. When scheduling, the scheduler will first determine the type of the scheduling object. If the scheduling object is a CRD, it will disassemble the CRD according to the current resource status to obtain a list of scheduling units. , so that the Kubernetes scheduling platform can directly schedule Pods according to the Pod list.
  • the CRD needs to be disassembled according to the current resource state
  • the current resource state can be understood as the remaining resources or available resources of the current scheduling platform.
  • the CRD object is divided reasonably by the splitter, so that the CRD can be scheduled according to the best resource allocation method, and the operation efficiency is higher.
  • the scheduling object is a native Pod
  • the Pod can be scheduled directly without disassembly.
  • a Pod is the basic unit of the kubernetes scheduling platform, the smallest component created or deployed by users, and a resource object for running containerized applications.
  • Other resource objects in the Kubernetes cluster support the pod resource object to achieve the purpose of kubernetes managing application services.
  • the Kubernetes scheduling platform supports the mixed scheduling of Pods and CRDs, and at the same time realizes atomic scheduling of Pods of a single CRD, which also ensures that CRDs can be scheduled reasonably and are compatible with various business scenarios.
  • Step S300 scheduling the scheduling units in the scheduling unit list in sequence.
  • a scheduling unit list is generated after the disassembly is completed.
  • the scheduling unit is a Pod
  • the scheduling unit list is a Pod set list.
  • the scheduler schedules the Pod set list in turn. to complete the scheduling of a single CRD. It can be understood that scheduling all Pods in sequence in the form of a list can avoid that the remaining Pods in the list cannot be scheduled due to insufficient remaining resources due to the insertion of other Pods, resulting in the failure of the entire CRD scheduling; it can also avoid scheduling a part of a CRD.
  • the remaining Pods of the two CRDs may not be scheduled due to insufficient remaining resources, and the occupied resources cannot be released, and the two CRDs will enter a resource deadlock state. .
  • step S200 the custom resource is disassembled according to the current resource state to obtain a list of scheduling units, which may include but not limited to the following steps:
  • Step S210 When the remaining resources of the cluster nodes meet the requirements for disassembling the custom resource, disassemble the custom resource to obtain a list of scheduling units.
  • the splitter mainly responds to the dismantling request of the scheduler, decomposes the CRD into reasonable Pods according to the resource occupancy of the current cluster nodes, and is responsible for creating scheduling units containing these Pods. List, and return the list of scheduling units to the scheduler for scheduling. It can be seen that the splitter can learn the resource status of the cluster nodes. For example, it can obtain the resource status by monitoring the binding status of the cluster nodes, and reasonably disassemble the CRD according to the resource status. , to meet the optimal dismantling CRD requirements.
  • the splitter can disassemble CRDs efficiently and reasonably under the premise of fully considering the state of resources.
  • the scheduler does not need to understand CRDs, and focuses on Pod scheduling to achieve CRD splitting and scheduling.
  • the CRD decomposition Pod has the function of reentrancy protection.
  • the scheduling queue of the scheduler stores the CRD object and the Pod object, and the Pod set belonging to the CRD object does not need to be inserted into the scheduling queue.
  • the resource scheduling method further includes but is not limited to the following steps:
  • Step S101 create a scheduling object according to the scheduling request
  • Step S102 Monitor the binding information of the scheduling object, and place the newly added scheduling object in the same queue to form a scheduling queue.
  • CRD objects and Pod objects are created according to the actual needs of application scenarios, for example, deep learning CRDs are required.
  • Users create CRD objects and Pod objects through Api-Server, and the scheduler monitors the binding information of CRD objects and Pod objects through Api-Server, and puts schedulable CRDs and Pods in the same queue.
  • CRDs and Pods are added to the queue to form a scheduling queue, and then scheduling objects are obtained from the scheduling queue.
  • the added scheduling objects can be CRDs and Pods, or all CRDs or all Pods.
  • the resource scheduling method further includes but is not limited to the following steps:
  • Step S400 After completing the scheduling of all the scheduling objects, bind the scheduling unit to the corresponding node.
  • the Kubernetes scheduling platform can reasonably disassemble CRDs when scheduling CRD objects, and return the scheduling unit list to the scheduler for scheduling.
  • the scheduler only needs to focus on Pod scheduling to complete the scheduling of all scheduling objects. .
  • the scheduler sends a node binding request to the splitter, and the splitter can respond to the scheduler's node binding request to complete the binding operation between the Pod and the node. Specifically, the splitter completes the binding of Pods and nodes through Api-Server.
  • the resource scheduling method further includes but is not limited to the following steps:
  • Step S500 when any scheduling unit fails to be scheduled, delete the scheduled scheduling unit and release resources.
  • any Pod in the Pod set of the CRD fails to be scheduled, it is considered that the entire CRD fails to be scheduled. If the CRD scheduling fails, the Pods that have been successfully scheduled in the CRD need to be deleted and resources are released to avoid resource occupation and reduce operating efficiency.
  • step S400 after completing the scheduling of all scheduling objects, binding the scheduling unit to the corresponding node, which may include but not limited to the following steps:
  • Step S410 Initiating a node binding request, updating the allocatable resource information of the node, determining the optimal node according to the allocatable resource information, and assigning hosts to the scheduling unit respectively according to the optimal node;
  • Step S420 Bind the scheduling unit to the corresponding host.
  • the splitter after completing the scheduling of all Pods, completes the binding of Pods and nodes through Api-Server.
  • the process of node binding is to select the appropriate node through optimization algorithms such as filtering, prioritizing, and scoring the nodes, then rotate the optimal node to assign a host to the pod, and send a pod binding request to the API-Server to bind the pod. Set to the corresponding host to complete the binding operation.
  • the scheduling object is a Pod
  • the Kubernetes scheduling system processes it according to the original scheduling process, but the Pod binding operation is completed by the splitter;
  • the scheduling object is a CRD
  • the splitter will Resource status, to disassemble the Pod and disassemble the CRD into one or more Pods, the splitter only needs to determine the number of Pods that need to be disassembled by the CRD, and the resources (CPU, memory, GPU) used by a Pod, and split
  • the scheduler completes the scheduling of these Pods.
  • the scheduler will filter, prioritize, score and other optimization algorithms to select the appropriate node for the Pod, so that the Pod in the Pod list will be split with the splitter. Nodes are bound to ensure that the scheduler and splitter resources are synchronized.
  • the scheduler and the splitter have a resource synchronization mechanism.
  • the splitter needs to disassemble the CRD reasonably and optimally. It needs to know the resource status of the cluster, monitor the node and Pod information, and cache the resource information that can be allocated locally. After the CRD Pod set is successfully scheduled by the scheduler, the scheduler sends a Pod binding request to the splitter. After the splitter accepts the binding request, it first updates the node allocation resource information in the splitter's local cache, and then Send the final binding request to Api-Server so that resources can be synchronized.
  • the resource scheduling method includes but is not limited to the following steps:
  • Step S610 Create CRD and Pod objects through Api-Server
  • Step S620 monitor CRD and Pod objects through Api-Server, and put the newly added CRD or Pod into the same scheduling queue;
  • Step S630 Obtain the scheduling object from the scheduling queue
  • the scheduling object is a Pod
  • it is processed according to the Pod scheduling process
  • the scheduling object is CRD
  • Step S640 schedule Pods in turn according to the Pod list returned by the splitter
  • Step S650 After all Pods are scheduled, initiate a binding request to the splitter and complete the binding of Pods and nodes through Api-Server.
  • This embodiment is an example of the scheduler successfully scheduling CRDs and Pods in a mixed manner.
  • the embodiment shows the process of mixing and scheduling CRDs and Pods on the Kubernetes scheduling platform, defining deep learning jobs as CRDs, and completing the parallel execution of deep learning jobs by Pod Workers Bearer, can realize the mixed scheduling of deep learning jobs and Pods and can run successfully.
  • Instance environment Kubernetes cluster equipped with Ubuntu16.04 system, including two nodes, with sufficient node resources; the cluster has deployed a modified scheduler; the controller and splitter for deploying custom deep learning jobs.
  • Step S710 define a deep learning job file, and create the CRD object
  • Step S720 define the file of a single Pod, and create the Pod object
  • Step S730 After the deep learning job is successfully created, the CRD corresponding to the deep learning job is in the running state;
  • Step S740 After the Pod related to the deep learning job is successfully created, the Pods disassembled by the deep learning job are all running.
  • step S720 the state of the single Pod created in step S720 is obtained as the running state, wherein the state of the CRD should be consistent with the state of the disassembled Pod.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • This example shows that the scheduler successfully schedules two CRD objects.
  • the example shows the process of mixing and scheduling different CRDs on the Kubernetes scheduling platform.
  • the deep learning job is defined as a CRD
  • the machine learning job is defined as a CRD
  • the two CRD objects are executed. All Workers are hosted by Pods, which can realize mixed scheduling of deep learning jobs and machine learning jobs and run successfully.
  • Instance environment Kubernetes cluster equipped with Ubuntu16.04 system, including two nodes, with sufficient node resources; the cluster has deployed the modified scheduler; the controller and splitter for deploying custom deep learning jobs; deploying custom machines Controllers and splitters for learning jobs.
  • Step S810 define the file of the deep learning job, and create the CRD object
  • Step S820 define the file of the machine learning job, and create the CRD object
  • Step S830 After the deep learning job is successfully created, the CRD corresponding to the deep learning job is in the running state;
  • Step S840 After the Pod related to the deep learning job is successfully created, the Pods disassembled by the deep learning job are all running;
  • Step S850 After the machine learning job is successfully created, the CRD corresponding to the deep learning job is in the running state;
  • Step S860 After the Pod related to the machine learning job is successfully created, the Pods disassembled by the deep learning job are all running.
  • the state of the CRD should be consistent with the state of the disassembled Pod.
  • This embodiment is that the scheduler schedules the CRD to run on the fewest nodes.
  • the embodiment shows that when scheduling the CRD object on the Kubernetes scheduling platform, the CRD can be reasonably disassembled according to the resource status, and the deep learning job is defined as a CRD to complete the deep learning. Workers that execute jobs in parallel are carried by Pods.
  • the scheduler can automatically disassemble CRDs according to the current resource status, and schedule CRD Pods to run on extremely small nodes, reducing network overhead and ensuring the rationality of disassembly. .
  • Example environment Kubernetes cluster equipped with Ubuntu 16.04 system, including 3 nodes, the nodes have sufficient CPU and memory resources, node 1 has 8 idle GPUs, and nodes 2 and 3 have 4 idle GPUs; the cluster has been deployed and modified Scheduler; a controller and splitter that deploys custom deep learning jobs.
  • Step S910 a file defining a deep learning job, wherein the job applies for 8 GPU resources, and the CRD object is created;
  • Step S920 After the deep learning job is successfully created, the CRD corresponding to the deep learning job is in the running state;
  • Step S930 After the Pod related to the deep learning job is successfully created, the Pods disassembled by the deep learning job are all running;
  • Step S940 It is obtained that the number of Pods after the CRD is disassembled is 1, and the Pod runs on Node 1.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • This example shows that the scheduler successfully schedules CRDs with large granularity of resource application.
  • CRDs can be reasonably disassembled according to the resource status, and deep learning jobs can be defined as CRDs to complete deep learning jobs. Workers running in parallel are carried by Pods, and the scheduler can automatically disassemble CRDs according to the current resource status when scheduling CRDs. If the resource application granularity of the job is large, the resources of a single node cannot meet the resource application of the job, but the total resources of the cluster can be When satisfied, the CRD can be successfully disassembled and scheduled and run successfully, ensuring that the job will not be starved of resources.
  • Example environment A Kubernetes cluster equipped with Ubuntu 16.04 system, including 4 nodes, the nodes have sufficient CPU and memory resources, nodes 1 and 3 have 4 idle GPUs, and nodes 2 and 4 have 2 idle GPUs; the cluster has been deployed Modified scheduler; controller and splitter for deploying custom deep learning jobs.
  • Step S1010 define a file of a deep learning job, wherein the job applies for 8 GPU resources, and creates the CRD object;
  • Step S1020 After the deep learning job is successfully created, the CRD corresponding to the deep learning job is in the running state;
  • Step S1030 After the Pod related to the deep learning job is successfully created, the Pods disassembled by the deep learning job are all running;
  • Step S1040 It is obtained that the number of Pods after the CRD is disassembled is 2, and the two Pods run on Node 1 and Node 3.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • This embodiment is the scheduler atomically schedules the CRD disassembled Pod.
  • the embodiment shows that in the Kubernetes scheduling platform, the scheduler can schedule the Pod of a single CRD object, define the deep learning job as a CRD, and define the machine learning job Into CRD, the Workers executed by the two CRD objects are carried by the Pod, which can realize the atomic scheduling of the CRD Pod, and avoid the unreasonable scheduling of the CRD and the problem of two CRDs entering the resource deadlock.
  • Instance environment Kubernetes cluster equipped with Ubuntu16.04 system, including 3 nodes, with sufficient CPU and memory resources, and 3 nodes with 4 idle GPUs; the cluster has been deployed with a modified scheduler; custom deep learning jobs have been deployed Controllers and Splitters; Controllers and Splitters for deploying custom machine learning jobs.
  • Step S1110 a file defining a deep learning job, the job applies for 8 GPU resources, and the CRD object is created;
  • Step S1120 a file defining a machine learning job, the job applies for 8 GPU resources, and the CRD object is created;
  • Step S1130 After the deep learning job is successfully created, the CRD status corresponding to the deep learning job;
  • Step S1140 After the machine learning job is successfully created, the CRD status corresponding to the deep learning job;
  • Step S1150 Obtaining that only one of the deep learning job and the machine learning job is in the running state, and the related Pods of the jobs in the running state are all in the running state.
  • an embodiment of the present application also provides a device, the device includes: a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the processor and memory may be connected by a bus or otherwise.
  • the memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device.
  • the memory may include memory located remotely from the processor, which may be connected to the processor through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the terminal in this embodiment may include the system architecture platform 100 in the embodiment shown in FIG. 1 , and the terminal in this embodiment and the system architecture platform 100 in the embodiment shown in FIG. 1 belong to the same Therefore, these embodiments have the same realization principle and technical effect, and will not be described in detail here.
  • the non-transitory software programs and instructions required to implement the resource scheduling method of the above-mentioned embodiment are stored in the memory, and when executed by the processor, the resource scheduling method in the above-mentioned embodiment is executed, for example, the above-described method in FIG. 2 is executed.
  • an embodiment of the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or controller, for example, by the above-mentioned Executed by a processor in the embodiment of the terminal, the above-mentioned processor may execute the resource scheduling method in the above-mentioned embodiment, for example, execute the above-described method steps S100 to S300 in FIG. 2 and method steps S101 to S102 in FIG. 3 . , method step S400 in FIG. 4, method step S410 to step S420 in FIG. 5, method step S610 to step S650 in FIG. 6, method step S710 to step S740 in FIG.
  • Step S860 method steps S910 to S940 in FIG. 9
  • method steps S1010 to S1040 in FIG. 10 and method steps S1110 to S1150 in FIG. 11 .
  • the embodiments of the present application include: obtaining a scheduling object from a scheduling queue during resource scheduling, and if the scheduling object is a custom resource, dismantling the custom resource according to the current resource state to obtain a scheduling unit list, where the scheduling unit list includes components configured to constitute Customize the first scheduling unit of the resource, and then schedule each first scheduling unit in turn according to the list of scheduling units, which can be applied to the Kubernetes scheduling platform.
  • the scheduling object is a CRD
  • the CRD is disassembled according to the current resource status to obtain the scheduling
  • the list of units, the list of scheduling units includes the set of Pods, so that the Kubernetes scheduling platform can perform atomic scheduling on all Pods according to the list of scheduling units, and all Pods are scheduled in sequence according to the queue, avoiding inserting other Pods, so as to ensure that CRDs can be reasonably scheduled and scheduled High efficiency makes the Kubernetes scheduling platform compatible with various business scenarios.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is well known to those of ordinary skill in the art .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种资源调度方法、资源调度***、设备及计算机可读存储介质,其中,资源调度方法包括从调度队列中获取调度对象(S100);当所述调度对象为自定义资源,根据当前资源状态拆解所述自定义资源,得到调度单元列表(S200),所述调度单元列表包括被配置为构成所述自定义资源的第一调度单元;依次调度所述调度单元列表中的所述第一调度单元(S300)。

Description

资源调度方法、资源调度***及设备
相关申请的交叉引用
本申请基于申请号为202010625668.0、申请日为2020年7月1日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及计算机技术领域,特别涉及一种资源调度方法、资源调度***、设备及计算机可读存储介质。
背景技术
Kubernetes作为当前最主流的容器编排、调度平台,Kubernetes可通过良好的扩展性来支持用户自定义资源(Custom Resource Definitions,CRD)的管理,方便用户将自定义资源作为一个整体对象实体来管理。但是目前Kubernetes仅支持Pod的调度,要调度CRD需要专门的调度器,多调度器之间会引发资源调度冲突问题,同时也会出现以下问题:资源无法满足CRD的资源请求,导致CRD无法被调度;即使CRD能被成功调度,但是CRD并不是按最佳的资源分配方式进行调度,运行效率就会降低。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请提出一种资源调度方法、资源调度***、设备及计算机可读存储介质。
第一方面,本申请实施例提供的资源调度方法,包括:从调度队列中获取调度对象;当所述调度对象为自定义资源,根据当前资源状态拆解所述自定义资源,得到调度单元列表,所述调度单元列表包括被配置为构成所述自定义资源的调度单元;依次调度所述调度单元列表中的所述调度单元。
第二方面,本申请实施例提供的资源调度***,包括:调度器,被配置为从调度队列中获取调度对象;拆分器,被配置为当所述调度对象为自定义资源,根据当前资源状态拆解所述自定义资源,得到调度单元列表,所述调度单元列表包括被配置为构成所述自定义资源的调度单元;其中,所述调度器依次调度所述调度单元列表中的所述调度单元。
第三方面,本申请实施例提供的设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上述第一方面实施例的资源调度方法。
第四方面,本申请实施例提供的计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行如上述第一方面实施例的资源调度方法。
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。
附图说明
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请 的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。
图1是本申请一个实施例提供的***架构平台的示意图;
图2是本申请一个实施例提供的资源调度方法的流程图;
图3是本申请另一实施例提供的资源调度方法的流程图;
图4是本申请另一实施例提供的资源调度方法的流程图;
图5是本申请另一实施例提供的资源调度方法的流程图;
图6是本申请另一实施例提供的资源调度方法的流程图;
图7是本申请另一实施例提供的资源调度方法的流程图;
图8是本申请另一实施例提供的资源调度方法的流程图;
图9是本申请另一实施例提供的资源调度方法的流程图;
图10是本申请另一实施例提供的资源调度方法的流程图;
图11是本申请另一实施例提供的资源调度方法的流程图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书、权利要求书或上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
Kubernetes是一个开源的,用于管理云平台中多个主机上的容器化的应用,Kubernetes的目标是让部署容器化的应用简单并且高效,Kubernetes提供了应用部署,规划,更新,维护的一种机制。在Kubernetes中,可以创建多个容器,每个容器里面运行一个应用实例,然后通过内置的负载均衡策略,实现对这一组应用实例的管理、发现、访问,而这些细节都不需要运维人员去进行复杂的手工配置和处理,Kubernetes有着广泛的应用,很多企业或者研究机构的云计算、人工智能等平台都是基于Kubernetes实现的。Kubernetes通过良好的扩展性来支持用户自定义资源(Custom Resource Definitions,CRD)的管理,方便用户将自定义资源作为一个整体对象实体来管理。
但是目前Kubernetes仅支持Pod的调度,Pod是Kubernetes中能够创建和部署的最小单元,是Kubernetes集群中的一个应用实例,总是部署在同一个节点上,Pod中包含了一个或多个容器,还包括了存储、网络等各个容器共享的资源。Kubernetes调度CRD需要专门的调度器,多调度器之间会引发资源调度冲突问题。
针对Kubernetes默认调度器仅支持Pod的调度,不支持CRD对象的调度,Kubernetes默认调度器不能根据当前的资源状态将CRD对象自动合理地拆解成Pod。本申请提供了本申请提出一种资源调度方法、资源调度***、设备及计算机可读储存介质,在资源调度时,从调度队列中获取调度对象,若调度对象为自定义资源,则根据当前资源状态拆解自定义资源,得到调度单元列表,该调度单元列表包括被配置为构成自定义资源的第一调度单元,然后根据调度单元列表依次调度每个第一调度单元,该资源调度方法能够应用于Kubernetes调度平台,第一调度单元为CRD对象,在调度时若调度对象为CRD,则根据 当前资源状态拆解CRD,得到调度单元列表,调度单元列表包括Pod的集合,这样Kubernetes调度平台可根据调度单元列表对Pod进行原子调度,且所有Pod按照队列依次进行调度,避免***其它Pod,这样保证CRD能够被合理调度,调度效率高,使Kubernetes调度平台能够兼容各种业务场景。
下面将结合附图对本申请的技术方案进行清楚、完整的描述,显然,以下所描述的实施例是本申请一部分实施例,并非全部实施例。
参见图1所示,图1是本申请一个实施例提供的用于执行资源调度方法的***架构平台100示意图,该***架构平台100也是就资源调度***。
在图1所示的实施例中,***架构平台100包括调度器110和拆分器120,其中,调度器110被配置为调度对象的调度,拆分器120被配置为响应调度器110的拆解请求,将调度对象进行拆解,以满足调度器110的调度要求。具体的,调度时调度器110从调度队列中获取调度对象,当调度对象为自定义资源,拆分器120能够根据当前资源状态拆解自定义资源,得到调度单元列表,该调度单元列表包括被配置为构成自定义资源的第一调度单元。调度器110根据调度单元列表,依次调度调度单元列表中的第一调度单元,从而完成自定义资源的调度。
如图1所示,具体以Kubernetes调度平台为示例进行说明。
实施例Kubernetes调度***包括调度器(Scheduler)110、拆分器(Pod-Splitor)120和控制器(CRD-Controller)130。
其中,调度器110负责Pod的调度,拆分器负责拆分CRD对象,第一调度单元为CRD对象,第二调度单元为原生的Pod对象。本实施例中,将CRD和Pod放在同一调度队列中,对于调度对象为CRD时,调度器110通过扩展的分解(Split)接口得到拆解后的Pod集合,利用调度器110对全部的Pod依次进行调度。
拆分器120为用户自定义的扩展组件,主要响应调度器110的拆解请求,根据当前集群资源占用情况,将CRD分解为合理的Pod,并负责创建包含这些Pod的调度单元列表,并将调度单元列表回给调度器110进行调度;同时,拆分器120能够响应调度器110的节点绑定请求,完成Pod与节点(Node)的绑定操作。其中,Pod和节点的绑定可理解为在Pod这个对象里新增一些节点的信息、资源信息,然后调度***会有专门的组件会根据这些绑定信息在对应的节点上运行Pod。
控制器130为用户自定义的扩展组件,用于特定CRD的状态、生命周期管理。根据CRD和对应Pod的状态来更新CRD状态,根据用户命令、或者CRD的自身策略,例如Pod正常结束后CRD生命周期结束,从而维护CRD生命周期。控制器130属于Kubernetes调度平台所具有的功能组件,此处不再赘述。
还有,用户通过Api-Server140创建CRD、Pod对象,调度器110通过Api-Server监听CRD、Pod对象的绑定信息,在完成所有Pod的调度后,拆分器120通过Api-Server完成Pod与节点的绑定。
另外,调度器110目前有两种扩展方式:扩展器(Extender)和调度框架(Scheduling Framework),在原有的扩展接口中新增Split接口,在调度器110调度CRD时,通过该Split接口获取CRD拆解后的Pod集合。其中,扩展器通过web hook方式对调度器110进行扩展,调度框架将扩展接口直接编译进调度器110内。为了将CRD资源进行合理拆分,本申 请实施例引入新的扩展接口,即Split接口,被配置为实现对CRD资源对象的拆分,将CRD转变为Pod的集合。不同CRD资源的拆分方式可能不同,具体的Split接口实现则在Extender或Scheduling Framework中进行,主要负责两部分工作:采用某种策略将此CRD拆分为1~N个Pod的集合,并为每个Pod划分具体的资源数;拆分的过程中,需要判断集群节点剩余资源能否满足拆分需求,如GPU、CPU资源等;如若不满足,则为调度器110返回错误信息;如若均满足,则将拆分后的Pod集合返回。
调度***在调度时,若调度对象为CRD,则根据当前资源状态拆解CRD,得到调度单元列表,调度单元列表包括Pod集合,这样Kubernetes调度平台可根据调度单元列表对Pod进行调度,且所有Pod按照队列依次进行调度,避免***其它Pod,这样保证CRD能够被合理调度,调度效率高,使Kubernetes调度平台能够兼容各种业务场景。
需要说明的是,当调度对象为Pod时,按原来Kubernetes调度***的调度流程处理,但是Pod的绑定操作是由拆分器120完成;当调度对象为CRD,拆分器120会根据当前集群的资源状态,来拆解Pod,将CRD拆解为一个或多个Pod,拆分器120只需确定CRD需要拆解的Pod数目,以及某个Pod使用的资源(CPU、内存、GPU),拆分器120将CRD拆分完成后由调度器110完成这些Pod的调度,调度器110会对节点进行过滤、优先、打分等优化算法来为Pod选择合适的节点,从而通过拆分器120将Pod列表中的Pod与节点进行绑定,这样能够保证调度器110和拆分器120资源同步。
这样,Kubernetes调度平台的调度器110能够支持CRD、Pod的混合调度以及单一CRD的Pod原子调度。可理解到,CRD、Pod的混合调度时,调度器110读取配置,知道哪些CRD参与调度,调度器110将Pod以及需要调度的CRD放入同一调度队列;当调度器110调度的对象是CRD的时候,需要通过扩展的Split接口获取CRD对象拆解后的Pod对象列表,并依次调度每个Pod,从而实现CRD、Pod的混合调度。
CRD的Pod原子调度可理解为,在调度由CRD拆解后的Pod集合时,不能调度其它Pod,CRD分解的Pod集合必须全部调度成功才算成功,否则失败,这样避免因剩余资源不足而无法被调度,导致整个CRD调度失败。
需要说明的是,CRD调度具有回退(BackOff)机制,该BackOff机制可理解为,CRD的Pod中若有任一个Pod调度失败,则认为整个CRD调度失败。若CRD调度失败,该CRD中已经成功调度的Pod需要删除并释放资源。另外,CRD分解Pod具有重入保护功能,调度器110的调度队列中存放有CRD对象和Pod对象,属于CRD对象的Pod集合就不需要再***调度队列中。
需要说明的是,调度器110与拆分器120具有资源同步机制,拆分器120要合理、最优的拆解CRD,需要清楚集群的资源状态,需要监听节点与Pod信息,并在本地缓存可分配资源信息。CRD的Pod集合在调度器110成功调度后,调度器110向拆分器120发送Pod的绑定(Bind)请求,拆分器120在接受绑定请求后,先更新拆分器120的本地缓存的节点可分配资源信息,再向Api-Server140发送最终的绑定请求,这样才能做到资源的同步。
本申请实施例描述的***架构平台100以及应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域技术人员可知,随着***架构平台100的演变和新应用场景的出现,本申请实施例提供的技术方案对 于类似的技术问题,同样适用。
本领域技术人员可以理解的是,图1中示出的***架构平台100并不构成对本申请实施例的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
基于上述***架构平台100,下面提出本申请的资源调度方法的各个实施例。
参见图2所示,图2是本申请一个实施例提供的资源调度方法的流程图,该资源调度方法包括但不限于步骤S100、步骤S200和步骤S300。
步骤S100,从调度队列中获取调度对象。
在一实施例中,资源调度可理解为对各种资源进行合理有效的利用,可理解到,调度的对象为资源对象,将可调度的对象按队列形式进行排列,调度时根据队列的先后顺序或优先级进行调取,从而得到调度对象,便于快速获取调度对象,也有利于资源的合理调度。
以Kubernetes调度平台为示例进行说明,Kubernetes调度平台中可提供了很多默认资源类型,如Pod、Deployment、Service、Volume等一系列资源,能够满足大多数日常***部署和管理的需求。但是,在一些特殊需求的场景下,这些现有资源类型满足不了,那么就可以通过CRD来满足这些需求,有效提高Kubernetes的扩展能力。
需要说明的是,Kubernetes调度平台支持Pod的调度,即可直接调度Pod,可理解到,在同一调度队列中可同时***CRD和Pod对象,或单独调度CRD。具体的,CRD、Pod的混合调度时,通过Kubernetes调度平台的调度器读取配置,获取可以参与调度的CRD对象和Pod对象,调度器将Pod以及需要调度的CRD放入同一调度队列,并通过调度器依次从调度队列中获取调度对象进行调度。
步骤S200,当调度对象为自定义资源,根据当前资源状态拆解自定义资源,得到调度单元列表。
其中,调度单元列表包括被配置为构成自定义资源的第一调度单元,该自定义资源为CRD,该第一调度单元为CRD对象,可理解到,在同一调度队列中可同时***CRD对象和原生Pod对象,即CRD对象和Pod对象可混合进行调度,CRD和Pod混合调度时,调度器从调度队列中依次获取调度对象。调度器在调度时会先判断调度对象的类型,若调度对象为CRD,根据当前资源状态拆解CRD,得到调度单元列表,该调度单元列表为构成CRD的Pod列表,即将CRD拆分为Pod集合,这样使Kubernetes调度平台可根据Pod列表直接调度Pod。
可理解到,需要根据当前资源状态拆解CRD,当前资源状态可理解为当前调度平台的剩余资源或可用资源。在满足拆解CRD的资源请求的情况下,通过拆分器将CRD对象进行合理拆分,使CRD能够按最佳的资源分配方式进行调度,运行效率更高。
需要说明的是,当调度对象为原生Pod时,可直接调度Pod,无需进行拆解。可理解的是,Pod是kubernetes调度平台的基础单元,是由用户创建或部署的最小组件,也是运行容器化应用的资源对象。Kubernetes集群中其他资源对象都是为pod这个资源对象做支撑来实现kubernetes管理应用服务的目的。这样,Kubernetes调度平台支持Pod和CRD的混合调度,同时实现单一CRD的Pod的原子调度,也保证CRD能够被合理调度,兼容各种业务场景。
步骤S300,依次调度调度单元列表中的调度单元。
在一实施例中,在完成拆解后生成调度单元列表,在Kubernetes调度平台中,该调度单元为Pod,调度单元列表为Pod集合列表,根据Pod集合列表,调度器依次调度该Pod集合列表中的全部Pod,从而完成单个CRD的调度。可理解到,按照列表形式依次调度所有Pod,能够避免因其它Pod***而导致列表中剩余的Pod因剩余资源不足无法被调度,从而导致整个CRD调度失败;也能够避免在调度某个CRD的部分Pod时,***了另一个CRD的部分Pod,就有可能导致这两个CRD剩余的Pod因剩余资源不足均无法调度,已占用的资源又无法释放,两个CRD会进入资源死锁状态的问题。
在一实施例中,步骤S200中,根据当前资源状态拆解自定义资源,得到调度单元列表,可包括但不限于有以下步骤:
步骤S210:当集群节点的剩余资源满足拆解自定义资源的要求,拆解自定义资源得到调度单元列表。
在一实施例中,在Kubernetes调度平台中,拆分器主要响应调度器的拆解请求,根据当前集群节点的资源占用情况,将CRD分解为合理的Pod,并负责创建包含这些Pod的调度单元列表,并将调度单元列表回给调度器进行调度,可见,拆分器能够获知集群节点的资源状态,例如通过监听集群节点的绑定状态来获取资源状态,并根据该资源状态合理拆解CRD,满足最优的拆解CRD要求。
这样,拆分器在充分考虑资源的状态的前提下,高效合理地拆解CRD,同时调度器无需理解CRD,专注于Pod的调度,实现CRD的拆分和调度。
需要说明的是,CRD分解Pod具有重入保护功能,调度器的调度队列中存放有CRD对象和Pod对象,属于CRD对象的Pod集合就不需要再***调度队列中。
参见图3,在一实施例中,资源调度方法还包括但不限于有以下步骤:
步骤S101:根据调度请求创建调度对象;
步骤S102:监听调度对象的绑定信息,并将新增的调度对象放到同一队列中形成调度队列。
可理解到,用户根据应用场景的实际需要创建CRD对象和Pod对象,例如需要深度学习CRD。用户通过Api-Server创建CRD对象和Pod对象,调度器通过Api-Server监听CRD对象和Pod对象的绑定信息,将可调度的CRD和Pod放到同一队列中。CRD和Pod增加到队列中形成调度队列,然后从调度队列中获取调度对象,增加的调度对象可为CRD和Pod,或全部均为CRD或全部为Pod。
参见图4,在一实施例中,资源调度方法还包括但不限于有以下步骤:
步骤S400:完成对全部调度对象的调度后,把调度单元绑定至对应的节点。
在一实施例中,Kubernetes调度平台在调度CRD对象时能够合理地拆解CRD,并将调度单元列表回给调度器进行调度,调度器只需专注于Pod的调度即可完成全部调度对象的调度。在全部调度对象的调度后,调度器向拆分器发送节点绑定请求,拆分器能够响应调度器的节点绑定请求,完成Pod与节点的绑定操作。具体的,拆分器是通过Api-Server完成Pod与节点的绑定。
在一实施例中,资源调度方法还包括但不限于有以下步骤:
步骤S500:当任一调度单元调度失败,删除已调度的调度单元并释放资源。
实施例中,CRD的Pod集合中若有任一个Pod调度失败,则认为整个CRD调度失败。 若CRD调度失败,该CRD中已经成功调度的Pod需要删除并释放资源,避免资源占用而降低运行效率。
参见图5,在一实施例中,步骤S400:完成对全部调度对象的调度后,把调度单元绑定至对应的节点,可包括但不限于有以下步骤:
步骤S410:发起节点绑定请求,并更新节点的可分配资源信息,根据可分配资源信息确定最优节点,根据最优节点分别为调度单元分配主机;
步骤S420:把调度单元绑定至对应的主机。
在一实施例中,在完成所有Pod的调度后,拆分器通过Api-Server完成Pod与节点的绑定。节点绑定的流程是,通过对节点进行过滤、优先、打分等优化算法选择合适的节点,然后旋转最优节点为pod分配主机,并向API-Server发送pod的绑定请求,从而将Pod绑定至对应的主机上,完成绑定操作。
需要说明的是,当调度对象为Pod时,Kubernetes调度***按原有的调度流程处理,但是Pod的绑定操作是由拆分器完成;当调度对象为CRD,拆分器会根据当前集群的资源状态,来拆解Pod,将CRD拆解为一个或多个Pod,拆分器只需确定CRD需要拆解的Pod数目,以及某个Pod使用的资源(CPU、内存、GPU),拆分器将CRD拆分完成后由调度器完成这些Pod的调度,调度器会对节点进行过滤、优先、打分等优化算法来为Pod选择合适的节点,从而通过拆分器将Pod列表中的Pod与节点进行绑定,这样能够保证调度器和拆分器资源同步。
另外,调度器与拆分器具有资源同步机制,拆分器要合理、最优的拆解CRD,需要清楚集群的资源状态,需要监听节点与Pod信息,并在本地缓存可分配资源信息。CRD的Pod集合在调度器成功调度后,调度器向拆分器发送Pod的绑定请求,拆分器在接受绑定请求后,先更新拆分器的本地缓存的节点可分配资源信息,再向Api-Server发送最终的绑定请求,这样才能做到资源的同步。
参见图6,在一实施例中,以Kubernetes调度平台为示例,资源调度方法包括但不限于有以下步骤:
步骤S610:通过Api-Server创建CRD、Pod对象;
步骤S620:通过Api-Server监听CRD、Pod对象,并将新增CRD或Pod放入同一调度队列中;
步骤S630:从调度队列中获取调度对象;
当调度对象为Pod,按照Pod调度流程处理;
当调度对象为CRD,向拆分器发送CRD拆解请求,使拆分器根据当前资源状态拆解CRD,并通过Api-Server创建拆解出来的Pod;
步骤S640:根据拆分器返回的Pod列表,并根据Pod列表依次调度Pod;
步骤S650:全部Pod调度完成后,向拆分器发起绑定请求并通过Api-Server完成Pod和节点的绑定。
为了更加清楚的说明上述各个实施例中资源调度方法的具体步骤流程,以下通过五个实施例进行说明。
实施例一:
该实施例为调度器成功混合调度CRD、Pod的示例,实施例展示了在Kubernetes调度 平台上混合调度CRD、Pod的过程,把深度学习作业定义成CRD,完成深度学习作业并行执行的Workers由Pod承载,可以实现深度学习作业和Pod的混合调度并且能够成功运行。
实例环境:搭载Ubuntu16.04***的Kubernetes集群,包含两个节点,节点资源充足;集群已经部署修改过的调度器;部署自定义的深度学习作业的控制器和拆分器。
参见图7,具体操作步骤如下:
步骤S710:定义深度学习作业文件,并创建该CRD对象;
步骤S720:定义单个Pod的文件,并创建该Pod对象;
步骤S730:深度学习作业成功创建后,深度学习作业对应的CRD为运行状态;
步骤S740:深度学习作业相关的Pod创建成功后,深度学习作业拆解出来的Pod均为运行状态。
这样,得到步骤S720中创建的单个Pod的状态为运行状态,其中,CRD的状态应和拆解出来的Pod的状态保持一致。
实施例二:
该实施例为调度器成功调度两种CRD对象,实施例展示了在Kubernetes调度平台上混合调度不同CRD的过程,把深度学习作业定义成CRD,把机器学习作业定义成CRD,两种CRD对象执行的Workers均由Pod承载,可以实现深度学习作业和机器学习作业的混合调度并且都能成功运行。
实例环境:搭载Ubuntu16.04***的Kubernetes集群,包含两个节点,节点资源充足;集群已经部署修改过的调度器;部署自定义的深度学习作业的控制器和拆分器;部署自定义的机器学习作业的控制器和拆分器。
参见图8,具体操作步骤如下:
步骤S810:定义深度学习作业的文件,并创建该CRD对象;
步骤S820:定义机器学习作业的文件,并创建该CRD对象;
步骤S830:深度学习作业成功创建后,深度学习作业对应的CRD为运行状态;
步骤S840:深度学习作业相关的Pod创建成功后,深度学习作业拆解出来的Pod均为运行状态;
步骤S850:机器学习作业成功创建后,深度学习作业对应的CRD为运行状态;
步骤S860:机器学习作业相关的Pod创建成功后,深度学习作业拆解出来的Pod均为运行状态。
其中,CRD的状态应和拆解出来的Pod的状态保持一致。
实施例三:
该实施例为调度器将CRD调度在最少的节点上运行,实施例展示了在Kubernetes调度平台上调度CRD对象时,根据资源状态能够合理拆解CRD,把深度学习作业定义成CRD,完成深度学习作业并行执行的Workers由Pod承载,调度器在调度CRD时,可根据当前的资源状态自动拆解CRD,将CRD的Pod调度在极小的节点上运行,减少网络开销,保证拆解的合理性。
实例环境:搭载Ubuntu16.04***的Kubernetes集群,包含3个节点,节点的CPU、内存资源充足,节点1有8个空闲GPU,节点2、3均有4个空闲GPU;集群已经部署修改过的调度器;部署自定义的深度学习作业的控制器和拆分器。
参见图9,具体操作步骤如下:
步骤S910:定义深度学习作业的文件,其中作业申请8个GPU资源,创建该CRD对象;
步骤S920:深度学习作业成功创建后,深度学习作业对应的CRD为运行状态;
步骤S930:深度学习作业相关的Pod创建成功后,深度学习作业拆解出来的Pod均为运行状态;
步骤S940:得到CRD拆解后的Pod个数为1,且该Pod运行在节点1上。
实施例四:
该实施例为调度器成功调度资源申请粒度大的CRD,实施例展示了在Kubernetes调度平台上调度CRD对象时,根据资源状态能够合理拆解CRD,把深度学习作业定义成CRD,完成深度学习作业并行执行的Workers由Pod承载,调度器在调度CRD时可根据当前的资源状态自动拆解CRD,若该作业的资源申请粒度大,单节点资源无法满足其作业的资源申请,但是集群总资源能满足时能够成功拆解该CRD并成功调度和运行,保证该作业不会处于资源饿死状态。
实例环境:搭载Ubuntu16.04***的Kubernetes集群,包含4个节点,节点的CPU、内存资源充足,节点1、3均有4个空闲GPU,节点2、4均有2个空闲GPU;集群已经部署修改过的调度器;部署自定义的深度学习作业的控制器和拆分器。
参见图10,具体操作步骤如下:
步骤S1010:定义深度学习作业的文件,其中作业申请8个GPU资源,并创建该CRD对象;
步骤S1020:深度学习作业成功创建后,深度学习作业对应的CRD为运行状态;
步骤S1030:深度学习作业相关的Pod创建成功后,深度学习作业拆解出来的Pod均为运行状态;
步骤S1040:得到CRD拆解后的Pod个数为2,且两个Pod运行在节点1和节点3上。
实施例五:
该实施例为调度器原子调度CRD拆解后的Pod,实施例展示了在Kubernetes调度平台中调度器能够实现对单一CRD对象的Pod的调度,把深度学习作业定义成CRD,把机器学习作业定义成CRD,两种CRD对象执行的Workers均由Pod承载,可实现CRD的Pod的原子调度,避免CRD的调度不合理以及两个CRD进入资源死锁的问题。
实例环境:搭载Ubuntu16.04***的Kubernetes集群,包含3个节点,节点CPU、内存资源充足,3个节点均有4个空闲GPU;集群已经部署修改过的调度器;部署自定义的深度学习作业的控制器和拆分器;部署自定义的机器学习作业的控制器和拆分器。
参见图11,具体操作步骤如下:
步骤S1110:定义深度学习作业的文件,该作业申请8个GPU资源,并创建该CRD对象;
步骤S1120:定义机器学习作业的文件,该作业申请8个GPU资源,并创建该CRD对象;
步骤S1130:深度学习作业成功创建后,深度学习作业对应的CRD状态;
步骤S1140:机器学习作业成功创建后,深度学习作业对应的CRD状态;
步骤S1150:得到深度学习作业和机器学习作业仅有一个处于运行状态,且处于运行状态的作业的相关Pod均为运行状态。
另外,本申请的一个实施例还提供了一种设备,该设备包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序。处理器和存储器可以通过总线或者其他方式连接。
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
需要说明的是,本实施例中的终端,可以包括有如图1所示实施例中的***架构平台100,本实施例中的终端和如图1所示实施例中的***架构平台100属于相同的发明构思,因此这些实施例具有相同的实现原理以及技术效果,此处不再详述。
实现上述实施例的资源调度方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的资源调度方法,例如,执行以上描述的图2中的方法步骤S100至S300、图3中的方法步骤S101至S102、图4中的方法步骤S400、图5中的方法步骤S410至步骤S420、图6中的方法步骤S610至步骤S650、图7中的方法步骤S710至步骤S740、图8中的方法步骤S810至步骤S860、图9中的方法步骤S910至步骤S940、图10中的方法步骤S1010至步骤S1040、图11中的方法步骤S1110至步骤S1150。
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
此外,本申请的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,被上述终端实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的资源调度方法,例如,执行以上描述的图2中的方法步骤S100至S300、图3中的方法步骤S101至S102、图4中的方法步骤S400、图5中的方法步骤S410至步骤S420、图6中的方法步骤S610至步骤S650、图7中的方法步骤S710至步骤S740、图8中的方法步骤S810至步骤S860、图9中的方法步骤S910至步骤S940、图10中的方法步骤S1010至步骤S1040、图11中的方法步骤S1110至步骤S1150。
本申请实施例包括:资源调度时从调度队列中获取调度对象,若调度对象为自定义资源,则根据当前资源状态拆解自定义资源,得到调度单元列表,该调度单元列表包括被配置为构成自定义资源的第一调度单元,然后根据调度单元列表依次调度每个第一调度单元,能够应用于Kubernetes调度平台,在调度时若调度对象为CRD,则根据当前资源状态拆解CRD,得到调度单元列表,调度单元列表包括Pod的集合,这样Kubernetes调度平台可根据调度单元列表对全部Pod进行原子调度,且所有Pod按照队列依次进行调度,避免***其它Pod,这样保证CRD能够被合理调度,调度效率高,使Kubernetes调度平台能够兼容 各种业务场景。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、***可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
以上是对本申请的较佳实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。

Claims (16)

  1. 一种资源调度方法,包括:
    从调度队列中获取调度对象;
    当所述调度对象为自定义资源,根据当前资源状态拆解所述自定义资源,得到调度单元列表,所述调度单元列表包括被配置为构成所述自定义资源的第一调度单元;
    依次调度所述调度单元列表中的所述第一调度单元。
  2. 根据权利要求1所述的资源调度方法,其中,所述根据当前资源状态拆解所述自定义资源,得到调度单元列表,包括:
    当集群节点的剩余资源满足拆解所述自定义资源的要求,拆解所述自定义资源得到所述调度单元列表。
  3. 根据权利要求1所述的资源调度方法,还包括:
    当所述调度对象为第二调度单元,直接调度所述第二调度单元。
  4. 根据权利要求3所述的资源调度方法,其特征在于,还包括:
    完成对全部所述调度对象的调度后,将所述第一调度单元和所述第二调度单元分别绑定至对应的节点。
  5. 根据权利要求4所述的资源调度方法,其中,所述完成对全部所述调度对象的调度后,还包括:
    发起节点绑定请求,并更新所述节点的可分配资源信息,根据所述可分配资源信息确定最优节点。
  6. 根据权利要求1所述的资源调度方法,还包括:
    根据调度请求创建调度对象;
    监听所述调度对象的绑定信息,并将新增的所述调度对象放到同一队列中形成所述调度队列。
  7. 根据权利要求1所述的资源调度方法,还包括:
    当任一所述第一调度单元调度失败,删除已调度的所述第一调度单元并释放资源。
  8. 一种资源调度***,包括:
    调度器,被配置为从调度队列中获取调度对象;
    拆分器,被配置为当所述调度对象为自定义资源,根据当前资源状态拆解所述自定义资源,得到调度单元列表,所述调度单元列表包括被配置为构成所述自定义资源的第一调度单元;
    其中,所述调度器依次调度所述调度单元列表中的所述第一调度单元。
  9. 根据权利要求8所述的资源调度***,其中,所述拆分器还被配置为:
    当集群节点的剩余资源满足拆解所述自定义资源的要求,拆解所述自定义资源得到所述调度单元列表。
  10. 根据权利要求8所述的资源调度***,其中,所述调度器还被配置为:
    当所述调度对象为第二调度单元,直接调度所述第二调度单元。
  11. 根据权利要求10所述的资源调度***,其中,所述拆分器还被配置为:
    将所述第一调度单元和所述第二调度单元分别绑定至对应的节点。
  12. 根据权利要求11所述的资源调度***,其中,所述调度器还被配置为:
    发起绑定请求,并更新所述节点的可分配资源信息,根据所述可分配资源信息确定最优节点。
  13. 根据权利要求8所述的资源调度***,其中,所述调度器还被配置为:
    获取所述调度对象的调度请求;
    监听所述调度对象的绑定信息,并将新增的所述调度对象放到同一队列中构成所述调度队列。
  14. 根据权利要求8所述的资源调度***,其中,所述调度器还被配置为:
    当任一所述第一调度单元调度失败,删除已调度的所述第一调度单元并释放资源。
  15. 一种设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1至7中任意一项所述的资源调度方法。
  16. 一种计算机可读存储介质,存储有计算机可执行指令,其中,所述计算机可执行指令用于执行如权利要求1至7中任意一项所述的资源调度方法。
PCT/CN2021/103638 2020-07-01 2021-06-30 资源调度方法、资源调度***及设备 WO2022002148A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US18/004,067 US20230266999A1 (en) 2020-07-01 2021-06-30 Resource scheduling method, resource scheduling system, and device
JP2023500093A JP7502550B2 (ja) 2020-07-01 2021-06-30 リソーススケジューリング方法、リソーススケジューリングシステム、及び機器
EP21833960.4A EP4177751A4 (en) 2020-07-01 2021-06-30 RESOURCE PLANNING METHOD, RESOURCE PLANNING SYSTEM AND APPARATUS

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010625668.0A CN113961335A (zh) 2020-07-01 2020-07-01 资源调度方法、资源调度***及设备
CN202010625668.0 2020-07-01

Publications (1)

Publication Number Publication Date
WO2022002148A1 true WO2022002148A1 (zh) 2022-01-06

Family

ID=79317431

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103638 WO2022002148A1 (zh) 2020-07-01 2021-06-30 资源调度方法、资源调度***及设备

Country Status (4)

Country Link
US (1) US20230266999A1 (zh)
EP (1) EP4177751A4 (zh)
CN (1) CN113961335A (zh)
WO (1) WO2022002148A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115022198A (zh) * 2022-05-31 2022-09-06 阿里巴巴(中国)有限公司 资源信息获取方法、设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080248B (zh) * 2022-08-19 2023-01-10 中兴通讯股份有限公司 调度装置的调度优化方法、调度装置和存储介质
CN115145695B (zh) * 2022-08-30 2022-12-06 浙江大华技术股份有限公司 资源调度方法、装置、计算机设备和存储介质
CN117033000B (zh) * 2023-10-09 2024-01-05 合肥中科类脑智能技术有限公司 数据调度方法、设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739770A (zh) * 2012-04-18 2012-10-17 上海和辰信息技术有限公司 一种基于云计算的资源调度方法及***
CN108228354A (zh) * 2017-12-29 2018-06-29 杭州朗和科技有限公司 调度方法、***、计算机设备和介质
CN110244964A (zh) * 2019-05-28 2019-09-17 阿里巴巴集团控股有限公司 一种基于运维应用的运维方法、运维方法、装置及设备
US20200153898A1 (en) * 2018-11-13 2020-05-14 International Business Machines Corporation Automated infrastructure updates in a cluster environment that includes containers
CN111274191A (zh) * 2020-01-08 2020-06-12 山东汇贸电子口岸有限公司 管理ceph集群的方法及云本地存储协调器

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11243818B2 (en) * 2017-05-04 2022-02-08 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing a scheduler and workload manager that identifies and optimizes horizontally scalable workloads

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739770A (zh) * 2012-04-18 2012-10-17 上海和辰信息技术有限公司 一种基于云计算的资源调度方法及***
CN108228354A (zh) * 2017-12-29 2018-06-29 杭州朗和科技有限公司 调度方法、***、计算机设备和介质
US20200153898A1 (en) * 2018-11-13 2020-05-14 International Business Machines Corporation Automated infrastructure updates in a cluster environment that includes containers
CN110244964A (zh) * 2019-05-28 2019-09-17 阿里巴巴集团控股有限公司 一种基于运维应用的运维方法、运维方法、装置及设备
CN111274191A (zh) * 2020-01-08 2020-06-12 山东汇贸电子口岸有限公司 管理ceph集群的方法及云本地存储协调器

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4177751A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115022198A (zh) * 2022-05-31 2022-09-06 阿里巴巴(中国)有限公司 资源信息获取方法、设备及存储介质
CN115022198B (zh) * 2022-05-31 2023-10-24 阿里巴巴(中国)有限公司 资源信息获取方法、设备及存储介质

Also Published As

Publication number Publication date
EP4177751A1 (en) 2023-05-10
US20230266999A1 (en) 2023-08-24
EP4177751A4 (en) 2023-12-20
JP2023532358A (ja) 2023-07-27
CN113961335A (zh) 2022-01-21

Similar Documents

Publication Publication Date Title
WO2022002148A1 (zh) 资源调度方法、资源调度***及设备
CN107547596B (zh) 一种基于Docker的云平台控制方法及装置
CN106919445B (zh) 一种在集群中并行调度容器的方法和装置
WO2020001320A1 (zh) 一种资源分配方法、装置及设备
CN113918270A (zh) 基于Kubernetes的云资源调度方法及***
CN110838939B (zh) 一种基于轻量级容器的调度方法及边缘物联管理平台
CN111682973B (zh) 一种边缘云的编排方法及***
CN111897654A (zh) 将应用迁移到云平台的方法、装置、电子设备和存储介质
WO2019056771A1 (zh) 分布式存储***升级管理的方法、装置及分布式存储***
CN112910937B (zh) 容器集群中的对象调度方法、装置、服务器和容器集群
CN114153580A (zh) 一种跨多集群的工作调度方法及装置
CN113608834A (zh) 一种基于超融合的资源调度方法、装置、设备及可读介质
CN108170417B (zh) 一种在mesos集群中集成高性能的作业调度框架的方法和装置
CN116089009A (zh) 一种gpu资源管理方法、***、设备和存储介质
CN116010064A (zh) Dag作业调度和集群管理的方法、***及装置
WO2020108337A1 (zh) 一种cpu资源调度方法及电子设备
CN114721824A (zh) 一种资源分配方法、介质以及电子设备
CN117435324B (zh) 基于容器化的任务调度方法
CN113867911A (zh) 一种任务调度方法、设备及微服务***
CN111400021B (zh) 一种深度学习方法、装置及***
CN115964176B (zh) 云计算集群调度方法、电子设备和存储介质
CN116954816A (zh) 容器集群控制方法、装置、设备及计算机存储介质
JP7502550B2 (ja) リソーススケジューリング方法、リソーススケジューリングシステム、及び機器
CN113225269B (zh) 基于容器的工作流调度方法、装置、***及存储介质
CN114706663A (zh) 一种计算资源调度方法、介质及计算设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21833960

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023500093

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021833960

Country of ref document: EP

Effective date: 20230201