WO2022211980A1 - Planet-scale, fully managed artificial intelligence infrastructure service - Google Patents
Planet-scale, fully managed artificial intelligence infrastructure service Download PDFInfo
- Publication number
- WO2022211980A1 WO2022211980A1 PCT/US2022/019213 US2022019213W WO2022211980A1 WO 2022211980 A1 WO2022211980 A1 WO 2022211980A1 US 2022019213 W US2022019213 W US 2022019213W WO 2022211980 A1 WO2022211980 A1 WO 2022211980A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- workloads
- workload
- received
- resource
- processor
- Prior art date
Links
- 238000013473 artificial intelligence Methods 0.000 title abstract description 208
- 238000012549 training Methods 0.000 claims abstract description 45
- 238000000034 method Methods 0.000 claims description 25
- 238000012544 monitoring process Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 5
- 238000004891 communication Methods 0.000 description 18
- 238000012545 processing Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 230000005012 migration Effects 0.000 description 8
- 238000013508 migration Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 230000004913 activation Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000006855 networking Effects 0.000 description 5
- 230000010354 integration Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- 230000003116 impacting effect Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 239000002184 metal Substances 0.000 description 3
- 238000013341 scale-up Methods 0.000 description 3
- 230000004888 barrier function Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
- 238000012384 transportation and delivery Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G06F9/5088—Techniques for rebalancing the load in a distributed system involving task migration
Definitions
- PLANET-SCALE FULLY MANAGED ARTIFICIAL INTELLIGENCE INFRASTRUCTURE SERVICE
- AI artificial intelligence
- IaaS general-purpose infrastructure as a service
- cloud-based environments have significant limitations as AI workloads are fundamentally different and necessitate purpose-built AI infrastructure.
- managing the minutia of current infrastructure presents substantial challenges to data scientists trying to accelerate the algorithmic innovations of AI.
- a computerized method for managing AI workloads in a cloud infrastructure platform is described.
- a set of distributed infrastructure resources are integrated into the cloud infrastructure platform via native support interfaces.
- AI workloads are received from a plurality of tenants, wherein the AI workloads include training workloads and inferencing workloads and resource subsets of the set of distributed infrastructure resources are assigned to the received AI workloads.
- the received AI workloads are scheduled for execution on the assigned resource subsets and based on the scheduling of the AI workloads, they are executed on the assigned resource subsets.
- FIG. 1 is a block diagram illustrating a system configured for providing infrastructure service for artificial intelligence (AI) workloads;
- AI artificial intelligence
- FIG. 2 is a block diagram illustrating a runtime plane of the system of FIG. 1;
- FIG. 3 is a block diagram illustrating an infrastructure plane of the system of FIG. i;
- FIG. 4 is a flowchart illustrating a method for managing AI workloads in a cloud infrastructure platform
- FIG. 5 is a block diagram illustrating a hierarchical scheduling subsystem configured for scheduling AI workloads
- FIG. 6 is a block diagram of an example computing device for implementing aspects disclosed herein.
- FIGs. 1 to 6 the systems are illustrated as schematic drawings. The drawings may not be to scale.
- aspects of the disclosure provide a computerized method and system for managing the execution of artificial intelligence (AI) workloads, such as training and inferencing workloads, using a diverse, distributed pool of infrastructure resources.
- AI artificial intelligence
- Distributed infrastructure resources are integrated into the cloud infrastructure platform via native support interfaces, enabling many entities to make use of their own infrastructure to add to the global pool of resources.
- AI workloads are received from a plurality of tenants and resource subsets of the set of distributed infrastructure resources are assigned to the received AI workloads, including securing the AI workloads from each other using containers, enabling multiple AI workloads to be executed securely on the same server.
- the received AI workloads are scheduled for execution on the assigned resource subsets and, based on the scheduling of the AI workloads, they are then executed on the assigned resource subsets.
- Cloud infrastructure includes hardware accelerators, computer networking and storage — all of which are bundled together in a workload-aware manner.
- AI workloads e.g., Deep Learning Training (DLT) and inferencing
- DLT Deep Learning Training
- IaaS general-purpose cloud- based IaaS are used for DLT and inferencing jobs, which require data scientists to set their AI DLT problems, execute them, and solve any resultant problems that may occur from today’s IaaS.
- DLT workloads are growing exponentially (e.g., lOx per year).
- the industry is responding to this uptick in DLT workloads by including more hardware in the IaaS environments, e.g., buy more graphics processing units (GPUs) or other hardware accelerators, add more nodes, and build out more distributed clusters.
- GPUs graphics processing units
- the models continue to grow exponentially, it becomes untenable to grow IaaS systems in such an exponential manner.
- There are limits to the size of cloud infrastructures, from a practical standpoint. Aspects of the disclosure solve these and other technical problems in unconventional ways.
- the disclosed examples provide a “Singularity” service that increases efficiencies from today’s fixed infrastructure resource (including hardware accelerators, networking, storage, etc.) and drives the most technical efficiencies as the models continue to grow or as the number of DLT jobs and/or other AI workloads increase.
- the disclosed service operates in an unconventional manner by allowing for an IaaS or other infrastructure to grow to accommodate large numbers of DLT jobs or function as smaller groups of IaaSs that facilitate different DLT job processing.
- Conventional general-purpose IaaSs are not able to the handle these large increases in DLT jobs because today’s general-purpose IaaSs are developed to be workload-agnostic.
- the disclosed service is designed to build purpose-built workloads that may be efficiently processed in an IaaS.
- the AI infrastructure service of the disclosure is operable with all AI workloads, including training (e.g., workloads for training new or updated AI models) and inferencing (e.g., workloads for using trained AI models to evaluate and make inferences from data).
- an example of the disclosed service is a fully managed, globally distributed, multi-tenant AI infrastructure service with native support for third-party (3P) hardware (e.g., different companies than the company operating a cloud environment), custom silicon, application-specific integrated circuit (ASIC), GPU, central processing unit (CPU), and first party (IP) hardware (the company operating the cloud environment), for DLT job training and inferencing workloads.
- 3P third-party
- ASIC application-specific integrated circuit
- GPU central processing unit
- IP first party
- an AI planet-scale computer infrastructure is used for training and inferencing at any scale, with the highest technical efficiency and differentiated capabilities which significantly improves the productivity of data scientists.
- the disclosed service manages 3P (e.g., GPUs and field- programmable gate arrays (FPGAs)) and IP AI hardware capacity and enabling high-level services, like AZURE® machine learning (ML), to build experiences and tools to serve customers.
- 3P e.g., GPUs and field- programmable gate arrays (F
- any kind of AI job may be migrated using the disclosed techniques. Such jobs may be long- running (e.g., processing for several hours or days or weeks or months).
- the disclosed embodiments and the examples mention the Azure cloud service provided by the MICROSOFT CORPORATION, headquartered in Redmond, Washington, USA. But any large-scale cloud infrastructure may utilize the disclosed service.
- the disclosure provides high-efficiency AI training and inferencing by driving the high utilization of resources.
- Secure, fine-grained multi-tenancy service is provided with high-density containerized hosting.
- such service may be provided using Hyper-V isolated containers on bare-metal machines.
- the disclosed service is able to both securely and densely pack multiple IP and 3P tenants on the same hosts, enabling highly efficient use of compute and AI hardware capacity across the cloud service.
- High-density workloads that belong to different tenants are enabled. For example, AI workloads can run alongside search workloads.
- the disclosure provides multiplexing or interspersing of inferencing and training workloads on the same shared pool of resources.
- By sharing the same pool of cloud-wide resources for both inferencing and training more efficient scheduling and packing of workloads is enabled to maximize use of hardware capacity and deal with fluctuations in the mix of workloads and demand for resources of the shared pool.
- inferencing workloads and training workloads are on different pools of resources, fragmenting the capacity.
- the disclosed service multiplexes the training and inferencing workloads on the same pool of cloud resources (e.g., hardware accelerators, compute resources, networking resources, and storage resources, etc.).
- DLT workloads and inferencing workloads need topological collocation of the nodes and the hardware associated with a job.
- the disclosed service intersperses inferencing workloads on top of or in between training workloads, helping drive efficiencies and finish more jobs through the IaaS.
- the disclosed service provides cloud-wide (e.g., global), topology & workload- aware scheduling of AI workloads.
- a global scheduler is provided to exploit the heterogeneity of workloads (e.g., differing attributes between training jobs, inferencing jobs, etc.) and to provide dynamic, topology-aware scheduling of resources across the entire AI hardware capacity in the cloud.
- the disclosed scheduler is able to transparently preempt any running job, live migrate any running job, and/or elastically scale up/down and load balance the workers of the service to drive the highest utilization without impacting the performance or downtime. Additionally, the disclosed scheduler is configured to be aware of all the jobs across the entire IaaS (e.g., a global view of the workload(s) across the entire IaaS).
- the scheduler used by the disclosed service is configured to identify groups of GPUs/CPUs/hardware accelerators that are not being efficiently utilized and therefore migrate jobs on such groups to other GPUs/CPUs/hardware accelerators by transparently checkpointing and verifying processor device states for migration to occur.
- the scheduler is further configured to monitor and/or track workloads that are currently running and hardware capacity that is currently available anywhere around the world in the cloud of the disclosed service. Additionally, the scheduler is configured to decide if and/or when to preempt a job, migrate a job, scale up or scale down the job, or load-balance between different workers for a job.
- the disclosed service is configured to manage AI workloads in a priority-driven and/or tier-driven manner.
- the scheduler may consider the designated tier of a given job (or an inferencing model) or associated job submitter.
- Each tier may be defined with different technical requirements. For example, if a job is submitted with the highest tier level, indicating a best-capacity tier, the job is run with the least preemption, the equivalent of running on dedicated cloud resources. If a job is submitted at a middle tier, there is some preemption or migration experienced that may “slow” the job somewhat but drive efficiencies and improving the overall utilization of the fixed pool of resources.
- VMs spot virtual machines
- DLT training and inferencing jobs may be scheduled based, at least partially, on their associated tier, which may be specific to the job, the customer, and/or the capacity kind.
- the disclosed system is configured to provide reliable and performant AI infrastructure. Without reliable infrastructure, utilization will always be sub-optimal. This is because planned and unplanned failures result into lost GPU hours and productivity. For example, if a large job is running for months on hundreds of nodes and GPUs, eventually, some of the GPUs will become unhealthy or need to be upgraded during the job’s processing. This has an impact on the customer workload. By virtue of how AI workloads operate, any stall in the health of a GPU may stall the entire AI workload job and progress may be stopped. Worse still, if the job or model has not been checkpointed, precious processing may be lost. To overcome this, the disclosed system provides capabilities such as transparent preemption, dynamic load-balancing, defragmentation, and elasticity that all enable a highly reliable infrastructure.
- the disclosure deeply integrates the bare-metal computing, networking, and the driver stacks of IP and 3P accelerators by providing at least the following technical contributions: (i) a bandwidth optimal distributed barrier and rendezvous protocol implementation directly inside the backend network communication stack to implement distributed agreement protocol among an ensemble of accelerator devices and worker processes, and (ii) transparent and consistent checkpointing and restoration of process and device state to enable transparent preemptive scheduling, failover, live migration, and dynamic elasticity - all without impacting the model convergence and without requiring any help from the user or frameworks.
- the disclosed service provides for AI jobs to be checkpointed so that their device state may be captured and then restored on other nodes, without impacting the correctness of the model or the model’s convergence — at the infrastructure layer.
- the disclosed service is configured to provide global distribution of inferencing endpoints for (a) predictable single digit millisecond latencies at 99th percentile (P99), anywhere around the world and (b) high availability in the face of regional disasters.
- P99 99th percentile
- the inferencing model may be deployed across different geographic regions and run in the closest region.
- the disclosed service is configured to provide vertical integration for both 3P and IP hardware.
- the example architecture of illustrated in FIG. 1 below is designed for the future, with built-in extensibility to be agile as new scenarios and technologies emerge.
- the disclosed design is flexible with respect to the following: providing first class support for both 3P and IP AI accelerators; providing disaggregated and aggregated topologies; providing non-uniform backend network configuration, providing extensible, layered architecture; enabling extensible scheduling systems for customizability by tenants; enabling extensible heterogeneous accelerators, devices, and/or hardware; and providing a compiler tool chain that is agnostic of AI training and inferencing frameworks.
- the disclosure provides a unified abstraction on top of a wide range of both 3P and IP AI accelerators, and can map a given training job or an inferencing endpoint across a mix of heterogeneous device types to drive the highest efficiency.
- the disclosed service is configured to support and drive a cloud computing environment’s disaggregation strategy and/or other similar strategies associated with other cloud platforms.
- Aggregated topologies include devices that are physically attached to the servers, such that one does not need to go through a backend network.
- Disaggregated topologies include a rack of compute nodes and a rack of hardware accelerators that may make use of a backend network. The disclosed service abstracts both of these topologies.
- the disclosed service is configured to support a variety of non-uniform backend network architectures envisioned by different first party and third-party hardware manufacturers.
- the disclosed service provides a layered architecture that supports extensibility at every level, including pluggable data planes (e.g., the orchestration layer extensibility supports plugging in alternate data planes or an orchestrator below its scheduler to support Kubernetes running in a customer’s private data center), pluggable scheduling subsystems (e.g., the scheduling layer extensibility supports plugging in alternate schedulers and custom policies below its control plane to support gradual migration to the disclosed service), and pluggable heterogeneous device types and accelerators (e.g., the disclosure is designed to enable a consistent model for provisioning and scaling accelerator devices with a pluggable device provider interface, including quantum-computing devices).
- pluggable data planes e.g., the orchestration layer extensibility supports plugging in alternate data planes or an orchestrator below its scheduler to support Kubernetes running in a customer’s private data center
- pluggable scheduling subsystems e.g., the scheduling layer extensibility supports plugging in alternate schedulers
- the disclosed service is configured to provide a compiler toolchain that is agnostic of AI training and inferencing frameworks.
- the service does not rely on any help from the user or frameworks for providing its core capabilities. It is designed to be agnostic of AI training and inferencing frameworks and tools. It does not require the user to opt into any specific framework, compiler toolchain or library.
- the service integrates at the level of device drivers and the device-to-device communication channels for supporting various hardware specific capabilities.
- the disclosed service provides a highly scalable AI infrastructure.
- the service is designed to scale across 100s of datacenters and tens of thousands of accelerators with training models of trillions of parameters.
- the service may be configured to cross- geographical boundaries as well.
- the architecture is also capable of treating training jobs and inferencing services as equal when they originate from data centers as well as on premises sources.
- FIG. 1 is a block diagram illustrating a system 100 configured for providing infrastructure service for AI workloads according to an embodiment.
- the system 100 includes a control plane 102, a runtime plane 104, and an infrastructure plan 106.
- the system 100 is a distributed computing infrastructure system that includes hardware devices distributes across many different locations (e.g., a global or planet-scale distributed system).
- the system 100 is configured specifically to enable the execution of AI workloads, such that the hardware, firmware, and/or software of the system 100 is configured to enable efficient execution of tasks associated with AI workloads.
- the system 100 may include hardware, firmware, and/or software configured specifically to enable the execution of other types of workloads without departing from the description.
- the control plane 102 includes a manageability subsystem 108, pluggable data planes 110, and a global scheduling subsystem 112.
- the control plane 102 is configured to receive or accept AI workloads and associated data through a variety of extensible or pluggable data planes 110 that may be defined by the tenants of the system (e.g., plugging in an alternate data plane below the scheduler to support Kubernetes or another similar system running in a tenant’s private data center).
- Those AI workloads are scheduled for execution on the infrastructure of the system 100 (e.g., the infrastructure plane 106), as described herein.
- the manageability subsystem 108 includes hardware, firmware, and/or software configured to provide interactive processing of AI workload requests to tenants. Further, the manageability subsystem 108 is configured to provide all infrastructure resources of the system 100 in all regions of the system’s operation. In some examples, the manageability subsystem 108 includes manageability replicas in various regions of the system 100 such that the infrastructure resources of the system 100 are multi -mastered by various replicas as an interface between tenants and the system 100. The manageability subsystem 108 may be decoupled from the global scheduler subsystem 112.
- the global scheduler subsystem 108 includes hardware, firmware, and/or software configured to schedule AI workloads/jobs for execution on the infrastructure resource of the system 100 as described herein.
- the global scheduler subsystem 108 includes hierarchical schedulers: global scheduler(s), regional schedulers, and coordinator services.
- the global scheduler is responsible for preparing schedules corresponding to the AI workloads (e.g., jobs, models, and/or pods) and handing them over to the regional schedulers based on those prepared schedules.
- the regional scheduler is responsible for managing and reporting regional capacity with the global scheduler and then also executing the schedule received from the global scheduler.
- the coordinator service is responsible for translating the schedules into physical resource allocations across clusters of infrastructure resources within a region.
- the coordinator service may also constitute or otherwise be closely associated with the reliability subsystem 122 as described herein.
- the global scheduling sub system 112 is described in greater detail below.
- the runtime plane 104 includes subsystems configured to enable the AI workloads to be distributed to and executed on the infrastructure plane 106 as described herein. Such subsystems may include a monitoring subsystem 114, a compilation subsystem 116, a communication subsystem 118, and/or a load balancing subsystem 120. Further, the runtime plane 104 includes a reliability subsystem 122 configured for securing the reliability of execution of AI workloads while enabling such workloads to be checkpointed and/or migrated throughout the infrastructure resources of the system 100. The runtime plane 104 further includes AI accelerator provider models 124 that are configured to enable the use of a variety of libraries and/or configurations for managing AI accelerators when executing AI workloads. The runtime plane 104 is described in greater detail below.
- the infrastructure plane 106 includes hardware, firmware, and/or software for executing the AI workloads based on the schedules provided by the control plane 102 and instructions received from the runtime plane 104.
- the infrastructure plane 106 includes hosting and activation subsystems 126, infrastructure resources 128, and devices/AI accelerators 130.
- the infrastructure plane 106 is described in greater detail below.
- FIG. 2 is a block diagram 200 illustrating a runtime plane 204 of the system 100 of FIG. 1 according to an embodiment.
- the runtime plane 204 is substantially the same as the runtime plane 104 described above with respect to FIG. 1.
- the runtime plane 204 includes a monitoring subsystem 214, a compilation subsystem 216, a communication subsystem 218, a load balancing subsystem 220, a reliability subsystem 222, and AI accelerator provider models 224.
- the reliability subsystem 222 includes routines for interacting with AI workloads to ensure their reliability.
- the routines include failover 232, suspend 234, resume 236, migrate 238, scale 240, checkpoint 242, and restore 244.
- the checkpoint 242 and restore 244 routines may be configured as the core routines and the other routines (failover 232, suspend 234, resume 236, migrate 238, and scale 240) may be configured to use checkpoint 242 and/or restore 244 routines to achieve the desired results.
- the checkpoint 242 routine is configured to save the state of an AI workload as it is executed, such that the saved state can be used to continue execution of the AI workload from the saved point in time.
- Checkpoint 242 may be used to perform the suspend 234 routine to halt the execution of an AI workload for a period of time and/or to perform the migrate 238 routine to save the state of the AI workload such that it can be moved to another set of infrastructure resources for continued execution.
- the restore 244 routine is configured to take a saved state of an AI workload as input and restore the execution of the AI workload on infrastructure resources starting at the point of the saved state.
- the restore 244 routine may be used to perform the resume 236 routine and/or to restore the execution of an AI workload that has been migrated to another set of infrastructure resources based on a migrate 238 routine.
- the failover 232 routine is configured to checkpoint the state of an AI workload based on detection of a failure of the current infrastructure resources and to restore the AI workload on a new set of infrastructure resources, such that the AI workload recovers from the detected failure.
- the scale 240 routine is configured to scale up and/or scale down the quantity, quality, and/or type of infrastructure resources being used to execute an AI workload. For instance, if additional infrastructure resources are available, an AI workload may be scaled up to make use of those additional infrastructure resources. Alternatively, if a new AI workload requires some infrastructure resources in use executing a current AI workload, the current AI workload may be scaled down to free up some resources for the new AI workload (e.g., the new AI workload may be associated with a higher priority or tier than the current AI workload).
- the reliability subsystem 222 further includes a rendezvous protocol 246 configured to synchronize or otherwise enforce synchronization on AI workloads upon which the above-described routines are to be applied. For instance, if an AI workload is going to be migrated, the rendezvous protocol 246 is configured to synchronize the operations of the system such that the resources involved in the migration are not altered during the migration process. Such a rendezvous protocol 246 may include use of locking or forming a barrier such that processes that are otherwise not associated with the migration do not affect the migration inadvertently.
- the AI accelerator provider models 224 are configured to enable the use of various software stacks, including 3P libraries 248 (e.g., libraries provided by tenants of the system 100) and/or IP libraries 250 (e.g., libraries provided by the entity that manages the system 100).
- 3P libraries 248 may include a 3P-specific management library (ML) 252, 3P-specific multi-GPU communications library (MGCL) 254, and 3P-specific GPU library (GPUL) 256.
- IP libraries 250 may include a management library 264, a communication library 266, and/or a compiler toolchain 268.
- the runtime plane 204 enables tenants to make use of a wide variety of software stacks and associated libraries, including their own software stacks, to execute AI workloads within the described system 100 based on its extensible, flexible configuration.
- FIG. 3 is a block diagram 300 illustrating an infrastructure plane 306 of the system 100 of FIG. 1 according to an embodiment.
- the infrastructure plane 306 is substantially the same as the infrastructure plane 106 of FIG. 1, as described above.
- the infrastructure plane 306 includes a hosting and activation subsystem 326, infrastructure resources 328, and devices and AI accelerators 330.
- the hosting and activation sub system 326 includes host agents 370 and containers 372.
- the host agents 370 enable and organize the hosting of AI workloads on the infrastructure resources 328.
- the containers 372 (e.g., copy-on-write containers) keep different AI workloads (e.g., workloads from different tenants) separate and secure from each other, even when they are being executed on the same host.
- a host controlled by a host agent 370 may be a device that includes a set of infrastructure resources 328 that are configured to execute an AI workload or at least a portion thereof.
- some resources of a host may be used to execute an AI workload from one tenant, while other resources of the host may be used to execute an AI workload of another tenant at the same time.
- the containers 372 are configured such that the two separated AI workloads are prevented from interacting in any manner while they are being executed.
- the infrastructure resources 328 include a service fabric 374 interface, storage resources 376, networking resources 378, compute resources 380 which may include bare metal blades 382 (e.g., physical processing devices) and virtual machines 384, and other resources 386 (e.g., integration infrastructure resources).
- the infrastructure resources 328 are primarily provided for use by the entity that is offering services of the system 100 (e.g., IP resources), but in other examples, the infrastructure resources 328 may also include resources provided by other entities (e.g., 3P resources) such as resources owned and used by tenants of the system 100.
- Such integration may be enabled via the 3P libraries 248 and other configurations described above.
- the devices and AI accelerators 330 include GPUs 388, FPGA devices 390, other 3P devices 392, and other IP devices 394.
- the described processes may further be enabled by backend networks 396 and/or associated devices.
- the execution of AI workloads may uniquely benefit from the use of GPUs 388, FPGAs 390, and/or other specialized hardware.
- infrastructure resources 328 such as compute resources 380, may be linked to GPUs 388, for instance, such that a compute resource 380 provides instructions to the GPU 388 for how to execute steps of the AI workload.
- Such execution then takes advantage of specialized architecture of the GPU 388, such as the GPU 388 having many cores enabling parallel processing of data to a significant degree beyond the capabilities of the compute resources 380.
- the backend networks 396 are configured to support a variety of non-uniform backend network architectures that may be envisioned by a variety of entities that use the system, such as IP and 3P hardware manufacturers. Such backend networks 396 may be used to provide links between disaggregated topologies of compute nodes (e.g., compute resources 380) and hardware accelerators (e.g., GPUs 388).
- compute nodes e.g., compute resources 380
- hardware accelerators e.g., GPUs 388
- FIG. 4 is a flowchart illustrating a method 400 for managing AI workloads in a cloud infrastructure platform according to an embodiment.
- the cloud infrastructure platform of method 400 is a system such as system 100 of FIG.1.
- a set of distributed infrastructure resources e.g., hosting and activation subsystems 126, infrastructure resources 128, and/or devices/ AI accelerators 130 of the infrastructure plane 106 are integrated into the cloud infrastructure platform via native support interfaces of those resources.
- the native support interfaces may include interfaces and/or libraries of the providers of the resources, such as the 3P libraries 248 and IP libraries 250 of FIG.
- a tenant of the could infrastructure platform may provide a subset of infrastructure resources for integration into the platform based on provided libraries, such that the tenant and/or other tenants of the platform may use those resources in execution of AI workloads.
- AI workloads are received from a plurality of tenants, wherein the received AI workloads include training workloads and inferencing workloads.
- the tenants provide AI workloads for execution on the platform via interfaces such as pluggable data planes 110 as described herein.
- resource subsets of the distributed infrastructure resources are assigned to the received AI workloads.
- the assignment of resource subsets to the AI workloads is performed by a global scheduling system 112 as described herein. Assigning the resources may include determining resource requirements of an AI workload and then identifying a subset of infrastructure resources that satisfy those requirements (e.g., an AI workload that requires the use of four GPUs in parallel may be assigned to a node of the system that has at least four GPUs).
- the assignment of a subset of resources to an AI workload may include rearranging of other AI workloads with respect to the subset of resources. For instance, assigning a resource subset to an AI workload may include saving a state checkpoint of an AI workload that is currently being executed on a first resource subset, migrating that AI workload to a second resource subset, restoring the saved state checkpoint of the migrated AI workload on the second resource subset, and then assigning at least a portion of the first resource subset to another AI workload. In some examples, such processes may be performed using routines of a reliability subsystem 222 as described herein.
- the received AI workloads are scheduled for execution on the assigned resource subsets.
- a global scheduling subsystem 112 generates a schedule for the AI workloads as described herein.
- scheduling the execution of the AI workloads may include scheduling training workloads and inferencing workloads on the same infrastructure resources and those two types of workloads are multiplexed on those infrastructure resources (e.g., execution of a training workload is interspersed with execution of an inferencing workload on an infrastructure resource, such as a GPU).
- AI workloads are associated with priorities or tiers that affect how resources are assigned and how AI workloads are scheduled to be executed on those resources. For instance, lower tier AI workloads may be more likely to be migrated to other resources to make space for higher tier AI workloads or higher tier AI workloads may be scheduled for a greater share of resource usage time that lower tier AI workloads, as described herein.
- the AI workloads are executed based on the scheduling of the AI workloads on the assigned resource subsets.
- the AI workloads are hosted in a hosting and activation subsystem 126 and then infrastructure resources 128 and/or devices/AI accelerators 130 are used to execute the AI workloads.
- assigning and executing AI workloads on resource subsets includes isolating the AI workloads from each other in secure containers, whereby AI workloads associated with different tenants are securely executed alongside each other (e.g., on resources associated with the same server).
- executing AI workloads are monitored based on the performance of the cloud infrastructure platform and, based on that monitoring, the scheduling of the AI workloads is adjusted.
- the adjusting of the scheduling may include preempting an AI workload, migrating an AI workload, scaling up an AI workload, scaling down an AI workload, and/or load-balancing between two or more AI workloads.
- schedule adjustment may be performed by a global scheduling subsystem 112 or other component of the system 100.
- FIG. 5 is a block diagram illustrating a hierarchical scheduling subsystem 500 configured for scheduling AI workloads 512 according to an embodiment.
- the scheduling subsystem 500 is included in a system such as system 100 of FIG. 1.
- the scheduling subsystem 500 may be substantially the same as the global scheduling subsystem 112 of FIG. 1.
- the scheduling subsystem 500 includes a global scheduler 502 and multiple regional schedulers 504, coordinator services 506, and associated infrastructure resources 508.
- the global scheduler 502 is configured to use the global capacity data 510 (e.g., data indicating the current state of resource usage throughout the associated global infrastructure system, including resource usage in each region of the system) and AI workloads 512 to generate a global schedule 514 that schedules the AI workloads 512 to be executed on the infrastructure resources 508.
- the global scheduler 514 includes regional schedules 520 for each region of the system, which are then provided to the regional schedulers 504 associated with those regions (e.g., a regional scheduler 520 of a region is provided to the regional scheduler 504 associated with that particular region).
- the regional schedulers 504 monitor the current regional capacity data 516 of the infrastructure resources 508 associated with the respective regions and that regional capacity data 516 is provided to the global scheduler 502 periodically or based on a pattern or a triggering event. Further, the regional schedulers 504 receive the regional AI workloads 518 associated with their regions from the global scheduler 502 from the set of AI workloads 512. The regional schedulers 504 are also configured to instruct the coordinator services 506 to execute the associated regional schedules 520 using the data of the regional AI workloads 518 (each region includes a regional scheduler 504 and a coordinator service 506).
- the coordinator services 506 are configured to receive a regional schedule 522 and associated regional AI workloads 524 from an associated regional scheduler 504 and to use the reliability routines 526 (e.g., the routines of the reliability subsystem 222 of FIG. 2 as described above) to cause the regional AI workloads 524 to be executed using infrastructure resources 508 of the region based on the regional scheduler 522.
- a coordinator service 506 may be configured to allocate a subset of infrastructure resource 508 of the region to a regional AI workload 524 and cause that workload 524 to be executed on those allocated resources 508.
- a coordinator service 506 may be configured to checkpoint, restore, migrate, and/or perform other reliability routines 526 to arrange the use of the infrastructure resources 508 according to the regional schedule 522.
- FIG. 6 is a block diagram of an example computing device 600 for implementing aspects disclosed herein, and is designated generally as computing device 600.
- Computing device 600 is but one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the examples disclosed herein. Neither should computing device 600 be interpreted as having any dependency or requirement relating to any one or combination of components/modules illustrated.
- the examples disclosed herein may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program components, being executed by a computer or other machine, such as a personal data assistant or other handheld device.
- program components including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks, or implement particular abstract data types.
- the disclosed examples may be practiced in a variety of system configurations, including personal computers, laptops, smart phones, mobile tablets, hand-held devices, consumer electronics, specialty computing devices, etc.
- the disclosed examples may also be practiced in distributed computing environments when tasks are performed by remote-processing devices that are linked through a communications network.
- Computing device 600 includes a bus 610 that directly or indirectly couples the following devices: computer-storage memory 612, one or more processors 614, one or more presentation components 616, input/output (I/O) ports 618, I/O components 620, a power supply 622, and a network component 624. While computing device 600 is depicted as a seemingly single device, multiple computing devices 600 may work together and share the depicted device resources. For example, memory 612 is distributed across multiple devices, and processor(s) 614 is housed with different devices.
- Bus 610 represents what may be one or more busses (such as an address bus, data bus, or a combination thereof). Although the various blocks of FIG. 6 are shown with lines for the sake of clarity, delineating various components may be accomplished with alternative representations.
- a presentation component such as a display device is an I/O component in some examples, and some examples of processors have their own memory. Distinction is not made between such categories as “workstation,” “server,” “laptop,” “hand-held device,” etc., as all are contemplated within the scope of FIG.
- Memory 612 may take the form of the computer-storage media references below and operatively provide storage of computer- readable instructions, data structures, program modules and other data for the computing device 600.
- memory 612 stores one or more of an operating system, a universal application platform, or other program modules and program data. Memory 612 is thus able to store and access data 612a and instructions 612b that are executable by processor 614 and configured to carry out the various operations disclosed herein.
- memory 612 includes computer-storage media in the form of volatile and/or nonvolatile memory, removable or non-removable memory, data disks in virtual environments, or a combination thereof.
- Memory 612 may include any quantity of memory associated with or accessible by the computing device 600.
- Memory 612 may be internal to the computing device 600 (as shown in FIG. 6), external to the computing device 600 (not shown), or both (not shown).
- Examples of memory 612 in include, without limitation, random access memory (RAM); read only memory (ROM); electronically erasable programmable read only memory (EEPROM); flash memory or other memory technologies; CD-ROM, digital versatile disks (DVDs) or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices; memory wired into an analog computing device; or any other medium for encoding desired information and for access by the computing device 600. Additionally, or alternatively, the memory 612 may be distributed across multiple computing devices 600, for example, in a virtualized environment in which instruction processing is carried out on multiple devices 600.
- “computer storage media,” “computer- storage memory,” “memory,” and “memory devices” are synonymous terms for the computer-storage memory 612, and none of these terms include carrier waves or propagating signaling.
- Processor(s) 614 may include any quantity of processing units that read data from various entities, such as memory 612 or I/O components 620. Specifically, processor(s) 614 are programmed to execute computer-executable instructions for implementing aspects of the disclosure. The instructions may be performed by the processor, by multiple processors within the computing device 600, or by a processor external to the client computing device 600. In some examples, the processor(s) 614 are programmed to execute instructions such as those illustrated in the flow charts discussed below and depicted in the accompanying drawings. Moreover, in some examples, the processor(s) 614 represent an implementation of analog techniques to perform the operations described herein. For example, the operations are performed by an analog client computing device 600 and/or a digital client computing device 600.
- Presentation component(s) 616 present data indications to a user or other device.
- Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc.
- GUI graphical user interface
- I/O ports 618 allow computing device 600 to be logically coupled to other devices including EO components 620, some of which may be built in.
- Example EO components 620 include, for example but without limitation, a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.
- the computing device 600 may operate in a networked environment via the network component 624 using logical connections to one or more remote computers.
- the network component 624 includes a network interface card and/or computer-executable instructions (e.g., a driver) for operating the network interface card. Communication between the computing device 600 and other devices may occur using any protocol or mechanism over any wired or wireless connection.
- network component 624 is operable to communicate data over public, private, or hybrid (public and private) using a transfer protocol, between devices wirelessly using short range communication technologies (e.g., near-field communication (NFC), BLUETOOTH branded communications, or the like), or a combination thereof.
- NFC near-field communication
- BLUETOOTH BLUETOOTH branded communications, or the like
- Network component 624 communicates over wireless communication link 626 and/or a wired communication link 626a to a cloud resource 628 across network 630.
- Various different examples of communication links 626 and 626a include a wireless connection, a wired connection, and/or a dedicated link, and in some examples, at least a portion is routed through the internet.
- examples of the disclosure are capable of implementation with numerous other general- purpose or special-purpose computing system environments, configurations, or devices.
- Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with aspects of the disclosure include, but are not limited to, smart phones, mobile tablets, mobile computing devices, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, gaming consoles, microprocessor- based systems, set top boxes, programmable consumer electronics, mobile telephones, mobile computing and/or communication devices in wearable or accessory form factors (e.g., watches, glasses, headsets, or earphones), network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, virtual reality (VR) devices, augmented reality (AR) devices, mixed reality (MR) devices, holographic device, and the like.
- Such systems or devices may accept input from the user in any way, including from input devices such as a keyboard or pointing device, via gesture input, proximity input (
- Examples of the disclosure may be described in the general context of computer- executable instructions, such as program modules, executed by one or more computers or other devices in software, firmware, hardware, or a combination thereof.
- the computer- executable instructions may be organized into one or more computer-executable components or modules.
- program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types.
- aspects of the disclosure may be implemented with any number and organization of such components or modules. For example, aspects of the disclosure are not limited to the specific computer-executable instructions or the specific components or modules illustrated in the figures and described herein. Other examples of the disclosure may include different computer-executable instructions or components having more or less functionality than illustrated and described herein.
- aspects of the disclosure transform the general-purpose computer into a special-purpose computing device when configured to execute the instructions described herein.
- An example system for managing AI workloads in a cloud infrastructure platform comprises: at least one processor of the cloud infrastructure platform; and at least one memory of the cloud infrastructure platform comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the at least one processor to: integrate a set of distributed infrastructure resources via native support interfaces; receive AI workloads from a plurality of tenants, wherein the AI workloads include training workloads and inferencing workloads; assign resource subsets of the set of distributed infrastructure resources to the received AI workloads; schedule the received AI workloads for execution on the assigned resource subsets; and execute the AI workloads based on the scheduling of the AI workloads on the assigned resource subsets.
- An example computerized method for managing AI workloads in a cloud infrastructure platform comprises: integrating, by at least one processor of the cloud infrastructure platform, a set of distributed infrastructure resources via native support interfaces; receiving, by the at least one processor, AI workloads from a plurality of tenants, wherein the AI workloads include training workloads and inferencing workloads; assigning, by the at least one processor, resource subsets of the set of distributed infrastructure resources to the received AI workloads; scheduling, by the at least one processor, the received AI workloads for execution on the assigned resource subsets; and executing, by the at least one processor, the AI workloads based on the scheduling of the AI workloads on the assigned resource subsets.
- One or more computer storage media have computer-executable instructions for managing AI workloads in a cloud infrastructure platform that, upon execution by a processor, cause the processor to at least: integrate a set of distributed infrastructure resources via native support interfaces; receive AI workloads from a plurality of tenants, wherein the AI workloads include training workloads and inferencing workloads; assign resource subsets of the set of distributed infrastructure resources to the received AI workloads; schedule the received AI workloads for execution on the assigned resource subsets; and execute the AI workloads based on the scheduling of the AI workloads on the assigned resource subsets.
- examples include any combination of the following:
- assigning the resource subsets to the received AI workloads includes isolating the AI workloads from each other in secure containers, whereby AI workloads associated with different tenants are securely executed alongside each other.
- assigning resource subsets of the set of distributed infrastructure resources to the received AI workloads further includes: saving a state checkpoint of a first AI workload that is being executed on a first resource subset; migrating the first AI workload to a second resource subset; restoring the saved state checkpoint of the first AI workload on the second resource subset; and assigning at least a portion of the first resource subset to a second AI workload.
- scheduling the received AI workloads for execution of the assigned resource subsets includes multiplexing execution of at least two AI workloads on at least one resource of an assigned resource subset.
- the at least two AI workloads include a training workload and an inferencing workload; and wherein the multiplexing of execution of the training workload and the inferencing workload on the at least one resource is based on differing resource use between the training workload and the inferencing workload -further comprising: monitoring, by the at least one processor, the executing of the AI workloads based on performance of the cloud infrastructure platform; and based on the monitoring, adjusting, by the at least one processor, the scheduling of the AI workloads, whereby performance of the cloud infrastructure platform is improved, and wherein the adjusting includes at least one of the following: preempting an AI workload, migrating an AI workload, scaling up an AI workload, scaling down an AI workload, and load-balancing between at least two AI workloads.
- each AI workload of the received AI workloads is associated with a priority tier; and wherein assigning resource subsets to the received AI workloads and scheduling the received AI workloads for execution on the assigned resource subsets are based on the associated priority tiers of the AI workloads.
- the embodiments illustrated and described herein as well as embodiments not specifically described herein but within the scope of aspects of the claims constitute an exemplary means for integrating, by at least one processor of the cloud infrastructure platform, a set of distributed infrastructure resources via native support interfaces; exemplary means for receiving, by the at least one processor, AI workloads from a plurality of tenants, wherein the AI workloads include training workloads and inferencing workloads; exemplary means for assigning, by the at least one processor, resource subsets of the set of distributed infrastructure resources to the received AI workloads; exemplary means for scheduling, by the at least one processor, the received AI workloads for execution on the assigned resource subsets; and exemplary means for executing, by the at least one processor, the AI workloads based on the scheduling of the AI workloads on the assigned resource subsets.
- Computer readable media comprise computer storage media and communication media.
- Computer storage media include volatile and nonvolatile, removable and non-removable memory implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or the like.
- Computer storage media are tangible and mutually exclusive to communication media.
- Computer storage media are implemented in hardware and exclude carrier waves and propagated signals. Computer storage media for purposes of this disclosure are not signals per se.
- Exemplary computer storage media include hard disks, flash drives, solid-state memory, phase change random-access memory (PRAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that may be used to store information for access by a computing device.
- communication media typically embody computer readable instructions, data structures, program modules, or the like in a modulated data signal such as a carrier wave or other transport mechanism and include any information delivery media.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Multi Processors (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22712724.8A EP4315057A1 (en) | 2021-03-30 | 2022-03-08 | Planet-scale, fully managed artificial intelligence infrastructure service |
CN202280022711.1A CN117015763A (en) | 2021-03-30 | 2022-03-08 | Planetary-scale fully managed artificial intelligence infrastructure service |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN202141014649 | 2021-03-30 | ||
IN202141014649 | 2021-03-30 | ||
US17/361,208 US20220318674A1 (en) | 2021-03-30 | 2021-06-28 | Planet-scale, fully managed artificial intelligence infrastructure service |
US17/361,208 | 2021-06-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022211980A1 true WO2022211980A1 (en) | 2022-10-06 |
Family
ID=80937234
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2022/019213 WO2022211980A1 (en) | 2021-03-30 | 2022-03-08 | Planet-scale, fully managed artificial intelligence infrastructure service |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP4315057A1 (en) |
WO (1) | WO2022211980A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220360643A1 (en) * | 2018-11-30 | 2022-11-10 | Vmware, Inc. | Distributed inline proxy |
-
2022
- 2022-03-08 WO PCT/US2022/019213 patent/WO2022211980A1/en active Application Filing
- 2022-03-08 EP EP22712724.8A patent/EP4315057A1/en active Pending
Non-Patent Citations (1)
Title |
---|
CHAUDHARY SHUBHAM T-SHUCHA@MICROSOFT COM ET AL: "Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning", PROCEEDINGS OF THE 12TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, ACM, NEW YORK, NY, USA, 15 April 2020 (2020-04-15), pages 1 - 16, XP058553032, ISBN: 978-1-4503-6894-0, DOI: 10.1145/3342195.3387555 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220360643A1 (en) * | 2018-11-30 | 2022-11-10 | Vmware, Inc. | Distributed inline proxy |
US11882196B2 (en) * | 2018-11-30 | 2024-01-23 | VMware LLC | Distributed inline proxy |
Also Published As
Publication number | Publication date |
---|---|
EP4315057A1 (en) | 2024-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10467725B2 (en) | Managing access to a resource pool of graphics processing units under fine grain control | |
US20210089344A1 (en) | Methods and apparatus to deploy a hybrid workload domain | |
US10496503B2 (en) | Healing cloud services during upgrades | |
EP3270289A2 (en) | Container-based multi-tenant computing infrastructure | |
US10387179B1 (en) | Environment aware scheduling | |
US9661071B2 (en) | Apparatus, systems and methods for deployment and management of distributed computing systems and applications | |
US20190318240A1 (en) | Training machine learning models in distributed computing systems | |
JP5497201B2 (en) | Method for allocating resources, computer program for allocating resources, and system for allocating resources | |
US10176004B2 (en) | Workload-aware load balancing to minimize scheduled downtime during maintenance of host or hypervisor of a virtualized computing system | |
US10109030B1 (en) | Queue-based GPU virtualization and management system | |
US20190250946A1 (en) | Migrating a software container taking into account resource constraints | |
US10628199B2 (en) | Restoring and powering-off workloads during workflow execution based on policy triggers | |
US20220318674A1 (en) | Planet-scale, fully managed artificial intelligence infrastructure service | |
JP2013518330A5 (en) | ||
US11740921B2 (en) | Coordinated container scheduling for improved resource allocation in virtual computing environment | |
Gogouvitis et al. | Seamless computing in industrial systems using container orchestration | |
CN112099917B (en) | Regulation and control system containerized application operation management method, system, equipment and medium | |
KR20190028210A (en) | Cloud service method and system for deployment of artificial intelligence application using container | |
EP4315057A1 (en) | Planet-scale, fully managed artificial intelligence infrastructure service | |
US20220318052A1 (en) | Scheduler for planet-scale computing system | |
WO2022078060A1 (en) | Tag-driven scheduling of computing resources for function execution | |
CN117015763A (en) | Planetary-scale fully managed artificial intelligence infrastructure service | |
US11017417B1 (en) | Using incentives to manage computing resources | |
WO2022211981A1 (en) | Scheduler for planet-scale computing system | |
CN117099083A (en) | Scheduler for a planetary level computing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22712724 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280022711.1 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022712724 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022712724 Country of ref document: EP Effective date: 20231030 |