CN113296921A

CN113296921A - Cloud resource scheduling method, node, system and storage medium

Info

Publication number: CN113296921A
Application number: CN202010264008.4A
Authority: CN
Inventors: 龚志刚; 林立翔; 游亮; 龙欣
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2021-08-24
Anticipated expiration: 2040-04-07
Also published as: CN113296921B

Abstract

The embodiment of the application provides a cloud resource scheduling method, a node, a system and a storage medium. In the embodiment of the application, a resource management mode for dynamically adjusting the cloud computing resource amount occupied by the online task and the offline task is provided aiming at the difference of the online task and the offline task in the cloud computing resource requirements, so that the cloud computing resource amount occupied by the online task in the resource requirement peak period can be increased and the cloud computing resource amount occupied by the offline task can be reduced according to the elastic requirements of the online task on the cloud computing resources; the cloud computing resource amount occupied by the online task in the resource demand valley period can be reduced, and the cloud computing resource amount occupied by the offline task is increased, so that the cloud computing resources are reasonably utilized, and the resource waste is reduced.

Description

Cloud resource scheduling method, node, system and storage medium

Technical Field

The present application relates to the field of cloud computing technologies, and in particular, to a cloud resource scheduling method, node, system, and storage medium.

Background

In the fields of artificial intelligence and deep learning, two main parts are involved, namely training (training) and reasoning (inference). "training" is the process of training and debugging continuously based on training samples to obtain a model that meets the application requirements. The 'inference' is the process of applying the model on the line and providing corresponding services for users.

Because strong computing resources are generally needed for model training and online reasoning, and a demand party (such as an enterprise) for model training and online reasoning generally deploys a model training task and an online reasoning task to high-performance cloud computing resources provided by a cloud service manufacturer for implementation, so that resource cost is reduced.

In some scenes, such as live broadcast scenes, with fluctuation of live broadcast flow, online reasoning tasks (such as models responsible for special effect processing) have strong timeliness and have high elastic requirements on cloud computing resources. To meet the resource demand during peak periods of traffic, sufficient cloud computing resources are typically purchased in advance. However, during the off-peak period, the resources are idle, and there is a waste of resources.

Disclosure of Invention

Aspects of the present application provide a cloud resource scheduling method, node, system, and storage medium, which are used to dynamically adjust the amount of cloud resources occupied by offline tasks and online tasks, improve resource utilization, and reduce resource waste.

An embodiment of the present application provides a cloud resource scheduling system, including: a resource scheduling node and a plurality of GPU cards already allocated to a first customer, wherein the GPU cards can be shared by an online task and an offline task of the first customer; the resource scheduling node is used for receiving a resource adjustment request submitted by a first client; and dynamically adjusting the number of GPU cards respectively occupied by the online tasks and the offline tasks in the multiple GPU cards according to the resource adjustment request.

An embodiment of the present application further provides a cloud resource scheduling system, including: a resource scheduling node and cloud computing resources already allocated to a first customer, the cloud computing resources of the first customer being shareable by an online task and an offline task of the first customer; the resource scheduling node is used for receiving a resource adjustment request submitted by a first client; and dynamically adjusting the resource amount respectively occupied by the online task and the offline task in the cloud computing resources of the first client according to the resource adjustment request.

The embodiment of the application also provides a cloud resource scheduling method, which is suitable for the resource scheduling node, and the method comprises the following steps: receiving a resource adjustment request submitted by a first client, wherein the online task and the offline task of the first client share a plurality of distributed GPU cards; and dynamically adjusting the number of GPU cards respectively occupied by the online tasks and the offline tasks in the multiple GPU cards according to the resource adjustment request.

The embodiment of the application further provides a cloud resource scheduling method, which is suitable for a resource management and control node and used for scheduling the multiple GPU cards allocated to a first client, wherein the multiple GPU cards are shared by the online tasks and the offline tasks of the first client; the method comprises the following steps: receiving a resource release notification sent by a resource scheduling node, wherein the resource release notification comprises GPU card number K1 which can be released by the offline task; selecting K1 GPU cards from the N GPU cards currently occupied by the offline task according to the resource release notice; controlling the K1 GPU cards to finish off-line tasks so as to release the K1 GPU cards; wherein K1 and N are natural numbers, and K1 is more than or equal to 1 and less than or equal to N.

The embodiment of the application also provides a cloud resource scheduling method, which is suitable for the resource scheduling node, and the method comprises the following steps: receiving a resource adjustment request submitted by a first client, wherein the online task and the offline task of the first client share the cloud computing resource allocated to the first client; and dynamically adjusting the resource amount respectively occupied by the online task and the offline task in the cloud computing resources according to the resource adjustment request.

The embodiment of the application further provides a cloud resource scheduling method, which is suitable for a resource management and control node and used for scheduling the cloud computing resources allocated to the first customer, wherein the online tasks and the offline tasks of the first customer share the allocated cloud computing resources; the method comprises the following steps: receiving a resource release notification sent by a resource scheduling node, wherein the resource release notification comprises the cloud computing resource amount which can be released by an offline task; according to the cloud computing resource amount contained in the resource release notification, selecting cloud computing resources which can be released from the cloud computing resources currently occupied by the offline task; and controlling the selected cloud computing resource to finish off-line tasks so as to release the selected cloud computing resource.

The embodiment of the application also provides a resource scheduling node, which is used for dynamically adjusting the use resources of the online task and the offline task; the apparatus comprises: a memory and a processor; the memory stores a computer program; the processor is to execute the computer program to: receiving a resource adjustment request submitted by a first client, wherein the online task and the offline task of the first client share a plurality of distributed GPU cards; and dynamically adjusting the number of GPU cards respectively occupied by the online tasks and the offline tasks in the multiple GPU cards according to the resource adjustment request.

The embodiment of the application further provides a resource management and control node, which is used for scheduling the multiple GPU cards allocated to the first client, wherein the multiple GPU cards are shared by the online task and the offline task of the first client; the apparatus comprises: a memory and a processor; the memory stores a computer program; the processor is to execute the computer program to: receiving a resource release notification sent by a resource scheduling node, wherein the resource release notification comprises GPU card number K1 which can be released by the offline task; selecting K1 GPU cards from the N GPU cards currently occupied by the offline task according to the resource release notice; controlling the K1 GPU cards to finish off-line tasks so as to release the K1 GPU cards; wherein K1 and N are natural numbers, and K1 is more than or equal to 1 and less than or equal to N.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to implement the steps in any one of the methods provided by the embodiments of the present application.

In the embodiment of the application, a resource management mode for dynamically adjusting the cloud computing resources occupied by the online task and the offline task is provided according to the difference of the online task and the offline task in the cloud computing resource requirements, so that the cloud computing resource amount occupied by the online task in the resource requirement peak period can be increased and the meta computing resource amount occupied by the offline task can be reduced according to the elastic requirements of the online task on the cloud computing resources; the cloud computing resource amount occupied by the online task in the resource demand valley period can be reduced, and the cloud computing resource amount occupied by the offline task is increased, so that the cloud computing resources are reasonably utilized, and the resource waste is reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1a is a schematic structural diagram of a cloud resource scheduling system according to an exemplary embodiment of the present application;

fig. 1b is a schematic structural diagram of another cloud resource scheduling system provided in an exemplary embodiment of the present application;

FIG. 1c is a schematic flow chart of a model training process provided in an exemplary embodiment of the present application;

fig. 2 is a flowchart of a cloud resource scheduling method, which is implemented by taking a GPU card as an example, according to an exemplary embodiment of the present application;

fig. 3a is a flowchart of another cloud resource scheduling method, which is implemented by taking a GPU card as an example, according to an exemplary embodiment of the present application;

fig. 3b is a schematic flowchart of a cloud resource scheduling method according to an exemplary embodiment of the present disclosure;

fig. 3c is a schematic flowchart of another cloud resource scheduling method according to an exemplary embodiment of the present application;

fig. 4 is a schematic structural diagram of a resource scheduling node according to an exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Aiming at the problem of resource waste in the prior art when a training task and an inference task are executed by using cloud computing resources, in the embodiment of the application, the online task has the characteristics of strong timeliness and high elastic demand on the cloud computing resources, and the mode of dynamically adjusting the cloud computing resource amount occupied by the offline task and the online task is adopted, so that the online task can be ensured to have enough resource use in the resource demand peak period as much as possible, the cloud computing resources occupied by the online task can be released to the offline task in the resource demand peak period of the online task, the utilization rate of the cloud computing resources is improved, and the resource waste is reduced.

Based on the above, in some embodiments of the present application, a cloud resource scheduling system is provided, where the cloud resource scheduling system includes: a resource scheduling node and cloud computing resources already allocated to the first customer; wherein the cloud computing resources allocated to the first customer may be shared by the first customer's online and offline tasks.

The first client generally refers to a user of the cloud resource scheduling system and the cloud computing resources provided by the cloud resource scheduling system, and may be, for example, a user, various application systems (for example, personal applications, enterprise applications, or government applications), some functional plug-ins, functional modules, or chips, and the like, and may also be a terminal device, a server, or devices such as multiple computer application devices, an enterprise cluster server, a distributed server, and a cloud server. The number of the first clients may be one or multiple, which is not limited herein.

In an embodiment of the present application, the cloud resource scheduling system may provide cloud computing resources for each first customer. Optionally, in addition to the cloud computing resources provided to each first customer, the resource pool of the cloud resource scheduling system may further include cloud computing resources that are not allocated to any customer. The "cloud computing resource" in this embodiment may be any resource form that has computing capability and can be implemented as a cloud resource, and may be, for example, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Elastic computing Service instance (ECS), an Elastic GPU Service instance (EGS), or the like. The ECS instance is a service instance which takes a CPU as a computing resource; EGS is a service instance with GPU cards as computing resources.

In this embodiment, the resource scheduling system may provide, in addition to the cloud computing resource for the first customer, a scheduling management service for the cloud computing resource for the first customer, where the scheduling management service is mainly implemented by a resource scheduling node in the resource scheduling system. The first client is in communication connection with the resource scheduling node, and the communication connection may be a wired connection or a wireless connection. Optionally, the first client may be in communication connection with the resource scheduling node through a mobile network, where a network format of the mobile network may be any one of 2G (gsm), 2.5G (gprs), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4G + (LTE +), WiMax, 5G, and other network formats that may appear in the future. Alternatively, the first client may also be in communication connection with the resource scheduling node through bluetooth, WiFi, infrared, or the like.

The first client may submit a resource adjustment request to the resource scheduling node based on its communication connection with the resource scheduling node to request the resource scheduling node to perform resource adjustment for it. Of course, in addition to requesting the resource scheduling node to perform resource adjustment, the resource scheduling node may also send a resource application request to the resource scheduling node to request the resource scheduling node to allocate resources to the resource scheduling node, or send a resource release/recovery request to the resource scheduling node to request the resource scheduling node to release/recover resources occupied by the first client. According to different task types deployed on the cloud computing resources by the first customer, the manner in which the first customer requests the resource scheduling node to perform resource adjustment on the first customer is also different. In this embodiment, the first customer has an online task and an offline task, and the online task and the offline task may share the cloud computing resource allocated by the first customer, that is, both the online task and the offline task are run on the cloud computing resource allocated by the first customer. The first client may submit a resource adjustment request to the resource scheduling node to request the resource scheduling node to adjust the amount of cloud computing resources occupied by each of the online task and the offline task.

The online task is a task which needs to be operated online and has higher requirement on real-time performance; correspondingly, the offline task refers to a task which can be operated offline and has low real-time requirement. The embodiment of the application does not limit the tasks on the line and the tasks off the line. Taking the field of artificial intelligence or deep learning as an example, the off-line task can be a model training task, and the on-line task is a model reasoning task. The model training task is a process that an algorithm engineer of a first client develops an algorithm model according to application requirements, and continuously trains and debugs in a training cluster to finally obtain the algorithm model meeting the application requirements; of course, the method also includes a process of updating the model according to application requirements in a subsequent process, wherein the model updating also requires continuous training and debugging in the training cluster. The model reasoning task refers to a process of running a trained algorithm model on line and providing corresponding service for a target user by using the algorithm model.

The algorithm model and the functions realized by the algorithm model are different according to different application scenes. For example, the algorithm model may be an image recognition model applied to various social platforms to recognize images, a speech recognition model applied to a smart speaker to realize speech recognition, a speech assistant model used as a digital speech assistant in a smart phone, a face recognition model applied to various face recognition systems, a special effect model applied to various video processing systems to take charge of special effects, and the like.

In practical applications, the online task (e.g., model inference task) has stronger timeliness than the offline task (e.g., model training task), for example, for a live video application, only 4 to 6 hours in a day is the peak period of the traffic; at other times, the user traffic is only half as low or even lower than during peak periods. This application traffic fluctuation situation results in the online task having a high elastic demand for cloud computing resources. For the online task, if sufficient cloud computing resources are not obtained in the peak period of the traffic, the service quality is greatly reduced, the user experience is reduced, and even the user loss is caused. However, if a large amount of cloud computing resources are applied (for example, purchased or leased) for the on-line task in the peak period of the traffic, the cloud computing resources are idle in the valley period of the traffic, and the resource waste problem exists.

In view of the above-mentioned two difficulties, in this embodiment, in consideration of the difference between the on-line task and the off-line task in terms of timeliness requirements, the first client is allowed to submit a resource adjustment request to the resource scheduling node according to the difference between the on-line task and the off-line task in terms of cloud computing resource requirements, and the amount of cloud computing resources occupied by the on-line task and the off-line task is dynamically adjusted by the resource scheduling node. For the resource scheduling node, the resource adjustment request submitted by the first client can be received, and the resource amount respectively occupied by the online task and the offline task in the cloud computing resource of the first client is dynamically adjusted according to the received resource adjustment request.

For example, through the resource adjustment request, the resource demand peak time (i.e., the traffic peak time) of the online task can be increased, the amount of cloud computing resources occupied by the online task is increased, and the amount of cloud computing resources occupied by the offline task is reduced; and the resource demand valley period (namely the flow valley period) of the online task, the cloud computing resource amount occupied by the online task is reduced, and the cloud computing resource amount occupied by the offline task is increased, so that the cloud computing resource is reasonably utilized, and the resource utilization rate is improved. Moreover, internal optimization configuration can be preferentially performed on cloud computing resources allocated to the first client by dynamically adjusting the amount of the cloud computing resources occupied by the online tasks and the offline tasks, the first client only needs to apply for proper cloud computing resources, a large amount of cloud computing resources do not need to be applied, and resource waste can be reduced.

Wherein, according to the practical application situation, the resource adjustment request sent by the first client to the resource scheduling node will be different. For example, when an online task enters or is about to enter a peak period of resource demand, a first client may submit a resource expansion request to a resource scheduling node, where the resource expansion request is one of resource adjustment requests. Under the condition that the resource adjustment request is a resource expansion request, the resource scheduling node can release at least part of currently occupied cloud computing resources according to the tasks under the resource expansion request control line, and run the on-line tasks on the cloud computing resources released by the on-line tasks to expand the resources for the on-line tasks. For another example, when the online task enters or is about to enter the resource demand valley period, the first client may submit a resource capacity reduction request to the resource scheduling node, where the resource capacity reduction request is one of the resource adjustment requests. Under the condition that the resource adjustment request is a resource capacity reduction request, the resource scheduling node can control at least part of currently occupied cloud computing resources according to the resource capacity reduction request, and performs resource capacity expansion for the offline task based on the cloud computing resources released by the online task.

Further, if the demand of the online task on the cloud computing resources is not met after the cloud computing resources released by the offline task are subjected to resource expansion for the online task, new cloud computing resources can be continuously allocated to the online task of the first client from the cloud computing resources which are not allocated to any client in the resource pool, so that the demand of the online task on the cloud computing resources is met.

According to the detailed architecture of the cloud computing resource scheduling system and the difference of the cloud computing resource implementation forms, the implementation mode of the resource scheduling node dynamically adjusting the amount of the cloud computing resources occupied by the online task and the offline task according to the resource adjustment request is also different. The cloud resource scheduling system and the resource scheduling process thereof provided in the embodiments of the present application are described below with reference to the architecture of the cloud resource scheduling system shown in fig. 1a and fig. 1b, and taking a cloud computing resource as a GPU card as an example.

Fig. 1a is a schematic structural diagram of a cloud computing resource scheduling system according to an exemplary embodiment of the present application, and as shown in fig. 1a, a cloud resource scheduling system 100 includes: a resource scheduling node 10 and a plurality of GPU cards that have been allocated to a first customer 20. Wherein the multiple GPU cards that have been assigned to the first customer 20 can be shared by the online task and the offline task of the first customer 20. The first client 20 is communicatively connected to the resource scheduling node 10. For detailed descriptions of the first client 20, the online task, the offline task, and the communication connection between the first client 20 and the resource scheduling node 10, reference may be made to the foregoing embodiments, which are not described herein again. Optionally, the cloud resource scheduling system 100 performs resource management and allocation with the GPU card as a resource granularity; for the first customer, the resource that applies for (purchased or leased) to the cloud resource scheduling system 100 is one or more GPU cards. Or, the cloud resource scheduling system 100 performs resource management and allocation with EGS instances as resource granularity; for a first customer, the resources that apply for (buy or lease) from the cloud resource scheduling system 100 are one or more EGS instances; each EGS instance includes at least one GPU card. Whether the resource granularity of resource management and allocation performed by the cloud resource scheduling system 100 is a GPU card or an EGS instance, the resource granularity that the resource scheduling node 10 is responsible for scheduling or adjusting is a GUP card.

In this embodiment, the first client 20 may submit a resource adjustment request to the resource scheduling node 10, so as to request the resource scheduling node 10 to dynamically adjust the GPU card numbers respectively occupied by the online task and the offline task. The resource scheduling node 10 may receive a resource adjustment request submitted by the first client 20, and dynamically adjust, according to the resource adjustment request submitted by the first client, the number of GPU cards occupied by the online task and the offline task, respectively, in the multiple GPU cards allocated to the first client 20.

Wherein the resource adjustment request sent by the first client 20 to the resource scheduling node 10 varies according to the resource requirement of the online task. For example, assuming that the online task is a task in the live broadcast application, which is responsible for beauty or special effects, the traffic situation of the live broadcast application is different in different time periods each day, which means that the amount of tasks to be processed by the beauty or special effect task is different in different time periods, and the GPU resources are required by the beauty or special effect task differently. In a scene, if the user flow of live broadcast application between 0 point and 18 points every day is very small, the amount of videos needing to be processed by the beautifying or special effect task is very small, and too many GPU resources do not need to be occupied; during the peak period of the live broadcast application from 18 to 22 points, the flow of the live broadcast user is very large, and during the period, a large amount of GPU resources are needed by the beauty or special effect task to support the video needing beauty or special effect in the live broadcast application, so that the live broadcast experience of the live broadcast user is improved; after the peak period, that is, the user traffic of the live broadcast application between 22 o 'clock and 0 o' clock of the next day is gradually reduced, the user traffic can be stabilized, and meanwhile, the resource waste can be avoided as much as possible, so that compared with the traffic peak period of the live broadcast application, the GPU resources occupied by the beauty or special task can be properly reduced.

Based on the above analysis, when the online task is in the peak period of resource demand, the first client 20 may send a resource capacity expansion request to the resource scheduling node 10; after receiving the resource expansion request, the resource scheduling node 10 controls the off-line task to release at least part of the currently occupied GPU card, and controls the on-line task to run on the GPU card released by the off-line task, so as to achieve the purpose of performing resource expansion on the on-line task. Accordingly, when the online task is in the low-valley period of resource demand, the first client 20 may send a resource capacity reduction request to the resource scheduling node 10; after receiving the resource capacity reduction request, the resource scheduling node 10 may control the on-line task to release at least part of the GPU cards currently occupied, and perform resource capacity expansion for the off-line task based on the GPU cards released by the on-line task.

As shown in fig. 1a, assume that the first customer 20 has been allocated 100 GPU cards, and 60 GPU cards are allocated for online tasks and 40 GPU cards are allocated for offline tasks, if the current task requirements are met. However, with the change of the GPU resource demand by the online task, the resource scheduling node 10 may dynamically adjust, according to the resource adjustment request submitted by the first client 20, the GPU card number respectively occupied by the online task and the offline task of the first client 20, so as to implement the capacity expansion and capacity reduction of the online task. In fig. 1a, case 1 represents the case of capacity expansion of an online task; case 2 is represented as the case of the on-line task scaling. These two cases are explained below separately:

case 1 (online task expansion):along with the change of the demand of the online task, the online task reaches the peak period of the resource demand, and the current 60 GPU cards cannot meet the resource demand of the online task. In order to obtain more GPU cards to support the online task, the first client 20 sends a resource expansion request to the resource scheduling node 10.

In an optional embodiment, assuming that the offline task needs at least 20 GPU cards and the online task needs at least 80 GPU cards in the peak period, on the premise of meeting the basic resource requirement of the offline task, the resource scheduling node 10 may control the offline task to release 20 GPU cards after receiving the capacity expansion request of the online task, and allocate the 20 GPU cards to the online task for use by the online task. To this end, as shown in FIG. 1a, the online task occupies 80 GPU cards, and the offline task occupies 20 GPU cards. Or,

in another optional embodiment, assuming that the offline task needs at least 20 GPU cards and the online task needs at least 100 GPU cards in the peak period, on the premise of meeting the basic resource requirement of the offline task, the resource scheduling node 10 may control the offline task to release 20 GPU cards after receiving the capacity expansion request of the online task, and allocate the 20 GPU cards to the online task for the online task; in addition, the online task has 20 different GPU cards, but the offline task can not release the GPU cards for the online task any more at the time, so that the online task can be continuously allocated with 20 GPU cards from the GPU cards which are not allocated to any client in the resource pool. To this end, as shown in FIG. 1a, the online task occupies 100 GPU cards, and the offline task occupies 20 GPU cards. Of the 100 GPU cards occupied by the online task, 80 GPU cards are previously allocated by the first client 20 (including 60 originally occupied by the online task and 20 released by the offline task), and the other 20 GPU cards are newly allocated from the resource pool. At this point, the offline task occupies 20 GPU cards that the first client 20 has previously allocated.

Of course, in case 1 above, the description is given taking an example in which all of the 100 GPU cards are occupied by the offline task and the online task, but the present invention is not limited to this. For example, in yet another alternative embodiment, out of 100 GPU cards, the offline task occupies 40 GPU cards, the online task occupies 40 GPU cards, and there are 20 GPU cards that are still and vacant. In one case, assuming that 20 GPU cards need to be expanded for the online task due to the application requirement, the first client 20 may submit a resource expansion request to the resource scheduling node 10, and the resource scheduling node 10 may preferentially provide the empty 20 GPU cards for the online task to use according to the resource expansion request, without reducing the GPU cards occupied by the offline task, which is beneficial to ensuring the execution efficiency of the offline task. In another case, assuming that 40 GPU cards need to be expanded for the online task due to the application requirement, the first client 20 may submit a resource expansion request to the resource scheduling node 10, and the resource scheduling node 10 may preferentially provide the empty 20 GPU cards for the online task to use according to the resource expansion request; further, it is possible to control the offline task to release 20 GPU cards, and then provide the 20 GPU cards released by the offline task to the online task for use.

Case 2 (online task scaling):during non-resource-demand peak periods, the on-line task does not need to occupy too many GPU cards, and it is assumed that 20 GPU cards are enough to meet the resource demand of the on-line task, while the current 60 GPU cards have surplus resources. At this point, the first client 20 may send a resource capacity reduction request to the resource scheduling node 10. Optionally, assuming that the offline task needs at least 20 GPU cards and at most 60 GPU cards, the resource scheduling node 10 may control the online task to release at least part of the currently occupied GPUs, for example, release 40 GPU cards, after receiving the resource capacity reduction request, and further expand the capacity of the offline task on the basis of the 40 GPU cards released by the online task, for example, expand the capacity of 20 GPU cards for the offline task, where the offline task occupies 60 (i.e., 40+20 GPU cards). So far, as shown in fig. 1a, the online task occupies 20 GPU cards, the offline task occupies 60 GPU cards, and there are 20 remaining GPU cards. It should be noted that, the 20 spare GPU cards may be released back to the resource pool, or may be left for capacity expansion for the online task again, which is not limited to this.

Further, as shown in fig. 1b, in order to facilitate dynamic management of the GPU card, the cloud resource scheduling system 100 further includes a resource management node 30. The resource management and control node 30 is used for managing the GPU cards occupied by the offline tasks. For example, when the resource scheduling node 10 needs to increase or decrease the GPU cards occupied by the offline task according to the resource adjustment request submitted by the first client 20, a corresponding notification may be sent to the resource management and control node 30, and the resource management and control node 30 adjusts the number of GPU cards occupied by the offline task. In this embodiment, the deployment implementation of the resource management and control node 30 is not limited. For example, the resource management node 30 may be deployed independently, e.g., in one or more physical machines, virtual machines, or containers. For another example, the resource managing node 30 may be deployed in the same physical machine, virtual machine, or container as the resource scheduling node 10. For another example, the resource management and control node 30 may also be deployed together with the offline task, for example, one or more GPU cards occupied by the offline task are deployed, so that the offline task is closer to the offline task, and management of the GPU cards occupied by the offline task is facilitated. Further optionally, when the resource management and control node 30 is deployed on one or more GPU cards occupied by the offline task, the GPU cards deployed with the resource management and control node 30 are not allowed to be released, so as to ensure that the resource management and control node 30 can continue to perform management and control processing on the GPU cards that are not released.

For case 1 above:when a first client 20 submits a resource expansion request to the resource scheduling node 10, the resource scheduling node 10 may determine, according to the resource expansion request, the number of GPU cards releasable by an offline task, and record the number of GPU cards as K1; then, the resource scheduling node 10 sends a resource release notification to the resource management and control node 30, where the resource release notification carries the GPU card number K1 that the task under the cable can release. The resource management and control node 30 selects K1 GPU cards from the N GPU cards currently occupied by the offline task according to the resource release notification sent by the resource scheduling node 10, and controls K1 GPU cards to finish the offline task, so as to release K1 GPU cards. And then, running (or starting) the online task on K1 GPU cards released by the offline task to achieve the purpose of expanding the capacity of the online task. Wherein K1 and N are natural numbers, and K1 is more than or equal to 1 and less than or equal to N.

In the embodiment of the present application, the implementation manner of determining, by the resource scheduling node 10, the GPU card number K1 releasable by the task under the line according to the resource capacity expansion request is not limited. The following examples illustrate:

in an optional embodiment, the offline task has no lower limit requirement on the number of occupied GPU cards, in this case, the resource scheduling node 10 may parse the GPU card number M1 that needs to be added for the online task from the resource capacity expansion request; and then, determining the GPU card number K1 releasable for the offline task according to the GPU card number N currently occupied by the offline task and the GPU card number M1 required to be increased for the online task. For example, if N is less than or equal to M1, the offline task may release all the N GPU cards currently occupied by the offline task, that is, it is determined that the number K1 of GPU cards releasable by the offline task is N; if N > M1, the offline task can release M1 GPU cards, that is, the number K1 of the GPU cards released by the offline task is determined to be M1. Of course, in the case of N ≦ M1, it is also possible to let the offline task release part of the GPU cards currently occupied by the offline task, for example, release R pieces, where R is a natural number, and 1 ≦ R < N ≦ M1, i.e., determine the number of GPU cards K1 released by the offline task to be R.

In another optional embodiment, the offline task has a lower limit requirement on the number of occupied GPU cards, and the lower limit of the resources required by the offline task is set to L1 GPU cards, in this case, the resource scheduling node 10 may parse the number of GPU cards M1 that need to be added for the online task from the resource capacity expansion request; and determining the releasable GPU card number K1 of the offline task according to the total number N of the GPU cards currently occupied by the offline task and the lower limit L1 of resources required by the offline task and by combining the GPU card number M1 required to be increased for the online task. Theoretically, K1 is less than or equal to M1, so that the execution efficiency and quality of off-line tasks can be ensured. Certainly, K1 > M1 is also applicable to the embodiment of the present application, and M1 pieces of GPU cards may be selected from K1 pieces of GPU cards to run on-line tasks, or the on-line tasks may be run or started on all K1 pieces of GPU cards, so as to achieve the purpose of expanding the capacity of the on-line tasks.

In FIG. 1b, assume that the first customer 20 has been assigned 100 GPU cards, and 60 GPU cards are assigned to the online task and 40 GPU cards are assigned to the offline task if the current task requirements are met; in addition, assume that the lower limit of resources L1 required by the offline task is 20 GPU cards, and as the demand for GPU resources by the online task changes, assume that 40 GPU cards need to be added to the online task, i.e. M1 is 40, i.e. case 1 shown in fig. 1 b. In this example, the resource scheduling node 10 may determine, according to 40 GPU cards currently occupied by the offline task and the lower limit of resources required by the offline task, 20 GPU cards, that the number of GPU cards K1 releasable by the offline task is 20, so as to ensure that the number of GPU cards eventually occupied by the offline task cannot be lower than the lower limit of resources required by the offline task. Then, the resource scheduling node 10 sends a resource release notification to the resource management and control node 30, and notifies the resource management and control node 30 to reduce the number of GPU cards occupied by the offline task from 40 to 20. For the resource control node 30, selecting 20 GPU cards to be released from the 40 GPU cards currently occupied by the offline task; control stops the executing offline tasks on the 20 GPU cards to release the 20 GPU cards. After the on-line task is run or started on the 20 GPU cards, the purpose of capacity expansion for the on-line task is achieved. It is explained here that the offline tasks that are stopped can be allocated to 20 GPU cards that are not released to continue execution; alternatively, the part of the offline task may be directly ended, which is not limited to this.

Further, the above-mentioned embodiment of determining the releasable GPU card number K1 for the offline task according to the currently occupied GPU card number N for the offline task and the lower limit of resources L1 required by the offline task includes: judging whether the GPU card number N currently occupied by the offline task is larger than a lower limit L1 of resources required by the offline task; in the case that the number N of the GPU cards currently occupied by the offline task is larger than the required lower resource limit L1, calculating the difference (N-L1) between the number N of the GPU cards and the lower resource limit L1, wherein the difference represents the number of the GPU cards most releasable by the online task; then, determining whether the difference (N-L1) is greater than or equal to M1; if the difference (N-L1) is greater than or equal to M1, which indicates that the offline task can release M1 GPU cards which need to be added for the online task, the number K1 of GPU cards released by the offline task can be determined as M1; if the difference (N-L1) is smaller than M1, it indicates that the offline task cannot release M1 GPU cards that need to be added to the online task, and even if all the GPU cards that can be released by the offline task are released, the number M1 of GPU cards that need to be added to the online task cannot be reached, for this case, all the GPU cards that can be released by the offline task can be released, that is, the number K1 of GPU cards that can be released by the offline task is determined as the difference between the total number N of GPU cards currently occupied by the offline task and the required resource lower limit L1, that is, (N-L1). Wherein, L1 and M1 are natural numbers, L1 is more than or equal to 1, and K1 is more than or equal to M1. As shown in case 1 in fig. 1b, 40 GPU cards need to be added to the online task to meet the resource requirement, but after the offline task meets the minimum resource requirement of 20 GPU cards, only 20 GPU cards can be released for the online task.

Further, as shown in fig. 1b, the cloud resource scheduling system 100 further includes a resource pool 40, and the resource pool 40 stores a plurality of GPU cards, including the GPU card that has been allocated to the first customer 20 and the GPU card that has not been allocated to any customer. On the basis that the resource pool 40 contains the GPU cards which are not allocated to any client, in the case that it is determined that the number K1 of the GPU cards which can be released by the offline task is smaller than the number M1 of the GPU cards which need to be added to the online task, that is, in the case that K1< M1, the resource scheduling node 10 may allocate (M1-K1) GPU cards to the online task from the GPU cards which are not allocated to any client and are contained in the resource pool 40, so that the online task successfully adds M1 GPU cards, the purpose of successful capacity expansion is achieved, and the service quality of the online task is ensured. As shown in fig. 1b, in case 1, the online task needs to expand 40 GPU cards, but the offline task can only release 20 GPU cards, for this, the resource scheduling node 10 may allocate 20 GPU cards to the online task from the GPU cards that are not yet allocated to any customer in the resource pool 40, so as to meet the expansion requirement of the online task and ensure the service quality of the online task.

In the embodiment of the present application, the resource scheduling node 10 is not limited to the embodiment of allocating (M1-K1) GPU cards from the resource pool 40 for the online tasks. For example, in an alternative embodiment, it may be preset or default that in the case that the number of GPU cards K1 that can be released by the offline task is smaller than the number of GPU cards M1 that need to be added for the online task, the resource scheduling node 10 automatically allocates (M1-K1) GPU cards for the online task from the GPU cards that have not been allocated to any client in the resource pool 40, and this embodiment does not require the first client 20 to participate. For another example, in another alternative embodiment, in the case that the number of GPU cards K1 that can be released by the offline task is smaller than the number of GPU cards M1 that need to be added for the online task, the resource scheduling node 10 may return to the first client 20 an insufficiency-of-resource prompt message that the offline task releases K1 GPU cards and lacks (M1-K1) GPU cards compared to the number of GPU cards M1 that need to be added for the online task. The first client 20 determines whether to continue to obtain the missing (M1-K1) GPU cards from the resource pool 40 based on the hint information. If the first client 20 determines to continue to acquire the GPU card from the resource pool 40 after receiving the hint, the first client 20 may submit information to the resource scheduling node 10 confirming acquisition of the GPU card from the resource pool 40 for the online task. After receiving the confirmation information of the first client 20, the resource scheduling node 10 allocates (M1-K1) GPU cards to the online task from the GPU cards that have not been allocated to any client in the resource pool 40, so as to achieve the purpose of capacity expansion.

Further, in the above embodiment, the embodiment that the resource management and control node 30 selects K1 GPU cards from the N GPU cards occupied by the offline task is not limited, and any embodiment that K1 GPU cards can be selected is applicable to the embodiment of the present application.

For example, the resource management and control node 30 may randomly select K1 GPU cards from the N GPU cards currently occupied by the offline task by using a random algorithm. The resource control node 30 may adopt a random algorithm or a hash algorithm, and may perform hash operation on identification information such as the numbers of the N GPU cards by using a hash function to obtain a hash value; k1 GPU cards with the minimum hash value are selected from the GPU cards; alternatively, K1 GPU cards with the largest hash value may be selected from among the GPU cards.

For another example, the resource management and control node 30 may select, according to the source of the to-be-processed application traffic and the geographic locations of the N GPU cards currently occupied by the offline task, K1 GPU cards that are closest to the source of the to-be-processed application traffic from the N GPU cards currently occupied by the offline task. The application traffic to be processed refers to application traffic that causes or triggers resource expansion for the online task. For example, in a live application scenario, assuming that the number of live users in city a is suddenly increased, in order to have enough beauty or special effects tasks to beautify or special effects on video content of these live users, K1 GPU cards need to be added to run the beauty or special effects tasks, and in this case, K1 GPU cards closest to city a may be selected from the N GPU cards currently occupied by the offline tasks according to their physical positions. For example, a GPU card located within city a may be preferentially selected; if there are not enough K1 GPU cards located in city A, then the GPU card in city B that is closest to city A may be selected, and so on until K1 GPU cards are selected. It should be noted that the GPU cards in the cloud resource scheduling system 100 may be distributed in different geographic locations, and of course, may be centralized in a certain geographic range.

For case 2 above:is proposed to the resource scheduling node 10 at the first client 20When delivering the resource capacity-reducing request, the resource scheduling node 10 may control the online task to release M2 GPU cards currently occupied, then determine the GPU card number K2 that the offline task may increase according to the resource upper limit L2 required by the offline task and the currently vacant GPU card number, and send a resource increase notification to the resource management and control node 30, where the resource increase notification carries the GPU card number K2 that the offline task may increase. The currently empty GPU cards include M2 GPU cards freed up by the online task. If there are some spare GPU cards before the online task releases M2 GPU cards, the current spare GPU cards include both the previously spare GPU cards and the M2 GPU cards released by the online task.

The resource management and control node 30 adds K2 GPU cards to the offline task based on the currently vacant GPU cards according to the resource increase notification sent by the resource scheduling node 10, so as to achieve the purpose of capacity expansion for the offline task while capacity contraction for the online task. Wherein K2 and M are natural numbers, and K2 is more than or equal to 1 and less than or equal to M.

In FIG. 1b, assume that the first customer 20 has been assigned 100 GPU cards, and 60 GPU cards are assigned to the online task and 40 GPU cards are assigned to the offline task if the current task requirements are met; in addition, assume that the upper limit of resources L2 required for the offline task is 60 GPU cards. With the change of the on-line task on the GPU resource requirement, the on-line task can meet the current resource requirement only by 20 GPU cards, and the resources of 60 current GPU cards are surplus. At this point, the first client 20 may send a resource capacity reduction request to the resource scheduling node 10, i.e. case 2 shown in fig. 1 b. In this example, the online task currently occupies 60 GPU cards, but it requires 20 GPU cards, and the resource scheduling node 10 may determine that the number of GPU cards that the online task can release is 40, i.e. M2 is 40; the GPU cards released by the online tasks can be used for capacity expansion of the offline tasks, but the number of the GPU cards finally occupied by the offline tasks is required to be ensured not to be higher than the upper limit of the required resources of 60 GPU cards. Then, the resource scheduling node 10 sends a resource increase notification to the resource management and control node 30, and notifies the resource management and control node 30 to add 20 GPU cards for the offline task. To this end, as shown in fig. 1b, the offline task occupies 60 GPU cards, the online task occupies 20 GPU cards, and 20 GPU cards are left empty. The 20 spare GPU cards may be released back to the resource pool, or may be left to be expanded for the online task again, which is not limited to this.

Further, in the above embodiment, the determining, by the resource scheduling node 10, the GPU card number K2 that can be increased by the offline task according to the upper limit of the resources required by the offline task L2 and the currently spare GPU card number includes: judging whether the GPU card number N currently occupied by the offline task is smaller than an upper limit L2 of resources required by the offline task; in the case that the number N of the GPU cards currently occupied by the offline task is smaller than the required upper limit L2 of the resources, the difference (L2-N) between the lower limit L1 of the GPU resources and the number N of the GPU cards can be calculated, and the difference represents the number of the GPU cards which need to be increased to reach the upper limit L2 of the GPU cards currently occupied by the offline task; then, judging whether the difference value (L2-N) is larger than or equal to the current number K3 of the vacant GPU cards; if the difference (L2-N) is greater than or equal to K3, which indicates that even if all the currently vacant GPU cards are provided for the offline task, the number of GPU cards occupied by the offline task cannot reach the upper resource limit L2, the number of GPU cards K2 which can be increased by the offline task can be determined to be K3; if the difference (L2-N) is less than K3, then the GPU card number K2 that the offline task may increase may be determined to be the difference (L2-N).

Further, in the embodiment of the present application, the resource management and control node 30 is not limited to the implementation of adding K2 GPU cards to the offline task based on the currently vacant GPU cards. In an optional embodiment, the resource management and control node 30 may select K2 GPU cards from the currently vacant GPU cards, deploy offline tasks on the selected K2 GPU cards, and continue to run the offline tasks by the N GPU cards that originally run the offline tasks, thereby achieving the purpose of adding K2 GPU cards to the offline tasks. In another alternative embodiment, the resource management node 30 may determine whether (K2+ N) is less than the current number of empty GPU cards; and under the condition that the number of the (K2+ N) GPU cards is less than or equal to the number of the current vacant GPU cards, selecting (K2+ N) GPU cards from the current vacant GPU cards, redeploying the offline tasks in the selected (K2+ N) GPU cards, and controlling the N GPU cards occupied before the offline tasks to finish the offline tasks, thereby achieving the purpose of adding the K2 GPU cards for the offline tasks. Certainly, when (K2+ N) is greater than the number of currently vacant GPU cards, the resource management and control node 30 may select K2 GPU cards from the currently vacant GPU cards, deploy offline tasks on the selected K2 GPU cards, and continue to run the offline tasks by the N GPU cards that originally run the offline tasks, thereby achieving the purpose of adding K2 GPU cards to the offline tasks.

In the embodiment of the present application, in the case that the off-line task releases the GPU card, it is not limited to whether the off-line task originally running on the released GPU card is transferred to the GPU card that is not released for execution, or is subjected to other processing (for example, directly abandoned). The processing mode may be different according to the offline task. In an alternative embodiment, the offline task of the first customer 20 is a model training task and, correspondingly, the online task is a model reasoning task. For the description of the model training task and the model inference task, reference may be made to the foregoing embodiments, which are not described herein again.

In the embodiments of the present application, the model training task involves several concepts: batch size (batch size), total number of samples U, number of steps of model training (step), model training, and a gradient of synchronization between GPU cards. Firstly, the technical solutions of the embodiments of the present application are combined to explain these concepts: batch size (batch size) refers to the number of samples that each GPU card needs to process per model training; the total number U of samples refers to the number of all samples needing to be processed in the model training task; the primary model training refers to a process that the GPU card executes a training task according to batch size; the step number (step) of model training refers to the number of iterations required for all samples to participate in one training task; synchronization gradients are needed among GPU cards executing the model training tasks, and the synchronization of the gradients among the GPU cards refers to a process of carrying out communication synchronization task states among the GPU cards after all samples participate in a training task. The total number of samples T, the number of GPU cards N for executing offline tasks, the size bs of the batch size and the size sp of the training steps satisfy the following relations: t ═ N × bs × sp.

In the embodiment of the application, the number of the GPU cards occupied by the model training task is dynamically reduced or increased according to the elastic requirement condition of the model reasoning task on the GPU cards. For the model training scenario, if the model training task releases part of the GPU cards, it is necessary to migrate the model training task originally running on the released GPU card to the GPU card that is not released, so as to ensure the execution quality and integrity of the model training task. Based on this, in this embodiment, in the process of controlling the model training task to release the GPU cards, the resource control node 30 may further calculate the increased step number sp _1 of the model training task according to the number of GPU cards (N-K1) that the offline task can still occupy after releasing K1 GPU cards and the total number T of samples of the model training task, while keeping the original batch size of the model training task unchanged; and controlling (N-K1) GPU cards which can be occupied by the model training task to continuously execute the model training task according to the original batch size bs and the increased step number sp _ 1. Wherein, gradient is synchronized every time model training is executed for sp _1 times among (N-K1) GPU cards which can be occupied by the model training task. Wherein sp _1 ═ T/((N-K1) × bs).

For example, assuming that the total number of samples T of the model training task is 8000, the number of GPU cards N originally performing the model training task is 40, and the number of steps sp of the model training is 1, it is possible to calculate the raw batch size bs used for performing the model training once per GPU card is 8000/(40 × 1) 200. With the expansion of the model inference task, assuming that the model training task releases 20 GPU cards and still occupies 20 GPU cards, at this time, under the condition that the batch size bs is guaranteed to be unchanged at 200, the number of model training steps may be increased as the number of GPU cards decreases, i.e., sp _1 is 8000/(20 is 200) is 2. For each GPU card, according to sp _1 being 2, performing model training twice each time and synchronizing with other GPU cards for one gradient; each model training was trained on 200 out of 8000 samples. Thus, each GPU card actually processes 400 samples and 20 GPU cards process 8000 samples in total during each gradient synchronization, and the integrity of the model training task is guaranteed. In addition, under the condition that the number of the GPU cards is reduced, the number of model training steps executed by each GPU card is increased, so that the communication times of synchronous gradients can be kept unchanged, and the reduction of communication overhead is facilitated.

Corresponding to the situation that the GPU cards are released by the model training task, the batch size of the model training task can be kept unchanged in the process of adding the GPU cards to the model training task, the number of steps of model training executed by each GPU card in the process of synchronizing the gradient once is changed, the number of communication times of the synchronous gradient is ensured to be unchanged as much as possible, and the change of execution logic of the model training task is reduced. Taking the case of adding K2 GPU cards to the model training task in case 2 as an example, in order to control the batch size of the model training task to be constant by the resource control node 30, in the process of adding K2 GPU cards to the model training task, whether the current training step number sp of the model training task is more than 1 can be judged, when the current training step number sp is greater than 1, which indicates that the model training task is scaled down, in order to ensure that the batch size of the model training task is not changed, the resource control node 30 may calculate the training step number sp _2 after the model training task is scaled down according to the number of GPU cards (K2+ N) occupied after K2 GPU cards are added to the model training task and the total number of samples T of the model training task, and controls the (K2+ N) GPU cards finally executing the model training task to execute the model training task with the original batch size and the reduced training step number step.

Continuing with the above example, assume that after releasing the GPU cards for the model inference task, the number of GPU cards N executing the model training task is 20, the number of steps sp of the model training is 2, the batch size bs used for executing the model training for each GPU card is 200, and the total number of samples T of the model training task is 8000. With the scaling of the model inference task, it is assumed that the model inference task releases 40 GPU cards, and 20 GPU cards are added to the model training task from the 40 GPU cards, at this time, under the condition that the batch size bs is guaranteed to be 200 unchanged, the number of steps of model training may be reduced as the number of GPU cards occupied by the model training task increases, that is, sp _2 is 8000/(40 × 200) ═ 1. For each GPU card, synchronizing the gradient with other GPU cards once model training is executed once according to sp _2 ═ 1; each model training was trained on 200 out of 8000 samples. Thus, each GPU card actually processes 200 samples and 40 GPU cards process 8000 samples in total during each gradient synchronization, and the integrity of the model training task is guaranteed.

Further, for the case of adding GPU cards to the model training task, if the resource control node 30 determines that the current training step sp of the model training task is 1, which means that the training step sp cannot be continuously reduced, it is possible to ensure that the model training task is equally distributed to each GPU card by reducing the batch size, and ensure that the model training task is successfully executed.

The above procedure for adjusting the training step number (step) is described with reference to the overall flow of model training shown in fig. 1 c. In the embodiment shown in fig. 1c, resource allocation is performed in units of nodes, where the nodes may be GPU instances, each node is configured with W GPU cards, and the value of W can be flexibly set according to application requirements. As shown in fig. 1c, the model training process comprises the following steps:

and 11c, defining resources needed by the model training task as X W, wherein X is the number of nodes, W is the number of GPU cards needing to be configured on each node, and X and W are both natural numbers more than or equal to 1.

And 12c, distributing X nodes for the model training task, configuring W GPU cards on each node, and starting the model training task.

13c, starting model training by X W GPU cards.

14c, judging whether the model training is finished or not; if yes, ending the training and storing the trained model; if not, go to step 15 c.

Alternatively, a model training end condition may be set, for example, a maximum number of iterations is set; when the iteration times reach the maximum iteration times, ending the model training and storing the trained model; otherwise, the subsequent steps are continuously executed.

15c, the current batch size (batch) is calculated forward and step 16c is performed.

16c, current batch size (batch) back calculation.

The forward calculation and the backward calculation are calculation processes in the model training process. The forward calculation is used for calculating the influence of the input layer nodes of the neural network on the hidden layer nodes, namely, the neural network is positively walked: input layer- > hidden layer- > output layer, calculating the influence of each node on the node of the next layer. The backward calculation is a calculation process of solving the gradient of the parameters in a step-by-step manner and updating each parameter by using a gradient descent method according to the loss (loss) generated by the final output of the model training. The forward calculation and backward calculation algorithms used in different model training processes are different, and are not limited.

17c, judging whether the step number (step) value of the updated model parameter is reached; if yes, go to step 18 c; if not, go to step 19 c.

18c, synchronizing the communication gradients, updating the model parameters and continuing to execute step 19 c.

19c, judging whether the step number (step) value of the updated model parameter needs to be adjusted or not; if yes, go to step 20 c; if not, go to step 21 c.

The step number (step) value of the updated model parameter is the number of model training steps mentioned in the previous embodiment, i.e. the number of iterations required for all samples to participate in one model training. After each time of model training of X W GPU cards according to batch size, judging whether a step number (step) value of updating model parameters is reached; if so, it is stated that all samples participate in one model training, and at this time, communication synchronization gradients need to be performed between X × W GPU cards, so step 18c is entered, i.e., the communication gradients are synchronized, and the model parameters are updated; on the contrary, it is indicated that there are samples not participating in the model training, and no communication synchronization gradient is required between X × W GPU cards, and the process may directly proceed to step 19c to determine whether the step number (step) value of the updated model parameter needs to be adjusted in the current model training process.

If a notification of reducing the number of the GPU cards is received or a notification of increasing the number of the GPU cards is received in the model training process, it means that the step (step) value needs to be adjusted while the batch size (batch) is guaranteed to be unchanged. If the step number (step) value needs to be adjusted, go to step 20 c; if no step (step) value adjustment is required, step 21c is entered directly.

20c, adjusting the step number (step) value of the updated model parameter, and executing the step 21 c.

In step 20c, if the notification of the reduced GPU card number is received and the GPU card number needs to be reduced from X × W to (X × W)/2, on one hand, (X × W)/2 GPU card resources can be released, and on the other hand, the step number (step) value of the updated model parameters can be adjusted to 2 times the original value.

21c, judging whether GPU card resources need to be released or not; if yes, go to step 22 c; if not, go to step 14 c.

And 22c, releasing the GPU card resources currently occupied by the model training task.

In step 21c, judging whether GPU card resources need to be released or not in the model training process, if so, entering step 22c, releasing the GPU card resources currently occupied by the model training task, and ending the model training task; if not, returning to the step 14c, and continuing to enter the model training process.

It should be noted that in the embodiments of the present application, one or more online tasks may be provided, and one or more offline tasks may also be provided. Wherein the first client 20 may specify for which online task the resource is currently requested to be scaled or scaled. No matter which on-line task is, the method provided by the above embodiment may be adopted to perform resource expansion or contraction for the on-line task. In addition, when there are a plurality of offline tasks, in the process of expanding the capacity of the resource of a certain online task, the resource scheduling node 10 may select the offline tasks that can release the resource from all the offline tasks in a certain manner, and control at least part of the cloud computing resources occupied by the offline tasks to be released, so as to expand the capacity of the online task. Of course, if one offline task cannot release enough cloud computing resources under the condition that multiple offline tasks are available, the multiple offline tasks can be controlled to release the cloud computing resources for the online tasks at the same time. Further, as time goes on, a new offline task is submitted, and an offline task is also delivered due to completion, and in this process, when a new offline task is submitted, if there are no vacant cloud computing resources (for example, there are no vacant GPU cards), the newly submitted offline task may be queued according to the order of submission or priority, and the like, until there are vacant cloud computing resources to be run.

The resource scheduling system of the embodiment of the application provides a resource management mode for dynamically adjusting the cloud computing resource amount occupied by the online task and the offline task according to the difference of the online task and the offline task on the cloud computing resource requirements, so that the cloud computing resource amount occupied by the online task in the resource requirement peak period can be increased and the cloud computing resource amount occupied by the offline task can be reduced according to the elastic requirements of the online task on the cloud computing resources; the cloud computing resource amount occupied by the online task in the resource demand valley period can be reduced, and the cloud computing resource amount occupied by the offline task is increased, so that the cloud computing resources are reasonably utilized, and the resource waste is reduced.

The embodiment of the application also provides a cloud resource scheduling method, which is suitable for the resource scheduling node, and the resource scheduling node can manage and control the cloud computing resources. Fig. 2 is a flowchart of a cloud resource scheduling method taking a GPU card as an example in the embodiment of the present application, and as shown in fig. 2, the method includes the following steps:

and 21a, receiving a resource adjustment request submitted by a first client, wherein the online task and the offline task of the first client share the multiple GPU cards allocated by the first client.

And 22a, dynamically adjusting the number of the GPU cards respectively occupied by the online tasks and the offline tasks in the multiple GPU cards according to the resource adjustment request.

The "first client" may be a user, or may be various application systems (for example, personal application, enterprise application, or government application), some functional plug-ins, functional modules, or chips, and the like, or may be a terminal device, a server, or devices such as multiple computer application devices, an enterprise cluster server, a distributed server, and a cloud server. The number of the first clients may be one or multiple, which is not limited herein.

The first client has an online task and an offline task; the online task and the offline task may share the GPU card to which the first customer has been assigned. The online task refers to a task which needs to be operated online and has high real-time requirement; correspondingly, the offline task refers to a task which can be operated offline and has low real-time requirement. The embodiment of the application does not limit the tasks on the line and the tasks off the line. Taking the field of artificial intelligence or deep learning as an example, the off-line task can be a model training task, and the on-line task is a model reasoning task.

In practical applications, the actual demands of the online tasks on resources are different for different application demands, for example, as in a live video application, there are live peak periods and valley periods in a day, and thus the timeliness requirements of the online tasks on resources are very high. Therefore, considering the difference of the timeliness requirements of the online task and the offline task, aiming at the difference of the requirements of the online task and the offline task on the GPU cards, the first client can submit a resource adjusting request to the resource scheduling node, and the resource scheduling node dynamically adjusts the number of the GPU cards occupied by the online task and the offline task according to the resource adjusting request.

Resource scheduling requests may vary according to the resource requirements of the on-line tasks. For example, when the online task enters or is about to enter the peak period of resource demand, the first client may submit a resource capacity expansion request to the resource scheduling node, requesting to perform resource capacity expansion for the online task. For another example, when the online task enters or is about to enter the resource demand valley period, the first client may submit a resource capacity reduction request to the resource scheduling node, so as to request resource capacity reduction for the online task.

Optionally, in a case that the resource adjustment request is a resource expansion request, the implementation manner of dynamically adjusting the number of GPU cards respectively occupied by the online task and the offline task in the cloud computing resource according to the resource adjustment request includes: and the task under the control line releases at least part of the GPU cards occupied currently and controls at least part of the GPU cards released by the task under the control line to run the tasks on the line, so that the purpose of expanding the resources of the tasks on the line is achieved.

Optionally, the off-line task may be controlled by the resource management and control node to release at least part of the GPU card currently occupied. Specifically, when a first client submits a resource expansion request to a resource scheduling node, the number of GPU cards releasable by an offline task may be determined according to the resource expansion request, and is recorded as K1; and then, sending a resource release notification to the resource management and control node, wherein the resource release notification carries a GPU card number K1 which can be released by the task under the cable. And the resource control node selects K1 GPU cards from the N GPU cards currently occupied by the offline task according to the resource release notice sent by the resource scheduling node, and controls K1 GPU cards to finish the offline task so as to release K1 GPU cards. And then, running (or starting) the online task on K1 GPU cards released by the offline task to achieve the purpose of expanding the capacity of the online task. Wherein K1 and N are natural numbers, and K1 is more than or equal to 1 and less than or equal to N.

In the embodiment of the present application, an implementation manner of determining the GPU card number K1 releasable by the task under line according to the resource expansion request is not limited. The following examples illustrate:

in an optional embodiment, the offline task has no lower limit requirement on the number of occupied GPU cards, and in this case, the GPU card number M1 that needs to be added for the online task may be parsed from the resource capacity expansion request; and then, determining the GPU card number K1 releasable for the offline task according to the GPU card number N currently occupied by the offline task and the GPU card number M1 required to be increased for the online task. For example, if N is less than or equal to M1, the offline task may release all the N GPU cards currently occupied by the offline task, that is, it is determined that the number K1 of GPU cards releasable by the offline task is N; if N > M1, the offline task can release M1 GPU cards, that is, the number K1 of the GPU cards released by the offline task is determined to be M1. Of course, in the case of N ≦ M1, it is also possible to let the offline task release part of the GPU cards currently occupied by the offline task, for example, release R pieces, where R is a natural number, and 1 ≦ R < N ≦ M1, i.e., determine the number of GPU cards K1 released by the offline task to be R.

In another optional embodiment, the offline task has a lower limit requirement on the number of occupied GPU cards, and the lower limit of the resources required by the offline task is set to L1 GPU cards, in this case, the number M1 of GPU cards that needs to be added for the online task may be resolved from the resource capacity expansion request; and determining the releasable GPU card number K1 of the offline task according to the total number N of the GPU cards currently occupied by the offline task and the lower limit L1 of resources required by the offline task and by combining the GPU card number M1 required to be increased for the online task. Theoretically, K1 is less than or equal to M1, so that the execution efficiency and quality of off-line tasks can be ensured. Certainly, K1 > M1 is also applicable to the embodiment of the present application, and M1 pieces of GPU cards may be selected from K1 pieces of GPU cards to run on-line tasks, or the on-line tasks may be run or started on all K1 pieces of GPU cards, so as to achieve the purpose of expanding the capacity of the on-line tasks.

Further, the above-mentioned embodiment of determining the releasable GPU card number K1 for the offline task according to the currently occupied GPU card number N for the offline task and the lower limit of resources L1 required by the offline task includes: judging whether the GPU card number N currently occupied by the offline task is larger than a lower limit L1 of resources required by the offline task; in the case that the number N of the GPU cards currently occupied by the offline task is larger than the required lower resource limit L1, calculating the difference (N-L1) between the number N of the GPU cards and the lower resource limit L1, wherein the difference represents the number of the GPU cards most releasable by the online task; then, determining whether the difference (N-L1) is greater than or equal to M1; if the difference (N-L1) is greater than or equal to M1, which indicates that the offline task can release M1 GPU cards which need to be added for the online task, the number K1 of GPU cards released by the offline task can be determined as M1; if the difference (N-L1) is smaller than M1, it indicates that the offline task cannot release M1 GPU cards that need to be added to the online task, and even if all the GPU cards that can be released by the offline task are released, the number M1 of GPU cards that need to be added to the online task cannot be reached, for this case, all the GPU cards that can be released by the offline task can be released, that is, the number K1 of GPU cards that can be released by the offline task is determined as the difference between the total number N of GPU cards currently occupied by the offline task and the required resource lower limit L1, that is, (N-L1). Wherein, L1 and M1 are natural numbers, L1 is more than or equal to 1, and K1 is more than or equal to M1.

In the embodiment of the present application, for the case that the number of GPU cards K1 is less than M1, one GPU card (M1-K1) may be allocated to the online task from the GPU cards included in the resource pool that are not allocated to any client, so that the online task successfully adds M1 GPU cards, the purpose of successful capacity expansion is achieved, and the service quality of the online task is ensured. The resource pool stores a plurality of GPU cards, including a plurality of GPU cards distributed to a first customer and a plurality of GPU cards not distributed to any customer.

In the embodiment of the present application, the implementation of allocating (M1-K1) GPU cards from the resource pool for the online task is not limited. For example, in an alternative embodiment, if the number of GPU cards K1 that can be released by the offline task is smaller than the number of GPU cards M1 that need to be added to the online task, then GPU cards (M1-K1) are automatically allocated to the online task from the GPU cards that have not been allocated to any client from the resource pool, and this embodiment does not require the first client to participate. For another example, in another alternative embodiment, in a case where the number of GPU cards K1 that can be released by the offline task is less than the number of GPU cards M1 that need to be added for the online task, an insufficient resource notification message may be returned to the first client, the insufficient resource notification message indicating that the offline task releases K1 GPU cards and lacks (M1-K1) GPU cards compared to the number of GPU cards M1 that need to be added for the online task. The first client determines whether to continue to acquire the missing (M1-K1) GPU cards from the resource pool according to the prompt message. If the first client determines to continue to acquire the GPU card from the resource pool after receiving the prompt message, the first client can submit information for confirming to acquire the GPU card for the online task from the resource pool to the resource scheduling node. After receiving the confirmation information of the first client, the resource scheduling node allocates (M1-K1) GPU cards to the online tasks from the GPU cards which are not allocated to any client in the resource pool, so that the purpose of capacity expansion is achieved.

Optionally, in a case that the resource adjustment request is a resource capacity reduction request, an implementation manner of dynamically adjusting, according to the resource adjustment request, the number of GPU cards respectively occupied by the online task and the offline task in the cloud computing resource includes: the online task can be controlled to release at least part of the GPU cards occupied currently; and carrying out resource expansion for the offline task based on the GPU card released by the online task.

Optionally, resource expansion for the offline task by the GPU card based on the online task release may be implemented by the resource management and control node. Specifically, when a first client submits a resource capacity reduction request to a resource scheduling node, an online task can be controlled to release M2 GPU cards occupied currently, then, the number K2 of GPU cards which can be increased by the offline task is determined according to an upper limit L2 of resources required by the offline task and the number of GPU cards which are vacant currently, and a resource increase notification is sent to the resource management and control node, wherein the resource increase notification carries the number K2 of GPU cards which can be increased by the offline task. The currently empty GPU cards include M2 GPU cards freed up by the online task. If there are some spare GPU cards before the online task releases M2 GPU cards, the current spare GPU cards include both the previously spare GPU cards and the M2 GPU cards released by the online task. For the resource control node, K2 GPU cards can be added to the offline task based on the currently vacant GPU cards according to the resource increase notification sent by the resource scheduling node, so as to achieve the purpose of capacity expansion for the offline task while capacity contraction for the online task. Wherein K2 and M are natural numbers, and K2 is more than or equal to 1 and less than or equal to M.

Further, the above embodiments of determining the number of GPU cards K2 that can be increased by the offline task according to the upper limit of resources L2 required by the offline task and the currently vacant number of GPU cards include: judging whether the GPU card number N currently occupied by the offline task is smaller than an upper limit L2 of resources required by the offline task; in the case that the number N of the GPU cards currently occupied by the offline task is smaller than the required upper limit L2 of the resources, the difference (L2-N) between the lower limit L1 of the GPU resources and the number N of the GPU cards can be calculated, and the difference represents the number of the GPU cards which need to be increased to reach the upper limit L2 of the GPU cards currently occupied by the offline task; then, judging whether the difference value (L2-N) is larger than or equal to the current number K3 of the vacant GPU cards; if the difference (L2-N) is greater than or equal to K3, which indicates that even if all the currently vacant GPU cards are provided for the offline task, the number of GPU cards occupied by the offline task cannot reach the upper resource limit L2, the number of GPU cards K2 which can be increased by the offline task can be determined to be K3; if the difference (L2-N) is less than K3, then the GPU card number K2 that the offline task may increase may be determined to be the difference (L2-N).

In the embodiment of the present application, in the case that the off-line task releases the GPU card, it is not limited to whether the off-line task originally running on the released GPU card is transferred to the GPU card that is not released for execution, or is subjected to other processing (for example, directly abandoned).

In the embodiment of the application, the cloud computing resource amount occupied by the online task and the offline task can be dynamically adjusted according to the difference of the online task and the offline task in the demands on the cloud computing resources, so that the cloud computing resource amount occupied by the online task in the resource demand peak period can be increased, and the cloud computing resource amount occupied by the offline task can be reduced; the cloud computing resource amount occupied by the online task in the resource demand valley period can be reduced, and the cloud computing resource amount occupied by the offline task is increased, so that the cloud computing resources are reasonably utilized, and the resource waste is reduced.

The embodiment of the application further provides a cloud resource scheduling method, which is applicable to a resource management and control node and used for scheduling the multiple GPU cards allocated to the first client in cooperation with the resource scheduling node, wherein the online tasks and the offline tasks of the first client share the multiple GPU cards allocated to the first client. Fig. 3a is a flowchart of another cloud resource scheduling method taking a GPU card as an example according to the embodiment of the present application, and as shown in fig. 3a, the method includes the following steps:

31a, receiving a resource release notification sent by the resource scheduling node, wherein the resource release notification includes a GPU card number K1 that the offline task can release.

32a, according to the resource release notice, selecting K1 GPU cards from the N GPU cards currently occupied by the offline task.

33a, controlling the K1 GPU cards to finish off-line tasks so as to release K1 GPU cards; wherein K1 and N are natural numbers, and K1 is more than or equal to 1 and less than or equal to N.

In this embodiment of the application, after the resource scheduling node receives the resource expansion request submitted by the first client, the resource scheduling node may determine, according to the resource expansion request, the GPU card number K1 that can be released by the task under the line, and notify the resource management and control node, so that the resource management and control node controls the task under the line to release K1 GPU cards from the currently occupied N GPU cards. For a detailed process of how the resource scheduling node determines the GPU card number K1 that can be released by the task under the line according to the resource capacity expansion request, reference may be made to the foregoing embodiments, and details are not described here. And after the resource management and control node receives the notification, the resource management and control node can select K1 GPU cards from the N GPU cards currently occupied by the tasks, stop the tasks currently executed on the K1 GPU cards, and deploy online tasks on the K1 GPU cards to achieve the purpose of capacity expansion for the online tasks.

In this embodiment, the embodiment that K1 GPU cards are selected from N GPU cards occupied by the offline task is not limited, and any embodiment that K1 GPU cards can be selected is applicable to the embodiment of the present application. The following examples illustrate:

for example, a random algorithm may be used to randomly select K1 GPU cards from the N GPU cards currently occupied by the offline task. The adopted random algorithm can be a hash algorithm, and hash operation can be performed on identification information such as the numbers of the N GPU cards by using a hash function to obtain a hash value; k1 GPU cards with the minimum hash value are selected from the GPU cards; alternatively, K1 GPU cards with the largest hash value may be selected from among the GPU cards.

For another example, the K1 GPU cards closest to the source of the application traffic to be processed may be selected from the N GPU cards currently occupied by the offline task according to the source of the application traffic to be processed and the geographic locations of the N GPU cards currently occupied by the offline task. The application traffic to be processed refers to application traffic that causes or triggers resource expansion for the online task. For example, in a live application scenario, assuming that the number of live users in city a is suddenly increased, in order to have enough beauty or special effects tasks to beautify or special effects on video content of these live users, K1 GPU cards need to be added to run the beauty or special effects tasks, and in this case, K1 GPU cards closest to city a may be selected from the N GPU cards currently occupied by the offline tasks according to their physical positions. For example, a GPU card located within city a may be preferentially selected; if there are not enough K1 GPU cards located in city A, then the GPU card in city B that is closest to city A may be selected, and so on until K1 GPU cards are selected. It should be noted that the GPU cards in the cloud resource scheduling system 100 may be distributed in different geographic locations, and of course, may be centralized in a certain geographic range.

In the embodiment of the application, when receiving a resource capacity reduction request submitted by a first client, a resource scheduling node may determine, according to the resource capacity reduction request, a GPU card number K2 that a task under a line may increase, and send a resource increase notification to a resource management and control node, where the resource increase notification carries a GPU card number K2 that a task under a line may increase. Based on the above, the resource management and control node also receives a resource increase notification, and adds K2 GPU cards for the offline task based on the currently vacant GPU cards according to the resource increase notification; the current spare GPU cards comprise M2 GPU cards released by online tasks; m2 and K2 are natural numbers, K2 is more than or equal to 1 and less than or equal to M2, K2+ N is more than or equal to L2, and N is the number of GPU cards currently occupied by offline tasks.

Optionally, an embodiment of adding K2 GPU cards for an offline task based on a currently spare GPU card includes: k2 GPU cards are selected from the current vacant GPU cards, and offline tasks are deployed on the K2 GPU cards, so that the purpose of capacity expansion for the offline tasks is achieved. Alternatively, another embodiment of adding 2 GPU cards for the offline task based on the currently empty GPU cards includes: and in the case that the number (K2+ N) of the GPU cards is smaller than the number of the current vacant GPU cards, selecting (K2+ N) GPU cards from the current vacant GPU cards, redeploying the offline tasks in the selected (K2+ N) GPU cards, and controlling the N GPU cards occupied before the offline tasks to finish the offline tasks.

In an alternative embodiment, the offline task is a model training task and the online task is a model reasoning task. Under the condition that the offline task is a model training task, the total number of samples T, the number of GPU cards N for executing the offline task, the size bs of the batch size and the size sp of the training step number satisfy the following relations: t ═ N × bs × sp. Therefore, in the process of controlling the model training task to release K1 GPU cards, under the condition of keeping the original batch size of the model training task unchanged, the increased step number sp _1 of the model training task can be calculated according to the number of GPU cards (N-K1) which can be occupied by the off-line task after the K1 GPU cards are released and the total number T of samples of the model training task; and controlling (N-K1) GPU cards which can be occupied by the model training task to continuously execute the model training task according to the original batch size bs and the increased step number sp _ 1. Wherein, gradient is synchronized every time model training is executed for sp _1 times among (N-K1) GPU cards which can be occupied by the model training task. Wherein sp _1 ═ T/((N-K1) × bs).

Corresponding to the situation that the GPU cards are released by the model training task, the batch size of the model training task can be kept unchanged in the process of adding the GPU cards to the model training task, the number of steps of model training executed by each GPU card in the process of synchronizing the gradient once is changed, the number of communication times of the synchronous gradient is ensured to be unchanged as much as possible, and the change of execution logic of the model training task is reduced. Taking the case of adding K2 GPU cards to the model training task in the resource capacity reduction request as an example, in order to control the batch size of the model training task to be unchanged, in the process of adding K2 GPU cards to the model training task, it may be determined whether the current training step number sp of the model training task is greater than 1, and when the current training step number sp is greater than 1, it is determined that the model training task has been subjected to capacity reduction, and in order to ensure that the batch size of the model training task is unchanged, the reduced training step number sp _2 of the model training task may be calculated according to the number of GPU cards (K2+ N) occupied by adding K2 GPU cards to the model training task and the total number T of samples of the model training task, and the (K2+ N) GPU cards finally executing the model training task are controlled to execute the model training task with the original batch size and the reduced training step number sp _ 2.

Further, for the condition that the GPU cards are added to the model training task, if the current training step number sp of the model training task is judged to be 1, which means that the training step number cannot be continuously reduced, the model training task can be uniformly distributed on each GPU card by reducing the batch size, and the successful execution of the model training task is ensured.

According to the cloud resource scheduling method, the resource control nodes and the resource scheduling nodes are matched with each other, the number of GPU cards occupied by offline tasks can be dynamically adjusted according to the requirements of the online tasks on the GPU cards, the resource utilization rate can be improved, and resource waste can be reduced. Under the condition that the task is the model training task on line, when a GPU card is released or added to the model training task, the batch size of the model training task is ensured to be unchanged by increasing or decreasing the number of steps of the model training task, the number of synchronous communication times is further ensured to be unchanged, and communication resources are saved.

In the above method embodiment, the process of the cloud resource scheduling method is described by taking the GPU card as an example, but is not limited to the GPU card. The cloud resource scheduling method provided by the embodiment of the application is suitable for various cloud computing resources such as a CPU (central processing unit), a GPU (graphic processing unit), an FPGA (field programmable gate array) board card, an ECS (electronic control system) example, an EGS (enterprise content server) example and the like. As shown in fig. 3b, a cloud resource scheduling method described from the perspective of a resource scheduling node includes:

31b, receiving a resource adjustment request submitted by the first client, wherein the online task and the offline task of the first client share the cloud computing resource allocated by the first client.

And 32b, dynamically adjusting the resource amount respectively occupied by the online task and the offline task in the cloud computing resources of the first client according to the resource adjustment request.

In an optional embodiment, when the resource adjustment request is a resource expansion request, the off-line task releases at least part of currently occupied cloud computing resources, and the on-line task runs on the cloud computing resources released by the off-line task to expand the resources for the on-line task.

In an optional embodiment, when the resource adjustment request is a resource capacity reduction request, the online task is controlled to release at least part of currently occupied cloud computing resources, and resource capacity expansion is performed for the offline task based on the cloud computing resources released by the online task.

Further optionally, when the task under the control line releases at least part of the currently occupied cloud computing resources, the amount of the cloud computing resources that can be released by the task under the control line may be determined, and a resource release notification is sent to a resource management and control node responsible for managing the cloud computing resources that are occupied by the task under the control line, where the resource release notification includes the amount of the cloud computing resources that can be released by the task under the control line. Correspondingly, as shown in fig. 3c, the process of resource release by a task under the control line of the resource management and control node includes the following steps:

31c, receiving a resource release notification sent by the resource scheduling node, wherein the resource release notification includes the amount of cloud computing resources that can be released by the offline task.

And 32c, selecting the cloud computing resources which can be released from the cloud computing resources currently occupied by the offline task according to the cloud computing resource amount contained in the resource release notification.

And 33c, controlling the selected cloud computing resources to finish off-line tasks so as to release the selected cloud computing resources.

In an alternative embodiment, the cloud computing resources to which the first customer has been allocated are multiple GPU cards. Then, when the cloud computing resources that have been allocated by the first client are multiple GPU cards, reference may be made to the description of the foregoing embodiment for detailed implementation of the operations, which is not described herein again. Optionally, the cloud computing resource that the first client has already allocated may also be a CPU, an FPGA board, an ECS instance, or an EGS instance, and for detailed implementation of the corresponding operations, reference may also be made to the foregoing embodiment in which the GPU card is used as an example, which is not described herein again.

It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 31a to 33a may be device a; for another example, the execution subject of

steps

31a and 32a may be device a, and the execution subject of step 33a may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 31a, 32a, etc., are merely used for distinguishing various operations, and the sequence numbers themselves do not represent any execution order. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Fig. 4 is a schematic structural diagram of a resource scheduling node according to an exemplary embodiment of the present application, where the resource scheduling node is configured to dynamically adjust resources used by an online task and an offline task. As shown in fig. 4, the resource scheduling node includes: a processor 401, a memory 402, and a communication component 403.

A memory 402 for storing computer programs and may be configured to store other various data to support operations on the resource scheduling node. Examples of such data include instructions, data, messages, pictures, videos, etc. for any application or method operating on the resource scheduling node.

The processor 401 is coupled to the memory 402 for executing the computer program stored in the memory 402 for: receiving a resource adjustment request submitted by a first client, wherein the online task and the offline task of the first client share a plurality of distributed GPU cards; and dynamically adjusting the number of GPU cards respectively occupied by the online tasks and the offline tasks in the multiple GPU cards according to the resource adjustment request.

In some optional embodiments, when the resource adjustment request is a resource expansion request, and the processor 401 dynamically adjusts the number of GPU cards respectively occupied by the online task and the offline task in the multiple GPU cards, it is specifically configured to: and the task under the control line releases at least part of the GPU card occupied currently, and the task on the line runs on the GPU card released by the task under the line so as to expand the resource of the task on the line.

In some optional embodiments, when the resource adjustment request is a resource capacity reduction request, and the processor 401 is specifically configured to, when dynamically adjusting the GPU card numbers respectively occupied by the online task and the offline task in the cloud computing resource: and releasing at least part of the GPU cards occupied currently by the tasks on the control line, and performing resource expansion for the tasks off the line based on the GPU cards released by the tasks on the line.

In some optional embodiments, when the processor 401 releases at least a part of the GPU cards currently occupied by the under-control-line task, it is specifically configured to: determining the GPU card number K1 releasable by the offline task according to the resource capacity expansion request, and sending a resource release notification to the resource control node, wherein the resource release notification carries the GPU card number K1, so that the resource control node can release K1 GPU cards from the currently occupied N GPU cards according to the resource release notification; wherein K1 and N are natural numbers, and K1 is more than or equal to 1 and less than or equal to N.

In some optional embodiments, when determining the GPU card number K1, the processor 401 is specifically configured to: analyzing the GPU card number M1 which needs to be increased for the on-line task from the resource capacity expansion request; determining the GPU card number K1 releasable by the offline task according to the GPU card number N currently occupied by the offline task and the lower limit L1 of resources required by the offline task; wherein M1 and L1 are natural numbers, L1 is more than or equal to 1, and K1 is more than or equal to M1.

In some optional embodiments, the processor 401 is further configured to: under the condition that the number N of the GPU cards occupied by the offline task at present is larger than the lower limit L1 of resources required by the offline task, calculating a difference value (N-L1) between the number N of the GPU cards and the lower limit L1 of the resources; and when the difference value (N-L1) is greater than or equal to M1, determining that the number of GPU cards K1 releasable by the offline task is M1; when the difference (N-L1) is smaller than M1, the releasable GPU card number K1 of the offline task is determined to be the difference (N-L1).

In some optional embodiments, the processor 401 is further configured to: when the GPU card number K1 is less than M1, GPU cards are allocated (M1-K1) for the online task from the GPU cards which are not allocated to any client in the resource pool.

In some optional embodiments, when allocating (M1-K1) GPU cards to the online task, processor 401 is specifically configured to: returning prompt information of insufficient resources to the first client, wherein the prompt information carries the number of the lacking GPU cards (M1-K1) so that the first client can determine whether to continuously apply for the GPU cards from the resource pool; and after receiving a request submitted by the first client for continuously applying for the GPU cards from the resource pool, allocating (M1-K1) GPU cards for the online task from the GPU cards which are not allocated to any client in the resource pool.

In some optional embodiments, when resource expansion is performed for the offline task by the GPU card released based on the online task, the processor 401 is specifically configured to: and determining the GPU card number K2 which can be increased by the offline task according to the resource upper limit L2 required by the offline task and the current vacant GPU card number, and sending a resource increase notice to the resource management and control node, wherein the resource increase notice carries the GPU card number K2, so that the resource management and control node can increase K2 GPU cards for the offline task according to the resource increase notice. The current spare GPU cards comprise M2 GPU cards released by online tasks; m2 and K2 are natural numbers, K2 is more than or equal to 1 and less than or equal to M2, K2+ N is more than or equal to L2, and N is the number of GPU cards currently occupied by offline tasks.

In some alternative embodiments, the offline task is a model training task; accordingly, the online task is a model inference task. For the description of the model training task and the model inference task, reference may be made to the foregoing embodiments, which are not described herein again.

It should be noted that the resource scheduling node of this embodiment may not only process one resource form, which is a GPU card, but also perform the same processing on various cloud computing resources, such as a CPU, an FPGA board, an ECS instance, and an EGS instance. That is, processor 401 may be configured to: receiving a resource adjustment request submitted by a first client, wherein an online task and an offline task of the first client share the cloud computing resources which are already allocated to the first client; and dynamically adjusting the resource amount respectively occupied by the online task and the offline task in the cloud computing resources allocated to the first client according to the resource adjustment request.

Further, as shown in fig. 4, the resource scheduling node further includes: a display 404, a power component 405, an audio component 406, and other components. Only some of the components are schematically shown in fig. 4, and it is not meant that the resource scheduling node comprises only the components shown in fig. 4. In addition, the components within the dashed line frame in fig. 4 are optional components, but not required components, and may specifically depend on the product form of the resource scheduling node. The resource scheduling node of this embodiment may be implemented as a terminal device such as a desktop computer, a notebook computer, a smart phone, or an IOT device, or may be a server device such as a conventional server, a cloud server, or a server array. If the resource scheduling node of this embodiment is implemented as a terminal device such as a desktop computer, a notebook computer, and a smart phone, the resource scheduling node may include a component within a dashed line frame in fig. 4; if the resource scheduling node of this embodiment is implemented as a server device such as a conventional server, a cloud server, or a server array, the component in the dashed box in fig. 4 may not be included.

Accordingly, the present application further provides a computer-readable storage medium storing a computer program, where the computer program can implement the steps in the method embodiment shown in fig. 2 or fig. 3b when executed.

The embodiment of the present application further provides a resource management and control node, where the resource management and control node has the same or similar structure as the resource scheduling node shown in fig. 4, and its internal structure may refer to the embodiment shown in fig. 4. The resource management and control node of this embodiment is different from the resource scheduling node shown in fig. 4 in that: the functions performed by the processor in executing the computer programs stored in the memory vary. The resource management and control node of the embodiment is used for scheduling the multiple GPU cards allocated by the first client, and the online task and the offline task of the first client share the multiple GPU cards; its processor executes computer programs in memory for: receiving a resource release notification sent by a resource scheduling node, wherein the resource release notification comprises GPU card number K1 which can be released by an offline task; selecting K1 GPU cards from the N GPU cards currently occupied by the offline task according to the resource release notice; controlling the K1 GPU cards to finish off-line tasks so as to release K1 GPU cards; wherein K1 and N are natural numbers, and K1 is more than or equal to 1 and less than or equal to N.

In some optional embodiments, where the offline task is a model training task, the processor is to: calculating the increased training steps of the model training task according to the GPU card number (N-K1) and the total number of samples of the model training task with the aim of keeping the batch size of the model training task unchanged; (N-K1) GPU cards which can be occupied by the control model training task continue to execute the model training task according to the original batch size and the increased training steps; wherein the gradient is synchronized once for each increased number of training steps executed between the (N-K1) GPU cards.

In some optional embodiments, when selecting K1 GPU cards, the processor is specifically configured to: selecting K1 GPU cards closest to the source of the application traffic to be processed from N GPU cards occupied by the offline task currently; or randomly selecting K1 GPU cards from the N GPU cards currently occupied by the offline task.

In some optional embodiments, the processor is further configured to: receiving a resource increasing notice sent by a resource scheduling node, wherein the resource increasing notice carries GPU card number K2 which can be increased by tasks under a wire; the resource increasing notification is sent by the resource scheduling node after the on-line task releases M2 GPU cards; and according to the resource increase notification, based on the currently vacant GPU cards, adding K2 GPU cards for the offline task. The current spare GPU cards comprise M2 GPU cards released by online tasks; m2 and K2 are natural numbers, K2 is more than or equal to 1 and less than or equal to M2, K2+ N is more than or equal to L2, and N is the number of GPU cards currently occupied by offline tasks.

In some optional embodiments, when adding K2 GPU cards for the offline task based on the currently empty GPU card, the processor is specifically configured to: selecting K2 GPU cards from the current vacant GPU cards, and deploying offline tasks on K2 GPU cards; or in the case that the number (K2+ N) of the GPU cards is smaller than the number of the current vacant GPU cards, selecting (K2+ N) GPU cards from the current vacant GPU cards, relocating the offline tasks on the selected (K2+ N) GPU cards, and controlling the N GPU cards occupied before the offline tasks to finish the offline tasks.

In some optional embodiments, where the offline task is a model training task, the processor is to: judging whether the current training step number of the model training task is more than 1; if the number of the GPU cards is larger than 1, in the process of adding K2 GPU cards for the model training task, keeping the batch size of the model training task unchanged, calculating the reduced training step number of the model training task according to the GPU card number (K2+ N) and the total number of samples of the model training task, and controlling the (K2+ N) GPU cards which finally execute the model training task to execute the model training task according to the original batch size and the reduced training step number.

In some optional embodiments, the resource management and control node provided in this embodiment may be deployed on one or more GPU cards occupied by the offline task, and the GPU cards on which the resource management and control node is deployed may not be released.

It should be noted that the resource management and control node of this embodiment can not only process one resource form, i.e., a GPU card, but also perform the same processing on various cloud computing resources, such as a CPU, an FPGA board, an ECS instance, and an EGS instance. That is, the processor in the resource managing node may be configured to: receiving a resource release notification sent by a resource scheduling node, wherein the resource release notification comprises the cloud computing resource amount which can be released by an offline task; according to the cloud computing resource amount which can be released by the offline task and is contained in the resource release notice, selecting the cloud computing resource which can be released by the offline task from the cloud computing resources currently occupied by the offline task; and controlling the selected cloud computing resources to finish off-line tasks so as to release the selected cloud computing resources.

Accordingly, the present application further provides a computer-readable storage medium storing a computer program, where the computer program can implement the steps in the method embodiment shown in fig. 3a when executed.

The communication component of fig. 4 described above is configured to facilitate communication between the device in which the communication component is located and other devices in a wired or wireless manner. The device where the communication component is located can access a wireless network based on a communication standard, such as a WiFi, a 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component may further include a Near Field Communication (NFC) module, Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and the like.

The display in fig. 4 described above includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The power supply assembly of fig. 4 described above provides power to the various components of the device in which the power supply assembly is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

The audio component of fig. 4 described above may be configured to output and/or input an audio signal. For example, the audio assembly includes a microphone (M1IC) configured to receive external audio signals when the device in which the audio assembly is located is in an operational mode, such as a call mode, a recording mode, and a speech recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM1, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

Memory may include forms of volatile memory in a computer readable medium, random access memory (RAM1) and/or non-volatile memory, such as read only memory (ROM1) or flash memory (flash RAM 1). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM1), static random access memory (SRAM1), dynamic random access memory (DRAM1), other types of random access memory (RAM1), read only memory (ROM1), electrically erasable programmable read only memory (EEPROM1), flash memory or other memory technology, compact disc read only memory (CD-ROM1), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media (transmission M1edia), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A cloud resource scheduling system, comprising: a resource scheduling node and a plurality of GPU cards already allocated to a first customer, wherein the GPU cards can be shared by an online task and an offline task of the first customer;

the resource scheduling node is used for receiving a resource adjustment request submitted by a first client; and dynamically adjusting the number of GPU cards respectively occupied by the online tasks and the offline tasks in the multiple GPU cards according to the resource adjustment request.

2. The system according to claim 1, wherein the resource scheduling node is specifically configured to:

and under the condition that the resource adjustment request is a resource expansion request, controlling the offline task to release at least part of the currently occupied GPU cards, and operating the online task on the GPU cards released by the offline task to expand the resources of the online task.

3. The system of claim 2, wherein the resource scheduling node is further configured to:

and under the condition that the resource adjustment request is a resource capacity reduction request, controlling the online task to release at least part of the currently occupied GPU cards, and performing resource capacity expansion for the offline task based on the GPU cards released by the online task.

4. The system of claim 3, further comprising: the resource control node is used for controlling the GPU card occupied by the offline task;

when the resource scheduling node controls the offline task to release at least part of the currently occupied GPU cards, the resource scheduling node is specifically configured to: determining the GPU card number K1 releasable by the offline task according to the resource capacity expansion request, and sending a resource release notification to the resource control node, wherein the resource release notification carries the GPU card number K1;

the resource management and control node is specifically configured to: according to the resource release notice, selecting K1 GPU cards from the N GPU cards currently occupied by the offline task, and controlling the K1 GPU cards to finish the offline task so as to release the K1 GPU cards; wherein K1 and N are natural numbers, and K1 is more than or equal to 1 and less than or equal to N.

5. The system according to claim 4, wherein, when determining the GPU card number K1, the resource scheduling node is specifically configured to:

analyzing the GPU card number M1 which needs to be increased for the online task from the resource capacity expansion request;

determining the GPU card number K1 releasable by the offline task according to the GPU card number N currently occupied by the offline task and the lower limit L1 of resources required by the offline task; wherein M1 and L1 are natural numbers, L1 is more than or equal to 1, and K1 is more than or equal to M1.

6. The system of claim 5, wherein the resource scheduling node is further to:

under the condition that the number N of the GPU cards occupied by the offline task currently is larger than the lower limit L1 of the resources required by the offline task, calculating a difference value (N-L1) between the number N of the GPU cards and the lower limit L1 of the resources;

and

when the difference value (N-L1) is greater than or equal to M1, determining that the releasable GPU card number K1 of the offline task is M1; when the difference (N-L1) is less than M1, determining the releasable GPU card number K1 of the offline task as the difference (N-L1).

7. The system of claim 6, further comprising: a resource pool including GPU cards that have not been assigned to any customer;

the resource scheduling node is further configured to: when the GPU card number K1< M1, allocating (M1-K1) GPU cards for the online task from GPU cards in the resource pool which are not allocated to any client.

8. The system of claim 7, wherein the resource scheduling node is specifically configured to:

returning prompt information of insufficient resources to the first client, wherein the prompt information carries the number of the lacking GPU cards (M1-K1) so that the first client can determine whether to continuously apply for the GPU cards from the resource pool; and

and after receiving a request submitted by the first client for continuously applying for GPU cards from the resource pool, allocating (M1-K1) GPU cards for the online task from GPU cards which are not allocated to any client in the resource pool.

9. The system of claim 4, wherein the offline task is a model training task;

the resource management and control node is further configured to: keeping the batch size of the model training task unchanged, calculating the increased training step number of the model training task according to the GPU card number (N-K1) and the total number of samples of the model training task, and controlling (N-K1) GPU cards which can be still occupied by the model training task to continuously execute the model training task according to the original batch size and the increased training step number; wherein the gradient is synchronized between the (N-K1) GPU cards every time the model training is executed for the increased training steps.

10. The system according to claim 4, wherein the resource management and control node, when selecting K1 GPU cards, is specifically configured to:

selecting K1 GPU cards closest to the source of the application traffic to be processed from the N GPU cards currently occupied by the offline task; or

And randomly selecting K1 GPU cards from the N GPU cards currently occupied by the offline task.

11. The system of claim 4, wherein the resource management node is deployed on one or more GPU cards occupied by the offline task, and the GPU cards with the resource management node deployed thereon cannot be released.

12. The system according to any one of claims 4 to 11, wherein the resource scheduling node, when performing resource expansion for the offline task, is specifically configured to:

determining the number K2 of GPU cards which can be increased by the offline task according to the upper limit L2 of resources required by the offline task and the number of the GPU cards which are vacant currently, and sending a resource increase notice to the resource management and control node, wherein the resource increase notice carries the number K2 of the GPU cards;

the resource management and control node is further configured to: according to the resource increase notification, based on the current vacant GPU cards, adding K2 GPU cards for the offline task; the current spare GPU cards comprise M2 GPU cards released by the online tasks; m2 and K2 are natural numbers, K2 is more than or equal to 1 and less than or equal to M2, K2+ N is more than or equal to L2, and N is the number of GPU cards currently occupied by the offline task.

13. The system according to claim 12, wherein when adding K2 GPU cards to the offline task, the resource management and control node is specifically configured to:

selecting K2 GPU cards from the current vacant GPU cards, and deploying the offline tasks on the K2 GPU cards;

or

And in the case that the number of the (K2+ N) GPU cards is less than or equal to the number of the current vacant GPU cards, selecting (K2+ N) GPU cards from the current vacant GPU cards, redeploying the offline task in the selected (K2+ N) GPU cards, and controlling the N GPU cards occupied before the offline task to end the offline task.

14. The system of claim 13, wherein the offline task is a model training task; the resource management and control node is further configured to:

judging whether the current training step number of the model training task is more than 1;

if the number of the GPU cards is larger than 1, keeping the batch size of the model training task unchanged in the process of adding K2 GPU cards to the model training task, calculating the reduced training step number of the model training task according to the GPU card number (K2+ N) and the total number of samples of the model training task, and controlling the (K2+ N) GPU cards which finally execute the model training task to execute the model training task according to the original batch size and the reduced training step number.

15. A cloud resource scheduling system, comprising: a resource scheduling node and cloud computing resources already allocated to a first customer, the cloud computing resources of the first customer being shareable by an online task and an offline task of the first customer;

the resource scheduling node is used for receiving a resource adjustment request submitted by a first client; and dynamically adjusting the resource amount respectively occupied by the online task and the offline task in the cloud computing resources of the first client according to the resource adjustment request.

16. The system according to claim 15, wherein the resource scheduling node is specifically configured to:

and under the condition that the resource adjustment request is a resource expansion request, controlling the offline task to release at least part of currently occupied cloud computing resources, and operating the online task on the cloud computing resources released by the offline task to expand the resources of the online task.

17. The system of claim 15 or 16, wherein the resource scheduling node is further configured to:

and under the condition that the resource adjustment request is a resource capacity reduction request, controlling the online task to release at least part of currently occupied cloud computing resources, and carrying out resource capacity expansion on the offline task based on the cloud computing resources released by the online task.

18. A cloud resource scheduling method is suitable for a resource scheduling node, and comprises the following steps:

receiving a resource adjustment request submitted by a first client, wherein the online task and the offline task of the first client share a plurality of distributed GPU cards;

and dynamically adjusting the number of GPU cards respectively occupied by the online tasks and the offline tasks in the multiple GPU cards according to the resource adjustment request.

19. The method of claim 18, wherein dynamically adjusting the number of GPU cards respectively occupied by the online task and the offline task in the plurality of GPU cards according to the resource adjustment request comprises:

under the condition that the resource adjustment request is a resource expansion request, controlling the offline task to release at least part of the currently occupied GPU cards; and

and running the online task on the GPU card released by the offline task to perform resource expansion on the online task.

20. The method of claim 19, wherein controlling the offline task to release at least a portion of the GPU cards currently occupied comprises:

determining a GPU card number K1 releasable by the offline task according to the resource capacity expansion request;

and sending a resource release notice to a resource control node, wherein the resource release notice carries the GPU card number K1, so that the resource control node releases the K1 GPU cards currently occupied by the offline task.

21. The method of claim 20, wherein determining the releasable GPU card number K1 of the offline task according to the resource expansion request comprises:

22. The method of claim 21, wherein determining the releasable GPU card number K1 of the offline task according to the currently occupied GPU card number N of the offline task and the lower limit of resources L1 required by the offline task comprises:

under the condition that the number N of the GPU cards occupied by the offline task currently is larger than the lower limit L1 of the resources required by the offline task, calculating a difference value (N-L1) between the number N of the GPU cards and the lower limit L1 of the resources; and

when the difference value (N-L1) is greater than or equal to M1, determining that the releasable GPU card number K1 of the offline task is M1;

when the difference (N-L1) is less than M1, determining the releasable GPU card number K1 of the offline task as the difference (N-L1).

23. The method of claim 22, further comprising:

when the GPU card number K1< M1, allocating (M1-K1) GPU cards for the online task from a resource pool; GPU cards which are not allocated to any client are included in the resource pool.

24. The method of claim 23, wherein allocating (M1-K1) GPU cards from a resource pool for the online task comprises:

and after receiving a request submitted by the first client to continuously apply for the GPU card from the resource pool, allocating (M1-K1) GPU cards for the online task from the resource pool.

25. The method according to any of claims 18-24, wherein dynamically adjusting the number of GPU cards respectively occupied by the online task and the offline task in the plurality of GPU cards according to the resource adjustment request further comprises:

under the condition that the resource adjustment request is a resource capacity reduction request, controlling the online task to release at least part of the currently occupied GPU cards; and

and carrying out resource expansion for the offline task based on the GPU card released by the online task.

26. The method of claim 25, wherein the expanding the resources for the offline task based on the GPU card released by the online task comprises:

determining the number K2 of GPU cards which can be increased by the offline task according to the upper limit L2 of resources required by the offline task and the number of the GPU cards which are vacant currently;

sending a resource increase notification to a resource control node, wherein the resource increase notification carries GPU card number K2, so that the resource control node can increase K2 GPU cards for the offline task;

the current spare GPU cards comprise M2 GPU cards released by the online tasks; m2 and K2 are natural numbers, K2 is more than or equal to 1 and less than or equal to M2, K2+ N is more than or equal to L2, and N is the number of GPU cards currently occupied by the offline task.

27. A cloud resource scheduling method is suitable for a resource control node and used for scheduling a plurality of GPU cards allocated by a first client, wherein online tasks and offline tasks of the first client share the GPU cards; the method comprises the following steps:

receiving a resource release notification sent by a resource scheduling node, wherein the resource release notification comprises GPU card number K1 which can be released by the offline task;

selecting K1 GPU cards from the N GPU cards currently occupied by the offline task according to the resource release notice;

controlling the K1 GPU cards to finish off-line tasks so as to release the K1 GPU cards; wherein K1 and N are natural numbers, and K1 is more than or equal to 1 and less than or equal to N.

28. The method of claim 27, where the offline task is a model training task, further comprising:

keeping the batch size of the model training task unchanged, and calculating the increased training steps of the model training task according to the GPU card number (N-K1) and the total number of samples of the model training task;

controlling (N-K1) GPU cards which can be occupied by the model training task to continuously execute the model training task according to the original batch size and the increased training steps; wherein the gradient is synchronized between the (N-K1) GPU cards every time the model training is executed for the increased training steps.

29. The method of claim 27, wherein selecting K1 GPU cards from the N GPU cards currently occupied by the offline task comprises:

30. The method according to any one of claims 27-29, further comprising:

receiving a resource increase notification sent by a resource scheduling node, wherein the resource increase notification carries GPU card number K2 which can be increased by the offline task; the resource increase notification is sent by the resource scheduling node after the online task releases M2 GPU cards;

according to the resource increase notification, based on the current vacant GPU cards, adding K2 GPU cards for the offline task;

31. The method of claim 30, wherein adding K2 GPU cards for the offline task based on the currently free GPU cards comprises:

or

And if the number of the (K2+ N) GPU cards is less than the number of the current vacant GPU cards, selecting (K2+ N) GPU cards from the current vacant GPU cards, redeploying the offline task in the selected (K2+ N) GPU cards, and controlling the N GPU cards occupied before the offline task to end the offline task.

32. The method of claim 31, where the offline task is a model training task, further comprising:

33. A cloud resource scheduling method is suitable for a resource scheduling node, and comprises the following steps:

receiving a resource adjustment request submitted by a first client, wherein the online task and the offline task of the first client share the cloud computing resource allocated to the first client;

and dynamically adjusting the resource amount respectively occupied by the online task and the offline task in the cloud computing resources according to the resource adjustment request.

34. A cloud resource scheduling method is suitable for a resource management and control node and used for scheduling cloud computing resources allocated to a first customer, and online tasks and offline tasks of the first customer share the allocated cloud computing resources; the method comprises the following steps:

receiving a resource release notification sent by a resource scheduling node, wherein the resource release notification comprises the cloud computing resource amount which can be released by an offline task;

according to the cloud computing resource amount contained in the resource release notification, selecting cloud computing resources which can be released from the cloud computing resources currently occupied by the offline task;

and controlling the selected cloud computing resource to finish off-line tasks so as to release the selected cloud computing resource.

35. A resource scheduling node is used for dynamically adjusting the use resources of an online task and an offline task; the apparatus comprises: a memory and a processor;

the memory stores a computer program;

the processor is to execute the computer program to:

36. A resource management and control node, configured to schedule multiple GPU cards to which a first client has been allocated, where an online task and an offline task of the first client share the multiple GPU cards; the apparatus comprises: a memory and a processor;

the memory stores a computer program;

the processor is to execute the computer program to:

37. A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, causes the processor to carry out the steps of the method of any one of claims 18-34.