CN113791906A

CN113791906A - Scheduling system and optimization algorithm based on GPU resources in artificial intelligence and engineering fields

Info

Publication number: CN113791906A
Application number: CN202111081087.6A
Authority: CN
Inventors: 唐维昌
Original assignee: Daisy Shanghai Software Co ltd
Current assignee: Daisy Shanghai Software Co ltd
Priority date: 2021-08-09
Filing date: 2021-09-15
Publication date: 2021-12-14

Abstract

The invention discloses a scheduling system and an optimization algorithm in the field of artificial intelligence and engineering based on GPU resources, wherein the scheduling system comprises: the human-family interaction module is used for enabling an operator to visually operate; the resource management module is used for establishing usable GPU resources into a GPU resource pool, the GPU resource pool is composed of a plurality of nodes, each node corresponds to one device, and the internal multi-card ID relationship of each node GPU and the available resource change of the node GPU are fed back to the GPU resource pool in real time so as to facilitate the dynamic management of the resource management module; each GPU periodically feeds back a use state including a GPU idle state, a GPU resource utilization rate and task completion to the resource management module, and the resource management module releases GPU resources after acquiring a task completion signal so as to be convenient for use of subsequent tasks; and the scheduling module is used for carrying out calculation amount budget on the tasks input by the human-computer interaction module and distributing GPU resources according to the calculated calculation amount so as to realize balance between improvement of calculation speed and reasonable utilization of the GPU resources.

Description

Scheduling system and optimization algorithm based on GPU resources in artificial intelligence and engineering fields

Technical Field

The invention relates to a computer technology, in particular to a scheduling optimization algorithm based on GPU resources in the fields of artificial intelligence and engineering.

Background

For GPU resource requirements on a computing platform in the fields of artificial intelligence and engineering, based on the characteristic that a GPU is different from a CPU, the number of CPUs on servers with certain specifications is constant, the number of GPUs on each server is quite possibly inconsistent, and a traditional CPU-based scheduling mode can schedule on the basis of the CPU resources with constant number but cannot schedule GPU resources with uncertain number, so that the traditional CPU resource scheduling technology is not suitable for GPU resource scheduling.

At present, the technologies for GPU resource scheduling mainly include: the techniques of the Nvidia GPU Docker, churm, and Nvidia GPU Docker can realize batch scheduling, but cannot realize interactive high-performance scheduling, and do not have a priority scheduling algorithm; slurm belongs to an open source technology, and has weak scheduling algorithm and no interaction and priority. Both the two technologies are researched and developed abroad, are widely used, have wide customer groups, are limited to dispatching CPUs, cannot meet the dispatching requirements of different numbers of GPUs of different nodes used in the professional field, cannot dispatch the GPUs across nodes, are researched and developed abroad, and have potential safety hazards.

Disclosure of Invention

In view of the above defects in the prior art, the technical problem to be solved by the present invention is to provide a scheduling system and an optimization algorithm in the artificial intelligence and engineering field based on GPU resources, which can realize efficient scheduling of GPU resources.

In order to achieve the above object, the present invention provides a scheduling system based on GPU resources in the field of artificial intelligence and engineering, comprising:

the human-family interaction module is used for enabling an operator to visually operate, transmitting an instruction of operating the 3D model by the client to the server, and feeding back the calculated operand of the 3D model to the client after the server calculates the operand of the 3D model, so that the 3D model operated by the client is not different from the 3D model operated by the server in a cloud computing mode;

the resource management module is used for establishing usable GPU resources into a GPU resource pool, the GPU resource pool is composed of a plurality of nodes, each node corresponds to one device, and the internal multi-card ID relationship of each node GPU and the available resource change of the node GPU are fed back to the GPU resource pool in real time so as to facilitate the dynamic management of the resource management module; each GPU periodically feeds back a use state including a GPU idle state, a GPU resource utilization rate and task completion to the resource management module, and the resource management module releases GPU resources after acquiring a task completion signal so as to be convenient for use of subsequent tasks;

and the scheduling module is used for carrying out calculation amount budget on the tasks input by the human-computer interaction module and distributing GPU resources according to the calculated calculation amount so as to realize balance between improvement of calculation speed and reasonable utilization of the GPU resources.

Further, the scheduling mode of the scheduling module includes:

1) autonomously identifying the model scale and matching scheduling, and allocating GPU resources for operation according to the calculation power required by the input task budget and the required calculation power;

2) firstly, after the GPU resources required by the task budget are calculated, matching the tasks according to the GPU resources of each node, preferentially distributing the tasks to the same node, preferentially distributing the tasks to idle GPU operation, and when the GPU corresponds to a plurality of tasks, sequentially queuing the tasks for operation;

3) the optimal solution is that the GPU resource which is operated fastest is selected on the premise of not influencing the calculation of other tasks according to the task quantity and the current GPU resource state;

4) the priority level is divided according to the emergency, and a preset threshold value is adopted as the dividing basis; when the priorities need to be allocated in the tasks, a priority parameter is directly given to the tasks, the scheduling module compares the priority parameter carried by the tasks with each priority threshold value, converts the priority level of the tasks, preferentially uses GPU resources according to the priority level, allocates the GPU resources according to an optimal solution mode by priority default, and preferentially occupies the optimal GPU resources for the tasks with higher priorities by adopting a shortest time principle;

in the priority operation priority task with priority corresponding to the GPU, a plurality of priorities corresponding to the GPU are queued and sequentially calculated according to priority parameter values, and a plurality of priorities corresponding to the GPU with equal priority parameters are queued and operated according to a first-in first-out principle.

Furthermore, the budgeting mode of the model mainly comprises the following steps:

budgeting the computation required for completing the specified task by using the data volume of the 3D model as a standard according to the model quantity budget;

according to the input value budget, taking the operation amount input by an operator at a client as a budget operation amount;

establishing an operand estimation model according to the operand budget algorithm budget, and training by using a large amount of operation data so that the operand budget algorithm can automatically estimate the required operand according to the characteristics of the current task;

and dynamically adjusting, namely monitoring the operation progress in the operation process, increasing GPU resources if the operation progress is lower than expected, and properly reducing the GPU resources if the operation progress is higher than expected so as to ensure that each operation task is performed quickly and smoothly.

The invention also discloses a scheduling optimization algorithm in the fields of artificial intelligence and engineering based on GPU resources, which comprises the following steps:

s100, dynamically updating GPU resource pool information by a resource management module;

s200, an operator inputs a task needing to be operated at a client through a man-machine interaction module, the task needing to be operated is uploaded to a server, and the server forwards the task to a scheduling module for processing;

s300, the scheduling module firstly checks whether the task contains the priority parameter, if so, the task enters priority scheduling, and if not, the task is scheduled in a common scheduling mode.

Further, S100 further includes:

s110, periodically requesting information to each GPU, feeding back a signal by each GPU according to the request information, and judging whether the GPU can be normally used or not by the GPU resource pool according to the fed-back signal, so that the GPU which cannot be used is found in time, and the GPU is prevented from being used in the subsequent task allocation;

s120, when the number of GPUs of each node is increased or decreased, the node sends updating information to a GPU resource pool, and the GPU resource pool updates GPU resources of the node according to the information;

and S130, in the operation process, each node collects state information of the GPU therein and feeds the state information back to the resource management module periodically, the state information of the GPU comprises an idle state of the GPU, a resource utilization rate when the GPU runs and a task completion state, and the resource management module releases the computing power of the GPU after acquiring the task completion state information, so that the operation of the next task is started, and task assignment is carried out on each GPU according to the state of the GPU.

Further, S300 further includes:

s310, dividing priority levels according to emergency situations, wherein a preset threshold is adopted as a dividing basis; when the priority needs to be distributed in the task, priority parameters are directly given to the task, the scheduling module compares the priority parameters carried by the task with various priority threshold values, the priority level of the task is converted, and then GPU resources are preferentially used according to the priority level;

allocating GPU resources according to an optimal solution mode by default according to the priority, and preferentially occupying the optimal GPU resources for tasks with higher priorities by adopting a shortest time principle;

Further, S300 further includes: s320, the common scheduling method includes:

s321, autonomously identifying model scale and matching scheduling, and allocating GPU resources for operation according to the calculation power required by the input task budget and the required calculation power;

s322, firstly, after the GPU resources required by the task budget are calculated, matching the tasks according to the GPU resources of each node, preferentially distributing the tasks to the same node, preferentially distributing the tasks to idle GPU operation, and when the GPU corresponds to a plurality of tasks, sequentially queuing the tasks for operation;

and S323, selecting the GPU resource which is operated fastest on the premise of not influencing the calculation of other tasks according to the task amount and the current GPU resource state by the optimal solution.

The invention also discloses an interaction method of the 3D model, which applies the optimization method; the method specifically comprises the following steps:

s1, the client side obtains the interactive operation instruction input by the user and then transmits the interactive operation instruction to the server side;

s2, the server side completes graph operation according to the obtained interactive operation instruction to obtain the graph variable quantity in the operation process;

s3, the server side forms an image set according to the image variable quantity obtained in the S2 and the frame extraction information, then encodes and compresses the image set according to the video stream, and transmits the compressed image set to the client side;

and S4, decoding the compressed image set received by the client, and continuously playing according to the information of the video stream to realize the feedback of the interactive operation instruction input by the user.

The invention has the beneficial effects that:

1. according to the automatic discovery and arrangement technical method, after a task requirement is received, a flexible temporary GPU resource pool is dynamically established according to the quantity of resources required by the task, so as to respond to different artificial intelligence and application calculation requirements in the engineering field, the task calculation is finished, the GPU resource group is released, the GPU returns to the shared resource pool, and the large-scale calculation force requirement of the artificial intelligence is met.

2. The scheduling algorithm of the scheduling module supports various modes under a CPU, optimal solution scheduling, first-in first-out scheduling, autonomous recognition model scale, matching scheduling and the like, and through analysis of the algorithm on platform calculation power, an idle resource matching task meeting conditions is searched, a simulation task is run, calculation is completed, autonomous release is performed, and the next task is received.

3. According to the invention, the GPU resource scheduling algorithm is optimized, the utilization rate of GPU resources is analyzed and fed back to the user in real time through the scheduling system, and tasks submitted by the user are automatically allocated to idle GPU resources through the scheduling system for calculation, so that the number of CPUs and servers are not occupied, and the performance is rapidly improved.

4. According to the method, the priority of the data of the tasks is automatically analyzed, an optimal solution is found by comparing the parameters submitted by a user with a standard library through the analysis method, and then the priority algorithm scheduling is carried out simultaneously, so that the requirement of the prior execution of important tasks is met;

5. the invention is very suitable for all the industrial design and simulation fields with professional graphic requirements, and covers the fields of film and television design, biology and gene, aerospace, automobile manufacturing and parts, dies, chips and semiconductors, nuclear industry, industrial and civil electrical appliances, meteorology and the like.

Drawings

FIG. 1 is a schematic diagram of the system of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Referring to fig. 1, a GPU resource based scheduling system in the artificial intelligence and engineering fields includes:

the resource management module is used for establishing usable GPU resources into a GPU resource pool, the GPU resource pool is composed of a plurality of nodes, each node corresponds to one device, the internal multi-card ID relation of each node GPU and the available resource change of the node GPU are fed back to the GPU resource pool in real time, so that the resource management module can manage dynamically, if the number of GPUs is increased or decreased, corresponding information is fed back to the GPU resource pool in combination with node information, and the GPU resource pool is enabled to increase or decrease the number of GPUs, resources and other information of the node. Each GPU periodically feeds back a use state to the resource management module, mainly a GPU idle state, GPU resource utilization rate and task completion, and after the resource management module obtains a task completion signal, the GPU resource is released so as to be convenient for use of a subsequent task.

The scheduling mode of the scheduling module comprises the following steps:

1. and (4) autonomously identifying the model scale and matching scheduling, and allocating GPU resources for operation according to the input calculation power required by the task budget and the required calculation power. The model budgeting method mainly comprises the following steps:

and according to the model quantity budget, the data quantity of the 3D model is used as a standard, and the calculation quantity required for completing the specified task is budgeted.

And according to the input value budget, taking the calculation amount input by the operator at the client as the budget calculation amount.

And establishing an operand evaluation model according to the operand budget algorithm budget, and training by using a large amount of operation data, so that the operand budget algorithm can automatically estimate the required operand according to the characteristics of the current task.

2. And performing first-in first-out, after the GPU resources required by the task budget are calculated, matching the tasks according to the GPU resources of each node, preferentially distributing the tasks to the same node, preferentially distributing the tasks to idle GPU operation, and when the GPU corresponds to a plurality of tasks, sequentially queuing the tasks for operation.

3. And (3) selecting the GPU resource which is operated fastest on the premise of not influencing the calculation of other tasks according to the task amount and the current GPU resource state.

4. And the priority level is divided according to the emergency, and a preset threshold value is adopted as the dividing basis. When the priorities need to be allocated in the tasks, a priority parameter is directly given to the tasks, the scheduling module compares the priority parameter carried by the tasks with each priority threshold value, the priority level of the tasks is converted, then GPU resources are preferentially used according to the priority level, the priority defaults to allocate the GPU resources according to an optimal solution mode, and for the tasks with higher priorities, the shortest time principle is adopted, and the optimal GPU resources are preferentially occupied. In the priority operation priority task with priority corresponding to the GPU, a plurality of priorities corresponding to the GPU are queued and sequentially calculated according to priority parameter values, and a plurality of priorities corresponding to the GPU with equal priority parameters are queued and operated according to a first-in first-out principle.

The scheduling module of the implementation utilizes GPU resources to the maximum, the scheduling module is integrated into a JSS pin FOR GPU in the embodiment, the scheduling of a GPU mode is different from that of a traditional CPU, the scheduling module firstly analyzes the idle states of GPU resources of different nodes, analyzes the relation among multi-card IDs inside the GPU, records the idle states into a GPU resource pool of the JSS, feeds back the idle states in real time, and then combines the idle states with the GPU resource pool according to the requirement of a task on the GPU, can simultaneously realize 8000 GPU resources among 10000 server nodes in high-performance high-speed scheduling, and realizes orderly execution, automatic queuing, priority scheduling, resource allocation according to requirements and the like of the task.

A scheduling optimization algorithm in the fields of artificial intelligence and engineering based on GPU resources comprises the following steps:

s100, the resource management module dynamically updates GPU resource pool information, the updating mode is mainly that information is periodically requested for each GPU, each GPU feeds back signals according to the request information, and the GPU resource pool judges whether the GPU can be normally used or not according to the fed-back signals, so that the GPU which cannot be used is found in time, and the GPU is prevented from being used in subsequent task allocation. In addition, when the number of GPUs of each node is increased or decreased, the node sends updating information to the GPU resource pool, and the GPU resource pool updates the GPU resources of the node according to the information. In the operation process, each node collects the state information of the GPU therein and periodically feeds the state information back to the resource management module, the state information of the GPU comprises the idle state of the GPU, the resource utilization rate when the GPU runs and the task completion state, and the resource management module releases the computing power of the GPU after acquiring the task completion state information, so that the operation of the next task is started, and task assignment is carried out on each GPU according to the state of the GPU.

S200, an operator inputs a task needing to be operated at a client through a man-machine interaction module, the task needing to be operated is uploaded to a server, and the server forwards the task to a scheduling module for processing; in this embodiment, the human-computer interaction module transmits an instruction for operating the 3D model by the client to the server, and the server calculates the operand of the 3D model and then feeds the operand back to the client, so that the client operates the large 3D model without the difference from the server.

S310, the priority level is divided according to the emergency, and a preset threshold value is adopted as the dividing basis; when the priority needs to be distributed in the task, priority parameters are directly given to the task, the scheduling module compares the priority parameters carried by the task with various priority threshold values, the priority level of the task is converted, and then GPU resources are preferentially used according to the priority level;

in the priority operation priority task with priority corresponding to the GPU, a plurality of priorities corresponding to the GPU are queued and sequentially calculated according to priority parameter values, a plurality of priorities corresponding to the GPU are equal in priority parameter values, and queuing operation is performed according to a first-in first-out principle;

s320, the common scheduling modes mainly comprise:

s321, autonomously identifying model scale and matching scheduling, and allocating GPU resources to perform operation according to the input calculation power required by task budget and the required calculation power. The model budgeting method mainly comprises the following steps:

The invention is not described in detail, but is well known to those skilled in the art.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A scheduling system based on GPU resources in the fields of artificial intelligence and engineering is characterized by comprising:

the human-computer interaction module is used for enabling an operator to visually operate, transmitting an instruction of operating the 3D model by the client to the server, and feeding back the calculated amount of the 3D model to the client after the server calculates the calculated amount of the 3D model, so that the 3D model operated by the client is not different from the 3D model operated by the server in a cloud computing mode;

the resource management module is used for establishing usable GPU resources into a GPU resource pool, the GPU resource pool is composed of a plurality of nodes, each node corresponds to one device, and the multi-card ID relationship in each node GPU and the available resource change of the node GPU are fed back to the GPU resource pool in real time so as to facilitate the dynamic management of the resource management module; each GPU periodically feeds back a use state including a GPU idle state, a GPU resource utilization rate and task completion to the resource management module, and the resource management module releases GPU resources after acquiring a task completion signal so as to be convenient for use of subsequent tasks;

2. The system for scheduling of artificial intelligence and engineering based on GPU resources of claim 1, wherein the scheduling mode of the scheduling module comprises:

autonomously identifying the model scale and matching scheduling, and allocating GPU resources for operation according to the calculation power required by the input task budget and the required calculation power;

firstly, after the GPU resources required by the task budget are calculated, matching the tasks according to the GPU resources of each node, preferentially distributing the tasks to the same node, preferentially distributing the tasks to idle GPU operation, and when the GPU corresponds to a plurality of tasks, sequentially queuing the tasks for operation;

the optimal solution is that the GPU resource which is operated fastest is selected on the premise of not influencing the calculation of other tasks according to the task quantity and the current GPU resource state;

the priority level is divided according to the emergency, and a preset threshold value is adopted as the dividing basis; when the priorities need to be allocated in the tasks, a priority parameter is directly given to the tasks, the scheduling module compares the priority parameter carried by the tasks with each priority threshold value, converts the priority level of the tasks, preferentially uses GPU resources according to the priority level, allocates the GPU resources according to an optimal solution mode by priority default, and preferentially occupies the optimal GPU resources for the tasks with higher priorities by adopting a shortest time principle;

3. The system of claim 2, wherein the model budgeting method mainly comprises:

4. A scheduling optimization algorithm in the fields of artificial intelligence and engineering based on GPU resources is characterized by comprising the following steps:

5. The algorithm for scheduling optimization in the artificial intelligence and engineering field based on GPU resources as claimed in claim 4, wherein S100 further comprises:

6. The algorithm for scheduling optimization in the artificial intelligence and engineering field based on GPU resources as claimed in claim 4, wherein S300 further comprises:

7. The algorithm for scheduling optimization in the artificial intelligence and engineering field based on GPU resources as claimed in claim 4, wherein S300 further comprises: s320, the common scheduling method includes:

8. A method of interaction of 3D models, characterized in that the optimization method of any of claims 4 to 7 is applied.

9. A method of interacting with 3D models according to claim 8, characterized in that it comprises the following steps: