CN115509704A - Task scheduling method, device, equipment and storage medium - Google Patents

Task scheduling method, device, equipment and storage medium Download PDF

Info

Publication number
CN115509704A
CN115509704A CN202211141190.XA CN202211141190A CN115509704A CN 115509704 A CN115509704 A CN 115509704A CN 202211141190 A CN202211141190 A CN 202211141190A CN 115509704 A CN115509704 A CN 115509704A
Authority
CN
China
Prior art keywords
task
video memory
video
releasable
memory capacity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211141190.XA
Other languages
Chinese (zh)
Inventor
陈旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202211141190.XA priority Critical patent/CN115509704A/en
Publication of CN115509704A publication Critical patent/CN115509704A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses a task scheduling method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a video memory application instruction of a first task application video memory, wherein the video memory application instruction comprises video memory capacity to be applied; determining a second task to be temporarily stopped based on the video memory capacity to be applied, wherein the video memory use priority of the first task is higher than that of the second task, and the video memory capacity used by the second task is a releasable video memory; under the condition of suspending the second task from running, allocating the releasable video memory of the second task to the first task; and under the condition that the idle video memory capacity matched with the second task exists, continuing to run the second task.

Description

Task scheduling method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of computers, and relates to but is not limited to a task scheduling method, a task scheduling device and a task scheduling storage medium.
Background
In the current scheme, in a scheduling platform exclusive to a display card, a preemptible task occupies the whole display card for data processing, and after being preempted by other tasks, the preemptible task needs to be closed and re-enter a scheduling queue. In the scheduling platform exclusive to the display card, due to the difference of programs, some tasks are exclusive to one display card, but the whole memory and computing capacity of the display card cannot be utilized, so that the waste of resources is caused. The preemptible task exclusively occupied by the display card is frequently closed, so that scheduling and starting are frequent, and the running efficiency of the preemptible task is poor.
Disclosure of Invention
In view of this, embodiments of the present application provide a method, an apparatus, a device, and a storage medium for task scheduling.
The technical scheme of the embodiment of the application is realized as follows:
in a first aspect, an embodiment of the present application provides a task scheduling method, where the method includes:
acquiring a video memory application instruction of a first task application video memory, wherein the video memory application instruction comprises video memory capacity to be applied;
determining a second task to be temporarily stopped to run based on the video memory capacity to be applied, wherein the running priority of the first task is higher than that of the second task, and the video memory capacity special for the second task is a releasable video memory;
under the condition of suspending the second task from running, allocating the releasable video memory of the second task to the first task;
and under the condition that the idle video memory capacity matched with the second task exists, continuing to run the second task.
In a second aspect, an embodiment of the present application provides a task scheduling apparatus, where the apparatus includes:
the system comprises an acquisition module, a processing module and a display memory management module, wherein the acquisition module is used for acquiring a display memory application instruction of a first task for applying for a display memory, and the display memory application instruction comprises a display memory capacity to be applied;
the first determining module is used for determining a second task to be temporarily suspended for operation based on the capacity of the video memory to be applied, wherein the operation priority of the first task is higher than that of the second task, and the video memory occupied by the second task is a releasable video memory;
the distribution module is used for distributing the releasable video memory occupied by the second task to the first task under the condition of temporarily running the second task;
and the running module is used for continuing to run the second task under the condition that the idle video memory capacity matched with the second task exists.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements the above method when executing the program.
In a fourth aspect, embodiments of the present application provide a storage medium storing executable instructions for causing a processor to implement the above method when executed.
In the embodiment of the application, firstly, a video memory application instruction used by a first task for applying a video memory to include a video memory capacity to be applied is obtained; then, determining a second task to be temporarily stopped to run based on the video memory capacity to be applied; under the condition of suspending the second task from running, allocating the releasable video memory of the second task to the first task; and finally, under the condition that the idle video memory capacity matched with the second task exists, continuing to run the second task. Therefore, the scheduled video memory with finer granularity can be realized based on the video memory capacity, the second task does not need to be opened and closed frequently, and the operating efficiency of the second task is effectively improved.
Drawings
Fig. 1 is a schematic flowchart illustrating an implementation process of a task scheduling method according to an embodiment of the present application;
fig. 2A is a schematic diagram of task scheduling provided in an embodiment of the present application;
fig. 2B is a schematic flowchart illustrating an implementation process of a task scheduling method according to an embodiment of the present application;
fig. 3A is a schematic flowchart illustrating an implementation process of a task scheduling method according to an embodiment of the present application;
fig. 3B is a schematic structural diagram of task management according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart illustrating an implementation process of a task scheduling method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a task scheduling device according to an embodiment of the present disclosure;
fig. 6 is a hardware entity diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, specific technical solutions of the embodiments of the present application will be described in further detail below with reference to the drawings in the embodiments of the present application. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
An embodiment of the present application provides a task scheduling method, as shown in fig. 1, the method includes:
step S110, a video memory application instruction of a first task application video memory is obtained, wherein the video memory application instruction comprises video memory capacity to be applied;
the video memory, also called a frame buffer, is used to store rendering data processed or to be extracted by a Graphics Processing Unit (GPU) of the corresponding video card. As with the memory of a computer, video memory is the means used to store graphics information to be processed.
The task scheduling method can be applied to computing nodes, one computing node can comprise a plurality of display cards, one display card can comprise a plurality of GPUs, and one computing node can run at least one task.
In some embodiments, the tasks running on the computing nodes may be classified according to running priorities, for example, the tasks with higher running priorities may be classified into normal tasks, and the tasks with lower running priorities may be classified into preemptible tasks. Here, the first task is a normal task having a higher priority for video memory use.
In the implementation process, when the first task determines that the video memory in the compute node needs to be used for operation, the required video memory capacity may be computed first, and then a video memory application instruction including the video memory capacity may be generated. And obtaining the video memory application instruction, namely determining the video memory capacity to be applied.
Step S120, determining a second task to be temporarily stopped to operate based on the video memory capacity to be applied, wherein the operation priority of the first task is higher than that of the second task, and the video memory occupied by the second task is a releasable video memory;
here, the second task is lower in running priority than the first task. For example, the second task may be a parasitic task, that is, in the case that there is free memory capacity, the parasitic task may be executed using the free memory capacity; and under the condition that the first task needs to use the video memory capacity, releasing the video memory capacity being used by the second task in time so as to provide the first task with higher running priority for use. That is, the memory used by the second task is a releasable memory.
In the implementation process, since the compute node may have at least one second task, a second task matching the video memory capacity may be determined based on the video memory capacity to be applied, that is, a second task to be suspended from running is determined.
Step S130, under the condition of temporarily stopping running the second task, distributing the releasable video memory of the second task to the first task;
here, the second task may be suspended from running, and then the releasable video memory of the second task may be allocated to the first task.
And step S140, under the condition that the idle video memory capacity matched with the second task exists, continuing to run the second task.
In some embodiments, after the second task is suspended from running, when waiting for the other tasks to release the free video memory capacity matched with the second task, the video memory capacity may be reacquired to continue running the second task.
In the embodiment of the application, firstly, a video memory application instruction used by a first task for applying the video memory to include the video memory capacity to be applied is obtained; then, determining a second task to be temporarily stopped to operate based on the video memory capacity to be applied; under the condition of suspending the second task from running, allocating the releasable video memory of the second task to the first task; and finally, under the condition that the idle video memory capacity matched with the second task exists, continuously running the second task. Therefore, the scheduled video memory with finer granularity can be realized based on the video memory capacity, the second task does not need to be opened and closed frequently, and the operating efficiency of the second task is effectively improved.
In some embodiments, the above step S130 "in the case of suspending the running of the second task, allocating the releasable video memory of the second task to the first task" may be implemented by:
step 131, suspending running the second task;
step 132, migrating the video memory data in the releasable video memory of the second task to a specified memory space;
here, the Memory (Memory) is an important component of a computer, and is also called an internal Memory and a main Memory, and temporarily stores operation data in a Central Processing Unit (CPU) and data exchanged with an external Memory such as a hard disk. The computer is a bridge for communicating an external memory with a CPU, and the running of all programs in the computer is performed in the internal memory.
The specified memory may be a memory in the same computing node as the second task is run. In the implementation process, the video memory data in the releasable video memory may be copied to the designated memory space, and then the video memory data in the releasable video memory is cleared, so as to migrate the video memory data of the second task to the designated memory space.
Step 133, releasing the releasable video memory of the second task;
here, since the memory data in the releasable memory of the second task has been migrated to the designated memory space, the releasable memory of the second task may be released.
And step 134, allocating the releasable video memory of the second task to the first task.
In the embodiment of the application, the second task is temporarily stopped to run; then migrating the video memory data in the releasable video memory of the second task to a memory space; releasing the releasable video memory of the second task; and finally, distributing the releasable video memory of the second task to the first task. In this way, the video memory data of the running second task is saved, and the releasable video memory of the second task can be allocated to the first task.
In some embodiments, the above step S140 "in case that it is determined that there is an idle video memory capacity matching with the second task, continuing to run the second task" may be implemented by:
step 141, storing the second task to a waiting operation queue;
in the implementation process, since there may be a case that a plurality of second tasks need to wait for allocating the video memory capacity, a wait-to-run queue may be set, and the plurality of second tasks are queued according to a preset queuing rule. Here, the queuing rule may be based on the release sequence of the second task for releasing the video memory as a queuing sequence, may also be based on the size of the second task for releasing the video memory as a queuing sequence, and may also be based on the attribute of the second task as a sorting sequence. The attributes of the second task include at least an identification, a function, and a category of the second task.
Step 142, determining that an idle video memory matched with the second task exists;
in the implementation process, the free video memory matched with the second task can be determined by scanning the video memory space.
Step 143, migrating the video memory data corresponding to the second task from the designated memory to the idle video memory;
in the implementation process, since the video memory data of the second task is already migrated to the specified memory under the condition that the second task is temporarily suspended from running, the video memory data can be migrated from the specified memory to the idle video memory.
And step 144, continuing to run the second task.
In the embodiment of the application, the second task is firstly stored in a waiting running queue; then determining that an idle video memory matched with the second task exists; migrating the video memory data corresponding to the second task from the designated memory to the idle video memory; and finally, continuing to run the second task. Therefore, under the condition that the idle video memory matched with the second task exists, the video memory data corresponding to the second task can be migrated from the designated memory to the idle video memory, so that the second task can be continuously operated, and the operation efficiency of the second task is effectively improved.
Fig. 2A is a schematic diagram of task scheduling provided in an embodiment of the present application, and as shown in fig. 2A, the schematic diagram includes four display video memory allocations at time T1, time T2, time T3, and time T4, where,
GPU0 and GPU1 in the four graphs belong to the same computing node.
At the time of T1, 60% of the video memory capacity corresponding to the GPU0 is allocated to the common task, and 40% of the video memory capacity corresponding to the GPU0 is allocated to the parasitic task; and allocating 100% of the video memory capacity corresponding to the GPU1 to the common task.
At the time of T2, 100% of the video memory capacity corresponding to the GPU0 is allocated to the common task; and allocating 100% of the video memory capacity corresponding to the GPU1 to the common task.
At the time of T3, 100% of the video memory capacity corresponding to the GPU0 is allocated to the common task; 60% of the video memory capacity corresponding to the GPU1 is allocated to the common task, and the rest 40% of the video memory capacity can be free.
At the time of T4, 100% of the video memory capacity corresponding to the GPU0 is allocated to the common task; the video memory capacity corresponding to the GPU1 is allocated to the common task by 60 percent and is allocated to the parasitic task by 40 percent.
Therefore, the method can release the releasable video memory of the parasitic task to the common task and temporarily stop running the parasitic task under the condition that the common task needs the video memory capacity at the time of T2; in the case where it is determined at time T3 that GPU1 has free video memory, the free video memory is allocated to the parasitic task at time T4, so that the parasitic task can continue to run.
In some embodiments, the idle video memory matched with the second task belongs to a video memory space of a video card occupied by a third task; wherein the third task has a higher running priority than the second task;
the above step S140 "in the case that it is determined that there is an idle video memory capacity matching the second task, continuing to run the second task" may be implemented by:
and continuing to run the second task based on the display card occupied by the third task.
Here, the priority of video memory usage of the third task is higher than that of the second task, and there is a case where the third task does not fully occupy the video memory space of one video card, that is, the third task occupies a part of the video memory space of the video card. In this case, it is determined that, in the video memory space of the video card occupied by the third task, there is an idle video memory matched with the second task, that is, the third task and the second task may share the video memory space of one video card.
In some embodiments, the third task may be set as a task having the same priority level of video memory usage as the first task, for example, the third task and the first task are both normal tasks, and the second task is a parasitic task.
In the embodiment of the application, the second task continues to run based on the video card occupied by the third task. Therefore, the video memory capacity is scheduled in a finer granularity mode, and the utilization rate of the video card is greatly improved.
In some embodiments, before determining that there is free video memory capacity matching the second task in step 142, the method for determining the free video memory capacity matching the second task includes the following steps:
step 143, scanning the video memory space at regular time to obtain the occupation condition of the video memory space;
in the implementation process, a scanning period can be set based on actual requirements to scan the video memory space at regular time, so as to obtain the service condition of the video memory space.
In some embodiments, the above method of task scheduling is applied to a compute node comprising at least one graphics card; the step 143 "scanning the memory space at regular time to obtain the occupation situation of the memory space" can be implemented through the following processes:
and scanning the video memory space of at least one video card on the computing node at regular time to acquire the occupation condition of the video memory space of at least one video card.
Here, a compute node may include at least one video card, where each video card corresponds to at least one video memory space, and in an implementation process, at least one video card in the compute node may be scanned at regular time to obtain a video memory space usage of each video card.
And 144, determining whether an idle video memory matched with the second task exists or not based on the occupation condition of the video memory space.
In the implementation process, an idle video memory matched with the second task can be determined from the video memory spaces corresponding to the multiple video cards, that is, the video card corresponding to the idle video memory is not limited.
In the embodiment of the application, a video memory space is scanned regularly to obtain the occupation condition of the video memory space; and determining whether the free video memory capacity matched with the second task exists or not based on the occupation condition of the video memory space. Therefore, the free video memory capacity matched with the second task can be determined timely and efficiently.
In some embodiments, the above method for task scheduling is applied to a computing cluster including at least one computing node, and as shown in fig. 2B, the method for task scheduling further includes:
step S210, under the condition that the computing node is determined to have no idle video memory capacity, scanning the occupation conditions of the video memory space of other computing nodes in the computing cluster;
here, since the compute cluster includes at least one compute node, if it is determined that the compute node in which the second task is running does not have a free video memory space, the occupation statuses of the video memory spaces of the remaining compute nodes in the compute cluster may be scanned.
Step S220 in the case that it is determined that at least one video card of the remaining computing nodes has free video memory capacity, determining the computing node with the free video memory capacity as a target computing node;
in the implementation process, at least one video card of the rest computing nodes is determined to have free video memory capacity, and then the computing node is determined as a target computing node.
And step S230, restarting the second task at the target computing node.
In the embodiment of the application, under the condition that it is determined that no free video memory space exists on the computing node, the use conditions of the video memory spaces of other computing nodes in the computing cluster are scanned; under the condition that at least one video card of the rest computing nodes has the free video memory capacity, determining the computing node with the free video memory capacity as a target computing node; restarting the second task at the target compute node. Therefore, the second task can be started in time under the condition that the rest nodes in the computing cluster have idle video memory capacity, and the utilization rate of the computing cluster is effectively improved.
In some embodiments, the step S120 "determining the second task to be suspended for running based on the video memory capacity to be applied" may be implemented by:
step 121, obtaining the video memory capacity to be applied;
here, the video memory capacity to be applied may be smaller than the video memory space of one video card, that is, the entire video memory space of one video card does not need to be applied.
Step 122, determining the releasable video memory matched with the video memory capacity to be applied;
here, since the releasable video memory is the video memory capacity corresponding to the second task, in the implementation process, a releasable video memory matched with the video memory capacity to be applied may be determined based on the size of the video memory capacity to be applied, and the releasable video memory may be equal to the video memory capacity to be applied, or may be slightly larger than the video memory capacity to be applied, and may be set according to the actual situation.
And step 123, determining the second task occupying the releasable video memory as the second task to be temporarily stopped to run.
In the embodiment of the application, firstly, the video memory capacity to be applied is obtained; then determining the releasable video memory matched with the video memory capacity to be applied; and finally, determining the second task occupying the releasable video memory as the second task to be temporarily stopped to run. Therefore, based on the video memory capacity to be applied, a second task for releasing the releasable video memory can be effectively determined.
In some embodiments, the method for task scheduling is applied to a parallel computing platform, where the parallel computing platform includes a video memory task management center and a task programming interface, and as shown in fig. 3A, the method includes:
step S310, the task programming interface initiates a video memory application instruction of a first task for applying for video memory to the video memory task management, wherein the video memory application instruction comprises video memory capacity to be applied;
here, since the GPU is not an independently operating computing platform, but needs to work in cooperation with the CPU, and may be regarded as a coprocessor of the CPU, in the case of GPU parallel computing, it may refer to a heterogeneous computing architecture based on the CPU and the GPU. In the heterogeneous computing architecture, the GPU and the CPU are connected together through a PCIe bus to work together, where the location of the CPU is referred to as a host side (host), and the location of the GPU is referred to as a device side (device). For example, CUDA is a parallel computing platform, i.e., a general parallel computing architecture that enables a GPU to solve complex computing problems. In CUDA, a CPU and its memory may be referred to as host, and a GPU and its memory may be referred to as device. The CUDA program comprises a host program and a device program which are respectively run on the CPU and the GPU. Meanwhile, the host and the device can be communicated, so that data copying can be carried out between the host and the device.
Here, task programming interfaces, namely, a preemptive CUDA Application programming interface (Hijacked CUDA API), and a memory task management center (Application Manager) may be added to the CUDA platform. In the implementation process, the Hijacked CUDA API sends a video memory Application instruction of a first task for applying for the video memory to the Application Manager.
Fig. 3B is a schematic structural diagram of video memory management provided in the embodiment of the present application, and as shown in fig. 3B, the video memory management includes a CUDA application 31, a preemptive CUDA application programming interface (Hijacked CUDA API) 32, a CUDA application programming interface 33, a driver 34, and a CUDA application management (video memory task management center) 35, where,
preemptive CUDA application programming interface (Hijacked CUDA API) 32, comprising at least the following functions: cudaMalloc (), cudaFree (), and cudamallloclaunchdevice ().
And the cudaMalloc (video memory application function) is used for applying for video memory capacity for the first task.
And the cudaFree (release video memory function) is used for releasing the releasable video memory corresponding to the second task.
cudaMallocLaunchDevice (function for running actual technology), function for actual calculation is run on the GPU, and whether the second task is in a suspended running state, task scheduling performed, and the like are judged by encapsulating the function.
Step S320, the video memory task management center determines a second task to be suspended for operation based on the video memory capacity to be applied, where the video memory application priority of the first task is higher than that of the second task, and the video memory occupied by the second task is a releasable video memory;
in the implementation process, the Application Manager searches for a second task (parasitic task) which can be suspended by the current node, and informs the second task of suspending operation.
Step S330, in the video memory task management, under the condition that the second task is temporarily suspended to run, allocating the releasable video memory of the second task to the first task;
in the implementation process, the Application Manager notifies the Device memory applied by the first task of success, and distributes the releasable video memory of the second task to the first task.
Step S340, the video memory task management center continues to run the second task when determining that there is an idle video memory capacity matching the second task.
And continuing to run the second task under the condition that the Application Manager finds the free video memory capacity matched with the second task in the implementation process.
In the embodiment of the application, the task scheduling method is applied to a parallel computing platform, and the parallel computing platform comprises a video memory task management center and a task programming interface. The video memory task management center can receive a video memory application instruction sent by a first task; coordinating the second task to release the video memory and maintaining the life cycle of the second task; and when the video memory capacity matched with the second task is determined to exist, continuing to run the second task. Therefore, the scheduled video memory with finer granularity can be realized based on the video memory capacity, the second task does not need to be started and closed frequently, and the operating efficiency of the second task is effectively improved.
Fig. 4 is a workflow of task scheduling provided in an embodiment of the present application, and as shown in fig. 4, the workflow includes the following steps:
step S410, sending a video memory capacity application instruction to a task management center by a common task through a video memory application function;
here, the common task applies for the video memory capacity through a video memory application function in the preemptive CUDA API; in the CUDA operation platform, the preemptive CUDA API can be named as Hijacked CUDA API. In the implementation process, the Hijacked CUDA API can be compiled into a dynamic link library file, and the file is specified to be loaded into a runtime environment through an environment variable of the LD _ PRELOAD.
The Hijacked CUDA API mechanism can obtain the reference of the CUDA function through dlsym () and encapsulate the reference on the CUDA function. The specification does not write code: is known
Here, cudammalloc (video memory Application function) is used for applying for video memory capacity for a common task, the function is a function for applying for video memory by Application, whether the releasable video memory needs to be released is judged by encapsulating the function and negotiating with an Application Manager, if the releasable video memory needs to be released, a parasitic task where the releasable video memory is located is set to be in a temporary running state, and enters a waiting running queue;
cudaLaunchdevice (function of running practical technology), running a function of practical calculation on a GPU, and judging whether a parasitic task is a temporary running state or not by packaging the function, and performing task scheduling and other work;
cudaFree (release video memory function), which is a release video memory function, and by packaging the function, the Application Manager can be notified to trigger one-time computing node state scanning, and whether the parasitic task can continue to run is checked.
In some embodiments, the CUDA program may be run in a container mode in the GPU cluster scheduling platform, so the compiled dynamic link library and environment variables are set by the platform, and are transparent to the CUDA program.
In the implementation process, a video memory capacity Application instruction can be initiated to the Application Manager by using a video memory Application function in the preemptive CUDA API. And the dynamic link library of the CUDA operation platform acquires the synchronous instruction from the Application Manager so as to acquire the instruction issued by the Application Manager.
Step S420, the task management center determines a parasitic task to be temporarily stopped to run;
in the implementation process, the Application Manager searches a parasitic task which can be temporarily operated by the current node and informs the parasitic task of temporarily operating;
step S430, the task management center releases the releasable video memory of the parasitic task and backups video memory data in the releasable video memory of the parasitic task;
in the implementation process, when the parasitic task checks that the temporary operation notification is received, the parasitic task temporarily operates the process corresponding to the parasitic task, and copies the video memory data in the releasable video memory to the specified memory.
In the case where the parasitic task is suspended from running, the task management center puts the suspended parasitic task into a queue waiting for running.
Step S440, the task management center allocates the released video memory to a common task;
in the implementation process, the task management center informs the success of the video memory capacity applied by the common task and distributes the video memory capacity to the common task.
Step S450, the task management center determines the idle video memory capacity matched with the parasitic task;
step S460, the task management center allocates the free video memory capacity to the parasitic task and continues to run the parasitic task;
in the implementation process, the task management center tries to continue to run the parasitic tasks in the queue under the condition that the task management center finds the proper free video memory capacity for the parasitic tasks; and after the parasitic task which is continuously operated obtains a continuous operation signal, the idle video memory capacity is bound again, and the memory of the video memory data which is backed up in the appointed memory before is copied into the idle video memory capacity.
Step S470, the task management center periodically scans the use condition of the video memory of the current node, and tries to continue to run the waiting parasitic task.
The video memory management strategy provided by the embodiment of the application can realize the releasable video memory allocation strategy of the CUDA API layer. Under the condition that the common task applies for the video memory capacity through the CUDA API, the releasable video memory of the parasitic task can be released. When the releasable video memory of the parasitic task is released, a task management (AM) center notifies the parasitic task and suspends the execution of the process to which the parasitic task belongs. When the releasable video memory of the parasitic task is released, the Application Manager copies the video memory data in the releasable video memory to the memory. When the Application Manager determines that the computing node has the free video memory capacity matched with the parasitic task, the video memory data stored in the memory can be copied to the free video memory capacity. And under the condition that the Application Manager determines that the releasable video memory acquired by the parasitic task is available, continuing to run the process of the parasitic task.
Based on the foregoing embodiments, an embodiment of the present application provides a task scheduling apparatus, where the apparatus includes modules, each module includes sub-modules, each sub-module includes a unit, and the sub-modules may be implemented by a processor in an electronic device; of course, the implementation can also be realized through a specific logic circuit; in the implementation Process, the processor may be a Central Processing Unit (CPU), a Microprocessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.
Fig. 5 is a schematic structural diagram of a task scheduling apparatus according to an embodiment of the present application, and as shown in fig. 5, the apparatus 500 includes:
an obtaining module 510, configured to obtain a video memory application instruction for a first task to apply for a video memory, where the video memory application instruction includes a video memory capacity to be applied;
a first determining module 520, configured to determine, based on the video memory capacity to be applied, a second task to be suspended for operation, where an operation priority of the first task is higher than that of the second task, and a video memory used by the second task is a releasable video memory;
an allocating module 530, configured to, in a case where the second task is suspended from running, allocate the releasable video memory of the second task to the first task;
and an operation module 540, configured to continue to operate the second task when it is determined that there is an idle video memory capacity matching the second task.
In some embodiments, the allocating module 530 includes a suspending submodule, a first migration submodule, a releasing submodule, and an allocating submodule, wherein the suspending submodule is configured to suspend the second task from running; the first migration submodule is used for migrating the video memory data in the releasable video memory of the second task to a specified memory space; the release submodule is used for releasing the releasable video memory of the second task; and the distribution submodule is used for distributing the releasable video memory of the second task to the first task.
In some embodiments, the running module includes a storing submodule, a first determining submodule, a second migrating submodule, and a running submodule, wherein the storing submodule is configured to store the second task to a waiting running queue; the first determining submodule is used for determining that an idle video memory matched with the second task exists; the second migration sub-module is configured to migrate the video memory data corresponding to the second task from the designated memory to the idle video memory; and the operation sub-module is used for continuously operating the second task.
In some embodiments, the idle video memory matched with the second task belongs to a video memory space of a video card occupied by a third task; wherein the third task has a higher running priority than the second task; the running module 540 is further configured to continue to run the second task based on the graphics card occupied by the third task.
In some embodiments, the operating module further includes a scanning sub-module and a second determining module, where the scanning sub-module is configured to scan a video memory space at regular time to obtain an occupation situation of the video memory space; and the second determining module is used for determining whether the idle video memory capacity matched with the second task exists or not based on the occupation condition of the video memory space.
In some embodiments, the apparatus is applied to a computing node comprising at least one graphics card; the scanning submodule is further configured to scan the video memory space of the at least one video card on the compute node at regular time, and acquire an occupation situation of the video memory space of the at least one video card.
In some embodiments, the apparatus is applied to a computing cluster including at least one computing node, and further includes: the system comprises a scanning module, a second determining module and a restarting module, wherein the scanning module is used for scanning the occupation situation of the video memory space of other computing nodes in the computing cluster under the condition that no idle video memory capacity exists on the computing nodes; the second determining module is configured to determine, as the target computing node, the computing node with the idle video memory capacity when it is determined that at least one video card of the remaining computing nodes has the idle video memory capacity; the restarting module is used for restarting the second task at the target computing node.
In some embodiments, the first determining module 520 includes an obtaining sub-module, a third determining sub-module, and a fourth determining sub-module, where the obtaining sub-module is configured to obtain the video memory capacity to be applied; the third determining submodule is used for determining the releasable video memory matched with the video memory capacity to be applied; the fourth determining submodule is configured to determine the second task occupying the releasable video memory as the second task to be suspended from running.
The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
It should be noted that, in the embodiment of the present application, if the method is implemented in the form of a software functional module and sold or used as a standalone product, the method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing an electronic device (which may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
Correspondingly, the present application provides a storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps in the task scheduling method provided in the foregoing embodiments.
Correspondingly, an embodiment of the present application provides an electronic device, and fig. 6 is a schematic diagram of a hardware entity of the electronic device provided in the embodiment of the present application, as shown in fig. 6, the hardware entity of the device 600 includes: comprising a memory 601 and a processor 602, said memory 601 storing a computer program operable on said processor 602, said processor 602 implementing the steps in the task scheduling method provided in the above embodiments when executing said program.
The Memory 601 is configured to store instructions and applications executable by the processor 602, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 602 and modules in the electronic device 600, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM).
Here, it should be noted that: the above description of the storage medium and device embodiments, similar to the description of the method embodiments above, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit may be implemented in the form of hardware, or in the form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps of implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer-readable storage medium, and when executed, executes the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.
Alternatively, the integrated unit described above may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing an electronic device (which may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to arrive at new method embodiments.
Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.
The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.
The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of task scheduling, the method comprising:
acquiring a video memory application instruction of a first task application video memory, wherein the video memory application instruction comprises video memory capacity to be applied;
determining a second task to be temporarily stopped to operate based on the video memory capacity to be applied, wherein the operation priority of the first task is higher than that of the second task, and the video memory occupied by the second task is a releasable video memory;
under the condition of suspending the second task from running, allocating the releasable video memory occupied by the second task to the first task;
and under the condition that the idle video memory capacity matched with the second task exists, continuing to run the second task.
2. The method of claim 1, wherein the allocating the releasable memory occupied by the second task to the first task in the case of suspending the running of the second task comprises:
suspending running the second task;
migrating the video memory data in the releasable video memory occupied by the second task to a specified memory space;
releasing the releasable video memory of the second task;
and allocating the releasable video memory of the second task to the first task.
3. The method of claim 2, wherein continuing to run the second task in the event that it is determined that there is free video memory capacity that matches the second task, comprises:
storing the second task to a waiting operation queue;
determining that an idle video memory matched with the second task exists;
migrating the video memory data corresponding to the second task from the designated memory to the idle video memory;
and continuing to run the second task.
4. The method as claimed in claim 1, wherein the idle video memory matched with the second task belongs to the video memory space of the video card occupied by the third task; wherein the third task has a higher running priority than the second task;
under the condition that the idle video memory capacity matched with the second task exists, continuing to run the second task, wherein the method comprises the following steps:
and continuing to run the second task based on the display card occupied by the third task.
5. The method of claim 3, prior to the determining that there is free video memory capacity matching the second task, the method further comprising:
scanning a video memory space at regular time to acquire the occupation condition of the video memory space;
and determining whether an idle video memory matched with the second task exists or not based on the occupation condition of the video memory space.
6. The method of claim 5, applied to a compute node comprising at least one graphics card;
regularly scan the video memory space, acquire the occupation condition in video memory space, include:
and scanning the video memory space of at least one video card on the computing node at regular time to acquire the occupation condition of the video memory space of at least one video card.
7. The method of claim 6, applied to a computing cluster including at least one computing node, the method further comprising:
scanning the occupation condition of the video memory space of other computing nodes in the computing cluster under the condition that the computing nodes are determined not to have the free video memory capacity;
under the condition that at least one video card of the rest of computing nodes has the free video memory capacity, determining the computing nodes with the free video memory capacity as target computing nodes;
restarting the second task at the target compute node.
8. The method according to claim 1, wherein the determining, based on the video memory capacity to be applied for, a second task to be suspended for operation comprises:
acquiring the video memory capacity to be applied;
determining the releasable video memory matched with the video memory capacity to be applied;
and determining the second task occupying the releasable video memory as the second task to be temporarily stopped to run.
9. The method of claim 1, applied to a parallel computing platform, the parallel computing platform comprising a video memory task management center and a task programming interface, the method comprising:
the task programming interface initiates a video memory application instruction of a first task for applying for video memory to the video memory task management center, wherein the video memory application instruction comprises video memory capacity to be applied;
the video memory task management center determines a second task to be temporarily stopped based on the video memory capacity to be applied, wherein the running priority of the first task is higher than that of the second task, and the video memory occupied by the second task is a releasable video memory;
the video memory task management center allocates the releasable video memory of the second task to the first task under the condition of temporarily stopping running the second task;
and the video memory task management center continues to run the second task under the condition that the video memory task management center determines that the idle video memory capacity matched with the second task exists.
10. An electronic device comprising a memory and a processor, the memory storing a computer program operable on the processor, the processor implementing the steps of the method of any one of claims 1 to 9 when executing the program.
CN202211141190.XA 2022-09-20 2022-09-20 Task scheduling method, device, equipment and storage medium Pending CN115509704A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211141190.XA CN115509704A (en) 2022-09-20 2022-09-20 Task scheduling method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211141190.XA CN115509704A (en) 2022-09-20 2022-09-20 Task scheduling method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115509704A true CN115509704A (en) 2022-12-23

Family

ID=84503533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211141190.XA Pending CN115509704A (en) 2022-09-20 2022-09-20 Task scheduling method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115509704A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117539639A (en) * 2024-01-05 2024-02-09 北京趋动智能科技有限公司 Video memory resource scheduling method, device, system, storage medium and electronic equipment
CN118132273A (en) * 2024-04-29 2024-06-04 阿里云计算有限公司 Data processing method, device and equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117539639A (en) * 2024-01-05 2024-02-09 北京趋动智能科技有限公司 Video memory resource scheduling method, device, system, storage medium and electronic equipment
CN117539639B (en) * 2024-01-05 2024-06-14 北京趋动智能科技有限公司 Video memory resource scheduling method, device, system, storage medium and electronic equipment
CN118132273A (en) * 2024-04-29 2024-06-04 阿里云计算有限公司 Data processing method, device and equipment

Similar Documents

Publication Publication Date Title
CN110489213B (en) Task processing method and processing device and computer system
JP3678414B2 (en) Multiprocessor system
CN115509704A (en) Task scheduling method, device, equipment and storage medium
US8739171B2 (en) High-throughput-computing in a hybrid computing environment
CN101968751B (en) Sharing idled processor execution resources
US8402470B2 (en) Processor thread load balancing manager
CN114741207B (en) GPU resource scheduling method and system based on multi-dimensional combination parallelism
KR102334511B1 (en) Manage task dependencies
EP4242843A1 (en) Graphics card memory management method and apparatus, device, and system
US20120054771A1 (en) Rescheduling workload in a hybrid computing environment
US20050125793A1 (en) Operating system kernel-assisted, self-balanced, access-protected library framework in a run-to-completion multi-processor environment
CN107515781B (en) Deterministic task scheduling and load balancing system based on multiple processors
KR102338849B1 (en) Method and system for providing stack memory management in real-time operating systems
CN110888743A (en) GPU resource using method, device and storage medium
CN108228343B (en) Memory recovery method and device, computer device and computer readable storage medium
CN102047218A (en) Scheduler instances in a process
CN109840149B (en) Task scheduling method, device, equipment and storage medium
CN114625533A (en) Distributed task scheduling method and device, electronic equipment and storage medium
CN111813541B (en) Task scheduling method, device, medium and equipment
CN116724294A (en) Task allocation method and device
JP2015148909A (en) Parallel computer system, control method of parallel computer system, and control program of management node
CN113495787A (en) Resource allocation method, device, storage medium and electronic equipment
CN112486638A (en) Method, apparatus, device and storage medium for executing processing task
CN112114958A (en) Resource isolation method, distributed platform, computer device, and storage medium
CN113032154B (en) Scheduling method and device for virtual CPU, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination