CN111708639A

CN111708639A - Task scheduling system and method, storage medium and electronic device

Info

Publication number: CN111708639A
Application number: CN202010573441.6A
Authority: CN
Inventors: 安虹; 李名凡; 韩文廷; 林晗; 林增
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2020-09-25

Abstract

The invention provides a task scheduling system and method, a storage medium and an electronic device, wherein the system is applied to a Graphic Processing Unit (GPU), and the system comprises: a global task scheduling unit and a local task scheduling unit; the global task scheduling unit is used for sending the target subtask to a task buffer area of the stream processor with the least current task amount when detecting that the target subtask exists in the task queue stored in the global memory; the target subtask is a subtask without forward dependence in the task queue; and the local task scheduling unit is used for determining the target subtask sent to the task buffer area as the subtask to be processed, and sequentially scheduling each subtask to be processed to the execution kernel of the stream processor with the least task amount according to the processing priority of each currently remaining subtask to be processed in the task buffer area, so that tasks can be reasonably distributed to the stream processor of the GPU, and the operation performance of the GPU is improved.

Description

Task scheduling system and method, storage medium and electronic device

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a task scheduling system and method, a storage medium, and an electronic device.

Background

With the development of science and technology, Graphics Processing Units (GPUs) are also widely used in various fields, such as Graphics Processing, video Processing, machine learning, and data mining, where the operating speed of the GPU greatly affects the work efficiency of people.

In order to improve the operating efficiency of the graphics processor, people usually adopt a way of improving hardware performance indexes, such as performance indexes of capacity, working frequency, bit width and the like; however, with the advent of the post moore's law era, the hardware performance index of the GPU is approaching the bottleneck more and more, and therefore, people begin to improve the data parallel operation capability of the GPU by increasing the number of GPU kernels, that is, each kernel can independently complete a certain operation task, and the operation speed of the GPU can be greatly improved.

In the process of processing a dependent task, a multi-kernel GPU generally allocates a multi-stage task having a task dependency relationship to independent stream processors in the GPU, where the multiple tasks having a dependency relationship have a specific execution order, for example, if a forward dependent task of task a is task B, it is necessary to execute task B first before task a is executed.

However, for each stream processor, the stream processor cannot sense the task processing status of the rest of the stream processors in the GPU, and therefore, the stream processors cannot reasonably arrange the execution order of the tasks to be processed, thereby causing the GPU to have limited operation performance.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a task scheduling system and method, a storage medium and an electronic device, which can reasonably allocate tasks to a stream processor of a GPU and improve the operation performance of the GPU.

The invention also provides a task scheduling device used for ensuring the realization and the application of the method in practice.

A task scheduling system for a graphics processor, GPU, the GPU comprising a global memory and a plurality of stream processors, the system comprising:

a global task scheduling unit and a local task scheduling unit;

the global task scheduling unit is configured to send a target subtask to a task buffer of a stream processor with a smallest task amount among the current plurality of stream processors when it is detected that the target subtask exists in a task queue stored in the global memory;

the task queue comprises at least one task group, and the task group comprises one subtask or a plurality of subtasks; the target subtask is a subtask without forward dependency in the task queue;

the local task scheduling unit is configured to determine the target subtask that has been sent to the task buffer as a to-be-processed subtask, determine whether the number of currently remaining to-be-processed subtasks in the task buffer is multiple, and if yes, sequentially schedule each to-be-processed subtask to an execution kernel of the stream processor with the smallest task amount according to a processing priority of each to-be-processed subtask, so that the execution kernel executes the received to-be-processed subtask.

The task scheduling system optionally further includes: a detection unit;

the detection unit is used for acquiring a task dependency relationship table of each task group of the task queue; determining a forward dependency count for each subtask of each of the task groups based on each of the task dependency tables; and detecting whether a target subtask currently exists in the task queue according to the forward dependency count of each subtask.

The task scheduling system optionally further includes: an update unit;

and the updating unit is used for determining the task group to which the executed subtask to be processed belongs and updating the forward dependent task count of each currently remaining subtask in the task group.

The task scheduling system optionally further includes: a task decomposition unit;

the task decomposition unit is used for determining a task to be decomposed corresponding to a task decomposition instruction and a task granularity specified by the task decomposition instruction when the task decomposition instruction is received, and decomposing the task to be decomposed based on the task granularity to obtain a task group corresponding to the task to be decomposed, wherein the task group comprises sub-tasks; determining the operation type of each subtask, and constructing a dependency relationship table based on the operation type of each subtask, wherein the dependency relationship table records the dependency relationship among the subtasks; and sending the task group corresponding to the task to be decomposed and the dependency relationship table corresponding to the task group to the global memory, so that the global memory stores each subtask of the task group into the task queue.

Optionally, in the task scheduling system, the local task scheduling unit that sequentially schedules each to-be-processed sub-task to the execution core of the stream processor according to the processing priority of each to-be-processed sub-task currently stored in the task buffer includes:

a third determining subunit, configured to determine a task out-degree corresponding to each to-be-processed sub-task currently stored in the task buffer, where the task out-degree is a number of sub-tasks corresponding to a to-be-decomposed task to which the to-be-processed sub-task belongs;

the fourth determining subunit is configured to determine a processing priority of each to-be-processed sub-task based on the task out-degree of each to-be-processed sub-task;

and the scheduling subunit is used for sequentially scheduling each sub-task to be processed to the execution kernel of the stream processor according to the sequence that the processing priority of each sub-task to be processed is from high to low.

A task scheduling method applied to a Graphics Processor (GPU), wherein the GPU comprises a global memory and a plurality of stream processors, and the method comprises the following steps:

when detecting that a target subtask exists in a task queue stored in the global memory, sending the target subtask to a task buffer area of a stream processor with the least task amount in the plurality of current stream processors;

the task queue comprises at least one task group, and the task group comprises one subtask or a plurality of subtasks with dependency relationship; the target subtask is a subtask without forward dependency in the task queue;

determining the target subtasks sent to the task buffer area as subtasks to be processed, and judging whether the number of the current remaining subtasks to be processed in the task buffer area is multiple;

if there are multiple current remaining to-be-processed subtasks in the task cache region, sequentially scheduling each to-be-processed subtask to an execution core of the stream processor with the least task amount according to the processing priority of each to-be-processed subtask, so that the execution core executes the received to-be-processed subtask.

Optionally, in the task scheduling method, the process of detecting the target subtask in the task queue stored in the global memory includes:

acquiring a task dependency relationship table of each task group of the task queue;

determining a forward dependent task count for each subtask of each of the task groups based on each of the task dependency tables;

and detecting whether a target subtask exists in the task queue at present according to the forward dependent task count of each subtask.

Optionally, in the task scheduling method, after the executing core executes the received to-be-processed subtask, the method further includes:

and determining a task group to which the executed to-be-processed subtasks belong, and updating the forward dependent task count of each currently remaining subtask in the task group.

A storage medium, comprising stored instructions, wherein the instructions, when executed, control a device on which the storage medium is located to perform a task scheduling method as described above.

An electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to perform the task scheduling method as described above by one or more processors.

Compared with the prior art, the invention has the following advantages:

the invention provides a task scheduling system and a method, which are applied to a GPU (graphics processing Unit), wherein the GPU comprises a global memory and a plurality of stream processors, and the system comprises: a global task scheduling unit and a local task scheduling unit; the global task scheduling unit is configured to send a target subtask to a task buffer of a stream processor with a smallest task amount among the current plurality of stream processors when it is detected that the target subtask exists in a task queue stored in the global memory; the task queue comprises at least one task group, and the task group comprises one subtask or a plurality of subtasks; the target subtask is a subtask without forward dependency in the task queue; the local task scheduling unit is configured to determine the target subtask that has been sent to the task buffer as a to-be-processed subtask, determine whether the number of currently remaining to-be-processed subtasks in the task buffer is multiple, and if yes, sequentially schedule each to-be-processed subtask to an execution kernel of the stream processor with the smallest task amount according to a processing priority of each to-be-processed subtask, so that the execution kernel executes the received to-be-processed subtask. By applying the task scheduling system provided by the invention, the subtasks without forward dependency relationship can be sent to each stream processor, so that the task execution process of each stream processor is not influenced by the dependency between the tasks, the processing efficiency of the stream processors can be improved, and the operation performance of the GPU is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic structural diagram of a task scheduling system according to the present invention;

FIG. 2 is a schematic diagram of another structure of a task scheduling system according to the present invention;

FIG. 3 is a flowchart of a task scheduling method according to the present invention;

FIG. 4 is a diagram illustrating an exemplary backend architecture provided by the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a task scheduling system, which can be applied to a Graphic Processing Unit (GPU), wherein the GPU comprises a global memory and a plurality of stream processors, and a structural schematic diagram of the task scheduling system is shown in figure 1 and specifically comprises the following steps:

a global task scheduling unit 101 and a local task scheduling unit 102;

the global task scheduling unit 101 is configured to, when it is detected that a target subtask exists in a task queue stored in the global memory, send the target subtask to a task buffer of a stream processor with a smallest task amount among the current plurality of stream processors;

the local task scheduling unit 102 is configured to determine the target subtask that has been sent to the task buffer as a to-be-processed subtask, determine whether there are multiple to-be-processed subtasks currently remaining in the task buffer, and if yes, sequentially schedule each to-be-processed subtask to an execution kernel of a stream processor to which the task buffer belongs according to a processing priority of each to-be-processed subtask, so that the execution kernel executes the received to-be-processed subtask.

In the task scheduling system provided by the embodiment of the present invention, the stream processor includes an execution core and a shared memory, and a task buffer area is arranged in the shared memory and used for storing the to-be-processed subtasks.

One way to determine the task load of the stream processor may be: the determination is made by the number of pending tasks in the task buffer of the stream processor, i.e. the stream processor with the least amount of tasks may be the stream processor with the least amount of pending tasks.

Another way to determine the amount of tasks for the stream processor may be: the estimated processing time consumption of the task to be processed in the task buffer area is executed by the execution kernel of the stream processor to be determined, that is, the stream processor with the least amount of tasks can be the stream processor with the shortest estimated processing time consumption.

Optionally, if the number of the to-be-processed tasks of the stream processor currently receiving the target subtask is not multiple, the to-be-processed subtask currently obtained by the stream processor may be scheduled to an execution core of the stream processor, so that the execution core executes the to-be-processed subtask.

Specifically, the global task scheduling unit may detect whether a target subtask exists in a task queue stored in the global memory at a preset time interval.

It should be noted that, the storage system of the GPU is designed hierarchically, the global memory may be accessed by all stream processors in the global, and the shared memory located inside each stream processor is exclusive to the stream processing, that is, each stream processor generally cannot access the shared memories of the other stream processors, so that in the case of allocating each task with dependency relationship to each stream processor, each stream processor cannot reasonably arrange the execution order of each task in its shared memory, and thus, during the task execution process of a part of stream processors, it is necessary to wait for the other stream processors to finish executing the forward dependent task of the task, and then the execution performance of the GPU is limited. By applying the task scheduling system provided by the invention, the subtasks without the forward dependency relationship stored in the overall situation can be detected, and the subtasks without the forward dependency relationship are sent to each stream processor, so that the dependency influence between the tasks can not be caused in the process of executing the tasks by each stream processor, the processing efficiency of the stream processors can be improved, and the operation performance of the GPU can be effectively improved.

In the task scheduling system provided in the embodiment of the present invention, based on the above scheme, specifically, the task scheduling system further includes: a detection unit;

Specifically, each task group has a task dependency relationship table corresponding thereto, and whether a subtask is a subtask without a forward dependency relationship can be determined by the forward task count of each subtask in the task dependency relationship table; that is, a subtask whose forward task count is 0 may be determined as a subtask having no forward dependency.

If the number of the subtasks without the forward dependency relationship is one, determining the subtasks without the forward dependency relationship as target subtasks; if the number of the subtasks without the forward dependency relationship is multiple, the subtask with the highest processing priority in each subtask without the forward dependency may be used as the target subtask, any one subtask without the forward dependency may be selected as the target subtask, and the selection may be performed according to the queuing order of each subtask without the forward dependency in the task queue, that is, the subtask without the forward dependency which is queued first is preferentially used as the target subtask.

If no subtask without forward dependency exists, whether a subtask without forward dependency exists in the task queue can be judged again according to the forward dependency task count of each subtask.

It should be noted that, after the target subtask is detected, it may be determined whether there are any remaining subtasks in the task queue, and if there are any remaining subtasks, it may be determined whether there is a target subtask in the task queue according to the forward dependent task count of each subtask.

In the task scheduling system provided in the embodiment of the present invention, the task dependency relationship table may include dependency information of each subtask of the task group to which the task dependency relationship table belongs, where the dependency information includes: a forward task count, a successor task count, and a successor task list.

For example, task 0, task 1, task 2, task 3, and task 4 exist in the task group;

if the forward task count of the task 0 is 0, the subsequent task count is 4, and the subsequent task list is [ task 1, task 2, task 3, task 4 ];

then, for task 2, the forward task count of task 2 may be 2, the successor task count may be 2, and the successor task list may be [ task 3, task 4 ];

for task 4, the forward count for task 4 may be 4, the successor task count may be 0, and the successor task list is empty.

In the task scheduling system provided in the embodiment of the present invention, based on the above scheme, specifically, the task scheduling system further includes: an update unit;

It should be noted that the execution core may decrement the forward dependency count of each subtask currently existing in the task group by one.

The current remaining subtasks in the task group can be determined through the subsequent task list of the executed to-be-processed subtask, and the task identifier of the subsequent task of the executed to-be-processed subtask is recorded in the subsequent task list.

The system provided by the embodiment of the invention can update the forward dependency task counts of each currently remaining subtask in the task group to which the executed to-be-processed subtask belongs, so that the global task scheduling unit can detect a new target subtask according to the forward dependency counts of each currently remaining subtask.

Referring to fig. 2, a diagram of another exemplary structure of the task scheduling system provided by the present invention is shown, where the task scheduling system further includes: a task decomposition unit 103; the task decomposition unit 103 is configured to:

when a task decomposition instruction is received, determining a task to be decomposed corresponding to the task decomposition instruction and task granularity specified by the task decomposition instruction;

decomposing the task to be decomposed based on the task granularity to obtain a task group corresponding to the task to be decomposed, wherein the task group comprises each subtask;

determining the operation type of each subtask, and constructing a dependency relationship table based on the operation type of each subtask, wherein the dependency relationship table records the dependency relationship among the subtasks;

and sending the task group corresponding to the task to be decomposed and the dependency relationship table corresponding to the task group to the global memory, so that the global task scheduling unit 101 stores each subtask of the task group into a task queue of the global memory.

The method comprises the steps that a task to be decomposed is obtained by tasking an application program to be executed through a preset tasking function, wherein the task to be decomposed can be an application program task in a matrix form; for example, the task to be decomposed may be a 4 × 4 matrix, a 4 × 5 matrix, a 5 × 5 matrix, and other matrices of various orders, and the execution of the application task starts with one or more start tasks with a forward task count of 0 and ends with a plurality of end tasks without subsequent tasks.

The structure of the task group obtained by decomposing the decomposition task may include: task index, number of subtasks, number of dependent tasks, and task data.

Specifically, the task granularity of the subtask may be set according to actual requirements, for example, the task granularity of the subtask may be a matrix of 1 × 1; the task to be decomposed can be decomposed according to the task granularity to obtain subtasks with the task granularity.

It should be noted that a preset matrix decomposition algorithm may be called to decompose the to-be-decomposed task according to the task granularity.

Optionally, the matrix decomposition algorithm may be completed through four operations, which mainly include: DGEQT, DLARFB, DTSQT and DSSRFB.

The DGEQT is used for carrying out QR decomposition on the sub-matrix blocks on the diagonal of the matrix to obtain two intermediate sub-matrix results: an upper triangular matrix and a lower triangular matrix; based on the upper triangular matrix, DLARFB carries out updating operation on all the sub-matrix blocks of the column, and based on the lower triangular matrix, DTSQT carries out updating operation on all the sub-matrix frames of the row; the DSSRFB updates all remaining sub-matrix blocks based on the two operated sub-matrix blocks DTSQT and DLARFB.

Specifically, DGEQT: mainly for sub-matrix blocks A on the main diagonal of the matrix_kkPerforming QR decomposition, recording transformation matrix V in operation_kkWhile simultaneously generating an upper triangular matrix U_kkA lower triangular matrix L_kk：

U_kk，L_kk←A_kk(QR transform process recording matrix V_kk)

DLARFB: u obtained by DGEQT decomposition_kkAnd transformation matrix V of the total operation records_kkUpdating the sub-matrix blocks of other rows in the column and updating the sub-matrix blocks of the rows from k to i, wherein the updating method comprises the following steps:

where I is an identity matrix, such a matrix plays a particular role in the multiplication of the matrix, like 1 in the multiplication of numbers. In the unit matrix, elements on a diagonal line from the upper left corner to the lower right corner (referred to as a main diagonal line) are all 1, and other elements are all 0.

DTSQT: l decomposed by DGEQT_kkAnd transformation matrix V of the total operation records_kkUpdating the sub-matrix blocks of other columns of the row and updating the sub-matrix blocks of k to j columns, wherein the updating method comprises the following steps:

DSSRFB: matrix block A updated according to DTSQT and DLARFB_ikAnd A_kjAnd transformation matrix V of the total operation records_kkFor the corresponding matrix block A in the original matrix_ijUpdating:

therefore, the decomposition task is circularly decomposed through the four operations of DGEQT, DLARFB, DTSQT and DSSRFB, and a plurality of subtasks can be obtained.

In the task scheduling system provided in the embodiment of the present invention, based on the above scheme, specifically, according to the processing priority of each to-be-processed sub-task currently stored in the task buffer, the local task scheduling unit 102 that sequentially schedules each to-be-processed sub-task to the execution core of the stream processor may include:

and the scheduling subunit is used for sequentially scheduling each sub-task to be processed to the execution kernel of the stream processor according to the sequence that the task out-degree corresponding to each sub-task to be processed is from large to small.

Specifically, the larger the task out-degree of the sub-task to be processed is, the higher the processing priority of the sub-task is.

For example, if there are multiple pending subtasks, they are: task A, task B, task C and task D; the task out degree of the task A is 4, the task out degree of the task B is 6, the task out degree of the task C is 5, and the task out degree of the task D is 8; the processing priorities of the sub tasks to be processed are from high to low: task D-task B-task C-task A.

It should be noted that, in the system provided by the present invention, according to the processing priority of each to-be-processed sub-task currently stored in the task buffer, the local task scheduling unit 102 that sequentially schedules each to-be-processed sub-task to the execution core of the stream processor may be further configured to:

determining a task decomposition sequence corresponding to each to-be-processed subtask currently stored in the task buffer area, wherein the task decomposition sequence is a sequence of the to-be-processed subtask obtained in a to-be-decomposed task process to which the to-be-processed subtask belongs by applying a task decomposition algorithm;

determining the processing priority of each sub task to be processed based on the task decomposition sequence corresponding to each sub task to be processed;

and scheduling each sub task to be processed to an execution kernel of the stream processor in sequence according to the sequence of the processing priority of each sub task to be processed from high to low.

Specifically, the processing priority of each to-be-processed subtask may be determined according to a sequence from first to last of a task decomposition sequence corresponding to each to-be-processed subtask, that is, if the task decomposition sequence corresponding to the to-be-processed subtask is earlier, the processing priority corresponding to the to-be-processed subtask is higher.

Based on the task scheduling system provided by the embodiment of the invention, the embodiment of the invention also provides a task scheduling method, which corresponds to the task scheduling system provided by the embodiment of the invention; the method is applied to a Graphics Processing Unit (GPU), the GPU comprises a global memory and a plurality of stream processors, and a method flow chart of the method is shown in figure 3, and the method specifically comprises the following steps:

s301: and when detecting that a target subtask exists in the task queue stored in the global memory, sending the target subtask to a task buffer area of a stream processor with the least task amount in the plurality of current stream processors.

The task queue comprises at least one task group, and the task group comprises one subtask or a plurality of subtasks with dependency relationship; the target subtask is a subtask without forward dependency in the task queue.

S302: and determining the target subtask which is sent to the task buffer area as a subtask to be processed.

S303: and judging whether the number of the currently remaining to-be-processed subtasks in the task cache area is multiple, if so, executing S304, and if not, executing S305.

S304: and according to the processing priority of each sub-task to be processed, scheduling each sub-task to be processed to the execution kernel of the stream processor with the least task quantity in sequence, and enabling the execution kernel to execute the received sub-task to be processed.

S305: and scheduling the to-be-processed subtask to an execution core of the stream processor with the least task quantity, so that the execution core executes the received to-be-processed subtask.

In the method provided in the embodiment of the present invention, based on the implementation process, specifically, the process of detecting the target subtask in the task queue stored in the global memory includes:

In the method provided in the embodiment of the present invention, based on the foregoing implementation process, specifically, after the executing core executes the received to-be-processed sub task, the method further includes:

The task scheduling method disclosed in the above embodiment of the present invention is the same as the principle and the execution process of each unit and module in the task scheduling device disclosed in the above embodiment of the present invention, and reference may be made to the corresponding parts in the task scheduling system provided in the above embodiment of the present invention, which is not described herein again.

In an embodiment provided by the present invention, referring to fig. 4, a diagram of a backend architecture example of a task scheduling method based on a GPU is provided in the embodiment of the present invention;

the GPU comprises a global memory and a plurality of stream processors, each stream processor comprises a shared memory and an execution kernel, and the shared memory comprises a task buffer area; the global memory is used for receiving a dependency graph sent by the dependency analysis front end and a task queue corresponding to the dependency graph, wherein the dependency analysis front end can be a Central Processing Unit (CPU), the CPU can decompose tasks of an application program to obtain data operations of all subtasks of the tasks, the dependency relationship among the subtasks is determined according to the data operations of the subtasks, and the execution time sequence of the subtasks is determined based on the data dependency among different subtasks.

Specifically, the backend architecture includes a global scheduler and a local scheduler for each stream processor.

It should be noted that, in general, the storage system of the GPU is designed hierarchically, the global memory may be accessed by the execution cores of all the stream processors globally, and the shared memory located inside each stream processor is exclusive to the stream processing and only allows its own execution core access.

The global scheduler may search a target sub-task in a task queue of the global memory, where the target sub-task is a sub-task without forward dependency, that is, a sub-task with a forward task count of zero.

After the global scheduler finds the target subtask, the processor with the minimum current task amount in each stream processor is determined, and the target subtask is sent to the task buffer area of the shared memory of the stream processor with the minimum task amount.

The local scheduler can select the subtask with the highest processing priority from the task buffer; and transmitting the subtask with the highest processing priority to the execution kernel which runs persistently, so that the execution kernel executes the task.

The stream processor takes control and checks the dependency of each subtask in the successor task list of the completed subtask, i.e. the stream processor can check the dependency graph to find the successor task of the executed subtask and reduce the forward dependent task count of each successor task, thereby obtaining a new subtask with a forward dependent task count of 0 to wait for the scheduling of the global scheduler.

The embodiment of the invention also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the device where the storage medium is located is controlled to execute the task scheduling method.

An electronic device is provided in an embodiment of the present invention, and the structural diagram of the electronic device is shown in fig. 5, which specifically includes a memory 501 and one or more instructions 502, where the one or more instructions 502 are stored in the memory 501, and are configured to be executed by one or more processors 503 to perform the following operations according to the one or more instructions 502:

when detecting that a target subtask exists in a task queue stored in a global memory, sending the target subtask to a task buffer area of a stream processor with the least task amount in the plurality of current stream processors;

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The task scheduling method provided by the present invention is described in detail above, and the principle and the implementation of the present invention are explained in this document by applying specific examples, and the description of the above examples is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A task scheduling system for a graphics processor, GPU, the GPU comprising a global memory and a plurality of stream processors, the system comprising:

a global task scheduling unit and a local task scheduling unit;

2. The task scheduling system of claim 1, further comprising: a detection unit;

3. The task scheduling system of claim 2, further comprising: an update unit;

4. The task scheduling system of claim 2, further comprising: a task decomposition unit;

5. The task scheduling system of claim 4, wherein the local task scheduling unit that sequentially schedules each of the to-be-processed subtasks to the execution core of the stream processor according to the processing priority of each of the to-be-processed subtasks currently stored in the task buffer comprises:

6. A task scheduling method applied to a Graphics Processor (GPU), wherein the GPU comprises a global memory and a plurality of stream processors, the method comprising:

7. The task scheduling method according to claim 6, wherein the process of detecting the target subtask in the task queue stored in the global memory comprises:

8. The task scheduling method according to claim 7, wherein after the causing the execution core to execute the received pending subtasks, further comprising:

9. A storage medium, characterized in that the storage medium comprises stored instructions, wherein when the instructions are executed, a device on which the storage medium is located is controlled to execute the task scheduling method according to any one of claims 6 to 8.

10. An electronic device comprising a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by the one or more processors to perform the method of task scheduling according to any one of claims 6 to 8.