CN117389712B

CN117389712B - GPU multithread scheduling management system

Info

Publication number: CN117389712B
Application number: CN202311694615.4A
Authority: CN
Inventors: 王爽; 孔超; 管叙民; 请求不公布姓名
Original assignee: Muxi Integrated Circuit Nanjing Co ltd
Current assignee: Muxi Integrated Circuit Nanjing Co ltd
Priority date: 2023-12-12
Filing date: 2023-12-12
Publication date: 2024-03-12
Anticipated expiration: 2043-12-12
Also published as: CN117389712A

Abstract

The invention discloses a GPU multithread scheduling management system, which belongs to the technical field of computation and comprises a thread scheduling module, a thread configuration module, a software configuration module and a breakpoint storage module; the thread scheduling module is configured in the GPU chip and comprises a first thread scheduling module and a second thread scheduling module; the thread configuration module is connected with the first thread scheduling module, and the software configuration module and the breakpoint storage module are respectively connected with the second thread scheduling module. According to the invention, through a scheduling mechanism of cooperation of software and hardware of the GPU chip, an efficient multithreading scheduling implementation scheme is provided, only software is needed to initialize and configure each thread, the GPU chip automatically realizes scheduling and thread switching, the time of multithreading scheduling is greatly reduced, the execution efficiency of multithreading scheduling is improved, and the method can be widely used for managing and scheduling the multithreading of a system.

Description

GPU multithread scheduling management system

Technical Field

The invention relates to the technical field of computation, in particular to a GPU multithreading scheduling management system.

Background

GPU chips have huge numbers of computing cores and powerful instruction sets, and are widely applied to numerous fields such as data centers, artificial intelligence and the like. The GPU chip supports multiple users to process data, each user using one Thread (Thread). The thread is the minimum scheduling unit of process operation, which is also the minimum unit in the process of program execution, and consists of thread number, program counter, register, etc. The introduction of the thread reduces the overhead of concurrent execution of the program and improves the concurrency performance of an operating system. Because only 1 thread is active at the same time, the GPU chip needs to support multiple thread scheduling.

In the prior art, the multithreading scheduling often adopts a traditional software management mode. First, the software creates multiple threads and schedules one of the threads to start working. Second, the software monitors the state of the thread. When the thread works for a certain time, the software stops the current thread and retains information. The software then reschedules the new thread. This has the advantage of facilitating software management.

But has the following problems: 1) The efficiency is lower: the time is longer by adopting a software scheduling mode; the scheduling switching between threads and each operation in the threads have time delay, which results in long execution time and low efficiency. 2) The software operation is complex: whether the thread is scheduled or the current thread is finished to be rescheduled, the operation of software is complex; in addition, the software monitors the state of the thread currently winning the schedule.

Meanwhile, in the prior art, a traditional software switching mode is often adopted for multithread switching. First, the software monitors the state of the current worker thread. When the thread is operating for a certain period of time, the software will stop the current thread. Second, the software maintains breakpoint information for the current thread, including instruction type, pointer location, address, instruction length, etc. Finally, after the information reservation is completed, the software is switched to a new thread. This has the advantage of facilitating software management.

But has the following problems: 1) The time delay is large, and the efficiency is lower: the mode of switching threads by software is adopted, so that the time is relatively long; particularly, the current thread is stopped, breakpoint information of the current thread is stored, much information needs to be recorded, execution time is long, and efficiency is low. 2) The software operation is complex: in particular, the software stores breakpoint information of the current thread, and relates to information such as instruction type, pointer position, address, instruction length and the like, which need to be recorded by a register and the like, so that the software operation is very complex.

Disclosure of Invention

In order to at least solve the technical problems in the background art, the invention provides a GPU multithreading scheduling management system.

A GPU multithread scheduling management system comprises a thread scheduling module, a thread configuration module, a software configuration module and a breakpoint storage module; the thread scheduling module is configured in the GPU chip and comprises a first thread scheduling module and a second thread scheduling module;

the thread configuration module is connected with the first thread scheduling module, and the software configuration module and the breakpoint storage module are respectively connected with the second thread scheduling module.

Optionally, the thread configuration module is configured to map the threads of the user to the threads of the GPU chip in a configuration mapping manner, and initialize the corresponding threads according to the thread numbers scheduled by the first thread scheduling module;

the first thread scheduling module is used for scheduling among a plurality of threads, sending the scheduled thread numbers to the thread configuration module, and simultaneously monitoring the state of the thread which wins scheduling and performing thread switching.

Optionally, the thread configuration module maps the thread of the user to the thread of the GPU chip by configuring RingBuffer.

Optionally, the thread configuration module configuration content includes, but is not limited to, thread valid flag, space size, address, scheduling priority of the thread; the scheduling mode of the first thread scheduling module includes, but is not limited to, polling, absolute priority, and scheduling mode is configured by the thread configuration module.

Optionally, the first thread scheduling module is specifically configured to monitor a thread state that wins scheduling and time, stop the work of the thread after the time reaches the time slice configured by the thread configuration module, and send a stop instruction to the thread configuration module to trigger the thread configuration module to retain the information and data of the work of the thread.

Optionally, the software configuration module is used for initializing and configuring threads;

the second thread scheduling module is used for scheduling among a plurality of threads and switching among the threads according to the breakpoint information of the corresponding threads sent by the breakpoint storage module;

the breakpoint storage module is used for recording the current scheduled thread number, storing the breakpoint information of the thread, and returning the breakpoint information of the corresponding thread to the second thread scheduling module.

Optionally, the scheduling mode and time slices of the second thread scheduling module are configured by the software configuration module.

Optionally, the second thread scheduling module is further configured to send a read request to the breakpoint storage module, and when the breakpoint storage module returns that the target thread is a breakpoint thread, directly jump to a breakpoint position, and start working to process data from the breakpoint position of the thread.

Optionally, the second thread scheduling module is further configured to control the timer to start counting when the scheduled thread starts working, stop the working of the thread when the counting reaches the working time slice, send a stop instruction to the software configuration module,

optionally, the second thread scheduling module is further configured to write current information into the breakpoint storage module when it is determined that the current thread is not executed; if the thread has been completed, then there is no need to write the current information to the breakpoint storage module.

Compared with the prior art, the invention has the following advantages:

1) The efficiency is high: the multi-thread scheduling is realized through the GPU chip, so that the scheduling time is greatly reduced, meanwhile, the GPU chip automatically monitors the current working thread and completes thread switching, the workload of users and software is reduced, and the efficiency of executing the multi-thread instructions of the system is remarkably improved.

2) The software is simple to operate: after the GPU chip and the software are connected, the whole scheduling process is given to the GPU chip, a reporting mechanism is provided, and only the software is needed to be configured, so that the operation complexity of a Host is simplified.

3) Low time delay and high efficiency: the GPU chip actively stores breakpoint information and automatically judges whether the breakpoint information is the breakpoint information or not when the threads are switched, so that the time for switching the threads is greatly shortened, and the efficiency of executing the multithreaded instructions of the system is remarkably improved.

4) The software is simple to operate: the method and the device have the advantages that the information quantity stored by the thread is large, the software needs to record a lot of information, the operation is complex, the breakpoint storage and switching judgment work is realized on the GPU chip, and only the software is needed to be configured, so that the operation complexity of a Host is greatly simplified.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a GPU multithreading scheduling management system according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of a portion of a GPU multithreading schedule management system according to an embodiment of the present invention.

FIG. 3 is a schematic diagram of a thread configuration module according to an embodiment of the present invention.

FIG. 4 is a schematic diagram illustrating a first thread scheduling module according to an embodiment of the present invention.

FIG. 5 is a flow chart of a method for managing GPU multithreading scheduling in accordance with an embodiment of the present invention.

FIG. 6 is a schematic diagram of another portion of a GPU multithreading schedule management system in accordance with the present invention.

FIG. 7 is a schematic diagram of a second thread scheduling module according to the present invention.

Fig. 8 is a schematic diagram of the structure of the breakpoint memory module of the present invention.

Fig. 9 is a schematic diagram of the software configuration module in the present invention.

FIG. 10 is a flow chart of a GPU multithreading method according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms first, second, third, etc. or module a, module B, module C and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order, and it is to be understood that the specific order or sequence may be interchanged if permitted to implement embodiments of the invention described herein in other than those illustrated or described.

In the following description, reference numerals indicating steps such as S110, S120, … …, etc. do not necessarily indicate that the steps are performed in this order, and the order of the steps may be interchanged or performed simultaneously as allowed.

The term "comprising" as used in the description and claims should not be interpreted as being limited to what is listed thereafter; it does not exclude other elements or steps. Thus, it should be interpreted as specifying the presence of the stated features, integers, steps or components as referred to, but does not preclude the presence or addition of one or more other features, integers, steps or components, or groups thereof. Thus, the expression "a device comprising means a and B" should not be limited to a device consisting of only components a and B.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments as would be apparent to one of ordinary skill in the art from this disclosure.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. If there is a discrepancy, the meaning described in the present specification or the meaning obtained from the content described in the present specification is used. In addition, the terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Referring to fig. 1, the embodiment of the invention discloses a GPU multithreading scheduling management system, which comprises a thread scheduling module, a thread configuration module, a software configuration module and a breakpoint storage module; the thread scheduling module is configured in the GPU chip and comprises a first thread scheduling module and a second thread scheduling module;

The invention constructs the thread scheduling module, which comprises a first thread scheduling module and a second thread scheduling module, and can be respectively used for realizing multi-thread scheduling and switching of the GPU.

Optionally, referring to fig. 2, a thread configuration module is configured to map a thread of a user to a thread of a GPU chip in a configuration mapping manner, and initialize a corresponding thread according to a thread number scheduled by the first thread scheduling module;

According to the invention, a scheduling mechanism of cooperation of the software and the hardware of the GPU chip is adopted, software (corresponding to the thread configuration module) is only responsible for initialization and configuration of each thread, and the GPU chip (corresponding to the first thread scheduling module) realizes scheduling according to configuration, so that the time required by multi-thread scheduling is greatly reduced. Meanwhile, the GPU chip monitors the state and running time of each thread and automatically switches the threads according to the configuration of time slices and the like. And then the result is informed to the software, so that the efficiency of multi-thread switching is improved.

It should be noted that, the first thread scheduling module and the second thread scheduling module in the present invention may be integrated into a single scheduling module, which is not limited in particular.

Optionally, referring to fig. 3, the thread configuration module maps the thread of the user to the thread of the GPU chip by configuring RingBuffer.

Optionally, the thread configuration module configuration content includes, but is not limited to, thread valid flag, space size, address, scheduling priority of the thread; referring to FIG. 4, the scheduling modes of the first thread scheduling module include, but are not limited to, polling, absolute priority, and scheduling is configured by the thread configuration module.

The GPU chip supports N threads on hardware, each of which may be configured independently, including but not limited to information on the thread's valid flag, space size, address, scheduling priority, etc. The default valid flag is 0, which indicates that the current thread is an invalid thread and the work is not started. All of the above information supports software through register configuration.

And the software builds a user thread according to the user demand. Meanwhile, the software maps the user thread to the thread of the GPU chip in a manner of configuring RingBuffer. And configuring registers such as the space size, the address, the scheduling priority and the like of the threads according to the demands of users.

The software configures the active flag for that thread to 1, indicating that the current thread is an active thread, which can be scheduled. If the software wants to stop or delete this thread, the valid flag can be directly configured to 0.

Optionally, the first thread scheduling module is specifically configured to monitor a thread state that wins scheduling and time, stop the work of the thread after the time reaches the time slice configured by the thread configuration module, and send a stop instruction to the thread configuration module to trigger the thread configuration module to keep information, data, and the like of the work of the thread.

The first thread scheduling module is responsible for scheduling among a plurality of threads, and a scheduling mode supports, but is not limited to, polling (Round Robin), absolute Priority (struct Priority) and the like, and is configured by software. The working time of a thread is called a time slice and is configured by software.

The first thread scheduling module selects threads according to a scheduling mode: 1) Polling: the scheduling module selects 1 thread (default scheduling from thread 0) in the current effective threads; 2) Absolute priority: the scheduling module selects a thread with highest scheduling priority from the current effective threads.

When a thread wins the schedule, the thread starts to work, sends data according to the instruction of the user, calculates the data, and the like. At the same time, the first thread scheduling module controls the timer to start counting, the timer starts counting from 0, and the first thread scheduling module feeds back the thread number which wins scheduling to the software. When the timer reaches the software configured time slice, the scheduling module stops the thread. And sends a stop instruction to the software for retaining the information, data, etc. of the current thread operation.

After the primary thread scheduling is completed, the first thread scheduling module reselects 1 thread among the active threads. Taking the example of the scheduling mode configured as polling: if the last thread that wins the schedule is thread 0, then the schedule starts with 1; when the thread 1 starts to work, the scheduling module starts to count time, and the timer starts from 0; and so on. When the scheduled thread is the last active thread N, the scheduling module resumes scheduling from thread 0. Taking the example of the scheduling mode being configured as absolute priority: the scheduling module selects a thread with highest scheduling priority from the current effective threads; if thread 0 is the highest priority, thread 0 will always be selected until the complete processing of thread 0's data is completed.

Referring to fig. 5, the embodiment of the invention also discloses a GPU multithreading scheduling method, which comprises the following steps:

s1, initializing a GPU chip: the thread configuration module maps threads to the GPU chip by configuring thread register information of the GPU chip according to the requirements of a user;

the registers include information such as thread valid flags, space size, addresses, scheduling priority, etc.

S2, the first thread scheduling module starts multi-thread scheduling: the first thread scheduling module reads the current state of each thread, and when more than 1 thread capable of being scheduled exists, a target thread is selected according to a scheduling mode (when the effective mark of a certain thread is 1 and the thread has data, the thread is considered as the thread capable of being scheduled, otherwise, the thread cannot participate in scheduling); otherwise, continuing to judge until a thread capable of being scheduled exists;

s3, the first thread scheduling module feeds back the scheduled thread number to the thread configuration module, and meanwhile, the first thread scheduling module controls a timer to start timing (from 0);

s4, the thread configuration module records the thread number and configures related information;

s5, the thread starts to work, and corresponding operation is executed according to the requirement of a user;

for example, reading data from an address, calculating data, writing an address after calculation, and the like.

S6, after the timing of the timer reaches the time slice configured by the thread configuration module, the first thread scheduling module ends the thread and feeds back the end information to the first thread scheduling module;

and S7, the first thread scheduling module repeats the flow of the steps S2-S6 and reschedules a new thread.

In addition, the following detailed description is provided.

Embodiment 1:

and initializing the GPU chip. The software maps the threads to the GPU chip by configuring the thread register information of the GPU chip according to the requirements of the user. The registers include information such as thread valid flags, space size, addresses, scheduling priority, etc. There are currently 3 threads, threads 0, 1, 2. Thread 0 valid flag is 1, space size is 1GByte, address is 0x1000000, priority is 0 (high); thread 1 valid flag is 1, space size is 2GByte, address is 0x2000000, priority is 4. Thread 2 valid flag is 1, space size 256MByte, address 0x8000000, priority 7 (low).

The multi-threaded scheduling is started. The first thread scheduling module begins operation.

The state of each thread is read. When the active flag of the current thread 0, 1, 2 is found to be 1 and the thread has data, this thread is considered to be a schedulable thread.

It is determined whether there are threads that can be scheduled. The current user configures the scheduling mode to be polling, and thread 0 is then selected from threads 0, 1, 2. The time slices are configured to be 1ms.

Thread 0 is scheduled out and the software is told thread number 0. At the same time, the scheduling module starts timing, starting from 0.

The software configures the relevant information of 0 according to the thread number.

Thread 0 begins to work. According to the requirement of the user, the thread 0 needs to read 2 groups of 256byte data from the system side to the GPU and perform multiplication calculation. And transmitting the calculated data to the system.

When the timer reaches 1ms, the scheduling module ends the work of the thread 0 and feeds back the end information to the software. If the current thread 0 calculation has been completed, the software will reassign a new task to thread 0 and wait for the next dispatch. If the calculation of the current thread 0 is not completed, the software records the data which is calculated currently and waits for the next scheduling to resume the calculation.

The scheduling system will continually repeat the above-mentioned initialization process and reschedule a new thread.

Embodiment 2:

and initializing the GPU chip. The software maps the threads to the GPU chip by configuring the thread register information of the GPU chip according to the requirements of the user. The registers include information such as thread valid flags, space size, addresses, etc. There are currently 3 threads, threads 0, 8, 15. Thread 0 valid flag is 1, space size is 1GByte, address is 0x1000000, priority is 7 (low); thread 8 valid flag is 1, space size is 2GByte, address is 0x2000000, priority is 4. Thread 15 valid flag is 1, space size 256MByte, address 0x8000000, priority 0 (high).

The multithreading is started and the first thread scheduling module begins to operate.

The state of each thread is read. When the active flag of the current thread 0, 8, 15 is found to be 1 and the thread has data, this thread is considered to be a schedulable thread.

It is determined whether there are threads that can be scheduled. The current user configures the scheduling mode to be absolute priority, and thread 15 is then selected from threads 0, 8, 15. The time slices are configured to be 100ms.

The thread 15 is dispatched and the thread number 15 is sent to the software. At the same time, the scheduling module starts timing, starting from 0.

The software configures 15 the relevant information according to the thread number.

Thread 15 begins to operate. According to the user's needs, the thread 15 needs to read 1024 data from the system side into the GPU and perform addition calculation. And transmitting the calculated data to the system.

When the timer reaches 100ms, the scheduling module ends the operation of the thread 15 and feeds back the end information to the software. If the computation of the current thread 15 has been completed, the software will reassign a new task to the thread 15 and wait for the next dispatch. If the calculation of the current thread 15 is not completed, the software records the data which is calculated currently, and waits for the next scheduling to resume the calculation.

Optionally, referring to fig. 6, the software configuration module is configured to initialize and configure threads;

When the thread is switched, the GPU chip actively stores breakpoint information of the thread, and the thread information which is not executed is recorded to the breakpoint storage module. Thus greatly simplifying the operation complexity of the software and reducing the time and space of the software record. And secondly, during the scheduling of the thread, the GPU chip judges whether the thread has a breakpoint or not. If the thread has a breakpoint, the GPU chip automatically jumps to the position where the thread was executed last time and continues to execute on the original data. Thus, the time for thread switching is greatly shortened, and the efficiency of the system multithreading instruction execution is remarkably improved.

Optionally, the scheduling mode is polling.

Optionally, the second thread scheduling module is further configured to control the timer to start counting (counting from 0) when the scheduled thread starts working, stop the working of the thread when the counting reaches the working time slice, and send a stop instruction to the software configuration module.

Optionally, the second thread scheduling module is further configured to write current information into the breakpoint storage module when it is determined that the current thread is not executed, and if the thread is already executed, the current information does not need to be written into the breakpoint storage module.

Optionally, the current information includes, but is not limited to, information of thread type, size of remaining space, address of current breakpoint, etc.

As shown in fig. 7, the second thread scheduling module is responsible for scheduling among the plurality of threads, with the scheduling mode and the working time slices configured by the software configuration module.

After the second thread scheduling module starts working, firstly selecting threads according to a scheduling mode. Taking the example of the scheduling mode configured as polling: the scheduling module selects 1 thread from the current active threads, and starts scheduling from thread 0 by default. And secondly, when a certain thread wins the scheduling, the thread starts to work, and the second thread scheduling module sends a read request to the breakpoint storage module according to the thread number of the winning scheduling and sends the thread number to the software configuration module.

If the thread is found to be the breakpoint thread in the breakpoint storage module, the thread directly jumps to the breakpoint position, and the data processing is started from the breakpoint position of the thread. For example, data continues to be read from the last address and calculation is continued based on the last data. If the thread is not a breakpoint thread, the software will initialize, send data according to the user's instructions, calculate the data, etc. At the same time, the timer of the second thread scheduling module begins counting from 0.

And then, after the timer reaches the time slice of the software configuration, the second thread scheduling module stops the work of the thread and sends a stop instruction to the software configuration module. Meanwhile, the second thread scheduling module judges whether the current thread is executed to complete: if the thread is not executed, writing the current information into a breakpoint storage module, and setting a breakpoint effective mark of the thread number to be 1; if the thread has been executed, then the current information need not be written to the breakpoint storage module, and the breakpoint valid flag of the default thread is 0.

Finally, the second thread scheduling module may reselect 1 thread among the active threads. Taking the example of the scheduling mode configured as polling: if the last thread that won the schedule is thread 0, then the schedule starts with 1. When thread 1 starts working, the scheduler module starts counting with a start timer, which starts from 0. And so on. When the scheduled thread is the last active thread N, the second thread scheduling module resumes scheduling from thread 0.

As shown in fig. 8, the breakpoint storage module is responsible for recording the current scheduled thread number, writing the breakpoint information of the thread into the storage unit, and simultaneously, returning the breakpoint information of the thread according to the switched thread number query command. And after the dispatching is completed, the breakpoint buffer module queries whether the thread is a breakpoint thread according to the thread number sent by the second thread dispatching module. If the thread is a breakpoint thread, the breakpoint buffer module returns breakpoint information including information of thread type, size of remaining space, address of current breakpoint, etc. If the thread is not a breakpoint thread, an invalid flag is returned.

Referring to fig. 9, first, the GPU chip supports N threads on hardware. Each thread may be configured independently including, but not limited to, information on the thread's valid flag, space size, address, scheduling priority, etc. The default valid flag is 0, which indicates that the current thread is an invalid thread and the work is not started. All of the above information supports the software configuration module to configure through registers. And secondly, the software configuration module builds a user thread according to the user requirement. Meanwhile, the software configuration module maps the user thread to the thread of the GPU chip in a manner of configuring the RingBuffer. And configuring registers such as the space size, the address, the scheduling priority and the like of the threads according to the demands of users. Finally, the software configures the active flag for that thread to 1, indicating that the current thread is an active thread, which can be scheduled. If the software wants to stop or delete this thread, the valid flag can be directly configured to 0.

Referring to fig. 10, the embodiment of the invention discloses a GPU multithreading switching method, which comprises the following steps:

s1, initializing a GPU chip: the software configuration module maps threads to the GPU chip by configuring thread register information of the GPU chip according to the requirements of a user. The registers include information such as thread valid flags, space size, addresses, scheduling priority, etc.

S2, starting multithreading scheduling: and the second thread scheduling module reads the scheduling mode and the current state of each thread, and schedules the target thread from the effective threads.

It is determined whether there are threads that can be scheduled. If there are more than 1 thread that can be scheduled, the second thread scheduling module selects the thread according to the scheduling mode. Otherwise, the scheduler will determine until there are threads that can be scheduled.

And S3, transmitting the thread number of the target thread to the software configuration module, and simultaneously controlling a timer to start to count (from 0) by the second thread scheduling module.

S4, the second thread scheduling module sends a read request to a breakpoint storage module according to the thread number of the target thread and receives breakpoint information returned by the breakpoint storage module; if the breakpoint information indicates that the target thread is a breakpoint thread, the second thread scheduling module directly schedules the target thread to a breakpoint position, and starts working to process data from the breakpoint position (for example, continuously reading data from a last address and continuously calculating based on the last data); if the breakpoint information indicates that the target thread is a new thread, the software configuration module initializes the target thread, and the target thread begins to operate (e.g., reads data from an address, calculates data, writes an address after calculation, etc.) after the software configuration module has been initialized.

S5, when the timer of the second thread scheduling module reaches a time slice configured by software, the second thread scheduling module judges whether the current thread is executed or not, and if the target thread is not executed, the current information of the target thread is written into a breakpoint storage module; if the target thread has been executed, then the current information need not be written to the breakpoint storage module.

And S6, after the breakpoint information (including the thread number and the current information) is stored, the second thread scheduling module ends the target thread and feeds the end information back to the software configuration module.

And S7, the thread scheduling system repeats the flow of the steps S2-S6 so as to realize the switching of the new target thread.

Embodiments based on the above technical solutions are further provided below.

Embodiment 1:

and initializing the GPU chip. The software maps the threads to the GPU chip by configuring the thread register information of the GPU chip according to the requirements of the user. The registers include information such as thread valid flags, space size, address, instruction type, etc. There are currently 2 threads, threads 7, 15. Thread 7 valid flag is 1, space size is 4GByte, address is 0x1000000, priority is 1 (high), instruction type is matrix addition; thread 15 valid flag is 1, space size is 1GByte, address is 0xa0000000, priority is 2 (low), instruction type is matrix multiplication. In the current breakpoint storage module, the breakpoint effective flags of all threads are 0.

The multi-threaded scheduling is started. The second thread scheduling module begins operation.

The state of each thread is read. The active flag of the current thread 7, 15 is found to be 1 and is considered to be a schedulable thread.

It is determined whether there are threads that can be scheduled. The current user configures the scheduling mode to be priority, and thread 7 is then selected from threads 7, 15. The time slices are configured to be 2ms.

The thread 7 is dispatched and the information of the breakpoint memory module is read. The breakpoint valid flag for thread 7 is 0 indicating that the thread is a new thread. And tells the software thread number 7. At the same time, the scheduling module starts timing, starting from 0.

The software configures 7 the relevant information according to the thread number.

Thread 7 begins to operate. According to the user's needs, thread 7 needs to read 2 sets of 1Mbyte data from the system side into the GPU and do matrix addition. And transmitting the calculated data to the system.

When the timer reaches 2ms, the scheduler module will end the thread 7 operation. The calculation of the current thread 7 is not completed, the GPU writes the address 0x5000000, the instruction type, the residual space, the calculated number and the like which are read currently into the breakpoint storage module, writes the effective breakpoint indication into 1, and waits for the next scheduling to resume the calculation.

When the breakpoint information storage is completed, the scheduling module ends the work of the thread 7 and feeds back the end information to the software.

The scheduling system will repeat the procedure of 2 constantly. Rescheduling the new thread.

Embodiment 2:

there are currently 2 threads, threads 7, 15. Thread 7 valid flag is 1, space size is 4GByte, address is 0x1000000, priority is 1 (high), instruction type is matrix addition; thread 15 valid flag is 1, space size is 1GByte, address is 0xa0000000, priority is 2 (low), instruction type is matrix multiplication. In the current breakpoint storage module, the breakpoint effective flag of the thread 7 is 1, the read address is 0x5000000, the instruction type bit matrix multiplication is performed, the residual space is 1GByte, the calculated number is the same. The breakpoint valid flag for thread 15 is 0.

The thread 7 is dispatched and the information of the breakpoint memory module is read. The breakpoint valid flag for thread 7 is 1, indicating that execution has not completed before. The current address 0x5000000 is automatically jumped into. Execution continues based on other information such as instruction type bit matrix multiplication, 1GByte of space left, number already calculated, etc. At the same time, the scheduling module starts timing, starting from 0.

Thread 7 begins to operate. According to the user's needs, thread 7 needs to read 0x5000000 of data from the system side into the GPU and do matrix addition. And transmitting the calculated data to the system.

When the timer reaches 2ms, the scheduler module ends the thread 7 operation. At this time, the calculation of the current thread 7 is completed, and the GPU writes the breakpoint valid indication corresponding to the thread 7 in the breakpoint storage module to 0, which indicates that the thread has been executed, and no breakpoint information is provided.

When the breakpoint information is cleared, the scheduling module ends the work of the thread 7 and feeds the end information back to the software.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the invention, which fall within the scope of the invention.

Claims

1. A GPU multithreading scheduling management system is characterized in that: the system comprises a thread scheduling module, a thread configuration module, a software configuration module and a breakpoint storage module; the thread scheduling module is configured in the GPU chip and comprises a first thread scheduling module and a second thread scheduling module;

the thread configuration module is connected with the first thread scheduling module, and the software configuration module and the breakpoint storage module are respectively connected with the second thread scheduling module;

the thread configuration module is used for mapping the threads of the user to the threads of the GPU chip in a configuration mapping mode, and initializing the corresponding threads according to the thread numbers scheduled by the first thread scheduling module;

the first thread scheduling module is used for scheduling among a plurality of threads, sending the scheduled thread numbers to the thread configuration module, and simultaneously monitoring the state of the thread which wins scheduling and performing thread switching;

the thread configuration module maps the threads of the user to the threads of the GPU chip in a manner of configuring RingBuffer;

the software configuration module is used for initializing and configuring threads;

2. A GPU multithreaded dispatch management system according to claim 1, wherein: the thread configuration module configuration content comprises, but is not limited to, a thread effective mark, a space size, an address and a scheduling priority of a thread; the scheduling mode of the first thread scheduling module includes, but is not limited to, polling, absolute priority, and scheduling mode is configured by the thread configuration module.

3. A GPU multithreaded dispatch management system according to claim 2, wherein: the first thread scheduling module is specifically configured to monitor and count the state of the thread winning the schedule, stop the work of the thread after the count reaches the time slice configured by the thread configuration module, and send a stop instruction to the thread configuration module to trigger the thread configuration module to retain the information and data of the work of the thread.

4. A GPU multithreaded dispatch management system according to claim 1, wherein: the scheduling mode and time slices of the second thread scheduling module are configured by the software configuration module.

5. A GPU multithreaded dispatch management system according to claim 4, wherein: the second thread scheduling module is further configured to send a read request to the breakpoint storage module, and when the breakpoint storage module returns that the target thread is a breakpoint thread, directly jump to a breakpoint position, and start working to process data from the breakpoint position of the thread.

6. A GPU multithreaded dispatch management system according to claim 5, wherein: the second thread scheduling module is further used for controlling the timer to start timing when the scheduled thread starts working, stopping the working of the thread when the timing reaches a working time slice, and sending a stopping instruction to the software configuration module.

7. A GPU multithreaded dispatch management system according to claim 6, wherein: the second thread scheduling module is further used for writing current information into the breakpoint storage module when judging that the current thread is not executed to be completed; if the thread has been completed, then there is no need to write the current information to the breakpoint storage module.