CN117389712B - GPU multithread scheduling management system - Google Patents

GPU multithread scheduling management system Download PDF

Info

Publication number
CN117389712B
CN117389712B CN202311694615.4A CN202311694615A CN117389712B CN 117389712 B CN117389712 B CN 117389712B CN 202311694615 A CN202311694615 A CN 202311694615A CN 117389712 B CN117389712 B CN 117389712B
Authority
CN
China
Prior art keywords
thread
module
scheduling
breakpoint
threads
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311694615.4A
Other languages
Chinese (zh)
Other versions
CN117389712A (en
Inventor
王爽
孔超
管叙民
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Muxi Integrated Circuit Nanjing Co ltd
Original Assignee
Muxi Integrated Circuit Nanjing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Muxi Integrated Circuit Nanjing Co ltd filed Critical Muxi Integrated Circuit Nanjing Co ltd
Priority to CN202311694615.4A priority Critical patent/CN117389712B/en
Publication of CN117389712A publication Critical patent/CN117389712A/en
Application granted granted Critical
Publication of CN117389712B publication Critical patent/CN117389712B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a GPU multithread scheduling management system, which belongs to the technical field of computation and comprises a thread scheduling module, a thread configuration module, a software configuration module and a breakpoint storage module; the thread scheduling module is configured in the GPU chip and comprises a first thread scheduling module and a second thread scheduling module; the thread configuration module is connected with the first thread scheduling module, and the software configuration module and the breakpoint storage module are respectively connected with the second thread scheduling module. According to the invention, through a scheduling mechanism of cooperation of software and hardware of the GPU chip, an efficient multithreading scheduling implementation scheme is provided, only software is needed to initialize and configure each thread, the GPU chip automatically realizes scheduling and thread switching, the time of multithreading scheduling is greatly reduced, the execution efficiency of multithreading scheduling is improved, and the method can be widely used for managing and scheduling the multithreading of a system.

Description

GPU multithread scheduling management system
Technical Field
The invention relates to the technical field of computation, in particular to a GPU multithreading scheduling management system.
Background
GPU chips have huge numbers of computing cores and powerful instruction sets, and are widely applied to numerous fields such as data centers, artificial intelligence and the like. The GPU chip supports multiple users to process data, each user using one Thread (Thread). The thread is the minimum scheduling unit of process operation, which is also the minimum unit in the process of program execution, and consists of thread number, program counter, register, etc. The introduction of the thread reduces the overhead of concurrent execution of the program and improves the concurrency performance of an operating system. Because only 1 thread is active at the same time, the GPU chip needs to support multiple thread scheduling.
In the prior art, the multithreading scheduling often adopts a traditional software management mode. First, the software creates multiple threads and schedules one of the threads to start working. Second, the software monitors the state of the thread. When the thread works for a certain time, the software stops the current thread and retains information. The software then reschedules the new thread. This has the advantage of facilitating software management.
But has the following problems: 1) The efficiency is lower: the time is longer by adopting a software scheduling mode; the scheduling switching between threads and each operation in the threads have time delay, which results in long execution time and low efficiency. 2) The software operation is complex: whether the thread is scheduled or the current thread is finished to be rescheduled, the operation of software is complex; in addition, the software monitors the state of the thread currently winning the schedule.
Meanwhile, in the prior art, a traditional software switching mode is often adopted for multithread switching. First, the software monitors the state of the current worker thread. When the thread is operating for a certain period of time, the software will stop the current thread. Second, the software maintains breakpoint information for the current thread, including instruction type, pointer location, address, instruction length, etc. Finally, after the information reservation is completed, the software is switched to a new thread. This has the advantage of facilitating software management.
But has the following problems: 1) The time delay is large, and the efficiency is lower: the mode of switching threads by software is adopted, so that the time is relatively long; particularly, the current thread is stopped, breakpoint information of the current thread is stored, much information needs to be recorded, execution time is long, and efficiency is low. 2) The software operation is complex: in particular, the software stores breakpoint information of the current thread, and relates to information such as instruction type, pointer position, address, instruction length and the like, which need to be recorded by a register and the like, so that the software operation is very complex.
Disclosure of Invention
In order to at least solve the technical problems in the background art, the invention provides a GPU multithreading scheduling management system.
A GPU multithread scheduling management system comprises a thread scheduling module, a thread configuration module, a software configuration module and a breakpoint storage module; the thread scheduling module is configured in the GPU chip and comprises a first thread scheduling module and a second thread scheduling module;
the thread configuration module is connected with the first thread scheduling module, and the software configuration module and the breakpoint storage module are respectively connected with the second thread scheduling module.
Optionally, the thread configuration module is configured to map the threads of the user to the threads of the GPU chip in a configuration mapping manner, and initialize the corresponding threads according to the thread numbers scheduled by the first thread scheduling module;
the first thread scheduling module is used for scheduling among a plurality of threads, sending the scheduled thread numbers to the thread configuration module, and simultaneously monitoring the state of the thread which wins scheduling and performing thread switching.
Optionally, the thread configuration module maps the thread of the user to the thread of the GPU chip by configuring RingBuffer.
Optionally, the thread configuration module configuration content includes, but is not limited to, thread valid flag, space size, address, scheduling priority of the thread; the scheduling mode of the first thread scheduling module includes, but is not limited to, polling, absolute priority, and scheduling mode is configured by the thread configuration module.
Optionally, the first thread scheduling module is specifically configured to monitor a thread state that wins scheduling and time, stop the work of the thread after the time reaches the time slice configured by the thread configuration module, and send a stop instruction to the thread configuration module to trigger the thread configuration module to retain the information and data of the work of the thread.
Optionally, the software configuration module is used for initializing and configuring threads;
the second thread scheduling module is used for scheduling among a plurality of threads and switching among the threads according to the breakpoint information of the corresponding threads sent by the breakpoint storage module;
the breakpoint storage module is used for recording the current scheduled thread number, storing the breakpoint information of the thread, and returning the breakpoint information of the corresponding thread to the second thread scheduling module.
Optionally, the scheduling mode and time slices of the second thread scheduling module are configured by the software configuration module.
Optionally, the second thread scheduling module is further configured to send a read request to the breakpoint storage module, and when the breakpoint storage module returns that the target thread is a breakpoint thread, directly jump to a breakpoint position, and start working to process data from the breakpoint position of the thread.
Optionally, the second thread scheduling module is further configured to control the timer to start counting when the scheduled thread starts working, stop the working of the thread when the counting reaches the working time slice, send a stop instruction to the software configuration module,
optionally, the second thread scheduling module is further configured to write current information into the breakpoint storage module when it is determined that the current thread is not executed; if the thread has been completed, then there is no need to write the current information to the breakpoint storage module.
Compared with the prior art, the invention has the following advantages:
1) The efficiency is high: the multi-thread scheduling is realized through the GPU chip, so that the scheduling time is greatly reduced, meanwhile, the GPU chip automatically monitors the current working thread and completes thread switching, the workload of users and software is reduced, and the efficiency of executing the multi-thread instructions of the system is remarkably improved.
2) The software is simple to operate: after the GPU chip and the software are connected, the whole scheduling process is given to the GPU chip, a reporting mechanism is provided, and only the software is needed to be configured, so that the operation complexity of a Host is simplified.
3) Low time delay and high efficiency: the GPU chip actively stores breakpoint information and automatically judges whether the breakpoint information is the breakpoint information or not when the threads are switched, so that the time for switching the threads is greatly shortened, and the efficiency of executing the multithreaded instructions of the system is remarkably improved.
4) The software is simple to operate: the method and the device have the advantages that the information quantity stored by the thread is large, the software needs to record a lot of information, the operation is complex, the breakpoint storage and switching judgment work is realized on the GPU chip, and only the software is needed to be configured, so that the operation complexity of a Host is greatly simplified.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of a GPU multithreading scheduling management system according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a portion of a GPU multithreading schedule management system according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a thread configuration module according to an embodiment of the present invention.
FIG. 4 is a schematic diagram illustrating a first thread scheduling module according to an embodiment of the present invention.
FIG. 5 is a flow chart of a method for managing GPU multithreading scheduling in accordance with an embodiment of the present invention.
FIG. 6 is a schematic diagram of another portion of a GPU multithreading schedule management system in accordance with the present invention.
FIG. 7 is a schematic diagram of a second thread scheduling module according to the present invention.
Fig. 8 is a schematic diagram of the structure of the breakpoint memory module of the present invention.
Fig. 9 is a schematic diagram of the software configuration module in the present invention.
FIG. 10 is a flow chart of a GPU multithreading method according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms first, second, third, etc. or module a, module B, module C and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order, and it is to be understood that the specific order or sequence may be interchanged if permitted to implement embodiments of the invention described herein in other than those illustrated or described.
In the following description, reference numerals indicating steps such as S110, S120, … …, etc. do not necessarily indicate that the steps are performed in this order, and the order of the steps may be interchanged or performed simultaneously as allowed.
The term "comprising" as used in the description and claims should not be interpreted as being limited to what is listed thereafter; it does not exclude other elements or steps. Thus, it should be interpreted as specifying the presence of the stated features, integers, steps or components as referred to, but does not preclude the presence or addition of one or more other features, integers, steps or components, or groups thereof. Thus, the expression "a device comprising means a and B" should not be limited to a device consisting of only components a and B.
Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments as would be apparent to one of ordinary skill in the art from this disclosure.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. If there is a discrepancy, the meaning described in the present specification or the meaning obtained from the content described in the present specification is used. In addition, the terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
Referring to fig. 1, the embodiment of the invention discloses a GPU multithreading scheduling management system, which comprises a thread scheduling module, a thread configuration module, a software configuration module and a breakpoint storage module; the thread scheduling module is configured in the GPU chip and comprises a first thread scheduling module and a second thread scheduling module;
the thread configuration module is connected with the first thread scheduling module, and the software configuration module and the breakpoint storage module are respectively connected with the second thread scheduling module.
The invention constructs the thread scheduling module, which comprises a first thread scheduling module and a second thread scheduling module, and can be respectively used for realizing multi-thread scheduling and switching of the GPU.
Optionally, referring to fig. 2, a thread configuration module is configured to map a thread of a user to a thread of a GPU chip in a configuration mapping manner, and initialize a corresponding thread according to a thread number scheduled by the first thread scheduling module;
the first thread scheduling module is used for scheduling among a plurality of threads, sending the scheduled thread numbers to the thread configuration module, and simultaneously monitoring the state of the thread which wins scheduling and performing thread switching.
According to the invention, a scheduling mechanism of cooperation of the software and the hardware of the GPU chip is adopted, software (corresponding to the thread configuration module) is only responsible for initialization and configuration of each thread, and the GPU chip (corresponding to the first thread scheduling module) realizes scheduling according to configuration, so that the time required by multi-thread scheduling is greatly reduced. Meanwhile, the GPU chip monitors the state and running time of each thread and automatically switches the threads according to the configuration of time slices and the like. And then the result is informed to the software, so that the efficiency of multi-thread switching is improved.
It should be noted that, the first thread scheduling module and the second thread scheduling module in the present invention may be integrated into a single scheduling module, which is not limited in particular.
Optionally, referring to fig. 3, the thread configuration module maps the thread of the user to the thread of the GPU chip by configuring RingBuffer.
Optionally, the thread configuration module configuration content includes, but is not limited to, thread valid flag, space size, address, scheduling priority of the thread; referring to FIG. 4, the scheduling modes of the first thread scheduling module include, but are not limited to, polling, absolute priority, and scheduling is configured by the thread configuration module.
The GPU chip supports N threads on hardware, each of which may be configured independently, including but not limited to information on the thread's valid flag, space size, address, scheduling priority, etc. The default valid flag is 0, which indicates that the current thread is an invalid thread and the work is not started. All of the above information supports software through register configuration.
And the software builds a user thread according to the user demand. Meanwhile, the software maps the user thread to the thread of the GPU chip in a manner of configuring RingBuffer. And configuring registers such as the space size, the address, the scheduling priority and the like of the threads according to the demands of users.
The software configures the active flag for that thread to 1, indicating that the current thread is an active thread, which can be scheduled. If the software wants to stop or delete this thread, the valid flag can be directly configured to 0.
Optionally, the first thread scheduling module is specifically configured to monitor a thread state that wins scheduling and time, stop the work of the thread after the time reaches the time slice configured by the thread configuration module, and send a stop instruction to the thread configuration module to trigger the thread configuration module to keep information, data, and the like of the work of the thread.
The first thread scheduling module is responsible for scheduling among a plurality of threads, and a scheduling mode supports, but is not limited to, polling (Round Robin), absolute Priority (struct Priority) and the like, and is configured by software. The working time of a thread is called a time slice and is configured by software.
The first thread scheduling module selects threads according to a scheduling mode: 1) Polling: the scheduling module selects 1 thread (default scheduling from thread 0) in the current effective threads; 2) Absolute priority: the scheduling module selects a thread with highest scheduling priority from the current effective threads.
When a thread wins the schedule, the thread starts to work, sends data according to the instruction of the user, calculates the data, and the like. At the same time, the first thread scheduling module controls the timer to start counting, the timer starts counting from 0, and the first thread scheduling module feeds back the thread number which wins scheduling to the software. When the timer reaches the software configured time slice, the scheduling module stops the thread. And sends a stop instruction to the software for retaining the information, data, etc. of the current thread operation.
After the primary thread scheduling is completed, the first thread scheduling module reselects 1 thread among the active threads. Taking the example of the scheduling mode configured as polling: if the last thread that wins the schedule is thread 0, then the schedule starts with 1; when the thread 1 starts to work, the scheduling module starts to count time, and the timer starts from 0; and so on. When the scheduled thread is the last active thread N, the scheduling module resumes scheduling from thread 0. Taking the example of the scheduling mode being configured as absolute priority: the scheduling module selects a thread with highest scheduling priority from the current effective threads; if thread 0 is the highest priority, thread 0 will always be selected until the complete processing of thread 0's data is completed.
Referring to fig. 5, the embodiment of the invention also discloses a GPU multithreading scheduling method, which comprises the following steps:
s1, initializing a GPU chip: the thread configuration module maps threads to the GPU chip by configuring thread register information of the GPU chip according to the requirements of a user;
the registers include information such as thread valid flags, space size, addresses, scheduling priority, etc.
S2, the first thread scheduling module starts multi-thread scheduling: the first thread scheduling module reads the current state of each thread, and when more than 1 thread capable of being scheduled exists, a target thread is selected according to a scheduling mode (when the effective mark of a certain thread is 1 and the thread has data, the thread is considered as the thread capable of being scheduled, otherwise, the thread cannot participate in scheduling); otherwise, continuing to judge until a thread capable of being scheduled exists;
s3, the first thread scheduling module feeds back the scheduled thread number to the thread configuration module, and meanwhile, the first thread scheduling module controls a timer to start timing (from 0);
s4, the thread configuration module records the thread number and configures related information;
s5, the thread starts to work, and corresponding operation is executed according to the requirement of a user;
for example, reading data from an address, calculating data, writing an address after calculation, and the like.
S6, after the timing of the timer reaches the time slice configured by the thread configuration module, the first thread scheduling module ends the thread and feeds back the end information to the first thread scheduling module;
and S7, the first thread scheduling module repeats the flow of the steps S2-S6 and reschedules a new thread.
In addition, the following detailed description is provided.
Embodiment 1:
and initializing the GPU chip. The software maps the threads to the GPU chip by configuring the thread register information of the GPU chip according to the requirements of the user. The registers include information such as thread valid flags, space size, addresses, scheduling priority, etc. There are currently 3 threads, threads 0, 1, 2. Thread 0 valid flag is 1, space size is 1GByte, address is 0x1000000, priority is 0 (high); thread 1 valid flag is 1, space size is 2GByte, address is 0x2000000, priority is 4. Thread 2 valid flag is 1, space size 256MByte, address 0x8000000, priority 7 (low).
The multi-threaded scheduling is started. The first thread scheduling module begins operation.
The state of each thread is read. When the active flag of the current thread 0, 1, 2 is found to be 1 and the thread has data, this thread is considered to be a schedulable thread.
It is determined whether there are threads that can be scheduled. The current user configures the scheduling mode to be polling, and thread 0 is then selected from threads 0, 1, 2. The time slices are configured to be 1ms.
Thread 0 is scheduled out and the software is told thread number 0. At the same time, the scheduling module starts timing, starting from 0.
The software configures the relevant information of 0 according to the thread number.
Thread 0 begins to work. According to the requirement of the user, the thread 0 needs to read 2 groups of 256byte data from the system side to the GPU and perform multiplication calculation. And transmitting the calculated data to the system.
When the timer reaches 1ms, the scheduling module ends the work of the thread 0 and feeds back the end information to the software. If the current thread 0 calculation has been completed, the software will reassign a new task to thread 0 and wait for the next dispatch. If the calculation of the current thread 0 is not completed, the software records the data which is calculated currently and waits for the next scheduling to resume the calculation.
The scheduling system will continually repeat the above-mentioned initialization process and reschedule a new thread.
Embodiment 2:
and initializing the GPU chip. The software maps the threads to the GPU chip by configuring the thread register information of the GPU chip according to the requirements of the user. The registers include information such as thread valid flags, space size, addresses, etc. There are currently 3 threads, threads 0, 8, 15. Thread 0 valid flag is 1, space size is 1GByte, address is 0x1000000, priority is 7 (low); thread 8 valid flag is 1, space size is 2GByte, address is 0x2000000, priority is 4. Thread 15 valid flag is 1, space size 256MByte, address 0x8000000, priority 0 (high).
The multithreading is started and the first thread scheduling module begins to operate.
The state of each thread is read. When the active flag of the current thread 0, 8, 15 is found to be 1 and the thread has data, this thread is considered to be a schedulable thread.
It is determined whether there are threads that can be scheduled. The current user configures the scheduling mode to be absolute priority, and thread 15 is then selected from threads 0, 8, 15. The time slices are configured to be 100ms.
The thread 15 is dispatched and the thread number 15 is sent to the software. At the same time, the scheduling module starts timing, starting from 0.
The software configures 15 the relevant information according to the thread number.
Thread 15 begins to operate. According to the user's needs, the thread 15 needs to read 1024 data from the system side into the GPU and perform addition calculation. And transmitting the calculated data to the system.
When the timer reaches 100ms, the scheduling module ends the operation of the thread 15 and feeds back the end information to the software. If the computation of the current thread 15 has been completed, the software will reassign a new task to the thread 15 and wait for the next dispatch. If the calculation of the current thread 15 is not completed, the software records the data which is calculated currently, and waits for the next scheduling to resume the calculation.
The scheduling system will continually repeat the above-mentioned initialization process and reschedule a new thread.
Optionally, referring to fig. 6, the software configuration module is configured to initialize and configure threads;
the second thread scheduling module is used for scheduling among a plurality of threads and switching among the threads according to the breakpoint information of the corresponding threads sent by the breakpoint storage module;
the breakpoint storage module is used for recording the current scheduled thread number, storing the breakpoint information of the thread, and returning the breakpoint information of the corresponding thread to the second thread scheduling module.
When the thread is switched, the GPU chip actively stores breakpoint information of the thread, and the thread information which is not executed is recorded to the breakpoint storage module. Thus greatly simplifying the operation complexity of the software and reducing the time and space of the software record. And secondly, during the scheduling of the thread, the GPU chip judges whether the thread has a breakpoint or not. If the thread has a breakpoint, the GPU chip automatically jumps to the position where the thread was executed last time and continues to execute on the original data. Thus, the time for thread switching is greatly shortened, and the efficiency of the system multithreading instruction execution is remarkably improved.
Optionally, the scheduling mode and time slices of the second thread scheduling module are configured by the software configuration module.
Optionally, the scheduling mode is polling.
Optionally, the second thread scheduling module is further configured to send a read request to the breakpoint storage module, and when the breakpoint storage module returns that the target thread is a breakpoint thread, directly jump to a breakpoint position, and start working to process data from the breakpoint position of the thread.
Optionally, the second thread scheduling module is further configured to control the timer to start counting (counting from 0) when the scheduled thread starts working, stop the working of the thread when the counting reaches the working time slice, and send a stop instruction to the software configuration module.
Optionally, the second thread scheduling module is further configured to write current information into the breakpoint storage module when it is determined that the current thread is not executed, and if the thread is already executed, the current information does not need to be written into the breakpoint storage module.
Optionally, the current information includes, but is not limited to, information of thread type, size of remaining space, address of current breakpoint, etc.
As shown in fig. 7, the second thread scheduling module is responsible for scheduling among the plurality of threads, with the scheduling mode and the working time slices configured by the software configuration module.
After the second thread scheduling module starts working, firstly selecting threads according to a scheduling mode. Taking the example of the scheduling mode configured as polling: the scheduling module selects 1 thread from the current active threads, and starts scheduling from thread 0 by default. And secondly, when a certain thread wins the scheduling, the thread starts to work, and the second thread scheduling module sends a read request to the breakpoint storage module according to the thread number of the winning scheduling and sends the thread number to the software configuration module.
If the thread is found to be the breakpoint thread in the breakpoint storage module, the thread directly jumps to the breakpoint position, and the data processing is started from the breakpoint position of the thread. For example, data continues to be read from the last address and calculation is continued based on the last data. If the thread is not a breakpoint thread, the software will initialize, send data according to the user's instructions, calculate the data, etc. At the same time, the timer of the second thread scheduling module begins counting from 0.
And then, after the timer reaches the time slice of the software configuration, the second thread scheduling module stops the work of the thread and sends a stop instruction to the software configuration module. Meanwhile, the second thread scheduling module judges whether the current thread is executed to complete: if the thread is not executed, writing the current information into a breakpoint storage module, and setting a breakpoint effective mark of the thread number to be 1; if the thread has been executed, then the current information need not be written to the breakpoint storage module, and the breakpoint valid flag of the default thread is 0.
Finally, the second thread scheduling module may reselect 1 thread among the active threads. Taking the example of the scheduling mode configured as polling: if the last thread that won the schedule is thread 0, then the schedule starts with 1. When thread 1 starts working, the scheduler module starts counting with a start timer, which starts from 0. And so on. When the scheduled thread is the last active thread N, the second thread scheduling module resumes scheduling from thread 0.
As shown in fig. 8, the breakpoint storage module is responsible for recording the current scheduled thread number, writing the breakpoint information of the thread into the storage unit, and simultaneously, returning the breakpoint information of the thread according to the switched thread number query command. And after the dispatching is completed, the breakpoint buffer module queries whether the thread is a breakpoint thread according to the thread number sent by the second thread dispatching module. If the thread is a breakpoint thread, the breakpoint buffer module returns breakpoint information including information of thread type, size of remaining space, address of current breakpoint, etc. If the thread is not a breakpoint thread, an invalid flag is returned.
Referring to fig. 9, first, the GPU chip supports N threads on hardware. Each thread may be configured independently including, but not limited to, information on the thread's valid flag, space size, address, scheduling priority, etc. The default valid flag is 0, which indicates that the current thread is an invalid thread and the work is not started. All of the above information supports the software configuration module to configure through registers. And secondly, the software configuration module builds a user thread according to the user requirement. Meanwhile, the software configuration module maps the user thread to the thread of the GPU chip in a manner of configuring the RingBuffer. And configuring registers such as the space size, the address, the scheduling priority and the like of the threads according to the demands of users. Finally, the software configures the active flag for that thread to 1, indicating that the current thread is an active thread, which can be scheduled. If the software wants to stop or delete this thread, the valid flag can be directly configured to 0.
Referring to fig. 10, the embodiment of the invention discloses a GPU multithreading switching method, which comprises the following steps:
s1, initializing a GPU chip: the software configuration module maps threads to the GPU chip by configuring thread register information of the GPU chip according to the requirements of a user. The registers include information such as thread valid flags, space size, addresses, scheduling priority, etc.
S2, starting multithreading scheduling: and the second thread scheduling module reads the scheduling mode and the current state of each thread, and schedules the target thread from the effective threads.
It is determined whether there are threads that can be scheduled. If there are more than 1 thread that can be scheduled, the second thread scheduling module selects the thread according to the scheduling mode. Otherwise, the scheduler will determine until there are threads that can be scheduled.
And S3, transmitting the thread number of the target thread to the software configuration module, and simultaneously controlling a timer to start to count (from 0) by the second thread scheduling module.
S4, the second thread scheduling module sends a read request to a breakpoint storage module according to the thread number of the target thread and receives breakpoint information returned by the breakpoint storage module; if the breakpoint information indicates that the target thread is a breakpoint thread, the second thread scheduling module directly schedules the target thread to a breakpoint position, and starts working to process data from the breakpoint position (for example, continuously reading data from a last address and continuously calculating based on the last data); if the breakpoint information indicates that the target thread is a new thread, the software configuration module initializes the target thread, and the target thread begins to operate (e.g., reads data from an address, calculates data, writes an address after calculation, etc.) after the software configuration module has been initialized.
S5, when the timer of the second thread scheduling module reaches a time slice configured by software, the second thread scheduling module judges whether the current thread is executed or not, and if the target thread is not executed, the current information of the target thread is written into a breakpoint storage module; if the target thread has been executed, then the current information need not be written to the breakpoint storage module.
And S6, after the breakpoint information (including the thread number and the current information) is stored, the second thread scheduling module ends the target thread and feeds the end information back to the software configuration module.
And S7, the thread scheduling system repeats the flow of the steps S2-S6 so as to realize the switching of the new target thread.
Embodiments based on the above technical solutions are further provided below.
Embodiment 1:
and initializing the GPU chip. The software maps the threads to the GPU chip by configuring the thread register information of the GPU chip according to the requirements of the user. The registers include information such as thread valid flags, space size, address, instruction type, etc. There are currently 2 threads, threads 7, 15. Thread 7 valid flag is 1, space size is 4GByte, address is 0x1000000, priority is 1 (high), instruction type is matrix addition; thread 15 valid flag is 1, space size is 1GByte, address is 0xa0000000, priority is 2 (low), instruction type is matrix multiplication. In the current breakpoint storage module, the breakpoint effective flags of all threads are 0.
The multi-threaded scheduling is started. The second thread scheduling module begins operation.
The state of each thread is read. The active flag of the current thread 7, 15 is found to be 1 and is considered to be a schedulable thread.
It is determined whether there are threads that can be scheduled. The current user configures the scheduling mode to be priority, and thread 7 is then selected from threads 7, 15. The time slices are configured to be 2ms.
The thread 7 is dispatched and the information of the breakpoint memory module is read. The breakpoint valid flag for thread 7 is 0 indicating that the thread is a new thread. And tells the software thread number 7. At the same time, the scheduling module starts timing, starting from 0.
The software configures 7 the relevant information according to the thread number.
Thread 7 begins to operate. According to the user's needs, thread 7 needs to read 2 sets of 1Mbyte data from the system side into the GPU and do matrix addition. And transmitting the calculated data to the system.
When the timer reaches 2ms, the scheduler module will end the thread 7 operation. The calculation of the current thread 7 is not completed, the GPU writes the address 0x5000000, the instruction type, the residual space, the calculated number and the like which are read currently into the breakpoint storage module, writes the effective breakpoint indication into 1, and waits for the next scheduling to resume the calculation.
When the breakpoint information storage is completed, the scheduling module ends the work of the thread 7 and feeds back the end information to the software.
The scheduling system will repeat the procedure of 2 constantly. Rescheduling the new thread.
Embodiment 2:
there are currently 2 threads, threads 7, 15. Thread 7 valid flag is 1, space size is 4GByte, address is 0x1000000, priority is 1 (high), instruction type is matrix addition; thread 15 valid flag is 1, space size is 1GByte, address is 0xa0000000, priority is 2 (low), instruction type is matrix multiplication. In the current breakpoint storage module, the breakpoint effective flag of the thread 7 is 1, the read address is 0x5000000, the instruction type bit matrix multiplication is performed, the residual space is 1GByte, the calculated number is the same. The breakpoint valid flag for thread 15 is 0.
The multi-threaded scheduling is started. The second thread scheduling module begins operation.
The state of each thread is read. The active flag of the current thread 7, 15 is found to be 1 and is considered to be a schedulable thread.
It is determined whether there are threads that can be scheduled. The current user configures the scheduling mode to be priority, and thread 7 is then selected from threads 7, 15. The time slices are configured to be 2ms.
The thread 7 is dispatched and the information of the breakpoint memory module is read. The breakpoint valid flag for thread 7 is 1, indicating that execution has not completed before. The current address 0x5000000 is automatically jumped into. Execution continues based on other information such as instruction type bit matrix multiplication, 1GByte of space left, number already calculated, etc. At the same time, the scheduling module starts timing, starting from 0.
Thread 7 begins to operate. According to the user's needs, thread 7 needs to read 0x5000000 of data from the system side into the GPU and do matrix addition. And transmitting the calculated data to the system.
When the timer reaches 2ms, the scheduler module ends the thread 7 operation. At this time, the calculation of the current thread 7 is completed, and the GPU writes the breakpoint valid indication corresponding to the thread 7 in the breakpoint storage module to 0, which indicates that the thread has been executed, and no breakpoint information is provided.
When the breakpoint information is cleared, the scheduling module ends the work of the thread 7 and feeds the end information back to the software.
The scheduling system will repeat the procedure of 2 constantly. Rescheduling the new thread.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the invention, which fall within the scope of the invention.

Claims (7)

1. A GPU multithreading scheduling management system is characterized in that: the system comprises a thread scheduling module, a thread configuration module, a software configuration module and a breakpoint storage module; the thread scheduling module is configured in the GPU chip and comprises a first thread scheduling module and a second thread scheduling module;
the thread configuration module is connected with the first thread scheduling module, and the software configuration module and the breakpoint storage module are respectively connected with the second thread scheduling module;
the thread configuration module is used for mapping the threads of the user to the threads of the GPU chip in a configuration mapping mode, and initializing the corresponding threads according to the thread numbers scheduled by the first thread scheduling module;
the first thread scheduling module is used for scheduling among a plurality of threads, sending the scheduled thread numbers to the thread configuration module, and simultaneously monitoring the state of the thread which wins scheduling and performing thread switching;
the thread configuration module maps the threads of the user to the threads of the GPU chip in a manner of configuring RingBuffer;
the software configuration module is used for initializing and configuring threads;
the second thread scheduling module is used for scheduling among a plurality of threads and switching among the threads according to the breakpoint information of the corresponding threads sent by the breakpoint storage module;
the breakpoint storage module is used for recording the current scheduled thread number, storing the breakpoint information of the thread, and returning the breakpoint information of the corresponding thread to the second thread scheduling module.
2. A GPU multithreaded dispatch management system according to claim 1, wherein: the thread configuration module configuration content comprises, but is not limited to, a thread effective mark, a space size, an address and a scheduling priority of a thread; the scheduling mode of the first thread scheduling module includes, but is not limited to, polling, absolute priority, and scheduling mode is configured by the thread configuration module.
3. A GPU multithreaded dispatch management system according to claim 2, wherein: the first thread scheduling module is specifically configured to monitor and count the state of the thread winning the schedule, stop the work of the thread after the count reaches the time slice configured by the thread configuration module, and send a stop instruction to the thread configuration module to trigger the thread configuration module to retain the information and data of the work of the thread.
4. A GPU multithreaded dispatch management system according to claim 1, wherein: the scheduling mode and time slices of the second thread scheduling module are configured by the software configuration module.
5. A GPU multithreaded dispatch management system according to claim 4, wherein: the second thread scheduling module is further configured to send a read request to the breakpoint storage module, and when the breakpoint storage module returns that the target thread is a breakpoint thread, directly jump to a breakpoint position, and start working to process data from the breakpoint position of the thread.
6. A GPU multithreaded dispatch management system according to claim 5, wherein: the second thread scheduling module is further used for controlling the timer to start timing when the scheduled thread starts working, stopping the working of the thread when the timing reaches a working time slice, and sending a stopping instruction to the software configuration module.
7. A GPU multithreaded dispatch management system according to claim 6, wherein: the second thread scheduling module is further used for writing current information into the breakpoint storage module when judging that the current thread is not executed to be completed; if the thread has been completed, then there is no need to write the current information to the breakpoint storage module.
CN202311694615.4A 2023-12-12 2023-12-12 GPU multithread scheduling management system Active CN117389712B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311694615.4A CN117389712B (en) 2023-12-12 2023-12-12 GPU multithread scheduling management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311694615.4A CN117389712B (en) 2023-12-12 2023-12-12 GPU multithread scheduling management system

Publications (2)

Publication Number Publication Date
CN117389712A CN117389712A (en) 2024-01-12
CN117389712B true CN117389712B (en) 2024-03-12

Family

ID=89465204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311694615.4A Active CN117389712B (en) 2023-12-12 2023-12-12 GPU multithread scheduling management system

Country Status (1)

Country Link
CN (1) CN117389712B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102648449A (en) * 2009-09-29 2012-08-22 辉达公司 Trap handler architecture for a parallel processing unit
CN104375804A (en) * 2013-08-13 2015-02-25 三星电子株式会社 Multiple threads execution processor and operating method thereof
CN104699461A (en) * 2013-12-10 2015-06-10 Arm有限公司 Configuring thread scheduling on a multi-threaded data processing apparatus
CN113946445A (en) * 2021-10-15 2022-01-18 杭州国芯科技股份有限公司 Multithreading module based on ASIC and multithreading control method
CN114942831A (en) * 2022-03-31 2022-08-26 上海阵量智能科技有限公司 Processor, chip, electronic device and data processing method
CN115617499A (en) * 2022-12-20 2023-01-17 深流微智能科技(深圳)有限公司 System and method for GPU multi-core hyper-threading technology
CN115756859A (en) * 2022-11-25 2023-03-07 昆易电子科技(上海)有限公司 Method, device and system for resolving radar point cloud and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11782720B2 (en) * 2020-11-16 2023-10-10 Ronald Chi-Chun Hui Processor architecture with micro-threading control by hardware-accelerated kernel thread

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102648449A (en) * 2009-09-29 2012-08-22 辉达公司 Trap handler architecture for a parallel processing unit
CN104375804A (en) * 2013-08-13 2015-02-25 三星电子株式会社 Multiple threads execution processor and operating method thereof
CN104699461A (en) * 2013-12-10 2015-06-10 Arm有限公司 Configuring thread scheduling on a multi-threaded data processing apparatus
CN113946445A (en) * 2021-10-15 2022-01-18 杭州国芯科技股份有限公司 Multithreading module based on ASIC and multithreading control method
CN114942831A (en) * 2022-03-31 2022-08-26 上海阵量智能科技有限公司 Processor, chip, electronic device and data processing method
CN115756859A (en) * 2022-11-25 2023-03-07 昆易电子科技(上海)有限公司 Method, device and system for resolving radar point cloud and storage medium
CN115617499A (en) * 2022-12-20 2023-01-17 深流微智能科技(深圳)有限公司 System and method for GPU multi-core hyper-threading technology

Also Published As

Publication number Publication date
CN117389712A (en) 2024-01-12

Similar Documents

Publication Publication Date Title
EP0491342B1 (en) Multiprocessing system and method of controlling the carrying out of tasks in a multiprocessing system
US7290261B2 (en) Method and logical apparatus for rename register reallocation in a simultaneous multi-threaded (SMT) processor
US7979680B2 (en) Multi-threaded parallel processor methods and apparatus
JP5678135B2 (en) A mechanism for scheduling threads on an OS isolation sequencer without operating system intervention
US5339415A (en) Dual level scheduling of processes to multiple parallel regions of a multi-threaded program on a tightly coupled multiprocessor computer system
US6931641B1 (en) Controller for multiple instruction thread processors
US6931639B1 (en) Method for implementing a variable-partitioned queue for simultaneous multithreaded processors
US6944850B2 (en) Hop method for stepping parallel hardware threads
JP2561801B2 (en) Method and system for managing process scheduling
US20060130062A1 (en) Scheduling threads in a multi-threaded computer
US20040172631A1 (en) Concurrent-multitasking processor
EP1031924A2 (en) Computer executing multiple operating system
JPH0760415B2 (en) Multitasking data processing system
JP5309703B2 (en) Shared memory control circuit, control method, and control program
JP2007200288A (en) System and method for grouping execution threads
US20050066149A1 (en) Method and system for multithreaded processing using errands
US8042116B2 (en) Task switching based on the execution control information held in register groups
CN117389712B (en) GPU multithread scheduling management system
KR20010036644A (en) Real-time control system for digital signal processor
CN111158875B (en) Multi-module-based multi-task processing method, device and system
US20020087844A1 (en) Apparatus and method for concealing switch latency
US7603673B2 (en) Method and system for reducing context switch times
EP0544822B1 (en) Dual level scheduling of processes
WO2002046887A2 (en) Concurrent-multitasking processor
JP2009515280A (en) Centralized interrupt controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant