WO2023283767A1 - 一种任务调度方法和装置 - Google Patents

一种任务调度方法和装置 Download PDF

Info

Publication number
WO2023283767A1
WO2023283767A1 PCT/CN2021/105779 CN2021105779W WO2023283767A1 WO 2023283767 A1 WO2023283767 A1 WO 2023283767A1 CN 2021105779 W CN2021105779 W CN 2021105779W WO 2023283767 A1 WO2023283767 A1 WO 2023283767A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
priority
task flow
sub
flows
Prior art date
Application number
PCT/CN2021/105779
Other languages
English (en)
French (fr)
Inventor
朱湘毅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2021/105779 priority Critical patent/WO2023283767A1/zh
Priority to EP21949564.5A priority patent/EP4357920A1/en
Priority to CN202180044331.3A priority patent/CN115812197A/zh
Publication of WO2023283767A1 publication Critical patent/WO2023283767A1/zh
Priority to US18/408,906 priority patent/US20240143393A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present application relates to the field of artificial intelligence (AI), in particular to a task scheduling method and device.
  • AI artificial intelligence
  • the neural network model has the characteristics of intensive matrix and vector calculations, which requires very high.
  • Ordinary central processing units Central Processing Units, CPUs
  • CPUs Central Processing Units
  • dedicated accelerators are needed to perform inference acceleration, such as graphics processing units (Graphics Processing Units, GPUs), or customized Embedded neural network processor (Neural network Processing Unit, NPU) and so on.
  • graphics processing units Graphics Processing Units, GPUs
  • Embedded neural network processor Neural network Processing Unit, NPU
  • the application (Application, APP) side needs to split the AI model into task flows and send them to the accelerator, where each task flow includes one or more AI tasks.
  • AI tasks related to obstacle detection and lane line detection require high real-time scheduling, while functions related to occupant monitoring and entertainment in the car require high real-time scheduling.
  • AI tasks generally do not require high real-time scheduling.
  • the present application provides a task scheduling method and device, which are used to meet the scheduling requirements of different AI tasks and improve the utilization of accelerator computing resources.
  • a task scheduling method which can be applied to any device or chip or integrated circuit with computing capability.
  • the method includes: the accelerator receives a plurality of first task flows, the priority type of the plurality of first task flows is the first priority, and the first priority includes a first sub-priority and a second sub-priority Sub-priority; assign a time slice to each first task flow according to the sub-priority of each first task flow in multiple first task flows; wherein, the first sub-priority is higher than the second sub-priority, and the first The time slice assigned to the first task flow corresponding to the sub-priority is greater than the time slice assigned to the first task flow corresponding to the second sub-priority; based on the time slice rotation method, all first tasks in multiple first task flows Stream AI tasks are scheduled.
  • the accelerator performs a round-robin method for each first task in the multiple first task flows based on time slices. The flow is scheduled, so the first task flow with different sub-priorities can occupy the accelerator in turn. Because the processing speed of the accelerator is very fast, as long as the interval between each time slice is appropriate, the first task flow of different sub-priorities can be scheduled in parallel on the surface, and the first task flow of different sub-priorities in the first priority can be realized.
  • the technical effect of task flow parallel scheduling at the same time, because the task flow with higher sub-priority is assigned a longer time slice, it can also ensure that the task flow with higher sub-priority takes up more time for the accelerator to run, thereby realizing priority level control.
  • the scheduling mechanism is simple, with a small amount of computation, and is suitable for hardening implementation, for example, for accelerators such as NPU and GPU.
  • first sub-priority and the second sub-priority are included in this document, and more priorities may be included in actual applications, which is not limited in this application.
  • the AI tasks may be scheduled from the first task flow with high sub-priority.
  • Task exemplary, take the highest priority of the first sub-priority as an example: schedule the AI tasks of the first task flow of the first sub-priority first; the time slice used by the first task flow of the first sub-priority is greater than or It is equal to its assigned time slice, or, when the AI tasks of the first task flow of the first sub-priority are scheduled, the AI tasks of the first task flow of the second sub-priority are scheduled.
  • the time slice used by the first task flow in the second sub-priority is greater than or equal to its assigned time slice, Or, when the AI tasks of the first task flow of the second sub-priority have been scheduled, continue to schedule the AI tasks of the first task flow of the third sub-priority.
  • an AI task can be dispatched to one operation logic unit of the accelerator for processing, or can be dispatched to multiple operation logic units for processing, which is not limited in the present application.
  • each AI task may carry an indication information, which is used to indicate how many operation logic units the AI task may need to be scheduled for processing.
  • the accelerator after the accelerator completes a round of scheduling, it can also re-allocate time slices.
  • each first task flow in at least one first-type task flow has used
  • the time slice is greater than or equal to its assigned time slice, reallocate each first task flow in the at least one first type task flow according to the sub-priority of each first task flow in the at least one first type task flow Time slice; Wherein, if there is any first task flow in at least one first-type task flow, the time slice used by any first task flow in the last round of scheduling exceeds its assigned time slice, then any first task flow will be re-acquired from any first task flow From the allocated time slices, the time slices used more by any first task flow in the last round of scheduling are deducted.
  • the accelerator receives multiple priority task streams at the same time, it can prioritize the scheduling of high priority task streams.
  • the several task flows include multiple first task flows and at least one second task flow, the priority type of at least one second task flow is the second priority, and the second priority is higher at the first priority; then, before scheduling the AI tasks of all the first task flows in multiple first task flows based on time slice round-robin, at least one AI task in the second task flow is scheduled; After completing all the AI tasks in at least one second task flow, the AI tasks of all the first task flows in the multiple first task flows are scheduled based on time slice rotation.
  • high-priority task flows can be scheduled by the accelerator prior to low-priority task flows, and the real-time scheduling of high-priority task flows can be better ensured.
  • the high-priority task flow can preempt the time occupied by the operation logic unit of the accelerator by the low-priority task flow.
  • the process of scheduling the AI tasks of all the first task flows in the multiple first task flows based on time slice rotation if the second AI tasks in the multiple first task flows are scheduled. After receiving at least one second task flow, after the second AI task is executed or after the data block that has been dispatched to the operation logic unit in the second AI task is executed, the execution of multiple first task flows is suspended. and start scheduling AI tasks in at least one second task flow; wherein, the priority type of at least one second task flow is the second priority, and the second priority is higher than the first priority .
  • the high-priority task flow can preempt the time of the low-priority task flow occupying the operation logic unit of the accelerator, and the real-time scheduling of the high-priority task flow can be better guaranteed.
  • the embodiments of the present application can be applied to any scenario where AI model reasoning is required.
  • the above-mentioned first task flow is the task flow in the first AI reasoning model, and the first AI reasoning model corresponds to the occupant monitoring function or entertainment function in the automatic driving system;
  • the above-mentioned second task flow is the task flow in the second AI reasoning model
  • the task flow of the second AI reasoning model corresponds to the obstacle detection function or the lane line detection function or the driver detection function in the automatic driving system.
  • a task scheduling device in a second aspect, includes a module/unit for performing the method described in the first aspect or any possible implementation manner of the first aspect.
  • the device may include a transceiver module configured to receive a plurality of first task flows, the priority type of the plurality of first task flows is a first priority, and the first priority includes a first sub-priority and a second sub-priority.
  • Sub-priority a processing module configured to allocate a time slice for each first task flow according to the sub-priority of each first task flow in multiple first task flows; wherein, the first sub-priority is higher than the second sub-priority Priority, the time slice assigned to the first task flow corresponding to the first sub-priority is greater than the time slice assigned to the first task flow corresponding to the second sub-priority; multiple first task flows based on time slice rotation All the AI tasks of the first task stream in the system are scheduled.
  • an accelerator including a programmable logic circuit and/or program instructions, which are used to implement the method described in the above first aspect or any possible implementation manner of the first aspect when the accelerator is running.
  • a computer-readable storage medium which is used to store instructions, and when the instructions are executed, the above-mentioned first aspect or any one of the possible implementation methods of the first aspect is implemented.
  • a computer program product including instructions is provided, and when the computer program product is run on a computer, the computer is made to execute the method described in the above first aspect or any possible implementation manner of the first aspect.
  • FIG. 1 is a schematic diagram of an application scenario applicable to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a task scheduling method provided in an embodiment of the present application.
  • Figure 4 is a schematic diagram of the relationship between AI tasks and blocks
  • Figure 5 is a specific example of the task scheduling method of the present application.
  • FIG. 6 is a schematic structural diagram of a task scheduling device provided by an embodiment of the present application.
  • the task scheduling method provided in the embodiment of the present application is applicable to any scenario where AI model reasoning is required. Examples include but are not limited to the following scenarios: intelligent control (such as automatic driving, assisted driving), machine vision, fingerprint recognition, face recognition, retina recognition, iris recognition, palmprint recognition, automatic planning, intelligent search, theorem proof, game, Automatic programming, robotics, language and image understanding, genetic programming, etc.
  • intelligent control such as automatic driving, assisted driving
  • machine vision fingerprint recognition, face recognition, retina recognition, iris recognition, palmprint recognition
  • automatic planning intelligent search, theorem proof
  • game Automatic programming, robotics, language and image understanding, genetic programming, etc.
  • FIG. 1 is a schematic diagram of an application scenario applicable to this embodiment of the present application.
  • AI model reasoning is required when vehicles implement related functions such as obstacle detection, lane line detection, driver monitoring, or entertainment.
  • the vehicle can collect the image of the lane, and input the collected image into the AI model corresponding to the lane line detection function, and output the lane line detection result through the AI model corresponding to the lane line detection function;
  • Voice signal input the voice signal into the AI model corresponding to the driver monitoring function, and output the voice recognition result through the AI model corresponding to the driver monitoring function.
  • the task scheduling method provided by the embodiment of the present application can be executed by a computing device, and the computing device can be set in a cloud data center to respond to user needs, or can be set in a front-end smart device to respond to user needs. There is no restriction on this.
  • the computing device can be deployed in the vehicle (such as an on-board computer), or it can be deployed in the cloud server of the vehicle.
  • computing device in the embodiment of the present application may be any device or chip or integrated circuit capable of computing.
  • computing devices include but are not limited to general-purpose central processing unit (central processing unit, CPU), graphics processing unit (graphics processing unit, GPU), neural network processor (neural networks processunit, NPU), field programmable logic gate One or more of arrays (field programmable gate array, FPGA) and the like.
  • FIG. 2 shows a schematic structural diagram of a possible computing device.
  • the computing device includes processors and accelerators.
  • the processor (such as CPU) is usually a very large-scale integrated circuit, which is the computing core (Core) and control core (Control Unit) of a computer. Its function is mainly to interpret computer instructions and process data in computer software.
  • the CPU is the core component in a computer responsible for reading instructions, decoding them, and executing them.
  • the central processing unit mainly includes two parts, that is, the controller and the arithmetic unit, which also include high-speed and data and control buses that realize the connection between the buffer processors.
  • the CPU is the core hardware unit that controls and allocates all the hardware resources of the computer (such as memory, input and output units) and performs general-purpose operations.
  • the CPU is the computing and control core of a computer. The operations of all software layers in the computer system will eventually be mapped to CPU operations through the instruction set.
  • the CPU can act alone to process complex logical operations and different data types, but when a large amount of processing data of a uniform type is required (such as a neural network model with intensive matrix and vector calculations), its computing power cannot meet the demand.
  • Accelerators can be used to process a large amount of data with uniform processing types (such as neural network models with matrix and vector calculation intensive). It can be used as an auxiliary processor of the CPU to share part of the work of the CPU to speed up the speed at which the CPU obtains data processing results. Examples of common accelerators: NPU, GPU, etc.
  • GPU also known as display core, visual processor, display chip
  • display core is a kind of micro-processing that specializes in image and graphics-related computing work on personal computers, workstations, game consoles, and some mobile devices (such as tablets, smartphones, etc.) device.
  • the GPU reduces the graphics card's dependence on the CPU and performs some of the original CPU work.
  • the core technologies adopted by the GPU include hardware T&L (geometric transformation and lighting processing), cubic environment material map and vertex blending, Texture compression and bump mapping, dual-texture quad-pixel 256-bit rendering engine, etc., and hardware T&L technology can be said to be the hallmark of GPU.
  • the NPU adopts a "data-driven parallel computing" architecture, and is particularly good at processing massive multimedia data such as videos and images.
  • the NPU is specially designed for the artificial intelligence of the Internet of Things. It is used to accelerate the operation of the neural network and solve the problem of low efficiency of the traditional chip in the operation of the neural network.
  • the NPU can process some data by itself, and distribute the received diversified data to other units for processing.
  • FIG. 2 shows three APPs, namely APP1, APP2, and APP3, but it is not limited thereto.
  • one APP can correspond to one or more AI models, and one AI model can also correspond to one or more APPs.
  • the APP in the processor can call the driver of the accelerator (such as the NPU driver) to drive the accelerator to run, and deliver (or load) the AI model to the accelerator.
  • the driver of the accelerator such as the NPU driver
  • the AI model on the APP side generally has a calculation graph structure, so the APP needs to convert the calculation graph structure into an execution sequence structure executable by the accelerator, and then deliver the AI model to the accelerator in the form of one or more execution sequences .
  • An AI model can correspond to one or more execution sequences, and each execution sequence can include multiple execution units.
  • an execution unit may also be referred to as an execution task or an AI task.
  • Each execution sequence can be composed of multiple AI tasks, and the AI tasks of the same execution sequence are executed serially on the accelerator, so the execution sequence can also be called a task flow.
  • each SQE shown in Figure 2 is an AI task (ie, an execution unit). It should be understood that in FIG. 2 , it is only taken as an example that the AI model 1 corresponding to APP1 has 2 task flows, the AI model 2 corresponding to APP2 has 1 task flow, and the AI model 3 corresponding to APP3 has 3 task flows. Actually not limited to this.
  • the APP when the APP loads the AI model to the accelerator, it can load all task flows (or all AI tasks) to the accelerator for execution at one time, or it can load multiple times and only load part of the task flow of the AI model each time. (or some AI tasks) to the accelerator for execution. Either way, what you see on the accelerator side is that the AI tasks in the task flow need to be executed (or the execution units in the execution sequence need to be executed).
  • the accelerator may include a controller and an arithmetic logic unit.
  • the controller is used for receiving the AI task issued by the processor (or receiving the task flow issued by the processor), and assigning the AI task issued by the processor to the operation logic unit for execution.
  • the process that the controller assigns the AI task to the operation logic unit may also be referred to as the process that the controller dispatches the AI task to the operation logic unit.
  • the operation logic unit is used to execute (or run) the AI task scheduled by the controller and report the execution result (or operation result or calculation result) to the controller.
  • the controller is also used to report the calculation results of the AI model to the processor. It should be understood that when the AI model is executed, the APP of the processor can obtain a complete calculation result only after all the AI tasks of the AI model are executed.
  • one processor may correspond to one accelerator, or may correspond to multiple accelerators, which is not limited in this application.
  • processor and the accelerator can be implemented as separate chips, or can be integrated into one chip, which is not limited in this application.
  • FIG. 3 it is a flow chart of a task scheduling method provided in the embodiment of the present application. Taking the method applied to the computing device shown in FIG. 2 as an example, the method includes:
  • the accelerator receives multiple first task flows, where the priority type of the multiple first task flows is a first priority, and the first priority includes at least two sub-priorities.
  • the controller of the accelerator may receive multiple first task flows from the processor, and each first task flow includes multiple AI tasks.
  • the plurality of first task flows may correspond to the same AI model.
  • APP3 in the processor may send the task flow of Model 3 corresponding to APP3 to the controller of the accelerator.
  • Model 3 includes three The task flows are: ⁇ SQE41, SQE42, ... ⁇ , ⁇ SQE51, SQE52, ... ⁇ , ⁇ SQE61, SQE62, ... ⁇ ; the multiple first task flows can also correspond to multiple different AI models, to Figure 2 is an example.
  • APP1 in the processor can send the task flow of model 1 corresponding to APP1 and the task flow of model 2 corresponding to APP2 to the controller of the accelerator.
  • Model 1 includes one task flow: ⁇ SQE11, SQE12 ,... ⁇
  • Model 2 includes two task streams: ⁇ SQE21, SQE22,... ⁇ SQE31, SQE32,... ⁇ .
  • the AI tasks in each first task flow are AI tasks to be scheduled. During scheduling, the AI tasks in each first task flow are scheduled in a serial manner.
  • the priority of the task flow may be used to represent the real-time requirement of the task flow for scheduling, or to represent the real-time requirement of the task flow for the output of the calculation result (referring to the calculation result of the entire task flow).
  • the number of priority types can be set according to actual needs, and this application does not make a limitation.
  • the following mainly takes the example that the higher the real-time requirement of the task flow on scheduling, the higher the corresponding priority.
  • task flows related to functions such as obstacle detection and lane line detection must complete the detection tasks within the expected time to obtain detection results, so the real-time requirements for scheduling are high (e.g. : The time from the accelerator receiving the task flow to the accelerator outputting the calculation result of the task flow does not exceed the first time threshold, or the time for the accelerator to execute the task flow does not exceed the first time threshold), the task flow needs to be scheduled with the highest priority AI tasks.
  • task flows related to functions such as driver monitoring (such as driver fatigue detection, distraction detection), etc. do not have high requirements for real-time scheduling (for example, requirements: received from the accelerator
  • requirements for real-time scheduling for example, requirements: received from the accelerator
  • the time from the task flow to the accelerator outputting the calculation result of the task flow exceeds the first time threshold but does not exceed the second time threshold, or the time for the accelerator to execute the task flow exceeds the first time threshold but does not exceed the second time threshold, wherein the second time threshold greater than the first time threshold)
  • the AI tasks in the task flow can be scheduled with a higher priority.
  • the task flow related to functions such as occupant monitoring and entertainment in the car does not require real-time performance (for example, the time required from the accelerator receiving the task flow to the accelerator outputting the calculation result of the task flow).
  • the second time threshold may be exceeded, or the time for the accelerator to execute the task flow may exceed the second time threshold), and the AI task may be scheduled with a lower priority.
  • the priorities corresponding to the task flows related to each function can be set according to the needs of users.
  • the first priority is the priority corresponding to the first task flow that does not require real-time scheduling (or not demanding or generally or relatively low requirements), or the priority for the calculation results (referring to the calculation result of the entire first task flow) the priority corresponding to the task flow whose real-time requirement is not high (or the requirement is average or the requirement is low).
  • the first task flow corresponding to the first priority does not require the accelerator to meet the first preset time range from receiving the first task flow to outputting the calculation result, or the first task flow corresponding to the first priority
  • a task flow does not require the accelerator to execute the first task flow to meet the first preset time range, or the first task flow corresponding to the first priority requires the accelerator to meet the first time from receiving the first task flow to outputting the calculation result.
  • Two preset time ranges the first task flow corresponding to the first priority requires the accelerator to execute the first task flow to meet the second preset time range.
  • the first task flow corresponding to the first priority can be a task flow related to occupant monitoring, entertainment and other functions in the car.
  • This type of task flow does not require real-time scheduling. Specifically, for example : The time from the accelerator receiving this type of task flow to the accelerator outputting the calculation result of this type of task flow is not required to be less than the second time threshold, or the time for the accelerator to execute this type of task flow is not required to be less than the second time threshold, or received from the accelerator The time from the task flow of this type to the accelerator outputting the calculation result of the task flow of this type may exceed the second time threshold, or the time for the accelerator to execute the task flow of this type may exceed the second time threshold.
  • the task flows delivered by the processor to the accelerator include at least the task flows of the first priority (that is, the multiple first task flows), but it is not limited to include other Prioritized task flow.
  • the task flow delivered by the processor to the accelerator may also include a second task flow of the second priority, wherein the second priority is higher than the first priority class.
  • the second priority can be the priority corresponding to the second task flow that has certain requirements for real-time scheduling (or higher or lower requirements), or the calculation result (referring to the priority of the entire second task flow)
  • the real-time output of calculation results) has a certain priority corresponding to the task flow (or higher or lower requirements).
  • the second task flow is, for example, a task flow related to the driver monitoring function. In a word, the second priority has a higher requirement on the real-time scheduling than the first priority has on the real-time scheduling.
  • the first task flow corresponding to the first priority can be the task flow related to occupant monitoring and entertainment functions in the vehicle, while for other real-time requirements other than occupant monitoring, occupant monitoring, Functions that require real-time performance from entertainment functions, such as driver monitoring functions, obstacle detection, or lane line detection, correspond to the second priority.
  • the priority including the first priority and the second priority
  • the number of priority levels can be flexibly set according to requirements, for example, the number of priority levels can be greater than 2 indivual.
  • the priority may also include a second priority and a third priority, the third priority is higher than the second priority, and the second priority is higher than the first priority (the first priority level ⁇ second priority ⁇ third priority).
  • the second priority is the priority corresponding to the second task flow that has certain requirements (or high or not low requirements) on the real-time scheduling, or the priority for the calculation result (referring to the entire second task flow)
  • the real-time output has certain requirements (or higher requirements or not lower requirements) corresponding to the priority of the task flow.
  • the second task flow is, for example, a task flow related to the driver monitoring function.
  • the third priority is the priority corresponding to the third task flow that has high (or highest) requirements for real-time scheduling, or the real-time output of the calculation results (referring to the calculation results of the entire third task flow)
  • the third task flow is, for example, a task flow related to functions such as obstacle detection or lane line detection.
  • this document mainly takes the priority including the first priority and the second priority as an example.
  • multiple sub-priorities can be further divided, and the sub-priorities in each priority are used to more finely characterize the real-time scheduling of different task flows under this priority different requirements.
  • the first priority may include at least two sub-priorities, and the sub-priorities of the first task flow are used to more finely characterize the real-time requirements of different first task flows for scheduling. For example, when scheduling the first task flow with higher real-time performance, the corresponding sub-priority is higher, or when scheduling the first task flow with higher real-time performance, the corresponding sub-priority is lower, which is not limited in this application.
  • the following mainly takes the example that the higher the real-time requirement of the task flow for scheduling, the higher the corresponding sub-priority.
  • the task flow corresponding to the first priority is the task flow related to functions such as occupant monitoring and entertainment in the vehicle, and the task flow related to the occupant monitoring function in the vehicle is the first sub-priority level, the task flow related to the entertainment function is the second sub-priority, and the second sub-priority is higher than the first sub-priority. It should be understood that this is only an example and not a limitation.
  • the AI model may also have a priority.
  • the priority of the AI model is used to represent the real-time requirement of the AI model for scheduling, or to represent the real-time requirement of the AI model for the output of the calculation result (referring to the calculation result of the entire AI model).
  • the priority of the task flow corresponding to the AI model should match the priority of the AI model, where the matching situations include but are not limited to the following two:
  • the priority of the task flow corresponding to the AI model is consistent with the priority of the AI model.
  • the priority of model I is the first sub-priority in the first priority
  • the task flow corresponding to model I is the first sub-priority in the first priority.
  • the priority of the AI model includes the priority of the task flow corresponding to the AI model.
  • the priority of Model II is the first priority
  • the two task flows corresponding to Model II one of which is the first sub-priority of the first priority
  • the other task flow is the sub-priority of the first priority.
  • Second sub-priority is the priority of the task flow corresponding to the AI model.
  • the accelerator can use the priority of the AI model to override the priority of the task flow, thereby making the task flow The priority of matches the priority of the AI model.
  • the priority of Model III is the first priority (the first priority includes the first sub-priority and the second sub-priority), Model III corresponds to three task flows, and the priority of the first task flow is the first The first sub-priority of the first priority, the priority of the second task flow is the second sub-priority of the first priority, and the priority of the third task flow is the first sub-priority of the second priority Priority, it can be seen that the priority of the third task flow (that is, the first sub-priority in the second priority) does not match the priority of Model III (that is, the first priority), so use the priority of Model III to override
  • the priority of the third task flow, the priority of the overridden third task flow is any sub-priority in the first priority (such as the first sub-priority or the first priority in the first priority the second subpriority in ).
  • the accelerator allocates a time slice to each first task flow in multiple first task flows according to the sub-priority of each first task flow.
  • the time slice allocated to the first task flow may specifically be the time allocated by the controller of the accelerator to the first task flow to occupy the operation logic unit, or the maximum time that the first task flow is allowed to be scheduled.
  • the controller of the accelerator schedules the AI task of the first task flow to the operation logic unit within the time slice corresponding to the first task flow, and the operation logic unit runs the scheduled AI task and outputs the calculation corresponding to the scheduled AI task result.
  • the first task flow with the higher sub-priority will have a longer allocated time slice; or, if the first task flow’s
  • the accelerator schedules the AI tasks of all the first task flows in the multiple first task flows based on time slice round-robin.
  • scheduling refers to a process in which the controller of the accelerator allocates AI tasks to the operation logic unit of the accelerator.
  • the operation logic unit runs (or calculates, or processes) the AI task assigned to itself, and can output the operation result (or calculation result, or processing result) corresponding to the AI task.
  • scheduling the AI tasks in a certain task flow can also be described as scheduling the task flow, for example, "scheduling the AI tasks of all the first task flows in multiple first task flows" can also be described as “Scheduling all first task flows in multiple first task flows”.
  • Time slice rotation means that multiple task flows occupy logical operation units to run in turn, and the time each task flow occupies in logical operation units is determined by the assigned time slice of the task flow.
  • the accelerator schedules the AI tasks of all the first task flows in the multiple first task flows based on time slice round-robin, including: the controller of the accelerator according to the corresponding The time slice (that is, the allocated time slice), the AI tasks in each first task flow are dispatched to the logic operation unit to run in turn, wherein the time occupied by each first task flow of the logic operation unit is determined by the first task flow
  • the allocated time slice determines.
  • Example 1 taking the first priority including 2 sub-priorities (the first sub-priority and the second sub-priority) as an example: the accelerator first evaluates the first task flow of the first sub-priority among the multiple first task flows AI tasks are scheduled; when the allocated time slice of the first task flow of the first sub-priority is exhausted, or when the AI tasks of the first task flow of the first sub-priority are scheduled, the AI tasks of the second sub-priority The AI tasks of the first task flow are scheduled; when the allocated time slice of the first task flow of the second sub-priority is exhausted, or the AI tasks of the first task flow of the second sub-priority are scheduled, one round Scheduling complete.
  • the accelerator first schedules the AI tasks of the first task flow of the second sub-priority among the multiple first task flows; When the allocated time slice of a task flow is exhausted, or the AI tasks of the first task flow of the second sub-priority are scheduled, the AI tasks of the first task flow of the first sub-priority are scheduled; When the allocated time slice of the first task flow of the sub-priority is exhausted, or the AI tasks of the first task flow of the first sub-priority are scheduled, a round of scheduling is completed.
  • the “exhaustion” of the allocated time slice for a certain task flow described in the embodiments of the present application means that the time slice actually used by the task flow is greater than or equal to the allocated time slice for the task flow.
  • the allocated time slice of the first task flow of the first sub-priority is exhausted, and the time slice actually used by the first task flow of the first sub-priority is greater than or equal to that of the first task flow of the first sub-priority.
  • the allocated time slice. (the time slice actually used by the task flow - the time slice actually used by the task flow) can also be used to represent the remaining time slice of the first task flow. If the assigned time slice of a certain task flow is exhausted, the task flow The remaining time slices can be 0 or negative.
  • Example 2 taking the first priority including 3 sub-priorities (first sub-priority, second sub-priority, and third priority) as an example: the accelerator first The AI tasks of the first task flow are scheduled; when the allocated time slice of the first task flow of the first sub-priority is exhausted, or the AI tasks of the first task flow of the first sub-priority are scheduled, the first task flow of the first sub-priority is scheduled.
  • the AI task of the first task flow of the second sub-priority is scheduled; the time slice allocated to the first task flow of the second sub-priority is exhausted, or the AI task of the first task flow of the second sub-priority is scheduled When finished, schedule the AI tasks of the first task flow of the third sub-priority; when the assigned time slice of the first task flow of the third sub-priority is exhausted, or the first task flow of the third When all AI tasks are scheduled, a round of scheduling is completed.
  • the above scheduling order can also be exchanged, which is not limited in this application.
  • the at least two first task flows if there are at least two first task flows with the same sub-priority, for example, there are at least two first task flows with the same sub-priority as the first sub-priority, then the at least two first task flows
  • the time slice allocated to each task flow in a task flow is determined according to the first sub-priority, that is, the length of the time slice allocated to each of the at least two first task flows is the same.
  • each first task flow in the at least two first task flows can be scheduled simultaneously (for example, if the accelerator has a plurality of operation logic units, the The AI tasks of different first task flows in the at least two first task flows are scheduled to different operation logic units), or the different first task flows in the at least two first task flows can be scheduled successively. No restrictions. But no matter which method is used, only after each first task flow of at least two first task flows corresponding to the first sub-priority is exhausted or the AI task is scheduled, another sub-priority can be assigned The first task flow of the level is scheduled.
  • different first task flows among the plurality of first task flows corresponding to the first priority may have different sub-priorities, and the accelerator performs sub-priority based on time slice rotation.
  • Each first task flow in the plurality of first task flows is scheduled so that the first task flows of different sub-priorities can occupy the accelerator in turn. Since the processing speed of the accelerator is very fast, as long as the time between each time slice If the interval is appropriate, the first task flows of different sub-priorities can be scheduled in parallel on the surface, and the technical effect of parallel scheduling of task flows of different sub-priorities in the first priority (or called ordinary priority) can be realized.
  • the above scheduling mechanism is simple and has a small amount of computation, and is suitable for hardened implementation, for example, for accelerators such as NPU.
  • the accelerator may start from the first task flow with the highest sub-priority, and sequentially schedule the AI tasks of each of the multiple first task flows based on time slice rotation.
  • the accelerator first evaluates the first sub-priority in multiple first task flows The AI tasks of the first task flow are scheduled; when the allocated time slice of the first task flow of the first sub-priority is exhausted, or the AI tasks of the first task flow of the first sub-priority are scheduled, the first task flow of the first sub-priority is scheduled.
  • the AI tasks of the first task flow of the second sub-priority are scheduled; when the allocated time slice of the first task flow of the second sub-priority is exhausted, or the AI tasks of the first task flow of the second sub-priority are scheduled When finished, a round of scheduling is completed.
  • the first priority includes the first sub-priority, the second sub-priority and the third sub-priority, and the third sub-priority is higher than the second priority, and the second sub-priority is higher than the first sub-priority
  • the accelerator first schedules the AI tasks of the first task flow of the first sub-priority among multiple first task flows; the allocated time slice of the first task flow of the first sub-priority is exhausted, or When the AI task of the first task flow of a sub-priority is scheduled, the AI task of the first task flow of the second sub-priority is scheduled; the time slice assigned to the first task flow of the second sub-priority When the AI tasks of the first task flow of the second sub-priority are exhausted, or the AI tasks of the first task flow of the second sub-priority are scheduled, the AI tasks of the first task flow of the third sub-priority are scheduled; when the first task flow of the third sub-priority When the allocated time slice is exhausted, or the AI tasks of the first
  • the above scheduling order can also be exchanged, which is not limited in this application.
  • the third sub-priority is higher than the second sub-priority
  • the second sub-priority is higher than the first sub-priority.
  • the accelerator when it schedules each task flow, it can schedule a single AI task to run on one operation logic unit, or schedule a single AI task to run on multiple operation logic units, which is not limited in this application.
  • the accelerator schedules a single AI task to run in multiple operation logic units, specifically including: the accelerator divides a single AI task into multiple units (blocks) (units can also be called blocks or data blocks) , dispatch multiple blocks to multiple ALUs to run, and one ALU executes one block.
  • blocks units
  • ALU executes one block.
  • each AI task can carry an indication information, which is used to describe the number of blocks that the AI task flow can be divided into, that is, it is used to indicate that the accelerator needs to use the block when scheduling the AI task. How many blocks the AI task is divided into (or in other words, the AI task needs to be dispatched to multiple operation logic units to run).
  • the first AI task may be the first AI task in the first task flow.
  • FIG. 4 is a schematic diagram of the relationship between AI tasks and blocks.
  • each logic operation unit After each logic operation unit receives the Block ID, it finds the corresponding Block to execute, that is, the operation logic unit A executes block0, and the operation logic unit B executes block1.
  • the controller of the accelerator can re-allocate the time slice, and start a new round of scheduling based on the reallocated time slice. For example, according to the sub-priority of each first task flow in at least one first-type task flow, reallocate time slices for each first task flow in at least one first-type task flow, and start a new round of scheduling based on a new time slice .
  • the controller needs to re-allocate time slices for each task flow in the multiple task flows. Specifically, the controller of the accelerator reallocates a time slice for each first task flow in the plurality of first-type task flows according to the sub-priority of each first task flow in the plurality of first-type task flows.
  • any first task flow has used time slices in the last round of scheduling exceeding its assigned time slice, the time slice that should be reassigned from any first task flow (that is, redistributed according to the sub-priority)
  • the time slice allocated by any first task flow in the last round of scheduling is deducted from the allocated time slice, and the deducted time slice is used as the actually allocated time slice of any first task flow.
  • the controller of the accelerator can re-allocate the time slice only for the other part of the task flow, or it can reassign the time slice for the multiple first task flows. All task flows reassign time slices, which is not limited in this application.
  • the time slice that should be reassigned from any first task flow (that is, according to the sub-priority
  • the time slice that any first task flow used more in the previous round of scheduling is deducted from the reallocated time slice), and the deducted time slice is used as the time slice actually allocated by any first task flow.
  • Time slice allocation in the first round of scheduling the controller allocates 1 time slice to each first task flow according to the time slice ratio of each first task flow corresponding to the first priority, where the sub-priority The time slice of the first task flow with high sub-priority is long, and the time slice of the first task flow with low sub-priority is short;
  • the controller schedules the first task flows sequentially according to the order of sub-priority from high to low, that is, the first task flow with high sub-priority is scheduled first, and the first task flow with low sub-priority is scheduled first. Scheduled to after the task flow.
  • the controller schedules a certain first task flow, and the first task flow has a time slice, it schedules the AI tasks in the first task flow to be executed by one or more operation logic units, and one of the AI tasks
  • the task can be divided into one or more blocks to run in parallel; each operation logic unit responds to the controller after executing the block (that is, sends the calculation result to the controller), and the controller sends the block according to the time issued by the block and the time when the corresponding response is received. time, calculate the running time of the operation logic unit to run the Block, and deduct this running time from the remaining time slice of the first task flow; if the time slice of the first task flow remains, the controller continues to schedule the first task flow
  • the AI task of the task flow is executed on the operation logic unit. If the AI task of the first task flow is scheduled or the remaining time slice of the first task flow is 0 or negative, the controller will schedule other first tasks with time slices. task flow;
  • the accelerator receive four first task streams from the processor: stream1 (sub-priority is WRR0), stream2 (sub-priority is WRR1), stream3 (sub-priority is WRR1), stream4 (sub-priority is WRR3),
  • the controller allocates time slices of 1000us, 800us, 800us, and 400us to stream1, stream2, stream3, and stream 4 according to the above time slice ratio.
  • stream1 ⁇ task11,task12,task13,... ⁇
  • stream2 ⁇ task21,task22,task23,... ⁇
  • stream3 ⁇ task31,task32,task33,... ⁇
  • stream2 ⁇ task41,task42,task43,... ⁇ .
  • the controller dispatches task11 of stream1 to the operation logic unit for execution, and the time for the operation logic unit to execute task11 is 400us;
  • the controller After deducting the 400us time slice of stream1, the controller still has 600us left, and continues to schedule task12;
  • the time for the operation logic unit to execute task12 is 500us. After deducting the time slice of stream1 for 500us, the controller still has 100us left to continue scheduling task13;
  • the time for the operation logic unit to execute task13 is 300us. After the controller deducts the time slice of stream1 for 300us, the time slice is -200us. It cannot continue to schedule stream1, but instead schedules stream2;
  • the above mainly introduces the scheduling method of task flows corresponding to different sub-priorities in the first priority, but in practical applications, there may be multiple task flows with different priorities in the accelerator at the same time.
  • the scheduling method of task flows corresponding to different priorities will be introduced in detail below.
  • the accelerator can schedule the high-priority task flows first, and then schedule the low-priority task flows after the high-priority task flows are scheduled. task flow.
  • the controller in the accelerator receives several task flows, and the several task flows include the multiple first task flows described in S301, and also include at least one second task flow, wherein the at least one first task flow
  • the priority type of the second task flow is the second priority
  • the second priority includes at least one sub-priority
  • the second priority is higher than the first priority.
  • the controller in the accelerator needs to schedule the AI tasks in at least one second task flow before scheduling the AI tasks of all the first task flows in the plurality of first task flows based on time slice round-robin; After all the AI tasks in at least one second task flow are scheduled, the AI tasks of all the first task flows in the multiple first task flows are scheduled based on time slice rotation.
  • the scheduling The process includes: the controller first schedules the third task flow, after scheduling all the AI tasks in the third task flow, then schedules the second task flow, after scheduling all the AI tasks in the second task flow , and then schedule the first task flow.
  • any priority other than the first priority may not be further divided into sub-priorities, or any other priority except the first priority contains only one sub-priority level, and then the length of the time slice corresponding to each task flow in any other priority level is the same, and each task flow in any other priority level uses the AI accelerator on average.
  • the method for scheduling task flows corresponding to any other priority based on time slices can be regarded as a special case of the method for scheduling task flows corresponding to the first priority based on time slices.
  • the task flow related to the driver monitoring function corresponds to the second priority.
  • the time slice ratio of the task flow related to the driver fatigue detection function and the task flow related to the driver distraction detection function is 1:1, for example, each task flow related to the driver fatigue detection function and the driver distraction detection function
  • the time slice allocated for each related task flow is 1100us.
  • task flows related to functions such as obstacle detection and lane line detection correspond to the third priority.
  • the time slice ratio of each task flow related to the obstacle detection function and each task flow related to the lane line detection function is 1:1, for example, the allocated time slice is 1200us.
  • sub-priorities may be further divided.
  • the scheduling method of the task flow corresponding to any other priority reference may be made to the scheduling method of the task flow corresponding to the first priority.
  • the second priority Take the second priority as an example: different second task flows among the multiple second task flows corresponding to the second priority may have different sub-priorities, and the accelerator performs a round-robin evaluation of the multiple second tasks based on time slices. Each second task flow in the flow is scheduled so that second task flows with different sub-priorities can occupy the accelerator in turn.
  • the allocation method of the time slice of the second task flow corresponding to each sub-priority in the second priority can refer to the allocation method of the time slice of the first task flow corresponding to each sub-priority in the first priority above, here No longer.
  • a high-priority task flow can preempt the time of a low-priority task flow occupying the operation logic unit of the accelerator (referred to as "a high-priority task flow preempts a low-priority task flow", or "a high-priority preemption low priority").
  • the specific implementation includes: if the controller of the accelerator receives a high-priority task flow during the process of scheduling a low-priority task flow, it can suspend the scheduling of the priority task flow and schedule the high-priority task instead flow.
  • the controller of the accelerator schedules the AI tasks of all the first task flows among multiple first task flows based on time slice round-robin
  • the controller of the accelerator receives at least one second task flow , then suspend scheduling AI tasks in multiple first task flows, and start scheduling AI tasks in at least one second task flow; wherein, the priority type of at least one second task flow is the second priority , the second priority includes at least one sub-priority, and the second priority is higher than the first priority.
  • the controller of the accelerator if the controller of the accelerator receives the third task flow, the controller of the accelerator suspends the first task flow or the second task flow and start scheduling the third task flow; wherein, the priority type of the third task flow is the third priority, and the third priority is higher than the second priority and the first priority.
  • the high-priority task flow can preempt the time occupied by the low-priority task flow of the operation logic unit of the accelerator, and the real-time scheduling of the high-priority task flow can be better guaranteed.
  • a high-priority task flow preempts a low-priority task flow, it occurs at the boundary of an AI task (that is, after one AI task is executed and before another AI task starts to execute) or at the boundary of a block (that is, After one block is executed, before another block is executed). If the controller of the accelerator receives a higher-priority task flow during the execution of an AI task or any block of an AI task, it must wait for the execution of the AI task or the block to complete before allowing the preemption process .
  • the controller of the accelerator schedules the AI tasks of all the first task flows in the multiple first task flows based on time slice round-robin, if the second AI task in the multiple first task flows is scheduled
  • an AI task receives at least one second task flow, after the second AI task is executed or after the block that has been dispatched to the operation logic unit in the second AI task is executed, the processing of multiple first The AI tasks in the task flow are scheduled, and at least one AI task in the second task flow is started to be scheduled.
  • the second AI task may be any AI task in any task flow.
  • the number of blocks of an AI task in the first task flow is 4, and the controller has sent 2 blocks in the AI task to the operation logic unit for execution. At this time, the controller receives a higher priority task flow, the controller only needs to wait for the completion of the two delivered blocks before switching to schedule higher priority task flows without delivering the remaining two blocks.
  • AI tasks are mainly matrix operations and vector operations, which are computationally intensive, they are also input/output (I/O) intensive, requiring large amounts of data to be read and written from memory.
  • I/O input/output
  • one clock cycle of an accelerator can multiply two 16*16 FP32 matrices, and it needs to read 2*16*16*4 bytes of data and write 16*16*4 bytes of data, which is much larger than The amount of data read and written when a common CPU does FP32 multiplication. Therefore, there are a large number of registers, cache memory (Cache) and buffer register (Buffer) in the AI operation logic unit, which are used to cache the calculation data that needs to be read from or written to the memory.
  • the AI accelerator When the AI accelerator executes an AI task, it will read data from the memory to the Cache or Buffer in parallel, and refresh the data in the Cache or Buffer to the memory. All Cache and Buffer are exclusive to the AI task, and both loading and refreshing are performed by AI. The task is controlled by itself. When the operation logic unit executes the next AI task, the AI task itself controls and manages the use of Cache or Buffer space.
  • the controller receives a new task flow, and the controller can allocate a time slice to the new task flow, where the time slice that the new task flow should be allocated is equal to the current round
  • the task flow with the same priority as the new task flow should allocate the same time slice (that is, the time slice that needs to be deducted from the previous round of multi-use is not considered).
  • the controller in the first round
  • the time slices allocated to stream1, stream2, stream3, and stream 4 in the scheduling cycle are 1000us, 800us, 800us, and 400us respectively. If the controller receives a new task stream stream5 (subpriority is WRR1) in the first round of scheduling cycle, Then the time slice allocated for the new task flow stream5 should be the same as that of stream2 (sub-priority is WRR1), that is, the time allocated for the task flow stream5 is 800us.
  • a scheduled task flow has a blocking scenario (for example, the scheduled AI model has multiple task flows, and one task flow waits for another task flow to be scheduled before it can be scheduled)
  • the controller can wait for the scheduled task flow to be scheduled.
  • the blocked task flow satisfies the execution condition (such as after another task flow is scheduled)
  • the task flow is executed again. If the task flow does not meet the execution conditions in a scheduling period, after the time slices of other task flows are exhausted or the AI task is scheduled, the next round of time slice allocation will start. Time slices are not carried into the next round of time slice allocation. If the remaining time slices of the blocked task flow in each scheduling cycle are brought into the next round of time slice allocation, it will cause the blocked task flow to accumulate a large number of time slices. When it is scheduled, it will cause The problem of severe delay in the execution of other task flows.
  • the controller can schedule other task flows first, and wait for the subsequent AI tasks of the task flow to arrive. , before continuing the task flow. If there is no follow-up AI task for this task flow in a scheduling period, then after the time slices of other task flows are exhausted or the AI task is scheduled, the next round of time slice allocation will start. The time slices will not be brought into the next round of time slice allocation. If the remaining time slices of the task flow in each scheduling cycle are brought into the next round of time slice allocation, it will cause the task flow to accumulate a large number of time slices. When an AI task arrives and is scheduled, it will The problem of causing serious delays in the execution of other task flows.
  • the controller may suspend the operation logic unit that is executing the third AI task Execute the third AI task, re-execute the scheduling process, that is, re-schedule the third AI task to the operation logic unit for execution.
  • the third AI task may be any AI task in the accelerator.
  • the running time of the AI model is at the ms level, and the AI model generally consists of dozens or hundreds of AI tasks. Therefore, under normal circumstances, the execution time of most AI tasks should be much less than 1ms . Therefore, the preset duration can be set to 1ms or 2ms or 3ms, etc. (here is only an example and not a limitation).
  • the accelerator can be used. The controller can reschedule the AI task to the operation logic unit for execution, and then resolve the abnormality, so that the operation logic unit can output the correct execution result of the AI task faster.
  • the controller can immediately switch to dispatching newly arrived AI tasks to the operation logic unit for execution, and re-execute the third AI task after other newly arrived AI tasks are executed, thereby ensuring the real-time scheduling of other AI tasks.
  • the accelerator can resolve exceptions in a timely manner while ensuring the real-time scheduling of other AI tasks.
  • FIG. 5 it is a specific example of the task scheduling method of the present application.
  • SP0, SP1, and SP2 where the priority relationship is WRR ⁇ SP2 ⁇ SP1 ⁇ SP0.
  • the WRR further includes five sub-priorities: WRR0, WRR1, WRR2, WRR3, and WRR4, where the sub-priority relationship is WRR0>WRR1>WRR2>WRR3>WRR4.
  • the time slice allocation relationship is: SP0>SP1>SP2>WRR0>WRR1>WRR2>WRR3>WRR.
  • the controller first schedules the task flows corresponding to SP0 based on time slice rotation; after all the task flows corresponding to SP0 are scheduled, it schedules the task flows corresponding to SP1 based on time slice rotation; between SP0 and After all task flows corresponding to SP1 are scheduled, each task flow corresponding to SP2 is scheduled based on time slice rotation; after all task flows corresponding to SP0, SP1 and SP2 are scheduled, each task flow corresponding to WRR is scheduled. Scheduling based on time slice rotation.
  • high-priority task flows can preempt low-priority task flows. For example, SP1 can be preempted by SP0, SP2 can be preempted by SP0/SP1, and WRR can be preempted by SP0/SP1/SP2.
  • the APP in the processor can call the accelerator through the API interface to execute the method provided by the embodiment of the present application.
  • the embodiments of the present application further provide a task scheduling device, which includes corresponding hardware structures and/or software modules for performing the foregoing functions.
  • a task scheduling device which includes corresponding hardware structures and/or software modules for performing the foregoing functions.
  • the task scheduling apparatus 600 includes:
  • the transceiver module 601 is configured to receive a plurality of first task flows, the priority type of the plurality of first task flows is a first priority, and the first priority includes a first sub-priority and a second sub-priority;
  • the processing module 602 is configured to allocate a time slice to each first task flow according to the sub-priority of each first task flow among the multiple first task flows; wherein, the first sub-priority is higher than the second sub-priority, The assigned time slice of the first task flow corresponding to the first sub-priority is greater than the assigned time slice of the first task flow corresponding to the second sub-priority; The AI tasks of a task flow are scheduled.
  • the processing module 602 schedules the AI tasks of all the first task flows in the multiple first task flows based on time slice round-robin, it is specifically configured to: the first task of the first sub-priority The AI task of the stream is scheduled; the time slice used by the first task stream of the first sub-priority is greater than or equal to its allocated time slice, or, the first task stream of the first sub-priority When all AI tasks are scheduled, the AI tasks of the first task flow of the second sub-priority are scheduled.
  • the processing module 602 may also be used for: when the remaining AI tasks of at least one task flow of the first type in the multiple first task flows have not been scheduled, and each first task in the at least one task flow of the first type When the time slice used by the flow is greater than or equal to its allocated time slice, according to the sub-priority of each first task flow in the at least one first-type task flow, each first task in the at least one first-type task flow Flow reallocation of time slices; wherein, if there is any first task flow in at least one first-type task flow, the time slice used by any first task flow in the last round of scheduling exceeds its assigned time slice, then any first task flow The overused time slice of any first task flow in the last round of scheduling is deducted from the reassigned time slice of the flow.
  • the transceiver module 601 is specifically configured to: receive several task flows, wherein the several task flows include multiple first task flows and at least one second task flow, and the priority type of at least one second task flow is the second priority level, the second priority is higher than the first priority.
  • the processing module 602 may also be configured to: before scheduling the AI tasks of all the first task flows in the plurality of first task flows based on time slice round-robin, perform AI tasks in at least one second task flow Scheduling: after all the AI tasks in at least one second task flow are scheduled, the AI tasks of all the first task flows in the multiple first task flows are scheduled based on time slice rotation.
  • the processing module 602 can also be used to: in the process of scheduling the AI tasks of all the first task flows in the multiple first task flows based on time slice round-robin, if scheduling multiple first task flows.
  • the processing module 602 can also be used to: in the process of scheduling the AI tasks of all the first task flows in the multiple first task flows based on time slice round-robin, if scheduling multiple first task flows.
  • the first task flow is the task flow in the first AI reasoning model, and the first AI reasoning model corresponds to the occupant monitoring function or entertainment function in the automatic driving system;
  • the second task flow is the task flow in the second AI reasoning model.
  • the task flow of the second AI reasoning model corresponds to the obstacle detection function or the lane line detection function or the driver detection function in the automatic driving system.
  • embodiments of the present application further provide an accelerator, including programmable logic circuits and/or program instructions, configured to execute the methods described in the above method embodiments when the accelerator is running.
  • the embodiments of the present application also provide a computer-readable storage medium, which is used to store instructions, and when the instructions are executed, the methods described in the above method embodiments are implemented.
  • the embodiments of the present application also provide a computer program product containing instructions, and when the computer program product is run on a computer, it causes the computer to execute the methods described in the above method embodiments.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Traffic Control Systems (AREA)
  • Multi Processors (AREA)

Abstract

本申请公开了一种任务调度方法和装置。加速器接收多个第一任务流,多个第一任务流的优先级类型为第一优先级,第一优先级包括第一子优先级和第二子优先级;根据多个第一任务流中每个第一任务流的子优先级为每个第一任务流分配时间片,其中第一子优先级高于第二子优先级,第一子优先级对应的第一任务流被分配的时间片大于第二子优先级对应的第一任务流被分配的时间片;基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度。如此,可以使得不同子优先级的第一任务流轮流占用加速器,且子优先级越高的第一任务流占用加速器运行的时间越多,进而兼顾不同AI任务的调度需求,提高加速器的计算资源利用率。

Description

一种任务调度方法和装置 技术领域
本申请涉及人工智能(Artificial Intelligence,AI)领域,尤其涉及一种任务调度方法和装置。
背景技术
随着计算机网络技术的发展,有越来越多应用场景需要构建神经网络模型。例如,在自动驾驶***中,有大量的场景需要用到AI模型进行推理,而AI模型本质是一种深度神经网络模型,神经网络模型具有矩阵和矢量计算密集的特点,对***的运算能力要求很高。普通的中央处理器(Central Processing Unit,CPU)一般不能满足神经网络模型的算力需求,因此需要用到专用的加速器来执行推理加速,比如图形处理器(Graphics Processing Unit,GPU),或专门定制的嵌入式神经网络处理器(Neural network Processing Unit,NPU)等。通常情况下,由于接口限制,应用(Application,APP)侧需要将AI模型分拆成任务流下发给加速器,其中每个任务流包括一个或多个AI任务。
加速器一旦选定,其计算能力就已经确定。而不同的AI任务的调度需求可以是不同的,例如在自动驾驶场景中,障碍物检测、车道线检测等功能相关的AI任务要求高实时性调度,而车内乘员监控、娱乐等功能相关的AI任务一般不要求高实时性调度。
如何兼顾不同AI任务的调度需求,提高加速器计算资源的利用率,是亟需解决的技术问题。
发明内容
本申请提供一种任务调度方法和装置,用于实现兼顾不同AI任务的调度需求,以及提高加速器计算资源的利用率。
第一方面,提供一种任务调度方法,该方法可以应用于任何具备计算能力的设备或芯片或集成电路等。以该方法应用于加速器为例,方法包括:加速器接收多个第一任务流,多个第一任务流的优先级类型为第一优先级,第一优先级包括第一子优先级和第二子优先级;根据多个第一任务流中每个第一任务流的子优先级为每个第一任务流分配时间片;其中,第一子优先级高于第二子优先级,第一子优先级对应的第一任务流被分配的时间片大于第二子优先级对应的第一任务流被分配的时间片;基于时间片轮转的方式对多个第一任务流中所有第一任务流的人工智能AI任务进行调度。
由于第一优先级对应的多个第一任务流中的不同第一任务流可以具有不同的子优先级,而加速器基于时间片轮转的方式对多个第一任务流中的每个第一任务流进行调度,所以可以实现不同子优先级的第一任务流可轮流占用加速器的效果。由于加速器的处理速度很快,只要各个时间片之间的间隔适当,就可以使不同子优先级的第一任务流从表面上看是并行调度的,实现第一优先级中不同子优先级的任务流并行调度的技术效果;同时,由于子优先级越高的任务流被分配的时间片越长,所以还可以保证子优先级越高的任务流占用加速器运行的时间越多,进而实现优先级的控制。并且,该调度机制简单,运算量小,适用于硬化实现,例如适用于NPU、GPU等加速器。
应理解,本文是以第一优先级包括第一子优先级和第二子优先级,实际应用中还可以包括更多的优先级,本申请不做限制。
一种可能的实现方式中,在基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度时,可以是从高子优先级的第一任务流开始调度AI任务。示例性的,以第一子优先级最高为例:先对第一子优先级的第一任务流的AI任务进行调度;在第一子优先级的第一任务流已使用的时间片大于或等于其被分配的时间片,或者,在第一子优先级的第一任务流的AI任务被调度完时,再对第二子优先级的第一任务流的AI任务进行调度。如果还有比第二子优先级低的子优先级,例如,第三子优先级,则在第二子优先级的第一任务流已使用的时间片大于或等于其被分配的时间片,或者,在第二子优先级的第一任务流的AI任务被调度完时,继续对第三子优先级的第一任务流的AI任务进行调度。
如此,可以在实现第一优先级中各子优先级对应的任务流并行调度的同时,尽可能保证较高子优先级的任务流先被调度到。
在本申请实施例中,一个AI任务可以被调度至加速器的一个运算逻辑单元进行处理,也可以被调度至多个运算逻辑单元进行处理,本申请对此不做限制。
一种可能的实现方式中,每个AI任务可以携带的一个指示信息,用于指示该AI任务可以需要被调度至多少运算逻辑单元进行处理。示例性的,根据第一任务流中第一AI任务携带的第一指示信息确定第一AI任务对应的数据块的数量N,N为正整数;若N=1,则将第一AI任务调度至一个运算逻辑单元进行处理;若N>1,则将第一AI任务切分为N个数据块,将切分后的N个数据块分别调度至N个不同的运算逻辑单元进行处理。
如此,可以实现将单个AI任务调度到多个运算逻辑单元运行,进而提高AI任务调度的并行度。
在本申请实施例中,加速器在完成一轮调度之后,还可以重新分配时间片。
一种可能的实现方式中,当多个第一任务流中剩余至少一个第一类任务流的AI任务未被调度完,且至少一个第一类任务流中每个第一任务流已使用的时间片大于或等于其被分配的时间片时,根据至少一个第一类任务流中每个第一任务流的子优先级为至少一个第一类任务流中的每个第一任务流重新分配时间片;其中,若至少一个第一类任务流中存在任一第一任务流在上一轮调度中已使用的时间片超过其被分配的时间片,则从任一第一任务流重新被分配的时间片中扣除任一第一任务流在上一轮调度中多使用的时间片。
如此,可以保证各第一任务流在多轮调度中实际使用总时间片比例与子优先级匹配,可以进一步提高优先级控制的精准度。
一种可能的实现方式中,如果加速器同时接收到多个优先级的任务流,可以优先调度,高优先级的任务流。示例性的,若接收若干任务流,其中若干任务流包括多个第一任务流、至少一个第二任务流,至少一个第二任务流的优先级类型为第二优先级,第二优先级高于第一优先级;则在基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度之前,对至少一个第二任务流中的AI任务进行调度;在调度完至少一个第二任务流中的所有AI任务之后,再基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度。
如此,可以实现高优先级的任务流优先于低优先级的任务流被加速器调度,可以更好地保证高优先级的任务流的调度实时性。
一种可能的实现方式中,高优先级的任务流可以抢占低优先级的任务流占用加速器的 运算逻辑单元的时间。示例性的,在基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度的过程中,若在调度多个第一任务流中的第二AI任务时,接收到至少一个第二任务流,则在第二AI任务被执行完之后或在第二AI任务中已经被调度至运算逻辑单元的数据块被执行完之后,暂停对多个第一任务流中的AI任务进行调度,并开始对至少一个第二任务流中的AI任务进行调度;其中,至少一个第二任务流的优先级类型为第二优先级,第二优先级高于第一优先级。
如此,高优先级的任务流可以抢占低优先级的任务流占用加速器的运算逻辑单元的时间,可以更好地保证高优先级的任务流的调度实时性。
本申请实施例可以应用于任何需要对AI模型进行推理的场景。
一种可能的实现方式中,应用于自动驾驶场景。例如:上述第一任务流为第一AI推理模型中的任务流,第一AI推理模型对应自动驾驶***中的车内乘员监控功能或娱乐功能;上述第二任务流为第二AI推理模型中的任务流,第二AI推理模型对应自动驾驶***中的障碍物检测功能或车道线检测功能或驾驶员检测功能。
如此,可以保证自动驾驶***中不同功能的实时性需求,提升自动驾驶的智能性,提升用户体验。
第二方面,提供一种任务调度装置,该装置包括用于执行上述第一方面或第一方面任一种可能的实现方式所述方法的模块/单元。
示例性的,该装置可以包括收发模块,用于接收多个第一任务流,多个第一任务流的优先级类型为第一优先级,第一优先级包括第一子优先级和第二子优先级;处理模块,用于根据多个第一任务流中每个第一任务流的子优先级为每个第一任务流分配时间片;其中,第一子优先级高于第二子优先级,第一子优先级对应的第一任务流被分配的时间片大于第二子优先级对应的第一任务流被分配的时间片;基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度。
上述各个模块执行相应功能的具体实施方式,请参照上述第一方面中相应实施方式可,这里不再重复赘述。
第三方面,提供一种加速器,包括可编程逻辑电路和/或程序指令,当所述加速器运行时,用于实现上述第一方面或第一方面任一种可能的实现方式所述方法。
第四方面,提供一种计算机可读存储介质,该可读存储介质用于存储指令,当指令被执行时,使上述第一方面或第一方面任一种可能的实现方式所述方法被实现。
第五方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或第一方面任一种可能的实现方式所述方法。
上述第二方面至第五方面的有益效果,具体请参照上述第一方面中相应实施方式可以达到的技术效果,这里不再重复赘述。
附图说明
图1为本申请实施例适用的一种应用场景的示意图;
图2为本申请实施例提供的一种计算设备的结构示意图;
图3为本申请实施例提供的一种任务调度方法的流程图;
图4为AI任务和block的关系示意图;
图5为本申请任务调度方法的一种具体示例;
图6为本申请实施例提供的一种任务调度装置的结构示意图。
具体实施方式
本申请实施例提供的任务调度方法适用于任何需要对AI模型进行推理的场景。例如包括但不限于以下场景:智能控制(如自动驾驶、辅助驾驶)、机器视觉,指纹识别,人脸识别,视网膜识别,虹膜识别,掌纹识别,自动规划,智能搜索,定理证明,博弈,自动程序设计,机器人学,语言和图像理解,遗传编程等。
示例性地,参见图1,为本申请实施例适用的一种应用场景的示意图。在自动驾驶场景中,车辆在实现障碍物检测、车道线检测、驾驶员监控或娱乐等相关功能时,都需要用到AI模型推理。例如,车辆可以采集车道的图像,并将采集的图像输入车道线检测功能对应的AI模型中,通过车道线检测功能对应的AI模型输出车道线检测结果;又如,车辆可以采集驾驶员发出的语音信号,将语音信号输入驾驶员监控功能对应的AI模型中,通过驾驶员监控功能对应的AI模型输出语音识别结果。
进一步的,本申请实施例提供的任务调度方法可以由计算设备来执行,该计算设备可以设置在云端数据中心来响应用户的需求,也可以设置在前端智能设备中来响应用户的需求,本申请对此不做限制。以图1所示的自动驾驶场景为例,计算设备可以部署在车辆(如车载电脑),也可以部署在车辆的云端服务器。
应理解,本申请实施例中的计算设备,可以是任何具备计算能力的设备或芯片或集成电路等。例如,计算设备包括但不限于是通用的中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、神经网络处理器(neural networks processunit,NPU)、现场可编程逻辑门阵列(field programmable gate array,FPGA)等中的一种或多种。
示例性地,图2示出了一种可能的计算设备的结构示意图。该计算设备中包括处理器和加速器。
其中,处理器(例如CPU)通常是一块超大规模的集成电路,是一台计算机的运算核心(Core)和控制核心(Control Unit)。其功能主要是解释计算机指令以及处理计算机软件中的数据。CPU是计算机中负责读取指令,对指令译码并执行指令的核心部件。中央处理器主要包括两个部分,即控制器、运算器,其中还包括高速及实现它们缓冲处理器之间联系的数据、控制的总线。在计算机体系结构中,CPU是对计算机的所有硬件资源(如存储器、输入输出单元)进行控制调配、执行通用运算的核心硬件单元。CPU是计算机的运算和控制核心。计算机***中所有软件层的操作,最终都将通过指令集映射为CPU的操作。
CPU可单独作用,处理复杂的逻辑运算和不同的数据类型,但当需要大量的处理类型统一的数据(例如具有矩阵和矢量计算密集的神经网络模型)时,其算力难以满足需求。
加速器,可以用于处理大量的处理类型统一的数据(例如具有矩阵和矢量计算密集的神经网络模型)。它可以作为CPU的辅助处理器,分担CPU的部分工作,以加快CPU获得数据处理结果的速度。常见的加速器举例:NPU、GPU等。
GPU,又称显示核心、视觉处理器、显示芯片,是一种专门在个人电脑、工作站、游戏机和一些移动设备(如平板电脑、智能手机等)上做图像和图形相关运算工作的微处理器。GPU使显卡减少了对CPU的依赖,并进行部分原本CPU的工作,尤其是在3D图形处理时GPU所采用的核心技术有硬件T&L(几何转换和光照处理)、立方环境材质贴图和 顶点混合、纹理压缩和凹凸映射贴图、双重纹理四像素256位渲染引擎等,而硬件T&L技术可以说是GPU的标志。
NPU采用“数据驱动并行计算”的架构,特别擅长处理视频、图像类的海量多媒体数据。NPU专门为物联网人工智能而设计,用于加速神经网络的运算,解决传统芯片在神经网络运算时效率低下的问题。NPU可以自行处理某些数据,将接受到的多元化的数据分担给其他单元处理。
处理器上运行有一个或多个应用程序(Application,APP)。应理解,图2示意的是3个APP,分别为APP1、APP2、APP3,但实际不限于此。
一般的,一个APP可以对应一个或多个AI模型,一个AI模型也可以对应一个或多个APP。APP在完成初始化后,处理器中的APP可以调用加速器的驱动程序(如NPU驱动)驱动加速器运行,并将AI模型下发(或者说加载)到至加速器。
应理解,APP侧的AI模型一般为计算图结构,所以APP需要将计算图结构进行转换成加速器可执行的执行序列结构后,再将AI模型以一个或多个执行序列的形式下发给加速器。一个AI模型可以对应一个或多个执行序列,每个执行序列可以包括多个执行单元。在本文中,执行单元还可称为执行任务或AI任务(task)。每个执行序列可以由多个AI任务组成,同一执行序列的AI任务在加速器上串行执行,所以执行序列也可称为任务流。例如,图2中所示的每个SQE,就是一个AI任务(即一个执行单元)。应理解,在图2中,仅仅是以APP1对应的AI模型1有2个任务流、APP2对应的AI模型2有1个任务流、APP3对应的AI模型3有3个任务流为例,但实际不限于此。
在具体实现时,APP在将AI模型加载到加速器时,可以是一次加载所有任务流(或所有AI任务)给加速器执行,也可以先后分多次加载且每次仅加载AI模型的部分任务流(或部分AI任务)给加速器执行。不管哪种方式,在加速器侧看到的都是任务流中的AI任务需要被执行(或者说执行序列中的执行单元需要被执行)。
加速器可以包括控制器和运算逻辑单元。其中,控制器用于接收处理器下发的AI任务(或者说接收处理器下发的任务流),并将处理器下发的AI任务分配到运算逻辑单元执行。其中,控制器将AI任务分配到运算逻辑单元的过程又可以称为控制器将AI任务调度到运算逻辑单元的过程。运算逻辑单元用于执行(或运行)控制器调度的AI任务并将执行结果(或运行结果或计算结果)上报给控制器。控制器还用于将AI模型的计算结果上报给处理器。应理解,AI模型在被执行时,AI模型的所有AI任务都执行完成后,处理器的APP才能得到完整的计算结果。
在具体实现时,一个处理器可以对应一个加速器,也可以对应多个加速器,本申请对此不做限制。
在具体实现时,处理器可加速器可以分别以单独的芯片实现,也可以集成在一个芯片中实现,本申请对此不做限制。
参见图3,为本申请实施例提供的一种任务调度方法的流程图,以该方法应用于图2所示的计算设备为例,方法包括:
S301、加速器接收多个第一任务流,其中,该多个第一任务流的优先级类型为第一优先级,第一优先级包括至少两个子优先级。
具体的,加速器的控制器可以从处理器接收多个第一任务流,每个第一任务流中包括多个的AI任务。其中,所述多个第一任务流可以对应同一个AI模型,以图2为例,处理 器中的APP3可以将APP3对应的模型3的任务流下发到加速器的控制器,模型3包括三个任务流即:{SQE41,SQE42,……}、{SQE51,SQE52,……}、{SQE61,SQE62,……};所述多个第一任务流也可以对应多个不同的AI模型,以图2为例,处理器中的APP1可以将APP1对应的模型1的任务流和APP2对应的模型2的任务流下发到加速器的控制器,其中模型1包括1个任务流即:{SQE11,SQE12,……},模型2包括2个任务流即:{SQE21,SQE22,……}{SQE31,SQE32,……}。应理解,每个第一任务流中的AI任务都是待被调度的AI任务。在调度时,每个第一任务流中的AI任务以串行的形式被调度。
在本申请实施例中,任务流的优先级可以用于表征任务流对调度的实时性要求,或者用于表征任务流对计算结果(是指整个任务流的计算结果)输出的实时性要求。任务流对调度的实时性要求越高,其对应的优先级越高,或者任务流对调度的实时性要求越低,其对应的优先级越高,本申请不做限制。另外,对于优先级类型的数量,可以根据实际需求设定,本申请不做限制。
为了便于描述,下文中主要以任务流对调度的实时性要求越高,其对应的优先级越高为例。
例如,在图1所示的自动驾驶场景中:障碍物检测、车道线检测等功能相关的任务流,必须在预期时间内完成检测任务获得检测结果,所以对调度的实时性要求高(例如要求:从加速器接收到任务流至加速器输出任务流的计算结果的时间不超过第一时间阈值,或者加速器执行任务流的时间不超过第一时间阈值),需以最高的优先级调度任务流中的AI任务。
例如,在图1所示的自动驾驶场景中:驾驶员监控(比如驾驶员疲劳检测、分心检测)等功能相关的任务流,对调度的实时性要求不高(例如要求:从加速器接收到任务流至加速器输出任务流的计算结果的时间超过第一时间阈值但不超过第二时间阈值,或者加速器执行任务流的时间超过第一时间阈值但不超过第二时间阈值,其中第二时间阈值大于第一时间阈值),可以较高的优先级调度任务流中的AI任务。
例如,在图1所示的自动驾驶场景中:车内乘员监控、娱乐等功能相关的任务流,不要求实时性(例如要求:从加速器接收到任务流至加速器输出任务流的计算结果的时间可以超过第二时间阈值,或者加速器执行任务流的时间可以超过第二时间阈值),可以较低的优先级调度AI任务。
当然,以上仅为举例而非限制,在具体实时,各功能相关的任务流对应的优先级可以根据用户的需求进行设置。
在本申实施例中,第一优先级是对调度的实时性没有要求(或者说要求不高或者说要求一般或者说要求较低)的第一任务流对应的优先级,或者是对计算结果(是指整个第一任务流的计算结果)输出的实时性要求不高(或者说要求一般或者说要求较低)的任务流对应的优先级。
一种可能的实现方式:第一优先级对应的第一任务流不要求加速器从接收到第一任务流至输出计算结果的时间满足第一预设时间范围,或者,第一优先级对应的第一任务流不要求加速器执行第一任务流的时间满足第一预设时间范围,或者,第一优先级对应的第一任务流要求加速器从接收到第一任务流至输出计算结果的时间满足第二预设时间范围,第一优先级对应的第一任务流要求加速器执行第一任务流的时间满足第二预设时间范围。
例如,在图1所示的自动驾驶场景中,第一优先级对应的第一任务流可以为车内乘员 监控、娱乐等功能相关的任务流,该类任务流不要求实时性调度,具体例如:从加速器接收到该类任务流至加速器输出该类任务流的计算结果的时间不要求小于第二时间阈值,或者加速器执行该类任务流的时间不要求小于第二时间阈值,或者从加速器接收到该类任务流至加速器输出该类任务流的计算结果的时间可以超过第二时间阈值,或者加速器执行该类任务流的时间可以超过第二时间阈值。
应理解,此处仅为举例而非具体限制,实际应用时,第一优先级的任务流对应的功能还可以是其它功能。
需要说明的是,在本申请实施例中,处理器下发给加速器的任务流中至少包括第一优先级的任务流(即所述多个第一任务流),但不限制还可以包括其它优先级的任务流。
示例性的,处理器下发给加速器的任务流中除了第一优先级的第一任务流之外,还可以包括第二优先级的第二任务流,其中第二优先级高于第一优先级。第二优先级可以是对调度的实时性有一定要求(或者说要求较高或者说要求不低)的第二任务流对应的优先级,或者是对计算结果(是指整个第二任务流的计算结果)输出的实时性有一定要求(或者说要求较高或者说要求不低)的任务流对应的优先级。第二任务流例如是驾驶员监控功能相关的任务流。总而言之,第二优先级是对调度的实时性的要求高于第一优先级对调度的实时性的要求。
例如,在图1所示的自动驾驶场景中,第一优先级对应的第一任务流可以为车内乘员监控、娱乐功能相关的任务流,而对于其它对实时性要求比车内乘员监控、娱乐功能对实时性要求的功能,例如驾驶员监控功能、障碍物检测或车道线检测等功能相关的任务流的优先级,对应第二优先级。
需要说明的是,上述任务流与优先级的对应关系也是举例而非限定,在实际应用中可以根据实际需求重新进行设定,本申请不做限制。
进一步需要说明的是,以上是以优先级包括第一优先级和第二优先级为例,但在实际应用中,优先级的级别数量可以根据需求灵活设置,例如优先级的级别数量可以大于2个。例如,优先级可以除了第一优先级之外,还包括第二优先级、第三优先级,第三优先级高于第二优先级,第二优先级高于第一优先级(第一优先级<第二优先级<第三优先级)。其中,第二优先级是对调度的实时性有一定要求(或者说要求较高或者说要求不低)的第二任务流对应的优先级,或者是对计算结果(是指整个第二任务流的计算结果)输出的实时性有一定要求(或者说要求较高或者说要求不低)的任务流对应的优先级。第二任务流例如是驾驶员监控功能相关的任务流。第三优先级是对调度的实时性要求很高(或者说要求最高)的第三任务流对应的优先级,或者是对计算结果(是指整个第三任务流的计算结果)输出的实时性要求很高(或者说要求最高)的任务流对应的优先级。第三任务流例如是障碍物检测或车道线检测等功能相关的任务流。
为了便于说明,本文主要以优先级包括第一优先级和第二优先级为例。在本申请实施例中,对于任一种优先级,还可以进一步划分多个子优先级,每个优先级中的子优先级用于更加精细地表征该优先级下的不同任务流对调度实时性的不同要求。例如,第一优先级可以包括至少两个子优先级,第一任务流的子优先级用于更加精细地表征不同第一任务流对调度的实时性要求。例如,调度实时性越高的第一任务流,对应的子优先级越高,或者调度实时性越高的第一任务流,对应的子优先级越低,本申请不做限制。
为了便于描述,下文中主要以任务流对调度的实时性要求越高,其对应的子优先级越 高为例。
例如,在图1所示的自动驾驶场景中,第一优先级对应的任务流为车内乘员监控、娱乐等功能相关的任务流,其中车内成员监控功能相关的任务流为第一子优先级,娱乐功能相关的任务流为第二子优先级,第二子优先级高于第一子优先级。应理解,此处仅为举例而非限制。
另外,在本申请实施例中,AI模型也可以具有优先级。其中,AI模型的优先级用于表征AI模型对调度实时性的要求,或者用于表征AI模型对计算结果(是指整个AI模型的计算结果)输出实时性的要求。
应理解,AI模型对应的任务流的优先级应当与AI模型的优先级匹配,其中匹配的情况包括但不限于以下两种:
情况1、AI模型对应的任务流的优先级与AI模型的优先级一致。例如,模型I的优先级为第一优先级中的第一子优先级,模型I对应的任务流均为第一优先级中的第一子优先级。
情况2、AI模型的优先级包括AI模型对应的任务流的优先级。例如,模型II的优先级为第一优先级,模型II对应的两个任务流,其中一个任务流为第一优先级中的第一子优先级,另一个任务流为第一优先级中的第二子优先级。
如果加速器检测到某个AI模型的优先级与该AI模型对应的某个任务流的优先级不匹配,则加速器可以使用该AI模型的优先级覆盖该任务流的优先级,进而使得该任务流的优先级与该AI模型的优先级匹配。
例如,模型III的优先级为第一优先级(第一优先级包括第一子优先级和第二子优先级),模型III对应三个任务流,其中第一个任务流的优先级为第一优先级中的第一子优先级,第二个任务流的优先级为第一优先级中的第二子优先级,第三个任务流的优先级为第二优先级中的第一子优先级,可见第三个任务流的优先级(即第二优先级中的第一子优先级)与模型III的优先级(即第一优先级)不匹配,所以使用模型III的优先级覆盖第三个任务流的优先级,覆盖后的第三个任务流的优先级为第一优先级中的任意一个子优先级(如第一优先级中的第一子优先级或第一优先级中的第二子优先级)。
S302、加速器根据多个第一任务流中每个第一任务流的子优先级为所述每个第一任务流分配时间片。
其中,分配给第一任务流的时间片,具体可以为加速器的控制器分配给第一任务流的一段占用运算逻辑单元的时间,或者说是第一任务流允许被调度的最长时间。加速器的控制器在每个第一任务流对应的时间片内将该个第一任务流的AI任务调度到运算逻辑单元,运算逻辑单元运行被调度的AI任务输出被调度的AI任务对应的计算结果。
若第一任务流的子优先级越高,对调度的实时性要求越高,则子优先级越高的第一任务流,其被分配的时间片越长;或者,若第一任务流的子优先级越低,对调度的实时性要求越高,则子优先级越低的第一任务流,其被分配的时间片越长。本申请对优先级高低和时间片长短的对应关系不做具体限制。
为了便于说明,在下文中主要以第一任务流的子优先级越高,对调度的实时性要求越高,且被分配的时间片越长为例。
S303、加速器基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度。
应理解,在本文中,调度是指:加速器的控制器将AI任务分配到加速器的运算逻辑单元的过程。运算逻辑单元运行(或者说计算,或者说处理)被分配到自身的AI任务,可以输出该AI任务对应的运算结果(或者说计算结果,或者说处理结果)。其中,对某个任务流中的AI任务进行调度还可以描述为对该任务流进行调度,例如,“对多个第一任务流中所有第一任务流的AI任务进行调度”还可以描述为“对多个第一任务流中所有第一任务流进行调度”。
时间片轮转是指:多个任务流轮流地占用逻辑运算单元运行,其中每个任务流占用逻辑运算单元的时间由该任务流被分配的时间片决定。
相应的,加速器基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度,包括:加速器的控制器根据多个第一任务流中每个第一任务流对应的时间片(即被分配的时间片),将每个第一任务流中的AI任务轮流调度至逻辑运算单元运行,其中每个第一任务流占用逻辑运算单元的时间由该第一任务流被分配的时间片决定。
以下通过两个示例来说明:
示例1,以第一优先级包括2个子优先级(第一子优先级、第二子优先级)为例:加速器首先对多个第一任务流中第一子优先级的第一任务流的AI任务进行调度;在第一子优先级的第一任务流被分配的时间片用尽,或者第一子优先级的第一任务流的AI任务被调度完时,对第二子优先级的第一任务流的AI任务进行调度;当第二子优先级的第一任务流被分配的时间片用尽,或者第二子优先级的第一任务流的AI任务被调度完时,一轮调度完成。当然,具体实施时,上述的调度顺序也可以交换,即:加速器首先对多个第一任务流中第二子优先级的第一任务流的AI任务进行调度;在第二子优先级的第一任务流被分配的时间片用尽,或者第二子优先级的第一任务流的AI任务被调度完时,对第一子优先级的第一任务流的AI任务进行调度;当第一子优先级的第一任务流被分配的时间片用尽,或者第一子优先级的第一任务流的AI任务被调度完时,一轮调度完成。
需要说明的是,本申请实施例中所描述的某个任务流被分配的时间片“用尽”,是指该任务流实际使用的时间片大于或等于该任务流被分配的时间片。例如,第一子优先级的第一任务流被分配的时间片用尽,是第一子优先级的第一任务流实际使用的时间片大于或等于第一子优先级的第一任务流被分配的时间片。另外,还可以使用(任务流实际使用的时间片-任务流实际使用的时间片)来表示第一任务流剩余的时间片,若某个任务流被分配的时间片用尽,则该任务流剩余的时间片可以为0或者为负。
示例2,以第一优先级包括3个子优先级(第一子优先级、第二子优先级、第三优先级)为例:加速器首先对多个第一任务流中第一子优先级的第一任务流的AI任务进行调度;在第一子优先级的第一任务流被分配的时间片用尽,或者第一子优先级的第一任务流的AI任务被调度完时,对第二子优先级的第一任务流的AI任务进行调度;在第二子优先级的第一任务流被分配的时间片用尽,或者第二子优先级的第一任务流的AI任务被调度完时,对第三子优先级的第一任务流的AI任务进行调度;当第三子优先级的第一任务流被分配的时间片用尽,或者第三子优先级的第一任务流的AI任务被调度完时,一轮调度完成。当然,具体实施时,上述的调度顺序也可以交换,本申请不做限制。
应理解,以上两个示例仅为举例而非具体限制,在实际应用中,第一优先级包含的子优先级的数量不局限于两个或三个。
进一步的,在具体实现时,若存在至少两个第一任务流的子优先级相同,例如存在至 少两个第一任务流的子优先级均为第一子优先级,则该至少两个第一任务流中每个任务流被分配的时间片都是根据第一子优先级确定的,即该至少两个第一任务流中的各第一任务流被分配的时间片的长度相同。加速器在对该至少两个第一任务流进行调度时,可以对该至少两个第一任务流中的各第一任务流同时进行调度(例如加速器有多个运算逻辑单元,则可以并行地将该至少两个第一任务流中不同第一任务流的AI任务调度至不同的运算逻辑单元),也可以先后对该至少两个第一任务流中的不同第一任务流进行调度,本申请不做限制。但是不管采用哪种方式,只有在第一子优先级对应的至少两个第一任务流中每个第一任务流被分配时间片用尽或AI任务被调度完之后,才能对另一子优先级的第一任务流进行调度。
基于上述可知,在本申请实施例提供的技术方案中,第一优先级对应的多个第一任务流中的不同第一任务流可以具有不同的子优先级,加速器基于时间片轮转的方式对所述多个第一任务流中的每个第一任务流进行调度,可以使得不同子优先级的第一任务流可轮流占用加速器,由于加速器的处理速度很快,只要各个时间片之间的间隔适当,就可以使不同子优先级的第一任务流从表面上看是并行调度的,实现第一优先级(或称为普通优先级)中不同子优先级的任务流并行调度的技术效果;同时,由于子优先级越高的任务流被分配的时间片越长,所以还可以保证子优先级越高的任务流占用加速器运行的时间越多,进而实现优先级的控制。并且,上述调度机制简单,运算量小,适用于硬化实现,例如适用于NPU等加速器。
可选的,加速器可以从子优先级最高的第一任务流开始,基于时间片轮转的方式依次对多个第一任务流中每个第一任务流的AI任务进行调度。
以第一优先级包括第一子优先级和第二子优先级、且第二子优先级高于第一子优先级为例:加速器首先对多个第一任务流中第一子优先级的第一任务流的AI任务进行调度;在第一子优先级的第一任务流被分配的时间片用尽,或者第一子优先级的第一任务流的AI任务被调度完时,对第二子优先级的第一任务流的AI任务进行调度;当第二子优先级的第一任务流被分配的时间片用尽,或者第二子优先级的第一任务流的AI任务被调度完时,一轮调度完成。
以第一优先级包括第一子优先级和第二子优先级和第三子优先级、且第三子优先级高于第二优先级、且第二子优先级高于第一子优先级为例:加速器首先对多个第一任务流中第一子优先级的第一任务流的AI任务进行调度;在第一子优先级的第一任务流被分配的时间片用尽,或者第一子优先级的第一任务流的AI任务被调度完时,对第二子优先级的第一任务流的AI任务进行调度;在第二子优先级的第一任务流被分配的时间片用尽,或者第二子优先级的第一任务流的AI任务被调度完时,对第三子优先级的第一任务流的AI任务进行调度;当第三子优先级的第一任务流被分配的时间片用尽,或者第三子优先级的第一任务流的AI任务被调度完时,一轮调度完成。当然,具体实施时,上述的调度顺序也可以交换,本申请不做限制。其中,第三子优先级高于第二子优先级,且第二子优先级高于第一子优先级。
应理解,以上两个示例仅为举例而非具体限制,在实际应用中,子优先级的数量不局限于两个或三个。
通过本实施方式,可以在实现第一优先级中各子优先级对应的任务流并行调度的同时, 尽可能保证较高子优先级的任务流先被调度到。
可选的,加速器在调度每个任务流时,可以将单个AI任务调度到一个运算逻辑单元运行,也可以将单个AI任务调度到多个运算逻辑单元运行,本申请不做限制。
一种可能的实现方式中,加速器将单个AI任务调度到多个运算逻辑单元运行,具体包括:加速器将单个AI任务切分成多个单元(block)(也可以将单元称为块或者数据块),将多个block调度至多个运算逻辑单元运行,其中一个运算逻辑单元执行一个block。
在具体实现时,每个AI任务都可以都携带一个指示信息,指示信息用于描述该AI任务流可被切分的block的个数,即用于指示加速器在调度该AI任务时需要将该AI任务切分成多少个block(或者说需要将该AI任务调度至多个运算逻辑单元运行)。例如,可以用N表示AI任务可被切分的block的个数,则N为正整数。特殊的,N=1表示一个AI任务即为一个block。
以对多个第一任务流中第一子优先级的第一任务流的AI任务进行调度,为例,则调度过程可以包括:根据第一任务流中第一AI任务携带的第一指示信息确定第一AI任务对应的block的数量N,N为正整数;若N=1,则将第一AI任务调度至1个运算逻辑单元进行处理;若N>1,则将第一AI任务切分为N个block,将切分后的N个block分别调度至N个不同的运算逻辑单元进行处理。其中,第一AI任务可以是第一任务流中第一个AI任务。
作为一个具体示例,参见图4,为AI任务和block的关系示意图。如图4所示,任务流(Stream)1在加速器中以串行的形式被调度,在任务流(Stream)1的第一个任务(即标识为1的任务)的信息中,包括Block数量为2的指示信息,则控制器调度该任务1时,将任务1切分成两个Block,将切分后的Block的标识(ID)配置给对应运算逻辑单元,其中Block ID=0分配给运算逻辑单元A,Block ID=1分配给运算逻辑单元B。每个逻辑运算单元接收到的Block ID后,找到对应的Block执行,即运算逻辑单元A执行block0,运算逻辑单元B执行block1。
通过本实施方式,可以实现将单个AI任务调度到多个运算逻辑单元运行,进而提高AI任务调度的并行度。
可选的,当多个第一任务流中剩余至少一个第一类任务流的AI任务未被调度完,且至少一个第一类任务流中每个第一任务流已使用的时间片大于或等于其被分配的时间片时(即一轮调度完成之后),加速器的控制器可以重新分配时间片,并基于重新分配的时间片开始新一轮的调度。例如,根据至少一个第一类任务流中各个第一任务流的子优先级为至少一个第一类任务流中各个第一任务流重新分配时间片,基于新的时间片开始新一轮的调度。
一种可能的情况中,如果多个第一任务流中的每个任务流的AI任务在上一轮调度中均未被调度完,即该至少一个第一类任务流为该多个第一任务流,则控制器需要重新为多个任务流中的每个任务流分配时间片。具体的,加速器的控制器根据该多个第一类任务流中各个第一任务流的子优先级为该多个第一类任务流中各个第一任务流重新分配时间片。其中,若存在任一第一任务流在上一轮调度中已使用的时间片超过其被分配的时间片,则从任一第一任务流应当重新分配的时间片(即按照子优先级重新分配的时间片)中扣除任 一第一任务流在上一轮调度中多使用的时间片,并将扣除后的时间片作为该任一第一任务流实际分配的时间片。
另一种可能的情况中,如果多个第一任务流中一部分任务流的AI任务在上一轮调度中被调度完(但时间片可能用尽或未用尽),而另一部分任务流的AI任务在上一轮调度中未被调度完(但时间片用尽),则加速器的控制器可以只为该另一部分任务流重新分配时间片,也可以为该多个第一任务流中的全部任务流重新分配时间片,本申请不做限制。同理,若存在任一第一任务流在上一轮调度中已使用的时间片超过其被分配的时间片,则从任一第一任务流应当重新分配的时间片(即按照子优先级重新分配的时间片)中扣除任一第一任务流在上一轮调度中多使用的时间片,并将扣除后的时间片作为该任一第一任务流实际分配的时间片。
示例性的,以两轮调度为例:
1)、在第一轮调度的时间片分配:控制器根据第一优先级对应的每个第一任务流的时间片比例,给每个第一任务流分配1个时间片,其中子优先级高的第一任务流的时间片长,子优先级低的第一任务流的时间片短;
2)第一轮调度过程:控制器根据子优先级从高到低的顺序,依次调度各第一任务流,即子优先级高的第一任务流先调度到,子优先级低的第一任务流后调度到。其中,控制器在调度到某个第一任务流时,并且该第一任务流有时间片,则调度该第一任务流中的AI任务到1个或多个运算逻辑单元执行,其中一个AI任务可以切分成1个或多个block并行运行;每个运算逻辑单元执行完block后给控制器响应(即发送计算结果给控制器),控制器根据Block下发的时间和接收到对应响应的时间,计算出运算逻辑单元运行Block的运行时间,从该第一任务流的剩余时间片里扣除这个运行时间;如果该第一任务流的时间片还有剩余,则控制器继续调度该第一任务流的AI任务到运算逻辑单元上执行,如果该第一任务流的AI任务被调度完或者该第一任务流的剩余时间片为0或负数,控制器则调度其他有时间片的第一任务流;
3)当所有第一任务流的时间片都用尽时,第一轮调度结束,则进入第二轮调度的时间片分配。在第二轮调度的时间片分配时,如果上一轮(即第一轮)调度中有任意第一任务流的剩余时间片为负数,则需要扣除上次多使用的时间片(即该第一任务流在第二轮调度中实际被分配的时间片=按照优先级应该分配的时间片-该第一任务流上轮多用掉的时间片。之后,进入第二轮调度过程,具体实现可参考第一轮调度过程,此处不再赘述。
以下再结合具体的时间片参数进行举例:
设加速器从处理器接收到4个第一任务流:stream1(子优先级为WRR0)、stream2(子优先级为WRR1)、stream3(子优先级为WRR1)、stream4(子优先级为WRR3),其中子优先级的高低顺序为WRR0>WRR1>WRR2>WRR3,各优先级对应的任务流的时间片比例为:WRR0:WRR1:WRR2:WRR3:WRR4=5:4:3:2:1。每个时间片分配周期,控制器按照上述时间片比例给stream1、stream2、stream3、stream 4分配的时间片分别为1000us、800us、800us、400us。stream1={task11,task12,task13,……},stream2={task21,task22,task23,……},stream3={task31,task32,task33,……},stream2={task41,task42,task43,……}。控制器给各stream分配时间片后,依次调度各stream:
控制器调度stream1的task11到运算逻辑单元执行,运算逻辑单元执行task11的时间是400us;
控制器扣除stream1的时间片400us后,还有600us剩余,继续调度task12;
运算逻辑单元执行task12的时间是500us,控制器扣除stream1的时间片500us后,还有100us剩余,继续调度task13;
运算逻辑单元执行task13的时间是300us,控制器扣除stream1的时间片300us后,时间片是-200us,不能再继续调度stream1,转而调度stream2;
运算逻辑单元执行完stream2task21和task22后,执行完剩余时间片为-100us,控制器调度stream3;
Stream3执行完task31、task32、task33后,剩余时间片-100us,控制器调度stream4;
Stream4执行完task41和task42后,剩余时间片-100us;
至此,所有stream的时间片都用尽,控制器重新为stream分配时间片,开始新一轮的调度。由于控制器需要扣除上一轮多用掉的时间片,stream1实际分配到的时间片为1000-200=800us,stream2实际分配到的时间片为800-100=700us,stream3实际分配到的时间片为800-100=700us,stream4实际分配到的时间片为400-100=300us。
通过该实施方式,能够保证各第一任务流在多轮调度中实际使用总时间片比例与子优先级匹配,可以进一步提高优先级控制的精准度。
以上主要是对第一优先级中不同子优先级对应的任务流的调度方法进行了介绍,但在实际应用中时,加速器中可以同时有多个不同优先级的任务流。以下对不同优先级对应的任务流的调度方法进行详细介绍。
可选的,若加速器中的控制器接收到多个不同优先级的任务流,则加速器可以先调度高优先级的任务流,在高优先级的任务流调度完之后,再调度低优先级的任务流。
示例性的,在S301中,加速器中的控制器接收若干任务流,该若干任务流包括S301中所述的多个第一任务流,另外还包括至少一个第二任务流,其中该至少一个第二任务流的优先级类型为第二优先级,第二优先级包括至少一个子优先级,切第二优先级高于第一优先级。则加速器中的控制器在基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度之前,需要先对该至少一个第二任务流中的AI任务进行调度;在调度完至少一个第二任务流中的所有AI任务之后,再基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度。
应理解,以上是以两个优先级的任务流为例,实际应用中,任务流的优先级的数量还可以有更多,本申请实施例不做限制。例如,除了第一优先级和第二优先级之后,还可以有第三优先级,其中第三优先级对应的任务流为第三任务流,第三优先级高于第二优先级,则调度过程包括:控制器先对第三任务流进行调度,在调度完第三任务流中的所有AI任务之后,再对第二任务流进行调度,在调度完第二任务流中的所有AI任务之后,再对第一任务流进行调度。
一种可能的实现方式中,除第一优先级之外的其它任一优先级,可以不进一步划分子优先级,或者说除第一优先级之外的其它任一优先级只包含一个子优先级,进而其它任一优先级中的各任务流对应的时间片的长度相同,其它任一优先级中的各任务流平均使用AI加速器。在这种情况下,其它任一优先级对应的任务流基于时间片调度的方法可以认为是第一优先级对应的任务流基于时间片调度的方法的一个特例。
例如,在图1所示的自动驾驶场景中,驾驶员监控功能相关的任务流对应第二优先级。 其中,驾驶员疲劳检测功能相关的任务流和驾驶员分心检测功能相关的任务流的时间片比例为1:1,例如驾驶员疲劳检测功能相关的每个任务流和驾驶员分心检测功能相关的每个任务流分配的时间片均为1100us。
例如,在图1所示的自动驾驶场景中,障碍物检测、车道线检测等功能相关的任务流对应第三优先级。其中,障碍物检测功能相关的每个任务流和车道线检测功能相关的每个任务流的时间片比例为1:1,例如分配的时间片均为1200us。
应理解,以上仅为举例而非限制,实际的自动驾驶***中的各功能对应的优先级划分方式还可以有其它形式。
另一种可能的实现方式中,对于除第一优先级之外的其它任一优先级(如第二优先级或第三优先级),可以进一步划分子优先级。该其它任一优先级对应的任务流的调度方法可以参考第一优先级对应的任务流的调度方法。以第二优先级为例:第二优先级对应的多个第二任务流中的不同第二任务流可以具有不同的子优先级,加速器基于时间片轮转的方式对所述多个第二任务流中的每个第二任务流进行调度,可以使得不同子优先级的第二任务流可轮流占用加速器。其中,第二优先级中各子优先级对应的第二任务流的时间片的分配方法可以参考上文第一优先级中各子优先级对应的第一任务流的时间片的分配方法,这里不再赘述。加速器在调度第二优先级对应所有任务流完成之后,继续调度第一优先级对应的任务流。
通过本实施方式,可以实现高优先级的任务流优先于低优先级的任务流被加速器调度,可以更好地保证高优先级的任务流的调度实时性。
可选的,高优先级的任务流可以抢占低优先级的任务流占用加速器的运算逻辑单元的时间(简称“高优先级的任务流抢占低优先级的任务流”、或者“高优先级抢占低优先级”)。具体实现包括:若加速器的控制器在调度低优先级的任务流的过程中,接收到高优先级的任务流,则可以暂停对优先级的任务流的调度,转而调度高优先级的任务流。
示例性的,加速器的控制器在基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度的过程中,若加速器的控制器接收到至少一个第二任务流,则暂停对多个第一任务流中的AI任务进行调度,并开始对至少一个第二任务流中的AI任务进行调度;其中,至少一个第二任务流的优先级类型为第二优先级,第二优先级包括至少一个子优先级,第二优先级高于第一优先级。
示例性的,加速器的控制器在对第一任务流或第二任务流进行调度的过程中,若加速器的控制器接收到第三任务流,则暂停对该第一任务流或第二任务流的调度,并开始对第三任务流进行调度;其中,第三任务流的优先级类型为第三优先级,第三优先级高于第二优先级和第一优先级。
通过本实施方式,高优先级的任务流可以抢占低优先级的任务流占用加速器的运算逻辑单元的时间,可以更好地保证高优先级的任务流的调度实时性。
可选的,高优先级的任务流抢占低优先级的任务流的时机,发生在AI任务的边界(即在一个AI任务执行完之后、另一个AI任务开始执行之前)或Block的边界(即在一个Block执行完之后、另一个Block开始执行之前)。如果加速器的控制器在某个AI任务或某个AI任务的任一Block的执行期间接收到更高优先级的任务流,则须等待该AI任务或该Block 执行结束之后,才允许执行抢占过程。
示例性的,加速器的控制器在基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度的过程中,若在调度多个第一任务流中的第二AI任务时,接收到至少一个第二任务流,则在第二AI任务被执行完之后或在第二AI任务中已经被调度至运算逻辑单元的Block被执行完之后,暂停对多个第一任务流中的AI任务进行调度,并开始对至少一个第二任务流中的AI任务进行调度。其中,第二AI任务可以是任意一个任务流中的任意一个AI任务。
具体例如,第一任务流的某个AI任务的block数量是4,控制器已经将该AI任务中的2个block下发到运算逻辑单元执行,此时控制器收到更高优先级的任务流,则控制器只需要等待已下发的这2个block完成,就可以切换对更高优先级任务流进行调度,而无需再下发剩余的2个block。
因为AI任务主要是矩阵运算和矢量运算,属于计算密集型,同样也是输入/输出(Input/Output,I/O)密集型,需要从内存读写大量数据。比如某加速器的一个时钟周期可以做2个16*16的FP32矩阵的乘法运算,需要读取2*16*16*4字节的数据,并写16*16*4字节的数据,远大于普通CPU做FP32乘法运算时的数据读写量。因此AI运算逻辑单元内有大量的寄存器、高速缓冲存储器(Cache)和缓冲寄存器(Buffer)等,用于缓存需要从内存读取或写入内存的计算数据。AI加速器执行AI任务时,会并行地从内存读取数据到Cache或Buffer,刷新Cache或Buffer中的数据到内存,所有的Cache和Buffer都是该AI任务独享的,加载和刷新都是AI任务自己控制的。运算逻辑单元执行下一个AI任务时,同样由这个AI任务自己控制和管理Cache或Buffer的空间使用。
加速器的逻辑运算单元内存在大量的寄存器和缓存,控制器要切换AI任务时,如果像CPU的切换程序进程一样,把逻辑运算单元暂停住,备份运算逻辑单元内的各种寄存器和各种缓存,再切换其他AI任务(如果这个AI任务之前被中断过,包括恢复这个任务之前备份的寄存器和缓存数据)到这个逻辑运算单元来执行,则备份和恢复寄存器、缓存等的存储开销和性能开销都太大。
而本实施方式正是考虑到大多数AI任务的执行时间是非常短的(例如对于自动驾驶相关的AI任务,一个AI模型的运行时间是毫秒(ms)级,而AI模型一般都有几十、上百个AI任务组成,因此在正常情况下,每个AI任务执行时间都远小于1ms),使高优先级的任务流抢占低优先级的任务流的时机发生在AI任务的边界或Block的边界,可以在不中断AI任务的情况下,实现高优先级的任务流对低优先级的任务流的抢占,同时可以不需要备份和恢复寄存器、缓存等,能够减小加速器的存储开销和性能开销。
可选的,在一个调度周期中(即一轮调度持续期间),控制器接收到新任务流,则控制器可以给新任务流分配时间片,其中新任务流应当分配的时间片和本轮调度周期中优先级与新任务流的优先级相同任务流应当分配的时间片(即未考虑需要扣除上一轮多用的时间片)相同。例如,沿用上文对stream1(子优先级为WRR0)、stream2(子优先级为WRR1)、stream3(子优先级为WRR1)、stream4(子优先级为WRR3)的调度,控制器在第一轮调度周期中给stream1、stream2、stream3、stream 4分配的时间片分别为1000us、800us、800us、400us,若控制器在第一轮调度周期中接收到新任务流stream5(子优先级为WRR1),则为新的任务流stream5分配的时间片应当与stream2(子优先级为WRR1)相同,即任务流 stream5分配的时间为800us。
可选的,如果被调度的某个任务流出现阻塞场景(比如被调度的AI模型有多个任务流,一个任务流等待另一个任务流被调度之后才能被调度),则控制器可以等待被阻塞的任务流满足执行条件(如另一个任务流stream被调度之后)时,再执行该任务流。如果一个调度周期该任务流都没有满足执行条件,则在其他任务流的时间片用尽或AI任务调度完之后,开始一下轮的时间片分配,该任务流在所述一个调度周期中剩余的时间片不带入下一轮的时间片分配。如果被阻塞的任务流在每个调度周期中剩余的时间片都带入下一轮的时间片分配,则会导致被阻塞的任务流积累大量的时间片,当其被调度到时,会造成其他任务流执行严重延时的问题。
通过该实施方式,可以更好地保证AI任务的调度实时性。
可选的,如果某个任务流中已到达加速器的AI任务均已被调度执行完成且暂无后续AI任务到达加速器,则控制器可以先调度其他任务流,等待该任务流后续AI任务到达后,再继续执行该任务流。如果一个调度周期该任务流都没有后续AI任务达到,则在其他任务流的时间片用尽或AI任务调度完之后,开始一下轮的时间片分配,该任务流在所述一个调度周期中剩余的时间片不带入下一轮的时间片分配。如果该任务流在每个调度周期中剩余的时间片都带入下一轮的时间片分配,则会导致该任务流积累大量的时间片,当其有AI任务到达进而被调度到时,会造成其他任务流执行严重延时的问题。
通过该实施方式,可以更好地保证AI任务的调度实时性。
可选的,加速器的控制器在调度第三AI任务的过程中,如果加速器的执行第三AI任务的执行时间超过预设时长,则控制器可以暂停正在执行该第三AI任务的运算逻辑单元执行第三AI任务,重新执行调度过程,即重新调度第三AI任务到运算逻辑单元执行。其中,第三AI任务可以是加速器中的任意一个AI任务。
例如,在自动驾驶场景中,AI模型的运行时间是ms级,而AI模型一般都有几十、上百个AI任务组成,因此在正常情况下,大多数AI任务执行时间都应当远小于1ms。因此可以设置预设时长为1ms或2ms或3ms等(此处仅为举例而非限制),一旦存在某个AI任务的执行时间超过该预设时长,则证明该AI任务执行异常,因此可以加速器的控制器可以重新调度该AI任务到运算逻辑单元执行,进而解决异常,使得运算逻辑单元可以更快地输出该AI任务的正确的执行结果。
其中,如果在第三AI任务发生异常时,刚好其它更高优先级(如第二优先级)有新AI任务达到加速器(包括新任务流到达加速器、原任务流有新AI任务达到加速器),则控制器可以立即转而调度新到达的AI任务到运算逻辑单元执行,当其它新到达的AI任务执行完之后,再重新执行第三AI任务,进而保证其他AI任务的调度实时性。
通过本实施方式,加速器可以在保证其他AI任务的调度实时性的同时,及时解决异常。
应理解,本申请提供上述各实施例可以相互结合以实现不同的技术效果。
参见图5,为本申请任务调度方法的一种具体示例,在图5所示的示例中,控制器将任务流的优先级分为4个级别,即WRR(可以对应上文中的第一优先级)、SP0、SP1和 SP2,其中优先级的高低关系为WRR<SP2<SP1<SP0。其中WRR进一步包括5个子优先级:WRR0、WRR1、WRR2、WRR3、WRR4,其中子优先级的高低关系为WRR0>WRR1>WRR2>WRR3>WRR4。相应的,时间片分配关系为:SP0>SP1>SP2>WRR0>WRR1>WRR2>WRR3>WRR。控制器先对SP0对应的各任务流基于时间片轮转的方式进行调度;在SP0对应的所有任务流全部调度完成之后,对SP1对应的各任务流基于时间片轮转的方式进行调度;在SP0和SP1对应的所有任务流全部调度完成之后,对SP2对应的各任务流基于时间片轮转的方式进行调度;在SP0、SP1和SP2对应的所有任务流全部调度完成之后,对WRR对应的各任务流基于时间片轮转的方式进行调度。在调度的过程中,高优先级的任务流可以抢占低优先级的任务流,例如SP1可以被SP0中途抢占,SP2可以被SP0/SP1中途抢占,WRR可以被SP0/SP1/SP2中途抢占。
在具体实现时,处理器中的APP可以通过API接口调用加速器执行本申请实施例提供的方法。
以下列举几种可能的API接口:
1)、CreateModelWithPriority(Model_t*pModel,int priority):用于创建AI模型(model)对象,输入model的优先级,输出model对象;
2)、CreateStreamWithPriority(Stream_t*pStream,int priority):用于创建任务流(stream)对象,输入stream的优先级,输出model对象。如果stream添加到model中,则以model的优先级覆盖stream的优先级。
3)、GetModelPriority(Model_t model,int*priority:用于查询stream的优先级,输入stream对象,返回stream的优先级。
4)、GetStreamPriority(Stream_t Stream,int*priority):用于查询model的优先级,输入model对象,返回model的优先级。
应理解,上述几种API接口仅为示例而非限制。
可以理解的是,为了实现上述实施例中功能,本申请实施例还提供一种任务调度装置,该装置包括了执行上述各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
示例性的,参见图6,任务调度装置600包括:
收发模块601,用于接收多个第一任务流,多个第一任务流的优先级类型为第一优先级,第一优先级包括第一子优先级和第二子优先级;
处理模块602,用于根据多个第一任务流中每个第一任务流的子优先级为每个第一任务流分配时间片;其中,第一子优先级高于第二子优先级,第一子优先级对应的第一任务流被分配的时间片大于第二子优先级对应的第一任务流被分配的时间片;基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度。
可选的,处理模块602在基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度时,具体用于:对所述第一子优先级的第一任务流的AI任务进行调度;在所述第一子优先级的第一任务流已使用的时间片大于或等于其被分配的时间片,或者,在所述第一子优先级的第一任务流的AI任务被调度完时,对第二子优先级的第一任务流 的AI任务进行调度。
可选的,处理模块602在对第一子优先级的第一任务流的AI任务进行调度时,具体用于:根据第一任务流中第一AI任务携带的第一指示信息确定第一AI任务对应的数据块的数量N,N为正整数;若N=1,则将第一AI任务调度至1个运算逻辑单元进行处理;若N>1,则将第一AI任务切分为N个数据块,将切分后的N个数据块分别调度至N个不同的运算逻辑单元进行处理。
可选的,处理模块602还可以用于:当多个第一任务流中剩余至少一个第一类任务流的AI任务未被调度完,且至少一个第一类任务流中每个第一任务流已使用的时间片大于或等于其被分配的时间片时,根据至少一个第一类任务流中每个第一任务流的子优先级为至少一个第一类任务流中每个第一任务流重新分配时间片;其中,若至少一个第一类任务流中存在任一第一任务流在上一轮调度中已使用的时间片超过其被分配的时间片,则从任一第一任务流重新被分配的时间片中扣除任一第一任务流在上一轮调度中多使用的时间片。
可选的,收发模块601,具体用于:接收若干任务流,其中若干任务流包括多个第一任务流、至少一个第二任务流,至少一个第二任务流的优先级类型为第二优先级,第二优先级高于第一优先级。相应的,处理模块602还可以用于:在基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度之前,对至少一个第二任务流中的AI任务进行调度;在调度完至少一个第二任务流中的所有AI任务之后,基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度。
可选的,处理模块602还可以用于:在基于时间片轮转的方式对多个第一任务流中所有第一任务流的AI任务进行调度的过程中,若在调度多个第一任务流中的第二AI任务时,确定收发模块601接收到至少一个第二任务流,则在第二AI任务被执行完之后或在第二AI任务中已经被调度至运算逻辑单元的数据块被执行完之后,暂停对多个第一任务流中的AI任务进行调度,并开始对至少一个第二任务流中的AI任务进行调度;其中,至少一个第二任务流的优先级类型为第二优先级,第二优先级高于第一优先级。
可选的,第一任务流为第一AI推理模型中的任务流,第一AI推理模型对应自动驾驶***中的车内乘员监控功能或娱乐功能;第二任务流为第二AI推理模型中的任务流,第二AI推理模型对应自动驾驶***中的障碍物检测功能或车道线检测功能或驾驶员检测功能。
应理解,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
以下介绍该任务调度装置的几种可能的产品形态:
基于同一技术构思,本申请实施例还提供一种加速器,包括可编程逻辑电路和/或程序指令,当所述加速器运行时,用于执行上述方法实施例中所描述的方法。
基于同一技术构思,本申请实施例还提供一种计算机可读存储介质,该可读存储介质用于存储指令,当所述指令被执行时,使上述方法实施例中所描述的方法被实现。
基于同一技术构思,本申请实施例还提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述方法实施例中所描述的方法。
本领域内的技术人员应明白,本申请的实施例可提供为方法、***、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实 施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的保护范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (17)

  1. 一种任务调度方法,其特征在于,所述方法包括:
    接收多个第一任务流,所述多个第一任务流的优先级类型为第一优先级,所述第一优先级包括第一子优先级和第二子优先级;
    根据所述多个第一任务流中每个第一任务流的子优先级为所述每个第一任务流分配时间片;其中,所述第一子优先级高于所述第二子优先级,所述第一子优先级对应的第一任务流被分配的时间片大于所述第二子优先级对应的第一任务流被分配的时间片;
    基于时间片轮转的方式对所述多个第一任务流中所有第一任务流的人工智能AI任务进行调度。
  2. 如权利要求1所述的方法,其特征在于,所述基于时间片轮转的方式对所述多个第一任务流中所有第一任务流的AI任务进行调度,包括:
    对所述第一子优先级的第一任务流的AI任务进行调度;
    在所述第一子优先级的第一任务流已使用的时间片大于或等于其被分配的时间片,或者,在所述第一子优先级的第一任务流的AI任务被调度完时,对所述第二子优先级的第一任务流的AI任务进行调度。
  3. 如权利要求2所述的方法,其特征在于,对所述第一子优先级的第一任务流的AI任务进行调度,包括:
    根据所述第一任务流中第一AI任务携带的第一指示信息确定所述第一AI任务对应的数据块的数量N,所述N为正整数;
    若所述N=1,则将所述第一AI任务调度至一个运算逻辑单元进行处理;
    若所述N>1,则将所述第一AI任务切分为N个数据块,将切分后的N个数据块分别调度至N个不同的运算逻辑单元进行处理。
  4. 如权利要求2或3所述的方法,其特征在于,所述方法还包括:
    当所述多个第一任务流中剩余至少一个第一类任务流的AI任务未被调度完,且所述至少一个第一类任务流中每个第一任务流已使用的时间片大于或等于其被分配的时间片时,根据所述至少一个第一类任务流中每个第一任务流的子优先级为所述至少一个第一类任务流中的所述每个第一任务流重新分配时间片;
    其中,若所述至少一个第一类任务流中存在任一第一任务流在上一轮调度中已使用的时间片超过其被分配的时间片,则从所述任一第一任务流重新被分配的时间片中扣除所述任一第一任务流在上一轮调度中多使用的时间片。
  5. 如权利要求1-4任一项所述的方法,其特征在于,
    所述接收多个第一任务流,包括:
    接收若干任务流,其中所述若干任务流包括所述多个第一任务流、至少一个第二任务流,所述至少一个第二任务流的优先级类型为第二优先级,所述第二优先级高于所述第一优先级;
    所述方法还包括:
    在基于时间片轮转的方式对所述多个第一任务流中所有第一任务流的AI任务进行调度之前,对所述至少一个第二任务流中的AI任务进行调度;在调度完所述至少一个第二任务流中的所有AI任务之后,基于时间片轮转的方式对所述多个第一任务流中所有第一 任务流的AI任务进行调度。
  6. 如权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:
    在基于时间片轮转的方式对所述多个第一任务流中所有第一任务流的AI任务进行调度的过程中,若在调度所述多个第一任务流中的第二AI任务时,接收到至少一个第二任务流,则在所述第二AI任务被执行完之后或在所述第二AI任务中已经被调度至运算逻辑单元的数据块被执行完之后,暂停对所述多个第一任务流中的AI任务进行调度,并开始对所述至少一个第二任务流中的AI任务进行调度;
    其中,所述至少一个第二任务流的优先级类型为第二优先级,所述第二优先级高于所述第一优先级。
  7. 如权利要求5或6所述的方法,其特征在于,所述第一任务流为第一AI推理模型中的任务流,所述第一AI推理模型对应自动驾驶***中的车内乘员监控功能或娱乐功能;
    所述第二任务流为第二AI推理模型中的任务流,所述第二AI推理模型对应自动驾驶***中的障碍物检测功能或车道线检测功能或驾驶员检测功能。
  8. 一种任务调度装置,其特征在于,所述装置包括:
    收发模块,用于接收多个第一任务流,所述多个第一任务流的优先级类型为第一优先级,所述第一优先级包括第一子优先级和第二子优先级;
    处理模块,用于根据所述多个第一任务流中每个第一任务流的子优先级为所述每个第一任务流分配时间片;其中,所述第一子优先级高于所述第二子优先级,所述第一子优先级对应的第一任务流被分配的时间片大于所述第二子优先级对应的第一任务流被分配的时间片;基于时间片轮转的方式对所述多个第一任务流中所有第一任务流的AI任务进行调度。
  9. 如权利要求8所述的装置,其特征在于,所述处理模块在基于时间片轮转的方式对所述多个第一任务流中所有第一任务流的AI任务进行调度时,具体用于:
    对所述第一子优先级的第一任务流的AI任务进行调度;
    在所述第一子优先级的第一任务流已使用的时间片大于或等于其被分配的时间片,或者,在所述第一子优先级的第一任务流的AI任务被调度完时,对所述第二子优先级的第一任务流的AI任务进行调度。
  10. 如权利要求9所述的装置,其特征在于,所述处理模块在对所述第一子优先级的第一任务流的AI任务进行调度时,具体用于:
    根据所述第一任务流中第一AI任务携带的第一指示信息确定所述第一AI任务对应的数据块的数量N,所述N为正整数;
    若所述N=1,则将所述第一AI任务调度至1个运算逻辑单元进行处理;
    若所述N>1,则将所述第一AI任务切分为N个数据块,将切分后的N个数据块分别调度至N个不同的运算逻辑单元进行处理。
  11. 如权利要求9或10所述的装置,其特征在于,所述处理模块还用于:
    当所述多个第一任务流中剩余至少一个第一类任务流的AI任务未被调度完,且所述至少一个第一类任务流中每个第一任务流已使用的时间片大于或等于其被分配的时间片时,根据所述至少一个第一类任务流中每个第一任务流的子优先级为所述至少一个第一类任务流中所述每个第一任务流重新分配时间片;
    其中,若所述至少一个第一类任务流中存在任一第一任务流在上一轮调度中已使用的 时间片超过其被分配的时间片,则从所述任一第一任务流重新被分配的时间片中扣除所述任一第一任务流在上一轮调度中多使用的时间片。
  12. 如权利要求8-11任一项所述的装置,其特征在于,
    收发模块,具体用于:接收若干任务流,其中所述若干任务流包括所述多个第一任务流、至少一个第二任务流,所述至少一个第二任务流的优先级类型为第二优先级,所述第二优先级高于所述第一优先级;
    所述处理模块还用于:
    在基于时间片轮转的方式对所述多个第一任务流中所有第一任务流的AI任务进行调度之前,对所述至少一个第二任务流中的AI任务进行调度;在调度完所述至少一个第二任务流中的所有AI任务之后,基于时间片轮转的方式对所述多个第一任务流中所有第一任务流的AI任务进行调度。
  13. 如权利要求8-11任一项所述的装置,其特征在于,所述处理模块还用于:
    在基于时间片轮转的方式对所述多个第一任务流中所有第一任务流的AI任务进行调度的过程中,若在调度所述多个第一任务流中的第二AI任务时,确定所述收发模块接收到至少一个第二任务流,则在所述第二AI任务被执行完之后或在所述第二AI任务中已经被调度至运算逻辑单元的数据块被执行完之后,暂停对所述多个第一任务流中的AI任务进行调度,并开始对所述至少一个第二任务流中的AI任务进行调度;
    其中,所述至少一个第二任务流的优先级类型为第二优先级,所述第二优先级高于所述第一优先级。
  14. 如权利要求12或13所述的装置,其特征在于,所述第一任务流为第一AI推理模型中的任务流,所述第一AI推理模型对应自动驾驶***中的车内乘员监控功能或娱乐功能;
    所述第二任务流为第二AI推理模型中的任务流,所述第二AI推理模型对应自动驾驶***中的障碍物检测功能或车道线检测功能或驾驶员检测功能。
  15. 一种加速器,其特征在于,包括可编程逻辑电路和/或程序指令,当所述加速器运行时,用于实现如权利要求1-7中任一项所述的方法。
  16. 一种计算机可读存储介质,其特征在于,所述可读存储介质用于存储指令,当所述指令被执行时,使如权利要求1-7中任一项所述的方法被实现。
  17. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得计算机执行如权利要求1-7中任一项所述的方法。
PCT/CN2021/105779 2021-07-12 2021-07-12 一种任务调度方法和装置 WO2023283767A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2021/105779 WO2023283767A1 (zh) 2021-07-12 2021-07-12 一种任务调度方法和装置
EP21949564.5A EP4357920A1 (en) 2021-07-12 2021-07-12 Task scheduling method and apparatus
CN202180044331.3A CN115812197A (zh) 2021-07-12 2021-07-12 一种任务调度方法和装置
US18/408,906 US20240143393A1 (en) 2021-07-12 2024-01-10 Task scheduling method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/105779 WO2023283767A1 (zh) 2021-07-12 2021-07-12 一种任务调度方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/408,906 Continuation US20240143393A1 (en) 2021-07-12 2024-01-10 Task scheduling method and apparatus

Publications (1)

Publication Number Publication Date
WO2023283767A1 true WO2023283767A1 (zh) 2023-01-19

Family

ID=84919829

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/105779 WO2023283767A1 (zh) 2021-07-12 2021-07-12 一种任务调度方法和装置

Country Status (4)

Country Link
US (1) US20240143393A1 (zh)
EP (1) EP4357920A1 (zh)
CN (1) CN115812197A (zh)
WO (1) WO2023283767A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399798A (zh) * 2013-07-29 2013-11-20 深圳市汇川控制技术有限公司 Plc的多任务控制方法和装置
CN104571042A (zh) * 2014-12-31 2015-04-29 深圳市进林科技有限公司 智能汽车的整车控制方法及整车控制器
CN108920261A (zh) * 2018-05-23 2018-11-30 中国航天***科学与工程研究院 一种适于大规模并行数据处理任务的两级自适应调度方法
CN111447152A (zh) * 2020-03-16 2020-07-24 Oppo广东移动通信有限公司 子流资源调度方法、装置、终端设备和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399798A (zh) * 2013-07-29 2013-11-20 深圳市汇川控制技术有限公司 Plc的多任务控制方法和装置
CN104571042A (zh) * 2014-12-31 2015-04-29 深圳市进林科技有限公司 智能汽车的整车控制方法及整车控制器
CN108920261A (zh) * 2018-05-23 2018-11-30 中国航天***科学与工程研究院 一种适于大规模并行数据处理任务的两级自适应调度方法
CN111447152A (zh) * 2020-03-16 2020-07-24 Oppo广东移动通信有限公司 子流资源调度方法、装置、终端设备和存储介质

Also Published As

Publication number Publication date
EP4357920A1 (en) 2024-04-24
CN115812197A (zh) 2023-03-17
US20240143393A1 (en) 2024-05-02

Similar Documents

Publication Publication Date Title
US9135077B2 (en) GPU compute optimization via wavefront reforming
CN111736987B (zh) 一种基于gpu空间资源共享的任务调度方法
US8918784B1 (en) Providing service quality levels through CPU scheduling
JP5722327B2 (ja) Gpuワークのハードウエアベースでのスケジューリング
CN110796588A (zh) 同时计算和图形调度
US8963933B2 (en) Method for urgency-based preemption of a process
CN112465129A (zh) 片内异构人工智能处理器
US8743131B2 (en) Course grain command buffer
US9104491B2 (en) Batch scheduler management of speculative and non-speculative tasks based on conditions of tasks and compute resources
US20200167191A1 (en) Laxity-aware, dynamic priority variation at a processor
US8933942B2 (en) Partitioning resources of a processor
WO2020227582A2 (en) Method and apparatus for scheduling matrix operations in digital processing systems
CN111597044A (zh) 任务调度方法、装置、存储介质及电子设备
CN112925616A (zh) 任务分配方法、装置、存储介质及电子设备
US20220207643A1 (en) Implementing heterogenous wavefronts on a graphics processing unit (gpu)
WO2023283767A1 (zh) 一种任务调度方法和装置
CN116724294A (zh) 一种任务分配方法及装置
Elliott et al. Gpusync: Architecture-aware management of gpus for predictable multi-gpu real-time systems
EP4254184A1 (en) Processing engine mapping for time-space partitioned processing systems
US9760969B2 (en) Graphic processing system and method thereof
Sun et al. Real-time scheduling upon a host-centric acceleration architecture with data offloading
US9632834B2 (en) Assigning priorities to computational work streams by mapping desired execution priorities to device priorities
WO2023230909A1 (zh) 调度方法及相关装置
US20230305887A1 (en) Processing engine scheduling for time-space partitioned processing systems
US20230111174A1 (en) Systems and methods for regulating memory utilization for coprocessors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21949564

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2021949564

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2021949564

Country of ref document: EP

Effective date: 20240121

NENP Non-entry into the national phase

Ref country code: DE