WO2014101561A1 - Method and device for implementing multi-application parallel processing on single processor - Google Patents

Method and device for implementing multi-application parallel processing on single processor Download PDF

Info

Publication number
WO2014101561A1
WO2014101561A1 PCT/CN2013/085906 CN2013085906W WO2014101561A1 WO 2014101561 A1 WO2014101561 A1 WO 2014101561A1 CN 2013085906 W CN2013085906 W CN 2013085906W WO 2014101561 A1 WO2014101561 A1 WO 2014101561A1
Authority
WO
WIPO (PCT)
Prior art keywords
thread
hardware
ithread
parallel
processing
Prior art date
Application number
PCT/CN2013/085906
Other languages
French (fr)
Chinese (zh)
Inventor
梅思行
Original Assignee
深圳中微电科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳中微电科技有限公司 filed Critical 深圳中微电科技有限公司
Publication of WO2014101561A1 publication Critical patent/WO2014101561A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming

Definitions

  • the present invention relates to the field of processors, and more particularly to a method and apparatus for implementing multi-application parallel processing on a single processor.
  • a GPU Graphic Processing Unit
  • GPGPU General Purpose Graphic Processing Unit
  • a GPU Graphic Processing Unit
  • GPGPU General Purpose Graphic Processing Unit
  • these two different types of operations are mutually exclusive and cannot run at the same time when running on the GPU. This means that the GPU cannot execute both a graphics shader and a non-shading computer program. This affects the efficiency of both types of operations when required. In this case, if you want better graphics performance for better GPU graphics rendering, you need to increase the size and capabilities of the parallel processing array within the GPU. This adds to the complexity and cost of the system.
  • the technical problem to be solved by the present invention is to provide a single processor capable of simultaneously running two or more different types of applications, in view of the drawback that the above two or more different types of applications cannot be operated simultaneously.
  • the technical solution adopted by the present invention to solve the technical problem is to construct a method for implementing multi-application parallel processing on a single processor, wherein the single processor is provided with a plurality of parallel processing units, and the method includes the following steps :
  • step B) separately determining whether the plurality of applications are image rendering applications, and if so, performing step D); otherwise, determining that the computing application performs step C);
  • the method further includes the following steps:
  • step E) further includes: configuring the operation to an idle processing unit controlled by the hardware controller.
  • the homogeneous parallel programming API includes pthread and openMP, and the thread processed by the thread is a pthread thread;
  • the heterogeneous parallel programming API includes openCL, and the thread processed by the GPGPU thread;
  • the GPU driver includes openGL The thread that is processed by it is an openGL thread.
  • step D) further includes:
  • the hardware thread control unit forms the program queue of the ithread according to the receiving time, calls and prepares the ithread;
  • the ithreads are sequentially run in idle multi-way parallel hardware thread slots of the processing unit controlled by the hardware manager according to their queue order in the hardware thread control unit; wherein the ithread is hardware Thread, the ithread includes threads that require hardware acceleration in an image engine, DSP, or/and general purpose image processor.
  • step D) further includes:
  • step D01 determining whether there is a valid and unexecuted hardware thread in the hardware thread control unit, if yes, executing step D02); otherwise, performing step D03);
  • D02 removing the currently idle multiple parallel hardware thread time slot from the system thread management unit, prohibiting the thread timer interrupt of the parallel hardware thread time slot, and configuring the idle multiple parallel hardware thread time slot to the location Hardware thread control unit control; D03) Waiting for and returning information about the idle hardware thread slot idle to the system thread management unit.
  • step D) further includes the following steps:
  • the ithread executes or enters an event waiting for its execution to continue, the ithread exits its running hardware thread slot and enables the thread timing interrupt of the slot;
  • the hardware thread control unit detects whether the valid state of the ithread in its run queue is cleared, and if so, clears the ithread; otherwise, the ithread is maintained.
  • the present invention also relates to an apparatus for carrying out the above method, wherein a plurality of parallel processing units are disposed in the single processor, the apparatus comprising:
  • Application allocation unit for respectively assigning multiple tasks to multiple processing units operating as SMP cores;
  • An application determining unit configured to respectively determine whether the plurality of tasks are image rendering applications; a computing application processing unit: configured to process at least one thread generated by the task using a homogeneous parallel programming API, and configure the processed thread Running on a processing unit as an SMP core; a graphics acceleration operation unit: configured to generate at least one shader thread from the task to a hardware thread control unit, and start rendering on a processing unit controlled by the hardware manager by a GPU driver; The processing unit controlled by the hardware management unit is obtained by the hardware management unit to the system.
  • GPGPU thread processing unit for judging whether the processing unit controlled by the hardware controller is idle and there is still a task to be processed in the task queue, and if so, using the heterogeneous parallel programming API to process at least one thread generated by the task and configuring it to The idle controller unit controlled by the hardware controller operates.
  • the graphics acceleration processing unit further includes:
  • Calling the instruction generation module for causing the thread to generate an ithread call instruction belonging to itself to the hardware thread control unit;
  • a queue forming module configured to cause the hardware thread control unit to form a call queue of the ithread according to a receiving time, and invoke and prepare the ithread;
  • Thread allocation module for causing the ithread to follow its control thread in the hardware thread
  • the queue order in the element is sequentially run in the idle multi-way parallel hardware thread time slot of the processing unit controlled by the hardware manager;
  • Thread interrupt module for when the ithread executes or enters an event waiting for its execution to continue, the ithread exits its running hardware thread slot and enables the thread timing interrupt of the slot;
  • a thread clearing module configured to cause the hardware thread control unit to detect whether an effective state of an ithread in its running queue is cleared, if yes, clearing the ithread; otherwise, maintaining the ithread
  • the ithread is a hardware thread, and the ithread includes an image engine, a DSP, or
  • a method and apparatus for implementing multi-application parallel processing on a single processor embodying the present invention has the following beneficial effects: Since different types of applications are uniformly processed, and applications related to graphics processing are allocated to a processing unit controlled by a hardware thread control unit In operation, at the same time, the processing unit not controlled by the above hardware thread control unit is still processing the thread generated by the computing application allocated by the thread control unit of the system, so that two or more different types can be processed simultaneously on one processor.
  • FIG. 1 is a flow chart of a method in an embodiment of a method for implementing multi-application parallel processing on a single processor of the present invention
  • FIG. 2 is a flow chart of the hardware thread control unit of the embodiment implementing a GPU application
  • Figure 3 is a schematic structural view of the device in the embodiment
  • FIG. 4 is a schematic structural diagram of a processor of the embodiment.
  • the method includes the following steps:
  • Step S11 multiple applications form a queue, and are ready to be distributed to run on multiple parallel processing units:
  • multiple applications are respectively formed into task queues and ready to be distributed to multiple parallel processing units to run in parallel, for the next step.
  • the processor involved is one
  • a processor having a plurality of parallel processing units is called a unified processing unit (UPU), and a specific structure of a core portion of the processor is shown in FIG.
  • the processor includes a plurality of parallel processing units, which may be symmetric multi-processing (SMP), homogenous parallel processors, and GPU programmable unified shading (gpu's programmable unified shading). Or a heterogeneous parallel processor of GPGPU.
  • SMP symmetric multi-processing
  • homogenous parallel processors homogenous parallel processors
  • gpu's programmable unified shading GPU programmable unified shading
  • gpu's programmable unified shading GPU programmable unified shading
  • gpu's programmable unified shading GPU programmable unified shading
  • gpu's programmable unified shading GPU programmable unified shading
  • gpu's programmable unified shading GPU programmable unified shading
  • GPGPU GPU programmable unified shading
  • load balancing can be achieved
  • the system forms the current various applications that need to be processed into a task queue, and prepares to configure these tasks to be respectively run in parallel in the parallel processing units. These tasks may have only one type or multiple types.
  • all the processing units are configured as SMP cores, and when the corresponding task is found, the idle SMP core is controlled by the corresponding mechanism.
  • the hardware thread controller begins to control some SMP cores (processing units) as hardware thread slots.
  • Step S12 determining whether the graphics processing task is: in this step, determining whether the task to be processed is a graphics processing task, if yes, executing step S14; otherwise, performing step S13; basically, the above graphics processing task is usually through graphics rendering To achieve.
  • step S12 is executed once for each task, and each execution causes the task to be processed by step S13 or step S14.
  • Step S13 processes the task using the homogeneous parallel programming API, and assigns the obtained thread to the parallel processing unit operation:
  • the task since it has been determined in step S12 that the task is not graphics processing, accordingly, the task should Is a computing application, such as CPU data processing, control, etc. in the traditional sense.
  • the application is processed using a homogeneous parallel programming API, at least one thread is obtained, and the thread is allocated to the above-mentioned processing as an SMP core. Run in the unit, when the application is completed, return data or results according to the general application.
  • the above-mentioned homogeneous parallel programming API may be pthread or openMP, so that the thread obtained by the application processed by it is a pthread thread, and these threads run on the above SMP core.
  • the OS directly allocates threads to parallel multi-hardware thread processing time slots. This action is implemented by the thread running queue, not through the THDC (the hardware thread management unit); these threads run as CPU threads and for the OS. It can be observed and controlled (also includes the time slots running these threads); where the threads created by the traditional pthread API are run to the OS's run queue. These special threads are directly allocated by the OS in the queue to the parallel multi-hardware thread processing time slots described above. At this point, these multiple hardware thread processing slots are similar to the "kernel" in SMP.
  • Step S14 obtains at least one thread through the task, and the hardware thread control unit obtains control of the processing unit by the system:
  • the hardware thread control unit obtains control of the processing unit by the system:
  • the GPU driver starts rendering; first, in this step, the graphics rendering thread needs to be obtained by the task, and the thread is assigned to the hardware thread control unit.
  • the hardware thread control unit requires control of one or more processing units to the system such that the processing units are under the control of the hardware thread control unit and operate as one or more hardware processing time slots.
  • the above thread allocated to the hardware thread control unit can also be called ithread, which is a hardware thread.
  • Step S15 The hardware thread control unit runs the thread on the processing unit: In this step, the thread obtained in the above step S14 is allocated to the processing unit under the control of the hardware thread control unit. It should be noted that, in this embodiment, the above steps S14 and S15 are continuously performed. In this sense, step S14 and step S15 may also be combined into one step; and specific to step S14 and step S15 Details, there will be a more detailed description later.
  • step S13 and the steps S14, S15 are performed in parallel (of course, in the case where both types exist), for example, the last application is graphics processing, by step S14, Step S15 is assigned to the hardware thread control unit and is running, and the current application is a CPU or GPGPU program, and is allocated to the SMP core operation through step S13.
  • step S13 and step S14 and step S15 are parallel, It is two different types of applications that run simultaneously on a single processor without interfering with each other.
  • the method further includes the following steps: determining whether the processing unit controlled by the hardware controller is idle while the task (GPU or heterogeneous GPGPU) processing queue has a task to be processed, and if yes, assigning the task to the foregoing Run on the processing unit controlled by the hardware manager; otherwise, exit this Steps.
  • this step at least one thread generated by the computing application is processed using a heterogeneous parallel programming API and configured to run on an idle processing unit controlled by the hardware controller.
  • the ithread runs the thread on the THDC (the hardware thread management unit) as described above through a user API. Initially, it is usually in kernel mode (admin mode), and when ithread creates a thread, it creates a thread to the THDC command queue. In general, THDC has a higher priority than OS threads.
  • Ithreads can be implemented by a driver running on a kernel-mode processor or directly by an application running on a user-mode processor.
  • ithreads will be created directly into THDC, and when they are uploaded, these threads will run as an embedded program with no system intervention; in the latter case, ithread will be built through the kernel.
  • Run the virtual pthread in the queue then the pthread runs and creates a real ithread to THDC; this extra action creates only one record in the OS, and its TLB exception handler can handle TLB exceptions in user mode Ithread is generated as a co-processing thread on MVP's parallel multi-hardware thread processing time slot.
  • the kernel's scheduler wants to allocate any of its ready threads in the run queue as operating system threads to the parallel multi-hardware thread processing time slots (typically, the thread processing time slots are idle). Always check whether there is a ready thread in the THDC. Through the traditional scheduling mechanism, if the prepared thread in the THDC is waiting, the system's scheduler will exit the original hardware thread to process the time slot and no longer put any new ones. System thread (CPU thread). The important point is that the system scheduler will close the timer interrupt (the time slot) before exiting, allowing the ithread to get full control of the thread's processing time slot without a timer interrupt. And the timer interrupt can only be enabled when ithread exits.
  • the THDC will get the idle hardware thread time slot and use it to run the prepared ithread; when an ithread completes or waits for any events to continue running, the ithread will exit the corresponding hardware thread. Processing time slots; when an active state of an ithread is cleared, the ithread will exit THDC. A CPU thread will be subject to the prepared ithread thread that is discovered when it is ready to run and is checked by the system scheduler for the THDC state.
  • All ithread threads are eventually created into THDC, regardless of whether they were created in kernel mode. Still created in user mode.
  • Figure 2 illustrates the parallel hardware thread time slot from being allocated to the CPU thread control unit or THDC from the perspective of a parallel hardware thread time slot, which includes the following steps:
  • Step S201 Timer interrupt: In this step, the hardware thread time slot has a timer interrupt. As described in the above description, the hardware thread time slot is already running when the system starts running or the thread running on it or When exiting, a timer interrupt is executed. That is to say, when the timer is interrupted, the hardware thread slot under the control of the CPU system receives the start of a new thread starting operation.
  • Step S202 Is there a thread waiting in the run queue? If yes, go to step S203; otherwise, jump to step S205; in this step, the run queue refers to the run queue (usually the task queue) in the system scheduler.
  • Step S203 Re-storing the environment:
  • the environment restore of the thread that is executed by the normal thread is executed, that is, the running environment, configuration, setting parameters, and the like of the thread are re-stored in the thread.
  • the thread in this step is a CPU thread.
  • Step S204 runs the waiting thread: In this step, the thread is run in the hardware thread time slot; when the thread runs or exits, the process returns to step S201;
  • Step S205 Is there an ithread waiting in the THDC? If the step S206 is performed; otherwise, the process goes to step S209;
  • Step S206 The thread slot is removed from the system: In this step, since it is determined in the above step S205 that there are valid threads in the THDC (these threads are hardware threads), and the threads are waiting to run, then the idle The hardware thread time slots (interrupted by the timer) are controlled by the THDC and run these waiting hardware threads. To achieve this, the first thing to do is to remove the thread time slot from the control of the system; Its control is given to THDC. So in this step, the hardware time slot is removed by the system.
  • Step S207 prohibiting the timer interrupt: In this step, when the hardware thread time slot is removed from the system, the timer interrupt of the hardware thread is turned off, so that the thread time slot does not run during the running of the hardware thread. A timer interrupt has occurred.
  • Step S208 The time slot exits: In this step, the hardware thread time slot exits the system;
  • Step S209 CPU-idle thread: This step occurs when there is no hardware thread waiting to run in the above THDC, that is, the whole system has neither the traditional CPU thread waiting nor the hardware thread waiting to run, in this case
  • the hardware thread time slot calls a CPU-idle thread, indicating that no new thread needs to be processed, and returns to step S201;
  • Step S210 THDC upload In this step, the THDC calls the hardware thread program, processes the called hardware thread to obtain an executable file, and uploads the obtained executable file into the above hardware thread time slot.
  • Step S211 ithread operation The ithread thread (ie, the hardware thread) runs in the above hardware thread slot.
  • Step S212 Thread waiting? Determining whether the ithread thread waits, if yes, returning to step S211; otherwise, executing step S213;
  • Step S213 the time slot exits: In this step, the hardware thread time slot exits the THDC;
  • Step S214 enables the timer interrupt: In this step, the timer interrupt of the hardware thread slot is enabled, and returns to step S201; specifically, in this step, the hardware thread slot is completed because the hardware thread has been run. So the hardware thread slot exits THDC and enables the timer interrupt; that is, the time slot is moved back to the system.
  • the device further includes an apparatus for implementing the foregoing method, where the apparatus includes: an application allocating unit 31, an application determining unit 32, a computing application processing unit 33, a graphics acceleration computing unit 34, and a GPGPU thread processing unit 35;
  • the application allocating unit 31 is configured to respectively allocate a plurality of tasks to a plurality of processing units that are SMP cores;
  • the application determining unit 32 is configured to respectively determine whether the plurality of tasks are image rendering tasks;
  • the computing application processing unit 33 is configured to use the homogeneous a parallel programming API (eg, pthread or openMP) processes at least one thread generated by the task, and configures the processed threads to run on a processing unit running as an SMP core;
  • graphics acceleration computing unit 34 is used to The task of graphics rendering generates at least one shading thread to the hardware thread control unit, and starts rendering on the processing unit controlled by the hardware manager by the GPU driver;
  • the processing unit controlled by the hardware management unit is obtained by the hardware management unit to the system GPGPU thread
  • the graphics acceleration processing unit 34 further includes: a call instruction generation module 341, a queue formation module 342, a thread allocation module 343, a thread interruption module 344, and a thread clearing module 345; wherein the call instruction generating module 341 is configured to enable The thread generates its own ithread call instruction to the hardware thread control unit; the queue formation module 342 is configured to cause the hardware thread control unit to form the ithread call instruction according to the receiving time to form its program queue, call and prepare the An ithread; thread allocation module 343 is configured to cause the ithread to run in an idle multi-way parallel hardware thread slot of the processing unit controlled by the hardware manager in sequence according to its queue order in the hardware thread control unit; The thread interrupt module 344 is configured to: when the ithread is executed or enters an event waiting for its execution to continue, the ithread exits its running hardware thread slot and enables the thread timing interrupt of the slot; the thread clearing module 345 Used to cause the hardware thread control unit to detect its operation Whether the valid state of the ithread in
  • the processor includes a plurality of parallel processor hardware cores (ie, processing units, which are labeled as 601 in FIG. 4). 602, 603, 604), a system thread management unit 61 for managing system threads in the processor and assigning the threads to the processor hardware core, and for receiving and managing hardware generated during operation a thread, a hardware thread management unit 62 that allocates the hardware thread to an idle processor hardware core and runs in a coprocessor thread mode (harmony scheduler in FIG. 4, that is, THCD described above); hardware thread management Unit 62 is coupled to the plurality of parallel processor cores (labeled 601, 602, 603, 604 in Figure 4), respectively. It is worth mentioning that the four cores shown in Figure 4 are exemplary, and may actually be 2, 3, 4 or 6 or more.
  • the hardware thread management unit 62 obtains the processing strip first data line 621 through the first data line 621 and is connected to the hardware thread management unit 62; in FIG. 4, these first data lines 621 are also marked as Ithread calls; hardware thread management unit 62 also sends the called and ready thread to the plurality of threads via second data line 622 (also labeled as threadjaunch in Figure 6)
  • the processor hardware core runs; the hardware thread management unit also transfers the state of the called thread to the system thread control unit via the third data line 623.
  • the plurality of processor hardware cores also pass through respective fourth data line thread control units 61; the fourth data line 63 is labeled as pthread/ithread_user_calls in FIG. 4, and each hardware core is There is a fourth data line connected to the system thread control unit 61.
  • a plurality of processor hardware cores and a system thread control unit 61 are respectively connected by a timer interrupt request signal line for transmitting the hardware core timer interrupt signals; each hardware core has a timer interrupt request signal line connection.
  • these signal lines are respectively labeled as timerO_intr, timer l_intr, timer2_intr, and timer3_intr.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The present invention relates to a method for implementing multi-application parallel processing, such as a GPU and a GPGPU, on a single processor. The method comprises the following steps: respectively forming queues by using multiple applications, and preparing to allocate the queues to multiple parallel processing units for parallel running; respectively determining whether the multiple applications are image rendering applications; if yes, the applications generating at least one shading thread to a hardware thread control unit, and starting rendering, by using a GPU driver, on a processing unit controlled by a hardware manager; otherwise, using heterogeneous parallel programming (API) to process the at least one thread generated by the applications, and configuring the processed thread to be run on a processing unit used as an SMP core. The present invention further relates to a device for implementing the method. By implementing the method and device for implementing multi-application parallel processing on a single processor in the present invention, the following beneficial effects exist: two or more different types of applications can be simultaneously processed on one processor.

Description

单个处理器上实现多应用并行处理的方法及装置  Method and device for implementing multi-application parallel processing on a single processor
技术领域 Technical field
本发明涉及处理器领域, 更具体地说, 涉及一种单个处理器上实现多应用 并行处理的方法及装置。  The present invention relates to the field of processors, and more particularly to a method and apparatus for implementing multi-application parallel processing on a single processor.
背景技术 Background technique
传统上 , 具有 GPGPU ( general purpose graphic processing unit ) 能力的 GPU ( graphic processing unit )能够通过可编程渲染器处理典型的 GPU渲染操 作, 也可以通过异质并行编程 API ( heterogeneous parallel programming API ) 处理非渲染操作。但是,这两种不同类型的操作在 GPU上运行时是相互排斥、 不能在同一时间运行的。 这意味着 GPU不能同时执行图形着色程序和非着色 的计算程序( non-shading computer program )。这使得这两种类型的操作都有需 求时的效率受到影响。 在这种情况下, 想要较好的 GPU图形渲染性能得到更 好的图形效果, 就需要增加 GPU内的并行处理阵列的规模和能力。 这就增加 了***的复杂性和成本。  Traditionally, a GPU (Graphic Processing Unit) with GPGPU (General Purpose Graphic Processing Unit) capability can handle typical GPU rendering operations through a programmable renderer, or it can handle non-rendering through a heterogeneous parallel programming API. operating. However, these two different types of operations are mutually exclusive and cannot run at the same time when running on the GPU. This means that the GPU cannot execute both a graphics shader and a non-shading computer program. This affects the efficiency of both types of operations when required. In this case, if you want better graphics performance for better GPU graphics rendering, you need to increase the size and capabilities of the parallel processing array within the GPU. This adds to the complexity and cost of the system.
发明内容 Summary of the invention
本发明要解决的技术问题在于,针对现有技术的上述两种或多种不同类型 的应用不能同时运行的缺陷,提供一种能够使两种或多种不同类型的应用同时 运行的单个处理器上实现多应用并行处理的方法及装置。  The technical problem to be solved by the present invention is to provide a single processor capable of simultaneously running two or more different types of applications, in view of the drawback that the above two or more different types of applications cannot be operated simultaneously. A method and apparatus for implementing multi-application parallel processing.
本发明解决其技术问题所采用的技术方案是:构造一种在单个处理器上实 现多应用并行处理的方法, 所述单个处理器中设置有多个并行的处理单元, 所 述方法包括如下步骤:  The technical solution adopted by the present invention to solve the technical problem is to construct a method for implementing multi-application parallel processing on a single processor, wherein the single processor is provided with a plurality of parallel processing units, and the method includes the following steps :
A )分别将多个应用形成任务队列并准备分配到所述多个并行的处理 单元并行运行;  A) respectively forming a plurality of applications into a task queue and preparing to be distributed to the plurality of parallel processing units to run in parallel;
B )分别判断所述多个应用是否图像渲染应用, 如是, 执行步骤 D ); 否则, 判断为计算应用执行步骤 C );  B) separately determining whether the plurality of applications are image rendering applications, and if so, performing step D); otherwise, determining that the computing application performs step C);
C )使用同质并行编程 API处理所述应用产生的至少一个线程, 并将 所述处理过的线程配置到作为 SMP核的处理单元上运行;  C) processing at least one thread generated by the application using a homogeneous parallel programming API, and configuring the processed thread to run on a processing unit that is an SMP core;
D ) 由所述应用产生至少一个着色线程到硬件线程控制单元, 并通过 GPU驱动在所述硬件管理器控制的处理单元上开始渲染; 所述硬件管理单元 控制的处理单元由所述硬件管理单元向***取得。 D) generating, by the application, at least one shading thread to the hardware thread control unit, and passing The GPU driver starts rendering on the processing unit controlled by the hardware manager; the processing unit controlled by the hardware management unit is acquired by the hardware management unit to the system.
更进一步地, 还包括如下步骤:  Further, the method further includes the following steps:
E )判断是否硬件控制器控制的处理单元空闲而任务处理队列中还有 任务待处理,如是,将所述任务分配到所述硬件管理器控制的处理单元上运行; 否则, 退出本步骤。  E) determining whether the processing unit controlled by the hardware controller is idle and there are still tasks to be processed in the task processing queue, and if so, assigning the task to the processing unit controlled by the hardware manager; otherwise, exiting this step.
更进一步地, 所述步骤 E ) 中进一步包括: 其配置到所述硬件控制器控制的空闲处理单元上运行。  Further, the step E) further includes: configuring the operation to an idle processing unit controlled by the hardware controller.
更进一步地, 所述同质并行编程 API包括 pthread和 openMP, 经过其处 理的线程为 pthread线程; 所述异质并行编程 API包括 openCL, 经过其处理 的线程为 GPGPU线程; 所述 GPU驱动包括 openGL, 经过其处理的线程为 openGL线程。  Further, the homogeneous parallel programming API includes pthread and openMP, and the thread processed by the thread is a pthread thread; the heterogeneous parallel programming API includes openCL, and the thread processed by the GPGPU thread; the GPU driver includes openGL The thread that is processed by it is an openGL thread.
更进一步地, 所述步骤 D )进一步包括:  Further, the step D) further includes:
D1 )所述线程产生属于其自身的 ithread调用指令到硬件线程控制单 元;  D1) the thread generates an ithread call instruction belonging to itself to the hardware thread control unit;
D2 )所述硬件线程控制单元将所述 ithread的调用指令按照接收时间 形成其程序队列, 调用并准备所述 ithread;  D2) the hardware thread control unit forms the program queue of the ithread according to the receiving time, calls and prepares the ithread;
D3 )所述 ithread按照其在所述硬件线程控制单元中的队列顺序依次 在所述硬件管理器控制的处理单元的、 空闲的多路并行硬件线程时隙中运行; 其中, 所述 ithread为硬件线程, 所述 ithread包括图像引擎、 DSP或 /和通用图像处理器中要求硬件加速的线程。  D3) the ithreads are sequentially run in idle multi-way parallel hardware thread slots of the processing unit controlled by the hardware manager according to their queue order in the hardware thread control unit; wherein the ithread is hardware Thread, the ithread includes threads that require hardware acceleration in an image engine, DSP, or/and general purpose image processor.
更进一步地, 所述步骤 D )进一步包括:  Further, the step D) further includes:
D01 )判断所述硬件线程控制单元中是否有有效且未执行完的硬件线 程, 如有, 执行步骤 D02 ); 否则, 执行步骤 D03 );  D01) determining whether there is a valid and unexecuted hardware thread in the hardware thread control unit, if yes, executing step D02); otherwise, performing step D03);
D02 )将当前空闲的多路并行硬件线程时隙由***线程管理单元中移 除, 禁止该并行硬件线程时隙的线程定时器中断, 并将该空闲的多路并行硬件 线程时隙配置给所述硬件线程控制单元控制; D03 )等待并返回该并行硬件线程时隙空闲的信息到***线程管理单 元。 D02) removing the currently idle multiple parallel hardware thread time slot from the system thread management unit, prohibiting the thread timer interrupt of the parallel hardware thread time slot, and configuring the idle multiple parallel hardware thread time slot to the location Hardware thread control unit control; D03) Waiting for and returning information about the idle hardware thread slot idle to the system thread management unit.
更进一步地, 所述步骤 D )还包括如下步骤:  Further, the step D) further includes the following steps:
当所述 ithread执行完毕或进入等待使其继续执行的事件发生时, 所 述 ithread退出其运行的硬件线程时隙并使能该时隙的线程计时中断;  When the ithread executes or enters an event waiting for its execution to continue, the ithread exits its running hardware thread slot and enables the thread timing interrupt of the slot;
所述硬件线程控制单元检测其运行队列中的 ithread的有效状态是否 被清除, 如是, 清除所述 ithread; 否则, 保持所述 ithread。  The hardware thread control unit detects whether the valid state of the ithread in its run queue is cleared, and if so, clears the ithread; otherwise, the ithread is maintained.
本发明还涉及一种实行上述方法的装置,所述单个处理器中设置有多个并 行的处理单元, 所述装置包括:  The present invention also relates to an apparatus for carrying out the above method, wherein a plurality of parallel processing units are disposed in the single processor, the apparatus comprising:
应用分配单元:用于分别将多个任务分配到作为 SMP核的多个处理单 元运行;  Application allocation unit: for respectively assigning multiple tasks to multiple processing units operating as SMP cores;
应用判断单元: 用于分别判断所述多个任务是否图像渲染应用; 计算应用处理单元: 用于使用同质并行编程 API处理所述任务产生的 至少一个线程,并将所述处理过的线程配置到作为 SMP核的处理单元上运行; 图形加速运算单元: 用于由所述任务产生至少一个着色线程到硬件线 程控制单元,并通过 GPU驱动在所述硬件管理器控制的处理单元上开始渲染; 所述硬件管理单元控制的处理单元由所述硬件管理单元向***取得。  An application determining unit: configured to respectively determine whether the plurality of tasks are image rendering applications; a computing application processing unit: configured to process at least one thread generated by the task using a homogeneous parallel programming API, and configure the processed thread Running on a processing unit as an SMP core; a graphics acceleration operation unit: configured to generate at least one shader thread from the task to a hardware thread control unit, and start rendering on a processing unit controlled by the hardware manager by a GPU driver; The processing unit controlled by the hardware management unit is obtained by the hardware management unit to the system.
更进一步地, 还包括:  Further, it also includes:
GPGPU线程处理单元:用于判断是否硬件控制器控制的处理单元空闲 而任务队列中还有任务待处理, 如是, 使用异质并行编程 API 处理所述任务 产生的至少一个线程,并将其配置到所述硬件控制器控制的空闲处理单元上运 行。  GPGPU thread processing unit: for judging whether the processing unit controlled by the hardware controller is idle and there is still a task to be processed in the task queue, and if so, using the heterogeneous parallel programming API to process at least one thread generated by the task and configuring it to The idle controller unit controlled by the hardware controller operates.
更进一步地, 所述图形加速处理单元进一步包括:  Further, the graphics acceleration processing unit further includes:
调用指令产生模块: 用于使所述线程产生属于其自身的 ithread调 用指令到硬件线程控制单元;  Calling the instruction generation module: for causing the thread to generate an ithread call instruction belonging to itself to the hardware thread control unit;
队列形成模块: 用于使所述硬件线程控制单元将所述 ithread的调 用指令按照接收时间形成其程序队列, 调用并准备所述 ithread;  a queue forming module: configured to cause the hardware thread control unit to form a call queue of the ithread according to a receiving time, and invoke and prepare the ithread;
线程分配模块: 用于使所述 ithread按照其在所述硬件线程控制单 元中的队列顺序依次在所述硬件管理器控制的处理单元的、空闲的多路并行硬 件线程时隙中运行; Thread allocation module: for causing the ithread to follow its control thread in the hardware thread The queue order in the element is sequentially run in the idle multi-way parallel hardware thread time slot of the processing unit controlled by the hardware manager;
线程中断模块: 用于当所述 ithread执行完毕或进入等待使其继 续执行的事件发生时, 所述 ithread退出其运行的硬件线程时隙并使能该时隙 的线程计时中断;  Thread interrupt module: for when the ithread executes or enters an event waiting for its execution to continue, the ithread exits its running hardware thread slot and enables the thread timing interrupt of the slot;
线程清除模块: 用于使所述硬件线程控制单元检测其运行队列中 的 ithread的有效状态是否被清除, 如是, 清除所述 ithread; 否则, 保持所述 ithread  a thread clearing module: configured to cause the hardware thread control unit to detect whether an effective state of an ithread in its running queue is cleared, if yes, clearing the ithread; otherwise, maintaining the ithread
其中, 所述 ithread为硬件线程, 所述 ithread包括图像引擎、 DSP或 The ithread is a hardware thread, and the ithread includes an image engine, a DSP, or
/和通用图像处理器中要求硬件加速的线程。 / and threads that require hardware acceleration in a general-purpose image processor.
实施本发明的单个处理器上实现多应用并行处理的方法及装置,具有以下 有益效果: 由于将不同类型的应用统一处理、且将与图形处理相关的应用分配 到硬件线程控制单元控制的处理单元中运行, 同时, 未被上述硬件线程控制单 元控制的处理单元仍然在处理由***的线程控制单元分配的计算应用所产生 的线程, 所以能够在一个处理器上同时处理两种或多种不同类型的应用。 附图说明  A method and apparatus for implementing multi-application parallel processing on a single processor embodying the present invention has the following beneficial effects: Since different types of applications are uniformly processed, and applications related to graphics processing are allocated to a processing unit controlled by a hardware thread control unit In operation, at the same time, the processing unit not controlled by the above hardware thread control unit is still processing the thread generated by the computing application allocated by the thread control unit of the system, so that two or more different types can be processed simultaneously on one processor. Applications. DRAWINGS
图 1 是本发明单个处理器上实现多应用并行处理的方法实施例中方法流 程图;  1 is a flow chart of a method in an embodiment of a method for implementing multi-application parallel processing on a single processor of the present invention;
图 2是所述实施例硬件线程控制单元实行 GPU应用的流程图;  2 is a flow chart of the hardware thread control unit of the embodiment implementing a GPU application;
图 3是所述实施例中装置结构示意图;  Figure 3 is a schematic structural view of the device in the embodiment;
图 4是所述实施例处理器的结构示意图。  4 is a schematic structural diagram of a processor of the embodiment.
具体实施方式 detailed description
下面将结合附图对本发明实施例作进一步说明。  The embodiments of the present invention will be further described below in conjunction with the accompanying drawings.
如图 1所示,在本发明的单个处理器上实现多应用并行处理的方法实施例 中, 其方法包括如下步骤:  As shown in FIG. 1, in an embodiment of a method for implementing multi-application parallel processing on a single processor of the present invention, the method includes the following steps:
步骤 S11 多个应用形成队列, 并准备分配到多个并行处理单元上运行: 在本步骤中,分别将多个应用形成任务队列并准备分配到多个并行的处理单元 并行运行, 为下一步的并行处理做好准备。 在本实施中, 其涉及的处理器是一  Step S11, multiple applications form a queue, and are ready to be distributed to run on multiple parallel processing units: In this step, multiple applications are respectively formed into task queues and ready to be distributed to multiple parallel processing units to run in parallel, for the next step. Prepare for parallel processing. In this implementation, the processor involved is one
4 个具有多个并行处理单元的处理器,称之为统一处理器( unified processing unit UPU ),该处理器的核心部分的具体结构如图 4所示。 该处理器包括的多个并 行处理单元可以是对称多任务处理器( symmetrical-multi-processing ,SMP ), 同 质 (homogeneous ) 并行处理器、 GPU 的可编程统一浓淡处理器 ( gpu's programmable unified shading )或 GPGPU的异质 ( heterogeneous )并行处理器。 这些类型的处理器或处理器核都可以在同一时间运行且相互之间不会产生影 响。 同时, 还可以通过对这些处理器的配置来达到负载平衡, 例如, 将空闲的 处理单元配置为某种类型的处理器以加快这种类型的处理速度。这种配置是通 过***的线程控制单元和硬件线程控制单元来实现的。 这些结构请参见图 4。 在本步骤中, ***将当前的各种需要处理的应用形成任务队列, 准备将这些任 务分别配置到上述并行的处理单元中分别并行地运行。这些任务可能只有一种 类型, 也可能有多种类型。 在本实施例中的一种情况下, ***开始工作时, 例 如, 上电时, 将上述处理单元全部配置为 SMP核, 在发现相应的任务时, 空 闲的 SMP核才由相应的机构控制, 例如, 发现图形处理应用时, 硬件线程控 制器才开始控制一些 SMP核 (处理单元), 将其作为硬件线程时隙。 4 A processor having a plurality of parallel processing units is called a unified processing unit (UPU), and a specific structure of a core portion of the processor is shown in FIG. The processor includes a plurality of parallel processing units, which may be symmetric multi-processing (SMP), homogenous parallel processors, and GPU programmable unified shading (gpu's programmable unified shading). Or a heterogeneous parallel processor of GPGPU. These types of processors or processor cores can all run at the same time without affecting each other. At the same time, load balancing can be achieved by configuring these processors, for example, configuring idle processing units as some type of processor to speed up this type of processing. This configuration is implemented by the system's thread control unit and hardware thread control unit. See Figure 4 for these structures. In this step, the system forms the current various applications that need to be processed into a task queue, and prepares to configure these tasks to be respectively run in parallel in the parallel processing units. These tasks may have only one type or multiple types. In one case in this embodiment, when the system starts to work, for example, when the power is turned on, all the processing units are configured as SMP cores, and when the corresponding task is found, the idle SMP core is controlled by the corresponding mechanism. For example, when a graphics processing application is discovered, the hardware thread controller begins to control some SMP cores (processing units) as hardware thread slots.
步骤 S12 判断是否图形处理任务: 在本步骤中, 判断当前要处理的任务 是否为图形处理任务, 如是, 执行步骤 S14; 否则, 执行步骤 S13; 基本上来 讲, 上述图形处理任务通常是通过图形渲染来达到的。 在本实施例中, 值得一 提的是, 步骤 S12在每处理一个任务就执行一次,每次执行都会使得该任务被 步骤 S13或步骤 S14处理。  Step S12: determining whether the graphics processing task is: in this step, determining whether the task to be processed is a graphics processing task, if yes, executing step S14; otherwise, performing step S13; basically, the above graphics processing task is usually through graphics rendering To achieve. In the present embodiment, it is worth mentioning that step S12 is executed once for each task, and each execution causes the task to be processed by step S13 or step S14.
步骤 S13 使用同质并行编程 API处理该任务, 并将得到的线程分配到并 行处理单元运行: 在本步骤中, 由于在步骤 S12中已经判断该任务不是图形处 理, 所以, 相应地, 该任务应该是计算应用, 例如传统意义上的 CPU数据处 理、 控制等等, 在本步骤中, 使用同质并行编程 API处理该应用, 得到至少 一个线程, 并将该线程分配到上述作为 SMP核心运行的处理单元中运行, 当 该应用完成后, 按照一般的应用返回数据或结果即可。 在本实施例中, 上述同 质并行编程 API可以是 pthread, 也可以是 openMP, 这样, 经过其处理的应用 得到的线程就是 pthread线程,这些线程运行在上述 SMP核上。在现有技术中,  Step S13 processes the task using the homogeneous parallel programming API, and assigns the obtained thread to the parallel processing unit operation: In this step, since it has been determined in step S12 that the task is not graphics processing, accordingly, the task should Is a computing application, such as CPU data processing, control, etc. in the traditional sense. In this step, the application is processed using a homogeneous parallel programming API, at least one thread is obtained, and the thread is allocated to the above-mentioned processing as an SMP core. Run in the unit, when the application is completed, return data or results according to the general application. In this embodiment, the above-mentioned homogeneous parallel programming API may be pthread or openMP, so that the thread obtained by the application processed by it is a pthread thread, and these threads run on the above SMP core. In the prior art,
5 最初 OS直接分配线程到并行的多硬件线程处理时隙, 这个动作通过线程运行 队列实现, 并不通过 THDC ( the hardware thread controller, 即硬件线程管理单 元); 这些线程作为 CPU的线程运行且对于 OS而言是可以观察和控制的(也 包括运行这些线程的时隙); 其中, 通过传统的 pthread API创建的线程到 OS 的运行队列。 这些特殊的线程在队列中被 OS直接分配到上述并行的多硬件线 程处理时隙中。 此时, 这些多硬件线程处理时隙与 SMP中的 "内核" 相似。 5 Initially the OS directly allocates threads to parallel multi-hardware thread processing time slots. This action is implemented by the thread running queue, not through the THDC (the hardware thread management unit); these threads run as CPU threads and for the OS. It can be observed and controlled (also includes the time slots running these threads); where the threads created by the traditional pthread API are run to the OS's run queue. These special threads are directly allocated by the OS in the queue to the parallel multi-hardware thread processing time slots described above. At this point, these multiple hardware thread processing slots are similar to the "kernel" in SMP.
步骤 S14通过该任务得到至少一个线程, 硬件线程控制单元由***得到 处理单元的控制权: 在本步骤中, 由于判断该任务涉及图形处理, 需要进行图 形渲染, 所以, 对该任务使用 openGL通过典型的 GPU驱动开始进行渲染; 首先, 在本步骤中, 需要由该任务得到图形渲染线程, 并将该线程分配到硬件 线程控制单元。据此,硬件线程控制单元向***要求一个或多个处理单元的控 制权,使得这些处理单元处于硬件线程控制单元的控制下,且作为一个或多个 硬件处理时隙运行。 上述分配到硬件线程控制单元的线程也可以称为 ithread, 即硬件线程。  Step S14 obtains at least one thread through the task, and the hardware thread control unit obtains control of the processing unit by the system: In this step, since it is determined that the task involves graphics processing, graphics rendering is required, so the openGL is used for the task. The GPU driver starts rendering; first, in this step, the graphics rendering thread needs to be obtained by the task, and the thread is assigned to the hardware thread control unit. Accordingly, the hardware thread control unit requires control of one or more processing units to the system such that the processing units are under the control of the hardware thread control unit and operate as one or more hardware processing time slots. The above thread allocated to the hardware thread control unit can also be called ithread, which is a hardware thread.
步骤 S15硬件线程控制单元在上述处理单元上运行上述线程: 在本步骤 中,将上述步骤 S14中得到线程分配到硬件线程控制单元控制下的处理单元上 运行。 值得一提的是, 在本实施例中, 上述步骤 S14和步骤 S15是连续执行 的, 从这个意义上来讲, 也可以将步骤 S14和步骤 S15合并为一个步骤; 关 于步骤 S14和步骤 S15的具体细节, 稍后会有较为详细的描述。 同时, 对于 多个应用而言, 上述步骤 S13和步骤 S14、 S15之间是并行执行的 (当然是在 两种类型均存在的情况下) , 例如, 上一应用是图形处理, 通过步骤 S14、 步 骤 S15分配到硬件线程控制单元且正在运行, 而当前应用是 CPU或 GPGPU 程序, 通过步骤 S13分配到 SMP核运行, 则对于当前而言, 上述步骤 S13和 步骤 S14、 步骤 S15是并行的, 也就是两种不同类型的应用在单个处理器上同 时运行, 且互不干扰。  Step S15: The hardware thread control unit runs the thread on the processing unit: In this step, the thread obtained in the above step S14 is allocated to the processing unit under the control of the hardware thread control unit. It should be noted that, in this embodiment, the above steps S14 and S15 are continuously performed. In this sense, step S14 and step S15 may also be combined into one step; and specific to step S14 and step S15 Details, there will be a more detailed description later. At the same time, for a plurality of applications, the above step S13 and the steps S14, S15 are performed in parallel (of course, in the case where both types exist), for example, the last application is graphics processing, by step S14, Step S15 is assigned to the hardware thread control unit and is running, and the current application is a CPU or GPGPU program, and is allocated to the SMP core operation through step S13. For the present, step S13 and step S14 and step S15 are parallel, It is two different types of applications that run simultaneously on a single processor without interfering with each other.
在本实施例中,上述方法还包括如下步骤: 判断是否硬件控制器控制的处 理单元空闲而同时任务(GPU或异质 GPGPU )处理队列中还有任务待处理, 如是, 将该任务分配到上述硬件管理器控制的处理单元上运行; 否则, 退出本 步骤。 在本步骤中, 使用异质并行编程 API 处理所述计算应用产生的至少一 个线程, 并将其配置到所述硬件控制器控制的空闲处理单元上运行。 In this embodiment, the method further includes the following steps: determining whether the processing unit controlled by the hardware controller is idle while the task (GPU or heterogeneous GPGPU) processing queue has a task to be processed, and if yes, assigning the task to the foregoing Run on the processing unit controlled by the hardware manager; otherwise, exit this Steps. In this step, at least one thread generated by the computing application is processed using a heterogeneous parallel programming API and configured to run on an idle processing unit controlled by the hardware controller.
在本实施例中, ithread通过一个用户 API在 THDC ( the hardware thread controller, 即上面所述的硬件线程管理单元)上运行线程。 开始时, 通常处于 内核模式(管理员模式) , 当 ithread创建线程时, 创建线程到 THDC的命令 队列。 通常, THDC较 OS的线程具有较高的优先级。  In this embodiment, the ithread runs the thread on the THDC (the hardware thread management unit) as described above through a user API. Initially, it is usually in kernel mode (admin mode), and when ithread creates a thread, it creates a thread to the THDC command queue. In general, THDC has a higher priority than OS threads.
Ithread 的产生能够由运行在内核模式的处理器上的一个驱动程序或直接 由一个运行在用户模式的处理器上的应用程序实现。 在前一种情况下, ithread 将直接被创建到 THDC, 且当其上载时, 这些线程作为一个没有***干涉的嵌 入式程序运行; 在后一种情况下, ithread将通过一个被建立在内核的运行队列 中的虚拟 pthread, 然后该 pthread运行并创建一个真正的 ithread到 THDC; 这 个额外的动作仅建立一个记录在 OS中,为其 TLB异常处理程序可以处理 TLB 异常,这些异常是在用户模式下 ithread在 MVP的并行多硬件线程处理时隙上 作为协处理线程运行时产生的。  The generation of Ithreads can be implemented by a driver running on a kernel-mode processor or directly by an application running on a user-mode processor. In the former case, ithreads will be created directly into THDC, and when they are uploaded, these threads will run as an embedded program with no system intervention; in the latter case, ithread will be built through the kernel. Run the virtual pthread in the queue, then the pthread runs and creates a real ithread to THDC; this extra action creates only one record in the OS, and its TLB exception handler can handle TLB exceptions in user mode Ithread is generated as a co-processing thread on MVP's parallel multi-hardware thread processing time slot.
在内核的调度程序要将其运行队列中的任何一个准备就绪的线程作为操 作***线程分配到上述并行多硬件线程处理时隙中运行时(通常的情况下, 意 味着线程处理时隙出现空闲) , 总要检查 THDC 中是否有准备就绪的线程, 通过传统的调度机制, 如果 THDC 中有准备好的线程在等待, ***的调度程 序将退出原先的硬件线程处理时隙,不再放入任何新的***线程( CPU线程 )。 重要的一点是, ***调度程序在退出之前, 将关闭 (该时隙) 的定时器中断, 允许 ithread在没有定时器中断的情况下拿到该线程处理时隙的全部控制权。 并且该定时器中断只能在 ithread 退出时使能。 ***调度程序退出后, THDC 将得到空闲的硬件线程时隙, 并将其用于运行准备好的 ithread; 当一个 ithread 完成或等待使其继续运行的任何事件时, 该 ithread将退出相应的硬件线程处 理时隙; 当一个 ithread的有效状态被清除时, 该 ithread将退出 THDC。 一个 CPU线程将服从于当其准备开始运行并由***调度程序检查 THDC状态时发 现的准备好的 ithread线程。  The kernel's scheduler wants to allocate any of its ready threads in the run queue as operating system threads to the parallel multi-hardware thread processing time slots (typically, the thread processing time slots are idle). Always check whether there is a ready thread in the THDC. Through the traditional scheduling mechanism, if the prepared thread in the THDC is waiting, the system's scheduler will exit the original hardware thread to process the time slot and no longer put any new ones. System thread (CPU thread). The important point is that the system scheduler will close the timer interrupt (the time slot) before exiting, allowing the ithread to get full control of the thread's processing time slot without a timer interrupt. And the timer interrupt can only be enabled when ithread exits. After the system scheduler exits, the THDC will get the idle hardware thread time slot and use it to run the prepared ithread; when an ithread completes or waits for any events to continue running, the ithread will exit the corresponding hardware thread. Processing time slots; when an active state of an ithread is cleared, the ithread will exit THDC. A CPU thread will be subject to the prepared ithread thread that is discovered when it is ready to run and is checked by the system scheduler for the THDC state.
所有的 ithread线程最终创建到 THDC中, 不管其是在内核模式下创建的 还是在用户模式下创建的。 All ithread threads are eventually created into THDC, regardless of whether they were created in kernel mode. Still created in user mode.
图 2从一个并行硬件线程时隙的角度示出了该并行硬件线程时隙在分配 到 CPU线程控制单元或 THDC的情况, 其包括如下步骤:  Figure 2 illustrates the parallel hardware thread time slot from being allocated to the CPU thread control unit or THDC from the perspective of a parallel hardware thread time slot, which includes the following steps:
步骤 S201 定时器中断: 在本步骤中, 该硬件线程时隙出现定时器中断, 正如上面的描述所记载的一样,硬件线程时隙在***开始运行时或在其上运行 的线程已经运行完成或退出时, 均会执行定时器中断。 也就是说, 定时器中断 时 CPU***控制下的硬件线程时隙接收一个新线程开始运行的开始。  Step S201: Timer interrupt: In this step, the hardware thread time slot has a timer interrupt. As described in the above description, the hardware thread time slot is already running when the system starts running or the thread running on it or When exiting, a timer interrupt is executed. That is to say, when the timer is interrupted, the hardware thread slot under the control of the CPU system receives the start of a new thread starting operation.
步骤 S202运行队列中有线程在等待? 如是, 执行步骤 S203; 否则, 跳 转执行步骤 S205; 在本步骤中, 运行队列指的是***调度程序中的运行队列 (通常是任务队列)。  Step S202 Is there a thread waiting in the run queue? If yes, go to step S203; otherwise, jump to step S205; in this step, the run queue refers to the run queue (usually the task queue) in the system scheduler.
步骤 S203 环境重存:本步骤中执行的是通常的线程运行时都会执行的线 程的环境重存(context restore ),也就是将该线程的运行环境、 配置、 设定的 参数等等重新存储在制定的区域内,便于该线程在运行时调用; 本步骤中的线 程是 CPU线程。  Step S203: Re-storing the environment: In this step, the environment restore of the thread that is executed by the normal thread is executed, that is, the running environment, configuration, setting parameters, and the like of the thread are re-stored in the thread. Within the defined area, it is convenient for the thread to be called at runtime; the thread in this step is a CPU thread.
步骤 S204运行等待的线程: 在本步骤中, 在该硬件线程时隙运行上述线 程; 当该线程运行完成或退出时, 返回步骤 S201;  Step S204 runs the waiting thread: In this step, the thread is run in the hardware thread time slot; when the thread runs or exits, the process returns to step S201;
步骤 S205 THDC中有 ithread在等待? 如是执行步骤 S206; 否则, 跳转 到步骤 S209;  Step S205 Is there an ithread waiting in the THDC? If the step S206 is performed; otherwise, the process goes to step S209;
步骤 S206 线程时隙由***中移除: 在本步骤中, 由于在上述步骤 S205 中已经判断 THDC中存在有效的线程(这些线程均为硬件线程 ), 且这些线程 正在等待运行,于是将该空闲的(经过定时器中断的)硬件线程时隙交由 THDC 控制并运行这些等待的硬件线程, 为实现这一目的, 首先要做的就是将该线程 时隙由***的控制中移除; 再将其控制权交给 THDC。 所以在本步骤中, 将硬 件时隙由***移除。  Step S206: The thread slot is removed from the system: In this step, since it is determined in the above step S205 that there are valid threads in the THDC (these threads are hardware threads), and the threads are waiting to run, then the idle The hardware thread time slots (interrupted by the timer) are controlled by the THDC and run these waiting hardware threads. To achieve this, the first thing to do is to remove the thread time slot from the control of the system; Its control is given to THDC. So in this step, the hardware time slot is removed by the system.
步骤 S207 禁止定时器中断: 在本步骤中, 当将该硬件线程时隙由***中 移除时, 关闭该硬件线程的定时器中断,使得该线程时隙在运行上述硬件线程 的过程中不会发生定时器中断。  Step S207: prohibiting the timer interrupt: In this step, when the hardware thread time slot is removed from the system, the timer interrupt of the hardware thread is turned off, so that the thread time slot does not run during the running of the hardware thread. A timer interrupt has occurred.
步骤 S208 时隙退出: 在本步骤中, 上述硬件线程时隙退出***;  Step S208: The time slot exits: In this step, the hardware thread time slot exits the system;
8 步骤 S209 CPU-idle线程: 本步骤是在上述 THDC中不存在等待运行的硬 件线程的情况下出现的, 也就是说整个***既没有传统的 CPU线程等待、 也 没有硬件线程等待运行, 在此情况下, 该硬件线程时隙调用 CPU-idle线程, 表示当前没有新的线程需要处理, 并返回步骤 S201 ; 8 Step S209 CPU-idle thread: This step occurs when there is no hardware thread waiting to run in the above THDC, that is, the whole system has neither the traditional CPU thread waiting nor the hardware thread waiting to run, in this case Next, the hardware thread time slot calls a CPU-idle thread, indicating that no new thread needs to be processed, and returns to step S201;
步骤 S210 THDC上载: 在本步骤中, THDC调用硬件线程程序, 将调用 的硬件线程处理后得到可执行文件,并将得到的可执行文件上载到上述硬件线 程时隙中。  Step S210 THDC upload: In this step, the THDC calls the hardware thread program, processes the called hardware thread to obtain an executable file, and uploads the obtained executable file into the above hardware thread time slot.
步骤 S211 ithread运行: ithread线程(即硬件线程)在上述硬件线程时隙 中运行。  Step S211 ithread operation: The ithread thread (ie, the hardware thread) runs in the above hardware thread slot.
步骤 S212 线程等待? 判断是否出现 ithread线程等待的情况, 如是, 返 回步骤 S211; 否则, 执行步骤 S213;  Step S212 Thread waiting? Determining whether the ithread thread waits, if yes, returning to step S211; otherwise, executing step S213;
步骤 S213 时隙退出: 在本步骤中, 上述硬件线程时隙退出 THDC;  Step S213, the time slot exits: In this step, the hardware thread time slot exits the THDC;
步骤 S214使能定时器中断: 在本步骤中, 使能该硬件线程时隙的定时器 中断, 并返回步骤 S201; 具体来讲, 在本步骤中, 上述硬件线程时隙由于硬 件线程已经运行完成, 所以该硬件线程时隙退出 THDC, 并使能定时器中断; 也就是将该时隙移回***。  Step S214 enables the timer interrupt: In this step, the timer interrupt of the hardware thread slot is enabled, and returns to step S201; specifically, in this step, the hardware thread slot is completed because the hardware thread has been run. So the hardware thread slot exits THDC and enables the timer interrupt; that is, the time slot is moved back to the system.
在本实施例中, 还涉及一种实现上述方法的装置, 该装置包括: 应用分配 单元 31、 应用判断单元 32、 计算应用处理单元 33、 图形加速运算单元 34和 GPGPU线程处理单元 35; 其中, 应用分配单元 31用于分别将多个任务分配 到作为 SMP核的多个处理单元运行;应用判断单元 32用于分别判断上述多个 任务是否图像渲染任务;计算应用处理单元 33用于使用同质并行编程 API(例 如, pthread或 openMP )处理所述任务产生的至少一个个线程, 并将这些处 理过的线程配置到作为 SMP核运行的处理单元上运行; 图形加速运算单元 34 用于由上述涉及图形渲染的任务产生至少一个着色线程到硬件线程控制单元, 并通过 GPU驱动在上述硬件管理器控制的处理单元上开始渲染; 所述硬件管 理单元控制的处理单元由所述硬件管理单元向***取得; GPGPU线程处理单 元 35用于判断是否硬件控制器控制的处理单元空闲而任务队列中还有任务待 处理,如是,使用异质并行编程 API处理所述计算应用产生的至少一个线程,  In this embodiment, the device further includes an apparatus for implementing the foregoing method, where the apparatus includes: an application allocating unit 31, an application determining unit 32, a computing application processing unit 33, a graphics acceleration computing unit 34, and a GPGPU thread processing unit 35; The application allocating unit 31 is configured to respectively allocate a plurality of tasks to a plurality of processing units that are SMP cores; the application determining unit 32 is configured to respectively determine whether the plurality of tasks are image rendering tasks; and the computing application processing unit 33 is configured to use the homogeneous a parallel programming API (eg, pthread or openMP) processes at least one thread generated by the task, and configures the processed threads to run on a processing unit running as an SMP core; graphics acceleration computing unit 34 is used to The task of graphics rendering generates at least one shading thread to the hardware thread control unit, and starts rendering on the processing unit controlled by the hardware manager by the GPU driver; the processing unit controlled by the hardware management unit is obtained by the hardware management unit to the system GPGPU thread processing unit 35 is used to determine Whether the processing unit controlled by the hardware controller is idle and there is still a task to be processed in the task queue, and if so, at least one thread generated by the computing application is processed using a heterogeneous parallel programming API,
9 并将其配置到所述硬件控制器控制的空闲处理单元上运行。 9 And configure it to run on the idle processing unit controlled by the hardware controller.
在本实施例中,图形加速处理单元 34进一步包括:调用指令产生模块 341、 队列形成模块 342、 线程分配模块 343、 线程中断模块 344以及线程清除模块 345; 其中, 调用指令产生模块 341用于使所述线程产生属于其自身的 ithread 调用指令到硬件线程控制单元;队列形成模块 342用于使所述硬件线程控制单 元将所述 ithread的调用指令按照接收时间形成其程序队列, 调用并准备所述 ithread;线程分配模块 343用于使所述 ithread按照其在所述硬件线程控制单元 中的队列顺序依次在所述硬件管理器控制的处理单元的、空闲的多路并行硬件 线程时隙中运行; 线程中断模块 344用于当所述 ithread执行完毕或进入等待 使其继续执行的事件发生时, 所述 ithread退出其运行的硬件线程时隙并使能 该时隙的线程计时中断;线程清除模块 345用于使所述硬件线程控制单元检测 其运行队列中的 ithread的有效状态是否被清除, 如是, 清除所述 ithread; 否 则, 保持所述 ithread。 在本实施例中, 上述 ithread为硬件线程, ithread包括 图像引擎、 DSP或 /和通用图像处理器中要求硬件加速的线程。  In this embodiment, the graphics acceleration processing unit 34 further includes: a call instruction generation module 341, a queue formation module 342, a thread allocation module 343, a thread interruption module 344, and a thread clearing module 345; wherein the call instruction generating module 341 is configured to enable The thread generates its own ithread call instruction to the hardware thread control unit; the queue formation module 342 is configured to cause the hardware thread control unit to form the ithread call instruction according to the receiving time to form its program queue, call and prepare the An ithread; thread allocation module 343 is configured to cause the ithread to run in an idle multi-way parallel hardware thread slot of the processing unit controlled by the hardware manager in sequence according to its queue order in the hardware thread control unit; The thread interrupt module 344 is configured to: when the ithread is executed or enters an event waiting for its execution to continue, the ithread exits its running hardware thread slot and enables the thread timing interrupt of the slot; the thread clearing module 345 Used to cause the hardware thread control unit to detect its operation Whether the valid state of the ithread in the queue is cleared, and if so, clearing the ithread; otherwise, maintaining the ithread. In this embodiment, the above ithread is a hardware thread, and the ithread includes a thread that requires hardware acceleration in the image engine, DSP, or/and general image processor.
在本实施例中, 还涉及一种 UPU处理器, 请参见图 4, 该处理器包括多 个并行的、 用于运行线程的处理器硬件内核(即处理单元, 在图 4 中标记为 601、 602、 603、 604 ), 用于管理所述处理器中***线程并将这些线程分配到 所述处理器硬件内核中运行的***线程管理单元 61 , 还包括用于接收并管理 运行中产生的硬件线程、将所述硬件线程分配到空闲的处理器硬件内核上、并 以协处理器线程方式运行的硬件线程管理单元 62(图 4中的 harmony scheduler, 也就是上面记载的 THCD );硬件线程管理单元 62分别与所述多个并行的处理 器内核(在图 4中标记为 601、 602、 603、 604 )连接。 值得一提的是, 在图 4 中示出 4个内核是示例性的, 实际中可能是 2、 3、 4或 6个或更多。  In this embodiment, a UPU processor is also involved. Referring to FIG. 4, the processor includes a plurality of parallel processor hardware cores (ie, processing units, which are labeled as 601 in FIG. 4). 602, 603, 604), a system thread management unit 61 for managing system threads in the processor and assigning the threads to the processor hardware core, and for receiving and managing hardware generated during operation a thread, a hardware thread management unit 62 that allocates the hardware thread to an idle processor hardware core and runs in a coprocessor thread mode (harmony scheduler in FIG. 4, that is, THCD described above); hardware thread management Unit 62 is coupled to the plurality of parallel processor cores (labeled 601, 602, 603, 604 in Figure 4), respectively. It is worth mentioning that the four cores shown in Figure 4 are exemplary, and may actually be 2, 3, 4 or 6 or more.
在本实施例中, 硬件线程管理单元 62通过第一数据线 621取得所述处理 条第一数据线 621连接到硬件线程管理单元 62; 在图 4中, 这些第一数据线 621也被标记为 ithread calls;硬件线程管理单元 62还通过第二数据线 622 (在 图 6中也被标记为 threadjaunch )将被调用并准备就绪的线程发送到所述多个  In the present embodiment, the hardware thread management unit 62 obtains the processing strip first data line 621 through the first data line 621 and is connected to the hardware thread management unit 62; in FIG. 4, these first data lines 621 are also marked as Ithread calls; hardware thread management unit 62 also sends the called and ready thread to the plurality of threads via second data line 622 (also labeled as threadjaunch in Figure 6)
10 处理器硬件内核上运行;硬件线程管理单元还通过第三数据线 623将其中被调 用线程的状态传送到所述***线程控制单元。 10 The processor hardware core runs; the hardware thread management unit also transfers the state of the called thread to the system thread control unit via the third data line 623.
在本实施例中, 所述多个处理器硬件内核还分别通过各自的第四数据线 统线程控制单元 61; 上述第 四数据线 63 在图 4 中标记为 pthread/ithread_user_calls , 每个硬件内核均有一条第四数据线连接到***线程 控制单元 61 。 多个处理器硬件内核和***线程控制单元 61之间还分别通过 传输所述各硬件内核定时器中断信号的定时器中断请求信号线连接;每个硬件 内核均有一条定时器中断请求信号线连接到***线程控制单元 61 ,在图 4中, 这些信号线分另 ll被标 i己为 timerO_intr、 timer l_intr, timer2_intr和 timer3_intr。 但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域 的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和 改进, 这些都属于本发明的保护范围。 因此, 本发明专利的保护范围应以所附 权利要求为准。  In this embodiment, the plurality of processor hardware cores also pass through respective fourth data line thread control units 61; the fourth data line 63 is labeled as pthread/ithread_user_calls in FIG. 4, and each hardware core is There is a fourth data line connected to the system thread control unit 61. A plurality of processor hardware cores and a system thread control unit 61 are respectively connected by a timer interrupt request signal line for transmitting the hardware core timer interrupt signals; each hardware core has a timer interrupt request signal line connection. To the system thread control unit 61, in FIG. 4, these signal lines are respectively labeled as timerO_intr, timer l_intr, timer2_intr, and timer3_intr. However, it is not to be construed as limiting the scope of the invention. It should be noted that a number of variations and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention. Therefore, the scope of the invention should be determined by the appended claims.
11 11

Claims

权利要求书 claims
1、 一种在单个处理器上实现多应用并行处理的方法, 其特征在于, 所 述单个处理器中设置有多个并行的处理单元, 所述方法包括如下步骤: 1. A method for implementing multi-application parallel processing on a single processor, characterized in that the single processor is provided with multiple parallel processing units, and the method includes the following steps:
A )分别将多个应用形成任务队列并准备分配到所述多个并行的处 理单元并行运行; A) Form multiple applications into task queues and prepare to be assigned to the multiple parallel processing units for parallel running;
B )分别判断所述多个应用是否图像渲染应用,如是,执行步骤 D ); 否则, 判断为计算应用执行步骤 C ); B) Determine whether the multiple applications are image rendering applications respectively. If so, execute step D); otherwise, determine whether the multiple applications are image rendering applications and execute step C);
C )使用同质并行编程 API处理所述应用产生的至少一个线程, 并 将所述处理过的线程配置到作为 SMP核的处理单元上运行; C) Use the homogeneous parallel programming API to process at least one thread generated by the application, and configure the processed thread to run on a processing unit serving as an SMP core;
D ) 由所述应用产生至少一个着色线程到硬件线程控制单元, 并通 过 GPU驱动在所述硬件管理器控制的处理单元上开始渲染; 所述硬件管理 单元控制的处理单元由所述硬件管理单元向***取得。 D) The application generates at least one shading thread to the hardware thread control unit, and starts rendering on the processing unit controlled by the hardware manager through the GPU driver; the processing unit controlled by the hardware management unit is controlled by the hardware management unit Get it from the system.
2、根据权利要求 1所述的在单个处理器上实现多应用并行处理的方法, 其特征在于, 还包括如下步骤: 2. The method for implementing multi-application parallel processing on a single processor according to claim 1, further comprising the following steps:
E )判断是否硬件控制器控制的处理单元空闲而任务处理队列中还 有任务待处理, 如是, 将所述任务分配到所述硬件管理器控制的处理单元 上运行; 否则, 退出本步骤。 E) Determine whether the processing unit controlled by the hardware controller is idle and there are tasks to be processed in the task processing queue. If so, allocate the task to the processing unit controlled by the hardware manager for execution; otherwise, exit this step.
3、根据权利要求 2所述的在单个处理器上实现多应用并行处理的方法, 其特征在于, 所述步骤 E ) 中进一步包括: 将其配置到所述硬件控制器控制的空闲处理单元上运行。 3. The method for implementing multi-application parallel processing on a single processor according to claim 2, wherein step E) further includes: configuring it to an idle processing unit controlled by the hardware controller. run.
4、根据权利要求 3所述的在单个处理器上实现多应用并行处理的方法, 其特征在于, 所述同质并行编程 API包括 pthread和 openMP, 经过其处理 的线程为 pthread线程; 所述异质并行编程 API包括 openCL, 经过其处理 的线程为 GPGPU线程; 所述 GPU驱动包括 openGL, 经过其处理的线程为 openGL线程。 4. The method for implementing multi-application parallel processing on a single processor according to claim 3, characterized in that the homogeneous parallel programming API includes pthread and openMP, and the thread processed by it is a pthread thread; the heterogeneous parallel programming API includes The qualitative parallel programming API includes openCL, and the threads processed by it are GPGPU threads; the GPU driver includes openGL, and the threads processed by it are openGL threads.
5、根据权利要求 4所述的在单个处理器上实现多应用并行处理的方法, 5. The method of realizing parallel processing of multiple applications on a single processor according to claim 4,
12 其特征在于, 所述步骤 D )进一步包括: 12 It is characterized in that said step D) further includes:
D1 )所述线程产生属于其自身的 ithread调用指令到硬件线程控制 单元; D1) The thread generates its own ithread call instruction to the hardware thread control unit;
D2 )所述硬件线程控制单元将所述 ithread的调用指令按照接收时 间形成其程序队列, 调用并准备所述 ithread; D2) The hardware thread control unit forms the program queue of the calling instructions of the ithread according to the reception time, calls and prepares the ithread;
D3 )所述 ithread按照其在所述硬件线程控制单元中的队列顺序依 次在所述硬件管理器控制的处理单元的、 空闲的多路并行硬件线程时隙中 运行; D3) The ithreads run sequentially in the idle multi-channel parallel hardware thread time slots of the processing unit controlled by the hardware manager according to their queue order in the hardware thread control unit;
其中, 所述 ithread为硬件线程, 所述 ithread包括图像引擎、 DSP 或 /和通用图像处理器中要求硬件加速的线程。 Wherein, the ithread is a hardware thread, and the ithread includes threads in the image engine, DSP or/and general image processor that require hardware acceleration.
6、根据权利要求 5所述的在单个处理器上实现多应用并行处理的方法, 其特征在于, 所述步骤 D )进一步包括: 6. The method for implementing multi-application parallel processing on a single processor according to claim 5, characterized in that step D) further includes:
D01 )判断所述硬件线程控制单元中是否有有效且未执行完的硬件 线程, 如有, 执行步骤 D02 ); 否则, 执行步骤 D03 ); D01) Determine whether there is a valid and unfinished hardware thread in the hardware thread control unit. If so, execute step D02); otherwise, execute step D03);
D02 )将当前空闲的多路并行硬件线程时隙由***线程管理单元中 移除, 禁止该并行硬件线程时隙的线程定时器中断, 并将该空闲的多路并 行硬件线程时隙配置给所述硬件线程控制单元控制; D02) Remove the currently idle multi-channel parallel hardware thread time slot from the system thread management unit, prohibit the thread timer interrupt of the parallel hardware thread time slot, and configure the idle multi-channel parallel hardware thread time slot to all Controlled by the hardware thread control unit;
D03 )等待并返回该并行硬件线程时隙空闲的信息到***线程管理 单元。 D03) Wait for and return the idle information of the parallel hardware thread time slot to the system thread management unit.
7、根据权利要求 5所述的在单个处理器上实现多应用并行处理的方法, 其特征在于, 所述步骤 D )还包括如下步骤: 7. The method for implementing multi-application parallel processing on a single processor according to claim 5, characterized in that step D) further includes the following steps:
当所述 ithread执行完毕或进入等待使其继续执行的事件发生时, 所述 ithread退出其运行的硬件线程时隙并使能该时隙的线程计时中断; 所述硬件线程控制单元检测其运行队列中的 ithread的有效状态是 否被清除, 如是, 清除所述 ithread; 否则, 保持所述 ithread。 When the ithread completes execution or enters a waiting event to continue execution, the ithread exits the hardware thread time slot in which it is running and enables the thread timing interruption of the time slot; the hardware thread control unit detects its running queue Whether the effective status of the ithread in is cleared, if so, clear the ithread; otherwise, keep the ithread.
8、 一种实现如权利要求 1所述在单个处理器上实现多应用并行处理方 法的装置, 其特征在于, 所述单个处理器中设置有多个并行的处理单元, 所述装置包括: 8. A device for implementing a multi-application parallel processing method on a single processor as claimed in claim 1, characterized in that, the single processor is provided with multiple parallel processing units, The device includes:
应用分配单元: 用于分别将多个任务分配到作为 SMP核的多个处 理单元运行; Application allocation unit: used to allocate multiple tasks to multiple processing units as SMP cores;
应用判断单元: 用于分别判断所述多个任务是否图像渲染应用; 计算应用处理单元:用于使用同质并行编程 API处理所述任务产生 的至少一个线程, 并将所述处理过的线程配置到作为 SMP核的处理单元上 运行; Application judgment unit: used to judge whether the multiple tasks are image rendering applications; computing application processing unit: used to use the homogeneous parallel programming API to process at least one thread generated by the task, and configure the processed thread Run on the processing unit as the SMP core;
图形加速运算单元: 用于由所述任务产生至少一个着色线程到硬件 线程控制单元, 并通过 GPU驱动在所述硬件管理器控制的处理单元上开始 渲染; 所述硬件管理单元控制的处理单元由所述硬件管理单元向***取得。 Graphics acceleration operation unit: used to generate at least one shading thread from the task to the hardware thread control unit, and start rendering on the processing unit controlled by the hardware manager through the GPU driver; the processing unit controlled by the hardware management unit is The hardware management unit obtains it from the system.
9、 根据权利要求 8所述的装置, 其特征在于, 还包括: 9. The device according to claim 8, further comprising:
GPGPU 线程处理单元: 用于判断是否硬件控制器控制的处理单元 空闲而任务队列中还有任务待处理, 如是, 使用异质并行编程 API处理所 述任务产生的至少一个线程, 并将其配置到所述硬件控制器控制的空闲处 理单元上运行。 GPGPU thread processing unit: Used to determine whether the processing unit controlled by the hardware controller is idle and there are tasks to be processed in the task queue. If so, use the heterogeneous parallel programming API to process at least one thread generated by the task, and configure it to The hardware controller controls running on idle processing units.
10、 根据权利要求 9所述的装置, 其特征在于, 所述图形加速处理单 元进一步包括: 10. The device according to claim 9, wherein the graphics acceleration processing unit further includes:
调用指令产生模块: 用于使所述线程产生属于其自身的 ithread 调用指令到硬件线程控制单元; Calling instruction generation module: used to cause the thread to generate its own ithread calling instruction to the hardware thread control unit;
队列形成模块: 用于使所述硬件线程控制单元将所述 ithread 的调用指令按照接收时间形成其程序队列, 调用并准备所述 ithread; Queue formation module: used to enable the hardware thread control unit to form the calling instructions of the ithread into its program queue according to the reception time, call and prepare the ithread;
线程分配模块:用于使所述 ithread按照其在所述硬件线程控制 单元中的队列顺序依次在所述硬件管理器控制的处理单元的、 空闲的多路 并行硬件线程时隙中运行; Thread allocation module: used to make the ithread run in the idle multi-channel parallel hardware thread time slots of the processing unit controlled by the hardware manager in sequence according to its queue order in the hardware thread control unit;
线程中断模块: 用于当所述 ithread执行完毕或进入等待使其 继续执行的事件发生时, 所述 ithread退出其运行的硬件线程时隙并使能该 时隙的线程计时中断; Thread interrupt module: used to exit the hardware thread time slot in which the ithread is running and enable the thread timing interrupt of the time slot when the ithread completes execution or enters a waiting event to continue execution;
14 线程清除模块: 用于使所述硬件线程控制单元检测其运行队列 中的 ithread的有效状态是否被清除, 如是, 清除所述 ithread; 否则, 保持 所述 ithread 14 Thread clearing module: used to enable the hardware thread control unit to detect whether the effective status of the ithread in its run queue has been cleared. If so, clear the ithread; otherwise, keep the ithread.
其中, 所述 ithread为硬件线程, 所述 ithread包括图像引擎、 DSP 或 /和通用图像处理器中要求硬件加速的线程。 Wherein, the ithread is a hardware thread, and the ithread includes threads in the image engine, DSP or/and general image processor that require hardware acceleration.
15 15
PCT/CN2013/085906 2012-12-26 2013-10-24 Method and device for implementing multi-application parallel processing on single processor WO2014101561A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210578341.8 2012-12-26
CN201210578341.8A CN103064657B (en) 2012-12-26 2012-12-26 Realize the method and device applying parallel processing on single processor more

Publications (1)

Publication Number Publication Date
WO2014101561A1 true WO2014101561A1 (en) 2014-07-03

Family

ID=48107295

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/085906 WO2014101561A1 (en) 2012-12-26 2013-10-24 Method and device for implementing multi-application parallel processing on single processor

Country Status (2)

Country Link
CN (1) CN103064657B (en)
WO (1) WO2014101561A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220269244A1 (en) * 2021-02-23 2022-08-25 Changxin Memory Technologies, Inc. Control method, apparatus, system, device and medium for production equipment

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750132B (en) * 2012-06-13 2015-02-11 深圳中微电科技有限公司 Thread control and call method for multithreading virtual assembly line processor, and processor
CN103064657B (en) * 2012-12-26 2016-09-28 深圳中微电科技有限公司 Realize the method and device applying parallel processing on single processor more
CN104866295B (en) * 2014-02-25 2018-03-06 华为技术有限公司 The design method and device of OpenCL runtime system frameworks
CN103995746A (en) * 2014-04-24 2014-08-20 深圳中微电科技有限公司 Method of realizing graphic processing in harmonic processor and harmonic processor
CN104714850B (en) * 2015-03-02 2016-03-30 心医国际数字医疗***(大连)有限公司 A kind of isomery based on OPENCL calculates equalization methods jointly
CN104793996A (en) * 2015-04-29 2015-07-22 中芯睿智(北京)微电子科技有限公司 Task scheduling method and device of parallel computing equipment
CN105653243B (en) * 2015-12-23 2018-03-30 北京大学 The task distributing method that a kind of graphics processing unit Multi-task Concurrency performs
US20180033114A1 (en) * 2016-07-26 2018-02-01 Mediatek Inc. Graphics Pipeline That Supports Multiple Concurrent Processes
CN107436760A (en) * 2017-06-30 2017-12-05 百度在线网络技术(北京)有限公司 Multiwindow rendering intent and device
CN107423014A (en) * 2017-06-30 2017-12-01 百度在线网络技术(北京)有限公司 Multiwindow rendering intent and device
CN109614230B (en) * 2018-12-03 2021-08-17 联想(北京)有限公司 Resource virtualization method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102147722A (en) * 2011-04-08 2011-08-10 深圳中微电科技有限公司 Multithreading processor realizing functions of central processing unit and graphics processor and method
US20120185671A1 (en) * 2011-01-14 2012-07-19 Qualcomm Incorporated Computational resource pipelining in general purpose graphics processing unit
CN102750132A (en) * 2012-06-13 2012-10-24 深圳中微电科技有限公司 Thread control and call method for multithreading virtual assembly line processor, and processor
CN103064657A (en) * 2012-12-26 2013-04-24 深圳中微电科技有限公司 Method and device for achieving multi-application parallel processing on single processors

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7812844B2 (en) * 2004-01-28 2010-10-12 Lucid Information Technology, Ltd. PC-based computing system employing a silicon chip having a routing unit and a control unit for parallelizing multiple GPU-driven pipeline cores according to the object division mode of parallel operation during the running of a graphics application
US8854381B2 (en) * 2009-09-03 2014-10-07 Advanced Micro Devices, Inc. Processing unit that enables asynchronous task dispatch
CN101706741B (en) * 2009-12-11 2012-10-24 中国人民解放军国防科学技术大学 Method for partitioning dynamic tasks of CPU and GPU based on load balance

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120185671A1 (en) * 2011-01-14 2012-07-19 Qualcomm Incorporated Computational resource pipelining in general purpose graphics processing unit
CN102147722A (en) * 2011-04-08 2011-08-10 深圳中微电科技有限公司 Multithreading processor realizing functions of central processing unit and graphics processor and method
CN102750132A (en) * 2012-06-13 2012-10-24 深圳中微电科技有限公司 Thread control and call method for multithreading virtual assembly line processor, and processor
CN103064657A (en) * 2012-12-26 2013-04-24 深圳中微电科技有限公司 Method and device for achieving multi-application parallel processing on single processors

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220269244A1 (en) * 2021-02-23 2022-08-25 Changxin Memory Technologies, Inc. Control method, apparatus, system, device and medium for production equipment

Also Published As

Publication number Publication date
CN103064657B (en) 2016-09-28
CN103064657A (en) 2013-04-24

Similar Documents

Publication Publication Date Title
WO2014101561A1 (en) Method and device for implementing multi-application parallel processing on single processor
JP6646114B2 (en) Dynamic virtual machine sizing
KR102432380B1 (en) Method for performing WARP CLUSTERING
WO2013185571A1 (en) Thread control and invoking method of multi-thread virtual assembly line processor, and processor thereof
KR20210057184A (en) Accelerate data flow signal processing applications in heterogeneous CPU/GPU systems
US20170076421A1 (en) Preemptive context switching of processes on an accelerated processing device (apd) based on time quanta
JP2014516192A5 (en)
US20120284720A1 (en) Hardware assisted scheduling in computer system
JP2009541848A5 (en)
WO2011032327A1 (en) Parallel processor and method for thread processing thereof
JP5792722B2 (en) Computer system
US20090133029A1 (en) Methods and systems for transparent stateful preemption of software system
CN111459647B (en) DSP multi-core processor parallel operation method and device based on embedded operating system
US10289306B1 (en) Data storage system with core-affined thread processing of data movement requests
JP2009223842A (en) Virtual machine control program and virtual machine system
JP5866430B2 (en) Dynamic allocation of processor cores running operating systems
US8977752B2 (en) Event-based dynamic resource provisioning
JP5726006B2 (en) Task and resource scheduling apparatus and method, and control apparatus
US8539491B1 (en) Thread scheduling in chip multithreading processors
US11061730B2 (en) Efficient scheduling for hyper-threaded CPUs using memory monitoring
Andersson et al. Finding an upper bound on the increase in execution time due to contention on the memory bus in COTS-based multicore systems
WO2013078733A1 (en) Method for eliminating texturing delay and register managing in mvp processor
TWI442323B (en) Task scheduling and allocation for multi-core/many-core management framework and method thereof
JP5678347B2 (en) IT system configuration method, computer program thereof, and IT system
JP2012203911A (en) Improvement of scheduling of task to be executed by asynchronous device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13866865

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 02.11.2015)

122 Ep: pct application non-entry in european phase

Ref document number: 13866865

Country of ref document: EP

Kind code of ref document: A1