JP2008090546A

JP2008090546A - Multiprocessor system

Info

Publication number: JP2008090546A
Application number: JP2006269777A
Authority: JP
Inventors: Tokuji Shono; 篤司庄野; Hidenori Matsuzaki; 秀則松崎; Tatsuya Mori; 達矢森; Shigehiro Asano; 滋博浅野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2006-09-29
Filing date: 2006-09-29
Publication date: 2008-04-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a multiprocessor system dynamically allocating tasks to processor cores efficiently and dynamically according to each of operation states. <P>SOLUTION: A multiprocessor core 5 is provided with a core A, which has a mechanism for improving performance of data processing and a performance monitor gathering usage information of hardware resources in use or those used for data processing, and cores B and C each having a mechanism, which has a processing system same as the first processing mechanism and inferior performance, and an IPC monitor measuring an IPC value in data processing. A scheduler auxiliary part 6 feeds to the core A a task to be executed for the first time or to be reallocated by the IPC value, and selects one task as a task, which has been executed before and needs no reallocation by the IPC value, from the cores A-C by referring to usage information of the task. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ヘテロジーニアスなマルチプロセッサシステムに関するものであり、特に、タスクをプロセッサコアに割り当てるためのマルチプロセッサシステムに関する。 The present invention relates to a heterogeneous multiprocessor system, and more particularly to a multiprocessor system for assigning tasks to processor cores.

プロセッサの高速化を目的として、従来から様々な機構、例えばキャッシュ機構、分岐予測機構、スーパースカラ機構、アウトオブオーダ機構、SIMD機構などが提案されている。これらの機構を採用することによって命令レベルの並列度の向上、各種ストールによるペナルティの回避、データレベル並列性の有効利用等が実現でき、その結果プロセッサの処理能力は向上してきた。なお、これらの機構はプロセッサの処理能力向上に寄与する一方で、実装面積や消費電力に対してはマイナスとなり、これらはトレードオフの関係にある。処理能力がこれらの機構によって高速化可能かどうかはソフトウェア次第であり、場合によっては全く速度向上が得られない可能性もあり得る。 For the purpose of speeding up the processor, various mechanisms such as a cache mechanism, a branch prediction mechanism, a superscalar mechanism, an out-of-order mechanism, and a SIMD mechanism have been proposed. By adopting these mechanisms, it is possible to improve instruction level parallelism, avoid penalties caused by various stalls, and effectively use data level parallelism. As a result, the processing capacity of the processor has been improved. While these mechanisms contribute to improving the processing capacity of the processor, they have a negative effect on the mounting area and power consumption, and these are in a trade-off relationship. Whether processing speed can be increased by these mechanisms depends on the software, and in some cases, it may not be possible to improve the speed at all.

また、システムの演算能力を向上させる手段として上記のようなプロセッサを複数並列に動作させるマルチプロセッサ方式が提案されており、近年ではプロセスの微細化によって１つのチップに複数のプロセッサコアを搭載するマルチコアプロセッサ方式も実現されている。このマルチコアプロセッサ方式を採用することで、ソフトウェアの独立した処理単位であるタスクを１チップ内で複数並列に実行できるようになってきている。 Further, as a means for improving the computing capability of the system, a multiprocessor system has been proposed in which a plurality of processors as described above are operated in parallel. In recent years, a multicore in which a plurality of processor cores are mounted on one chip by miniaturization of the process. A processor system is also realized. By adopting this multi-core processor method, a plurality of tasks, which are independent processing units of software, can be executed in parallel in one chip.

さらに、このマルチコアプロセッサが複数の異なる種類のプロセッサコアにより構成されたものが存在し、これをヘテロジーニアスなマルチコアプロセッサと呼ぶ。ヘテロジーニアスなマルチコアプロセッサを構成するプロセッサコアとしては、汎用プロセッサコア、DSPコア、専用ハードウェア処理エンジンなどがあり、例えばCELLプロセッサのように２つの異なる汎用プロセッサコアで構成されたマルチコアプロセッサもヘテロジーニアスなマルチコアプロセッサと呼ばれる。 In addition, there are those in which this multi-core processor is composed of a plurality of different types of processor cores, and this is called a heterogeneous multi-core processor. The processor cores that make up a heterogeneous multi-core processor include general-purpose processor cores, DSP cores, and dedicated hardware processing engines. Called a multi-core processor.

ヘテロジーニアスなマルチコアプロセッサでは、複数の異なる種類のプロセッサコアを用意し、タスク毎に処理に最も適したプロセッサコアを利用することで効率の良い処理を実現する。例えば前記CELLプロセッサにおいてはメディア処理に適したプロセッサコア（ＳＰＥ）が8個、ＯＳなどの処理に適したプロセッサコア（ＰＰＥ）が１個というマルチコア構成になっている。
“10.2 The Design and Implementation of a First-Generation CELL Processor” D. Pham et al., 2005 IEEE International Solid-State Circuits Conference (ISSCC) In a heterogeneous multi-core processor, a plurality of different types of processor cores are prepared, and efficient processing is realized by using the processor core most suitable for processing for each task. For example, the CELL processor has a multi-core configuration with eight processor cores (SPE) suitable for media processing and one processor core (PPE) suitable for OS processing.
“10.2 The Design and Implementation of a First-Generation CELL Processor” D. Pham et al., 2005 IEEE International Solid-State Circuits Conference (ISSCC)

ヘテロジーニアスな構成のマルチコアプロセッサにおいては、どのタスクをどのプロセッサコアで実行するかというタスク割り当てが重要である。既存のヘテロジーニアスなマルチコアプロセッサでは、どのタスクがどのようなプロセッサコア上で実行されるべきかは、予めソフトウェア開発者やツールによって静的に決定される。 In a multi-core processor having a heterogeneous configuration, task assignment as to which task is executed by which processor core is important. In an existing heterogeneous multi-core processor, which task should be executed on which processor core is statically determined in advance by a software developer or a tool.

しかし、例えば「キャッシュ容量のみが異なる２種類のプロセッサコアが存在する場合に、そのどちらにタスクを割り当てるべきか」、「アウトオブオーダ機構を持つプロセッサコアとそれを持たないプロセッサコアが存在する場合に、そのどちらにタスクを割り当てるべきか」といった選択に関しては、必ずしも静的な解析が最適に行えるとは限らない。つまりマルチコアプロセッサを構成するプロセッサコアの種類によっては静的なタスク割り当てでは最適な解が得られない可能性がある。 However, for example, "When there are two types of processor cores that differ only in cache capacity, to which task should be assigned", "When there are processor cores that have an out-of-order mechanism and processor cores that do not have it In regard to the selection of “whether the task should be assigned to it”, the static analysis is not always optimal. In other words, depending on the types of processor cores constituting the multi-core processor, there is a possibility that an optimal solution cannot be obtained by static task assignment.

また、プロセスの微細化により１つのチップに搭載できるプロセッサコア数が増加し、それとともにより多くの種類のコアが搭載できるようになると、さらにタスクを静的に割り当てることは困難となってくる。 Further, as the number of processor cores that can be mounted on one chip increases due to process miniaturization, and more types of cores can be mounted at the same time, it becomes more difficult to assign tasks statically.

一方、タスクを動的に割り当てることを仮定すると、タスク群が構成するジョブの実行経過や入力データの変遷に伴い、ジョブの実行開始当初と比較して各タスクが必要とするハードウェアリソース（キャッシュサイズ、分岐予測精度、SIMD機能等）が変化することが考えられ、単純にタスクを動的に割り当てると、結局は効率が劣化する可能性をはらんでいる。 On the other hand, assuming that tasks are dynamically allocated, the hardware resources (cache) required by each task as compared to the beginning of job execution as the job progresses and the input data changes. Size, branch prediction accuracy, SIMD function, etc.) may change, and simply assigning tasks dynamically has the potential to degrade efficiency.

そこで本発明では、ヘテロジーニアスなマルチコアプロセッサにおいて、タスクを実行状況に応じて都度動的に、効率よくプロセッサコアへ割り当てるマルチプロセッサシステムを提供することを目的とする。 Therefore, an object of the present invention is to provide a multiprocessor system that dynamically allocates tasks to processor cores dynamically and efficiently according to the execution status in a heterogeneous multicore processor.

本発明のマルチプロセッサシステムは、データ処理の処理性能を向上する第１の処理機構と、データ処理での利用途中あるいは利用されたハードウェア資源の利用情報を収集するパフォーマンスモニタとを有する第１のプロセッサコアと、該第１の処理機構と同一処理方式で処理性能が劣る第２の処理機構と、データ処理された際のＩＰＣ値を計測するＩＰＣモニタとを有する第２のプロセッサコアとを備えるマルチプロセッサコアと、同一のタスクを含む複数のタスクを備えるアプリケーションソフトの実行時において、初めて実行するタスクまたは前記ＩＰＣモニタでの計測結果により再割り当てが必要なタスクは前記第１のプロセッサコアへ供給し、以前に実行されたことがあり、且つ前記ＩＰＣモニタでの計測結果により再割り当てが不要なタスクは前記パフォーマンスモニタで以前に収集された該タスクの前記ハードウェア資源の利用情報を参照して前記マルチプロセッサコアから処理させるプロセッサコアを一つ選択し、選択したプロセッサコアへ供給するスケジューリング手段とを備えたことを特徴とする。 The multiprocessor system of the present invention includes a first processing mechanism that improves the processing performance of data processing, and a performance monitor that collects usage information of hardware resources being used or used in data processing. A second processor core having a processor core, a second processing mechanism having the same processing method as the first processing mechanism and inferior in processing performance, and an IPC monitor for measuring an IPC value when data is processed. When executing application software including a multiprocessor core and a plurality of tasks including the same task, a task to be executed for the first time or a task that needs to be reassigned according to a measurement result of the IPC monitor is supplied to the first processor core Reassigned according to the measurement result of the IPC monitor that has been executed before An unnecessary task refers to scheduling information to be supplied to the selected processor core by selecting one processor core to be processed from the multiprocessor core by referring to the utilization information of the hardware resource of the task previously collected by the performance monitor. Means.

また、本発明のマルチプロセッサシステムは、データ処理の処理性能を向上するための、互いに異なる複数の処理機構と、データ処理での利用途中あるいは利用されたハードウェア資源の利用情報を収集するパフォーマンスモニタとを備える第１のプロセッサコアと、該複数の処理機構の全てによって得られる処理性能未満となり、且つ、該複数の処理機構のそれぞれの処理性能以下となる、少なくとも一つ以上の処理機構と、データ処理された際のＩＰＣ値を計測するＩＰＣモニタとを有する第２のプロセッサコアとを備えるマルチプロセッサコアと、同一のタスクを含む複数のタスクを備えるアプリケーションソフトの実行時において、初めて実行するタスクまたは前記ＩＰＣモニタでの計測結果により再割り当てが必要なタスクは前記第１のプロセッサコアへ供給し、以前に実行されたことがあり、且つ前記ＩＰＣモニタでの計測結果により再割り当てが不要なタスクは前記パフォーマンスモニタで以前に収集された該タスクの前記ハードウェア資源の利用情報を参照して前記マルチプロセッサコアから処理させるプロセッサコアを一つ選択し、選択したプロセッサコアへ供給するスケジューリング手段とを備えたことを特徴とする。 In addition, the multiprocessor system of the present invention includes a plurality of different processing mechanisms for improving the processing performance of data processing, and a performance monitor that collects usage information of hardware resources used or used during data processing. And at least one processing mechanism that is less than the processing performance obtained by all of the plurality of processing mechanisms and is equal to or lower than the processing performance of each of the plurality of processing mechanisms, Task to be executed for the first time when executing a multiprocessor core having a second processor core having an IPC monitor for measuring an IPC value when data is processed, and application software having a plurality of tasks including the same task Or the task that needs to be reassigned according to the measurement result in the IPC monitor A task that has been executed before and that does not need to be reassigned according to the measurement result of the IPC monitor, is the number of the hardware resources of the task previously collected by the performance monitor. Scheduling means that selects one processor core to be processed from the multiprocessor core with reference to usage information and supplies the selected processor core to the selected processor core.

また、本発明のマルチプロセッサは、データ処理の処理性能を向上するための、少なくとも互いに異なる第１から第４の４つの処理機構があり、第１の機構、第２の処理機構、およびデータ処理での利用途中あるいは利用されたハードウェア資源の利用情報を収集する第１のパフォーマンスモニタを備える第１のプロセッサコアと、第３の機構、第４の処理機構、およびデータ処理での利用途中あるいは利用されたハードウェア資源の利用情報を収集する第２のパフォーマンスモニタを備える第２のプロセッサコアと、第１および第３の処理機構、および、データ処理された際のＩＰＣ値を計測するＩＰＣモニタとを備える第３のプロセッサコアとを備えるマルチプロセッサコアと、同一のタスクを含む複数のタスクを備えるソフトウェアの実行時において、初めて実行するタスクまたは前記ＩＰＣモニタでの計測結果により再割り当てが必要なタスクは前記第１のプロセッサコア及び第２のプロセッサコアへ供給し、以前に実行されたことがあり、且つ前記ＩＰＣモニタでの計測結果により再割り当てが不要なタスクは前記第１パフォーマンスモニタおよび第２のパフォーマンスモニタで以前に収集された該タスクの前記ハードウェア資源の利用情報を参照して前記マルチプロセッサコアから処理させるプロセッサコアを一つ選択し、選択したプロセッサコアへ供給するスケジューリング手段とを備えたことを特徴とする。 In addition, the multiprocessor of the present invention has at least four first to fourth processing mechanisms different from each other for improving the processing performance of data processing. The first mechanism, the second processing mechanism, and the data processing A first processor core having a first performance monitor that collects usage information of used hardware resources, a third mechanism, a fourth processing mechanism, and a data processing halfway A second processor core having a second performance monitor that collects utilization information of used hardware resources, first and third processing mechanisms, and an IPC monitor that measures an IPC value when data is processed A multiprocessor core comprising a third processor core comprising a plurality of tasks, and a software comprising a plurality of tasks including the same task At the time of execution, a task that is executed for the first time or a task that needs to be reassigned according to a measurement result in the IPC monitor is supplied to the first processor core and the second processor core, and has been executed before; A task that does not need to be reassigned according to the measurement result of the IPC monitor refers to the utilization information of the hardware resources of the task previously collected by the first performance monitor and the second performance monitor. And a scheduling means for selecting one processor core to be processed and supplying the selected processor core to the selected processor core.

本発明によれば、ヘテロジーニアスなマルチコアプロセッサにおいて、タスクを実行状況に応じて都度動的に効率よくプロセッサコアへ割り当てることができるようになった。 According to the present invention, in a heterogeneous multi-core processor, tasks can be dynamically and efficiently allocated to processor cores depending on the execution status.

以下、本発明の実施の形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本実施の形態に係るシステム全体の構成を図１に示す。本システムは、プロセッサ装置１、主記憶装置２、ディスク装置３、外部入出力装置４から構成され、これらはシステムバスを介して接続されている。プロセッサ装置１は、複数のプロセッサコア部５とスケジューラ補助部６とを備える（プロセッサ装置１の詳細は後述）。外部入出力装置４は、図示しないキーボード、マウス、表示装置などの入力・出力デバイスと接続される。 FIG. 1 shows the configuration of the entire system according to this embodiment. This system comprises a processor device 1, a main storage device 2, a disk device 3, and an external input / output device 4, which are connected via a system bus. The processor device 1 includes a plurality of processor core units 5 and a scheduler auxiliary unit 6 (details of the processor device 1 will be described later). The external input / output device 4 is connected to input / output devices such as a keyboard, a mouse, and a display device (not shown).

ディスク装置３は、このシステムで実行するための各種ソフトウェアを記憶しており、オペレーティングシステム（ＯＳ）やアプリケーションプログラム（アプリケーション１、アプリケーション２）を含む。 The disk device 3 stores various software to be executed by this system, and includes an operating system (OS) and application programs (application 1 and application 2).

アプリケーションプログラムは、粒度の細かい実行単位である１つまたは複数のタスクで構成され、例えば図１ではアプリケーション１は、タスク１、２、３という３つのタスクで、アプリケーション２はタスク４、５という２つのタスクで構成されていることを例示している。アプリケーションプログラムの実行は、それを構成するタスクを適宜実行することで実現され、例えばアプリケーション１の実行において、各タスクが実行されるのみならず、同じタスクが複数回実行されたり、また、場合によっては同時に実行されることもある。また本実施の形態においては、各タスクがスレッドと呼ばれる実行単位であることを想定して記述するが、これに限らず、タスクはスケジューリングによってプロセッサコア部５に対して割り当てられるソフトウェアの単位であれば良く、例えばプロセス等を想定したものもその対象に含まれる。 The application program is composed of one or a plurality of tasks that are fine-grained execution units. For example, in FIG. 1, the application 1 includes three tasks 1, 2, and 3, and the application 2 includes 2 tasks 4 and 5. It is illustrated that it consists of two tasks. The execution of the application program is realized by appropriately executing the tasks that constitute the application program. For example, in the execution of the application 1, not only each task is executed, but also the same task is executed a plurality of times. May be executed simultaneously. In the present embodiment, description is made on the assumption that each task is an execution unit called a thread. However, the present invention is not limited to this, and the task may be a unit of software assigned to the processor core unit 5 by scheduling. For example, a process or the like is also included in the target.

ＯＳは、プロセッサコア部５のうちの一つで実行され、これによりシステム全体の管理が行われる。また、ＯＳは、スケジューラを備え、スケジューラ補助装置６と協調しながらタスクのスケジューリングを行う。 The OS is executed by one of the processor core units 5, thereby managing the entire system. In addition, the OS includes a scheduler and performs task scheduling in cooperation with the scheduler auxiliary device 6.

ユーザが外部入出力装置４を介し、あるアプリケーションプログラムの実行をＯＳへ指示すると、ＯＳのスケジューラは、そのアプリケーションプログラムを構成する複数のタスクから必要に応じてスケジューラ補助装置６へ実行するタスクを通知し、スケジューラはそのタスクを実行可能なプロセッサコア部５に割り当て、そのプロセッサコア部５は割り当てられたタスクを処理することによって、そのアプリケーションプログラムの実行を進めていく。また、そのアプリケーションプログラムの実行中に別のアプリケーションプログラムの実行が指示された場合、スケジューラはその別のアプリケーションプログラムを構成する複数のタスクを必要に応じてスケジューリングの対象として加えることで、複数のアプリケーションプログラムが並行して実行が進む。 When the user instructs the OS to execute a certain application program via the external input / output device 4, the scheduler of the OS notifies the task to be executed to the scheduler auxiliary device 6 as necessary from a plurality of tasks constituting the application program. Then, the scheduler assigns the task to the executable processor core unit 5, and the processor core unit 5 advances the execution of the application program by processing the assigned task. In addition, when the execution of another application program is instructed during the execution of the application program, the scheduler adds a plurality of tasks constituting the other application program as targets of scheduling as necessary, thereby The program proceeds in parallel.

次にプロセッサ装置１の全体構成を図２に示す。 Next, the overall configuration of the processor device 1 is shown in FIG.

プロセッサ装置１は、ここではＮ＋１個のプロセッサコア部５（コアＡ、・・・コアＮ、コアＺ）を備えるマルチプロセッサであり、それぞれが内部バスを介して相互に接続されている。 Here, the processor device 1 is a multiprocessor including N + 1 processor core units 5 (core A,... Core N, core Z), which are connected to each other via an internal bus.

コアＺは、ＯＳ実行のために予め予約されるプロセッサコア部５である。残りのプロセッサコア部５である、コアＡ、・・・、コアＮは、それぞれ複数の処理機構を備えている。ここで処理機構とは、プロセッサの高速化を目的とした処理機能を指すものであって、例えば、キャッシュ機構、分岐予測機構、スーパースカラ機構、アウトオブオーダ機構、ＳＩＭＤ機構などを指す。つまり、このプロセッサ装置１は、各プロセッサコア部５が複数の異なる処理機構で構成された、ヘテロジーニアスなマルチコアプロセッサである。 The core Z is a processor core unit 5 reserved in advance for OS execution. The remaining processor core units 5, core A,..., Core N, each include a plurality of processing mechanisms. Here, the processing mechanism refers to a processing function for the purpose of speeding up the processor, and includes, for example, a cache mechanism, a branch prediction mechanism, a superscalar mechanism, an out-of-order mechanism, a SIMD mechanism, and the like. That is, the processor device 1 is a heterogeneous multi-core processor in which each processor core unit 5 is configured by a plurality of different processing mechanisms.

コアＡは、他のコアＢ・・・コアＮが備える各処理機構と同じ、またはそれ以上の性能を持った機能ブロックを備えている。更に、コアＡは、タスクが実行されている最中または、タスクが実行された際のコアＡが有するハードウェア資源の利用情報を収集するパフォーマンスモニタ装置（ＰＭ装置）を備えている。一方、他のコアＢ・・・コアＮにはＩＰＣ（ＩｎｓｔｒｕｃｔｉｏｎｓＰｅｒＣｙｃｌｅ）値を計測するＩＰＣモニタ装置をそれぞれ備えている。なお、ＩＰＣ値については後述する。 The core A includes functional blocks having the same or higher performance as the processing mechanisms included in the other cores B ... N. Furthermore, the core A includes a performance monitor device (PM device) that collects usage information of hardware resources of the core A when the task is being executed or when the task is executed. On the other hand, the other core B... Core N is provided with an IPC monitor device for measuring IPC (Instructions Per Cycle) values. The IPC value will be described later.

一方、コアＢ・・・コアＮは、コアＡが備える全ての処理機構によって得られる処理性能未満となり、且つ、該複数の処理機構のそれぞれの処理性能以下となる処理機構をそれぞれ備える。 On the other hand, the core B... The core N includes processing mechanisms that are less than the processing performance obtained by all the processing mechanisms included in the core A and that are equal to or lower than the processing performances of the plurality of processing mechanisms.

また、プロセッサ装置１は、スケジューラ補助部６を備える、スケジューラ補助部６は、同一のタスクの実行を含む複数のタスクを備えるアプリケーションプログラムを実行する際に、各タスクをどのプロセッサコア部（コアＡ〜Ｎの何れか）で実行させるかを割り振るものである。スケジューラ補助部６は、初めて実行する、または再割り当てが必要なタスクの場合には、必ずコアＡへ供給するよう割り振る。また、スケジューラ補助部６は、以前に実行されたことがあり、再割り当てが不要なタスクを再実行する場合には、前記パフォーマンスモニタ装置で以前に収集された該タスクの前記ハードウェア資源の利用情報を参照して、処理させるプロセッサコア部（コアＡ〜Ｎ）を一つ選択し、選択したプロセッサコア部（コアＡ〜Ｎの何れか一つ）へ供給する。 In addition, the processor device 1 includes a scheduler auxiliary unit 6. The scheduler auxiliary unit 6 executes each application program including a plurality of tasks including execution of the same task, which processor core unit (core A To any one of -N). In the case of a task that is executed for the first time or needs to be reassigned, the scheduler auxiliary unit 6 always assigns it to be supplied to the core A. When the scheduler auxiliary unit 6 re-executes a task that has been executed before and does not require reassignment, the scheduler auxiliary unit 6 uses the hardware resources of the task previously collected by the performance monitor device. With reference to the information, one processor core unit (cores A to N) to be processed is selected and supplied to the selected processor core unit (any one of the cores A to N).

また、プロセッサ装置１は、システムバスＩ／Ｆ部７を備える。システムバスＩ／Ｆ部７は、内部バスとシステムバスとを接続するためのインターフェースである。 The processor device 1 also includes a system bus I / F unit 7. The system bus I / F unit 7 is an interface for connecting the internal bus and the system bus.

次に上記で示したプロセッサ装置１の全体の概略動作を図３に示す。 Next, the overall schematic operation of the processor device 1 described above is shown in FIG.

まず、ユーザからアプリケーションプログラムの実行要求により、コアＺ上のＯＳは、アプリケーションプログラムの複数のタスクを実行順に、順次スケジューラ補助部６へ供給し、スケジューラ補助部６は、供給されたタスクを一時保持しつつ、実行順にタスクを取り出す（Ｓ１１）。取り出したタスクが初めて実行するタスクか、または再割り当てが必要なタスクか否かを判断する（Ｓ１２）。タスクが、初めて実行または再割り当てが必要な場合にはコアＡへ供給する（Ｓ１３）。そして、タスクの実行が終了すると、スケジューラ補助装置６は、パフォーマンスモニタ装置（ＰＭ装置）で収集した該タスクの前記ハードウェア資源の利用情報（ＰＭ情報）を受信する（Ｓ１４）。スケジューラ補助装置６は、前記利用情報を該タスクを示す情報と対応付けて保持する（Ｓ１５）。 First, in response to an application program execution request from the user, the OS on the core Z sequentially supplies a plurality of tasks of the application program to the scheduler auxiliary unit 6 in order of execution, and the scheduler auxiliary unit 6 temporarily holds the supplied tasks. However, tasks are taken out in the order of execution (S11). It is determined whether the extracted task is a task to be executed for the first time or a task requiring reassignment (S12). When a task needs to be executed or reassigned for the first time, it is supplied to the core A (S13). When the execution of the task ends, the scheduler auxiliary device 6 receives the hardware resource usage information (PM information) collected by the performance monitor device (PM device) (S14). The scheduler auxiliary device 6 holds the usage information in association with information indicating the task (S15).

一方、初めての実行でないまたは再割り当てが不要な場合には、パフォーマンスモニタ装置（ＰＭ装置）で以前に収集された該タスクの前記ハードウェア資源の利用情報を参照し、このタスクを実行させるプロセッサコア部５（コアＡ〜Ｚ）の一つを選択し、選択したプロセッサコア部５へタスクを供給する（Ｓ１６）。タスクの実行が終了すると、スケジューラ補助装置６は、実行されたプロセッサコアのＰＭ装置またはＩＰＣモニタ装置からＩＰＣ値を受信する（Ｓ１７）。そして、ＩＰＣ値に基づき、タスクの再割り当てが必要か否かを確認し、必要な場合にそのタスクが次回割り当て時に再割り当てが必要であることを管理する（Ｓ１８）。 On the other hand, when it is not the first execution or when reassignment is unnecessary, the processor core that executes the task by referring to the utilization information of the hardware resource of the task previously collected by the performance monitor device (PM device) One of the units 5 (cores A to Z) is selected, and a task is supplied to the selected processor core unit 5 (S16). When the task execution ends, the scheduler auxiliary device 6 receives the IPC value from the PM device or the IPC monitor device of the executed processor core (S17). Then, based on the IPC value, it is confirmed whether or not the task needs to be reassigned. If necessary, the task is managed to be reassigned at the next assignment (S18).

スケジューラ補助部６は、供給され一時保持されたタスクがなくなるまで（Ｓ１９）、タスクを取り出して（Ｓ２０）、ステップＳ１２以降の処理を繰り返し、タスクが無くなるとアプリケーションプログラムの実行が完了する。 The scheduler auxiliary unit 6 takes out the task (S20) until there is no task that is supplied and temporarily held (S19), and repeats the processing after step S12. When there are no more tasks, the execution of the application program is completed.

なお、この実行中に、ユーザから、この実行中のアプリケーションに含まれるタスクを含む別のアプリケーションの実行要求を受けた場合は、そのタスクはアプリケーションのタスクの実行した際の利用情報を利用できる。 During this execution, if a user receives an execution request for another application including a task included in the application being executed, the task can use the usage information when the task of the application is executed.

以上のような本発明の実施の形態によれば、ヘテロジーニアスなマルチプロセッサにおいて、タスクを再実行する際には、そのタスクの実行に適切なプロセッサコア部を選択し、実行させることができるとともに、タスクの実行の際のＩＰＣ値の低下によって、再度割り当てるコアを決定できるから、タスクを効率的に実行できる。 According to the embodiment of the present invention as described above, when a task is re-executed in a heterogeneous multiprocessor, a processor core unit suitable for executing the task can be selected and executed. Since the core to be reassigned can be determined by the lowering of the IPC value when executing the task, the task can be executed efficiently.

次に上記で説明した本実施の形態をより詳細化した実施例を説明する。 Next, a more detailed example of the present embodiment described above will be described.

（第一実施例）
第一実施例では、プロセッサ装置１のプロセッサコア部５が４つの場合を想定する。図４は、４つのプロセッサコア部５のうち、ＯＳを実行するコアＺを除いた、コアＡ〜コアＣが備える処理機構の一例を示したものである。 (First Example)
In the first embodiment, it is assumed that there are four processor core units 5 of the processor device 1. FIG. 4 shows an example of the processing mechanism provided in the cores A to C except for the core Z that executes the OS among the four processor core units 5.

コアＡは、分岐予測機構（Ｂｒａｎｃｈｐｒｅｄｉｃｔｉｏｎ）、アウトオブオーダ機構（ｏｕｔｏｆｏｒｄｅｒ）、３つの同一のパイプライン機構（Ｐｒｏｃｅｓｓｉｎｇｐｉｐｅ１〜３）、及び、５１２ＫＢの２次キャッシュ機構（Ｌ２：５１２ＫＢ）の各処理機構を備える。また、コアＡは、コアＡのハードウェア資源の利用状況をモニタリングするパフォーマンスモニタ装置（ＰＭ装置）を備える。コアＢは、コアＡと同一のパイプライン機構を１つ、及び、コアＡの半分の記憶領域である２５６ＫＢの２次キャッシュ機構を備える。また、コアＣは、コアＡと同一の分岐予測機構、コアＡと同一のパイプライン機構を２つ、及び、コアＡの１／４の記憶領域である１２８ＫＢの２次キャッシュ機構とを備えている。このようにコアＢ、およびコアＣは、機能的にプロセッサコア部Ａのサブセットとなっている。なお、プロセッサコア部Ｚは、ここではＯＳ専用のプロセッサコアとし、説明を省く。また、コアＡ、コアＢ、コアＣは、いずれも同一のＩＳＡ（命令形式を二進数のオペコードのセットで表現したもの）で構成されるオブジェクトコードを実行可能であるとする。 Core A consists of a branch prediction mechanism (Branch prediction), an out-of-order mechanism (out of order), three identical pipeline mechanisms (Processing pipes 1-3), and a 512 KB secondary cache mechanism (L2: 512KB). Each processing mechanism is provided. The core A also includes a performance monitor device (PM device) that monitors the usage status of the hardware resources of the core A. The core B includes one pipeline mechanism that is the same as that of the core A, and a 256 KB secondary cache mechanism that is a half of the storage area of the core A. Core C also includes the same branch prediction mechanism as Core A, two pipeline mechanisms identical to Core A, and a 128 KB secondary cache mechanism that is a quarter of the storage area of Core A. Yes. Thus, the core B and the core C are functionally a subset of the processor core unit A. Here, the processor core unit Z is assumed to be a processor core dedicated to the OS and will not be described. Further, it is assumed that the core A, the core B, and the core C are all capable of executing an object code composed of the same ISA (in which an instruction format is expressed by a set of binary operation codes).

次に、コアＡに備えるパフォーマンスモニタ装置（ＰＭ装置）について、説明する。 Next, a performance monitor device (PM device) provided in the core A will be described.

ＰＭ装置は、コアＡでの一つのタスクの実行におけるハードウェア資源の利用状況を収集し、計算などによって複数のデータを生成し、これらを利用情報（ＰＭ情報）としてスケジューラ補助部６へ出力するものである。ＰＭ情報には、様々なデータを備えることが考えられるが、本実施の形態では図５に示すように、タスクＩＤ（TID=6）に対応付けて、キャッシュパフォーマンス低下率、分岐予測の有効度、ＩＰＣ値、アウトオブオーダ有効度、及び、実行時間の各項目で構成することとする。 The PM device collects the usage status of hardware resources in the execution of one task in the core A, generates a plurality of data by calculation or the like, and outputs these as usage information (PM information) to the scheduler auxiliary unit 6. Is. Although it is conceivable that the PM information includes various data, in this embodiment, as shown in FIG. 5, the cache performance degradation rate and the effectiveness of branch prediction are correlated with the task ID (TID = 6). , IPC value, out-of-order effectiveness, and execution time.

以下では、各項目の説明、及びその生成方法について説明する。 Below, explanation of each item and its generation method are explained.

「キャッシュパフォーマンス低下率」：５１２ＫＢのキャッシュサイズを持つ二次キャッシュ機構によってどのくらい速度向上が得られたかを測定し、そのキャッシュサイズを変更した（減少した）場合に、どの程度パフォーマンスに対して悪影響があるかを示した値。ＰＭ装置によってキャッシュのエントリごとに「ヒット回数」と「ミス回数」を測定し、それをもとに、「５１２ＫＢでヒットだったうちミスになる回数」に、「キャッシュミスペナルティサイクル数」を積算し、「タスク処理に要した全サイクル数」で除して、キャッシュサイズ毎のパフォーマンスへの悪影響を計算する。 “Cache performance degradation rate”: How much speed improvement is obtained by the secondary cache mechanism with a cache size of 512KB, and how much adverse performance is affected when the cache size is changed (decreased) A value indicating whether there is. The number of hits and the number of misses are measured for each cache entry by the PM device, and based on this, the number of cache miss penalty cycles is added to the number of misses out of 512KB hits. Then, by dividing by “the total number of cycles required for task processing”, an adverse effect on performance for each cache size is calculated.

なお、「５１２KBでヒットだったうちミスになる回数」は、（１）キャッシュのエントリごとにヒット回数、ミス回数をカウントし、（２）キャッシュサイズを変えた場合に同一エントリになってしまうエントリ同士を比較して、最もヒット数が多いものを探し、（３）キャッシュサイズを変えた場合に同一エントリになってしまうエントリのうち、ヒット数が最も多くなかったエントリ全てのヒット数を合計して、その値に「ワードサイズ÷キャッシュラインサイズ」を掛ける。こうして得られた値がキャッシュサイズを変更した場合に元々ヒットだったものがミスになってしまう回数の予測値とし、（４）最後にそれらの合計を求めることで得られる。 Note that “Number of misses out of hits at 512KB” is (1) Count the number of hits and misses for each cache entry, and (2) Entries that become the same entry when the cache size is changed Compare them to find the one with the largest number of hits. (3) Among the entries that become the same entry when the cache size is changed, add the number of hits for all the entries with the least number of hits. Multiply the value by “word size ÷ cache line size”. The value obtained in this way is obtained as a predicted value of the number of times that the original hit was missed when the cache size is changed, and (4) finally obtained by obtaining the total.

「分岐予測の有効度」：分岐予測機構によってどのくらい速度向上が得られたかを測定し、その有効度を示した値。既存のＰＭ装置でも採用されている性能指標イベントである「分岐がｔａｋｅｎ」、「分岐予測のヒット」を用いて、「分岐がｔａｋｅｎ、かつ分岐予測のヒット、の回数」に、プロセッサによって一意に決まる定数である「分岐ミスペナルティサイクル数」を積算し、他のタスクとの同期処理によって生じた遅延を除いたそのタスク本来に必要な処理時間を示す「タスク処理に要した全サイクル数」で除した値を計算して得られる。 “Effectiveness of branch prediction”: A value indicating the effectiveness of measuring how much speed improvement is obtained by the branch prediction mechanism. Using the performance index event “branch taken” and “branch prediction hit”, which are also used in existing PM devices, the number of “branch takes and branch prediction hits” is uniquely determined by the processor. The total number of cycles required for task processing, which indicates the processing time originally required for the task, excluding delays caused by synchronization processing with other tasks, is added to the determined constant "number of branch mispenalty cycles" It is obtained by calculating the divided value.

「ＩＰＣ値」：１サイクルあたりに処理された命令数の平均値を測定し、必要なパイプライン数。既存のパフォーマンスモニタ装置でも採用されている性能指標イベントである「実行された命令数」を前出の「タスク処理に要した全サイクル数」で除した値を計算して得られる。 “IPC value”: The average number of instructions processed per cycle is measured, and the number of pipelines required. It is obtained by calculating a value obtained by dividing the “number of executed instructions”, which is a performance index event also used in existing performance monitoring apparatuses, by the “total number of cycles required for task processing”.

「アウトオブオーダ有効度」：アウトオブオーダ機構によってどの程度命令の追い越しを実現できたかを測定し、その有効度を示す値。「先行命令よりも先にｉｓｓｕｅされた命令数」を「実行された命令数」で除して求める。 “Out-of-order effectiveness”: A value indicating the effectiveness of measuring the degree of overtaking of an instruction by the out-of-order mechanism. The "number of instructions issued before the preceding instruction" is divided by "the number of executed instructions".

「実行時間」：タスクの実行時間に掛かったサイクル数を測定した値。ここではサイクル数を単位とする。 “Execution time”: A value obtained by measuring the number of cycles required for the execution time of a task. Here, the number of cycles is used as a unit.

以上のようにしてＰＭ装置で求められた、「キャッシュパフォーマンス低下率」、「分岐予測の有効度」、「ＩＰＣ値」、及び、「アウトオブオーダ有効度」、および「タスク実行時間」は、スケジューラ補助部６へ供給される。 As described above, the “cache performance degradation rate”, “branch prediction effectiveness”, “IPC value”, “out-of-order validity”, and “task execution time” obtained by the PM device are as follows: This is supplied to the scheduler auxiliary unit 6.

また、コアＢ、Ｃがそれぞれ備えるＩＰＣモニタ装置は、上記ＰＭ装置での説明のうち、ＩＰＣ値だけを測定するものである。 Further, the IPC monitoring device provided in each of the cores B and C measures only the IPC value in the description of the PM device.

次に、スケジューラ補助部６内部の詳細について説明する。図６は、スケジューラ補助部６の各内部ブロックおよびそれらの関係を図示したものである。 Next, details of the scheduler auxiliary unit 6 will be described. FIG. 6 illustrates the internal blocks of the scheduler auxiliary unit 6 and their relationship.

スケジューラ補助部６は、主にレジスタファイルにより実現されるタスクキュー２１、コア管理テーブル２２、タスク情報テーブル２４、コア情報テーブル２３の４つのテーブルと、ハードウェア回路により実現されるタスク管理部１１およびコア選択部１２という２つの実行部とで構成する。まず、各テーブルについて説明する。なお、各テーブルで示すN/Aは、「無」を示す。 The scheduler auxiliary unit 6 includes a task queue 21 realized mainly by a register file, a core management table 22, a task information table 24, and a core information table 23, a task management unit 11 realized by a hardware circuit, and It consists of two execution units called the core selection unit 12. First, each table will be described. Note that N / A shown in each table indicates “none”.

タスクキュー２１は、各プロセッサコア部５で実行するタスクの状態を管理するものである。ある状態のタスクキュー２１の一例を図７に示す。タスクキュー２１は有限個のエントリー（本実施例では１０個）で構成され、各エントリは、TID、T#、status、dependency、parameter、orderという項目を持つ。TIDは、現在スケジューラ補助部６内で管理されているタスクのユニークな内部ID、T#はTIDに割り当てられたタスクの開始アドレスごとに固有のID、StatusはTIDで示されるタスクの状態、dependencyはこのタスクが実行可能となるために実行終了していなくてはならないタスクのTIDのリスト、parameterはこのタスクを実行する際に用いるパラメータ、orderはこのタスクキューへの投入された順序を保持する項目である。なお本実施例ではT#はタスクの開始アドレスごとに固有のIDであるとしたが、実際には開始アドレスが同じでも状況によって動作パターンが異なる場合には異なるIDを与える方式も考えられる。 The task queue 21 manages the state of tasks executed by each processor core unit 5. An example of the task queue 21 in a certain state is shown in FIG. The task queue 21 includes a finite number of entries (10 in this embodiment), and each entry has items of TID, T #, status, dependency, parameter, and order. TID is a unique internal ID of the task currently managed in the scheduler auxiliary unit 6, T # is a unique ID for each task start address assigned to TID, Status is the task status indicated by TID, dependency Is a list of TIDs of tasks that must be completed in order for this task to be executable, parameter is a parameter used to execute this task, and order holds the order in which this task was submitted It is an item. In this embodiment, T # is a unique ID for each start address of a task. However, in practice, a method of giving a different ID when the operation pattern differs depending on the situation even if the start address is the same may be considered.

ここでstatusによって示されるタスクの状態としてはempty, wait, ready, run, finishの５つの用意されており、これらが図８で示されるように遷移することでタスク管理が実現される。まず新しいタスクがスケジューラから投入されると状態がemptyであるTIDのうちの一つにそのタスクを登録するが、投入されたタスクに先行依存タスクが設定されている場合には状態をwaitに、そうでなければ状態をreadyとする。状態がwaitであるタスクに関しては全ての先行タスクが終了した時点でその状態をreadyとする。次に状態がreadyであるタスクはコアへの割り当て対象となり、コアに実行を割り当てられた時点で状態をrunに遷移し、さらにこの実行が終了すると状態はfinishに遷移する。最後にタスクの終了をスケジューラに通知した時点で状態をemptyに戻し、再び新しいタスクを受け入れ可能な状態とする。 Here, five statuses of empty, wait, ready, run, and finish are prepared as the status indicated by status, and task management is realized by transitioning these as shown in FIG. First, when a new task is submitted from the scheduler, the task is registered in one of the TIDs whose state is empty, but if a predecessor dependent task is set for the submitted task, the state is set to wait, Otherwise, the state is set to ready. For a task whose state is "wait", the state is set to ready when all the preceding tasks are completed. Next, a task whose state is ready is assigned to the core, and when execution is assigned to the core, the state transitions to run, and when this execution is completed, the state transitions to finish. Finally, when the end of task is notified to the scheduler, the state is returned to empty, and a new task can be accepted again.

次に、コア管理テーブル２２は、各プロセッサコア部５の現在の状態を記憶しておくためのテーブルである。ある状態のコア管理テーブル２２の一例を図９に示す。コア管理テーブル２２は、プロセッサ装置１が備えるコアの個数分のエントリを持つ。各エントリは、CID、C#、status、running TIDという４つの項目を持ち、それぞれプロセッサ装置１内のユニークな内部ID、コアの種類、コアの状態、実行中のTIDを示すために用いられる。コアの状態としてはbusy, idle, reservedが存在し、それぞれタスクを実行中、タスク実行割り当て待ち、タスク割り当て対象外という状態を示す。 Next, the core management table 22 is a table for storing the current state of each processor core unit 5. An example of the core management table 22 in a certain state is shown in FIG. The core management table 22 has entries for the number of cores included in the processor device 1. Each entry has four items, CID, C #, status, and running TID, and is used to indicate a unique internal ID, a core type, a core state, and a running TID in the processor device 1, respectively. The core states include busy, idle, and reserved, indicating that each task is being executed, waiting for task execution assignment, and not being assigned to a task.

次に、コア情報テーブル２３は、プロセッサ装置１に搭載されている各コアの種類ごとの特徴が記載され、コア選択の基準として用いられるテーブルである。ある状態のコア情報テーブルの一例を図１０に示す。コアの特徴としては、L2キャッシュサイズ(L2 cache size)、分岐予測器の有無(branch prediction available)、命令実行パイプライン数(pipeline number)、アウトオブオーダ実行の可否(OOO available)があり、機能の有無を示す場合は有する場合にはYes、有しない場合にはNoを、それ以外の場合にはその項目が示す処理機構の数量をそのパラメータとして持つ。コア情報テーブル２３は、各コア（Ａ〜Ｃ）ごとに固有のテーブルであり、書き換えは行われない。なお、コアＺは、ここではＯＳ実行のために予約されているためタスク割り振りの対象外のため、コアＺ用の項目は有していない。 Next, the core information table 23 is a table that describes characteristics for each type of core mounted on the processor device 1 and is used as a reference for core selection. An example of the core information table in a certain state is shown in FIG. Core features include L2 cache size, branch predictor availability (branch prediction available), instruction execution pipeline number (pipeline number), out-of-order execution availability (OOO available), and functions In the case of indicating the presence / absence of “Yes”, “Yes” is given if it is present, “No” if it is not present, and in other cases, the quantity of the processing mechanism indicated by that item is given as its parameter. The core information table 23 is a table unique to each core (A to C) and is not rewritten. Since the core Z is reserved for OS execution here and is not subject to task allocation, the core Z does not have an item for the core Z.

次に、タスク情報テーブル２４は、タスクを各プロセッサコア部５で実行した場合の適切さの度合いを示したものである。ある状態のタスク情報テーブルの一例を図１１に示す。 Next, the task information table 24 indicates the degree of appropriateness when the task is executed by each processor core unit 5. An example of the task information table in a certain state is shown in FIG.

タスク情報テーブル２４はT#で示されるタスクがどの種類のコアでどの程度最適に実行可能かを示すためのScoreという項目（Score AはコアAに対する適性、Score BはコアBに対する適性、Score CはコアCに対する適性であり、１０を最大値として大きい方がより適性が高いことを示す）と、このタスクをコアＡで実行したときの実行時間（サイクル数）を保持するための項目execution time、このタスクの実行開始アドレスを示すstart address、タスクが新たな最適なコアへと割り当てられて実行された初回実行終了時にスケジューラ補助装置６が受信したIPC値から算出した閾値、各コア終了時にスケジューラ補助装置６が受信したIPC値、IPC値が閾値より低下した場合にT#で示されるタスクを再度コアAで実行してハードウェアリソースの有効活用度を計測するかどうかを示すre-execution flagで構成される。ここで、閾値とは、予め各コア毎に静的に定められた値（例えば80%など）でも良いし、コア数とタスク数等を勘案して動的に変動する値であっても良い。なお、ここで示したタスク情報テーブル２４は、各コア共通に静的な80%という値を用いている。 The task information table 24 is an item “Score” that indicates how optimally the task indicated by T # can be executed on which type of core (Score A is suitability for Core A, Score B is suitability for Core B, Score C Is an aptitude for core C, and a larger value with 10 being the maximum value indicates higher aptitude), and an item execution time for holding the execution time (number of cycles) when this task is executed on core A , A start address indicating the execution start address of this task, a threshold value calculated from the IPC value received by the scheduler auxiliary device 6 at the end of the first execution when the task is assigned to a new optimal core and executed, and a scheduler at the end of each core Whether the auxiliary device 6 receives the IPC value, the IPC value falls below the threshold value, and the task indicated by T # is executed again in the core A to measure the effective utilization of hardware resources It consists of a re-execution flag indicating whether or not. Here, the threshold value may be a value that is statically determined in advance for each core (for example, 80%), or may be a value that dynamically varies in consideration of the number of cores and the number of tasks. . The task information table 24 shown here uses a static value of 80% common to each core.

同図で、例えば、Ｔ＃１のエントリは、ＩＰＣ値が閾値より低下していないのでｒｅ−ｅｘｅｃｕｔｉｏｎｆｌａｇは０であるが、T#が２のエントリは、ＩＰＣ値が閾値より低下したため、ｒｅ−ｅｘｅｃｕｔｉｏｎｆｌａｇは１となっており、且つ、スコアと実行時間とはクリア（N/A）されている。また、タスクキュー２１に登録されているタスクのT#は全てこのタスク情報テーブル２４にエントリを持つが、そのうちスコア欄がN/Aであるタスクについてはまだコアの種類ごとの適性が調べられていないことを示す。なお、Scoreの値は、コア選択部１２によるスコア計算によって求められるものであり、詳細については後述する。 In the figure, for example, the entry of T # 1 has a re-execution flag of 0 because the IPC value is not lower than the threshold value, but the entry of T # is 2 because the IPC value is lower than the threshold value. -Execution flag is 1, and score and execution time are cleared (N / A). Also, all T # s of tasks registered in the task queue 21 have entries in the task information table 24, but for those tasks whose score column is N / A, the suitability for each core type is still being investigated. Indicates no. The Score value is obtained by score calculation by the core selection unit 12, and details will be described later.

コア選択部１２は、プロセッサコア部５からのタスク終了通知を受け取り、タスクキュー２１、コア管理テーブル２２、コア情報テーブル２３を参照しながらタスク情報テーブル２４の更新を行うものである。図１２に、タスク情報テーブルの更新のフローを示し、以下に説明する。 The core selection unit 12 receives a task end notification from the processor core unit 5 and updates the task information table 24 while referring to the task queue 21, the core management table 22, and the core information table 23. FIG. 12 shows a task information table update flow, which will be described below.

タスクが終了すると、プロセッサコア部５は、内部バスを介してスケジューラ補助部６に終了通知を送信する。スケジューラ補助部６内ではコア選択部１２がこの通知を受け取る（Ｓ２１）。終了通知には終了したタスクのTID、通知元のプロセッサコア部５のCID、タスク実行に要した時間、そしてコアＡで実行された場合にはPMデータも含まれる。コア選択部１２は、通知されたTIDとCIDをもとに、タスクキュー２１とコア管理テーブル２２を参照し、そのTIDのT#、および実行されていたプロセッサコア部５のC#を見つけておく。 When the task ends, the processor core unit 5 transmits an end notification to the scheduler auxiliary unit 6 via the internal bus. In the scheduler auxiliary unit 6, the core selection unit 12 receives this notification (S21). The completion notification includes the TID of the completed task, the CID of the processor core unit 5 that is the notification source, the time required for task execution, and the PM data when executed in the core A. The core selection unit 12 refers to the task queue 21 and the core management table 22 based on the notified TID and CID, and finds the T # of the TID and the C # of the processor core unit 5 that has been executed. .

次に、コア選択部１２は、ステップＳ２１において求めたT#について、タスク情報テーブル２４を参照して既にコアタイプ毎のスコアが計算済みであるかどうかを判定する（Ｓ２２）。スコア項目がN/Aの場合はまだスコアが計算されていないと判断して、ステップＳ２３へ処理を進める。一方、スコアに既にある値があれば、ステップＳ２６へ処理を進める。 Next, the core selection unit 12 determines whether or not the score for each core type has already been calculated for the T # obtained in step S21 with reference to the task information table 24 (S22). If the score item is N / A, it is determined that the score has not yet been calculated, and the process proceeds to step S23. On the other hand, if there is already a value in the score, the process proceeds to step S26.

コア選択部１２は、Ｓ２１において求めたC#から、このタスクがコアＡで実行されたのかどうかの判定を行う（Ｓ２３）。コアＡで実行されたものである場合にはステップＳ２４へ処理を進める。そうでなかった場合には処理を終了する。 The core selection unit 12 determines whether or not this task has been executed by the core A from C # obtained in S21 (S23). If it is executed by the core A, the process proceeds to step S24. If not, the process ends.

コア選択部１２は、終了通知の一部として送信されてきたＰＭ情報をもとにして、このタスクが対応するT#の、各コアタイプに対するスコアを計算する（Ｓ２４）。コア選択部１２は、Ｓ２４で計算した各コアタイプに対するスコア値をタスク情報テーブル２４の該当する項目に記録する。またタスクの実行時間をexecution time項目に記録し（Ｓ２５）、終了する。 The core selection unit 12 calculates a score for each core type of T # corresponding to this task based on the PM information transmitted as a part of the end notification (S24). The core selection unit 12 records the score value for each core type calculated in S24 in the corresponding item of the task information table 24. Also, the task execution time is recorded in the execution time item (S25), and the process ends.

ステップＳ２２の判定でＮｏの場合、コア選択部１２は、ステップＳ２１で得られたT#とC#とによって、タスク情報テーブル２４中のこのタスクの実行されたプロセッサコア部５に対するスコア値を調べ、これが１０だった場合のみＳ２７に処理を進める。そうでなければ処理を終了する。ここでスコア＝１０の時のみＳ２７の処理を行うのは、スコア＝１０のコアと言うのはこのタスクにとって最適なコアであると判断されていたということであり、そのようなコアで実行した場合の実行時間とコアＡでの実行時間とを比較することにより、この最適性の判断の妥当性が再検証できるからである。逆にスコア＜１０であるようなコアでの実行時間はコアＡでの実行時間との比較が難しいため、本実施例ではＳ２７による再検証処理も行わない。 When the determination in step S22 is No, the core selection unit 12 examines the score value for the processor core unit 5 in which the task is executed in the task information table 24 by using T # and C # obtained in step S21. Only when this is 10, the process proceeds to S27. Otherwise, the process is terminated. The reason why the processing of S27 is performed only when the score = 10 is that the core with the score = 10 is determined to be the most suitable core for this task, and is executed with such a core. This is because the validity of the determination of the optimality can be re-verified by comparing the execution time in this case with the execution time in the core A. On the contrary, the execution time in the core having a score <10 is difficult to compare with the execution time in the core A, and therefore the re-verification process in S27 is not performed in this embodiment.

コア選択部１２は、このタスクの今回の実行時間とタスク情報テーブル２４中に登録されているコアＡでの実行時間を比較する（Ｓ２７）。ここではある程度の誤差を許容するために、テーブル中に登録された実行時間に対して一定の値を加算した（もしくは乗算した）ものと比較するようにしても良い（一定の値は、外部から設定可能）。比較の結果、今回のタスクの実行時間がタスク情報テーブル２４中に登録されていた実行時間を上回った場合には、コア選択部１２は、タスク情報テーブル２４のこのタスクに関する情報をN/Aに、即ち、クリアする（Ｓ２８）。このステップＳ２８の処理により、この後、再度同じタスクが実行される際には最適なプロセッサコア部５の再選択が行われるようになる。 The core selection unit 12 compares the current execution time of this task with the execution time of the core A registered in the task information table 24 (S27). Here, in order to allow a certain amount of error, the execution time registered in the table may be compared with a value obtained by adding (or multiplying) a constant value (the constant value is externally applied). Can be set). If the execution time of the current task exceeds the execution time registered in the task information table 24 as a result of the comparison, the core selection unit 12 sets the information on this task in the task information table 24 to N / A. That is, it is cleared (S28). By the processing in step S28, when the same task is executed again thereafter, the optimum processor core unit 5 is reselected.

一方、比較の結果、今回のタスクの実行時間がタスク情報テーブル２４中に登録されていた実行時間を上回わらなかった場合には、図１３の処理に進む。 On the other hand, if the execution time of the current task does not exceed the execution time registered in the task information table 24 as a result of the comparison, the process proceeds to the process of FIG.

次に、コア選択装置１２は、タスク情報テーブル２４を参照することで、ＩＰＣ値を初めて計測したか否かを判定する（ステップＳ２７１）。ＩＰＣ値を初めて計測した場合にはステップＳ２７５へ、そうでない場合にはステップＳ２７２に処理が進む。 Next, the core selection device 12 refers to the task information table 24 to determine whether or not the IPC value has been measured for the first time (step S271). If the IPC value is measured for the first time, the process proceeds to step S275, and if not, the process proceeds to step S272.

コア選択装置１２は、プロセッサコア５から送信されるＰＭ情報もしくはＩＰＣ値により、タスク情報テーブル２４にＩＰＣ値を書き込む（ステップＳ２７２）。コア選択装置１２は、書き込まれたＩＰＣ値と閾値とを比較する（ステップＳ２７３）。上回っていれば本処理を終了する。閾値を下回っていれば、次にコア選択装置１２は、タスク情報テーブルのre-execution flagを立て、各コアのスコアとexecution timeをN/Aにクリアし（ステップＳ２７４）、本処理を終了する。これにより、次回同一T#となるタスクは再度コアAに割り当てられ最適なコアを再選択できるようになる。 The core selection device 12 writes the IPC value in the task information table 24 based on the PM information or the IPC value transmitted from the processor core 5 (step S272). The core selection device 12 compares the written IPC value with a threshold value (step S273). If it exceeds, this process is terminated. If it is below the threshold, the core selection device 12 then sets a re-execution flag in the task information table, clears the score and execution time of each core to N / A (step S274), and ends this processing. . As a result, the next task with the same T # is again assigned to the core A, and the optimum core can be selected again.

一方、ステップＳ２７１で、ＩＰＣ値の計測が初めての場合、タスク情報テーブル２４に閾値とＩＰＣ値を登録して（ステップＳ２７５）処理が終了する。 On the other hand, if the IPC value is measured for the first time in step S271, the threshold value and the IPC value are registered in the task information table 24 (step S275), and the process ends.

ここで、タスク情報テーブル２４内に記録するスコアの計算方法について、以下に一例を示す。 Here, an example of the calculation method of the score recorded in the task information table 24 is shown below.

コア選択部１２は、ＰＭ情報を評価するための閾値テーブルを備えている。閾値テーブルの一例を図１４に示す。閾値テーブルを用いたスコア計算方法は次のように行われる。 The core selection unit 12 includes a threshold value table for evaluating PM information. An example of the threshold table is shown in FIG. The score calculation method using the threshold value table is performed as follows.

まず、閾値テーブルとＰＭ情報を参照し、各プロセッサコア部５が有しているハードウェア資源がそのタスクを遅延なしに実行するための条件を満たしているかそうでないかを決定する。具体的にはPMデータの値が閾値未満のものは条件を満たしていない（×）、閾値以上のものは条件を満たしている（○）と判断する。この処理の結果は、例えば図１５のようになる。 First, with reference to the threshold table and PM information, it is determined whether or not the hardware resources of each processor core unit 5 satisfy the condition for executing the task without delay. Specifically, it is determined that the value of the PM data is less than the threshold value does not satisfy the condition (×), and that the value of the PM data is greater than the threshold value satisfies the condition (◯). The result of this process is, for example, as shown in FIG.

次に、各プロセッサコア部５が有しているハードウェア資源ごとのスコアを計算する。先の判定でタスクを遅延なしに実行できるための条件を満たしていない（×）と判定された場合は０点を、満たしている（○）と判定された場合は、さらに必要度に応じたスコア計算を行う。必要度に応じたスコア計算とは、概念的には必要最小限のハードウェア資源で要求を満たしている場合には１点を、必要以上のハードウェア資源を備えている場合には減点をして１点未満のスコアを付ける、というものである。より具体的にはYesかNoで示されるハードウェア資源に対しては、必要が無かったのに有している場合には０．５点を、数量で示されるハードウェア資源に対しては、必要だった量を実際に所有している数量で割った値をそのスコアとする。この処理の結果は、例えば図１６の６項目のうちの左の４項目それぞれのようになる。 Next, the score for every hardware resource which each processor core part 5 has is calculated. If it is determined that the condition for executing the task without delay in the previous determination is not satisfied (×), 0 points are received. If it is determined that the task is satisfied (O), the degree of necessity is further met. Calculate the score. The score calculation according to necessity is conceptually one point when the requirement is satisfied with the minimum necessary hardware resources, and one point is subtracted when the hardware resources are more than necessary. And give a score of less than one point. More specifically, for hardware resources indicated by Yes or No, 0.5 points are given when there is no need, and for hardware resources indicated by quantity, The value obtained by dividing the required quantity by the quantity actually owned is taken as the score. The result of this process is, for example, each of the left four items of the six items in FIG.

次に、プロセッサコアごとに、ハードウェア資源ごとに計算された値の合計値を求める。この処理の結果は、例えば図１６の６項目のうちの左から５項目”Intermediate score(SUM)”のようになる。 Next, a total value calculated for each hardware resource is obtained for each processor core. The result of this processing is, for example, five items “Intermediate score (SUM)” from the left of the six items in FIG.

最後に、上記で求めた合計値のうち、最も大きな値を持つコアを１０点、それ以外のプロセッサコアは中間値で求めた値に２．５を乗算した値に対して、整数値への切り上げを行った値を最終的なスコアとする。この処理の結果は、例えば図１６の６項目のうちの一番右の項目”Final Score”のようになる。 Finally, of the total values obtained above, the core having the largest value is 10 points, and the other processor cores are multiplied by 2.5 to the value obtained by the intermediate value. The rounded up value is used as the final score. The result of this processing is, for example, the rightmost item “Final Score” of the six items in FIG.

以上のようにして、タスク情報テーブル２４内に記録するためのスコアを求める。 As described above, the score for recording in the task information table 24 is obtained.

図６の説明に戻る。 Returning to the description of FIG.

タスク管理部１１は、ＯＳが実行されているコアＺと通信を行うとともに、タスク割り当て対象のプロセッサコア部５へのタスク実行割り当ての通知、およびタスクを割り当てたプロセッサコア部５から実行終了通知の受け取りを行うものである。 The task management unit 11 communicates with the core Z on which the OS is executed, notifies task execution allocation to the processor core unit 5 to which tasks are allocated, and notifies execution completion from the processor core unit 5 to which the task is allocated. Receive.

タスク管理部１１の内部を図１７に示す。破線で囲まれた領域がタスク管理部１１を示している。タスク管理部１１は、タスクキュー２１の更新を行うタスクキュー管理部３１、プロセッサコア部５に割り当てるタスクを決定するためのタスク割り当て決定部３２、割り当てが決まったタスクのプロセッサコア部５上での実行を管理するためのタスク実行管理部３３、コア管理テーブル２２の更新を行うコア管理テーブル管理部３４を備える。また、タスクキュー管理部３１、タスク実行管理部３３は、内部バスを介して各プロセッサコア部５と通信を行うことができる。 The inside of the task management unit 11 is shown in FIG. A region surrounded by a broken line indicates the task management unit 11. The task management unit 11 includes a task queue management unit 31 that updates the task queue 21, a task allocation determination unit 32 that determines a task to be assigned to the processor core unit 5, and a task on the processor core unit 5 that has been assigned. A task execution management unit 33 for managing execution and a core management table management unit 34 for updating the core management table 22 are provided. Further, the task queue management unit 31 and the task execution management unit 33 can communicate with each processor core unit 5 via an internal bus.

次にこのタスク管理部１１の動作について図１８に示すフローチャートをもとにして説明する。動作としては「新規タスクの登録」、「タスクのプロセッサコア装置への割り当て」、「タスクの実行終了」の３つのフローがあり、これらは共通のテーブルへのアクセスを除いてはそれぞれ独立に実行される。共通のテーブルへのアクセスに関する排他関係は図中の破線矢印に示すとおりである。破線矢印で結ばれた処理ステージ間は排他的な実行が行われる。 Next, the operation of the task management unit 11 will be described with reference to the flowchart shown in FIG. There are three flows of operations: “Register new task”, “Assign task to processor core device” and “End task execution”, which are executed independently except for access to a common table. Is done. The exclusive relationship regarding the access to the common table is as shown by the broken line arrow in the figure. Exclusive execution is performed between processing stages connected by broken-line arrows.

まず、新規タスクの登録について説明する。 First, registration of a new task will be described.

タスクキュー管理部３１は、内部バスを介してスケジューラからの、新規タスクの実行の要求を受理する（Ｓ３１）。 The task queue management unit 31 accepts a request to execute a new task from the scheduler via the internal bus (S31).

タスクキュー管理部３１は、タスク情報テーブル２４を参照してスケジューラから要求されたタスクの開始アドレスからT#を求める。タスク情報テーブル２４にタスクの開始アドレスが登録されていればそのT#を新規タスクのT#とし、まだ登録されていなかった場合は新しいT#エントリをタスク情報テーブル２４に生成してその開始アドレスをstart address項目に登録してタスクのT#とする（Ｓ３２）。 The task queue management unit 31 refers to the task information table 24 and obtains T # from the task start address requested by the scheduler. If the task start address is registered in the task information table 24, the T # is set as the T # of the new task, and if it is not registered yet, a new T # entry is generated in the task information table 24 and the start address is generated. Is registered in the start address item to be T # of the task (S32).

タスクキュー管理部３１は、タスクキュー２１内の空いているエントリ(状態がemptyのエントリ)に新規タスクを登録する。タスクキュー管理部３１はステップＳ３２で得られたT#とスケジューラからの要求に含まれているdependency、parameter情報をもとにタスクキュー２１の該当する項目を登録し（Ｓ３３）、また、既存のタスクより順序関係が後になるようにorder項目の値を設定する。dependencyが空でなければstatusをwaitに、そうでなければreadyとする。 The task queue management unit 31 registers a new task in an empty entry (entry whose state is empty) in the task queue 21. The task queue management unit 31 registers the corresponding items in the task queue 21 based on the T # obtained in step S32 and the dependency and parameter information included in the request from the scheduler (S33). Set the value of the order item so that the order relationship follows the task. Set status to wait if dependency is not empty, and ready otherwise.

タスクキュー管理部３１は、新規タスクを登録したTIDを内部バスを介してスケジューラに返す（Ｓ３４）。 The task queue management unit 31 returns the TID registered with the new task to the scheduler via the internal bus (S34).

次に、タスクのプロセッサコア部５への割当て時について説明する。 Next, a description will be given of when tasks are assigned to the processor core unit 5.

タスク割り当て決定部３２は、タスクキュー２１、タスク情報テーブル２４、コア管理テーブル２３を参照して、新たに実行を割り当てるべきタスクと、それを割り当てるべきプロセッサコア部５を決定し、タスク実行管理部３３に通知する（Ｓ４１）。通知される情報には割り当てられるタスクを示すTIDおよびその実行開始アドレス、実行パラメータと、割り当て先のプロセッサコア部５を示すCIDが含まれる。このタスク割り当て決定部３２によるタスクの決定処理の詳細については後述する。 The task assignment determination unit 32 refers to the task queue 21, the task information table 24, and the core management table 23, determines a task to which a new execution is to be assigned and a processor core unit 5 to which the task is to be assigned. 33 is notified (S41). The notified information includes a TID indicating the assigned task, its execution start address, an execution parameter, and a CID indicating the processor core unit 5 to which the assignment is made. Details of task determination processing by the task allocation determination unit 32 will be described later.

タスク実行管理部３３は、通知された情報を元に、内部バスを介してCIDで示されたプロセッサコア部５に対してTIDで示されるタスクの実行を要求する。具体的にはタスク実行管理部３３は、受け取ったTIDをもとにしてタスクキュー２１を参照して該当するT#とparameterを読み出し、これらの情報をCIDで示されるプロセッサコア部５にタスク実行要求として通知することにより実現する。またタスク実行管理部３３は、タスク実行中のCIDとTIDのペアを情報として記憶しておく（Ｓ４２）。 Based on the notified information, the task execution management unit 33 requests the processor core unit 5 indicated by CID to execute the task indicated by TID via the internal bus. Specifically, the task execution management unit 33 reads the corresponding T # and parameter by referring to the task queue 21 based on the received TID, and executes the task execution in the processor core unit 5 indicated by CID. This is realized by notifying as a request. The task execution management unit 33 stores a CID / TID pair during task execution as information (S42).

タスク実行管理部３３は、コア管理テーブル管理部３４に対して実行開始フラグとともにCID、TIDを送信する。コア管理テーブル管理部３４は、この情報を元にコア管理テーブルの更新を行う。具体的にはCIDが示すエントリのstatus項目をbusyとし、running TID項目にTIDを登録する（Ｓ４３）。 The task execution management unit 33 transmits the CID and TID together with the execution start flag to the core management table management unit 34. The core management table management unit 34 updates the core management table based on this information. Specifically, the status item of the entry indicated by the CID is set to busy, and the TID is registered in the running TID item (S43).

タスク実行管理部３３は、タスクキュー管理部３１に対して実行開始フラグとともにTIDを送信する。タスクキュー管理部３１はこの情報を元にタスクキューの更新を行う。具体的にはTIDが示すエントリのstatus項目をrunとする（Ｓ４４）。 The task execution management unit 33 transmits the TID to the task queue management unit 31 together with the execution start flag. The task queue management unit 31 updates the task queue based on this information. Specifically, the status item of the entry indicated by the TID is set to run (S44).

ステップＳ４１に、戻って次のタスク割り当てを行う。 Returning to step S41, the next task assignment is performed.

次に、タスクの実行終了について説明する。 Next, task execution termination will be described.

タスクを実行中のプロセッサコア部５が内部バスを介してタスクの終了をスケジューラ補助部６に通知すると、タスク実行管理部３３がその情報を受け取る。通知される情報の中には終了したプロセッサコア部５を識別するためのID（CID）が含まれる（Ｓ５１）。 When the processor core unit 5 executing the task notifies the scheduler auxiliary unit 6 of the end of the task via the internal bus, the task execution management unit 33 receives the information. The notified information includes an ID (CID) for identifying the completed processor core unit 5 (S51).

タスク実行管理部３３は、コア管理テーブル管理部３４に対して終了フラグとともにCIDを送信する。コア管理テーブル管理部３４は、この情報を元にコア管理テーブル２２の更新を行う。具体的にはCIDが示すエントリのstatus項目をidleとし、running TID項目をN/Aとする（Ｓ５２）。 The task execution management unit 33 transmits the CID along with the end flag to the core management table management unit 34. The core management table management unit 34 updates the core management table 22 based on this information. Specifically, the status item of the entry indicated by the CID is set to idle, and the running TID item is set to N / A (S52).

タスク実行管理部３３はタスクキュー管理部３１に対して終了フラグとともにTIDを送信する。タスクキュー管理部３１は、この情報を元にタスクキュー２１の更新を行う。具体的にはTIDが示すエントリのstatus項目をfinishとし、さらに他のTIDエントリのdependency項目からこのTIDを取り除く（Ｓ５３）。 The task execution management unit 33 transmits the TID to the task queue management unit 31 together with the end flag. The task queue management unit 31 updates the task queue 21 based on this information. Specifically, the status item of the entry indicated by the TID is set to finish, and this TID is removed from the dependency item of another TID entry (S53).

タスク実行管理部３３は、内部バスを介してスケジューラに対してこのタスクの終了を通知する。通知情報には実行を終了したTIDが含まれる。さらに通知終了後にタスクキュー２１の更新を行う。具体的にはTIDが示すエントリのstatus項目をemptyとし、T#、parameter、orderの各項目をN/Aとする。さらにタスクキュー２１内の各エントリのorder値のうちこのタスクのorder値よりも大きいものを全て１デクリメントする（Ｓ５４）。 The task execution management unit 33 notifies the scheduler of the end of the task via the internal bus. The notification information includes the TID that has finished execution. Further, the task queue 21 is updated after the notification is completed. Specifically, the status item of the entry indicated by TID is set to empty, and the T #, parameter, and order items are set to N / A. Further, all of the order values of each entry in the task queue 21 that are larger than the order value of this task are decremented by 1 (S54).

以上のように、タスク管理部１１は動作する。 As described above, the task management unit 11 operates.

次に、タスク割り当てを行うタスク割り当て決定部３２の詳細な動作について図１９を用いて説明する。タスク割り当て決定部３２は、スケジューラ補助部６内のタスクキュー２１、タスク情報テーブル２４、コア管理テーブル２２の３つのテーブルを参照して、プロセッサコア部５へ実行させるタスクと、その割り当て先のプロセッサコア部５を決定する機能を有するものである。 Next, the detailed operation of the task assignment determination unit 32 that performs task assignment will be described with reference to FIG. The task allocation determination unit 32 refers to the three tables of the task queue 21, the task information table 24, and the core management table 22 in the scheduler auxiliary unit 6, and the task to be executed by the processor core unit 5 and the processor of the allocation destination This has a function of determining the core unit 5.

まず、タスク割り当て決定部３２は、タイプ毎割り当て可否テーブルを生成する（Ｓ６１）。コアタイプ毎割り当て可否テーブルの一例を図２０に示す。コアタイプ毎割り当て可否テーブルとは、コア管理テーブル２２をもとに生成可能な中間テーブルであって、コアの種類(C#)ごとにエントリを持ち、新規タスク割り当ての可否(status)と、割り当てが可能な場合にはどのCIDに割り当てられるのか(allocatable CID)を示すテーブルである。Status項目はコア管理テーブル中で該当するコア(C#)のうち少なくとも一つのstatusがidleである場合のみidleに、それ以外の場合はbusyとなる。またallocatable CID項目は前項目がidleの場合のみ、コア管理テーブル２２においてそのC#をもつCIDのうちstatusがidleである最も小さなCIDをその値とする。 First, the task assignment determination unit 32 generates an assignment availability table for each type (S61). An example of the assignment table for each core type is shown in FIG. The assignability table for each core type is an intermediate table that can be generated based on the core management table 22, and has an entry for each core type (C #). It is a table showing which CID is allocated (allocatable CID) when possible. The Status item is set to idle only when at least one status among the corresponding cores (C #) in the core management table is idle, and is set to busy otherwise. In addition, the allocatable CID item has the smallest CID whose status is idle among the CIDs having the C # in the core management table 22 only when the previous item is idle.

次に、コアタイプ毎割り当て可否テーブルにstatusがidleであるコアがあるか否かを判断し（Ｓ６２）、あれば、次に、割り当て候補TIDテーブルを作成する（Ｓ６３）。 Next, it is determined whether or not there is a core whose status is idle in the assignability table for each core type (S62). If there is, then an assignment candidate TID table is created (S63).

割り当て候補TIDテーブルの一例を図２１に示す。割り当て候補TIDテーブルは、タスクキュー２１から生成可能な中間テーブルであり、割り当て可能なTIDごとにそのT#とorderのみを抽出したテーブルである。タスクキュー２１においてstatusがreadyであるようなTIDのみを抽出し、そのT#とorderを抜き出すことで生成できる。 An example of the allocation candidate TID table is shown in FIG. The assignment candidate TID table is an intermediate table that can be generated from the task queue 21, and is a table in which only T # and order are extracted for each assignable TID. It can be generated by extracting only the TID whose status is ready in the task queue 21 and extracting its T # and order.

次に、割り当て候補TIDテーブルに割り当て可能なTIDがあるか否かを判断し（Ｓ６４）、あれば、次に、コア状態を反映したタスク毎スコアテーブルを作成する（Ｓ６５）。 Next, it is determined whether there is an assignable TID in the assignment candidate TID table (S64). If there is, then a task-specific score table reflecting the core state is created (S65).

コア状態を反映したタスク毎スコアテーブルの一例を図２２に示す。コア状態を反映したタスク毎スコアテーブルは、先ほどのコアタイプ毎割り当て可否テーブルとタスク情報テーブルをもとに生成可能な中間テーブルであり、現在割り当て不可能なコアタイプに対するスコア値を０でマスクしたテーブルである。タスク情報テーブル２４を基本として、コアタイプ毎割り当て可否情報から、もしそのコアタイプが割り当て可能であればスコア値はそのままにし、割り当てが不可能である場合はスコアを０に書き換えることで生成される。またタスク情報テーブル２４にスコアが登録されていないタスク全て（タスク情報テーブルにre-executionフラグが立っているタスクを含む）に対応するような”other”というエントリを追加し、コアＡのみスコアを１０、それ以外を０とした上で、上記と同様なマスク処理を行ってコアタイプ毎のスコアとしている。 An example of the score table for each task reflecting the core state is shown in FIG. The score table for each task that reflects the core status is an intermediate table that can be generated based on the assignment table for each core type and the task information table, and the score values for core types that cannot be assigned are masked with 0. It is a table. Based on the task information table 24, it is generated from the allocation information for each core type by rewriting the score to 0 if the core type can be allocated, while the score value is left unchanged. . In addition, an entry “other” corresponding to all the tasks whose scores are not registered in the task information table 24 (including tasks having a re-execution flag set in the task information table) is added, and only the score of the core A is obtained. 10 and other values are set to 0, and a mask process similar to the above is performed to obtain a score for each core type.

次に、実行可能タスクスコアテーブルを生成する。（Ｓ６６）実行可能タスクスコアテーブルの一例を図２３に示す。実行可能タスクスコアテーブルは、先に生成したコア状態を反映したタスク毎スコアテーブルと割り当て候補TIDテーブルから生成可能な中間テーブルであり、割り当て可能なタスク毎にエントリを持ち、TID、T#、最大score、order、C#を項目として持つテーブルである。このうちC#と最大scoreは、該当するT#をもとにしてコア状態を反映したタスク毎スコアテーブルから計算される値で、最大のスコアをとるためのコアタイプ(C#)とそこに割り当てたときのスコア値を示している。またT#とorderは割り当て候補TIDテーブルから対応するTIDの持つ値をそのまま登録したものとなる。 Next, an executable task score table is generated. (S66) An example of the executable task score table is shown in FIG. The executable task score table is an intermediate table that can be generated from the score table for each task reflecting the previously generated core state and the allocation candidate TID table, and has an entry for each assignable task, TID, T #, maximum This table has score, order, and C # as items. Of these, C # and maximum score are values calculated from the score table for each task that reflects the core state based on the corresponding T #, and the core type (C #) for obtaining the maximum score and assigned to it When the score value is shown. Also, T # and order are registered as they are with the values of the corresponding TIDs from the allocation candidate TID table.

上記４つの中間テーブルが生成された時点で、タスク割り当て決定部３２は、実行を割り当てるべきタスクを決定する（Ｓ６７）。具体的には最大スコア値が最大となるTIDで示されるタスクを、該当するC#で示されるコアタイプのプロセッサコア部５に割り当てるのが最も適当であると判断する。また最大スコア値が同じタスクが複数存在する場合にはorder値が最小のTIDを選択する。 When the four intermediate tables are generated, the task assignment determination unit 32 determines a task to which execution is to be assigned (S67). Specifically, it is determined that it is most appropriate to assign the task indicated by the TID having the maximum maximum score value to the processor core unit 5 of the core type indicated by the corresponding C #. If there are multiple tasks with the same maximum score value, the TID with the smallest order value is selected.

次に、タスク割り当て決定部３２は、選択されたTIDを実行すべきプロセッサコア部５を選択するが、これは実行可能タスクコアテーブルで示されるC#を使ってコア毎割り当て可否テーブルの該当するエントリのCID項目を参照することで実現される（Ｓ６８）。 Next, the task allocation determination unit 32 selects the processor core unit 5 that should execute the selected TID, and this is a corresponding entry in the per-core allocation availability table using C # indicated in the executable task core table. This is realized by referring to the CID item (S68).

さらにタスク割り当て決定部３２は、実行可能タスクコアテーブルで示されるT#をもとにタスク情報テーブル２４を参照してタスクの実行開始アドレスを、TIDをもとにタスクキューを参照してタスクの実行パラメータをそれぞれ決定する（Ｓ６９）。そして、これらの情報（TID、CID、実行開始アドレス、パラメータ）をタスク実行管理部３３に通知する（Ｓ７０）。この例では、TID=６で示されるタスク（開始アドレス=0x10000、実行パラメータ=parameter6）を、CID=2で示されるプロセッサコア部５に割り当てることが決定される。 Further, the task allocation determining unit 32 refers to the task information table 24 based on the T # indicated in the executable task core table, refers to the task execution start address, and refers to the task queue based on the TID. Each execution parameter is determined (S69). Then, the information (TID, CID, execution start address, parameter) is notified to the task execution management unit 33 (S70). In this example, it is determined to assign the task indicated by TID = 6 (start address = 0x10000, execution parameter = parameter6) to the processor core unit 5 indicated by CID = 2.

なお、ステップＳ６２、ステップＳ６４で、存在しない場合には、インターバル処理を行った（Ｓ７１）後に、ステップＳ６１からの処理を再開する。なお、インターバル処理中は、新規タスク投入やタスクの終了などにともなうスケジュール補助装置６内のテーブルの更新を許している。 In step S62 and step S64, if there is no such interval, an interval process is performed (S71), and then the process from step S61 is resumed. During the interval processing, the table in the schedule assisting device 6 is allowed to be updated when a new task is input or a task is ended.

以上説明してきたように、本実施例によれば、ヘテロジーニアスなマルチプロセッサにおいて、タスクを再実行する際には、そのタスクの実行に適切なプロセッサコア部を選択し、実行させることができるとともに、タスクの実行の際のＩＰＣ値の低下によって、再度割り当てるコアを決定するから、タスクを効率的に実行できる。 As described above, according to the present embodiment, when a task is re-executed in a heterogeneous multiprocessor, it is possible to select and execute a processor core unit suitable for executing the task. Since the core to be reassigned is determined by the lowering of the IPC value when executing the task, the task can be executed efficiently.

次に第２の実施例について説明する。 Next, a second embodiment will be described.

（第二実施例）
第一実施例では、コアＡ、コアＢ、コアＣという３種類のプロセッサコア部５を備えたプロセッサ装置１において、コアＡが、その他全ての種類のコアの機能を備える場合について示したが、本実施例は、そのような絶対的なコアＡが存在しないようなプロセッサ装置１にも適用可能な例を示したものである。第ニ実施例は、第一実施例と多くの点で重複するため、差分を中心に説明する。 (Second embodiment)
In the first embodiment, in the processor device 1 including the three types of processor core units 5 of the core A, the core B, and the core C, the case where the core A has the functions of all other types of cores has been shown. This embodiment shows an example applicable to the processor device 1 in which such an absolute core A does not exist. Since the second embodiment overlaps with the first embodiment in many respects, the difference will be mainly described.

第二実施例では、プロセッサ装置１のプロセッサコア部５が５つの場合を想定する。図２４は、５つのプロセッサコア部５のうち、ＯＳを実行するコアＺを除いた、コアＡ〜コアＤが備える処理機構の一例を示したものである。 In the second embodiment, it is assumed that there are five processor core units 5 of the processor device 1. FIG. 24 shows an example of a processing mechanism included in the cores A to D except for the core Z that executes the OS among the five processor core units 5.

同図からわかるように、命令パイプライン数、分岐予測器、アウトオブオーダ機構の観点では、コアＢ、Ｃ、ＤはコアＡのサブセットであり、またＬ２キャッシュサイズの観点では、コアＡ、Ｂ、ＣはコアＤのサブセットである。 As can be seen from the figure, the cores B, C, and D are a subset of the core A in terms of the number of instruction pipelines, branch predictors, and out-of-order mechanisms, and the cores A and B in terms of the L2 cache size. , C is a subset of core D.

従って、コアＡに加えて、コアＤにおいても、パフォーマンスモニタ装置（ＰＭ）を搭載している。また、コアＢ、コアＣにおいても、ＩＰＣモニタ装置を搭載している。 Therefore, in addition to the core A, the core D is also equipped with a performance monitor device (PM). The core B and the core C are also equipped with an IPC monitor device.

次に、スケジューラ補助部６’の内部について、図２５を用いて説明する。同図から、わかるように、第一実施例と比較し、ＰＭデータバッファ２５が追加された点が異なっている。また、図からは直接的には見えないが、タスク情報テーブル２４’、タスク管理部１１’、コア選択部１２’にも、一部変更（拡張）が必要となる。 Next, the inside of the scheduler auxiliary unit 6 'will be described with reference to FIG. As can be seen from the figure, the PM data buffer 25 is added as compared with the first embodiment. Although not directly visible in the figure, the task information table 24 ′, the task management unit 11 ′, and the core selection unit 12 ′ need to be partially changed (extended).

ＰＭデータバッファ２５は、あるタスク（T#）をコアＡとコアＤとの２つから異なるタイミングでＰＭ情報が通知されるため、双方からＰＭ情報が揃うまで一時的に記憶するためのものである。双方からのＰＭ情報が揃うと、コア選択部１２’によってそのタスク（T#）の各コアタイプに対するスコアを計算し、スコアの計算が完了するとＰＭバッファ中のそのタスク（T#）に対するエントリは削除される。 The PM data buffer 25 is for temporarily storing a task (T #) from two of the core A and the core D at different timings until the PM information is obtained from both. is there. When the PM information from both sides is gathered, the core selection unit 12 ′ calculates the score for each core type of the task (T #), and when the score calculation is completed, the entry for the task (T #) in the PM buffer is Deleted.

タスク情報テーブル２４’は、図２６に示すように”To be run”項目が追加され、ここにそのタスクのスコアを計算するために実行されなければいけないプロセッサコア部５のタイプ(C#)のリストが登録される。ここに登録されたC#値は、それが示すプロセッサコア５上で該当するタスクが終了するたびにリストから除外され、N/Aになった段階でスコアが計算されたことを示す。この例ではT#＝3のタスクはコアＡでのみ、T#＝6のタスクはコアＤでのみ実行済みであり、その他のタスク1,4,5はコアＡ、コアＤの両方のプロセッサコア５で実行済であることが分かる。 In the task information table 24 ′, as shown in FIG. 26, a “To be run” item is added, and a list of types (C #) of the processor core unit 5 that must be executed to calculate the score of the task. Is registered. The C # value registered here is excluded from the list every time the corresponding task is completed on the processor core 5 indicated by the C # value, and indicates that the score has been calculated when it becomes N / A. In this example, the task with T # = 3 is executed only in core A, the task with T # = 6 is executed only in core D, and the other tasks 1, 4, and 5 are processor cores of both core A and core D. 5 shows that it has been executed.

コア選択部１２’は、図２７および図２８のような動作フローとなる。なお、第一実施例のコア選択部１２の動作フロー（図１２および図１３）と同じステップは同一のステップ番号を付しており、変更となった点は「’（カンマ）」を付しており、新たに追加されたステップは１００番台のステップ番号を付している。 The core selection unit 12 ′ has an operation flow as shown in FIGS. 27 and 28. The same steps as the operation flow (FIGS. 12 and 13) of the core selection unit 12 of the first embodiment are given the same step numbers, and “′ (comma)” is added to the changed points. The newly added steps are given step numbers in the 100s.

まず、ステップＳ２１、Ｓ２２は、第一実施例と同じである。 First, steps S21 and S22 are the same as in the first embodiment.

ステップＳ２２でYesのとき、次に、コア選択部１２’は、Ｓ２１において求めたC#とT#から、このタスクがコアＡ及びコアＤで実行された否かの判定を行う（Ｓ２３’）。具体的には、タスク情報テーブル２４’中のT#が示すエントリにおける”To be run”項目にC#がリストされていれば、コアＡ及びコアＤで実行されたと判定する。ステップＳ２３’にて、コアＡ及びコアＤで実行されなかったとの判定であれば本動作フローを終了し、実行されたとの判定であれば、ステップＳ１０１へ進む。 When Yes in step S22, the core selection unit 12 'next determines whether or not this task has been executed by the core A and the core D from C # and T # obtained in S21 (S23'). Specifically, if C # is listed in the “To be run” item in the entry indicated by T # in the task information table 24 ′, it is determined that the execution is performed by the core A and the core D. If it is determined in step S23 'that the core A and the core D have not been executed, the operation flow is terminated. If it is determined that the core A and the core D have not been executed, the process proceeds to step S101.

コア選択部１２’は、終了通知の一部として送信されてきたＰＭ情報をＰＭデータバッファ２５に登録する（Ｓ１０１）。PMバッファ内に既に対応するT#のエントリが存在すれば、そのエントリに対して追記を行い、そうでなければ新しいエントリを追加してそこの対応する項目に対してPMデータを記録し、PMデータが存在しない項目についてはN/Aのままにしておく。なおexecution timeの欄の登録については既存の値が存在すればその値よりもPMデータの示す値が小さい場合のみ上書きを行う。さらにコア選択装置はタスク情報テーブルの該当する”To be run”項目の登録されるC#を取り除く。 The core selection unit 12 'registers the PM information transmitted as part of the end notification in the PM data buffer 25 (S101). If a corresponding T # entry already exists in the PM buffer, append to that entry, otherwise add a new entry and record PM data for the corresponding item, Leave N / A for items for which no data exists. Note that if there is an existing value in the execution time column, it is overwritten only if the value indicated by the PM data is smaller than that value. Furthermore, the core selection device removes C # registered in the corresponding “To be run” item of the task information table.

コア選択部１２’は、ステップ２３’において参照したタスク情報テーブル２４’のT#が示すエントリ中の”To be run”項目にリストアップされていたコアタイプが、C#以外に存在していないかどうか判定する（Ｓ１０２）。C#以外のコアタイプがリストアップされていなかった場合にはステップＳ２４’へ、そうでなければ終了する。 The core selection unit 12 ′ determines whether there is a core type other than C # listed in the “To be run” item in the entry indicated by T # in the task information table 24 ′ referred to in step 23 ′. It is determined whether or not (S102). If no core type other than C # is listed, the process proceeds to step S24 ', and if not, the process ends.

次に、コア選択部１２’は、ＰＭデータバッファ２５に記録されたＰＭデータをもとにして、このタスクが対応するT#の、各コアタイプに対するスコアを計算する（Ｓ２４’）。 Next, the core selection unit 12 'calculates a score for each core type of T # corresponding to this task based on the PM data recorded in the PM data buffer 25 (S24').

そして、コア選択部１２’で計算した各コアタイプに対するスコア値をタスク情報テーブル２４’の該当する項目に記録する。また、ＰＭバッファに記録されていたexecution timeをタスク情報テーブル２４’のexecution time項目に記録する（Ｓ２５’）。 Then, the score value for each core type calculated by the core selection unit 12 'is recorded in the corresponding item of the task information table 24'. Further, the execution time recorded in the PM buffer is recorded in the execution time item of the task information table 24 '(S25').

次に、コア選択部１２’は、ＰＭデータバッファ２５内のT#に相当するエントリを削除し（Ｓ１０３）、処理を終了する。 Next, the core selection unit 12 'deletes the entry corresponding to T # in the PM data buffer 25 (S103) and ends the process.

一方、ステップＳ２２で、Noの場合、ステップＳ２６へ進み、以後ステップＳ２８まで第一実施例と同様である。ステップＳ２８の後、コア選択部１２’は、タスク情報テーブル２４’のT#に相当するエントリ中の”To be run”項目に、全てのＰＭ装置を有するプロセッサコア部５のコアタイプを再登録する（Ｓ１０４）。これによってこのタスクは再測定されることになる。 On the other hand, in the case of No in step S22, the process proceeds to step S26, and thereafter the same as in the first embodiment up to step S28. After step S28, the core selection unit 12 ′ re-registers the core type of the processor core unit 5 having all PM devices in the “To be run” item in the entry corresponding to T # of the task information table 24 ′. (S104). This causes this task to be remeasured.

次に、タスク管理部１１’について説明する。 Next, the task management unit 11 'will be described.

タスク管理部１１’は、Ｈ／Ｗ構成は第1実施例と同様であるが、図１８に示した処理フロー中のステップＳ３２、および、図１９に示したタスク割り当て決定フロー中のステップＳ６５が異なっている。 The task management unit 11 ′ has the same H / W configuration as that of the first embodiment, but includes step S32 in the processing flow shown in FIG. 18 and step S65 in the task assignment determination flow shown in FIG. Is different.

ステップＳ３２に対応し、変更した処理は次のようになる。 Corresponding to step S32, the changed process is as follows.

「タスクキュー管理部１１’は、タスク情報テーブル２４’を参照して、（ＯＳの）スケジューラから要求されたタスクの開始アドレスからT#を求める。タスク情報テーブル２４’を参照して既にタスクの開始アドレスが登録されていればそのT#を新規タスクのT#とし、まだ登録されていなかった場合は、新しいT#エントリをタスク情報テーブル２４’に生成して、その開始アドレスをstart address項目に登録しタスクのT#とする。T#で示されるエントリに対して、コアＡ及びコアＤに相当するコアタイプのC#（本実施例ではAとD）を”To be run”項目に登録する。」
また、ステップＳ６５に対応し、変更した処理は以下のようになる。 “The task queue management unit 11 ′ refers to the task information table 24 ′ and obtains T # from the start address of the task requested by the (OS) scheduler. The task information table 24 ′ refers to the task information table 24 ′. If the start address is registered, the T # is set as the T # of the new task. If the T # is not registered yet, a new T # entry is generated in the task information table 24 ′, and the start address is set to the start address item. And register it as T # of the task.Register the core type C # (A and D in this example) corresponding to the core A and core D in the “To be run” item for the entry indicated by T #. To do. "
Further, the changed processing corresponding to step S65 is as follows.

「コア状態を反映したタスク毎スコアテーブルは、先ほどのコアタイプ毎割り当て可否テーブルとタスク情報テーブル２４’をもとに生成可能なテーブルであり、現在割り当て不可能なコアタイプに対するスコア値を０でマスクしたテーブルである。タスク情報テーブル２４’を基本として、コアタイプ毎割り当て可否情報から、もしそのコアタイプが割り当て可能であればスコア値はそのままにし、割り当てが不可能である場合はスコアを０に書き換えることで生成される。またタスク情報テーブル２４’にスコアが登録されていないタスクに関しては、タスク情報テーブル２４’を参照しながらまだ実行されていない（”To be run”の項目にリストされている）ビッグコアのみスコアを１０、それ以外を０とした上で、上記と同様なマスク処理を行ってコアタイプ毎のスコアとする。」このような変更の結果、コア状態を反映したタスク毎スコアテーブルにはotherというエントリがなくなり、代わりに図２９に示すように、タスク情報テーブルが持つ全てのT#に対するエントリが用意される。 “The score table for each task that reflects the core state is a table that can be generated based on the assignment table for each core type and the task information table 24 ′, and the score value for the core type that cannot be assigned at present is 0. Based on the task information table 24 ', the score value is left unchanged if the core type can be assigned, and the score is 0 if assignment is impossible. In addition, a task whose score is not registered in the task information table 24 ′ is not yet executed with reference to the task information table 24 ′ (listed in the “To be run” item). The score is 10 for big cores and 0 for others, and the same mask as above As a result of such a change, there is no entry “other” in the score table for each task reflecting the core state. Instead, as shown in FIG. An entry for every T # you have is prepared.

以上説明した第二実施例によれば、絶対的なコアＡが存在しないようなプロセッサ装置１にも適用可能となった。また絶対的なコアＡが存在しないようなプロセッサ装置においても最小限の回数での実行によって、タスクとプロセッサ装置内の全てのコアに対するスコア判定が可能となった。 According to the second embodiment described above, it can be applied to the processor device 1 in which the absolute core A does not exist. Further, even in a processor device in which an absolute core A does not exist, it is possible to determine scores for a task and all the cores in the processor device by execution with a minimum number of times.

なお、上記で説明した各実施例におけるＰＭ装置は、タスク終了通知とともにＰＭ情報を送信するとして説明したが、タスク終了を伴わない状況でもＰＭ装置があるタイミングでＰＭ情報をTIDとともに送信するようにしてもよく、ステップＳ２４、Ｓ２４’によるスコア計算処理およびステップＳ２５、Ｓ２５’によるタスク情報テーブル２４、２４’の更新処理のみを独立に行うことも可能である。ただし、この場合にはタスクの実行時間であるexecution time項目の更新は行わない、もしくは登録可能な最大値で更新することになる。 The PM device in each of the embodiments described above has been described as transmitting PM information together with a task end notification. However, the PM device is configured to transmit PM information together with TID at a certain timing even in a situation where task end is not accompanied. Alternatively, only the score calculation process in steps S24 and S24 ′ and the update process of the task information tables 24 and 24 ′ in steps S25 and S25 ′ can be performed independently. However, in this case, the execution time item, which is the task execution time, is not updated or updated with the maximum value that can be registered.

また、上記で説明した各実施例におけるＰＭ装置は、タスクの実行開始から実行終了までタスクに関する実行状況を収集するとしたが、タスクの実行終了までの間に収集中のＰＭ情報をTIDとともにスケジューラ補助部６、６’に送信する機能が必要となる。この場合、送信するきっかけとしてはタイマによる一定間隔ごと送信処理や、ＰＭ情報のあるデータが設定した閾値を超えた場合に送信処理を行うことなどが考えられる。さらにスケジューラ補助部６、６’が、ＰＭ装置へ能動的に収集途中のＰＭ情報の送信を要求する方法などを適用しても良い。 In addition, the PM device in each of the embodiments described above collects the execution status related to the task from the start of the task execution to the end of the execution. A function for transmitting to the units 6 and 6 'is required. In this case, transmission may be triggered by transmission processing at regular intervals by a timer or transmission processing when data with PM information exceeds a set threshold. Further, a method in which the scheduler auxiliary units 6 and 6 ′ actively request transmission of PM information being collected to the PM device may be applied.

また、上記で説明した各実施例におけるタスクのコア割当先変更を行うのは、IPC（処理効率）が閾値を下回った場合であったが、これに限るものではなく、“IPC×周波数（処理性能）” が閾値を下回った場合、や“IPC×周波数／単位時間当たりの消費電力（消費電力）”が閾値を下回った場合にも、同様にタスクの割当先コアの変更を行うようにしても良いことは勿論である。 In addition, the task core assignment destination change in each embodiment described above is performed when the IPC (processing efficiency) falls below the threshold. However, the present invention is not limited to this, and “IPC × frequency (processing When the “performance” is below the threshold, or when “IPC x frequency / power consumption per unit time (power consumption)” is below the threshold, the task assignment target core is changed in the same way. Of course, it is also good.

このような場合には、図７のコア情報テーブル２３にコア毎の動作周波数、及び、単位時間当たりの消費電力を追加すればよい。その場合のコア情報テーブル２３’は、図３０のようになる。 In such a case, the operating frequency for each core and the power consumption per unit time may be added to the core information table 23 in FIG. The core information table 23 'in that case is as shown in FIG.

ところで、プロセッサコアは、周波数と消費電力との関係は図３１に示したように、電圧が変動なら動作周波数を上げるために電圧を上げる必要があり、図示したように電圧が一定の場合より大幅に向上する。一般にプロセッサコアは周波数の変更設定できるため、周波数の変更設定が行われた都度、コア情報テーブルには２３’には、そのプロセッサコアにそのとき設定されている周波数と、図３１により定まる消費電力とが保存されるようにしておく。 By the way, in the processor core, the relationship between the frequency and the power consumption needs to be increased in order to increase the operating frequency if the voltage fluctuates as shown in FIG. To improve. In general, since a processor core can be set to change the frequency, each time the frequency change is set, the core information table 23 'shows the frequency set at that time for the processor core and the power consumption determined by FIG. And be saved.

このようにしておけば、ＩＰＣ値を受信した際に、コア選択装置１２’がコア情報テーブル２３’を参照することにより、“IPC×周波数”や、“IPC x 周波数 x 単位時間当たりの消費電力”を算出することが可能となる。 In this way, when the IPC value is received, the core selection device 12 ′ refers to the core information table 23 ′, so that “IPC × frequency” or “IPC × frequency × power consumption per unit time”. "Can be calculated.

また、上記で説明した各実施例のプロセッサコア部５は、いずれも同一のＩＳＡ（命令形式を二進数のオペコードのセットで表現したもの）で構成されるオブジェクトコードを実行可能であるとしたが、一部、あるいは、互いに異なった種類のＩＳＡで構成されるオブジェクトコードのみを実行可能である場合にも適用可能である。この場合には、例えば、
それぞれのＩＳＡで実行可能なタスクに対応するオブジェクトコードを用意しておき、
タスクを割り当てるコアプロセッサ部５が決定すると、そのコアプロセッサ部５の種類に対応するオブジェクトコードが格納されるアドレスを知らせるようにし、そのプロセッサコア部５は、そのアドレスからオブジェクトコードを得るようにすれば良い。また別の方法としては、動的にバイナリトランスレーションを実行することによって割り当て先のコアで実行可能なオブジェクトコードを生成する方法、などでも実現できる。 In addition, it is assumed that the processor core unit 5 of each embodiment described above can execute an object code composed of the same ISA (instruction format expressed by a set of binary opcodes). The present invention is also applicable to the case where only part of or only object code composed of different types of ISA can be executed. In this case, for example,
Prepare the object code corresponding to the tasks that can be executed by each ISA,
When the core processor unit 5 to which the task is assigned is determined, the address where the object code corresponding to the type of the core processor unit 5 is stored is notified, and the processor core unit 5 is configured to obtain the object code from the address. It ’s fine. As another method, it can be realized by a method of generating an object code that can be executed by an assigned core by dynamically executing binary translation.

また、上記で説明した各実施例のプロセッサコア部５は、いずれも同一のＩＳＡで構成されるオブジェクトコードを実行可能であるとしたが、コアＢ、コアＣは、コアＡのＩＳＡで構成されるオブジェクトコードのうち、一部のみを実行可能とするようにしても良い。 In addition, the processor core unit 5 of each of the embodiments described above is capable of executing an object code composed of the same ISA, but the core B and the core C are composed of the ISA of the core A. Only a part of the object code may be executable.

なお、この場合には、実行可能なオブジェクトコードに制限があるため、コアＢ、コアＣへのタスクの割り当ても制限されることは勿論である。 In this case, since the executable object code is limited, the assignment of tasks to the core B and core C is of course limited.

また、上記で説明した各実施例のスケジューラ補助部６、６’は、ハードウェアで実現するとして説明してきたが、機能ブロックの一部もしくは全てをソフトウェアで実施してもよい。この場合、一部のみをソフトウェアで実施する際には、各実施例で示してきた各テーブルが、このソフトウェアを実行するプロセッサコア装置から読み書き可能であることが必要となる。 Further, although the scheduler auxiliary units 6 and 6 ′ in the embodiments described above have been described as being realized by hardware, some or all of the functional blocks may be implemented by software. In this case, when only a part is implemented by software, each table shown in each embodiment needs to be readable and writable from a processor core device that executes the software.

また、上記で説明した各実施例のタスク情報テーブル２４、２４’を、ＯＳもしくはアプリケーションソフトが直接読み書き可能とすることで、例えば、プロセッサ装置１の電源をオフにする前にタスク情報テーブル２４、２４’をディスク装置３上に保存し、またその後、プロセッサ装置１の電源がオンになった時点で保存しておいたタスク情報テーブル情報２４、２４’をスケジュール補助部６、６’上のタスク情報テーブル２４、２４’に登録するといった機能も実現できる。さらにアプリケーションソフトウェアごとに予め用意されたタスク情報テーブル２４、２４’を持たせ、これを実行前にスケジューラ補助部６、６’のタスク情報テーブル２４、２４’に登録しておくことで、そのアプリケーションソフトウェアの最初の実行からタスクの特性を計測することなく効率の良い処理を実現させることも可能である。 Further, the task information tables 24 and 24 ′ of each embodiment described above can be directly read and written by the OS or application software, for example, before the processor device 1 is powered off, 24 'is saved on the disk device 3, and then the task information table information 24, 24' saved when the power of the processor device 1 is turned on becomes the task on the schedule auxiliary unit 6, 6 '. A function of registering in the information tables 24 and 24 ′ can also be realized. Furthermore, task information tables 24 and 24 'prepared in advance for each application software are provided, and the application information is registered in the task information tables 24 and 24' of the scheduler auxiliary units 6 and 6 'before execution. It is also possible to realize efficient processing without measuring task characteristics from the first execution of software.

本実施の形態に係るシステム全体の構成を示す図。The figure which shows the structure of the whole system which concerns on this Embodiment. プロセッサ装置１の全体構成を示す図。1 is a diagram illustrating an overall configuration of a processor device 1. FIG. プロセッサ装置１の全体の概略動作を示す図。FIG. 3 is a diagram illustrating an overall schematic operation of the processor device 1. コアＡ〜コアＣが備える処理機構の一例を示した図。The figure which showed an example of the processing mechanism with which core A-core C are provided. ＰＭ情報の一例を示す図。The figure which shows an example of PM information. スケジューラ補助部６の機能ブロック図。The functional block diagram of the scheduler auxiliary | assistant part 6. FIG. ある状態のタスクキュー２１の一例を示す図。The figure which shows an example of the task queue 21 of a certain state. タスクの状態の遷移を示す図。The figure which shows the transition of the state of a task. ある状態のコア管理テーブル２２の一例を示す図。The figure which shows an example of the core management table 22 of a certain state. ある状態のコア情報テーブル２３の一例を示す図。The figure which shows an example of the core information table 23 of a certain state. ある状態のタスク情報テーブル２４の一例を示す図。The figure which shows an example of the task information table 24 of a certain state. タスク情報テーブル２４の更新のフローを示したフローチャートの一部。A part of flowchart which showed the update flow of the task information table 24. FIG. タスク情報テーブル２４の更新のフローを示したフローチャートの一部。A part of flowchart which showed the update flow of the task information table 24. FIG. 閾値テーブルの一例を示す図。The figure which shows an example of a threshold value table. 各閾値との比較結果の一例を示す図。The figure which shows an example of the comparison result with each threshold value. スコアの計算結果の一例を示す図。The figure which shows an example of the calculation result of a score. タスク管理部１１の内部の機能ブロック図Functional block diagram inside task management unit 11 タスク管理部１１の動作の概略フローを示す図。The figure which shows the schematic flow of operation | movement of the task management part 11. FIG. タスク割り当て決定部３２の詳細な動作の概略フローを示す図。The figure which shows the schematic flow of the detailed operation | movement of the task allocation determination part 32. FIG. コアタイプ毎割り当て可否テーブルの一例を示す図。The figure which shows an example of the allocation decision | availability table for every core type. 割り当て候補TIDテーブルの一例を示す図。The figure which shows an example of an allocation candidate TID table. コア状態を反映したタスク毎スコアテーブルの一例を示す図。The figure which shows an example of the score table for every task reflecting a core state. 実行可能タスクスコアテーブルの一例を示す図。The figure which shows an example of the executable task score table. コアＡ〜コアＤが備える処理機構の一例を示した図。The figure which showed an example of the processing mechanism with which the core A-core D are provided. スケジューラ補助部６’の内部の機能ブロック図。The functional block diagram inside a scheduler auxiliary | assistant part 6 '. タスク情報テーブル２４’の一例を示す図。The figure which shows an example of the task information table 24 '. タスク情報テーブル２４’の更新のフローを示したフローチャートの一部。A part of flowchart which showed the update flow of the task information table 24 '. タスク情報テーブル２４’の更新のフローを示したフローチャートの一部。A part of flowchart which showed the update flow of the task information table 24 '. コア状態を反映したタスク毎スコアテーブルの一例を示す図。The figure which shows an example of the score table for every task reflecting a core state. ある状態のコア情報テーブル２３’の一例を示す図。The figure which shows an example of the core information table 23 'of a certain state. プロセッサコア５の周波数と消費電力の関係を示す表。The table | surface which shows the relationship between the frequency of processor core 5, and power consumption.

符号の説明Explanation of symbols

１…プロセッサ装置
２…主記憶装置
３…ディスク装置
４…外部入出力装置
５…プロセッサコア部
６、６’…スケジューラ補助部
７…システムバスＩ／Ｆ
１１、１１’…タスク管理部
１２、１２’…コア選択部
２１…タスクキュー
２２…コア管理テーブル
２３…コア情報テーブル
２４、２４’…タスク情報テーブル
２５…ＰＭデータバッファ DESCRIPTION OF SYMBOLS 1 ... Processor apparatus 2 ... Main memory 3 ... Disk apparatus 4 ... External input / output device 5 ... Processor core part 6, 6 '... Scheduler auxiliary | assistant part 7 ... System bus I / F
11, 11 '... task management unit 12, 12' ... core selection unit 21 ... task queue 22 ... core management table 23 ... core information table 24, 24 '... task information table 25 ... PM data buffer

Claims

データ処理の処理性能を向上する第１の処理機構と、データ処理での利用途中あるいは利用されたハードウェア資源の利用情報を収集するパフォーマンスモニタとを有する第１のプロセッサコアと、該第１の処理機構と同一処理方式で処理性能が劣る第２の処理機構と、データ処理された際のＩＰＣ値を計測するＩＰＣモニタとを有する第２のプロセッサコアとを備えるマルチプロセッサコアと、
同一のタスクを含む複数のタスクを備えるアプリケーションソフトの実行時において、初めて実行するタスクまたは前記ＩＰＣモニタでの計測結果により再割り当てが必要なタスクは前記第１のプロセッサコアへ供給し、以前に実行されたことがあり、且つ前記ＩＰＣモニタでの計測結果により再割り当てが不要なタスクは前記パフォーマンスモニタで以前に収集された該タスクの前記ハードウェア資源の利用情報を参照して前記マルチプロセッサコアから処理させるプロセッサコアを一つ選択し、選択したプロセッサコアへ供給するスケジューリング手段とを備えたことを特徴とするマルチプロセッサシステム。 A first processor core having a first processing mechanism for improving processing performance of data processing, a performance monitor for collecting usage information of hardware resources used during or during data processing, and the first processor core. A multiprocessor core comprising: a second processing mechanism having the same processing method as the processing mechanism and inferior in processing performance; and a second processor core having an IPC monitor that measures an IPC value when data is processed;
When executing application software including a plurality of tasks including the same task, a task to be executed for the first time or a task that needs to be reassigned according to a measurement result in the IPC monitor is supplied to the first processor core and executed previously Tasks that have been performed and need not be reassigned due to the measurement result of the IPC monitor are referred to from the multiprocessor core by referring to the utilization information of the hardware resources of the task previously collected by the performance monitor. A multiprocessor system comprising scheduling means for selecting one processor core to be processed and supplying the selected processor core to the selected processor core.

データ処理の処理性能を向上するための、互いに異なる複数の処理機構と、データ処理での利用途中あるいは利用されたハードウェア資源の利用情報を収集するパフォーマンスモニタとを備える第１のプロセッサコアと、該複数の処理機構の全てによって得られる処理性能未満となり、且つ、該複数の処理機構のそれぞれの処理性能以下となる、少なくとも一つ以上の処理機構と、データ処理された際のＩＰＣ値を計測するＩＰＣモニタとを有する第２のプロセッサコアとを備えるマルチプロセッサコアと、
同一のタスクを含む複数のタスクを備えるアプリケーションソフトの実行時において、初めて実行するタスクまたは前記ＩＰＣモニタでの計測結果により再割り当てが必要なタスクは前記第１のプロセッサコアへ供給し、以前に実行されたことがあり、且つ前記ＩＰＣモニタでの計測結果により再割り当てが不要なタスクは前記パフォーマンスモニタで以前に収集された該タスクの前記ハードウェア資源の利用情報を参照して前記マルチプロセッサコアから処理させるプロセッサコアを一つ選択し、選択したプロセッサコアへ供給するスケジューリング手段とを備えたことを特徴とするマルチプロセッサシステム。 A first processor core comprising: a plurality of different processing mechanisms for improving the processing performance of data processing; and a performance monitor for collecting usage information of hardware resources used during or during data processing; Measuring at least one processing mechanism that is less than the processing performance obtained by all of the plurality of processing mechanisms and less than the processing performance of each of the plurality of processing mechanisms, and the IPC value when data is processed A multiprocessor core comprising: a second processor core having an IPC monitor that:
When executing application software including a plurality of tasks including the same task, a task to be executed for the first time or a task that needs to be reassigned according to a measurement result in the IPC monitor is supplied to the first processor core and executed previously Tasks that have been performed and need not be reassigned due to the measurement result of the IPC monitor are referred to from the multiprocessor core by referring to the utilization information of the hardware resources of the task previously collected by the performance monitor. A multiprocessor system comprising scheduling means for selecting one processor core to be processed and supplying the selected processor core to the selected processor core.

データ処理の処理性能を向上するための、少なくとも互いに異なる第１から第４の４つの処理機構があり、第１の機構、第２の処理機構、およびデータ処理での利用途中あるいは利用されたハードウェア資源の利用情報を収集する第１のパフォーマンスモニタを備える第１のプロセッサコアと、第３の機構、第４の処理機構、およびデータ処理での利用途中あるいは利用されたハードウェア資源の利用情報を収集する第２のパフォーマンスモニタを備える第２のプロセッサコアと、第１および第３の処理機構、および、データ処理された際のＩＰＣ値を計測するＩＰＣモニタとを備える第３のプロセッサコアとを備えるマルチプロセッサコアと、
同一のタスクを含む複数のタスクを備えるソフトウェアの実行時において、初めて実行するタスクまたは前記ＩＰＣモニタでの計測結果により再割り当てが必要なタスクは前記第１のプロセッサコア及び第２のプロセッサコアへ供給し、以前に実行されたことがあり、且つ前記ＩＰＣモニタでの計測結果により再割り当てが不要なタスクは前記第１パフォーマンスモニタおよび第２のパフォーマンスモニタで以前に収集された該タスクの前記ハードウェア資源の利用情報を参照して前記マルチプロセッサコアから処理させるプロセッサコアを一つ選択し、選択したプロセッサコアへ供給するスケジューリング手段とを備えたことを特徴とするマルチプロセッサシステム。 There are at least four different first to fourth processing mechanisms for improving the processing performance of data processing. The first mechanism, the second processing mechanism, and the hardware used in the data processing or used hardware A first processor core having a first performance monitor that collects usage information of hardware resources, a third mechanism, a fourth processing mechanism, and usage information of hardware resources used or used during data processing A second processor core comprising a second performance monitor for collecting data, a first processor core comprising first and third processing mechanisms, and an IPC monitor for measuring an IPC value when data is processed; A multiprocessor core comprising:
When executing software including a plurality of tasks including the same task, a task to be executed for the first time or a task that needs to be reassigned according to a measurement result of the IPC monitor is supplied to the first processor core and the second processor core The task that has been executed before and does not need to be reassigned according to the measurement result in the IPC monitor is the hardware of the task collected in the first performance monitor and the second performance monitor. A multiprocessor system comprising scheduling means for selecting one processor core to be processed from the multiprocessor core with reference to resource usage information and supplying the selected processor core to the selected processor core.

前記マルチプロセッサコアは、前記複数の処理機構の全てによって得られる処理性能未満となり、且つ、該複数の処理機構のそれぞれの処理性能以下となる、少なくとも一つ以上の処理機構と、データ処理された際のＩＰＣ値を計測するＩＰＣモニタとを有する第４のプロセッサコアを備えたとことを特徴とする請求項１乃至請求項３のいずれかに記載のマルチプロセッサシステム。 The multiprocessor core is subjected to data processing with at least one processing mechanism that is less than the processing performance obtained by all of the plurality of processing mechanisms and is equal to or lower than the processing performance of each of the plurality of processing mechanisms. 4. The multiprocessor system according to claim 1, further comprising a fourth processor core having an IPC monitor for measuring an IPC value at the time.

前記ＩＰＣモニタでの計測結果により再割り当てが必要か否かを判断する際に、前記計測結果とその再割り当て先のコアの動作周波数とを掛け算した値で判断するようにしたことを特徴とする、請求項１乃至請求項４のいずれかに記載のマルチプロセッサシステム。 When determining whether or not reassignment is necessary based on the measurement result in the IPC monitor, the determination is made by a value obtained by multiplying the measurement result and the operating frequency of the core to which the reassignment is performed. A multiprocessor system according to any one of claims 1 to 4.

前記ＩＰＣモニタでの計測結果により再割り当てが必要か否かを判断する際に、前記計測結果とその再割り当て先のコアの動作周波数とを掛け算した値に単位時間当たりの消費電力を除した値で判断するようにしたことを特徴とする、請求項１乃至請求項４のいずれかに記載のマルチプロセッサシステム。 A value obtained by dividing power consumption per unit time by multiplying the measurement result and the operating frequency of the reassigned core when determining whether or not reassignment is necessary based on the measurement result of the IPC monitor. The multiprocessor system according to any one of claims 1 to 4, wherein the multiprocessor system is determined as described above.

前記第２のプロセッサコアが実行可能な命令セットは、前記第１のプロセッサコアが実行可能な命令セットの少なくとも一部であることを特徴とする請求項１乃至請求項６のいずれかに記載のマルチプロセッサシステム。 The instruction set that can be executed by the second processor core is at least part of an instruction set that can be executed by the first processor core. Multiprocessor system.

前記第１のプロセッサコアは第１の命令セットで実行可能であり、前記第２のプロセッサコアは前記第１の命令セットとは異なる第２の命令セットで実行可能であることを特徴とする請求項１乃至請求項６のいずれかに記載のマルチプロセッサシステム。 The first processor core can be executed by a first instruction set, and the second processor core can be executed by a second instruction set different from the first instruction set. The multiprocessor system according to any one of claims 1 to 6.

前記パフォーマンスモニタで収集された該タスクの前記ハードウェア資源の利用情報を、外部へ出力可であって、出力したが入用情報を外部より入力可としたことを特徴とする請求項１乃至請求項８のいずれかに記載のマルチプロセッサシステム。 The utilization information of the hardware resource of the task collected by the performance monitor can be output to the outside, and the input information can be input from the outside although it is output. Item 9. The multiprocessor system according to Item 8.