WO2023238276A1

WO2023238276A1 - Information processing device and information processing method

Info

Publication number: WO2023238276A1
Application number: PCT/JP2022/023104
Authority: WO
Inventors: 克久小笠原; 涼太北川
Original assignee: 三菱電機株式会社
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2023-12-14

Abstract

A performance-power optimization unit (1100) is provided with a performance-power optimization program (1110), roofline model data (1101) that indicates performance per unit time and operational intensity of computer hardware (1300), and an application performance definition table (1101) that includes operational intensity information for an application. The performance-power optimization program (1110) is provided with a scheduling information acquisition unit (1111) that acquires scheduling information for the application with respect to a computing core (1310) of the computer hardware (1300), and a roofline optimization calculation unit (1112) that performs optimization calculation of the roofline of the computer hardware (1300).

Description

情報処理装置および情報処理方法Information processing device and information processing method

　本願は、演算処理性能に適応して消費電力を制御する情報処理装置および情報処理方法に関する。 The present application relates to an information processing device and an information processing method that control power consumption in accordance with arithmetic processing performance.

　自動制御システムは、一般的に、複数の機能が協調・統合化して認知・判断・制御を行うシステムである。例えば、自動運転システムは、周辺状況から最適な制御パラメータを生成する自動運転制御部と、車両のエンジン制御、ブレーキ制御およびステアリング制御をそれぞれ実現するエンジン制御部、ブレーキ制御部およびステアリング制御部で構成される。自律レベル（例：自動運転レベル）が上がるにつれ、自動制御システムに多くの演算性能を必要とする。 An automatic control system is generally a system in which multiple functions are coordinated and integrated to perform recognition, judgment, and control. For example, an automated driving system consists of an automated driving control unit that generates optimal control parameters from surrounding conditions, and an engine control unit, brake control unit, and steering control unit that implement vehicle engine control, brake control, and steering control, respectively. be done. As the level of autonomy (e.g. self-driving) increases, automatic control systems require more computing performance.

　これに対し、システムが必要とする処理性能を満たすため、高性能かつ複数の演算コアおよび大容量の主記憶装置を備えたシステムオンチップ（ＳｏＣ；Ｓｙｓｔｅｍｏｎａｃｈｉｐ）を搭載するシステム構成がある。一方で、演算コアの高性能化およびマルチコア／メニ―コア化、主記憶装置の大容量化に伴い、システムの消費電力および発熱量が増大する。これに対し、例えば、動的電圧周波数制御（ＤＶＦＳ；ＤｙｎａｍｉｃＶｏｌｔａｇｅａｎｄＦｒｅｑｕｅｎｃｙＳｃａｌｉｎｇ）を備える演算処理装置がある。ＤＶＦＳ機能は、演算コアの動作周波数および動作電圧を変更し消費電力の低減を行う省電力機構となる。しかしながら、アプリケーションなどの演算処理の演算性能に影響を与えずに、リアルタイムに消費電力を最適化制御することは容易でない。さらに、複数の演算コアを含むシステム構成で並列実行するアプリケーションあるいは複数のコンテナ環境内のアプリケーションを考慮した最適化制御が必要となる。 On the other hand, in order to meet the processing performance required by the system, there are system configurations that are equipped with a system on a chip (SoC) that has high performance, multiple processing cores, and a large-capacity main storage device. . On the other hand, as the performance of arithmetic cores increases, multi-core/many-core technology increases, and the capacity of main storage devices increases, power consumption and heat generation of the system increase. On the other hand, for example, there is an arithmetic processing device equipped with dynamic voltage frequency control (DVFS). The DVFS function is a power saving mechanism that reduces power consumption by changing the operating frequency and operating voltage of the arithmetic core. However, it is not easy to optimize and control power consumption in real time without affecting the performance of arithmetic processing in applications and the like. Furthermore, optimization control is required that takes into account applications that are executed in parallel in a system configuration that includes multiple processing cores or applications that are in multiple container environments.

　このような課題に対し、特許文献１では、ハイパーコンバージドインフラストラクチャ（ＨＣＩ）環境でのＶＭ／コンテナおよびデータ配置決定方法が開示されている。また、特許文献２では、プロセッサ内部で取得可能なメモリアクセス情報を拠り所に、プロセッサ周波数、命令発行幅を変えて電力消費を削減する装置が開示されている。 To address these issues, Patent Document 1 discloses a method for determining VM/container and data placement in a hyperconverged infrastructure (HCI) environment. Further, Patent Document 2 discloses a device that reduces power consumption by changing processor frequency and instruction issue width based on memory access information that can be obtained inside the processor.

特開２０２０－５２７３０号公報JP2020-52730A 国際公開第２００８／１２０２７４号公報International Publication No. 2008/120274

　特許文献１では、共用する計算機資源の利用状況を管理し、利用状況に基づいて、配置先ノードの計算機資源の上限を超えないように、新規の仮想マシン、コンテナ、ストレージボリュームの配置先ノードを決定する方法であり、アプリケーションの演算処理に適応した情報処理装置の性能電力の最適化を対象としていない。特許文献２では、プロセッサ内部のメモリアクセス情報に基づきプロセッサの電力を削減する装置であり、アプリケーションの演算処理に関する情報に基づいた高精度な性能電力の最適化を対象としていない。さらに、複数の演算コアで並列処理するアプリケーションも対象としていない。また、性能電力の最適化対象に主記憶装置を含まない。 In Patent Document 1, the usage status of shared computer resources is managed, and based on the usage status, the destination node for new virtual machines, containers, and storage volumes is selected so as not to exceed the upper limit of the computer resources of the destination node. This is a method for determining the performance and power of an information processing device adapted to the calculation processing of an application. Patent Document 2 is a device that reduces power of a processor based on memory access information inside the processor, and does not target highly accurate optimization of performance power based on information regarding arithmetic processing of an application. Furthermore, it does not target applications that perform parallel processing using multiple processing cores. Furthermore, the main storage device is not included in the performance power optimization target.

　特許文献１及び特許文献２に開示された技術を組み合わせても、複数の演算コアで並列処理するアプリケーションの演算処理に適応し、処理性能を妨げないリアルタイムな情報処理装置の性能電力の最適化を実現できないという問題があった。 Even if the technologies disclosed in Patent Document 1 and Patent Document 2 are combined, it is possible to optimize the performance and power of a real-time information processing device that adapts to the calculation processing of applications that perform parallel processing using multiple calculation cores and does not hinder processing performance. The problem was that it couldn't be done.

　本願は、これらの問題点に鑑みてなされたものであり、アプリケーションの演算処理に適応し、処理性能を妨げないリアルタイムな情報処理装置の性能電力の最適化制御を可能にすることを目的とする。また、本願は、複数の演算コアで並列処理するアプリケーションの実行に適応する情報処理装置の性能電力の最適化制御を可能にすることを目的とする。複数の演算コアは同一種類の演算コアだけでなく、処理方法が異なる演算コアを複数もつヘテロジニアスな情報処理装置の構成に対応する。 The present application was made in view of these problems, and aims to enable optimization control of the performance power of an information processing device in real time, which is adapted to the arithmetic processing of applications and does not impede processing performance. . Another object of the present application is to enable optimization control of the performance and power of an information processing device that is adapted to the execution of an application that is processed in parallel by a plurality of arithmetic cores. The plurality of arithmetic cores corresponds to the configuration of a heterogeneous information processing device having not only arithmetic cores of the same type but also a plurality of arithmetic cores with different processing methods.

　本願に開示される情報処理装置は、省電力機構を備える演算コアおよび主記憶装置を含む情報処理装置において、演算性能と消費電力の最適化処理を行う性能電力最適化部に、計算機の演算強度と単位時間当たりの演算性能を示すルーフラインモデルデータと、アプリケーションの演算強度情報を含むアプリケーション性能定義テーブルと、計算機の演算コアに対するアプリケーションのスケジューリング情報取得部と、計算機のルーフライン最適化計算部と、を備えている。 The information processing device disclosed in the present application includes a computing power optimization unit that performs processing for optimizing computing performance and power consumption in an information processing device that includes a computing core and a main storage device equipped with a power saving mechanism. and roofline model data indicating calculation performance per unit time, an application performance definition table containing application calculation strength information, an application scheduling information acquisition unit for the calculation core of the computer, and a roofline optimization calculation unit of the computer. , is equipped with.

　本願の情報処理装置によれば、複数の演算コアで並列処理するアプリの演算強度に適応した情報処理装置（演算コア・主記憶装置）の高精度な性能電力最適化を可能とする。アプリケーションのスケジューリング情報と連携し、アプリの処理性能を妨げないリアルタイムな性能電力最適化が可能な情報処理装置を提供する。なお、複数の演算コアは同一種類の演算コアではなく、処理方法が異なる演算コアを複数もつヘテロジニアスな構成にも対応する。 According to the information processing device of the present application, it is possible to perform highly accurate performance and power optimization of the information processing device (computation core/main storage device) that is adapted to the computation intensity of an application that is processed in parallel by multiple computation cores. The present invention provides an information processing device that is capable of real-time performance and power optimization that does not hinder application processing performance by linking with application scheduling information. Note that the plurality of arithmetic cores are not of the same type, but also correspond to a heterogeneous configuration having a plurality of arithmetic cores with different processing methods.

実施の形態１に係る情報処理装置の構成を示すブロック図である。1 is a block diagram showing the configuration of an information processing device according to Embodiment 1. FIG. 実施の形態１に係る情報処理装置におけるルーフラインモデルデータの例を示す図である。3 is a diagram illustrating an example of roofline model data in the information processing apparatus according to the first embodiment. FIG. 実施の形態１に係る情報処理装置におけるアプリケーション性能定義テーブルの内容例を示すテーブル図である。FIG. 3 is a table diagram showing an example of contents of an application performance definition table in the information processing apparatus according to the first embodiment. 実施の形態１に係る情報処理装置におけるスケジューリング情報取得部から取得する情報の例を示す図である。3 is a diagram illustrating an example of information acquired from a scheduling information acquisition unit in the information processing apparatus according to the first embodiment. FIG. 実施の形態１に係る情報処理装置におけるスケジューリング情報取得部から取得する情報の例を示す図である。3 is a diagram illustrating an example of information acquired from a scheduling information acquisition unit in the information processing apparatus according to the first embodiment. FIG. 実施の形態１に係る情報処理装置における性能電力最適化部の動作フローを示す図である。3 is a diagram showing an operation flow of a performance power optimization unit in the information processing device according to the first embodiment. FIG. 実施の形態１に係る情報処理装置におけるルーフライン最適化計算部の動作フローを示す図である。FIG. 3 is a diagram illustrating an operation flow of a roofline optimization calculation unit in the information processing device according to the first embodiment. 実施の形態１に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータの勾配部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation in a case where the calculation strength of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection with a slope portion of roofline model data. 実施の形態１に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータの勾配部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation in a case where the calculation strength of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection with a slope portion of roofline model data. 実施の形態１に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータの勾配部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation in a case where the calculation strength of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection with a slope portion of roofline model data. 実施の形態１に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータのルーフ部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation when the calculation intensity of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection point with a roof portion of roofline model data. 実施の形態１に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータのルーフ部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation when the calculation intensity of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection point with a roof portion of roofline model data. 実施の形態１に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータのルーフ部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation when the calculation intensity of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection point with a roof portion of roofline model data. 実施の形態１に係る情報処理装置におけるルーフライン制御部の動作フローを示す図である。FIG. 3 is a diagram showing an operation flow of a roofline control section in the information processing device according to the first embodiment. 実施の形態２に係る情報処理装置におけるスケジューリング情報取得部から取得する情報の例を示す図である。7 is a diagram illustrating an example of information acquired from a scheduling information acquisition unit in the information processing apparatus according to Embodiment 2. FIG. 実施の形態２に係る情報処理装置におけるルーフライン最適化計算部の動作フローを示す図である。7 is a diagram showing an operation flow of a roofline optimization calculation unit in the information processing device according to the second embodiment. FIG. 実施の形態２に係る情報処理装置におけるルーフライン最適化計算部に含む制御性能算出処理の動作フローを示す図である。7 is a diagram illustrating an operational flow of control performance calculation processing included in a roofline optimization calculation section in the information processing device according to the second embodiment. FIG. 実施の形態２に係る情報処理装置におけるルーフライン最適化計算部によるルーフライン最適化計算の例を示す図である。FIG. 7 is a diagram illustrating an example of roofline optimization calculation by a roofline optimization calculation unit in the information processing device according to the second embodiment. 実施の形態２に係る情報処理装置におけるルーフライン最適化計算部によるルーフライン最適化計算の例を示す図である。FIG. 7 is a diagram illustrating an example of roofline optimization calculation by a roofline optimization calculation unit in the information processing device according to the second embodiment. 実施の形態２に係る情報処理装置におけるルーフライン最適化計算部によるルーフライン最適化計算の例を示す図である。FIG. 7 is a diagram illustrating an example of roofline optimization calculation by a roofline optimization calculation unit in the information processing device according to the second embodiment. 実施の形態２に係る情報処理装置におけるルーフライン最適化計算部によるルーフライン最適化計算の例を示す図である。FIG. 7 is a diagram illustrating an example of roofline optimization calculation by a roofline optimization calculation unit in the information processing device according to the second embodiment. 実施の形態２に係る情報処理装置のルーフライン制御部における実行スレッド取得処理の動作フローを示す図である。FIG. 7 is a diagram illustrating an operation flow of execution thread acquisition processing in the roofline control unit of the information processing device according to the second embodiment.

実施の形態１．
　図１は、実施の形態１に係る情報処理装置１０００の構成を示すブロック図である。
　情報処理装置１０００には、少なくとも、性能電力最適化部１１００と、システムソフトウェア１２００と、計算機ハードウェア１３００と、アプリケーション１４００とを有する。 Embodiment 1.
FIG. 1 is a block diagram showing the configuration of an information processing apparatus 1000 according to the first embodiment.
The information processing device 1000 includes at least a performance power optimization unit 1100, system software 1200, computer hardware 1300, and an application 1400.

　性能電力最適化部１１００は、アプリケーション１４００の演算処理に適応する計算機ハードウェア１３００の性能電力を最適化制御する計算を行う。
　システムソフトウェア１２００は、少なくとも、アプリケーション１４００を計算機ハードウェア１３００に割り当てて実行するとともに、計算機ハードウェア１３００の実行状態の取得、計算機ハードウェア１３００の性能および電力を制御する。 The performance power optimization unit 1100 performs calculations to optimize and control the performance power of the computer hardware 1300 adapted to the calculation processing of the application 1400.
System software 1200 at least allocates and executes application 1400 to computer hardware 1300, acquires the execution state of computer hardware 1300, and controls the performance and power of computer hardware 1300.

　アプリケーション１４００は、システムソフトウェア１２００が割り当てた計算機ハードウェア１３００のリソースを使用して実行動作する。なお、アプリケーション１４００は、コンテナランタイム１５００がシステムソフトウェア１２００とともに用意するコンテナ実行環境１６００内で動作するアプリケーション１６１０であってもよい。 The application 1400 is executed using the resources of the computer hardware 1300 allocated by the system software 1200. Note that the application 1400 may be an application 1610 that operates within a container execution environment 1600 that is prepared by the container runtime 1500 together with the system software 1200.

　性能電力最適化部１１００には、ルーフラインモデルデータ１１０１と、アプリケーション性能定義テーブル１１０２と、性能電力最適化プログラム１１１０とを備える。 The performance power optimization unit 1100 includes roofline model data 1101, an application performance definition table 1102, and a performance power optimization program 1110.

　ルーフラインモデルデータ１１０１は、計算機ハードウェア１３００における、演算強度と単位時間当たりの演算性能との関係を表すデータである。演算強度は、データサイズ、例えば、１Ｂｙｔｅあたりの演算量を表す。演算性能は、単位時間、例えば１秒あたりの演算量を表す。 Roofline model data 1101 is data representing the relationship between calculation intensity and calculation performance per unit time in computer hardware 1300. The calculation intensity represents the amount of calculation per data size, for example, 1 Byte. Computation performance represents the amount of computation per unit time, for example, per second.

　アプリケーション性能定義テーブル１１０２は、アプリケーション１４００およびコンテナ実行環境１６００内で動作するアプリケーション１６１０を、計算機ハードウェア１３００上で実行するにあたり、計算機ハードウェア１３００の性能電力制御の最適化計算を行うために必要な情報を記述する。 The application performance definition table 1102 contains information necessary to perform optimization calculations for performance power control of the computer hardware 1300 when the application 1400 and the application 1610 running in the container execution environment 1600 are executed on the computer hardware 1300. Describe information.

　性能電力最適化プログラム１１１０は、スケジューリング情報取得部１１１１と、ルーフライン最適化計算部１１１２と、ルーフライン制御設定部１１１３とを含む。 The performance power optimization program 1110 includes a scheduling information acquisition section 1111, a roofline optimization calculation section 1112, and a roofline control setting section 1113.

　スケジューリング情報取得部１１１１は、アプリケーション１４００およびコンテナ実行環境１６００内で動作するアプリケーション１６１０の、計算機ハードウェア１３００に対する割り当て情報を取得する。 The scheduling information acquisition unit 1111 acquires allocation information of the application 1400 and the application 1610 running in the container execution environment 1600 to the computer hardware 1300.

　ルーフライン最適化計算部１１１２は、アプリケーション１４００およびコンテナ実行環境１６００内で動作するアプリケーション１６１０を、計算機ハードウェア１３００上で実行するにあたり、計算機ハードウェア１３００の性能電力制御の最適化計算を行う。 The roofline optimization calculation unit 1112 performs optimization calculations for performance power control of the computer hardware 1300 when executing the application 1400 and the application 1610 running in the container execution environment 1600 on the computer hardware 1300.

　ルーフライン制御設定部１１１３は、ルーフライン最適化計算部１１１２による性能電力制御の最適化計算にもとづいて、計算機ハードウェア１３００の性能電力の制御設定をシステムソフトウェア１２００へ行う。 The roofline control setting unit 1113 sets the performance power control of the computer hardware 1300 to the system software 1200 based on the performance power control optimization calculation performed by the roofline optimization calculation unit 1112.

　システムソフトウェア１２００は、スケジューラ１２０１と、ルーフライン制御部１２０２と、演算コア性能電力制御部１２０３と、メモリ帯域電力制御部１２０４とを備える。 The system software 1200 includes a scheduler 1201 , a roofline control section 1202 , an arithmetic core performance power control section 1203 , and a memory bandwidth power control section 1204 .

スケジューラ１２０１は、アプリケーション１４００およびコンテナ実行環境１６００内で動作するアプリケーション１６１０を、計算機ハードウェア１３００上への割り当ておよび実行を制御する。 The scheduler 1201 controls the allocation and execution of the application 1400 and the application 1610 running within the container execution environment 1600 on the computer hardware 1300.

　ルーフライン制御部１２０２は、計算機ハードウェア１３００における、演算強度と単位時間当たりの演算性能との関係を表すルーフラインモデルを制御する。 The roofline control unit 1202 controls a roofline model representing the relationship between calculation intensity and calculation performance per unit time in the computer hardware 1300.

　演算コア性能電力制御部１２０３は、計算機ハードウェア１３００に備える演算コアの処理性能および電力を制御する。 The arithmetic core performance and power control unit 1203 controls the processing performance and power of the arithmetic cores provided in the computer hardware 1300.

　メモリ帯域電力制御部１２０４は、計算機ハードウェア１３００に備える主記憶装置の帯域性能および電力を制御する。 The memory bandwidth power control unit 1204 controls the bandwidth performance and power of the main storage device included in the computer hardware 1300.

　計算機ハードウェア１３００には、少なくとも、ひとつ以上の演算コア１３１０と、主記憶装置１３２０を備える。
　演算コア１３１０は、少なくとも、演算処理を行う演算機および演算コア１３１０の消費電力を制御する省電力機構を備える。なお、演算コア１３１０は、同一種類の演算コアだけでなく、処理方法が異なる演算コアを複数もつヘテロジニアスな情報処理装置の構成でもよい。
　主記憶装置１３２０は、演算処理を行うためのデータを記憶し、演算コア１３１０は主記憶装置１３２０へデータのロードおよびストアを行う。主記憶装置１３２０のコントローラ（図示せず）は、主記憶装置１３２０の消費電力を制御する省電力機構を備える。主記憶装置１３２０は、演算コア１３１０との間に備えるキャッシュメモリ（図示せず）を含んでもよい。 Computer hardware 1300 includes at least one or more arithmetic cores 1310 and a main storage device 1320.
The arithmetic core 1310 includes at least a computing machine that performs arithmetic processing and a power saving mechanism that controls power consumption of the arithmetic core 1310. Note that the calculation core 1310 may be configured as a heterogeneous information processing device having not only calculation cores of the same type but also a plurality of calculation cores with different processing methods.
The main storage device 1320 stores data for performing calculation processing, and the calculation core 1310 loads and stores data to the main storage device 1320. The controller (not shown) of the main storage device 1320 includes a power saving mechanism that controls power consumption of the main storage device 1320. The main storage device 1320 may include a cache memory (not shown) provided between the main storage device 1320 and the arithmetic core 1310.

　図２は、実施の形態１のルーフラインモデルデータ１１０１の例を示す図である。 FIG. 2 is a diagram showing an example of roofline model data 1101 according to the first embodiment.

　ルーフラインモデルデータ１１０１は、少なくとも、計算機ハードウェア１３００に備える演算コア１３１０ごとのルーフラインモデルデータ２０００から構成する。
　例えば、図２の例は、ルーフラインモデルデータ２０００を、横軸を演算強度、縦軸を演算性能とするグラフで表したものである。さらに、演算コア１３１０の制御可能な演算性能の最大値Ａおよび最小値Ｂの情報と、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃおよび最小値Ｄの情報とを含む。 Roofline model data 1101 is composed of at least roofline model data 2000 for each calculation core 1310 provided in computer hardware 1300.
For example, in the example of FIG. 2, the roofline model data 2000 is expressed as a graph in which the horizontal axis represents calculation strength and the vertical axis represents calculation performance. Furthermore, it includes information on the maximum value A and minimum value B of the controllable calculation performance of the calculation core 1310, and information on the maximum value C and minimum value D of the controllable bandwidth performance of the main storage device 1320.

　ルーフラインモデルデータ２０００は、演算コア１３１０の制御可能な演算性能および主記憶装置１３２０の制御可能な帯域性能について、演算強度に対する演算性能の上限値を規定する。
　なお、ルーフラインモデルの詳細は、例えば、「Samuel Williams, Andrew Waterman and David Patterson, "Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore, (2009)"」に記載されている。 The roofline model data 2000 defines the upper limit of the calculation performance with respect to the calculation intensity with respect to the controllable calculation performance of the calculation core 1310 and the controllable bandwidth performance of the main storage device 1320.
Note that details of the roofline model are described in, for example, "Samuel Williams, Andrew Waterman and David Patterson, "Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore, (2009)".

　図３は、実施の形態１のアプリケーション性能定義テーブル１１０２の内容例を示すテーブル図である。 FIG. 3 is a table diagram showing an example of the contents of the application performance definition table 1102 according to the first embodiment.

　アプリケーション性能定義テーブル１１０２は、コンテナ識別子３０００と、アプリケーション識別子３００１と、実行スレッドを示す実行スレッド識別子３００２（以下、実行スレッドとも称す）と、演算強度３００３と、メモリ転送実行効率３００４とを含む。 The application performance definition table 1102 includes a container identifier 3000, an application identifier 3001, an execution thread identifier 3002 indicating an execution thread (hereinafter also referred to as an execution thread), an operation strength 3003, and a memory transfer execution efficiency 3004.

　コンテナ識別子３０００は、コンテナ実行環境１６００を示す識別子を記載している。なお、アプリケーション１４００がコンテナ実行環境１６００内で動作するアプリケーションでない場合には、コンテナ識別子３０００は無効を示す識別子とする。 The container identifier 3000 describes an identifier indicating the container execution environment 1600. Note that if the application 1400 is not an application that operates within the container execution environment 1600, the container identifier 3000 is an identifier indicating invalidity.

　アプリケーション識別子３００１は、コンテナ識別子３０００ごとに、コンテナ実行環境１６００内で動作するアプリケーション１６１０を示す識別子を記載している。なお、アプリケーション１４００がコンテナ実行環境１６００内で動作するアプリケーションでない場合には、アプリケーション１４００ごとに識別子を記載している。
　例えば、図３の例は、コンテナ識別子３０００の識別子Ｃ１で記載しているコンテナ実行環境１６００内で動作するアプリケーション１６１０は、アプリケーション識別子３００１から、識別子Ａｐｐ１および識別子Ａｐｐ２のふたつであることが例示されている。また、識別子Ａｐｐ３に対するコンテナ識別子３０００の識別子は無効であり、コンテナ実行環境１６００内で動作するアプリケーションでないことが例示されている。
　実行スレッド識別子３００２は、アプリケーションごとの実行スレッドを示す識別子を記載している。アプリケーションの実行スレッドは、通常、ひとつ以上を含む。
　演算強度３００３は、アプリケーションの実行スレッドごとの演算強度を示す情報を記載する。
　メモリ転送実行効率３００４は、アプリケーションの実行スレッドごとの演算コア１３１０と主記憶装置１３２０との間のデータ転送実行効率を記載している。 The application identifier 3001 describes, for each container identifier 3000, an identifier indicating an application 1610 that operates within the container execution environment 1600. Note that if the application 1400 is not an application that operates within the container execution environment 1600, an identifier is written for each application 1400.
For example, in the example of FIG. 3, the applications 1610 that operate within the container runtime environment 1600 described by the identifier C1 of the container identifier 3000 are illustrated as having two identifiers, App1 and App2, from the application identifier 3001. There is. Furthermore, the container identifier 3000 for the identifier App3 is invalid, and is not an application that operates within the container execution environment 1600.
The execution thread identifier 3002 describes an identifier indicating an execution thread for each application. An application's execution threads typically include one or more.
The calculation strength 3003 describes information indicating the calculation strength of each execution thread of the application.
The memory transfer execution efficiency 3004 describes the data transfer execution efficiency between the arithmetic core 1310 and the main storage device 1320 for each execution thread of the application.

　図４Ａおよび図４Ｂは、実施の形態１におけるスケジューリング情報取得部１１１１から取得するスケジューリング情報の例を示す図である。
　スケジューリング情報は、少なくとも、演算コア１３１０を構成する演算コア０、演算コア１ごとに、演算コア１３１０に対してアプリケーションの実行スレッドを割り当てるタイミングＴ１，Ｔ２，Ｔ３と、割り当て時間間隔Ｔ１ａ，Ｔ２ａ，Ｔ３ａと、スケジューリング周期Ｐと、時間Ｔの情報とを含む。 4A and 4B are diagrams illustrating examples of scheduling information acquired from the scheduling information acquisition unit 1111 in the first embodiment.
The scheduling information includes at least timings T1, T2, and T3 for allocating application execution threads to the computing core 1310, and allocation time intervals T1a, T2a, and T3a for each computing core 0 and computing core 1 that constitute the computing core 1310. , a scheduling period P, and time T information.

　例えば、図４Ａは、アプリケーションの実行スレッドが複数の演算コア０、演算コア１で同時間での実行がない場合の例示である。演算コア０と演算コア１に対するアプリケーションの実行スレッドのスケジューリング情報を例示しており、演算コア０と演算コア１の各々で実行するスレッドが同時間で重なることがない。 For example, FIG. 4A is an example of a case where the execution threads of an application are not executed at the same time by a plurality of calculation cores 0 and 1. The scheduling information of application execution threads for computing core 0 and computing core 1 is illustrated, and the threads executed by computing core 0 and computing core 1 do not overlap at the same time.

　また、図４Ｂは、演算コア１３１０がひとつの演算コア０である場合の例示である。例えば、他の演算コアが低電力のスリープ状態でひとつの演算コア０のみ動作する場合でもよい。この場合も、演算コアで実行するスレッドは同時間ではひとつとなる。 Further, FIG. 4B is an example in which the calculation core 1310 is one calculation core 0. For example, only one calculation core 0 may operate while other calculation cores are in a low-power sleep state. In this case as well, only one thread is executed on the computing core at the same time.

　図５は、実施の形態１の性能電力最適化部１１００の動作フローを示す図である。 FIG. 5 is a diagram showing an operation flow of the performance power optimization unit 1100 of the first embodiment.

　ステップＳ５０００にて、性能電力最適化部１１００は、性能電力最適化工程を開始する。
　ステップＳ５００１にて、性能電力最適化部１１００は、スケジューリング情報取得部１１１１により、アプリケーションの実行スレッドのスケジューリング情報を取得する。なお、スケジューリング情報の例は前述した図４Ａおよび図４Ｂで示した。 In step S5000, performance power optimization section 1100 starts a performance power optimization process.
In step S5001, the performance power optimization unit 1100 uses the scheduling information acquisition unit 1111 to acquire scheduling information of the execution thread of the application. Note that examples of scheduling information are shown in FIGS. 4A and 4B described above.

　ステップＳ５００２にて、性能電力最適化部１１００は、ルーフライン最適化計算部１１１２により、アプリケーション１４００の実行スレッドの演算処理に適応する計算機ハードウェア１３００の性能電力の最適化計算を行う。 In step S5002, the performance power optimization unit 1100 uses the roofline optimization calculation unit 1112 to perform optimization calculation of the performance power of the computer hardware 1300 that is adapted to the calculation processing of the execution thread of the application 1400.

　ステップＳ５００３にて、性能電力最適化部１１００は、ルーフライン制御設定部１１１３により、ステップＳ５００２の最適化計算にもとづいて、計算機ハードウェア１３００の性能電力の制御設定を行う。
　ステップＳ５００４にて、性能電力最適化部１１００は、性能電力最適化工程を終了する。 In step S5003, the performance power optimization unit 1100 uses the roofline control setting unit 1113 to perform control settings for the performance power of the computer hardware 1300 based on the optimization calculation in step S5002.
In step S5004, the performance power optimization unit 1100 ends the performance power optimization process.

　図６は、実施の形態１のルーフライン最適化計算部１１１２の動作フローを示す図である。 FIG. 6 is a diagram showing the operation flow of the roofline optimization calculation unit 1112 of the first embodiment.

　ステップＳ６０００にて、ルーフライン最適化計算部１１１２は、最適化計算工程を開始する。
　ステップＳ６００１にて、ルーフライン最適化計算部１１１２は、アプリケーションの実行スレッドごとにステップＳ６００1～ステップＳ６０１３を繰り返す。なお、アプリケーションの実行スレッドは、図５のステップＳ５００１で示したスケジューリング情報取得部１１１１で取得したスケジューリング情報に記載のアプリケーションの実行スレッドである。 In step S6000, roofline optimization calculation section 1112 starts an optimization calculation process.
In step S6001, the roofline optimization calculation unit 1112 repeats steps S6001 to S6013 for each execution thread of the application. Note that the application execution thread is the application execution thread described in the scheduling information acquired by the scheduling information acquisition unit 1111 shown in step S5001 in FIG.

　ステップＳ６００２にて、ルーフライン最適化計算部１１１２は、アプリケーション性能定義テーブル１１０２から、図３における実行スレッド識別子３００２に該当する演算強度３００３を取得する。 In step S6002, the roofline optimization calculation unit 1112 obtains the calculation strength 3003 corresponding to the execution thread identifier 3002 in FIG. 3 from the application performance definition table 1102.

　ステップＳ６００３にて、ルーフライン最適化計算部１１１２は、実行スレッドが動作する演算コア１３１０について、演算コア１３１０の制御可能な演算性能の最大値Ａと、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃとで構成するルーフラインモデルデータ２０００（図２参照）と、ステップＳ６００２で取得した演算強度３００３との交点（第１の交点）を算出する。第１の交点である交点Ａ１がルーフラインモデルデータ２０００の勾配部にある場合には、ステップＳ６００４へ進む。交点Ｂ１がルーフラインモデルデータ２０００のルーフ部にある場合には、ステップＳ６００９へ進む。 In step S6003, the roofline optimization calculation unit 1112 determines, for the arithmetic core 1310 on which the execution thread operates, the maximum value A of the controllable arithmetic performance of the arithmetic core 1310 and the controllable bandwidth performance of the main storage device 1320. The intersection point (first intersection point) between the roof line model data 2000 (see FIG. 2) consisting of the maximum value C and the calculation intensity 3003 acquired in step S6002 is calculated. If the first intersection A1 is on the slope of the roofline model data 2000, the process advances to step S6004. If the intersection B1 is on the roof of the roof line model data 2000, the process advances to step S6009.

　ステップＳ６００４にて、ルーフライン最適化計算部１１１２は、ルーフラインモデルデータ２０００の勾配部にある交点Ａ１の演算性能値Ｐ１を示すルーフを算出する。 In step S6004, the roofline optimization calculation unit 1112 calculates a roof that indicates the calculation performance value P1 of the intersection A1 in the slope part of the roofline model data 2000.

　ステップＳ６００５にて、ルーフライン最適化計算部１１１２は、アプリケーション性能定義テーブル１１０２から、実行スレッド識別子３００２に該当するメモリ転送実行効率３００４を取得する。 In step S6005, the roofline optimization calculation unit 1112 obtains the memory transfer execution efficiency 3004 corresponding to the execution thread identifier 3002 from the application performance definition table 1102.

　ステップＳ６００６にて、ルーフライン最適化計算部１１１２は、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃに、ステップＳ６００５で取得したメモリ転送実行効率３００４を乗じた帯域性能を算出する。なお、例示したルーフラインモデルデータ２０００のグラフは両対数グラフであり、帯域性能はルーフラインモデルデータ２０００の勾配部で表される。算出した帯域性能による新たな勾配部と、ステップＳ６００２で取得した演算強度３００３との交点Ａ２を算出する。 In step S6006, the roofline optimization calculation unit 1112 calculates the bandwidth performance by multiplying the maximum value C of the controllable bandwidth performance of the main storage device 1320 by the memory transfer execution efficiency 3004 obtained in step S6005. Note that the graph of the illustrated roofline model data 2000 is a logarithmic graph, and the band performance is expressed by the slope part of the roofline model data 2000. An intersection point A2 between the new gradient part based on the calculated band performance and the calculation strength 3003 acquired in step S6002 is calculated.

　ステップＳ６００７にて、ルーフライン最適化計算部１１１２は、ステップＳ６００６で求めた交点Ａ２の演算性能値Ｐ２を示すルーフを算出する。 In step S6007, the roof line optimization calculation unit 1112 calculates the roof that indicates the calculation performance value P2 of the intersection A2 obtained in step S6006.

　ステップＳ６００８にて、ルーフライン最適化計算部１１１２は、主記憶装置１３２０の制御帯域性能を最大値Ｃとし、演算コア１３１０の制御性能の理論値をステップＳ６００４で算出したルーフの演算性能値Ｐ１とし、下限値をステップＳ６００７で算出したルーフの演算性能値Ｐ２とする。ステップＳ６００８の処理工程の後、ステップＳ６０１３へ進む。 In step S6008, the roofline optimization calculation unit 1112 sets the control band performance of the main storage device 1320 to the maximum value C, and sets the theoretical value of the control performance of the calculation core 1310 to the roof calculation performance value P1 calculated in step S6004. , the lower limit value is set to the roof calculation performance value P2 calculated in step S6007. After the processing step of step S6008, the process advances to step S6013.

　ステップＳ６００９にて、ルーフライン最適化計算部１１１２は、例示したルーフラインモデルデータ２０００の勾配部を平行移動し、ルーフラインモデルデータ２０００のルーフ部にある交点Ｂ１を通る新たな勾配と、その勾配の勾配値（理論帯域値とする）を算出する。
　ステップＳ６０１０にて、ルーフライン最適化計算部１１１２は、ステップＳ６００５と同様に、アプリケーション性能定義テーブル１１０２から、実行スレッド識別子３００２に該当するメモリ転送実行効率３００４を取得する。 In step S6009, the roof line optimization calculation unit 1112 translates the slope part of the illustrated roof line model data 2000, and calculates a new slope passing through the intersection B1 at the roof part of the roof line model data 2000 and the slope. Calculate the slope value (assumed to be the theoretical band value) of
In step S6010, similarly to step S6005, the roofline optimization calculation unit 1112 obtains the memory transfer execution efficiency 3004 corresponding to the execution thread identifier 3002 from the application performance definition table 1102.

　ステップＳ６０１１にて、ルーフライン最適化計算部１１１２は、ステップＳ６００９で算出した理論帯域値に、ステップＳ６０１０で取得したメモリ転送実行効率３００４の逆数を乗じた帯域性能値（実行帯域値とする）を算出する。ただし、算出した実行帯域値が，主記憶装置１３２０の制御可能な帯域性能の最大値Ｃを超える場合には、算出する実行帯域値を最大値Ｃとしてよい。 In step S6011, the roofline optimization calculation unit 1112 calculates a band performance value (referred to as an execution band value) obtained by multiplying the theoretical band value calculated in step S6009 by the reciprocal of the memory transfer execution efficiency 3004 obtained in step S6010. calculate. However, if the calculated execution bandwidth value exceeds the maximum value C of the controllable bandwidth performance of the main storage device 1320, the calculated execution bandwidth value may be set to the maximum value C.

　ステップＳ６０１２にて、ルーフライン最適化計算部１１１２は、主記憶装置１３２０の制御帯域性能をステップＳ６０１１で算出した実行帯域値とし、演算コア１３１０の制御性能を制御可能な演算性能の最大値Ａとする。 In step S6012, the roofline optimization calculation unit 1112 sets the control band performance of the main storage device 1320 to the execution band value calculated in step S6011, and sets the control performance of the calculation core 1310 to the maximum value A of controllable calculation performance. do.

　ステップＳ６０１３にて、ルーフライン最適化計算部１１１２は、アプリケーションの実行スレッドについて終了か否か判定する。終了の場合にはステップＳ６０１４へ進む。終了でない場合にはステップＳ６００１へ戻る。
ステップＳ６０１４にて、ルーフライン最適化計算部１１１２は、最適化計算工程を終了する。 In step S6013, the roofline optimization calculation unit 1112 determines whether the execution thread of the application has ended. In the case of termination, the process advances to step S6014. If the process has not ended, the process returns to step S6001.
In step S6014, the roofline optimization calculation unit 1112 ends the optimization calculation process.

　図７Ａ、図７Ｂ、図７Ｃは、実施の形態１のルーフライン最適化計算部１１１２によるアプリケーションの演算強度が図２におけるルーフラインモデルデータ２０００の勾配部に交点を持つ場合のルーフライン最適化計算の例を示す図である。
7A, FIG. 7B, and FIG. 7C show roof line optimization calculations when the calculation intensity of the application by the roof line optimization calculation unit 1112 of Embodiment 1 has an intersection at the slope part of the roof line model data 2000 in FIG. 2. FIG.

　図７Ａに示すルーフラインモデル７０００は、図６で示したステップＳ６００３にて、実行スレッドが動作する演算コア１３１０について、演算コア１３１０の制御可能な演算性能の最大値Ａと、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃとで構成するルーフラインモデルデータと、ステップＳ６００２で取得した演算強度３００３との交点を算出し、ルーフラインモデルデータルーフ部に交点Ａ１を持つ場合を例示している。 The roofline model 7000 shown in FIG. 7A calculates, in step S6003 shown in FIG. The intersection point between the roofline model data consisting of the maximum value C of the controllable band performance and the calculation strength 3003 acquired in step S6002 is calculated, and the case where the roofline model data has an intersection point A1 in the roof portion is exemplified. There is.

　図７Ｂに示すルーフラインモデル７０１０は、図６で示したステップＳ６００４にて、交点Ａ１と、交点Ａ１の演算性能値Ｐ１を含む算出処理に該当する破線矢印７０１１とを含めて例示している。 The roofline model 7010 shown in FIG. 7B is illustrated in step S6004 shown in FIG. 6, including the intersection A1 and a broken line arrow 7011 corresponding to the calculation process including the calculation performance value P1 of the intersection A1.

　図７Ｃに示すルーフラインモデル７０２０は、図６で示したステップＳ６００６における帯域性能の算出と、交点Ａ２の算出とを含む処理に該当する破線矢印７０２１と、交点Ａ２の演算性能値Ｐ２の算出処理に該当する矢印７０２２とを含めて例示している。 The roof line model 7020 shown in FIG. 7C shows a broken line arrow 7021 corresponding to the process including the calculation of the band performance in step S6006 shown in FIG. An example is shown including an arrow 7022 corresponding to .

　図８Ａ、図８Ｂ、図８Ｃは、実施の形態１のルーフライン最適化計算部１１１２によるアプリケーションの演算強度が図２におけるルーフラインモデルデータ２０００のルーフ部に交点を持つ場合のルーフライン最適化計算の例を示す図である。 8A, FIG. 8B, and FIG. 8C show roof line optimization calculations when the calculation strength of the application by the roof line optimization calculation unit 1112 of Embodiment 1 has an intersection with the roof part of the roof line model data 2000 in FIG. 2. It is a figure showing an example.

　図８Ａに示すルーフラインモデル８０００は、図６で示したステップＳ６００３にて、実行スレッドが動作する演算コア１３１０について、演算コア１３１０の制御可能な演算性能の最大値Ａと、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃとで構成するルーフラインモデルデータと、ステップＳ６００２で取得した演算強度３００３との交点を算出し、ルーフラインモデルデータのルーフ部に交点Ｂ１を持つ場合を例示している。 The roofline model 8000 shown in FIG. 8A calculates, in step S6003 shown in FIG. An example is shown in which the intersection point between the roofline model data consisting of the maximum value C of controllable band performance and the calculation strength 3003 acquired in step S6002 is calculated, and the roofline model data has an intersection point B1 at the roof portion. ing.

　図８Ｂに示すルーフラインモデル８０１０は、図６で示したステップＳ６００９にて、理論帯域値を算出した状態を例示している。また、理論帯域値の算出処理に該当する破線矢印８０１１を含めて例示している。 The roofline model 8010 shown in FIG. 8B exemplifies the state in which the theoretical band value is calculated in step S6009 shown in FIG. 6. In addition, a dashed arrow 8011 corresponding to the calculation process of the theoretical band value is included in the example.

　図８Ｃに示すルーフラインモデル８０２０は、図６で示したステップＳ６０１１にて、実行帯域値を算出した状態を例示している。また、実行帯域値の算出処理に該当する矢印８０２１を含めて例示している。 The roofline model 8020 shown in FIG. 8C exemplifies the state in which the execution band value is calculated in step S6011 shown in FIG. 6. Further, an arrow 8021 corresponding to the execution band value calculation process is included in the example.

　図９は、実施の形態１のルーフライン制御部１２０２の動作フローを示す図である。
　ルーフライン制御部１２０２は、例えば、図４Ａ、図４Ｂで例示したように、スケジューラ１２０１によって、実行スレッドを演算コアに割り当てて実行するタイミングＴ１の直前に呼び出されて実行してもよい。なお、ルーフライン制御部１２０２の処理時間を考慮した時間余裕分だけ直前に実行してもよい。 FIG. 9 is a diagram showing an operation flow of the roof line control section 1202 according to the first embodiment.
For example, as illustrated in FIGS. 4A and 4B, the roofline control unit 1202 may be called and executed by the scheduler 1201 immediately before timing T1 at which an execution thread is assigned to a calculation core and executed. Note that it may be executed immediately before by a time margin that takes into consideration the processing time of the roofline control unit 1202.

　ステップＳ９０００にて、ルーフライン制御部１２０２は、ルーフライン制御工程を開始する。
　ステップＳ９００１にて、ルーフライン制御部１２０２は、スケジューラ１２０１から実行するアプリケーションの実行スレッドを取得する。 In step S9000, roof line control section 1202 starts a roof line control process.
In step S9001, the roofline control unit 1202 obtains an execution thread of an application to be executed from the scheduler 1201.

　ステップＳ９００２にて、ルーフライン制御部１２０２は、実行スレッドに対応する、ルーフライン最適化計算部１１１２で算出し、ルーフライン制御設定部１１１３で設定済みの主記憶装置１３２０の制御帯域性能と、それに応じた主記憶装置１３２０の電力制御を、メモリ帯域電力制御部１２０４を介して実行する。 In step S9002, the roofline control unit 1202 calculates the control band performance of the main storage device 1320, which has been calculated by the roofline optimization calculation unit 1112 and has been set by the roofline control setting unit 1113, and which corresponds to the execution thread. Accordingly, power control of the main storage device 1320 is executed via the memory band power control unit 1204.

　ステップＳ９００３にて、ルーフライン制御部１２０２は、実行スレッドが動作する演算コア１３１０について、ルーフライン最適化計算部１１１２で算出し、ルーフライン制御設定部１１１３で設定済みの制御性能値が有る場合、ステップＳ９００４に進む。実行スレッドが動作する演算コア１３１０について、ルーフライン最適化計算部１１１２で算出し、ルーフライン制御設定部１１１３で設定済みの制御性能値が、制御性能の下限値または理論値の設定が有る場合、ステップＳ９００５に進む。 In step S9003, the roofline control unit 1202 calculates the control performance value for the arithmetic core 1310 on which the execution thread operates by the roofline optimization calculation unit 1112, and if there is a control performance value already set by the roofline control setting unit 1113, The process advances to step S9004. Regarding the arithmetic core 1310 on which the execution thread operates, if the control performance value calculated by the roofline optimization calculation unit 1112 and set by the roofline control setting unit 1113 is set as the lower limit value or theoretical value of the control performance, The process advances to step S9005.

　ステップＳ９００４にて、ルーフライン制御部１２０２は、実行スレッドが動作する演算コア１３１０の制御性能をステップＳ９００３で取得した制御性能値に設定し、その制御性能値に応じた演算コア１３１０の電力制御を、演算コア性能電力制御部１２０３を介して実行する。なお、演算コア１３１０の性能電力制御には、例えば、動的電圧周波数制御（ＤＶＦＳ；ＤｙｎａｍｉｃＶｏｌｔａｇｅａｎｄＦｒｅｑｕｅｎｃｙＳｃａｌｉｎｇ）を利用してもよい。ステップＳ９００４の処理工程後、ステップＳ９００６へ進む。 In step S9004, the roofline control unit 1202 sets the control performance of the arithmetic core 1310 on which the execution thread operates to the control performance value acquired in step S9003, and controls the power of the arithmetic core 1310 according to the control performance value. , is executed via the arithmetic core performance and power control unit 1203. Note that dynamic voltage frequency control (DVFS) (Dynamic Voltage and Frequency Scaling) may be used to control the performance and power of the arithmetic core 1310, for example. After the processing step in step S9004, the process advances to step S9006.

　ステップＳ９００５にて、ルーフライン制御部１２０２は、実行スレッドが動作する演算コア１３１０の制御性能をステップＳ９００３で取得した制御性能の理論値または下限値に設定し、その制御性能値に応じた演算コア１３１０の電力制御を、演算コア性能電力制御部１２０３を介して実行する。なお、ステップＳ９００３で制御性能の下限値が有る場合には、下限値を使用してよい。 In step S9005, the roofline control unit 1202 sets the control performance of the arithmetic core 1310 on which the execution thread operates to the theoretical value or lower limit value of the control performance acquired in step S9003, and 1310 is executed via the arithmetic core performance power control unit 1203. Note that if there is a lower limit value of control performance in step S9003, the lower limit value may be used.

　ステップＳ９００４またはステップＳ９００５の処理後は、ステップＳ９００６に進み、ステップＳ９００６にて、ルーフライン制御部１２０２は、ルーフライン制御工程を終了する。 After the processing in step S9004 or step S9005, the process proceeds to step S9006, and in step S9006, the roof line control unit 1202 ends the roof line control process.

実施の形態２．
　図１０は、実施の形態２のスケジューリング情報取得部１１１１から取得する情報の例を示す図である。
　図４との違いは、アプリケーションの実行スレッドが各演算コアで同時間での実行が有る点である。
　例えば、図１０の例では、演算コア０と演算コア１に対するアプリケーションの実行スレッドのスケジューリング情報を例示しており、演算コア０と演算コア１の各々で実行する実行スレッドが同時間で重なっている。 Embodiment 2.
FIG. 10 is a diagram illustrating an example of information acquired from the scheduling information acquisition unit 1111 according to the second embodiment.
The difference from FIG. 4 is that the execution thread of the application is executed at the same time on each calculation core.
For example, the example in FIG. 10 shows the scheduling information of application execution threads for computing core 0 and computing core 1, and the execution threads executed by computing core 0 and computing core 1 overlap at the same time. .

　図１１は、実施の形態２のルーフライン最適化計算部の動作フローを示す図である。
　図６との違いは、アプリケーションの実行スレッドが各演算コアで同時間での実行が有ることに対応する点である。
　ステップＳ１１０００にて、ルーフライン最適化計算部１１１２は、最適化計算工程を開始する。
　ステップＳ１１００１にて、ルーフライン最適化計算部１１１２は、各演算コア１３１０で同時間に実行が重なるアプリケーションの実行スレッドの組ごとに、ステップＳ１１００１～ステップＳ１１０１２を繰り返す。 FIG. 11 is a diagram showing an operation flow of the roof line optimization calculation section according to the second embodiment.
The difference from FIG. 6 is that the execution thread of the application is executed at the same time on each calculation core.
In step S11000, roofline optimization calculation section 1112 starts an optimization calculation process.
In step S11001, the roofline optimization calculation unit 1112 repeats steps S11001 to S11012 for each set of execution threads of applications that overlap in execution at the same time in each calculation core 1310.

　ステップＳ１１００２にて、ルーフライン最適化計算部１１１２は、アプリケーション性能定義テーブル１１０２から、各々の実行スレッドの組について、各々の演算強度３００３取得する。 In step S11002, the roofline optimization calculation unit 1112 obtains each calculation strength 3003 for each set of execution threads from the application performance definition table 1102.

　ステップＳ１１００３にて、ルーフライン最適化計算部１１１２は、各々の実行スレッドが要求する主記憶装置１３２０の帯域性能を算出する。実行スレッドが要求する帯域性能の算出方法は、例えば、実行スレッドが動作する演算コア１３１０について、演算コア１３１０の制御可能な演算性能の最大値Ａと、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃとで構成するルーフラインモデルデータ２０００と、ステップＳ１１００２で取得した演算強度３００３との交点を算出する。交点がルーフラインモデルデータ２０００の勾配部にある場合には、その勾配値が実行スレッドが要求する帯域性能値としてよい。交点がルーフラインモデルデータ２０００のルーフ部にある場合には、ルーフラインモデルデータ２０００の勾配部を平行移動し、ルーフラインモデルデータ２０００のルーフ部にある交点を通る新たな勾配を算出し、その勾配値に実行スレッドのメモリ転送実行効率３００４の逆数を乗じた勾配値を算出し、その勾配値が実行スレッドが要求する帯域性能値としてよい。ただし、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃを超える場合には、実行スレッドが要求する帯域性能値を最大値Ｃとしてよい。 In step S11003, the roofline optimization calculation unit 1112 calculates the bandwidth performance of the main storage device 1320 required by each execution thread. A method for calculating the bandwidth performance required by an execution thread is, for example, for the calculation core 1310 on which the execution thread operates, the maximum value A of the controllable calculation performance of the calculation core 1310 and the controllable bandwidth performance of the main storage device 1320. The intersection point between the roof line model data 2000 composed of the maximum value C and the calculation strength 3003 acquired in step S11002 is calculated. If the intersection point is on a slope part of the roofline model data 2000, the slope value may be the bandwidth performance value required by the execution thread. If the intersection point is at the roof part of the roof line model data 2000, the slope part of the roof line model data 2000 is translated in parallel, a new slope passing through the intersection point at the roof part of the roof line model data 2000 is calculated, and the slope part of the roof line model data 2000 is calculated. A gradient value may be calculated by multiplying the gradient value by the reciprocal of the memory transfer execution efficiency 3004 of the execution thread, and the gradient value may be used as the bandwidth performance value required by the execution thread. However, if the controllable bandwidth performance of the main storage device 1320 exceeds the maximum value C, the bandwidth performance value requested by the execution thread may be set to the maximum value C.

　ステップＳ１１００４にて、ルーフライン最適化計算部１１１２は、ステップＳ１１００３で算出した各々の実行スレッドが要求する主記憶装置１３２０の帯域性能の合計値が、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃよりも大きいか否か判定する。大きい場合にはステップＳ１１００５に進む。小さい場合にはステップＳ１１００６に進む。 In step S11004, the roofline optimization calculation unit 1112 determines that the total value of the bandwidth performance of the main storage device 1320 required by each execution thread calculated in step S11003 is the maximum of the controllable bandwidth performance of the main storage device 1320. It is determined whether the value is larger than the value C. If it is larger, the process advances to step S11005. If it is smaller, the process advances to step S11006.

　ステップＳ１１００５にて、ルーフライン最適化計算部１１１２は、ステップＳ１１００３で算出した各々の実行スレッドが要求する主記憶装置１３２０の帯域性能の合計値に対して、実行スレッドごとの帯域割合を算出する。例えば、実行スレッドごとの帯域割合の算出方法は、ステップＳ１１００３で算出した各々の実行スレッドが要求する帯域性能の合計値に対する各実行スレッドが要求する帯域性能の比を算出してもよい。 In step S11005, the roofline optimization calculation unit 1112 calculates the bandwidth ratio for each execution thread with respect to the total value of the bandwidth performance of the main storage device 1320 required by each execution thread calculated in step S11003. For example, the method of calculating the bandwidth ratio for each execution thread may be to calculate the ratio of the bandwidth performance required by each execution thread to the total value of the bandwidth performance required by each execution thread calculated in step S11003.

　ステップＳ１１００６にて、ルーフライン最適化計算部１１１２は、実行スレッドごとにステップＳ１１００６～Ｓ１１００８を繰り返す。
　ステップＳ１１００７にて、ルーフライン最適化計算部１１１２は、実行スレッドが動作する演算コア１３１０の制御性能を算出する。 In step S11006, the roofline optimization calculation unit 1112 repeats steps S11006 to S11008 for each execution thread.
In step S11007, the roofline optimization calculation unit 1112 calculates the control performance of the arithmetic core 1310 on which the execution thread operates.

　ステップＳ１１００８にて、ルーフライン最適化計算部１１１２は、実行スレッドについて終了か否か判定する。終了の場合にはステップＳ１１００９へ進む。終了でない場合にはステップＳ１１００６へ戻る。 In step S11008, the roofline optimization calculation unit 1112 determines whether the execution thread has ended. In the case of termination, the process advances to step S11009. If the process has not ended, the process returns to step S11006.

　ステップＳ１１００９にて、ステップＳ１１００４と同等の判定を行う。ステップＳ１１００３で算出した各々の実行スレッドが要求する主記憶装置１３２０の帯域性能の合計値が、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃよりも大きい場合にはステップＳ１１０１０に進む。小さい場合にはステップＳ１１０１１に進む。 In step S11009, the same determination as step S11004 is made. If the total value of the bandwidth performance of the main storage device 1320 requested by each execution thread calculated in step S11003 is larger than the maximum value C of the controllable bandwidth performance of the main storage device 1320, the process advances to step S11010. If it is smaller, the process advances to step S11011.

　ステップＳ１１０１０にて、ルーフライン最適化計算部１１１２は、主記憶装置１３２０の制御帯域性能を最大値Ｃとする。
　ステップＳ１１０１１にて、ルーフライン最適化計算部１１１２は、主記憶装置１３２０の制御帯域性能を、ステップＳ１１００３で算出した各々の実行スレッドが要求する主記憶装置１３２０の帯域性能の合計値とする。 In step S11010, the roofline optimization calculation unit 1112 sets the control band performance of the main storage device 1320 to the maximum value C.
In step S11011, the roofline optimization calculation unit 1112 sets the control bandwidth performance of the main storage device 1320 to the total value of the bandwidth performance of the main storage device 1320 required by each execution thread calculated in step S11003.

　ステップＳ１１０１２にて、ルーフライン最適化計算部１１１２は、各演算コア１３１０で同時間に実行が重なるアプリケーションの実行スレッドの組について終了か否か判定する。終了の場合にはステップＳ１１０１３へ進む。終了でない場合にはステップＳ１１００１へ戻る。
　ステップＳ１１０１３にて、ルーフライン最適化計算部１１１２は、最適化計算工程を終了する。 In step S11012, the roofline optimization calculation unit 1112 determines whether or not a set of execution threads of applications overlappingly executed at the same time in each calculation core 1310 is finished. In the case of termination, the process advances to step S11013. If the process has not ended, the process returns to step S11001.
In step S11013, the roofline optimization calculation unit 1112 ends the optimization calculation process.

　図１２は、実施の形態２のルーフライン最適化計算部に含む制御性能算出処理の動作フローを示す図である。
　ステップＳ１２０００にて、ルーフライン最適化計算部１１１２は、演算コア１３１０の制御性能算出処理工程を開始する。
　ステップＳ１２００１にて、ルーフライン最適化計算部１１１２は、アプリケーション性能定義テーブル１１０２から、実行スレッド識別子３００２に該当するメモリ転送実行効率３００４を取得する。 FIG. 12 is a diagram illustrating an operational flow of control performance calculation processing included in the roofline optimization calculation section of the second embodiment.
In step S12000, the roofline optimization calculation unit 1112 starts a control performance calculation process for the calculation core 1310.
In step S12001, the roofline optimization calculation unit 1112 obtains the memory transfer execution efficiency 3004 corresponding to the execution thread identifier 3002 from the application performance definition table 1102.

　ステップＳ１２００２にて、ルーフライン最適化計算部１１１２は、実行スレッドが動作する演算コア１３１０について、演算コア１３１０の制御可能な演算性能の最大値Ａと、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃとで構成するルーフラインモデルデータ２０００と、ステップＳ１１００２で取得した演算強度３００３との交点を算出する。交点Ａ１ａがルーフラインモデルデータ２０００の勾配部にある場合には、ステップＳ１２００３へ進む。交点Ｂ１ａがルーフラインモデルデータ２０００のルーフ部にある場合には、ステップＳ１２００８へ進む。 In step S12002, the roofline optimization calculation unit 1112 determines, for the arithmetic core 1310 on which the execution thread operates, the maximum value A of the controllable arithmetic performance of the arithmetic core 1310 and the controllable bandwidth performance of the main storage device 1320. The intersection point between the roof line model data 2000 composed of the maximum value C and the calculation strength 3003 acquired in step S11002 is calculated. If the intersection A1a is on the slope of the roofline model data 2000, the process advances to step S12003. If the intersection B1a is on the roof of the roofline model data 2000, the process advances to step S12008.

　ステップＳ１２００３にて、ルーフライン最適化計算部１１１２は、ルーフラインモデルデータ２０００の勾配部にある交点Ａ１ａの演算性能値Ｐ１ａを示すルーフを算出する。 In step S12003, the roofline optimization calculation unit 1112 calculates a roof that indicates the calculation performance value P1a of the intersection A1a on the slope part of the roofline model data 2000.

　ステップＳ１２００４にて、ルーフライン最適化計算部１１１２は、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃに対して、図１１で示したステップＳ１１００５で算出した実行スレッドの帯域割合と、ステップＳ１２００１で取得した実行スレッドに該当するメモリ転送実行効率３００４を乗じた帯域性能値を算出する。算出した帯域性能値となるルーフラインモデルデータ２０００の勾配部とステップＳ１１００２で取得した演算強度３００３との交点Ａ２ａを算出する。 In step S12004, the roofline optimization calculation unit 1112 calculates the bandwidth ratio of the execution thread calculated in step S11005 shown in FIG. A bandwidth performance value is calculated by multiplying the execution thread obtained in S12001 by the corresponding memory transfer execution efficiency 3004. An intersection point A2a between the slope part of the roofline model data 2000 that becomes the calculated band performance value and the calculation strength 3003 acquired in step S11002 is calculated.

　ステップＳ１２００５にて、ルーフライン最適化計算部１１１２は、ステップＳ１２００４で求めた交点Ａ２ａの演算性能値Ｐ２ａを示すルーフを算出する。
　ステップＳ１２００６にて、ルーフライン最適化計算部１１１２は、演算コア１３１０の制御性能の理論値をステップＳ１２００３で算出したルーフ時の演算性能値Ｐ１ａとする。 In step S12005, the roof line optimization calculation unit 1112 calculates a roof indicating the calculation performance value P2a of the intersection A2a obtained in step S12004.
In step S12006, the roof line optimization calculation unit 1112 sets the theoretical value of the control performance of the calculation core 1310 as the calculation performance value P1a during the roof period calculated in step S12003.

　ステップＳ１２００７にて、ルーフライン最適化計算部１１１２は、演算コア１３１０の制御性能の下限値をステップＳ１２００５で算出したルーフ時の演算性能値Ｐ２ａとする。ステップＳ１２００７の処理工程の後、ステップＳ１２０１４へ進む。 In step S12007, the roofline optimization calculation unit 1112 sets the lower limit value of the control performance of the calculation core 1310 to the roof-time calculation performance value P2a calculated in step S12005. After the processing step of step S12007, the process advances to step S12014.

　ステップＳ１２００８にて、ルーフライン最適化計算部１１１２は、図１１で例示したステップＳ１１００４と同じく、実行スレッドが要求する主記憶装置１３２０の帯域性能の合計値が、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃよりも大きいか否か判定する。大きい場合にはステップＳ１２００９に進む。小さい場合にはステップＳ１２０１３に進む。 In step S12008, the roofline optimization calculation unit 1112 determines that the total value of the bandwidth performance of the main storage device 1320 requested by the execution thread is equal to It is determined whether the performance is larger than the maximum value C. If it is larger, the process advances to step S12009. If it is smaller, the process advances to step S12013.

　ステップＳ１２００９にて、ルーフライン最適化計算部１１１２は、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃに対して、図１１で示したステップＳ１１００５で算出した実行スレッドの帯域割合を乗じた帯域性能値を算出する。算出した帯域性能値となるルーフラインモデルデータ２０００の勾配部とステップＳ１１００２で取得した演算強度３００３との交点Ｂ２ａを算出する。 In step S12009, the roofline optimization calculation unit 1112 multiplies the maximum value C of the controllable bandwidth performance of the main storage device 1320 by the bandwidth ratio of the execution thread calculated in step S11005 shown in FIG. Calculate the bandwidth performance value. An intersection B2a between the slope part of the roofline model data 2000, which is the calculated band performance value, and the calculation strength 3003 acquired in step S11002 is calculated.

　ステップＳ１２０１０にて、ルーフライン最適化計算部１１１２は、ステップＳ１２００９で算出した交点Ｂ２ａの演算性能値Ｑ２ａを示すルーフを算出する。
　ステップＳ１２０１１にて、ルーフライン最適化計算部１１１２は、ステップＳ１２０１０で算出した演算性能値Ｑ２ａが、演算コア１３１０の制御可能な演算性能の最大値Ａより小さいか否かを判定する。小さい場合にはステップＳ１２０１２に進む。大きい場合にはステップＳ１２０１３に進む。 In step S12010, the roof line optimization calculation unit 1112 calculates a roof indicating the calculation performance value Q2a of the intersection B2a calculated in step S12009.
In step S12011, the roofline optimization calculation unit 1112 determines whether the calculation performance value Q2a calculated in step S12010 is smaller than the maximum value A of the controllable calculation performance of the calculation core 1310. If it is smaller, the process advances to step S12012. If it is larger, the process advances to step S12013.

　ステップＳ１２０１２にて、ルーフライン最適化計算部１１１２は、演算コア１３１０の制御性能の下限値をステップＳ１２０１０で算出したルーフ時の演算性能値Ｑ２ａとする。ステップＳ１２０１２の処理工程の後、ステップＳ１２０１４へ進む。 In step S12012, the roofline optimization calculation unit 1112 sets the lower limit value of the control performance of the calculation core 1310 to the roof-time calculation performance value Q2a calculated in step S12010. After the processing step of step S12012, the process advances to step S12014.

　ステップＳ１２０１３にて、ルーフライン最適化計算部１１１２は、演算コア１３１０の制御性能値を演算コア１３１０の制御可能な演算性能の最大値Ａとし、演算性能値Ｑ１ａする。ステップＳ１２０１３の処理工程の後、ステップＳ１２０１４へ進む。
　ステップＳ１２０１４にて、ルーフライン最適化計算部１１１２は、演算コア１３１０の制御性能算出処理工程を終了する。 In step S12013, the roof line optimization calculation unit 1112 sets the control performance value of the calculation core 1310 to the maximum value A of the controllable calculation performance of the calculation core 1310, and sets it as the calculation performance value Q1a. After the processing step of step S12013, the process advances to step S12014.
In step S12014, the roof line optimization calculation unit 1112 ends the control performance calculation process of the calculation core 1310.

　図１３Ａ、図１３Ｂ、図１３Ｃ、図１３Ｄは、実施の形態２のルーフライン最適化計算部１１１２によるルーフライン最適化計算の例を示す図である。 13A, FIG. 13B, FIG. 13C, and FIG. 13D are diagrams showing examples of roofline optimization calculations by the roofline optimization calculation unit 1112 of the second embodiment.

　図１３Ａに示すルーフラインモデル１３０００は、図１１で示したステップＳ１１００３において、実行スレッドが要求する主記憶装置１３２０の帯域性能値を算出する処理のうち、実行スレッドが動作する演算コア１３１０について、演算コア１３１０の制御可能な演算性能の最大値Ａと、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃとで構成するルーフラインモデルデータ２０００と、ステップＳ１１００２で取得した演算強度３００３との第２の交点である交点Ａ１ａがルーフラインモデルデータ２０００の勾配部にある場合を例示している。 The roofline model 13000 shown in FIG. 13A performs calculations on the calculation core 1310 on which the execution thread operates in the process of calculating the bandwidth performance value of the main storage device 1320 requested by the execution thread in step S11003 shown in FIG. Roofline model data 2000 consisting of the maximum value A of the controllable calculation performance of the core 1310 and the maximum value C of the controllable bandwidth performance of the main storage device 1320, and the calculation intensity 3003 acquired in step S11002. A case is illustrated in which the intersection point A1a, which is the intersection point of 2, is located on the slope part of the roof line model data 2000.

　図１３Ｂに示すルーフラインモデル１３０１０は、図１１で示したステップＳ１１００３において、実行スレッドが要求する主記憶装置１３２０の帯域性能値を算出する処理のうち、実行スレッドが動作する演算コア１３１０について、演算コア１３１０の制御可能な演算性能の最大値Ａと、主記憶装置１３２０の制御可能な帯域性能の最大値Ｃとで構成するルーフラインモデルデータ２０００と、ステップＳ１１００２で取得した演算強度３００３との交点Ｂ１ａがルーフラインモデルデータ２０００のルーフ部にある場合を例示している。また、実行スレッドが要求する帯域性能値の算出処理に該当する破線矢印１３０１１を含めて例示している。 The roofline model 13010 shown in FIG. 13B performs calculations on the calculation core 1310 on which the execution thread operates in the process of calculating the bandwidth performance value of the main storage device 1320 requested by the execution thread in step S11003 shown in FIG. The intersection point of the roofline model data 2000, which is composed of the maximum value A of the controllable calculation performance of the core 1310 and the maximum value C of the controllable bandwidth performance of the main storage device 1320, and the calculation strength 3003 obtained in step S11002. A case where B1a is located at the roof portion of the roof line model data 2000 is illustrated. In addition, a dashed arrow 13011 corresponding to the calculation process of the bandwidth performance value requested by the execution thread is included in the example.

　図１３Ｃに示すルーフラインモデル１３０２０は、図１２で示したステップＳ１２００３で交点Ａ１ａの演算性能値Ｐ１ａを示すルーフの算出処理に該当する破線矢印１３０２１と、ステップＳ１２００４で交点Ａ２ａの算出処理に該当する破線矢印１３０２２と、ステップＳ１２００５で交点Ａ２ａの演算性能値Ｐ２ａを示すルーフの算出処理に該当する破線矢印１３０２３とを含めて例示している。 The roof line model 13020 shown in FIG. 13C has a broken line arrow 13021 corresponding to the calculation process of the roof indicating the calculation performance value P1a of the intersection point A1a in step S12003 shown in FIG. The example includes a broken line arrow 13022 and a broken line arrow 13023 corresponding to the roof calculation process that indicates the calculation performance value P2a of the intersection A2a in step S12005.

　図１３Ｄに示すルーフラインモデル１３０３０は、図１２で示したステップＳ１２００９で交点Ｂ２ａの算出処理に該当する破線矢印１３０３１を含めて例示している。 The roof line model 13030 shown in FIG. 13D is illustrated including a broken line arrow 13031 corresponding to the calculation process of the intersection B2a in step S12009 shown in FIG.

　図１４は、実施の形態２のルーフライン制御部１２０２における実行スレッド取得処理の動作フローを示す図である。
　図９との違いは、ステップＳ９００１の実行スレッド取得の処理工程について、アプリケーションの実行スレッドが各演算コア１３１０で同時間での実行が有ることに対応する点である。ルーフライン制御部１２０２の動作フローは、図９におけるステップＳ９００１を、図１４で示すステップＳ１４０００～ステップＳ１４００６で置換したものとなる。 FIG. 14 is a diagram showing an operational flow of execution thread acquisition processing in the roofline control unit 1202 according to the second embodiment.
The difference from FIG. 9 is that the processing step of obtaining an execution thread in step S9001 corresponds to the fact that the execution thread of the application is executed at the same time in each calculation core 1310. The operation flow of the roofline control unit 1202 is such that step S9001 in FIG. 9 is replaced with steps S14000 to S14006 shown in FIG. 14.

　ステップＳ１４０００にて、ルーフライン制御部１２０２は、実行スレッド取得処理を開始する。 In step S14000, the roofline control unit 1202 starts execution thread acquisition processing.

　ステップＳ１４００１にて、ルーフライン制御部１２０２は、ルーフライン制御部１２０２がルーフライン制御の割込みを契機として動作しているか否かを判定する。割込みを契機として動作している場合にはステップＳ１４００５へ進む。割込みを契機として動作していない場合にはステップＳ１４００２へ進む。なお、ルーフライン制御の割込みは、ステップＳ１４００４で発行される。 In step S14001, the roofline control unit 1202 determines whether the roofline control unit 1202 is operating in response to a roofline control interruption. If the operation is triggered by an interrupt, the process advances to step S14005. If the interrupt is not triggered, the process advances to step S14002. Note that the roof line control interrupt is issued in step S14004.

　ステップＳ１４００２にて、ルーフライン制御部１２０２は、スケジューラ１２０１から自身の演算コア１３１０へ割当実行するアプリケーションの実行スレッドを取得する。 In step S14002, the roofline control unit 1202 acquires from the scheduler 1201 the execution thread of the application to be assigned to its own arithmetic core 1310 for execution.

　ステップＳ１４００３にて、ルーフライン制御部１２０２は、スケジューラ１２０１から自身以外の他の演算コア１３１０で実行中の実行スレッドが有るか否かを判定する。他の演算コア１３１０で実行中の実行スレッドが有る場合にはステップＳ１４００４へ進む。無い場合にはステップＳ１４００６へ進む。 In step S14003, the roofline control unit 1202 determines from the scheduler 1201 whether there is an execution thread being executed on a calculation core 1310 other than itself. If there is an execution thread being executed on another arithmetic core 1310, the process advances to step S14004. If there is no such information, the process advances to step S14006.

　ステップＳ１４００４にて、ルーフライン制御部１２０２は、自身以外の他の演算コア１３１０に対して、ルーフライン制御部を実行するための割込みを発行する。 In step S14004, the roofline control unit 1202 issues an interrupt to other arithmetic cores 1310 other than itself to execute the roofline control unit.

　ステップＳ１４００５にて、ルーフライン制御部１２０２は、各演算コア１３１０で実行中の実行スレッドの組を取得する。なお、ステップＳ１４００５で例示する処理工程によれば、各々の演算コア１３１０が同時間に実行のあるアプリケーションの実行スレッドの組を取得することになる。
　ステップＳ１４００６にて、ルーフライン制御部１２０２は、実行スレッド取得処理を終了する。 In step S14005, the roofline control unit 1202 obtains a set of execution threads that are being executed in each calculation core 1310. Note that according to the process illustrated in step S14005, each calculation core 1310 acquires a set of execution threads of applications that are executed at the same time.
In step S14006, the roofline control unit 1202 ends the execution thread acquisition process.

　ステップＳ１４００６の処理工程を終了後に、図９におけるステップＳ９００２～ステップＳ９００６の処理工程を実施する。なお、図９におけるステップＳ９００２～ステップＳ９００６の処理工程は、図１１で例示した各演算コア１３１０で同時間に実行のあるアプリケーションの実行スレッドの組ごとに算出した、主記憶装置１３２０の制御帯域性能と、各実行スレッドが動作する演算コア１３１０の制御性能にもとづいて、主記憶装置１３２０の制御帯域性能とそれに応じた主記憶装置１３２０の電力制御と、各実行スレッドが動作する演算コア１３１０の制御性能とそれに応じた演算コア１３１０の電力制御を実施すればよい。なお、主記憶装置１３２０の制御帯域性能と主記憶装置１３２０の電力制御は、いずれかの演算コア１３１０で実施されればよい。また、各実行スレッドが動作する演算コア１３１０の制御性能とそれに応じた演算コア１３１０の電力制御は、各々の演算コア１３１０で実施すればよい。 After completing the process in step S14006, the processes in steps S9002 to S9006 in FIG. 9 are performed. Note that the processing steps from step S9002 to step S9006 in FIG. 9 are based on the control bandwidth performance of the main storage device 1320 calculated for each set of execution threads of applications that are executed at the same time on each arithmetic core 1310 illustrated in FIG. Based on the control performance of the arithmetic core 1310 on which each execution thread operates, the control bandwidth performance of the main storage device 1320 and the corresponding power control of the main memory device 1320, and the control of the arithmetic core 1310 on which each execution thread operates. What is necessary is to perform power control of the arithmetic core 1310 according to the performance. Note that the control band performance of the main storage device 1320 and the power control of the main storage device 1320 may be performed by any one of the arithmetic cores 1310. Further, control performance of the arithmetic core 1310 on which each execution thread operates and power control of the arithmetic core 1310 corresponding to the control performance may be performed by each arithmetic core 1310.

　本願は、様々な例示的な実施の形態及び実施例が記載されているが、１つ、または複数の実施の形態に記載された様々な特徴、態様、及び機能は特定の実施の形態の適用に限られるのではなく、単独で、または様々な組み合わせで実施の形態に適用可能である。
従って、例示されていない無数の変形例が、本願明細書に開示される技術の範囲内において想定される。例えば、少なくとも１つの構成要素を変形する場合、追加する場合または省略する場合、さらには、少なくとも１つの構成要素を抽出し、他の実施の形態の構成要素と組み合わせる場合が含まれるものとする。 Although this application describes various exemplary embodiments and examples, various features, aspects, and functions described in one or more embodiments may be applicable to a particular embodiment. The present invention is not limited to, and can be applied to the embodiments alone or in various combinations.
Accordingly, countless variations not illustrated are envisioned within the scope of the technology disclosed herein. For example, this includes cases where at least one component is modified, added, or omitted, and cases where at least one component is extracted and combined with components of other embodiments.

　１０００　情報処理装置、１１００　性能電力最適化部、１１０１　ルーフラインモデルデータ、１１０２　アプリケーション性能定義テーブル、１１１０　性能電力最適化プログラム、１１１１　スケジューリング情報取得部、１１１２　ルーフライン最適化計算部、１１１３　ルーフライン制御設定部、１２００　システムソフトウェア、１２０１　スケジューラ、１２０２　ルーフライン制御部、１２０３　演算コア性能電力制御部、１２０４　メモリ帯域電力制御部、１３００　計算機ハードウェア、１３１０　演算コア、１３２０　主記憶装置、１４００　アプリケーション 1000 Information processing device, 1100 Performance power optimization unit, 1101 Roofline model data, 1102 Application performance definition table, 1110 Performance power optimization program, 1111 Scheduling information acquisition unit, 1112 Roofline optimization calculation unit, 1113 Roofline control setting Department, 1200 System software, 1201 Scheduler, 1202 Roofline control unit, 1203 Computing core performance power control unit, 1204 Memory band power control unit, 1300 Computer hardware, 1310 Computing core, 1320 Main storage, 1400 Application

Claims

　省電力機構を備える演算コアおよび主記憶装置を含む計算機ハードウェアと、
前記計算機ハードウェアで動作するシステムソフトウェアと、
前記システムソフトウェアおよび前記システムソフトウェアのコンテナ実行環境で動作するアプリケーションと、
を含む情報処理装置において、
前記情報処理装置の演算性能と消費電力の最適化処理を行う性能電力最適化部を備え、
前記性能電力最適化部に、性能電力最適化プログラムと、前記計算機ハードウェアの演算強度と単位時間当たりの演算性能を示すルーフラインモデルデータと、前記アプリケーションの演算強度情報を含むアプリケーション性能定義テーブルと、を備え、
前記性能電力最適化プログラムに、前記計算機ハードウェアの演算コアに対するアプリケーションのスケジューリング情報を取得するスケジューリング情報取得部と、前記計算機ハードウェアのルーフラインの最適化計算を行うルーフライン最適化計算部と、
を備えることを特徴とする情報処理装置。 Computer hardware including an arithmetic core and a main storage device equipped with a power saving mechanism;
system software that operates on the computer hardware;
the system software and an application that operates in a container execution environment of the system software;
In an information processing device including
comprising a performance power optimization unit that performs processing to optimize calculation performance and power consumption of the information processing device;
The performance power optimization unit includes a performance power optimization program, roofline model data indicating the calculation intensity and calculation performance per unit time of the computer hardware, and an application performance definition table including calculation intensity information of the application. , comprising;
The performance power optimization program includes a scheduling information acquisition unit that acquires scheduling information of an application for the calculation core of the computer hardware, and a roofline optimization calculation unit that performs a roofline optimization calculation of the computer hardware.
An information processing device comprising:
　前記性能電力最適化プログラムは、前記ルーフラインの最適化計算にもとづいて、前記システムソフトウェアに対して前記計算機ハードウェアの性能電力の制御設定を行うルーフライン制御設定部を含むことを特徴とする請求項１に記載の情報処理装置。 The performance power optimization program includes a roofline control setting section that performs control settings for the performance power of the computer hardware for the system software based on the roofline optimization calculation. The information processing device according to item 1.
　前記システムソフトウェアは、
ひとつ以上の前記アプリケーションをひとつ以上の演算コアに割り当てて実行するスケジューラと、
ルーフライン制御設定の設定情報をもとに、前記計算機ハードウェアの前記ルーフラインを制御するルーフライン制御部と、
前記演算コアの性能および電力の制御を行う演算コア性能電力制御部と、
前記主記憶装置の帯域および電力の制御を行うメモリ帯域電力制御部と、
を含むことを特徴とする請求項１に記載の情報処理装置。 The system software includes:
a scheduler that assigns and executes one or more of the applications to one or more computing cores;
a roofline control unit that controls the roofline of the computer hardware based on setting information of roofline control settings;
an arithmetic core performance and power control unit that controls performance and power of the arithmetic core;
a memory bandwidth power control unit that controls bandwidth and power of the main storage device;
The information processing device according to claim 1, characterized in that the information processing device includes:
　前記ルーフラインモデルデータは、前記演算コアごとのルーフラインモデルデータを含むことを特徴とする請求項１に記載の情報処理装置。 The information processing device according to claim 1, wherein the roofline model data includes roofline model data for each of the calculation cores.
　前記ルーフラインモデルデータは、前記演算コアごとのルーフラインモデルデータにおいて、
演算性能が最大を示すルーフラインである演算性能最大値と、
演算性能が最小を示すルーフラインである演算性能最小値と、
メモリ帯域が最大を示す勾配であるメモリ帯域最大値と、
メモリ帯域が最小を示す勾配であるメモリ帯域最小値と、
を備えることを特徴とする請求項１に記載の情報処理装置。 The roofline model data includes roofline model data for each calculation core,
The maximum calculation performance value is the roof line indicating the maximum calculation performance,
the minimum calculation performance value, which is the roof line indicating the minimum calculation performance;
The memory bandwidth maximum value is the slope indicating the maximum memory bandwidth,
A memory band minimum value, which is a slope indicating a minimum memory band,
The information processing device according to claim 1, further comprising:
　前記アプリケーション性能定義テーブルは、
前記コンテナ実行環境を示すコンテナ識別子と、
前記コンテナ実行環境で動作するアプリケーションを示すアプリケーション識別子と、
前記アプリケーションの実行スレッドを示す実行スレッド識別子と、
前記アプリケーションの実行スレッドの演算強度と、
前記アプリケーションの実行スレッドのメモリ転送実行効率と、
を含むことを特徴とする請求項１に記載の情報処理装置。 The application performance definition table is
a container identifier indicating the container execution environment;
an application identifier indicating an application running in the container execution environment;
an execution thread identifier indicating an execution thread of the application;
the computational intensity of the execution thread of the application;
memory transfer execution efficiency of the execution thread of the application;
The information processing device according to claim 1, characterized in that the information processing device includes:
　省電力機構を備える演算コアおよび主記憶装置を含む計算機ハードウェアの前記演算コアに対するアプリケーションの実行スレッドのスケジューリング情報を取得するスケジューリング情報取得ステップと、
前記計算機ハードウェアのルーフラインの最適化計算を行うルーフライン最適化計算ステップと、
前記計算機ハードウェアで動作するシステムソフトウェアに対して前記計算機ハードウェアの性能電力の制御設定を行うルーフライン制御設定ステップと、を含むことを特徴とする情報処理方法。 a scheduling information acquisition step of acquiring scheduling information of an execution thread of an application for the arithmetic core of computer hardware including an arithmetic core equipped with a power saving mechanism and a main storage device;
a roofline optimization calculation step of performing roofline optimization calculation of the computer hardware;
An information processing method comprising: a roofline control setting step of performing control settings for performance power of the computer hardware for system software running on the computer hardware.
　前記スケジューリング情報取得ステップは、
前記計算機ハードウェアの全ての前記演算コアの活性化状態と、前記アプリケーションの前記実行スレッドが複数の前記演算コアで同時間での実行があるか否かと、を含む情報を取得するステップを含むことを特徴とする請求項７に記載の情報処理方法。 The scheduling information acquisition step includes:
the step of acquiring information including the activation state of all the calculation cores of the computer hardware and whether the execution thread of the application is executed on a plurality of the calculation cores at the same time; The information processing method according to claim 7, characterized in that:
　前記ルーフライン最適化計算ステップは、
前記アプリケーションの実行スレッドが複数の前記演算コアで同時間での実行が無い場合において、
前記アプリケーションの前記実行スレッドごとに、前記アプリケーションの演算強度情報を含むアプリケーション性能定義テーブルから演算強度を取得するステップと、
前記演算コアの制御可能な演算性能の最大値と、主記憶装置の制御可能な帯域性能の最大値とで構成するルーフラインモデルデータと、前記演算強度との第１の交点を求めるステップと、
前記第１の交点が前記ルーフラインモデルデータの勾配部となる場合に、前記第１の交点を交点Ａ１とし、交点Ａ１の演算性能値Ｐ１を示すルーフを算出するステップと、
前記アプリケーション性能定義テーブルから前記アプリケーションの実行スレッドのメモリ転送実行効率を取得するステップと、
前記主記憶装置の制御可能な帯域性能の最大値に前記メモリ転送実行効率を乗じた帯域性能を算出し、前記帯域性能によるルーフラインモデルデータの勾配部と前記演算強度の交点Ａ２を算出するステップと、
前記交点Ａ２の演算性能値Ｐ２を示すルーフを算出するステップと、
前記主記憶装置の制御帯域性能を最大値とし、前記演算コアの制御性能の理論値を前記演算性能値Ｐ１とし、前記演算コアの制御性能の下限値を前記演算性能値Ｐ２とするステップと、を含み、
前記第１の交点が前記ルーフラインモデルデータの前記ルーフとなる場合に、前記第１の交点を交点Ｂ１とし、前記ルーフラインモデルデータの両対数グラフにおいて、前記主記憶装置の制御可能な帯域性能の最大値を示す勾配部を平行移動し、前記交点Ｂ１を通る新たな勾配部と、その勾配の理論帯域値となる勾配値を算出するステップと、
算出した前記勾配値に前記メモリ転送実行効率の逆数を乗じた実行帯域値となる勾配値を算出するステップと、
前記主記憶装置の制御帯域性能を実行帯域値とし、前記演算コアの制御性能を制御可能な演算性能の前記最大値とするステップと、
を含むことを特徴とする請求項７に記載の情報処理方法。 The roofline optimization calculation step includes:
In a case where the execution thread of the application is not executed at the same time on a plurality of the calculation cores,
obtaining a computation strength for each of the execution threads of the application from an application performance definition table containing computation strength information of the application;
determining a first intersection point between roofline model data consisting of a maximum value of controllable calculation performance of the calculation core and a maximum value of controllable bandwidth performance of the main storage device and the calculation intensity;
When the first intersection point is a slope part of the roof line model data, the first intersection point is set as an intersection point A1, and a roof indicating a calculation performance value P1 of the intersection point A1 is calculated;
obtaining the memory transfer execution efficiency of the execution thread of the application from the application performance definition table;
calculating the bandwidth performance obtained by multiplying the maximum value of the controllable bandwidth performance of the main storage device by the memory transfer execution efficiency, and calculating the intersection point A2 of the gradient part of the roofline model data based on the bandwidth performance and the calculation intensity; and,
calculating a roof indicating a calculation performance value P2 of the intersection A2;
setting the control band performance of the main storage device to a maximum value, the theoretical value of the control performance of the arithmetic core to the arithmetic performance value P1, and the lower limit value of the control performance of the arithmetic core to the arithmetic performance value P2; including;
When the first intersection is the roof of the roofline model data, the first intersection is set as intersection B1, and in the logarithmic graph of the roofline model data, the controllable bandwidth performance of the main storage device a step of translating a slope section showing the maximum value of and calculating a new slope section passing through the intersection B1 and a slope value that is a theoretical band value of the slope;
calculating a gradient value that is an execution band value obtained by multiplying the calculated gradient value by the reciprocal of the memory transfer execution efficiency;
setting the control band performance of the main storage device as an execution band value, and setting the control performance of the arithmetic core to the maximum value of controllable arithmetic performance;
8. The information processing method according to claim 7, further comprising:
　前記ルーフライン制御設定ステップは、
前記アプリケーションの実行スレッドが複数の前記演算コアで同時間での実行が無い場合において、
前記演算コアへ割当実行する前記アプリケーションの実行スレッドを取得するステップと、
前記実行スレッドに対応する前記主記憶装置の制御帯域性能への制御と、前記制御帯域性能に応じた前記主記憶装置の電力制御とを行うステップと、
を含み、
前記演算コアの制御性能値が有る場合に、前記演算コアの制御性能を前記制御性能値に制御するステップと、前記制御性能に応じた演算コア電力に制御するステップと、を含み、
前記演算コアの制御性能の理論値が有る場合に、前記演算コアの制御性能を前記理論値に制御するステップと、前記制御性能に応じた演算コア電力に制御するステップと、
もしくは、
前記演算コアの制御性能の下限値が有る場合に、前記演算コアの制御性能を前記下限値に制御するステップと、前記制御性能に応じた演算コア電力に制御するステップと、を含むことを特徴とする請求項７に記載の情報処理方法。 The roofline control setting step includes:
In a case where the execution thread of the application is not executed at the same time on a plurality of the calculation cores,
obtaining an execution thread of the application to be assigned to the computing core;
controlling the control band performance of the main storage device corresponding to the execution thread, and controlling the power of the main storage device according to the control band performance;
including;
If there is a control performance value of the calculation core, the control performance of the calculation core is controlled to the control performance value, and the calculation core power is controlled to be according to the control performance,
If there is a theoretical value of the control performance of the arithmetic core, controlling the control performance of the arithmetic core to the theoretical value; and controlling the power of the arithmetic core to be in accordance with the control performance.
or,
If there is a lower limit value for the control performance of the arithmetic core, the method includes the steps of: controlling the control performance of the arithmetic core to the lower limit value; and controlling the power of the arithmetic core to be in accordance with the control performance. 8. The information processing method according to claim 7.
　前記ルーフライン最適化計算ステップは、
前記アプリケーションの前記実行スレッドが複数の前記演算コアで同時間での実行が有る場合に、複数の前記演算コアで同時間で実行する前記アプリケーションの前記実行スレッドの組ごとに、前記アプリケーションの演算強度情報を含むアプリケーション性能定義テーブルから前記アプリケーションの前記実行スレッドの組の各々の演算強度を取得するステップと、各々の前記実行スレッドが要求する帯域性能を算出するステップと、を含み、
算出した各々の前記実行スレッドが要求する前記帯域性能の合計値が主記憶装置の制御可能な帯域性能の最大値よりも大きい場合には、前記帯域性能の合計値に対する各々の前記アプリケーションの前記実行スレッドの帯域割合を算出するステップと、前記主記憶装置の制御帯域性能を最大値とするステップと、を含み、
算出した各々の前記実行スレッドが要求する前記帯域性能の合計値が主記憶装置の制御可能な帯域性能の最大値よりも大きくない場合には、前記主記憶装置の制御帯域性能を算出した各々の前記実行スレッドが要求する前記帯域性能の合計値とするステップと、を含み、
前記アプリケーションの前記実行スレッドごとに前記演算コアの制御性能を算出する演算コア制御性能算出ステップと、を含むことを特徴とする請求項７に記載の情報処理方法 The roofline optimization calculation step includes:
When the execution threads of the application are executed at the same time on a plurality of the calculation cores, the calculation intensity of the application is determined for each set of execution threads of the application executed at the same time on the plurality of calculation cores. the steps of: obtaining the computation intensity of each of the set of execution threads of the application from an application performance definition table including information; and calculating the bandwidth performance required by each of the execution threads;
If the calculated total value of the bandwidth performance requested by each of the execution threads is larger than the maximum value of the controllable bandwidth performance of the main storage device, the execution of each of the applications with respect to the total value of the bandwidth performance The method includes a step of calculating a thread bandwidth ratio, and a step of setting the control bandwidth performance of the main storage device to a maximum value,
If the calculated total value of the bandwidth performance required by each of the execution threads is not larger than the maximum value of the controllable bandwidth performance of the main storage device, each of the calculated control bandwidth performance of the main storage device the total value of the bandwidth performance required by the execution thread;
8. The information processing method according to claim 7, further comprising: calculating control performance of the calculation core for each execution thread of the application.
　前記アプリケーションの実行スレッドが複数の前記演算コアで同時間での実行が有る場合に、前記アプリケーションの前記実行スレッドごとに前記演算コアの制御性能を算出する演算コア制御性能算出処理において、
前記アプリケーションの前記実行スレッドごとに、前記アプリケーションの演算強度情報を含むアプリケーション性能定義テーブルから前記アプリケーションの実行スレッドのメモリ転送実行効率を取得するステップと、前記実行スレッドが動作する前記演算コアの制御可能な演算性能の最大値と、主記憶装置の制御可能な帯域性能の最大値とで構成するルーフラインモデルデータと、前記実行スレッドの演算強度との第２の交点を求めるステップと、を含み、
前記第２の交点が前記ルーフラインモデルデータの勾配部となる場合に、前記第２の交点を交点Ａ１ａとし、前記交点Ａ１ａの演算性能値Ｐ１ａを示すルーフを算出するステップと、前記主記憶装置の制御可能な帯域性能の最大値に対して、算出した前記実行スレッドの帯域割合と、取得した前記メモリ転送実行効率とを乗じた帯域性能値を算出するステップと、算出した前記帯域性能値を示すルーフラインモデルデータの勾配部と、取得した前記実行スレッドの前記演算強度との交点Ａ２ａを算出するステップと、前記交点Ａ２ａの演算性能値Ｐ２ａを示すルーフを算出するステップと、前記演算コアの制御性能の理論値を前記演算性能値Ｐ１ａとし、前記演算コアの制御性能の下限値を前記演算性能値Ｐ２ａとするステップと、を含み、
前記第２の交点が前記ルーフラインモデルデータの前記ルーフとなる場合に、前記第２の交点を交点Ｂ１ａとし、算出した前記実行スレッドが要求する前記主記憶装置の帯域性能の合計値が、主記憶装置の制御可能な帯域性能の最大値よりも大きい場合には、前記主記憶装置の制御可能な帯域性能の最大値に対して、算出した前記実行スレッドの帯域割合を乗じた帯域性能値を算出するステップと、算出した前記帯域性能値を示すルーフラインモデルデータの勾配部と、取得した前記実行スレッドの前記演算強度との交点Ｂ２ａを算出するステップと、前記交点Ｂ２ａの演算性能値Ｑ２ａを示すルーフを算出するステップと、算出した演算性能値Ｑ２ａが前記演算コアの制御可能な演算性能の最大値よりも小さい場合には、前記演算コアの制御性能の下限値を前記演算性能値Ｑ２ａとするステップと、を含み、
算出した前記実行スレッドが要求する前記主記憶装置の帯域性能の合計値が、主記憶装置の制御可能な帯域性能の最大値よりも小さい場合、または、算出した演算性能値Ｑ２ａが前記演算コアの制御可能な演算性能の最大値よりも大きい場合には、前記演算コアの制御性能を制御可能な演算性能の前記最大値とするステップと、を含むことを特徴とする請求項７に記載の情報処理方法。 In a calculation core control performance calculation process of calculating the control performance of the calculation core for each execution thread of the application when the execution thread of the application is executed at the same time on a plurality of the calculation cores,
obtaining memory transfer execution efficiency of the execution thread of the application from an application performance definition table including calculation intensity information of the application for each execution thread of the application; and controllable of the calculation core on which the execution thread operates. calculating a second intersection point between roofline model data consisting of a maximum value of the calculation performance and a maximum value of the controllable bandwidth performance of the main storage device and the calculation intensity of the execution thread;
When the second intersection point is a slope part of the roof line model data, the second intersection point is set as an intersection point A1a, and a step of calculating a roof indicating a calculation performance value P1a of the intersection point A1a, and the main storage device calculating a bandwidth performance value by multiplying the maximum value of controllable bandwidth performance by the calculated bandwidth ratio of the execution thread and the obtained memory transfer execution efficiency; a step of calculating an intersection point A2a between the slope part of the roof line model data shown and the calculation strength of the acquired execution thread; a step of calculating a roof indicating the calculation performance value P2a of the intersection point A2a; The theoretical value of control performance is set to the calculation performance value P1a, and the lower limit value of the control performance of the calculation core is set to the calculation performance value P2a,
When the second intersection is the roof of the roofline model data, the second intersection is set as intersection B1a, and the calculated total value of the bandwidth performance of the main storage device requested by the execution thread is If it is larger than the maximum value of the controllable bandwidth performance of the storage device, the bandwidth performance value is calculated by multiplying the maximum value of the controllable bandwidth performance of the main storage device by the calculated bandwidth ratio of the execution thread. a step of calculating an intersection point B2a between a slope part of the roofline model data indicating the calculated band performance value and the calculation strength of the acquired execution thread; and a step of calculating the calculation performance value Q2a of the intersection point B2a. a step of calculating a roof indicated by the calculation performance value Q2a, and when the calculated calculation performance value Q2a is smaller than the maximum value of the controllable calculation performance of the calculation core, the lower limit value of the control performance of the calculation core is set to the calculation performance value Q2a; and a step of
If the calculated total value of the bandwidth performance of the main storage device required by the execution thread is smaller than the maximum value of the controllable bandwidth performance of the main storage device, or if the calculated calculation performance value Q2a is The information according to claim 7, further comprising the step of setting the control performance of the arithmetic core to the maximum value of the controllable arithmetic performance when the control performance is larger than the maximum value of the controllable arithmetic performance. Processing method.
　ルーフライン制御処理の前記実行スレッドの情報取得処理において、前記アプリケーションの実行スレッドが複数の前記演算コアで同時間での実行が有る場合には、ルーフライン制御の割込みを契機として動作しているか否かを判定するステップと、ルーフライン制御の割込みを契機として動作している場合に、各々の前記演算コアで実行中の前記実行スレッドの組を取得するステップと、を含み、
前記ルーフライン制御の割込みを契機とする動作ではない場合には、自身の前記演算コアへ割当実行する前記アプリケーションの前記実行スレッドを取得するステップと、自身以外の他の前記演算コアで実行中の前記実行スレッドが有るか否かを判定するステップと、を含み、
他の前記演算コアで実行中の前記実行スレッドが有る場合には、自身以外の他の前記演算コアに対して、前記ルーフライン制御を実行するための割込みを発行するステップと、各々の前記演算コアで実行中の前記実行スレッドの組を取得するステップと、を含むことを特徴とする請求項７に記載の情報処理方法。 In the information acquisition process of the execution thread of the roofline control process, if the execution thread of the application is executed at the same time on a plurality of the calculation cores, it is determined whether the execution thread of the application is running in response to an interrupt of the roofline control. and a step of obtaining a set of the execution threads being executed in each of the arithmetic cores when the operation is triggered by a roofline control interrupt,
If the operation is not triggered by an interrupt of the roofline control, a step of acquiring the execution thread of the application to be executed by assigning it to the calculation core of the application itself, and a step of acquiring the execution thread of the application to be executed by the calculation core of the application itself; determining whether the execution thread is present;
If there is the execution thread being executed in another of the calculation cores, issuing an interrupt for executing the roofline control to the other calculation cores other than the execution thread; and 8. The information processing method according to claim 7, further comprising the step of obtaining a set of the execution threads being executed in a core.