WO2023238276A1 - Information processing device and information processing method - Google Patents

Information processing device and information processing method Download PDF

Info

Publication number
WO2023238276A1
WO2023238276A1 PCT/JP2022/023104 JP2022023104W WO2023238276A1 WO 2023238276 A1 WO2023238276 A1 WO 2023238276A1 JP 2022023104 W JP2022023104 W JP 2022023104W WO 2023238276 A1 WO2023238276 A1 WO 2023238276A1
Authority
WO
WIPO (PCT)
Prior art keywords
performance
calculation
roofline
value
application
Prior art date
Application number
PCT/JP2022/023104
Other languages
French (fr)
Japanese (ja)
Inventor
克久 小笠原
涼太 北川
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2022/023104 priority Critical patent/WO2023238276A1/en
Publication of WO2023238276A1 publication Critical patent/WO2023238276A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present application relates to an information processing device and an information processing method that control power consumption in accordance with arithmetic processing performance.
  • An automatic control system is generally a system in which multiple functions are coordinated and integrated to perform recognition, judgment, and control.
  • an automated driving system consists of an automated driving control unit that generates optimal control parameters from surrounding conditions, and an engine control unit, brake control unit, and steering control unit that implement vehicle engine control, brake control, and steering control, respectively. be done.
  • an engine control unit, brake control unit, and steering control unit that implement vehicle engine control, brake control, and steering control, respectively. be done.
  • level of autonomy e.g. self-driving
  • automatic control systems require more computing performance.
  • optimization control is required that takes into account applications that are executed in parallel in a system configuration that includes multiple processing cores or applications that are in multiple container environments.
  • Patent Document 1 the usage status of shared computer resources is managed, and based on the usage status, the destination node for new virtual machines, containers, and storage volumes is selected so as not to exceed the upper limit of the computer resources of the destination node.
  • This is a method for determining the performance and power of an information processing device adapted to the calculation processing of an application.
  • Patent Document 2 is a device that reduces power of a processor based on memory access information inside the processor, and does not target highly accurate optimization of performance power based on information regarding arithmetic processing of an application. Furthermore, it does not target applications that perform parallel processing using multiple processing cores. Furthermore, the main storage device is not included in the performance power optimization target.
  • FIG. 1 is a block diagram showing the configuration of an information processing device according to Embodiment 1.
  • FIG. 3 is a diagram illustrating an example of roofline model data in the information processing apparatus according to the first embodiment.
  • FIG. 3 is a table diagram showing an example of contents of an application performance definition table in the information processing apparatus according to the first embodiment.
  • 3 is a diagram illustrating an example of information acquired from a scheduling information acquisition unit in the information processing apparatus according to the first embodiment.
  • FIG. 3 is a diagram illustrating an example of information acquired from a scheduling information acquisition unit in the information processing apparatus according to the first embodiment.
  • FIG. 3 is a diagram showing an operation flow of a performance power optimization unit in the information processing device according to the first embodiment.
  • FIG. 6 is a diagram illustrating an example of roofline optimization calculation in a case where the calculation strength of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection with a slope portion of roofline model data.
  • FIG. 6 is a diagram illustrating an example of roofline optimization calculation when the calculation intensity of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection point with a roof portion of roofline model data.
  • FIG. 6 is a diagram illustrating an example of roofline optimization calculation when the calculation intensity of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection point with a roof portion of roofline model data.
  • the scheduling information acquisition unit 1111 acquires allocation information of the application 1400 and the application 1610 running in the container execution environment 1600 to the computer hardware 1300.
  • the roofline optimization calculation unit 1112 performs optimization calculations for performance power control of the computer hardware 1300 when executing the application 1400 and the application 1610 running in the container execution environment 1600 on the computer hardware 1300.
  • the roofline control unit 1202 controls a roofline model representing the relationship between calculation intensity and calculation performance per unit time in the computer hardware 1300.
  • Computer hardware 1300 includes at least one or more arithmetic cores 1310 and a main storage device 1320.
  • the arithmetic core 1310 includes at least a computing machine that performs arithmetic processing and a power saving mechanism that controls power consumption of the arithmetic core 1310.
  • the calculation core 1310 may be configured as a heterogeneous information processing device having not only calculation cores of the same type but also a plurality of calculation cores with different processing methods.
  • the main storage device 1320 stores data for performing calculation processing, and the calculation core 1310 loads and stores data to the main storage device 1320.
  • the controller (not shown) of the main storage device 1320 includes a power saving mechanism that controls power consumption of the main storage device 1320.
  • the main storage device 1320 may include a cache memory (not shown) provided between the main storage device 1320 and the arithmetic core 1310.
  • FIG. 2 is a diagram showing an example of roofline model data 1101 according to the first embodiment.
  • Roofline model data 1101 is composed of at least roofline model data 2000 for each calculation core 1310 provided in computer hardware 1300.
  • the roofline model data 2000 is expressed as a graph in which the horizontal axis represents calculation strength and the vertical axis represents calculation performance. Furthermore, it includes information on the maximum value A and minimum value B of the controllable calculation performance of the calculation core 1310, and information on the maximum value C and minimum value D of the controllable bandwidth performance of the main storage device 1320.
  • the roofline model data 2000 defines the upper limit of the calculation performance with respect to the calculation intensity with respect to the controllable calculation performance of the calculation core 1310 and the controllable bandwidth performance of the main storage device 1320. Note that details of the roofline model are described in, for example, "Samuel Williams, Andrew Waterman and David Patterson, "Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore, (2009)”.
  • the application performance definition table 1102 includes a container identifier 3000, an application identifier 3001, an execution thread identifier 3002 indicating an execution thread (hereinafter also referred to as an execution thread), an operation strength 3003, and a memory transfer execution efficiency 3004.
  • the application identifier 3001 describes, for each container identifier 3000, an identifier indicating an application 1610 that operates within the container execution environment 1600. Note that if the application 1400 is not an application that operates within the container execution environment 1600, an identifier is written for each application 1400. For example, in the example of FIG. 3, the applications 1610 that operate within the container runtime environment 1600 described by the identifier C1 of the container identifier 3000 are illustrated as having two identifiers, App1 and App2, from the application identifier 3001. There is. Furthermore, the container identifier 3000 for the identifier App3 is invalid, and is not an application that operates within the container execution environment 1600.
  • the execution thread identifier 3002 describes an identifier indicating an execution thread for each application.
  • FIG. 4B is an example in which the calculation core 1310 is one calculation core 0.
  • the calculation core 1310 is one calculation core 0.
  • only one calculation core 0 may operate while other calculation cores are in a low-power sleep state. In this case as well, only one thread is executed on the computing core at the same time.
  • step S6008 the roofline optimization calculation unit 1112 sets the control band performance of the main storage device 1320 to the maximum value C, and sets the theoretical value of the control performance of the calculation core 1310 to the roof calculation performance value P1 calculated in step S6004. , the lower limit value is set to the roof calculation performance value P2 calculated in step S6007. After the processing step of step S6008, the process advances to step S6013.
  • step S6009 the roof line optimization calculation unit 1112 translates the slope part of the illustrated roof line model data 2000, and calculates a new slope passing through the intersection B1 at the roof part of the roof line model data 2000 and the slope. Calculate the slope value (assumed to be the theoretical band value) of In step S6010, similarly to step S6005, the roofline optimization calculation unit 1112 obtains the memory transfer execution efficiency 3004 corresponding to the execution thread identifier 3002 from the application performance definition table 1102.
  • step S6012 the roofline optimization calculation unit 1112 sets the control band performance of the main storage device 1320 to the execution band value calculated in step S6011, and sets the control performance of the calculation core 1310 to the maximum value A of controllable calculation performance. do.
  • FIG. 7A, FIG. 7B, and FIG. 7C show roof line optimization calculations when the calculation intensity of the application by the roof line optimization calculation unit 1112 of Embodiment 1 has an intersection at the slope part of the roof line model data 2000 in FIG. 2.
  • FIG. 7A, FIG. 7B, and FIG. 7C show roof line optimization calculations when the calculation intensity of the application by the roof line optimization calculation unit 1112 of Embodiment 1 has an intersection at the slope part of the roof line model data 2000 in FIG. 2.
  • the roofline model 8000 shown in FIG. 8A calculates, in step S6003 shown in FIG. An example is shown in which the intersection point between the roofline model data consisting of the maximum value C of controllable band performance and the calculation strength 3003 acquired in step S6002 is calculated, and the roofline model data has an intersection point B1 at the roof portion. ing.
  • step S9000 roof line control section 1202 starts a roof line control process.
  • step S9001 the roofline control unit 1202 obtains an execution thread of an application to be executed from the scheduler 1201.
  • step S9003 the roofline control unit 1202 calculates the control performance value for the arithmetic core 1310 on which the execution thread operates by the roofline optimization calculation unit 1112, and if there is a control performance value already set by the roofline control setting unit 1113, The process advances to step S9004.
  • the control performance value calculated by the roofline optimization calculation unit 1112 and set by the roofline control setting unit 1113 is set as the lower limit value or theoretical value of the control performance.
  • step S9004 the roofline control unit 1202 sets the control performance of the arithmetic core 1310 on which the execution thread operates to the control performance value acquired in step S9003, and controls the power of the arithmetic core 1310 according to the control performance value. , is executed via the arithmetic core performance and power control unit 1203.
  • dynamic voltage frequency control DVFS
  • DVFS Dynamic Voltage and Frequency Scaling
  • step S9005 the roofline control unit 1202 sets the control performance of the arithmetic core 1310 on which the execution thread operates to the theoretical value or lower limit value of the control performance acquired in step S9003, and 1310 is executed via the arithmetic core performance power control unit 1203. Note that if there is a lower limit value of control performance in step S9003, the lower limit value may be used.
  • step S9004 After the processing in step S9004 or step S9005, the process proceeds to step S9006, and in step S9006, the roof line control unit 1202 ends the roof line control process.
  • FIG. 10 is a diagram illustrating an example of information acquired from the scheduling information acquisition unit 1111 according to the second embodiment. The difference from FIG. 4 is that the execution thread of the application is executed at the same time on each calculation core.
  • the example in FIG. 10 shows the scheduling information of application execution threads for computing core 0 and computing core 1, and the execution threads executed by computing core 0 and computing core 1 overlap at the same time. .
  • FIG. 11 is a diagram showing an operation flow of the roof line optimization calculation section according to the second embodiment. The difference from FIG. 6 is that the execution thread of the application is executed at the same time on each calculation core.
  • roofline optimization calculation section 1112 starts an optimization calculation process.
  • the roofline optimization calculation unit 1112 repeats steps S11001 to S11012 for each set of execution threads of applications that overlap in execution at the same time in each calculation core 1310.
  • step S11002 the roofline optimization calculation unit 1112 obtains each calculation strength 3003 for each set of execution threads from the application performance definition table 1102.
  • the roofline optimization calculation unit 1112 calculates the bandwidth performance of the main storage device 1320 required by each execution thread.
  • a method for calculating the bandwidth performance required by an execution thread is, for example, for the calculation core 1310 on which the execution thread operates, the maximum value A of the controllable calculation performance of the calculation core 1310 and the controllable bandwidth performance of the main storage device 1320.
  • the intersection point between the roof line model data 2000 composed of the maximum value C and the calculation strength 3003 acquired in step S11002 is calculated. If the intersection point is on a slope part of the roofline model data 2000, the slope value may be the bandwidth performance value required by the execution thread.
  • a gradient value may be calculated by multiplying the gradient value by the reciprocal of the memory transfer execution efficiency 3004 of the execution thread, and the gradient value may be used as the bandwidth performance value required by the execution thread.
  • the controllable bandwidth performance of the main storage device 1320 exceeds the maximum value C, the bandwidth performance value requested by the execution thread may be set to the maximum value C.
  • step S11004 the roofline optimization calculation unit 1112 determines that the total value of the bandwidth performance of the main storage device 1320 required by each execution thread calculated in step S11003 is the maximum of the controllable bandwidth performance of the main storage device 1320. It is determined whether the value is larger than the value C. If it is larger, the process advances to step S11005. If it is smaller, the process advances to step S11006.
  • step S11006 the roofline optimization calculation unit 1112 repeats steps S11006 to S11008 for each execution thread.
  • step S11007 the roofline optimization calculation unit 1112 calculates the control performance of the arithmetic core 1310 on which the execution thread operates.
  • step S11009 the same determination as step S11004 is made. If the total value of the bandwidth performance of the main storage device 1320 requested by each execution thread calculated in step S11003 is larger than the maximum value C of the controllable bandwidth performance of the main storage device 1320, the process advances to step S11010. If it is smaller, the process advances to step S11011.
  • step S11010 the roofline optimization calculation unit 1112 sets the control band performance of the main storage device 1320 to the maximum value C.
  • step S11011 the roofline optimization calculation unit 1112 sets the control bandwidth performance of the main storage device 1320 to the total value of the bandwidth performance of the main storage device 1320 required by each execution thread calculated in step S11003.
  • FIG. 12 is a diagram illustrating an operational flow of control performance calculation processing included in the roofline optimization calculation section of the second embodiment.
  • the roofline optimization calculation unit 1112 starts a control performance calculation process for the calculation core 1310.
  • the roofline optimization calculation unit 1112 obtains the memory transfer execution efficiency 3004 corresponding to the execution thread identifier 3002 from the application performance definition table 1102.
  • step S12002 the roofline optimization calculation unit 1112 determines, for the arithmetic core 1310 on which the execution thread operates, the maximum value A of the controllable arithmetic performance of the arithmetic core 1310 and the controllable bandwidth performance of the main storage device 1320.
  • the intersection point between the roof line model data 2000 composed of the maximum value C and the calculation strength 3003 acquired in step S11002 is calculated. If the intersection A1a is on the slope of the roofline model data 2000, the process advances to step S12003. If the intersection B1a is on the roof of the roofline model data 2000, the process advances to step S12008.
  • step S12003 the roofline optimization calculation unit 1112 calculates a roof that indicates the calculation performance value P1a of the intersection A1a on the slope part of the roofline model data 2000.
  • step S12005 the roof line optimization calculation unit 1112 calculates a roof indicating the calculation performance value P2a of the intersection A2a obtained in step S12004.
  • step S12006 the roof line optimization calculation unit 1112 sets the theoretical value of the control performance of the calculation core 1310 as the calculation performance value P1a during the roof period calculated in step S12003.
  • step S12007 the roofline optimization calculation unit 1112 sets the lower limit value of the control performance of the calculation core 1310 to the roof-time calculation performance value P2a calculated in step S12005. After the processing step of step S12007, the process advances to step S12014.
  • step S12008 the roofline optimization calculation unit 1112 determines that the total value of the bandwidth performance of the main storage device 1320 requested by the execution thread is equal to It is determined whether the performance is larger than the maximum value C. If it is larger, the process advances to step S12009. If it is smaller, the process advances to step S12013.
  • step S12009 the roofline optimization calculation unit 1112 multiplies the maximum value C of the controllable bandwidth performance of the main storage device 1320 by the bandwidth ratio of the execution thread calculated in step S11005 shown in FIG. Calculate the bandwidth performance value.
  • An intersection B2a between the slope part of the roofline model data 2000, which is the calculated band performance value, and the calculation strength 3003 acquired in step S11002 is calculated.
  • step S12010 the roof line optimization calculation unit 1112 calculates a roof indicating the calculation performance value Q2a of the intersection B2a calculated in step S12009.
  • step S12011 the roofline optimization calculation unit 1112 determines whether the calculation performance value Q2a calculated in step S12010 is smaller than the maximum value A of the controllable calculation performance of the calculation core 1310. If it is smaller, the process advances to step S12012. If it is larger, the process advances to step S12013.
  • step S12012 the roofline optimization calculation unit 1112 sets the lower limit value of the control performance of the calculation core 1310 to the roof-time calculation performance value Q2a calculated in step S12010. After the processing step of step S12012, the process advances to step S12014.
  • step S12013 the roof line optimization calculation unit 1112 sets the control performance value of the calculation core 1310 to the maximum value A of the controllable calculation performance of the calculation core 1310, and sets it as the calculation performance value Q1a.
  • step S12014 the roof line optimization calculation unit 1112 ends the control performance calculation process of the calculation core 1310.
  • FIG. 13A, FIG. 13B, FIG. 13C, and FIG. 13D are diagrams showing examples of roofline optimization calculations by the roofline optimization calculation unit 1112 of the second embodiment.
  • the roofline model 13000 shown in FIG. 13A performs calculations on the calculation core 1310 on which the execution thread operates in the process of calculating the bandwidth performance value of the main storage device 1320 requested by the execution thread in step S11003 shown in FIG.
  • Roofline model data 2000 consisting of the maximum value A of the controllable calculation performance of the core 1310 and the maximum value C of the controllable bandwidth performance of the main storage device 1320, and the calculation intensity 3003 acquired in step S11002.
  • a case is illustrated in which the intersection point A1a, which is the intersection point of 2, is located on the slope part of the roof line model data 2000.
  • the roofline model 13010 shown in FIG. 13B performs calculations on the calculation core 1310 on which the execution thread operates in the process of calculating the bandwidth performance value of the main storage device 1320 requested by the execution thread in step S11003 shown in FIG.
  • the intersection point of the roofline model data 2000 which is composed of the maximum value A of the controllable calculation performance of the core 1310 and the maximum value C of the controllable bandwidth performance of the main storage device 1320, and the calculation strength 3003 obtained in step S11002.
  • B1a is located at the roof portion of the roof line model data 2000 is illustrated.
  • a dashed arrow 13011 corresponding to the calculation process of the bandwidth performance value requested by the execution thread is included in the example.
  • the roof line model 13020 shown in FIG. 13C has a broken line arrow 13021 corresponding to the calculation process of the roof indicating the calculation performance value P1a of the intersection point A1a in step S12003 shown in FIG.
  • the example includes a broken line arrow 13022 and a broken line arrow 13023 corresponding to the roof calculation process that indicates the calculation performance value P2a of the intersection A2a in step S12005.
  • the roof line model 13030 shown in FIG. 13D is illustrated including a broken line arrow 13031 corresponding to the calculation process of the intersection B2a in step S12009 shown in FIG.
  • FIG. 14 is a diagram showing an operational flow of execution thread acquisition processing in the roofline control unit 1202 according to the second embodiment.
  • the difference from FIG. 9 is that the processing step of obtaining an execution thread in step S9001 corresponds to the fact that the execution thread of the application is executed at the same time in each calculation core 1310.
  • the operation flow of the roofline control unit 1202 is such that step S9001 in FIG. 9 is replaced with steps S14000 to S14006 shown in FIG. 14.
  • step S14002 the roofline control unit 1202 acquires from the scheduler 1201 the execution thread of the application to be assigned to its own arithmetic core 1310 for execution.
  • step S14006 After completing the process in step S14006, the processes in steps S9002 to S9006 in FIG. 9 are performed. Note that the processing steps from step S9002 to step S9006 in FIG. 9 are based on the control bandwidth performance of the main storage device 1320 calculated for each set of execution threads of applications that are executed at the same time on each arithmetic core 1310 illustrated in FIG. Based on the control performance of the arithmetic core 1310 on which each execution thread operates, the control bandwidth performance of the main storage device 1320 and the corresponding power control of the main memory device 1320, and the control of the arithmetic core 1310 on which each execution thread operates. What is necessary is to perform power control of the arithmetic core 1310 according to the performance.
  • control band performance of the main storage device 1320 and the power control of the main storage device 1320 may be performed by any one of the arithmetic cores 1310. Further, control performance of the arithmetic core 1310 on which each execution thread operates and power control of the arithmetic core 1310 corresponding to the control performance may be performed by each arithmetic core 1310.
  • Performance power optimization unit 1000 Information processing device, 1100 Performance power optimization unit, 1101 Roofline model data, 1102 Application performance definition table, 1110 Performance power optimization program, 1111 Scheduling information acquisition unit, 1112 Roofline optimization calculation unit, 1113 Roofline control setting Department, 1200 System software, 1201 Scheduler, 1202 Roofline control unit, 1203 Computing core performance power control unit, 1204 Memory band power control unit, 1300 Computer hardware, 1310 Computing core, 1320 Main storage, 1400 Application

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

A performance-power optimization unit (1100) is provided with a performance-power optimization program (1110), roofline model data (1101) that indicates performance per unit time and operational intensity of computer hardware (1300), and an application performance definition table (1101) that includes operational intensity information for an application. The performance-power optimization program (1110) is provided with a scheduling information acquisition unit (1111) that acquires scheduling information for the application with respect to a computing core (1310) of the computer hardware (1300), and a roofline optimization calculation unit (1112) that performs optimization calculation of the roofline of the computer hardware (1300).

Description

情報処理装置および情報処理方法Information processing device and information processing method
 本願は、演算処理性能に適応して消費電力を制御する情報処理装置および情報処理方法に関する。 The present application relates to an information processing device and an information processing method that control power consumption in accordance with arithmetic processing performance.
 自動制御システムは、一般的に、複数の機能が協調・統合化して認知・判断・制御を行うシステムである。例えば、自動運転システムは、周辺状況から最適な制御パラメータを生成する自動運転制御部と、車両のエンジン制御、ブレーキ制御およびステアリング制御をそれぞれ実現するエンジン制御部、ブレーキ制御部およびステアリング制御部で構成される。自律レベル(例:自動運転レベル)が上がるにつれ、自動制御システムに多くの演算性能を必要とする。 An automatic control system is generally a system in which multiple functions are coordinated and integrated to perform recognition, judgment, and control. For example, an automated driving system consists of an automated driving control unit that generates optimal control parameters from surrounding conditions, and an engine control unit, brake control unit, and steering control unit that implement vehicle engine control, brake control, and steering control, respectively. be done. As the level of autonomy (e.g. self-driving) increases, automatic control systems require more computing performance.
 これに対し、システムが必要とする処理性能を満たすため、高性能かつ複数の演算コアおよび大容量の主記憶装置を備えたシステムオンチップ(SoC;System on a chip)を搭載するシステム構成がある。一方で、演算コアの高性能化およびマルチコア/メニ―コア化、主記憶装置の大容量化に伴い、システムの消費電力および発熱量が増大する。これに対し、例えば、動的電圧周波数制御(DVFS;Dynamic Voltage and Frequency Scaling)を備える演算処理装置がある。DVFS機能は、演算コアの動作周波数および動作電圧を変更し消費電力の低減を行う省電力機構となる。しかしながら、アプリケーションなどの演算処理の演算性能に影響を与えずに、リアルタイムに消費電力を最適化制御することは容易でない。さらに、複数の演算コアを含むシステム構成で並列実行するアプリケーションあるいは複数のコンテナ環境内のアプリケーションを考慮した最適化制御が必要となる。 On the other hand, in order to meet the processing performance required by the system, there are system configurations that are equipped with a system on a chip (SoC) that has high performance, multiple processing cores, and a large-capacity main storage device. . On the other hand, as the performance of arithmetic cores increases, multi-core/many-core technology increases, and the capacity of main storage devices increases, power consumption and heat generation of the system increase. On the other hand, for example, there is an arithmetic processing device equipped with dynamic voltage frequency control (DVFS). The DVFS function is a power saving mechanism that reduces power consumption by changing the operating frequency and operating voltage of the arithmetic core. However, it is not easy to optimize and control power consumption in real time without affecting the performance of arithmetic processing in applications and the like. Furthermore, optimization control is required that takes into account applications that are executed in parallel in a system configuration that includes multiple processing cores or applications that are in multiple container environments.
 このような課題に対し、特許文献1では、ハイパーコンバージドインフラストラクチャ(HCI)環境でのVM/コンテナおよびデータ配置決定方法が開示されている。また、特許文献2では、プロセッサ内部で取得可能なメモリアクセス情報を拠り所に、プロセッサ周波数、命令発行幅を変えて電力消費を削減する装置が開示されている。 To address these issues, Patent Document 1 discloses a method for determining VM/container and data placement in a hyperconverged infrastructure (HCI) environment. Further, Patent Document 2 discloses a device that reduces power consumption by changing processor frequency and instruction issue width based on memory access information that can be obtained inside the processor.
特開2020-52730号公報JP2020-52730A 国際公開第2008/120274号公報International Publication No. 2008/120274
 特許文献1では、共用する計算機資源の利用状況を管理し、利用状況に基づいて、配置先ノードの計算機資源の上限を超えないように、新規の仮想マシン、コンテナ、ストレージボリュームの配置先ノードを決定する方法であり、アプリケーションの演算処理に適応した情報処理装置の性能電力の最適化を対象としていない。特許文献2では、プロセッサ内部のメモリアクセス情報に基づきプロセッサの電力を削減する装置であり、アプリケーションの演算処理に関する情報に基づいた高精度な性能電力の最適化を対象としていない。さらに、複数の演算コアで並列処理するアプリケーションも対象としていない。また、性能電力の最適化対象に主記憶装置を含まない。 In Patent Document 1, the usage status of shared computer resources is managed, and based on the usage status, the destination node for new virtual machines, containers, and storage volumes is selected so as not to exceed the upper limit of the computer resources of the destination node. This is a method for determining the performance and power of an information processing device adapted to the calculation processing of an application. Patent Document 2 is a device that reduces power of a processor based on memory access information inside the processor, and does not target highly accurate optimization of performance power based on information regarding arithmetic processing of an application. Furthermore, it does not target applications that perform parallel processing using multiple processing cores. Furthermore, the main storage device is not included in the performance power optimization target.
 特許文献1及び特許文献2に開示された技術を組み合わせても、複数の演算コアで並列処理するアプリケーションの演算処理に適応し、処理性能を妨げないリアルタイムな情報処理装置の性能電力の最適化を実現できないという問題があった。 Even if the technologies disclosed in Patent Document 1 and Patent Document 2 are combined, it is possible to optimize the performance and power of a real-time information processing device that adapts to the calculation processing of applications that perform parallel processing using multiple calculation cores and does not hinder processing performance. The problem was that it couldn't be done.
 本願は、これらの問題点に鑑みてなされたものであり、アプリケーションの演算処理に適応し、処理性能を妨げないリアルタイムな情報処理装置の性能電力の最適化制御を可能にすることを目的とする。また、本願は、複数の演算コアで並列処理するアプリケーションの実行に適応する情報処理装置の性能電力の最適化制御を可能にすることを目的とする。複数の演算コアは同一種類の演算コアだけでなく、処理方法が異なる演算コアを複数もつヘテロジニアスな情報処理装置の構成に対応する。 The present application was made in view of these problems, and aims to enable optimization control of the performance power of an information processing device in real time, which is adapted to the arithmetic processing of applications and does not impede processing performance. . Another object of the present application is to enable optimization control of the performance and power of an information processing device that is adapted to the execution of an application that is processed in parallel by a plurality of arithmetic cores. The plurality of arithmetic cores corresponds to the configuration of a heterogeneous information processing device having not only arithmetic cores of the same type but also a plurality of arithmetic cores with different processing methods.
 本願に開示される情報処理装置は、省電力機構を備える演算コアおよび主記憶装置を含む情報処理装置において、演算性能と消費電力の最適化処理を行う性能電力最適化部に、計算機の演算強度と単位時間当たりの演算性能を示すルーフラインモデルデータと、アプリケーションの演算強度情報を含むアプリケーション性能定義テーブルと、計算機の演算コアに対するアプリケーションのスケジューリング情報取得部と、計算機のルーフライン最適化計算部と、を備えている。 The information processing device disclosed in the present application includes a computing power optimization unit that performs processing for optimizing computing performance and power consumption in an information processing device that includes a computing core and a main storage device equipped with a power saving mechanism. and roofline model data indicating calculation performance per unit time, an application performance definition table containing application calculation strength information, an application scheduling information acquisition unit for the calculation core of the computer, and a roofline optimization calculation unit of the computer. , is equipped with.
 本願の情報処理装置によれば、複数の演算コアで並列処理するアプリの演算強度に適応した情報処理装置(演算コア・主記憶装置)の高精度な性能電力最適化を可能とする。アプリケーションのスケジューリング情報と連携し、アプリの処理性能を妨げないリアルタイムな性能電力最適化が可能な情報処理装置を提供する。なお、複数の演算コアは同一種類の演算コアではなく、処理方法が異なる演算コアを複数もつヘテロジニアスな構成にも対応する。 According to the information processing device of the present application, it is possible to perform highly accurate performance and power optimization of the information processing device (computation core/main storage device) that is adapted to the computation intensity of an application that is processed in parallel by multiple computation cores. The present invention provides an information processing device that is capable of real-time performance and power optimization that does not hinder application processing performance by linking with application scheduling information. Note that the plurality of arithmetic cores are not of the same type, but also correspond to a heterogeneous configuration having a plurality of arithmetic cores with different processing methods.
実施の形態1に係る情報処理装置の構成を示すブロック図である。1 is a block diagram showing the configuration of an information processing device according to Embodiment 1. FIG. 実施の形態1に係る情報処理装置におけるルーフラインモデルデータの例を示す図である。3 is a diagram illustrating an example of roofline model data in the information processing apparatus according to the first embodiment. FIG. 実施の形態1に係る情報処理装置におけるアプリケーション性能定義テーブルの内容例を示すテーブル図である。FIG. 3 is a table diagram showing an example of contents of an application performance definition table in the information processing apparatus according to the first embodiment. 実施の形態1に係る情報処理装置におけるスケジューリング情報取得部から取得する情報の例を示す図である。3 is a diagram illustrating an example of information acquired from a scheduling information acquisition unit in the information processing apparatus according to the first embodiment. FIG. 実施の形態1に係る情報処理装置におけるスケジューリング情報取得部から取得する情報の例を示す図である。3 is a diagram illustrating an example of information acquired from a scheduling information acquisition unit in the information processing apparatus according to the first embodiment. FIG. 実施の形態1に係る情報処理装置における性能電力最適化部の動作フローを示す図である。3 is a diagram showing an operation flow of a performance power optimization unit in the information processing device according to the first embodiment. FIG. 実施の形態1に係る情報処理装置におけるルーフライン最適化計算部の動作フローを示す図である。FIG. 3 is a diagram illustrating an operation flow of a roofline optimization calculation unit in the information processing device according to the first embodiment. 実施の形態1に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータの勾配部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation in a case where the calculation strength of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection with a slope portion of roofline model data. 実施の形態1に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータの勾配部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation in a case where the calculation strength of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection with a slope portion of roofline model data. 実施の形態1に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータの勾配部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation in a case where the calculation strength of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection with a slope portion of roofline model data. 実施の形態1に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータのルーフ部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation when the calculation intensity of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection point with a roof portion of roofline model data. 実施の形態1に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータのルーフ部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation when the calculation intensity of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection point with a roof portion of roofline model data. 実施の形態1に係る情報処理装置におけるルーフライン最適化計算部によるアプリケーションの演算強度がルーフラインモデルデータのルーフ部に交点を持つ場合のルーフライン最適化計算の例を示す図である。FIG. 6 is a diagram illustrating an example of roofline optimization calculation when the calculation intensity of an application by the roofline optimization calculation unit in the information processing device according to the first embodiment has an intersection point with a roof portion of roofline model data. 実施の形態1に係る情報処理装置におけるルーフライン制御部の動作フローを示す図である。FIG. 3 is a diagram showing an operation flow of a roofline control section in the information processing device according to the first embodiment. 実施の形態2に係る情報処理装置におけるスケジューリング情報取得部から取得する情報の例を示す図である。7 is a diagram illustrating an example of information acquired from a scheduling information acquisition unit in the information processing apparatus according to Embodiment 2. FIG. 実施の形態2に係る情報処理装置におけるルーフライン最適化計算部の動作フローを示す図である。7 is a diagram showing an operation flow of a roofline optimization calculation unit in the information processing device according to the second embodiment. FIG. 実施の形態2に係る情報処理装置におけるルーフライン最適化計算部に含む制御性能算出処理の動作フローを示す図である。7 is a diagram illustrating an operational flow of control performance calculation processing included in a roofline optimization calculation section in the information processing device according to the second embodiment. FIG. 実施の形態2に係る情報処理装置におけるルーフライン最適化計算部によるルーフライン最適化計算の例を示す図である。FIG. 7 is a diagram illustrating an example of roofline optimization calculation by a roofline optimization calculation unit in the information processing device according to the second embodiment. 実施の形態2に係る情報処理装置におけるルーフライン最適化計算部によるルーフライン最適化計算の例を示す図である。FIG. 7 is a diagram illustrating an example of roofline optimization calculation by a roofline optimization calculation unit in the information processing device according to the second embodiment. 実施の形態2に係る情報処理装置におけるルーフライン最適化計算部によるルーフライン最適化計算の例を示す図である。FIG. 7 is a diagram illustrating an example of roofline optimization calculation by a roofline optimization calculation unit in the information processing device according to the second embodiment. 実施の形態2に係る情報処理装置におけるルーフライン最適化計算部によるルーフライン最適化計算の例を示す図である。FIG. 7 is a diagram illustrating an example of roofline optimization calculation by a roofline optimization calculation unit in the information processing device according to the second embodiment. 実施の形態2に係る情報処理装置のルーフライン制御部における実行スレッド取得処理の動作フローを示す図である。FIG. 7 is a diagram illustrating an operation flow of execution thread acquisition processing in the roofline control unit of the information processing device according to the second embodiment.
実施の形態1.
 図1は、実施の形態1に係る情報処理装置1000の構成を示すブロック図である。
 情報処理装置1000には、少なくとも、性能電力最適化部1100と、システムソフトウェア1200と、計算機ハードウェア1300と、アプリケーション1400とを有する。
Embodiment 1.
FIG. 1 is a block diagram showing the configuration of an information processing apparatus 1000 according to the first embodiment.
The information processing device 1000 includes at least a performance power optimization unit 1100, system software 1200, computer hardware 1300, and an application 1400.
 性能電力最適化部1100は、アプリケーション1400の演算処理に適応する計算機ハードウェア1300の性能電力を最適化制御する計算を行う。
 システムソフトウェア1200は、少なくとも、アプリケーション1400を計算機ハードウェア1300に割り当てて実行するとともに、計算機ハードウェア1300の実行状態の取得、計算機ハードウェア1300の性能および電力を制御する。
The performance power optimization unit 1100 performs calculations to optimize and control the performance power of the computer hardware 1300 adapted to the calculation processing of the application 1400.
System software 1200 at least allocates and executes application 1400 to computer hardware 1300, acquires the execution state of computer hardware 1300, and controls the performance and power of computer hardware 1300.
 アプリケーション1400は、システムソフトウェア1200が割り当てた計算機ハードウェア1300のリソースを使用して実行動作する。なお、アプリケーション1400は、コンテナランタイム1500がシステムソフトウェア1200とともに用意するコンテナ実行環境1600内で動作するアプリケーション1610であってもよい。 The application 1400 is executed using the resources of the computer hardware 1300 allocated by the system software 1200. Note that the application 1400 may be an application 1610 that operates within a container execution environment 1600 that is prepared by the container runtime 1500 together with the system software 1200.
 性能電力最適化部1100には、ルーフラインモデルデータ1101と、アプリケーション性能定義テーブル1102と、性能電力最適化プログラム1110とを備える。 The performance power optimization unit 1100 includes roofline model data 1101, an application performance definition table 1102, and a performance power optimization program 1110.
 ルーフラインモデルデータ1101は、計算機ハードウェア1300における、演算強度と単位時間当たりの演算性能との関係を表すデータである。演算強度は、データサイズ、例えば、1Byteあたりの演算量を表す。演算性能は、単位時間、例えば1秒あたりの演算量を表す。 Roofline model data 1101 is data representing the relationship between calculation intensity and calculation performance per unit time in computer hardware 1300. The calculation intensity represents the amount of calculation per data size, for example, 1 Byte. Computation performance represents the amount of computation per unit time, for example, per second.
 アプリケーション性能定義テーブル1102は、アプリケーション1400およびコンテナ実行環境1600内で動作するアプリケーション1610を、計算機ハードウェア1300上で実行するにあたり、計算機ハードウェア1300の性能電力制御の最適化計算を行うために必要な情報を記述する。 The application performance definition table 1102 contains information necessary to perform optimization calculations for performance power control of the computer hardware 1300 when the application 1400 and the application 1610 running in the container execution environment 1600 are executed on the computer hardware 1300. Describe information.
 性能電力最適化プログラム1110は、スケジューリング情報取得部1111と、ルーフライン最適化計算部1112と、ルーフライン制御設定部1113とを含む。 The performance power optimization program 1110 includes a scheduling information acquisition section 1111, a roofline optimization calculation section 1112, and a roofline control setting section 1113.
 スケジューリング情報取得部1111は、アプリケーション1400およびコンテナ実行環境1600内で動作するアプリケーション1610の、計算機ハードウェア1300に対する割り当て情報を取得する。 The scheduling information acquisition unit 1111 acquires allocation information of the application 1400 and the application 1610 running in the container execution environment 1600 to the computer hardware 1300.
 ルーフライン最適化計算部1112は、アプリケーション1400およびコンテナ実行環境1600内で動作するアプリケーション1610を、計算機ハードウェア1300上で実行するにあたり、計算機ハードウェア1300の性能電力制御の最適化計算を行う。 The roofline optimization calculation unit 1112 performs optimization calculations for performance power control of the computer hardware 1300 when executing the application 1400 and the application 1610 running in the container execution environment 1600 on the computer hardware 1300.
 ルーフライン制御設定部1113は、ルーフライン最適化計算部1112による性能電力制御の最適化計算にもとづいて、計算機ハードウェア1300の性能電力の制御設定をシステムソフトウェア1200へ行う。 The roofline control setting unit 1113 sets the performance power control of the computer hardware 1300 to the system software 1200 based on the performance power control optimization calculation performed by the roofline optimization calculation unit 1112.
 システムソフトウェア1200は、スケジューラ1201と、ルーフライン制御部1202と、演算コア性能電力制御部1203と、メモリ帯域電力制御部1204とを備える。 The system software 1200 includes a scheduler 1201 , a roofline control section 1202 , an arithmetic core performance power control section 1203 , and a memory bandwidth power control section 1204 .
スケジューラ1201は、アプリケーション1400およびコンテナ実行環境1600内で動作するアプリケーション1610を、計算機ハードウェア1300上への割り当ておよび実行を制御する。 The scheduler 1201 controls the allocation and execution of the application 1400 and the application 1610 running within the container execution environment 1600 on the computer hardware 1300.
 ルーフライン制御部1202は、計算機ハードウェア1300における、演算強度と単位時間当たりの演算性能との関係を表すルーフラインモデルを制御する。 The roofline control unit 1202 controls a roofline model representing the relationship between calculation intensity and calculation performance per unit time in the computer hardware 1300.
 演算コア性能電力制御部1203は、計算機ハードウェア1300に備える演算コアの処理性能および電力を制御する。 The arithmetic core performance and power control unit 1203 controls the processing performance and power of the arithmetic cores provided in the computer hardware 1300.
 メモリ帯域電力制御部1204は、計算機ハードウェア1300に備える主記憶装置の帯域性能および電力を制御する。 The memory bandwidth power control unit 1204 controls the bandwidth performance and power of the main storage device included in the computer hardware 1300.
 計算機ハードウェア1300には、少なくとも、ひとつ以上の演算コア1310と、主記憶装置1320を備える。
 演算コア1310は、少なくとも、演算処理を行う演算機および演算コア1310の消費電力を制御する省電力機構を備える。なお、演算コア1310は、同一種類の演算コアだけでなく、処理方法が異なる演算コアを複数もつヘテロジニアスな情報処理装置の構成でもよい。
 主記憶装置1320は、演算処理を行うためのデータを記憶し、演算コア1310は主記憶装置1320へデータのロードおよびストアを行う。主記憶装置1320のコントローラ(図示せず)は、主記憶装置1320の消費電力を制御する省電力機構を備える。主記憶装置1320は、演算コア1310との間に備えるキャッシュメモリ(図示せず)を含んでもよい。
Computer hardware 1300 includes at least one or more arithmetic cores 1310 and a main storage device 1320.
The arithmetic core 1310 includes at least a computing machine that performs arithmetic processing and a power saving mechanism that controls power consumption of the arithmetic core 1310. Note that the calculation core 1310 may be configured as a heterogeneous information processing device having not only calculation cores of the same type but also a plurality of calculation cores with different processing methods.
The main storage device 1320 stores data for performing calculation processing, and the calculation core 1310 loads and stores data to the main storage device 1320. The controller (not shown) of the main storage device 1320 includes a power saving mechanism that controls power consumption of the main storage device 1320. The main storage device 1320 may include a cache memory (not shown) provided between the main storage device 1320 and the arithmetic core 1310.
 図2は、実施の形態1のルーフラインモデルデータ1101の例を示す図である。 FIG. 2 is a diagram showing an example of roofline model data 1101 according to the first embodiment.
 ルーフラインモデルデータ1101は、少なくとも、計算機ハードウェア1300に備える演算コア1310ごとのルーフラインモデルデータ2000から構成する。
 例えば、図2の例は、ルーフラインモデルデータ2000を、横軸を演算強度、縦軸を演算性能とするグラフで表したものである。さらに、演算コア1310の制御可能な演算性能の最大値Aおよび最小値Bの情報と、主記憶装置1320の制御可能な帯域性能の最大値Cおよび最小値Dの情報とを含む。
Roofline model data 1101 is composed of at least roofline model data 2000 for each calculation core 1310 provided in computer hardware 1300.
For example, in the example of FIG. 2, the roofline model data 2000 is expressed as a graph in which the horizontal axis represents calculation strength and the vertical axis represents calculation performance. Furthermore, it includes information on the maximum value A and minimum value B of the controllable calculation performance of the calculation core 1310, and information on the maximum value C and minimum value D of the controllable bandwidth performance of the main storage device 1320.
 ルーフラインモデルデータ2000は、演算コア1310の制御可能な演算性能および主記憶装置1320の制御可能な帯域性能について、演算強度に対する演算性能の上限値を規定する。
 なお、ルーフラインモデルの詳細は、例えば、「Samuel Williams, Andrew Waterman and David Patterson, "Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore, (2009)"」に記載されている。
The roofline model data 2000 defines the upper limit of the calculation performance with respect to the calculation intensity with respect to the controllable calculation performance of the calculation core 1310 and the controllable bandwidth performance of the main storage device 1320.
Note that details of the roofline model are described in, for example, "Samuel Williams, Andrew Waterman and David Patterson, "Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore, (2009)".
 図3は、実施の形態1のアプリケーション性能定義テーブル1102の内容例を示すテーブル図である。 FIG. 3 is a table diagram showing an example of the contents of the application performance definition table 1102 according to the first embodiment.
 アプリケーション性能定義テーブル1102は、コンテナ識別子3000と、アプリケーション識別子3001と、実行スレッドを示す実行スレッド識別子3002(以下、実行スレッドとも称す)と、演算強度3003と、メモリ転送実行効率3004とを含む。 The application performance definition table 1102 includes a container identifier 3000, an application identifier 3001, an execution thread identifier 3002 indicating an execution thread (hereinafter also referred to as an execution thread), an operation strength 3003, and a memory transfer execution efficiency 3004.
 コンテナ識別子3000は、コンテナ実行環境1600を示す識別子を記載している。なお、アプリケーション1400がコンテナ実行環境1600内で動作するアプリケーションでない場合には、コンテナ識別子3000は無効を示す識別子とする。 The container identifier 3000 describes an identifier indicating the container execution environment 1600. Note that if the application 1400 is not an application that operates within the container execution environment 1600, the container identifier 3000 is an identifier indicating invalidity.
 アプリケーション識別子3001は、コンテナ識別子3000ごとに、コンテナ実行環境1600内で動作するアプリケーション1610を示す識別子を記載している。なお、アプリケーション1400がコンテナ実行環境1600内で動作するアプリケーションでない場合には、アプリケーション1400ごとに識別子を記載している。
 例えば、図3の例は、コンテナ識別子3000の識別子C1で記載しているコンテナ実行環境1600内で動作するアプリケーション1610は、アプリケーション識別子3001から、識別子App1および識別子App2のふたつであることが例示されている。また、識別子App3に対するコンテナ識別子3000の識別子は無効であり、コンテナ実行環境1600内で動作するアプリケーションでないことが例示されている。
 実行スレッド識別子3002は、アプリケーションごとの実行スレッドを示す識別子を記載している。アプリケーションの実行スレッドは、通常、ひとつ以上を含む。
 演算強度3003は、アプリケーションの実行スレッドごとの演算強度を示す情報を記載する。
 メモリ転送実行効率3004は、アプリケーションの実行スレッドごとの演算コア1310と主記憶装置1320との間のデータ転送実行効率を記載している。
The application identifier 3001 describes, for each container identifier 3000, an identifier indicating an application 1610 that operates within the container execution environment 1600. Note that if the application 1400 is not an application that operates within the container execution environment 1600, an identifier is written for each application 1400.
For example, in the example of FIG. 3, the applications 1610 that operate within the container runtime environment 1600 described by the identifier C1 of the container identifier 3000 are illustrated as having two identifiers, App1 and App2, from the application identifier 3001. There is. Furthermore, the container identifier 3000 for the identifier App3 is invalid, and is not an application that operates within the container execution environment 1600.
The execution thread identifier 3002 describes an identifier indicating an execution thread for each application. An application's execution threads typically include one or more.
The calculation strength 3003 describes information indicating the calculation strength of each execution thread of the application.
The memory transfer execution efficiency 3004 describes the data transfer execution efficiency between the arithmetic core 1310 and the main storage device 1320 for each execution thread of the application.
 図4Aおよび図4Bは、実施の形態1におけるスケジューリング情報取得部1111から取得するスケジューリング情報の例を示す図である。
 スケジューリング情報は、少なくとも、演算コア1310を構成する演算コア0、演算コア1ごとに、演算コア1310に対してアプリケーションの実行スレッドを割り当てるタイミングT1,T2,T3と、割り当て時間間隔T1a,T2a,T3aと、スケジューリング周期Pと、時間Tの情報とを含む。
4A and 4B are diagrams illustrating examples of scheduling information acquired from the scheduling information acquisition unit 1111 in the first embodiment.
The scheduling information includes at least timings T1, T2, and T3 for allocating application execution threads to the computing core 1310, and allocation time intervals T1a, T2a, and T3a for each computing core 0 and computing core 1 that constitute the computing core 1310. , a scheduling period P, and time T information.
 例えば、図4Aは、アプリケーションの実行スレッドが複数の演算コア0、演算コア1で同時間での実行がない場合の例示である。演算コア0と演算コア1に対するアプリケーションの実行スレッドのスケジューリング情報を例示しており、演算コア0と演算コア1の各々で実行するスレッドが同時間で重なることがない。 For example, FIG. 4A is an example of a case where the execution threads of an application are not executed at the same time by a plurality of calculation cores 0 and 1. The scheduling information of application execution threads for computing core 0 and computing core 1 is illustrated, and the threads executed by computing core 0 and computing core 1 do not overlap at the same time.
 また、図4Bは、演算コア1310がひとつの演算コア0である場合の例示である。例えば、他の演算コアが低電力のスリープ状態でひとつの演算コア0のみ動作する場合でもよい。この場合も、演算コアで実行するスレッドは同時間ではひとつとなる。 Further, FIG. 4B is an example in which the calculation core 1310 is one calculation core 0. For example, only one calculation core 0 may operate while other calculation cores are in a low-power sleep state. In this case as well, only one thread is executed on the computing core at the same time.
 図5は、実施の形態1の性能電力最適化部1100の動作フローを示す図である。 FIG. 5 is a diagram showing an operation flow of the performance power optimization unit 1100 of the first embodiment.
 ステップS5000にて、性能電力最適化部1100は、性能電力最適化工程を開始する。
 ステップS5001にて、性能電力最適化部1100は、スケジューリング情報取得部1111により、アプリケーションの実行スレッドのスケジューリング情報を取得する。なお、スケジューリング情報の例は前述した図4Aおよび図4Bで示した。
In step S5000, performance power optimization section 1100 starts a performance power optimization process.
In step S5001, the performance power optimization unit 1100 uses the scheduling information acquisition unit 1111 to acquire scheduling information of the execution thread of the application. Note that examples of scheduling information are shown in FIGS. 4A and 4B described above.
 ステップS5002にて、性能電力最適化部1100は、ルーフライン最適化計算部1112により、アプリケーション1400の実行スレッドの演算処理に適応する計算機ハードウェア1300の性能電力の最適化計算を行う。 In step S5002, the performance power optimization unit 1100 uses the roofline optimization calculation unit 1112 to perform optimization calculation of the performance power of the computer hardware 1300 that is adapted to the calculation processing of the execution thread of the application 1400.
 ステップS5003にて、性能電力最適化部1100は、ルーフライン制御設定部1113により、ステップS5002の最適化計算にもとづいて、計算機ハードウェア1300の性能電力の制御設定を行う。
 ステップS5004にて、性能電力最適化部1100は、性能電力最適化工程を終了する。
In step S5003, the performance power optimization unit 1100 uses the roofline control setting unit 1113 to perform control settings for the performance power of the computer hardware 1300 based on the optimization calculation in step S5002.
In step S5004, the performance power optimization unit 1100 ends the performance power optimization process.
 図6は、実施の形態1のルーフライン最適化計算部1112の動作フローを示す図である。 FIG. 6 is a diagram showing the operation flow of the roofline optimization calculation unit 1112 of the first embodiment.
 ステップS6000にて、ルーフライン最適化計算部1112は、最適化計算工程を開始する。
 ステップS6001にて、ルーフライン最適化計算部1112は、アプリケーションの実行スレッドごとにステップS6001~ステップS6013を繰り返す。なお、アプリケーションの実行スレッドは、図5のステップS5001で示したスケジューリング情報取得部1111で取得したスケジューリング情報に記載のアプリケーションの実行スレッドである。
In step S6000, roofline optimization calculation section 1112 starts an optimization calculation process.
In step S6001, the roofline optimization calculation unit 1112 repeats steps S6001 to S6013 for each execution thread of the application. Note that the application execution thread is the application execution thread described in the scheduling information acquired by the scheduling information acquisition unit 1111 shown in step S5001 in FIG.
 ステップS6002にて、ルーフライン最適化計算部1112は、アプリケーション性能定義テーブル1102から、図3における実行スレッド識別子3002に該当する演算強度3003を取得する。 In step S6002, the roofline optimization calculation unit 1112 obtains the calculation strength 3003 corresponding to the execution thread identifier 3002 in FIG. 3 from the application performance definition table 1102.
 ステップS6003にて、ルーフライン最適化計算部1112は、実行スレッドが動作する演算コア1310について、演算コア1310の制御可能な演算性能の最大値Aと、主記憶装置1320の制御可能な帯域性能の最大値Cとで構成するルーフラインモデルデータ2000(図2参照)と、ステップS6002で取得した演算強度3003との交点(第1の交点)を算出する。第1の交点である交点A1がルーフラインモデルデータ2000の勾配部にある場合には、ステップS6004へ進む。交点B1がルーフラインモデルデータ2000のルーフ部にある場合には、ステップS6009へ進む。 In step S6003, the roofline optimization calculation unit 1112 determines, for the arithmetic core 1310 on which the execution thread operates, the maximum value A of the controllable arithmetic performance of the arithmetic core 1310 and the controllable bandwidth performance of the main storage device 1320. The intersection point (first intersection point) between the roof line model data 2000 (see FIG. 2) consisting of the maximum value C and the calculation intensity 3003 acquired in step S6002 is calculated. If the first intersection A1 is on the slope of the roofline model data 2000, the process advances to step S6004. If the intersection B1 is on the roof of the roof line model data 2000, the process advances to step S6009.
 ステップS6004にて、ルーフライン最適化計算部1112は、ルーフラインモデルデータ2000の勾配部にある交点A1の演算性能値P1を示すルーフを算出する。 In step S6004, the roofline optimization calculation unit 1112 calculates a roof that indicates the calculation performance value P1 of the intersection A1 in the slope part of the roofline model data 2000.
 ステップS6005にて、ルーフライン最適化計算部1112は、アプリケーション性能定義テーブル1102から、実行スレッド識別子3002に該当するメモリ転送実行効率3004を取得する。 In step S6005, the roofline optimization calculation unit 1112 obtains the memory transfer execution efficiency 3004 corresponding to the execution thread identifier 3002 from the application performance definition table 1102.
 ステップS6006にて、ルーフライン最適化計算部1112は、主記憶装置1320の制御可能な帯域性能の最大値Cに、ステップS6005で取得したメモリ転送実行効率3004を乗じた帯域性能を算出する。なお、例示したルーフラインモデルデータ2000のグラフは両対数グラフであり、帯域性能はルーフラインモデルデータ2000の勾配部で表される。算出した帯域性能による新たな勾配部と、ステップS6002で取得した演算強度3003との交点A2を算出する。 In step S6006, the roofline optimization calculation unit 1112 calculates the bandwidth performance by multiplying the maximum value C of the controllable bandwidth performance of the main storage device 1320 by the memory transfer execution efficiency 3004 obtained in step S6005. Note that the graph of the illustrated roofline model data 2000 is a logarithmic graph, and the band performance is expressed by the slope part of the roofline model data 2000. An intersection point A2 between the new gradient part based on the calculated band performance and the calculation strength 3003 acquired in step S6002 is calculated.
 ステップS6007にて、ルーフライン最適化計算部1112は、ステップS6006で求めた交点A2の演算性能値P2を示すルーフを算出する。 In step S6007, the roof line optimization calculation unit 1112 calculates the roof that indicates the calculation performance value P2 of the intersection A2 obtained in step S6006.
 ステップS6008にて、ルーフライン最適化計算部1112は、主記憶装置1320の制御帯域性能を最大値Cとし、演算コア1310の制御性能の理論値をステップS6004で算出したルーフの演算性能値P1とし、下限値をステップS6007で算出したルーフの演算性能値P2とする。ステップS6008の処理工程の後、ステップS6013へ進む。 In step S6008, the roofline optimization calculation unit 1112 sets the control band performance of the main storage device 1320 to the maximum value C, and sets the theoretical value of the control performance of the calculation core 1310 to the roof calculation performance value P1 calculated in step S6004. , the lower limit value is set to the roof calculation performance value P2 calculated in step S6007. After the processing step of step S6008, the process advances to step S6013.
 ステップS6009にて、ルーフライン最適化計算部1112は、例示したルーフラインモデルデータ2000の勾配部を平行移動し、ルーフラインモデルデータ2000のルーフ部にある交点B1を通る新たな勾配と、その勾配の勾配値(理論帯域値とする)を算出する。
 ステップS6010にて、ルーフライン最適化計算部1112は、ステップS6005と同様に、アプリケーション性能定義テーブル1102から、実行スレッド識別子3002に該当するメモリ転送実行効率3004を取得する。
In step S6009, the roof line optimization calculation unit 1112 translates the slope part of the illustrated roof line model data 2000, and calculates a new slope passing through the intersection B1 at the roof part of the roof line model data 2000 and the slope. Calculate the slope value (assumed to be the theoretical band value) of
In step S6010, similarly to step S6005, the roofline optimization calculation unit 1112 obtains the memory transfer execution efficiency 3004 corresponding to the execution thread identifier 3002 from the application performance definition table 1102.
 ステップS6011にて、ルーフライン最適化計算部1112は、ステップS6009で算出した理論帯域値に、ステップS6010で取得したメモリ転送実行効率3004の逆数を乗じた帯域性能値(実行帯域値とする)を算出する。ただし、算出した実行帯域値が,主記憶装置1320の制御可能な帯域性能の最大値Cを超える場合には、算出する実行帯域値を最大値Cとしてよい。 In step S6011, the roofline optimization calculation unit 1112 calculates a band performance value (referred to as an execution band value) obtained by multiplying the theoretical band value calculated in step S6009 by the reciprocal of the memory transfer execution efficiency 3004 obtained in step S6010. calculate. However, if the calculated execution bandwidth value exceeds the maximum value C of the controllable bandwidth performance of the main storage device 1320, the calculated execution bandwidth value may be set to the maximum value C.
 ステップS6012にて、ルーフライン最適化計算部1112は、主記憶装置1320の制御帯域性能をステップS6011で算出した実行帯域値とし、演算コア1310の制御性能を制御可能な演算性能の最大値Aとする。 In step S6012, the roofline optimization calculation unit 1112 sets the control band performance of the main storage device 1320 to the execution band value calculated in step S6011, and sets the control performance of the calculation core 1310 to the maximum value A of controllable calculation performance. do.
 ステップS6013にて、ルーフライン最適化計算部1112は、アプリケーションの実行スレッドについて終了か否か判定する。終了の場合にはステップS6014へ進む。終了でない場合にはステップS6001へ戻る。
ステップS6014にて、ルーフライン最適化計算部1112は、最適化計算工程を終了する。
In step S6013, the roofline optimization calculation unit 1112 determines whether the execution thread of the application has ended. In the case of termination, the process advances to step S6014. If the process has not ended, the process returns to step S6001.
In step S6014, the roofline optimization calculation unit 1112 ends the optimization calculation process.

 図7A、図7B、図7Cは、実施の形態1のルーフライン最適化計算部1112によるアプリケーションの演算強度が図2におけるルーフラインモデルデータ2000の勾配部に交点を持つ場合のルーフライン最適化計算の例を示す図である。

7A, FIG. 7B, and FIG. 7C show roof line optimization calculations when the calculation intensity of the application by the roof line optimization calculation unit 1112 of Embodiment 1 has an intersection at the slope part of the roof line model data 2000 in FIG. 2. FIG.
 図7Aに示すルーフラインモデル7000は、図6で示したステップS6003にて、実行スレッドが動作する演算コア1310について、演算コア1310の制御可能な演算性能の最大値Aと、主記憶装置1320の制御可能な帯域性能の最大値Cとで構成するルーフラインモデルデータと、ステップS6002で取得した演算強度3003との交点を算出し、ルーフラインモデルデータルーフ部に交点A1を持つ場合を例示している。 The roofline model 7000 shown in FIG. 7A calculates, in step S6003 shown in FIG. The intersection point between the roofline model data consisting of the maximum value C of the controllable band performance and the calculation strength 3003 acquired in step S6002 is calculated, and the case where the roofline model data has an intersection point A1 in the roof portion is exemplified. There is.
 図7Bに示すルーフラインモデル7010は、図6で示したステップS6004にて、交点A1と、交点A1の演算性能値P1を含む算出処理に該当する破線矢印7011とを含めて例示している。 The roofline model 7010 shown in FIG. 7B is illustrated in step S6004 shown in FIG. 6, including the intersection A1 and a broken line arrow 7011 corresponding to the calculation process including the calculation performance value P1 of the intersection A1.
 図7Cに示すルーフラインモデル7020は、図6で示したステップS6006における帯域性能の算出と、交点A2の算出とを含む処理に該当する破線矢印7021と、交点A2の演算性能値P2の算出処理に該当する矢印7022とを含めて例示している。 The roof line model 7020 shown in FIG. 7C shows a broken line arrow 7021 corresponding to the process including the calculation of the band performance in step S6006 shown in FIG. An example is shown including an arrow 7022 corresponding to .
 図8A、図8B、図8Cは、実施の形態1のルーフライン最適化計算部1112によるアプリケーションの演算強度が図2におけるルーフラインモデルデータ2000のルーフ部に交点を持つ場合のルーフライン最適化計算の例を示す図である。 8A, FIG. 8B, and FIG. 8C show roof line optimization calculations when the calculation strength of the application by the roof line optimization calculation unit 1112 of Embodiment 1 has an intersection with the roof part of the roof line model data 2000 in FIG. 2. It is a figure showing an example.
 図8Aに示すルーフラインモデル8000は、図6で示したステップS6003にて、実行スレッドが動作する演算コア1310について、演算コア1310の制御可能な演算性能の最大値Aと、主記憶装置1320の制御可能な帯域性能の最大値Cとで構成するルーフラインモデルデータと、ステップS6002で取得した演算強度3003との交点を算出し、ルーフラインモデルデータのルーフ部に交点B1を持つ場合を例示している。 The roofline model 8000 shown in FIG. 8A calculates, in step S6003 shown in FIG. An example is shown in which the intersection point between the roofline model data consisting of the maximum value C of controllable band performance and the calculation strength 3003 acquired in step S6002 is calculated, and the roofline model data has an intersection point B1 at the roof portion. ing.
 図8Bに示すルーフラインモデル8010は、図6で示したステップS6009にて、理論帯域値を算出した状態を例示している。また、理論帯域値の算出処理に該当する破線矢印8011を含めて例示している。 The roofline model 8010 shown in FIG. 8B exemplifies the state in which the theoretical band value is calculated in step S6009 shown in FIG. 6. In addition, a dashed arrow 8011 corresponding to the calculation process of the theoretical band value is included in the example.
 図8Cに示すルーフラインモデル8020は、図6で示したステップS6011にて、実行帯域値を算出した状態を例示している。また、実行帯域値の算出処理に該当する矢印8021を含めて例示している。 The roofline model 8020 shown in FIG. 8C exemplifies the state in which the execution band value is calculated in step S6011 shown in FIG. 6. Further, an arrow 8021 corresponding to the execution band value calculation process is included in the example.
 図9は、実施の形態1のルーフライン制御部1202の動作フローを示す図である。
 ルーフライン制御部1202は、例えば、図4A、図4Bで例示したように、スケジューラ1201によって、実行スレッドを演算コアに割り当てて実行するタイミングT1の直前に呼び出されて実行してもよい。なお、ルーフライン制御部1202の処理時間を考慮した時間余裕分だけ直前に実行してもよい。
FIG. 9 is a diagram showing an operation flow of the roof line control section 1202 according to the first embodiment.
For example, as illustrated in FIGS. 4A and 4B, the roofline control unit 1202 may be called and executed by the scheduler 1201 immediately before timing T1 at which an execution thread is assigned to a calculation core and executed. Note that it may be executed immediately before by a time margin that takes into consideration the processing time of the roofline control unit 1202.
 ステップS9000にて、ルーフライン制御部1202は、ルーフライン制御工程を開始する。
 ステップS9001にて、ルーフライン制御部1202は、スケジューラ1201から実行するアプリケーションの実行スレッドを取得する。
In step S9000, roof line control section 1202 starts a roof line control process.
In step S9001, the roofline control unit 1202 obtains an execution thread of an application to be executed from the scheduler 1201.
 ステップS9002にて、ルーフライン制御部1202は、実行スレッドに対応する、ルーフライン最適化計算部1112で算出し、ルーフライン制御設定部1113で設定済みの主記憶装置1320の制御帯域性能と、それに応じた主記憶装置1320の電力制御を、メモリ帯域電力制御部1204を介して実行する。 In step S9002, the roofline control unit 1202 calculates the control band performance of the main storage device 1320, which has been calculated by the roofline optimization calculation unit 1112 and has been set by the roofline control setting unit 1113, and which corresponds to the execution thread. Accordingly, power control of the main storage device 1320 is executed via the memory band power control unit 1204.
 ステップS9003にて、ルーフライン制御部1202は、実行スレッドが動作する演算コア1310について、ルーフライン最適化計算部1112で算出し、ルーフライン制御設定部1113で設定済みの制御性能値が有る場合、ステップS9004に進む。実行スレッドが動作する演算コア1310について、ルーフライン最適化計算部1112で算出し、ルーフライン制御設定部1113で設定済みの制御性能値が、制御性能の下限値または理論値の設定が有る場合、ステップS9005に進む。 In step S9003, the roofline control unit 1202 calculates the control performance value for the arithmetic core 1310 on which the execution thread operates by the roofline optimization calculation unit 1112, and if there is a control performance value already set by the roofline control setting unit 1113, The process advances to step S9004. Regarding the arithmetic core 1310 on which the execution thread operates, if the control performance value calculated by the roofline optimization calculation unit 1112 and set by the roofline control setting unit 1113 is set as the lower limit value or theoretical value of the control performance, The process advances to step S9005.
 ステップS9004にて、ルーフライン制御部1202は、実行スレッドが動作する演算コア1310の制御性能をステップS9003で取得した制御性能値に設定し、その制御性能値に応じた演算コア1310の電力制御を、演算コア性能電力制御部1203を介して実行する。なお、演算コア1310の性能電力制御には、例えば、動的電圧周波数制御(DVFS;Dynamic Voltage and Frequency Scaling)を利用してもよい。ステップS9004の処理工程後、ステップS9006へ進む。 In step S9004, the roofline control unit 1202 sets the control performance of the arithmetic core 1310 on which the execution thread operates to the control performance value acquired in step S9003, and controls the power of the arithmetic core 1310 according to the control performance value. , is executed via the arithmetic core performance and power control unit 1203. Note that dynamic voltage frequency control (DVFS) (Dynamic Voltage and Frequency Scaling) may be used to control the performance and power of the arithmetic core 1310, for example. After the processing step in step S9004, the process advances to step S9006.
 ステップS9005にて、ルーフライン制御部1202は、実行スレッドが動作する演算コア1310の制御性能をステップS9003で取得した制御性能の理論値または下限値に設定し、その制御性能値に応じた演算コア1310の電力制御を、演算コア性能電力制御部1203を介して実行する。なお、ステップS9003で制御性能の下限値が有る場合には、下限値を使用してよい。 In step S9005, the roofline control unit 1202 sets the control performance of the arithmetic core 1310 on which the execution thread operates to the theoretical value or lower limit value of the control performance acquired in step S9003, and 1310 is executed via the arithmetic core performance power control unit 1203. Note that if there is a lower limit value of control performance in step S9003, the lower limit value may be used.
 ステップS9004またはステップS9005の処理後は、ステップS9006に進み、ステップS9006にて、ルーフライン制御部1202は、ルーフライン制御工程を終了する。 After the processing in step S9004 or step S9005, the process proceeds to step S9006, and in step S9006, the roof line control unit 1202 ends the roof line control process.
実施の形態2.
 図10は、実施の形態2のスケジューリング情報取得部1111から取得する情報の例を示す図である。
 図4との違いは、アプリケーションの実行スレッドが各演算コアで同時間での実行が有る点である。
 例えば、図10の例では、演算コア0と演算コア1に対するアプリケーションの実行スレッドのスケジューリング情報を例示しており、演算コア0と演算コア1の各々で実行する実行スレッドが同時間で重なっている。
Embodiment 2.
FIG. 10 is a diagram illustrating an example of information acquired from the scheduling information acquisition unit 1111 according to the second embodiment.
The difference from FIG. 4 is that the execution thread of the application is executed at the same time on each calculation core.
For example, the example in FIG. 10 shows the scheduling information of application execution threads for computing core 0 and computing core 1, and the execution threads executed by computing core 0 and computing core 1 overlap at the same time. .
 図11は、実施の形態2のルーフライン最適化計算部の動作フローを示す図である。
 図6との違いは、アプリケーションの実行スレッドが各演算コアで同時間での実行が有ることに対応する点である。
 ステップS11000にて、ルーフライン最適化計算部1112は、最適化計算工程を開始する。
 ステップS11001にて、ルーフライン最適化計算部1112は、各演算コア1310で同時間に実行が重なるアプリケーションの実行スレッドの組ごとに、ステップS11001~ステップS11012を繰り返す。
FIG. 11 is a diagram showing an operation flow of the roof line optimization calculation section according to the second embodiment.
The difference from FIG. 6 is that the execution thread of the application is executed at the same time on each calculation core.
In step S11000, roofline optimization calculation section 1112 starts an optimization calculation process.
In step S11001, the roofline optimization calculation unit 1112 repeats steps S11001 to S11012 for each set of execution threads of applications that overlap in execution at the same time in each calculation core 1310.
 ステップS11002にて、ルーフライン最適化計算部1112は、アプリケーション性能定義テーブル1102から、各々の実行スレッドの組について、各々の演算強度3003取得する。 In step S11002, the roofline optimization calculation unit 1112 obtains each calculation strength 3003 for each set of execution threads from the application performance definition table 1102.
 ステップS11003にて、ルーフライン最適化計算部1112は、各々の実行スレッドが要求する主記憶装置1320の帯域性能を算出する。実行スレッドが要求する帯域性能の算出方法は、例えば、実行スレッドが動作する演算コア1310について、演算コア1310の制御可能な演算性能の最大値Aと、主記憶装置1320の制御可能な帯域性能の最大値Cとで構成するルーフラインモデルデータ2000と、ステップS11002で取得した演算強度3003との交点を算出する。交点がルーフラインモデルデータ2000の勾配部にある場合には、その勾配値が実行スレッドが要求する帯域性能値としてよい。交点がルーフラインモデルデータ2000のルーフ部にある場合には、ルーフラインモデルデータ2000の勾配部を平行移動し、ルーフラインモデルデータ2000のルーフ部にある交点を通る新たな勾配を算出し、その勾配値に実行スレッドのメモリ転送実行効率3004の逆数を乗じた勾配値を算出し、その勾配値が実行スレッドが要求する帯域性能値としてよい。ただし、主記憶装置1320の制御可能な帯域性能の最大値Cを超える場合には、実行スレッドが要求する帯域性能値を最大値Cとしてよい。 In step S11003, the roofline optimization calculation unit 1112 calculates the bandwidth performance of the main storage device 1320 required by each execution thread. A method for calculating the bandwidth performance required by an execution thread is, for example, for the calculation core 1310 on which the execution thread operates, the maximum value A of the controllable calculation performance of the calculation core 1310 and the controllable bandwidth performance of the main storage device 1320. The intersection point between the roof line model data 2000 composed of the maximum value C and the calculation strength 3003 acquired in step S11002 is calculated. If the intersection point is on a slope part of the roofline model data 2000, the slope value may be the bandwidth performance value required by the execution thread. If the intersection point is at the roof part of the roof line model data 2000, the slope part of the roof line model data 2000 is translated in parallel, a new slope passing through the intersection point at the roof part of the roof line model data 2000 is calculated, and the slope part of the roof line model data 2000 is calculated. A gradient value may be calculated by multiplying the gradient value by the reciprocal of the memory transfer execution efficiency 3004 of the execution thread, and the gradient value may be used as the bandwidth performance value required by the execution thread. However, if the controllable bandwidth performance of the main storage device 1320 exceeds the maximum value C, the bandwidth performance value requested by the execution thread may be set to the maximum value C.
 ステップS11004にて、ルーフライン最適化計算部1112は、ステップS11003で算出した各々の実行スレッドが要求する主記憶装置1320の帯域性能の合計値が、主記憶装置1320の制御可能な帯域性能の最大値Cよりも大きいか否か判定する。大きい場合にはステップS11005に進む。小さい場合にはステップS11006に進む。 In step S11004, the roofline optimization calculation unit 1112 determines that the total value of the bandwidth performance of the main storage device 1320 required by each execution thread calculated in step S11003 is the maximum of the controllable bandwidth performance of the main storage device 1320. It is determined whether the value is larger than the value C. If it is larger, the process advances to step S11005. If it is smaller, the process advances to step S11006.
 ステップS11005にて、ルーフライン最適化計算部1112は、ステップS11003で算出した各々の実行スレッドが要求する主記憶装置1320の帯域性能の合計値に対して、実行スレッドごとの帯域割合を算出する。例えば、実行スレッドごとの帯域割合の算出方法は、ステップS11003で算出した各々の実行スレッドが要求する帯域性能の合計値に対する各実行スレッドが要求する帯域性能の比を算出してもよい。 In step S11005, the roofline optimization calculation unit 1112 calculates the bandwidth ratio for each execution thread with respect to the total value of the bandwidth performance of the main storage device 1320 required by each execution thread calculated in step S11003. For example, the method of calculating the bandwidth ratio for each execution thread may be to calculate the ratio of the bandwidth performance required by each execution thread to the total value of the bandwidth performance required by each execution thread calculated in step S11003.
 ステップS11006にて、ルーフライン最適化計算部1112は、実行スレッドごとにステップS11006~S11008を繰り返す。
 ステップS11007にて、ルーフライン最適化計算部1112は、実行スレッドが動作する演算コア1310の制御性能を算出する。
In step S11006, the roofline optimization calculation unit 1112 repeats steps S11006 to S11008 for each execution thread.
In step S11007, the roofline optimization calculation unit 1112 calculates the control performance of the arithmetic core 1310 on which the execution thread operates.
 ステップS11008にて、ルーフライン最適化計算部1112は、実行スレッドについて終了か否か判定する。終了の場合にはステップS11009へ進む。終了でない場合にはステップS11006へ戻る。 In step S11008, the roofline optimization calculation unit 1112 determines whether the execution thread has ended. In the case of termination, the process advances to step S11009. If the process has not ended, the process returns to step S11006.
 ステップS11009にて、ステップS11004と同等の判定を行う。ステップS11003で算出した各々の実行スレッドが要求する主記憶装置1320の帯域性能の合計値が、主記憶装置1320の制御可能な帯域性能の最大値Cよりも大きい場合にはステップS11010に進む。小さい場合にはステップS11011に進む。 In step S11009, the same determination as step S11004 is made. If the total value of the bandwidth performance of the main storage device 1320 requested by each execution thread calculated in step S11003 is larger than the maximum value C of the controllable bandwidth performance of the main storage device 1320, the process advances to step S11010. If it is smaller, the process advances to step S11011.
 ステップS11010にて、ルーフライン最適化計算部1112は、主記憶装置1320の制御帯域性能を最大値Cとする。
 ステップS11011にて、ルーフライン最適化計算部1112は、主記憶装置1320の制御帯域性能を、ステップS11003で算出した各々の実行スレッドが要求する主記憶装置1320の帯域性能の合計値とする。
In step S11010, the roofline optimization calculation unit 1112 sets the control band performance of the main storage device 1320 to the maximum value C.
In step S11011, the roofline optimization calculation unit 1112 sets the control bandwidth performance of the main storage device 1320 to the total value of the bandwidth performance of the main storage device 1320 required by each execution thread calculated in step S11003.
 ステップS11012にて、ルーフライン最適化計算部1112は、各演算コア1310で同時間に実行が重なるアプリケーションの実行スレッドの組について終了か否か判定する。終了の場合にはステップS11013へ進む。終了でない場合にはステップS11001へ戻る。
 ステップS11013にて、ルーフライン最適化計算部1112は、最適化計算工程を終了する。
In step S11012, the roofline optimization calculation unit 1112 determines whether or not a set of execution threads of applications overlappingly executed at the same time in each calculation core 1310 is finished. In the case of termination, the process advances to step S11013. If the process has not ended, the process returns to step S11001.
In step S11013, the roofline optimization calculation unit 1112 ends the optimization calculation process.
 図12は、実施の形態2のルーフライン最適化計算部に含む制御性能算出処理の動作フローを示す図である。
 ステップS12000にて、ルーフライン最適化計算部1112は、演算コア1310の制御性能算出処理工程を開始する。
 ステップS12001にて、ルーフライン最適化計算部1112は、アプリケーション性能定義テーブル1102から、実行スレッド識別子3002に該当するメモリ転送実行効率3004を取得する。
FIG. 12 is a diagram illustrating an operational flow of control performance calculation processing included in the roofline optimization calculation section of the second embodiment.
In step S12000, the roofline optimization calculation unit 1112 starts a control performance calculation process for the calculation core 1310.
In step S12001, the roofline optimization calculation unit 1112 obtains the memory transfer execution efficiency 3004 corresponding to the execution thread identifier 3002 from the application performance definition table 1102.
 ステップS12002にて、ルーフライン最適化計算部1112は、実行スレッドが動作する演算コア1310について、演算コア1310の制御可能な演算性能の最大値Aと、主記憶装置1320の制御可能な帯域性能の最大値Cとで構成するルーフラインモデルデータ2000と、ステップS11002で取得した演算強度3003との交点を算出する。交点A1aがルーフラインモデルデータ2000の勾配部にある場合には、ステップS12003へ進む。交点B1aがルーフラインモデルデータ2000のルーフ部にある場合には、ステップS12008へ進む。 In step S12002, the roofline optimization calculation unit 1112 determines, for the arithmetic core 1310 on which the execution thread operates, the maximum value A of the controllable arithmetic performance of the arithmetic core 1310 and the controllable bandwidth performance of the main storage device 1320. The intersection point between the roof line model data 2000 composed of the maximum value C and the calculation strength 3003 acquired in step S11002 is calculated. If the intersection A1a is on the slope of the roofline model data 2000, the process advances to step S12003. If the intersection B1a is on the roof of the roofline model data 2000, the process advances to step S12008.
 ステップS12003にて、ルーフライン最適化計算部1112は、ルーフラインモデルデータ2000の勾配部にある交点A1aの演算性能値P1aを示すルーフを算出する。 In step S12003, the roofline optimization calculation unit 1112 calculates a roof that indicates the calculation performance value P1a of the intersection A1a on the slope part of the roofline model data 2000.
 ステップS12004にて、ルーフライン最適化計算部1112は、主記憶装置1320の制御可能な帯域性能の最大値Cに対して、図11で示したステップS11005で算出した実行スレッドの帯域割合と、ステップS12001で取得した実行スレッドに該当するメモリ転送実行効率3004を乗じた帯域性能値を算出する。算出した帯域性能値となるルーフラインモデルデータ2000の勾配部とステップS11002で取得した演算強度3003との交点A2aを算出する。 In step S12004, the roofline optimization calculation unit 1112 calculates the bandwidth ratio of the execution thread calculated in step S11005 shown in FIG. A bandwidth performance value is calculated by multiplying the execution thread obtained in S12001 by the corresponding memory transfer execution efficiency 3004. An intersection point A2a between the slope part of the roofline model data 2000 that becomes the calculated band performance value and the calculation strength 3003 acquired in step S11002 is calculated.
 ステップS12005にて、ルーフライン最適化計算部1112は、ステップS12004で求めた交点A2aの演算性能値P2aを示すルーフを算出する。
 ステップS12006にて、ルーフライン最適化計算部1112は、演算コア1310の制御性能の理論値をステップS12003で算出したルーフ時の演算性能値P1aとする。
In step S12005, the roof line optimization calculation unit 1112 calculates a roof indicating the calculation performance value P2a of the intersection A2a obtained in step S12004.
In step S12006, the roof line optimization calculation unit 1112 sets the theoretical value of the control performance of the calculation core 1310 as the calculation performance value P1a during the roof period calculated in step S12003.
 ステップS12007にて、ルーフライン最適化計算部1112は、演算コア1310の制御性能の下限値をステップS12005で算出したルーフ時の演算性能値P2aとする。ステップS12007の処理工程の後、ステップS12014へ進む。 In step S12007, the roofline optimization calculation unit 1112 sets the lower limit value of the control performance of the calculation core 1310 to the roof-time calculation performance value P2a calculated in step S12005. After the processing step of step S12007, the process advances to step S12014.
 ステップS12008にて、ルーフライン最適化計算部1112は、図11で例示したステップS11004と同じく、実行スレッドが要求する主記憶装置1320の帯域性能の合計値が、主記憶装置1320の制御可能な帯域性能の最大値Cよりも大きいか否か判定する。大きい場合にはステップS12009に進む。小さい場合にはステップS12013に進む。 In step S12008, the roofline optimization calculation unit 1112 determines that the total value of the bandwidth performance of the main storage device 1320 requested by the execution thread is equal to It is determined whether the performance is larger than the maximum value C. If it is larger, the process advances to step S12009. If it is smaller, the process advances to step S12013.
 ステップS12009にて、ルーフライン最適化計算部1112は、主記憶装置1320の制御可能な帯域性能の最大値Cに対して、図11で示したステップS11005で算出した実行スレッドの帯域割合を乗じた帯域性能値を算出する。算出した帯域性能値となるルーフラインモデルデータ2000の勾配部とステップS11002で取得した演算強度3003との交点B2aを算出する。 In step S12009, the roofline optimization calculation unit 1112 multiplies the maximum value C of the controllable bandwidth performance of the main storage device 1320 by the bandwidth ratio of the execution thread calculated in step S11005 shown in FIG. Calculate the bandwidth performance value. An intersection B2a between the slope part of the roofline model data 2000, which is the calculated band performance value, and the calculation strength 3003 acquired in step S11002 is calculated.
 ステップS12010にて、ルーフライン最適化計算部1112は、ステップS12009で算出した交点B2aの演算性能値Q2aを示すルーフを算出する。
 ステップS12011にて、ルーフライン最適化計算部1112は、ステップS12010で算出した演算性能値Q2aが、演算コア1310の制御可能な演算性能の最大値Aより小さいか否かを判定する。小さい場合にはステップS12012に進む。大きい場合にはステップS12013に進む。
In step S12010, the roof line optimization calculation unit 1112 calculates a roof indicating the calculation performance value Q2a of the intersection B2a calculated in step S12009.
In step S12011, the roofline optimization calculation unit 1112 determines whether the calculation performance value Q2a calculated in step S12010 is smaller than the maximum value A of the controllable calculation performance of the calculation core 1310. If it is smaller, the process advances to step S12012. If it is larger, the process advances to step S12013.
 ステップS12012にて、ルーフライン最適化計算部1112は、演算コア1310の制御性能の下限値をステップS12010で算出したルーフ時の演算性能値Q2aとする。ステップS12012の処理工程の後、ステップS12014へ進む。 In step S12012, the roofline optimization calculation unit 1112 sets the lower limit value of the control performance of the calculation core 1310 to the roof-time calculation performance value Q2a calculated in step S12010. After the processing step of step S12012, the process advances to step S12014.
 ステップS12013にて、ルーフライン最適化計算部1112は、演算コア1310の制御性能値を演算コア1310の制御可能な演算性能の最大値Aとし、演算性能値Q1aする。ステップS12013の処理工程の後、ステップS12014へ進む。
 ステップS12014にて、ルーフライン最適化計算部1112は、演算コア1310の制御性能算出処理工程を終了する。
In step S12013, the roof line optimization calculation unit 1112 sets the control performance value of the calculation core 1310 to the maximum value A of the controllable calculation performance of the calculation core 1310, and sets it as the calculation performance value Q1a. After the processing step of step S12013, the process advances to step S12014.
In step S12014, the roof line optimization calculation unit 1112 ends the control performance calculation process of the calculation core 1310.
 図13A、図13B、図13C、図13Dは、実施の形態2のルーフライン最適化計算部1112によるルーフライン最適化計算の例を示す図である。 13A, FIG. 13B, FIG. 13C, and FIG. 13D are diagrams showing examples of roofline optimization calculations by the roofline optimization calculation unit 1112 of the second embodiment.
 図13Aに示すルーフラインモデル13000は、図11で示したステップS11003において、実行スレッドが要求する主記憶装置1320の帯域性能値を算出する処理のうち、実行スレッドが動作する演算コア1310について、演算コア1310の制御可能な演算性能の最大値Aと、主記憶装置1320の制御可能な帯域性能の最大値Cとで構成するルーフラインモデルデータ2000と、ステップS11002で取得した演算強度3003との第2の交点である交点A1aがルーフラインモデルデータ2000の勾配部にある場合を例示している。 The roofline model 13000 shown in FIG. 13A performs calculations on the calculation core 1310 on which the execution thread operates in the process of calculating the bandwidth performance value of the main storage device 1320 requested by the execution thread in step S11003 shown in FIG. Roofline model data 2000 consisting of the maximum value A of the controllable calculation performance of the core 1310 and the maximum value C of the controllable bandwidth performance of the main storage device 1320, and the calculation intensity 3003 acquired in step S11002. A case is illustrated in which the intersection point A1a, which is the intersection point of 2, is located on the slope part of the roof line model data 2000.
 図13Bに示すルーフラインモデル13010は、図11で示したステップS11003において、実行スレッドが要求する主記憶装置1320の帯域性能値を算出する処理のうち、実行スレッドが動作する演算コア1310について、演算コア1310の制御可能な演算性能の最大値Aと、主記憶装置1320の制御可能な帯域性能の最大値Cとで構成するルーフラインモデルデータ2000と、ステップS11002で取得した演算強度3003との交点B1aがルーフラインモデルデータ2000のルーフ部にある場合を例示している。また、実行スレッドが要求する帯域性能値の算出処理に該当する破線矢印13011を含めて例示している。 The roofline model 13010 shown in FIG. 13B performs calculations on the calculation core 1310 on which the execution thread operates in the process of calculating the bandwidth performance value of the main storage device 1320 requested by the execution thread in step S11003 shown in FIG. The intersection point of the roofline model data 2000, which is composed of the maximum value A of the controllable calculation performance of the core 1310 and the maximum value C of the controllable bandwidth performance of the main storage device 1320, and the calculation strength 3003 obtained in step S11002. A case where B1a is located at the roof portion of the roof line model data 2000 is illustrated. In addition, a dashed arrow 13011 corresponding to the calculation process of the bandwidth performance value requested by the execution thread is included in the example.
 図13Cに示すルーフラインモデル13020は、図12で示したステップS12003で交点A1aの演算性能値P1aを示すルーフの算出処理に該当する破線矢印13021と、ステップS12004で交点A2aの算出処理に該当する破線矢印13022と、ステップS12005で交点A2aの演算性能値P2aを示すルーフの算出処理に該当する破線矢印13023とを含めて例示している。 The roof line model 13020 shown in FIG. 13C has a broken line arrow 13021 corresponding to the calculation process of the roof indicating the calculation performance value P1a of the intersection point A1a in step S12003 shown in FIG. The example includes a broken line arrow 13022 and a broken line arrow 13023 corresponding to the roof calculation process that indicates the calculation performance value P2a of the intersection A2a in step S12005.
 図13Dに示すルーフラインモデル13030は、図12で示したステップS12009で交点B2aの算出処理に該当する破線矢印13031を含めて例示している。 The roof line model 13030 shown in FIG. 13D is illustrated including a broken line arrow 13031 corresponding to the calculation process of the intersection B2a in step S12009 shown in FIG.
 図14は、実施の形態2のルーフライン制御部1202における実行スレッド取得処理の動作フローを示す図である。
 図9との違いは、ステップS9001の実行スレッド取得の処理工程について、アプリケーションの実行スレッドが各演算コア1310で同時間での実行が有ることに対応する点である。ルーフライン制御部1202の動作フローは、図9におけるステップS9001を、図14で示すステップS14000~ステップS14006で置換したものとなる。
FIG. 14 is a diagram showing an operational flow of execution thread acquisition processing in the roofline control unit 1202 according to the second embodiment.
The difference from FIG. 9 is that the processing step of obtaining an execution thread in step S9001 corresponds to the fact that the execution thread of the application is executed at the same time in each calculation core 1310. The operation flow of the roofline control unit 1202 is such that step S9001 in FIG. 9 is replaced with steps S14000 to S14006 shown in FIG. 14.
 ステップS14000にて、ルーフライン制御部1202は、実行スレッド取得処理を開始する。 In step S14000, the roofline control unit 1202 starts execution thread acquisition processing.
 ステップS14001にて、ルーフライン制御部1202は、ルーフライン制御部1202がルーフライン制御の割込みを契機として動作しているか否かを判定する。割込みを契機として動作している場合にはステップS14005へ進む。割込みを契機として動作していない場合にはステップS14002へ進む。なお、ルーフライン制御の割込みは、ステップS14004で発行される。 In step S14001, the roofline control unit 1202 determines whether the roofline control unit 1202 is operating in response to a roofline control interruption. If the operation is triggered by an interrupt, the process advances to step S14005. If the interrupt is not triggered, the process advances to step S14002. Note that the roof line control interrupt is issued in step S14004.
 ステップS14002にて、ルーフライン制御部1202は、スケジューラ1201から自身の演算コア1310へ割当実行するアプリケーションの実行スレッドを取得する。 In step S14002, the roofline control unit 1202 acquires from the scheduler 1201 the execution thread of the application to be assigned to its own arithmetic core 1310 for execution.
 ステップS14003にて、ルーフライン制御部1202は、スケジューラ1201から自身以外の他の演算コア1310で実行中の実行スレッドが有るか否かを判定する。他の演算コア1310で実行中の実行スレッドが有る場合にはステップS14004へ進む。無い場合にはステップS14006へ進む。 In step S14003, the roofline control unit 1202 determines from the scheduler 1201 whether there is an execution thread being executed on a calculation core 1310 other than itself. If there is an execution thread being executed on another arithmetic core 1310, the process advances to step S14004. If there is no such information, the process advances to step S14006.
 ステップS14004にて、ルーフライン制御部1202は、自身以外の他の演算コア1310に対して、ルーフライン制御部を実行するための割込みを発行する。 In step S14004, the roofline control unit 1202 issues an interrupt to other arithmetic cores 1310 other than itself to execute the roofline control unit.
 ステップS14005にて、ルーフライン制御部1202は、各演算コア1310で実行中の実行スレッドの組を取得する。なお、ステップS14005で例示する処理工程によれば、各々の演算コア1310が同時間に実行のあるアプリケーションの実行スレッドの組を取得することになる。
 ステップS14006にて、ルーフライン制御部1202は、実行スレッド取得処理を終了する。
In step S14005, the roofline control unit 1202 obtains a set of execution threads that are being executed in each calculation core 1310. Note that according to the process illustrated in step S14005, each calculation core 1310 acquires a set of execution threads of applications that are executed at the same time.
In step S14006, the roofline control unit 1202 ends the execution thread acquisition process.
 ステップS14006の処理工程を終了後に、図9におけるステップS9002~ステップS9006の処理工程を実施する。なお、図9におけるステップS9002~ステップS9006の処理工程は、図11で例示した各演算コア1310で同時間に実行のあるアプリケーションの実行スレッドの組ごとに算出した、主記憶装置1320の制御帯域性能と、各実行スレッドが動作する演算コア1310の制御性能にもとづいて、主記憶装置1320の制御帯域性能とそれに応じた主記憶装置1320の電力制御と、各実行スレッドが動作する演算コア1310の制御性能とそれに応じた演算コア1310の電力制御を実施すればよい。なお、主記憶装置1320の制御帯域性能と主記憶装置1320の電力制御は、いずれかの演算コア1310で実施されればよい。また、各実行スレッドが動作する演算コア1310の制御性能とそれに応じた演算コア1310の電力制御は、各々の演算コア1310で実施すればよい。 After completing the process in step S14006, the processes in steps S9002 to S9006 in FIG. 9 are performed. Note that the processing steps from step S9002 to step S9006 in FIG. 9 are based on the control bandwidth performance of the main storage device 1320 calculated for each set of execution threads of applications that are executed at the same time on each arithmetic core 1310 illustrated in FIG. Based on the control performance of the arithmetic core 1310 on which each execution thread operates, the control bandwidth performance of the main storage device 1320 and the corresponding power control of the main memory device 1320, and the control of the arithmetic core 1310 on which each execution thread operates. What is necessary is to perform power control of the arithmetic core 1310 according to the performance. Note that the control band performance of the main storage device 1320 and the power control of the main storage device 1320 may be performed by any one of the arithmetic cores 1310. Further, control performance of the arithmetic core 1310 on which each execution thread operates and power control of the arithmetic core 1310 corresponding to the control performance may be performed by each arithmetic core 1310.
 本願は、様々な例示的な実施の形態及び実施例が記載されているが、1つ、または複数の実施の形態に記載された様々な特徴、態様、及び機能は特定の実施の形態の適用に限られるのではなく、単独で、または様々な組み合わせで実施の形態に適用可能である。
従って、例示されていない無数の変形例が、本願明細書に開示される技術の範囲内において想定される。例えば、少なくとも1つの構成要素を変形する場合、追加する場合または省略する場合、さらには、少なくとも1つの構成要素を抽出し、他の実施の形態の構成要素と組み合わせる場合が含まれるものとする。
Although this application describes various exemplary embodiments and examples, various features, aspects, and functions described in one or more embodiments may be applicable to a particular embodiment. The present invention is not limited to, and can be applied to the embodiments alone or in various combinations.
Accordingly, countless variations not illustrated are envisioned within the scope of the technology disclosed herein. For example, this includes cases where at least one component is modified, added, or omitted, and cases where at least one component is extracted and combined with components of other embodiments.
 1000 情報処理装置、1100 性能電力最適化部、1101 ルーフラインモデルデータ、1102 アプリケーション性能定義テーブル、1110 性能電力最適化プログラム、1111 スケジューリング情報取得部、1112 ルーフライン最適化計算部、1113 ルーフライン制御設定部、1200 システムソフトウェア、1201 スケジューラ、1202 ルーフライン制御部、1203 演算コア性能電力制御部、1204 メモリ帯域電力制御部、1300 計算機ハードウェア、1310 演算コア、1320 主記憶装置、1400 アプリケーション 1000 Information processing device, 1100 Performance power optimization unit, 1101 Roofline model data, 1102 Application performance definition table, 1110 Performance power optimization program, 1111 Scheduling information acquisition unit, 1112 Roofline optimization calculation unit, 1113 Roofline control setting Department, 1200 System software, 1201 Scheduler, 1202 Roofline control unit, 1203 Computing core performance power control unit, 1204 Memory band power control unit, 1300 Computer hardware, 1310 Computing core, 1320 Main storage, 1400 Application

Claims (13)

  1.  省電力機構を備える演算コアおよび主記憶装置を含む計算機ハードウェアと、
    前記計算機ハードウェアで動作するシステムソフトウェアと、
    前記システムソフトウェアおよび前記システムソフトウェアのコンテナ実行環境で動作するアプリケーションと、
    を含む情報処理装置において、
    前記情報処理装置の演算性能と消費電力の最適化処理を行う性能電力最適化部を備え、
    前記性能電力最適化部に、性能電力最適化プログラムと、前記計算機ハードウェアの演算強度と単位時間当たりの演算性能を示すルーフラインモデルデータと、前記アプリケーションの演算強度情報を含むアプリケーション性能定義テーブルと、を備え、
    前記性能電力最適化プログラムに、前記計算機ハードウェアの演算コアに対するアプリケーションのスケジューリング情報を取得するスケジューリング情報取得部と、前記計算機ハードウェアのルーフラインの最適化計算を行うルーフライン最適化計算部と、
    を備えることを特徴とする情報処理装置。
    Computer hardware including an arithmetic core and a main storage device equipped with a power saving mechanism;
    system software that operates on the computer hardware;
    the system software and an application that operates in a container execution environment of the system software;
    In an information processing device including
    comprising a performance power optimization unit that performs processing to optimize calculation performance and power consumption of the information processing device;
    The performance power optimization unit includes a performance power optimization program, roofline model data indicating the calculation intensity and calculation performance per unit time of the computer hardware, and an application performance definition table including calculation intensity information of the application. , comprising;
    The performance power optimization program includes a scheduling information acquisition unit that acquires scheduling information of an application for the calculation core of the computer hardware, and a roofline optimization calculation unit that performs a roofline optimization calculation of the computer hardware.
    An information processing device comprising:
  2.  前記性能電力最適化プログラムは、前記ルーフラインの最適化計算にもとづいて、前記システムソフトウェアに対して前記計算機ハードウェアの性能電力の制御設定を行うルーフライン制御設定部を含むことを特徴とする請求項1に記載の情報処理装置。 The performance power optimization program includes a roofline control setting section that performs control settings for the performance power of the computer hardware for the system software based on the roofline optimization calculation. The information processing device according to item 1.
  3.  前記システムソフトウェアは、
    ひとつ以上の前記アプリケーションをひとつ以上の演算コアに割り当てて実行するスケジューラと、
    ルーフライン制御設定の設定情報をもとに、前記計算機ハードウェアの前記ルーフラインを制御するルーフライン制御部と、
    前記演算コアの性能および電力の制御を行う演算コア性能電力制御部と、
    前記主記憶装置の帯域および電力の制御を行うメモリ帯域電力制御部と、
    を含むことを特徴とする請求項1に記載の情報処理装置。
    The system software includes:
    a scheduler that assigns and executes one or more of the applications to one or more computing cores;
    a roofline control unit that controls the roofline of the computer hardware based on setting information of roofline control settings;
    an arithmetic core performance and power control unit that controls performance and power of the arithmetic core;
    a memory bandwidth power control unit that controls bandwidth and power of the main storage device;
    The information processing device according to claim 1, characterized in that the information processing device includes:
  4.  前記ルーフラインモデルデータは、前記演算コアごとのルーフラインモデルデータを含むことを特徴とする請求項1に記載の情報処理装置。 The information processing device according to claim 1, wherein the roofline model data includes roofline model data for each of the calculation cores.
  5.  前記ルーフラインモデルデータは、前記演算コアごとのルーフラインモデルデータにおいて、
    演算性能が最大を示すルーフラインである演算性能最大値と、
    演算性能が最小を示すルーフラインである演算性能最小値と、
    メモリ帯域が最大を示す勾配であるメモリ帯域最大値と、
    メモリ帯域が最小を示す勾配であるメモリ帯域最小値と、
    を備えることを特徴とする請求項1に記載の情報処理装置。
    The roofline model data includes roofline model data for each calculation core,
    The maximum calculation performance value is the roof line indicating the maximum calculation performance,
    the minimum calculation performance value, which is the roof line indicating the minimum calculation performance;
    The memory bandwidth maximum value is the slope indicating the maximum memory bandwidth,
    A memory band minimum value, which is a slope indicating a minimum memory band,
    The information processing device according to claim 1, further comprising:
  6.  前記アプリケーション性能定義テーブルは、
    前記コンテナ実行環境を示すコンテナ識別子と、
    前記コンテナ実行環境で動作するアプリケーションを示すアプリケーション識別子と、
    前記アプリケーションの実行スレッドを示す実行スレッド識別子と、
    前記アプリケーションの実行スレッドの演算強度と、
    前記アプリケーションの実行スレッドのメモリ転送実行効率と、
    を含むことを特徴とする請求項1に記載の情報処理装置。
    The application performance definition table is
    a container identifier indicating the container execution environment;
    an application identifier indicating an application running in the container execution environment;
    an execution thread identifier indicating an execution thread of the application;
    the computational intensity of the execution thread of the application;
    memory transfer execution efficiency of the execution thread of the application;
    The information processing device according to claim 1, characterized in that the information processing device includes:
  7.  省電力機構を備える演算コアおよび主記憶装置を含む計算機ハードウェアの前記演算コアに対するアプリケーションの実行スレッドのスケジューリング情報を取得するスケジューリング情報取得ステップと、
    前記計算機ハードウェアのルーフラインの最適化計算を行うルーフライン最適化計算ステップと、
    前記計算機ハードウェアで動作するシステムソフトウェアに対して前記計算機ハードウェアの性能電力の制御設定を行うルーフライン制御設定ステップと、を含むことを特徴とする情報処理方法。
    a scheduling information acquisition step of acquiring scheduling information of an execution thread of an application for the arithmetic core of computer hardware including an arithmetic core equipped with a power saving mechanism and a main storage device;
    a roofline optimization calculation step of performing roofline optimization calculation of the computer hardware;
    An information processing method comprising: a roofline control setting step of performing control settings for performance power of the computer hardware for system software running on the computer hardware.
  8.  前記スケジューリング情報取得ステップは、
    前記計算機ハードウェアの全ての前記演算コアの活性化状態と、前記アプリケーションの前記実行スレッドが複数の前記演算コアで同時間での実行があるか否かと、を含む情報を取得するステップを含むことを特徴とする請求項7に記載の情報処理方法。
    The scheduling information acquisition step includes:
    the step of acquiring information including the activation state of all the calculation cores of the computer hardware and whether the execution thread of the application is executed on a plurality of the calculation cores at the same time; The information processing method according to claim 7, characterized in that:
  9.  前記ルーフライン最適化計算ステップは、
    前記アプリケーションの実行スレッドが複数の前記演算コアで同時間での実行が無い場合において、
    前記アプリケーションの前記実行スレッドごとに、前記アプリケーションの演算強度情報を含むアプリケーション性能定義テーブルから演算強度を取得するステップと、
    前記演算コアの制御可能な演算性能の最大値と、主記憶装置の制御可能な帯域性能の最大値とで構成するルーフラインモデルデータと、前記演算強度との第1の交点を求めるステップと、
    前記第1の交点が前記ルーフラインモデルデータの勾配部となる場合に、前記第1の交点を交点A1とし、交点A1の演算性能値P1を示すルーフを算出するステップと、
    前記アプリケーション性能定義テーブルから前記アプリケーションの実行スレッドのメモリ転送実行効率を取得するステップと、
    前記主記憶装置の制御可能な帯域性能の最大値に前記メモリ転送実行効率を乗じた帯域性能を算出し、前記帯域性能によるルーフラインモデルデータの勾配部と前記演算強度の交点A2を算出するステップと、
    前記交点A2の演算性能値P2を示すルーフを算出するステップと、
    前記主記憶装置の制御帯域性能を最大値とし、前記演算コアの制御性能の理論値を前記演算性能値P1とし、前記演算コアの制御性能の下限値を前記演算性能値P2とするステップと、を含み、
    前記第1の交点が前記ルーフラインモデルデータの前記ルーフとなる場合に、前記第1の交点を交点B1とし、前記ルーフラインモデルデータの両対数グラフにおいて、前記主記憶装置の制御可能な帯域性能の最大値を示す勾配部を平行移動し、前記交点B1を通る新たな勾配部と、その勾配の理論帯域値となる勾配値を算出するステップと、
    算出した前記勾配値に前記メモリ転送実行効率の逆数を乗じた実行帯域値となる勾配値を算出するステップと、
    前記主記憶装置の制御帯域性能を実行帯域値とし、前記演算コアの制御性能を制御可能な演算性能の前記最大値とするステップと、
    を含むことを特徴とする請求項7に記載の情報処理方法。
    The roofline optimization calculation step includes:
    In a case where the execution thread of the application is not executed at the same time on a plurality of the calculation cores,
    obtaining a computation strength for each of the execution threads of the application from an application performance definition table containing computation strength information of the application;
    determining a first intersection point between roofline model data consisting of a maximum value of controllable calculation performance of the calculation core and a maximum value of controllable bandwidth performance of the main storage device and the calculation intensity;
    When the first intersection point is a slope part of the roof line model data, the first intersection point is set as an intersection point A1, and a roof indicating a calculation performance value P1 of the intersection point A1 is calculated;
    obtaining the memory transfer execution efficiency of the execution thread of the application from the application performance definition table;
    calculating the bandwidth performance obtained by multiplying the maximum value of the controllable bandwidth performance of the main storage device by the memory transfer execution efficiency, and calculating the intersection point A2 of the gradient part of the roofline model data based on the bandwidth performance and the calculation intensity; and,
    calculating a roof indicating a calculation performance value P2 of the intersection A2;
    setting the control band performance of the main storage device to a maximum value, the theoretical value of the control performance of the arithmetic core to the arithmetic performance value P1, and the lower limit value of the control performance of the arithmetic core to the arithmetic performance value P2; including;
    When the first intersection is the roof of the roofline model data, the first intersection is set as intersection B1, and in the logarithmic graph of the roofline model data, the controllable bandwidth performance of the main storage device a step of translating a slope section showing the maximum value of and calculating a new slope section passing through the intersection B1 and a slope value that is a theoretical band value of the slope;
    calculating a gradient value that is an execution band value obtained by multiplying the calculated gradient value by the reciprocal of the memory transfer execution efficiency;
    setting the control band performance of the main storage device as an execution band value, and setting the control performance of the arithmetic core to the maximum value of controllable arithmetic performance;
    8. The information processing method according to claim 7, further comprising:
  10.  前記ルーフライン制御設定ステップは、
    前記アプリケーションの実行スレッドが複数の前記演算コアで同時間での実行が無い場合において、
    前記演算コアへ割当実行する前記アプリケーションの実行スレッドを取得するステップと、
    前記実行スレッドに対応する前記主記憶装置の制御帯域性能への制御と、前記制御帯域性能に応じた前記主記憶装置の電力制御とを行うステップと、
    を含み、
    前記演算コアの制御性能値が有る場合に、前記演算コアの制御性能を前記制御性能値に制御するステップと、前記制御性能に応じた演算コア電力に制御するステップと、を含み、
    前記演算コアの制御性能の理論値が有る場合に、前記演算コアの制御性能を前記理論値に制御するステップと、前記制御性能に応じた演算コア電力に制御するステップと、
    もしくは、
    前記演算コアの制御性能の下限値が有る場合に、前記演算コアの制御性能を前記下限値に制御するステップと、前記制御性能に応じた演算コア電力に制御するステップと、を含むことを特徴とする請求項7に記載の情報処理方法。
    The roofline control setting step includes:
    In a case where the execution thread of the application is not executed at the same time on a plurality of the calculation cores,
    obtaining an execution thread of the application to be assigned to the computing core;
    controlling the control band performance of the main storage device corresponding to the execution thread, and controlling the power of the main storage device according to the control band performance;
    including;
    If there is a control performance value of the calculation core, the control performance of the calculation core is controlled to the control performance value, and the calculation core power is controlled to be according to the control performance,
    If there is a theoretical value of the control performance of the arithmetic core, controlling the control performance of the arithmetic core to the theoretical value; and controlling the power of the arithmetic core to be in accordance with the control performance.
    or,
    If there is a lower limit value for the control performance of the arithmetic core, the method includes the steps of: controlling the control performance of the arithmetic core to the lower limit value; and controlling the power of the arithmetic core to be in accordance with the control performance. 8. The information processing method according to claim 7.
  11.  前記ルーフライン最適化計算ステップは、
    前記アプリケーションの前記実行スレッドが複数の前記演算コアで同時間での実行が有る場合に、複数の前記演算コアで同時間で実行する前記アプリケーションの前記実行スレッドの組ごとに、前記アプリケーションの演算強度情報を含むアプリケーション性能定義テーブルから前記アプリケーションの前記実行スレッドの組の各々の演算強度を取得するステップと、各々の前記実行スレッドが要求する帯域性能を算出するステップと、を含み、
    算出した各々の前記実行スレッドが要求する前記帯域性能の合計値が主記憶装置の制御可能な帯域性能の最大値よりも大きい場合には、前記帯域性能の合計値に対する各々の前記アプリケーションの前記実行スレッドの帯域割合を算出するステップと、前記主記憶装置の制御帯域性能を最大値とするステップと、を含み、
    算出した各々の前記実行スレッドが要求する前記帯域性能の合計値が主記憶装置の制御可能な帯域性能の最大値よりも大きくない場合には、前記主記憶装置の制御帯域性能を算出した各々の前記実行スレッドが要求する前記帯域性能の合計値とするステップと、を含み、
    前記アプリケーションの前記実行スレッドごとに前記演算コアの制御性能を算出する演算コア制御性能算出ステップと、を含むことを特徴とする請求項7に記載の情報処理方法
    The roofline optimization calculation step includes:
    When the execution threads of the application are executed at the same time on a plurality of the calculation cores, the calculation intensity of the application is determined for each set of execution threads of the application executed at the same time on the plurality of calculation cores. the steps of: obtaining the computation intensity of each of the set of execution threads of the application from an application performance definition table including information; and calculating the bandwidth performance required by each of the execution threads;
    If the calculated total value of the bandwidth performance requested by each of the execution threads is larger than the maximum value of the controllable bandwidth performance of the main storage device, the execution of each of the applications with respect to the total value of the bandwidth performance The method includes a step of calculating a thread bandwidth ratio, and a step of setting the control bandwidth performance of the main storage device to a maximum value,
    If the calculated total value of the bandwidth performance required by each of the execution threads is not larger than the maximum value of the controllable bandwidth performance of the main storage device, each of the calculated control bandwidth performance of the main storage device the total value of the bandwidth performance required by the execution thread;
    8. The information processing method according to claim 7, further comprising: calculating control performance of the calculation core for each execution thread of the application.
  12.  前記アプリケーションの実行スレッドが複数の前記演算コアで同時間での実行が有る場合に、前記アプリケーションの前記実行スレッドごとに前記演算コアの制御性能を算出する演算コア制御性能算出処理において、
    前記アプリケーションの前記実行スレッドごとに、前記アプリケーションの演算強度情報を含むアプリケーション性能定義テーブルから前記アプリケーションの実行スレッドのメモリ転送実行効率を取得するステップと、前記実行スレッドが動作する前記演算コアの制御可能な演算性能の最大値と、主記憶装置の制御可能な帯域性能の最大値とで構成するルーフラインモデルデータと、前記実行スレッドの演算強度との第2の交点を求めるステップと、を含み、
    前記第2の交点が前記ルーフラインモデルデータの勾配部となる場合に、前記第2の交点を交点A1aとし、前記交点A1aの演算性能値P1aを示すルーフを算出するステップと、前記主記憶装置の制御可能な帯域性能の最大値に対して、算出した前記実行スレッドの帯域割合と、取得した前記メモリ転送実行効率とを乗じた帯域性能値を算出するステップと、算出した前記帯域性能値を示すルーフラインモデルデータの勾配部と、取得した前記実行スレッドの前記演算強度との交点A2aを算出するステップと、前記交点A2aの演算性能値P2aを示すルーフを算出するステップと、前記演算コアの制御性能の理論値を前記演算性能値P1aとし、前記演算コアの制御性能の下限値を前記演算性能値P2aとするステップと、を含み、
    前記第2の交点が前記ルーフラインモデルデータの前記ルーフとなる場合に、前記第2の交点を交点B1aとし、算出した前記実行スレッドが要求する前記主記憶装置の帯域性能の合計値が、主記憶装置の制御可能な帯域性能の最大値よりも大きい場合には、前記主記憶装置の制御可能な帯域性能の最大値に対して、算出した前記実行スレッドの帯域割合を乗じた帯域性能値を算出するステップと、算出した前記帯域性能値を示すルーフラインモデルデータの勾配部と、取得した前記実行スレッドの前記演算強度との交点B2aを算出するステップと、前記交点B2aの演算性能値Q2aを示すルーフを算出するステップと、算出した演算性能値Q2aが前記演算コアの制御可能な演算性能の最大値よりも小さい場合には、前記演算コアの制御性能の下限値を前記演算性能値Q2a とするステップと、を含み、
    算出した前記実行スレッドが要求する前記主記憶装置の帯域性能の合計値が、主記憶装置の制御可能な帯域性能の最大値よりも小さい場合、または、算出した演算性能値Q2aが前記演算コアの制御可能な演算性能の最大値よりも大きい場合には、前記演算コアの制御性能を制御可能な演算性能の前記最大値とするステップと、を含むことを特徴とする請求項7に記載の情報処理方法。
    In a calculation core control performance calculation process of calculating the control performance of the calculation core for each execution thread of the application when the execution thread of the application is executed at the same time on a plurality of the calculation cores,
    obtaining memory transfer execution efficiency of the execution thread of the application from an application performance definition table including calculation intensity information of the application for each execution thread of the application; and controllable of the calculation core on which the execution thread operates. calculating a second intersection point between roofline model data consisting of a maximum value of the calculation performance and a maximum value of the controllable bandwidth performance of the main storage device and the calculation intensity of the execution thread;
    When the second intersection point is a slope part of the roof line model data, the second intersection point is set as an intersection point A1a, and a step of calculating a roof indicating a calculation performance value P1a of the intersection point A1a, and the main storage device calculating a bandwidth performance value by multiplying the maximum value of controllable bandwidth performance by the calculated bandwidth ratio of the execution thread and the obtained memory transfer execution efficiency; a step of calculating an intersection point A2a between the slope part of the roof line model data shown and the calculation strength of the acquired execution thread; a step of calculating a roof indicating the calculation performance value P2a of the intersection point A2a; The theoretical value of control performance is set to the calculation performance value P1a, and the lower limit value of the control performance of the calculation core is set to the calculation performance value P2a,
    When the second intersection is the roof of the roofline model data, the second intersection is set as intersection B1a, and the calculated total value of the bandwidth performance of the main storage device requested by the execution thread is If it is larger than the maximum value of the controllable bandwidth performance of the storage device, the bandwidth performance value is calculated by multiplying the maximum value of the controllable bandwidth performance of the main storage device by the calculated bandwidth ratio of the execution thread. a step of calculating an intersection point B2a between a slope part of the roofline model data indicating the calculated band performance value and the calculation strength of the acquired execution thread; and a step of calculating the calculation performance value Q2a of the intersection point B2a. a step of calculating a roof indicated by the calculation performance value Q2a, and when the calculated calculation performance value Q2a is smaller than the maximum value of the controllable calculation performance of the calculation core, the lower limit value of the control performance of the calculation core is set to the calculation performance value Q2a; and a step of
    If the calculated total value of the bandwidth performance of the main storage device required by the execution thread is smaller than the maximum value of the controllable bandwidth performance of the main storage device, or if the calculated calculation performance value Q2a is The information according to claim 7, further comprising the step of setting the control performance of the arithmetic core to the maximum value of the controllable arithmetic performance when the control performance is larger than the maximum value of the controllable arithmetic performance. Processing method.
  13.  ルーフライン制御処理の前記実行スレッドの情報取得処理において、前記アプリケーションの実行スレッドが複数の前記演算コアで同時間での実行が有る場合には、ルーフライン制御の割込みを契機として動作しているか否かを判定するステップと、ルーフライン制御の割込みを契機として動作している場合に、各々の前記演算コアで実行中の前記実行スレッドの組を取得するステップと、を含み、
    前記ルーフライン制御の割込みを契機とする動作ではない場合には、自身の前記演算コアへ割当実行する前記アプリケーションの前記実行スレッドを取得するステップと、自身以外の他の前記演算コアで実行中の前記実行スレッドが有るか否かを判定するステップと、を含み、
    他の前記演算コアで実行中の前記実行スレッドが有る場合には、自身以外の他の前記演算コアに対して、前記ルーフライン制御を実行するための割込みを発行するステップと、各々の前記演算コアで実行中の前記実行スレッドの組を取得するステップと、を含むことを特徴とする請求項7に記載の情報処理方法。
    In the information acquisition process of the execution thread of the roofline control process, if the execution thread of the application is executed at the same time on a plurality of the calculation cores, it is determined whether the execution thread of the application is running in response to an interrupt of the roofline control. and a step of obtaining a set of the execution threads being executed in each of the arithmetic cores when the operation is triggered by a roofline control interrupt,
    If the operation is not triggered by an interrupt of the roofline control, a step of acquiring the execution thread of the application to be executed by assigning it to the calculation core of the application itself, and a step of acquiring the execution thread of the application to be executed by the calculation core of the application itself; determining whether the execution thread is present;
    If there is the execution thread being executed in another of the calculation cores, issuing an interrupt for executing the roofline control to the other calculation cores other than the execution thread; and 8. The information processing method according to claim 7, further comprising the step of obtaining a set of the execution threads being executed in a core.
PCT/JP2022/023104 2022-06-08 2022-06-08 Information processing device and information processing method WO2023238276A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/023104 WO2023238276A1 (en) 2022-06-08 2022-06-08 Information processing device and information processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/023104 WO2023238276A1 (en) 2022-06-08 2022-06-08 Information processing device and information processing method

Publications (1)

Publication Number Publication Date
WO2023238276A1 true WO2023238276A1 (en) 2023-12-14

Family

ID=89117742

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/023104 WO2023238276A1 (en) 2022-06-08 2022-06-08 Information processing device and information processing method

Country Status (1)

Country Link
WO (1) WO2023238276A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015001807A (en) * 2013-06-14 2015-01-05 株式会社デンソー Parallelization compilation method, parallelization compiler, parallelization compilation device, and on-vehicle device
JP2016511880A (en) * 2013-02-05 2016-04-21 クアルコム,インコーポレイテッド System and method for controlling central processing unit power with guaranteed transient deadlines
WO2021250737A1 (en) * 2020-06-08 2021-12-16 三菱電機株式会社 Information processing system and information processing system control method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016511880A (en) * 2013-02-05 2016-04-21 クアルコム,インコーポレイテッド System and method for controlling central processing unit power with guaranteed transient deadlines
JP2015001807A (en) * 2013-06-14 2015-01-05 株式会社デンソー Parallelization compilation method, parallelization compiler, parallelization compilation device, and on-vehicle device
WO2021250737A1 (en) * 2020-06-08 2021-12-16 三菱電機株式会社 Information processing system and information processing system control method

Similar Documents

Publication Publication Date Title
US9946563B2 (en) Batch scheduler management of virtual machines
CN108207114B (en) Techniques for reconfiguring virtual machines
US9063783B2 (en) Coordinating parallel execution of processes using agents
EP2485146A1 (en) System and method for providing hardware virtualization in a virtual machine environment
EP1715405A1 (en) Processing method, system and computer program product for dynamic allocation of processing tasks in a multiprocessor cluster platforms with power adjustment
JP2009140157A (en) Virtual computer system and control method for virtual computer and program
US20160196157A1 (en) Information processing system, management device, and method of controlling information processing system
US8302082B2 (en) Methods and apparatus to provide a managed runtime environment in a sequestered partition
JP3810735B2 (en) An efficient thread-local object allocation method for scalable memory
US20100235669A1 (en) Memory power consumption reduction system, and method and program therefor
JP6400296B2 (en) Multi-mode support processor and processing method for supporting multi-mode
US9515905B1 (en) Management of multiple scale out workloads
WO2023238276A1 (en) Information processing device and information processing method
JP2009223842A (en) Virtual machine control program and virtual machine system
JP2820189B2 (en) Control software execution system for numerical controller
KR101557995B1 (en) Apparatus for supporting multi operating system and method for allocating system resource thereof
KR20130051076A (en) Method and apparatus for scheduling application program
JPH0926889A (en) Virtual machine system
EP1019815B1 (en) Computerized method and system for implementing distributed applications
US7299471B2 (en) Common thread server
CN114115140A (en) Data synchronization system and method between multi-core main controller and main-auxiliary multi-core controller
US20110010695A1 (en) Architecture for accelerated computer processing
JPS59167756A (en) Dispatch control system of virtual computer
KR101865994B1 (en) Virtual cluster management system and method for using the same
CN117311990B (en) Resource adjustment method and device, electronic equipment, storage medium and training platform

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22945782

Country of ref document: EP

Kind code of ref document: A1