JP2002358293A

JP2002358293A - System, method and program for load distribution at run- time

Info

Publication number: JP2002358293A
Application number: JP2001165177A
Authority: JP
Inventors: Kenichiro Matsuura; 健一郎松浦
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-05-31
Filing date: 2001-05-31
Publication date: 2002-12-13

Abstract

PROBLEM TO BE SOLVED: To shorten processing time to run a data parallel program in distributed memory computer systems to combine more than one PE(processing element) varying with kinds and performance and change in load conditions during running of the program. SOLUTION: A conversion part 103 for a monitoring program converts a source program 101 into a converted program 106 possible to collect monitoring information (including a calculating time necessary for each PE to execute processing object for load distributions) 112. A decision part 113 for optimum load distribution calculates data dividing widths optimum for a run-time of the program by using the information 112 and an execution part 114 for load redistribution redistributes loads based on the calculated widths. Moreover, calculations of the optimum widths and redistributions of the loads are executed in more than one time if necessary.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、分散メモリ型計算
機システムに関し、特に、データ並列プログラムの実行
時に、分散メモリ型計算機システムを構成する各プロセ
ッサ・エレメント(PE)に配置するデータのデータ幅を動
的に変更する実行時負荷分散システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a distributed memory type computer system, and more particularly, to a data width of data arranged in each processor element (PE) constituting a distributed memory type computer system when a data parallel program is executed. The present invention relates to a dynamically changing runtime load balancing system.

【０００２】[0002]

【従来の技術】分散メモリ型計算機システムとは、各々
がローカルメモリを持つ複数のPEが、相互結合網（ネッ
トワーク）で結合された構造の計算機システムである。
通常、データは各PEのローカルメモリに分散して配置さ
れる。2. Description of the Related Art A distributed memory type computer system is a computer system having a structure in which a plurality of PEs each having a local memory are connected by an interconnection network.
Normally, data is distributed and located in the local memory of each PE.

【０００３】逐次ソースプログラムから分散メモリ型計
算機システム向けの並列プログラムを生成するコンパイ
ラは、ソースプログラムに現れるデータ（配列）を、各
PEのローカルメモリ上に分割・配置する。A compiler for generating a parallel program for a distributed memory type computer system from a sequential source program converts data (array) appearing in the source program into each program.
Divide and place on PE local memory.

【０００４】複数のPEへのデータの分割・配置に基づく
並列処理をデータ並列処理と呼ぶ。データの分割・配置
によって並列化されるプログラムをデータ並列プログラ
ムと呼び、データの分割・配置によってプログラムを並
列化するプログラミング言語をデータ並列言語と呼ぶ。[0004] Parallel processing based on the division and arrangement of data into a plurality of PEs is called data parallel processing. A program that is parallelized by dividing and arranging data is called a data parallel program, and a programming language that parallelizes the program by dividing and arranging data is called a data parallel language.

【０００５】データの分割にはさまざまな手法がある
が、代表的な手法の一つとしてブロック(BLOCK) 分割が
知られている。BLOCK 分割は、データの分割対象となる
次元方向をPE数に等しい個数の断片に分割する。そし
て、各PEにできるだけ等分に、連続した要素を配置す
る。There are various methods for dividing data. One of the typical methods is block division. The BLOCK division divides the dimension direction to be divided into pieces equal in number to the number of PEs. Then, continuous elements are arranged in each PE as evenly as possible.

【０００６】BLOCK 分割の例を図９に示す。図９（１）
のような要素数1000の１次元配列Ａを考える。Ａ(m:n)
という表記は、配列Ａの第ｍ番目から第ｎ番目までの要
素を表す。例えばＡ(1:1000)は、配列Ａの第１番目から
第1000番目までの要素である。一方、図９（２）は、こ
の配列ＡをPE1 からPE4 までの４個のPEにBLOCK 分割し
た状態である。1000個の要素を持つ配列Ａを４分割し
て、250 個の要素を持つ４個の断片とする。そしてA(1:
250)をPE1 に、A(251:500)をPE2 に、A(501:750)をPE3
に、A(751:1000) をPE4 に配置する。FIG. 9 shows an example of BLOCK division. FIG. 9 (1)
Consider a one-dimensional array A having 1000 elements as shown in FIG. A (m: n)
Represents the m-th to n-th elements of the array A. For example, A (1: 1000) is the first to 1000th elements of array A. On the other hand, FIG. 9 (2) shows a state in which this array A is divided into four PEs from PE1 to PE4 by BLOCK. Array A having 1000 elements is divided into four parts to obtain four fragments having 250 elements. And A (1:
250) to PE1, A (251: 500) to PE2, A (501: 750) to PE3
Then, place A (751: 1000) in PE4.

【０００７】BLOCK 分割ではデータをほぼ均等な幅で分
割するが、データを不均等な幅で分割する手法もある。
データ並列言語High Performance Fortran(HPF) の公認
拡張仕様（High Performance Fortran 2.0公式マニュア
ル, High Performance Fortran Forum, シュプリンガー
・フェアラーク東京(1999)) では、このような分割をGE
N_BLOCK 分割と呼ぶ。[0007] In the BLOCK division, data is divided into substantially equal widths, but there is a method of dividing data into unequal widths.
The officially extended specification of the High Performance Fortran (HPF) data parallel language (High Performance Fortran 2.0 Official Manual, High Performance Fortran Forum, Springer Verlag Tokyo (1999)) describes such partitioning as GE
N_BLOCK This is called division.

【０００８】GEN_BLOCK 分割では、BLOCK 分割と同様
に、データの分割対象となる次元方向をPE数に等しい個
数の断片に分割し、各PEに連続した要素を配置する。こ
のときGEN_BLOCK 分割では、BLOCK 分割とは異なり、断
片の幅を任意に変えることができる。従ってGEN_BLOCK
分割では、データを不均等な幅で分割することも、ある
いはBLOCK 分割と同様に均等な幅で分割することも可能
である。In the GEN_BLOCK division, similarly to the BLOCK division, the dimension direction to be divided is divided into pieces equal in number to the number of PEs, and continuous elements are arranged in each PE. At this time, the width of the fragment can be arbitrarily changed in the GEN_BLOCK division, unlike the BLOCK division. Therefore GEN_BLOCK
In the division, it is possible to divide the data into unequal widths, or to divide the data into equal widths like the BLOCK division.

【０００９】GEN_BLOCK 分割の例を図１０に示す。図１
０（１）のような要素数1000の１次元配列Ａを考える。
一方、図１０（２）は、この配列ＡをPE1 からPE4 まで
の４個のPEにGEN_BLOCK 分割した状態である。1000個の
要素を持つ配列Ａを４分割して、300 個(A(1:300))をPE
1 に、200 個(A(301:500))をPE2 に、400 個(A(501:90
0))をPE3 に、100 個(A(901:1000)) をPE4 に配置す
る。FIG. 10 shows an example of GEN_BLOCK division. FIG.
Consider a one-dimensional array A such as 0 (1) having 1000 elements.
On the other hand, FIG. 10 (2) shows a state in which the array A is divided into GEN_BLOCKs by four PEs from PE1 to PE4. Array A having 1000 elements is divided into four parts, and 300 (A (1: 300)) are PE
1, 200 pieces (A (301: 500)) to PE2, 400 pieces (A (501: 90)
0)) is allocated to PE3 and 100 (A (901: 1000)) are allocated to PE4.

【００１０】HPF プログラムにおいてGEN_BLOCK 分割の
分割幅を指定する例を図１１に示す。図１１は図１０の
ようなGEN_BLOCK 分割を行うHPF プログラムである。GE
N_BLOCK 分割を行うには、PE数に等しい要素数を持つ整
数型の配列を用意する。図１１では配列MAP がこれに該
当する。次にFortran 言語のDATA文などを用いて、各々
のPEに対応する分割幅を配列MAP に設定する。図１１で
は配列MAP の第１番目の要素(MAP(1))がPE1 に、MAP(2)
がPE2 に、MAP(3)がPE3 に、MAP(4)がPE4 に対応した分
割幅を表す。最後にHPF のDISTRIBUTE指示文を用いて、
配列ＡをGEN_BLOCK 分割する。「GEN_BLOCK(MAP)」とい
う記述は「分割幅として配列MAP を用いたGEN_BLOCK 分
割」を意味する。FIG. 11 shows an example of specifying the division width of the GEN_BLOCK division in the HPF program. FIG. 11 shows an HPF program for performing GEN_BLOCK division as shown in FIG. GE
To perform N_BLOCK division, prepare an integer type array with the number of elements equal to the number of PEs. In FIG. 11, the array MAP corresponds to this. Next, the division width corresponding to each PE is set in the array MAP using a Fortran language DATA statement or the like. In FIG. 11, the first element (MAP (1)) of the array MAP is assigned to PE1, and MAP (2)
Indicates the division width corresponding to PE2, MAP (3) corresponds to PE3, and MAP (4) corresponds to PE4. Finally, using the HPF DISTRIBUTE directive,
Array A is divided into GEN_BLOCK. The description “GEN_BLOCK (MAP)” means “GEN_BLOCK division using array MAP as division width”.

【００１１】分割されたデータに対して、各PEが自PEに
対応するローカルメモリ上に配置されたデータの断片に
関する処理を担当し、全PEが並列に処理を進めることに
よって、全体の処理時間を短縮することができる。なぜ
なら、巨大なデータを用いる多くの科学技術計算プログ
ラムの処理時間はデータの大きさに比例するからであ
る。データを断片に分割することによって、各々の断片
に要する処理時間は元のデータの処理時間よりも短縮さ
れる。従って、断片に対する処理を並列化することによ
り、全体の処理時間を短縮することが可能となる。Each PE is in charge of processing of a fragment of data arranged on the local memory corresponding to the own PE with respect to the divided data, and all PEs perform processing in parallel. Can be shortened. This is because the processing time of many scientific and technical calculation programs using huge data is proportional to the size of the data. By dividing the data into fragments, the processing time required for each fragment is shorter than the processing time of the original data. Therefore, by parallelizing the processing for the fragments, it is possible to reduce the entire processing time.

【００１２】データ並列処理において処理時間の短縮を
達成するためには、データの断片を並列に処理する各PE
の処理時間をできるだけ均等にしなければならない。こ
のように並列に動作する複数のPE間で処理を適切に配分
することを負荷分散という。In order to achieve a reduction in processing time in data parallel processing, each PE for processing data fragments in parallel is required.
Processing time must be as uniform as possible. Such proper distribution of processing among a plurality of PEs operating in parallel is called load distribution.

【００１３】あるプログラムに関するPE間の計算速度比
が、プログラムを実行する以前に判明している場合、手
作業ではあるが、GEN_BLOCK 分割を用いて負荷分散を行
うことができる。すなわち、各PEの計算速度に比例した
分割幅でデータをGEN_BLOCK分割し、各々のPEに配置す
れば良い。If the calculation speed ratio between PEs for a certain program is known before executing the program, the load can be distributed by using GEN_BLOCK partitioning, though it is manual. That is, the data may be divided into GEN_BLOCKs with a division width proportional to the calculation speed of each PE, and may be arranged in each PE.

【００１４】プログラムを実行する以前にPE間の計算速
度比が判明していない場合や、プログラムの実行時にPE
間の計算速度比が変化する場合、GEN_BLOCK 分割を用い
て適切な負荷分散を行うためには、プログラムの実行時
に収集した情報を用いてPE間の計算速度比を算出し、計
算速度比に基づいてデータの分割幅を調整することによ
って負荷分散を行う必要がある。このような負荷分散を
自動的に行う従来の技術は存在せず、また手作業で実現
するのは困難である。If the calculation speed ratio between PEs is not known before executing the program,
If the calculation speed ratio changes between the PEs, the appropriate calculation of the calculation speed ratio between the PEs is performed based on the information collected during program execution in order to perform appropriate load distribution using the GEN_BLOCK partition. It is necessary to perform load distribution by adjusting the data division width. There is no conventional technique for automatically performing such load distribution, and it is difficult to implement it manually.

【００１５】負荷分散に関してはさまざまな従来の技術
が存在するが、データ並列プログラムにおいて、PE間の
計算速度比に基づいて自動的にデータの分割幅を調整す
ることによって負荷分散を行う従来の技術は存在しな
い。There are various conventional techniques for load distribution, but in a data parallel program, a conventional technique for automatically adjusting the data division width based on the calculation speed ratio between PEs to perform load distribution. Does not exist.

【００１６】例えば「分散メモリ計算機における負荷分
散方式」（特開平５−７３５１５号公報）は、並列実行
単位の生成、その単位間の通信、同期を行う動作体と、
この動作体が動作する仮想共有空間とを有する並列プロ
グラミングシステムを用いて記述された並列プログラム
を分散メモリ並列計算機上で実行し、その履歴をとり、
この履歴に基づき、並列に動作するプロセスを各プロセ
ッサに分配する技術である。この技術は並列に動作する
プロセスに関する負荷分散を実現するものであり、デー
タ並列プログラムにおいてデータの分割幅を調整する負
荷分散とは異なる。For example, the "load distribution method in a distributed memory computer" (Japanese Patent Laid-Open No. Hei 5-73515) discloses an operation unit for generating a parallel execution unit, communicating between the units, and synchronizing the units.
A parallel program described using a parallel programming system having a virtual shared space in which this operating body operates is executed on a distributed memory parallel computer, and the history is taken.
This is a technique for distributing processes operating in parallel to each processor based on this history. This technique realizes load distribution for processes operating in parallel, and is different from load distribution for adjusting the data division width in a data parallel program.

【００１７】別の例で、例えば「分散計算機環境におけ
るオブジェクトの最適分散配置方法」（特願平１１−７
３１０８号公報）は、分散オブジェクトに関して、分散
計算機間の接続関係、オブジェクト間の依存関係、各計
算機の計算負荷の情報を基に、オブジェクトを最適な計
算機に配置する技術である。この技術は分散オブジェク
ト処理に関する負荷分散を実現するものであり、データ
並列プログラムにおいてデータの分割幅を調整する負荷
分散とは異なる。In another example, for example, "Optimal distributed arrangement method of objects in distributed computer environment" (Japanese Patent Application No. 11-7)
Japanese Patent Application Laid-Open No. 3108) is a technique for allocating an object to an optimal computer based on information on a connection relationship between distributed computers, a dependency relationship between objects, and a calculation load of each computer. This technique realizes load distribution related to distributed object processing, and is different from load distribution that adjusts a data division width in a data parallel program.

【００１８】[0018]

【発明が解決しようとする課題】GEN_BLOCK 分割を用い
た従来の負荷分散方法には次のような問題がある。The conventional load distribution method using GEN_BLOCK division has the following problems.

【００１９】第１の問題点は、種類や性能が異なる複数
のPEを結合した分散メモリ型計算機システムに対して、
有効な負荷分散ができないことである。その理由は、種
類や性能が異なるPEの計算速度比は、PEの種類や性能と
プログラムの性質との相性に依存するため、プログラム
の実行以前には判らないからである。計算速度比がプロ
グラムの実行以前に判明していない場合、従来の方法で
は適切な負荷分散ができない。The first problem is that a distributed memory type computer system in which a plurality of PEs having different types and different performances are combined is
The inability to effectively distribute the load. The reason is that the calculation speed ratio of PEs having different types and performances is not known before the execution of the program because it depends on the compatibility between the types and performances of the PEs and the properties of the program. If the calculation speed ratio is not known before the execution of the program, the conventional method cannot perform an appropriate load distribution.

【００２０】第２の問題点は、プログラムの実行中に負
荷状況が変化する計算機システムに対して有効な負荷分
散ができないことである。その理由は、あるプログラム
に関するPE間の計算速度比は、他のプログラムやオペレ
ーティングシステムなどの負荷状態に応じて随時変化す
るからである。計算速度比がプログラムの実行中に変化
する場合、従来の方法では適切な負荷分散ができない。The second problem is that effective load distribution cannot be performed on a computer system whose load status changes during execution of a program. The reason is that the calculation speed ratio between PEs regarding a certain program changes at any time according to the load state of another program or operating system. If the calculation speed ratio changes during the execution of the program, the conventional method cannot properly distribute the load.

【００２１】[0021]

【発明の目的】そこで、本発明の目的は、種類や性能が
異なる複数のPEを結合した分散メモリ型計算機システム
に対して、適切な負荷分散が可能な実行時負荷分散シス
テムを提供することにある。SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a runtime load distribution system capable of appropriately distributing a load to a distributed memory type computer system in which a plurality of PEs having different types and performances are connected. is there.

【００２２】また本発明の他の目的は、プログラムの実
行中に負荷状況が変化する分散メモリ型計算機システム
に対して、適切な負荷分散が可能な実行時負荷分散シス
テムを提供することにある。It is another object of the present invention to provide a runtime load distribution system capable of appropriately distributing a load to a distributed memory type computer system in which the load situation changes during execution of a program.

【００２３】[0023]

【課題を解決するための手段】本発明の実行時負荷分散
システムは、データ並列プログラムを変形してモニタリ
ング情報の収集を可能とするためのモニタリング用プロ
グラム変換部（図１の１０３）と、モニタリング情報
（図１の１１２）を利用してプログラムの実行時に最適
なデータ分割幅を算出する最適負荷分散決定部（図１の
１１３）と、算出したデータ分割幅に基づいて負荷の再
分散を行う負荷再分散実行部（図１の１１４）を有す
る。A runtime load distribution system according to the present invention comprises a monitoring program conversion unit (103 in FIG. 1) for transforming a data parallel program to enable collection of monitoring information; An optimum load distribution determining unit (113 in FIG. 1) that calculates an optimal data division width at the time of executing a program using information (112 in FIG. 1), and performs load distribution based on the calculated data division width. It has a load redistribution execution unit (114 in FIG. 1).

【００２４】この構成によれば、各PEの処理時間が均等
になり、本発明の第１の目的を達成することができる。According to this configuration, the processing time of each PE becomes equal, and the first object of the present invention can be achieved.

【００２５】また、本発明の実行時負荷分散システム
は、最適負荷分散決定部（図１の１１３）が実行時間の
変化を監視し必要に応じて最適なデータ分割幅の算出を
複数回に渡って行い、負荷再分散実行部（図１の１１
４）が、最適負荷分散決定部（図１の１１３）でデータ
分割幅が算出される毎に、上記データ分割幅に従ってデ
ータの再配置を行う。Further, in the runtime load distribution system of the present invention, the optimum load distribution determination unit (113 in FIG. 1) monitors the change in the execution time and calculates the optimum data division width a plurality of times as necessary. The load redistribution execution unit (11 in FIG. 1)
4), every time the data division width is calculated by the optimum load distribution determination unit (113 in FIG. 1), the data is rearranged according to the data division width.

【００２６】この構成によれば、負荷状況の変化したPE
が発生した場合、各PEの処理時間が均等になるように、
データの再配置が行われるので、本発明の第２の目的を
達成することができる。According to this configuration, the PE whose load condition has changed is
Occurs, so that the processing time of each PE is equal,
Since the data is rearranged, the second object of the present invention can be achieved.

【００２７】[0027]

【発明の実施の形態】次に本発明の実施の形態について
図面を参照して詳細に説明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００２８】図１を用いて、本発明の第１の実施の形態
について詳細に説明する。本発明の第１の実施の形態に
係る実行時負荷分散システム１０２は、モニタリング用
プログラム変換部１０３と、実行時負荷分散制御部１１
１とを含む。更に、実行時負荷分散システム１０２に
は、記録媒体Ｋが接続されている。この記録媒体Ｋは、
フロッピィディスク，ハードディスク，CD-ROM，半導体
メモリ，その他の記録媒体であり、コンピュータを実行
時負荷分散システム１０２として機能させるためのプロ
グラムが記録されている。このプログラムは、コンピュ
ータからなる実行時負荷分散システム１０２によって読
み取られ、その動作を制御することで実行時負荷分散シ
ステム１０２上に、モニタリング用プログラム変換部１
０３、実行時負荷分散制御部１１１を実現する。なお、
本実施の形態では、実行時負荷分散システム１０２を計
算機システム１０９とは別のコンピュータを用いて実現
したが，計算機システム１０９を用いて実現しても良
い。A first embodiment of the present invention will be described in detail with reference to FIG. The runtime load distribution system 102 according to the first embodiment of the present invention includes a monitoring program conversion unit 103 and a runtime load distribution control unit 11.
1 is included. Further, a recording medium K is connected to the runtime load distribution system 102. This recording medium K is
It is a floppy disk, a hard disk, a CD-ROM, a semiconductor memory, or another recording medium in which a program for causing a computer to function as the runtime load distribution system 102 is recorded. This program is read by the runtime load distribution system 102 composed of a computer, and by controlling its operation, the monitoring program conversion unit 1 is placed on the runtime load distribution system 102.
03, implement the runtime load distribution control unit 111. In addition,
In the present embodiment, the runtime load distribution system 102 is realized using a computer different from the computer system 109, but may be realized using the computer system 109.

【００２９】モニタリング用プログラム変換部１０３
は、プログラム解析部１０４と、プログラム変換部１０
５とを含む。Monitoring program converter 103
Are the program analysis unit 104 and the program conversion unit 10
5 is included.

【００３０】プログラム解析部１０４は、ソースプログ
ラム１０１を入力とし、それに含まれている配列宣言、
HPF 指示文、負荷分散制御指示文などを解析し、解析情
報をプログラム変換部１０５に出力する。ソースプログ
ラム１０１は、負荷分散の対象となるソースレベルのデ
ータ並列プログラムである。ソースプログラム１０１
は、負荷分散を制御するための指示文を含む。負荷分散
を制御するための指示文を負荷分散制御指示文と呼ぶ。The program analysis unit 104 receives the source program 101 as an input, and executes an array declaration,
It analyzes HPF directives, load distribution control directives, and the like, and outputs analysis information to the program conversion unit 105. The source program 101 is a source-level data parallel program to be load-balanced. Source program 101
Includes a directive for controlling load distribution. A directive for controlling load distribution is referred to as a load distribution control directive.

【００３１】プログラム変換部１０５は、プログラム解
析部１０４からの解析情報に基づいてソースプログラム
１０１を変形し、実行時負荷分散制御部１１１に接続可
能な変換済みプログラム１０６を生成する。The program conversion unit 105 transforms the source program 101 based on the analysis information from the program analysis unit 104 to generate a converted program 106 connectable to the runtime load distribution control unit 111.

【００３２】並列化処理部１０７は、変換済みプログラ
ム１０６を入力とし、変換済みプログラム１０６におけ
るデータ分割情報や通信情報を解析して、変換済みプロ
グラム１０６を並列化する。The parallelization processing unit 107 receives the converted program 106 as input, analyzes data division information and communication information in the converted program 106, and parallelizes the converted program 106.

【００３３】実行可能プログラム生成部１０８は、並列
化処理部１０７が並列化した変換済みプログラム１０６
を入力とし、プログラムの実行環境となる計算機システ
ム１０９における実行可能プログラム１１０を生成す
る。The executable program generator 108 converts the converted program 106 parallelized by the parallel processor 107.
To generate an executable program 110 in the computer system 109 which is an execution environment of the program.

【００３４】計算機システム１０９は、各々がローカル
メモリを持つ複数のPEが相互結合網で結合された構造の
計算機システムである。The computer system 109 is a computer system having a structure in which a plurality of PEs each having a local memory are connected by an interconnection network.

【００３５】実行時負荷分散制御部１１１は、モニタリ
ング情報１１２と、最適負荷分散決定部１１３と、負荷
再分散実行部１１４と、負荷分散情報１１５と、計算時
間測定部１１６とを含む。The runtime load distribution control unit 111 includes monitoring information 112, an optimal load distribution determination unit 113, a load redistribution execution unit 114, load distribution information 115, and a calculation time measurement unit 116.

【００３６】モニタリング情報１１２は、実行可能プロ
グラム１１０の実行時に収集する情報である。モニタリ
ング情報１１２は、実行可能プログラム１１０に関する
各PEの計算時間等をなどを含む。The monitoring information 112 is information collected when the executable program 110 is executed. The monitoring information 112 includes the calculation time of each PE regarding the executable program 110 and the like.

【００３７】計算時間測定部１１６は、計算機システム
１０９を構成する各PEの計算時間を測定し、それをモニ
タリング情報１１２として記録する。The calculation time measuring section 116 measures the calculation time of each PE constituting the computer system 109 and records it as monitoring information 112.

【００３８】最適負荷分散決定部１１３は、モニタリン
グ情報１１２と負荷分散情報１１５を入力として、最適
なデータ分割幅を算出し、負荷再分散（データの再配
置）に必要なデータの移動量から通信時間を見積もっ
て、負荷の再分散を実行するか否かを決定する。The optimum load distribution determining unit 113 receives the monitoring information 112 and the load distribution information 115 as inputs, calculates an optimum data division width, and determines a communication width based on a data movement amount necessary for load rebalancing (data rearrangement). Estimate the time and determine whether to perform load redistribution.

【００３９】負荷再分散実行部１１４は、最適負荷分散
決定部１１３の決定に基づいてデータの再配置を実行
し、負荷分散情報１１５を更新する。負荷分散情報１１
５は、実行可能プログラム１１０の実行時における負荷
分散の状態を表す情報である。負荷分散情報１１５は、
実行可能プログラム１１０における各PE毎のデータ分割
幅情報などを含む。The load rebalancing execution unit 114 executes data relocation based on the determination of the optimum load distribution determining unit 113, and updates the load balancing information 115. Load balancing information 11
Reference numeral 5 denotes information indicating a state of load distribution when the executable program 110 is executed. The load balancing information 115 is
The information includes data division width information for each PE in the executable program 110.

【００４０】次に図２を用いて、計算機システム１０９
の構成に関して説明する。Next, referring to FIG.
Will be described.

【００４１】計算機システム１０９は、プロセッサとロ
ーカルメモリを組み合わせたプロセッサ・エレメント(P
E1〜PEn)２１−１〜２１−ｎを、相互結合網（ネットワ
ーク）２２で接続した構造である。データは、各PEのロ
ーカルメモリに分散して配置される。The computer system 109 includes a processor element (P) combining a processor and a local memory.
E1 to PEn) 21-1 to 21-n are connected by an interconnection network (network) 22. Data is distributed and placed in the local memory of each PE.

【００４２】[0042]

【動作の説明】次に、本実施の形態の動作について詳細
に説明する。[Description of Operation] Next, the operation of this embodiment will be described in detail.

【００４３】先ず、プログラム解析部１０４が、ソース
プログラム１０１の配列宣言、HPF指示文、負荷分散制
御指示文などを解析し、解析情報をプログラム変換部１
０５に渡す。プログラム変換部１０５は、解析情報に基
づいてソースプログラム１０１を変形し、実行時負荷分
散制御部１１１に接続可能な変換済みプログラム１０６
を生成する。First, the program analysis unit 104 analyzes an array declaration, an HPF directive, a load distribution control directive, and the like of the source program 101, and converts the analysis information to the program conversion unit 1.
Hand over to 05. The program conversion unit 105 transforms the source program 101 based on the analysis information, and converts the converted program 106 connectable to the runtime load distribution control unit 111.
Generate

【００４４】その後、並列化処理部１０７が、変換済み
プログラム１０６を並列化し、更に、実行可能プログラ
ム生成部１０８が実行可能プログラム１１０を生成す
る。After that, the parallel processing unit 107 parallelizes the converted program 106, and the executable program generation unit 108 generates an executable program 110.

【００４５】実行可能プログラム１１０が計算機システ
ム１０９上で起動されると、実行時負荷分散制御部１１
１内の計算時間測定部１１６は、計算機システム１０９
を構成する各PEの計算時間を測定し、それをモニタリン
グ情報１１２として記録する。計算時間とは、プログラ
ム中の処理Ｒに関して、各PEが計算のために要した時間
である。計算時間の測定対象となる処理Ｒは、ソースプ
ログラム１０１中の負荷分散制御指示文によって指定さ
れる。When the executable program 110 is started on the computer system 109, the runtime load distribution control unit 11
1 is a computer system 109
Is calculated and recorded as monitoring information 112. The calculation time is the time required by each PE for calculation for the processing R in the program. The processing R for which the calculation time is to be measured is specified by a load distribution control directive in the source program 101.

【００４６】一方、最適負荷分散決定部１１３，負荷再
分散実行部１１４は、図３の流れ図に示す処理を行う。
尚、ステップ３１〜３５は、最適負荷分散決定部１１３
の処理を示し、ステップ３６，３７は、負荷再分散実行
部１１４の処理を示している。On the other hand, the optimum load distribution determining unit 113 and the load redistribution executing unit 114 perform the processing shown in the flowchart of FIG.
Steps 31 to 35 are performed by the optimum load distribution determination unit 113.
Steps 36 and 37 show the processing of the load redistribution execution unit 114.

【００４７】ステップ３１では、モニタリング情報１１
２中の各PEの計算時間を基に、計算機システム１０９を
構成する各PE間の計算速度比を予測する。In step 31, the monitoring information 11
Based on the calculation time of each PE in 2, the calculation speed ratio between the PEs configuring the computer system 109 is predicted.

【００４８】最適負荷分散決定部１１３は、処理Ｒに関
するモニタリング情報１１２を基に、次回処理Ｒを再び
実行するときに計算時間を最短にするための負荷分散を
決定することを目的とする。また、全てのPEは処理Ｒを
１回実行するたびに、同期処理を行うとする。The purpose of the optimum load distribution determining unit 113 is to determine the load distribution for minimizing the calculation time when the next process R is executed again, based on the monitoring information 112 on the process R. Also, it is assumed that every PE performs the synchronization process every time the process R is performed once.

【００４９】PE数をｎ個、PE番号をk=1,2,...,n とし、
各々のPEをPEk と表す。そして、PEk の計算時間をTkと
し、計算速度をSkとする。Let the number of PEs be n and the PE number be k = 1,2, ..., n,
Each PE is designated PEk. Then, the calculation time of PEk is Tk, and the calculation speed is Sk.

【００５０】PEk の計算速度Skは計算時間Tkに反比例す
ると考えられる。従ってPE間の計算速度比は次式（１）
に示すものとなる。The calculation speed Sk of PEk is considered to be inversely proportional to the calculation time Tk. Therefore, the calculation speed ratio between PEs is given by the following equation (1).
It becomes what is shown in.

【００５１】 S1 : S2 : … : Sk : … : Sn ＝1/T1 : 1/T2 : … : 1/Tk : … : 1/Tn …（１）S1: S2:…: Sk:…: Sn = 1 / T1: 1 / T2:…: 1 / Tk:…: 1 / Tn (1)

【００５２】ステップ３２では、ステップ３１で求めた
PE間の計算速度比を基に、最適なデータ分割幅を求め
る。最適なデータ分割幅とは、PE間の計算速度比に比例
した分割幅である。PE間の計算速度比に比例した分割幅
でデータを分割し、各々のPEに分散配置することによっ
て、各々のPEの処理時間を均等にすることが可能とな
り、全体の処理時間を短縮できる。In step 32, the value obtained in step 31 is obtained.
The optimum data division width is determined based on the calculation speed ratio between PEs. The optimum data division width is a division width proportional to the calculation speed ratio between PEs. By dividing the data with a division width proportional to the calculation speed ratio between the PEs and distributing the data to the respective PEs, the processing time of each PE can be equalized, and the overall processing time can be reduced.

【００５３】計算速度比Skのk=1,2,…,nに関する総和S1
+S2+…+Sn をSsumと表す。また、分割前のデータの幅を
Ｗとする。分割前のデータの幅とは、分割の対象となる
データ（配列) の、分割の対象となる次元の要素数であ
る。The sum S1 of the calculation speed ratio Sk with respect to k = 1, 2,..., N
+ S2 + ... + Sn is represented as Ssum. Further, the width of the data before division is represented by W. The width of the data before division is the number of elements of the dimension to be divided of the data (array) to be divided.

【００５４】Sk、Ssum、Ｗを用いると、PEk に対する最
適なデータ分割幅Mkは、次式（２）で表すことができ
る。Using Sk, Ssum, and W, the optimum data division width Mk for PEk can be expressed by the following equation (2).

【００５５】Mk＝Ｗ×Sk÷Ssum …（２）Mk = W × Sk ÷ Ssum (2)

【００５６】ステップ３３では、式（２）で求めた最適
なデータ分割幅と、負荷分散情報１１５と、モニタリン
グ情報１１２とを基に、負荷再分散によって現在のデー
タ分割幅を最適なデータ分割幅に変更したときに達成で
きる、計算時間の短縮量を予測する。In step 33, based on the optimum data division width obtained by the equation (2), the load distribution information 115 and the monitoring information 112, the current data division width is changed to the optimum data division width by load redistribution. Predict the amount of calculation time reduction that can be achieved when changing to

【００５７】PEk に現在（負荷再分散以前）配置されて
いるデータの分割幅をLkとする。また、負荷再分散後の
PEk の計算時間の予測値をFkとする。Tk、Mk、Lkを用い
ると、Fkは次式（３）に示すものとなる。Let Lk be the division width of the data currently placed in PEk (before load rebalancing). Also, after load rebalancing
Let Fk be the predicted value of PEk calculation time. Using Tk, Mk, and Lk, Fk is given by the following equation (3).

【００５８】Fk＝Tk×Mk÷Lk …（３）Fk = Tk × Mk ÷ Lk (3)

【００５９】k=1,2,...,n に関するTkの最大値をMax(T
k) 、Fkの最大値をMax(Fk) と表すと、負荷再分散によ
って達成できる処理Ｒの計算時間の短縮量Tcalc は、次
式（４）で求めることができる。The maximum value of Tk for k = 1, 2,..., N is defined as Max (T
If the maximum values of k) and Fk are expressed as Max (Fk), the amount of reduction Tcalc in the calculation time of the processing R that can be achieved by the load redistribution can be obtained by the following equation (4).

【００６０】 Tcalc＝Max(Tk) −Max(Fk) …（４）Tcalc = Max (Tk) −Max (Fk) (4)

【００６１】ステップ３４では、負荷分散情報１１５
と、ステップ３２で求めた最適なデータ分割幅と、モニ
タリング情報１１２とを基に、負荷再分散のためのデー
タ通信に要する時間を予測する。In step 34, load balancing information 115
And the optimum data division width obtained in step 32 and the monitoring information 112, the time required for data communication for load rebalancing is predicted.

【００６２】負荷再分散によってPEk に配置するデータ
分割幅はLkからMkに変更される。データ分割幅がLkのと
き、PEk に配置するのは分割対象となる次元において第
LLk+1 番目から第LHk 番目までのデータとする。同様
に、データ分割幅がMkのとき、PEk に配置するのは分割
対象となる次元において第MLk+1 番目から第MHk 番目ま
でのデータとする。LLk 、LHk 、MLk 、MHk は次式（５
−１）〜（５−４）で表すことができる。なお、式（５
−１）〜（５−４）においてsum(範囲：式) は指定した
範囲における式の値の和を表すものとする。例えばsum
(x=1,k : Ln) はL1からLkまでの和を表す。The data division width allocated to PEk is changed from Lk to Mk by load redistribution. When the data division width is Lk, the data is placed in PEk in the dimension to be divided.
The data is from LLk + 1 to LHkth data. Similarly, when the data division width is Mk, what is arranged in PEk is the MLk + 1-th to MHk-th data in the dimension to be divided. LLk, LHk, MLk, and MHk are expressed by the following equation (5
-1) to (5-4). The expression (5)
In (-1) to (5-4), sum (range: expression) represents the sum of the values of the expression in the specified range. For example, sum
(x = 1, k: Ln) represents the sum from L1 to Lk.

【００６３】 LLk＝sum(x=1,k-1 : Lk) …（５−１） LHk＝sum(x=1,k : Lk) …（５−２） MLk＝sum(x=1,k-1 : Mk) …（５−３） MHk＝sum(x=1,k : Mk) …（５−４）但し、式（５−１），（５−３）において、k=1 の時に
は、LLk=0 、MLk=0 とする。LLk = sum (x = 1, k−1: Lk) (5-1) LHk = sum (x = 1, k: Lk) (5-2) MLk = sum (x = 1, k) -1: Mk) (5-3) MHk = sum (x = 1, k: Mk) (5-4) However, in the equations (5-1) and (5-3), when k = 1, , LLk = 0 and MLk = 0.

【００６４】LLk 、LHk 、MLk 、MHk を用いると、負荷
再分散に伴ってPEi からPEj に転送されるデータの分割
次元における幅Dij は、次式（６−１），（６−２）で
表すことができる。なお、式（６−１），（６−２）に
おいて、Min(LHi,MHj)は、LHi とMHj のうち値が小さい
ものを、Max(LLi,MLj)は、LLi とMLj のうち値が大きい
ものを表す。また、ｉ，ｊは、PEの台数がｎ台である場
合、１≦ｉ≦ｎ，１≦ｊ≦ｎとなり、最適負荷分散決定
部１１３は、ｉ，ｊの全ての組み合わせについて式（６
−１），（６−２）に示す演算を行う。When LLk, LHk, MLk, and MHk are used, the width Dij in the division dimension of the data transferred from PEi to PEj with the load redistribution is given by the following equations (6-1) and (6-2). Can be represented. In Equations (6-1) and (6-2), Min (LHi, MHj) represents the smaller value of LHi and MHj, and Max (LLi, MLj) represents the smaller value of LLi and MLj. Represent a large one. Further, i and j are 1 ≦ i ≦ n and 1 ≦ j ≦ n when the number of PEs is n, and the optimum load distribution determination unit 113 determines the formula (6) for all combinations of i and j.
-1) and (6-2).

【００６５】・ i≠j かつ LLi＜ MHjかつ MLj＜ LHiのとき Dij＝Min(LHi,MHj)−Max(LLi,MLj) …（６−１）・ i＝j または LLi≧ MHjまたは MLj≧ LHiのとき Dij＝0 …（６−２）When i ≠ j and LLi <MHj and MLj <LHi, Dij = Min (LHi, MHj) −Max (LLi, MLj) (6-1) i = j or LLi ≧ MHj or MLj ≧ LHi When Dij = 0 (6-2)

【００６６】PEi からPEj に対して分割次元における幅
がDij のデータを転送するために要する時間をUij とす
る。Uij の決定には幾つかの方法が考えられる。例え
ば、プログラムを実行する以前にPE間の通信速度を計測
しておき、この通信速度とデータ幅Dij とに基づいて時
間Uij を決定する方法がある。The time required to transfer data having a width of Dij in the divided dimension from PEi to PEj is defined as Uij. There are several ways to determine Uij. For example, there is a method in which the communication speed between PEs is measured before executing the program, and the time Uij is determined based on the communication speed and the data width Dij.

【００６７】負荷再分散に伴ってｎ個のPE間でデータを
通信するために要する時間をTcommとする。PE間の相互
結合網の性質や通信の方式によって、Tcomm の決定には
幾つかの方法が考えられる。次式（７）では各々のPE間
の通信を全て逐次に行う場合のTcomm の予測値を示す
が、幾つかの独立したPEの組の通信を並行に進められる
場合、Tcomm は式（７）よりも小さい値になる。式
（７）は予測される最長の通信時間である。The time required for data communication between n PEs with the load redistribution is defined as Tcomm. There are several ways to determine Tcomm, depending on the nature of the interconnection network between PEs and the type of communication. The following equation (7) shows the predicted value of Tcomm when all the communications between the PEs are performed sequentially, but if the communications of several independent sets of PEs can proceed in parallel, Tcomm becomes the equation (7) Will be smaller than Equation (7) is the predicted longest communication time.

【００６８】 Tcomm ＝sum(j=1,n : sum(i=1,n : Uij)) …（７）Tcomm = sum (j = 1, n: sum (i = 1, n: Uij)) (7)

【００６９】ステップ３５では、ステップ３３で予測し
た計算時間の短縮量と、ステップ３４で予測した負荷再
分散に要する通信時間とを比較して、計算時間の短縮量
が負荷再分散に要する通信時間を上回るならばステップ
３６に進む。計算時間の短縮量が負荷再分散に要する通
信時間を下回るならば最適負荷分散決定部１１３の処理
を終了し、負荷再分散実行部１１４の処理は行わない。In step 35, the amount of reduction in calculation time predicted in step 33 is compared with the communication time required for load redistribution predicted in step 34, and the amount of reduction in calculation time is determined as the communication time required for load redistribution. If so, the process proceeds to step 36. If the amount of reduction in the calculation time is shorter than the communication time required for load redistribution, the processing of the optimal load distribution determining unit 113 ends, and the processing of the load redistribution executing unit 114 is not performed.

【００７０】ステップ３３で予測した計算時間の短縮量
は式（４）のTcalc である。ステップ３４で予測した負
荷再分散に要する通信時間は式（７）のTcomm である。
ステップ３５では、Tcalc ＞Tcomm ならばステップ３６
に進む。The amount of reduction in calculation time predicted in step 33 is Tcalc in equation (4). The communication time required for load redistribution predicted in step 34 is Tcomm in equation (7).
In step 35, if Tcalc> Tcomm, step 36
Proceed to.

【００７１】ステップ３６、３７は、負荷再分散実行部
１１４の処理である。Steps 36 and 37 are the processing of the load redistribution execution unit 114.

【００７２】ステップ３６では、負荷分散情報１１５
と、ステップ３２で求めた最適なデータ分割幅を基に、
負荷再分散において転送の必要があるデータの範囲と、
通信スケジュールを決定する。In step 36, the load balancing information 115
And the optimal data division width obtained in step 32,
The range of data that needs to be transferred for load rebalancing,
Determine the communication schedule.

【００７３】負荷再分散において転送の必要があるデー
タの範囲は、ステップ３４の式（６−１），（６−２）
を利用して決定できる。通信スケジュールに関しては、
PE間の相互結合網の性質や通信の方式に応じて、最短時
間でデータ転送を完了できる方法を選択する。The range of data that needs to be transferred in the load redistribution is determined by equations (6-1) and (6-2) in step 34.
Can be determined using Regarding the communication schedule,
Select a method that can complete data transfer in the shortest time according to the nature of the interconnection network between PEs and the communication method.

【００７４】ステップ３７では、ステップ３６で決定し
た転送すべきデータの範囲と通信スケジュールとを基
に、PE間通信を用いてデータ転送を実行する。In step 37, data transfer is executed using PE communication based on the range of data to be transferred determined in step 36 and the communication schedule.

【００７５】次に、具体的な実施例を挙げて本実施の形
態の動作を説明する。Next, the operation of this embodiment will be described with reference to specific examples.

【００７６】配列Ａが現れる図４のようなHPF プログラ
ムを考える。Consider an HPF program as shown in FIG.

【００７７】図４の（１）は、要素数1000個の配列Ａの
宣言文である。FIG. 4A is a declaration statement of an array A having 1000 elements.

【００７８】図４の（２）は、要素数４個のプロセッサ
配列Ｐの宣言を行うHPF 指示文である。FIG. 4B shows an HPF directive for declaring a processor array P having four elements.

【００７９】図４の（３）は、配列Ａをプロセッサ配列
Ｐに対してMAP が表す分割幅で割り付けるための負荷分
散制御指示文の例である。FIG. 4C shows an example of a load distribution control instruction for allocating the array A to the processor array P at the division width indicated by MAP.

【００８０】図４の（４）の「DO T=1,5000 」から「EN
D DO」までの外側ループでは、繰り返し毎に全PEの同期
処理を行う。In FIG. 4 (4), “DO T = 1,5000” to “EN
In the outer loop up to “D DO”, the synchronization processing of all PEs is performed at each iteration.

【００８１】図４の（５）の「!DYN$ BALANCE MAP 」
と、図４の（６）の「!DYN$ END BALANCE 」は、実行時
負荷分散を適用する範囲（負荷分散の対象となる処理）
を定めるための負荷分散制御指示文の例である。[! DYN $ BALANCE MAP] in (5) of FIG.
And “! DYN $ END BALANCE” in (6) of FIG. 4 indicates the range to which runtime load balancing is applied (processing subject to load balancing).
5 is an example of a load distribution control instruction statement for determining the value.

【００８２】図４の（７）の「DO I=1,1000 」から「EN
D DO」までの内側ループは、PE間で並列に処理される。
各PEにはMAP で表す分割幅に基づいて配列Ａが配置され
る。各PEは各々のローカルメモリに配置された配列Ａの
断片に関する処理を行う。In FIG. 4 (7), “DO I = 1,1000” to “EN”
The inner loop up to “D DO” is processed in parallel between PEs.
An array A is arranged in each PE based on the division width represented by MAP. Each PE performs a process related to the fragment of the array A arranged in each local memory.

【００８３】図１と図３を用いて、図４のプログラムに
対する動作を説明する。The operation of the program shown in FIG. 4 will be described with reference to FIGS. 1 and 3.

【００８４】プログラム解析部１０４は、図４のプログ
ラムにおける配列宣言、HPF 指示文、負荷分散制御指示
文などを解析し、解析情報をプログラム変換部１０５に
渡す。プログラム変換部１０５は、解析情報に基づい
て、図４のプログラムを変形し、実行時負荷分散制御部
１１１に接続可能な変換済みプログラム１０６を生成す
る。図４のプログラムに対する変形済みプログラム１０
６の一例を図５に示す。The program analysis unit 104 analyzes the array declaration, the HPF instruction, the load distribution control instruction, and the like in the program of FIG. 4, and passes the analysis information to the program conversion unit 105. The program conversion unit 105 transforms the program in FIG. 4 based on the analysis information, and generates a converted program 106 connectable to the runtime load distribution control unit 111. Modified program 10 for the program of FIG.
An example of No. 6 is shown in FIG.

【００８５】図５に示した変換済みプログラム１０６
は、呼び出し文（４）’，（５）’によって実行時負荷
分散制御部１１１に接続可能にされている。プログラム
変換部１０５は、負荷分散の対象となる処理の開始位
置，終了位置を示す負荷分散制御指示文をそれぞれ図５
に示す呼び出し文（４）’，（５）’に変換する機能を
有しており、この機能により、図４に示した負荷分散制
御指示文（５），（６）がそれぞれ図５の呼び出し文
（４）’，（５）’に変換される。つまり、プログラム
解析部１０４の解析情報に基づいて、図４の文（５），
（６）が、負荷分散の対象となる処理の開始位置，終了
位置を示す負荷分散制御指示文であることを認識するこ
とにより、プログラム変換部１０５は、図４の負荷分散
制御指示文（５），（６）をそれぞれ図５の呼び出し文
（４）’，（５）’に変換する。The converted program 106 shown in FIG.
Can be connected to the runtime load distribution control unit 111 by calling statements (4) ′ and (5) ′. The program conversion unit 105 sends a load distribution control instruction statement indicating a start position and an end position of the process to be load-balanced, respectively, as shown in FIG.
5 is converted into the call statements (4) ′ and (5) ′ shown in FIG. 5, and the load balancing control instruction statements (5) and (6) shown in FIG. It is converted into sentences (4) 'and (5)'. That is, based on the analysis information of the program analysis unit 104, the sentence (5) of FIG.
By recognizing that (6) is a load distribution control instruction indicating the start position and the end position of the processing to be load distributed, the program conversion unit 105 executes the load distribution control instruction (5 ) And (6) are converted into call statements (4) ′ and (5) ′ in FIG. 5, respectively.

【００８６】更に、プログラム変換部１０５は、プログ
ラム変換部１０４の解析情報に基づいて、データ分割幅
を保持する配列MAP の宣言、配列MAP の初期値の設定処
理、およびデータの初期分割処理を挿入する機能を有し
ている。この機能により、プログラム変換部１０５は、
プログラム解析部１０４から渡される、配列宣言文
（１）、HPF 指示文（２）及び負荷分散制御指示文
（３）の解析情報に基づいて、図５に示すように、配列
MAP の宣言文（１）’や、初期値の設定処理（２）’
や、データの初期分割処理（３）’を変換済みプログラ
ム１０６に挿入する。Further, the program conversion unit 105 inserts a declaration of the array MAP holding the data division width, a process of setting an initial value of the array MAP, and a process of initial data division based on the analysis information of the program conversion unit 104. It has the function to do. With this function, the program conversion unit 105
Based on the analysis information of the array declaration statement (1), the HPF directive statement (2), and the load distribution control directive statement (3) passed from the program analysis unit 104, as shown in FIG.
MAP declaration (1) 'and initial value setting (2)'
Alternatively, the data division process (3) ′ is inserted into the converted program 106.

【００８７】図５における配列MAP のｋ番目の要素が、
PEk に配置する配列Ａの分割幅となる。また、配列MAP
の４個の要素の初期値は、配列Ａの分割次元の要素数10
00をプロセッサ配列の要素数４で等分した250 とする。
従って、配列MAP の要素の値は、順に250,250,250,250
となる。The k-th element of the array MAP in FIG.
This is the division width of the array A arranged in PEk. Also, the array MAP
The initial values of the four elements are 10 elements in the array A
00 is 250 equally divided by the number of elements 4 in the processor array.
Therefore, the values of the elements of the array MAP are 250, 250, 250, 250
Becomes

【００８８】上述のように生成された変換済みプログラ
ム１０６は、並列化処理部１０７及び実行可能プログラ
ム生成部１０８によって実行可能プログラム１１０に変
換される。HPF プログラムの場合、並列化処理部１０７
としてHPF コンパイラを、実行可能プログラム生成部１
０８としてFortran コンパイラを利用すれば、実行可能
プログラム１１０を得ることができる。The converted program 106 generated as described above is converted into an executable program 110 by the parallel processing unit 107 and the executable program generation unit 108. In the case of an HPF program, the parallel processing unit 107
HPF compiler as executable program generator 1
If a Fortran compiler is used as 08, the executable program 110 can be obtained.

【００８９】図４の外側ループ（４）の初回の繰り返し
において、配列ＡのPEk に配置する分割幅Lkは、図６に
示すように「250,250,250,250 」である。各PEは、配列
Ａの250 個の要素に関して並列に処理を行う。In the first iteration of the outer loop (4) in FIG. 4, the division width Lk arranged in the PEk of the array A is “250, 250, 250, 250” as shown in FIG. Each PE performs parallel processing on the 250 elements of array A.

【００９０】また、各PEは、内側ループ（７）の実行開
始時および実行終了時に、呼び出し文（４）’，
（５）’に従って実行時負荷分散制御部１１１を呼び出
す。その際、各PEは、実行開始時の呼び出し（実行開始
時呼び出し）なのか、実行終了時の呼び出し（実行終了
時呼び出し）なのかを区別する情報と、自PEのPE番号を
実行時負荷分散制御部１１１に送る。At the start and end of the execution of the inner loop (7), each PE calls the call statements (4) ′,
The runtime load distribution control unit 111 is called according to (5) ′. At this time, each PE uses information for distinguishing whether it is a call at the start of execution (call at the start of execution) or a call at the end of execution (call at the end of execution), and the PE number of its own PE at the time of load balancing at the time of execution. Send to control unit 111.

【００９１】実行時負荷分散制御部１１１内の計算時間
測定部１１６は、各PEからの最初の実行開始時呼び出
し，実行終了時呼び出しに基づいて、各PEに於ける内側
ループ（７）の計算時間を測定し、モニタリング情報１
１２に記録する。具体的には、或るPEk から最初の実行
開始時呼び出しが行われた時、その時刻を記録してお
き、その後、上記PEk から最初の実行終了時呼び出しが
行われたとき、現在時刻から上記記録しておいた時刻を
差し引くことにより、上記PEk における内側ループ
（７）の計算時間を測定する。尚、この例では、最初の
実行開始時呼び出し，実行終了時呼び出しに基づいて、
計算時間を測定するようにしたが、２回目以降の呼び出
しに基づいて計算時間を測定してもよい。計算時間測定
部１１６は、計算機システム１０９を構成する全てのPE
の計算時刻をモニタリング情報１１２に記録すると、最
適負荷分散決定部１１３を起動する。The calculation time measuring unit 116 in the runtime load distribution control unit 111 calculates the inner loop (7) in each PE based on the first execution start call and the execution end call from each PE. Measure time and monitor information 1
Record at 12. More specifically, when the first execution start call is made from a certain PEk, the time is recorded, and then, when the first execution end call is made from the PEk, the time is recorded from the current time. The calculation time of the inner loop (7) in PEk is measured by subtracting the recorded time. In this example, based on the first execution start call and the first execution end call,
Although the calculation time is measured, the calculation time may be measured based on the second and subsequent calls. The calculation time measuring unit 116 is provided for all the PEs constituting the computer system 109.
When the calculation time of is calculated in the monitoring information 112, the optimum load distribution determination unit 113 is activated.

【００９２】これにより、最適負荷分散決定部１１３
は、モニタリング情報１１２に記録された各PEの計算時
間に基づいて、PE間の計算速度比を求める（図３のステ
ップ３１）。ここで、各PE1,PE2,PE3,PE4 における内側
ループ（７）の計算時間Tkが、図６のように「100,150,
75,300」であったとすると、速度比Skは「3:2:4:1 」と
なる。Thus, the optimum load distribution determining unit 113
Calculates the calculation speed ratio between PEs based on the calculation time of each PE recorded in the monitoring information 112 (step 31 in FIG. 3). Here, the calculation time Tk of the inner loop (7) in each of PE1, PE2, PE3, and PE4 is “100, 150,
If the speed ratio is 75,300, the speed ratio Sk becomes "3: 2: 4: 1".

【００９３】その後、最適負荷分散決定部１１３は、ス
テップ３２において最適なデータ分割幅を求める。前出
の式（２）によれば各PE1,PE2,PE3,PE4 に対する最適な
分割幅Mkは、図６のように「300,200,400,100 」とな
る。Thereafter, the optimum load distribution determining unit 113 calculates the optimum data division width in step 32. According to the above equation (2), the optimum division width Mk for each of PE1, PE2, PE3, and PE4 is "300, 200, 400, 100" as shown in FIG.

【００９４】次のステップ３３では計算時間の短縮量Tc
alc を予測する。前出の式（３）により、データの再配
置後の各PE1,PE2,PE3,PE4 における計算時間の予測値Fk
は、図６のように「120,120,120,120 」となる。更に、
前出の式（４）により短縮量Tcalc は、Tcalc ＝Max(T
k) − Max(Fk)＝300 −120 ＝180 となる。In the next step 33, the shortening amount Tc of the calculation time is calculated.
Predict alc. According to the above equation (3), the predicted value Fk of the calculation time in each of the PE1, PE2, PE3, and PE4 after the data rearrangement.
Is "120,120,120,120" as shown in FIG. Furthermore,
According to the above equation (4), the shortening amount Tcalc is calculated as Tcalc = Max (T
k)-Max (Fk) = 300-120 = 180.

【００９５】ステップ３４では負荷再分散（データの再
配置）に要する通信時間を予測する。まず、前出の式
（５−１）〜（５−４）によりPEk に配置されるデータ
の範囲LLk 、LHk 、MLk 、MHk が図６のように求まる。In step 34, the communication time required for load rebalancing (data relocation) is predicted. First, the ranges LLk, LHk, MLk, and MHk of the data arranged in PEk are obtained as shown in FIG. 6 by the above-described equations (5-1) to (5-4).

【００９６】次に、前出の式（６−１），（６−２）に
よりPEi からPEj に転送するデータの幅Dij が、図７の
ように求まる。図７は、PE2 からPE1 に幅50のデータ、
PE4からPE3 に幅150 のデータを転送することを表す。
データ転送を図示したのが図８である。Next, the width Dij of the data to be transferred from PEi to PEj is determined by the above equations (6-1) and (6-2) as shown in FIG. Figure 7 shows the data of width 50 from PE2 to PE1,
Indicates that data with a width of 150 is transferred from PE4 to PE3.
FIG. 8 illustrates the data transfer.

【００９７】PEi からPEj に幅Dij のデータを転送する
ために要する時間Uij が、ｉとｊの値に関わらず一定の
値「Dij ×0.5 」であるとすると、前出の式（７）によ
り負荷再分散に要する通信時間Tcomm は、Tcomm ＝sum
(j=1,n : sum(i=1,n : Uij))＝50×0.5 ＋150 ×0.5 ＝
100 となる。Assuming that the time Uij required to transfer data of width Dij from PEi to PEj is a constant value “Dij × 0.5” regardless of the values of i and j, the above equation (7) is used. The communication time Tcomm required for load rebalancing is Tcomm = sum
(j = 1, n: sum (i = 1, n: Uij)) = 50 × 0.5 + 150 × 0.5 =
It will be 100.

【００９８】ステップ３５では計算時間の短縮量Tcalc
が負荷再分散に要する時間Tcomm を上回るかどうかを判
定する。この例の場合、Tcalc=180 、Tcomm=100 である
から、Tcalc ＞Tcomm である。従って、負荷再分散実行
部１１４によりステップ３６、ステップ３７が実行さ
れ、データの再配置が行われる。In step 35, the calculation time reduction amount Tcalc
Is longer than the time Tcomm required for load rebalancing. In this example, since Tcalc = 180 and Tcomm = 100, Tcalc> Tcomm. Therefore, steps 36 and 37 are executed by the load rebalancing execution unit 114, and data relocation is performed.

【００９９】[0099]

【発明の他の実施の形態】次に、本発明の第２の実施の
形態について説明する。Next, a second embodiment of the present invention will be described.

【０１００】上述した第１の実施の形態においては、実
行時負荷分散の対象となるデータ（配列）の初期分割
を、各PEにできるだけ均等な幅の断片を配置するBLOCK
分割とした。これに対し第２の実施の形態では、PE間の
計算速度比の概算値が予め判明している場合に、計算速
度比に基づいた比率でデータを初期分割するようにす
る。データの初期分割の決定は、プログラム変換部１０
５が行う。例えば、ソースプログラム１０１が図４に示
すものであり、且つ４台のPEの計算速度比の概算値が
「３：２：４：１」である場合、各PEに初期分割するデ
ータ幅を「300,200,400,100 」と決定する。そして、図
５の（２）’に示した「DATA MAP/250,250,250,250/ 」
の代わりに「DATA MAP/300,200,400,100/ 」を挿入した
変換済みプログラム１０６を生成する。なお、PE間の計
算速度比の概算値は、例えば、図示を省略したキーボー
ド等の入力装置或いは磁気ディスクやCD-ROM等の記録媒
体からプログラム変換部１０５に入力される。In the first embodiment described above, the initial division of data (array) to be subjected to runtime load distribution is performed by using a BLOCK in which fragments having a width as uniform as possible are allocated to each PE.
It was divided. On the other hand, in the second embodiment, when an approximate value of the calculation speed ratio between PEs is known in advance, data is initially divided at a ratio based on the calculation speed ratio. The determination of the initial data division is performed by the program conversion unit 10.
5 does. For example, when the source program 101 is as shown in FIG. 4 and the approximate value of the calculation speed ratio of four PEs is “3: 2: 4: 1”, the data width to be initially divided into each PE is “ 300,200,400,100 ". Then, "DATA MAP / 250,250,250,250 /" shown in (2) 'of FIG.
, A converted program 106 in which “DATA MAP / 300, 200, 400, 100 /” is inserted. The approximate value of the calculation speed ratio between PEs is input to the program conversion unit 105 from an input device such as a keyboard (not shown) or a recording medium such as a magnetic disk or a CD-ROM.

【０１０１】一般に、計算速度比に基づいて決定したデ
ータの初期分割は、単純なBLOCK 分割に比べて、実行時
に測定した計算時間に基づいて決定した最適な分割に近
いと考えられる。従って第２の実施の形態では、第１の
実施の形態に比べて、負荷再分散に伴うデータ通信の量
を軽減できる。In general, it is considered that the initial division of data determined based on the calculation speed ratio is closer to the optimal division determined based on the calculation time measured at the time of execution than the simple BLOCK division. Therefore, in the second embodiment, the amount of data communication associated with the load redistribution can be reduced as compared with the first embodiment.

【０１０２】次に、本発明の第３の実施の形態について
説明する。Next, a third embodiment of the present invention will be described.

【０１０３】第１の実施の形態では、最適負荷分散決定
部１１３が、負荷再分散の対象となる処理Ｒに関して、
負荷再分散による処理Ｒの１回の実行に要する計算時間
の短縮量が、負荷再分散に伴うデータ通信に要する時間
を上回る場合に、負荷再分散を実行するようにしてい
る。第１の実施の形態の動作の説明に用いた図４のプロ
グラムの場合、処理Ｒに該当するのは内側ループ（７）
である。すなわち、負荷再分散による内側ループ（７）
の１回の実行に要する計算時間の短縮量が、負荷再分散
に伴うデータ通信に要する時間を上回る場合に、負荷再
分散を実行するようにしている。In the first embodiment, the optimum load distribution determining unit 113 determines that the processing R to be subjected to load redistribution
When the amount of reduction in the calculation time required for one execution of the processing R by the load redistribution exceeds the time required for data communication accompanying the load redistribution, the load redistribution is executed. In the case of the program of FIG. 4 used for describing the operation of the first embodiment, the processing R corresponds to the inner loop (7).
It is. That is, inner loop (7) due to load redistribution
When the amount of reduction in the calculation time required for one execution of is longer than the time required for data communication accompanying the load redistribution, the load redistribution is executed.

【０１０４】これに対し、第３の実施の形態では、最適
負荷分散決定部１１３が、負荷分散の対象となる処理Ｒ
がプログラムの全実行過程において複数回実行される場
合に、処理Ｒの複数回の実行に関する計算時間の短縮量
と、負荷再分散に伴うデータ通信に要する時間とを比較
する。On the other hand, in the third embodiment, the optimum load distribution determining unit 113 executes processing R to be subjected to load distribution.
Is executed a plurality of times in the entire execution process of the program, the amount of reduction in calculation time for the plurality of executions of the processing R is compared with the time required for data communication accompanying load rebalancing.

【０１０５】図４のプログラムにおいて、外側ループ
（４）の内側にある内側ループ（７）は、プログラムの
全実行過程においては5000回実行される。内側ループ
（７）に関するPE間の計算速度比がプログラムの全実行
過程を通じてほぼ一定である場合、負荷再分散を行う
と、以後の内側ループ（７）の全ての実行の計算時間が
短縮される。すなわち、内側ループ（７）を１回実行し
た後に負荷再分散を行った場合、初期分割による実行と
比べた計算時間の短縮量をTcalc とすると、内側ループ
（７）の第１回目の実行を除いた残り4999回の実行に関
する短縮量の合計は理想的にはTcalc ×4999となる。In the program shown in FIG. 4, the inner loop (7) inside the outer loop (4) is executed 5000 times in the entire execution process of the program. When the calculation speed ratio between PEs regarding the inner loop (7) is substantially constant throughout the entire execution process of the program, performing the load redistribution shortens the calculation time of all subsequent executions of the inner loop (7). . That is, in the case where the load redistribution is performed after executing the inner loop (7) once, if the amount of reduction in the calculation time as compared with the execution by the initial division is Tcalc, the first execution of the inner loop (7) is The total amount of shortening for the remaining 4999 executions is ideally Tcalc x 4999.

【０１０６】なお、短縮量の合計を求めるためには、内
側ループ（７）の繰り返し回数を最適負荷分散決定部１
１３で認識することが必要になる。本実施の形態では、
次のようにして最適負荷分散決定部１１３が内側ループ
（７）の繰り返し回数を認識できるようにしている。即
ち、プログラム変換部１０５が変換済みプログラム１０
６を生成する際に、内側ループ（７）の繰り返し回数を
モニタリング情報１１２に記録するための記録処理を変
換済みプログラム１０６に挿入しておき、プログラムの
実行時に上記記録処理によってモニタリング情報１１２
に内側ループ（７）の繰り返し回数を記録するようにす
る。最適負荷分散決定部１１３は、モニタリング情報１
１２に基づいて、内側ループ（７）の繰り返し回数を認
識する。In order to obtain the total amount of shortening, the number of repetitions of the inner loop (7) is determined by the optimum load distribution determining unit 1.
It is necessary to recognize at 13. In the present embodiment,
The optimum load distribution determining unit 113 can recognize the number of repetitions of the inner loop (7) as follows. That is, the program conversion unit 105 converts the converted program 10
6 is generated, a recording process for recording the number of repetitions of the inner loop (7) in the monitoring information 112 is inserted into the converted program 106. When the program is executed, the monitoring information 112 is recorded by the recording process.
, The number of repetitions of the inner loop (7) is recorded. The optimum load distribution determining unit 113 checks the monitoring information 1
12, the number of repetitions of the inner loop (7) is recognized.

【０１０７】一方、負荷再分散に伴うデータ通信に要す
る時間をTcomm とすると、プログラムの全実行過程にお
いて初期分割による実行と比べた計算と通信を含めた処
理時間の短縮量Tallは次式（８）のように求まる。On the other hand, assuming that the time required for the data communication accompanying the load redistribution is Tcomm, the reduction amount Tall of the processing time including the calculation and the communication compared with the execution by the initial division in the entire execution process of the program is expressed by the following equation (8) ).

【０１０８】 Tall＝ Tcalc× 4999 − Tcomm …（８）Tall = Tcalc × 4999−Tcomm (8)

【０１０９】第３の実施の形態では、Tall＞0 の場合に
負荷再分散を実行する。一般的には、負荷再分散の対象
となる処理Ｒをｃ回繰り返すプログラムで、ｄ回の処理
Ｒの実行後に負荷再分散を行う場合、次式（９）で示す
Tallが０より大きければ負荷再分散を実行する。In the third embodiment, load rebalancing is executed when Tall> 0. Generally, in a program that repeats the process R to be subjected to load rebalancing c times, and performs the load rebalancing after executing the process R d times, the following formula (9) is used.
If Tall is greater than 0, load rebalancing is performed.

【０１１０】 Tall＝ Tcalc× (c-d)−Tcomm …（９）Tall = Tcalc × (c−d) −Tcomm (9)

【０１１１】第１の実施の形態では、負荷再分散の対象
となる処理Ｒの１回の実行に関して、計算時間の短縮量
がデータ通信に要する時間を上回る場合に負荷再分散を
実行する。一方、第３の実施の形態では、プログラムの
全実行過程において、処理Ｒの計算時間の短縮量の合計
がデータ通信に要する時間を上回る場合に負荷再分散を
実行する。処理Ｒを複数回実行する場合、第３の実施の
形態は第１の実施の形態に比べて、プログラムの全実行
過程における計算および通信を合わせた処理時間を短縮
することができる。In the first embodiment, with respect to one execution of the processing R to be subjected to the load redistribution, the load redistribution is executed when the reduction amount of the calculation time exceeds the time required for data communication. On the other hand, in the third embodiment, load rebalancing is performed when the total amount of reduction in the calculation time of the processing R exceeds the time required for data communication in the entire execution process of the program. When the process R is executed a plurality of times, the third embodiment can reduce the processing time including the calculation and the communication in the entire execution process of the program as compared with the first embodiment.

【０１１２】次に、本発明の第４の実施の形態について
説明する。Next, a fourth embodiment of the present invention will be described.

【０１１３】第１の実施の形態では、負荷分散の対象と
なる処理Ｒが最初に実行された時などに１回だけ、最適
負荷分散決定部１１３及び負荷再分散実行部１１４が、
図３の流れ図に示す処理を行う。In the first embodiment, the optimal load distribution determining unit 113 and the load redistribution executing unit 114 execute the process R once, for example, when the process R targeted for load distribution is first executed.
The processing shown in the flowchart of FIG. 3 is performed.

【０１１４】これに対して、第４の実施の形態では、計
算機システム１０９に状態変動が発生し、PE間の計算速
度比が変化する毎に、最適負荷分散決定部１１３及び負
荷再分散実行部１１４が図３の流れ図に示す処理を行
う。尚、PE間の計算速度比を変化させる計算機システム
１０９の状態変動としては、例えば、計算機システム１
０９上で動作する他の実行可能プログラムの終了や、新
たな他の実行可能プログラムの起動などがある。On the other hand, in the fourth embodiment, each time a state change occurs in the computer system 109 and the calculation speed ratio between PEs changes, the optimum load distribution determining unit 113 and the load rebalancing execution unit 114 performs the processing shown in the flowchart of FIG. The state change of the computer system 109 for changing the calculation speed ratio between PEs includes, for example, the computer system 1
For example, there is an end of another executable program that runs on the server 09 or a start of another new executable program.

【０１１５】第４の実施の形態では、計算時間測定部１
１６は、各PEが処理Ｒを所定回数（１回或いは複数回）
実行する毎に、各PEが処理Ｒの実行に要した計算時間を
測定し、それをモニタリング情報１１２に記録する。最
適負荷分散決定部１１３は、モニタリング情報１１２を
常時監視し、各PEの計算時間がモニタリング情報１１２
に新たに記録されると、各PE毎に前回記録された計算時
間と今回記録された計算時間との差分を求める。そし
て、差分が予め定められている閾値よりも大きいPEが１
つでも存在する場合は、図３の流れ図に示す処理を開始
する。尚、図３のステップ３４では、PEi からPEj に対
して分割次元に於ける幅がDij のデータを転送するため
に要する時間Uij を使用するが、本実施の形態では、モ
ニタリング情報１１２に記録されているPE間の通信時間
情報およびPE間の通信量情報を用いてPE間の通信速度を
予測し、この予測した通信速度と転送するデータの幅Di
j とに基づいて時間Uij を予測する。尚、PE間の通信時
間及び通信量は、例えば、負荷再分散実行部１１４が、
負荷の再分散を実行する際に測定し、モニタリング情報
１１２に記録しておく。但し、第１回目の再分散を行う
前は、第１の実施の形態と同様にする。In the fourth embodiment, the calculation time measuring unit 1
16 is a predetermined number (one or more times) of processing R by each PE
Each time the processing is executed, the calculation time required for each PE to execute the processing R is measured and recorded in the monitoring information 112. The optimum load distribution determining unit 113 constantly monitors the monitoring information 112 and calculates the calculation time of each PE.
Then, a difference between the previously recorded calculation time and the currently recorded calculation time is obtained for each PE. Then, the PE whose difference is larger than a predetermined threshold is 1
If there is at least one, the process shown in the flowchart of FIG. 3 is started. Note that in step 34 of FIG. 3, the time Uij required to transfer data having a width of Dij in the divided dimension from PEi to PEj is used. In the present embodiment, the time Uij is recorded in the monitoring information 112. The communication speed between PEs is predicted using the communication time information between PEs and the communication amount information between PEs, and the predicted communication speed and the data width Di to be transferred are estimated.
Predict the time Uij based on j. The communication time and the communication amount between the PEs are, for example, the load redistribution execution unit 114,
It is measured when the load is redistributed and recorded in the monitoring information 112. However, before the first redistribution is performed, it is the same as in the first embodiment.

【０１１６】第４の実施の形態では、プログラムの実行
中に計算機システムの状態が変動し、PE間の計算速度比
が変化した場合でも、そのつど負荷再分散を実行するこ
とによって、適切な負荷分散状態を維持することができ
る。In the fourth embodiment, even when the state of the computer system fluctuates during the execution of the program and the calculation speed ratio between the PEs changes, the load re-distribution is executed each time, so that an appropriate load can be obtained. A dispersed state can be maintained.

【０１１７】[0117]

【発明の効果】第１の効果は、プログラムの実行時に計
測したモニタリング情報を基に、最適負荷分散決定部に
おいてPEの計算速度比に基づいたデータ分割幅を算出
し、算出したデータ分割幅に従って負荷再分散実行部が
負荷を再分散（データを再配置）することにより、プロ
グラムの実行以前にPEの計算速度比を知ることが難しい
計算機システム、特に種類や性能が異なる複数のPEを結
合した分散メモリ型計算機システムに対して、適切な負
荷分散ができることである。The first effect is that the optimum load distribution determining unit calculates a data division width based on the PE calculation speed ratio based on the monitoring information measured at the time of executing the program, and according to the calculated data division width. The load redistribution execution unit redistributes the load (relocates data), so that it is difficult to know the calculation speed ratio of PEs before executing the program, especially computer systems with different types and performances are combined. An appropriate load distribution can be performed on a distributed memory type computer system.

【０１１８】第２の効果は、プログラムの実行時に負荷
分散の対象となる処理の実行時間の変化を最適負荷分散
決定部が監視し、必要に応じて、最適負荷分散決定部に
よる最適データ分割幅の決定と、負荷再分散実行部によ
る負荷再分散を、複数回に渡って行うことにより、プロ
グラムの実行中に負荷状況が変化する計算機システムに
対して、適切な負荷分散ができることである。The second effect is that the optimum load distribution determining unit monitors a change in the execution time of the processing to be load distributed during the execution of the program, and, if necessary, the optimum data division width by the optimum load distribution determining unit. Is determined and the load redistribution by the load redistribution execution unit is performed a plurality of times, so that appropriate load distribution can be performed on a computer system whose load status changes during execution of the program.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の第１、第２、第３および第４の実施の
形態の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of first, second, third and fourth embodiments of the present invention.

【図２】本発明が実行される分散メモリ型計算機システ
ムの一構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of a distributed memory computer system on which the present invention is executed.

【図３】本発明の最適負荷分散決定部１１３および負荷
再分散実行部１１４の処理例を示す流れ図である。FIG. 3 is a flowchart showing a processing example of an optimum load distribution determining unit 113 and a load redistribution executing unit 114 according to the present invention.

【図４】第１の実施の形態の動作を説明するためのソー
スプログラム１０１の例を示す図である。FIG. 4 is a diagram showing an example of a source program 101 for explaining an operation of the first embodiment.

【図５】図４のソースプログラム１０１に基づいて生成
された変換済みプログラム１０６の例を示す図である。FIG. 5 is a diagram showing an example of a converted program 106 generated based on the source program 101 of FIG.

【図６】第１の実施の形態の動作の説明において、最適
負荷分散決定部１１３が計算に用いるパラメータを示す
図である。FIG. 6 is a diagram illustrating parameters used for calculation by the optimum load distribution determining unit 113 in the description of the operation of the first embodiment.

【図７】第１の実施の形態の動作の説明において、最適
負荷分散決定部１１３が算出したPE間のデータ通信の幅
を示す図である。FIG. 7 is a diagram illustrating the width of data communication between PEs calculated by the optimum load distribution determining unit 113 in the description of the operation of the first embodiment.

【図８】第１の実施の形態の動作の説明において、最適
負荷分散決定部が算出したPE間のデータ通信を図示した
概念図である。FIG. 8 is a conceptual diagram illustrating data communication between PEs calculated by the optimum load distribution determining unit in the description of the operation of the first embodiment.

【図９】BLOCK 分割の概念を説明するための例である。FIG. 9 is an example for explaining the concept of BLOCK division.

【図１０】GEN_BLOCK 分割の概念を説明するための例で
ある。FIG. 10 is an example for explaining the concept of GEN_BLOCK division.

【図１１】GEN_BLOCK 分割を用いたHPF プログラムの例
である。FIG. 11 is an example of an HPF program using GEN_BLOCK division.

【符号の説明】[Explanation of symbols]

１０１…ソースプログラム１０２…実行時負荷分散システム１０３…モニタリング用プログラム変換部１０４…プログラム解析部１０５…プログラム変換部１０６…変換済みプログラム１０７…並列化処理部１０８…実行可能プログラム生成部１０９…計算機システム１１０…実行可能プログラム１１１…実行時負荷分散制御部１１２…モニタリング情報１１３…最適負荷分散決定部１１４…負荷再分散実行部１１５…負荷分散情報１１６…計算時間測定部２１−１〜２１−ｎ…プロセッサ・エレメント（ＰＥ）２２…相互結合網（ネットワーク）Ｋ…記録媒体 DESCRIPTION OF SYMBOLS 101 ... Source program 102 ... Runtime load distribution system 103 ... Monitoring program conversion part 104 ... Program analysis part 105 ... Program conversion part 106 ... Converted program 107 ... Parallel processing part 108 ... Executable program generation part 109 ... Computer system 110 executable program 111 runtime load distribution control unit 112 monitoring information 113 optimal load distribution determination unit 114 load rebalancing execution unit 115 load distribution information 116 computation time measurement units 21-1 to 21-n Processor element (PE) 22 ... Interconnection network (network) K ... Recording medium

Claims

【特許請求の範囲】[Claims]

【請求項１】データ並列プログラムの実行時に、該デ
ータ並列プログラムを実行する複数のプロセッサ・エレ
メントそれぞれのモニタリング情報に基づいて、データ
を前記複数のプロセッサ・エレメントに分割・配置する
際の最適なデータ分割幅を求め、該求めたデータ分割幅
に基づいてデータの再配置を行うことを特徴とする実行
時負荷分配システム。An optimal data for dividing and arranging data in a plurality of processor elements based on monitoring information of each of a plurality of processor elements executing the data parallel program when executing the data parallel program. A runtime load distribution system, wherein a division width is obtained, and data is rearranged based on the obtained data division width.

【請求項２】データ並列プログラムを変形してモニタ
リング情報の収集を可能にするモニタリング用プログラ
ム変換部と、変形されたデータ並列プログラムの実行時に、前記変形
されたデータ並列プログラムを実行する複数のプロセッ
サ・エレメントそれぞれのモニタリング情報に基づい
て、データを前記複数のプロセッサ・エレメントに分割
・配置する際の最適なデータ分割幅を算出する最適負荷
分散決定部と、該最適負荷分散決定部で算出されたデータ分割幅に基づ
いてデータの再配置を行う負荷再分散実行部とを備えた
ことを特徴とする実行時負荷分散システム。2. A monitoring program conversion unit that enables collection of monitoring information by modifying a data parallel program, and a plurality of processors that execute the modified data parallel program when the modified data parallel program is executed. An optimum load distribution determining unit that calculates an optimum data division width when dividing and arranging data into the plurality of processor elements based on the monitoring information of each element; A runtime load distribution system, comprising: a load redistribution execution unit that relocates data based on a data division width.

【請求項３】請求項１記載の実行時負荷分散システム
において、前記各プロセッサ・エレメントのモニタリング情報は、
前記各プロセッサ・エレメントが、前記データ並列プロ
グラム中の負荷分散制御指示文によって指定された処理
の実行に要した計算時間であり、データの再配置を行うことにより達成される前記負荷分
散制御指示文によって指定された処理に対する計算時間
の短縮量と、データの再配置を行うために必要になる通
信時間とに基づいて、データの再配置を行うか否かを決
定することを特徴とする実行時負荷分散システム。3. The runtime load balancing system according to claim 1, wherein the monitoring information of each processor element is:
The load balancing control instruction, which is a calculation time required for each processor element to execute a process specified by the load balancing control instruction in the data parallel program, and is achieved by relocating data. Determining whether or not to perform data relocation based on the amount of reduction in calculation time for the process specified by (i) and the communication time required to perform data relocation. Load balancing system.

【請求項４】請求項２記載の実行時負荷分散システム
において、前記変形されたデータ並列プログラムの実行中に、前記
各プロセッサ・エレメントが前記データ並列プログラム
中の負荷分散制御指示文によって指定された処理の実行
に要した計算時間を前記各プロセッサ・エレメントのモ
ニタリング情報として収集する計算時間測定部を備え、
且つ、前記最適負荷分散決定部が、データの再配置を行うこと
により達成される前記負荷分散制御指示文によって指定
された処理に対する計算時間の短縮量と、データの再配
置を行うために必要になる通信時間とに基づいて、デー
タの再配置を行うか否かを決定する構成を有することを
特徴とする実行時負荷分散システム。4. The runtime load distribution system according to claim 2, wherein during execution of the modified data parallel program, each of the processor elements is specified by a load distribution control instruction statement in the data parallel program. A calculation time measuring unit that collects calculation time required for execution of processing as monitoring information of each processor element,
In addition, the optimum load distribution determining unit is required to reduce the calculation time for the process specified by the load distribution control instruction statement achieved by performing the data relocation, and to perform the data relocation. A runtime load distribution system characterized by having a configuration for determining whether or not to relocate data based on a communication time.

【請求項５】請求項１または３記載の実行時負荷分散
システムにおいて、前記各プロセッサ・エレメントに初期分割するデータの
データ幅を、予め判明している前記各プロセッサ・エレ
メント間の計算速度比の概算値に応じたデータ幅とする
ことを特徴とする実行時負荷分散システム。5. The runtime load distribution system according to claim 1, wherein a data width of data to be initially divided into each of said processor elements is calculated based on a calculation speed ratio between each of said processor elements which is known in advance. A runtime load distribution system, wherein the data width is set according to an approximate value.

【請求項６】請求項２または４記載の実行時負荷分散
システムにおいて、前記モニタリング用プログラム変換部が、前記各プロセ
ッサ・エレメントに初期分割するデータのデータ幅を、
予め判明している前記各プロセッサ・エレメント間の計
算速度比の概算値に応じて決定する構成を有することを
特徴とする実行時負荷分散システム。6. The runtime load distribution system according to claim 2, wherein the monitoring program conversion unit sets a data width of data to be initially divided into each of the processor elements.
A runtime load distribution system, characterized in that the runtime load distribution system has a configuration that is determined according to an estimated value of a calculation speed ratio between the respective processor elements that is known in advance.

【請求項７】請求項１記載の実行時負荷分散システム
において、負荷分散の対象となる処理を複数回実行する場合、デー
タの再配置を行うことにより達成されるデータ並列プロ
グラムの全実行過程における前記処理に対する計算時間
の短縮量を予測し、該予測した短縮量とデータの再配置
を行うために必要になる通信時間とに基づいてデータの
再配置を行うか否かを決定することを特徴とする実行時
負荷分散システム。7. The run-time load distribution system according to claim 1, wherein when the processing to be load-balanced is executed a plurality of times, the data parallel program is executed by executing the data rearrangement. Predicting the amount of reduction in calculation time for the processing, and determining whether to perform data rearrangement based on the predicted amount of reduction and the communication time required to perform data rearrangement. And a runtime load balancing system.

【請求項８】請求項２記載の実行時負荷分散システム
において、前記最適負荷分散決定部が、負荷分散の対象となる処理
を複数回実行する場合、データの再配置を行うことによ
り達成されるデータ並列プログラムの全実行過程におけ
る前記処理に対する計算時間の短縮量を予測し、該予測
した短縮量とデータの再配置を行うために必要になる通
信時間とに基づいてデータの再配置を行うか否かを決定
する構成を有することを特徴とする実行時負荷分散シス
テム。8. The runtime load distribution system according to claim 2, wherein the optimal load distribution determining unit performs the data distribution when the processing to be subjected to load distribution is executed a plurality of times. Predict the amount of reduction in the calculation time for the processing in the entire execution process of the data parallel program, and perform data relocation based on the predicted amount of reduction and the communication time required to relocate the data? A runtime load distribution system, characterized in that the runtime load distribution system has a configuration for determining whether or not to perform the determination.

【請求項９】請求項７または８記載の実行時負荷分散
システムにおいて、前記負荷分散の対象となる処理が、前記データ並列プロ
グラム中の負荷分散制御指示文によって指定された処理
であり、前記各プロセッサ・エレメントのモニタリング情報は、
前記各プロセッサ・エレメントが、前記データ並列プロ
グラム中の負荷分散制御指示文によって指定された処理
の実行に要した計算時間であることを特徴とする実行時
負荷分散システム。9. The runtime load balancing system according to claim 7, wherein the processing to be load-balanced is processing specified by a load-balancing control directive in the data parallel program. The monitoring information of the processor element is
A runtime load distribution system, wherein each processor element is a calculation time required for executing a process specified by a load distribution control instruction statement in the data parallel program.

【請求項１０】請求項１記載の実行時負荷分散システ
ムにおいて、実行時間の変化を監視し、必要に応じてデータ分割幅の
決定とデータの再配置とを複数回に渡って行うことを特
徴とする実行時負荷分散システム。10. The runtime load distribution system according to claim 1, wherein a change in execution time is monitored, and a data division width is determined and data relocation is performed a plurality of times as necessary. And a runtime load balancing system.

【請求項１１】請求項２記載の実行時負荷分散システ
ムにおいて、前記最適負荷分散決定部が、実行時間の変化を監視し、
必要に応じてデータ分割幅の決定を複数回に渡って行う
構成を有し、前記負荷再分散実行部が、前記最適負荷分散決定部でデ
ータ分割幅が決定される毎に決定されたデータ分割幅に
従ってデータの再配置を行う構成を有することを特徴と
する実行時負荷分散システム。11. The runtime load distribution system according to claim 2, wherein the optimal load distribution determination unit monitors a change in execution time,
A data division width determination unit that determines a data division width a plurality of times as necessary, wherein the load rebalancing execution unit determines the data division width each time the data division width is determined by the optimal load distribution determination unit; A runtime load distribution system having a configuration for rearranging data according to a width.

【請求項１２】データ並列プログラムを変形してモニ
タリング情報の収集を可能にし、変形されたデータ並列プログラムの実行時に、前記変形
されたデータ並列プログラムを実行する複数のプロセッ
サ・エレメントそれぞれのモニタリング情報に基づい
て、データを前記複数のプロセッサ・エレメントに分割
・配置する際の最適なデータ分割幅を算出し、該算出されたデータ分割幅に基づいてデータの再配置を
行うことを特徴とする実行時負荷分散方法。12. The data parallel program is modified to enable collection of monitoring information, and when the modified data parallel program is executed, the monitoring information of each of a plurality of processor elements executing the modified data parallel program is added to the monitoring information. Calculating an optimum data division width when dividing and arranging data into the plurality of processor elements based on the calculated data division width, and rearranging the data based on the calculated data division width. Load balancing method.

【請求項１３】請求項１２記載の実行時負荷分散方法
において、前記各プロセッサ・エレメントのモニタリング情報は、
前記各プロセッサ・エレメントが、前記データ並列プロ
グラム中の負荷分散制御指示文によって指定された処理
の実行に要した計算時間であり、データの再配置を行うことにより達成される前記負荷分
散制御指示文によって指定された処理に対する計算時間
の短縮量と、データの再配置を行うために必要になる通
信時間とに基づいて、データの再配置を行うか否かを決
定することを特徴とする実行時負荷分散方法。13. The runtime load distribution method according to claim 12, wherein the monitoring information of each processor element is:
The load balancing control instruction, which is a calculation time required for each processor element to execute a process specified by the load balancing control instruction in the data parallel program, and is achieved by relocating data. Determining whether to perform data relocation based on the amount of reduction in calculation time for the process specified by, and the communication time required to perform data relocation. Load balancing method.

【請求項１４】請求項１２または１３記載の実行時負
荷分散方法において、前記各プロセッサ・エレメントに初期分割するデータの
データ幅を、予め判明している前記各プロセッサ・エレ
メント間の計算速度比の概算値に応じたデータ幅とする
ことを特徴とする実行時負荷分散方法。14. The run-time load distribution method according to claim 12, wherein a data width of data to be initially divided into the processor elements is calculated based on a calculation speed ratio between the processor elements which is known in advance. A runtime load distribution method, wherein a data width is set according to an approximate value.

【請求項１５】請求項１２記載の実行時負荷分散方法
において、負荷分散の対象となる処理を複数回実行する場合、デー
タの再配置を行うことにより達成されるデータ並列プロ
グラムの全実行過程における前記処理に対する計算時間
の短縮量を予測し、該予測した短縮量とデータの再配置
を行うために必要になる通信時間とに基づいてデータの
再配置を行うか否かを決定することを特徴とする実行時
負荷分散方法。15. The run-time load balancing method according to claim 12, wherein when the processing to be load-balanced is executed a plurality of times, the data parallel program is executed in all of the execution steps of the data parallel program. Predicting the amount of reduction in calculation time for the processing, and determining whether to perform data rearrangement based on the predicted amount of reduction and the communication time required to perform data rearrangement. Run-time load balancing method.

【請求項１６】請求項１５記載の実行時負荷分散方法
において、前記負荷分散の対象となる処理が、前記データ並列プロ
グラム中の負荷分散制御指示文によって指定された処理
であり、前記各プロセッサ・エレメントのモニタリング情報は、
前記各プロセッサ・エレメントが、前記データ並列プロ
グラム中の負荷分散制御指示文によって指定された処理
の実行に要した計算時間であることを特徴とする実行時
負荷分散方法。16. The runtime load balancing method according to claim 15, wherein the processing to be load-balanced is processing specified by a load-balancing control instruction statement in the data parallel program. The monitoring information of the element
A runtime load balancing method, wherein each processor element is a calculation time required for executing a process specified by a load balancing control instruction statement in the data parallel program.

【請求項１７】請求項１２記載の実行時負荷分散方法
において、実行時間の変化を監視し、必要に応じてデータ分割幅の
決定を複数回に渡って行い、データ分割幅が決定される毎に決定されたデータ分割幅
に従ってデータの再配置を行うことを特徴とする実行時
負荷分散方法。17. The runtime load distribution method according to claim 12, wherein a change in execution time is monitored, and a data division width is determined a plurality of times as necessary. A runtime load balancing method, wherein data is rearranged in accordance with the data division width determined in (1).

【請求項１８】コンピュータに、データ並列プログラムを変形してモニタリング情報の収
集を可能にするモニタリング用プログラム変換処理と、変形されたデータ並列プログラムの実行時に、前記変形
されたデータ並列プログラムを実行する複数のプロセッ
サ・エレメントそれぞれのモニタリング情報に基づい
て、データを前記複数のプロセッサ・エレメントに分割
・配置する際の最適なデータ分割幅を算出する最適負荷
分散決定処理と、該最適負荷分散決定処理で算出されたデータ分割幅に基
づいてデータの再配置を行う負荷再分散実行処理とを実
行させることを特徴とするプログラム。18. A monitoring program conversion process for enabling a computer to collect a monitoring information by transforming a data parallel program, and executing the transformed data parallel program when the transformed data parallel program is executed. An optimal load distribution determining process for calculating an optimal data division width when dividing and arranging data into the plurality of processor elements based on monitoring information of each of the plurality of processor elements; A program for executing a load rebalancing execution process of relocating data based on a calculated data division width.

【請求項１９】請求項１８記載のプログラムにおい
て、前記各プロセッサ・エレメントのモニタリング情報は、
前記各プロセッサ・エレメントが、前記データ並列プロ
グラム中の負荷分散制御指示文によって指定された処理
の実行に要した計算時間であり、前記最適負荷分散決定処理が、データの再配置を行うこ
とにより達成される前記負荷分散制御指示文によって指
定された処理に対する計算時間の短縮量と、データの再
配置を行うために必要になる通信時間とに基づいて、デ
ータの再配置を行うか否かを決定することを特徴とする
プログラム。19. The program according to claim 18, wherein the monitoring information of each processor element is:
The calculation time required for each processor element to execute a process specified by a load balancing control instruction statement in the data parallel program, and the optimal load balancing determining process is achieved by relocating data. Determining whether to perform data relocation based on the amount of reduction in calculation time for the process specified by the load balancing control instruction statement to be performed and the communication time required to perform data relocation. A program characterized by the following.

【請求項２０】請求項１８または１９記載のプログラ
ムにおいて、前記モニタリング用プログラム変換処理が、前記各プロ
セッサ・エレメントに初期分割するデータのデータ幅
を、予め判明している前記各プロセッサ・エレメント間
の計算速度比の概算値に応じて決定することを特徴とす
るプログラム。20. The program according to claim 18 or 19, wherein the monitoring program conversion processing is performed between the respective processor elements for which the data width of data to be initially divided into the respective processor elements is known in advance. A program which is determined according to an approximate value of a calculation speed ratio.

【請求項２１】請求項１８記載のプログラムにおい
て、前記最適負荷分散決定処理が、負荷分散の対象となる処
理を複数回実行する場合、データの再配置を行うことに
より達成されるデータ並列プログラムの全実行過程にお
ける前記処理に対する計算時間の短縮量を予測し、該予
測した短縮量とデータの再配置を行うために必要になる
通信時間とに基づいてデータの再配置を行うか否かを決
定することを特徴とするプログラム。21. The program according to claim 18, wherein the optimum load balancing determination process is performed by relocating data when executing a process to be load balanced a plurality of times. Predict the amount of reduction in calculation time for the processing in the entire execution process, and determine whether to perform data relocation based on the predicted amount of reduction and the communication time required to perform data relocation. A program characterized by the following.

【請求項２２】請求項１８記載のプログラムにおい
て、前記最適負荷分散決定処理が、実行時間の変化を監視
し、必要に応じてデータ分割幅の決定を複数回に渡って
行い、前記負荷再分散実行処理が、前記最適負荷分散決定処理
でデータ分割幅が決定される毎に決定されたデータ分割
幅に従ってデータの再配置を行うことを特徴とするプロ
グラム。22. The program according to claim 18, wherein the optimum load distribution determining process monitors a change in execution time, and determines a data division width a plurality of times as necessary. A program wherein the execution process rearranges data in accordance with the data division width determined each time the data division width is determined in the optimum load distribution determination processing.