JP2001005792A

JP2001005792A - Method for deciding paralleled loop

Info

Publication number: JP2001005792A
Application number: JP11174972A
Authority: JP
Inventors: Makoto Sato; 真琴佐藤
Original assignee: REAL WORLD COMPUTING PARTNERSH; Hitachi Ltd; Real World Computing Partnership
Current assignee: REAL WORLD COMPUTING PARTNERSH; Hitachi Ltd; Real World Computing Partnership
Priority date: 1999-06-22
Filing date: 1999-06-22
Publication date: 2001-01-12

Abstract

PROBLEM TO BE SOLVED: To reduce the number of combinations of paralleled loop candidates and to permit flexible loop division that does not use approximately extending over a plurality of loop nests. SOLUTION: A reference pattern detection processing part 104 detects the data reference pattern of array data in a loop with respect to a loop control variable for each parallelable loop, a loop type graph preparation processing part 105 puts together loop groups having the same data reference pattern to one sort, and a paralleled loop decision processing part 106 decides the allocated range to each processor from the execution range of each loop for each loop in a loop sort, evaluates an execution period in the case of paralleling each loop according to the range and decides an optimum paralleled loop.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ソースプログラム
を入力して、並列計算機向けにループを分割する指示文
を含むプログラムまたは並列化されたオブジェクトコー
ドを出力する並列化ツールまたは並列化コンパイラに係
わり、特に、ループを並列化する場合に、その最適な候
補の選択を効率的に行なうのに好適な並列化ループ決定
方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallelizing tool or a parallelizing compiler for inputting a source program and outputting a program including a directive for dividing a loop for a parallel computer or a parallelized object code. In particular, the present invention relates to a parallelized loop determination method suitable for efficiently selecting an optimal candidate when a loop is parallelized.

【０００２】[0002]

【従来の技術】従来、ソースプログラムを入力して、主
として共有メモリ型並列計算機向けに、ループを分割す
る指示文を含むプログラムまたは並列化されたオブジェ
クトコードを出力する並列化ツールまたは並列化コンパ
イラでは、例えば、コンスタンチンディーポリクロノ
ポーラス、ディビッドジェークック、およびディビ
ッドエイパデュア著、「ユーティライジングマルチ
ディメンショナルループパラレリズムオンラージス
ケールパラレルプロセッサシステムズ」（アイイーイ
ーイートランザクションズオンコンピューターズ、v
ol.３８,no.９、９月、１９８９年）の第１２８５頁か
ら第１２９６頁（Constantine D. Polychronopoulos, D
avid J. Kuck, and David A. Padua, Utilizing Multid
imensionalLoop Parallelism on Large-Scale Parallel
Processor Systems, IEEE Transactions on computer
s, vol.38, no.9, september 1989, pp.1285-1296）で
論じられているように、一つのループネスト内のみに注
目して、並列化ループを決定していた（第１の従来技
術）。2. Description of the Related Art Conventionally, a parallelizing tool or a parallelizing compiler for inputting a source program and outputting a program including a directive to divide a loop or a parallelized object code mainly for a shared memory type parallel computer has been proposed. See, for example, "Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems" by Konstantin Dee Polychronoporous, David J. Cook and David A. Padua (IEE Transactions on Computers, v
ol. 38, no. 9, September, 1989), pages 1285 to 1296 (Constantine D. Polychronopoulos, D.
avid J. Kuck, and David A. Padua, Utilizing Multid
imensionalLoop Parallelism on Large-Scale Parallel
Processor Systems, IEEE Transactions on computer
s, vol. 38, no. 9, september 1989, pp. 1285-1296), the parallelized loop was determined by focusing only on one loop nest (first). Prior art).

【０００３】また、ディージェーパレルモ、ピィー
バナジー著、オートマチックセレクションオブダイ
ナミックパーティショニングスキームズフォーディ
ストリビューテッドメモリーマルチコンピューター
ズ、インシィーエイチファン等編、プロシーディン
グスオブエイスアニュアルワークショップオンラ
ンゲージズアンドコンパイラーズフォーパラレル
コンピューティング、コロンバス、オハイオ、オーガス
ト、１９９５、レクチャーノーツインコンピュータ
サイエンス１０３３、シュプリンガーフェアラーク、
第３９２頁から第４０６頁(D. J. Palermo, P. Banerje
e, Automatic selection of dynamic partitioning sch
emes for distributed-memory multicomputers, In C.-
H. Huang etal. (Eds.), Proceedings of 8th Annual W
orkshop on Languages and Compilers for Parallel Co
mputing, Columbus, Ohio, August 1995, Lecture Note
s inComputer Science 1033, Springer Verlag, pp.392
-406)で論じられているように、データ分割問題を近似
的に解くことで、並列化ループを決定している（第２の
従来技術）。[0003] Deejay Palermo, Pee
By Vanergy, Automatic Selection of Dynamic Partitioning Schemes for Distributed Memory Multicomputers, Incy H Fans, etc., Proceedings of Ace Annual Workshop on Languages and Compilers for Parallels
Computing, Columbus, Ohio, August, 1995, Lecture Notes in Computer
Science 1033, Springer Verlag,
392 to 406 (DJ Palermo, P. Banerje
e, Automatic selection of dynamic partitioning sch
emes for distributed-memory multicomputers, In C.-
H. Huang etal. (Eds.), Proceedings of 8th Annual W
orkshop on Languages and Compilers for Parallel Co
mputing, Columbus, Ohio, August 1995, Lecture Note
s inComputer Science 1033, Springer Verlag, pp.392
As discussed in -406), a parallelization loop is determined by approximately solving a data division problem (second conventional technique).

【０００４】[0004]

【発明が解決しようとする課題】解決しようとする問題
点は、上記第１の従来技術では、複数のループネストに
またがったループ並列化を考慮せず各々のループネスト
にのみ注目して並列化するため、先行するループネスト
で一つのプロセッサが書き換えたデータを、後続のルー
プネストで他のプロセッサが読み込む場合、ループネス
ト間でプロセッサ間同期が必要になり、その同期の時間
だけ、プログラムの実行が遅くなってしまう点である。The problem to be solved is that, in the first prior art, the parallelization is performed by paying attention only to each loop nest without considering the loop parallelization over a plurality of loop nests. Therefore, if data read by one processor in the preceding loop nest is read by another processor in the following loop nest, inter-processor synchronization is required between the loop nests. Is that it becomes slow.

【０００５】また、上記第２の従来技術では、近似的に
解くため、および、データ分割パターンが決まっており
ループ分割方法に柔軟性がないため、必ずしも最良の解
が得られずに、プログラムの実行が遅くなるという点で
ある。In the second prior art, the best solution cannot always be obtained because the approximate solution is obtained and the data division pattern is fixed and the loop division method is not flexible. The point is that execution is slow.

【０００６】本発明の目的は、これら従来技術の課題を
解決し、複数のループネストにまたがったループ並列化
を考慮した最適な並列ループの決定を短い解析時間で可
能とすること、および、近似を用いずに最適な並列ルー
プの決定を短い解析時間で可能とすること、さらに、ル
ープ分割をより柔軟に行なって最適な並列ループの決定
を可能とする並列化ループ決定方法を提供することであ
る。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems of the prior art, and to make it possible to determine an optimal parallel loop in a short analysis time in consideration of parallelization of a loop over a plurality of loop nests, and to provide an approximation. By enabling the optimal parallel loop to be determined in a short analysis time without using, and by providing a parallelized loop determining method that enables the optimal parallel loop to be determined by making the loop division more flexible. is there.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するた
め、本発明の並列化ループ決定方法は、ソースプログラ
ムを入力して、並列化可能なループを検出し、この検出
した並列化可能なループを並列計算機向けに分割する指
示文を含むプログラムもしくは並列化されたオブジェク
トコードを出力する並列化ツールまたは並列化コンパイ
ラによる並列化可能なループの決定方法において、参照
パターン検出処理部を設けて、各並列可能なループに対
し、ループ制御変数に対するループ内のデータのデータ
参照パターンを検出し、またループ類グラフ作成処理部
を設けて、同じデータ参照パターンを持つループ群を１
つの類に含め、それらの類の間の制御フロー関係を表わ
すループ類グラフを作成し、そして並列化ループ決定処
理部を設けて、ループ類グラフを用いて、ループ類中の
全てのループに対し、各プロセッサが分担して計算する
データの分担範囲を決定し、その分担に従って各ループ
を並列化した場合の実行時間を評価し、得られた評価値
から最適な並列化ループを選択する処理を行なうことを
特徴とする。In order to achieve the above object, a method for determining a parallelized loop according to the present invention detects a parallelizable loop by inputting a source program, and detects the detected parallelizable loop. In a method of determining a loop that can be parallelized by a parallelization tool or a parallelization compiler that outputs a program including a directive or a parallelized object code that divides the parallelized code for a parallel computer, a reference pattern detection processing unit is provided. For a loop that can be parallelized, a data reference pattern of data in the loop with respect to the loop control variable is detected, and a loop type graph creation processing unit is provided so that one loop group having the same data reference pattern is set.
A loop class graph representing the control flow relations between the classes, and a parallelized loop decision processing unit is provided, and the loop class graph is used for all the loops in the loop class. The processing for determining the sharing range of the data to be shared and calculated by each processor, evaluating the execution time when each loop is parallelized according to the sharing, and selecting the optimal parallelized loop from the obtained evaluation value is performed. It is characterized by performing.

【０００８】[0008]

【発明の実施の形態】以下、本発明の実施の形態を、図
面により詳細に説明する。図１は、本発明の並列化ルー
プ決定方法を実行するシステムの構成例を示すブロック
図であり、図１６は、図１におけるシステムのハードウ
ェア構成例を示すブロック図である。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of a system that executes the parallelized loop determination method of the present invention, and FIG. 16 is a block diagram showing a hardware configuration example of the system in FIG.

【０００９】図１６において、１はＣＲＴ（Cathode Ra
y Tube）等からなり文字や画像を表示出力する表示装
置、１はキーボードやマウス等からなり操作者からの指
示を入力する入力装置、３はＨＤＤ（Hard Disk Driv
e）等からなり大容量のデータやプログラムを記憶する
外部記憶装置、４はＣＰＵ（Central Processing Uni
t）や主メモリを有して蓄積プログラム方式によるコン
ピュータ処理を行なう情報処理装置、５は本発明の処理
手順に係るプログラムやデータを記録した記録媒体とし
ての光ディスク、６は情報処理装置４からの指示に基づ
き外部記憶装置３に記憶させる光ディスク５内のデータ
やプログラムを読み出す駆動装置である。In FIG. 16, reference numeral 1 denotes a CRT (Cathode Ra).
y Tube) etc., a display device for displaying and outputting characters and images, 1 an input device consisting of a keyboard, a mouse, etc. for inputting instructions from an operator, 3 an HDD (Hard Disk Drive)
e) and an external storage device for storing large amounts of data and programs, and 4 is a CPU (Central Processing Uniform).
t), an information processing apparatus having a main memory and performing computer processing by a storage program method, 5 is an optical disk as a recording medium on which a program and data according to the processing procedure of the present invention are recorded, and 6 is an information processing apparatus The drive device reads out data and programs in the optical disk 5 to be stored in the external storage device 3 based on instructions.

【００１０】情報処理装置４は、外部記憶装置３に記憶
した光ディスク５からのデータやプログラムを主メモリ
にロードすることにより、図１に示す各処理部からなる
並列化コンパイラ１００を構成する。The information processing device 4 loads the data and programs from the optical disk 5 stored in the external storage device 3 into the main memory to configure the parallelizing compiler 100 including the respective processing units shown in FIG.

【００１１】図１において、１００は本発明に係る処理
を行なう並列化コンパイラ、１１０は並列化の処理対象
となるソースプログラムである。並列化コンパイラ１０
０は、構文解析処理部１０１、データ依存解析処理部１
０２、並列性解析処理部１０３、参照パターン検出処理
部１０４、ループ類グラフ作成処理部１０５、並列化ル
ープ決定処理部１０６、ループ並列化処理部１０７、お
よび、コード生成処理部１０８により構成されている。In FIG. 1, reference numeral 100 denotes a parallelizing compiler for performing the processing according to the present invention, and 110 denotes a source program to be processed for parallelization. Parallelizing compiler 10
0 indicates the syntax analysis processing unit 101 and the data dependence analysis processing unit 1
02, a parallelism analysis processing unit 103, a reference pattern detection processing unit 104, a loop type graph creation processing unit 105, a parallelization loop determination processing unit 106, a loop parallelization processing unit 107, and a code generation processing unit 108. I have.

【００１２】構文解析処理部１０１は、ソースプログラ
ム１１０を入力して辞書１２０および中間語１３０を出
力する。データ依存解析処理部１０２は、辞書１２０お
よび中間語１３０を入力してデータ依存関係を解析し、
同時にループテーブル１４０も出力する。The syntax analysis processing unit 101 inputs a source program 110 and outputs a dictionary 120 and an intermediate language 130. The data dependency analysis processing unit 102 analyzes the data dependency by inputting the dictionary 120 and the intermediate language 130,
At the same time, a loop table 140 is also output.

【００１３】並列性解析処理部１０３は、辞書１２０と
中間語１３０、および、データ依存解析処理部１０２か
らのデータ依存解析情報を入力して、プログラムのルー
プの並列化可能性を判定し、その結果をループテーブル
１４０に出力する。以上、構文解析処理部１０１とデー
タ依存解析処理部１０２および並列性解析処理部１０３
の処理は従来技術に基づくものである。The parallelism analysis processing unit 103 inputs the dictionary 120, the intermediate language 130, and the data dependence analysis information from the data dependence analysis processing unit 102, and determines the parallelizability of the program loop. The result is output to the loop table 140. As described above, the syntax analysis processing unit 101, the data dependence analysis processing unit 102, and the parallelism analysis processing unit 103
Is based on the prior art.

【００１４】参照パターン検出処理部１０４は、辞書１
２０と中間語１３０およびループテーブル１４０からの
情報を入力して、並列性解析処理部１０３で並列化可能
と判定されたループ中に現れる配列参照の添字に、その
配列参照を囲むループ群のループ制御変数がどのような
パターンで現れるかを検出して、その結果を参照パター
ンテーブル１５０に出力する。The reference pattern detection processing unit 104 stores the dictionary 1
20, the information from the intermediate language 130 and the loop table 140 are input, and the subscript of the array reference appearing in the loop determined to be parallelizable by the parallelism analysis processing unit 103 is added to the loop of the loop group surrounding the array reference. The pattern in which the control variable appears is detected, and the result is output to the reference pattern table 150.

【００１５】ループ類グラフ作成処理部１０５は、ルー
プテーブル１４０および参照パターンテーブル１５０か
らの情報を入力し、ループ群を後に述べるループ類に分
類して、各ループ類を１つのノードとするループ類グラ
フ１６０を出力する。The loop type graph creation processing unit 105 receives information from the loop table 140 and the reference pattern table 150, classifies the loops into loops to be described later, and sets the loops as one node. A graph 160 is output.

【００１６】並列化ループ決定処理部１０６は、中間語
１３０とループテーブル１４０およびループ類グラフ１
６０からの情報を入力して、ループ類に含まれるループ
は１つのループであるかのように扱うことによって、並
列ループ候補の組合わせ数を削減し、削減された各組合
わせに対してプログラムの実行時間を評価し、最も実行
時間が短いと評価された時の並列ループの組合わせを選
択して、その結果をループテーブル１４０に出力する。
以上、参照パターン検出処理部１０４とループ類グラフ
作成処理部１０５および並列化ループ決定処理部１０６
の処理は本発明に係わるものである。The parallelized loop determination processing unit 106 includes an intermediate language 130, a loop table 140, and a loop type graph 1
60, the number of combinations of parallel loop candidates is reduced by treating the loops included in the loops as if they were one loop, and a program is executed for each reduced combination. Is evaluated, the combination of the parallel loops when the execution time is estimated to be the shortest is selected, and the result is output to the loop table 140.
As described above, the reference pattern detection processing unit 104, the loop type graph creation processing unit 105, and the parallelized loop determination processing unit 106
Is related to the present invention.

【００１７】ループ並列化処理部１０７は、従来技術に
基づき辞書１２０と中間語１３０およびループテーブル
１４０からの情報を入力して、並列化が決定されたルー
プを並列化変換して、その結果を辞書１２０および中間
語１３０に反映する。コード生成処理部１０８も、従来
技術に基づき辞書１２０と中間語１３０を入力して、ソ
ースプログラムレベルまたはオブジェクトコードレベル
の並列化プログラム１８０を出力する。The loop parallel processing unit 107 inputs information from the dictionary 120, the intermediate language 130, and the loop table 140 based on the prior art, converts the loop determined to be parallel into parallel, and converts the result. This is reflected in the dictionary 120 and the intermediate language 130. The code generation processing unit 108 also inputs the dictionary 120 and the intermediate language 130 based on the conventional technology, and outputs a source program level or object code level parallelized program 180.

【００１８】ループ類グラフ作成処理部１０５は、細分
テーブル作成処理部１０５１と、ループ類検出処理部１
０５２、エッジ接続処理部１０５３からなり、細分テー
ブル作成処理部１０５１では、ループテーブル１４０を
入力してループネスト中の最内側ループに対応したテー
ブルを作成して、そのテーブル間を最内側ループを含
む、あるループ同士の制御フローに従って接続し、その
結果を細分ループテーブル１７０として出力する。The loop type graph creation processing unit 105 includes a subdivision table creation processing unit 1051 and a loop type detection processing unit 1.
052, an edge connection processing unit 1053, the subdivision table creation processing unit 1051 receives the loop table 140, creates a table corresponding to the innermost loop in the loop nest, and includes the innermost loop between the tables. Are connected according to a control flow of a certain loop, and the result is output as a subdivided loop table 170.

【００１９】また、ループ類検出処理部１０５２は、ル
ープテーブル１４０と参照パターンテーブル１５０およ
び細分ループテーブル１７０からの情報を入力して、同
値な関係にあるループテーブルをループ類グラフ１６０
のノードであるループ類テーブルに接続して、ノード間
がエッジで未接続な状態のループ類グラフ１６０を出力
する。The loop detection processing unit 1052 receives information from the loop table 140, the reference pattern table 150, and the subdivision loop table 170, and converts the loop tables having the same relation into the loop class graph 160.
And outputs a loop type graph 160 in a state where the nodes are not connected by edges at the edges.

【００２０】また、エッジ接続処理部１０５３は、ルー
プテーブル１４０と細分ループテーブル１７０およびル
ープ類グラフ１６０からの情報を入力して、ループ類グ
ラフのノード間がエッジで接続されたループ類グラフ１
６０を出力する。The edge connection processing unit 1053 receives information from the loop table 140, the subdivision loop table 170, and the loop type graph 160, and generates a loop type graph 1 in which nodes of the loop type graph are connected by edges.
60 is output.

【００２１】このような構成の並列化コンパイラ１００
の具体的な動作例を、図２〜図１５を用いて説明する。
図７は、図１におけるソースプログラムの具体例を示す
説明図である。図７において、行７００は、Fortranの
文法による配列a,ｂ,ｃおよびスカラ変数ｘ, ｙの宣言
である。The parallelizing compiler 100 having such a configuration is described.
A specific operation example will be described with reference to FIGS.
FIG. 7 is an explanatory diagram showing a specific example of the source program in FIG. In FIG. 7, a line 700 is a declaration of arrays a, b, c and scalar variables x, y according to Fortran grammar.

【００２２】行７０１と行７１２で囲まれた部分は、k
をループ制御変数とするループである。以下、ループは
先頭行の行番号を用いて表わす。即ち、このループはル
ープ７０１と表わす。行７０２と行７０６で囲まれた部
分は、jをループ制御変数とするループ７０２であり、
行７０３と行７０５で囲まれた部分は、iをループ制御
変数とするループ７０３である。行７０７と行７１１で
囲まれた部分も同様に、iをループ制御変数とするルー
プ７０７であり、行７０８と行７１０で囲まれた部分も
同様に、ｊをループ制御変数とするループ７０８であ
る。The portion between the rows 701 and 712 is k
Is a loop in which is a loop control variable. Hereinafter, the loop is represented using the line number of the first line. That is, this loop is represented as a loop 701. A portion surrounded by a row 702 and a row 706 is a loop 702 in which j is a loop control variable.
A portion surrounded by the lines 703 and 705 is a loop 703 in which i is a loop control variable. Similarly, a portion surrounded by rows 707 and 711 is a loop 707 in which i is a loop control variable, and a portion surrounded by rows 708 and 710 is also a loop 708 in which j is a loop control variable. is there.

【００２３】以下、並列化コンパイラ１００の中の個々
の処理を説明する。構文解析処理部１０１は、ソースプ
ログラム１１０を入力して辞書１２０と中間語１３０を
生成する。尚、中間語１３０はソースプログラム１１０
に対応しているので、以下の説明では、図７のソースプ
ログラム１１０を、中間語１３０のソースプログラムス
タイルの表現として代用する。また、辞書１３０の詳細
は省略する。Hereinafter, individual processes in the parallelizing compiler 100 will be described. The syntax analysis processing unit 101 receives the source program 110 and generates a dictionary 120 and an intermediate language 130. The intermediate language 130 is the source program 110
Therefore, in the following description, the source program 110 of FIG. 7 will be used as the expression of the source program style of the intermediate language 130. Details of the dictionary 130 are omitted.

【００２４】データ依存解析処理部１０２および並列性
解析処理部１０３の詳細動作に関しては、ハンス・ジー
マ、バーバラ・チャップマン著「スーパーコンパイラー
ズフォーパラレルアンドベクトルコンピューター
ズ」（アディッソン-ウエスリー、１９９１）（Hans Zi
ma and Barbara Chapman. "Supercompilers for Parall
el and Vector Computers", Addison-Wesley, 1991.）
に説明されているのでここでの説明は省略する。Regarding the detailed operations of the data dependence analysis processing section 102 and the parallelism analysis processing section 103, see "Super Compilers for Parallel and Vector Computers" by Hans Zima and Barbara Chapman (Addison-Wesley, 1991) (Hans Zi
ma and Barbara Chapman. "Supercompilers for Parall
el and Vector Computers ", Addison-Wesley, 1991.)
Therefore, the description is omitted here.

【００２５】参照パターン検出処理部１０４は、辞書１
２０と中間語１３０およびループテーブル１４０を入力
して、並列性解析処理部１０３で並列化可能と判定され
たループ中に現れる各配列参照の添字に、その配列参照
を囲むループ群のループ制御変数がどのようなパターン
で現れるかを検出して、その結果を参照パターンテーブ
ル１５０に出力する。The reference pattern detection processing unit 104 stores the dictionary 1
20 and the intermediate language 130 and the loop table 140, and the subscript of each array reference appearing in the loop determined to be parallelizable by the parallelism analysis processing unit 103 is a loop control variable of a loop group surrounding the array reference. Is detected in what pattern, and the result is output to the reference pattern table 150.

【００２６】以下、この参照パターン検出処理部１０４
の処理動作例を説明する。図２は、図１における参照パ
ターン検出処理部の処理動作例を示す説明図であり、図
８は、図１における参照パターンテーブルの構成例を示
す説明図である。図８に示す例は、図１の参照パターン
検出処理部１０４が図７の入力プログラム１１０に対し
て図２の処理を行なった結果得られた参照パターンテー
ブル１５０の例である。Hereinafter, the reference pattern detection processing unit 104
An example of the processing operation will be described. FIG. 2 is an explanatory diagram illustrating an example of a processing operation of the reference pattern detection processing unit in FIG. 1, and FIG. 8 is an explanatory diagram illustrating a configuration example of the reference pattern table in FIG. The example shown in FIG. 8 is an example of the reference pattern table 150 obtained as a result of the reference pattern detection processing unit 104 of FIG. 1 performing the processing of FIG. 2 on the input program 110 of FIG.

【００２７】図８において、参照パターンテーブル群８
００，８１０，８２０，８３０，８４０は各々、図７に
おけるループ７０１，７０２，７０３，７０７，７０８
における、配列aに対する参照パターンテーブルであ
る。このような参照パターンテーブルの各フィールドに
ついて、テーブル８００を例に用いて説明する。In FIG. 8, reference pattern table group 8
00, 810, 820, 830, and 840 are loops 701, 702, 703, 707, and 708 in FIG.
5 is a reference pattern table for an array a. Each field of the reference pattern table will be described using the table 800 as an example.

【００２８】フィールド８０１，８０３，８０５には、
各々、配列aの１，２，３次元目の添字の特徴を表わす
値が格納される。ここで、添字の特徴を表わす値は"li
n"、"inv"、"oth"の３種類であり、各々、添字が現ルー
プのループ制御変数の１次式であること（lin）、添字
が現ループ不変であること（inv）、上記２つの場合以
外であること（oth）を示す。The fields 801, 803, 805 contain
Each stores a value representing the characteristic of the first, second, or third dimension of the subscript of the array a. Here, the value representing the characteristic of the subscript is "li
n "," inv ", and" oth ", where the subscript is a linear expression of the loop control variable of the current loop (lin), and the subscript is that the current loop is invariant (inv). It indicates that it is other than two cases (oth).

【００２９】また、フィールド８０２，８０４，８０６
には、各々、配列aの１，２，３次元目の添字の特徴を
表わす値が"lin"、"inv"である時にその添字が格納され
る。"oth"の時には常にNULLである。Also, fields 802, 804, and 806
Is stored when the value representing the feature of the first, second and third subscripts of the array a is "lin" or "inv", respectively. Always null for "oth".

【００３０】以下、図７の入力プログラム１１０中の、
ループ７０３とループ７０２の２つのループ中の配列a
に、参照パターン検出処理部１０４の処理を適用した結
果について説明する。まず、ループ７０３中の配列aに
参照パターン検出処理部１０４の処理を適用する。Hereinafter, in the input program 110 of FIG.
Sequence a in two loops, loop 703 and loop 702
Next, the result of applying the processing of the reference pattern detection processing unit 104 will be described. First, the processing of the reference pattern detection processing unit 104 is applied to the array a in the loop 703.

【００３１】すなわち、図２に示したステップ２００で
の処理を配列aの１次元目に適用すると、１次元目の添
字iは、現ループ７０３のループ制御変数の１次式なの
で、ステップ２００での処理中の（１）に相当し、値"l
in"とその添字"i"を、図８における参照パターンテーブ
ル８２０の１次元目に設定する。これらは各々、フィー
ルド８２１，８２２に対応する。That is, when the processing in step 200 shown in FIG. 2 is applied to the first dimension of the array a, the subscript i in the first dimension is a linear expression of the loop control variable of the current loop 703. Corresponds to (1) during the processing of
"in" and its subscript "i" are set in the first dimension of the reference pattern table 820 in Fig. 8. These correspond to the fields 821 and 822, respectively.

【００３２】同様にして、図２のステップ２００での処
理を配列aの２次元目に適用すると、２次元目の添字ｊ
は、現ループ７０３に対してループ不変なので、ステッ
プ２００での処理中の（２）に相当し、値"inv"とその
添字"ｊ"を、図８の参照パターンテーブル８２０の２次
元目に設定する。これらは各々、フィールド８２３，８
２４に対応する。Similarly, when the processing in step 200 of FIG. 2 is applied to the second dimension of the array a, the subscript j of the second dimension
Is invariant to the current loop 703, and thus corresponds to (2) being processed in step 200. The value “inv” and its subscript “j” are added to the second dimension of the reference pattern table 820 in FIG. Set. These are respectively the fields 823, 8
Corresponding to 24.

【００３３】同様にして、図２のステップ２００での処
理を配列aの３次元目に適用すると、３次元目の添字ｋ
は、現ループ７０３に対してループ不変なので、ステッ
プ２００での処理中の（２）に相当し、値"inv"とその
添字"ｋ"を、図８の参照パターンテーブル８２０の３次
元目に設定する。これらは各々、フィールド８２５，８
２６に対応する。Similarly, when the processing in step 200 of FIG. 2 is applied to the third dimension of the array a, the subscript k of the third dimension
Is invariant to the current loop 703, and thus corresponds to (2) in the processing in step 200. The value “inv” and its subscript “k” are stored in the third dimension of the reference pattern table 820 in FIG. Set. These are respectively the fields 825, 8
26.

【００３４】次に、図７に示すソーステーブル１１０に
おけるループ７０２中の配列aに図２に示す参照パター
ン検出処理部１０４の処理を適用する。図２のステップ
２００での処理を配列aの１次元目に適用すると、１次
元目の添字iは、現ループ７０２の１回の繰り返しにお
いて、ループ７０３が示すように１〜３０までの値を取
る。Next, the processing of the reference pattern detection processing unit 104 shown in FIG. 2 is applied to the array a in the loop 702 in the source table 110 shown in FIG. When the processing in step 200 of FIG. 2 is applied to the first dimension of the array a, the subscript i of the first dimension indicates a value of 1 to 30 as shown by the loop 703 in one iteration of the current loop 702. take.

【００３５】従って、現ループ７０２のループ制御変数
の１次式にならず、また、ループ不変にもならないの
で、ステップ２００での処理中の（３）に相当し、値"o
th"を図８の参照パターンテーブル８１０の１次元目に
設定する。これらは、フィールド８１１，８１２に対応
する。しかし添字は設定されないので、フィールド８１
２はNULLである。Therefore, since it does not become a linear expression of the loop control variable of the current loop 702 and does not become a loop invariant, it corresponds to (3) in the processing in step 200 and has a value "o".
"th" is set in the first dimension of the reference pattern table 810 in FIG. 8. These correspond to the fields 811, 812. However, since the subscript is not set, the field 81 is set.
2 is NULL.

【００３６】次に、図２のステップ２００での処理を配
列aの２次元目に適用すると、２次元目の添字ｊは、現
ループ７０２のループ制御変数の１次式なので、ステッ
プ２００での処理中の（１）に相当し、値"lin"とその
添字"ｊ"を、図８の参照パターンテーブル８１０の２次
元目に設定する。これらは各々、フィールド８１３，８
１４に対応する。Next, when the processing in step 200 of FIG. 2 is applied to the second dimension of the array a, the suffix j of the second dimension is a linear expression of the loop control variable of the current loop 702. The value “lin” and its subscript “j” are set in the second dimension of the reference pattern table 810 in FIG. 8, which corresponds to (1) in the process. These are the fields 813 and 8 respectively.
14 corresponds to.

【００３７】同様にして、図２のステップ２００での処
理を配列aの３次元目に適用すると、３次元目の添字ｋ
は、現ループ７０２に対してループ不変なので、ステッ
プ２００での処理中の（２）に相当し、値"inv"とその
添字"ｋ"を、図８の参照パターンテーブル８１０の３次
元目に設定する。これらは各々、フィールド８１５，８
１６に対応する。以上の処理は、他のループについても
全く同様である。Similarly, when the processing in step 200 of FIG. 2 is applied to the third dimension of the array a, the subscript k of the third dimension
Is invariant to the current loop 702, and thus corresponds to (2) being processed in step 200. The value “inv” and its subscript “k” are added to the third dimension of the reference pattern table 810 in FIG. Set. These are the fields 815, 8 respectively.
16 corresponds to 16. The above processing is exactly the same for other loops.

【００３８】このようにして、参照パターン検出処理部
１０４では、並列化可能ループ中に現れる各配列参照の
添字に、その配列参照を囲むループ群のループ制御変数
がどのようなパターンで現れるかを検出して、その結果
を参照パターンテーブル１５０に出力する。As described above, the reference pattern detection processing unit 104 indicates, in the subscript of each array reference appearing in the parallelizable loop, the pattern in which the loop control variable of the loop group surrounding the array reference appears. The detection is performed, and the result is output to the reference pattern table 150.

【００３９】次に、図１のループ類グラフ作成処理部１
０５の動作を説明する。ループ類グラフ作成処理部１０
５は、細分テーブル作成処理部１０５１、ループ類検出
処理部１０５２、エッジ接続処理部１０５３よりなり、
細分テーブル作成処理部１０５１は、ループテーブル１
４０を入力してループネスト中の最内側ループに対応し
たテーブルを作成して、そのテーブル間を最内側ループ
を含む、あるループ同士の制御フローに従って接続し、
その結果を細分ループテーブル１７０として出力する。Next, the loop type graph creation processing unit 1 shown in FIG.
05 will be described. Loop type graph creation processing unit 10
Reference numeral 5 includes a subdivision table creation processing unit 1051, a loop detection processing unit 1052, and an edge connection processing unit 1053.
The subdivision table creation processing unit 1051 executes the loop table 1
Enter 40 to create a table corresponding to the innermost loop in the loop nest, connect the tables according to a control flow between certain loops, including the innermost loop,
The result is output as the subdivision loop table 170.

【００４０】以下、細分テーブル作成処理部１０５１の
処理動作を図３および図９を用いて説明する。図３は、
図１における細分テーブル作成処理部の処理動作例を示
すフローチャートであり、図９は、図１におけるループ
テーブルの構成例を示す説明図である。図９において、
９００，９２０，９３０，９４０，９５０は、各々、ル
ープ７０２，７０３，７０７，７０１，７０８に対応す
るループテーブルを示したものである。この内、ループ
テーブル９００のみフィールドを詳細に示している。Hereinafter, the processing operation of the subdivision table creation processing unit 1051 will be described with reference to FIGS. FIG.
10 is a flowchart illustrating a processing operation example of a subdivision table creation processing unit in FIG. 1, and FIG. 9 is an explanatory diagram illustrating a configuration example of a loop table in FIG. 1. In FIG.
Reference numerals 900, 920, 930, 940, and 950 denote loop tables corresponding to the loops 702, 703, 707, 701, and 708, respectively. Of these, only the loop table 900 shows the fields in detail.

【００４１】以下、このループテーブル９００を用い
て、ループテーブル中のフィールドを説明する。フィー
ルド９０１は、現在のループより一つ外側にあるループ
に対応するループテーブルを指すポインタである。図７
においてループ７０２より一つ外側にあるループはルー
プ７０１なので、フィールド９０１には、ループ７０１
に対応するループテーブル９４０へのポインタが格納さ
れる。Hereinafter, the fields in the loop table will be described using the loop table 900. A field 901 is a pointer that points to a loop table corresponding to a loop that is one outside the current loop. FIG.
Is a loop 701 outside the loop 702, the field 901 includes the loop 701
Is stored in the loop table 940 corresponding to.

【００４２】またフィールド９０２は、現在のループよ
り一つ内側にあるループ群の内の、先頭ループに対応す
るループテーブルを指すポインタである。図７において
ループ７０２より一つ内側にあるループはループ７０３
のみなので、フィールド９０２には、ループ７０３に対
応するループテーブル９２０へのポインタが格納され
る。A field 902 is a pointer that points to a loop table corresponding to the first loop in a group of loops one inward from the current loop. In FIG. 7, a loop one inner side of the loop 702 is a loop 703.
Therefore, a pointer to the loop table 920 corresponding to the loop 703 is stored in the field 902.

【００４３】フィールド９０３は、現在のループからプ
ログラムの出口方向に向かって最初に現れるループに対
応するループテーブルを指すポインタである。図７にお
いてループ７０２からプログラムの出口方向に向かって
最初に現れるループは、ループ７０７なので、フィール
ド９０３には、ループ７０７に対応するループテーブル
９３０へのポインタが格納される。A field 903 is a pointer to a loop table corresponding to a loop that first appears from the current loop toward the exit of the program. In FIG. 7, the loop that first appears from the loop 702 toward the exit of the program is the loop 707, and therefore, the field 903 stores a pointer to the loop table 930 corresponding to the loop 707.

【００４４】フィールド９０４は、ループ制御変数に対
する辞書を指す。図７においてループ７０２のループ制
御変数はｊなので、ｊの辞書へのポインタが格納され
る。フィールド９０５は、後で述べる細分ループテーブ
ルを指す。細分テーブル作成処理部１０５１の処理前な
ので、ＮＵＬＬが格納されている。Field 904 points to a dictionary for loop control variables. In FIG. 7, since the loop control variable of the loop 702 is j, a pointer to the dictionary of j is stored. Field 905 points to a subdivision loop table, described below. Since it is before the processing of the subdivision table creation processing unit 1051, NULL is stored.

【００４５】フィールド９０６，９０７，９０８は、各
々、ループの下限値、上限値、ストライドを表わす。フ
ィールド９０９，９１０，９１１は、後で述べるテンプ
レートの下限値、上限値、ストライドを表わす。未設定
なので、ＮＵＬＬが格納されている。Fields 906, 907, and 908 represent the lower limit, upper limit, and stride of the loop, respectively. Fields 909, 910, and 911 represent a lower limit, an upper limit, and a stride of a template described later. Since it has not been set, NULL is stored.

【００４６】フィールド９１２は、ループが並列化可能
であるか否かを示すフラグである。図７におけるループ
７０２は、図１の並列性解析処理部１０３により、並列
化可能と判定されるので、この値はＴＲＵＥとなる。The field 912 is a flag indicating whether or not the loop can be parallelized. Since the loop 702 in FIG. 7 is determined to be parallelizable by the parallelism analysis processing unit 103 in FIG. 1, this value is TRUE.

【００４７】フィールド９１３は、実際に、並列化する
か否かを示す並列化フラグである。この決定は、後の並
列化ループ決定処理部１０６でなされるため、ＮＵＬＬ
が格納されている。A field 913 is a parallel flag indicating whether or not to actually perform parallel processing. Since this determination is made by the parallelization loop determination processing unit 106 later, NULL
Is stored.

【００４８】次に、このようなループテーブル１４０を
入力して図１における細分ループテーブル１７０を作成
する細分テーブル作成処理部１０５１の処理動作を図３
を用いて説明する。まず、ステップ３００での処理によ
り、全ての最内側ループに対して細分ループテーブルを
作成し、それらを最内側ループに接続する。この動作を
図１０に示す。Next, the processing operation of the subdivision table creation processing unit 1051 for inputting such a loop table 140 and creating the subdivision loop table 170 in FIG. 1 will be described with reference to FIG.
This will be described with reference to FIG. First, the processing in step 300 creates a subdivided loop table for all the innermost loops and connects them to the innermost loop. This operation is shown in FIG.

【００４９】図１０は、図１における細分ループテーブ
ルとループテーブルとの関連を示す説明図である。本例
は、図１における細分テーブル作成処理部１０５１の出
力となる細分ループテーブル１７０を、入力となるルー
プテーブル１４０とともに示したものである。FIG. 10 is an explanatory diagram showing the relationship between the subdivision loop table and the loop table in FIG. In this example, a subdivision loop table 170 that is an output of the subdivision table creation processing unit 1051 in FIG. 1 is shown together with a loop table 140 that is an input.

【００５０】本図１０において、ループテーブル（９４
０，９００，９２０，９３０，９５０）間の実線の右向
きの矢印と実線の下向きの矢印は、各々、現在のループ
より一つ内側にあるループ群の内の、先頭ループに対応
するループテーブルを指すポインタと、現在のループか
らプログラムの出口方向に向かって最初に現れるループ
に対応するループテーブルを指すポインタを表わす。In FIG. 10, the loop table (94
0, 900, 920, 930, 950), the right arrow of the solid line and the downward arrow of the solid line indicate the loop table corresponding to the first loop in the group of loops one inside the current loop. A pointer that points to a loop table and a pointer that points to a loop table corresponding to a loop that first appears from the current loop toward the exit of the program.

【００５１】１０１１，１０１２は各々、最内側ループ
７０３，７０８に対応する細分ループテーブルである。
最内側ループ７０３，７０８の各々に対応するループテ
ーブルは、９２０，９５０なので、図３におけるステッ
プ３００での処理により、細分ループテーブル１０１
１，１０１２は各々、ループテーブル９２０，９５０に
接続される。ポインタ１０３１，１０３２は各々、この
接続関係を表わしたものである。Reference numerals 1011 and 1012 are subdivided loop tables corresponding to the innermost loops 703 and 708, respectively.
Since the loop tables corresponding to the innermost loops 703 and 708 are 920 and 950, respectively, the processing in step 300 in FIG.
1 and 1012 are connected to loop tables 920 and 950, respectively. The pointers 1031 and 1032 each represent this connection relationship.

【００５２】１０２１，１０２３は、細分ループテーブ
ル１０１１，１０１２の入口リストを模式的に表わした
ものである。同様にして、１０２２，１０２４は、細分
ループテーブル１０１１，１０１２の出口リストを模式
的に表わしたものである。これらの入口リストや出口リ
ストには、ループテーブルまたはループ類テーブルへの
ポインタが接続される。図１０は何も接続されていない
状態を表わしている。Reference numerals 1021 and 1023 schematically show entry lists of the subdivision loop tables 1011 and 1012. Similarly, reference numerals 1022 and 1024 schematically show exit lists of the subdivided loop tables 1011 and 1012. A pointer to a loop table or a loop type table is connected to these entry lists and exit lists. FIG. 10 shows a state where nothing is connected.

【００５３】次に、図３におけるステップ３０１での処
理により、細分ループテーブル１０１１，１０１２の組
に対して、各々が接続する最内側ループテーブルは９２
０，９５０であり、これらのループを共通に含むループ
の中で最も内側にあるループ（図７におけるループ７０
１）を検出する。これはループテーブル９２０，９５０
における９０１に相当するフィールドを、共通なループ
テーブル９４０に到達するまでたどることによって得ら
れる。Next, by the processing in step 301 in FIG. 3, the innermost loop table to which each of the subdivided loop tables 1011 and 1012 is connected is 92
0,950, and the innermost loop (loop 70 in FIG. 7) among the loops including these loops in common.
1) is detected. This is the loop table 920, 950
In the common loop table 940 until the common loop table 940 is reached.

【００５４】次に、図３におけるステップ３０２での処
理により、ステップ３０１の処理で得たループ７０１内
で、ループ７０１より１つだけ内側で、ループ７０３，
７０８を含むループとして、各々、ループ７０２，７０
７を得る。これはステップ３０１での処理において、ル
ープテーブル９２０，９５０におけるフィールド９０１
に相当するフィールドを、共通なループテーブル９４０
に到達するまでたどる際に、ループテーブル９４０の直
前にたどったループテーブルを記録しておくことによっ
て得られる。Next, according to the processing in step 302 in FIG. 3, in the loop 701 obtained in the processing in step 301, only one loop inside the loop 701,
Loops 702 and 70
Get 7. This is because in the processing in step 301, the fields 901 in the loop tables 920 and 950
In the common loop table 940
Is obtained by recording the loop table traced immediately before the loop table 940 when the loop table is reached.

【００５５】そして最後に、ステップ３０３の処理によ
り、ステップ３０２の処理で得たループ７０２からルー
プ７０７の方向に制御フローが存在するので、細分ルー
プテーブル１０１１から細分ループテーブル１０１２の
方向に、向きの付いたエッジで細分ループテーブル１０
１１と細分ループテーブル１０１２を接続する。図１０
における細分ループテーブル１０１１から細分ループテ
ーブル１０１２への実線の矢印はこのエッジを表わす。Finally, since the control flow exists in the direction from the loop 702 to the loop 707 obtained in the processing in step 302 by the processing in step 303, the control flow is directed in the direction from the subdivided loop table 1011 to the subdivided loop table 1012. Subdivided loop table 10 with attached edges
11 and the subdivision loop table 1012. FIG.
The solid arrow from the subdivision loop table 1011 to the subdivision loop table 1012 in FIG.

【００５６】図１におけるループ類検出処理部１０５２
は、ループテーブル１４０、参照パターンテーブル１５
０および細分ループテーブル１７０を入力して同値な関
係にあるループテーブルをループ類グラフ１６０のノー
ドであるループ類テーブルに接続して、ノード間がエッ
ジで未接続な状態の主要なノードだけから成るループ類
グラフ１６０を出力する。以下、このループ類検出処理
部１０５２の処理動作を、図４および図１１を用いて説
明する。The loop detection processing unit 1052 in FIG.
Are the loop table 140 and the reference pattern table 15
By inputting 0 and the subdivided loop table 170, the equivalent loop table is connected to the loop table which is a node of the loop class graph 160, and the nodes are composed of only the main nodes which are not connected at the edge. The loop type graph 160 is output. Hereinafter, the processing operation of the loop detection processing unit 1052 will be described with reference to FIGS.

【００５７】図４は、図１におけるループ類検出処理部
の処理動作例を示すフローチャートであり、図１１は、
図１におけるループテーブルと細分ループテーブルおよ
びループ類テーブル間の関係を表わした説明図である。
図１１において、１１０１〜１１０３はループ類を表わ
すループ類テーブルである。また、１１１１〜１１１３
は、ループ類テーブルにループテーブルを接続するため
のリストを模式的に表わしたものである。FIG. 4 is a flowchart showing an example of the processing operation of the loop type detection processing unit in FIG. 1. FIG.
FIG. 2 is an explanatory diagram showing a relationship among a loop table, a subdivided loop table, and a loop type table in FIG. 1.
In FIG. 11, reference numerals 1101 to 1103 denote loop type tables indicating loop types. Also, 1111-1113
Shows a list for connecting a loop table to the loop table.

【００５８】これらのリストには、ループテーブルへの
ポインタが格納される。図中の実線の矢印は、矢印の始
点となるループ類テーブル中のあるフィールドに、矢印
の終点となるループテーブルへのポインタが格納されて
いることを示す。尚、ループ類テーブル間の矢印は、ル
ープ類グラフのエッジとは異なる。These lists store pointers to the loop table. The solid arrow in the drawing indicates that a pointer to the loop table that is the end point of the arrow is stored in a certain field in the loop type table that is the start point of the arrow. The arrows between the loop type tables are different from the edges of the loop type graph.

【００５９】図４において、図１におけるループ類検出
処理部１０５２は、まずステップ４００での処理によ
り、開始点をプログラム入口とする。次に、ステップ４
０１での処理により、未処理ループがあるか否かを判別
し、ここでは未処理ループがあるのでステップ４０２の
処理に移る。ステップ４０２では、開始点であるプログ
ラム入口に最も近い未処理ループとして、図７における
ループ７０１を選択する。In FIG. 4, the loop detection processing unit 1052 shown in FIG. Next, step 4
In step 01, it is determined whether or not there is an unprocessed loop. Since there is an unprocessed loop, the process proceeds to step 402. In step 402, the loop 701 in FIG. 7 is selected as the unprocessed loop closest to the entry point of the program.

【００６０】そして、ステップ４０３での処理により、
ループ７０１内の最初の最内側ループ７０３に対応する
図１１における細分ループテーブル１０１１の入口リス
ト１０２１に、ループ７０１に対応するループテーブル
９４０を接続する。図１１中の、入口リスト１０２１を
始点とし、ループテーブル９４０を終点とする点線の矢
印は、入口リスト１０２１にループテーブル９４０への
ポインタが格納されたことを示す。Then, by the processing in step 403,
The loop table 940 corresponding to the loop 701 is connected to the entry list 1021 of the subdivided loop table 1011 in FIG. 11 corresponding to the first innermost loop 703 in the loop 701. In FIG. 11, a dotted arrow starting from the entry list 1021 and ending at the loop table 940 indicates that a pointer to the loop table 940 is stored in the entry list 1021.

【００６１】次にステップ４０４での処理により、図１
１におけるループ類テーブル１１０１が作成され、図７
のループ７０１に対応するループテーブル９４０がルー
プ類テーブル１１０１に接続される。図１１中の、リス
ト１１１１を始点とし、ループテーブル９４０を終点と
する実線の矢印は、リスト１１１１にループテーブル９
４０へのポインタが格納されたことを示す。ステップ４
０５での処理において、ループ７０１の後続ループネス
トはないので、ステップ４０９の処理に移る。Next, by the processing in step 404, FIG.
7 is created, and the loop type table 1101 in FIG.
The loop table 940 corresponding to the loop 701 is connected to the loop table 1101. In FIG. 11, a solid arrow starting from the list 1111 and ending at the loop table 940 indicates that the list 1111 is the loop table 9.
Indicates that the pointer to 40 has been stored. Step 4
In the process at 05, there is no subsequent loop nest of the loop 701, so the process proceeds to the process of step 409.

【００６２】このステップ４０９での処理により、開始
点をプログラム入口とする。さらにステップ４１０での
処理により、ループ７０１内の最後の最内側ループ７０
８に対応する細分ループテーブル１０１２の出口リスト
１０２４に、ループ７０１に対応するループテーブル９
４０を接続する。図１１中の、入口リスト１０２４を始
点とし、ループテーブル９４０を終点とする点線の矢印
は、入口リスト１０２４にループテーブル９４０へのポ
インタが格納されたことを示す。By the processing in step 409, the starting point is set as the program entry point. Further, by the processing in step 410, the last innermost loop 70 in the loop 701 is executed.
In the exit list 1024 of the subdivided loop table 1012 corresponding to 8, the loop table 9 corresponding to the loop 701 is included.
40 is connected. In FIG. 11, a dotted arrow starting from the entry list 1024 and ending at the loop table 940 indicates that a pointer to the loop table 940 is stored in the entry list 1024.

【００６３】再び、ステップ４０１での処理により、未
処理ループがあるのでステップ４０２の処理に移る。こ
のステップ４０２での処理により、開始点であるプログ
ラム入口に最も近い未処理ループとして、図７のループ
７０２を選択する。[0063] Again, there is an unprocessed loop due to the processing in step 401, so the processing moves to step 402. By the processing in step 402, the loop 702 in FIG. 7 is selected as the unprocessed loop closest to the program entrance which is the starting point.

【００６４】そしてステップ４０３での処理により、ル
ープ７０２内の最初の最内側ループ７０３に対応する細
分ループテーブル１０１１の入口リスト１０２１に、図
７のループ７０２に対応するループテーブル９００を接
続する。図１１中の、入口リスト１０２１を始点とし、
ループテーブル９００を終点とする点線の矢印は、入口
リスト１０２１にループテーブル９００へのポインタが
格納されたことを示す。By the processing in step 403, the loop table 900 corresponding to the loop 702 in FIG. 7 is connected to the entry list 1021 of the subdivided loop table 1011 corresponding to the first innermost loop 703 in the loop 702. Starting from the entrance list 1021 in FIG.
A dotted arrow ending with the loop table 900 indicates that a pointer to the loop table 900 has been stored in the entry list 1021.

【００６５】さらにステップ４０４での処理により、ル
ープ類テーブル１１０２が作成され、図７のループ７０
２に対応するループテーブル９００がループ類テーブル
１１０２に接続される。図１１中の、リスト１１１２を
始点とし、ループテーブル９００を終点とする実線の矢
印は、リスト１１１２にループテーブル９００へのポイ
ンタが格納されたことを示す。Further, the loop type table 1102 is created by the processing in step 404, and the loop 70 shown in FIG.
2 is connected to the loop table 1102. In FIG. 11, a solid arrow starting from the list 1112 and ending at the loop table 900 indicates that the list 1112 stores a pointer to the loop table 900.

【００６６】またステップ４０５の処理では、図７にお
いてループ７０２の後続ループネスト７０７が存在する
ので、ステップ４０６の処理に移る。このステップ４０
６での処理により、図７のループ７０２と後続ループネ
スト７０７中のループとで、同値なものがあるか否かを
調べる。In the process of step 405, the process proceeds to the process of step 406 because a loop nest 707 following the loop 702 exists in FIG. This step 40
By the processing in 6, it is checked whether or not there is an equivalent in the loop 702 of FIG.

【００６７】本例では、２つのループL1, L2が同値であ
るとは、以下の（１）〜（６）の条件を満足する時と定
める。（１）L1とL2は包含関係にない。（２）L1とL2は共に並列化可能。In this example, the two loops L1 and L2 are defined as having the same value when the following conditions (1) to (6) are satisfied. (1) L1 and L2 do not have an inclusion relationship. (2) Both L1 and L2 can be parallelized.

【００６８】（３）L1とL2を並列化する時のループ繰り
返し範囲の分割方法は同じである。（４）L1とL2のストライドは等しい。（５）L1とL2の上限値と下限値の差を、各々、LU, LLと
する時、LU, LLは共に定数で、LU, LLの絶対値はある一
定値以下。(3) The method of dividing the loop repetition range when L1 and L2 are parallelized is the same. (4) L1 and L2 have the same stride. (5) When the difference between the upper limit value and the lower limit value of L1 and L2 is LU and LL, respectively, LU and LL are constants, and the absolute values of LU and LL are less than a certain fixed value.

【００６９】（６）L1とL2の両方に現われる全ての配列
に対して、L1内の全ての参照点で、ある次元の添字は同
じで、かつ、L1のループ制御変数の１次式であり、L2内
の全ての参照点で、ある次元の添字は同じで、かつ、L2
のループ制御変数の１次式であり、（a）の添字のルー
プ制御変数をL2のループ制御変数に置換したものと、
（ｂ）の添字との差は定数で、その絶対値はある一定値
以下。(6) For all arrays appearing in both L1 and L2, at all reference points in L1, the suffix of a certain dimension is the same and is a linear expression of the loop control variable of L1. , L2, the index of a certain dimension is the same at all reference points, and L2
Is a linear expression of the loop control variable of (a), wherein the subscript of the loop control variable of (a) is replaced with the loop control variable of L2,
The difference from the subscript in (b) is a constant, and its absolute value is not more than a certain fixed value.

【００７０】尚、上記各条件中の一定値を、本例では
「１」と定める。また、入力プログラム中のループは全
て並列化可能で、並列化する時のループ繰り返し範囲の
分割方法は、全てブロック分割であるとする。The constant value in each of the above conditions is defined as "1" in this example. Further, it is assumed that all loops in the input program can be parallelized, and the method of dividing the loop repetition range at the time of parallelization is block division.

【００７１】この時、図７におけるループ７０２と同値
な、後続ループネスト７０７中のループはループ７０８
であることがわかる。よって、図４におけるステップ４
０６での処理においてはYesとなり、ステップ４０７の
処理へ移る。At this time, a loop in the subsequent loop nest 707 equivalent to the loop 702 in FIG.
It can be seen that it is. Therefore, step 4 in FIG.
In the process at 06, the result is Yes, and the routine goes to the process at step 407.

【００７２】ステップ４０７での処理により、図７のル
ープ７０８に対応するループテーブル９５０を、ループ
７０２に対応するループテーブル９００と同じループ類
テーブル１１１２に接続し、ループ７０８をループＡと
する。図１１中の、リスト１１１２を始点とし、ループ
テーブル９５０を終点とする実線の矢印は、リスト１１
１２にループテーブル９５０へのポインタが格納された
ことを示す。By the processing in step 407, the loop table 950 corresponding to the loop 708 in FIG. 7 is connected to the same loop type table 1112 as the loop table 900 corresponding to the loop 702, and the loop 708 is set as loop A. In FIG. 11, a solid arrow starting from the list 1112 and ending at the loop table 950 is the list 11
12 indicates that the pointer to the loop table 950 has been stored.

【００７３】次に、ステップ４０５の処理に戻る。ここ
では、図７におけるループ７０８の後続ループネストは
ないので、ステップ４０９の処理に移る。このステップ
４０９での処理により、開始点をプログラム入口とす
る。Next, the process returns to step 405. Here, there is no subsequent loop nest of the loop 708 in FIG. With the processing in step 409, the starting point is set as the program entry point.

【００７４】そしてステップ４１０での処理により、図
７におけるループ７０８内の最後の最内側ループ７０８
に対応する細分ループテーブル１０１２の出口リスト１
０２４に、ループ７０８に対応するループテーブル９５
０を接続する。図１１中の、入口リスト１０２４を始点
とし、ループテーブル９５０を終点とする点線の矢印
は、入口リスト１０２４にループテーブル９５０へのポ
インタが格納されたことを示す。Then, by the processing in step 410, the last innermost loop 708 in the loop 708 in FIG.
List 1 of the subdivision loop table 1012 corresponding to
024, a loop table 95 corresponding to the loop 708
0 is connected. In FIG. 11, a dotted arrow starting from the entry list 1024 and ending at the loop table 950 indicates that a pointer to the loop table 950 is stored in the entry list 1024.

【００７５】再び、ステップ４０１での処理において、
未処理ループがあるのでステップ４０２の処理に移る。
このステップ４０２での処理により、開始点であるプロ
グラム入口に最も近い未処理ループとして、図７におけ
るループ７０３を選択する。Again, in the processing of step 401,
Since there is an unprocessed loop, the process proceeds to step 402.
By the processing in step 402, the loop 703 in FIG. 7 is selected as the unprocessed loop closest to the program entry which is the starting point.

【００７６】次のステップ４０３での処理により、図７
におけるループ７０３内の最初の最内側ループ７０３に
対応する細分ループテーブル１０１１の入口リスト１０
２１に、ループ７０３に対応するループテーブル９２０
を接続する。図１１中の、入口リスト１０２１を始点と
し、ループテーブル９２０を終点とする点線の矢印は、
入口リスト１０２１にループテーブル９２０へのポイン
タが格納されたことを示す。By the processing in the next step 403, FIG.
, The entry list 10 of the subdivided loop table 1011 corresponding to the first innermost loop 703 in the loop 703
21, a loop table 920 corresponding to the loop 703.
Connect. In FIG. 11, a dotted arrow starting from the entrance list 1021 and ending at the loop table 920 is
The entry list 1021 indicates that the pointer to the loop table 920 has been stored.

【００７７】さらにステップ４０４での処理により、ル
ープ類テーブル１１０３が作成され、図７のループ７０
３に対応するループテーブル９２０がループ類テーブル
１１０３に接続される。図１１中の、リスト１１１３を
始点とし、ループテーブル９２０を終点とする実線の矢
印は、リスト１１１３にループテーブル９２０へのポイ
ンタが格納されたことを示す。Further, the loop type table 1103 is created by the processing in step 404, and the loop 70 shown in FIG.
3 is connected to the loop table 1103. In FIG. 11, a solid arrow starting from the list 1113 and ending at the loop table 920 indicates that a pointer to the loop table 920 is stored in the list 1113.

【００７８】次のステップ４０５での処理においては、
図７のループ７０３の後続ループネスト７０７が存在す
るので、ステップ４０６の処理に移る。このステップ４
０６での処理により、図７におけるループ７０３と後続
ループネスト７０７中のループとで、同値なものがある
か否かを調べる。In the processing in the next step 405,
Since there is a subsequent loop nest 707 of the loop 703 in FIG. 7, the process proceeds to step 406. This step 4
By the process at 06, it is checked whether or not there is an equivalent between the loop 703 in FIG. 7 and the loop in the subsequent loop nest 707.

【００７９】このループ７０３と同値な、後続ループネ
スト７０７中のループはループ７０７であることがわか
る。よって、ステップ４０６での処理結果はYesとな
り、ステップ４０７の処理へ移る。It can be seen that the loop in the subsequent loop nest 707 which is equivalent to the loop 703 is the loop 707. Therefore, the processing result in step 406 is Yes, and the process proceeds to step 407.

【００８０】ステップ４０７での処理により、図７のル
ープ７０７に対応するループテーブル９３０を、ループ
７０３に対応するループテーブル９２０と同じループ類
テーブル１１１３に接続し、ループ７０７をループＡと
する。図１１中の、リスト１１１３を始点とし、ループ
テーブル９３０を終点とする実線の矢印は、リスト１１
１３にループテーブル９３０へのポインタが格納された
ことを示す。By the processing in step 407, the loop table 930 corresponding to the loop 707 in FIG. 7 is connected to the same loop type table 1113 as the loop table 920 corresponding to the loop 703, and the loop 707 is defined as loop A. In FIG. 11, the solid arrow starting from the list 1113 and ending at the loop table 930 is the list 1113.
13 indicates that the pointer to the loop table 930 has been stored.

【００８１】次に、ステップ４０５の処理に戻る。この
ステップ４０５での処理においては、図７におけるルー
プ７０７の後続ループネストはないので、ステップ４０
９の処理に移る。Next, the process returns to step 405. In the processing in step 405, there is no subsequent loop nest of the loop 707 in FIG.
Move to the process of No. 9.

【００８２】ステップ４０９での処理により、開始点を
プログラム入口とする。そしてステップ４１０での処理
により、図７におけるループ７０７内の最後の最内側ル
ープ７０８に対応する細分ループテーブル１０１２の出
口リスト１０２４に、ループ７０７に対応するループテ
ーブル９３０を接続する。図１１中の、入口リスト１０
２４を始点とし、ループテーブル９３０を終点とする点
線の矢印は、入口リスト１０２４にループテーブル９３
０へのポインタが格納されたことを示す。By the processing in step 409, the starting point is set as the program entry point. Then, by the processing in step 410, the loop table 930 corresponding to the loop 707 is connected to the exit list 1024 of the subdivided loop table 1012 corresponding to the last innermost loop 708 in the loop 707 in FIG. Entrance list 10 in FIG.
A dotted arrow starting at 24 and ending at the loop table 930 indicates a loop table 93 in the entry list 1024.
Indicates that the pointer to 0 has been stored.

【００８３】再び、ステップ４０１の処理に移り、未処
理ループはないので処理を終了する。尚、以上の処理で
得られたループ類テーブル１１０１，１１０２，１１０
３が、主要なノードだけからなるループ類グラフ１６０
を構成する。The process again proceeds to the step 401, and since there is no unprocessed loop, the process ends. Note that the loop type tables 1101, 1102, 110 obtained by the above processing
3 is a loop type graph 160 including only major nodes
Is configured.

【００８４】図１におけるエッジ接続処理部１０５３
は、このようなループテーブル１４０、細分ループテー
ブル１７０、主要なノードだけからなるループ類グラフ
１６０を入力してループ類グラフのノード間がエッジで
接続されたループ類グラフ１６０を出力する。以下、エ
ッジ接続処理部１０５３の処理動作を、図５および図１
１〜図１３を用いて説明する。The edge connection processing unit 1053 in FIG.
Inputs such a loop table 140, a subdivided loop table 170, and a loop type graph 160 including only major nodes, and outputs a loop type graph 160 in which nodes of the loop type graph are connected by edges. Hereinafter, the processing operation of the edge connection processing unit 1053 will be described with reference to FIGS.
This will be described with reference to FIGS.

【００８５】図５は、図１におけるエッジ接続処理部の
処理動作例を示す説明図であり、図１２は、図５におけ
るステップ５００の処理を行なった結果の細分テーブル
とループ類テーブルとの関係を示す説明図、図１３は、
図１におけるループ類グラフの構成例を示す説明図であ
る。FIG. 5 is an explanatory diagram showing an example of the processing operation of the edge connection processing section in FIG. 1. FIG. 12 is a diagram showing the relationship between the subdivision table and the loop table as a result of performing the processing of step 500 in FIG. FIG. 13 is an explanatory diagram showing
FIG. 2 is an explanatory diagram illustrating a configuration example of a loop type graph in FIG. 1.

【００８６】図５のステップ５００での処理により、図
１１における全てのループ類テーブル１１０１，１１０
２，１１０３と、それらのループ類テーブルに接続され
たループテーブル９４０，９００，９２０，９３０，９
５０に対して、以上のループテーブルが細分ループテー
ブルに接続されているか調べる。By the processing in step 500 of FIG. 5, all the loop type tables 1101, 110 in FIG.
2, 1103 and the loop tables 940, 900, 920, 930, 9 connected to the loop tables.
For 50, it is checked whether the above loop table is connected to the subdivided loop table.

【００８７】図１１の細分ループテーブル１０１１の入
口リスト１０２１と、細分ループテーブル１０１２の出
口リスト１０２４から、全てのループテーブルがいずれ
かの細分ループテーブルに接続されていることが分か
る。From the entry list 1021 of the subdivision loop table 1011 and the exit list 1024 of the subdivision loop table 1012 in FIG. 11, it can be seen that all the loop tables are connected to one of the subdivision loop tables.

【００８８】この場合、細分ループテーブル１０１１，
１０１２から各ループテーブル９００，９２０，９３
０，９４０，９５０へのポインタを、そのループテーブ
ルが接続されたループ類テーブル１１０１〜１１０３へ
のポインタに付け替える。例えば、細分ループテーブル
１０１１からループテーブル９４０へのポインタを、細
分ループテーブル１０１１からループテーブル９４０が
接続されたループ類テーブル１１０１へのポインタに付
け替えるのは、以下のように行なう。In this case, the subdivision loop table 1011
From 1012, each loop table 900, 920, 93
The pointers to 0, 940, and 950 are replaced with pointers to the loop type tables 1101 to 1103 to which the loop table is connected. For example, replacement of the pointer from the subdivided loop table 1011 to the loop table 940 with the pointer to the loop type table 1101 to which the subdivided loop table 1011 is connected to the loop table 940 is performed as follows.

【００８９】まず、ループ類テーブル１１０１のリスト
１１１１からループテーブル９４０を見つけ、ループテ
ーブル９４０から、その２つの最内側ループのループテ
ーブル９２０、９５０を見つけ、それらが指す細分ルー
プテーブル１０１１，１０１２を、ポインタ１０３１，
１０３２より見つける。First, the loop table 940 is found from the list 1111 of the loop table 1101, the loop tables 920 and 950 of the two innermost loops are found from the loop table 940, and the subdivided loop tables 1011 and 1012 indicated by them are Pointer 1031,
Find from 1032.

【００９０】細分ループテーブル１０１１の入口リスト
１０２１にループテーブル９４０が接続されているの
で、これをループ類テーブル１１０１へのポインタに付
け替える。同様にして、細分ループテーブル１０１２の
出口リスト１０２４にループテーブル９４０が接続され
ているので、これをループ類テーブル１１０１へのポイ
ンタに付け替える。Since the loop table 940 is connected to the entry list 1021 of the subdivision loop table 1011, it is replaced with a pointer to the loop type table 1101. Similarly, since the loop table 940 is connected to the exit list 1024 of the subdivided loop table 1012, this is replaced with a pointer to the loop type table 1101.

【００９１】このようなステップ５００での処理を行な
った結果の細分ループテーブルとループ類テーブルとの
関係を表わしたものが図１２である。この図１２におい
て、細分ループテーブル１０１１の入口リスト１０２１
から、ループ類テーブル１１０１への点線の矢印と、細
分ループテーブル１０１２の出口リスト１０２４から、
ループ類テーブル１１０１への点線の矢印が、ポインタ
が付け替えられた結果を表わしている。他のループテー
ブル、ループ類テーブルについても同様である。この結
果、図１２中の点線の矢印が示すポインタの付け替え結
果が得られる。FIG. 12 shows the relationship between the subdivision loop table and the loop type table as a result of performing the processing in step 500. In FIG. 12, the entry list 1021 of the subdivision loop table 1011
From the dotted arrow to the loop type table 1101 and the exit list 1024 of the subdivided loop table 1012,
A dotted arrow to the loop type table 1101 indicates the result of the pointer change. The same applies to other loop tables and loop type tables. As a result, a pointer replacement result indicated by a dotted arrow in FIG. 12 is obtained.

【００９２】次に、図５におけるステップ５０１での処
理に移るが、その処理の説明の前に、図１３を説明す
る。図１３は、図１におけるループ類グラフ１６０を表
わしたものであり、ノード１３０１は入口ノード、ノー
ド１３０２は出口ノードを表わし、ループ類テーブル１
１０１，１１０２，１１０３は、一般のノードを表わ
す。以上のノード同士を接続する点線の矢印は、ループ
類グラフのノードを接続するエッジを表わす。Next, the process proceeds to step 501 in FIG. 5. Before describing the process, FIG. 13 will be described. FIG. 13 shows the loop type graph 160 in FIG. 1. The node 1301 represents the entry node, the node 1302 represents the exit node, and the loop type table 1.
101, 1102 and 1103 represent general nodes. The dotted arrows connecting the above nodes represent edges connecting the nodes of the loop type graph.

【００９３】図５のステップ５０１での処理により、図
１３における入口ノード１３０１を作成し、この入口ノ
ード１３０１から、最初の細分ループテーブル１０１１
の入口リスト１０２１に接続されたループ類テーブル１
１０１，１１０２，１１０３に向かうエッジで、両者を
接続する。図１３中、入口ノード１３０１からノード１
１０１，１１０２，１１０３に向かう点線の矢印がその
エッジを示す。The entry node 1301 shown in FIG. 13 is created by the processing in step 501 in FIG. 5, and the first subdivided loop table 1011 is created from the entry node 1301.
Table 1 connected to the entry list 1021
Both are connected at the edge toward 101, 1102, 1103. In FIG. 13, from the entry node 1301 to the node 1
Dotted arrows pointing toward 101, 1102, and 1103 indicate the edges.

【００９４】次のステップ５０２での処理により、図１
３における出口ノード１３０２を作成し、最後の細分ル
ープテーブル１０１２の出口リスト１０２４に接続され
たループ類テーブル１１０１，１１０２，１１０３か
ら、出口ノード１３０２に向かうエッジで、両者を接続
する。図１３中、ノード１１０１，１１０２，１１０３
から出口ノード１３０２に向かう点線の矢印がそのエッ
ジを示す。By the processing in the next step 502, FIG.
3 is created, and the two are connected at the edge toward the exit node 1302 from the loop type tables 1101, 1102, and 1103 connected to the exit list 1024 of the last subdivided loop table 1012. 13, nodes 1101, 1102, 1103
A dotted arrow directed from to the exit node 1302 indicates the edge.

【００９５】次のステップ５０３での処理においては、
隣接する細分テーブル１０１１，１０１２に対して、プ
ログラム入口に近い細分ループテーブル１０１１の出口
リスト１０２２と、プログラム出口に近い細分ループテ
ーブル１０１２の入口リスト１０２３に接続されたルー
プ類テーブルはないので、何も処理を行なわない。以上
のようにして、図１３のループ類グラフ１６０を得る。In the processing in the next step 503,
For the adjacent subdivision tables 1011 and 1012, there is no loop type table connected to the exit list 1022 of the subdivision loop table 1011 near the program entry and the entry list 1023 of the subdivision loop table 1012 near the program exit. Do not process. As described above, the loop type graph 160 of FIG. 13 is obtained.

【００９６】図１における並列化ループ決定処理部１０
６は、中間語１３０、ループテーブル１４０およびルー
プ類グラフ１６０を入力して、同じループ類に含まれる
ループは１つのループであるかのように扱うことによっ
て、並列ループ候補の組合わせ数を削減し、削減された
各組合わせに対してプログラムの実行時間を評価し、最
も実行時間が短いと評価された時の並列ループの組合わ
せを選択して、その結果をループテーブル１４０に出力
する。The parallelization loop decision processing unit 10 in FIG.
No. 6 reduces the number of combinations of parallel loop candidates by inputting the intermediate language 130, the loop table 140, and the loop type graph 160, and treating the loops included in the same loop type as one loop. Then, the execution time of the program is evaluated for each of the reduced combinations, the combination of the parallel loops when the execution time is estimated to be the shortest is selected, and the result is output to the loop table 140.

【００９７】以下、このような並列化ループ決定処理部
１０６の処理を説明する。図６は、図１における並列化
ループ決定処理部の処理動作例を示すフローチャートで
ある。まずステップ６００での処理により、ループ類テ
ーブルに接続された各ループとそのループ内で参照され
る配列に対して、そのループのループ制御変数を含む次
元の参照範囲を添字より計算し、ループ類テーブルに接
続された全ループに対し、その参照範囲の和集合を計算
し、上記配列が上記和集合の範囲を参照するようなルー
プ範囲を、各ループのループテーブルに設定する。Hereinafter, the processing of the parallelization loop determination processing unit 106 will be described. FIG. 6 is a flowchart illustrating an example of a processing operation of the parallelization loop determination processing unit in FIG. First, by the processing in step 600, for each loop connected to the loop table and the array referred to in the loop, the reference range of the dimension including the loop control variable of the loop is calculated from the subscript, The union of the reference ranges is calculated for all the loops connected to the table, and a loop range in which the array refers to the union range is set in the loop table of each loop.

【００９８】以下、ループ内で参照される配列として配
列aのみを考える。図１１においてループ類テーブル１
１０１に接続されたループはループ７０１だけであり、
この場合、ステップ６００での処理で設定されるループ
範囲は元のループ範囲と同じなので、その上限値「２」
と、下限値「１」と、ストライド「１」を、図９における
ループテーブル９４０のフィールド９０９，９１０，９
１１に設定する。Hereinafter, only the array a is considered as the array to be referred to in the loop. In FIG. 11, the loop type table 1
The only loop connected to 101 is loop 701,
In this case, since the loop range set in the processing in step 600 is the same as the original loop range, its upper limit value is “2”.
, The lower limit value “1”, and the stride “1” are stored in the fields 909, 910, and 9 of the loop table 940 in FIG.
Set to 11.

【００９９】また、図１１においてループ類テーブル１
１０２に接続されたループはループ７０２とループ７０
８である。図７に示すループ７０２における、配列a
の、ループ７０２のループ制御変数ｊを含む次元の参照
範囲は、行７０４での配列aのｊを含む添字がｊである
ことから、「１〜３０」までである。In FIG. 11, the loop type table 1
The loop connected to 102 is a loop 702 and a loop 70
8 Array a in loop 702 shown in FIG.
The reference range of the dimension including the loop control variable j of the loop 702 is “1 to 30” since the subscript including j of the array a in the row 704 is j.

【０１００】一方、ループ７０８における、配列aの、
ループ７０８のループ制御変数ｊを含む次元の参照範囲
は、行７０９での配列aのｊを含む添字がｊ−1であるこ
とから、「１〜２８」までである。よって、両方の参照
範囲の和集合は「１〜３０」までである。On the other hand, in the loop 708,
The reference range of the dimension including the loop control variable j of the loop 708 is “1-28” since the subscript including j of the array a in the row 709 is j−1. Therefore, the union of both reference ranges is “1 to 30”.

【０１０１】そして，ループ７０２において、配列aが
上記和集合の範囲（「１〜３０」）を参照するようなル
ープ範囲は、行７０４での配列aのｊを含む添字がｊで
あることから、「１〜３０」までである。よって、その
上限値「３０」と、下限値「１」と、ストライド「１」
を、図９におけるループテーブル９００のフィールド９
０９，９１０，９１１に設定する。In the loop 702, the loop range in which the array a refers to the range of the union (“1 to 30”) is because the subscript including j of the array a in the row 704 is j. , “1 to 30”. Therefore, the upper limit “30”, the lower limit “1”, and the stride “1”
To field 9 of the loop table 900 in FIG.
09, 910 and 911 are set.

【０１０２】一方、ループ７０８において、配列aが上
記和集合の範囲（「１〜３０」）を参照するようなルー
プ範囲は、行７０９での配列aのｊを含む添字がｊ−1で
あることから、「２〜３１」までである。よって、その
上限値「３１」と、下限値「２」と、ストライド「１」
を、図９におけるループテーブル９５０のフィールド９
０９，９１０，９１１に対応するフィールドに設定す
る。On the other hand, in the loop 708, the subscript including j of the array a in the row 709 is j−1 in the loop range in which the array a refers to the range of the union (“1 to 30”). Therefore, the number is from "2 to 31". Therefore, the upper limit value “31”, the lower limit value “2”, and the stride “1”
To field 9 of loop table 950 in FIG.
Fields corresponding to 09, 910, and 911 are set.

【０１０３】また、図１１におけるループ類テーブル１
１０３に接続されたループはループ７０３と７０７であ
る。図７のループ７０３における、配列aの、ループ７
０３のループ制御変数iを含む次元の参照範囲は、行７
０４での配列aのiを含む添字がiであることから、「１
〜３０」である。The loop table 1 shown in FIG.
The loops connected to 103 are loops 703 and 707. In the loop 703 in FIG.
The reference range of the dimension including the loop control variable i of 03
04, since the subscript including i in the array a is i, “1
~ 30 ".

【０１０４】一方、図７のループ７０７における、配列
aの、ループ７０７のループ制御変数iを含む次元の参照
範囲は、行７０９での配列aのiを含む添字がi−1である
ことから、「１〜２９」である。よって、両方の参照範
囲の和集合は「１〜３０」である。On the other hand, in the loop 707 of FIG.
The reference range of the dimension of a including the loop control variable i of the loop 707 is “1 to 29” because the subscript including i of the array a in the row 709 is i−1. Therefore, the union of both reference ranges is “1 to 30”.

【０１０５】そして、ループ７０３において、配列aが
上記和集合の範囲（「１〜３０」）を参照するようなル
ープ範囲は、行７０４での配列aのiを含む添字がiであ
ることから、「１〜３０」である。よって、その上限値
「３０」と、下限値「１」と、ストライド「１」を、図９
のループテーブル９２０の、フィールド９０９，９１
０，９１１に対応するフィールドに設定する。In the loop 703, a loop range in which the array a refers to the range of the union (“1 to 30”) is because the subscript including i of the array a in the row 704 is i. , “1 to 30”. Therefore, the upper limit value “30”, the lower limit value “1”, and the stride “1” are set in FIG.
Fields 909 and 91 of the loop table 920 of FIG.
Set in the field corresponding to 0,911.

【０１０６】一方、ループ７０７において、配列aが上
記和集合の範囲１から３０までを参照するようなループ
範囲は、行７０９での配列aのiを含む添字がiであるこ
とから、「１〜３０」である。よって、その上限値「３
０」と、下限値「１」と、ストライド「１」を、図９のル
ープテーブル９３０の、フィールド９０９，９１０，９
１１に対応するフィールドに設定する。On the other hand, in the loop 707, the loop range in which the array a refers to the union ranges 1 to 30 is “1” since the subscript including i of the array a in the row 709 is i. ~ 30 ". Therefore, the upper limit “3”
0, the lower limit value “1”, and the stride “1” are stored in the fields 909, 910, 9 in the loop table 930 of FIG.
Set in the field corresponding to 11.

【０１０７】次に図６のステップ６０１での処理におい
ては、図１３に示すループ類グラフの入口ノード１３０
１から出口ノード１３０２までは、ループ類テーブル１
１０１，１１０２，１１０３を通る３つのパスがあり、
各パス上には一つのループ類のみ存在するので、以下に
示すように、それらのループ類の予測実行時間を評価
し、その値が最小になるパスを選択すれば良い。Next, in the processing in step 601 of FIG. 6, the entry node 130 of the loop type graph shown in FIG.
1 to the exit node 1302, the loop table 1
There are three paths through 101, 1102, 1103,
Since there is only one loop on each path, as shown below, the estimated execution time of those loops is evaluated, and the path having the minimum value may be selected.

【０１０８】ここではループの並列化方法として、ルー
プの繰り返し範囲を等分割することとし、プロセッサ数
は「３」として説明する。まず、図１３におけるループ
類テーブル１１０１に接続されたループ７０１を並列化
する。Here, as a method of parallelizing the loop, the loop repetition range is equally divided, and the number of processors is described as "3". First, the loop 701 connected to the loop type table 1101 in FIG. 13 is parallelized.

【０１０９】すなわち、図６におけるステップ６０１で
の処理（１）により、ステップ６００の処理で設定され
た各ループの上限値「２」と下限値「１」を持つ仮想的な
ループの並列化によるループ繰り返し範囲の分割は、
「１」から「１」、「２」から「２」までとなり、３台目の
プロセッサにはループ範囲は割り当てられない。That is, by the processing (1) in step 601 in FIG. 6, the virtual loop having the upper limit value “2” and the lower limit value “1” of each loop set in the processing in step 600 is parallelized. The division of the loop repetition range is
From "1" to "1" and from "2" to "2", no loop range is allocated to the third processor.

【０１１０】また、ステップ６０１での処理（２）によ
り、上記分割されたループ繰り返し範囲と、図７のルー
プ７０１の元の繰返し範囲の積集合は、「１」から
「１」、「２」から「２」、空集合となり、これをループ
７０１を並列化した時のループ範囲とする。By the process (2) in the step 601, the intersection of the divided loop repetition range and the original repetition range of the loop 701 in FIG. 7 is changed from “1” to “1”, “2”. From “2” to an empty set, which is a loop range when the loop 701 is parallelized.

【０１１１】次に、図１３におけるループ類テーブル１
１０２に接続されたループ７０２，７０８を並列化す
る。図６におけるステップ６０１での処理（１）によ
り、図７のループ７０２に対しては、ステップ６００の
処理で設定された各ループの上限値「３０」と下限値
「１」を持つ仮想的なループの並列化によるループ繰り
返し範囲の分割は、「１〜１０」、「１１〜２０」、「２
１〜３０」となる。Next, the loop type table 1 in FIG.
The loops 702 and 708 connected to 102 are parallelized. As a result of the processing (1) in step 601 in FIG. 6, the virtual loop having the upper limit value “30” and the lower limit value “1” of each loop set in the processing in step 600 for the loop 702 in FIG. The division of the loop repetition range by the parallelization of the loop includes “1 to 10”, “11 to 20”, “2”.
1 to 30 ".

【０１１２】また，図７のループ７０８に対しては，ス
テップ６００の処理で設定された各ループの上限値「３
１」と下限値「２」を持つ仮想的なループの並列化による
ループ繰り返し範囲の分割は、「２〜１１」、「１２〜
２１」、「２２〜３１」となる。尚、これらはコンパイラ
の実際の処理では、プロセッサ番号をp（p=0,1,2）とす
ると、各々、10*p+1から10*(p+1)、10*p+2から10*(p+1)
+1と表現される。For the loop 708 of FIG. 7, the upper limit value of each loop set in the processing of step 600 is “3”.
The division of the loop repetition range by parallelizing a virtual loop having “1” and the lower limit “2” is “2 to 11”, “12 to
21 "and" 22 to 31 ". In the actual processing of the compiler, these are 10 * p + 1 to 10 * (p + 1) and 10 * p + 2 to 10 *, assuming that the processor number is p (p = 0, 1, 2). * (p + 1)
Expressed as +1.

【０１１３】ステップ６０１での処理（２）により、上
記分割されたループ繰り返し範囲と、図７におけるルー
プ７０１の元の繰返し範囲の積集合は、ループ７０２に
対しては、「１〜１０」、「１１〜２０」、「２１〜３
０」までとなり、ループ７０８に対しては、「２〜１
１」、「１２〜２１」、「２２〜２９」までとなる。尚、
これらはコンパイラの実際の処理では、プロセッサ番号
をp（p=0,1,2）とすると、各々、min(1, 10*p+1)からma
x(30, 10*(p+1))、min(2, 10*p+2)からmax(29, 10*(p+
1)+1)と表現される。As a result of the processing (2) in step 601, the product set of the divided loop repetition range and the original repetition range of the loop 701 in FIG. "11-20", "21-3"
0 ”, and for the loop 708,“ 2 to 1 ”
1 "," 12 to 21 ", and" 22 to 29 ". still,
In the actual processing of the compiler, if the processor number is p (p = 0, 1, 2), these are calculated from min (1, 10 * p + 1) to ma
x (30, 10 * (p + 1)), min (2, 10 * p + 2) to max (29, 10 * (p +
It is expressed as 1) +1).

【０１１４】以上の２つの範囲を各々、ループ７０２，
７０８を並列化した時のループ範囲とする。ループ類テ
ーブル１１０３に接続されたループ７０３，７０７につ
いても同様に、並列化できる。Each of the above two ranges is defined as a loop 702,
Let 708 be the loop range when parallelized. The loops 703 and 707 connected to the loop table 1103 can be similarly parallelized.

【０１１５】以上の並列化により、ループ類テーブル１
１０２に接続されたループ７０２，７０８を並列化する
時が、最も予測実行時間が短いとして、図６における残
りの処理を説明する。With the above parallelization, the loop type table 1
The remaining processing in FIG. 6 will be described on the assumption that the prediction execution time is the shortest when the loops 702 and 708 connected to 102 are parallelized.

【０１１６】図６におけるステップ６０２での処理によ
り、選択されたループ７０２，７０８に対するループテ
ーブル９００，９５０の並列化フラグ９１３に対応する
フィールドをＴＲＵＥにする。The field corresponding to the parallel flag 913 of the loop tables 900 and 950 for the selected loops 702 and 708 is set to TRUE by the processing in step 602 in FIG.

【０１１７】図１におけるループ並列化処理部１０７
は、辞書１２０、中間語１３０およびループテーブル１
４０を入力して並列化が決定されたループを並列化変換
して、その結果を辞書１２０、中間語１３０に反映す
る。尚、このループ並列化処理部１０７の処理に関して
は従来の技術であり省略する。Loop parallel processing section 107 in FIG.
Is the dictionary 120, the intermediate language 130, and the loop table 1
40 is input, the loop for which parallelization is determined is converted into parallel, and the result is reflected in the dictionary 120 and the intermediate language 130. Note that the processing of the loop parallel processing unit 107 is a conventional technique and will not be described.

【０１１８】図１におけるコード生成処理部１０８は、
辞書１２０、中間語１３０を入力してソースプログラム
レベルまたはオブジェクトコードレベルの並列化プログ
ラム１８０を出力する。このコード生成処理部１０８の
処理に関しても従来の技術であり省略する。The code generation processing unit 108 in FIG.
The dictionary 120 and the intermediate language 130 are input, and a parallel program 180 at a source program level or an object code level is output. The processing of the code generation processing unit 108 is also a conventional technique, and will not be described.

【０１１９】このような図１における並列化コンパイラ
１００の処理で得られた並列化プログラム１８０の具体
例を図１４で示す。図１４は、図１における並列化プロ
グラムの具体例を示す説明図である。本例は、ソースプ
ログラムスタイルで書いたものであり、行１４０１のfo
rk関数呼出し、行１４１４のjoin関数呼出しで囲まれた
部分の行１４０２〜行１４１３までが各プロセッサで並
列に実行される。FIG. 14 shows a specific example of the parallelized program 180 obtained by the processing of the parallelizing compiler 100 in FIG. FIG. 14 is an explanatory diagram showing a specific example of the parallelized program in FIG. This example is written in the source program style,
An rk function call and a row 1402 to a row 1413 of a portion surrounded by a join function call in a row 1414 are executed in parallel by each processor.

【０１２０】行１４０３と行１４０９中のループ範囲が
図１の並列化ループ決定処理部１０６の図６に示す処理
で決定された、各々、図７におけるループ７０２と７０
８のループ繰り返し範囲である。図１の並列化ループ決
定処理部１０６でのループ範囲の分割方法から、ループ
ネスト１４０３とループネスト１４０８内で参照される
配列の、プロセッサごとの参照範囲が両ループネストで
等しいので、両ループネスト間にはプロセッサ間同期は
生成されない。The loop ranges in rows 1403 and 1409 are determined by the processing shown in FIG. 6 by the parallelized loop determination processing unit 106 in FIG. 1, and the loops 702 and 70 in FIG. 7, respectively.
8 is a loop repetition range. According to the method of dividing the loop range in the parallelized loop determination processing unit 106 in FIG. 1, the reference range of the array referenced in the loop nest 1403 and the loop nest 1408 for each processor is equal in both loop nests. No interprocessor synchronization is generated between them.

【０１２１】図１５は、図１における並列化コンパイラ
を実装する並列計算機システムの構成例を示すブロック
図である。本図１５において、１５０１は共有メモリ、
１５０２は論理プロセッサエレメント、１５０３は制御
用ネットワーク、１５０４は入出力用論理プロセッサエ
レメント、１５０５は入出力用コンソールまたはワーク
ステーションを表す。FIG. 15 is a block diagram showing a configuration example of a parallel computer system in which the parallelizing compiler shown in FIG. 1 is mounted. In FIG. 15, reference numeral 1501 denotes a shared memory;
Reference numeral 1502 denotes a logical processor element, 1503 denotes a control network, 1504 denotes an input / output logical processor element, and 1505 denotes an input / output console or workstation.

【０１２２】図１に示した並列化コンパイラ１００は、
入出力用コンソールまたはワークステーション１５０５
において実行され、並列ソースプログラムまたは並列オ
ブジェクトプログラムに変換される。並列ソースプログ
ラムは、さらに、論理プロセッサエレメント１５０２向
けのコンパイラにより並列オブジェクトプログラムに変
換される。The parallelizing compiler 100 shown in FIG.
I / O console or workstation 1505
And converted into a parallel source program or a parallel object program. The parallel source program is further converted into a parallel object program by a compiler for the logical processor element 1502.

【０１２３】また、並列オブジェクトプログラムはリン
カによりロードモジュールに変換され、入出力用論理プ
ロセッサエレメント１５０４を通じて共有メモリ１５０
１にロードされ、各論理プロセッサエレメント１５０２
により実行される。この論理プロセッサエレメント１５
０２の起動、終了などの制御は制御用ネットワーク１５
０３を通じて行われる。The parallel object program is converted into a load module by the linker, and is shared by the shared memory 150 through the input / output logical processor element 1504.
1 and each logical processor element 1502
Is executed by This logical processor element 15
02 is controlled by the control network 15
03.

【０１２４】以上、図１〜図１６を用いて説明したよう
に、本例の並列化コンパイラでは、配列データの参照パ
ターンが同じループ群を一つの同値なループとして扱う
ことが可能となる。これによって、ループネストにまた
がった並列化ループ候補の組合せ数を削減でき、短い解
析時間で複数のループネストにまたがったループ並列化
を考慮した最適な並列ループが決定できる。As described above with reference to FIGS. 1 to 16, the parallelizing compiler of this embodiment can handle a group of loops having the same reference pattern of array data as one equivalent loop. This makes it possible to reduce the number of combinations of parallelized loop candidates spanning loop nests, and to determine an optimal parallel loop in a short analysis time in consideration of loop parallelism spanning a plurality of loop nests.

【０１２５】また、ループネストにまたがった並列化ル
ープ候補の組合せ数を削減でき、短い解析時間で近似を
用いずに最適な並列ループが決定できる。また、ループ
類中のループに対して、各プロセッサが分担して計算す
るデータの分担範囲を、データの宣言範囲によらずに各
ループでの配列データの参照範囲から決定できるので、
ループ分割をより柔軟に行なうことができ、最適な並列
ループが決定できる。尚、本発明は、図１〜図１６を用
いて説明した実施例に限定されるものではなく、その要
旨を逸脱しない範囲において種々変更可能である。Further, it is possible to reduce the number of combinations of parallelized loop candidates spanning loop nests, and to determine an optimal parallel loop in a short analysis time without using approximation. In addition, since the allocation range of data to be calculated by being shared by each processor with respect to the loops in the loops can be determined from the reference range of the array data in each loop without depending on the declaration range of the data,
Loop division can be performed more flexibly, and an optimal parallel loop can be determined. The present invention is not limited to the embodiment described with reference to FIGS. 1 to 16 and can be variously modified without departing from the gist thereof.

【０１２６】[0126]

【発明の効果】本発明によれば、複数のループネストに
またがったループ並列化を考慮し、近似を用いずに、か
つ、ループ分割をより柔軟に行なって、最適な並列ルー
プの決定を行なうことが可能である。According to the present invention, an optimum parallel loop is determined without using approximation and with more flexible loop division in consideration of parallelization of loops over a plurality of loop nests. It is possible.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の並列化ループ決定方法を実行するシス
テムの構成例を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of a system that executes a parallelized loop determination method according to the present invention.

【図２】図１における参照パターン検出処理部の処理動
作例を示す説明図である。FIG. 2 is an explanatory diagram showing a processing operation example of a reference pattern detection processing unit in FIG. 1;

【図３】図１における細分テーブル作成処理部の処理動
作例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of a processing operation of a subdivision table creation processing unit in FIG. 1;

【図４】図１におけるループ類検出処理部の処理動作例
を示すフローチャートである。FIG. 4 is a flowchart illustrating an example of a processing operation of a loop detection processing unit in FIG. 1;

【図５】図１におけるエッジ接続処理部の処理動作例を
示す説明図である。FIG. 5 is an explanatory diagram illustrating an example of a processing operation of an edge connection processing unit in FIG. 1;

【図６】図１における並列化ループ決定処理部の処理動
作例を示すフローチャートである。FIG. 6 is a flowchart illustrating an example of a processing operation of a parallelization loop determination processing unit in FIG. 1;

【図７】図１におけるソースプログラムの具体例を示す
説明図である。FIG. 7 is an explanatory diagram showing a specific example of a source program in FIG. 1;

【図８】図１における参照パターンテーブルの構成例を
示す説明図である。FIG. 8 is an explanatory diagram showing a configuration example of a reference pattern table in FIG. 1;

【図９】図１におけるループテーブルの構成例を示す説
明図である。FIG. 9 is an explanatory diagram showing a configuration example of a loop table in FIG. 1;

【図１０】図１における細分ループテーブルとループテ
ーブルとの関連を示す説明図である。FIG. 10 is an explanatory diagram showing the relationship between the subdivision loop table and the loop table in FIG. 1;

【図１１】図１におけるループテーブルと細分ループテ
ーブルおよびループ類テーブル間の関係を表わした説明
図である。FIG. 11 is an explanatory diagram showing a relationship among a loop table, a subdivision loop table, and a loop type table in FIG. 1;

【図１２】図５におけるステップ５００の処理を行なっ
た結果の細分テーブルとループ類テーブルとの関係を示
す説明図である。FIG. 12 is an explanatory diagram showing a relationship between a subdivision table and a loop type table as a result of performing the processing of step 500 in FIG. 5;

【図１３】図１におけるループ類グラフの構成例を示す
説明図である。FIG. 13 is an explanatory diagram showing a configuration example of a loop type graph in FIG. 1;

【図１４】図１における並列化プログラムの具体例を示
す説明図である。FIG. 14 is an explanatory diagram showing a specific example of the parallelized program in FIG. 1;

【図１５】図１における並列化コンパイラを実装する並
列計算機システムの構成例を示すブロック図である。FIG. 15 is a block diagram showing a configuration example of a parallel computer system implementing the parallelizing compiler in FIG. 1;

【図１６】図１におけるシステムのハードウェア構成例
を示すブロック図である。FIG. 16 is a block diagram illustrating an example of a hardware configuration of the system in FIG. 1;

【符号の説明】[Explanation of symbols]

１：表示装置、２：入力装置、３：外部記憶装置、４：
情報処理装置、５：光ディスク、６：駆動装置、１０
０：並列化コンパイラ、１０１：構文解析処理部、１０
２：データ依存解析処理部、１０３：並列性解析処理
部、１０４：参照パターン検出処理部、１０５：ループ
類グラフ作成処理部、１０６：並列化ループ決定処理
部、１０７：ループ並列化処理部、１０８：コード生成
処理部、１１０：ソースプログラム、１２０：辞書、１
３０：中間語、１４０：ループテーブル、１５０：参照
パターンテーブル、１６０：ループ類グラフ、１７０：
細分ループテーブル、１８０：並列化プログラム、１０
５１：細分ループテーブル作成処理部、１０５２：ルー
プ類検出処理部、１０５３：エッジ接続処理部、８０
０，８１０，８２０，８３０，８４０：参照パターンテ
ーブル、８０１〜８０５，８１１〜８１６，８２１〜８
２６，８３１〜８３６，８４１〜８４６：フィールド、
９００，９２０，９３０，９４０，９５０：ループテー
ブル、９０１〜９１３：フィールド、１０１１，１０１
２：細分ループテーブル、１０２１，１０２３：入口リ
スト、１０２２，１０２４：出口リスト、１０３１，１
０３２：ポインタ、１１０１〜１１０３：ループ類テー
ブル、１１１１〜１１１３：リスト、１３０１：入口ノ
ード、１３０２：出口ノード、１５０１：共有メモリ、
１５０２：論理プロセッサエレメント、１５０３：制御
用ネットワーク、１５０４：入出力論理プロセッサエレ
メント、１５０５：入出力用コンソールまたはワークス
テーション。1: display device, 2: input device, 3: external storage device, 4:
Information processing device, 5: optical disk, 6: drive device, 10
0: parallelizing compiler, 101: syntax analysis processing unit, 10
2: data dependence analysis processing unit, 103: parallelism analysis processing unit, 104: reference pattern detection processing unit, 105: loop type graph creation processing unit, 106: parallelization loop determination processing unit, 107: loop parallelization processing unit, 108: code generation processing unit, 110: source program, 120: dictionary, 1
30: intermediate language, 140: loop table, 150: reference pattern table, 160: loop type graph, 170:
Subdivision loop table, 180: parallelized program, 10
51: subdivision loop table creation processing unit, 1052: loop detection processing unit, 1053: edge connection processing unit, 80
0, 810, 820, 830, 840: Reference pattern table, 801 to 805, 811 to 816, 821 to 8
26,831 to 826,841 to 846: field,
900, 920, 930, 940, 950: loop table, 901 to 913: field, 1011, 101
2: subdivision loop table, 1021, 1023: entrance list, 1022, 1024: exit list, 1031, 1
032: pointer, 1101 to 1103: loop table, 1111 to 1113: list, 1301: entry node, 1302: exit node, 1501: shared memory,
1502: logical processor element, 1503: control network, 1504: input / output logical processor element, 1505: input / output console or workstation.

Claims

【特許請求の範囲】[Claims]

【請求項１】ソースプログラムを入力して、並列化可
能なループを検出し、該並列化可能なループを並列計算
機向けに分割する指示文を含むプログラムもしくは並列
化されたオブジェクトコードを出力するシステムによる
上記並列化可能なループの決定方法であって、上記並列
化可能ループ中に現れる各配列参照の添字に、該配列参
照を囲むループのループ制御変数がどのようなパターン
で現れるかを検出するステップと、所定の条件に基づき
上記パターンが同類とされる複数のループを一つの類に
まとめてループ類を生成するステップと、上記ループ類
中の各ループ範囲に基づき該ループ類中の各ループを並
列化する場合の各々のループ範囲を求め、該ループ範囲
に従って各ループ類の各ループを並列化した場合の実行
時間を評価して、並列化するのに最適なループ類を特定
するステップとを有し、上記最適なループ類中の各ルー
プを上記並列化可能なループとして選択することを特徴
とする並列化ループ決定方法。1. A system for inputting a source program, detecting a parallelizable loop, and outputting a program or a parallelized object code including a directive for dividing the parallelizable loop for a parallel computer. Wherein the loop control variable of the loop surrounding the array reference appears in a subscript of each array reference appearing in the parallelizable loop. A step of generating loops by grouping a plurality of loops having the same pattern based on predetermined conditions into one class, and generating each loop in the loops based on each loop range in the loops. Are obtained in the case where the loops are parallelized, the execution time when each loop of each loop is parallelized according to the loop range is evaluated, and A step of identifying loops that are optimal for queuing, and selecting each loop in the optimal loops as the parallelizable loop.