JPS62169272A

JPS62169272A - Unrolling processing system for vector arithmetic string loop

Info

Publication number: JPS62169272A
Application number: JP61011577A
Authority: JP
Inventors: Masaki Aoki; 正樹青木; Morie Sagawa; 佐川　守江
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-01-22
Filing date: 1986-01-22
Publication date: 1987-07-25
Also published as: JPH054712B2

Abstract

PURPOSE:To improve the execution performance of a program by grasping a data dependent relation in an outside loop regarding a vector arithmetic string after a process made into a vector setting the number of rotation of the outside loop to 1/N according to a result, and developing the vector arithmetic string in N-fold. CONSTITUTION:A data dependent relation analysis part 15, when an inside loop in a multiplex loop in a compile objected program is made into a vector, analyzes the data dependent relation regarding an unrolling in the outside loop of the arithmetic string made into the vector. According to an analyzed result, an unrolling execution condition deciding part 16 decides the possibility of the unrolling. When the unrolling is possible, an unrolling processing part 17 sets the number of the rotation of the outside loop to 1/N (N represents an integer more than two), and develops the vector arithmetic string in N-fold.

Description

【発明の詳細な説明】〔概要〕自動ベクトル化対象プログラムのコンパイルにあたって
、ベクトル化後のベクトル演算列に関する外側ループ中
のデータ依存関係を把握し、その結果に従って、外側ル
ープの回転数を１／Ｎとし。[Detailed Description of the Invention] [Summary] When compiling a program to be automatically vectorized, data dependencies in the outer loop regarding the vector operation sequence after vectorization are grasped, and the rotation speed of the outer loop is reduced by 1/1 according to the result. N.

ベクトル演算列をＮ倍に展開することにより、コンパイ
ルされたプログラムの実行性能を向上させる。By expanding a vector operation sequence N times, the execution performance of a compiled program is improved.

〔産業上の利用分野〕[Industrial application field]

本発明は、ヘクトル計算機を持つデータ処理装置によっ
て実行されるプログラムをコンパイルする処理方式に係
り、特にループ中のベクトル演算列をアンローリングす
るベクトル演算列ループアンローリング処理方式に関す
るものである。The present invention relates to a processing method for compiling a program executed by a data processing device having a hector computer, and more particularly to a vector operation string loop unrolling processing method for unrolling a vector operation string in a loop.

〔従来の技術〕[Conventional technology]

例えばＦＯＲＴＲＡＮ言語ＴＲ上り作成されたプログラ
ムを、ベクトル計算機を用いて実行させるために、ＤＯ
ループの配列等について、自動的にベクトル演算列を生
成するコンパイラが用いられている。このコンパイラが
生成するオブジェクトについて、ベクトル化率を上げる
ことは、ベクトル計算機による実行性能を向上させるた
めに重要な課題とされている。しかしながら、ハードウ
ェア資源であるベクトル計算機を最大限有効に使うには
、ベクトル化後のベクトル演算列を最適にスケジューリ
ングすることも必要である。For example, in order to execute a program created using the FORTRAN language TR using a vector computer, DO
Compilers are used that automatically generate vector operation sequences for loop arrays and the like. Increasing the vectorization rate of objects generated by this compiler is considered an important issue in order to improve the execution performance of a vector computer. However, in order to make the most effective use of a vector computer, which is a hardware resource, it is also necessary to optimally schedule the vector operation sequence after vectorization.

この最適スケジューリングとは、ベクトル計算機におけ
るロード・ストアパイプライン、加算パイプライン、乗
算パイプライン等を流れるデータの密度を濃＜シ、実行
の待ち時間が少なくなるように、ベクトル演算列を並べ
ることである。This optimal scheduling is the process of arranging vector operation sequences in such a way that the density of data flowing through the load/store pipeline, addition pipeline, multiplication pipeline, etc. in a vector computer is increased, and the waiting time for execution is reduced. be.

この最適スケジューリングのため、従来、ソースレベル
のスカライメージで、ユーザの手作業により、プログラ
ムをチューニングすることが行われていた。In order to achieve this optimal scheduling, programs have conventionally been manually tuned by users using source-level scalar images.

〔発明が解決しようとする問題点〕しかし、ユーザが手作業により、ソースプログラムをチ
ューニングした場合２次のような問題が発生する。[Problems to be Solved by the Invention] However, when a user manually tunes a source program, the following problems occur.

■　スカライメージでベクトル版にチューニングしたソ
ースプログラムは、ベクトル処理機能を持たない汎用計
算機上では、実行性能が低下する可能性がある。■ A source program tuned to a vector version using a scalar image may have lower execution performance on a general-purpose computer that does not have vector processing capabilities.

■　チューニングするために多大な労力および時間を要
する。■ It requires a lot of effort and time to tune.

■　ソースプログラムの記述性が損なわれる。■ The descriptive nature of the source program is impaired.

■　ユーザのチューニングにより性能が低下し。■ Performance deteriorates due to user tuning.

逆効果となることがある。It may have the opposite effect.

本発明は上記問題点を解決するため、ベクトル演算列を
ループアンローリングすることにより。The present invention solves the above problems by loop unrolling a vector operation sequence.

ソースプログラムから自動的に最適化されたオブジェク
トを生成する１方式を提供することを目的としている。The objective is to provide a method for automatically generating optimized objects from source programs.

〔問題点を解決するための手段〕[Means for solving problems]

第１図は本発明の基本構成例ブロック図を示す。 FIG. 1 shows a block diagram of an example of the basic configuration of the present invention.

第１図において、１０は高級言語により記述されたソー
スプログラム、１１はＣＰＵおよびメモリ等からなる処
理装置、１２はソースプログラム１０を機械語のオブジ
ェクトに翻訳するコンパイラ、１３はプログラム入力部
、１４はベクトル化処理部、１５はデータ依存関係解析
部、１６はアンローリング実施条件判定部、１７はアン
ローリング処理部、１８はオブジェクト生成部、１９は
ソースプログラム１０に対応する機械語コード列からな
るオブジェクトプログラムを表す。In FIG. 1, 10 is a source program written in a high-level language, 11 is a processing unit consisting of a CPU, memory, etc., 12 is a compiler that translates the source program 10 into a machine language object, 13 is a program input unit, and 14 is a 15 is a data dependency analysis unit; 16 is an unrolling execution condition determination unit; 17 is an unrolling processing unit; 18 is an object generation unit; 19 is an object consisting of a machine code string corresponding to the source program 10 Represents a program.

プログラム入力部１３は、ソースプログラム１０から処
理すべきソースステートメントを入力する。この入力プ
ログラムを解析することにより。The program input unit 13 inputs source statements to be processed from the source program 10 . By parsing this input program.

中間テキストが生成される。コンパイラ１２は。Intermediate text is generated. Compiler 12.

自動ベクトル化機能を備えており、ベクトル化処理部１
４によって、中間テキストを解読し、ベクトル化可能な
ものを検出して、ベクトル演算列を生成する。Equipped with automatic vectorization function, vectorization processing section 1
4, the intermediate text is decoded, those that can be vectorized are detected, and a vector operation sequence is generated.

データ依存関係解析部１５は、多重ループにおける内側
のループが、ベクトル化処理部１４によって、ベクトル
化されている場合に、そのベクトル化されたベクトル演
算列の外側ループにおけるデータ依存関係を解析するも
のである。The data dependence analysis unit 15 analyzes the data dependence relationship in the outer loop of the vectorized vector operation sequence when the inner loop in the multiple loop is vectorized by the vectorization processing unit 14. It is.

アンローリング実施条件判定部１６は、データ依存関係
解析部１５による解析結果により、予め各データ依存関
係に対応してアンローリングの可否情報が登録されたテ
ーブルを検索することにより、アンローリングの可否を
判定する。ループのアンローリングとは、外側ループの
回転数を１／Ｎ　（Ｎは２以上の整数）とし、ベクトル
演算列をＮ倍に展開する処理である。The unrolling execution condition determining unit 16 determines whether or not unrolling is possible by searching a table in which unrolling permission information is registered in advance for each data dependency relationship based on the analysis result by the data dependency relationship analysis unit 15. judge. Loop unrolling is a process in which the number of rotations of the outer loop is set to 1/N (N is an integer of 2 or more) and a vector operation sequence is expanded N times.

アンローリング処理部１７は、アンローリング実施条件
判定部１６により、アンローリング可と判定された場合
に、ベクトル演算列を分解して。The unrolling processing unit 17 decomposes the vector operation sequence when the unrolling execution condition determining unit 16 determines that unrolling is possible.

ループアンローリングを行う。外側ループの回転数は、
１／Ｎに削減されるが、端数が出る場合には、その残り
のベクトル演算列による処理命令列を、ループの外側に
（」加する。Perform loop unrolling. The rotation speed of the outer loop is
It is reduced to 1/N, but if a fraction is obtained, the processing instruction sequence based on the remaining vector operation sequence is added to the outside of the loop.

ベクトル化され、アンローリングされた中間テキストは
、必要に応じてさらに他の手段により最適化される。オ
ブジェクト生成部１８は、最終的にオブジェクトプログ
ラム１９を生成する。The vectorized and unrolled intermediate text is further optimized by other means as necessary. The object generation unit 18 finally generates an object program 19.

〔作用〕[Effect]

以下、ＦＯＲＴＲＡＮプログラムのループアンローリン
グを例にして１本発明の詳細な説明する。Hereinafter, the present invention will be described in detail using loop unrolling of a FORTRAN program as an example.

例えば。for example.

Ｄｏ　１０　Ｊ＝１，１００ＤＯ１０Ｉ＝１．１００００Ａ（１，Ｊ）＝Ｂ（ｒ、Ｊ）＋Ｃ（１，Ｊ）＊Ｄ（１，
Ｊ）１０　Ｃ０ＮＴＩＮＵＥという二重ループのプログラムは、ベクトル化処理部１
４により、内側ループについて９次のようにベクトル化
が行われる。Do 10 J=1,100 DO10I=1.10000 A(1,J)=B(r,J)+C(1,J)*D(1,
J) The double-loop program called 10 C0NTINUE is the vectorization processing unit 1.
4, the inner loop is vectorized to the ninth order.

Ｄｏ　１０　Ｊ＝１．１００八（＊、Ｊ）＝８（＊、Ｊ）＋Ｃ（車、Ｊ）＊［）（本
、Ｊ）１０　Ｃ０ＮＴＩＮＵＥこ゛こで、配列中の「＊」は、１から１００００までの
値をとるベクトル・パラメータであって、ベクトル長は
１００００である。Do 10 J = 1.100 8 (*, J) = 8 (*, J) + C (car, J) * [) (book, J) 10 C0NTINUE Here, "*" in the array is from 1 to A vector parameter that takes values up to 10,000, and the vector length is 10,000.

アンローリング処理部１７は、これについて。The unrolling processing unit 17 is about this.

次のようにループアンローリングを行う。Perform loop unrolling as follows.

ＤＯ１０Ｊ・１，１００．２＾（＊、Ｊ）・Ｂ（車、　Ｊ）　＋Ｃ（傘、Ｊ）寧Ｄ（
傘、Ｊ）八（率、Ｊ＋１）・Ｂ（本、Ｊ＋１）＋Ｃ（＊
、Ｊ＋１）＊Ｄ（本、Ｊ÷１）ｔｏ　　Ｃ０ＮＴＩＮＵ
Ｅ即ち、ループ制御変数の増分値を２倍にすることにより
、外側ループのループ回転数を１／２とし、内部のベク
トル演算列を分解して２倍にする。DO10J・1,100.2 ^(*, J)・B(Car, J) +C(Umbrella, J) Ning D(
Umbrella, J) 8 (rate, J+1), B (book, J+1) + C (*
, J+1)*D(book, J÷1) to C0NTINU
E That is, by doubling the increment value of the loop control variable, the loop rotation speed of the outer loop is halved, and the internal vector operation sequence is decomposed and doubled.

３重展開以上についても同様である。展開されたベクト
ル演算列は１個別にベクトル計算機におけるパイプライ
ンによって処理されるので、パイプラインの、処理密度
を高密度化することが可能になり、パイプライン・スケ
ジューリングが最適化される。The same applies to triple expansion or more. Since the expanded vector operation sequence is individually processed by the pipeline in the vector computer, it becomes possible to increase the processing density of the pipeline, and the pipeline scheduling is optimized.

また１次のような場合には、ベクトル演算における共通
式の最適化によるベクトルテキスト最適化が可能になる
。例えば、ベクトル化後のベクトル演算列が。Furthermore, in a first-order case, vector text optimization is possible by optimizing a common expression in vector operations. For example, the vector operation sequence after vectorization.

Ｄｏ　１０　Ｊ＝１，１００八（＊、Ｊ＋１）・Ｂ　（＊　、　Ｊ）＋＾（＊、Ｊ）
１０　Ｃ０ＮＴＩＮＵＥであるとする。ここで、配列中の「＊」は、前例と同様
に、１から１００００までの値をとるベクトル・パラメ
ータである。Do 10 J=1,100 8 (*, J+1)・B (*, J)+^(*, J)
10 C0NTINUE. Here, "*" in the array is a vector parameter that takes values from 1 to 10,000, as in the previous example.

Ｄｏ　１０　Ｊ＝１．１００．２Ａ（＊、Ｊ＋１）＝Ｂ（＊、Ｊ）十へ（＊、Ｊ）　　　
　　　　・・・・・・■八（＊、Ｊ＋２）＝Ｂ（車、Ｊ
）十へ（＊、Ｊ＋１）　　　　　・・・・・・■１０　
Ｃ０ＮＴＩＮＵＥこのベクトル演算列■における右辺第２項は。Do 10 J=1.100.2 A(*, J+1)=B(*, J) to ten(*, J)
・・・・・・■8(*, J+2)=B(car, J
) to ten (*, J+1) ・・・・・・■10
C0NTINUE The second term on the right side of this vector operation sequence ■ is.

ベクトル演算列■の左辺と同じ値をとる。ベクトル計算
機により、ベクトル演算列■を実行すると。Takes the same value as the left side of the vector operation sequence ■. When the vector operation sequence ■ is executed using a vector calculator.

ベクトルレジスタに／ｌ（＊、Ｊ＋１）が得られるので
２次のベクトル演算列■の実行において、　Ａ（＊、Ｊ
＋１）をロードする必要がない。これにより、高速実行
が可能になり、ベクトルテキストの最適化が可能になる
。Since /l(*, J+1) is obtained in the vector register, in executing the second-order vector operation sequence ■, A(*, J
+1) There is no need to load. This enables fast execution and optimization of vector text.

〔実施例〕〔Example〕

第２図は本発明の一実施例処理説明図、第３図はアンロ
ーリング可否テーブルの例、第４図はデータ依存関係値
とアンローリング展開数との関連を説明する図を示す。FIG. 2 is a diagram illustrating a process of an embodiment of the present invention, FIG. 3 is an example of an unrolling availability table, and FIG. 4 is a diagram illustrating the relationship between data dependency values and the number of unrolling expansions.

本発明によるループアンローリング処理は１例えば第２
図に示すように行われる。なお、この処理は、処理対象
ループ内にベクトル化された演算列が存在するときに呼
び出される。以下の説明における処理番号■〜［相］は
、第２図に示す番号■〜［相］に対応する。The loop unrolling process according to the present invention includes one
It is done as shown in the figure. Note that this process is called when a vectorized operation sequence exists in the loop to be processed. Processing numbers ■ to [phase] in the following description correspond to numbers ■ to [phase] shown in FIG.

■　データ依存関係値をもとに、第３図に示すようなア
ンローリング可否テーブルを検索する。■ Search the unrolling possibility table as shown in FIG. 3 based on the data dependency relationship value.

なお、データ依存関係値およびアンローリング可否テー
ブルについては、後に詳述する。Note that the data dependency value and the unrolling possibility table will be described in detail later.

■　アンローリング可否テーブルを検索した結果により
、アンローリングの可／不可を判定し。■ Determine whether unrolling is possible or not based on the result of searching the unrolling possibility table.

アンローリングが不可である場合には、アンローリング
による最適化処理を行わずに１次の最適化処理へ進む。If unrolling is not possible, the process proceeds to the first optimization process without performing the optimization process by unrolling.

■　他のアンローリング実施条件についても判定する。■ Other unrolling implementation conditions are also determined.

この条件として１例えばループの回転数が２以上（陽に
判明している場合）であること。This condition is 1, for example, the number of rotations of the loop is 2 or more (if it is explicitly known).

ループの出口が１つであること、ループ内でベクトル長
の変化がないことなどがある。また。There is one exit of the loop, there is no change in vector length within the loop, etc. Also.

アンローリングにより、実行効率がよくなるかどうかの
条件についても判定する。これらの各条件が満足されな
い場合１次の最適化処理へ進む。The conditions for determining whether unrolling improves execution efficiency are also determined. If each of these conditions is not satisfied, the process proceeds to the first optimization process.

■　ループアンローリングのために、外側ループの回転
数を１／Ｎにする。なお、説明を節単にするために、以
下、Ｎ＝２の場合について説明する。■ For loop unrolling, set the rotation speed of the outer loop to 1/N. Note that, in order to simplify the explanation, the case where N=2 will be explained below.

■　ベクトル演算列を２倍にする。即ち２元のベクトル
演算列に対して、配列の添字式の値を歩進したベクトル
演算列を生成して付加する。■ Double the vector operation sequence. That is, a vector operation sequence is generated and added by incrementing the value of the subscript expression of the array to a two-dimensional vector operation sequence.

■　ループ回転数が定数であるかどうかを判定する。定
数でない場合には、処理■へ制御を移す。■ Determine whether the loop rotation speed is constant. If it is not a constant, control is transferred to process (2).

■　元のループ回転数が偶数であるか奇数であるかを判
定する。偶数である場合１次の最適化処理へ進み、奇数
である場合には、処理■を実行する。■ Determine whether the original loop rotation number is even or odd. If the number is even, proceed to the first-order optimization process; if the number is odd, process (2) is executed.

■　元のループにおいて最後に実行されるベクトル演算
列の部分を、新しいループの外に付加して１次の最適化
処理へ進む。(2) Add the part of the vector operation sequence that is executed last in the original loop to the outside of the new loop and proceed to the first optimization process.

■　ループ回転数が変数である場合、ダイナミックに回
転数を判定するテキストを生成して、付加する。■ If the loop rotation speed is a variable, generate and append text that dynamically determines the rotation speed.

［相］　回転数の判定に対応して、１／２にした回転数
の端数となる分のベクトル演算列をループの外に付加す
る。その後１次の最適化処理へ進む。[Phase] In response to the determination of the rotation speed, a vector operation sequence corresponding to the fraction of the rotation speed reduced to 1/2 is added to the outside of the loop. After that, the process proceeds to the first optimization process.

ベクトル演算列をループアンローリングする場合、アン
ローリングによって、配列の定義／参照に関するベクト
ル計算機による実行順番が意図しないものとなって、正
しい結果が得られなくなる可能性がある。そのため９本
発明では、予め２次のようなデータ依存関係値を求めて
おき、これによって、アンローリングの可否を決定する
。When unrolling a vector operation sequence in a loop, the unrolling may cause the vector computer to execute the definition/reference of the array in an unintended order, making it impossible to obtain correct results. Therefore, in the present invention, a quadratic data dependency relationship value is obtained in advance, and based on this, it is determined whether or not unrolling is possible.

データ依存関係値は、ループ内における前後する配列添
字式の相対的な値関係を示すものと考えてよい。例えば
、前に現れる配列が、　Ａ　（Ｉ）であって、後に現れ
る配列が、Ａ　（１＋２）であるとき、データ依存関係
値は、制御変数■が共通しているので、■＝０として９
次のように求められる。The data dependency value can be considered to indicate the relative value relationship between the preceding and following array subscript expressions within the loop. For example, when the array that appears before is A (I) and the array that appears after is A (1+2), the data dependency value is 9 with ■ = 0 because the control variable ■ is common.
It is calculated as follows.

（０）−（０＋２）＝−２データ依存関係値の種類は２例えば、以下の通りである
。(0)-(0+2)=-2 The types of data dependency values are 2, for example, as follows.

（星号）　　　（意味） φ　：重なりなしくデータ依存関係なし）。(Star code) (Meaning) φ: No overlap, no data dependencies).

＋　：順方向のデータ依存関係あり。+: There is a forward data dependency.

−二進方向のデータ依存関係あり。- There is a binary data dependency.

＊　：制御変数が出現していない。*: Control variable does not appear.

？　：データ依存関係が不明である。? : Data dependency is unknown.

＋ＯＲ−：順方向のデータ依存関係あり。+OR-: There is a forward data dependency.

（スカラとベクトル）０　：同じ位置をアクセスしている。(scalars and vectors) 0: Accessing the same location.

十の値：順方向にいくつ、ずれているかを表す。Value of 10: Indicates the number of shifts in the forward direction.

−の値：逆方向にいくつ、ずれているかを表す。- value: Indicates the amount of deviation in the opposite direction.

アンローリングの可否は１以上のようなデータ依存関係
値によって、決められる。そのため２例えば第３図に示
すようなアンローリング可否テーブルが用いられる。Whether or not unrolling is possible is determined by a data dependency value such as 1 or more. Therefore, for example, an unrolling possibility table as shown in FIG. 3 is used.

第３図図示アンローリング可否テーブルにおいて、○は
アンローリング可能、×はアンローリング不可能、△は
値によって可否が決定されるものを表している。縦の列
は１次元目のデータ依存関係値、横の列は２次元目のデ
ータ依存関係値を表している。In the unrolling possibility table shown in FIG. 3, ◯ indicates that unrolling is possible, × indicates that unrolling is impossible, and △ indicates whether unrolling is possible or not. The vertical columns represent first-dimensional data dependency relationship values, and the horizontal columns represent second-dimensional data dependency relationship values.

Ｄｏ　１０　Ｊ＝１．ＮＤｏ　１０１＝ｌ、ＭＡ　（Ｌ　Ｊ）　＝・・・・１０　Ｃ０ＮＴＩＮｔｌＥこのような場合、■が１次元目であり、Ｊが２次元目で
ある。Do 10 J=1. N Do 101=l, M A (L J) =... 10 C0NTINtlE In such a case, ■ is the first dimension and J is the second dimension.

第３図において、×印に該当する場合には、アンローリ
ングすることによって、従来なかったデータ依存関係が
生じることになるので、アンローリング不可能とされる
。Δ印に該当する場合には。In FIG. 3, in cases corresponding to the x marks, unrolling is considered impossible because unrolling will result in a data dependency relationship that did not exist before. If it falls under the Δ mark.

第４図に示すデータ依存関係値と、アンローリング展開
数とによって可否が決められる。例えば。Whether or not it is possible is determined by the data dependency relationship values shown in FIG. 4 and the number of unrolling expansions. for example.

データ依存関係値が±２である場合、２重展開（即ら、
Ｎ＝２）のときにはアンローリング可能であるが、３重
展開以上（Ｎａ３）ではアンローリングが不可能とされ
る。If the data dependency value is ±2, double expansion (i.e.
When N=2), unrolling is possible, but when triple expansion or more (Na3) occurs, unrolling is impossible.

次に、ＦＯＲＴＲＡＮプログラムの例により。Next, by example of a FORTRAN program.

ループアンローリングの具体例を示す。A specific example of loop unrolling is shown below.

（ａｌ　　ループの回転数が陽に判明している場合であ
って１回転数が偶数である場合［ループアンローリング前］ＤＯ１０Ｊ・１．４八（＊、Ｊ）＝Ｂ（＊、Ｊ）＋Ｃ（＊、Ｊ）１０　Ｃ０
ＮＴＩＮｔｌＥ［ループアンローリング後］ ′ＤＯ１０Ｊ＝１．４．２八（＊、Ｊ）・Ｂ（＊、、１）＋Ｃ（木、Ｊ）Ａ（本、
Ｊ＋１）・Ｂ（＊、Ｊ＋１）＋Ｃ（＊、Ｊ＋１）１０　
　Ｃ０ＮＴＩＮＵＥ（ｂ）　　回転数が奇数である場合［ループアンローリング前］Ｄｏ　１０　Ｊ・１．５八（ネ、　Ｊ）　、、、Ｂ　（本、Ｊ）＋Ｃ（＊、Ｊ）
１０　Ｃ０ＮＴＩＮＵＥ［ループアンローリング後］ＤＯ１０Ｊ＝１＋３＋２Ａ（＊、Ｊ）＝８（車、Ｊ）＋Ｃ（＊、Ｊ）Ａ　（＊、
Ｊ＋１）・Ｂ（＊、Ｊ＋１）＋Ｃ（本、Ｊ＋１）１０　
Ｃ０ＮＴＩＮＵＥＡ　（１，５）・Ｂ（＊、５）十Ｃ（傘、５）最後にＪ
＝５のベクトル演算列が付加されている。(al If the number of rotations of the loop is explicitly known and the number of rotations is an even number [before loop unrolling] DO10J・1.4 8 (*, J) = B (*, J) + C (*, J)10 C0
NTINtlE [After loop unrolling] 'DO10J=1.4.2 8 (*, J)・B (*,, 1) + C (wood, J) A (book,
J+1)・B(*, J+1)+C(*, J+1)10
C0NTINUE (b) When the number of rotations is odd [before loop unrolling] Do 10 J・1.5 8 (ne, J) ,,,B (hon, J) + C (*, J)
10 C0NTINUE [After loop unrolling] DO10J = 1 + 3 + 2 A (*, J) = 8 (car, J) + C (*, J) A (*,
J+1)・B(*, J+1)+C(book, J+1)10
C0NTINUE A (1,5)・B(*,5)10C (umbrella, 5)Finally J
=5 vector operation sequences are added.

（Ｃ１回転数が不明な場合［ループアンローリング前］ＤＯ１０Ｊ・１．ＮＡ　（＊、　Ｊ）・Ｂ（＊、Ｊ）＋Ｃ（＊、Ｊ）１０　
Ｃ０ＮＴＩＮＵＥ［ループアンローリング後］ＩＰ（Ｎ、ＥＱ、１）　ＧＯＴＯ２０Ｄｏ　１０　Ｊ＝１．Ｎ４．．２Ａ（本、Ｊ）＝Ｂ（＊、Ｊ）＋Ｃ（ネ、Ｊ）Ａ（＊、Ｊ
＋１）＝Ｂ（本、Ｊ÷１）＋Ｃ（ネ、Ｊ＋１）１０　Ｃ
ＯＮＴＩＮＵＥＩＦ（ＭＯＤ（Ｎ、２）、ＥＱ、０）　ＧＯＴＯ３０２
０Ｃ０ＮＴｒＮＵＥＡ（＊、Ｎ）＝Ｂ（＊、Ｎ）＋Ｃ（本、Ｎ）３０　Ｃ０
ＮＴＩＮＵＥ上記実施例では、ループアンローリングの展開数を２と
したが１例えばループの回転数が陽に３の場合には、３
重展開にするというように、多重展開も可能である。い
わゆる最適化制御杆によって、ユーザがアンローリング
の展開数を外側から指定できるようにしてもよい。この
場合、ユーザは２例えば次のような最適化制御杆をソー
スプログラムに記述する。(If C1 rotation speed is unknown [before loop unrolling] DO10J・1.N A (*, J)・B(*, J)+C(*, J)10
C0NTINUE [After loop unrolling] IP (N, EQ, 1) GOTO20 Do 10 J=1. N4. ．． 2 A (book, J) = B (*, J) + C (ne, J) A (*, J
+1) = B (Book, J ÷ 1) + C (Ne, J + 1) 10 C
ONTINUE IF (MOD (N, 2), EQ, 0) GOTO302
0C0NTrNUE A (*, N) = B (*, N) + C (book, N) 30 C0
NTINUE In the above embodiment, the number of loop unrolling operations is set to 2, but if the number of rotations of the loop is explicitly 3, then
Multiple expansion is also possible, such as multiple expansion. A so-called optimization control rod may be used to allow the user to specify the number of unrolling developments from the outside. In this case, the user writes two optimization control levers, such as the following, in the source program.

ｒ＊ｖＯｃＬ　　Ｌｏｏｐ、ＵＮＲＯＬ　（４）Ｊここ
で、＊ＶＯＣＬは、この行が最適化制御杆であることを
示している。ＬＯＯＰは、最適化がループに対して有効
であることを示す。ＵＮＲＯＬ（４）は、４重展開にす
べきことを指示している。４重展開の場合２例えば次の
ようになる。r*vOcL Loop, UNROL (4)JHere, *VOCL indicates that this line is an optimization control rod. LOOP indicates that the optimization is valid for loops. UNROL (4) instructs that quadruple deployment should be performed. In the case of quadruple expansion 2 For example, it is as follows.

［ループアンローリング前］＊ＶＯＣＬ　ＬＯＯＰ、ＵＮＲＯＬ（４）Ｄｏ　１０　
Ｊ＝１．ＮＡ（＊、Ｊ）・１３（＊、’Ｊ）＋Ｃ（＊、Ｊ）１０　
Ｃ０ＮＴＩＮＵＥ［ループアンローリング後コＩＰ（Ｎ、ＬＴ、４）　ＧＯＴＯ２０Ｄｏ　１０　Ｊ＝１．Ｎ−１，４Ａ（車、　Ｊ）　、Ｂ　（本、　Ｊ）　＋Ｃ（本、Ｊ）
八（＊、Ｊ＋１）・Ｂ　（＊＋　Ｊ＋１）　十Ｇ　（本
＋　Ｊ＋１）八（＊、Ｊ＋２）・Ｂ（＊、Ｊ＋２）＋Ｃ
（＊、Ｊ＋２）Ａ（＊、Ｊ＋３）・Ｂ（＊、Ｊ＋３）＋
Ｃ（＊、Ｊ＋３）１０　Ｃ０ＮＴＩＮＵＥ２０　Ｍ＝ＭＯＤ（Ｎ、４）ＩＰ（Ｍ、ＥＱ、０）　ＧＯＴＯ５０ＩＰ（Ｍ、ＩＥｏ、１）　ＧＯＴＯ４０ＩＰ（Ｍ、ＥＱ
、２）　　ＧＯＴＯ３０八（＊、Ｎ−２）＝８（＊、Ｎ
−２）＋Ｃ（＊、Ｎ−２）３〇　八（＊、Ｎ−１）＝［
３（＊、Ｎ−１）トｃ（＊、Ｎ−１）４〇　八（＊、Ｎ
）＝８（＊、Ｎ）＋Ｃ（本、Ｎ）５０　　（：０ＮＴＩ
ＮＵＥこの例では、ユーザが指定した最適化制御行により、ア
ンローリングを４重展開で実施するとともに、制御変数
がＮであって、コンパイル時には。[Before loop unrolling] *VOCL LOOP, UNROL (4) Do 10
J=1. N A(*, J)・13(*,'J)+C(*,J)10
C0NTINUE [Co-IP after loop unrolling (N, LT, 4) GOTO20 Do 10 J=1. N-1,4 A (car, J), B (book, J) +C (book, J)
Eight (*, J+1)・B (*+ J+1) Ten G (Book+ J+1) Eight (*, J+2)・B (*, J+2)+C
(*, J+2) A(*, J+3)・B(*, J+3)+
C(*,J+3)10 C0NTINUE 20 M=MOD(N,4) IP(M,EQ,0) GOTO50 IP(M,IEo,1) GOTO40IP(M,EQ
, 2) GOTO308(*,N-2)=8(*,N
-2)+C(*,N-2)30 8(*,N-1)=[
3 (*, N-1) to c (*, N-1) 40 8 (*, N
)=8(*,N)+C(book,N)50(:0NTI
NUE In this example, the optimization control line specified by the user performs unrolling in quadruple expansion, and the control variable is N when compiling.

ループ回転数が不明であるため２回転数判定テキストを
生成して、ループの後に付加している。Since the loop rotation speed is unknown, two rotation speed determination texts are generated and added after the loop.

〔発明の効果〕〔Effect of the invention〕

以上説明したように９本発明によれば、データ依存関係
を把握することにより、自動的にベクトル演算列のルー
プアンローリングがなされることになり、これにより、
パイプライン・スケジューリングの最適化が可能になる
。また、ベクトルテキストの最適化も可能になる。従っ
て、実行性能が向上し、ユーザのチューニング時間を短
縮することができる。また、ソースプログラムについて
。As explained above, according to the present invention, loop unrolling of vector operation sequences is automatically performed by understanding data dependencies, and thereby,
Optimization of pipeline scheduling becomes possible. It also allows optimization of vector text. Therefore, execution performance is improved and the user's tuning time can be shortened. Also, regarding the source program.

Ｆ　ＯＲＴ　ＲＡ　Ｎプログラム等の記述性を保持する
ことができ、ソースレベルでの汎用計算機との互換性を
維持することができる。The descriptive nature of FORTRAN programs and the like can be maintained, and compatibility with general-purpose computers can be maintained at the source level.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は本発明の基本構成例ブロック図、第２図は本発
明の一実施例処理説明図、第３図はアンローリング可否
テーブルの例、第４図はデータ依存関係値とアンローリ
ング展開数との関連を説明する図を示す。図中、１０はソースプログラム、１１は処理装置、１２
はコンパイラ、１３はプログラム入力部。１４はベクトル化処理部、１５はデータ依存関係解析部
、１６はアンローリング実施条件判定部。１７はアンローリング処理部、１８はオブジェクト生成
部、１９はオブジェクトプログラムを表す。Fig. 1 is a block diagram of a basic configuration example of the present invention, Fig. 2 is a processing explanatory diagram of an embodiment of the present invention, Fig. 3 is an example of an unrolling possibility table, and Fig. 4 is a data dependency value and unrolling expansion. A diagram illustrating the relationship with numbers is shown. In the figure, 10 is a source program, 11 is a processing device, 12
13 is a compiler, and 13 is a program input section. 14 is a vectorization processing unit, 15 is a data dependency analysis unit, and 16 is an unrolling execution condition determination unit. Reference numeral 17 represents an unrolling processing section, 18 an object generation section, and 19 an object program.

Claims

【特許請求の範囲】自動ベクトル化を行うコンパイル処理機能を有するデー
タ処理システムにおいて、コンパイル対象プログラム中の多重ループにおける内側
のループがベクトル化されている場合に、ベクトル化さ
れたベクトル演算列の外側ループにおけるアンローリン
グに関連するデータ依存関係を解析するデータ依存関係
解析部（１５）と、少なくとも上記データ依存関係解析
部（１５）による解析結果に従って、アンローリングの
可否を判定するアンローリング実施条件判定部（１６）
と、該アンローリング実施条件判定部（１６）により、
アンローリング可と判定された場合に、上記外側ループ
の回転数を１／Ｎ（Ｎは２以上の整数）とし、ベクトル
演算列をＮ倍に展開するアンローリング処理部（１７）
とを備えたことを特徴とするベクトル演算列ループアン
ローリング処理方式。[Claims] In a data processing system having a compilation processing function that performs automatic vectorization, when an inner loop in a multiple loop in a program to be compiled is vectorized, an outer loop of a vectorized vector operation sequence A data dependency relationship analysis unit (15) that analyzes data dependency relationships related to unrolling in a loop, and an unrolling implementation condition determination that determines whether or not unrolling is possible, based on the analysis results by at least the data dependency relationship analysis unit (15). Part (16)
And, by the unrolling execution condition determination unit (16),
When it is determined that unrolling is possible, the unrolling processing unit (17) sets the rotation speed of the outer loop to 1/N (N is an integer of 2 or more) and expands the vector operation sequence N times.
A vector operation sequence loop unrolling processing method comprising: