JPH103425A

JPH103425A - Instruction supply device

Info

Publication number: JPH103425A
Application number: JP8154035A
Authority: JP
Inventors: Atsushi Kawai; 淳河井
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1996-06-14
Filing date: 1996-06-14
Publication date: 1998-01-06

Abstract

PROBLEM TO BE SOLVED: To eliminate such correction needed to the sequential execution part of a program as the insertion of a NPO code, etc., by executing the instructions to store them in an instruction buffer and executing the executable instructions in parallel to each other. SOLUTION: When no instruction codes are stored in an instruction buffer 24, an execution part 1 executes in sequence the instructions which are read out of an instruction memory 21 and also stores these instructions in a block of the buffer 24 that can store at most (n) pieces of instructions which can be executed in parallel to each other. If a block that is retrieved by an instruction address and coincident with this address can be read out, the buffer 24 reads out at most (n) pieces of continuous parallel executable instruction strings starting at the given instruction address and gives these instruction strings to the execution parts 1 to (n) to execute the instructions in parallel to each other. If a necessary instruction code string is not stored in the buffer 24, the instructions are executed in sequence and at the same time a new instruction string is stored.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、並列計算機におけ
る命令供給装置に係り、詳細には、ＶＬＩＷ型並列計算
機（Very Long Instruction Word：超長形式機械命令型
並列計算機）における命令供給装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an instruction supply device in a parallel computer, and more particularly, to an instruction supply device in a VLIW type parallel computer (Very Long Instruction Word).

【０００２】[0002]

【従来の技術】１クロック・サイクル当たり複数の命令
を発行する方法がある。この方法により、命令実行速度
がクロック速度を超えることが可能になる。2. Description of the Related Art There is a method of issuing a plurality of instructions per clock cycle. This method allows the instruction execution speed to exceed the clock speed.

【０００３】ＶＬＩＷ型並列計算機（Very Long Instru
ction Word：超長形式機械命令型並列計算機）は、真に
同時発行可能な命令をまとめて１つの超長形式機械命令
とするところまでコンパイラが責任を持って行うので、
ハードウェアは命令の同時実行可能性に関して何も迷う
必要はない。A VLIW-type parallel computer (Very Long Instrument)
ction Word: a very long machine instruction type parallel computer), the compiler takes responsibility for combining instructions that can be issued together at the same time into one very long machine instruction.
The hardware does not need to be confused about the concurrency of instructions.

【０００４】例えば、この種の並列計算機として「ヘネ
シー＆パターソンコンピュータ・アーキテクチャ −
設計・実現・評価の定量的アプローチ−」（日経ＢＰ
社、１９９２、ｐｐ３１２〜３２３）に記載されたもの
がある。[0004] For example, as this kind of parallel computer, "Hennessy & Patterson Computer Architecture-
Quantitative Approach to Design, Realization and Evaluation-"(Nikkei BP
Co., Ltd., 1992, pp 312-323).

【０００５】図７は従来のＶＬＩＷ型並列計算機の構成
を示す図であり、この図において、ＶＬＩＷ型並列計算
機は、命令メモリ１１、実行部１２（実行部１〜ｎ）、
及びデータメモリ１３から構成される。FIG. 7 is a diagram showing the configuration of a conventional VLIW type parallel computer. In this figure, the VLIW type parallel computer includes an instruction memory 11, an execution unit 12 (execution units 1 to n),
And a data memory 13.

【０００６】上記命令メモリ１１は、実行部１〜ｎに対
して毎命令実行サイクルにそれぞれ命令を供給する。こ
のため、命令メモリ１１から一度に読み出される命令長
は、各命令実行部１２に与える命令長×ｎとなる。それ
ぞれの実行部に与える命令長は、通常の一般的な計算機
と同様に３２ビット程度である。The instruction memory 11 supplies instructions to the execution units 1 to n in each instruction execution cycle. Therefore, the instruction length read from the instruction memory 11 at a time is the instruction length given to each instruction execution unit 12 × n. The instruction length given to each execution unit is about 32 bits as in a general computer.

【０００７】したがって、命令メモリ１１から読み出さ
れる命令長は、３２×ｎビットとなり、非常に長くな
る。これが、ＶＬＩＷ型並列計算機と呼ばれる理由であ
る。各実行部は、命令実行サイクル毎に、命令メモリ１
１から供給される命令を実行する。データメモリ１３
は、全ての実行部から同時にアクセス可能な構成をと
り、どの実行部からもメモリ参照ができる構成となって
いる。Therefore, the instruction length read from the instruction memory 11 is 32 × n bits, which is very long. This is why it is called a VLIW parallel computer. Each execution unit stores an instruction memory 1 in each instruction execution cycle.
Execute the instruction supplied from 1. Data memory 13
Has a configuration that can be accessed simultaneously from all execution units, and a memory can be referenced from any execution unit.

【０００８】[0008]

【発明が解決しようとする課題】このような従来のＶＬ
ＩＷ型並列計算機では、毎命令実行サイクルにそれぞれ
の実行部が別々の命令を実行するため、複雑な制御を伴
うプログラム部分を実行させることが可能となる利点が
ある。SUMMARY OF THE INVENTION Such a conventional VL
In the IW type parallel computer, each execution unit executes a separate instruction in each instruction execution cycle, so that there is an advantage that it is possible to execute a program part with complicated control.

【０００９】しかしながら、全ての実行部が命令実行を
行うため、本質的に逐次処理と並列処理とを、任意の組
み合わせで内在する通常のプログラム実行においては、
逐次実行を行うプログラム部分、あるいは、ＶＬＩＷ型
並列計算機に備わっているｎ個の実行部全てを同時に使
用して処理を行う程には並列処理性が高くないプログラ
ム部分の実行に対しては、並列実行により進めることに
よる不具合が生じる。However, since all execution units execute instructions, in a normal program execution inherent in any combination of sequential processing and parallel processing, essentially,
For the execution of a program part that performs sequential execution, or a program part that is not high in parallelism enough to perform processing by simultaneously using all n execution units included in the VLIW type parallel computer, A problem arises due to the progress of execution.

【００１０】このため、命令実行を行わない実行部に対
する命令コードとして、ＮＯΡコード（何も実行しない
ことを指示する命令コード）を挿入する必要が生じてい
る。このことが、プログラム全体を構成する命令コード
量の増大を引き起こしている。更に、上記プログラムの
修正は個々の並列計算機の構成に依存するため、修正さ
れたプログラムは特定の計算機においてのみ、実行可
能、あるいは最大性能を発揮するという問題点があっ
た。For this reason, it has become necessary to insert a NO $ code (an instruction code instructing not to execute anything) as an instruction code for an execution unit that does not execute an instruction. This causes an increase in the amount of instruction codes constituting the entire program. Further, since the modification of the above-mentioned program depends on the configuration of each parallel computer, there has been a problem that the modified program can be executed only on a specific computer or exhibits the maximum performance.

【００１１】本発明は、ＮＯＰコード挿入等の、プログ
ラムの逐次実行部分に対する修正を不要にすることがで
き、プログラムを理想的な効率で実行可能でき、さらに
プログラムサイズの削減ができる命令供給装置を提供す
ることを目的とする。According to the present invention, there is provided an instruction supply apparatus which can eliminate the need for modifying a sequential execution portion of a program, such as inserting a NOP code, can execute a program with ideal efficiency, and can further reduce the program size. The purpose is to provide.

【００１２】[0012]

【課題を解決するための手段】本発明に係る命令供給装
置は、命令メモリ、命令アドレス制御部、及び複数の命
令実行部を備えた並列計算機に命令を供給する命令供給
装置であって、命令バッファ、命令バッファ読み出し制
御部、命令バッファ書き込み制御部、複数個の命令セレ
クタ及びＮＯＰ（定数生成回路）を備え、命令バッファ
に参照すべき命令コードが格納されていない場合には、
単一の実行部において命令メモリから読み出される命令
を逐次実行すると同時に、命令バッファに設置される、
命令実行部の数に等しい数の命令コードまでを格納可能
な、単一、あるいは複数個の命令ブロックのうちの、空
ブロック、あるいは最も最近参照されていないブロック
に格納し、以下、順次命令実行と並行して、同時実行可
能な命令コードを検査しながら、最多で、命令実行部の
数に等しい数の命令コードの、並列実行可能な命令を格
納し、命令バッファに参照すべき命令コードが格納され
ている場合には、命令バッファは命令アドレスにより検
索され、該命令アドレスに一致するブロックが存在する
こと、及び該ブロックの内容が有効であることを確認し
た後、最多で、命令実行部の数に等しい数の命令コー
ド、及び並列実行可能な命令数を、同時に読み出し、並
列に命令実行可能な数の命令実行部には、読み出した命
令コードを供給し、並列に命令実行を行わせるととも
に、並列実行不可能な命令実行部に対しては命令実行不
可であることを示すＮＯＰコードを自動的に供給する機
能を有し、逐次実行型の計算機において実行されるプロ
グラムを入力し、該入力したプログラムの中から並列実
行可能な命令コード列を抽出し、命令バッファに格納
し、該命令コード列が繰り返し実行される場合には、２
回目以降は、該命令バッファより並列実行可能な最大数
の命令コードを供給し、並列実行させるように構成す
る。According to the present invention, there is provided an instruction supply apparatus for supplying an instruction to a parallel computer having an instruction memory, an instruction address control unit, and a plurality of instruction execution units. A buffer, an instruction buffer read control unit, an instruction buffer write control unit, a plurality of instruction selectors and a NOP (constant generation circuit), and when an instruction code to be referred to is not stored in the instruction buffer,
In the single execution unit, the instructions read from the instruction memory are sequentially executed, and at the same time, are installed in the instruction buffer.
Of single or multiple instruction blocks capable of storing up to the number of instruction codes equal to the number of instruction execution units, the instruction blocks are stored in an empty block or a block which has not been most recently referred to. In parallel with the above, while simultaneously checking the instruction codes that can be executed simultaneously, the maximum number of instruction codes that can be executed in parallel with the number of instruction codes equal to the number of instruction execution units are stored, and the instruction code to be referred to the instruction buffer is If the instruction buffer is stored, the instruction buffer is searched by the instruction address, and after confirming that a block matching the instruction address exists and that the contents of the block are valid, the instruction The number of instruction codes equal to the number of instructions and the number of instructions that can be executed in parallel are read simultaneously, and the read instruction codes are supplied to the number of instruction execution units that can execute instructions in parallel, It has a function of causing a column to execute an instruction and automatically supplying a NOP code indicating that the instruction cannot be executed to an instruction execution unit that cannot execute in parallel. A command code sequence that can be executed in parallel is extracted from the input program and stored in an instruction buffer. If the command code sequence is repeatedly executed, 2
After the first time, the maximum number of instruction codes which can be executed in parallel is supplied from the instruction buffer, and the instruction codes are executed in parallel.

【００１３】また、並列計算機は、同時に複数の命令実
行部において、それぞれ独立した処理を行う並列計算機
であってもよく、並列計算機は、ＶＬＩＷ（Very Long
Instruction Word：超長形式機械命令）型並列計算機で
あってもよい。Further, the parallel computer may be a parallel computer that simultaneously performs independent processing in a plurality of instruction execution units, and the parallel computer is a VLIW (Very Long Long).
Instruction Word: An ultra-long machine instruction) type parallel computer.

【００１４】[0014]

【発明の実施の形態】本発明に係る命令供給装置は、並
列計算機における命令供給装置に適用することができ
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS The instruction supply device according to the present invention can be applied to an instruction supply device in a parallel computer.

【００１５】図１は本発明の実施形態に係る命令供給装
置を備えるＶＬＩＷ型並列計算機の全体構成を示す図で
ある。FIG. 1 is a diagram showing the overall configuration of a VLIW type parallel computer provided with an instruction supply device according to an embodiment of the present invention.

【００１６】図１において、命令供給装置を備えるＶＬ
ＩＷ型並列計算機は、命令メモリ２１、命令アドレス制
御部２２及び実行部２３（実行部１〜ｎ）で構成される
並列計算機２０に加え、命令バッファ２４、命令バッフ
ァ読み出し制御部２５、命令バッファ書き込み制御部２
６、命令セレクタ２７（命令セレクタ１〜ｎ）、ＮＯＰ
（定数生成回路）２８及びインバータ（反転器）２９か
ら構成される命令供給装置３０を備えている。In FIG. 1, a VL having an instruction supply device
The IW type parallel computer includes an instruction buffer 24, an instruction buffer read control unit 25, and an instruction buffer write unit, in addition to a parallel computer 20 including an instruction memory 21, an instruction address control unit 22, and an execution unit 23 (execution units 1 to n). Control unit 2
6. Instruction selector 27 (instruction selectors 1 to n), NOP
An instruction supply device 30 including a (constant generation circuit) 28 and an inverter (inverter) 29 is provided.

【００１７】図２は上記命令バッファ２４の構成を示す
図である。FIG. 2 is a diagram showing the configuration of the instruction buffer 24.

【００１８】図２において、命令バッファ２４は、ブロ
ック１〜ｍまでのｍ個のブロックから構成される。個々
のブロックは、Ｖ、命令数、命令アドレス部、ＬＲＵ、
及び命令１〜ｎの各フィールドで構成される。In FIG. 2, the instruction buffer 24 is composed of m blocks 1 to m. Each block has V, the number of instructions, an instruction address portion, an LRU,
And fields of instructions 1 to n.

【００１９】図２中のＶは、Ｖalidフラグの意味であ
り、該当ブロックの内容が有効であるか、否かを示す。
命令数は、命令１〜ｎに格納される各命令のうち、命令
１から数えて、有効である命令数を示す。この命令数で
示される数の命令が、実行部１〜ｎにおいて並列実行可
能であることを示す。V in FIG. 2 means a Valid flag, and indicates whether the contents of the corresponding block are valid or not.
The number of instructions indicates the number of valid instructions counted from the instruction 1 among the instructions stored in the instructions 1 to n. This indicates that the number of instructions indicated by the number of instructions can be executed in parallel in the execution units 1 to n.

【００２０】図２のハッチング部分に示す命令アドレス
は、該当ブロックの命令１に格納されている命令コード
の、命令メモリ上に配置された命令アドレスを示す。外
部から与えられた命令アドレスに一致する値が本命令ア
ドレス格納部に格納されている場合には、該当ブロック
には参照すべき命令コードが格納されていることを示
す。The instruction address shown in the hatched portion in FIG. 2 indicates the instruction address of the instruction code stored in the instruction 1 of the corresponding block, which is arranged on the instruction memory. When a value corresponding to an instruction address given from the outside is stored in the present instruction address storage unit, it indicates that the corresponding block stores an instruction code to be referred to.

【００２１】ＬＲＵは、Least Recentyly Usedフラグの
意味であり、本フラグが０の場合には、該当ブロックに
格納されている命令が最近参照されていないことを示
す。このようなブロックは、新たな命令コード列を命令
バッファに格納する際の、更新対象となる。LRU means Least Recently Used Flag. When this flag is 0, it indicates that the instruction stored in the corresponding block has not been referred to recently. Such a block is to be updated when a new instruction code string is stored in the instruction buffer.

【００２２】上記命令アドレス格納部、及びＬＲＵ部は
内容検索型メモリから構成され、該当部分に格納されて
いる内容と一致する場合に必要データを出力する構造を
もつ。The instruction address storage unit and the LRU unit are constituted by a content search type memory, and have a structure for outputting necessary data when the content matches the content stored in the corresponding part.

【００２３】上記命令アドレス格納部は、入力される命
令アドレスの内容と一致するブロックが存在する場合に
は一致信号をアサート（論理“１”）し、該当ブロック
のＶフラグ、命令数及び命令１〜ｎの内容を出力する。The instruction address storage unit asserts a match signal (logic "1") when there is a block that matches the content of the input instruction address, and sets the V flag, instruction count and instruction 1 of the corresponding block. Output the contents of .about.n.

【００２４】また、ＬＲＵ部は、読み出しが行われた場
合に該当ブロックのＬＲＵを１にし、書き込みを行う場
合には、Ｖフラグが０である一番若い番のブロック番号
を出力する。全てのＶフラグが１の場合には、ＬＲＵが
０である一番若いブロック番号を出力すると同時に該当
ブロックのＶフラグを０にする。全てのＶフラグ、及び
全てのＬＲＵが１の場合には一番老い番のブロック、す
なわちｍを出力すると同時に該当ブロックのＶフラグを
０にする。また、書き込み時（ブロックの更新時）に
は、更新されたブロックのＬＲＵフラグのみ１になり、
他の全てのブロックのＬＲＵフラグは０となる。The LRU unit sets the LRU of the corresponding block to 1 when reading is performed, and outputs the block number of the smallest block whose V flag is 0 when writing is performed. When all the V flags are 1, the youngest block number whose LRU is 0 is output, and the V flag of the corresponding block is set to 0 at the same time. When all the V flags and all the LRUs are 1, the oldest block, that is, m, is output, and at the same time, the V flag of the block is set to 0. At the time of writing (at the time of updating a block), only the LRU flag of the updated block becomes 1,
The LRU flags of all other blocks are set to 0.

【００２５】図３は上記並列計算機２０により実行され
る命令コードのフォーマットを示す図である。FIG. 3 is a diagram showing a format of an instruction code executed by the parallel computer 20.

【００２６】図３において、命令には演算命令と分岐命
令の２種があり、それぞれ別々のフォーマットをもつ。
演算命令は、演算指定、ソース１オペランド、ソース２
オペランド、及びデスティネーションオペランドで構成
される。また、分岐命令は、分岐指定、及び分岐先アド
レスで構成される。In FIG. 3, there are two types of instructions, an operation instruction and a branch instruction, each having a different format.
The operation instruction includes an operation specification, a source 1 operand, a source 2
It consists of an operand and a destination operand. A branch instruction is composed of a branch designation and a branch destination address.

【００２７】図４は上記並列計算機２０における命令バ
ッファ書き込み制御部２６の構成を示す回路図である。FIG. 4 is a circuit diagram showing the configuration of the instruction buffer write control unit 26 in the parallel computer 20.

【００２８】図４において、命令バッファ書き込み制御
部２６は、命令レジスタ３１、分岐コード（定数生成）
３２、比較器３３（比較器１）、命令数カウンタ３４、
「１」定数生成部３５、ＡＮＤ回路３６（ＡＮＤ１）、
ＡＮＤcount（レジスタ）３７、ＡＮＤ回路３８（ＡＮ
Ｄ２〜ｎ）、イネーブル入力付きフリップフロップ３９
（ＦＦ１〜ｎ）、オペランドキューレジスタ４０（オペ
ランドキュー１〜ｎ−１）、比較器４１（比較器Ｓ１１
〜Ｓ１ｎ−１）、比較器４２（比較器Ｓ２１〜Ｓ２ｎ−
１）、ＯＲ回路４３（ＯＲall）、及びインバータ４４
から構成される。In FIG. 4, the instruction buffer write control unit 26 includes an instruction register 31, a branch code (constant generation).
32, a comparator 33 (comparator 1), an instruction number counter 34,
“1” constant generator 35, AND circuit 36 (AND1),
ANDcount (register) 37, AND circuit 38 (AN
D2 to n), flip-flop 39 with enable input
(FF1 to n), operand queue register 40 (operand queues 1 to n-1), comparator 41 (comparator S11
To S1n-1) and the comparator 42 (comparators S21 to S2n-).
1), OR circuit 43 (ORall), and inverter 44
Consists of

【００２９】図５は上記並列計算機２０における命令バ
ッファ読み出し制御部２５の構成を示す回路図である。FIG. 5 is a circuit diagram showing the configuration of the instruction buffer read control unit 25 in the parallel computer 20.

【００３０】図５において、命令バッファ読み出し制御
部２５は、ΑＮＤ回路５１（ΑＮＤｖ）、ΑＮＤ回路５
２（ＡＮＤ１〜ｎ）、及びデコーダ５３から構成され
る。In FIG. 5, the instruction buffer read control unit 25 includes a ΑND circuit 51 (ΑNDv), a ΑND circuit 5
2 (AND1 to n) and a decoder 53.

【００３１】上記デコーダ５３は、入力される値に対し
て、その値に等しい数の出力信号をアサートする（論理
“１”にする）もので、例えば、入力値が３の場合に
は、ｎ本の出力信号のうち１〜３までの３本のみをアサ
ートする。The decoder 53 asserts (makes the logic "1") the number of output signals equal to the input value. For example, when the input value is 3, n is n. Of the three output signals, only three signals from 1 to 3 are asserted.

【００３２】図６は上記並列計算機２０における命令ア
ドレス制御部２２の構成を示す回路図である。FIG. 6 is a circuit diagram showing a configuration of the instruction address control unit 22 in the parallel computer 20.

【００３３】図６において、命令アドレス制御部２２
は、命令アドレスカウンタ６１のみで構成される。In FIG. 6, the instruction address control unit 22
Is composed of only the instruction address counter 61.

【００３４】以下、上述のように構成された命令供給装
置３０を備えた並列計算機２０の動作を説明する。Hereinafter, the operation of the parallel computer 20 having the instruction supply device 30 configured as described above will be described.

【００３５】図１に示す命令供給装置３０を備えた並列
計算機２０において、命令バッファ２４を参照しながら
命令実行を行う動作を示す。An operation of executing an instruction while referring to the instruction buffer 24 in the parallel computer 20 provided with the instruction supply device 30 shown in FIG. 1 will be described.

【００３６】プログラム実行を開始する以前の状態にお
いて、命令バッファ２４は初期化されているものとす
る。すなわち、命令バッファ２４の全ブロックのＶフラ
グは０となっている。It is assumed that the instruction buffer 24 has been initialized before the execution of the program is started. That is, the V flags of all blocks in the instruction buffer 24 are 0.

【００３７】この状態では、命令コードは命令バッファ
２４からではなく、命令メモリ２１から供給され、実行
部１において逐次実行される。命令実行と並行して、命
令バッファ２４内のブロック１の命令１の部分から命令
ｎに向かって順次命令コードが書き込まれる。In this state, the instruction code is supplied not from the instruction buffer 24 but from the instruction memory 21 and is sequentially executed by the execution unit 1. In parallel with the instruction execution, instruction codes are sequentially written from the instruction 1 portion of the block 1 in the instruction buffer 24 toward the instruction n.

【００３８】ブロックの先頭に格納される命令、すなわ
ち命令１に書き込まれる命令に対しては、その命令アド
レスも命令アドレス格納部（図２）に格納される。For the instruction stored at the head of the block, that is, the instruction written in the instruction 1, the instruction address is also stored in the instruction address storage unit (FIG. 2).

【００３９】命令バッファ書き込み制御部２６では、命
令バッファ２４に格納した命令列に対して、新たに書き
込まれる命令が並列実行可能か否かの検査を行う。並列
実行できない命令とは、同一ブロック内に既に格納され
ている命令の実行結果をソースオペランド１、あるいは
ソースオペランド２とする命令、分岐命令直後の命令、
及びｎ個を超える命令、すなわち命令バッファ２４の１
ブロック分（ｎ命令）を超える命令である。The instruction buffer write control unit 26 checks whether or not a newly written instruction can be executed in parallel with the instruction sequence stored in the instruction buffer 24. Instructions that cannot be executed in parallel include an instruction whose source operand 1 or source operand 2 is the execution result of an instruction already stored in the same block, an instruction immediately after a branch instruction,
And more than n instructions, ie, one
This is an instruction exceeding the number of blocks (n instructions).

【００４０】命令バッファ書き込み制御部２６では、上
記並列実行不可能な命令コードを検出した時点で、この
命令コードの格納を抑止する。When the instruction buffer write control unit 26 detects the instruction code that cannot be executed in parallel, the storage of the instruction code is suppressed.

【００４１】これと同時に、それまで同一ブロックに格
納した命令数を命令数部分に格納し、ＬＲＵ、及びＶフ
ラグに１を書き込む。更に、命令バッファ２４の該当ブ
ロック、すなわちブロック１を除く全てのブロックのＬ
ＲＵを０にする。At the same time, the number of instructions stored so far in the same block is stored in the instruction number portion, and 1 is written to the LRU and the V flag. Further, the L of the corresponding block of the instruction buffer 24, that is, all the blocks except the block 1
Set RU to 0.

【００４２】ここまでの操作で、命令バッファ２４の１
つのブロックの書き込みが完了する。並列実行不可能と
判断された命令は、次のブロック、つまりブロック２の
先頭に格納される命令として、以降命令ブロック１の書
き込みと同様に処理される。By the operation so far, 1 of the instruction buffer 24
Writing of one block is completed. The instruction determined not to be executable in parallel is processed as the next block, that is, the instruction stored at the head of block 2, in the same manner as the writing of instruction block 1.

【００４３】上記動作を繰り返しながら、ブロック１か
ら順次ブロックｍまで、それぞれ並列実行可能な命令列
が、それぞれのブロックに格納されていく。By repeating the above operation, an instruction sequence that can be executed in parallel from block 1 to block m is stored in each block.

【００４４】図４の命令バッファ書き込み制御部２６の
構成図において、命令バッファ書き込み制御部２６で
は、命令メモリ２１から読み出された命令コードを一旦
命令レジスタ３１に格納する。命令レジスタ３１に格納
された命令コードのうち、動作指定部（演算命令の場合
には演算指定、分岐命令の場合には分岐指定に当たる）
の内容を検査する。In the configuration diagram of the instruction buffer write control unit 26 in FIG. 4, the instruction buffer write control unit 26 temporarily stores the instruction code read from the instruction memory 21 in the instruction register 31. Of the instruction codes stored in the instruction register 31, an operation designating portion (corresponds to an operation designation for an operation instruction, a branch designation for a branch instruction)
Inspect the contents of

【００４５】この内容が分岐命令を示す値の場合には、
分岐コード（定数生成）３２、及び比較器１において検
査された結果、分岐信号がアサートされる。If the content is a value indicating a branch instruction,
As a result of the inspection by the branch code (constant generation) 32 and the comparator 1, the branch signal is asserted.

【００４６】この結果、ＯＲ回路４３（ＯＲall）の出
力であるＦ（Finish）信号がアサートされる。Ｆ信号
は、図２の命令バッファ２４において、該当ブロックの
Ｖフラグを１にするとともに、命令バッファ書き込み制
御部２６から与えられる命令数を書き込む。更に、該当
ブロックのＬＲＵフラグを１とし、他の全てのブロック
のＬＲＵフラグを０にする。なお、該当ブロックの命令
アドレス部の内容は、命令１に最初の命令コードが格納
された時点に書き込まれた命令アドレスを保持する。As a result, the F (Finish) signal output from the OR circuit 43 (ORall) is asserted. The F signal sets the V flag of the corresponding block to 1 and writes the number of instructions given from the instruction buffer write control unit 26 in the instruction buffer 24 of FIG. Further, the LRU flag of the corresponding block is set to 1, and the LRU flags of all other blocks are set to 0. The content of the instruction address portion of the block holds the instruction address written at the time when the first instruction code is stored in the instruction 1.

【００４７】同様に、命令レジスタ３１に格納された命
令コードのソース１オペランドの内容、及びソース２オ
ペランドの内容が、既に命令バッファ２４に格納されて
いる同一ブロック内の並列実行可能な命令列のデスティ
ネーションオベランドの内容と比較される。すなわち、
同時に実行する命令間でのデータの参照による競合を検
査する。これを実行するためにオペランドキュー４０
（オペランドキュー１〜ｎ）、比較器４１（比較器Ｓ１
１〜Ｓ１ｎ）、比較器４２（比較器Ｓ２１〜Ｓ２ｎ）が
用いられる。Similarly, the contents of the source 1 operand and the contents of the source 2 operand of the instruction code stored in the instruction register 31 are the same as those of the parallel executable instruction sequence in the same block already stored in the instruction buffer 24. Compared to the content of Destination Obland. That is,
Checks for conflicts due to data references between instructions executed simultaneously. To do this, the operand queue 40
(Operand queues 1 to n), comparator 41 (comparator S1
1 to S1n) and a comparator 42 (comparators S21 to S2n).

【００４８】上記検査の結果、１つでも競合が生じるこ
とが判明した場合には、ＯＲ回路４３（ＯＲall）出
力、すなわちＦ信号がアサートされる。As a result of the above inspection, if it is found that at least one conflict occurs, the output of the OR circuit 43 (ORall), that is, the F signal is asserted.

【００４９】以下、分岐命令を検出した場合と同様に、
該当ブロックの書き込み完了処理が行われる。Hereinafter, similar to the case where a branch instruction is detected,
The write completion processing of the corresponding block is performed.

【００５０】また、この検査対象とする命令コード列の
範囲を指定するために、ＡＮＤ回路３８（ＡＮＤ２〜
ｎ）、及びフリップフロップ３９（ＦＦ１〜ｎ）が用い
られる。To specify the range of the instruction code string to be checked, an AND circuit 38 (AND2 to AND2)
n) and flip-flops 39 (FF1 to n).

【００５１】命令コードがブロックに格納される度に、
上記ＦＦ１〜ｎの１出力がＦＦ１からＦＦｎの方向に伝
搬する。すなわち、ＦＦ１〜ｎはシフトレジスタを構成
している。比較器４１（比較器Ｓ１１〜Ｓ１ｎ）、及び
比較器４２（比較器Ｓ２１〜Ｓ２ｎ）のうち、ＦＦ１〜
ｎから１を入力する比較器において一致検査が行われ
る。Each time an instruction code is stored in a block,
One output of the FFs 1 to n propagates in the direction from FF1 to FFn. That is, the FFs 1 to n constitute a shift register. Among the comparators 41 (comparators S11 to S1n) and the comparators 42 (comparators S21 to S2n), FF1 to FF1
A match check is performed in a comparator that inputs 1 from n.

【００５２】ＡＮＤ回路３８（ＡＮＤ２〜ｎ）は、ＦＦ
１〜ｎの内容をそれぞれシフトする場合と、Ｆ信号のア
サートによりＦＦ１〜ｎの内容を全てリセット、すなわ
ち０にする場合の論理をとるために用いられる。The AND circuit 38 (AND2 to AND-n) includes an FF
It is used to shift the contents of 1 to n and to reset all the contents of FFs 1 to n by asserting the F signal, that is, to set the logic to 0.

【００５３】ＦＦ１〜ｎの否定出力は、書き込み許可信
号１〜ｎとして、命令バッファ２４に供給される。これ
らの信号の立ち上がり時点で、命令レジスタ３１に格納
された命令コードが命令バッファ２４の該当命令コード
格納部分（命令１〜ｍ）に書き込まれる。また、書き込
み許可信号１の立ち上がり時点で、命令アドレス制御部
２２から入力される命令アドレスが命令バッファ２４の
該当ブロックの命令アドレス格納部（図２）に書き込ま
れる。The negative outputs of the FFs 1 to n are supplied to the instruction buffer 24 as write enable signals 1 to n. At the time of rising of these signals, the instruction code stored in the instruction register 31 is written to the corresponding instruction code storage portion (instructions 1 to m) of the instruction buffer 24. At the time of the rise of the write enable signal 1, the instruction address input from the instruction address control unit 22 is written to the instruction address storage unit (FIG. 2) of the corresponding block of the instruction buffer 24.

【００５４】図４の命令バッファ書き込み制御部２６に
示す命令数カウンタ３４、「１」定数生成部３５、ＡＮ
Ｄ回路３６（ＡＮＤ１）及びＡＮＤcount（レジスタ）
３７は、該当ブロックに書き込まれた命令数、すなわち
並列実行可能な命令数を計数するものである。The instruction number counter 34, the “1” constant generation unit 35, and the AN shown in the instruction buffer write control unit 26 in FIG.
D circuit 36 (AND1) and ANDcount (register)
Reference numeral 37 indicates the number of instructions written in the corresponding block, that is, the number of instructions that can be executed in parallel.

【００５５】ＡＮＤcount３７の出力である命令数は、
Ｆ信号の立ち上がり時点で、命令バッファ２４の命令数
部分に書き込まれる。The number of instructions output from the ANDcount 37 is:
At the time of the rise of the F signal, it is written into the instruction number portion of the instruction buffer 24.

【００５６】図１及び図２に戻って、命令アドレスは、
命令メモリ２１だけでなく、命令バッファ２４にも入力
される。命令バッファ２４では、入力された命令アドレ
スに一致するブロックの存在を検査する。これは、命令
バッファ２４の各ブロックの命令アドレス格納部を検索
することで実行される。命令アドレス格納部は、内容検
索型のメモリで構成され、外部から入力された命令アド
レスに一致する内容を命令アドレス格納部に保持してい
るブロックを検索する。Returning to FIGS. 1 and 2, the instruction address is
The data is input not only to the instruction memory 21 but also to the instruction buffer 24. The instruction buffer 24 checks for the presence of a block that matches the input instruction address. This is executed by searching the instruction address storage section of each block of the instruction buffer 24. The instruction address storage unit is configured by a content search type memory, and searches for a block in the instruction address storage unit that stores the content that matches the instruction address input from the outside.

【００５７】もし、一致するブロックが存在する場合に
は、ｈｉｔフラグをアサートするとともに、該当ブロッ
クのＶフラグ、命令数、及び命令１〜ｎの内容を出力す
る。また、同時に該当ブロックのＬＲＵフラグを１にす
る。一致するブロックが存在しない場合には、ｈｉｔフ
ラグはアサートされない。この場合には、命令メモリ２
１から読み出された命令コードが命令セレクタ１にて選
択され、実行部１にて実行される。これと同時に該命令
コードは命令バッファ２４に格納される。このとき、新
たな並列動作可能な命令列を格納すべきブロックは、Ｌ
ＲＵブロック番号として命令バッファ２４のタグ部分か
ら与えられる。命令バッファ２４のＬＲＵ部分は、内容
検索型メモリで、書き込みを行う場合には、Ｖフラグが
０である一番若いブロック番号を出力する。If a matching block exists, the hit flag is asserted, and the V flag of the corresponding block, the number of instructions, and the contents of instructions 1 to n are output. At the same time, the LRU flag of the corresponding block is set to 1. If there is no matching block, the hit flag is not asserted. In this case, the instruction memory 2
1 is selected by the instruction selector 1 and executed by the execution unit 1. At the same time, the instruction code is stored in the instruction buffer 24. At this time, the block in which a new parallel operable instruction sequence should be stored is L
The RU block number is given from the tag portion of the instruction buffer 24. The LRU part of the instruction buffer 24 is a content search type memory, and when writing, outputs the youngest block number whose V flag is 0.

【００５８】全てのＶフラグが１の場合には、ＬＲＵフ
ラグが０である一番若いブロック番号を出力すると同時
に該当ブロックのＶフラグを０にする。全てのＶフラ
グ、及び全てのＬＲＵフラグが１の場合には一番老いブ
ロック、すなわちｍを出力すると同時に該当ブロックの
Ｖフラグを０にする。When all the V flags are 1, the youngest block number whose LRU flag is 0 is output, and the V flag of the corresponding block is set to 0 at the same time. When all the V flags and all the LRU flags are 1, the oldest block, that is, m, is output, and the V flag of the corresponding block is set to 0 at the same time.

【００５９】また、書き込み時（ブロックの更新時）に
は、更新されたブロックのＬＲＵフラグのみ１になり、
他の全てのブロックのＬＲＵフラグは０となる。At the time of writing (at the time of updating a block), only the LRU flag of the updated block becomes 1, and
The LRU flags of all other blocks are set to 0.

【００６０】この方法により、命令バッファ２４内のブ
ロックを最大効率で利用することが可能となる。According to this method, the blocks in the instruction buffer 24 can be used with maximum efficiency.

【００６１】なお、本実施形態で示すＬＲＵ部分の動作
方法については、既存技術によるＬＲＵ機構を用いるよ
うにしてもよいことは言うまでもない。It is needless to say that an LRU mechanism according to an existing technique may be used for the operation method of the LRU part shown in the present embodiment.

【００６２】図５に示す命令バッファ読み出し制御部２
５では、命令バッファ２４のタグ部から読み出したＶフ
ラグ、ｈｉｔフラグ、及び命令数とから、命令選択信号
１〜ｎを生成する。Instruction buffer read control unit 2 shown in FIG.
In step 5, the instruction selection signals 1 to n are generated from the V flag, the hit flag, and the number of instructions read from the tag section of the instruction buffer 24.

【００６３】ΑＮＤ回路５１（ΑＮＤｖ）にてＶフラグ
とｈｉｔフラグの両方が１の場合に、命令バッファ２４
がヒットしたと判定され、ΑＮＤ回路５２（ＡＮＤ１〜
ｎ）の一方の入力をアサートする。命令数はデコーダ５
３に入力される。When both the V flag and the hit flag are 1 in the ND circuit 51 ($ NDv), the instruction buffer 24
Is determined to have been hit, and the ΑND circuit 52 (AND1 to AND1)
Assert one input of n). The number of instructions is decoder 5
3 is input.

【００６４】デコーダ５３は、入力される値に対して、
その値に等しい数の出力信号をアサートする（論理
“１”にする）もので、例えば、入力値が３の場合に
は、ｎ本の出力信号のうち１〜３までの３本のみをアサ
ートする。したがって、命令数分の１出力がΑＮＤ１か
ら数えて、それぞれのもう一方の入力をアサートする。
その結果、ΑＮＤ１〜ｎの出力のうち、ヒットした命令
バッファ２４の該当ブロックに格納される命令数分の命
令選択信号が、命令選択信号１から数えてアサートされ
る。The decoder 53 outputs a value
Asserts the output signal of the number equal to the value (sets it to logic "1"). For example, when the input value is 3, only three of n output signals are asserted. I do. Therefore, one output of the instruction counts from $ ND1 and asserts the other input of each.
As a result, among the outputs of $ ND1 to #NDn, the instruction selection signals for the number of instructions stored in the corresponding block of the instruction buffer 24 that has hit are asserted counting from the instruction selection signal 1.

【００６５】命令選択信号１〜ｎは、図１に示す並列計
算機２０において、命令セレクタ１〜ｎにそれぞれ入力
される。命令セレクタ１〜ｎでは、命令選択信号がアサ
ートされている場合には、命令バッファ２４から読み出
した命令１〜ｎを選択し、実行部１〜ｎに供給する。命
令バッファ２４がヒットしていながら、命令選択信号が
アサートされていない命令セレクタでは、ＮＯＰ（Non
OPeration：何ら処理を行わない命令）に相当する命令
コードが、接続される実行部に与えられる。ＮＯΡが与
えられた実行部では、命令実行を行わない。The instruction selection signals 1 to n are input to the instruction selectors 1 to n in the parallel computer 20 shown in FIG. When the instruction selection signal is asserted, the instruction selectors 1 to n select the instructions 1 to n read from the instruction buffer 24 and supply them to the execution units 1 to n. In the instruction selector in which the instruction buffer 24 is hit but the instruction selection signal is not asserted, the NOP (Non
OPeration: an instruction that does not perform any processing) is given to the connected execution unit. The execution unit to which NO # is given does not execute the instruction.

【００６６】また、命令バッファ２４がヒットしなかっ
た場合には、命令選択信号１〜ｎは全てネゲートされる
（論理“０”となる）。このとき、命令セレクタ１では
命令メモリ２１から読み出された命令コードが選択さ
れ、実行部に与えられる。命令セレクタ２〜ｎではＮＯ
Ρが選択され、実行部２〜ｎに与えられる。If the instruction buffer 24 does not hit, all of the instruction selection signals 1 to n are negated (become logic "0"). At this time, the instruction selector 1 selects the instruction code read from the instruction memory 21 and supplies it to the execution unit. NO in instruction selectors 2 to n
Is selected and given to the execution units 2 to n.

【００６７】この場合には、並列計算機２０は、命令メ
モリ２１から読み出した１命令のみを実行部１で実行す
る。すなわち、逐次実行を行う。In this case, the parallel computer 20 causes the execution unit 1 to execute only one instruction read from the instruction memory 21. That is, execution is performed sequentially.

【００６８】図６に示すように、命令アドレス制御部２
２は、命令アドレスカウンタ６１のみで構成される。図
１に示す並列計算機２０の実行部１からｎまでの唯一の
実行部において分岐命令が実行される。As shown in FIG. 6, the instruction address control unit 2
2 comprises only an instruction address counter 61. A branch instruction is executed in only one of the execution units 1 to n of the parallel computer 20 shown in FIG.

【００６９】実行結果は、分岐命令結果１〜ｎとして命
令アドレス制御部２２に出力される。分岐命令結果に
は、分岐信号、及び分岐アドレスが含まれる。命令アド
レスカウンタ６１では、分岐信号がネゲートされている
状態（論理“０”の状態）では、クロック毎に計数を行
い、シーケンシャルな命令アドレスを出力する。The execution result is output to the instruction address control unit 22 as branch instruction results 1 to n. The branch instruction result includes a branch signal and a branch address. When the branch signal is negated (state of logic "0"), the instruction address counter 61 counts for each clock and outputs a sequential instruction address.

【００７０】分岐信号がアサートされた場合には、分岐
アドレスを初期値として、新たに書き込む。これによ
り、分岐アドレスを命令アドレスとして、命令メモリ２
１、及び命令バッファ２４に出力する。When the branch signal is asserted, new writing is performed with the branch address as an initial value. As a result, the instruction memory 2
1 and output to the instruction buffer 24.

【００７１】以上説明したように、本実施形態に係る命
令供給装置３０を備えるＶＬＩＷ型並列計算機２０は、
命令メモリ２１、命令アドレス制御部２２及び実行部２
３（実行部１〜ｎ）で構成される並列計算機２０に加
え、命令バッファ２４、命令バッファ読み出し制御部２
５、命令バッファ書き込み制御部２６、命令セレクタ２
７（命令セレクタ１〜ｎ）、ＮＯＰ（定数生成）２８及
びインバータ（反転器）２９から構成される命令供給装
置３０を備え、命令バッファ２４に命令コードが格納さ
れていない状態では、実行部１において命令メモリ２１
から読み出される命令を逐次実行すると共に、命令バッ
ファ２４内の最大ｎ個の並列実行可能な命令を格納する
ことができるブロックに格納し、命令バッファ２４は命
令アドレスにより検索され、その命令アドレスに一致す
るブロックを読み出すことができた場合には、与えられ
た命令アドレスから始まり連続する最大ｎ個までの並列
実行可能な命令列を読み出し、実行部１〜ｎに与えるこ
とで、命令の並列実行を行わせるとともに、必要とする
命令コード列が、命令バッファ２４に格納されていない
場合には、逐次的な命令実行を行いながら、最も最近使
用されていないブロックに対して、新たに並列実行可能
な最大数の命令列を格納するようにしているので、通常
の逐次実行型の計算機において実行されるプログラムを
与え、この中から並列実行可能な命令列を抽出し、命令
バッファ２４に格納し、この命令列が繰り返し実行され
る場合には、２回目以降は並列実行することができる。As described above, the VLIW type parallel computer 20 including the instruction supply device 30 according to the present embodiment
Instruction memory 21, instruction address control unit 22, and execution unit 2
3 (execution units 1 to n), an instruction buffer 24, an instruction buffer read control unit 2
5, instruction buffer write control unit 26, instruction selector 2
7 (instruction selectors 1 to n), an instruction supply device 30 including a NOP (constant generation) 28 and an inverter (inverter) 29, and when no instruction code is stored in the instruction buffer 24, the execution unit 1 In the instruction memory 21
Are sequentially executed and stored in a block capable of storing up to n parallel executable instructions in the instruction buffer 24. The instruction buffer 24 is searched by the instruction address and matches the instruction address. When a block to be read can be read out, a sequence of up to n consecutive instructions that can be executed in parallel starting from a given instruction address is read out and given to the execution units 1 to n, thereby executing the instructions in parallel. When the required instruction code sequence is not stored in the instruction buffer 24, the instruction is sequentially executed, and a new parallel execution can be performed on the least recently used block. Since the maximum number of instruction sequences is stored, a program to be executed on a normal sequential execution type computer is given. Extracting column executable instruction sequence stored in the instruction buffer 24, when the instruction sequence is executed repeatedly, the second and subsequent capable of parallel execution.

【００７２】したがって、本実施形態に係る命令供給装
置３０を備える並列計算機２０では、計算機内部でプロ
グラム内の並列実行可能部分と逐次実行部分を判別して
いるため、従来例によるＶＬＩＷ型並列計算機において
問題とされている、ＮＯＰコード挿入等の、プログラム
の逐次実行部分に対する修正は一切必要なくなる。この
ため、プログラムはどの計算機においても実行可能であ
り、また、そのプログラムを本命令供給装置３０を備え
る並列計算機２０において実行する場合には、ほぼ理想
的な効率で実行可能となる。Therefore, in the parallel computer 20 provided with the instruction supply device 30 according to the present embodiment, the parallel executable part and the serially executable part in the program are discriminated inside the computer, so that in the conventional VLIW type parallel computer, There is no need to modify the sequential execution portion of the program, such as the insertion of NOP code, which is a problem. Therefore, the program can be executed by any computer, and when the program is executed by the parallel computer 20 including the instruction supply device 30, the program can be executed with almost ideal efficiency.

【００７３】特に、ループ処理を多用する通常のプログ
ラムの実行に有効である。また、逐次型計算機を対象と
するプログラムをそのまま並列実行するため、従来例に
よるＶＬＩＷ型並列計算機で問題となっている、並列処
理のためにプログラムサイズが大きくなることは全くな
いという利点がある。In particular, it is effective for executing a normal program that makes heavy use of loop processing. Further, since a program intended for a serial computer is executed in parallel as it is, there is an advantage that the program size does not increase at all due to the parallel processing, which is a problem in the conventional VLIW type parallel computer.

【００７４】また、本実施形態に係る命令供給装置３０
は、通常の命令実行部を複数備える並列計算機に、容易
に設置することができ、一般的なプログラムの実行にお
いて、そのプログラムに内在する並列実行可能な最大数
の命令で構成されるプログラム部分を自動抽出し、並列
実行することが可能である。The instruction supply device 30 according to the present embodiment
Can be easily installed on a parallel computer having a plurality of ordinary instruction execution units, and in the execution of a general program, a program part consisting of the maximum number of instructions that can be executed in parallel and that is inherent in the program is used. Automatic extraction and parallel execution are possible.

【００７５】なお、上記実施形態では、ＶＬＩＷ型並列
計算機への適応例を示したが、命令長が特に制限を与え
るものでもなくかつ、同時に複数の命令実行部におい
て、それぞれ独立した処理を行う並列計算機であれば適
応可能である。また、上記命令供給装置が計算機等に組
み込まれる回路の一部であってもよいことは言うまでも
ない。In the above embodiment, an example of application to a VLIW type parallel computer has been described. However, there is no particular limitation on the instruction length, and a plurality of instruction execution units simultaneously execute independent processing in each of the plurality of instruction execution units. Any computer is applicable. Needless to say, the instruction supply device may be a part of a circuit incorporated in a computer or the like.

【００７６】さらに、上記各制御部等を構成するバッフ
ァ、レジスタ、比較器等の数、種類接続状態などは上記
実施形態に限られないことは言うまでもない。Further, it goes without saying that the number of buffers, registers, comparators and the like constituting each control section and the like, the type of connection, and the like are not limited to those in the above embodiment.

【００７７】[0077]

【発明の効果】本発明に係る命令供給装置では、命令メ
モリ、命令アドレス制御部、及び複数の命令実行部を備
えた並列計算機に命令を供給する命令供給装置であっ
て、命令バッファ、命令バッファ読み出し制御部、命令
バッファ書き込み制御部、複数個の命令セレクタ及びＮ
ＯＰを備え、命令バッファに参照すべき命令コードが格
納されていない場合には、単一の実行部において命令メ
モリから読み出される命令を逐次実行すると同時に、命
令バッファに設置される、命令実行部の数に等しい数の
命令コードまでを格納可能な、単一、あるいは複数個の
命令ブロックのうちの、空ブロック、あるいは最も最近
参照されていないブロックに格納し、以下、順次命令実
行と並行して、同時実行可能な命令コードを検査しなが
ら、最多で、命令実行部の数に等しい数の命令コード
の、並列実行可能な命令を格納し、命令バッファに参照
すべき命令コードが格納されている場合には、命令バッ
ファは命令アドレスにより検索され、該命令アドレスに
一致するブロックが存在すること、及び該ブロックの内
容が有効であることを確認した後、最多で、命令実行部
の数に等しい数の命令コード、及び並列実行可能な命令
数を、同時に読み出し、並列に命令実行可能な数の命令
実行部には、読み出した命令コードを供給し、並列に命
令実行を行わせるとともに、並列実行不可能な命令実行
部に対しては命令実行不可であることを示すＮＯＰコー
ドを自動的に供給する機能を有し、逐次実行型の計算機
において実行されるプログラムを入力し、該入力したプ
ログラムの中から並列実行可能な命令コード列を抽出
し、命令バッファに格納し、該命令コード列が繰り返し
実行される場合には、２回目以降は、該命令バッファよ
り並列実行可能な最大数の命令コードを供給し、並列実
行させるように構成しているので、ＮＯＰコード挿入等
の、プログラムの逐次実行部分に対する修正を不要にす
ることができ、プログラムを理想的な効率で実行可能で
き、さらにプログラムサイズの削減ができる。According to the present invention, there is provided an instruction supply apparatus for supplying an instruction to a parallel computer having an instruction memory, an instruction address control unit, and a plurality of instruction execution units. A read controller, an instruction buffer write controller, a plurality of instruction selectors and N
If the instruction buffer to be referred to is not stored in the instruction buffer, the instructions read from the instruction memory are sequentially executed in the single execution unit, and at the same time, the instruction execution unit is installed in the instruction buffer. A single or a plurality of instruction blocks, which can store up to the same number of instruction codes, are stored in an empty block or a block which has not been referred to most recently. While checking the instruction codes that can be executed simultaneously, a maximum number of instruction codes that can be executed in parallel with the number of instruction codes equal to the number of instruction execution units are stored, and the instruction code to be referred to is stored in the instruction buffer. In the case, the instruction buffer is searched by the instruction address, and there is a block corresponding to the instruction address, and that the content of the block is valid. After confirmation, the maximum number of instruction codes equal to the number of instruction execution units and the number of instructions that can be executed in parallel are read out at the same time, and the read instruction codes are sent to the instruction execution units that can execute instructions in parallel. A sequential execution type computer having a function of supplying instructions to execute instructions in parallel and automatically supplying a NOP code indicating that the instructions cannot be executed to an instruction execution unit which cannot execute the instructions in parallel. Is input, a parallel executable instruction code string is extracted from the input program, stored in an instruction buffer, and when the instruction code string is repeatedly executed, the second and subsequent times The maximum number of instruction codes that can be executed in parallel is supplied from the instruction buffer and the instruction codes are executed in parallel. Positive and can be eliminated, the program can be executed in an ideal efficiency can further reduce the program size.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明を適用した実施形態に係る命令供給装置
を備えるＶＬＩＷ型並列計算機の全体構成を示す図であ
る。FIG. 1 is a diagram illustrating an overall configuration of a VLIW type parallel computer including an instruction supply device according to an embodiment to which the present invention is applied.

【図２】上記命令供給装置の命令バッファの構成を示す
図である。FIG. 2 is a diagram showing a configuration of an instruction buffer of the instruction supply device.

【図３】上記命令供給装置の並列計算機により実行され
る命令コードのフォーマットを示す図である。FIG. 3 is a diagram showing a format of an instruction code executed by a parallel computer of the instruction supply device.

【図４】上記命令供給装置の命令バッファ書き込み制御
部の構成を示す回路図である。FIG. 4 is a circuit diagram showing a configuration of an instruction buffer write control unit of the instruction supply device.

【図５】上記命令供給装置の命令バッファ読み出し制御
部の構成を示す回路図である。FIG. 5 is a circuit diagram showing a configuration of an instruction buffer read control unit of the instruction supply device.

【図６】上記命令供給装置の命令アドレス制御部の構成
を示す回路図である。FIG. 6 is a circuit diagram showing a configuration of an instruction address control unit of the instruction supply device.

【図７】従来の命令供給装置を備えるＶＬＩＷ型並列計
算機の全体構成を示す図である。FIG. 7 is a diagram illustrating an overall configuration of a VLIW type parallel computer including a conventional instruction supply device.

【符号の説明】[Explanation of symbols]

２０並列計算機、２１命令メモリ、２２命令アド
レス制御部、２３実行部（実行部１〜ｎ）、２４命
令バッファ、２５命令バッファ読み出し制御部、２６
命令バッファ書き込み制御部、２７命令セレクタ
（命令セレクタ１〜ｎ）、２８ＮＯＰ（定数生成回
路）、２９インバータ（反転器）、３０命令供給装置Reference Signs List 20 parallel computer, 21 instruction memory, 22 instruction address control unit, 23 execution unit (execution units 1 to n), 24 instruction buffer, 25 instruction buffer read control unit, 26
Instruction buffer write control unit, 27 instruction selectors (instruction selectors 1 to n), 28 NOP (constant generation circuit), 29 inverters (inverters), 30 instruction supply devices

Claims

【特許請求の範囲】[Claims]

【請求項１】命令メモリ、命令アドレス制御部、及び
複数の命令実行部を備えた並列計算機に命令を供給する
命令供給装置であって、命令バッファ、命令バッファ読み出し制御部、命令バッ
ファ書き込み制御部、複数個の命令セレクタ及びＮＯＰ
（定数生成回路）を備え、前記命令バッファに参照すべき命令コードが格納されて
いない場合には、単一の実行部において前記命令メモリ
から読み出される命令を逐次実行すると同時に、前記命
令バッファに設置される、前記命令実行部の数に等しい
数の命令コードまでを格納可能な、単一、あるいは複数
個の命令ブロックのうちの、空ブロック、あるいは最も
最近参照されていないブロックに格納し、以下、順次命令実行と並行して、同時実行可能な命令コ
ードを検査しながら、最多で、前記命令実行部の数に等
しい数の命令コードの、並列実行可能な命令を格納し、前記命令バッファに参照すべき命令コードが格納されて
いる場合には、前記命令バッファは命令アドレスにより
検索され、該命令アドレスに一致するブロックが存在す
ること、及び該ブロックの内容が有効であることを確認
した後、最多で、命令実行部の数に等しい数の命令コー
ド、及び並列実行可能な命令数を、同時に読み出し、並
列に命令実行可能な数の命令実行部には、読み出した命
令コードを供給し、並列に命令実行を行わせるととも
に、並列実行不可能な命令実行部に対しては命令実行不
可であることを示すＮＯＰコードを自動的に供給する機
能を有し、逐次実行型の計算機において実行されるプログラムを入
力し、該入力したプログラムの中から並列実行可能な命
令コード列を抽出し、前記命令バッファに格納し、該命
令コード列が繰り返し実行される場合には、２回目以降
は、該命令バッファより並列実行可能な最大数の命令コ
ードを供給し、並列実行させるように構成したことを特
徴とする命令供給装置。An instruction supply device for supplying an instruction to a parallel computer having an instruction memory, an instruction address control unit, and a plurality of instruction execution units, comprising: an instruction buffer, an instruction buffer read control unit, and an instruction buffer write control unit. , Multiple instruction selectors and NOPs
(Constant generation circuit), when the instruction code to be referred to is not stored in the instruction buffer, the instructions read from the instruction memory are sequentially executed by a single execution unit, and at the same time, installed in the instruction buffer. Stored in an empty block or a least recently referenced block of a single or a plurality of instruction blocks capable of storing up to a number of instruction codes equal to the number of the instruction execution units, In parallel with sequential instruction execution, while checking instruction codes that can be executed simultaneously, at the most, the instruction codes of the number of instruction codes equal to the number of the instruction execution units are stored, and the instructions that can be executed in parallel are stored in the instruction buffer. When the instruction code to be referred to is stored, the instruction buffer is searched by the instruction address, and there is a block matching the instruction address. And, after confirming that the contents of the block are valid, the maximum number of instruction codes equal to the number of instruction execution units and the number of instructions that can be executed in parallel are read out at the same time, and instructions can be executed in parallel. A number of instruction execution units are supplied with the read instruction codes to execute instructions in parallel, and an instruction execution unit that cannot execute in parallel is automatically given a NOP code indicating that the instruction cannot be executed. Inputting a program to be executed in a sequential execution type computer, extracting an instruction code sequence that can be executed in parallel from the input program, storing the extracted instruction code sequence in the instruction buffer, In the case where the sequence is repeatedly executed, the second and subsequent times, the maximum number of instruction codes that can be executed in parallel is supplied from the instruction buffer and the instruction codes are executed in parallel. Supply device.

【請求項２】前記並列計算機は、同時に複数の命令実
行部において、それぞれ独立した処理を行う並列計算機
であることを特徴とする請求項１記載の命令供給装置。2. The instruction supply device according to claim 1, wherein the parallel computer is a parallel computer that performs independent processing in a plurality of instruction execution units at the same time.

【請求項３】前記並列計算機は、ＶＬＩＷ（Very Lon
g Instruction Word：超長形式機械命令）型並列計算機
であることを特徴とする請求項１記載の命令供給装置。3. The computer according to claim 1, wherein the parallel computer is a VLIW (Very Lon
The instruction supply device according to claim 1, wherein the instruction supply device is a g Instruction Word (ultra-long format machine instruction) type parallel computer.