JPH0222417B2

JPH0222417B2 -

Info

Publication number: JPH0222417B2
Application number: JP3130281A
Authority: JP
Inventors: Koichiro Omoda; Shigeo Nagashima; Shunichi Torii; Yasuhiro Inagami
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1981-03-06
Filing date: 1981-03-06
Publication date: 1990-05-18
Also published as: JPS57146376A

Description

【発明の詳細な説明】本発明はベクトルデータを高速に演算する装置
に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an apparatus for calculating vector data at high speed.

（従来技術）科学技術の進歩に伴ない、科学技術計算の高速
化が重要な課題となつている。このため、ベクト
ル演算をパイプライン処理によつて高速化するベ
クトルプロセツサが開発されている。(Prior Art) As science and technology progress, speeding up scientific and technical calculations has become an important issue. For this reason, vector processors have been developed that speed up vector operations through pipeline processing.

これらのベクトルプロセツサで第１図に示すよ
うな条件文を含むDOループの演算を処理するた
めの従来の手順を第２図と第３図を用いて説明す
る。第１図に示すDOループでは、式(1)に示す論
理演算に一致がみられれば、そのエレメントにつ
いては、式(2)の演算を行い、一致が見られなけれ
ば、そのエレメントについては式(3)の演算を行う
ことをＮ個のエレメントについて繰り返えされ
る。第２図はエレメント数Ｎが５で、オペランド
Ａ（１〜５）とＢ（１〜５）に適当な数値を仮定し
た場合のデータの変化を示し、第３図はその処理
内容をステツプ１〜４に分けて示したものであ
る。 A conventional procedure for processing a DO loop operation including a conditional statement as shown in FIG. 1 using these vector processors will be explained with reference to FIGS. 2 and 3. In the DO loop shown in Figure 1, if a match is found in the logical operation shown in equation (1), the operation in equation (2) is performed for that element, and if no match is found, the equation is The operation (3) is repeated for N elements. Figure 2 shows the change in data when the number of elements N is 5 and operands A (1 to 5) and B (1 to 5) are assumed to be appropriate values, and Figure 3 shows the processing contents in step 1. It is divided into 4 parts.

ステツプ１：オペランドＡ（１〜５）とＢ（１〜
５）の対応するエレメント同志を比較して、マス
クＭ（１〜５）を作成する。この場合、両エレメ
ントの値が一致していれば対応するマスクは１と
し、不一致であれば０とする。したがつて本例で
はマスクＭ（１〜５）の値はそれぞれ１，０，１，
１，０となる。Step 1: Operands A (1~5) and B (1~
Masks M (1 to 5) are created by comparing the corresponding elements of 5). In this case, if the values of both elements match, the corresponding mask is set to 1, and if they do not match, the corresponding mask is set to 0. Therefore, in this example, the values of the mask M (1 to 5) are 1, 0, 1, and 1, respectively.
It becomes 1,0.

なお、ステツプ１での処理は第１図の式(1)に対
応し、演算は５個のエレメント全部について行な
われる。 Note that the processing at step 1 corresponds to equation (1) in FIG. 1, and the calculation is performed on all five elements.

ステツプ２：本ステツプでの処理は第１図の式
(2)の演算を行なうが、この場合、全エレメントに
ついて無条件に行なうのでなく、ステツプ１での
比較演算結果、一致したエレメントに対応するエ
レメントについてのみ式(2)を実行する必要があ
る。このためマスク付き加算を用いて次のように
演算する。即ち、マスクＭ（１〜５）の値を調べ、
１の値をもつエレメントに対応するエレメントの
みに対して式(2)の演算を実行し、０の値をもつエ
レメントに対応するエレメントについては演算を
抑止するように制御する。従つて、本例では図中
に斜線で示した第１，３，４エレメントのみ演算
される。 Step 2: The process in this step is based on the formula shown in Figure 1.
Calculation (2) is performed, but in this case, it is not necessary to perform the calculation unconditionally for all elements, but only for the element corresponding to the matched element as a result of the comparison calculation in step 1. Therefore, the following calculation is performed using masked addition. That is, check the value of mask M (1 to 5),
The operation of equation (2) is executed only for elements corresponding to elements having a value of 1, and the operation is suppressed for elements corresponding to elements having a value of 0. Therefore, in this example, only the first, third, and fourth elements indicated by diagonal lines in the figure are calculated.

ステツプ３：マスクＭ（１〜５）の各エレメン
トの値を反転して、その結果をマスクM′（１〜
５）とする。本例では、マスクM′（１〜５）の値
は、それぞれ０，１，０，０，１となる。 Step 3: Invert the value of each element of mask M (1 to 5) and apply the result to mask M' (1 to 5).
5). In this example, the values of the mask M' (1 to 5) are 0, 1, 0, 0, and 1, respectively.

ステツプ４：本ステツプでの処理は第１図の式
(3)の演算を行なうが、この場合、全エレメントに
ついて無条件に行なうのでなく、ステツプ１での
比較演算結果、不一致のエレメントに対応するエ
レメントについてのみ式(3)を実行する必要があ
る。このため、ステツプ２で述べたと同様の演算
制御をするマスク付き加算を用いる。なお、この
場合のマスクM′を用いる。本例では、ステツプ
２で演算されなかつた第２，５エレメントのみが
演算されることになる。 Step 4: The process in this step is based on the equation shown in Figure 1.
Calculation (3) is performed, but in this case, it is not necessary to perform the calculation unconditionally for all elements, but only for the element corresponding to the element that does not match as a result of the comparison calculation in step 1. For this reason, masked addition is used which performs arithmetic control similar to that described in step 2. Note that the mask M′ in this case is used. In this example, only the second and fifth elements that were not calculated in step 2 are calculated.

前述した如く、従来技術によるマスク付き演算
では、マスクの値が１であるエレメントに対応し
たエレメントのみの演算を実行し、マスクの値が
０であるエレメントに対応したエレメントについ
ては演算を抑止するように制御する。従つて、従
来技術では第３図のステツプ４の処理を行なう前
にステツプ３のマスク反転処理が余分に必要とな
るという問題点があつた。 As mentioned above, in the masked operation according to the conventional technology, the operation is executed only on the element corresponding to the element whose mask value is 1, and the operation is suppressed on the element corresponding to the element whose mask value is 0. to control. Therefore, in the prior art, there was a problem in that the mask inversion process in step 3 was required before the process in step 4 in FIG. 3 was performed.

一般に、Ｎ個のエレメントを有するベクトルの
演算を行う場合、各エレメントの演算が１クロツ
ク期間で行なわれたとするとステツプ１，２，４
の各々でＮクロツク期間必要となる。さらにステ
ツプ３でもＮクロツク期間必要となる。このため
マスク付き演算では、本来の演算に必要な時間
（3Nクロツク期間）の三分の一の時間がマスク反
転のために余分に必要となり、このため、マスク
付き演算の高速化を妨げている。 Generally, when calculating a vector having N elements, if each element is calculated in one clock period, steps 1, 2, and 4 are performed.
Each of these requires N clock periods. Further, step 3 also requires N clock periods. For this reason, in masked operations, one-third of the time required for the original operation (3N clock period) is required for mask inversion, which prevents speeding up of masked operations. .

（発明の目的）本発明の目的は、命令によるマスク情報を変更
する処理を行なわないで、複数の種類のマスク情
報を用いて複数のマスク演算を実行可能なベクト
ル演算処理装置を提供することにある。(Objective of the Invention) An object of the present invention is to provide a vector arithmetic processing device capable of executing a plurality of mask operations using a plurality of types of mask information without performing processing to change mask information by instructions. be.

（発明の総括的説明）このため本発明による装置では、ベクトル命令
を解読することにより、その命令がマスク付演算
のための命令のときには、それが要求する演算内
容とマスク情報をそれぞれ指定する信号を出力す
るための解読手段と、複数種類のマスク情報を発
生する手段であつて、該解読手段から与えられる
マスク情報指定信号により指定されたマスク情報
を選択的に順次１要素ずつ出力する手段とを設け
ることにより、命令によるマスク情報の変更処理
を不要としたものである。(General Description of the Invention) Therefore, in the device according to the present invention, by decoding a vector instruction, when the instruction is for a masked operation, a signal is sent that specifies the operation contents and mask information required by the instruction. and means for generating multiple types of mask information, the means for selectively sequentially outputting the mask information designated by the mask information designation signal supplied from the decoding means one element at a time. By providing this, it is possible to eliminate the need for changing mask information by command.

（実施例）以下、本発明を実施例を参照して詳細に説明す
る。第５図は本発明の一実施例を示す。(Examples) Hereinafter, the present invention will be described in detail with reference to Examples. FIG. 5 shows an embodiment of the present invention.

本実施例は、本出願人が開発したM180又は
M200H型の計算機に内蔵されているアレイプロ
セツサに本発明を適用して得られたものである。
本発明に直接関連しない装置部分は、従来の、上
述のアレイプロセツサと同じ構成を有するので、
その部分の説明は簡単にするに止める。上のアレ
イプロセツサについては、マニユアル「M180／
M200H内蔵型アレイプロセツサ」が参考となる。 This example is based on the M180 or M180 developed by the applicant.
This was obtained by applying the present invention to the array processor built into the M200H computer.
Since the device parts not directly related to the present invention have the same configuration as the conventional array processor described above,
I will only explain that part briefly. For the above array processor, please refer to the manual "M180/
"M200H Built-in Array Processor" is a good reference.

第５図において、１０１は主記憶装置、１０２
は記憶制御装置、１０３は命令制御装置、１０４
は演算制御装置、１０５はベクトル命令制御回
路、１０６，１０７は64個のデータを格納できる
プリフエツチバツフア、１０８，１１８〜１２１
はAND（論理積）回路、１０９は１ビツトのマス
クレジスタ、１１０はストアレジスタ、１１１は
加減算および比較演算を行なう演算器、１１１―
１〜５は演算器の各演算ステージ、１１１―６〜
１０はマスクを１ビツトずつシフトする５ビツト
のシフトレジスタ、１１２は64ビツトから構成さ
れるマスクレジスタ、１１３は選択回路、１１４
はマスク命令識別回路、１１５は反転回路、１１
６，１１７はNOT（否定）回路、１２２はOR（論
理和）回路である。 In FIG. 5, 101 is the main storage device, 102
103 is a storage control device, 103 is an instruction control device, 104
1 is an arithmetic control unit, 105 is a vector instruction control circuit, 106 and 107 are prefetch buffers that can store 64 pieces of data, and 108, 118 to 121.
is an AND (logical product) circuit, 109 is a 1-bit mask register, 110 is a store register, 111 is an arithmetic unit that performs addition/subtraction and comparison operations, 111-
1 to 5 are each calculation stage of the arithmetic unit, 111-6 to
10 is a 5-bit shift register that shifts the mask one bit at a time; 112 is a mask register composed of 64 bits; 113 is a selection circuit; 114
115 is a mask instruction identification circuit, 115 is an inversion circuit, and 11 is a mask instruction identification circuit.
6, 117 is a NOT (negation) circuit, and 122 is an OR (logical sum) circuit.

命令制御装置１０３は記憶制御装置１２０と信
号線２０２を介して主記憶装置１０１から命令を
順次読出し、これらの命令を解読し、これが非ベ
クトル命令であるときは記憶制御装置１０２を介
して主記憶装置１０１から読出したデータと演算
制御情報（例えば演算の種類等）を信号線２０３
を介して演算制御装置１０４に転送して起動の指
示を行なう。演算制御装置１０４は命令制御装置
１０３から指定された演算を入力されたデータに
施し、演算結果を信号線２０１と記憶制御装置１
０２を介して主記憶装置１０１に書込む。 The instruction control device 103 sequentially reads instructions from the main memory 101 via the memory control device 120 and the signal line 202, decodes these instructions, and if the instructions are non-vector instructions, reads them from the main memory via the memory control device 102. Data read from the device 101 and calculation control information (for example, type of calculation, etc.) are transferred to the signal line 203.
The data is transferred to the arithmetic and control unit 104 via the controller to instruct startup. The arithmetic control device 104 performs arithmetic operations specified by the instruction control device 103 on the input data, and sends the arithmetic results to the signal line 201 and the storage control device 1.
02 to the main storage device 101.

命令制御装置１０３が解読した命令がベクトル
命令のときは、その命令はベクトル命令制御回路
１０５とマスク命令識別回路１１４により実行さ
れる。 When the instruction decoded by instruction control device 103 is a vector instruction, the instruction is executed by vector instruction control circuit 105 and mask instruction identification circuit 114.

第１図に示した演算を行う場合を例にとり、本
発明によるベクトル命令の実行を第４図、第６図
を参照して説明する。 Taking as an example the case where the calculation shown in FIG. 1 is performed, the execution of a vector instruction according to the present invention will be explained with reference to FIGS. 4 and 6.

(1) ステツプ５の処理第１図の式(2)、式(3)の演算を実行するために
は、ベクトルＡとＢを比較してマスクを作るべき
ことを指示する第１の命令と、ベクトルＤとＥの
加算を、このマスクを用いて行なうことにより、
ベクトルＣを得るべきことを指示する第２の命令
と、ベクトルＧとＨの加算を先のマスクを用いて
行なうことにより、ベクトルＦを得るべきことを
指示する第３の命令とが用いられる。(1) Processing of Step 5 In order to execute the calculations of equations (2) and (3) in Figure 1, the first command instructs to compare vectors A and B to create a mask. , by adding vectors D and E using this mask,
A second instruction instructs that vector C should be obtained, and a third instruction instructs that vector F should be obtained by adding vectors G and H using the previous mask.

第１の命令は、従来のマスク作成命令と同じ
く、ベクトルＡとＢを比較して、一致が検出され
たエレメントに対して値１を有し、そうでないエ
レメントに対して値０を有するようにマスクＭを
作るべきことを指示する命令である。もちろん、
すでに、このマスクＭが作成されている場合は、
第１の命令の実行は不必要である。 The first instruction, like a conventional mask creation instruction, compares vectors A and B and has a value of 1 for elements for which a match is detected and a value of 0 for elements that are not. This is a command instructing that mask M should be created. of course,
If this mask M has already been created,
Execution of the first instruction is unnecessary.

式(2)の加算は、ベクトルＡとＢの互いに等しい
エレメントと同じ番号のベクトルＤとＥのエレメ
ントについてのみ、行なう必要がある。したがつ
て、第２の命令は、従来のマスク付きの加算命令
と同じく、マスクＭの対応する値が１であるエレ
メントについてのみ、ベクトルＤとＥの加算を行
なうべきことを指示する命令である。以下では、
このように、マスクＭの対応する値が１であるエ
レメントについてのみ演算を行うべきことを指示
する命令を、正マスク付き演算命令と呼ぶ。一
方、式(3)の加算は、ベクトルＡとＢの互いに等し
くないエレメントと同じ番号の、ベクトルＧとＨ
のエレメントについてのみ行なう必要がある。こ
のことは、マスクＭの対応する値が０であるベク
トルＧとＨのエレメントについてのみ、加算を行
なう必要があることを示す。従来方法では、この
ため、マスクＭを反転する命令を実行し、その後
で、マスクＭの対応する値が１である、ベクトル
ＧとＨのエレメントについてのみ加算を行なうこ
とにより、式(3)の演算が行なわれていた。つま
り、従来方法では二つの命令が実行される必要が
ある。本発明では、式(3)の演算を行なう命令は第
３の命令のみである。このため、第３の命令は、
マスクＭの対応する値が０である、ベクトルＧと
Ｈのエレメントについてのみ加算を行なうべきこ
とを指示するように定義される。以下では、この
ように、マスクＭの対応する値が０であるエレメ
ントについてのみ演算を行うべきことを指示する
命令を補マスク付き演算命令と呼ぶ。第２、第３
の命令はいずれもマスク付き加算を指示する命令
であるが、マスク使用の条件が異なることにな
る。いいかえれば、マスクの種類が異なることに
なる。したがつて、本発明では、これらの命令に
対しては、異なる命令コード（オペコード）を与
えるようにし、命令コードに応じて、異なるマス
クが用いられるようにした。 The addition in equation (2) needs to be performed only for elements of vectors D and E that have the same number as the mutually equal elements of vectors A and B. Therefore, the second instruction, like the conventional addition instruction with a mask, is an instruction that instructs that vectors D and E should be added only for elements for which the corresponding value of mask M is 1. . Below,
An instruction that instructs to perform an operation only on elements for which the corresponding value of the mask M is 1 is called an operation instruction with a positive mask. On the other hand, the addition of equation (3) is performed using vectors G and H that have the same number as the unequal elements of vectors A and B.
This needs to be done only for elements of This indicates that it is necessary to perform addition only for elements of vectors G and H for which the corresponding value of mask M is 0. In the conventional method, therefore, equation (3) is solved by executing an instruction to invert mask M, and then performing addition only for the elements of vectors G and H for which the corresponding value of mask M is 1. calculations were being performed. In other words, the conventional method requires two instructions to be executed. In the present invention, the third instruction is the only instruction that performs the calculation of equation (3). Therefore, the third instruction is
It is defined to indicate that addition should be performed only for elements of vectors G and H for which the corresponding value of mask M is 0. Hereinafter, an instruction that instructs to perform an operation only on elements whose corresponding value of mask M is 0 will be referred to as an operation instruction with complementary mask. 2nd, 3rd
Both instructions instruct addition with a mask, but the conditions for using the mask are different. In other words, the types of masks are different. Therefore, in the present invention, different instruction codes (opcodes) are given to these instructions, and different masks are used depending on the instruction codes.

第１のベクトル命令を命令制御装置１０３が解
読し、これがマスク作成指示命令であることを知
ると、装置１０３は線２０４を介して、この命令
で指定される、オペランドＡ，Ｂの先頭アドレス
（エレメントＡ(1)とＢ(1)の主記憶装置１０１上で
のアドレス）、オペランドＡ，Ｂのインクリメン
ト値（エレメントＡ(i)とＡ（ｉ＋１）、およびＢ(i)
とＢ（ｉ＋１）の主記憶装置１０１上でのアドレ
ス間隔値）、処理エレメント数Ｎ（今の例ではこれ
を５とする）、マスク作成指示情報をベクトル命
令制御回路１０５に転送し、同時にこれらの情報
をもとにしたベクトル命令処理開始を指示する信
号を送る。また、これと並列して、命令制御装置
１０３は信号線２０４を介して命令コードをマス
ク命令識別回路１１４に転送する。 When the instruction control device 103 decodes the first vector instruction and learns that this is a mask creation instruction instruction, the instruction control device 103 sends a message to the start address ( addresses of elements A(1) and B(1) on the main storage device 101), increment values of operands A and B (elements A(i), A(i+1), and B(i)
and B(i+1) on the main memory 101), the number of processing elements N (in this example, this is 5), and mask creation instruction information are transferred to the vector instruction control circuit 105, and at the same time these A signal instructing the start of vector instruction processing based on the information is sent. Further, in parallel with this, the instruction control device 103 transfers the instruction code to the mask instruction identification circuit 114 via the signal line 204.

ベクトル命令制御装置１０５は、まず、オペラ
ンドＡ(1)のアドレスと読出しリクエストをそれぞ
れ信号線２０６，２０７を介し、記憶制御装置１
０２に転送する。オペランドＡ（２〜５）のアド
レスと読出しリクエストも順次転送される。記憶
制御装置１０２はリクエストの受付け順に従つて
一定サイクル後に１サイクルに１エレメントずつ
主記憶装置１０１からデータを読出し、ただち
に、信号線２１８を介してプリフエツチバツフア
１０６に順次１サイクルに１エレメントずつ転送
する。なお、オペランドＡ（１〜５）のアドレス
はベクトル命令制御回路１０５により次のように
して求められる。即ち、Ａ(1)のアドレスは命令制
御装置１０３から転送されたオペランドＡの先頭
アドレスそのもの、Ａ(2)のアドレスはＡ(1)のアド
レスに装置１０３から転送されたオペランドＡの
インクリメント値を加算して求められ、オペラン
ドＡ(3)のアドレスＡ(2)のアドレスに更にインクリ
メント値を加算して求められる。一般にはＡ(i)の
アドレスはＡ（ｉ−１）のアドレスにインクリメ
ント値を加算して求められる。（但し、ｉ＝２，
３，４，…）。オペランドＢの読出しもＡと同様
の手順でオペランドＡの読出しと並行して行なわ
れる。この際アドレスと読出しリクエストは信号
線２０８，２０９を介して装置１０２に転送され
る。そして、読出しデータは信号線２１９を介し
てプリフエツチバツフア１０７に記憶される。 The vector instruction control device 105 first sends the address and read request of operand A(1) to the storage control device 1 via signal lines 206 and 207, respectively.
Transfer to 02. The addresses of operands A (2 to 5) and read requests are also sequentially transferred. The storage control device 102 reads data from the main storage device 101 one element per cycle after a certain number of cycles according to the order in which requests are received, and immediately reads data one element per cycle to the prefetch buffer 106 via the signal line 218. Transfer one by one. Note that the address of operand A (1 to 5) is determined by the vector instruction control circuit 105 as follows. That is, the address of A(1) is the start address of operand A transferred from the instruction control device 103, and the address of A(2) is the increment value of operand A transferred from the device 103 to the address of A(1). It is obtained by adding the address A(2) of the operand A(3), and further adding an increment value to the address of the address A(2) of the operand A(3). Generally, the address of A(i) is obtained by adding an increment value to the address of A(i-1). (However, i=2,
3, 4,…). The reading of operand B is also performed in parallel with the reading of operand A in the same procedure as operand A. At this time, the address and read request are transferred to the device 102 via signal lines 208 and 209. The read data is then stored in prefetch buffer 107 via signal line 219.

こうして、ステツプ５の演算に必要なオペラン
ドＡ，Ｂがそれぞれプリフエツチバツフア１０
６，１０７に書込まれる。 In this way, operands A and B necessary for the operation in step 5 are stored in the prefetch buffer 10.
6,107.

命令制御装置１０３がベクトル命令を解読して
から演算開始まで（最初のエレメントのデータＡ
(1)とＢ(1)を、それぞれプリフエツチバツフア１０
６，１０７から読出すまで）の一連の処理を前処
理と呼ぶことにし、第６図に示すようにこの前処
理はＰサイクルを要するものとする。ベクトル命
令制御回路１０５は、オペランドＡとＢの読出し
が終了したＰサイクル目にプリフエツチバツフア
１０６，１０７に、データの読出しを指示する信
号と読出すべきエレメント数を指示する信号とを
送出する。このための信号線は図示されていな
い。プリフエツチバツフア１０６，１０７は、こ
れらの信号に応答して、Ｐ＋１サイクルから連続
して５サイクルの間に順次Ａ（１〜５）とＢ（１〜
５）のデータを１サイクルに１エレメントずつ互
いに同期して読出して、それぞれ信号線２２０，
２２１を介して演算器１１１に転送する。 From the time the instruction control device 103 decodes the vector instruction to the start of calculation (data A of the first element
(1) and B(1), each prefetch buffer 10
A series of processes (from 6, 107 to read) will be called preprocessing, and as shown in FIG. 6, this preprocessing requires P cycles. The vector instruction control circuit 105 sends a signal instructing to read data and a signal instructing the number of elements to be read to the prefetch buffers 106 and 107 at the Pth cycle when the reading of operands A and B is completed. do. Signal lines for this purpose are not shown. In response to these signals, the prefetch buffers 106 and 107 sequentially convert A(1 to 5) and B(1 to
5) are read out one element per cycle in synchronization with each other, and are connected to the signal lines 220 and 220, respectively.
221 to the arithmetic unit 111.

演算器１１１は例えば５個のステージ（演算ス
テージ１１１―１〜５）に分割されているものと
仮定する。一般的に、パイプライン処理する演算
器では命令で指定された演算を複数の単位処理に
分割して行なうが、上記ステージはこの単位処理
の一つを実行する。エレメントＡ(1)とＢ(1)はＰ＋
２サイクルでステージ１１１―１の処理を受け、
以後１サイクル毎に次のステージに進み、Ｐ＋６
サイクルで最後のステージ１１１―５の処理を受
け、Ａ(1)とＢ(1)の比較結果（Ａ(1)とＢ(1)が等しい
とき１、等しくないとき０）が信号線２２２を介
して64ビツトから構成されるマスクレジスタ１１
２に送られる。なお、ベクトル命令制御回路１０
５は、命令制御装置１０３から転送された比較演
算指示情報に基づき、信号線２１２を介して比較
演算指示を演算開始に先だつＰ＋１サイクル目か
ら演算が終了するＰ＋11サイクルまでの間にわた
り転送し、演算器１１１に比較演算を指示する。 It is assumed that the arithmetic unit 111 is divided into, for example, five stages (arithmetic stages 111-1 to 111-5). Generally, in an arithmetic unit that performs pipeline processing, an operation specified by an instruction is divided into a plurality of unit processes, and the above stage executes one of these unit processes. Elements A(1) and B(1) are P+
Processed in stage 111-1 in 2 cycles,
From then on, proceed to the next stage every cycle, P+6
After being processed by the last stage 111-5 in the cycle, the comparison result of A(1) and B(1) (1 when A(1) and B(1) are equal, 0 when they are not equal) is sent to the signal line 222. Mask register 11 consisting of 64 bits through
Sent to 2. Note that the vector instruction control circuit 10
5, based on the comparison operation instruction information transferred from the instruction control device 103, transfers the comparison operation instruction via the signal line 212 from the P+1 cycle before the start of the operation to the P+11 cycle when the operation ends, and performs the operation. Instructs the device 111 to perform a comparison operation.

他のデータＡ（２〜５），Ｂ（２〜５）は、第６
図に示すように１サイクルずつ遅れて処理され、
２〜５番目のエレメントの比較結果がそれぞれＰ
＋８〜Ｐ＋11サイクルでマスクレジスタ１１２に
転送される。ベクトル命令制御回路１０５は、最
初の演算結果がでるＰ＋６サイクルに、マスクレ
ジスタ１１２に、データのストアを指示する信号
とストアすべきエレメント数を指示する信号とを
送出する。マスクレジスタ１１２は、これらの信
号に応答して、Ｐ＋７サイクルからＰ＋11サイク
ルにかけて、線２２２上のデータを順次第１番目
のビツト位置側からストアする。ＡとＢの値を第
２図のように仮定すれば、マスクレジスタ１１２
のビツト位置１〜５には、１，０，１，１，０の
値がそれぞれ書込まれる（第６図ではこれらの値
を丸で囲んで示してある）。 Other data A (2 to 5) and B (2 to 5) are the 6th
As shown in the figure, processing is delayed by one cycle,
The comparison results of the 2nd to 5th elements are each P
It is transferred to the mask register 112 in +8 to P+11 cycles. The vector instruction control circuit 105 sends a signal instructing to store data and a signal instructing the number of elements to be stored to the mask register 112 in the P+6 cycle when the first operation result is obtained. In response to these signals, mask register 112 sequentially stores the data on line 222 starting from the first bit position from cycle P+7 to cycle P+11. Assuming the values of A and B as shown in FIG.
The values 1, 0, 1, 1, 0 are respectively written in bit positions 1 to 5 of (these values are shown circled in FIG. 6).

これらの演算ステージ１１１―１〜１１１―５
における演算は、マスクシフトレジスタ１１１―
６〜１１１―１０の内の各演算ステージに対応す
るビツトが１のときに行なわれ、０のときには行
なわれない。マスクシフトレジスタ１１１―６〜
１１１―１０の内容は、１サイクルごとに順次シ
フトされ、同一エレメントに対する演算の進行に
同期して、同一のエレメントに対するマスクビツ
トが順次異なる演算ステージに供給される。すな
わち、プリフエツチバツフア１０６，１０７から
のデータの読出しと同期して、OR回路１２２か
らマスクが１ビツトずつ出力され、信号線２２８
を介して演算器１１１に転送される。演算器１１
１に転送されたマスクは、データが演算ステージ
１１１―１〜５に順次移動していくのと同期し
て、マスクシフトレジスタ１１１―６〜１０をシ
フトされてそれぞれにセツトされる。本命令に於
ては、Ａ（１〜５）とＢ（１〜５）の比較演算が全
エレメントに対して行なわれるため、OR回路１
２２から出力されるマスクは５エレメント共、全
て１となるよう制御する必要があるが、この制御
は次のように行なわれる。 These calculation stages 111-1 to 111-5
The operation in mask shift register 111-
The operation is performed when the bit corresponding to each operation stage among 6 to 111-10 is 1, and is not performed when it is 0. Mask shift register 111-6~
The contents of 111-10 are sequentially shifted every cycle, and mask bits for the same element are sequentially supplied to different operation stages in synchronization with the progress of operations on the same element. That is, in synchronization with the reading of data from the prefetch buffers 106 and 107, the OR circuit 122 outputs the mask one bit at a time, and the signal line 228
The data is transferred to the arithmetic unit 111 via. Arithmetic unit 11
The mask transferred to stage 1 is shifted and set in each of mask shift registers 111-6 to 111-10 in synchronization with the data being sequentially moved to calculation stages 111-1 to 111-5. In this instruction, the comparison operation of A (1 to 5) and B (1 to 5) is performed on all elements, so the OR circuit 1
It is necessary to control the mask output from 22 so that all five elements become 1, and this control is performed as follows.

マスク命令識別回路１１４は命令制御装置１０
３から信号線２０４を介して転送された命令コー
ドをデコードしてその結果を演算実行が終了する
まで正マスク演算指示信号線２２６と補マスク演
算指示信号線２２７に出力する。このとき、正マ
スク演算であれば信号線２２６に１を出力し、補
マスク演算であれば信号線２２７に１を出力す
る。本命令は正／補マスク演算でないため信号線
２２６，２２７には共に０の値が出力される。信
号線２２６の値はAND回路１１９に入力される
が、０であるため信号線２３０には０が出力され
る。また、信号線２２７の値はAND回路１２０
に入力されるが、０であるため信号線２３１には
０が出力される。また、信号線２２６，２２７の
０の値はそれぞれNOT回路１１６，１１７で反
転され１の値となりAND回路１１８の出力であ
る信号線２２９の値は１となる。この信号線２２
９の１の値は信号線２２３の１の値と共にAND
回路１２１に入力され、信号線２３２の値は常に
１となり、更にOR回路１２２を介して信号線２
２８には演算実行中、常に１の値が出力される。
従つて、演算実行中、シフトレジスタ１１１―６
〜１１１―１０を、１が連続してシフトイン、シ
フトアウトされる。この結果、すべてのエレメン
トに対してすべての演算ステージでの処理が行な
われる。 The mask instruction identification circuit 114 is connected to the instruction control device 10.
3 through the signal line 204, and outputs the result to the positive mask operation instruction signal line 226 and the complementary mask operation instruction signal line 227 until the execution of the operation is completed. At this time, 1 is output to the signal line 226 if it is a positive mask operation, and 1 is output to the signal line 227 if it is a complementary mask operation. Since this instruction is not a correction/complementary mask operation, a value of 0 is output to both signal lines 226 and 227. The value on the signal line 226 is input to the AND circuit 119, but since it is 0, 0 is output on the signal line 230. Also, the value of the signal line 227 is determined by the AND circuit 120.
However, since it is 0, 0 is output to the signal line 231. Further, the value of 0 on the signal lines 226 and 227 is inverted by the NOT circuits 116 and 117, respectively, and becomes the value of 1, and the value of the signal line 229, which is the output of the AND circuit 118, becomes 1. This signal line 22
The value of 1 in 9 is ANDed with the value of 1 in signal line 223.
It is input to the circuit 121, the value of the signal line 232 is always 1, and the value of the signal line 232 is further passed through the OR circuit 122.
A value of 1 is always output to 28 during the execution of the calculation.
Therefore, during execution of the operation, the shift register 111-6
111-10 are successively shifted in and out. As a result, all elements are processed in all calculation stages.

マスクレジスタ１１２への最後のデータのセツ
トが行なわれるＰ＋11サイクルで、ベクトル命令
制御回路１０５は信号線２０５を介して処理が終
了した旨を命令制御装置１０３に知らせる。そし
て、命令制御装置１０３は次のＰ＋12サイクルか
ら次のベクトル命令の解読を開始する。 In the P+11 cycle when the last data is set in the mask register 112, the vector instruction control circuit 105 notifies the instruction control device 103 via the signal line 205 that the processing has ended. Then, the instruction control device 103 starts decoding the next vector instruction from the next P+12 cycle.

なお、マスクレジスタ１１２、プリフエツチバ
ツフア１０６，１０７のデータ格納エレメント数
を64と仮定しているが、この値には特に意味がな
い。また、処理エレメント数が64を越える場合に
は、必要に応じて処理を分割すれば良い。 It is assumed that the number of data storage elements of the mask register 112 and prefetch buffers 106 and 107 is 64, but this value has no particular meaning. Furthermore, if the number of processing elements exceeds 64, the processing may be divided as necessary.

(2) ステツプ６の処理上述の第２のベクトル命令を命令制御装置１０
３が解読し、これが加算命令であると知ると、線
２０４を介して、その命令で指定されるオペラン
ドＣ，Ｄ，Ｅの先頭アドレスとインクリメント
値、処理エレメント数Ｎ（本実施例では５）、加算
演算指示情報をベクトル制御回路１０５に転送
し、同時にこれらの情報をもとにしたベクトル命
令処理開始を指示する。また、これと並行して、
命令制御装置１０３は信号線２０４を介して命令
コードもマスク命令識別回路１１４に転送する。(2) Processing in step 6 The second vector instruction described above is sent to the instruction control device 10.
3 decodes and finds that this is an addition instruction, it sends the start address and increment value of operands C, D, and E specified by that instruction, and the number of processing elements N (5 in this example) via line 204. , and addition operation instruction information to the vector control circuit 105, and at the same time instructs the start of vector instruction processing based on these information. Also, in parallel with this,
The instruction control device 103 also transfers the instruction code to the mask instruction identification circuit 114 via the signal line 204.

引続いて、ベクトル命令制御回路１０５は、装
置１０３から転送されたアドレス、インクリメン
ト値、処理エレメント数Ｎをもとに、前述と同様
の方法でオペランドＤ，Ｅを主記憶装置１０１か
ら読出し、それぞれプリフエツチバツフア１０
６，１０７に格納する。 Subsequently, the vector instruction control circuit 105 reads operands D and E from the main storage device 101 in the same manner as described above based on the address, increment value, and number N of processing elements transferred from the device 103, and writes them respectively. Pre-feedback 10
6,107.

上述の前処理をＰサイクル要して行ない、その
後、2P＋12サイクルから連続して５サイクルの
間にプリフエツチバツフア１０６，１０７からそ
れぞれＤ（１〜５），Ｅ（１〜５）を順次読出して
演算器１１１へ転送する。ベクトル命令制御回路
１０５は、（2P＋12）〜（2P＋16）サイクルの
間、このデータの読出しと同期して、マスクレジ
スタ１１２に前述のステツプ５でセツトされたマ
スクを処理エレメント数の５だけ（Ｍ（１〜５））
順次読出すように選択回路１１３を制御する。読
出されたデータは信号線２２５、AND回路１１
９、信号線２３０、OR回路１２２、信号線２２
８を介して演算器１１１のシフトレジスタ１１１
―６に転送され、シフトレジスタ１１１―６から
１１１―１０へ順次毎サイクルシフトされる。こ
の転送は次のように行なわれる。 The above pretreatment is performed for P cycles, and then D (1 to 5) and E (1 to 5) are sequentially applied from the prefetch buffers 106 and 107 during 5 consecutive cycles from 2P+12 cycles. It is read and transferred to the arithmetic unit 111. During (2P+12) to (2P+16) cycles, the vector instruction control circuit 105, in synchronization with this data readout, stores the mask set in step 5 in the mask register 112 by the number of processing elements (5) (M( 1-5))
The selection circuit 113 is controlled to read out sequentially. The read data is sent to the signal line 225 and the AND circuit 11
9, signal line 230, OR circuit 122, signal line 22
8 to the shift register 111 of the arithmetic unit 111
-6, and sequentially shifted from shift register 111-6 to shift register 111-10 every cycle. This transfer is performed as follows.

本命令は正マスク付き演算であるため、マスク
命令識別回路１１４は演算実行の間正マスク演算
指示信号線２２６に１を、補マスク演算指示信号
線２２７に０を出力する。このため、AND回路
１１９から信号線２２５の値（即ち選択回路１１
３を介して読出されるマスクレジスタ１１２のＭ
（１〜５）の値）が出力され、信号線２３０、
OR回路１２２、信号線２２８を介して演算器１
１１に転送される。なお、AND回路１２０と１
２１からは常に０が出力される演算器１１１に転
送されたデータＤ（１〜５）とＥ（１〜５）は演算
ステージ１１１―１〜５で順次演算されるが、こ
れらの演算の実行可否は前述のようにデータの入
力と同期してシフトされるシフトレジスタ１１１
―６〜１０の値により制御される。 Since this instruction is an operation with a positive mask, the mask instruction identification circuit 114 outputs 1 to the positive mask operation instruction signal line 226 and 0 to the complementary mask operation instruction signal line 227 during execution of the operation. Therefore, the value of the signal line 225 from the AND circuit 119 (that is, the value of the selection circuit 11
M of mask register 112 read through 3
(values of 1 to 5)) is output, and the signal line 230,
Arithmetic unit 1 via OR circuit 122 and signal line 228
Transferred to 11. In addition, AND circuits 120 and 1
21 always outputs 0. Data D (1 to 5) and E (1 to 5) transferred to the arithmetic unit 111 are sequentially calculated in the calculation stages 111-1 to 5, but the execution of these calculations is The shift register 111 is shifted in synchronization with data input as described above.
- Controlled by a value from 6 to 10.

第６図に示すように、2P＋13，2P＋15，2P＋
16サイクルで、それぞれ第１，３，４エレメント
のデータが演算ステージ１１１―１で演算される
が、対応するマスクシフトレジスタ１１１―６に
は共に１のマスクがセツトされるため演算が許可
される。そして、これらのマスクはデータの転送
と同期して順に次のシフトレジスタにセツトさ
れ、対応する演算ステージでの演算を許可する。
これに対して、第２，５エレメントについては対
応するマスクの値が０であるため演算が抑止され
る。 As shown in Figure 6, 2P+13, 2P+15, 2P+
In 16 cycles, the data of the first, third, and fourth elements are calculated in the calculation stage 111-1, but since a mask of 1 is set in the corresponding mask shift register 111-6, the calculation is permitted. . These masks are then set in the next shift register in sequence in synchronization with data transfer, permitting operations at the corresponding operation stages.
On the other hand, since the values of the corresponding masks for the second and fifth elements are 0, calculations are suppressed.

なお、演算器１１１では加算が行なわれるが、
これはベクトル命令制御回路１０５が、装置１０
３から転送された加算演算指示情報に基づき、信
号線２１２を介して加算指示を演算開始に先だつ
て転送するものとする。 Note that although addition is performed in the arithmetic unit 111,
This means that the vector instruction control circuit 105
Based on the addition operation instruction information transferred from No. 3, the addition instruction is transferred via the signal line 212 prior to the start of the operation.

2P＋17サイクルで第１エレメントの加算結果
Ｄ(1)が演算ステージ１１１―６で求まり、その出
力が信号線２１６を介してストアレジスタ１１０
に送られる。ベクトル命令制御回路１０５は、
（2P＋18）サイクルでこのストアレジスタ１１０
に、線２１６上の信号をセツトすべきことを指示
する信号を送出する。こうして、2P＋18サイク
ルでストアレジスタ１１０に加算結果がセツトさ
れる。ベクトル命令制御回路１０５の制御によ
り、これと同期して、第１エレメントに対応する
マスク値Ｍ(1)がシフトレジスタ１１１―１０から
信号線２１７を介してレジスタ１０９にセツトさ
れる。さらに、ストアレジスタ１１０とレジスタ
１０９へのセツトと同時に、ベクトル命令制御回
路１０５から書込みリクエストがAND回路１０
８に送出され、信号線２１１の値が１となり、信
号線２１４を介してAND回路１０８に転送され
るレジスタ１０９の１の値との間でANDがとら
れ、信号線２１３が１となり記憶制御装置１０２
に書込みリクエストが送出される。そして、記憶
制御装置１０２に書込みリクエストが送出される
のと並行して、ベクトル命令制御回路１０５から
信号線２１０を介してオペランドＣ(1)のアドレス
が、またストアレジスタ１１０から書込みデータ
がそれぞれ装置１０２へ転送され、装置１０２の
制御により主記憶装置１０１にオペランドＤ(1)と
Ｅ(1)の加算結果Ｃ(1)が格納される。 In 2P+17 cycles, the addition result D(1) of the first element is obtained at the calculation stage 111-6, and its output is sent to the store register 110 via the signal line 216.
sent to. The vector instruction control circuit 105 is
This store register 110 in (2P+18) cycles
then sends a signal indicating that the signal on line 216 should be set. In this way, the addition result is set in the store register 110 in 2P+18 cycles. Under the control of the vector instruction control circuit 105, the mask value M(1) corresponding to the first element is set in the register 109 from the shift register 111-10 via the signal line 217 in synchronization with this. Furthermore, at the same time as the store register 110 and register 109 are set, a write request is sent from the vector instruction control circuit 105 to the AND circuit 109.
8, the value of the signal line 211 becomes 1, which is ANDed with the value of 1 of the register 109, which is transferred to the AND circuit 108 via the signal line 214, and the signal line 213 becomes 1, which is used for storage control. device 102
A write request is sent to Then, in parallel with the write request being sent to the storage control device 102, the address of operand C(1) is sent from the vector instruction control circuit 105 via the signal line 210, and the write data is sent from the store register 110 to the device. 102, and the addition result C(1) of operands D(1) and E(1) is stored in the main storage device 101 under the control of the device 102.

第１エレメントの場合と同様に、第２〜５エレ
メントに対する加算結果Ｃ（２〜５）が得られる
ごとに、ベクトル命令制御回路１０５から書込み
リクエストがAND回路１０８へ、アドレスが装
置１０２へ、ストアレジスタ１１０から書込みデ
ータが装置１０２へ、マスクレジスタ１１０から
マスクがAND回路１０８へ順次2P＋19〜2P＋22
サイクルの間に転送される。 As in the case of the first element, each time an addition result C (2 to 5) for the second to fifth elements is obtained, a write request is sent from the vector instruction control circuit 105 to the AND circuit 108, an address is sent to the device 102, and a store is sent. Write data is sent from the register 110 to the device 102, and the mask from the mask register 110 is sent to the AND circuit 108 in sequence from 2P+19 to 2P+22.
transferred between cycles.

装置１０２への書込みリクエストはAND回路
１０８から出力されるが、第２と第５エレメント
の場合についてはマスクの値が０となるため装置
１０２への書込みリクエストは送出されず、装置
１０１への書込みは抑止される。 A write request to the device 102 is output from the AND circuit 108, but since the mask value is 0 for the second and fifth elements, the write request to the device 102 is not sent, and the write request to the device 101 is not sent. is suppressed.

なお、オペランドＣのアドレス生成は、オペラ
ンドＣの先頭アドレスとインクリメント値に基づ
きオペランドＡの場合と同様に発生される。 Note that address generation for operand C is generated in the same way as for operand A based on the start address and increment value of operand C.

2P＋22サイクルで、ベクトル命令制御回路１
０５は信号線２０５を介して処理が終了した旨を
命令制御装置１０３に知らせる。そして、命令制
御装置１０３は次の2P＋23サイクルから次のベ
クトル命令の解読を開始する。 Vector instruction control circuit 1 with 2P + 22 cycles
05 notifies the command control device 103 via the signal line 205 that the processing has ended. Then, the instruction control device 103 starts decoding the next vector instruction from the next 2P+23 cycle.

(3) ステツプ７の処理命令制御装置１０３が、上述の第３の命令を解
読し、これが加算命令であることを知ると、線２
０４を介して、その命令で指定されるオペランド
Ｆ，Ｇ，Ｈの先頭アドレスとインクリメント値処
理エレメント数５、加算演算指示情報をベクトル
制御回路１０５に転送し、同時にこれらの情報を
もとにしたベクトル命令処理開始を指示する。ま
た、これと並行して、装置１０３は信号線２０４
を介して命令コードもマスク命令識別回路１１４
に転送する。引続き、回路１０５と装置１０２は
オペランドＧとＨを装置１０１から読出し、それ
ぞれプリフエツチバツフア１０６，１０７に格納
する。(3) Processing in step 7 When the instruction control device 103 decodes the third instruction mentioned above and learns that this is an addition instruction, line 2
04, the start addresses of operands F, G, and H specified by the instruction, the number of increment value processing elements 5, and addition operation instruction information are transferred to the vector control circuit 105, and at the same time, based on these information, Instructs to start vector instruction processing. In addition, in parallel with this, the device 103 uses the signal line 204
The instruction code is also masked via the instruction identification circuit 114.
Transfer to. Subsequently, circuit 105 and device 102 read operands G and H from device 101 and store them in prefetch buffers 106 and 107, respectively.

上述の前処理をＰサイクル要して行ない、その
後、3P＋23サイクルから連続して、プリフエツ
チバツフア１０６と１０７からそれぞれＧ（１〜
５），Ｈ（１〜５）を順次読出して演算器１１１へ
転送する。上述の第２の命令の場合と同様にこの
データの読出しと同期して、マスクレジスタ１１
２にセツトされているマスクＭ（１〜５）を選択
回路１１３により順次読出し、信号線２２８を介
して演算器１１１へ転送されるが、この転送は次
のように行なわれる。 The above-mentioned preprocessing is performed for P cycles, and then, from 3P+23 cycles, G(1 to
5), H (1 to 5) are read out sequentially and transferred to the arithmetic unit 111. As in the case of the second instruction described above, in synchronization with this data reading, the mask register 11
Masks M (1 to 5) set to 2 are sequentially read out by the selection circuit 113 and transferred to the arithmetic unit 111 via the signal line 228. This transfer is performed as follows.

本命令は補マスク付き演算であるため、マスク
命令識別回路１１４は演算実行の間、正マスク演
算指示信号線２２６に０を、補マスク演算指示信
号線２２７に１を出力する。このため、選択回路
１１３の出力を入力とする反転回路１１５の線２
２４への出力がAND回路１２０から線２３１へ
出力され、OR回路１２２、信号線２２８を介し
て演算器１１１のマスクシストレジスタ１１１―
６に転送される。 Since this instruction is an operation with a complementary mask, the mask instruction identification circuit 114 outputs 0 to the positive mask operation instruction signal line 226 and 1 to the complementary mask operation instruction signal line 227 during execution of the operation. Therefore, the line 2 of the inverting circuit 115 which receives the output of the selection circuit 113 as an input.
The output to 24 is output from the AND circuit 120 to the line 231, and is sent to the mask register 111- of the arithmetic unit 111 via the OR circuit 122 and the signal line 228.
Transferred to 6.

なお、AND回路１１９，１２１からは０が出
力される。この結果、マスクシフトレジスタ１１
１―６には、マスクレジスタ１１２の中のマスク
Ｍ（１〜５）の反転された値０，１，０，０，１
が順次シフトインされる。 Note that 0 is output from the AND circuits 119 and 121. As a result, the mask shift register 11
1-6 contain the inverted values 0, 1, 0, 0, 1 of the mask M (1 to 5) in the mask register 112.
are shifted in sequentially.

演算器１１１に順次転送されたデータＧ（１〜
５）とＨ（１〜５）は演算ステージ１１１―１〜
５の間で加算されるが、各演算ステージの実行可
否は前述と同様に対応するシフトレジスタ１１１
―６〜１０の値により制御される。そして、第１
〜５エレメントの加算結果とマスクの値が、3P
＋29〜3P＋33サイクルの間にそれぞれストアレ
ジスタ２１６、マスクレジスタ１０９にセツトさ
れ、ストアレジスタ２１６にセツトされた加算結
果が信号線２１５を介して装置１０２へ、また、
マスクレジスタ１０９のマスクは信号線２１４を
介してAND回路１０８へ転送される。これと同
期して、回路１０５が対応するエレメントのオペ
ランドＦのアドレスを信号線２１０を介して装置
１０２へ、書込みリクエストを信号線２１２を介
してAND回路１０８へ転送する。装置１０２へ
の書込みリクエストはAND回路１０８から信号
線２１３を介して転送されるが、第１，３，４エ
レメントに対応するマスクレジスタ１０９の値が
０であるため装置１０２への書込みリクエストは
抑止され、最終的には3P＋30サイクルで第２エ
レメント、3P＋33サイクルで第５エレメントの
演算結果が装置１０１に書込まれて処理が終了す
る。 Data G (1~
5) and H (1 to 5) are calculation stages 111-1 to
However, whether or not each operation stage can be executed is determined by the corresponding shift register 111 as described above.
- Controlled by a value from 6 to 10. And the first
~The addition result of 5 elements and the mask value are 3P
During +29 to 3P+33 cycles, the store register 216 and mask register 109 are set, and the addition result set in the store register 216 is sent to the device 102 via the signal line 215, and
The mask in mask register 109 is transferred to AND circuit 108 via signal line 214. In synchronization with this, the circuit 105 transfers the address of the operand F of the corresponding element to the device 102 via the signal line 210 and the write request to the AND circuit 108 via the signal line 212. A write request to the device 102 is transferred from the AND circuit 108 via the signal line 213, but since the value of the mask register 109 corresponding to the first, third, and fourth elements is 0, the write request to the device 102 is suppressed. Finally, the calculation result of the second element is written in the 3P+30th cycle, and the calculation result of the fifth element is written in the 3P+33rd cycle to the device 101, and the process ends.

以上のように、本発明では、同じ演算を必要と
する複数の命令について、各命令が必要とするマ
スク情報が異なるときは、命令に異なる命令コー
ドを与え、この命令コードに応答して、異なるマ
スク情報を発生する手段を用いるので、従来のよ
うに、マスク情報を変更する命令を用いなくても
よい。従つて、第１図に示した処理のように、複
数のマスクを用いる処理を高速化できる。 As described above, in the present invention, when multiple instructions that require the same operation require different mask information, each instruction is given a different instruction code, and in response to this instruction code, a different Since a means for generating mask information is used, there is no need to use a command to change mask information as in the conventional method. Therefore, processing using a plurality of masks, such as the processing shown in FIG. 1, can be speeded up.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は、マスク付きベクトル演算を必要とす
るベクトル演算の一例を示す図、第２図は、第１
図の演算についての従来方法による処理手順を示
す図、第３図は、第２図の処理のフロー図、第４
図は、本発明による第１図の演算の処理フロー
図、第５図は、本発明によるマスク付きベクトル
演算処理装置の概略ブロツク図、第６図は、第５
図の装置において第１図の処理を行つた場合の第
５図の装置の動作のタイムチヤートを示す図。１１２…マスクレジスタ、１１１―６〜１１１
―１０…マスクシフトレジスタ、１１１―１〜１
１１―５…演算ステージ。 FIG. 1 is a diagram showing an example of a vector operation that requires a vector operation with a mask, and FIG.
Figure 3 is a flowchart of the process in Figure 2,
1 is a processing flow diagram of the calculation shown in FIG. 1 according to the present invention, FIG. 5 is a schematic block diagram of a masked vector calculation processing device according to the present invention, and FIG.
6 is a diagram showing a time chart of the operation of the device shown in FIG. 5 when the process shown in FIG. 1 is performed in the device shown in FIG. 112...Mask register, 111-6 to 111
-10...Mask shift register, 111-1 to 1
11-5...Calculation stage.

Claims

【特許請求の範囲】１マスク情報を表わす要素からなる第１のベク
トルデータを保持する手段と、該第１のベクトルデータをそのまま出力するか
もしくは該第１のベクトルデータの各要素の値を
反転した値の要素からなる第２のベクトルデータ
を出力するかを切換えて行うマスクベクトル供給
手段と、該マスクベクトル供給手段から出力される第１
又は第２のベクトルデータおよび演算を受けるべ
き第３のベクトルデータが入力され、該第３のベ
クトルデータの各要素に、該第１又は第２のベク
トルデータの対応する要素の値に応じて有効な演
算を実行するかしないかを切りかえる演算手段
と、からなるマスク付ベクトル演算処理装置。２該マスクベクトル供給手段は、該第３のベク
トルデータに対して該演算手段による演算を要求
する命令の種別に応答して上記切換えを行うもの
である第１項のマスク付ベクトル演算処理装置。３該演算手段は、該第３のベクトルデータに対
して演算をパイプライン的に行うための複数の演
算ステージと、該第１又は第２のベクトルデータの各要素を順
次転送するための、該演算ステージの数に対応し
た段数からなるシフトレジスタからなり、該シフ
トレジスタの出力を該複数の演算ステージによる
演算結果が有効か否かを示す信号として出力する
ものである第１項又は第２項のマスク付ベクトル
演算処理装置。４複数のベクトルデータを記憶するための記憶
手段と、パイプライン演算手段と、該記憶手段から演算を受けるべき第１のベクト
ルデータを読み出しその要素を順次該パイプライ
ン演算手段に供給する手段と、マスク情報を表わす要素からなる第２のベクト
ルデータを保持する手段と、該第２のベクトルデータをそのまま出力するか
もしくは該第２のベクトルデータの各要素の値を
反転した値の要素からなる第３のベクトルデータ
を出力するかを切換えて行うマスクベクトル供給
手段と該マスクベクトル供給手段から出力される該第
２又は第３のベクトルデータが入力され、該パイ
プライン演算手段による演算時間に対応した時間
だけ遅延して該パイプライン演算手段の出力が有
効か否かを示す信号として出力する遅延手段と、該パイプライン演算手段から出力される演算結
果を表わすベクトルデータの各要素を、対応する
該遅延手段の出力信号の値に応じて該記憶手段に
書込むか否かを制御する書込み手段とを有するデ
ータ処理装置。[Claims] 1. A means for holding first vector data consisting of elements representing mask information; and a means for outputting the first vector data as is or inverting the value of each element of the first vector data. a mask vector supply means for switching whether or not to output second vector data consisting of elements of values; and a first vector data outputted from the mask vector supply means.
Alternatively, the second vector data and the third vector data to be subjected to the operation are input, and each element of the third vector data is given a valid value according to the value of the corresponding element of the first or second vector data. A masked vector arithmetic processing device comprising: arithmetic means for switching whether or not to execute a certain arithmetic operation; 2. The masked vector arithmetic processing device according to item 1, wherein the mask vector supply means performs the switching in response to the type of instruction that requests the arithmetic means to perform an operation on the third vector data. 3. The calculation means includes a plurality of calculation stages for performing calculations on the third vector data in a pipeline manner, and a plurality of calculation stages for sequentially transferring each element of the first or second vector data. The first or second term consists of a shift register having a number of stages corresponding to the number of arithmetic stages, and outputs the output of the shift register as a signal indicating whether or not the arithmetic results of the plurality of arithmetic stages are valid. masked vector arithmetic processing unit. 4 storage means for storing a plurality of vector data; pipeline calculation means; means for reading first vector data to be subjected to calculation from the storage means and sequentially supplying its elements to the pipeline calculation means; means for holding second vector data consisting of elements representing mask information; and means for outputting the second vector data as is or consisting of elements having values obtained by inverting the values of each element of the second vector data; a mask vector supply means which switches whether or not to output the vector data of No. 3; delay means for outputting a signal indicating whether or not the output of the pipeline calculation means is valid or not with a delay of time; and a delay means for outputting a signal indicating whether or not the output of the pipeline calculation means is valid; A data processing device comprising: writing means for controlling whether or not to write to the storage means according to the value of the output signal of the delay means.