JP2602852B2

JP2602852B2 - Modularized parallel computer

Info

Publication number: JP2602852B2
Application number: JP62282399A
Authority: JP
Inventors: 文夫高橋; 巖原田; 幸夫長岡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1987-11-09
Filing date: 1987-11-09
Publication date: 1997-04-23
Anticipated expiration: 2012-04-23
Also published as: JPH01123354A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、並列計算機に係わり、特に偏微分方程式の
数値解を並列処理により求めるのに好適な格子結合型の
並列計算機のモジュール化に関する。Description: TECHNICAL FIELD The present invention relates to a parallel computer, and more particularly to a modularization of a lattice-coupled parallel computer suitable for obtaining a numerical solution of a partial differential equation by parallel processing.

［従来の技術］従来、複数台のプロセッサによって並列に処理する計
算機が開発されている。特に偏微分方程式を解くために
適した並列処理計算機がACM Transactions on Computer
Systems,vol1,No.3 August 1983,P195−221に提案され
ている。本特許出願人も、先に特許出願59−273061号
（特開昭61−151773号）「並列処理計算機」なる出願を
した。両者とも格子結合されたMIMD型（多重命令流多重
データ流型）計算機である。[Prior Art] Conventionally, computers that perform processing in parallel by a plurality of processors have been developed. ACM Transactions on Computer is a parallel processing computer especially suitable for solving partial differential equations.
Systems, vol1, No. 3 August 1983, P195-221. The present applicant has also filed an application entitled "Parallel Processing Computer" in Japanese Patent Application No. 59-273061 (Japanese Patent Application Laid-Open No. 61-151773). Both are MIMD-type (multiple-command-flow, multi-data-flow) computers that are lattice-coupled.

前者では隣接するプロセッサ間を共有メモリを介して
接続している。隣接するプロセッサ間のデータ転送は送
信側プロセッサが受信側プロセッサと互いに共有するメ
モリへデータを格納し、受信側プロセッサが格納された
データを入力することで行なわれる。隣接しないプロセ
ッサへのデータ転送は中間のプロセッサを経由して行う
ことができる。In the former, adjacent processors are connected via a shared memory. Data transfer between adjacent processors is performed by storing data in a memory shared by the transmitting processor and the receiving processor and inputting the stored data to the receiving processor. Data transfer to non-adjacent processors can occur via intermediate processors.

後者は、中間のプロセッサを径由することによる時間
の遅れを最小にするための発明であり、プロセッサ間の
データ転送にFIFO（First In First Out）メモリを連結
しデータ転送バスを構成し、遠方のプロセッサ間の転送
時間を短縮している。The latter is an invention for minimizing a time delay caused by switching an intermediate processor. A data transfer bus is formed by connecting a FIFO (First In First Out) memory to data transfer between processors and forming a data transfer bus. The transfer time between processors has been reduced.

これらの計算機はプロセッサ台数を増やすことにより
計算速度を増大させることが容易であり、現在、実際に
稼動しているプロセッサ数十台の規模を拡張し、将来は
数百〜数万台の規模となると考えられる。しかし、プロ
セッサ台数が増えるにしたがい、プロセッサ間のデータ
線の本数が増え、この結果、システムの信頼性が低下す
る。そこで複数台のプロセッサをモジュール化すること
により、モジュールの接続データ線の信頼性を向上する
ことが望ましい。These computers are easy to increase the calculation speed by increasing the number of processors, expand the scale of dozens of currently operating processors, and scale to hundreds to tens of thousands in the future. It is considered to be. However, as the number of processors increases, the number of data lines between processors increases, and as a result, the reliability of the system decreases. Therefore, it is desirable to improve reliability of connection data lines of modules by modularizing a plurality of processors.

［発明が解決しようとする問題点］しかし、上記従来技術は、モジュール化について十分
に配慮されておらず、モジュール化するプロセッサの台
数を増やすにしたがって、一モジュールに接続するデー
タ線の本数が増大する問題がある。第３図に、上記従来
技術で、Ｍ×Ｎ台のプロセッサP₁₁,P₁₂,…P_MNを一モジ
ュールとしてモジュール化する例を示す。第３図に示す
ように、一モジュールに接続される双方向のデータ線の
本数は、４方向合わせると２×（Ｍ＋Ｎ）本となり、例
えばＭ＝Ｎ＝４の16台のプロセッサを一モジュールとす
る場合、合計16本のデータ線が一モジュールに接続され
ることになり、モジュール化による信頼性の向上をそれ
ほど期待できない。また、一モジュールに多数のデータ
線を接続することは、モジュールをコンパクトに実装で
きないというような実装上の問題を生じさせる。[Problems to be Solved by the Invention] However, in the above-mentioned conventional technology, modularization is not sufficiently considered, and the number of data lines connected to one module increases as the number of processors to be modularized increases. There is a problem to do. In Figure 3, the prior art, M × N block of the processor P _11, P _12, shows an example of a module as a module ... P _MN. As shown in FIG. 3, the number of bidirectional data lines connected to one module is 2 × (M + N) in four directions. For example, 16 processors of M = N = 4 are regarded as one module. In this case, a total of 16 data lines are connected to one module, and the improvement in reliability by modularization cannot be expected so much. In addition, connecting a large number of data lines to one module causes a mounting problem such that the module cannot be compactly mounted.

また、モジュール間を１本の双方向のデータ線で接続
することは容易に考えうるが、各々のプロセッサから該
当するデータを取り出し、順番を定めて送る合理的な方
式は公知でない。Although it is easy to connect the modules with one bidirectional data line, a rational method of extracting the corresponding data from each processor, determining the order, and sending the data is not known.

本発明の目的は、接続データ線の本数を最少にし得る
モジュール化並列計算機を提供することにある。It is an object of the present invention to provide a modular parallel computer capable of minimizing the number of connection data lines.

［問題点を解決するための手段］本発明は、二次元格子に結合されたプロセッサをモジ
ュール化する際に、モジュール間で論理的に隣接するプ
ロセッサの全てを結合するのではなく、モジュール間で
は特定のプロセッサ間だけを物理的に結合すれば、偏微
分方程式の数値解析を並列処理する場合、最も重要な隣
接プロセッサへのデータ転送を、途中のプロセッサを径
由して実行でき、これによりモジュール間のデータ線の
本数を削減できることに着眼することにより生まれた。[Means for Solving the Problems] The present invention provides a method of modularizing a processor coupled to a two-dimensional lattice, instead of coupling all logically adjacent processors between the modules. If only specific processors are physically connected, the parallel transfer of numerical analysis of partial differential equations allows data transfer to the most important neighboring processor to be executed by the intermediate processor, thereby enabling the module It was born by focusing on reducing the number of data lines between them.

本発明は、一次元格子に結合された複数のプロセッサ
をモジュールとしたモジュール化並列計算機であって、
一モジュール内の外周の一辺に位置するプロセッサが通
信メモリを介して互に結合され、外周の一辺の端にあた
るプロセッサからモジュール外に出力されたデータ信号
が、隣接する他のモジュール内の同様に結合された外周
の一辺に位置するプロセッサのうち反対の端にあたるプ
ロセッサに入力するようになされ、以て、隣接モジュー
ル間で互に論理的に隣りにあるプロセッサがモジュール
内の外周の一辺上の結合されたプロセッサ台数分だけ互
に離れて接続されていることを特徴とする。The present invention is a modular parallel computer having a plurality of processors coupled to a one-dimensional lattice as modules,
Processors located on one side of the outer periphery in one module are connected to each other via the communication memory, and data signals output from the processor located at one end of the outer periphery to the outside of the module are similarly connected in other adjacent modules. Input to the processor at the opposite end of the processors located on one side of the perimeter, so that processors which are logically adjacent to each other between adjacent modules are connected on one side of the perimeter in the module. Are connected to each other by the number of connected processors.

すなわち、本発明では、モジュール間で、論理的に隣
りでなく、互いに対角に位置するプロセッサ間を物理的
に結合したことにより、Ｍ×Ｎ台のプロセッサを一モジ
ュールとした場合、モジュール間で論理的に隣りとなる
プロセッサは、全て、物理的な結合の関係ではＭ台（又
はＮ台）先に有ることになり、論理的に隣りとなるプロ
セッサへのデータ転送をＭ−１台（又はＮ−１台）のプ
ロセッサを径由して一斉に送ることができる。In other words, according to the present invention, when M × N processors are integrated into one module by physically connecting processors that are not logically adjacent to each other but are located diagonally to each other, All of the logically adjacent processors are M (or N) ahead in terms of physical coupling, and the data transfer to the logically adjacent processor is M-1 (or N-1). (N-1) processors can be sent simultaneously.

［作用］偏微分方程式を各部分領域を各プロセッサに分担させ
て並列処理する場合、論理的に隣りにあるプロセッサ間
で一斉にデータ転送を行なう。異なるモジュールにあり
論理的に隣り合せにあるプロセッサは通信メモリを介し
て結合されており、隣のモジュールにあり、論理的に隣
りにあるプロセッサは、一外周辺上にＮ台のプロセッサ
がある場合、接続上はＮ台離れて接続される。隣りのモ
ジュールの論理的に隣りのプロセッサへは、一外周辺上
のプロセッサが一斉に通信メモリへ転送するデータを書
き込みＮ台先へ順送りすることにより、一斉にデータを
転送することができる。[Operation] In the case where the partial differential equations are processed in parallel by assigning each partial region to each processor, data is simultaneously transferred between logically adjacent processors. Processors in different modules and logically adjacent to each other are connected via a communication memory, and in adjacent modules, logically adjacent processors have N processors on the outer periphery. , N connections apart from each other. Data can be transferred all at once to the logically adjacent processor of the adjacent module by writing the data to be transferred to the communication memory all at once to the communication memory and sequentially sending the data to N destinations.

［実施例］以下、本発明の一実施例を図面を用いて説明する。Embodiment An embodiment of the present invention will be described below with reference to the drawings.

第１図は本発明の一実施例のモジュール化並列計算機
の一モジュールの構成図である。第１図において、１は
モジュール、２はプロセッサ、３は転送制御回路、４は
モジュール内転送回路である。５は通信メモリであり、
FIFO（First IN First Out）を用いる。モジュール１
は、モジュール内転送回路４により二次元格子状に結合
されたＭ×Ｎ台（図では３×３台）のプロセッサ２で構
成される。図中の上下左右の隣接する図示されていない
モジュールの論理的に隣りに有るプロセッサとの間でデ
ータの転送を行なうために、モジュール１の外周にあた
るプロセッサ２に対して通信メモリ５を配置し、一外周
辺上の通信メモリ５を単方向のデータ線50によって連結
し、両端のデータ線および同期信号を（101,102,10
3），（111,112,113），（121,122,123），（131,132,1
33）としてモジュール１の外に出力している。通信メモ
リ５に対してプロセッサ２は、転送制御回路３を介して
データ信号23,30により接続される。なお、モジュール
内のプロセッサ間でデータの転送を行なうために、プロ
セッサ２は各行、各列毎にモジュール内転送回路４に接
続される。FIG. 1 is a configuration diagram of one module of a modular parallel computer according to one embodiment of the present invention. In FIG. 1, 1 is a module, 2 is a processor, 3 is a transfer control circuit, and 4 is an intra-module transfer circuit. 5 is a communication memory,
Use FIFO (First IN First Out). Module 1
Is composed of M × N (3 × 3 in the figure) processors 2 connected in a two-dimensional lattice by an intra-module transfer circuit 4. In order to transfer data to and from a processor which is logically adjacent to an adjacent module (not shown) on the upper, lower, left and right sides in the figure, a communication memory 5 is arranged for the processor 2 on the outer periphery of the module 1, The communication memories 5 on the outer periphery are connected by a unidirectional data line 50, and the data lines at both ends and the synchronization signal are connected to (101, 102, 10).
3), (111,112,113), (121,122,123), (131,132,1
33) is output outside the module 1. The processor 2 is connected to the communication memory 5 by the data signals 23 and 30 via the transfer control circuit 3. In order to transfer data between processors in the module, the processor 2 is connected to the intra-module transfer circuit 4 for each row and each column.

第１図の実施例では、モジュールの一外周辺上のプロ
セッサ２の結合として、先に本出願人が出願した特許出
願59−273061号（特開昭61−151773号）に開示されてい
る結合を用いている。この結合によれば、離れたプロセ
ッサ間でも高速にデータ転送が可能であり本発明には最
も適している。もちろん、この代りに、転送する時間と
手順が多くかかることにはなるが、ACM Transaction on
Computer Systems,vol1,No3,August 1983,P195−221に
示される共有メモリを介した結合方式を用いることも可
能である。In the embodiment shown in FIG. 1, as the connection of the processor 2 on the outer periphery of the module, the connection disclosed in Japanese Patent Application No. 59-273061 (Japanese Patent Application Laid-Open No. 61-151773) previously filed by the present applicant is used. Is used. According to this coupling, data can be transferred at high speed even between distant processors, and is most suitable for the present invention. Of course, this will take a lot of time and steps to transfer, but ACM Transaction on
It is also possible to use a coupling method via a shared memory described in Computer Systems, vol1, No3, August 1983, P195-221.

第２図は本実施例に係る複数のモジュール１を接続し
た構成図であり、特にモジュール１の外周にあたるプロ
セッサ２の結合を示している。データ信号121,122は上
隣のモジュールのデータ信号101,102に接続され、同様
にデータ信号131,132は右隣のモジュールのデータ信号1
11,112に接続される。この結合方式では、一モジュール
の外周辺上にＮ台のプロセッサ２が有る場合、隣のモジ
ュールで互に論理的に隣りにあるプロセッサ（例えば第
２図のP_M1とＰ′₁₁;P_M2とＰ′₁₂;P_MNとＰ′_1N）は、物
理的な結合の関係では互に常にＮ台先にあることにな
る。同様に、一モジュールの他の一外周辺上にＮ台のプ
ロセッサがある場合、隣のモジュールで互に論理的に隣
りにあるプロセッサ（例えば第２図のP₁₁とＰ″_1N;P₂₁
とＰ″_2N;P_M1とＰ″_MN）は互に常にＭ台先にあることに
なる。FIG. 2 is a configuration diagram in which a plurality of modules 1 according to the present embodiment are connected, and particularly shows coupling of a processor 2 on the outer periphery of the module 1. The data signals 121 and 122 are connected to the data signals 101 and 102 of the upper adjacent module, and similarly, the data signals 131 and 132 are the data signals 1 and 2 of the right adjacent module.
Connected to 11,112. In this coupling method, one if the module processor 2 of N number is on the outer periphery of, P _M1 and P _'11 processors (e.g., a second view each other next to the logical next to the module; and P _M2 P ′ ₁₂ ; P _MN and P ′ _1N ) are always N units ahead of each other in terms of physical coupling. Similarly, one if on the other Ichisoto peripheral module has N number of processors, processor located next to one another logically next to the module (e.g. Figure 2 of P ₁₁ and P _"1N; P ₂₁
And P ″ _2N ; P _M1 and P ″ _MN ) will always be M ahead of each other.

第４図は転送制御回路３の構成を示す。転送制御回路
３は同期回路35、デコーダ36、カウンタ回路37、パルス
発生回路38、ゲート39からなり、パルス発生回路38から
はライトパルス380、リードパルス381が出力され、デコ
ーダ36からのライト信号360、リード信号361と論理和が
とられ、ライト信号30b、リード信号30cとなり、それぞ
れ右隣と左隣のFIFOを用いた通信メモリ５のライト端子
とリード端子へ接続される。データ線30aは２つの通信
メモリ５を結ぶデータ線50に接続され、ゲート39を通し
てプロセッサ２からのデータ信号23aに接続される。プ
ロセッサ２が通信メモリ５へデータを出力する時は、デ
ータ信号23aへデータが送出され、アドレス信号23bへ出
力信号が送出され、ライト信号30bが送出され、ゲート3
9が開き、右隣の通信メモリ５にデータが格納される。
プロセッサ２が通信メモリ５からデータを入力する時
は、入力信号がアドレス信号23bへ送出され、リード信
号30cが送出され、ゲート39が開き、左隣の通信メモリ
５からデータがデータ信号23aへ送出され、プロセッサ
２へデータが送られる。通信メモリ５の間でデータを転
送する時はカウンタ回路37へライトパルス380およびリ
ードパルス381の発信回数を指定する。この時、アドレ
ス信号23bをデコーダ36がデコードし、カウンタ回路37
への書き込みを選択し、データとしてデータ信号23aを
介してカウンタ回路37へ、発信回数を書き込む。カウン
タ回路37は、発信回数をカウントダウンし、周期的に信
号370をパルス発生回路38へ送出する。パルス発生回路3
8は、リードパルス381を送出し、左側の通信メモリ５に
リード信号30cを送り、左側の通信メモリ５にデータ信
号50を送出させた後、ライトパルス380を送出し、右側
の通信メモリ５にライト信号30bを送出し、右側の通信
メモリ５に、左側の通信メモリ５が送出したデータを読
み込ませる。カウンタ回路37へ書き込む発信回数を適切
に決めることで、第２図に示した異なるモジュールの論
理的に隣にあるプロセッサ間のデータ転送ができる。例
えば外周の１辺にＮ台のプロセッサがあり各々のプロセ
ッサ間でｎ個のデータを転送するときには、各々のプロ
セッサは、まず転送するｎ個のデータを通信メモリ５に
書き込んだ後、カウンタ回路37に発信回数としてｎ（Ｎ
−１）を書き込む。これにより、異なるモジュールの論
理的隣りにあるプロセッサへの転送が実行される。FIG. 4 shows the configuration of the transfer control circuit 3. The transfer control circuit 3 includes a synchronization circuit 35, a decoder 36, a counter circuit 37, a pulse generation circuit 38, and a gate 39. The pulse generation circuit 38 outputs a write pulse 380 and a read pulse 381, and the write signal 360 from the decoder 36. , And the read signal 361 are ORed to become the write signal 30b and the read signal 30c, which are connected to the write terminal and the read terminal of the communication memory 5 using the right and left FIFOs, respectively. The data line 30a is connected to a data line 50 connecting the two communication memories 5 and to a data signal 23a from the processor 2 through a gate 39. When the processor 2 outputs data to the communication memory 5, the data is sent to the data signal 23a, the output signal is sent to the address signal 23b, the write signal 30b is sent, and the gate 3
9 is opened, and data is stored in the communication memory 5 on the right side.
When the processor 2 inputs data from the communication memory 5, the input signal is sent to the address signal 23b, the read signal 30c is sent, the gate 39 is opened, and the data is sent from the communication memory 5 on the left to the data signal 23a. The data is sent to the processor 2. When data is transferred between the communication memories 5, the number of times of transmission of the write pulse 380 and the read pulse 381 to the counter circuit 37 is designated. At this time, the address signal 23b is decoded by the decoder 36, and the counter circuit 37
Is selected, and the number of transmissions is written as data into the counter circuit 37 via the data signal 23a. The counter circuit 37 counts down the number of transmissions, and periodically sends out a signal 370 to the pulse generation circuit. Pulse generation circuit 3
8 sends a read pulse 381, sends a read signal 30c to the left communication memory 5, sends a data signal 50 to the left communication memory 5, sends a write pulse 380, and sends it to the right communication memory 5. It sends out the write signal 30b and causes the right communication memory 5 to read the data sent by the left communication memory 5. By appropriately determining the number of transmissions to be written to the counter circuit 37, data can be transferred between processors logically adjacent to different modules shown in FIG. For example, when there are N processors on one side of the outer periphery and n data are transferred between each processor, each processor first writes the n data to be transferred into the communication memory 5 and then writes the data to the counter circuit 37. To the number of calls to n (N
Write -1). This causes a transfer of the different module to the logically adjacent processor.

第５図はモジュール内転送回路４の構成を示し、１行
または１列のモジュール内転送回路４とプロセッサ２に
ついて示してある。（第５図において、第４図と同じ符
号は同じ構成のものを示しているが、第４図のもの自体
を示しているのではない。）通信メモリ５とプロセッサ
２の接続関係は、モジュール外への転送と同様である。
これは、プロセッサ２の通信メモリ５への制御方式を同
一にするためで、他の結合方式であっても良い。ただ
し、異なるモジュールへの転送回路が単方向で順回させ
ているのに対し、モジュール内転送回路４では、２つの
単方向の転送回路4a,4bにより双方向としている。FIG. 5 shows the configuration of the intra-module transfer circuit 4, showing the one-row or one-column intra-module transfer circuit 4 and the processor 2. (In FIG. 5, the same reference numerals as those in FIG. 4 indicate the same components, but do not indicate the same components in FIG. 4.) The connection relationship between the communication memory 5 and the processor 2 is a module. This is the same as transferring outside.
This is to make the control method of the processor 2 for the communication memory 5 the same, and another coupling method may be used. However, while the transfer circuits to different modules are made to rotate in one direction, the intra-module transfer circuit 4 is made bidirectional by two unidirectional transfer circuits 4a and 4b.

以下、本発明のモジュール化並列計算機による並列計
算の方法を説明する。二次元拡散方程式（t:時間、x:行方向位置、y:列方向位置、φ：求める変
数）を、時間と位置について差分化し、 φ_ij ^{（ν＋１）} ＝λφ_i-1j ^（ν）＋（１−２λ）φ_ij ^（ν）＋λφ_i+1j ^（ν）＋λφ_ij-1 ^（ν）＋（１−２λ）φ_ij ^（ν）＋λφ_ij+1 ^（ν）とした差分式を適当な境界条件のもとに解く。ここに、 λ＝Δt/Δx²＝Δt/Δy² Δｔは差分化する時間間隔 Δx,Δｙは差分化する格子間隔である。Hereinafter, a method of parallel calculation by the modular parallel computer of the present invention will be described. Two-dimensional diffusion equation (T: time, x: row direction position, y: column direction position, φ: variable to be obtained) is differentiated with respect to time and position, and φ _ij ^{(ν + 1)} = λφ _i-1j ^(ν) + (1-2λ) Under the appropriate boundary condition, the difference formula of φ _ij ^(ν) + λφ _{i + 1j} ^(ν) + λφ _ij-1 ^(ν) + (1-2λ) φ _ij ^(ν) + λφ _{ij + 1} ^(ν) is obtained. solve. Here, λ = Δt / Δx ² = Δt / Δy ² Δt is the time interval to be differentiated, and Δx and Δy are the grid intervals to be differentiated.

並列計算は、第２図に示す複数のモジュール１を結合
したモジュール化並列計算機で行なう。まず、計算領域
を分割し、分割した部分領域を各々のプロセッサ２へ分
担させる。その結果、ある部分領域（I₁≦ｉ≦I₂,J₁≦
ｊ≦J₂）は、例えばモジュールM_m1m2内のプロセッサP
_p1p2において計算されることになる。ここに、m1m2は二
次元格子に接続されたモジュール１の番号、p1p2はモジ
ュール内のプロセッサ２の番号を示す。差分式に示され
るように、格子点ijの変数φ_ij ^{（ν＋１）}は古い時刻
（ν）の格子点ijとそれに隣接する格子点の値を用いて
計算できる。第６図に並列計算の流れを示す。第６図に
ついて説明すると、（１）各々のプロセッサがφ_ijの初期値φ_ij ⁽⁰⁾を設定
する。The parallel calculation is performed by a modular parallel computer in which a plurality of modules 1 shown in FIG. 2 are connected. First, the calculation area is divided, and the divided partial areas are shared by the processors 2. As a result, a partial area (I ₁ ≦ i ≦ I ₂ , J ₁ ≦
j ≦ J ₂ ) is, for example, the processor P in the module M _m1m2 .
It will be calculated in _p1p2 . Here, m1m2 indicates the number of the module 1 connected to the two-dimensional lattice, and p1p2 indicates the number of the processor 2 in the module. As shown in the difference equation, the variable φ _ij ^{(ν + 1)} of the lattice point ij can be calculated using the value of the lattice point ij at the old time (ν) and the lattice point adjacent thereto. FIG. 6 shows a flow of the parallel calculation. Referring to Figure 6, (1) Set the initial value of each processor is φ _{_ij} φ _ij ^(0).

（２）各々のプロセッサが上下左右のプロセッサへ送る
データを、それぞれの方向に用意された通信メモリへ出
力し、同一モジュール内のプロセッサへの転送の場合、
カウンタ37の計数値を０とする。異なるモジュールの論
理的に隣にあるプロセッサへ転送する場合、カウンタ37
の計数値をＮ台先に送るように設定する。例えばモジュ
ールの外周辺にＮ台のプロセッサがあり、ｎ個のデータ
を送るときには、カウンタ37の計数値としてｎ（Ｎ−
１）を設定する。(2) When each processor outputs data to be sent to the upper, lower, left, and right processors to a communication memory prepared in each direction, and transfers the data to the processors in the same module,
The count value of the counter 37 is set to 0. When transferring to a logically adjacent processor of a different module, the counter 37
Is set to be sent to N devices ahead. For example, there are N processors around the periphery of the module and when sending n data, the counter 37 counts n (N−N−
Set 1).

（３）上下左右のプロセッサから送られたデータを通信
メモリから読み込む。(3) Read data sent from the upper, lower, left and right processors from the communication memory.

（４）各々のプロセッサが、前述した差分式に基づき、
φ^（ν）からφ^（ν）を計算する。(4) Each processor is based on the above-mentioned difference formula,
Calculate φ ^(ν) from φ ^(ν) .

（５）時刻を更新する。(5) Update the time.

（６）終了時刻に達したならば計算を終了する。(6) When the end time has been reached, the calculation ends.

以上述べたように、本発明のモジュール化並列計算機
によれば、複数のモジュール内の複数のプロセッサの並
列処理により、拡散方程式等の偏微分方程式の数値解を
求めることが可能である。また、モジュール間は、高
々、単方向の２本のデータ信号で接続されるので、接続
による信頼性の低下を防ぐことができ、より多くのプロ
セッサの結合によりシステム全体の可能性を向上するこ
とができる。As described above, according to the modularized parallel computer of the present invention, a numerical solution of a partial differential equation such as a diffusion equation can be obtained by parallel processing of a plurality of processors in a plurality of modules. Further, since the modules are connected by at most two unidirectional data signals, a decrease in reliability due to the connection can be prevented, and the possibility of the entire system can be improved by coupling more processors. Can be.

なお、実装技術の進歩により、モジュールの追加をデ
ータ線の接続だけで行なえるようになれば、例えばプロ
セッサ４×４台を１モジュールとして、ユーザが解くべ
き問題に合わせて大きなシステムを組むことが可能にな
り、拡張性に富んだ並列計算機になりうる。In addition, if the addition of modules can be performed only by connecting data lines due to the progress of mounting technology, for example, a large system can be constructed according to a problem to be solved by a user, for example, using 4 × 4 processors as one module. It becomes possible and it becomes a scalable parallel computer.

［発明の効果］本発明によれば、異なるモジュールにあり且つ論理的
に隣合わせになるプロセッサを、モジュールの一外周辺
上のプロセッサ台数だけ互に離れた位置にあるように結
合しているので、転送手順が繁雑にならず、モジュール
間のデータ線の本数を単方向で２本に抑えることができ
る。[Effects of the Invention] According to the present invention, processors that are logically adjacent to each other in different modules are connected so as to be separated from each other by the number of processors on the outer periphery of the module. The transfer procedure is not complicated, and the number of data lines between modules can be reduced to two in one direction.

【図面の簡単な説明】[Brief description of the drawings]

第１図は本発明の一実施例に係るモジュール化並列計算
機の一モジュールの構成図、第２図は本発明の一実施例
に係るモジュール化並列計算機の構成図、第３図は従来
の格子結合並列計算機のモジュール化の概念図、第４図
は第１図における転送制御回路３の構成図、第５図は第
１図におけるモジュール内転送回路４の構成図、第６図
は本発明のモジュール化並列計算機による偏微分方程式
の並列計算の流れ図である。１……モジュール、２……プロセッサ、３……転送制御回路、４……モジュール内転送回路、５……通信メモリ、35……同期回路 36……デコーダ、37……カウンタ回路、 38……パルス発生回路。FIG. 1 is a block diagram of one module of a modular parallel computer according to one embodiment of the present invention, FIG. 2 is a block diagram of a modular parallel computer according to one embodiment of the present invention, and FIG. FIG. 4 is a conceptual diagram of modularization of the coupled parallel computer, FIG. 4 is a configuration diagram of the transfer control circuit 3 in FIG. 1, FIG. 5 is a configuration diagram of the intra-module transfer circuit 4 in FIG. 1, and FIG. 5 is a flowchart of parallel calculation of partial differential equations by a modular parallel computer. 1 ... Module 2 ... Processor 3 ... Transfer control circuit 4 ... Transfer circuit in module 5 ... Communication memory 35 ... Synchronous circuit 36 ... Decoder 37 ... Counter circuit 38 ... Pulse generation circuit.

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】二次元格子に結合された複数のプロセッサ
をモジュールとしたモジュール化並列計算機であって、
−モジュール内の外周の一辺に位置するプロセッサが通
信メモリを介して互に結合され、外周の一辺の端にあた
るプロセッサからモジュール外に出力されたデータ信号
が、隣接する他のモジュール内の同様に結合された外周
の一辺に位置するプロセッサのうち反対の端にあたるプ
ロセッサに入力するようになされ、以て、隣接モジュー
ル間で互いに論理的に隣りにあるプロセッサがモジュー
ル内の外周の一辺上の結合されたプロセッサ台数分だけ
互に離れて接続されていることを特徴とするモジュール
化並列計算機。1. A modular parallel computer comprising a plurality of processors coupled to a two-dimensional lattice as modules.
The processors located on one side of the outer periphery of the module are connected to each other via the communication memory, and the data signal output from the processor at the end of one side of the outer periphery to the outside of the module is similarly connected in another adjacent module. Input to the processor at the opposite end of the processors located on one side of the perimeter, whereby the processors logically adjacent to each other between adjacent modules are connected on one side of the perimeter in the module. A modular parallel computer, which is connected to each other by the number of processors.