JP2004005572A

JP2004005572A - Multiprocessor system, data processing method, data processing system, computer program, and semiconductor device

Info

Publication number: JP2004005572A
Application number: JP2003106871A
Authority: JP
Inventors: Nobuo Sasaki; 佐々木　伸夫
Original assignee: Sony Computer Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2000-09-27
Filing date: 2003-04-10
Publication date: 2004-01-08
Anticipated expiration: 2021-09-21
Also published as: JP3872034B2

Abstract

<P>PROBLEM TO BE SOLVED: To realize high-speed data processing by a multiprocessor system. <P>SOLUTION: The multiprocessor system is provided with plural cell processors 20 for managing at least one of plural objects distributed in a predetermined virtual space and generating position data, to indicate a position of the object in the virtual space, and a BCMC 10 which can acquire the object position data from all of the cell processors 20 and broadcasts each acquired position data to all of the cell processors 20. Each of the plural cell processors 20 fetchs the position data which are broadcast from the BCMC 10, determines whether the object with its position indicated by the position data taken in lies within the limits of distribution of the object under the management of the cell processor 20 itself, and if so, then determines whether it lies at a position of collision the object under the management of the cell processor 20 the object. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、複数のデータ処理手段によりデータ処理を行うデータ処理システム、例えばマルチプロセッサシステム及びデータ処理方法に関する。
【０００２】
【発明の背景】
高度情報化社会が進み、コンピュータ等のデータ処理装置によるデータ処理量は増大する傾向にある。また、データ処理の内容も複雑化、高度化している。従来、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）などのプロセッサの高性能化や、複数のプロセッサによるマルチプロセッサ化により、データ処理装置全体の処理能力の向上を図っている。
しかし、近年、要求されるデータ処理能力の増大のスピードは、プロセッサの高性能化のスピードを凌駕するまでになっている。プロセッサの高性能化は、その開発期間が長いこともあり一朝一夕に行えるものではない。
一方、例えばマルチプロセッサによるデータ処理能力は、使用するプロセッサの数や、その処理方法により決まり、個々のプロセッサの高性能化への依存度が小さい。そのために、データ処理装置の処理能力を向上させるための有効な手段の一つとなっている。
【０００３】
マルチプロセッサによるデータ処理方法を、一つのプロセッサがデータ処理時に必要とするデータの範囲により分類すると、以下のようになる。
（１）データ処理を行うプロセッサが、隣接して接続されるプロセッサにより処理されたデータのみを使用する
このような制御は、セル・オートマトン、画像フィルタ、布や波の運動の計算、曲面からのポリゴン生成の計算等に向いている。
（２）データ処理を行うプロセッサが、複数のプロセッサのうちの一部のプロセッサにより処理されたデータのみを使用する
このような制御は、多対多の衝突判定等に向いている。
【０００４】
上記の（１）の場合のデータ処理は、従来の並列プロセッサによって、効率よく実現可能である。しかし、（２）のデータ処理は、並列プロセッサ間の通信速度によりシステム全体の処理速度が制限されてしまい、各プロセッサの処理速度を十分に発揮できない。例えば、すべてのプロセッサ間をクロスバー接続することにより、（２）のデータ処理を高速に行うことも可能であるが、この場合、必要なハードウェアが膨大になり、現実的ではない。
【０００５】
本発明の課題は、例えば上記の（２）のデータ処理を従来よりも効率よく行うことのできるデータ処理システム及びデータ処理方法を提供することにある。
【０００６】
【課題を解決するための手段】
上記課題を解決するため、本発明のマルチプロセッサシステムは、所定の仮想空間内に分布する複数のオブジェクトの少なくとも一つを管理して、当該オブジェクトについて、前記仮想空間内における位置を表す位置データを生成する複数のプロセッサと、すべての前記プロセッサから前記オブジェクトの位置データを取得可能であり、取得した位置データを一つずつすべてのプロセッサにブロードキャストするコントローラと、を備えており、前記複数のプロセッサの各々は、前記コントローラからブロードキャストされた位置データを取り込んで、自分の管理下にあるオブジェクトが分布する範囲内に、取り込んだ位置データで位置が表されるオブジェクトがあるか否かを判定し、範囲内にある場合に、当該オブジェクトが自分の管理下にあるオブジェクトと衝突する位置にあるか否かを判定するようになっている。
【０００７】
前記複数のプロセッサは、自分の管理するオブジェクトについて、前記仮想空間内における速度を表す速度データを生成するようになっていてもよい。この場合、前記コントローラが、すべての前記プロセッサから前記位置データ及び前記速度データを取得可能であり、取得した位置データ及び速度データを一組ずつすべてのプロセッサにブロードキャストすれば、前記プロセッサは、ブロードキャストされた位置データで位置が表されるオブジェクトが自分の管理下にあるオブジェクトと衝突位置にある場合に、ブロードキャストされたオブジェクトの前記速度データにより、衝突による衝撃の強さを定量的に表す衝突強度データ及び衝突によるオブジェクトへの影響を表すデータを生成することができるようになる。
また、前記複数のプロセッサに、各々を識別するための識別データを割り当て、前記衝突強度データを生成したプロセッサからは衝突強度データをそのプロセッサの識別データと共に取り込み、前記衝突強度データを生成していないプロセッサからは衝突強度データの値よりも小さい値をそのプロセッサの衝突強度データとして識別データと共に取り込んで、最も大きい衝突強度データを生成したプロセッサを特定し、特定したプロセッサの識別データを前記コントローラに送る最大値検出機構をさらに備えるようにしてもよい。これにより前記コントローラは、前記最大値検出機構から送られた識別データにより表されるプロセッサから、衝突強度データ及び衝突によるオブジェクトへの影響を表すデータを取得することができるようになる。
【０００８】
また、このようなマルチプロセッサシステムにおいて、前記プロセッサは、例えば、位置データがブロードキャストされたオブジェクトと自分の管理下にあるオブジェクトとの距離を算出することにより、当該オブジェクトが自分の管理下にあるオブジェクトと衝突位置にあるか否かを判定するようになっている。
【０００９】
本発明のデータ処理方法は、所定の仮想空間内に分布する複数のオブジェクトをクラスタ単位に分けて、それぞれが異なる一つのクラスタを管理する複数のデータ処理手段と、すべての前記オブジェクトの前記仮想空間内における位置及び速度を管理するとともに、前記複数のデータ処理手段に、前記オブジェクトの位置を表す位置データ及び速度を表す速度データをブロードキャストする制御手段と、を有する装置又はシステムにおいて実行される方法であって、前記制御手段が、すべてのオブジェクトの前記位置データ及び前記速度データを、オブジェクトが属するクラスタを表すクラスタデータとともにすべてのデータ処理手段にブロードキャストする段階と、前記複数のデータ処理手段の各々が、自分の管理するオブジェクトについての位置データ及び速度データを、前記クラスタデータに基づいて取り込み、取り込んだ位置データ及び速度データから、新しい位置データ及び速度データを生成する段階と、前記制御手段が、前記複数のデータ処理手段の各々から、各オブジェクトの新しい位置データ及び速度データを取り込むとともに、取り込んだ位置データを一つずつすべてのデータ処理手段にブロードキャストする段階と、前記複数のデータ処理手段の各々が、前記制御手段からブロードキャストされた新しい位置データを取り込んで、自分の管理下にあるオブジェクトが分布する範囲内に当該新しい位置データで表されるオブジェクトがあるか否かを判定し、範囲内にある場合に、当該オブジェクトが自分の管理下にあるオブジェクトと衝突する位置にあるか否かを判定する段階と、をこの順序で実行する。
【００１０】
本発明のデータ処理システムは、所定の仮想空間内に分布する複数のオブジェクトの、前記仮想空間内における位置を表す位置データを管理する制御手段との間で双方向通信を行うシステムであって、前記仮想空間内に分布する複数のオブジェクトの少なくとも一つを管理して、当該オブジェクトについて、前記仮想空間内における位置を表す位置データを生成するとともに、生成した位置データを前記制御手段に送る手段と、前記制御手段から一つずつブロードキャストされる位置データを取り込んで、自分の管理下にあるオブジェクトが分布する範囲内に、取り込んだ位置データで位置が表されるオブジェクトがあるか否かを判定し、範囲内にある場合に、当該オブジェクトが自分の管理下にあるオブジェクトと衝突する位置にあるか否かを判定する手段と、を備えている。
【００１１】
本発明が提供するコンピュータプログラムは、所定の仮想空間内に分布する複数のオブジェクトの、前記仮想空間内における位置を表す位置データを管理する制御手段との間で双方向通信を行う、コンピュータ搭載の装置に於いて、前記コンピュータに以下の機能を形成させるためのコンピュータプログラムであり、本発明が提供する半導体デバイスは、所定の仮想空間内に分布する複数のオブジェクトの、前記仮想空間内における位置を表す位置データを管理する制御手段との間で双方向通信を行う、コンピュータ搭載の装置に組み込まれることにより、前記コンピュータに以下の機能を形成させる半導体デバイスである。
（１）前記仮想空間内に分布する複数のオブジェクトの少なくとも一つを管理して、当該オブジェクトについて、前記仮想空間内における位置を表す位置データを生成するとともに、生成した位置データを前記制御手段に送る手段、
（２）前記制御手段から一つずつブロードキャストされる位置データを取り込んで、自分の管理下にあるオブジェクトが分布する範囲内に、取り込んだ位置データで位置が表されるオブジェクトがあるか否かを判定し、範囲内にある場合に、当該オブジェクトが自分の管理下にあるオブジェクトと衝突する位置にあるか否かを判定する手段。
【００１２】
【発明の実施の形態】
以下に、本発明をデータ処理システムの一例となるマルチプロセッサシステムに適用した場合の実施の形態を説明する。
【００１３】
＜全体構成＞
図１は、マルチプロセッサシステムの構成例を示した図である。このマルチプロセッサシステム１は、データ処理及びデータ記録及び読み出しのための制御手段であるブロードキャストメモリコントローラ（以下、「ＢＣＭＣ（Ｂｒｏａｄｃａｓｔ　Ｍｅｍｏｒｙ　Ｃｏｎｔｒｏｌｌｅｒ）」という。）１０と、各々データ処理手段の一例となる複数のセルプロセッサ２０と、データ処理のための所要の機能を種々形成するための複数のＷＴＡ（Ｗｉｎｎｅｒ　Ｔａｋｅ　Ａｌｌ）・総和回路３０と、を含んで構成されている。
ＢＣＭＣ１０とすべてのセルプロセッサ２０とは、ブロードキャストチャネル（一斉送出可能な通信チャネル）により接続されている。
【００１４】
このマルチプロセッサシステム１は、各セルプロセッサ２０によるデータ処理結果の一例となる状態変数値をＢＣＭＣ１０で管理し、ＢＣＭＣ１０からすべてのセルプロセッサ２０の状態変数値を、参照用数値の一例としてブロードキャストにより送出するものである。これにより、各セルプロセッサ２０は、高速に他のセルプロセッサ２０において発生した状態変数値を参照可能とする。
【００１５】
ブロードキャストチャネルは、ＢＣＭＣ１０と複数のセルプロセッサ２０との間の伝送経路であって、アドレスの受け渡しに使用されるアドレスバスと、状態変数値などのデータの受け渡しに使用されるデータバスとを含んで構成される。アドレスには、個々のセルプロセッサ２０を特定するためのセルアドレスと、すべてのセルプロセッサ２０を対象とするブロードキャストアドレスとがある。
セルアドレスは、メモリ上のアドレス（物理アドレス又は論理アドレス）に対応しており、セルプロセッサ２０からの状態変数値は、常に、当該セルプロセッサ２０を示すセルアドレスに対応するアドレスに記憶されるようになっている。各セルプロセッサ２０には、各々を識別するための識別情報として、ＩＤ（ｉｄｅｎｔｉｆｉｃａｔｉｏｎ）が付されている。セルアドレスは、このＩＤにも対応するようになっている。これにより、状態変数値がどのセルプロセッサ２０から出力されたのかを、セルアドレスによって特定することができる。
【００１６】
ＷＴＡ・総和回路３０は、図１に示すように接続される。即ち、ＷＴＡ・総和回路３０は、セルプロセッサ２０側を一段目としてピラミッド状に接続される。一段目のＷＴＡ・総和回路３０の入力端には２つのセルプロセッサ２０が接続され、出力端は二段目のＷＴＡ・総和回路３０の入力端に接続される。
二段目以降は、入力端の各々に下位の段の２つのＷＴＡ・総和回路３０の出力端が接続され、出力端に上位の段のＷＴＡ・総和回路３０の入力端が接続される。最上段のＷＴＡ・総和回路３０は、入力端に下段の２つのＷＴＡ・総和回路３０の出力端が接続され、出力端はＢＣＭＣ１０に接続される。
【００１７】
なお、図示の接続形態の他に、ＷＴＡ・総和回路３０をカスケードに接続しても、本発明を実施することが可能である。この場合、一段目のＷＴＡ・総和回路３０の入力端には２つのセルプロセッサ２０を接続し、出力端を上位の段の入力端に接続する。二段目以降のＷＴＡ・総和回路３０の入力端には、下位の段のＷＴＡ・総和回路３０の出力端とセルプロセッサ２０が接続され、出力端は上位の段の入力端に接続される。最上段のＷＴＡ・総和回路３０は、入力端に下位の段のＷＴＡ・総和回路３０の出力端とセルプロセッサ２０とが接続され、出力端はＢＣＭＣ１０に接続される。
【００１８】
次に、ＢＣＭＣ１０、セルプロセッサ２０、ＷＴＡ・総和回路３０のそれぞれについて詳細に説明する。
【００１９】
＜ＢＣＭＣ＞
ＢＣＭＣ１０は、ブロードキャストチャネルによりすべてのセルプロセッサ２０にデータをブロードキャストするとともに、各セルプロセッサ２０からの状態変数値を取り込んで保持する。図２にＢＣＭＣ１０の構成例を示す。
ＢＣＭＣ１０は、マルチプロセッサシステム１全体の動作を制御するＣＰＵコア１０１と、ＳＲＡＭ（Ｓｔａｔｉｃ　Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）などの書き換え可能なメインメモリ１０２と、ＤＭＡＣ（Ｄｉｒｅｃｔ　Ｍｅｍｏｒｙ　Ａｃｃｅｓｓ　Ｃｏｎｔｒｏｌｌｅｒ）１０３とがバスＢ１で接続されて構成される。ＣＰＵコア１０１は、メインメモリ１０２と協働し、所定のコンピュータプログラムを読み込んで実行することにより、本発明の特徴的なデータ処理を行うための機能を形成するコンピュータ搭載の半導体デバイスである。メインメモリ１０２は、システム全体の共有メモリとして使用されるようになっている。
バスＢ１には、最上段のＷＴＡ・総和回路３０の出力端及びハードディスクや可搬性メディア等の外部メモリも接続される。
【００２０】
ＣＰＵコア１０１は、起動時に上記の外部メモリから起動プログラムを読み込み、その起動プログラムを実行してオペレーティングシステムを動作させる。また、データ処理に必要となる種々のデータを上記の外部メモリから読み出し、これをメインメモリ１０２に展開する。メインメモリ１０２には、各セルプロセッサ２０の状態変数値などのデータも記憶されるようにする。状態変数値は、当該状態変数値を算出したセルプロセッサ２０のセルアドレスに応じたメインメモリ１０２のアドレスに記憶される。
ＣＰＵコア１０１は、また、メインメモリ１０２から読み出したデータに基づいて、各セルプロセッサ２０に対してブロードキャストするブロードキャストデータを生成する。ブロードキャストデータは、例えば、状態変数値と当該状態変数値を算出したセルプロセッサ２０を示すセルアドレスとの組からなるペア（組）データである。ペアデータは、１組又は複数組生成される。
【００２１】
ＤＭＡＣ１０３は、メインメモリ１０２と各セルプロセッサ２０との間のダイレクトメモリアクセス転送制御を行う半導体デバイスである。例えば、各セルプロセッサ２０に対しては、ブロードキャストチャネルを介して、ブロードキャストデータをブロードキャストする。また、各セルプロセッサ２０のデータ処理結果を個別に取得して、メインメモリ１０２に書き込む。
【００２２】
＜セルプロセッサ＞
各セルプロセッサ２０は、ブロードキャストデータの中から必要となるデータを取捨選択してデータ処理を行い、データ処理の終了時に、その旨をＷＴＡ・総和回路３０へ報告する。データ処理結果である状態変数値を、ＢＣＭＣ１０からの指示により、ＢＣＭＣ１０へ送出する。各セルプロセッサ２０間は、図示しない共有メモリを介してリング接続される。各セルプロセッサ２０は、データ処理を同期的なクロックで行ってもよく、各々異なるクロックで行ってもよい。図３にセルプロセッサ２０の構成例を示す。
セルプロセッサ２０は、セルＣＰＵ２０１と、入力バッファ２０２と、出力バッファ２０３と、ＷＴＡバッファ２０４と、プログラムコントローラ２０５と、命令メモリ２０６と、データメモリ２０７と、を含んで構成される。
【００２３】
セルＣＰＵ２０１は、プログラマブルな浮動小数点演算器を備えたプロセッサであり、セルプロセッサ２０内の動作を制御して、データ処理を行うものである。セルＣＰＵ２０１は、ＢＣＭＣ１０からブロードキャストされたブロードキャストデータを入力バッファ２０２を介して取得し、ペアデータのセルアドレスにより自己が行うべき処理に必要なデータか否かを判断し、必要であればデータメモリ２０７の対応するアドレスに状態変数値を書き込む。また、データメモリ２０７から状態変数値を読み出してデータ処理を行い、データ処理結果を出力バッファ２０３に書き込み、ＷＴＡ・総和回路３０にデータ処理の終了を示すデータを送る。
【００２４】
入力バッファ２０２は、ＢＣＭＣ１０からブロードキャストされたブロードキャストデータを保持するものである。保持されたブロードキャストデータは、セルＣＰＵ２０１からの要求により、セルＣＰＵ２０１へ送られる。
出力バッファ２０３は、セルＣＰＵ２０１の状態変数値を保持するものである。保持された状態変数値は、ＢＣＭＣ１０からの要求により、ＢＣＭＣ１０へ送信される。
入力バッファ２０２及び出力バッファ２０３は、この他に制御用のデータ等の送受を行ってもよい。
ＷＴＡバッファ２０４は、セルＣＰＵ２０１によるデータ処理の終了時に、セルＣＰＵ２０１からデータ処理の終了を示すデータを受信して、これをＷＴＡ・総和回路３０へ送信することにより、データ処理の終了をＷＴＡ・総和回路３０に報告するものである。データ処理の終了を示す終了データには、例えば、自セルプロセッサ２０のＩＤと、出力バッファ２０３に保存された状態変数値がＢＣＭＣ１０へ読み取られるときの優先度を決める優先度データとが含まれる。
【００２５】
プログラムコントローラ２０５は、セルプロセッサ２０の動作を規定するプログラムをＢＣＭＣ１０から取り込むものである。セルプロセッサ２０の動作を規定するプログラムには、セルプロセッサ２０で実行されるデータ処理のためのプログラムや、当該セルプロセッサ２０で処理に必要なデータを決めるデータ選択プログラム、処理結果がＢＣＭＣ１０へ読み取られるときの優先度を決める優先度決定プログラムなどがある。
命令メモリ２０６は、プログラムコントローラ２０５により取り込んだプログラムを保存するものである。保存したプログラムは、必要に応じてセルＣＰＵ２０１に読み込まれる。
【００２６】
データメモリ２０７は、セルプロセッサ２０において処理されるデータを保存するものである。セルＣＰＵ２０１により必要と判断されたブロードキャストデータが書き込まれる。ブロードキャストデータは、セルアドレスに応じたアドレスに保存される。
また、本実施形態ではデータメモリ２０７の一部は共有メモリを介して隣接するセルプロセッサ２０に繋がっており、１サイクル毎に隣接するセルプロセッサ２０とデータの送受が可能となっている。
【００２７】
＜ＷＴＡ・総和回路＞
複数のＷＴＡ・総和回路３０は、各セルプロセッサ２０から送られるデータ処理の終了を示すデータにより、ＢＣＭＣ１０がセルプロセッサ２０から状態変数値を取り込む順序を決めてＢＣＭＣ１０へ報告する。
図４にＷＴＡ・総和回路３０の構成例を示す。
各ＷＴＡ・総和回路３０は、２つの入力レジスタＡ、Ｂ（以下、第１入力レジスタ３０１、第２入力レジスタ３０２）と、切換器３０３と、比較器３０４と、加算器３０５と、出力レジスタ３０６と、を含んで構成される。
【００２８】
第１入力レジスタ３０１及び第２入力レジスタ３０２は、それぞれ整数レジスタ及び浮動小数点レジスタを備えている。整数レジスタには、例えばセルプロセッサ２０から送られるデータ処理の終了を示す終了データのうち、ＩＤが書き込まれ、浮動小数点レジスタには、例えば優先度データが書き込まれる。
切換器３０３は、比較器３０４及び加算器３０５のいずれか一方を活性化する。具体的には、動作モードに従って一方のみを使用可能とする。動作モードは、例えばＢＣＭＣ１０からの指示により決められる。動作モードについては後述する。
比較器３０４は、第１入力レジスタ３０１及び第２入力レジスタ３０２の各々の浮動小数点レジスタが保持する浮動小数点値の比較を行い、大きい方（又は小さい方）の値と、それに付随する整数とを、出力レジスタ３０６へ書き込む。
加算器３０５は、第１入力レジスタ３０１及び第２入力レジスタ３０２の各々の浮動小数点レジスタが保持する浮動小数点値の和を算出し、算出結果を出力レジスタ３０６へ書き込む。
出力レジスタ３０６は、第１入力レジスタ３０１及び第２入力レジスタ３０２とほぼ同じに構成される。つまり、整数レジスタ及び浮動小数点レジスタを備えている。整数レジスタにはＩＤが書き込まれ、浮動小数点レジスタには優先度データが書き込まれるようになっている。
【００２９】
ＷＴＡ・総和回路３０は、以下に説明する３つの動作モードをもつ。
【００３０】
・最大値（ＷＴＡ）モード：
切換器３０３により、比較器３０４が活性化される。比較器３０４は、第１入力レジスタ３０１及び第２入力レジスタ３０２の各々の浮動小数点レジスタが保持する浮動小数点値Ａ、Ｂの比較を行い、大きい方（又は小さい方）の値と、それに付随する整数値を出力レジスタ３０６に書き込む。出力レジスタ３０６への書き込みが終了すると、第１入力レジスタ３０１及び第２入力レジスタ３０２をクリアする。出力レジスタ３０６の内容は、上位の段のＷＴＡ・総和回路３０の入力レジスタに書き込まれる。このとき、書き込み先の入力レジスタがクリアされていないときは、書き込みがストールして、そのサイクルでは書き込みを行わず、次のサイクルで書き込むようにする。
【００３１】
・加算モード：
切換器３０３により、加算器３０５が活性化される。加算器３０５により、第１入力レジスタ３０１及び第２入力レジスタ３０２の各々の浮動小数点レジスタが保持する浮動小数点値Ａ、Ｂの和を算出し、算出結果を出力レジスタ３０６に書き込む。出力レジスタ３０６の内容は、上位の段のＷＴＡ・総和回路３０の入力レジスタに書き込まれる。
【００３２】
・近似ソートモード：
切換器３０３により、比較器３０４が活性化される。比較器３０４は、第１入力レジスタ３０１及び第２入力レジスタ３０２の各々の浮動小数点レジスタが保持する浮動小数点値Ａ、Ｂの比較を行い、大きい方（又は小さい方）の値と、それに付随する整数値とを出力レジスタ３０６に書き込む。
その後、出力レジスタ３０６に書き込まれた値を保持していた入力レジスタのみをクリアし、出力レジスタ３０６の内容を、上位の段のＷＴＡ・総和回路３０の入力レジスタに書き込む。書き込み先の入力レジスタがクリアされていない場合は、書き込みがストールし、そのサイクルでは書き込みを行わない。ただし、下位の段のＷＴＡ・総和回路３０の出力レジスタ３０６からの書き込み動作は行われる。
近似ソートモードにより、ＢＣＭＣ１０がＷＴＡ・総和回路３０の最上段の出力レジスタ３０６から受け取るデータが、浮動小数点が大きい順或いは小さい順にソートされた（並び替えられた）ものとなる。
【００３３】
なお、各モードに入る前には、すべてのＷＴＡ・総和回路３０の第１入力レジスタ３０１、第２入力レジスタ３０２及び出力レジスタ３０６がクリアされる。
【００３４】
各モードを切替えて使用することにより、複数のＷＴＡ・総和回路３０全体として、上記のソートのための機構（ソート機構）及び／又は総和回路として機能する。つまり、近似ソートモードで動作するときは、ソート機構を実現するものとなり、加算モードで動作するときは、総和回路を実現するものとなる。
【００３５】
最大値モード、近似ソートモードで動作するＷＴＡ・総和回路３０は、次に示すようにして実現してもよい。
すなわち、セルプロセッサ２０と同数の入力レジスタと、切換器と、比較器と、加算器と、出力レジスタとを含んでＷＴＡ・総和回路が構成される。
入力レジスタがセルプロセッサ２０の数と同じだけ用意されており、それぞれが、第１レジスタ３０１、第２レジスタ３０２と同様に、整数レジスタ及び浮動小数点レジスタを備える。比較器は、すべての入力レジスタの浮動小数点レジスタが保持する浮動小数点値の比較を行う。加算器は、すべての入力レジスタの浮動小数点レジスタが保持する浮動小数点値の和を算出する。
出力レジスタは、図４のＷＴＡ・総和回路３０の出力レジスタと同様である。
【００３６】
比較器により、各入力レジスタの浮動小数点レジスタが保持する優先度データを比較して、優先度の高い順に、付随するＩＤを順次出力レジスタに書き込む。これにより、ＩＤを、優先度の高い順序でＢＣＭＣ１０へ送ることができる。
加算器により、各浮動小数点レジスタが保持するデータを加算して、その総和を求めることができる。
このようなＷＴＡ・総和回路は、図１に示すような接続形態をとらなくとも、一つで、本発明におけるソート機構、総和回路として機能する。
【００３７】
＜データ処理方法＞
本実施形態におけるマルチプロセッサシステム１は、以下のように動作することにより、所要のデータ処理を実行する。図５は、このマルチプロセッサシステム１において実行される処理の流れを示すフローチャートである。
【００３８】
ＢＣＭＣ１０のメインメモリ１０２には、すべてのセルプロセッサ２０の状態変数値の初期値が予め記憶される。
ＢＣＭＣ１０は、このセルプロセッサ２０の状態変数値とセルプロセッサ２０を示すセルアドレスとからなるペアデータにより、ブロードキャストデータを作成する（ステップＳ１０１）。そして、作成したブロードキャストデータを、すべてのセルプロセッサ２０へブロードキャストする（ステップＳ１０２）。
各セルプロセッサ２０は、ブロードキャストデータを、入力バッファ２０２に取り込む。セルＣＰＵ２０１は、命令メモリ２０６に記憶されたデータ選択プログラムにより、入力バッファ２０２が保持するブロードキャストデータのセルアドレスを調べて、自セルプロセッサ２０が行うデータ処理に要する状態変数値があるか否かを確認する（ステップＳ１０３）。自らが行うデータ処理に要する状態変数値が無い場合、セルプロセッサ２０は、処理動作を終了する（ステップＳ１０３：無）。自らが行うデータ処理に要する状態変数値が有る場合は（ステップＳ１０３：有）、該当する状態変数値を、この状態変数値とペアデータを組むセルアドレスに対応するデータメモリ２０７上のアドレスへ上書きする（ステップＳ１０４）。
以上により、ＢＣＭＣ１０から各セルプロセッサ２０へのデータのブロードキャストが終了する。
【００３９】
ブロードキャストが終了すると、各セルプロセッサ２０は、命令メモリ２０６に記憶されたデータ処理のプログラムにより、データメモリ２０７に記録された状態変数値をデータ処理して新たな状態変数値を生成する。新たな状態変数値は、データメモリ２０７に書き込まれるとともに、出力バッファ２０３にも書き込まれる（ステップＳ１０５）。新たな状態変数値は、データメモリ２０７上の、自らのセルアドレスに対応するアドレスに、上書きされる。
データ処理が終了すると、セルＣＰＵ２０１は、ＷＴＡバッファ２０４を介して１段目のＷＴＡ・総和回路３０の入力レジスタへＩＤと優先度データとを含む終了データを送信して、データ処理の終了を報告する（ステップＳ１０６）。優先度データは、データ処理の前又は後に、所定の優先度決定プログラムによって生成される。
【００４０】
１段目のＷＴＡ・総和回路３０は、各セルプロセッサ２０から送られる終了データのうち、ＩＤを入力レジスタの整数レジスタへ、優先度データを浮動小数点レジスタでそれぞれ保持する。ここで、ＷＴＡ・総和回路３０は近似ソートモードで動作する。そのために、切換器３０３は、比較器３０４を活性化する。
ＷＴＡ・総和回路３０の第１入力レジスタ３０１及び第２入力レジスタの整数レジスタは、各々異なるセルプロセッサ２０から送られたＩＤを保持する。また、各々の浮動小数点レジスタは、ＩＤに付随した優先度データを保持する。比較器３０４は、第１入力レジスタ３０１及び第２入力レジスタ３０２の浮動小数点レジスタからそれぞれ優先度データを読み出し、優先度を比較する。比較の結果、優先度が高い方の優先度データ及びそれに付随したＩＤを、出力レジスタ３０６の浮動小数点レジスタ及び整数レジスタへ書き込む。出力レジスタ３０６へ内容が書き込まれた入力レジスタは、その内容がクリアされる。出力レジスタ３０６へ書き込まれたＩＤ及び優先度データは、上位の段のＷＴＡ・総和回路３０の入力レジスタへ書き込まれる。
このような処理を各段のＷＴＡ・総和回路３０で行う。最上段のＷＴＡ・総和回路３０は、出力レジスタ３０６の整数レジスタに書き込まれたＩＤをＢＣＭＣ１０へ送る。
以上のような処理により、ＷＴＡ・総和回路３０全体としては、ＩＤを、優先度の高い順序でＢＣＭＣ１０へ送ることとなる（ステップＳ１０７）。
【００４１】
ＢＣＭＣ１０は、ＷＴＡ・総和回路３０から送られるＩＤに該当するセルプロセッサ２０の出力バッファ２０３から、データ処理された状態変数値を取得する。取得した状態変数値は、ＢＣＭＣ１０内のメインメモリ１０２上の、処理を行ったセルプロセッサ２０を示すセルアドレスに対応するアドレスに上書きされる（ステップＳ１０８）。
以上で、状態変数値の処理動作の１サイクルが終了する。
【００４２】
ＢＣＭＣ１０が、各セルプロセッサ２０からデータ処理結果を取得し、これによりブロードキャストデータを生成する。
各セルプロセッサ２０は、ブロードキャストデータから自分に必要となるデータのみを取捨選択してデータ処理を行う。このブロードキャストデータを用いてデータ処理を行うことにより、他のすべてのセルプロセッサ２０により処理されたデータを利用する処理が可能となる。また、ブロードキャストデータを、各セルプロセッサ２０からのデータ処理結果とこのデータ処理結果を生成したセルプロセッサ２０を示すセルアドレスとからなるペアデータにより作成することにより、特定のセルプロセッサ２０のデータ処理結果のみを用いる処理が可能となる。さらに、隣接するセルプロセッサ２０間は共有メモリを介して接続されているので、従来と同様に、隣接するセルプロセッサ２０間の処理も可能である。
各セルプロセッサ２０が、メインメモリ１０２に、直接、自セルプロセッサ２０で必要とするデータを取り込みに行くことがなく、ブロードキャストデータから必要となるデータを選択して、各セルプロセッサ２０内にデータを保持して処理を行うので、データの競合が起こらずに高速処理が可能となる。
【００４３】
［実施例１］
次に、上記のマルチプロセッサシステム１の実施例を具体的に説明する。
この実施例では、あるセルプロセッサ２０とそれに隣接する他のセルプロセッサ２０により処理されたデータのみを使用する場合の例を、図６を参照して説明する。
図６において、「○」はセルプロセッサを表しており、網掛された「○」がデータ処理を行うセルプロセッサ、「●」が必要とされるデータを保持するセルプロセッサである。
ｎ×ｎ（ｎは２以上の自然数）の格子の各格子点についてのデータ（格子点データ）に対して、次のようなフィルタ計算を連続的に実行する場合を考える。
Ｘｉ，ｊ＝（Ｘｉ−１，ｊ＋Ｘｉ＋１，ｊ＋Ｘｉ，ｊ−１＋Ｘｉ，ｊ＋１）／４
ｉ：格子点の行番号、ｊ：格子点の列番号
【００４４】
ＢＣＭＣ１０は、格子点データを行又は列でグループ化したブロードキャストデータとして、ｎ個のセルプロセッサ２０にブロードキャストする。
図８は、格子点データをグループ化した例示図であり、「○」で示される格子点データを５個ずつグループ化してある。一つのグループ化した格子点データが、一つのセルプロセッサ２０で処理される。
セルプロセッサ２０では、ブロードキャストデータから必要とするグループ化された格子点データをデータメモリ２０７に保存する。データメモリ２０７から、格子点データを順次読み出してデータ処理する。
【００４５】
共有メモリを介して接続されるセルプロセッサ２０との間では、共有メモリを用いてデータ転送を行う。共有メモリへのデータの書込動作を１サイクルとすると、セルプロセッサ２０間のグループ化されたデータの転送は、２ｎサイクルで行うことができる。
各セルプロセッサ２０を同期的に動作させ、共有メモリへの書き込みと演算とをパイプライン処理のように同時に実行することにより、セルプロセッサ２０間の通信と演算を同時に行うことができる。
【００４６】
次のブロードキャストデータは、グループ化された格子点データのデータ処理が終了する度に、ＢＣＭＣ１０によりブロードキャストされる。セルプロセッサ２０は、ブロードキャストされるデータのｉ、ｊにより、必要なデータか否かを判断する。
ブロードキャストデータをグループ化することにより行又は列方向のデータを処理可能であり、共有データを介してデータ転送することにより列又は行方向のデータ処理が可能となる。
【００４７】
［実施例２］
この実施例では、すべてのセルプロセッサ２０のうち、一部のセルプロセッサ２０により処理されたデータのみを使用する場合の例を、図７を参照して説明する。図７において、「○」はセルプロセッサを表しており、網掛された「○」がデータ処理を行うセルプロセッサ、「●」が必要とされるデータを保持するセルプロセッサである。このようなマルチプロセッサシステムは、ホップフィールドの連想記憶器の実現に有用である。
各セルプロセッサ２０は、データ処理結果である状態変数値とその状態変数値の重要度を表す重み係数とを保持するものとする。また、セルプロセッサ２０には、番号が付されており、ＢＣＭＣ１０は、番号順にセルプロセッサ２０から状態変数値を取り込む。
ＢＣＭＣ１０は、すべてのセルプロセッサ２０から取り込んだ状態変数値をブロードキャストデータとしてブロードキャストする。各セルプロセッサ２０は、ブロードキャストデータから必要な状態変数値のみを選択して重み係数との積和演算を行い、状態変数値を更新する。必要な状態変数値が、ブロードキャストデータに含まれるすべての状態変数値の場合、すべてのプロセッサにより処理されたデータを使用する処理に該当することとなる。
【００４８】
［実施例３］
次に、パターンマッチング計算処理の例を説明する。
ここでは、入力データの特徴に最も類似するデータを保持するセルプロセッサ２０を特定する処理を行う。この処理は、以下のようにして行う。
各セルプロセッサ２０は、予め比較対象となるテンプレートデータを保持する。
ＢＣＭＣ１０は、入力データをすべてのセルプロセッサ２０にブロードキャストする。各セルプロセッサ２０は、自らが保持するテンプレートデータの特徴と入力データの特徴との差分値を算出する。差分値は、ＩＤとともにＷＴＡ・総和回路３０へ送られる。
ＷＴＡ・総和回路３０は、最大値モードで動作する。入力レジスタの整数レジスタはＩＤを保持し、浮動小数点レジスタは差分値を保持する。差分値を比較器３０４により比較して、小さい方の差分値とそれに付随するＩＤを出力レジスタ３０６へ送る。これをＷＴＡ・総和回路３０全体で行い、最も小さい差分値とそれに付随するＩＤを求める。このＩＤ及び差分値をＢＣＭＣ１０へ送る。
ＢＣＭＣ１０は、ＩＤによりセルプロセッサ２０を特定する。これにより、入力データの特徴に最も類似するテンプレートデータ及び入力データの特徴と最も類似するテンプレートデータとの差分値も検出できる。
【００４９】
［実施例４］
次に、画像処理等の際に用いられる、動くオブジェクトの衝突判定アルゴリズムの処理例について説明する。「衝突判定アルゴリズム」は、ある空間内に存在するｎ個のオブジェクト（物体）が互いに他のオブジェクトと衝突するかどうか、衝突する場合はどの程度の強度かを判定するアルゴリズムである。
ｎ個のオブジェクトの空間分布には偏りがあり、ｍ個のクラスタに分かれているとする。ここでは、例えば、１個のオブジェクトが、他の（ｎ−１）個のオブジェクトのいずれと最も強く衝突するかについて判定するものとする。
図９は、このような空間内のオブジェクトの例示図であり、「○」で表されるオブジェクトを矩形で囲んで１クラスタとしており、図９ではオブジェクトが５個のクラスタに分けられている。オブジェクトを示すデータは、ＢＣＭＣ１０からブロードキャストされ、クラスタ毎にセルプロセッサ２０に取り込まれる。セルプロセッサ２０は、取り込んだ１つのクラスタに含まれるオブジェクトに関する空間内での位置、運動についての処理を行う。
図９の例では、セルプロセッサＡ〜Ｅにより５個のクラスタに分けられたオブジェクトに関する処理が行われる。
図１０により、衝突判定アルゴリズムの処理の流れを説明する。
【００５０】
ＢＣＭＣ１０は、オブジェクトの位置や速度のデータを含むオブジェクトデータと、当該オブジェクトが属するクラスタを示すクラスタデータとを含むブロードキャストデータを生成し、すべてのセルプロセッサ２０にブロードキャストする（ステップＳ２０１）。各セルプロセッサ２０は、ブロードキャストデータから、オブジェクトデータをクラスタデータに基づいて取捨選択して取り込む。
オブジェクトデータを取り込んだセルプロセッサ２０は、オブジェクトの現在の位置データと速度データとから、単位時間後の新しい位置データを算出する。新しい位置データから、新しいバウンディングボックスの値を得る（ステップＳ２０２）。バウンディングボックスとは、例えば、図９における、オブジェクトを囲む矩形である。バウンディングボックスの値とは、例えば、バウンディングボックスの頂点の座標である。
ＢＣＭＣ１０は、オブジェクトの新しい位置データを各セルプロセッサ２０から取り込んで位置データを更新する（ステップＳ２０３）。
【００５１】
次に、ＢＣＭＣ１０は、取得した新しい位置データ等を含むオブジェクトデータを一つずつ全セルプロセッサ２０にブロードキャストする（ステップＳ２０４）。つまり、衝突判定の対象となる１個のオブジェクト（以下、「判定対象オブジェクト」という）の位置を表す位置データを全セルプロセッサ２０に送る。
各セルプロセッサ２０では、まず、ステップＳ２０２で計算したバウンディングボックスを用いて、判定対象オブジェクトが衝突する可能性があるか否かを判断する（ステップＳ２０５）。具体的には、判定対象オブジェクトの位置がバウンディングボックス内にあるか否かを判断する。
衝突する可能性がある場合、つまり、判定対象オブジェクトがバウンディングボックス内にある場合は（ステップＳ２０５：Ｙ）、そのセルプロセッサ２０で処理される、バウンディングボックス内の各オブジェクトとの距離計算を順次行い（ステップＳ２０６）、衝突の判定を行う（ステップＳ２０７）。判定対象オブジェクトがバウンディングボックス内のいずれかのオブジェクトと衝突する場合には（ステップＳ２０７：Ｙ）、その衝突による衝撃の強さを定量的に表すデータ（衝突強度データ）、衝突による判定対象オブジェクトへの影響を表すデータ等を含む衝突データを生成する（ステップＳ２０８）。また、セルプロセッサ２０は、生成した衝突データのうち衝突強度データを、そのＩＤとともにＷＴＡ・総和回路３０に送る（ステップＳ２０９）。
【００５２】
判定対象オブジェクトがバウンディングボックス外にある場合（ステップＳ２０５：Ｎ）、または距離計算の結果、衝突しないと判定した場合（ステップＳ２０７：Ｎ）、各セルプロセッサ２０は、ＷＴＡ・総和回路３０に、例えば「−１．０」を、衝突強度データとして送る（ステップＳ２１０）。
ＷＴＡ・総和回路３０は最大値モードで動作する。ＷＴＡ・総和回路３０は、セルプロセッサ２０から送られる衝突強度データを比較して、最も衝突による衝撃の強さが大きいことを表す衝突強度データを検出して（ステップＳ２１１）、検出した衝突強度データを生成したセルプロセッサ２０を特定する。そして特定したセルプロセッサ２０を表すＩＤをＢＣＭＣ１０へ送る。
ＢＣＭＣ１０は、ＷＴＡ・総和回路３０の最上段から送られたＩＤにより表されるセルプロセッサ２０から衝突データを取得する（ステップＳ２１２）。ステップＳ２０４以降の処理をすべてのオブジェクトについて行うことにより、空間内のすべてのオブジェクト間の衝突判定が行われる。
【００５３】
［実施例５］
次に、ＷＴＡ・総和回路３０の加算器３０５を用いる場合の例を説明する。
各セルプロセッサ２０は、データ処理結果をＷＴＡ・総和回路３０へ入力する。ＷＴＡ・総和回路３０では、加算器３０５によりデータ処理結果を加算し、最終的に、すべてのセルプロセッサ２０のデータ処理結果の総和を得る。このようにして、ＷＴＡ・総和回路３０により高速にデータ処理結果の総和を得ることが可能である。
データ処理結果の総和は、ＢＣＭＣ１０に送られて、各セルプロセッサ２０にブロードキャストにより、高速に送信可能である。データ処理結果の総和は、例えば、ニューロなどの最適化計算において、正規化計算に用いられる。
【００５４】
以上の説明において、ＢＣＭＣ１０とＷＴＡ・総和回路３０とは各々独立したものとしたが、ＢＣＭＣ１０にＷＴＡ・総和回路３０を組み込んだ一つのブロックとして、コントローラを構成してもよい。
【００５５】
なお、以上の説明は、データ処理手段がセルプロセッサ２０であり、制御手段がコントローラ（ＢＣＭＣ１０）である場合の例であるが、本発明の構成要素は、このような例に限定されるものではない。
例えば複数のデータ処理端末を広域ネットワークを介して双方向通信が可能な形態で接続し、そのうちの一つ又は複数のデータ処理端末を制御手段、他の複数のデータ処理端末をデータ処理手段として動作させ、制御手段に、複数のデータ処理手段の一部又は全部より受け取ったデータ処理結果及び少なくとも一つのデータ処理手段によるデータ処理に用いるデータを含むブロードキャストデータをブロードキャストする機能をもたせ、複数のデータ処理手段の各々に、制御手段によりブロードキャストされたブロードキャストデータから自らが行うデータ処理に必要なデータのみを取捨選択してデータ処理を行うとともに、その処理結果を制御手段に送出させる機能をもたせるようにしてもよい。
【００５６】
また、複数のデータ処理手段として、予め定めた識別情報（例えば上述した識別データ）によりそれを特定できる汎用のデータ処理端末を用い、これらの汎用のデータ処理端末と双方向通信可能なサーバ、あるいはＣＰＵ及びメモリを内蔵した半導体デバイスを搭載した装置をのみをもってデータ処理システムを構成するようにしてもよい。
この場合のサーバ又は装置は、その内部のＣＰＵが所定のコンピュータプログラムを読み込んで実行することにより、サーバ本体又は装置内に、少なくとも一つのデータ処理手段としてのデータ処理端末を特定するとともに特定したデータ処理端末の識別情報とそのデータ処理端末宛のデータ処理用データとを含むブロードキャストデータを生成する機能と、複数のデータ処理端末の一部又は全部から当該データ処理端末で行われたデータの処理結果を取得する機能と、受け取った処理結果をブロードキャストデータに含め、当該ブロードキャストデータを複数のデータ処理端末の各々にブロードキャストする機能とを形成するものである。
【００５７】
【発明の効果】
以上のような本発明により、複数のデータ処理手段を用いる場合のデータ処理手段間のデータ処理を効率的に行えるようになる。
【図面の簡単な説明】
【図１】本発明を適用したマルチプロセッサシステムの構成例を示した図。
【図２】ＢＣＭＣの構成図。
【図３】セルプロセッサの構成図。
【図４】ＷＴＡ・総和回路の構成図。
【図５】本実施形態によるマルチプロセッサシステムの処理の流れを示すフローチャート。
【図６】隣接するプロセッサのデータ処理結果を使用する概念図。
【図７】一部のプロセッサのデータ処理結果を使用する概念図。
【図８】格子点データをグループ化した例示図。
【図９】オブジェクトをクラスタに分けた場合の例示図。
【図１０】衝突判定アルゴリズムの処理の流れを示すフローチャート。
【符号の説明】
１０　ＢＣＭＣ
１０１　ＣＰＵコア
１０２　メインメモリ
１０３　ＤＭＡＣ
２０　セルプロセッサ
２０１　セルＣＰＵ
２０２　入力バッファ
２０３　出力バッファ
２０４　ＷＴＡバッファ
２０５　プログラムコントローラ
２０６　命令メモリ
２０７　データメモリ
３０　ＷＴＡ・総和回路
３０１　第１入力レジスタ
３０２　第２入力レジスタ
３０３　切換器
３０４　比較器
３０５　加算器
３０６　出力レジスタ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a data processing system for performing data processing by a plurality of data processing means, for example, a multiprocessor system and a data processing method.
[0002]
BACKGROUND OF THE INVENTION
As the advanced information society advances, the amount of data processing by a data processing device such as a computer tends to increase. In addition, the contents of data processing are becoming more complicated and sophisticated. 2. Description of the Related Art Conventionally, the performance of a processor such as a CPU (Central Processing Unit) has been improved, and the processing performance of the entire data processing apparatus has been improved by using a plurality of processors to implement a multiprocessor.
However, in recent years, the speed of increasing the required data processing capacity has exceeded the speed of improving the performance of processors. The high performance of a processor cannot be achieved overnight because of its long development period.
On the other hand, the data processing capacity of, for example, a multiprocessor is determined by the number of processors to be used and the processing method thereof, and the dependence of individual processors on high performance is small. Therefore, it is one of effective means for improving the processing capacity of the data processing device.
[0003]
The data processing method by the multiprocessor is classified as follows according to the range of data required by one processor at the time of data processing.
(1) A processor that performs data processing uses only data processed by an adjacently connected processor
Such control is suitable for cellular automata, image filters, calculation of cloth and wave motions, calculation of polygon generation from curved surfaces, and the like.
(2) A processor that performs data processing uses only data processed by some of the plurality of processors.
Such control is suitable for many-to-many collision determination and the like.
[0004]
The data processing in the above case (1) can be efficiently realized by a conventional parallel processor. However, in the data processing of (2), the processing speed of the entire system is limited by the communication speed between the parallel processors, and the processing speed of each processor cannot be sufficiently exhibited. For example, the data processing of (2) can be performed at high speed by connecting all processors with a crossbar. However, in this case, the required hardware becomes enormous, which is not practical.
[0005]
An object of the present invention is to provide, for example, a data processing system and a data processing method capable of performing the above-described data processing (2) more efficiently than in the past.
[0006]
[Means for Solving the Problems]
In order to solve the above problem, the multiprocessor system of the present invention manages at least one of a plurality of objects distributed in a predetermined virtual space, and stores position data representing a position in the virtual space for the object. A plurality of processors to generate, and a controller that can acquire position data of the object from all the processors, and broadcasts the acquired position data one by one to all processors, comprising: Each fetches the position data broadcast from the controller, and determines whether there is an object whose position is represented by the fetched position data within a range in which the objects under its control are distributed. Object is under its control if Is adapted to determine whether the position of collision with an object.
[0007]
The plurality of processors may generate speed data indicating a speed in the virtual space for an object managed by the plurality of processors. In this case, if the controller can acquire the position data and the speed data from all the processors and broadcast the acquired position data and velocity data to all the processors one by one, the processors are broadcast. When the object whose position is represented by the position data is in a collision position with an object under its control, collision intensity data that quantitatively indicates the strength of the impact due to the collision by the speed data of the broadcasted object. And data representing the effect of the collision on the object.
Further, identification data for identifying each of the plurality of processors is assigned, the collision intensity data is fetched together with the identification data of the processor from the processor that generated the collision intensity data, and the collision intensity data is not generated. A value smaller than the value of the collision strength data is taken in from the processor as the collision strength data of the processor together with the identification data, the processor that generated the largest collision strength data is specified, and the identification data of the specified processor is sent to the controller. A maximum value detection mechanism may be further provided. Thereby, the controller can acquire the collision strength data and the data indicating the influence of the collision on the object from the processor represented by the identification data sent from the maximum value detection mechanism.
[0008]
In such a multiprocessor system, the processor calculates, for example, the distance between the object whose position data is broadcast and the object under its control, and thereby determines that the object under its control is It is determined whether or not the vehicle is at a collision position.
[0009]
The data processing method according to the present invention includes a plurality of data processing means for dividing a plurality of objects distributed in a predetermined virtual space into cluster units, each managing one different cluster, and the virtual space of all the objects. Control means for managing the position and speed within the device and broadcasting to the plurality of data processing means position data representing the position of the object and speed data representing the speed. Wherein the control means broadcasts the position data and the velocity data of all objects to all data processing means together with cluster data representing a cluster to which the object belongs, and each of the plurality of data processing means About the objects I manage Capturing the position data and the speed data based on the cluster data, generating new position data and speed data from the captured position data and speed data, and the control unit controls each of the plurality of data processing units. Fetching new position data and velocity data of each object, and broadcasting the fetched position data one by one to all data processing means, wherein each of the plurality of data processing means is broadcasted from the control means. Fetches new location data, and determines whether there is an object represented by the new location data within a range where objects under its management are distributed. Whether or not it is in a position to collide with an object under the control of And determining step, the run in this order.
[0010]
The data processing system of the present invention is a system that performs two-way communication with a control unit that manages position data representing a position in the virtual space of a plurality of objects distributed in a predetermined virtual space, Means for managing at least one of the plurality of objects distributed in the virtual space, generating position data representing a position in the virtual space for the object, and sending the generated position data to the control means; Fetching position data broadcast one by one from the control means, and determining whether or not there is an object whose position is represented by the fetched position data within a range in which objects under its management are distributed. If the object is in a range where it collides with an object under its control And a, means for determining.
[0011]
A computer program provided by the present invention is a computer-mounted computer that performs two-way communication with control means for managing position data representing a position in the virtual space of a plurality of objects distributed in a predetermined virtual space. In the apparatus, the computer program is a computer program for causing the computer to form the following functions, and the semiconductor device provided by the present invention determines the positions in the virtual space of a plurality of objects distributed in a predetermined virtual space. A semiconductor device which performs two-way communication with control means for managing the position data to be represented, and which is incorporated in a device mounted on a computer to cause the computer to form the following functions.
(1) managing at least one of a plurality of objects distributed in the virtual space, generating position data representing a position of the object in the virtual space, and transmitting the generated position data to the control unit; Means of sending,
(2) The position data broadcast from the control means is fetched one by one, and it is determined whether or not there is an object whose position is represented by the fetched position data within a range where the objects under its management are distributed. Means for judging and, if within the range, judging whether or not the object is in a position where it collides with an object under its control.
[0012]
BEST MODE FOR CARRYING OUT THE INVENTION
An embodiment in which the present invention is applied to a multiprocessor system as an example of a data processing system will be described below.
[0013]
<Overall configuration>
FIG. 1 is a diagram illustrating a configuration example of a multiprocessor system. The multiprocessor system 1 includes a broadcast memory controller (hereinafter, referred to as a "BCMC (Broadcast Memory Controller)") 10 which is a control unit for data processing and data recording and reading, and a plurality of units each of which is an example of a data processing unit. , And a plurality of WTA (Winner Take All) / sum circuits 30 for forming various functions required for data processing.
The BCMC 10 and all the cell processors 20 are connected by a broadcast channel (a communication channel capable of simultaneous transmission).
[0014]
The multiprocessor system 1 manages a state variable value as an example of a data processing result by each cell processor 20 in the BCMC 10, and sends the state variable values of all the cell processors 20 from the BCMC 10 by broadcast as an example of a reference numerical value. Is what you do. Thereby, each cell processor 20 can refer to the state variable value generated in another cell processor 20 at high speed.
[0015]
The broadcast channel is a transmission path between the BCMC 10 and the plurality of cell processors 20, and includes an address bus used for transferring addresses and a data bus used for transferring data such as state variable values. Be composed. The address includes a cell address for specifying each cell processor 20 and a broadcast address for all cell processors 20.
The cell address corresponds to an address (physical address or logical address) on the memory, and the state variable value from the cell processor 20 is always stored at the address corresponding to the cell address indicating the cell processor 20. It has become. Each cell processor 20 is provided with an ID (identification) as identification information for identifying each cell processor. The cell address also corresponds to this ID. Thereby, it is possible to specify from which cell processor 20 the state variable value has been output by the cell address.
[0016]
The WTA / sum circuit 30 is connected as shown in FIG. That is, the WTA / sum circuit 30 is connected in a pyramid shape with the cell processor 20 side as the first stage. Two cell processors 20 are connected to the input terminals of the first-stage WTA / sum circuit 30, and the output terminals are connected to the input terminals of the second-stage WTA / sum circuit 30.
In the second and subsequent stages, the output terminals of the two lower stage WTA / sum circuits 30 are connected to the respective input terminals, and the input terminals of the upper stage WTA / sum circuit 30 are connected to the output terminals. The output terminals of the lower two WTA / sum circuits 30 are connected to the input terminals of the uppermost WTA / sum circuit 30, and the output terminals are connected to the BCMC 10.
[0017]
The present invention can be implemented by connecting the WTA / sum circuit 30 in cascade in addition to the connection form shown in the figure. In this case, two cell processors 20 are connected to the input terminals of the first stage WTA / sum circuit 30, and the output terminals are connected to the input terminals of the upper stage. The input terminal of the WTA / sum circuit 30 of the second and subsequent stages is connected to the output terminal of the WTA / sum circuit 30 of the lower stage and the cell processor 20, and the output terminal is connected to the input terminal of the upper stage. The input terminal of the uppermost stage WTA / sum circuit 30 is connected to the output terminal of the lower stage WTA / sum circuit 30 and the cell processor 20, and the output terminal is connected to the BCMC 10.
[0018]
Next, each of the BCMC 10, the cell processor 20, and the WTA / sum circuit 30 will be described in detail.
[0019]
<BCMC>
The BCMC 10 broadcasts data to all the cell processors 20 through a broadcast channel, and acquires and holds the state variable values from each cell processor 20. FIG. 2 shows a configuration example of the BCMC 10.
In the BCMC 10, a CPU core 101 that controls the entire operation of the multiprocessor system 1, a rewritable main memory 102 such as an SRAM (Static Random Access Memory), and a DMAC (Direct Memory Access Controller) 103 are connected by a bus B1. It is composed. The CPU core 101 is a computer-mounted semiconductor device that forms a function for performing characteristic data processing of the present invention by reading and executing a predetermined computer program in cooperation with the main memory 102. The main memory 102 is used as a shared memory for the entire system.
The output terminal of the uppermost WTA / sum circuit 30 and an external memory such as a hard disk or a portable medium are also connected to the bus B1.
[0020]
The CPU core 101 reads a startup program from the external memory at the time of startup, and executes the startup program to operate the operating system. Further, various data necessary for data processing is read from the external memory, and is expanded in the main memory 102. The main memory 102 also stores data such as a state variable value of each cell processor 20. The state variable value is stored at an address in the main memory 102 corresponding to the cell address of the cell processor 20 that has calculated the state variable value.
The CPU core 101 also generates broadcast data to be broadcast to each cell processor 20 based on the data read from the main memory 102. The broadcast data is, for example, pair data that is a set of a state variable value and a cell address indicating the cell processor 20 that has calculated the state variable value. One or more sets of pair data are generated.
[0021]
The DMAC 103 is a semiconductor device that performs direct memory access transfer control between the main memory 102 and each cell processor 20. For example, broadcast data is broadcast to each cell processor 20 via a broadcast channel. In addition, data processing results of each cell processor 20 are individually obtained and written to the main memory 102.
[0022]
<Cell processor>
Each cell processor 20 performs data processing by selecting necessary data from the broadcast data, and reports this to the WTA / sum circuit 30 when the data processing is completed. The state variable value, which is the data processing result, is sent to the BCMC 10 according to an instruction from the BCMC 10. Each of the cell processors 20 is ring-connected via a shared memory (not shown). Each cell processor 20 may perform data processing with a synchronous clock, or may each perform data processing with a different clock. FIG. 3 shows a configuration example of the cell processor 20.
The cell processor 20 includes a cell CPU 201, an input buffer 202, an output buffer 203, a WTA buffer 204, a program controller 205, an instruction memory 206, and a data memory 207.
[0023]
The cell CPU 201 is a processor having a programmable floating-point arithmetic unit, and controls the operation in the cell processor 20 to perform data processing. The cell CPU 201 acquires the broadcast data broadcast from the BCMC 10 via the input buffer 202, determines whether or not the data is necessary for the processing to be performed by the cell address of the pair data, and if necessary, the data memory 207. Write the state variable value to the address corresponding to. Further, it reads the state variable value from the data memory 207 and performs data processing, writes the data processing result in the output buffer 203, and sends data indicating the end of the data processing to the WTA / sum circuit 30.
[0024]
The input buffer 202 holds broadcast data broadcast from the BCMC 10. The held broadcast data is sent to the cell CPU 201 in response to a request from the cell CPU 201.
The output buffer 203 holds the state variable value of the cell CPU 201. The held state variable value is transmitted to the BCMC 10 in response to a request from the BCMC 10.
The input buffer 202 and the output buffer 203 may also transmit and receive control data and the like.
The WTA buffer 204 receives the data indicating the end of the data processing from the cell CPU 201 at the end of the data processing by the cell CPU 201, and transmits the data to the WTA / summation circuit 30 to determine the end of the data processing by the WTA / summation. This is reported to the circuit 30. The end data indicating the end of the data processing includes, for example, the ID of the own cell processor 20 and priority data for determining the priority when the state variable value stored in the output buffer 203 is read by the BCMC 10.
[0025]
The program controller 205 loads a program defining the operation of the cell processor 20 from the BCMC 10. The program that defines the operation of the cell processor 20 includes a program for data processing executed by the cell processor 20, a data selection program that determines data necessary for processing by the cell processor 20, and a processing result read to the BCMC 10. There is a priority determination program that determines the priority at the time.
The instruction memory 206 stores a program fetched by the program controller 205. The stored program is read into the cell CPU 201 as needed.
[0026]
The data memory 207 stores data processed in the cell processor 20. Broadcast data determined to be necessary by the cell CPU 201 is written. The broadcast data is stored at an address corresponding to the cell address.
In this embodiment, a part of the data memory 207 is connected to the adjacent cell processor 20 via the shared memory, and data can be transmitted and received to and from the adjacent cell processor 20 every cycle.
[0027]
<WTA / sum circuit>
The plurality of WTA / sum circuits 30 determine the order in which the BCMC 10 takes in the state variable values from the cell processor 20 based on the data indicating the end of the data processing sent from each cell processor 20, and report the determined order to the BCMC 10.
FIG. 4 shows a configuration example of the WTA / sum circuit 30.
Each WTA / sum circuit 30 includes two input registers A and B (hereinafter, a first input register 301 and a second input register 302), a switch 303, a comparator 304, an adder 305, and an output register 306. And is comprised.
[0028]
The first input register 301 and the second input register 302 include an integer register and a floating point register, respectively. In the integer register, for example, ID of the end data indicating the end of the data processing sent from the cell processor 20 is written, and in the floating-point register, for example, priority data is written.
The switch 303 activates one of the comparator 304 and the adder 305. Specifically, only one of them can be used according to the operation mode. The operation mode is determined by an instruction from the BCMC 10, for example. The operation mode will be described later.
The comparator 304 compares the floating-point values held by the floating-point registers of the first input register 301 and the second input register 302, and determines the larger (or smaller) value and the associated integer. , Write to the output register 306.
The adder 305 calculates the sum of the floating-point values held by the floating-point registers of the first input register 301 and the second input register 302, and writes the calculation result to the output register 306.
The output register 306 has substantially the same configuration as the first input register 301 and the second input register 302. That is, it has an integer register and a floating point register. The ID is written in the integer register, and the priority data is written in the floating-point register.
[0029]
The WTA / sum circuit 30 has three operation modes described below.
[0030]
・ Maximum value (WTA) mode:
The switch 304 activates the comparator 304. The comparator 304 compares the floating-point values A and B held by the floating-point registers of the first input register 301 and the second input register 302, respectively, and determines the larger (or smaller) value and the associated value. Write an integer value to output register 306. When the writing to the output register 306 is completed, the first input register 301 and the second input register 302 are cleared. The contents of the output register 306 are written to the input register of the WTA / sum circuit 30 in the upper stage. At this time, if the input register to which the data is to be written is not cleared, the writing is stalled, and the writing is not performed in that cycle, but is performed in the next cycle.
[0031]
・ Addition mode:
Switch 303 activates adder 305. The adder 305 calculates the sum of the floating point values A and B held by the floating point registers of the first input register 301 and the second input register 302, and writes the calculation result to the output register 306. The contents of the output register 306 are written to the input register of the WTA / sum circuit 30 in the upper stage.
[0032]
・ Approximate sort mode:
The switch 304 activates the comparator 304. The comparator 304 compares the floating-point values A and B held by the floating-point registers of the first input register 301 and the second input register 302, respectively, and determines the larger (or smaller) value and the associated value. The integer value is written to the output register 306.
After that, only the input register holding the value written to the output register 306 is cleared, and the contents of the output register 306 are written to the input register of the WTA / sum circuit 30 in the upper stage. If the write destination input register is not cleared, the write stalls and does not perform the write in that cycle. However, the write operation from the output register 306 of the lower stage WTA / sum circuit 30 is performed.
In the approximate sort mode, the data received by the BCMC 10 from the uppermost output register 306 of the WTA / sum circuit 30 is sorted (rearranged) in ascending or descending order of floating point.
[0033]
Before entering each mode, the first input register 301, the second input register 302, and the output register 306 of all the WTA / sum circuits 30 are cleared.
[0034]
By switching and using each mode, the plurality of WTA / sum circuits 30 as a whole function as a mechanism for sorting (sort mechanism) and / or a sum circuit. That is, when operating in the approximate sort mode, a sorting mechanism is realized, and when operating in the addition mode, a summation circuit is realized.
[0035]
The WTA / sum circuit 30 operating in the maximum value mode and the approximate sort mode may be realized as follows.
That is, a WTA / sum circuit includes the same number of input registers, switchers, comparators, adders, and output registers as the cell processor 20.
As many input registers as the number of cell processors 20 are provided, each of which includes an integer register and a floating-point register, like the first register 301 and the second register 302. The comparator compares floating point values held by floating point registers of all input registers. The adder calculates the sum of the floating-point values held by the floating-point registers of all the input registers.
The output register is the same as the output register of the WTA / sum circuit 30 in FIG.
[0036]
The comparator compares the priority data held by the floating-point registers of the input registers, and sequentially writes the associated IDs to the output registers in descending order of priority. As a result, the IDs can be sent to the BCMC 10 in the order of priority.
The adder can add the data held in each floating-point register and calculate the sum.
Such a WTA / sum circuit can function alone as a sort mechanism and a sum circuit in the present invention without taking the connection form as shown in FIG.
[0037]
<Data processing method>
The multiprocessor system 1 according to the present embodiment performs required data processing by operating as follows. FIG. 5 is a flowchart showing the flow of processing executed in the multiprocessor system 1.
[0038]
The initial values of the state variable values of all the cell processors 20 are stored in the main memory 102 of the BCMC 10 in advance.
The BCMC 10 creates broadcast data based on the pair data including the state variable value of the cell processor 20 and the cell address indicating the cell processor 20 (step S101). Then, the created broadcast data is broadcast to all the cell processors 20 (step S102).
Each cell processor 20 fetches the broadcast data into the input buffer 202. The cell CPU 201 checks the cell address of the broadcast data held in the input buffer 202 by the data selection program stored in the instruction memory 206, and determines whether there is a state variable value required for the data processing performed by the cell processor 20. Confirm (step S103). When there is no state variable value required for the data processing performed by itself, the cell processor 20 ends the processing operation (Step S103: No). If there is a state variable value required for the data processing performed by itself (step S103: yes), the corresponding state variable value is overwritten on the address on the data memory 207 corresponding to the cell address forming the pair data with this state variable value. (Step S104).
Thus, the broadcast of data from the BCMC 10 to each cell processor 20 ends.
[0039]
When the broadcast ends, each cell processor 20 performs data processing on the state variable value recorded in the data memory 207 according to the data processing program stored in the instruction memory 206 to generate a new state variable value. The new state variable value is written into the data memory 207 and also into the output buffer 203 (step S105). The new state variable value is overwritten on the address corresponding to its own cell address on the data memory 207.
When the data processing ends, the cell CPU 201 transmits end data including the ID and the priority data to the input register of the first stage WTA / summation circuit 30 via the WTA buffer 204, and reports the end of the data processing. (Step S106). The priority data is generated by a predetermined priority determination program before or after data processing.
[0040]
The first stage WTA / sum circuit 30 holds the ID of the end data sent from each cell processor 20 in the integer register of the input register and the priority data in the floating point register. Here, the WTA / sum circuit 30 operates in the approximate sort mode. For that purpose, the switch 303 activates the comparator 304.
The integer registers of the first input register 301 and the second input register of the WTA / sum circuit 30 hold IDs sent from different cell processors 20, respectively. Each floating-point register holds priority data associated with the ID. The comparator 304 reads priority data from the floating-point registers of the first input register 301 and the second input register 302, and compares the priorities. As a result of the comparison, the higher priority data and the associated ID are written to the floating-point register and the integer register of the output register 306. The contents of the input register whose contents have been written to the output register 306 are cleared. The ID and priority data written to the output register 306 are written to the input register of the WTA / sum circuit 30 in the upper stage.
Such processing is performed by the WTA / sum circuit 30 at each stage. The uppermost WTA / sum circuit 30 sends the ID written in the integer register of the output register 306 to the BCMC 10.
With the above processing, the WTA / sum circuit 30 sends the IDs to the BCMC 10 in the order of the highest priority (step S107).
[0041]
The BCMC 10 acquires the data-processed state variable value from the output buffer 203 of the cell processor 20 corresponding to the ID sent from the WTA / sum circuit 30. The acquired state variable value is overwritten on the main memory 102 in the BCMC 10 at the address corresponding to the cell address indicating the cell processor 20 that has performed the processing (step S108).
Thus, one cycle of the processing operation of the state variable value is completed.
[0042]
The BCMC 10 obtains a data processing result from each cell processor 20, and thereby generates broadcast data.
Each cell processor 20 performs data processing by selecting only data necessary for itself from the broadcast data. By performing data processing using the broadcast data, processing using data processed by all other cell processors 20 becomes possible. Also, by creating the broadcast data by pair data consisting of the data processing result from each cell processor 20 and the cell address indicating the cell processor 20 that generated this data processing result, the data processing result of the specific cell processor 20 can be obtained. It is possible to perform a process using only the information. Further, since the adjacent cell processors 20 are connected via the shared memory, processing between the adjacent cell processors 20 is possible as in the related art.
Each cell processor 20 does not go directly into the main memory 102 to fetch the data required by its own cell processor 20, but selects the required data from the broadcast data and stores the data in each cell processor 20. Since processing is performed while holding the data, high-speed processing can be performed without data conflict.
[0043]
[Example 1]
Next, an embodiment of the multiprocessor system 1 will be specifically described.
In this embodiment, an example in which only data processed by a certain cell processor 20 and another cell processor 20 adjacent thereto is used will be described with reference to FIG.
In FIG. 6, “○” indicates a cell processor, and shaded “○” indicates a cell processor that performs data processing, and “●” indicates a cell processor that holds required data.
It is assumed that the following filter calculation is continuously performed on data (grid point data) for each grid point of an nxn (n is a natural number of 2 or more) grid.
Xi, j = (Xi-1, j + Xi + 1, j + Xi, j-1 + Xi, j + 1) / 4
i: row number of grid point, j: column number of grid point
[0044]
The BCMC 10 broadcasts the grid point data to the n cell processors 20 as broadcast data obtained by grouping the grid point data by rows or columns.
FIG. 8 is an exemplary diagram in which grid point data is grouped, in which grid point data indicated by “」 ”is grouped by five. One group of grid point data is processed by one cell processor 20.
The cell processor 20 stores the required grid point data from the broadcast data in the data memory 207. The grid point data is sequentially read from the data memory 207 and subjected to data processing.
[0045]
Data transfer is performed using the shared memory with the cell processor 20 connected via the shared memory. If the operation of writing data to the shared memory is one cycle, the transfer of grouped data between the cell processors 20 can be performed in 2n cycles.
By operating each cell processor 20 synchronously and simultaneously executing writing to the shared memory and operation as in pipeline processing, communication and operation between the cell processors 20 can be performed simultaneously.
[0046]
The next broadcast data is broadcast by the BCMC 10 each time the data processing of the grouped grid point data ends. The cell processor 20 determines whether it is necessary data based on i and j of the data to be broadcast.
By grouping the broadcast data, data in the row or column direction can be processed, and by transferring data via shared data, data processing in the column or row direction can be performed.
[0047]
[Example 2]
In this embodiment, an example in which only data processed by some of the cell processors 20 among all the cell processors 20 is used will be described with reference to FIG. In FIG. 7, “○” indicates a cell processor, and shaded “○” indicates a cell processor that performs data processing, and “●” indicates a cell processor that holds required data. Such a multiprocessor system is useful for implementing a Hopfield associative memory.
Each cell processor 20 holds a state variable value as a data processing result and a weight coefficient indicating the importance of the state variable value. The cell processor 20 is assigned a number, and the BCMC 10 fetches the state variable value from the cell processor 20 in the order of the number.
The BCMC 10 broadcasts the state variable values fetched from all the cell processors 20 as broadcast data. Each cell processor 20 selects only a necessary state variable value from the broadcast data, performs a product-sum operation with the weight coefficient, and updates the state variable value. When the required state variable values are all the state variable values included in the broadcast data, the processing corresponds to the processing using the data processed by all the processors.
[0048]
[Example 3]
Next, an example of the pattern matching calculation process will be described.
Here, a process of specifying the cell processor 20 that holds data most similar to the characteristics of the input data is performed. This processing is performed as follows.
Each cell processor 20 holds template data to be compared in advance.
BCMC 10 broadcasts input data to all cell processors 20. Each cell processor 20 calculates a difference value between the feature of the template data held by itself and the feature of the input data. The difference value is sent to the WTA / sum circuit 30 together with the ID.
The WTA / sum circuit 30 operates in the maximum value mode. The integer register of the input register holds the ID, and the floating point register holds the difference value. The difference value is compared by the comparator 304, and the smaller difference value and the ID associated therewith are sent to the output register 306. This is performed by the entire WTA / sum circuit 30, and the smallest difference value and the ID associated therewith are obtained. The ID and the difference value are sent to the BCMC 10.
The BCMC 10 specifies the cell processor 20 based on the ID. This makes it possible to detect template data most similar to the characteristics of the input data and a difference value between the template data most similar to the characteristics of the input data.
[0049]
[Example 4]
Next, a description will be given of a processing example of a collision determination algorithm of a moving object used in image processing or the like. The “collision determination algorithm” is an algorithm that determines whether n objects (objects) existing in a certain space collide with each other and, if so, how strong the object is.
It is assumed that the spatial distribution of n objects has a bias and is divided into m clusters. Here, for example, it is determined which one of the (n-1) objects collides most strongly with one object.
FIG. 9 is a view showing an example of an object in such a space. The object represented by “○” is surrounded by a rectangle to form one cluster. In FIG. 9, the object is divided into five clusters. The data indicating the object is broadcast from the BCMC 10 and taken into the cell processor 20 for each cluster. The cell processor 20 performs a process regarding a position and a movement in space regarding an object included in one captured cluster.
In the example of FIG. 9, the processes related to the objects divided into five clusters are performed by the cell processors A to E.
With reference to FIG. 10, the flow of processing of the collision determination algorithm will be described.
[0050]
The BCMC 10 generates broadcast data including object data including data on the position and speed of an object and cluster data indicating a cluster to which the object belongs, and broadcasts the broadcast data to all cell processors 20 (step S201). Each cell processor 20 selects and takes in the object data from the broadcast data based on the cluster data.
The cell processor 20 that has taken in the object data calculates new position data after a unit time from the current position data and the velocity data of the object. A new bounding box value is obtained from the new position data (step S202). The bounding box is, for example, a rectangle surrounding the object in FIG. The value of the bounding box is, for example, the coordinates of the vertex of the bounding box.
The BCMC 10 fetches new position data of the object from each cell processor 20 and updates the position data (Step S203).
[0051]
Next, the BCMC 10 broadcasts the obtained object data including the new position data and the like one by one to all the cell processors 20 (step S204). That is, position data representing the position of one object to be subjected to collision determination (hereinafter, referred to as “determination target object”) is sent to the all-cell processor 20.
First, each cell processor 20 determines whether there is a possibility that the determination target object will collide using the bounding box calculated in step S202 (step S205). Specifically, it is determined whether or not the position of the determination target object is within the bounding box.
If there is a possibility of collision, that is, if the object to be determined is in the bounding box (step S205: Y), the distance calculation for each object in the bounding box, which is processed by the cell processor 20, is sequentially performed. (Step S206), a collision is determined (Step S207). When the object to be determined collides with any object in the bounding box (step S207: Y), data (collision intensity data) that quantitatively represents the strength of the impact due to the collision, Then, collision data including data representing the influence of is generated (step S208). Further, the cell processor 20 sends the collision intensity data of the generated collision data together with the ID to the WTA / summation circuit 30 (step S209).
[0052]
When the object to be determined is outside the bounding box (step S205: N), or as a result of the distance calculation, it is determined that no collision occurs (step S207: N), each cell processor 20 sends a signal to the WTA / summation circuit 30, for example. "-1.0" is sent as the collision strength data (step S210).
The WTA / sum circuit 30 operates in the maximum value mode. The WTA / summation circuit 30 compares the collision intensity data sent from the cell processor 20, detects collision intensity data indicating that the magnitude of the impact due to the collision is the highest (step S211), and detects the detected collision intensity data. Is specified. Then, an ID representing the specified cell processor 20 is sent to the BCMC 10.
The BCMC 10 obtains collision data from the cell processor 20 represented by the ID sent from the uppermost stage of the WTA / sum circuit 30 (step S212). By performing the processing from step S204 on all the objects, a collision determination between all the objects in the space is performed.
[0053]
[Example 5]
Next, an example in which the adder 305 of the WTA / sum circuit 30 is used will be described.
Each cell processor 20 inputs the data processing result to the WTA / sum circuit 30. In the WTA / sum circuit 30, the data processing results are added by the adder 305, and finally the sum of the data processing results of all the cell processors 20 is obtained. In this way, the WTA / sum circuit 30 can obtain the sum of the data processing results at high speed.
The sum of the data processing results is sent to the BCMC 10 and can be transmitted to each cell processor 20 by broadcasting at a high speed. The sum of the data processing results is used for a normalization calculation in an optimization calculation of, for example, a neuro.
[0054]
In the above description, the BCMC 10 and the WTA / sum circuit 30 are independent of each other. However, the controller may be configured as one block in which the WTA / sum circuit 30 is incorporated in the BCMC 10.
[0055]
Although the above description is an example in which the data processing means is the cell processor 20 and the control means is the controller (BCMC 10), the components of the present invention are not limited to such an example. Absent.
For example, a plurality of data processing terminals are connected via a wide area network in a form capable of two-way communication, and one or more of the data processing terminals operate as control means, and the other plurality of data processing terminals operate as data processing means. The control means is provided with a function of broadcasting data processing results received from some or all of the plurality of data processing means and broadcast data including data used for data processing by at least one data processing means. Each of the means has a function of selecting only data necessary for the data processing performed by itself from the broadcast data broadcasted by the control means and performing the data processing, and having a function of transmitting the processing result to the control means. Is also good.
[0056]
Also, as the plurality of data processing means, a general-purpose data processing terminal capable of specifying it by predetermined identification information (for example, the above-described identification data) is used, and a server capable of two-way communication with these general-purpose data processing terminals, or The data processing system may be configured with only an apparatus equipped with a semiconductor device having a built-in CPU and memory.
In this case, the server or device reads and executes a predetermined computer program by the CPU in the server or device, so that at least one data processing terminal as data processing means is specified and specified in the server main body or device. A function of generating broadcast data including the identification information of the processing terminal and data processing data addressed to the data processing terminal, and a processing result of the data performed by the data processing terminal from some or all of the plurality of data processing terminals And a function of including the received processing result in the broadcast data and broadcasting the broadcast data to each of the plurality of data processing terminals.
[0057]
【The invention's effect】
According to the present invention as described above, data processing between data processing units when a plurality of data processing units are used can be efficiently performed.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration example of a multiprocessor system to which the present invention is applied.
FIG. 2 is a configuration diagram of a BCMC.
FIG. 3 is a configuration diagram of a cell processor.
FIG. 4 is a configuration diagram of a WTA / sum circuit.
FIG. 5 is a flowchart showing the flow of processing of the multiprocessor system according to the embodiment;
FIG. 6 is a conceptual diagram using a data processing result of an adjacent processor.
FIG. 7 is a conceptual diagram using data processing results of some processors.
FIG. 8 is an exemplary diagram in which grid point data is grouped.
FIG. 9 is an exemplary diagram when an object is divided into clusters.
FIG. 10 is a flowchart showing the flow of processing of a collision determination algorithm.
[Explanation of symbols]
10 BCMC
101 CPU core
102 Main memory
103 DMAC
20 cell processor
201 cell CPU
202 input buffer
203 output buffer
204 WTA buffer
205 Program Controller
206 Instruction memory
207 Data memory
30 WTA / sum circuit
301 First input register
302 Second input register
303 switch
304 comparator
305 adder
306 output register

Claims

所定の仮想空間内に分布する複数のオブジェクトの少なくとも一つを管理して、当該オブジェクトについて、前記仮想空間内における位置を表す位置データを生成する複数のプロセッサと、
すべての前記プロセッサから前記オブジェクトの位置データを取得可能であり、取得した位置データを一つずつすべてのプロセッサにブロードキャストするコントローラと、を備えており、
前記複数のプロセッサの各々は、前記コントローラからブロードキャストされた位置データを取り込んで、自分の管理下にあるオブジェクトが分布する範囲内に、取り込んだ位置データで位置が表されるオブジェクトがあるか否かを判定し、範囲内にある場合に、当該オブジェクトが自分の管理下にあるオブジェクトと衝突する位置にあるか否かを判定するようになっている、
マルチプロセッサシステム。A plurality of processors that manage at least one of a plurality of objects distributed in a predetermined virtual space and generate position data representing a position in the virtual space for the object;
It is possible to obtain the position data of the object from all the processors, and a controller that broadcasts the obtained position data to all processors one by one,
Each of the plurality of processors fetches the position data broadcast from the controller, and determines whether there is an object whose position is represented by the fetched position data within a range where the objects under its management are distributed. Is determined, and when it is within the range, it is determined whether or not the object is located at a position where it collides with an object under its control,
Multiprocessor system.

前記複数のプロセッサは、自分の管理するオブジェクトについて、前記仮想空間内における速度を表す速度データを生成するようになっており、
前記コントローラは、すべての前記プロセッサから前記位置データ及び前記速度データを取得可能であり、取得した位置データ及び速度データを一組ずつすべてのプロセッサにブロードキャストするようになっており、
前記プロセッサは、ブロードキャストされた位置データで位置が表されるオブジェクトが自分の管理下にあるオブジェクトと衝突位置にある場合に、ブロードキャストされたオブジェクトの前記速度データにより、衝突による衝撃の強さを定量的に表す衝突強度データ及び衝突によるオブジェクトへの影響を表すデータを生成するようになっている、
請求項１記載のマルチプロセッサシステム。The plurality of processors are configured to generate speed data representing a speed in the virtual space for an object managed by the plurality of processors,
The controller is capable of acquiring the position data and the speed data from all the processors, and broadcasts the acquired position data and speed data to all processors one by one,
The processor quantifies the impact strength of the collision based on the speed data of the broadcasted object when the object whose position is represented by the broadcasted position data is in a collision position with an object under its control. Collision strength data and data representing the effect of the collision on the object are generated.
The multiprocessor system according to claim 1.

前記複数のプロセッサには、各々を識別するための識別データが割り当てられており、
前記衝突強度データを生成したプロセッサからは衝突強度データをそのプロセッサの識別データと共に取り込み、前記衝突強度データを生成していないプロセッサからは衝突強度データの値よりも小さい値をそのプロセッサの衝突強度データとして識別データと共に取り込んで、最も大きい衝突強度データを生成したプロセッサを特定し、特定したプロセッサの識別データを前記コントローラに送る最大値検出機構を備えており、
前記コントローラは、前記最大値検出機構から送られた識別データにより表されるプロセッサから、衝突強度データ及び衝突によるオブジェクトへの影響を表すデータを取得するようになっている、
請求項２記載のマルチプロセッサシステム。Identification data for identifying each of the plurality of processors is assigned,
The processor that has generated the collision intensity data captures the collision intensity data together with the identification data of the processor, and the processor that has not generated the collision intensity data outputs a value smaller than the value of the collision intensity data from the processor that has not generated the collision intensity data. Captured along with the identification data, to identify the processor that generated the largest collision intensity data, comprising a maximum value detection mechanism to send the identification data of the identified processor to the controller,
The controller is configured to obtain, from the processor represented by the identification data sent from the maximum value detection mechanism, collision strength data and data representing an influence on the object due to the collision,
The multiprocessor system according to claim 2.

前記プロセッサは、位置データがブロードキャストされたオブジェクトと自分の管理下にあるオブジェクトとの距離を算出することにより、当該オブジェクトが自分の管理下にあるオブジェクトと衝突位置にあるか否かを判定するようになっている、
請求項１記載のマルチプロセッサシステム。The processor determines whether or not the object is in a collision position with the object under its control by calculating the distance between the object whose position data is broadcast and the object under its control. It has become,
The multiprocessor system according to claim 1.

所定の仮想空間内に分布する複数のオブジェクトをクラスタ単位に分けて、それぞれが異なる一つのクラスタを管理する複数のデータ処理手段と、すべての前記オブジェクトの前記仮想空間内における位置及び速度を管理するとともに、前記複数のデータ処理手段に、前記オブジェクトの位置を表す位置データ及び速度を表す速度データをブロードキャストする制御手段と、を有する装置又はシステムにおいて実行される方法であって、
前記制御手段が、すべてのオブジェクトの前記位置データ及び前記速度データを、オブジェクトが属するクラスタを表すクラスタデータとともにすべてのデータ処理手段にブロードキャストする段階と、
前記複数のデータ処理手段の各々が、自分の管理するオブジェクトについての位置データ及び速度データを、前記クラスタデータに基づいて取り込み、取り込んだ位置データ及び速度データから、新しい位置データ及び速度データを生成する段階と、
前記制御手段が、前記複数のデータ処理手段の各々から、各オブジェクトの新しい位置データ及び速度データを取り込むとともに、取り込んだ位置データを一つずつすべてのデータ処理手段にブロードキャストする段階と、
前記複数のデータ処理手段の各々が、前記制御手段からブロードキャストされた新しい位置データを取り込んで、自分の管理下にあるオブジェクトが分布する範囲内に当該新しい位置データで表されるオブジェクトがあるか否かを判定し、範囲内にある場合に、当該オブジェクトが自分の管理下にあるオブジェクトと衝突する位置にあるか否かを判定する段階と、をこの順序で実行する、
データ処理方法。A plurality of objects distributed in a predetermined virtual space are divided into clusters, a plurality of data processing means for managing one different cluster, and a position and a speed of all the objects in the virtual space are managed. And a control means for broadcasting, to the plurality of data processing means, position data representing the position of the object and velocity data representing the velocity, the method being executed in an apparatus or a system,
The control means broadcasting the position data and the velocity data of all objects to all data processing means together with cluster data representing a cluster to which the object belongs;
Each of the plurality of data processing units captures position data and speed data of an object managed by the plurality of data processing units based on the cluster data, and generates new position data and speed data from the captured position data and speed data. Stages and
The control means, from each of the plurality of data processing means, captures new position data and velocity data of each object, and broadcasts the captured position data one by one to all data processing means,
Each of the plurality of data processing means fetches new position data broadcast from the control means, and determines whether there is an object represented by the new position data within a range where objects under its management are distributed. Determining whether the object is in a position where the object collides with the object under its control if the object is within the range.
Data processing method.

所定の仮想空間内に分布する複数のオブジェクトの、前記仮想空間内における位置を表す位置データを管理する制御手段との間で双方向通信を行うシステムであって、
前記仮想空間内に分布する複数のオブジェクトの少なくとも一つを管理して、当該オブジェクトについて、前記仮想空間内における位置を表す位置データを生成するとともに、生成した位置データを前記制御手段に送る手段と、
前記制御手段から一つずつブロードキャストされる位置データを取り込んで、自分の管理下にあるオブジェクトが分布する範囲内に、取り込んだ位置データで位置が表されるオブジェクトがあるか否かを判定し、範囲内にある場合に、当該オブジェクトが自分の管理下にあるオブジェクトと衝突する位置にあるか否かを判定する手段と、を備えている
データ処理システム。A system that performs two-way communication with a control unit that manages position data representing a position in the virtual space of a plurality of objects distributed in a predetermined virtual space,
Means for managing at least one of the plurality of objects distributed in the virtual space, generating position data representing a position in the virtual space for the object, and sending the generated position data to the control means; ,
Capture the position data broadcast one by one from the control means, within the range of distribution of objects under their control, determine whether there is an object whose position is represented by the captured position data, Means for determining whether the object is in a position where the object collides with an object under its control when the object is within the range.

所定の仮想空間内に分布する複数のオブジェクトの、前記仮想空間内における位置を表す位置データを管理する制御手段との間で双方向通信を行う、コンピュータ搭載の装置に於いて、前記コンピュータに以下の機能を形成させるためのコンピュータプログラム。
（１）前記仮想空間内に分布する複数のオブジェクトの少なくとも一つを管理して、当該オブジェクトについて、前記仮想空間内における位置を表す位置データを生成するとともに、生成した位置データを前記制御手段に送る手段、
（２）前記制御手段から一つずつブロードキャストされる位置データを取り込んで、自分の管理下にあるオブジェクトが分布する範囲内に、取り込んだ位置データで位置が表されるオブジェクトがあるか否かを判定し、範囲内にある場合に、当該オブジェクトが自分の管理下にあるオブジェクトと衝突する位置にあるか否かを判定する手段。In a device equipped with a computer, which performs two-way communication with control means for managing position data representing a position in the virtual space of a plurality of objects distributed in a predetermined virtual space, A computer program for forming the functions of.
(1) managing at least one of a plurality of objects distributed in the virtual space, generating position data representing a position of the object in the virtual space, and transmitting the generated position data to the control unit; Means of sending,
(2) The position data broadcast from the control means is fetched one by one, and it is determined whether or not there is an object whose position is represented by the fetched position data within a range where the objects under its management are distributed. Means for judging and, if within the range, judging whether or not the object is in a position where it collides with an object under its control.

所定の仮想空間内に分布する複数のオブジェクトの、前記仮想空間内における位置を表す位置データを管理する制御手段との間で双方向通信を行う、コンピュータ搭載の装置に組み込まれることにより、前記コンピュータに以下の機能を形成させる半導体デバイス。
（１）前記仮想空間内に分布する複数のオブジェクトの少なくとも一つを管理して、当該オブジェクトについて、前記仮想空間内における位置を表す位置データを生成するとともに、生成した位置データを前記制御手段に送る手段、
（２）前記制御手段から一つずつブロードキャストされる位置データを取り込んで、自分の管理下にあるオブジェクトが分布する範囲内に、取り込んだ位置データで位置が表されるオブジェクトがあるか否かを判定し、範囲内にある場合に、当該オブジェクトが自分の管理下にあるオブジェクトと衝突する位置にあるか否かを判定する手段。The computer includes a plurality of objects distributed in a predetermined virtual space, and performs bidirectional communication with control means for managing position data representing positions in the virtual space. A semiconductor device that causes the following functions to be formed.
(1) managing at least one of a plurality of objects distributed in the virtual space, generating position data representing a position of the object in the virtual space, and transmitting the generated position data to the control unit; Means of sending,
(2) The position data broadcast from the control means is fetched one by one, and it is determined whether or not there is an object whose position is represented by the fetched position data within a range where the objects under its management are distributed. Means for judging and, if within the range, judging whether or not the object is in a position where it collides with an object under its control.