JP3434735B2

JP3434735B2 - Information processing system and fault handling method used for it

Info

Publication number: JP3434735B2
Application number: JP17616699A
Authority: JP
Inventors: 睦雄進藤
Original assignee: NEC Computertechno Ltd
Current assignee: NEC Computertechno Ltd
Priority date: 1999-06-23
Filing date: 1999-06-23
Publication date: 2003-08-11
Anticipated expiration: 2019-06-23
Also published as: JP2001007893A

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は情報処理システム及
びそれに用いる障害処理方式に関し、特にクロスバ・ス
イッチを介して共有メモリによるノード間通信を行うク
ラスタ構成化された情報処理システムの障害処理方式に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing system and a failure processing method used for the same, and more particularly to a failure processing method for a clustered information processing system for performing inter-node communication by a shared memory via a crossbar switch.

【０００２】[0002]

【従来の技術】従来、この種の情報処理システムにおい
ては、システムの一部に障害が発生してもシステム全体
がダウンすることなく稼動できるようにするために、複
数台のシステムを組合せて１つのシステムとして扱うよ
うにしている。このクラスタ・システムではシステムの
冗長性を上げて耐障害性を向上させたり、全体的なパフ
ォーマンスを向上させることができる。2. Description of the Related Art Conventionally, in this type of information processing system, a plurality of systems are combined so that even if a part of the system fails, the entire system can operate without being down. I treat them as one system. This cluster system can increase system redundancy to improve fault tolerance and overall performance.

【０００３】クラスタ・システムでは情報処理装置とい
う大きな単位で多重化し、それぞれのシステムでは独立
したプロセスを動作させることが多い。障害が発生した
場合にはそのノードのみを切離し、実行中のプロセスや
トランザクション等は他のノード上で再度実行させる
か、もしくは継続実行させることになる。In a cluster system, a large unit called an information processing device is used for multiplexing, and an independent process is often operated in each system. When a failure occurs, only that node is detached, and the process, transaction, etc. being executed is re-executed on another node or continuously executed.

【０００４】このようなクラスタを構築する最大の目的
は、何よりも信頼性が求められるシステムにおいて、万
一何らかの問題が発生した場合でも、問題を起こしたサ
ーバに代わってクラスタ内の他のサーバ（ノードと呼ば
れる）で処理を続行できるようにすることである。The most important purpose of constructing such a cluster is, in a system that requires reliability above all else, in the unlikely event that a problem occurs, another server in the cluster ( (Called a node) so that processing can continue.

【０００５】また、従来の情報処理装置において、上記
のようなクラスタ構成は１つのノードを１台の情報処理
装置で構成し、またノード間の通信路はイーサネットに
代表される通信ネットワークによって構成されている。Further, in the conventional information processing apparatus, in the cluster configuration as described above, one node is configured by one information processing apparatus, and the communication path between the nodes is configured by a communication network represented by Ethernet. ing.

【０００６】しかしながら、近年、分散共有メモリ型の
情報処理装置を論理的に分割し、１つのノードをプロセ
ッサ、メモリ、ＩＯ（入出力）装置からなる論理的分散
ノードとノード間の通信路とを分散共有メモリネットワ
ークで構成し、ノード間の通信を超高速に行うタイプの
クラスタ・システムも存在する。However, in recent years, a distributed shared memory type information processing apparatus is logically divided and one node is provided with a logically distributed node consisting of a processor, a memory, an IO (input / output) device and a communication path between the nodes. There is also a cluster system of a type that is composed of a distributed shared memory network and performs communication between nodes at an extremely high speed.

【０００７】ここで、この分散共有メモリ型システムと
は各ノードのメモリ領域をネットワーク接続で、他のノ
ードからもアクセス可能とした方式である。この方式に
よると、分散共有メモリへのアクセス・データはほとん
ど瞬時に他ノードに転送されるため、分散処理の高速
化、リアルタイム応答性を確保しやすい利点がある。Here, the distributed shared memory type system is a system in which the memory area of each node can be accessed from other nodes by network connection. According to this method, the access data to the distributed shared memory is transferred to the other nodes almost instantly, so that there is an advantage that it is easy to speed up the distributed processing and secure the real-time responsiveness.

【０００８】この種の分散共有メモリ型システムとして
は、例えば特開平８−３１４８７５号公報に記載された
情報処理装置があり、この情報処理装置では分散共有メ
モリを分散共有メモリ・ネットワークで接続している。An example of this type of distributed shared memory type system is an information processing apparatus described in Japanese Patent Laid-Open No. 8-314875. In this information processing apparatus, the distributed shared memory is connected by a distributed shared memory network. There is.

【０００９】図５は従来の情報処理システムの構成を示
すブロック図である。この図５において、６ａ〜６ｄは
当該システムを形成するコントローラまたは計算機等に
よる分散ノードで、８ａ〜８ｄはＣＰＵ、７ａ〜７ｄは
主メモリ、９ａ〜９ｄは分散共有メモリである。各分散
ノード６ａ〜６ｄはこれらＣＰＵ８ａ〜８ｄ、主メモリ
７ａ〜７ｄ、分散共有メモリ９ａ〜９ｄにて構成されて
いる。FIG. 5 is a block diagram showing the configuration of a conventional information processing system. In FIG. 5, 6a to 6d are distributed nodes by a controller or a computer forming the system, 8a to 8d are CPUs, 7a to 7d are main memories, and 9a to 9d are distributed shared memories. Each of the distributed nodes 6a to 6d is composed of these CPUs 8a to 8d, main memories 7a to 7d, and distributed shared memories 9a to 9d.

【００１０】また、９１ａ〜９１ｄは各分散ノード６ａ
〜６ｄで共有利用される分散共有メモリ９ａ〜９ｄ上の
共有データであり、９２ａ〜９２ｄは分散システム管理
データが格納された分散共有メモリ９ａ〜９ｄ上の状態
監視テーブルである。Further, 91a to 91d are distributed nodes 6a.
6d are shared data on the distributed shared memories 9a to 9d, and 92a to 92d are state monitoring tables on the distributed shared memories 9a to 9d in which distributed system management data are stored.

【００１１】さらに、ＣＰＵ８ａ内において、８１ａ，
８２ａ，８３ａは当該ＣＰＵ８ａの複数のアプリケーシ
ョンタスクであり、８４ａはこれら各アプリケーション
タスク８１ａ，８２ａ，８３ａの実行を制御するタスク
実行制御部であり、８５ａは自ノードの状態を状態監視
テーブル９２ａに記録するとともに、状態監視テーブル
９２ａを参照して他の分散ノード６ｂ〜６ｄの異常を検
出する状態監視部、８６ａは他の分散ノード６ｂ〜６ｄ
に故障が検出された場合に必要なアプリケーションタス
ク８１ａ，８２ａ，８３ａの処理をタスク実行制御部８
４ａに依頼する故障対処部である。尚、他の分散ノード
６ｂ〜６ｄのＣＰＵ８ｂ〜８ｄ内もＣＰＵ８ａと同様の
構成となっている。Further, in the CPU 8a, 81a,
Reference numerals 82a and 83a are a plurality of application tasks of the CPU 8a, 84a is a task execution control unit that controls the execution of these application tasks 81a, 82a, and 83a, and 85a records the state of the own node in the state monitoring table 92a. In addition, the state monitoring unit that refers to the state monitoring table 92a and detects an abnormality in the other distributed nodes 6b to 6d, and 86a is the other distributed nodes 6b to 6d.
The task execution control unit 8 processes the application tasks 81a, 82a, and 83a necessary when a failure is detected in the
It is a failure coping unit requested to 4a. The CPUs 8b to 8d of the other distributed nodes 6b to 6d have the same configuration as the CPU 8a.

【００１２】また、３００はこれら各分散ノード６ａ〜
６ｄの分散共有メモリ９ａ〜９ｄを直接接続している分
散共有メモリネットワークであり、各分散ノード６ａ〜
６ｄの分散共有メモリ９ａ〜９ｄの内容変更はこの分散
共有メモリネットワーク３００を介して各分散ノード６
ａ〜６ｄの動作とは独立に、各分散ノード６ａ〜６ｄの
分散共有メモリ９ａ〜９ｄに通信されてそのメモリ内容
が反映される。Reference numeral 300 denotes each of these distributed nodes 6a ...
6d is a distributed shared memory network in which distributed shared memories 9a to 9d are directly connected, and each distributed node 6a to
The contents of the distributed shared memories 9a to 9d of 6d are changed via the distributed shared memory network 300 to the distributed nodes 6 respectively.
Independently of the operations of a to 6d, the distributed shared memories 9a to 9d of the distributed nodes 6a to 6d are communicated with and their memory contents are reflected.

【００１３】次に、上記の情報処理システムの動作につ
いて説明する。ここで、各分散ノード６ａ〜６ｄは互い
に対等であり、符号を読み替えるだけで全く同様に動作
するものであるため、以下分散ノード６ａにおける動作
について説明し、他の分散ノード６ｂ〜６ｄの動作につ
いてはその説明を省略する。Next, the operation of the above information processing system will be described. Here, since the distributed nodes 6a to 6d are equal to each other and operate in exactly the same way only by replacing the symbols, the operation of the distributed node 6a will be described below, and the operation of the other distributed nodes 6b to 6d will be described. Will not be described.

【００１４】今、分散ノード６ａのＣＰＵ８ａが分散共
有メモリ９ａに書込みを行うと、そのアドレス及び内容
が分散共有メモリネットワーク３００を介して他の分散
ノード６ｂ〜６ｄの分散共有メモリ９ｂ〜９ｄに転送さ
れ、同じアドレスに同じ内容が書込まれる。すなわち、
各分散ノード６ａ〜６ｄの分散共有メモリ９ａ〜９ｄは
通信遅延を除いて等価である。Now, when the CPU 8a of the distributed node 6a writes to the distributed shared memory 9a, its address and contents are transferred to the distributed shared memories 9b to 9d of the other distributed nodes 6b to 6d via the distributed shared memory network 300. The same contents are written to the same address. That is,
The distributed shared memories 9a to 9d of the distributed nodes 6a to 6d are equivalent except for communication delay.

【００１５】ＣＰＵ８ａは複数のアプリケーションタス
ク８１ａ，８２ａ，８３ａを実行し、各アプリケーショ
ンタスク８１ａ，８２ａ，８３ａの実行処理は主メモリ
７ａ上にある局所データ及び分散共有メモリ９ａ上の共
有データ９１ａをアクセスして進める。尚、どのアプリ
ケーションタスク８１ａ，８２ａ，８３ａを実行するか
はタスク実行制御部８４ａが制御している。The CPU 8a executes a plurality of application tasks 81a, 82a, 83a, and the execution processing of each application task 81a, 82a, 83a accesses the local data in the main memory 7a and the shared data 91a in the distributed shared memory 9a. And proceed. The task execution control unit 84a controls which application task 81a, 82a, 83a is executed.

【００１６】[0016]

【発明が解決しようとする課題】上述した従来の情報処
理システムでは、分散共有メモリネットワークのように
ノード間が密に結合した構成の場合、あるノードにて訂
正不可能障害が発生すると、結合が密であるがゆえ、訂
正不可能障害を検出したデータがそのまま別のノードに
流れてしまい、この訂正不可能障害を複数または全部の
ノードで検出してしまうため、複数のノードに障害が伝
搬し、結果として高信頼システムにならない。In the above-mentioned conventional information processing system, in the case of a configuration in which nodes are closely coupled like a distributed shared memory network, the coupling is performed when an uncorrectable failure occurs in a certain node. Since it is dense, the data that detected the uncorrectable fault flows to another node as it is, and this uncorrectable fault is detected by multiple or all nodes, so the fault propagates to multiple nodes. , As a result, it does not become a highly reliable system.

【００１７】また、あるデータがあるノードの分散共有
メモリに書込まれると、そのアドレス及び内容が分散共
有メモリネットワークを介して他のノードに瞬時に書込
まれる構成であるため、本来の処理系統とは別に準備さ
れた障害検出及び通知系統からの通知との間に時間差が
生まれ、あるノードが障害通知を受けた時点で、すでに
障害データを使い終わってしまう可能性がある。よっ
て、最悪データ破壊等が発生する可能性がある。Further, when a certain data is written in the distributed shared memory of a node, its address and contents are instantly written in the other node via the distributed shared memory network. There is a time lag between the failure detection and the notification from the notification system prepared separately, and there is a possibility that the failure data has already been used up when a certain node receives the failure notification. Therefore, the worst data destruction may occur.

【００１８】そこで、本発明の目的は上記の問題点を解
消し、自ノードの障害を他ノードに伝搬させることな
く、自ノードの障害に起因する他ノードでのデータ破壊
等の不正動作を防止することができ、ノード間の高速通
信を維持したまま高信頼クラスタ・システムを構築する
ことができる情報処理システム及びそれに用いる障害処
理方式を提供することにある。Therefore, an object of the present invention is to solve the above problems and prevent an illegal operation such as data destruction in another node due to the failure of the own node without propagating the failure of the own node to the other node. It is possible to provide an information processing system and a failure processing method used for the information processing system capable of constructing a highly reliable cluster system while maintaining high-speed communication between nodes.

【００１９】[0019]

【課題を解決するための手段】本発明による情報処理シ
ステムは、共有メモリを用いて複数のノード間の通信を
行うクラスタ構成の情報処理システムであって、前記複
数のノード間の通信で受信したデータが不正なデータで
あることをチェックするチェック手段と、前記複数のノ
ード間で通信されるデータに訂正不可能な障害が発生し
た時に当該障害データを前記チェック手段で前記不正な
データであることが検出されるデータに置き換えて当該
障害データのデータ受取り側のノードへの伝搬を抑止す
る抑止手段とを前記複数のノード各々に備えている。The information processing system according to the present invention, in order to solve the problem] is an information processing system in the cluster configuration for communication among a plurality of nodes using a shared memory, the double
The data received during communication between several nodes is incorrect
Checking means for checking that there are
Uncorrectable failures occur in the data communicated between
When the fault data is detected by the checking means
Replace with data that is detected to be data
Prevents the propagation of faulty data to the data receiving node
And a deterrent means for each of the plurality of nodes.

【００２０】本発明による他の情報処理システムは、ク
ロスバ・スイッチを介して共有メモリによる複数のノー
ド間の通信を、自ノードの共有メモリに通信データを書
込みかつその通信データを他のノードが当該共有メモリ
から読出すことで行うクラスタ構成化された情報処理シ
ステムであって、前記複数のノード間の通信によって受
信したデータが不正なデータであることをチェックする
チェック手段と、前記複数のノード間で通信されるデー
タに訂正不可能な障害が発生した時に当該障害データを
前記チェック手段で前記不正なデータであることが検出
されるデータに置き換えて当該障害データのデータ受取
り側のノードへの伝搬を抑止する抑止手段とを前記複数
のノード各々に備えている。In another information processing system according to the present invention, communication between a plurality of nodes by a shared memory via a crossbar switch is performed, communication data is written in a shared memory of its own node, and the communication data is transmitted by another node. It is an information processing system having a cluster configuration by reading from a shared memory, which is received by communication between the plurality of nodes.
Check that the received data is invalid data
The checking means and the data communicated between the plurality of nodes.
Data when an uncorrectable failure occurs in
The checking means detects that the data is illegal.
Received the fault data by replacing it with the data
Each of the plurality of nodes is provided with a suppressing means for suppressing the propagation to the other node.

【００２１】本発明による情報処理システムの障害処理
方式は、共有メモリを用いて複数のノード間の通信を行
うクラスタ構成の情報処理システムの障害処理方式であ
って、前記複数のノード各々において、前記複数のノー
ド間で通信されるデータに訂正不可能な障害が発生した
時に当該障害データを不正なデータであることが検出さ
れるデータに置き換えて当該障害データのデータ受取り
側のノードへの伝搬を抑止し、前記複数のノード間の通
信で受信したデータが前記不正なデータであることをチ
ェックしている。The failure processing method of an information processing system according to the present invention is a failure processing method of the information processing system of the cluster configuration for communication among a plurality of nodes using a shared memory, at said plurality of nodes each, the Multiple no
An uncorrectable failure occurred in the data communicated between
Sometimes the faulty data is detected as invalid data.
Replace the data with the received data and receive the fault data
The propagation to the node on the side is suppressed and the communication between the multiple nodes is suppressed.
Check that the data received by the communication is the illegal data.
I am checking .

【００２２】本発明による他の情報処理システムの障害
処理方式は、クロスバ・スイッチを介して共有メモリに
よる複数のノード間の通信を、自ノードの共有メモリに
通信データを書込みかつその通信データを他のノードが
当該共有メモリから読出すことで行うクラスタ構成化さ
れた情報処理システムの障害処理方式であって、前記複
数のノード各々において、前記複数のノード間で通信さ
れるデータに訂正不可能な障害が発生した時に当該障害
データを不正なデータであることが検出されるデータに
置き換えて当該障害データのデータ受取り側のノードへ
の伝搬を抑止し、前記複数のノード間の通信で受信した
データが前記不正なデータであることをチェックしてい
る。A failure processing method for another information processing system according to the present invention is to perform communication between a plurality of nodes by a shared memory via a crossbar switch, write communication data in the shared memory of the own node, and transfer the communication data to other nodes. node a failure processing method of an information processing system that are clustered reduction carried out by reading from the shared memory, in said plurality of nodes each, communication is between said plurality of nodes
When an uncorrectable failure occurs in the data that is stored, the failure
Turn data into data that is detected as malicious
Replace and go to the node on the data receiving side of the fault data
Of the communication between the plurality of nodes
It is checked that the data is the illegal data .

【００２３】すなわち、本発明の情報処理システムは、
クロスバ・スイッチを介して共有メモリによるノード間
通信を行うクラスタ構成化されたシステムであり、ノー
ド間で通信されるデータがデータ送信側において訂正不
可能な障害が発生した場合でも、データ受取り側のノー
ドに対して障害データを伝搬させない手段、及びノード
間通信によって受信したデータが不正なデータであるこ
とをチェックする手段の２つの手段を併用することによ
ってデータ受取り側のノードにノードダウン等の悪影響
と不正データによるデータ破壊等の不正動作とを防止可
能としている。That is, the information processing system of the present invention is
It is a clustered system that performs node-to-node communication by shared memory via a crossbar switch. Even if an uncorrectable failure occurs on the data transmission side for the data transmitted between nodes, the data reception side Adverse effects such as node down on the data receiving node by using the two means of not propagating the fault data to the node and the means of checking that the data received by the inter-node communication is invalid data It is possible to prevent illegal operation such as data destruction due to illegal data.

【００２４】具体的に、本発明のクラスタ構成をとる情
報処理システムにおいては、各ノード間の通信をクラス
タ・ドライバというソフトウェアによって制御してい
る。例えば、第１のノードと第３のノードとの間の通信
は第１のクラスタ・ドライバが自ノードの共有メモリに
通信データを書込み、その通信データを第３のノードの
クラスタ・ドライバが第１のノードの共有メモリを読出
すことで、または第３のクラスタ・ドライバが自ノード
の共有メモリに通信データを書込み、その通信データを
第１のノードのクラスタ・ドライバが第３のノードの共
有メモリを読出すことで実現している。Specifically, in the information processing system having the cluster configuration of the present invention, the communication between the nodes is controlled by software called a cluster driver. For example, for communication between the first node and the third node, the first cluster driver writes communication data in the shared memory of its own node, and the communication data is first transmitted by the cluster driver of the third node. Read the shared memory of the node, or the third cluster driver writes communication data to the shared memory of its own node, and the cluster driver of the first node shares the communication data with the shared memory of the third node. It is realized by reading.

【００２５】尚、クロスバ・スイッチで接続された共有
メモリを使ったノード間通信はイーサーネットのような
ネットワークを使ったノード間通信に比べ、通信速度が
桁違いに早いという特徴も合わせ持っている。Note that inter-node communication using a shared memory connected by a crossbar switch has a feature that the communication speed is orders of magnitude faster than inter-node communication using a network such as Ethernet. .

【００２６】第１のノードから第３のノードへ通信デー
タを送る場合、第３のノードのクラスタドライバが第１
のノードの共有メモリ領域のリード要求を発行し、第１
のノードのメモリ制御部がメモリ中の共有メモリ領域か
らデータを読出す。この第１のノード内部では読出しデ
ータにＥＣＣ（Ｅｒｒｏｒ−ＣｏｒｒｅｃｔｉｎｇＣｏ
ｄｅ）で訂正不可能な２ビットエラーが検出された時、
この訂正不可能なエラーを検出した第１のノードは第３
のノードにリード要求のあったデータのうちの２ビット
エラーを検出した時点から残りの全データを、２ビット
エラー状態でない固定値（例えば、ＥＣＣ以外のビット
が“０”であるデータ）を返却する。When sending communication data from the first node to the third node, the cluster driver of the third node makes the first
Issue a read request for the shared memory area of the node
The memory control unit of the node reads the data from the shared memory area in the memory. ECC (Error-Correcting Co) is added to the read data inside the first node.
When an uncorrectable 2-bit error is detected in de),
The first node that detects this uncorrectable error is the third
Returns all the remaining data from the time when a 2-bit error is detected in the read-requested data to the node, a fixed value that is not in the 2-bit error state (for example, data in which bits other than ECC are “0”) To do.

【００２７】これによって、訂正不可能な２ビットエラ
ーを持ったデータが、他のノードに伝搬されるのを防止
することが可能となり、結果として障害が他のノードに
伝搬するのを防止可能となる。This makes it possible to prevent data having an uncorrectable 2-bit error from propagating to another node, and consequently prevent a fault from propagating to another node. Become.

【００２８】しかしながら、第３のノードから見れば、
第１のノードから化けたデータが返却されたことにな
り、第３のノードがこのデータを使って処理を進める
と、データ破壊等の新たな障害につながってしまう。However, from the perspective of the third node,
The garbled data has been returned from the first node, and if the third node uses this data to proceed with processing, it leads to a new failure such as data destruction.

【００２９】そこで、クラスタ間の通信を制御するクラ
スタ・ドライバは自ノードの共有メモリに通信データを
書込む時にチェックサムデータを必ず付加し、他のノー
ドの共有メモリから通信データを読出す場合にチェック
サムによるデータの誤りを必ず検出している。Therefore, the cluster driver for controlling the communication between the clusters always adds the checksum data when writing the communication data to the shared memory of its own node and reads the communication data from the shared memory of another node. Data error due to checksum is always detected.

【００３０】つまり、第１のノードから第３のノードに
通信データを送る場合、第１のノードのクラスタ・ドラ
イバが送るべき通信データを自ノードの共有メモリに書
込む時にチェックサムデータを算出し、書込みデータに
付加する。That is, when sending communication data from the first node to the third node, the checksum data is calculated when the communication data to be sent by the cluster driver of the first node is written in the shared memory of its own node. , Add to write data.

【００３１】第３のノードのクラスタ・ドライバは受取
るべき通信データを第１のノードの共有メモリから読出
し、さらにチェックサムによるデータの正当制を確認す
る。このデータの正当制の確認時点で、２ビットエラー
の無い任意の固定値を返却されたノードではクラスタ・
ドライバがチェックサムエラーを検出し、第１のノード
に異常が発生したこと、また受取ったデータが無効であ
ることを検出することが可能となり、不正データを使用
したデータ破壊等が防止可能となる。The cluster driver of the third node reads the communication data to be received from the shared memory of the first node, and further confirms the legitimacy of the data by the checksum. At the time of confirming the legality of this data, the node that returned any fixed value without 2-bit error
It becomes possible for the driver to detect a checksum error, detect that an abnormality has occurred in the first node, and that the received data is invalid, and it is possible to prevent data corruption using illegal data. .

【００３２】[0032]

【発明の実施の形態】次に、本発明の一実施例について
図面を参照して説明する。図１は本発明の一実施例によ
る情報処理システムの構成を示すブロック図である。図
１において、本発明の一実施例による情報処理システム
は第１のノード１、第２のノード２、第３のノード３、
第４のノード４という４個のノードと、これら各ノード
１〜４間を接続するクロスバ・スイッチ５とから構成さ
れている。BEST MODE FOR CARRYING OUT THE INVENTION Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an information processing system according to an embodiment of the present invention. 1, an information processing system according to an embodiment of the present invention includes a first node 1, a second node 2, a third node 3,
It is composed of four nodes, that is, a fourth node 4, and a crossbar switch 5 that connects these nodes 1 to 4.

【００３３】本発明の一実施例による情報処理システム
ではシステムの一部に障害が発生しても、システム全体
がダウンすることなく稼動できるようにするために、第
１のノード１と第２のノード２と第３のノード３と第４
のノード４とを組合せて、１つのシステムとして扱うよ
うにしたクラスタ構成を組んでいる。In the information processing system according to the embodiment of the present invention, even if a part of the system fails, the first node 1 and the second node 1 can operate so that the entire system can be operated without being down. Node 2 and 3rd Node 3 and 4th
And a node 4 of No. 4 are combined to form a cluster configuration that is handled as one system.

【００３４】また、クラスタ間の通信は各ノード１〜４
の共有メモリ（メモリ部１６の共有メモリ空間１６ｂ）
とクロスバ・スイッチ５とを介して実現している。尚、
共有メモリへの書込みは共有メモリを持ったノードのみ
許可、共有メモリからの読出しは各ノードとも許可され
ている。Communication between clusters is performed by each node 1 to 4
Shared memory (shared memory space 16b of the memory unit 16)
And the crossbar switch 5 are used. still,
Writing to the shared memory is permitted only for the node having the shared memory, and reading from the shared memory is permitted for each node.

【００３５】さらに、本発明の一実施例による情報処理
システムでは各ノード１〜４間のインタフェースはメデ
ィアを電気、接続方式をクロスバ・スイッチとしたが、
メディアは光、電気、電波のいずれであってもよく、そ
の接続方式もバス型、リング型、スター型、無線型のい
ずれであってもよい。Further, in the information processing system according to the embodiment of the present invention, the interface between the nodes 1 to 4 is the medium as electricity and the connection method as the crossbar switch.
The medium may be light, electricity, or radio waves, and the connection system may be any of bus type, ring type, star type, and wireless type.

【００３６】各ノード１〜４はそれぞれ独立して動作可
能な情報処理装置、または情報処理装置として必要なプ
ロセッサ、メモリ、入出力部を持った論理的な単位であ
るが、本発明の一実施例による情報処理システムではノ
ードがプロセッサ、メモリ、入出力部を持ったセルと呼
ばれるカードで構成されている。ここで、第１のノード
１と第２のノード２と第３のノード３と第４のノード４
とはそれぞれ同一構成となっており（第１のノード１の
詳細構成のみ図示）、以下第１のノード１を例に挙げて
説明する。Each of the nodes 1 to 4 is an information processing device that can operate independently, or a logical unit having a processor, a memory, and an input / output unit necessary for the information processing device. In the information processing system according to the example, a node is composed of a processor, a memory, and a card called a cell having an input / output unit. Here, the first node 1, the second node 2, the third node 3 and the fourth node 4
Have the same configuration (only the detailed configuration of the first node 1 is shown), and the first node 1 will be described below as an example.

【００３７】第１のノード１は複数のＭＰＵ（マイクロ
・プロセッサ・ユニット）１１−１〜１１−ｎと、ＩＯ
（入出力）制御部１３と、システム制御部１４と、メモ
リ制御部１５と、メモリ部１６と、プロセッサバス１１
０とから構成されている。The first node 1 includes a plurality of MPUs (microprocessor units) 11-1 to 11-n and IOs.
(Input / output) control unit 13, system control unit 14, memory control unit 15, memory unit 16, and processor bus 11
It is composed of 0 and 0.

【００３８】複数のＭＰＵ１１−１〜１１−ｎはプログ
ラム命令を解釈して実行する。ＩＯ制御部１３は第１の
ノード１が情報処理装置として動作するために必要なＬ
ＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ファ
イル装置、キーボード等の入出力デバイスを配下に持
ち、この配下のデバイスへのアクセスを制御する。The plurality of MPUs 11-1 to 11-n interpret and execute program instructions. The IO control unit 13 has an L level necessary for the first node 1 to operate as an information processing device.
It has an input / output device such as an AN (Local Area Network), a file device, and a keyboard under its control, and controls access to this subordinate device.

【００３９】システム制御部１４はメモリ制御部１５及
びＩＯ制御部１３にＥＣＣを持ったインタフェースで接
続され、各ＭＰＵ１１−１〜１１−ｎの命令によってメ
モリ制御部１５及びＩＯ制御部１３への動作要求を行
い、かつクロスバ・スイッチ５にＥＣＣを持ったインタ
フェースで接続される。The system control unit 14 is connected to the memory control unit 15 and the IO control unit 13 by an interface having an ECC, and operates to the memory control unit 15 and the IO control unit 13 according to the commands of each MPU 11-1 to 11-n. A request is made and the crossbar switch 5 is connected by an interface having an ECC.

【００４０】メモリ制御部１５はメモリ部１６へのアク
セスを制御し、メモリ部１６にＥＣＣ（Ｅｒｒｏｒ−Ｃ
ｏｒｒｅｃｔｉｎｇＣｏｄｅ）を持ったインタフェー
スで接続される。メモリ部１６は固有メモリ空間１６ａ
と共有メモリ空間１６ｂとからなり、ＥＣＣが付加され
データの１ビットエラーを訂正することが可能なプログ
ラム命令及びデータを格納する。プロセッサバス１１０
はＥＣＣを持ち、各ＭＰＵ１１−１〜１１−ｎとシステ
ム制御部１４とを接続する。The memory control unit 15 controls access to the memory unit 16, and the memory unit 16 stores an ECC (Error-C).
It is connected by an interface having an activating code). The memory unit 16 has a unique memory space 16a
And a shared memory space 16b, which is added with ECC and stores program instructions and data capable of correcting a 1-bit error of data. Processor bus 110
Has an ECC and connects the MPUs 11-1 to 11-n to the system control unit 14.

【００４１】システム制御部１４にはＥＣＣによる１ビ
ットエラー検出、１ビットエラー訂正、２ビットエラー
検出、及び２ビットエラー検出時に他のノード２〜４へ
の転送データを“０”固定値＋ＥＣＣに置換える等の各
機能を持つ拡張ＥＣＣ回路１４ａが内蔵されている。In the system control unit 14, when 1-bit error detection by ECC, 1-bit error correction, 2-bit error detection, and 2-bit error detection, transfer data to other nodes 2 to 4 is set to "0" fixed value + ECC. An extended ECC circuit 14a having various functions such as replacement is built in.

【００４２】また、ＩＯ制御部１３、メモリ制御部１
５、クロスバ・スイッチ５もＥＣＣによる１ビットエラ
ー検出、１ビットエラー訂正、２ビットエラー検出の機
能を持つＥＣＣ回路（図示せず）が内蔵されている。Further, the IO controller 13 and the memory controller 1
5. The crossbar switch 5 also has a built-in ECC circuit (not shown) having a function of 1-bit error detection, 1-bit error correction, and 2-bit error detection by ECC.

【００４３】ここで、本実施例では固定値を“０”＋Ｅ
ＣＣとしたが、ＥＣＣによって２ビットエラーが検出さ
れない限り、この固定値はどんな値でも良い。尚、第１
のノード１及び第３のノード３にはノード間通信及び他
のノードの状態を管理するクラスタ・ドライバという専
用ソフトウェア１７，３１が動作している。また、図示
していないが、第２のノード２及び第４のノード４にも
専用ソフトウェアが動作している。In this embodiment, the fixed value is "0" + E.
Although this is CC, this fixed value may be any value unless a 2-bit error is detected by the ECC. The first
The dedicated software 17, 31 called a cluster driver that manages the inter-node communication and the states of other nodes operates on the node 1 and the third node 3. Although not shown, the dedicated software also operates on the second node 2 and the fourth node 4.

【００４４】このクラスタ・ドライバ１７，３１には自
ノードの共有メモリに書込んで他のノードへ送出する送
出データに対して必ずデータチェック用のサム算出とサ
ムの付加とを行うサム付加機能１７ａ，３１ａと、他の
ノードの共有メモリから読出した受信データに対して必
ずデータチェック用のサムをチェックするサムチェック
機能１７ｂ，３１ｂの２つの機能が備えられている。The cluster drivers 17 and 31 are provided with a sum addition function 17a for surely performing sum calculation and sum addition for data check with respect to transmission data which is written in the shared memory of the own node and transmitted to other nodes. , 31a and a sum check function 17b, 31b for surely checking the sum for data check with respect to the received data read from the shared memory of the other node.

【００４５】本発明の一実施例による情報処理システム
には装置のブート及びシャットダウンに伴う電源制御を
含めたサービス、障害発生時の障害情報の採取及び障害
ノードのシャットダウン、他のノードへの障害通知、障
害後処理のサービスを行うサービスプロセッサという装
置を付加し、各ノードが訂正不可能及び訂正可能障害を
検出した場合にサービスプロセッサにも障害通知を行
い、各ノードがこのサービスプロセッサとのインタフェ
ースから他のノードの障害通知を受取る構成を追加して
もよい。The information processing system according to the embodiment of the present invention includes a service including power supply control for booting and shutting down an apparatus, collecting fault information when a fault occurs and shutting down a faulty node, and reporting faults to other nodes. A device called a service processor that performs post-fault processing services is added, and when each node detects an uncorrectable or correctable fault, it also notifies the service processor of the fault, and each node uses the interface with this service processor. You may add the structure which receives the failure notification of another node.

【００４６】また、拡張ＥＣＣ回路１４ａをクロスバ・
スイッチ５の各ノード１〜４間インタフェース部分に持
たせ、各ノード１〜４とクロスバ・スイッチインタフェ
ース上で発生した訂正不可能障害も他のノードに伝搬し
ないよう構成してもよい。Further, the extended ECC circuit 14a is connected to the crossbar
The switch 5 may be provided in the interface portion between the nodes 1 to 4 so that an uncorrectable fault occurring on the nodes 1 to 4 and the crossbar switch interface does not propagate to other nodes.

【００４７】さらに、ノード間通信データのサム作成、
サムチェックを行う専用回路を各ノード１〜４のシステ
ム制御部１４（ノード２〜４のシステム制御部は図示せ
ず）に設け、クラスタ・ドライバ１７，３１の処理負荷
を減らすという構成をとってもよい。Furthermore, the sum of the communication data between nodes is created,
A dedicated circuit for performing the sum check may be provided in the system control unit 14 of each of the nodes 1 to 4 (the system control unit of the nodes 2 to 4 is not shown) to reduce the processing load of the cluster drivers 17 and 31. .

【００４８】図２は図１の拡張ＥＣＣ回路１４ａの構成
例を示すブロック図である。図２において、拡張ＥＣＣ
回路１４ａはＥＣＣ付きデータの入力レジスタ２０と、
ＥＣＣエラー検出回路２１と、ＣＲＣＴ回路２２と、エ
ラー保持レジスタ２３と、オアゲート２４と、アンドゲ
ート２５と、セレクタ２６とから構成されている。FIG. 2 is a block diagram showing a configuration example of the extended ECC circuit 14a of FIG. In FIG. 2, the extended ECC
The circuit 14a includes an input register 20 for data with ECC,
An ECC error detection circuit 21, a CRCT circuit 22, an error holding register 23, an OR gate 24, an AND gate 25, and a selector 26.

【００４９】ＥＣＣエラー検出回路２１は入力レジスタ
２０からの出力を受け、１ビット及び２ビットエラーを
検出し、それぞれのエラー検出信号を出力する。ＣＲＣ
Ｔ回路２２は入力レジスタ２０からの出力を受け、１ビ
ットエラーがあった場合にエラー訂正データを生成す
る。The ECC error detection circuit 21 receives the output from the input register 20, detects 1-bit and 2-bit errors, and outputs respective error detection signals. CRC
The T circuit 22 receives the output from the input register 20 and generates error correction data when there is a 1-bit error.

【００５０】エラー保持レジスタ２３はＥＣＣエラー検
出回路２１が２ビットエラーを検出したことを通知する
２ビットエラー信号２０１とクロックの立下りとによっ
てセットされ、データ転送要求の完了によってリセット
される。The error holding register 23 is set by the 2-bit error signal 201 notifying that the ECC error detection circuit 21 has detected a 2-bit error and the falling edge of the clock, and reset by the completion of the data transfer request.

【００５１】オアゲート２４は２ビットエラー信号２０
１とエラー保持レジスタ２３の２値とを入力し、２ビッ
トエラーを示す値を出力する。アンドゲート２５は送出
先が他ノード／自ノードかを示す信号を図示せぬデータ
転送先判断回路から受けるノード通知信号２０２及び２
ビットエラーを示すオアゲート２４の出力の２値とを入
力し、セレクタ２６に信号を出力する。The OR gate 24 receives the 2-bit error signal 20.
1 and the binary value of the error holding register 23 are input and a value indicating a 2-bit error is output. The AND gate 25 receives a signal indicating whether the transmission destination is another node / own node from a node not shown in the data transfer destination determination circuit 202 and 2
The binary value of the output of the OR gate 24 indicating a bit error is input and a signal is output to the selector 26.

【００５２】セレクタ２６はアンドゲート２５の出力を
受け、拡張ＥＣＣ回路１４ａの出力データをＥＣＣ以外
のビットが“０”であるデータとするか、ＣＲＣＴ回路
２２の出力とするかを選択する。The selector 26 receives the output of the AND gate 25 and selects whether the output data of the extended ECC circuit 14a is the data whose bits other than ECC are "0" or the output of the CRCT circuit 22.

【００５３】ここで、本実施例では２ビットエラー検出
時に、エラーデータを任意の固定値に変更するのはデー
タの送り先が他のノードであるか否かをノード通知信号
２０２とアンドゲート２５とから判断するよう構成して
いるが、全ての２ビットエラーデータについて、任意の
固定値へのデータ差し替えを実施するよう構成してもよ
い。その場合、本実施例のノード通知信号２０２とアン
ドゲート２５とが不要になる。Here, in the present embodiment, when a 2-bit error is detected, the error data is changed to an arbitrary fixed value by the node notification signal 202 and the AND gate 25 depending on whether or not the destination of the data is another node. However, it is also possible to replace all 2-bit error data with arbitrary fixed values. In that case, the node notification signal 202 and the AND gate 25 of this embodiment are unnecessary.

【００５４】図３は本発明の一実施例による情報処理シ
ステムの動作を示すタイミングチャートであり、図４は
図２に示す拡張ＥＣＣ回路１４ａの動作を示すタイミン
グチャートである。これら図１〜図４を参照して本発明
の一実施例による情報処理システムの動作について説明
する。FIG. 3 is a timing chart showing the operation of the information processing system according to the embodiment of the present invention, and FIG. 4 is a timing chart showing the operation of the extended ECC circuit 14a shown in FIG. The operation of the information processing system according to the embodiment of the present invention will be described with reference to FIGS.

【００５５】本発明の一実施例による情報処理システム
では各ノード１〜４のメモリ部１６の一部を他のノード
からリード可能・ライト不可能な共有メモリ空間１６ｂ
として設定し、この共有メモリ空間１６ｂをクラスタ・
ドライバ１７，３１がアクセスすることでノード間通信
を実現している。In the information processing system according to the embodiment of the present invention, a part of the memory unit 16 of each of the nodes 1 to 4 is readable / writable by another node, and the shared memory space 16b is readable / writable.
And set this shared memory space 16b as a cluster
The access between the drivers 17 and 31 realizes inter-node communication.

【００５６】ここでは第３のノード３が第１のノード１
と通信する場合について、特に第３のノード３が第１の
ノード１の共有メモリ空間１６ｂのデータをリードした
場合について説明する。Here, the third node 3 is the first node 1
The case where the third node 3 reads the data in the shared memory space 16b of the first node 1 will be described below.

【００５７】図３を参照すると、第３のノードのクラス
タ・ドライバ３１は第１のノード１の共有メモリ空間１
６ｂに用意された通信エリアをリードする（図３の４１
参照）。クラスタ・ドライバ３１によるリードはＭＰＵ
（図示せず）から第１のノード１の共有メモリ空間１６
ｂへのリード要求としてシステム制御部（図示せず）に
発行される。Referring to FIG. 3, the cluster driver 31 of the third node uses the shared memory space 1 of the first node 1.
The communication area prepared in 6b is read (41 in FIG. 3).
reference). Read by the cluster driver 31 is MPU
From the shared memory space 16 of the first node 1 (not shown)
It is issued to the system control unit (not shown) as a read request to b.

【００５８】システム制御部はリード要求が他のノード
（この場合、第１のノード１）への要求であることを認
識し、そのリード要求をクロスバ・スイッチ５に発行す
る（図３の４２参照）。The system control unit recognizes that the read request is a request to another node (in this case, the first node 1) and issues the read request to the crossbar switch 5 (see 42 in FIG. 3). ).

【００５９】また、クロスバ・スイッチ５はこのリード
要求が第１のノード１に対する要求であることを認識
し、第１のノード１のシステム制御部１４にリード要求
を発行する（図３の４３参照）。Further, the crossbar switch 5 recognizes that this read request is a request to the first node 1 and issues a read request to the system control unit 14 of the first node 1 (see 43 in FIG. 3). ).

【００６０】クロスバ・スイッチ５からリード要求を受
けた第１のノード１のシステム制御部１４はメモリ制御
部１５を介して共有メモリ空間１６ｂのリードを実行す
る（図３の４４，４５参照）。The system controller 14 of the first node 1 which has received the read request from the crossbar switch 5 executes the read of the shared memory space 16b via the memory controller 15 (see 44 and 45 in FIG. 3).

【００６１】メモリ部１６からは上記のリード要求にし
たがったデータが読出され（図３の４６参照）、メモリ
制御部１５を経由してシステム制御部１４に共有メモリ
空間１６ｂから読出されたデータが返却される（図３の
４７参照）。尚、本実施例ではメモリ部１６から読出さ
れたデータが訂正不可能な２ビットエラー状態であると
して説明する。The data according to the above read request is read from the memory section 16 (see 46 in FIG. 3), and the data read from the shared memory space 16b is sent to the system control section 14 via the memory control section 15. It is returned (see 47 in FIG. 3). In the present embodiment, it is assumed that the data read from the memory section 16 is in an uncorrectable 2-bit error state.

【００６２】システム制御部１４は共有メモリ空間１６
ｂのリードデータを受取りつつ、ＥＣＣによるデータエ
ラーをチェックするが、このチェックで２ビットエラー
を検出する（図３の４８参照）。The system control unit 14 uses the shared memory space 16
While receiving the read data of b, the data error by the ECC is checked, but the 2-bit error is detected by this check (see 48 in FIG. 3).

【００６３】システム制御部１４は２ビットエラー検出
時にエラーを持ったデータが他のノードへ返却すべきデ
ータであることも合わせて認識し、クロスバ・スイッチ
５に対して返却するデータを２ビットエラーを検出した
時点から、返却データを２ビットエラーを訂正した
“０”固定値＋ＥＣＣに差し替えてクロスバ・スイッチ
５に返却する。The system control unit 14 also recognizes that the data having an error at the time of detecting the 2-bit error is the data to be returned to another node, and the data to be returned to the crossbar switch 5 is 2-bit error. From the time point of detecting "," the return data is replaced with the fixed value "0" + ECC in which the 2-bit error is corrected and returned to the crossbar switch 5.

【００６４】尚、この時、第１のノード１では障害通知
信号線（図示せず）を使って障害時の装置内部（図示せ
ず）の情報を採取したり、障害発生時の後処理を行うサ
ービスプロセッサに障害を通知している。At this time, the first node 1 uses a fault notification signal line (not shown) to collect information inside the device (not shown) at the time of the fault and to perform post-processing when the fault occurs. The failure is notified to the service processor.

【００６５】２ビットエラーを訂正して“０”固定値＋
ＥＣＣに差し替えられたデータは、クロスバ・スイッチ
５を経由し（図３の４９参照）、第３のノード３のシス
テム制御部を経由してＭＰＵに返却される（図３の５０
参照）。Correct 2-bit error and fix "0" value +
The data replaced by the ECC is returned to the MPU via the crossbar switch 5 (see 49 in FIG. 3) and the system control unit of the third node 3 (50 in FIG. 3).
reference).

【００６６】ここで、先の第３のノード３のクラスタ・
ドライバ３１からの通信エリアへのリード要求が完了す
る。この通信エリアへのリード要求の完了を受け、クラ
スタ・ドライバ３１はリードしたデータをサムによるチ
ェックする（図３の５１参照）。Here, the cluster of the third node 3
The read request from the driver 31 to the communication area is completed. Upon completion of the read request to the communication area, the cluster driver 31 checks the read data by the sum (see 51 in FIG. 3).

【００６７】このチェックではデータの途中が“０”に
変えられているため、サムの不一致が発生し、クラスタ
・ドライバ３１は第１のノード１にて障害が発生して受
取ったデータが無効であること、以後、第１のノード１
へのアクセスを実施しないようノードダウンの処理が必
要であることに気がつく。そこで、クラスタ・ドライバ
３１はリードデータを破棄し（図３の５２参照）、第１
のノード１のダウン処理を行う（図３の５１参照）。In this check, since the middle of the data has been changed to "0", a mismatch of thumbs occurs, and the cluster driver 31 has a failure in the first node 1 and the received data is invalid. There being, then the first node 1
Notice that it is necessary to process the node down so as not to access to. Therefore, the cluster driver 31 discards the read data (see 52 in FIG. 3), and the first
The down processing of node 1 is performed (see 51 in FIG. 3).

【００６８】次に、第１のノード１のシステム制御部１
４がＥＣＣによるデータエラーをチェックする動作につ
いて図２及び図４を参照して説明する。図４を参照する
と、入力レジスタ２０が２ビットエラーを含んだデータ
をデータ送出元の制御部（本ケースではメモリ制御部１
５）から受取り、時刻Ｔ０においてクロックの立下りと
共に入力レジスタ２０に格納する。同時に、ＥＣＣエラ
ー検出回路２１とＣＲＣＴ回路２２と２ビットエラーを
含んだデータの出力を開始する。Next, the system controller 1 of the first node 1
The operation in which the data No. 4 checks the data error due to the ECC will be described with reference to FIGS. 2 and 4. Referring to FIG. 4, the input register 20 sends data including a 2-bit error to the control unit (in this case, the memory control unit 1) of the data transmission source.
5), it is stored in the input register 20 at the time T0 at the falling edge of the clock. At the same time, the ECC error detection circuit 21, the CRCT circuit 22, and the output of data including a 2-bit error are started.

【００６９】また、入力レジスタ２０が格納したデータ
はクロスバ・スイッチ５経由で他のノードに送出される
ため、ノード通知信号２０２は他のノードを示す“１”
の値となっている。Since the data stored in the input register 20 is sent to another node via the crossbar switch 5, the node notification signal 202 is "1" indicating the other node.
Is the value of.

【００７０】時刻Ｔ１において、ＥＣＣエラー検出回路
２１が２ビットエラーを検出し、２ビットエラー検出信
号２０１に２ビットエラーを示す“１”を出力する。こ
の時、ノード通知信号２０２は他のノードを示す“１”
であるから、オアゲート２４の出力と、アンドゲート２
５の出力も“１”となり、セレクタ２６に対して拡張Ｅ
ＣＣ回路１４ａの出力データを“０”＋ＥＣＣであるデ
ータを出力するよう通知する。At time T1, the ECC error detection circuit 21 detects a 2-bit error and outputs "1" indicating the 2-bit error as the 2-bit error detection signal 201. At this time, the node notification signal 202 is "1" indicating another node.
Therefore, the output of the OR gate 24 and the AND gate 2
The output of 5 also becomes "1", and the selector 26 is extended E
The output data of the CC circuit 14a is notified to output the data of "0" + ECC.

【００７１】時刻Ｔ２において、セレクタ２６は“０”
＋ＥＣＣであるデータを出力として選択し、次のデータ
が入力レジスタ２０に取込まれる時刻Ｔ３まで、この
“０”＋ＥＣＣデータを拡張ＥＣＣ回路１４ａの出力デ
ータとして出力し続ける。At time T2, the selector 26 is "0".
The data that is + ECC is selected as the output, and this "0" + ECC data is continuously output as the output data of the extended ECC circuit 14a until the time T3 when the next data is taken into the input register 20.

【００７２】時刻Ｔ３において、エラー保持レジスタ２
３は２ビットエラー検出信号２０１の出力する値“１”
を保持し、以後、データ転送要求の完了まで“１”を出
力し続ける。これによって、オアゲート２４の出力及び
アンドゲート２５の出力が“１”となり、セレクタ２６
は拡張ＥＣＣ回路１４ａの出力データを“０”＋ＥＣＣ
とし、データ転送要求の完了まで出力する。At time T3, the error holding register 2
3 is the value "1" output by the 2-bit error detection signal 201
After that, “1” is continuously output until the completion of the data transfer request. As a result, the output of the OR gate 24 and the output of the AND gate 25 become "1", and the selector 26
Output data of the extended ECC circuit 14a is "0" + ECC
And output until the completion of the data transfer request.

【００７３】このように、自ノードの訂正不可能な２ビ
ットエラーデータを、ＥＣＣを合わせたデータ部分が
“０”のデータに置換えることによって、２ビットエラ
ーを消去することができるので、自ノードの障害を他の
ノードに伝搬させないことができる。As described above, the 2-bit error can be erased by replacing the uncorrectable 2-bit error data of the own node with the data whose ECC combined data portion is "0". Node faults can not be propagated to other nodes.

【００７４】また、ノード間通信データにデータの正当
性を確認するサムを付加し、受信時にサムによるデータ
のチェックを行うことによって、自ノードの障害に起因
する他のノードでのデータ破壊等の不正動作を防止する
ことができる。Further, a sum for confirming the validity of the data is added to the inter-node communication data, and the data is checked by the sum at the time of reception to prevent data destruction in other nodes due to the failure of the own node. Unauthorized operation can be prevented.

【００７５】さらに、分散型共有メモリ方式をベースに
したノード間通信方式はデータ転送が高速であるという
長所と、メモリの２ビットエラーのような訂正不可能な
障害を伝搬しやすいという短所とを合わせ持つが、２ビ
ットエラーデータの置き換え及びノード間通信データの
正当性のチェックを併用することによって、長所をその
ままとし、短所だけを排除することができるので、ノー
ド間の高速通信を維持したまま、高信頼クラスタ・シス
テムを構築することができる。Further, the inter-node communication method based on the distributed shared memory method has an advantage that data transfer is fast and a disadvantage that an uncorrectable fault such as a 2-bit error in the memory is easily propagated. However, by combining 2-bit error data replacement and checking the validity of inter-node communication data, the advantages can be retained and only the disadvantages can be eliminated, so that high-speed communication between nodes can be maintained. A high-reliability cluster system can be built.

【００７６】[0076]

【発明の効果】以上説明したように本発明によれば、共
有メモリを用いて複数のノード間の通信を行うクラスタ
構成の情報処理システムにおいて、複数のノード間で通
信されるデータに訂正不可能な障害が発生した時に当該
障害データのデータ受取り側のノードへの伝搬を抑止
し、複数のノード間の通信で受信したデータが不正なデ
ータであることをチェックすることによって、自ノード
の障害を他ノードに伝搬させることなく、自ノードの障
害に起因する他ノードでのデータ破壊等の不正動作を防
止することができ、ノード間の高速通信を維持したまま
高信頼クラスタ・システムを構築することができるとい
う効果がある。As described above, according to the present invention, in an information processing system having a cluster configuration for performing communication between a plurality of nodes using a shared memory, data that is communicated between a plurality of nodes cannot be corrected. In the event of such a failure, the failure of the own node is checked by suppressing the propagation of the failure data to the data receiving node and checking that the data received by the communication between multiple nodes is invalid. Without propagating to other nodes, it is possible to prevent illegal operation such as data destruction in other nodes due to failure of its own node, and build a highly reliable cluster system while maintaining high-speed communication between nodes. There is an effect that can be.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の一実施例による情報処理システムの構
成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an information processing system according to an embodiment of the present invention.

【図２】図１の拡張ＥＣＣ回路の構成例を示すブロック
図である。FIG. 2 is a block diagram showing a configuration example of an extended ECC circuit of FIG.

【図３】本発明の一実施例による情報処理システムの動
作を示すタイミングチャートである。FIG. 3 is a timing chart showing the operation of the information processing system according to the embodiment of the present invention.

【図４】図２に示す拡張ＥＣＣ回路の動作を示すタイミ
ングチャートである。FIG. 4 is a timing chart showing an operation of the extended ECC circuit shown in FIG.

【図５】従来の情報処理システムの構成を示すブロック
図である。FIG. 5 is a block diagram showing a configuration of a conventional information processing system.

【符号の説明】[Explanation of symbols]

１第１のノード２第２のノード３第３のノード４第４のノード５クロスバ・スイッチ１１−１〜１１−ｎＭＰＵ１３ＩＯ制御部１４システム制御部１４ａ拡張ＥＣＣ回路１５メモリ制御部１６メモリ部１６ａ固有メモリ空間１６ｂ共有メモリ空間１７，３１クラスタ・ドライバ１７ａ，３１ａサム付加機能１７ｂ，３１ｂサムチェック機能２０入力レジスタ２１ＥＣＣエラー検出回路２２ＣＲＣＴ回路２３エラー保持レジスタ２４オアゲート２５アンドゲート２６セレクタ 1 first node 2 Second node 3 Third node 4th node 5 Crossbar switch 11-1 to 11-n MPU 13 IO controller 14 System control unit 14a Extended ECC circuit 15 Memory controller 16 memory section 16a Unique memory space 16b shared memory space 17,31 Cluster driver 17a, 31a Sum addition function 17b, 31b Sum check function 20 input registers 21 ECC error detection circuit 22 CRCT circuit 23 Error holding register 24 or gate 25 AND GATE 26 Selector

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04L 29/14 G06F 13/00 353 G06F 15/177 678 Continuation of front page (58) Fields surveyed (Int.Cl. ⁷ , DB name) H04L 29/14 G06F 13/00 353 G06F 15/177 678

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】共有メモリを用いて複数のノード間の通
信を行うクラスタ構成の情報処理システムであって、前
記複数のノード間の通信で受信したデータが不正なデー
タであることをチェックするチェック手段と、前記複数
のノード間で通信されるデータに訂正不可能な障害が発
生した時に当該障害データを前記チェック手段で前記不
正なデータであることが検出されるデータに置き換えて
当該障害データのデータ受取り側のノードへの伝搬を抑
止する抑止手段とを前記複数のノード各々に有すること
を特徴とする情報処理システム。1. A data processing system in the cluster configuration for communication among a plurality of nodes using a shared memory, before
The data received by communication between multiple nodes is invalid.
Checking means for checking that the
Uncorrectable failures occur in the data communicated between
When it occurs, the failure data is checked by the checking means.
Replace with data that is detected as positive data
Suppress the propagation of the faulty data to the data receiving node
An information processing system, characterized in that it has stop means for stopping each of the plurality of nodes.

【請求項２】前記抑止手段は、前記訂正不可能な障害
が発生した時に前記障害データを前記チェック手段で前
記不正なデータであることが検出されかつ予め設定され
た固定値に置き換えて出力することを特徴とする請求項
１記載の情報処理システム。2. The deterrence means detects the fault data by the checking means when the uncorrectable fault occurs.
Incorrect data is detected and preset
The information processing system according to claim 1, wherein substituting the fixed value and outputs a.

【請求項３】前記チェック手段は、前記共有メモリに
通信データを書込む時にチェックサムデータを付加する
手段と、他のノードの共有メモリから通信データを読出
す際にチェックサムによって当該データの誤りを検出す
る手段とを含むことを特徴とする請求項１または請求項
２記載の情報処理システム。3. The check means adds checksum data when writing communication data to the shared memory, and an error of the data due to the checksum when reading communication data from the shared memory of another node. The information processing system according to claim 1 or 2, further comprising:

【請求項４】クロスバ・スイッチを介して共有メモリ
による複数のノード間の通信を、自ノードの共有メモリ
に通信データを書込みかつその通信データを他のノード
が当該共有メモリから読出すことで行うクラスタ構成化
された情報処理システムであって、前記複数のノード間
の通信によって受信したデータが不正なデータであるこ
とをチェックするチェック手段と、前記複数のノード間
で通信されるデータに訂正不可能な障害が発生した時に
当該障害データを前記チェック手段で前記不正なデータ
であることが検出されるデータに置き換えて当該障害デ
ータのデータ受取り側のノードへの伝搬を抑止する抑止
手段とを前記複数のノード各々に有することを特徴とす
る情報処理システム。4. A communication between a plurality of nodes by a shared memory via a crossbar switch is performed by writing communication data to a shared memory of the own node and reading the communication data from the shared memory by another node. An information processing system having a cluster structure, comprising:
If the data received by the communication is invalid data
Between the plurality of nodes and checking means for checking
When an uncorrectable failure occurs in the data transmitted by
The fault data is confirmed by the checking means as the illegal data.
Replaced with the data detected as
Suppression that suppresses the propagation of data to the node that receives the data
The information processing system characterized by having means to said plurality of nodes each.

【請求項５】前記抑止手段は、前記訂正不可能な障害
が発生した時に前記障害データを前記チェック手段で前
記不正なデータであることが検出されかつ予め設定され
た固定値に置き換えて出力することを特徴とする請求項
４記載の情報処理システム。 5. The deterrence means detects the fault data by the checking means when the uncorrectable fault occurs.
Incorrect data is detected and preset
5. The information processing system according to claim 4, wherein the information processing system outputs the information after replacing it with a fixed value .

【請求項６】前記チェック手段は、前記共有メモリに
通信データを書込む時にチェックサムデータを付加する
手段と、他のノードの共有メモリから通信データを読出
す際にチェックサムによって当該データの誤りを検出す
る手段とを含むことを特徴とする請求項４または請求項
５記載の情報処理システム。 6. The checking means adds checksum data when writing communication data to the shared memory, and an error in the data due to the checksum when reading communication data from the shared memory of another node. The information processing system according to claim 4 or 5, further comprising: means for detecting.

【請求項７】共有メモリを用いて複数のノード間の通
信を行うクラスタ構成の情報処理システムの障害処理方
式であって、前記複数のノード各々において、前記複数
のノード間で通信されるデータに訂正不可能な障害が発
生した時に当該障害データを不正なデータであることが
検出されるデータに置き換えて当該障害データのデータ
受取り側のノードへの伝搬を抑止し、前記複数のノード
間の通信で受信したデータが前記不正なデータであるこ
とをチェックすることを特徴とする情報処理システムの
障害処理方式。7. A failure processing method for an information processing system having a cluster configuration for performing communication between a plurality of nodes using a shared memory, wherein:
Uncorrectable failures occur in the data communicated between
If the fault data is incorrect,
Replace the detected data with the fault data
Prohibits the propagation to the receiving node,
If the data received by the communication between the
A failure processing method for an information processing system, characterized by checking and .

【請求項８】前記訂正不可能な障害が発生した時に前
記障害データを前記不正なデータであることが検出され
かつ予め設定された固定値に置き換えて出力すること
で、前記障害データのデータ受取り側のノードへの伝搬
を抑止することを特徴とする請求項７記載の情報処理シ
ステムの障害処理方式。8. Before the uncorrectable failure occurs
DataDetected to be malicious data
And a preset fixed valueReplace with and outputWhat to do
, The propagation of the fault data to the data receiving node
SuppressWhat to doThe information processing system according to claim 7, wherein
Stem fault handling method.

【請求項９】前記共有メモリに通信データを書込む時
にチェックサムデータを付加し、他のノードの共有メモ
リから通信データを読出す際にチェックサムによって当
該データの誤りを検出することで、前記不正なデータで
あることをチェックすることを特徴とする請求項７また
は請求項８記載の情報処理システムの障害処理方式。9. The checksum data is added when the communication data is written to the shared memory, and the error of the data is detected by the checksum when the communication data is read from the shared memory of another node. failure processing method of an information processing system according to claim 7 or claim 8 wherein checking that an invalid data.

【請求項１０】クロスバ・スイッチを介して共有メモ
リによる複数のノード間の通信を、自ノードの共有メモ
リに通信データを書込みかつその通信データを他のノー
ドが当該共有メモリから読出すことで行うクラスタ構成
化された情報処理システムの障害処理方式であって、前
記複数のノード各々において、前記複数のノード間で通
信されるデータに訂正不可能な障害が発生した時に当該
障害データを不正なデータであることが検出されるデー
タに置き換えて当該障害データのデータ受取り側のノー
ドへの伝搬を抑止し、前記複数のノード間の通信で受信
したデータが前記不正なデータであることをチェックす
ることを特徴とする情報処理システムの障害処理方式。10. A communication between a plurality of nodes by a shared memory via a crossbar switch is performed by writing communication data in a shared memory of the own node and reading the communication data from the shared memory by another node. A failure handling method for a clustered information processing system, wherein communication is performed between the plurality of nodes at each of the plurality of nodes.
When an uncorrectable failure occurs in the trusted data
The fault data is the data that is detected as invalid data.
Data on the receiving side of the fault data
Communication to multiple nodes and receive it by communication between the multiple nodes
Check that the created data is the illegal data
Failure processing method of an information processing system according to claim Rukoto.

【請求項１１】前記訂正不可能な障害が発生した時に
前記障害データを前記不正なデータであることが検出さ
れかつ予め設定された固定値に置き換えて出力すること
で、前記障害データのデータ受取り側のノードへの伝搬
を抑止することことを特徴とする請求項１０記載の情報
処理システムの障害処理方式。11. The fault data is detected as the illegal data when the uncorrectable fault occurs.
The information processing system according to claim 10, wherein propagation of the fault data to a node on a data receiving side is suppressed by replacing the fault data with a preset fixed value and outputting the fault data. Fault handling method.

【請求項１２】前記共有メモリに通信データを書込む
時にチェックサムデータを付加し、他のノードの共有メ
モリから通信データを読出す際にチェックサムによって
当該データの誤りを検出することで、前記不正なデータ
であることをチェックすることを特徴とする請求項１０
または請求項１１記載の情報処理システムの障害処理方
式。12. The checksum data is added when writing the communication data to the shared memory, and the error of the data is detected by the checksum when the communication data is read from the shared memory of another node. claim, characterized in that to check that it is the incorrect data 10
Alternatively, the failure processing method of the information processing system according to claim 11.