JP2000010949A

JP2000010949A - Relay type decentralized health check control system and method

Info

Publication number: JP2000010949A
Application number: JP10173178A
Authority: JP
Inventors: Takeo Sakakibara; 健夫榊原
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-06-19
Filing date: 1998-06-19
Publication date: 2000-01-14

Abstract

PROBLEM TO BE SOLVED: To prevent the load on a monitor computer from increasing even when computers connected to a network increase in number by decentralizing a transmitting process for a health check control message to plural computers and using only one check machine, and sending only a notice message to a monitor master. SOLUTION: The transmitting process for the health check control message is decentralized to plural computers connected to a network and only one check machine is defined; and only the notice message such as a fault notice is sent to the monitor master. For example, computers 100-1 to 100-5 are connected to the network 101 and the computer 100-1 is used as the monitor master A. To a computer having sent a notice of actuation, the instruction is male that a health check, on a computer having sent a notice of actuation right before it, is started. Then the monitor master A itself makes a health check on the computer having sent a notice of actuation most recently.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数のコンピュー
タが接続されたネットワークの障害監視システムに関
し、特に、監視コンピュータの負荷の低減しネットワー
クの有効利用を図るリレー型分散ヘルスチェック制御シ
ステム及び方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a fault monitoring system for a network to which a plurality of computers are connected, and more particularly to a relay type distributed health check control system and method for reducing the load on a monitoring computer and effectively using the network. .

【０００２】[0002]

【従来の技術】従来、複数のコンピュータがネットワー
ク接続された分散システム等において、通信先の装置が
通信可能な状態か否かを調べるヘルスチェックを行うこ
とにより障害監視を行う場合、監視サーバを備え、監視
サーバからネットワーク上の複数のコンピュータに対し
て、一台ごとにヘルスチェック用の制御電文を送信する
ことによって行っている。2. Description of the Related Art Conventionally, in a distributed system or the like in which a plurality of computers are connected to a network, a monitoring server is provided for monitoring a failure by performing a health check for checking whether a communication destination device is in a communicable state. This is performed by transmitting a control message for health check from the monitoring server to a plurality of computers on the network.

【０００３】このため、ネットワーク上に接続するコン
ピュータの台数が増大すると、その分、ヘルスチェック
対象が増え、監視サーバの処理負荷が増大することにな
る。For this reason, as the number of computers connected to the network increases, the number of health check targets increases accordingly, and the processing load on the monitoring server increases.

【０００４】そして、ネットワークトラフィックも、こ
の従来の方式では、監視サーバ地点に電文が集中する為
に、ネットワークに接続するコンピュータ台数の増加に
伴い通信性能の劣化を招くことになる。[0004] In the conventional method, since the telegrams are concentrated at the monitoring server point in the conventional system, the communication performance is deteriorated as the number of computers connected to the network increases.

【０００５】[0005]

【発明が解決しようとする課題】上記したように、従来
のネットワークシステムにおいては、下記記載の問題点
を有している。As described above, the conventional network system has the following problems.

【０００６】第１の問題点は、ネットワーク上に接続す
るコンピュータの台数が増加に伴い、ヘルスチェック対
象が増え、監視サーバの処理負荷が増大し、診断時間も
長大化する、ということである。The first problem is that as the number of computers connected on the network increases, the number of health check targets increases, the processing load on the monitoring server increases, and the diagnosis time increases.

【０００７】第２の問題点は、監視サーバ地点に電文が
集中する為に、ネットワークに接続するコンピュータ台
数の増加に伴いネットワークの転送性能が低下する、と
いうことである。[0007] The second problem is that since messages are concentrated at the monitoring server point, the transfer performance of the network is reduced as the number of computers connected to the network increases.

【０００８】したがって、本発明は、上記問題点に鑑み
てなされたものであって、その目的は、ネットワーク上
に接続するコンピュータが増大しても監視コンピュータ
の負荷が増大することなく、且つネットワーク性能の低
下を回避するヘルスチェック制御システム及び方法を提
供することにある。Accordingly, the present invention has been made in view of the above problems, and has as its object to increase the load on the monitoring computer even if the number of computers connected to the network increases, and to improve the network performance. It is an object of the present invention to provide a health check control system and a method for avoiding the deterioration of the health check.

【０００９】[0009]

【課題を解決するための手段】前記目的を達成するた
め、本発明は、ネットワーク上に接続された複数のコン
ピュータのうち、少なくとも一を監視マスタとし、これ
以外のコンピュータは起動時に前記監視マスタに対して
起動通知を行い、前記監視マスタは、前記起動通知を行
った第一のコンピュータに対して前記監視マスタに向け
てヘルスチェックを行うように指示するととともに前記
監視マスタが、前記第一のコンピュータに向けてヘルス
チェックを行い、前記監視マスタは、次に起動通知を行
った第二のコンピュータに対して、前記第一のコンピュ
ータに向けてヘルスチェックを行うように指示するとと
もに、前記監視マスタは前記第二のコンピュータに向け
てヘルスチェックを行うように切替え、同様にして、起
動通知を行った順番に第ｎコンピュータに対して第ｎ−
１のコンピュータに向けてヘルスチェックを行うように
指示するとともに、前記監視マスタは前記第ｎのコンピ
ュータに向けてヘルスチェックを行うように切替制御す
ることを特徴とする。In order to achieve the above object, according to the present invention, at least one of a plurality of computers connected on a network is set as a monitoring master, and the other computers are connected to the monitoring master at startup. The monitoring master instructs the first computer that has issued the startup notification to perform a health check toward the monitoring master, and the monitoring master issues the startup notification to the first computer. A health check is performed for the monitoring master, and the monitoring master instructs the second computer that has performed the startup notification next to perform a health check toward the first computer, and the monitoring master Switch to perform a health check toward the second computer, and in the same manner, the order in which the The relative n-th computer n-
The monitoring master is instructed to perform a health check to one computer, and the monitoring master performs switching control to perform a health check to the nth computer.

【００１０】[0010]

【発明の実施の形態】本発明の実施について以下に説明
する。本発明は、その好ましい実施の形態において、ネ
ットワーク上に接続された複数のコンピュータのうち、
少なくとも一を監視マスタ（図１のＡ）とし、これ以外
のコンピュータ（図１のＢ〜Ｅ）は起動時にこの監視マ
スタに対して起動通知を行い、監視マスタは、最初に起
動通知を行った第一のコンピュータ（図１のＢ）に対し
て監視マスタに向けてヘルスチェックを行うように指示
するととともに監視マスタが、第一のコンピュータ（図
１のＢ）に向けてヘルスチェックを行い、監視マスタ
は、次に起動通知を行った第二のコンピュータ（図２の
Ｃ）に対して、第一のコンピュータ（図１のＢ）に向け
てヘルスチェックを行うように指示するとともに、監視
マスタは第二のコンピュータ（図２のＣ）に向けてヘル
スチェックを行うように切替え、同様にして、起動通知
を行った第ｎコンピュータに対して第ｎ−１のコンピュ
ータに向けてヘルスチェックを行うように指示するとと
もに、監視マスタは第ｎのコンピュータに向けてヘルス
チェックを行うように切替制御する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below. The present invention, in a preferred embodiment thereof, among a plurality of computers connected on a network,
At least one is a monitoring master (A in FIG. 1), and the other computers (B to E in FIG. 1) notify the monitoring master at the time of startup, and the monitoring master first notifies the monitoring master. The first computer (B in FIG. 1) is instructed to perform a health check to the monitoring master, and the monitoring master performs a health check to the first computer (B in FIG. 1) and performs monitoring. The master instructs the second computer (C in FIG. 2) that has performed the next startup notification to perform a health check toward the first computer (B in FIG. 1). The health check is switched to the second computer (C in FIG. 2), and similarly, the health of the n-th computer that has performed the start-up notification is changed to the (n−1) -th computer. It instructs to perform Ekku, monitoring the master controls switching so as to perform a health check toward the computer of the n.

【００１１】監視マスタは、各コンピュータのヘルスチ
ェック先、及びチェック先のコンピュータの状態を記録
する管理用テーブル（図１の１０２）を備える。The monitoring master has a management table (102 in FIG. 1) for recording the health check destination of each computer and the status of the computer at the check destination.

【００１２】ネットワーク上の複数コンピュータの通信
障害を検出するために、一定間隔でヘルスチェックを行
う際、そのＬＡＮ（ローカルエリアネットワーク）やＷ
ＡＮ（ワイドエリアネットワーク）を構成するコンピュ
ータの台数が幾ら増えても、各々のコンピュータがそれ
ぞれ他のコンピュータに対してヘルスチェックを行うこ
とにより、従来のように特定の監視サーバからのみヘル
スチェックを行う方式より監視サーバの処理負荷が増大
せず、またヘルスチェック用の制御電文が特定箇所に集
中する事がない。When a health check is performed at regular intervals in order to detect a communication failure of a plurality of computers on a network, a LAN (Local Area Network) or W
Even if the number of computers configuring an AN (Wide Area Network) increases, each computer performs a health check on each of the other computers, thereby performing a health check only from a specific monitoring server as in the related art. The processing load on the monitoring server does not increase as compared with the method, and the control message for health check does not concentrate on a specific location.

【００１３】また、本発明の実施の形態においては、複
数のコンピュータうちあるコンピュータに障害発生時、
障害コンピュータ（図４のＣ）に向けてヘルスチェック
を行っていたコンピュータ（図４のＤ）が障害を検出
し、ネットワーク切断通知を監視マスタに送信し、切断
通知を受けた前記監視マスタは、障害コンピュータ（図
４のＣ）が行っていたヘルスチェック先のコンピュータ
を検索し、切断通知を発行したコンピュータ（図４の
Ｄ）に対して、障害コンピュータに代替して、ヘルスチ
ェック先のコンピュータに向けてヘルスチェックを行う
ようにヘルスチェック先変更指示を発行するように構成
してもよい。Further, in the embodiment of the present invention, when a failure occurs in one of the plurality of computers,
The computer (D in FIG. 4) that has been performing a health check on the failed computer (C in FIG. 4) detects the failure, transmits a network disconnection notification to the monitoring master, and the monitoring master that has received the disconnection notification, The health check destination computer searched by the failed computer (C in FIG. 4) is searched, and the computer (D in FIG. 4) that has issued the disconnection notification is replaced with the failed computer. It may be configured to issue a health check destination change instruction so as to perform a health check toward the user.

【００１４】また本発明の実施の形態においては、監視
マスタ（図６のＡ）が、各コンピュータのヘルスチェッ
ク先、及びチェック先のコンピュータの状態を記録する
ための管理用テーブル（図６の１０２）を、監視マスタ
自身に向けてヘルスチェックを行っているコンピュータ
（図６のＢ）に対して転送しておき、監視マスタ上で管
理用テーブルの更新時には、該コンピュータ（図６の
Ｂ）に更新情報の通知を行い、監視マスタが障害時、監
視マスタをヘルスチェックしているコンピュータ（図６
のＢ）は、監視マスタから引き継いだ管理用テーブルを
参照して、監視マスタが行っていたヘルスチェック先の
コンピュータ（図７のＥ）に向けて、ヘルスチェックを
開始するとともに、該コンピュータ（図７のＥ）に向け
て監視マスタの変更通知を送信し、ネットワーク上の各
コンピュータは、ヘルスチェック先のコンピュータに向
けて監視マスタ変更通知をリレー形式で行い、以降コン
ピュータ（図７のＢ）が監視マスタの役割を引継ぐよう
に構成してもよい。In the embodiment of the present invention, the monitoring master (A in FIG. 6) manages a health check destination of each computer and a management table (102 in FIG. 6) for recording the status of the check destination computer. ) Is transferred to the computer (B in FIG. 6) performing a health check toward the monitoring master itself, and when the management table is updated on the monitoring master, the computer (B in FIG. 6) is transferred to the computer. The computer that notifies update information and performs a health check on the monitoring master when the monitoring master fails (see FIG. 6)
B) refers to the management table inherited from the monitoring master, starts a health check toward the health check destination computer (E in FIG. 7) performed by the monitoring master, and starts the health check. 7E), the computer on the network sends a monitoring master change notification to the health check destination computer in a relay format, and thereafter the computer (B in FIG. 7) It may be configured to take over the role of the monitoring master.

【００１５】本発明によれば、ットワークを構成するコ
ンピュータの台数が増大しても、監視モニタの障害検出
性能が低下することなく、ヘルスチェック監視する事が
可能となる。According to the present invention, even if the number of computers constituting a network increases, health check monitoring can be performed without lowering the fault detection performance of the monitor.

【００１６】[0016]

【実施例】上記した本発明の実施の形態についてさらに
詳細に説明すべく、本発明の実施例について図面を参照
して以下に説明する。図１乃至図８は、本発明の一実施
例を説明するための図である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of the present invention; 1 to 8 are diagrams for explaining an embodiment of the present invention.

【００１７】図１を参照すると、ネットワーク１０１上
に複数のコンピュータ１００−１〜１００−５が接続さ
れており、このうち予め定められた所定のコンピュータ
（図では１００−１）を監視モニタ用の監視マスタＡと
する。Referring to FIG. 1, a plurality of computers 100-1 to 100-5 are connected on a network 101. Among them, a predetermined computer (100-1 in the figure) is used for monitoring and monitoring. Let it be the monitoring master A.

【００１８】ネットワーク接続された他のコンピュータ
Ｂ乃至Ｅは、起動時に、監視マスタＡに対して起動通知
を行う。The other computers B to E connected to the network notify the monitoring master A at startup at the time of startup.

【００１９】まずコンピュータＢが起動通知１を行って
きた際、監視マスタＡは、コンピュータＢに向けてヘル
スチェックを開始し、コンピュータＢに対して監視マス
タＡに向けてヘルスチェックを始めるようにヘルスチェ
ック開始指示２を行う。そして監視マスタＡに設けた管
理用のハッシュテーブル１０２上にコンピュータＢの状
態を記録する。なお、管理用のハッシュテーブルには、
コンピュータ名とそのハッシュ値（ハッシングした値）
とチェック先のハッシュ値、及び障害情報等が格納され
る。図１を参照すると、管理用のハッシュテーブル１０
２は、コンピュータＡのチェック先がコンピュータＢ
（ハッシュ値は８）、コンピュータＢのチェック先がコ
ンピュータＡ（ハッシュ値は３）であることを示してい
る。First, when the computer B sends the start notification 1, the monitoring master A starts a health check toward the computer B, and starts a health check toward the computer B so as to start a health check toward the monitoring master A. Check start instruction 2 is performed. Then, the status of the computer B is recorded on the management hash table 102 provided in the monitoring master A. The hash table for management contains
Computer name and its hash value (hashing value)
And a hash value of a check destination, failure information, and the like. Referring to FIG. 1, a management hash table 10 is shown.
2 means that computer A checks computer B
(The hash value is 8), indicating that the check destination of the computer B is the computer A (the hash value is 3).

【００２０】次に図２に示すように、コンピュータＣが
起動通知４を行ってきた際、監視マスタＡは、コンピュ
ータＣに対して、コンピュータＢに向けてヘルスチェッ
クを開始するように指示５を送信するとともに、監視マ
スタＡは、コンピュータＣに向けてヘルスチェックを行
うように切替える。図２を参照すると、管理用のハッシ
ュテーブルは、コンピュータＡのチェック先がコンピュ
ータＣに切り替わり、コンピュータＣのチェック先はコ
ンピュータＢであることを示している。なお、コンピュ
ータＢのチェック先はコンピュータＡのままである。Next, as shown in FIG. 2, when the computer C sends the start notification 4, the monitoring master A gives the computer C an instruction 5 to start a health check toward the computer B. At the same time, the monitoring master A switches to perform a health check toward the computer C. Referring to FIG. 2, the management hash table indicates that the check destination of the computer A is switched to the computer C, and the check destination of the computer C is the computer B. Note that the check destination of the computer B remains the computer A.

【００２１】同様にして、起動通知を行ってきたコンピ
ュータに対しては、一つ前に、起動通知を行ってきたコ
ンピュータに向けて、ヘルスチェックを開始するように
指示し、監視マスタＡ自身は、最も最近に起動通知を行
ってきたコンピュータに向けてのヘルスチェックを行
い、最終的に、通常運用中は、図３に模式的に示すよう
な、リレー型のヘルスチェックを行う。図３を参照する
と、管理用のハッシュテーブル１０２は、コンピュータ
Ａのチェック先がコンピュータＥに切り替わり、コンピ
ュータＥのチェック先はコンピュータＤであることを示
している。Similarly, the computer that has issued the start notification is instructed to immediately start the health check to the computer that has issued the start notification. Then, a health check is performed on the computer that has recently sent the startup notification, and finally, during normal operation, a relay-type health check is performed as schematically shown in FIG. Referring to FIG. 3, the management hash table 102 indicates that the check destination of the computer A is switched to the computer E, and the check destination of the computer E is the computer D.

【００２２】次に図４に示すように、ネットワーク上に
あるコンピュータ（図４ではコンピュータＣ）に、障害
が発生した場合の動作について説明する。Next, the operation when a failure occurs in a computer (computer C in FIG. 4) on the network as shown in FIG. 4 will be described.

【００２３】まずコンピュータＣに向けてヘルスチェッ
クを行っていたコンピュータＤが障害を検出し、ネット
ワーク切断通知６を監視マスタＡに対して行う。First, the computer D, which has been performing a health check for the computer C, detects a failure and sends a network disconnection notification 6 to the monitoring master A.

【００２４】この通知を受けて監視マスタＡは、コンピ
ュータＣが障害状態にあることを認識し、管理用のハッ
シュテーブルを参照して、コンピュータＣを障害状態に
設定し（図４の管理用ハッシュテーブル１０２では
「×」印で示す）、同時に、今までコンピュータＣが行
っていたヘルスチェック先を検索し、この場合、コンピ
ュータＢに向けてヘルスチェックを行っていたため、切
断通知を行ってきたコンピュータＤに対して、以後、コ
ンピュータＢに向けて、ヘルスチェックを行うようにヘ
ルスチェック先変更指示７を行う。In response to this notification, the monitoring master A recognizes that the computer C is in the failure state, sets the computer C in the failure state with reference to the management hash table (see the management hash table in FIG. 4). At the same time, the computer C searches for a health check destination that has been performed by the computer C, and in this case, performs a health check toward the computer B. Thereafter, a health check destination change instruction 7 is issued to the computer B so as to perform a health check.

【００２５】その後、図５に示すように、コンピュータ
Ｃが復旧した場合には、コンピュータＣはまず起動完了
通知８を監視マスタＡに対して行う。監視マスタＡは、
管理用のハッシュテーブル１０２を更新し、コンピュー
タＣに対して、現在監視マスタＡが現在ヘルスチェック
を行っているコンピュータＥに対してヘルスチェックを
行うように指示９を発行し、監視マスタＡ自身は、ヘル
スチェック先をコンピュータＥから、障害復旧後起動し
たコンピュータＣへ切り替える。これは、先の新たにコ
ンピュータが起動してきた動作と同じとなる。Thereafter, as shown in FIG. 5, when the computer C recovers, the computer C first sends a startup completion notification 8 to the monitoring master A. Monitoring master A
The management hash table 102 is updated, and the computer 9 issues an instruction 9 to the computer C to perform a health check on the computer E on which the monitoring master A is currently performing a health check. Then, the health check destination is switched from the computer E to the computer C activated after the failure recovery. This is the same operation as when the computer is newly started.

【００２６】次に、監視マスタＡ自信の障害を考慮した
動作を説明する。Next, an operation in consideration of the failure of the monitoring master A itself will be described.

【００２７】図６に示すように、監視マスタＡは先に説
明した管理用のハッシュテーブル１０２を監視マスタＡ
自身に向けてヘルスチェックを行っているコンピュータ
Ｂ、すなわち監視マスタＡに対して一番先に起動通知を
行ってきたコンピュータに対して、転送しておき（図６
の１０）、コンピュータＢでも管理用のハッシュテーブ
ルを保持し、監視マスタＡでの管理用のハッシュテーブ
ルの更新時には、その都度コンピュータＢに対して、更
新情報の通知を行うことで、整合性を保つ。As shown in FIG. 6, the monitoring master A stores the management hash table 102 described above in the monitoring master A.
It is forwarded to the computer B that is performing a health check toward itself, that is, the computer that has first sent a startup notification to the monitoring master A (FIG. 6).
10), the computer B also retains the management hash table, and when the monitoring master A updates the management hash table, the computer B notifies the computer B of the update information each time, thereby ensuring consistency. keep.

【００２８】そして、監視マスタＡが障害になった場合
には、図７に示すように、監視マスタＡをヘルスチェッ
クしているコンピュータＢが監視マスタＡの障害を検出
できるので、これ以降、コンピュータＢが監視マスタと
しての役割を果たすことになる。When the monitoring master A fails, as shown in FIG. 7, the computer B which is performing a health check on the monitoring master A can detect the failure of the monitoring master A. B will play the role of monitoring master.

【００２９】まずコンピュータＢは、監視マスタＡから
引き継いだ管理用のハッシュテーブルを参照して（図７
の１１）、監視マスタＡが行っていたヘルスチェック先
のコンピュータＥに向けて、ヘルスチェックを開始する
とともに、コンピュータＥに向けて監視マスタの変更通
知１２を送信する。その際、この監視マスタ変更通知
は、一斉通知として扱われ、ネットワーク上の各コンピ
ュータは、ヘルスチェック先のコンピュータに向けて監
視マスタ変更通知１３をリレー形式で行う。First, the computer B refers to the management hash table inherited from the monitoring master A (FIG. 7).
11), the health check is started to the health check destination computer E that the monitoring master A has been performing, and the monitoring master change notification 12 is transmitted to the computer E. At this time, this monitoring master change notification is treated as a simultaneous notification, and each computer on the network issues a monitoring master change notification 13 in a relay format to the health check destination computer.

【００３０】またコンピュータＢは管理用ハッシュテー
ブルより自分自身に向けてヘルスチェックを行っている
コンピュータ（図７ではコンピュータＣ）に対して、こ
の管理用ハッシュテーブルを転送し、これ以降、テーブ
ルの更新の都度をコンピュータＣに更新情報を通知し、
これを受けたコンピュータＣで管理用ハッシュテーブル
の更新を行う。このようにして、たとえ監視マスタが障
害になっても、状態管理は次々のコンピュータへ引き継
がれる事となる。The computer B transfers the management hash table to the computer (computer C in FIG. 7) that is performing a health check on itself from the management hash table, and thereafter updates the table. Notify computer C of update information each time
The computer C receiving this updates the management hash table. In this way, even if the monitoring master fails, state management is taken over by successive computers.

【００３１】また、監視マスタＡが障害時にネットワー
クに接続していなかったコンピュータは、監視マスタ変
更を認識していない場合があるので、起動時、変更後の
監視マスタへの起動通知が失敗する可能性がある。Also, the computer that was not connected to the network at the time of the failure of the monitoring master A may not recognize the change of the monitoring master, so that at the time of startup, the startup notification to the monitoring master after the change may fail. There is.

【００３２】そこで、本発明の一実施例では、このよう
な場合、まず監視マスタＡに対して起動完了通知を行っ
て失敗した後は、ブロードキャスト通知を行い監視マス
タを引き継いでいるコンピュータを探し出す。そして、
監視マスタを引き継いだコンピュータから、監視マスタ
変更通知とヘルスチェック先の指示を受ける事となる。Therefore, in one embodiment of the present invention, in such a case, after the start-up completion notification is given to the monitoring master A and the monitoring master A fails, a broadcast notification is made and a computer taking over the monitoring master is searched for. And
From the computer that has taken over the monitoring master, it will receive a monitoring master change notification and a health check destination instruction.

【００３３】次に、監視マスタＡが復旧した場合には、
図８に示すように、まず障害前に管理用テーブルを引き
継いでいたコンピュータＢに対して、監視マスタ復旧通
知１４を行う。Next, when the monitoring master A recovers,
As shown in FIG. 8, first, the monitoring master recovery notification 14 is sent to the computer B that has taken over the management table before the failure.

【００３４】コンピュータＢから最新の管理用のハッシ
ュテーブルを受け取り（１５）、監視マスタの変更通知
１６を行う。そしてコンピュータＢに対して監視マスタ
Ａに向けてヘルスチェックをするように変更指示を行
い、監視マスタＡ自身はコンピュータＢが行っていたヘ
ルスチェック先のコンピュータに向けてヘルスチェック
を開始する。The latest management hash table is received from the computer B (15), and a monitoring master change notification 16 is issued. Then, a change instruction is issued to the computer B so as to perform a health check toward the monitoring master A, and the monitoring master A itself starts a health check toward the computer of the health check performed by the computer B.

【００３５】またコンピュータＢが管理用ハッシュテー
ブルを持っていない場合には、次にブロードキャスト通
知を行い監視マスタを引き継いでいるコンピュータを探
しだした後、上記の同様の動作を行う。If the computer B does not have the management hash table, the computer B performs a broadcast notification to search for a computer taking over the monitoring master, and then performs the same operation as described above.

【００３６】[0036]

【発明の効果】以上説明したように、本発明によれば、
ヘルスチェック制御電文の送信処理をネットワーク接続
する複数のコンピュータに分散させ、それぞれチェック
対象マシンを一台のみとし、障害報告等の通知電文のみ
を監視マスタに送信するように構成したことにより、監
視マスタの処理負荷の増大を抑止し、ネットワーク負荷
に対して影響の増大を抑止してヘルスチェック監視を行
うことができる、という効果を奏する。As described above, according to the present invention,
The transmission process of the health check control message is distributed to multiple computers connected to the network, and only one machine is checked, and only the notification message such as a failure report is sent to the monitoring master. In this case, the health check monitoring can be performed while suppressing the increase in the processing load and suppressing the increase in the influence on the network load.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の一実施例を説明するための図である。FIG. 1 is a diagram for explaining an embodiment of the present invention.

【図２】本発明の一実施例を説明するための図である。FIG. 2 is a diagram for explaining an embodiment of the present invention.

【図３】本発明の一実施例を説明するための図である。FIG. 3 is a diagram for explaining an embodiment of the present invention.

【図４】本発明の一実施例を説明するための図である。FIG. 4 is a diagram for explaining one embodiment of the present invention.

【図５】本発明の一実施例を説明するための図である。FIG. 5 is a diagram for explaining one embodiment of the present invention.

【図６】本発明の一実施例を説明するための図である。FIG. 6 is a diagram for explaining one embodiment of the present invention.

【図７】本発明の一実施例を説明するための図である。FIG. 7 is a diagram for explaining an embodiment of the present invention.

【図８】本発明の一実施例を説明するための図である。FIG. 8 is a diagram for explaining one embodiment of the present invention.

【符号の説明】[Explanation of symbols]

１、４、８起動通知２、５ヘルスチェック開始指示６切断通知７、９ヘルスチェック先変更指示１１管理テーブルの参照更新１２、１３、１６サーバ変更通知１４復旧通知１５管理テーブルの転送Ａ、Ｅコンピュータ１００−1〜１００−５コンピュータ（Ａ〜Ｅ）１０１ネットワーク１０２管理用ハッシュテーブル 1, 4, 8 Startup notification 2, 5 Health check start instruction 6 Disconnection notification 7, 9 Health check destination change instruction 11 Management table reference update 12, 13, 16 Server change notification 14 Recovery notification 15 Management table transfer A, E Computers 100-1 to 100-5 Computers (A to E) 101 Network 102 Management Hash Table

Claims

【特許請求の範囲】[Claims]

【請求項１】ネットワーク上に接続された複数のコンピ
ュータのうち、少なくとも一を監視マスタとし、これ以
外のコンピュータは起動時に前記監視マスタに対して起
動通知を行い、前記監視マスタは、前記起動通知を行った第一のコンピ
ュータに対して前記監視マスタに向けてヘルスチェック
を行うように指示するととともに前記監視マスタが、前
記第一のコンピュータに向けてヘルスチェックを行い、前記監視マスタは、次に起動通知を行った第二のコンピ
ュータに対して、前記第一のコンピュータに向けてヘル
スチェックを行うように指示するとともに、前記監視マ
スタは前記第二のコンピュータに向けてヘルスチェック
を行うように切替え、同様にして、起動通知を行った順
に第ｎコンピュータに対して第ｎ−１のコンピュータに
向けてヘルスチェックを行うように指示するとともに、
前記監視マスタは前記第ｎのコンピュータに向けてヘル
スチェックを行うように切替制御する、ことを特徴とす
るヘルスチェック制御方法。At least one of a plurality of computers connected on a network is used as a monitoring master, and at least one of the other computers sends a startup notification to the monitoring master at the time of startup. And instructs the first computer to perform a health check toward the monitoring master, and the monitoring master performs a health check toward the first computer. While instructing the second computer that has issued the startup notification to perform a health check toward the first computer, the monitoring master switches to perform a health check toward the second computer. Similarly, the n-th computer is assigned to the (n-1) -th computer in the order in which the startup notifications are given. Only together with an instruction to perform a health check,
The health check control method, wherein the monitoring master performs switching control to perform a health check toward the n-th computer.

【請求項２】前記監視マスタが、管理用のテーブル上
で、各コンピュータのヘルスチェック先、及びチェック
先のコンピュータの状態を記録管理する、ことを特徴と
する請求項１記載のヘルスチェック制御方法。2. The health check control method according to claim 1, wherein the monitoring master records and manages a health check destination of each computer and a state of the check destination computer on a management table. .

【請求項３】前記複数のコンピュータうち、あるコンピ
ュータに障害発生時、該障害コンピュータに向けてヘル
スチェックを行っていたコンピュータは該障害を検出し
て、ネットワーク切断通知を前記監視マスタに送信し、前記切断通知を受けた前記監視マスタは、前記障害コン
ピュータが行っていたヘルスチェック先のコンピュータ
を検索し、前記切断通知を発行したコンピュータに対し
て、前記障害コンピュータに代替して、前記ヘルスチェ
ック先のコンピュータに向けてヘルスチェックを行うよ
うにヘルスチェック先変更指示を発行する、ことを特徴
とする請求項１記載のヘルスチェック制御方法。3. When a failure occurs in a certain computer among the plurality of computers, a computer that has been performing a health check on the failed computer detects the failure and transmits a network disconnection notification to the monitoring master. Upon receiving the disconnection notification, the monitoring master searches for the health check destination computer performed by the failed computer, and replaces the failed computer with respect to the computer that has issued the disconnection notification. The health check control method according to claim 1, wherein a health check destination change instruction is issued so as to perform a health check toward the computer.

【請求項４】前記障害コンピュータが障害から復旧した
場合、前記コンピュータ（「復旧コンピュータ」とい
う）はまず起動完了通知を前記監視マスタに対して行
い、前記監視マスタは、前記復旧コンピュータに対し
て、前記監視マスタが現在ヘルスチェックを行っている
コンピュータに対してヘルスチェックを行うように指示
を発行し、前記監視マスタは、ヘルスチェック先を、前
記現在ヘルスチェックを行っているコンピュータから、
前記復旧コンピュータへ切り替える、ことを特徴とする
請求項３記載のヘルスチェック制御方法。4. When the failed computer recovers from the failure, the computer (hereinafter referred to as a "recovery computer") first sends a startup completion notification to the monitoring master, and the monitoring master issues a notification to the recovery computer. The monitoring master issues an instruction to perform a health check on a computer that is currently performing a health check, and the monitoring master issues a health check destination from the computer that is currently performing a health check.
4. The health check control method according to claim 3, wherein switching to the recovery computer is performed.

【請求項５】前記監視マスタが、各コンピュータのヘル
スチェック先、及びチェック先のコンピュータの状態を
記録するための管理用テーブルを、前記監視マスタ自身
に向けてヘルスチェックを行っているコンピュータ
（「第一のコンピュータ」という）に対して転送してお
き、前記監視マスタ上で前記管理用テーブルの更新時に
は、前記第一のコンピュータに更新情報の通知を行い、
前記第一のコンピュータは自装置上の前記管理用テーブ
ルを前記更新情報にて更新し、前記監視マスタが障害時、前記監視マスタをヘルスチェ
ックしている前記第一のコンピュータは、前記監視マス
タから引き継いだ管理用テーブルを参照して、前記監視
マスタが行っていたヘルスチェック先のコンピュータに
向けて、ヘルスチェックを開始するとともに、該コンピ
ュータに向けて監視マスタの変更通知を送信し、前記ネ
ットワーク上の各コンピュータは、ヘルスチェック先の
コンピュータに向けて監視マスタ変更通知をリレー形式
で行い、以降前記第一のコンピュータが監視マスタの役
割を引継ぐ、ことを特徴とする請求項１記載のヘルスチ
ェック制御方法。5. A computer which performs a health check on the monitoring master itself, wherein the monitoring master sets a management table for recording the health check destination of each computer and the status of the computer at the check destination. To the first computer "), and when updating the management table on the monitoring master, notifies the first computer of update information,
The first computer updates the management table on its own device with the update information, and when the monitoring master fails, the first computer that is performing a health check on the monitoring master is configured by the monitoring master from the monitoring master. With reference to the inherited management table, a health check is started toward the health check destination computer that the monitoring master was performing, and a monitoring master change notification is transmitted to the computer, and 2. The health check control according to claim 1, wherein each of the computers performs a monitoring master change notification in the form of a relay to the health check destination computer, and thereafter, the first computer takes over the role of the monitoring master. Method.

【請求項６】前記監視マスタを引き継いだコンピュータ
は、前記管理用テーブルより自分自身に向けてヘルスチ
ェックを行っているコンピュータに対して前記管理用テ
ーブルを転送し、これ以降、新たな監視マスタは、前記
管理用テーブルの更新の都度前記コンピュータに更新情
報を通知し、これを受けた前記コンピュータは自身の管
理用テーブルの更新を行うことを特徴とする請求項５記
載のヘルスチェック制御方法。6. The computer that has taken over the monitoring master transfers the management table from the management table to a computer that is performing a health check toward itself, and thereafter, a new monitoring master 6. The health check control method according to claim 5, wherein the update information is notified to the computer each time the management table is updated, and the computer receiving the update information updates the management table.

【請求項７】監視マスタを引き継いだコンピュータに対
してあるコンピュータが起動完了通知を行って失敗した
際、一斉同報通知を行い監視マスタを引き継いでいるコ
ンピュータを探し出し、監視マスタを引き継いだコンピ
ュータから、監視マスタ変更通知とヘルスチェック先の
指示を受ける、ことを特徴とする請求項６記載のヘルス
チェック制御方法。7. When a certain computer issues a start completion notification to a computer that has taken over the monitoring master and fails, a broadcast notification is made and a computer that takes over the monitoring master is searched for, and a computer that takes over the monitoring master is sent from 7. The health check control method according to claim 6, further comprising receiving a monitoring master change notification and a health check destination instruction.

【請求項８】前記障害が発生した監視マスタが復旧した
場合（この監視マスタを「障害復旧マスタ」という）に
は、監視マスタを引き継いでいたコンピュータに対し
て、監視マスタ復旧通知を行い、前記監視マスタを引き継いでいたコンピュータから最新
の管理用テーブルを受け取り、監視マスタの変更通知行
い、監視マスタを引き継いでいたコンピュータに対して
障害復旧監視マスタに向けてヘルスチェックをするよう
に変更指示を行い、障害復旧監視マスタ自身は、前記監
視マスタを引き継いでいたコンピュータが行っていたヘ
ルスチェック先のコンピュータに向けてヘルスチェック
を開始する、ことを特徴とする請求項５記載のヘルスチ
ェック制御方法。8. When the failed monitoring master is recovered (this monitoring master is referred to as a “failure recovery master”), a monitoring master recovery notification is sent to the computer that has taken over the monitoring master. Receives the latest management table from the computer that has taken over the monitoring master, notifies the change of the monitoring master, and instructs the computer that has taken over the monitoring master to perform a health check on the failure recovery monitoring master. 6. The health check control method according to claim 5, wherein the fault recovery monitoring master itself starts a health check toward a health check destination computer that has been performed by the computer taking over the monitoring master.

【請求項９】前記監視マスタを引き継いでいたコンピュ
ータが管理用テーブルを持っていない場合には、一斉同
報通知を行い監視マスタを引き継いでいるコンピュータ
を探しだした後、前記障害復旧監視マスタは、前記コン
ピュータとの間で、管理用テーブルの取得、ヘルスチェ
ックの変更指示及びヘルスチェックの開始を行う、こと
を特徴とする請求項１記載のヘルスチェック制御方法。9. If the computer taking over the monitoring master does not have a management table, a broadcast notification is made to search for a computer taking over the monitoring master. 2. The health check control method according to claim 1, further comprising: acquiring a management table, instructing a change of the health check, and starting the health check with the computer.

【請求項１０】複数のコンピュータがネットワーク接続
されたシステムのヘルスチェック制御システムにおい
て、複数のコンピュータのうち少なくとも一を監視マスタと
し、これ以外のコンピュータは起動時に前記監視マスタ
に対して起動通知を行い、前記監視マスタが、各コンピュータのヘルスチェック
先、及びチェック先のコンピュータの状態を記録管理す
るための管理用テーブルを備え、前記起動通知を行った第一のコンピュータに対して前記
監視マスタに向けてヘルスチェックを行うように指示す
るととともに、前記監視マスタは、前記第一のコンピュ
ータに向けてヘルスチェックを行い、前記監視マスタは、次に起動通知を行った第二のコンピ
ュータに対して、前記第一のコンピュータに向けてヘル
スチェックを行うように指示するとともに、前記監視マ
スタは、前記第二のコンピュータに向けてヘルスチェッ
クを行うように切替え、同様にして、起動通知を行った
順に第ｎコンピュータに対して第ｎ−１のコンピュータ
に向けてヘルスチェックを行うように指示するととも
に、前記監視マスタは前記第ｎのコンピュータに向けて
ヘルスチェックを行うように切替制御する手段を備え
た、ことを特徴とするヘルスチェック制御システム。10. A health check control system for a system in which a plurality of computers are connected to a network, wherein at least one of the plurality of computers is used as a monitoring master, and the other computers notify the monitoring master when the computer starts up. The monitoring master includes a health check destination of each computer, and a management table for recording and managing the status of the check destination computer, and directs the first computer that has issued the startup notification to the monitoring master. Along with instructing to perform a health check with the monitoring master, the monitoring master performs a health check toward the first computer, and the monitoring master next sends a startup notification to the second computer, Instructs the first computer to perform a health check At the same time, the monitoring master switches so as to perform a health check toward the second computer, and similarly performs a health check on the n-th computer toward the (n-1) -th computer in the order in which the startup notification was performed. A health check control system comprising: an instruction to perform a check; and a means for switching the monitoring master to perform a health check toward the n-th computer.

【請求項１１】前記複数のコンピュータうち、あるコン
ピュータに障害発生時、障害コンピュータに向けてヘル
スチェックを行っていたコンピュータが障害を検出し、
ネットワーク切断通知を前記監視マスタに送信し、前記切断通知を受けた前記監視マスタは、障害コンピュ
ータが行っていたヘルスチェック先のコンピュータを検
索し、前記切断通知を発行したコンピュータに対して、
前記障害コンピュータに代替して、ヘルスチェック先の
コンピュータに向けてヘルスチェックを行うようにヘル
スチェック先変更指示を発行する手段を備えた、ことを
特徴とする請求項１０記載のヘルスチェック制御システ
ム。11. When a failure occurs in a certain computer among the plurality of computers, a computer which has been performing a health check for the failed computer detects the failure,
Sending a network disconnection notification to the monitoring master, the monitoring master having received the disconnection notification searches for a health check destination computer performed by the failed computer, and for the computer that issued the disconnection notification,
The health check control system according to claim 10, further comprising: a unit that issues a health check destination change instruction so as to perform a health check toward a health check destination computer instead of the failed computer.

【請求項１２】前記監視マスタが、管理用テーブルを、
前記監視マスタ自身に向けてヘルスチェックを行ってい
るコンピュータ（「第一のコンピュータ」という）に対
して転送しておき、前記監視マスタ上で前記管理用テー
ブルの更新時には、前記第一のコンピュータに更新情報
の通知を行う手段を備え、前記第一のコンピュータでは自信の管理用テーブルを前
記更新情報に基づき更新し、前記監視マスタが障害時、前記監視マスタをヘルスチェ
ックしている前記第一のコンピュータは、前記監視マス
タから引き継いだ管理用テーブルを参照して、前記監視
マスタが行っていたヘルスチェック先のコンピュータに
向けて、ヘルスチェックを開始するとともに、該コンピ
ュータに向けて監視マスタの変更通知を送信し、前記ネ
ットワーク上の各コンピュータは、ヘルスチェック先の
コンピュータに向けて監視マスタ変更通知をリレー形式
で行い、以降前記第一のコンピュータが監視マスタの役
割を引継ぐ、ことを特徴とする請求項１０記載のヘルス
チェック制御システム。12. The monitoring master according to claim 1, wherein said monitoring table includes:
The data is transferred to a computer performing a health check toward the monitoring master itself (referred to as a “first computer”), and when the management table is updated on the monitoring master, the first computer is Means for notifying update information, the first computer updates its own management table based on the update information, and when the monitoring master fails, the first computer checks the health of the monitoring master. The computer refers to the management table inherited from the monitoring master, starts the health check toward the computer of the health check performed by the monitoring master, and notifies the computer of the change of the monitoring master to the computer. And each computer on the network communicates with the health check destination computer. Only by monitors master change notifications relay format, said first computer takes over the role of the monitoring master subsequent health check control system according to claim 10, wherein a.