JP2009003491A

JP2009003491A - Server switching method in cluster system

Info

Publication number: JP2009003491A
Application number: JP2007160846A
Authority: JP
Inventors: Go Takebayashi; 剛武林; Takahiro Ohira; 崇博大平
Original assignee: Hitachi Ltd; Hitachi Information and Control Systems Inc; Hitachi Information and Control Solutions Ltd
Current assignee: Hitachi Ltd; Hitachi Information and Control Systems Inc; Hitachi Information and Control Solutions Ltd
Priority date: 2007-06-19
Filing date: 2007-06-19
Publication date: 2009-01-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide a server switching method in a cluster system for avoiding such a state that a plurality of MAC addresses are made correspond to IP addresses to be provided by a cluster system, and for preventing communication failure in a network even when a server in a stand-by mode makes erroneous judgement, and a switching operation is performed due to inter-server heart beat transmission path abnormality or the like. <P>SOLUTION: When heart beat disruption from a server 1101 in an execution mode is detected in a server 1201 in a stand-by mode configuring a cluster system, an ARP packet in which an MAC address which does not exist on a network is made to correspond to the IP address of a client is transmitted from the server 1201 in the stand-by mode to the server 1101 in the execution mode and the server 1101 is put in a communication-unabled state, and one server can be put into a state where it is isolated from the network. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、クラスタシステムにおけるサーバの切り替え方法に係り、特に、クラスタを構成するサーバが相互に他サーバの状態を正常に認識できない場合に複数のサーバが実行モードとなり、ネットワーク上に同じアドレスが存在することによって発生するシステム障害を防止できるクラスタシステムにおけるサーバ切り替え方法に関する。 The present invention relates to a server switching method in a cluster system, and in particular, when servers constituting a cluster cannot mutually recognize the status of other servers normally, a plurality of servers enter an execution mode, and the same address exists on the network. The present invention relates to a server switching method in a cluster system that can prevent a system failure caused by the operation.

クラスタシステムとは、２台のサーバで構成され、サーバ上でオペレーティングシステム，クラスタソフト，アプリケーションが動作し、サーバ間で共通のＩＰアドレス（以下、仮想ＩＰアドレスと呼ぶ）を排他利用して、ネットワーク上で仮想的に１台のサーバに見せるシステムのことをいう。 A cluster system is composed of two servers, and an operating system, cluster software, and applications run on the server, and a common IP address (hereinafter referred to as a virtual IP address) is used exclusively between the servers. A system that is virtually shown on one server above.

このシステムでは、いずれか一方のサーバは実行モードであり、クライアントコンピュータとクラスタシステムとの間でネットワーク通信を行うための仮想ＩＰアドレスを有効化する。実行モードのサーバが停止した場合、もう１台の待機モードのサーバが同じ仮想ＩＰアドレスを有効化することでサーバの切り替えを行う。この時、同じネットワーク上に存在するクライアントコンピュータに対して、有効化したＩＰアドレスと自サーバのＭＡＣ（Media Access Control address）アドレスとを関連付けたＡＲＰ（Address Resolution Protocol）パケットをブロードキャスト形式で送信することで、クライアントのＡＲＰテーブル（ＩＰアドレスとＭＡＣアドレスの対応表）を更新する。これによってクライアントコンピュータは、仮想ＩＰアドレスを有効化しているサーバが切り替わったことを意識せずに通信を再開することが可能となる。 In this system, one of the servers is in the execution mode, and the virtual IP address for performing network communication between the client computer and the cluster system is validated. When a server in execution mode stops, another server in standby mode switches the server by enabling the same virtual IP address. At this time, an ARP (Address Resolution Protocol) packet in which the activated IP address and the MAC (Media Access Control address) address of the local server are associated with each other is transmitted in broadcast format to a client computer existing on the same network. Then, the ARP table (IP address / MAC address correspondence table) of the client is updated. As a result, the client computer can resume communication without being aware that the server that has enabled the virtual IP address has been switched.

クラスタシステムで他サーバの停止を検出する方法には、サーバ間に伝送経路を設け、相互にハートビートと呼ぶ電文の送受信を行い、一定時間ハートビートの受信が途絶したときに他サーバの停止を検出する方法がある。 To detect the stop of other servers in a cluster system, a transmission path is provided between the servers, and a message called a heartbeat is sent and received between them. When the heartbeat is interrupted for a certain period of time, the other servers are stopped. There is a way to detect.

しかし、サーバ間の伝送経路が異常となった場合、実行モードのサーバが一時的に高負荷状態となり、ハートビートの送信処理が動作不可または遅延した場合は、サーバは停止していないにも関わらず、ハートビートが途絶することで、待機モードのサーバは、他サーバが停止したものと誤った判断をし、実行モードへの切り替え動作を行う。これによって両方のサーバが実行モードとなり、ネットワーク上に同じ仮想ＩＰアドレスに対して複数のＭＡＣアドレスが対応付いた状態となり、クライアントコンピュータとクラスタシステムを構成するサーバが通信を行う際に、電文の欠落や、通信不能などの障害が発生する。 However, if the transmission path between servers becomes abnormal, the servers in the execution mode are temporarily in a heavy load state, and if the heartbeat transmission process is disabled or delayed, the server has not stopped. First, when the heartbeat is interrupted, the server in the standby mode erroneously determines that the other server has stopped, and performs the switching operation to the execution mode. As a result, both servers enter the execution mode, and multiple MAC addresses are associated with the same virtual IP address on the network. When a client computer communicates with the servers that make up the cluster system, a message is missing. Or failure such as communication failure occurs.

このため、〔特許文献１〕に記載のように、クラスタシステムのサーバとは別に、クラスタシステムのデータを共有するための共有ディスクを接続し、共有ディスクの占有権によって排他制御を行い、誤った判断をした場合でも、一方のサーバのみが実行モードとなるようにする方式がある。また、同様なものに、〔特許文献２〕に記載のものがある。 For this reason, as described in [Patent Document 1], a shared disk for sharing data of the cluster system is connected separately from the server of the cluster system, and exclusive control is performed by the exclusive right of the shared disk. There is a method in which only one of the servers is in the execution mode even when the determination is made. A similar one is described in [Patent Document 2].

また、〔特許文献３〕に記載のように、各々のサーバに特殊機器を接続し、ハートビートの受信が途絶したときに、特殊機器を利用して他サーバを強制停止する方式がある。 Further, as described in [Patent Document 3], there is a method in which a special device is connected to each server, and when the heartbeat reception is interrupted, other servers are forcibly stopped using the special device.

特許第３５７３０９２号公報Japanese Patent No. 3573092 特開２００６−２５３９００号公報JP 2006-253900 A 特開２００６−１１９９２号公報JP 2006-11992 A

上述したハートビートの受信が途絶したときにサーバの停止を検出する方式では、サーバ間の伝送経路異常、実行モードのサーバで高負荷状態が発生した際に、相互に他サーバの停止を誤って検出し、待機モードのサーバが切り替え動作を行うことでネットワーク上に同じ仮想ＩＰアドレスに対応付く複数のＭＡＣアドレスが存在する状態となる。この結果、クライアントコンピュータとクラスタシステムを構成するサーバ間で通信障害が発生するという問題がある。 In the above-described method of detecting a server stop when the heartbeat reception is interrupted, when a transmission path error between servers or a high load condition occurs in a server in execution mode, the other servers are erroneously stopped. When the server in the standby mode detects and performs the switching operation, a plurality of MAC addresses corresponding to the same virtual IP address exist on the network. As a result, there is a problem that a communication failure occurs between the client computer and the servers constituting the cluster system.

また、〔特許文献１〕に記載のように、共有ディスクを使用する場合は、複数のサーバが共有ディスクに接続されているということが前提となるため、共有ディスクを使用しないシステムでは適用できないという問題がある。また、共有ディスクを使用した場合、その共有部位での故障によりシステム全体が停止してしまうという問題がある。 Also, as described in [Patent Document 1], when a shared disk is used, it is assumed that a plurality of servers are connected to the shared disk, and therefore cannot be applied to a system that does not use a shared disk. There's a problem. Further, when a shared disk is used, there is a problem that the entire system stops due to a failure at the shared part.

また、〔特許文献２〕に記載の技術では、クラスタシステムを構成するサーバに、問題を解決するための専用の機器が前提となるため、システム価格が高くなり、パーソナルコンピュータなどの汎用品でのシステム構築ができない。 In addition, in the technology described in [Patent Document 2], a dedicated device for solving the problem is premised on the server that constitutes the cluster system, so that the system price is high, and a general-purpose product such as a personal computer is used. The system cannot be built.

本発明の目的は、サーバ間のハートビート伝送経路異常、実行モードのサーバにおける一時的な高負荷によって待機モードのサーバが誤った判断を行い、切り替え動作を行った場合であっても、クラスタシステムが提供するＩＰアドレスに複数のＭＡＣアドレスが対応付く状態を回避し、ネットワークにおける通信障害を防止するクラスタシステムにおけるサーバ切り替え方法を提供することにある。 An object of the present invention is to provide a cluster system even when a standby mode server makes an erroneous determination due to an abnormal heartbeat transmission path between servers or a temporary high load on an execution mode server and performs a switching operation. It is an object of the present invention to provide a server switching method in a cluster system that avoids a state in which a plurality of MAC addresses correspond to an IP address provided by and prevents a communication failure in a network.

本発明は、両方のサーバが実行モードとなった際に、一方のサーバで有効化されている仮想ＩＰアドレスを利用した通信を不能な状態とすることでネットワークから論理的に切り離すようにしたものである。これにより、ネットワーク上でクラスタシステムが提供する仮想ＩＰアドレスに対応するＭＡＣアドレスは１つとなるため、クライアントコンピュータとクラスタシステムとの通信が正常に行われる。 The present invention logically disconnects from the network by disabling communication using the virtual IP address enabled on one of the servers when both servers are in execution mode. It is. As a result, since there is one MAC address corresponding to the virtual IP address provided by the cluster system on the network, communication between the client computer and the cluster system is normally performed.

本発明によれば、特別な機器を利用しない安価なクラスタ構成において、ハートビート伝送経路の異常、実行モードのサーバにおける一時的な高負荷状態によって待機モードのサーバが他サーバの停止を誤って判断した場合でも、一方のサーバをネットワークから切り離し、クラスタシステムを構成するサーバとクライアント間の通信障害の発生を防止したサーバ切り替えが可能となる。 According to the present invention, in a low-cost cluster configuration that does not use special equipment, a standby mode server erroneously determines that another server has stopped due to an abnormal heartbeat transmission path or a temporary high load on the execution mode server. Even in this case, one server can be disconnected from the network, and server switching can be performed while preventing a communication failure between the server constituting the cluster system and the client.

本発明の一実施例を図１から図５により説明する。図１は、本実施例のクラスタシステムの構成図である。 An embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a configuration diagram of the cluster system of this embodiment.

図１に示すように、ネットワーク１４０２を介して、サーバＡ１１０１，サーバＢ１２０１，クライアントコンピュータ１３０１が接続されている。クラスタシステムを構成するサーバＡ１１０１，サーバＢ１２０１は、伝送経路１４０１で相互に接続される。 As shown in FIG. 1, a server A 1101, a server B 1201, and a client computer 1301 are connected via a network 1402. Server A 1101 and server B 1201 constituting the cluster system are connected to each other via a transmission path 1401.

サーバＡ１１０１，サーバＢ１２０１では、サーバ毎にクラスタ制御機構１１０３，１２０２，ネットワーク通信機構１１０５，１２０４、アプリケーション１１０２を有している。クラスタ制御機構１１０３，１２０２には、それぞれＡＲＰ制御部１１０４，１２０３が設けられ、ネットワーク通信機構１１０５，１２０４には、それぞれＡＲＰテーブル１１０６，１２０５、仮想ＩＰアドレス１１０７，ＭＡＣアドレスＡ１１０８，ＭＡＣアドレスＢ１２０６が設けられる。ここで、仮想ＩＰアドレスを上位アドレス、ＭＡＣアドレスを下位アドレスともいう。 The server A 1101 and the server B 1201 have a cluster control mechanism 1103, 1202, a network communication mechanism 1105, 1204, and an application 1102 for each server. The cluster control mechanisms 1103 and 1202 are respectively provided with ARP control units 1104 and 1203, and the network communication mechanisms 1105 and 1204 are respectively provided with ARP tables 1106 and 1205, a virtual IP address 1107, a MAC address A 1108, and a MAC address B 1206. It is done. Here, the virtual IP address is also referred to as an upper address, and the MAC address is also referred to as a lower address.

クライアントコンピュータ１３０１には、アプリケーション１３０２，ネットワーク通信機構１３０３を有しており、ネットワーク通信機構１３０３には、ＡＲＰテーブル１３０４，クライアントＩＰアドレス１３０５，ＭＡＣアドレス１３０６が設けられる。 The client computer 1301 has an application 1302 and a network communication mechanism 1303, and the network communication mechanism 1303 is provided with an ARP table 1304, a client IP address 1305, and a MAC address 1306.

アプリケーション１１０２はクラスタ制御機構１１０３の持つ運転モードに従い、処理を行うか否かが制御される。図１に示す例では、サーバＡ１１０１がアプリケーションの処理を行う場合を示している。 The application 1102 is controlled to perform processing according to the operation mode of the cluster control mechanism 1103. In the example illustrated in FIG. 1, a case where the server A 1101 performs application processing is illustrated.

実行モードのサーバＡ１１０１は、アプリケーション１１０２と、クライアントコンピュータ１３０１のアプリケーション１３０２が通信を行うための仮想ＩＰアドレス１１０７を有効化している。仮想ＩＰアドレスに対応するＭＡＣアドレスは、実行モードのサーバＡ１１０１のＭＡＣアドレスＣ１１０８であり、クライアントコンピュータ１３０１のネットワーク通信機構１３０３では、この対応付けをＡＲＰテーブル１３０４にエントリしている。エントリの内容は、クライアントのＡＲＰエントリ（１）１３０７に示すように、仮想ＩＰアドレス１１０７とＭＡＣアドレスＡ１１０８である。 The server A 1101 in the execution mode validates the virtual IP address 1107 for communication between the application 1102 and the application 1302 of the client computer 1301. The MAC address corresponding to the virtual IP address is the MAC address C1108 of the server A1101 in the execution mode, and the network communication mechanism 1303 of the client computer 1301 has entered this association in the ARP table 1304. The contents of the entry are a virtual IP address 1107 and a MAC address A1108 as shown in the ARP entry (1) 1307 of the client.

実行モードのサーバＡ１１０１のネットワーク通信機構１１０５においても、クライアントのＩＰアドレスとＭＡＣアドレスの対応付けをＡＲＰテーブル１１０６にエントリしている。エントリの内容は、サーバＡのＡＲＰエントリ（１）１１０９に示すように、クライアントのＩＰアドレス１３０５とＭＡＣアドレスＣ１３０６である。 Also in the network communication mechanism 1105 of the server A 1101 in the execution mode, the association between the client IP address and the MAC address is entered in the ARP table 1106. The contents of the entry are the client's IP address 1305 and MAC address C 1306 as shown in the ARP entry (1) 1109 of server A.

これらのＩＰアドレスとＭＡＣアドレスの対応付けによってネットワーク１４０２における通信を行っている。 Communication in the network 1402 is performed by associating these IP addresses and MAC addresses.

図１に示す構成において、サーバＡ１１０１が停止した場合、サーバＢ１２０１のクラスタ制御機構１２０２は、サーバＡ１１０１が停止したことを検出し、サーバＢ１２０１上でアプリケーションが処理を行うようにクラスタ制御機構１２０２が制御する。 In the configuration shown in FIG. 1, when the server A 1101 is stopped, the cluster control mechanism 1202 of the server B 1201 detects that the server A 1101 has stopped, and the cluster control mechanism 1202 controls the application so that the application processes on the server B 1201. To do.

他のサーバが停止したことの検出は、ハートビート受信の途絶によって判定する。これは、サーバ毎のクラスタ制御機構１１０３，１２０２がハートビート用の伝送経路１４０１を利用して定周期で電文の送受信を行い、両方のクラスタ制御機構１１０３，１２０２で一定期間の受信が行われない場合には、サーバが停止したと判断する。待機モードのサーバＢ１２０１が、実行モードのサーバＡ１１０１が停止したと判断した場合には、待機モードから実行モードへの移行を行う。 The detection of the other server being stopped is determined by the interruption of heartbeat reception. This is because the cluster control mechanisms 1103 and 1202 for each server use the heartbeat transmission path 1401 to transmit and receive messages at regular intervals, and both cluster control mechanisms 1103 and 1202 do not receive for a certain period of time. In this case, it is determined that the server has stopped. When the server B 1201 in the standby mode determines that the server A 1101 in the execution mode has stopped, the server B 1201 shifts from the standby mode to the execution mode.

図１に示す構成において、伝送経路１４０１が異常となった場合、もしくは、実行モードのサーバＡ１１０１内でアプリケーション１１０２の処理負荷が高くなったため、クラスタ制御機構１１０３がハートビートを送信できなかった場合は、待機モードのクラスタ制御機構１２０２は、実行モードのサーバＡ１１０１の停止を誤って判断する。 In the configuration shown in FIG. 1, when the transmission path 1401 becomes abnormal, or when the cluster control mechanism 1103 cannot transmit a heartbeat because the processing load of the application 1102 is increased in the server A 1101 in the execution mode. The cluster control mechanism 1202 in the standby mode erroneously determines that the server A 1101 in the execution mode is stopped.

待機モードのサーバＢ１２０１が実行モードに移行する際、待機モードのＡＲＰ制御部１２０３は、停止したと判断したサーバＡ１１０１（以下、旧実行サーバと呼ぶ）に対し、ネットワーク１４０２に接続されているクライアントコンピュータ１３０１のＩＰアドレスとネットワーク上に存在しないＭＡＣアドレスとを関連付けたＡＲＰリプライパケット（以下、偽装ＡＲＰと呼ぶ）を送信する。 When the standby mode server B 1201 shifts to the execution mode, the standby mode ARP control unit 1203 is connected to the network 1402 for the server A 1101 (hereinafter referred to as the old execution server) that is determined to have stopped. An ARP reply packet (hereinafter referred to as a fake ARP) in which the IP address 1301 is associated with a MAC address that does not exist on the network is transmitted.

旧実行サーバ１１０１が停止していなかった場合、偽装ＡＲＰを受けることでネットワーク通信機構１１０５のＡＲＰテーブル１１０６が更新または追加される。このときのＡＲＰテーブル１１０６のエントリ内容は、サーバＡのＡＲＰエントリ（２）１１１０に示すように、クライアントＩＰアドレス１３０５とＭＡＣアドレスＸである。図１では、ネットワーク上に存在しないＭＡＣアドレスの例をｘとしている。このため、旧実行サーバ１１０１からクライアントコンピュータ１３０１に対する通信は存在しないＭＡＣアドレス宛となり、旧実行サーバ１１０１を通信不能な状態として、ネットワークから切り離す。 When the old execution server 1101 is not stopped, the ARP table 1106 of the network communication mechanism 1105 is updated or added by receiving the fake ARP. The entry contents of the ARP table 1106 at this time are the client IP address 1305 and the MAC address X as shown in the ARP entry (2) 1110 of the server A. In FIG. 1, x is an example of a MAC address that does not exist on the network. Therefore, communication from the old execution server 1101 to the client computer 1301 is addressed to a non-existing MAC address, and the old execution server 1101 is disconnected from the network in a state where communication is not possible.

また、ＡＲＰテーブルのエントリは保持時間があり、設定された時間を経過すると消去されるため、実行モードとなるサーバＢ１２０１（以下、新実行サーバと呼ぶ）のＡＲＰ制御部１２０３は、設定された時間より短い間隔で旧実行サーバ１１０１のネットワーク通信機構１１０５に対して偽装ＡＲＰを送信することによってエントリの消去を防止する。ＡＲＰエントリの保持時間は、オペレーティングシステムの仕様によって異なるため、偽装ＡＲＰの送信時間間隔は設定可能となっている。 Also, since the entry in the ARP table has a holding time and is deleted when the set time elapses, the ARP control unit 1203 of the server B 1201 (hereinafter referred to as a new execution server) that is in the execution mode is set to the set time. By deleting the spoofed ARP to the network communication mechanism 1105 of the old execution server 1101 at shorter intervals, the deletion of the entry is prevented. Since the holding time of the ARP entry varies depending on the operating system specifications, the transmission time interval of the fake ARP can be set.

偽装ＡＲＰの送信は、旧実行サーバ１１０１のクラスタ制御機構１１０３からのハートビートを再度受信し、状態が確認できるまで継続する。 The transmission of the fake ARP continues until the heartbeat from the cluster control mechanism 1103 of the old execution server 1101 is received again and the state can be confirmed.

この後、新実行サーバ１２０１のＡＲＰ制御部１２０３は、有効化する仮想ＩＰアドレス１１０７と自サーバのＭＡＣアドレス１２０６を対応付けたＡＲＰパケットをネットワーク上にブロードキャスト形式で送信し、クライアントコンピュータ１３０１のＡＲＰテーブル１３０４を更新する。このときのエントリ内容は、クライアントのＡＲＰエントリ（２）１３０８に示すように、仮想ＩＰアドレス１１０７とＭＡＣアドレスＢ１２０６である。 Thereafter, the ARP control unit 1203 of the new execution server 1201 transmits an ARP packet in which the virtual IP address 1107 to be activated and the MAC address 1206 of the own server are associated with each other in a broadcast format on the network, and the ARP table of the client computer 1301 1304 is updated. The entry contents at this time are a virtual IP address 1107 and a MAC address B 1206 as shown in the ARP entry (2) 1308 of the client.

また、仮想ＩＰアドレスに対応するＡＲＰテーブルのエントリを持たないクライアントから、新たに通信を開始する場合は、ネットワーク１４０２上に仮想ＩＰアドレスに対するＡＲＰリクエストパケットをブロードキャストし、旧実行サーバ１１０１のネットワーク通信機構１１０５がＡＲＰリプライを応答することによって偽装ＡＲＰによるエントリ１１１０が存在するＭＡＣアドレスのエントリに更新され、通信可能な状態に回復するケースが考えられる。 When a new communication is started from a client that does not have an ARP table entry corresponding to the virtual IP address, an ARP request packet for the virtual IP address is broadcast on the network 1402, and the network communication mechanism of the old execution server 1101 When 1105 responds with the ARP reply, the entry 1110 by the fake ARP is updated to the entry of the MAC address in which it exists, and a case where the communication is restored can be considered.

この状態を防止するため、新実行サーバ１２０１のＡＲＰ制御部１２０３は、ネットワーク１４０２上のＡＲＰリクエストパケットを監視し、仮想ＩＰアドレスに対するＡＲＰリクエストを検出した場合には、旧実行サーバ１１０１のネットワーク通信機構１１０５に偽装ＡＲＰを再送し、ＡＲＰリクエスト送信元のクライアントに新実行サーバ１２０１のＭＡＣアドレス１２０６を設定したＡＲＰリプライを複数回に渡って送信することで補正を行う。 In order to prevent this state, the ARP control unit 1203 of the new execution server 1201 monitors the ARP request packet on the network 1402 and when an ARP request for the virtual IP address is detected, the network communication mechanism of the old execution server 1101 Correction is performed by retransmitting the spoofed ARP in 1105 and transmitting the ARP reply in which the MAC address 1206 of the new execution server 1201 is set to the client of the ARP request transmission multiple times.

これによって、待機モードのサーバが他サーバの停止を誤って判断した場合でも、ネットワーク上で仮想ＩＰアドレスに複数のＭＡＣアドレスが対応付く状態を防止する。 This prevents a state where a plurality of MAC addresses are associated with virtual IP addresses on the network even when a server in standby mode erroneously determines that another server has stopped.

図２は、クラスタ制御機構のフロー図を、図３はクラスタ制御機構における引き継ぎ処理のフロー図を、図４はクラスタ制御機構における偽装ＡＲＰ送信処理のフロー図を、図５はクラスタ制御機構におけるＡＲＰ監視処理のフロー図を示す。 2 is a flowchart of the cluster control mechanism, FIG. 3 is a flowchart of the takeover process in the cluster control mechanism, FIG. 4 is a flowchart of the fake ARP transmission process in the cluster control mechanism, and FIG. 5 is an ARP in the cluster control mechanism. The flowchart of a monitoring process is shown.

図２は、クラスタ制御機構の処理を示す流れ図である。ステップＳ１では、自サーバの運転モードの初期状態を決定する。他サーバが実行モード以外である場合には実行モードとなり、他サーバが実行モードである場合には待機モードとなる。 FIG. 2 is a flowchart showing the processing of the cluster control mechanism. In step S1, the initial state of the operation mode of the own server is determined. When the other server is other than the execution mode, the execution mode is set, and when the other server is the execution mode, the standby mode is set.

ステップＳ２では、決定した運転モードの判定を行う。運転モードが実行の場合は、ステップＳ３で仮想ＩＰアドレスの有効化を行い、ステップＳ４でアプリケーションを起動する。 In step S2, the determined operation mode is determined. If the operation mode is execution, the virtual IP address is validated in step S3, and the application is activated in step S4.

ステップＳ５では、定期的な時間間隔で、他サーバへハートビートを送信し、他サーバからのハートビートの受信確認を行う。ステップＳ６では、ハートビートの受信が一定期間途絶したかの判定を行う。ハートビート受信の途絶を検出した場合、他サーバが停止したと判断する。 In step S5, heartbeats are transmitted to other servers at regular time intervals, and reception of heartbeats from other servers is confirmed. In step S6, it is determined whether reception of the heartbeat has been interrupted for a certain period. If a heartbeat reception interruption is detected, it is determined that the other server has stopped.

ステップＳ７では、現在のサーバの運転モードを判定し、待機モードである場合には、ステップＳ８で、実行モードとなるための引き継ぎ処理を実行する。 In step S7, the operation mode of the current server is determined, and if it is the standby mode, a takeover process for entering the execution mode is executed in step S8.

ステップＳ９では、ステップＳ５でのハートビート受信の確認により、停止していた他サーバからのハートビート受信が再開したかの判定を行う。 In step S9, it is determined whether or not the heartbeat reception from the other server that has been stopped has been resumed based on the confirmation of the heartbeat reception in step S5.

他サーバからのハートビート受信の再開を検知した場合には、引き継ぎ処理で開始した処理を停止するため、ステップＳ１０で、仮想ＡＲＰの送信を停止し、ステップＳ１１で、ＡＲＰ監視処理を停止する。 When the restart of heartbeat reception from another server is detected, the process started in the takeover process is stopped, so that the transmission of the virtual ARP is stopped in step S10, and the ARP monitoring process is stopped in step S11.

図３は、待機モードのサーバが実行モードとなる際の引き継ぎ処理のフロー図を示す。ステップＳ１２では、ステップＳ６で停止と判断したサーバに対して偽装ＡＲＰを送信する処理を開始し、ステップＳ１３で、仮想ＩＰアドレスを有効化する。 FIG. 3 shows a flowchart of the takeover process when the server in the standby mode enters the execution mode. In step S12, a process of transmitting a fake ARP to the server determined to be stopped in step S6 is started, and in step S13, the virtual IP address is validated.

ステップＳ１４で、ＡＲＰ監視処理を開始した後、ステップＳ１５でアプリケーションを起動し、ステップＳ１６で自サーバの運転モードを実行モードに更新する。 After starting the ARP monitoring process in step S14, the application is started in step S15, and the operation mode of the own server is updated to the execution mode in step S16.

図４は、偽装ＡＲＰの送信処理のフロー図を示す。この処理は、図３のステップＳ１２で開始され、図２のステップＳ１０で停止される。 FIG. 4 shows a flowchart of the transmission process of the camouflaged ARP. This process starts at step S12 in FIG. 3 and stops at step S10 in FIG.

ステップＳ１７では、他サーバに偽装ＡＲＰを送信する間隔の設定を読み込む。ステップＳ１８では、ネットワークに接続し、クラスタシステムを構成するサーバと通信を行うためのクライアントのＩＰアドレスを取得する。 In step S17, the setting of the interval for transmitting the fake ARP to another server is read. In step S18, an IP address of a client for connecting to the network and communicating with a server constituting the cluster system is acquired.

ステップＳ１９では、ステップＳ１８で取得したクライアントのＩＰアドレスに、ネットワーク上に存在しないＭＡＣアドレスを設定したＡＲＰリプライパケットを生成し、他サーバに送信する。ステップＳ２０は、偽装ＡＲＰを他サーバに送信する間隔を生成する。 In step S19, an ARP reply packet in which the MAC address that does not exist on the network is set as the IP address of the client acquired in step S18 is generated and transmitted to another server. Step S20 generates an interval for transmitting the camouflaged ARP to another server.

図５はＡＲＰ監視処理のフロー図を示す。この処理は、図３のステップＳ１４で開始され、図２のステップＳ１１で停止される。 FIG. 5 shows a flowchart of the ARP monitoring process. This process starts at step S14 in FIG. 3 and stops at step S11 in FIG.

ステップＳ２１では、ネットワーク上に流れるＡＲＰパケットを監視する。ＡＲＰパケットを検出した場合、ステップＳ２２で、仮想ＩＰアドレスに対するＡＲＰリクエストかを判定する。仮想ＩＰアドレスに対するＡＲＰリクエストを検出した場合は、ステップＳ２３で、他サーバに対して、ＡＲＰリクエスト送信元のＩＰアドレスにネットワーク上に存在しないＭＡＣアドレスを設定した偽装ＡＲＰを送信する。また、ステップＳ２４では、仮想ＩＰアドレスに対するＭＡＣアドレスを自サーバのＭＡＣアドレスとしたＡＲＰリプライパケットを送信する。これらの送信は、他サーバがＡＲＰリプライの応答が送信された後に、更にＡＲＰテーブルが更新されるよう、複数回に渡って送信を行う。 In step S21, ARP packets flowing on the network are monitored. If an ARP packet is detected, it is determined in step S22 whether it is an ARP request for the virtual IP address. When an ARP request for the virtual IP address is detected, in step S23, a spoofed ARP in which a MAC address that does not exist on the network is set as the IP address of the ARP request transmission source is transmitted to another server. In step S24, an ARP reply packet with the MAC address corresponding to the virtual IP address as the MAC address of the own server is transmitted. These transmissions are performed a plurality of times so that the ARP table is further updated after the other server transmits the ARP reply response.

本発明の一実施例であるクラスタシステムの構成図である。It is a block diagram of the cluster system which is one Example of this invention. ノード停止監視機構のネットワーク送受信処理のフロー図である。It is a flowchart of the network transmission / reception process of a node stop monitoring mechanism. クラスタ制御機構における引き継ぎ処理のフロー図である。It is a flowchart of the takeover process in a cluster control mechanism. クラスタ制御機構における偽装ＡＲＰ送信処理のフロー図である。It is a flowchart of the camouflaged ARP transmission process in a cluster control mechanism. クラスタ制御機構におけるＡＲＰ監視処理のフロー図である。It is a flowchart of the ARP monitoring process in a cluster control mechanism.

符号の説明Explanation of symbols

１１０１サーバＡ
１１０２アプリケーション
１１０３，１２０２クラスタ制御機構
１１０４，１２０３ＡＲＰ制御部
１１０５，１２０４ネットワーク通信機構
１２０１サーバＢ
１３０１クライアントコンピュータ
１４０１伝送経路
１４０２ネットワーク 1101 Server A
1102 Application 1103, 1202 Cluster control mechanism 1104, 1203 ARP control unit 1105, 1204 Network communication mechanism 1201 Server B
1301 Client computer 1401 Transmission path 1402 Network

Claims

実行モードのサーバと待機モードのサーバが相互監視用の伝送経路で接続され、ハートビートの確認によって相互に他サーバの停止を監視し、一方のサーバがアドレスを用いてネットワークを介して他の機器と通信を行うクラスタシステムであって、前記待機モードのサーバが、前記実行モードのサーバからのハートビートの途絶を確認した時には、前記待機モードのサーバから、前記実行モードのサーバにネットワーク上に存在しない偽装アドレスを送信して通信を利用不可の状態とし、ネットワークから切り離すクラスタシステムにおけるサーバ切り替え方法。 The server in the execution mode and the server in the standby mode are connected by the transmission path for mutual monitoring, and the other server is monitored by the other party via the network using the address by monitoring the stop of each other by checking the heartbeat. The standby mode server is present on the network from the standby mode server to the execution mode server when the heartbeat from the execution mode server is confirmed to be interrupted. Server switching method in a cluster system that sends a spoofed address to make the communication unusable and disconnects it from the network.

前記アドレスが上位アドレスと下位アドレスから構成され、クラスタを構成するサーバ間では同じ上位アドレスを排他利用し、前記下位アドレスに前記偽装アドレスを設定する請求項１に記載のクラスタシステムにおけるサーバ切り替え方法。 2. The server switching method in a cluster system according to claim 1, wherein the address is composed of an upper address and a lower address, the same upper address is exclusively used among servers constituting the cluster, and the spoofed address is set as the lower address.

前記上位アドレスがＩＰアドレス、前記下位アドレスがＭＡＣアドレスであり、前記偽装アドレスを含むＡＲＰリプライパケットとして送信される請求項２に記載のクラスタシステムにおけるサーバ切り替え方法。 The server switching method in the cluster system according to claim 2, wherein the upper address is an IP address, the lower address is a MAC address, and is transmitted as an ARP reply packet including the spoofed address.