JP7056868B2

JP7056868B2 - System, computer, system control method and program

Info

Publication number: JP7056868B2
Application number: JP2017242586A
Authority: JP
Inventors: 直人鈴木
Original assignee: NEC Communication Systems Ltd
Current assignee: NEC Communication Systems Ltd
Priority date: 2017-12-19
Filing date: 2017-12-19
Publication date: 2022-04-19
Anticipated expiration: 2037-12-19
Also published as: JP2019109735A

Description

本発明は、システム、計算機、システム制御方法及びプログラムに関する。 The present invention relates to a system, a computer, a system control method and a program.

特許文献１乃至３において、物理マシン（ＰＭ；Physical Machine）上に構築された仮想マシン（ＶＭ；Virtual Machine）を用いるフォールトトレラントシステムが開示されている。 Patent Documents 1 to 3 disclose a fault-tolerant system using a virtual machine (VM) built on a physical machine (PM).

通常、フォールトトレラントシステムでは、計算機（所謂、コンピュータ；物理マシン）が冗長に構成される。また、特許文献１乃至３に開示されるように、サービスを提供する仮想マシンが冗長に構成されることもある。フォールトトレラントシステムにおける通常動作時には、稼働情報が稼働系から待機系に複製される。その際の複製は、非同期に運用されることが多い。従って、障害が発生した場合には稼働系から待機系に切り替えが行われた後、新たな稼働系（旧待機系）における再開動作（再起動、リブート）が必要となる。 Usually, in a fault tolerant system, a computer (so-called computer; physical machine) is redundantly configured. Further, as disclosed in Patent Documents 1 to 3, virtual machines that provide services may be redundantly configured. During normal operation in a fault-tolerant system, operating information is replicated from the active system to the standby system. The duplication at that time is often operated asynchronously. Therefore, when a failure occurs, after switching from the active system to the standby system, a restart operation (reboot, reboot) in the new active system (old standby system) is required.

特開２０１４－１３９７０６号公報Japanese Unexamined Patent Publication No. 2014-139706 特開２０１４－１０２７２４号公報Japanese Unexamined Patent Publication No. 2014-102724 特開２０１１－０６００５５号公報Japanese Unexamined Patent Publication No. 2011-060055

なお、上記先行技術文献の各開示を、本書に引用をもって繰り込むものとする。以下の分析は、本発明者らによってなされたものである。 The disclosures of the above prior art documents shall be incorporated into this document by citation. The following analysis was made by the present inventors.

上述のように、フォールトトレラントシステムでは、系の切り替え時に稼働系の情報を待機系へ複製し、その後新たな稼働系を再起動する必要があり、当該複製や再起動に伴うサービス遅延が問題となる。 As mentioned above, in a fault-tolerant system, it is necessary to duplicate the information of the active system to the standby system when switching the system, and then restart the new active system, and the service delay due to the duplication and restart is a problem. Become.

本発明は、稼働系から待機系への切り替えを迅速に実行することに寄与する、システム、計算機、システム制御方法及びプログラムを提供することを目的とする。 It is an object of the present invention to provide a system, a computer, a system control method and a program that contribute to rapid switching from an active system to a standby system.

本発明乃至開示の第１の視点によれば、第１の仮想マシンが稼働する、第１の物理マシンと、第２の仮想マシンが稼働する、第２の物理マシンと、を含み、前記第１の仮想マシンを稼働系とし、前記第２の仮想マシンを待機系とする場合に、前記第１の物理マシンは、前記第１の仮想マシンが動作することにより生じるリソースの変化に関する情報をリソース差分情報として、前記第２の物理マシンに送信し、前記第２の物理マシンは、前記リソース差分情報を前記第２の仮想マシンのリソースに反映すると共に、系の切り替えが必要な場合に、前記リソース差分情報が反映された第２の仮想マシンを、待機を解除された状態に復帰させ、前記第１の物理マシンは、前記リソース差分情報を生成し、生成したリソース差分情報を前記第２の物理マシンに送信する前にリソースの変化が生じた結果、そのリソースの変化が今回のリソース差分情報に含まれなかった場合に次回のリソース差分情報にそのリソースの変化を反映する、システムが提供される。 According to the first aspect of the present invention or the disclosure, the first physical machine on which the first virtual machine is operated and the second physical machine on which the second virtual machine is operated are included. When one virtual machine is used as an active system and the second virtual machine is used as a standby system, the first physical machine uses information on resource changes caused by the operation of the first virtual machine as resources. The difference information is transmitted to the second physical machine, and the second physical machine reflects the resource difference information in the resources of the second virtual machine, and when it is necessary to switch the system, the said The second virtual machine to which the resource difference information is reflected is returned to the state in which the standby is released , the first physical machine generates the resource difference information, and the generated resource difference information is used as the second. A system is provided that reflects the resource change in the next resource difference information if the resource change is not included in the resource difference information this time as a result of the resource change occurring before sending to the physical machine. To.

本発明乃至開示の第２の視点によれば、稼働系の仮想マシンが稼働し、前記仮想マシンが動作することにより生じるリソースの変化に関する情報をリソース差分情報として、フォールトトレラントシステムを構成する他の計算機に送信する、計算機であって、前記計算機は、前記リソース差分情報を生成し、生成したリソース差分情報を前記他の計算機に送信する前にリソースの変化が生じた結果、そのリソースの変化が今回のリソース差分情報に含まれなかった場合に次回のリソース差分情報にそのリソースの変化を反映する、計算機が提供される。 According to the second viewpoint of the present invention or the disclosure, another system constituting a fault tolerant system is configured by using information on a resource change caused by the operation of a running virtual machine and the operation of the virtual machine as resource difference information. A computer that is transmitted to a computer , wherein the computer generates the resource difference information, and the change in the resource occurs as a result of a change in the resource before transmitting the generated resource difference information to the other computer. A computer is provided that reflects the change in the resource in the next resource difference information when it is not included in the resource difference information this time .

本発明乃至開示の第３の視点によれば、フォールトトレラントシステムに含まれる、稼働系の計算機が送信する情報であって、前記稼働系の計算機にて稼働する仮想マシンが動作することにより生じるリソースの変化に関するリソース差分情報を取得し、前記リソース差分情報を自装置の仮想マシンのリソースに反映すると共に、系の切り替えが必要な場合に、前記リソース差分情報が反映された仮想マシンを、待機を解除された状態に復帰させる、計算機であって、前記稼働系の計算機は、前記リソース差分情報を生成し、生成したリソース差分情報を待機系の前記計算機に送信する前にリソースの変化が生じた結果、そのリソースの変化が今回のリソース差分情報に含まれなかった場合に次回に反映される差分情報にそのリソースの変化を繰り越し、待機系の前記計算機は、そのリソースの変化を反映した次回のリソース差分情報を取得する、計算機が提供される。 According to the third aspect of the present invention or the disclosure, it is information transmitted by an operating system computer included in a fault tolerant system, and is a resource generated by operating a virtual machine operating on the operating system computer. Acquires the resource difference information related to the change of the system, reflects the resource difference information in the resources of the virtual machine of the own device , and waits for the virtual machine to which the resource difference information is reflected when the system needs to be switched. It is a computer that returns to the released state, and the active computer generates the resource difference information, and the resource change occurs before the generated resource difference information is transmitted to the standby computer. As a result, if the change in the resource is not included in the resource difference information this time, the change in the resource is carried over to the difference information that will be reflected next time, and the standby computer will reflect the change in the resource next time. A computer is provided to acquire resource difference information .

本発明乃至開示の第４の視点によれば、第１の仮想マシンが稼働する、第１の物理マシンと、第２の仮想マシンが稼働する、第２の物理マシンと、を含むシステムにおいて、前記第１の仮想マシンを稼働系とし、前記第２の仮想マシンを待機系とする場合に、前記第１の仮想マシンが動作することにより生じるリソースの変化に関する情報をリソース差分情報として、前記第２の物理マシンに送信するステップと、前記リソース差分情報を前記第２の仮想マシンのリソースに反映するステップと、系の切り替えが必要な場合に、前記リソース差分情報が反映された第２の仮想マシンを、待機を解除された状態に復帰させるステップと、前記第１の物理マシンが、前記リソース差分情報を生成し、生成したリソース差分情報を前記第２の物理マシンに送信する前にリソースの変化が生じた結果、そのリソースの変化が今回のリソース差分情報に含まれなかった場合に次回のリソース差分情報にそのリソースの変化を反映するステップと、を含むシステム制御方法が提供される。 According to the fourth aspect of the present invention or the disclosure, in a system including a first physical machine in which a first virtual machine is operated and a second physical machine in which a second virtual machine is operated. When the first virtual machine is used as an active system and the second virtual machine is used as a standby system, the information regarding the resource change caused by the operation of the first virtual machine is used as resource difference information. A step of transmitting to the 2 physical machines, a step of reflecting the resource difference information in the resources of the second virtual machine, and a second virtual in which the resource difference information is reflected when system switching is required. The step of returning the machine to the state in which the standby is released, and the resource difference information before the first physical machine generates the resource difference information and transmits the generated resource difference information to the second physical machine. A system control method including a step of reflecting the change of the resource in the next resource difference information when the change of the resource is not included in the current resource difference information as a result of the change is provided.

本発明乃至開示の第５の視点によれば、稼働系の仮想マシンを稼働する処理と、前記仮想マシンが動作することにより生じるリソースの変化に関する情報をリソース差分情報として、フォールトトレラントシステムを構成する他の計算機に送信する処理と、を稼働系の仮想マシンが動作しているコンピュータに実行させるプログラムであって、前記リソース差分情報を生成し、生成したリソース差分情報を前記他の計算機に送信する前にリソースの変化が生じた結果、そのリソースの変化が今回のリソース差分情報に含まれなかった場合に次回のリソース差分情報にそのリソースの変化を反映する、プログラムが提供される。
なお、このプログラムは、コンピュータが読み取り可能な記憶媒体に記録することができる。記憶媒体は、半導体メモリ、ハードディスク、磁気記録媒体、光記録媒体等の非トランジェント（non-transient）なものとすることができる。本発明は、コンピュータプログラム製品として具現することも可能である。 According to the fifth viewpoint of the present invention or the disclosure, a fault tolerant system is configured by using information on a process of operating an operating virtual machine and a resource change caused by the operation of the virtual machine as resource difference information. A program that causes a computer running a running virtual machine to execute a process to be transmitted to another computer , generates the resource difference information, and transmits the generated resource difference information to the other computer. A program is provided that reflects the resource change in the next resource difference information if the resource change is not included in the resource difference information this time as a result of the previous resource change .
Note that this program can be recorded on a computer-readable storage medium. The storage medium may be a non-transient such as a semiconductor memory, a hard disk, a magnetic recording medium, or an optical recording medium. The present invention can also be embodied as a computer program product.

本発明乃至開示の各視点によれば、稼働系から待機系への切り替えを迅速に実行することに寄与する、システム、計算機、システム制御方法及びプログラムが、提供される。 According to the viewpoints of the present invention or the disclosure, there are provided systems, computers, system control methods and programs that contribute to rapid switching from an active system to a standby system.

一実施形態の概要を説明するための図である。It is a figure for demonstrating the outline of one Embodiment. 第１の実施形態に係るフォールトトレラントシステムの概略構成の一例を示す図である。It is a figure which shows an example of the schematic structure of the fault tolerant system which concerns on 1st Embodiment. 稼働系と待機系の状態を同期する動作を説明するための図である。It is a figure for demonstrating the operation which synchronizes the state of an active system and a standby system. 稼働系の仮想マシンに発生したソフトウェア障害による系の切り替えを説明するための図である。It is a figure for demonstrating the system switching due to the software failure which occurred in the virtual machine of the active system. 稼働系の物理マシンに発生したソフトウェア障害による系の切り替えを説明するための図である。It is a figure for demonstrating the system switching due to the software failure which occurred in the physical machine of the operating system. 稼働系の物理マシンに発生したハードウェア障害による系の切り替えを説明するための図である。It is a figure for demonstrating the system switching due to the hardware failure which occurred in the physical machine of the operating system. 伝送経路障害等による系の切り替えを説明するための図である。It is a figure for demonstrating the switching of a system due to a transmission path failure or the like. 保守者が系切り替えコマンドを投入した場合の系の切り替えを説明するための図である。It is a figure for demonstrating the system switching when a maintenance person inputs a system switching command. 第１の実施形態に係るフォールトトレラントシステムの同期動作の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the synchronous operation of the fault tolerant system which concerns on 1st Embodiment. 第１の実施形態に係るフォールトトレラントシステムの系切り替え動作の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the system switching operation of the fault tolerant system which concerns on 1st Embodiment. 物理マシンのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of a physical machine.

初めに、一実施形態の概要について説明する。なお、この概要に付記した図面参照符号は、理解を助けるための一例として各要素に便宜上付記したものであり、この概要の記載はなんらの限定を意図するものではない。また、各図におけるブロック間の接続線は、双方向及び単方向の双方を含む。一方向矢印については、主たる信号（データ）の流れを模式的に示すものであり、双方向性を排除するものではない。 First, an outline of one embodiment will be described. It should be noted that the drawing reference reference numerals added to this outline are added to each element for convenience as an example for assisting understanding, and the description of this outline is not intended to limit anything. Further, the connection line between the blocks in each figure includes both bidirectional and unidirectional. The one-way arrow schematically shows the flow of the main signal (data), and does not exclude bidirectionality.

一実施形態に係るシステムは、第１の仮想マシン１０１が稼働する、第１の物理マシン１０２と、第２の仮想マシン１１１が稼働する、第２の物理マシン１１２と、を含む（図１参照）。当該システムでは、第１の仮想マシン１０１を稼働系とし、第２の仮想マシン１１１を待機系とする。この場合、第１の物理マシン１０２は、第１の仮想マシン１０１が動作することにより生じるリソースの変化に関する情報をリソース差分情報として、第２の物理マシン１１２に送信する。第２の物理マシン１１２は、リソース差分情報を第２の仮想マシン１１１のリソースに反映すると共に、系の切り替えが必要な場合に、リソース差分情報が反映された第２の仮想マシン１１１を復帰させる。 The system according to one embodiment includes a first physical machine 102 in which a first virtual machine 101 is operated and a second physical machine 112 in which a second virtual machine 111 is operated (see FIG. 1). ). In the system, the first virtual machine 101 is used as an operating system, and the second virtual machine 111 is used as a standby system. In this case, the first physical machine 102 transmits information regarding changes in resources caused by the operation of the first virtual machine 101 to the second physical machine 112 as resource difference information. The second physical machine 112 reflects the resource difference information in the resources of the second virtual machine 111, and returns the second virtual machine 111 to which the resource difference information is reflected when the system needs to be switched. ..

図１に示すシステムは、第１の仮想マシン１０１、第２の仮想マシン１１１を使った計算機の同期運転方式を実現する。具体的には、物理的に異なる計算機（物理マシン）間で各々の計算機に搭載された第１の仮想マシン１０１、第２の仮想マシン１１１を使い、当該仮想マシンの各種リソースの情報を対となる物理マシンと同期を取る全二重化構成を採用する。図１のシステムでは、同期を取りつつ、稼働中の第１の物理マシン１０２に生じた事象（例えば、ソフトウェア障害等）を契機とし、第１の物理マシン１０２と対となる同期された第２の物理マシン１１２へ系の切り替えを行いシステムの継続運用を実現する。その結果、稼働系から待機系への切り替えが迅速に実行される。 The system shown in FIG. 1 realizes a synchronous operation method of a computer using a first virtual machine 101 and a second virtual machine 111. Specifically, the first virtual machine 101 and the second virtual machine 111 mounted on each computer are used between physically different computers (physical machines), and information on various resources of the virtual machine is paired. Adopt a full duplex configuration that synchronizes with the physical machine. In the system of FIG. 1, the second synchronized second physical machine 102 paired with the first physical machine 102 triggered by an event (for example, a software failure) that occurred in the first physical machine 102 in operation while maintaining synchronization. The system is switched to the physical machine 112 of the above to realize continuous operation of the system. As a result, switching from the active system to the standby system is executed quickly.

以下に具体的な実施の形態について、図面を参照してさらに詳しく説明する。なお、各実施形態において同一構成要素には同一の符号を付し、その説明を省略する。 Specific embodiments will be described in more detail below with reference to the drawings. In each embodiment, the same components are designated by the same reference numerals, and the description thereof will be omitted.

［第１の実施形態］
第１の実施形態について、図面を用いてより詳細に説明する。 [First Embodiment]
The first embodiment will be described in more detail with reference to the drawings.

図２は、第１の実施形態に係るフォールトトレラントシステムの概略構成の一例を示す図である。図２を参照すると、フォールトトレラントシステムには、複数の物理マシン（計算機、電算機）１０－１及び物理マシン１０－２が含まれる。物理マシン１０－１と物理マシン１０－２は異なる装置（ハードウェア）である。物理マシン１０－１及び物理マシン１０－２は、それぞれ仮想マシンの稼働が可能に構成された計算機である。 FIG. 2 is a diagram showing an example of a schematic configuration of the fault tolerant system according to the first embodiment. Referring to FIG. 2, the fault tolerant system includes a plurality of physical machines (computers, computers) 10-1 and physical machines 10-2. The physical machine 10-1 and the physical machine 10-2 are different devices (hardware). The physical machine 10-1 and the physical machine 10-2 are computers configured to operate virtual machines, respectively.

図２において、物理マシン１０－１は稼働系の計算機である。対して、物理マシン１０－２は待機系の計算機である。つまり、図２では、稼働系の物理マシン１０－１によりサービスの提供が行われている。 In FIG. 2, the physical machine 10-1 is an operating computer. On the other hand, the physical machine 10-2 is a standby computer. That is, in FIG. 2, the service is provided by the operating physical machine 10-1.

物理マシン１０－１に何らかの障害が発生すると、系の切り替えが発生し、物理マシン１０－２が稼働系の計算機となる。 When some kind of failure occurs in the physical machine 10-1, the system is switched and the physical machine 10-2 becomes the computer of the operating system.

なお、図２において、２台の物理マシンを開示しているが、フォールトトレラントシステムに含まれる物理マシンの数を限定する趣旨ではない。例えば、待機系の物理マシンは複数存在してもよい。その場合、稼働系の物理マシンから新たに稼働系となる物理マシンを指定してもよいし、複数の待機系の物理マシンによる協調動作の結果、稼働系の物理マシンが決定されても良い。 Although two physical machines are disclosed in FIG. 2, it is not intended to limit the number of physical machines included in the fault tolerant system. For example, there may be a plurality of standby physical machines. In that case, a new physical machine to be the operating system may be specified from the physical machine of the operating system, or the physical machine of the operating system may be determined as a result of cooperative operation by a plurality of standby physical machines.

稼働系の物理マシン１０－１と待機系の物理マシン１０－２は、伝送路２０により接続され、通信路が設けられている。伝送路２０は、例えば、インターネット等のネットワーク回線でも良いし、専用回線であってもよい。このように、図２に示すフォールトトレラントシステムでは、稼働系と待機系は異なるサイトに設置されている。 The operating physical machine 10-1 and the standby physical machine 10-2 are connected by a transmission line 20 to provide a communication path. The transmission line 20 may be, for example, a network line such as the Internet or a dedicated line. As described above, in the fault-tolerant system shown in FIG. 2, the operating system and the standby system are installed at different sites.

稼働系の物理マシン１０－１と待機系の物理マシン１０－２は同じ機能を備えている。そのため、以下の説明において、稼働系の物理マシン１０－１と待機系の物理マシン１０－２を区別する特段の理由が無い場合には単に「物理マシン１０」と表記する。同様に、他の構成要素に関しても、ハイフン（－）より前に記載された数字にて当該構成要素を代表して表記する。 The active physical machine 10-1 and the standby physical machine 10-2 have the same functions. Therefore, in the following description, if there is no particular reason for distinguishing between the active physical machine 10-1 and the standby physical machine 10-2, it is simply referred to as "physical machine 10". Similarly, for other components, the numbers before the hyphen (-) are used to represent the components.

図２に示すように、物理マシン１０には各種の処理モジュールが実装される。具体的には、物理マシン１０－１は、通信部１１－１と、仮想マシン管理部１２－１と、系切り替え処理部１３－１と、物理マシン障害検出部１４－１と、を含んで構成される。 As shown in FIG. 2, various processing modules are mounted on the physical machine 10. Specifically, the physical machine 10-1 includes a communication unit 11-1, a virtual machine management unit 12-1, a system switching processing unit 13-1, and a physical machine failure detection unit 14-1. It is composed.

同様に、物理マシン１０－２は、通信部１１－２と、仮想マシン管理部１２－２と、系切り替え処理部１３－２と、物理マシン障害検出部１４－２と、を含んで構成される。上記処理モジュールの説明は後述する。 Similarly, the physical machine 10-2 includes a communication unit 11-2, a virtual machine management unit 12-2, a system switching processing unit 13-2, and a physical machine failure detection unit 14-2. To. The description of the above processing module will be described later.

さらに、図２に示すように、物理マシン１０－１には仮想マシン１５－１が生成され、物理マシン１０－２には仮想マシン１５－２が生成される。仮想マシン１５－１及び仮想マシン１５－２のそれぞれにおいて、ＯＳ（Operating System）及びアプリケーション（ＡＰＰ；application）ソフトウェアが動作する。当該アプリケーションにより、所定のサービスが提供される。つまり、仮想マシンにてアプリケーションが動作することで、サービス提供に係るプロセス（タスク）の生成、実行、終了等が行われる。 Further, as shown in FIG. 2, a virtual machine 15-1 is generated in the physical machine 10-1, and a virtual machine 15-2 is generated in the physical machine 10-2. The OS (Operating System) and application (application) software operate in each of the virtual machine 15-1 and the virtual machine 15-2. The application provides a predetermined service. That is, when the application operates on the virtual machine, the process (task) related to the service provision is generated, executed, terminated, and the like.

また、仮想マシン１５－１は、仮想マシン障害検出部１６－１に係る処理モジュール（アプリケーション）を実現する。同様に、仮想マシン１５－２は、仮想マシン障害検出部１６－２を実現する。仮想マシン障害検出部１６－１及び仮想マシン障害検出部１６－２の詳細は後述する。 Further, the virtual machine 15-1 realizes a processing module (application) related to the virtual machine failure detection unit 16-1. Similarly, the virtual machine 15-2 realizes the virtual machine failure detection unit 16-2. Details of the virtual machine failure detection unit 16-1 and the virtual machine failure detection unit 16-2 will be described later.

図２を参照して説明したように、第１の実施形態に係るフォールトトレラントシステムは、物理的な計算機による冗長構成を備えている。つまり、運用システムの構成は、極力、システム停止を避けるため、物理マシンのハードウェア障害を考慮して物理的に異なるハードウェア（物理マシン）上に同期対象の仮想マシンを、各物理マシンに実装する。さらに、津波や大地震等の激甚対応を目的とする場合には、稼働系が設置されるサイトと待機系が設置されるサイトは、所定の距離、離れた場所に設置される。 As described with reference to FIG. 2, the fault-tolerant system according to the first embodiment has a redundant configuration by a physical computer. In other words, in the configuration of the operational system, in order to avoid system outage as much as possible, the virtual machines to be synchronized are mounted on each physical machine on physically different hardware (physical machines) in consideration of the hardware failure of the physical machines. do. Further, in the case of a severe response such as a tsunami or a large earthquake, the site where the operating system is installed and the site where the standby system is installed are installed at a predetermined distance and away from each other.

第１の実施形態に係るフォールトトレラントシステムの概略動作は以下のようになる。ここでは、仮想マシン１５－１を稼働系とし、仮想マシン１５－２を待機系とする場合の動作を説明する。物理マシン１０－１は、通常動作時には、仮想マシン１５－１が動作することにより生じるリソースの変化に関する情報をリソース差分情報として、物理マシン１０－２に送信する。物理マシン１０－２は、取得したリソース差分情報を仮想マシン１５－２のリソースに反映する。さらに、フォールトトレラントシステムにおいて、系の切り替えが必要な場合に、物理マシン１０－２は、リソース差分情報が反映された仮想マシン１５－２を復帰させる。 The schematic operation of the fault tolerant system according to the first embodiment is as follows. Here, the operation when the virtual machine 15-1 is the active system and the virtual machine 15-2 is the standby system will be described. During normal operation, the physical machine 10-1 transmits information regarding changes in resources caused by the operation of the virtual machine 15-1 to the physical machine 10-2 as resource difference information. The physical machine 10-2 reflects the acquired resource difference information in the resources of the virtual machine 15-2. Further, in the fault tolerant system, when the system needs to be switched, the physical machine 10-2 restores the virtual machine 15-2 to which the resource difference information is reflected.

以下、物理マシン及び仮想マシンの処理構成（処理モジュール）の詳細について説明する。 Hereinafter, the details of the processing configuration (processing module) of the physical machine and the virtual machine will be described.

通信部１１は、他の装置（他の物理マシン１０）との間の通信を制御する手段である。通信部１１は、処理モジュール（例えば、系切り替え処理部１３等）からデータを取得すると、他の物理マシンに向けて当該データ（パケット）を送信する。また、通信部１１は、他の物理マシン１０からデータを取得すると、当該データを処理モジュールに振り分ける。 The communication unit 11 is a means for controlling communication with another device (another physical machine 10). When the communication unit 11 acquires data from a processing module (for example, a system switching processing unit 13 or the like), the communication unit 11 transmits the data (packet) to another physical machine. Further, when the communication unit 11 acquires data from another physical machine 10, the communication unit 11 distributes the data to the processing module.

仮想マシン管理部１２は、仮想マシン１５を管理する手段である。例えば、仮想マシン管理部１２は、仮想マシン１５にハードウェア資源（ＣＰＵ（Central Processing Unit）、メモリ、Ｉ／Ｏ（Ｉｎｐｕｔ／Ｏｕｔｐｕｔ）等）を割り当て、仮想マシン１５を生成する。また、仮想マシン管理部１２は、必要に応じて生成した仮想マシンを復帰させたり、停止したり（待機させたり）する。このように、仮想マシン管理部１２は、生成された仮想マシン１５に対する制御手段でもある。 The virtual machine management unit 12 is a means for managing the virtual machine 15. For example, the virtual machine management unit 12 allocates hardware resources (CPU (Central Processing Unit), memory, I / O (Input / Output), etc.) to the virtual machine 15 and generates the virtual machine 15. Further, the virtual machine management unit 12 restores or stops (stands by) the generated virtual machine as needed. As described above, the virtual machine management unit 12 is also a control means for the generated virtual machine 15.

さらに、稼働系の仮想マシン管理部１２（図２では、仮想マシン管理部１２－１）は、仮想マシン１５のリソースに関する情報の収集を行う。より具体的には、仮想マシン管理部１２－１は、仮想マシン１５－１のアプリケーション、ＯＳ等の動作に伴い変化する各種リソース情報を収集する。収集されたリソース情報は、通信部１１及び伝送路２０を経由して待機系の物理マシン（図２では物理マシン１０－２）に送信される。なお、仮想マシン管理部１２が収集するリソース情報には、レジスタに関する情報やメモリマップ等が含まれる。 Further, the virtual machine management unit 12 of the operating system (virtual machine management unit 12-1 in FIG. 2) collects information regarding the resources of the virtual machine 15. More specifically, the virtual machine management unit 12-1 collects various resource information that changes with the operation of the application, OS, and the like of the virtual machine 15-1. The collected resource information is transmitted to the standby physical machine (physical machine 10-2 in FIG. 2) via the communication unit 11 and the transmission line 20. The resource information collected by the virtual machine management unit 12 includes information related to registers, a memory map, and the like.

仮想マシン管理部１２によるリソース情報の収集は、ＯＳに搭載された汎用機能であるハイパーバイザ機能のスナップショットを利用して行うことができる。なお、仮想マシン管理部１２がリソース情報を収集する契機（トリガ）は、仮想マシン１５－１が使用するリソースに変化をもたらすイベントである。具体的には、タスクの起動、中断、終了等が当該イベントに該当する。 The collection of resource information by the virtual machine management unit 12 can be performed by using a snapshot of the hypervisor function, which is a general-purpose function installed in the OS. The trigger for collecting resource information by the virtual machine management unit 12 is an event that causes a change in the resources used by the virtual machine 15-1. Specifically, the start, interruption, end, etc. of a task correspond to the event.

物理マシン障害検出部１４は、自装置（物理マシン１０）に生じる障害を検出する手段である。具体的には、物理マシン障害検出部１４は、物理マシン１０上で稼働するＯＳを含むソフトウェアの障害及び物理マシン１０のハードウェア障害を検出する。 The physical machine failure detection unit 14 is a means for detecting a failure that occurs in the own device (physical machine 10). Specifically, the physical machine failure detection unit 14 detects a software failure including an OS running on the physical machine 10 and a hardware failure of the physical machine 10.

仮想マシン障害検出部１６は、稼働系として動作する仮想マシン１５に生じる障害を検出する手段である。具体的には、仮想マシン障害検出部１６は、稼働系の仮想マシン１５上で動作しているソフトウェア（ＯＳ、アプリケーション）の障害を検出する。なお、待機系の仮想マシン１５は待機状態となる。従って、待機系の仮想マシン１５は動作していない（停止中）ため、待機系の仮想マシン１５に含まれる仮想マシン障害検出部１６は稼働状態に切り替わるまで動作しない。図２の例では、稼働系の仮想マシン障害検出部１６－１は動作するが、待機系の仮想マシン障害検出部１６－２は非動作となる。 The virtual machine failure detection unit 16 is a means for detecting a failure that occurs in the virtual machine 15 that operates as an operating system. Specifically, the virtual machine failure detection unit 16 detects a failure of software (OS, application) running on the virtual machine 15 of the operating system. The standby virtual machine 15 is in the standby state. Therefore, since the standby virtual machine 15 is not operating (stopped), the virtual machine failure detection unit 16 included in the standby virtual machine 15 does not operate until it is switched to the operating state. In the example of FIG. 2, the virtual machine failure detection unit 16-1 of the active system operates, but the virtual machine failure detection unit 16-2 of the standby system does not operate.

稼働系における２つの障害検出部（物理マシン障害検出部１４及び仮想マシン障害検出部１６）が検出した障害に関する情報は、系切り替え処理部１３に通知される。 Information on failures detected by the two failure detection units (physical machine failure detection unit 14 and virtual machine failure detection unit 16) in the operating system is notified to the system switching processing unit 13.

系切り替え処理部１３は、仮想マシン障害検出部１６及び物理マシン障害検出部１４のすくなくとも一方が障害を検出すると、待機系の物理マシン１０に向けて「系切り替え要求」を送信する。具体的には、系切り替え処理部１３は、２つの障害検出部の少なくとも一方から障害を検出した旨の通知を受信すると、通信部１１等を介して、待機系の物理マシン１０に対して「系切り替え要求」を送信する。 When at least one of the virtual machine failure detection unit 16 and the physical machine failure detection unit 14 detects a failure, the system switching processing unit 13 transmits a "system switching request" to the standby physical machine 10. Specifically, when the system switching processing unit 13 receives a notification that a failure has been detected from at least one of the two failure detection units, the system switching processing unit 13 informs the standby physical machine 10 via the communication unit 11 or the like. "System switching request" is sent.

また、系切り替え処理部１３は、仮想マシン管理部１２に対して、仮想マシン１５を停止するように指示する。つまり、系切り替え処理部１３は、系切り替え要求を待機系の物理マシン１０に向けて送信する際に、稼働系の仮想マシン管理部１２に対して稼働系として動作している仮想マシン１５の停止（待機）を指示する。 Further, the system switching processing unit 13 instructs the virtual machine management unit 12 to stop the virtual machine 15. That is, when the system switching processing unit 13 transmits the system switching request to the standby physical machine 10, the virtual machine 15 operating as the operating system is stopped for the operating system virtual machine management unit 12. Instruct (standby).

上記説明は、主に稼働系の物理マシン１０－１に関する説明である。続いて、待機系の物理マシン１０－２について説明する。ここでは、待機系の仮想マシン管理部１２－２、系切り替え処理部１３－２の動作について説明する。 The above description is mainly about the operating physical machine 10-1. Subsequently, the standby physical machine 10-2 will be described. Here, the operations of the standby system virtual machine management unit 12-2 and the system switching processing unit 13-2 will be described.

上述のように、稼働系にて収集されたリソース情報は、待機系の物理マシン１０－２に送信される。待機系の仮想マシン管理部１２－２は、稼働系から送信されるリソース情報を待機中の仮想マシン１５－２のリソースに反映する。例えば、仮想マシン管理部１２－２が、仮想マシン１５－１のメモリマップに関する情報を「リソース情報」として取得した場合を考える。この場合、仮想マシン管理部１２－２は、仮想マシン１５－２が使用するメモリ領域が取得したメモリマップと同一となるように仮想マシン１５－２に割り当てられているメモリ領域を書き換える。 As described above, the resource information collected in the operating system is transmitted to the physical machine 10-2 in the standby system. The standby system virtual machine management unit 12-2 reflects the resource information transmitted from the active system to the resources of the standby virtual machine 15-2. For example, consider a case where the virtual machine management unit 12-2 acquires information related to the memory map of the virtual machine 15-1 as "resource information". In this case, the virtual machine management unit 12-2 rewrites the memory area allocated to the virtual machine 15-2 so that the memory area used by the virtual machine 15-2 is the same as the acquired memory map.

このようして、仮想マシン管理部１２－２は、稼働系の仮想マシン管理部１２－１から取得したリソース情報を仮想マシン１５－２のリソースに反映する。当該反映処理を行うことで、稼働系の仮想マシン１５－１と待機系の仮想マシン１５－２は同期状態となるよう管理される。 In this way, the virtual machine management unit 12-2 reflects the resource information acquired from the active virtual machine management unit 12-1 in the resources of the virtual machine 15-2. By performing the reflection process, the active virtual machine 15-1 and the standby virtual machine 15-2 are managed to be in a synchronized state.

即ち、待機系の物理マシン１０－２及び仮想マシン１５－２において、稼働系の物理マシン１０－１より受信した各種リソースの差分情報は、待機系の仮想マシン１５－２の各種リソース差分情報として、仮想マシン１５－２に即座に反映される。その結果、稼働系と待機系のリソースの同期状態が保たれる。なお、稼働系の物理マシン１０－１から受信する各種リソースの差分情報は、少なくとも１回分の差分に関する情報を含むものである。 That is, in the standby physical machine 10-2 and the virtual machine 15-2, the difference information of various resources received from the active physical machine 10-1 is used as various resource difference information of the standby virtual machine 15-2. , Immediately reflected in virtual machine 15-2. As a result, the synchronized state of the resources of the active system and the standby system is maintained. The difference information of various resources received from the physical machine 10-1 of the operating system includes information related to the difference at least once.

上述のように、稼働系の物理マシン１０－１において何らかの障害が発生すると、待機系の物理マシン１０－２に向けて「系切り替え要求」が送信される。上記系切り替え要求を受信した物理マシン１０－２の系切り替え処理部１３－２は、待機系から稼働系への切り替えを行う。具体的には、系切り替え処理部１３－２は、待機中（停止中）の仮想マシン１５－２のＩＰ（Internet protocol）アドレスの付け替えを行う。その後、系切り替え処理部１３－２は、待機中の仮想マシン１５－２を復帰（動作を再開；待機を解除）するように、仮想マシン管理部１２－２に指示する。 As described above, when some kind of failure occurs in the physical machine 10-1 of the operating system, a "system switching request" is transmitted to the physical machine 10-2 of the standby system. Upon receiving the system switching request, the system switching processing unit 13-2 of the physical machine 10-2 switches from the standby system to the operating system. Specifically, the system switching processing unit 13-2 replaces the IP (Internet protocol) address of the waiting (stopped) virtual machine 15-2. After that, the system switching processing unit 13-2 instructs the virtual machine management unit 12-2 to return the waiting virtual machine 15-2 (restart the operation; cancel the standby).

［状態同期に関する説明］
続いて、図３を参照しつつ、稼働系と待機系の状態を同期する動作について説明する。 [Explanation of status synchronization]
Subsequently, the operation of synchronizing the states of the active system and the standby system will be described with reference to FIG.

図３において、稼働系の仮想マシン１５－１上にて複数のタスク（プロセス）３０－１～３０－ｎ（ｎは正の整数、以下同じ）が実行されているものとする。各タスク３０では、タスク（プロセス）の終了、メモリの解放、ストレージへのデータ書き込み、タスク（プロセス）の中断等に係るイベントが生じる。これらのイベントでは、所定の関数が呼び出される。図３では、当該関数呼び出しイベントを中抜きの黒丸により図示している。 In FIG. 3, it is assumed that a plurality of tasks (processes) 30-1 to 30-n (n is a positive integer, the same applies hereinafter) are executed on the virtual machine 15-1 of the operating system. In each task 30, events related to the end of the task (process), the release of the memory, the writing of data to the storage, the interruption of the task (process), and the like occur. In these events, a given function is called. In FIG. 3, the function call event is illustrated by a blank black circle.

仮想マシン管理部１２－１は、イベントが発生すると（関数が呼び出されると）、当該イベント発生時の各種リソース（ＣＰＵのレジスタ情報、メモリ情報、Ｉ／Ｏ情報等）に関する情報を仮想マシン１５－１から取得する。仮想マシン管理部１２－１は、リソース情報を取得するたびに情報はメモリ上で保持した状態とする。 When an event occurs (when a function is called), the virtual machine management unit 12-1 provides information on various resources (CPU register information, memory information, I / O information, etc.) at the time of the event occurrence in the virtual machine 15-. Get from 1. The virtual machine management unit 12-1 holds the information in the memory every time the resource information is acquired.

また、差分情報はハイパーバイザ機能のスナップショットを利用して取得される。 In addition, the difference information is acquired by using the snapshot of the hypervisor function.

仮想マシン管理部１２－１は、当該リソース差分情報をイベントの発生ごとに保持していく。仮想マシン管理部１２－１により保持されたリソース差分情報のサイズが所定量（一定量）に到達すると、仮想マシン管理部１２－１は、情報取得状態を「飽和」に設定する。 The virtual machine management unit 12-1 holds the resource difference information every time an event occurs. When the size of the resource difference information held by the virtual machine management unit 12-1 reaches a predetermined amount (constant amount), the virtual machine management unit 12-1 sets the information acquisition state to "saturated".

なお、情報取得状態とは仮想マシン管理部１２－１が管理するステータス情報であって、リソース差分情報に関する仮想マシン管理部１２－１の動作状態を示す。情報取得状態が「飽和」となると、仮想マシン管理部１２－１は、保持したリソース差分情報を待機系の物理マシン１０－２に送信する。図３では、時刻Ｔ１にて、リソース差分情報４０－１が稼働系から待機系に送信されている。 The information acquisition state is the status information managed by the virtual machine management unit 12-1, and indicates the operating state of the virtual machine management unit 12-1 regarding the resource difference information. When the information acquisition state becomes "saturated", the virtual machine management unit 12-1 transmits the retained resource difference information to the standby physical machine 10-2. In FIG. 3, the resource difference information 40-1 is transmitted from the operating system to the standby system at time T1.

待機系の仮想マシン管理部１２－２は、当該リソース差分情報４０－１を仮想マシン１５－２のリソースに反映し、稼働系の仮想マシン１５－１と待機系の仮想マシン１５－２を同期させる。 The standby virtual machine management unit 12-2 reflects the resource difference information 40-1 in the resources of the virtual machine 15-2, and synchronizes the active virtual machine 15-1 and the standby virtual machine 15-2. Let me.

情報取得状態が「飽和」となり、リソース差分情報を待機系に送信する準備をしている最中に（生成したリソース差分情報を待機系に送信する前に）、仮想マシン１５－１のリソースが変化するイベントが発生する場合がある。例えば、図３に示すように、イベント４１－１が発生し、リソース差分情報４０－２の生成段階（図３の時刻Ｔ２）にてイベント４１－２が発生する場合がある（時刻Ｔ３）。 While the information acquisition status is "saturated" and the resource difference information is being prepared to be transmitted to the standby system (before the generated resource difference information is transmitted to the standby system), the resources of virtual machine 15-1 are set. Changing events may occur. For example, as shown in FIG. 3, event 41-1 may occur, and event 41-2 may occur at the generation stage of resource difference information 40-2 (time T2 in FIG. 3) (time T3).

このような場合、仮想マシン管理部１２－１は、リソース差分情報４０－２に反映されていないイベント４１－２を認識した段階で情報取得状態を「湧出」に設定する。その後、仮想マシン管理部１２－１は、リソース差分情報４０－２を待機系の物理マシン１０－２に送信する（時刻Ｔ４）。 In such a case, the virtual machine management unit 12-1 sets the information acquisition state to "spring" at the stage of recognizing the event 41-2 which is not reflected in the resource difference information 40-2. After that, the virtual machine management unit 12-1 transmits the resource difference information 40-2 to the standby physical machine 10-2 (time T4).

待機系の仮想マシン１５－２は、当該リソース差分情報４０－２を取得し、当該リソース差分情報４０－２を仮想マシン１５－２に反映する。その結果、稼働系と待機系の仮想マシン１５が同期する。 The standby virtual machine 15-2 acquires the resource difference information 40-2 and reflects the resource difference information 40-2 in the virtual machine 15-2. As a result, the active system and the standby system virtual machines 15 are synchronized.

イベント４１－２に起因するリソースの変化は、リソース差分情報４０－３として収集される。イベント４１－２の後に発生したイベント４１－３により、リソース差分情報が所定量保持されると、情報取得状態は「飽和」に設定される。その後、リソース差分情報４０－２に含まれなかったイベント４１－２に起因するリソースの変化情報を含むリソース差分情報は、リソース差分情報４０－３として生成される。その後、当該リソース差分情報４０－３は待機系の物理マシン１０－２に送信される（時刻Ｔ５）。 The resource change caused by the event 41-2 is collected as the resource difference information 40-3. When the resource difference information is held in a predetermined amount by the event 41-3 generated after the event 41-2, the information acquisition state is set to "saturated". After that, the resource difference information including the resource change information due to the event 41-2 not included in the resource difference information 40-2 is generated as the resource difference information 40-3. After that, the resource difference information 40-3 is transmitted to the standby physical machine 10-2 (time T5).

待機系の物理マシン１０－２に含まれる仮想マシン管理部１２－２は、リソース差分情報４０－３を取得し、仮想マシン１５－２のリソースに反映する。その結果、稼働系の仮想マシン１５－１と待機系の仮想マシン１５－２の同期が完了する。 The virtual machine management unit 12-2 included in the standby physical machine 10-2 acquires the resource difference information 40-3 and reflects it in the resources of the virtual machine 15-2. As a result, synchronization between the active virtual machine 15-1 and the standby virtual machine 15-2 is completed.

所定の期間、イベントが発生しないこともある（例えば、図３の時刻Ｔ６～Ｔ７の期間）。このようにイベントが発生しない場合、仮想マシン管理部１２－１は、リソース差分情報を待機系に通知する必要がない。イベントが発生せず、待機系に通知するリソース差分情報が存在しない場合の情報取得状態は「枯渇」に設定される。情報取得状態が「枯渇」の場合、稼働系と待機系の間で同期を取るための動作は行われない。 The event may not occur for a predetermined period (for example, the period from time T6 to T7 in FIG. 3). When the event does not occur in this way, the virtual machine management unit 12-1 does not need to notify the standby system of the resource difference information. The information acquisition status is set to "depleted" when no event occurs and there is no resource difference information to be notified to the standby system. When the information acquisition status is "depleted", the operation for synchronizing between the active system and the standby system is not performed.

所定期間に、所定量のリソース差分情報が保持されてないことがある（例えば、図３の時刻Ｔ８～Ｔ９の期間）。例えば、図３において、イベント４１－４が発生することで、仮想マシン管理部１２－１は、リソース差分情報４０－４の収集（生成）を開始する。当該収集中の最後に発生したイベント４１－５から一定時間経過し、仮想マシン管理部１２－１が設定したタイマのタイムアウトが発生する（時刻Ｔ９）。 A predetermined amount of resource difference information may not be retained in a predetermined period (for example, the period from time T8 to T9 in FIG. 3). For example, in FIG. 3, when the event 41-4 occurs, the virtual machine management unit 12-1 starts collecting (generating) the resource difference information 40-4. A certain time has elapsed from the last event 41-5 that occurred during the collection, and a timer timeout set by the virtual machine management unit 12-1 occurs (time T9).

この場合、情報取得状態は「不足」に設定され、仮想マシン管理部１２－１は、リソース差分情報４０－４を待機系の物理マシン１０－２に送信する。 In this case, the information acquisition state is set to "insufficient", and the virtual machine management unit 12-1 transmits the resource difference information 40-4 to the standby physical machine 10-2.

待機系の仮想マシン管理部１２－２は、リソース差分情報４０－４を取得し、当該情報を仮想マシン１５－２のリソースに反映する。その結果、稼働系の仮想マシン１５－１と待機系の仮想マシン１５－２の同期が行われる。 The standby virtual machine management unit 12-2 acquires the resource difference information 40-4 and reflects the information in the resources of the virtual machine 15-2. As a result, the active virtual machine 15-1 and the standby virtual machine 15-2 are synchronized.

このように、最後のイベントが発生してから所定の期間経過した後、リソース差分情報が所定量保持されていない場合には、当該所定期間経過時のリソース差分情報が待機系に送信される。 In this way, if a predetermined amount of resource difference information is not retained after a predetermined period has elapsed since the last event occurred, the resource difference information after the predetermined period has elapsed is transmitted to the standby system.

［系切り替えに関する説明］
続いて、図面を参照しつつ、系の切り替えに伴う動作を説明する。 [Explanation about system switching]
Subsequently, the operation associated with the system switching will be described with reference to the drawings.

第１の実施形態における稼働系と待機系の切り替えに関する契機は、例えば、以下の５つの場面が想定される。
（１）稼働系の物理マシン１０上の仮想マシン１５のソフトウェア障害。
（２）稼働系の物理マシン１０に実装されているソフトウェアの障害。
（３）物理マシン１０が構成されているハードウェアの障害。
（４）物理マシン１０間に敷設された伝送路２０の障害。
（５）保守者のコマンド操作による切り替え。 For example, the following five situations are assumed as the triggers for switching between the operating system and the standby system in the first embodiment.
(1) Software failure of the virtual machine 15 on the physical machine 10 of the operating system.
(2) Failure of software installed in the physical machine 10 of the operating system.
(3) Failure of the hardware in which the physical machine 10 is configured.
(4) Failure of the transmission line 20 laid between the physical machines 10.
(5) Switching by command operation of the maintenance person.

以下、各場面を説明する。なお、系の切り替えは、系切り替え処理部１３により行われ、待機系の仮想マシン１５に対するＩＰアドレスの付け替え後、各種リソースの差分情報は１回分前の情報（直前の情報）にて稼働復帰とする。その結果、旧稼働系で発生したソフトウェア障害等は引き継がれない仕組みで、サービスの継続稼働が行われる。 Each scene will be described below. The system is switched by the system switching processing unit 13, and after the IP address of the standby virtual machine 15 is replaced, the difference information of various resources is returned to operation with the information one time before (immediately before information). do. As a result, the service will continue to operate with a mechanism that does not take over software failures that occurred in the old operating system.

［（１）稼働系の仮想マシンに発生したソフトウェア障害による系の切り替え］
図４は、稼働系の仮想マシンに発生したソフトウェア障害による系の切り替えを説明するための図である。図４において、稼働系の仮想マシン１５－１上で走行しているタスクにおいてソフトウェア障害５１が発生した場合の動作を説明する。 [(1) System switching due to software failure that occurred in a running virtual machine]
FIG. 4 is a diagram for explaining system switching due to a software failure that occurred in an operating virtual machine. FIG. 4 describes an operation when a software failure 51 occurs in a task running on an operating virtual machine 15-1.

この場合、稼働系の仮想マシン障害検出部１６－１は、ソフトウェア障害５１を検出し、切替処理を実行する（図４の切替検出処理２００）。仮想マシン障害検出部１６－１は当該事実（ソフトウェア障害５１の発生）を系切り替え処理部１３－１に通知する。系切り替え処理部１３－１は、稼働系側での系切り替え処理を実行する（系切り替え処理２０１）。具体的には、系切り替え処理部１３－１は、仮想マシン管理部１２－１に対して仮想マシン１５－１の停止を指示する。 In this case, the virtual machine failure detection unit 16-1 of the operating system detects the software failure 51 and executes the switching process (switching detection process 200 in FIG. 4). The virtual machine failure detection unit 16-1 notifies the system switching processing unit 13-1 of the fact (occurrence of software failure 51). The system switching processing unit 13-1 executes the system switching processing on the operating system side (system switching processing 201). Specifically, the system switching processing unit 13-1 instructs the virtual machine management unit 12-1 to stop the virtual machine 15-1.

また、系切り替え処理部１３－１は、「系切り替え要求」を待機系の物理マシン１０－２に送信する。待機系の物理マシン１０－２は、上記要求を系切り替え処理部１３－２にて取得する。系切り替え処理部１３－２が系切り替え要求を受信することで待機系側の切り替え処理が開始する（図４の切り替え処理開始２０２）。 Further, the system switching processing unit 13-1 transmits a "system switching request" to the standby physical machine 10-2. The standby physical machine 10-2 acquires the above request by the system switching processing unit 13-2. When the system switching processing unit 13-2 receives the system switching request, the switching processing on the standby system side starts (switching processing start 202 in FIG. 4).

系切り替え処理部１３－２は、待機系側の切り替え処理を実行する（系切り替え処理２０３）。具体的には、待機系の仮想マシン１５－２のＩＰアドレスの付け替えと直前に反映されたリソース差分情報（図４ではリソース差分情報４０－３）による仮想マシン１５－２の復帰（動作開始）を、仮想マシン管理部１２－２に指示する。その結果、仮想マシン１５－２は、障害発生前の同期状態からシステムの継続稼働を実現する（継続稼働２０４）。 The system switching processing unit 13-2 executes the switching processing on the standby system side (system switching processing 203). Specifically, the IP address of the standby virtual machine 15-2 is replaced and the virtual machine 15-2 is restored (operation starts) by the resource difference information (resource difference information 40-3 in FIG. 4) reflected immediately before. Is instructed to the virtual machine management unit 12-2. As a result, the virtual machine 15-2 realizes continuous operation of the system from the synchronous state before the failure occurs (continuous operation 204).

［（２）稼働系の物理マシンに発生したソフトウェア障害による系の切り替え］
図５は、稼働系の物理マシンに発生したソフトウェア障害による系の切り替えを説明するための図である。図５において、稼働系の物理マシン１０－１上で走行しているソフトウェアにおいてソフトウェア障害５２が発生した場合の動作を説明する。 [(2) System switching due to software failure that occurred in the operating physical machine]
FIG. 5 is a diagram for explaining system switching due to a software failure that has occurred in an operating physical machine. FIG. 5 describes an operation when a software failure 52 occurs in software running on an operating physical machine 10-1.

この場合、稼働系の物理マシン障害検出部１４－１は、ソフトウェア障害５２を検出し、切替処理を実行する（図５の切替検出処理３００）。物理マシン障害検出部１４－１は当該事実（ソフトウェア障害５２の発生）を系切り替え処理部１３－１に通知する。系切り替え処理部１３－１は、稼働系側での系切り替え処理を実行する（系切り替え処理３０１）。具体的には、系切り替え処理部１３－１は、仮想マシン管理部１２－１に対して仮想マシン１５－１の停止を指示する。 In this case, the physical machine failure detection unit 14-1 of the operating system detects the software failure 52 and executes the switching process (switching detection process 300 in FIG. 5). The physical machine failure detection unit 14-1 notifies the system switching processing unit 13-1 of the fact (occurrence of software failure 52). The system switching processing unit 13-1 executes the system switching processing on the operating system side (system switching processing 301). Specifically, the system switching processing unit 13-1 instructs the virtual machine management unit 12-1 to stop the virtual machine 15-1.

また、系切り替え処理部１３－１は、「系切り替え要求」を待機系の物理マシン１０－２に送信する。待機系の物理マシン１０－２は、上記要求を系切り替え処理部１３－２にて取得する。系切り替え処理部１３－２が系切り替え要求を受信することで待機系側の切り替え処理が開始する（図５の切り替え処理開始３０２）。 Further, the system switching processing unit 13-1 transmits a "system switching request" to the standby physical machine 10-2. The standby physical machine 10-2 acquires the above request by the system switching processing unit 13-2. When the system switching processing unit 13-2 receives the system switching request, the switching processing on the standby system side starts (switching processing start 302 in FIG. 5).

系切り替え処理部１３－２は、待機系側の切り替え処理を実行する（系切り替え処理３０３）。具体的には、待機系の仮想マシン１５－２のＩＰアドレスの付け替えと直前に反映されたリソース差分情報（図５ではリソース差分情報４０－３）による仮想マシン１５－２の動作開始を、仮想マシン管理部１２－２に指示する。その結果、仮想マシン１５－２は、障害発生前の同期状態からシステムの継続稼働を実現する（継続稼働３０４）。 The system switching processing unit 13-2 executes the switching processing on the standby system side (system switching processing 303). Specifically, the IP address of the standby virtual machine 15-2 is replaced and the operation of the virtual machine 15-2 is started by the resource difference information (resource difference information 40-3 in FIG. 5) reflected immediately before. Instruct the machine management unit 12-2. As a result, the virtual machine 15-2 realizes continuous operation of the system from the synchronous state before the failure occurs (continuous operation 304).

［（３）稼働系の物理マシンに発生したハードウェア障害による系の切り替え］
図６は、稼働系の物理マシンに発生したハードウェア障害による系の切り替えを説明するための図である。図６において、稼働系の物理マシン１０－１のハードウェアにおいてハードウェア障害５３が発生した場合の動作を説明する。 [(3) System switching due to a hardware failure that occurred in an operating physical machine]
FIG. 6 is a diagram for explaining system switching due to a hardware failure that occurred in an operating physical machine. FIG. 6 describes an operation when a hardware failure 53 occurs in the hardware of the physical machine 10-1 of the operating system.

この場合、稼働系の物理マシン障害検出部１４－１は、ハードウェア障害５３を検出し、切替処理を実行する（図６の切替検出処理４００）。物理マシン障害検出部１４－１は当該事実（ハードウェア障害５３の発生）を系切り替え処理部１３－１に通知する。系切り替え処理部１３－１は、稼働系側での系切り替え処理を実行する（系切り替え処理４０１）。具体的には、系切り替え処理部１３－１は、仮想マシン管理部１２－１に対して仮想マシン１５－１の停止を指示する。 In this case, the physical machine failure detection unit 14-1 of the operating system detects the hardware failure 53 and executes the switching process (switching detection process 400 in FIG. 6). The physical machine failure detection unit 14-1 notifies the system switching processing unit 13-1 of the fact (occurrence of the hardware failure 53). The system switching processing unit 13-1 executes the system switching processing on the operating system side (system switching processing 401). Specifically, the system switching processing unit 13-1 instructs the virtual machine management unit 12-1 to stop the virtual machine 15-1.

仮想マシン管理部１２－１は、可能な限り仮想マシン１５－１の停止を試みる。即ち、ハードウェア障害５３に起因し、仮想マシン１５－１の停止が行えない場合も想定されるが、仮想マシン管理部１２－１は、可能な限り仮想マシン１５－１を停止するように動作する。 The virtual machine management unit 12-1 tries to stop the virtual machine 15-1 as much as possible. That is, it is assumed that the virtual machine 15-1 cannot be stopped due to the hardware failure 53, but the virtual machine management unit 12-1 operates so as to stop the virtual machine 15-1 as much as possible. do.

また、系切り替え処理部１３－１は、「系切り替え要求」を待機系の物理マシン１０－２に送信する。待機系の物理マシン１０－２は、上記要求を系切り替え処理部１３－２にて取得する。系切り替え処理部１３－２が系切り替え要求を受信することで待機系側の切り替え処理が開始する（図６の切り替え処理開始４０２）。 Further, the system switching processing unit 13-1 transmits a "system switching request" to the standby physical machine 10-2. The standby physical machine 10-2 acquires the above request by the system switching processing unit 13-2. When the system switching processing unit 13-2 receives the system switching request, the switching processing on the standby system side starts (switching processing start 402 in FIG. 6).

系切り替え処理部１３－２は、待機系側の切り替え処理を実行する（系切り替え処理４０３）。具体的には、待機系の仮想マシン１５－２のＩＰアドレスの付け替えと直前に反映されたリソース差分情報（図６ではリソース差分情報４０－３）による仮想マシン１５－２の動作開始を、仮想マシン管理部１２－２に指示する。その結果、仮想マシン１５－２は、障害発生前の同期状態からシステムの継続稼働を実現する（継続稼働４０４）。 The system switching processing unit 13-2 executes the switching processing on the standby system side (system switching processing 403). Specifically, the IP address of the standby virtual machine 15-2 is replaced and the operation of the virtual machine 15-2 is started by the resource difference information (resource difference information 40-3 in FIG. 6) reflected immediately before. Instruct the machine management unit 12-2. As a result, the virtual machine 15-2 realizes continuous operation of the system from the synchronous state before the failure occurs (continuous operation 404).

なお、ハードウェア障害に起因して系の切り替えが発生した場合は、稼働系と待機系の間の同期処理は停止となる。より具体的には、稼働系の仮想マシン管理部１２はリソース差分情報を待機系に送信しない。 If a system switch occurs due to a hardware failure, the synchronization process between the active system and the standby system will be stopped. More specifically, the virtual machine management unit 12 of the active system does not transmit the resource difference information to the standby system.

［（４）伝送経路障害（又は物理マシンのフリーズ）による系の切り替え］
図７は、伝送経路障害（又は物理マシンフリーズ）による系の切り替えを説明するための図である。 [(4) System switching due to transmission path failure (or physical machine freeze)]
FIG. 7 is a diagram for explaining system switching due to a transmission path failure (or physical machine freeze).

稼働系の物理マシン１０－１に生じた何らかの原因又は伝送路２０上での障害（図７参照）により、稼働系と待機系の同期運転が不可能となる場合がある。具体的には、所定の期間経過しても「リソース差分情報」が稼働系から待機系に送信されない場合が、上記同期運転が不可能な場合に相当する。この場合、待機系の通信部１１－２は、所定の期間に亘りリソース差分情報を取得していない旨を物理マシン障害検出部１４－２に通知する。 Synchronous operation of the operating system and the standby system may not be possible due to some cause occurring in the physical machine 10-1 of the operating system or a failure on the transmission line 20 (see FIG. 7). Specifically, the case where the "resource difference information" is not transmitted from the operating system to the standby system even after the lapse of a predetermined period corresponds to the case where the above-mentioned synchronous operation is impossible. In this case, the communication unit 11-2 of the standby system notifies the physical machine failure detection unit 14-2 that the resource difference information has not been acquired for a predetermined period.

物理マシン障害検出部１４－２は、当該通知により、伝送路２０等における障害発生を認識し、系切り替え処理部１３－２に系切り替え処理の開始を指示する。 The physical machine failure detection unit 14-2 recognizes the occurrence of a failure in the transmission line 20 or the like by the notification, and instructs the system switching processing unit 13-2 to start the system switching process.

系切り替え処理部１３－２は、直前に取得済みのリソース差分情報に基づき仮想マシン１５－２を復帰させ、サービスの提供を継続する。 The system switching processing unit 13-2 restores the virtual machine 15-2 based on the resource difference information acquired immediately before, and continues to provide the service.

あるいは、物理マシン障害検出部１４は、他の物理マシン１０に向けて生死確認信号を送信し、他の物理マシン１０の障害（フリーズ）や伝送路２０の障害を検出してもよい。この場合、障害を検出した側の物理マシン１０は、自発的に系の切り替えを行うことができる。障害を検出した後の系切り替え処理に関しては、図４等を参照して説明した内容と同一とすることができるので説明を省略する。 Alternatively, the physical machine failure detection unit 14 may transmit a life / death confirmation signal to the other physical machine 10 to detect a failure (freeze) of the other physical machine 10 or a failure of the transmission line 20. In this case, the physical machine 10 on the side of detecting the failure can spontaneously switch the system. The system switching process after the failure is detected can be the same as the content described with reference to FIG. 4 and the like, and thus the description thereof will be omitted.

また、伝送路２０等の障害もハードウェア障害の一種と捉えることが可能であるので、旧稼働系と新稼働系間での同期運転は停止となる。 Further, since the failure of the transmission line 20 or the like can be regarded as a kind of hardware failure, the synchronous operation between the old operating system and the new operating system is stopped.

このように、待機系の物理マシン１０－２は、稼働系の物理マシン１０－１と自装置の間を接続する伝送路２０等に障害が発生した場合に、リソース差分情報が反映された仮想マシン１５－２を復帰させる。 In this way, the standby physical machine 10-2 is a virtual machine in which the resource difference information is reflected when a failure occurs in the transmission line 20 or the like connecting between the operating physical machine 10-1 and the own device. Return machine 15-2.

［（５）コマンドによる系の切り替え］
図８は、保守者（管理者）が系切り替えコマンドを投入した場合の系の切り替えを説明するための図である。この場合、稼働系の系切り替え処理部１３－１が、保守者による系切り替えコマンド５４の投入を認識する。その結果、系切替検出処理５００が開始する。 [(5) System switching by command]
FIG. 8 is a diagram for explaining system switching when a maintenance person (administrator) inputs a system switching command. In this case, the system switching processing unit 13-1 of the operating system recognizes the input of the system switching command 54 by the maintenance person. As a result, the system switching detection process 500 starts.

系切り替え処理部１３－１は、稼働系側での系切り替え処理を実行する（系切り替え処理５０１）。具体的には、系切り替え処理部１３－１は、仮想マシン管理部１２－１に対して仮想マシン１５－１の停止を指示する。 The system switching processing unit 13-1 executes the system switching processing on the operating system side (system switching processing 501). Specifically, the system switching processing unit 13-1 instructs the virtual machine management unit 12-1 to stop the virtual machine 15-1.

また、系切り替え処理部１３－１は、「系切り替え要求」を待機系の物理マシン１０－２に送信する。待機系の物理マシン１０－２は、上記要求を系切り替え処理部１３－２にて取得する。系切り替え処理部１３－２が系切り替え要求を受信することで待機系側の切り替え処理が開始する（図８の切り替え処理開始５０２）。 Further, the system switching processing unit 13-1 transmits a "system switching request" to the standby physical machine 10-2. The standby physical machine 10-2 acquires the above request by the system switching processing unit 13-2. When the system switching processing unit 13-2 receives the system switching request, the switching processing on the standby system side starts (switching processing start 502 in FIG. 8).

系切り替え処理部１３－２は、待機系側の切り替え処理を実行する（系切り替え処理５０３）。具体的には、待機系の仮想マシン１５－２のＩＰアドレスの付け替えと直前に反映されたリソース差分情報（図８ではリソース差分情報４０－３）による仮想マシン１５－２の動作開始を、仮想マシン管理部１２－２に指示する。その結果、仮想マシン１５－２は、障害発生前の同期状態からシステムの継続稼働を実現する（継続稼働５０４）。 The system switching processing unit 13-2 executes the switching processing on the standby system side (system switching processing 503). Specifically, the IP address of the standby virtual machine 15-2 is replaced and the operation of the virtual machine 15-2 is started by the resource difference information (resource difference information 40-3 in FIG. 8) reflected immediately before. Instruct the machine management unit 12-2. As a result, the virtual machine 15-2 realizes continuous operation of the system from the synchronous state before the failure occurs (continuous operation 504).

このように、稼働系の系切り替え処理部１３－１は、外部から系切り替えコマンドが投入されたことに応じて、待機系の物理マシン１０－２に向けて系切り替え要求を送信してもよい。 In this way, the system switching processing unit 13-1 of the operating system may send a system switching request to the physical machine 10-2 of the standby system in response to the input of the system switching command from the outside. ..

［動作概略］
第１の実施形態に係るフォールトトレラントシステムの動作をまとめると図９、図１０に示すとおりとなる。初めに、図９を参照しつつ、フォールトトレラントシステムにおける同期動作を説明する。次に、図１０を参照しつつ、フォールトトレラントシステムにおける系切り替え動作を説明する。 [Outline of operation]
The operation of the fault tolerant system according to the first embodiment is summarized in FIGS. 9 and 10. First, the synchronous operation in the fault tolerant system will be described with reference to FIG. Next, the system switching operation in the fault tolerant system will be described with reference to FIG.

稼働系の仮想マシン１５上のプロセスが動作等することで、リソースの変化を伴うイベントが発生する（図９のステップＳ０１）。 When a process on the active virtual machine 15 operates or the like, an event accompanied by a change in resources occurs (step S01 in FIG. 9).

稼働系の物理マシン１０は、スナップショット情報を取りだす（図９のステップＳ０２）。 The operating physical machine 10 retrieves snapshot information (step S02 in FIG. 9).

稼働系の物理マシン１０は、リソース差分情報に関する所定の条件が満たされているか否かを判定する（図９のステップＳ０３）。例えば、稼働系の物理マシン１０は、保持されたリソース差分情報が所定量に到達したか否か、あるいは、リソース差分情報の収集を開始して所定時間経過したか否か等を確認する。 The operating physical machine 10 determines whether or not a predetermined condition regarding the resource difference information is satisfied (step S03 in FIG. 9). For example, the operating physical machine 10 confirms whether or not the retained resource difference information has reached a predetermined amount, or whether or not a predetermined time has elapsed after starting the collection of the resource difference information.

所定の条件を満たせば（図９のステップＳ０３、Ｙｅｓ分岐）、リソース差分情報が待機系に送信される。 If a predetermined condition is satisfied (step S03 in FIG. 9, Yes branch), the resource difference information is transmitted to the standby system.

所定の条件を満たさなければ（図９のステップＳ０３、Ｎｏ分岐）、ステップＳ０１以降の処理が繰り返される。 If the predetermined condition is not satisfied (step S03 in FIG. 9, No branch), the processes after step S01 are repeated.

リソース差分情報を取得した待機系の物理マシン１０は、当該情報を待機系の仮想マシン１５のリソースに反映する（図９のステップＳ１１）。 The standby physical machine 10 that has acquired the resource difference information reflects the information in the resources of the standby virtual machine 15 (step S11 in FIG. 9).

障害が発生すると、稼働系の物理マシン１０は、動作中の仮想マシン１５を停止する（図１０のステップＳ２１）。その際、稼働系の物理マシン１０は、待機系に対して「系切り替え要求」を送信する（図１０のステップＳ２２）。 When a failure occurs, the operating physical machine 10 stops the operating virtual machine 15 (step S21 in FIG. 10). At that time, the physical machine 10 of the operating system transmits a "system switching request" to the standby system (step S22 in FIG. 10).

系切り替え要求を受信した待機系の物理マシン１０は、最新のリソース情報が反映された仮想マシン１５を復帰させ、サービスの提供を継続する（図１０のステップＳ３１）。 The standby physical machine 10 that has received the system switching request restores the virtual machine 15 to which the latest resource information is reflected, and continues to provide the service (step S31 in FIG. 10).

［ハードウェア構成］
第１の実施形態に係る物理マシン１０のハードウェア構成について説明する。 [Hardware configuration]
The hardware configuration of the physical machine 10 according to the first embodiment will be described.

図１１は、物理マシン１０のハードウェア構成の一例を示す図である。物理マシン１０は、所謂、情報処理装置（コンピュータ）であり、図１１に例示する構成を備える。例えば、物理マシン１０は、内部バスにより相互に接続される、ＣＰＵ（Central Processing Unit）６１、メモリ６２、入出力インターフェイス６３及び通信手段であるＮＩＣ（Network Interface Card）６４等を備える。 FIG. 11 is a diagram showing an example of the hardware configuration of the physical machine 10. The physical machine 10 is a so-called information processing device (computer), and has the configuration illustrated in FIG. For example, the physical machine 10 includes a CPU (Central Processing Unit) 61, a memory 62, an input / output interface 63, a NIC (Network Interface Card) 64 which is a communication means, and the like, which are connected to each other by an internal bus.

なお、図１１に示す構成は、物理マシン１０のハードウェア構成を限定する趣旨ではない。物理マシン１０は、図示しないハードウェアを含んでもよい。あるいは、物理マシン１０に含まれるＣＰＵ等の数も図１１の例示に限定する趣旨ではなく、例えば、複数のＣＰＵが物理マシン１０に含まれていてもよい。 The configuration shown in FIG. 11 does not mean to limit the hardware configuration of the physical machine 10. The physical machine 10 may include hardware (not shown). Alternatively, the number of CPUs and the like included in the physical machine 10 is not limited to the example shown in FIG. 11, and for example, a plurality of CPUs may be included in the physical machine 10.

メモリ６２は、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、補助記憶装置（ハードディスク等）である。 The memory 62 is a RAM (Random Access Memory), a ROM (Read Only Memory), and an auxiliary storage device (hard disk or the like).

入出力インターフェイス６３は、図示しない表示装置や入力装置のインターフェイスとなる手段である。表示装置は、例えば、液晶ディスプレイ等である。入力装置は、例えば、キーボードやマウス等のユーザ操作を受け付ける装置である。 The input / output interface 63 is a means that serves as an interface for a display device or an input device (not shown). The display device is, for example, a liquid crystal display or the like. The input device is, for example, a device that accepts user operations such as a keyboard and a mouse.

物理マシン１０の機能は、上述の処理モジュールにより実現される。当該処理モジュールは、例えば、メモリ６２に格納されたプログラムをＣＰＵ６１が実行することで実現される。また、そのプログラムは、ネットワークを介してダウンロードするか、あるいは、プログラムを記憶した記憶媒体を用いて、更新することができる。さらに、上記処理モジュールは、半導体チップにより実現されてもよい。即ち、上記処理モジュールが行う機能は、何らかのハードウェアにおいてソフトウェアが実行されることによって実現できればよい。 The function of the physical machine 10 is realized by the above-mentioned processing module. The processing module is realized, for example, by the CPU 61 executing a program stored in the memory 62. In addition, the program can be downloaded via a network or updated using a storage medium in which the program is stored. Further, the processing module may be realized by a semiconductor chip. That is, the function performed by the processing module may be realized by executing the software on some hardware.

以上のように、第１の実施形態に係るフォールトトレラントシステムは、稼働系と待機系の同期運転を実現する。さらに、システムに各種障害が発生した場合、瞬時に稼働系と待機系の切り替えが行われる。その結果、稼働系から待機系への切り替え時間が短縮される。このように、第１の実施形態では、稼働系と待機系の切り替え時間が短く、また、システムの再開（待機系の再起動）を伴わないため、運用中のサービスに与える影響は存在しない。 As described above, the fault-tolerant system according to the first embodiment realizes synchronous operation of the operating system and the standby system. Furthermore, when various failures occur in the system, the active system and the standby system are switched instantly. As a result, the switching time from the active system to the standby system is shortened. As described above, in the first embodiment, the switching time between the active system and the standby system is short, and the system is not restarted (restart of the standby system), so that there is no influence on the service in operation.

また、通常のシステムでは、稼働系において各種リソース情報（例えば、メモリに関する情報）を一定周期等で収集しておく必要がある。そのため、当該情報収集動作に伴う負荷（通常動作から見た場合の無駄な負荷）によりリソース不足が発生する可能性がある。このようなリソース不足が発生し、且つ、不安定な状態で稼働系から待機系へ切り替えが行われると、システム管理者等にとって意図しない障害が発生し得る。即ち、各種リソース情報の収集動作が、迅速なサービス継続に影響を与え、最善な状態に復元できずサービス継続性が低下する問題がある。対して、第１の実施形態では、各種リソース情報はソフトウェアのプロセス、タスク、Ｉ／Ｏ状態等が安定状態で収集されるため、稼働系と待機系の切り替え後も安定した処理の継続運転が可能となる。つまり、第１の実施形態に係るフォールトトレラントシステムでは、稼働系と待機系の同期手法によりソフトウェアのプロセス、タスク等が安定状態（走行終了、中断状態等）の場合に、各種リソースの差分情報を待機系の物理マシン１０に送信する。 Further, in a normal system, it is necessary to collect various resource information (for example, information on memory) in an operating system at regular intervals or the like. Therefore, there is a possibility that a resource shortage may occur due to the load associated with the information collection operation (useless load when viewed from the normal operation). If such a resource shortage occurs and the system is switched from the active system to the standby system in an unstable state, an unintended failure may occur for the system administrator or the like. That is, there is a problem that the collection operation of various resource information affects the prompt service continuity, cannot be restored to the optimum state, and the service continuity is deteriorated. On the other hand, in the first embodiment, various resource information is collected in a stable state such as software processes, tasks, and I / O states, so that stable processing can be continuously operated even after switching between the active system and the standby system. It will be possible. That is, in the fault-tolerant system according to the first embodiment, the difference information of various resources is obtained when the software process, task, etc. are in a stable state (running end, interruption state, etc.) by the synchronization method of the operating system and the standby system. It is transmitted to the physical machine 10 of the standby system.

さらに、通常のフォールトトレラントシステムでは、仮想マシンの実装は同一の物理マシン（ハードウェア）上の運用となることが多く、局地激甚な災害（例えば、火災等）発生時はシステム運用が不可となる場合がある。対して、第１の実施形態では、仮想マシン１５の実装は異なる物理マシン（ハードウェア）１０上に実装されることを前提とするため、局地激甚における災害発生時においても継続的なシステム運用が可能となる。 Furthermore, in a normal fault-tolerant system, the virtual machine is often operated on the same physical machine (hardware), and it is impossible to operate the system in the event of a severe local disaster (for example, fire). May be. On the other hand, in the first embodiment, since it is assumed that the virtual machine 15 is mounted on different physical machines (hardware) 10, continuous system operation is performed even in the event of a disaster in a severe local situation. Is possible.

上記の説明により、本発明の産業上の利用可能性は明らかであるが、本発明は、サービス稼働無停止システムの提供、高可用性が要求されるシステムの構築、障害発生時のサービス提供の継続、障害の原因調査、システムのバックアップ、災害等の激甚対応としてＢＣＰ（Business Continuity Plan）システム向け構成等に好適に適用可能である。 Although the industrial utility of the present invention is clear from the above description, the present invention provides a non-stop service system, constructs a system that requires high availability, and continues to provide services in the event of a failure. , It can be suitably applied to the configuration for BCP (Business Continuity Plan) system as the cause investigation of the failure, the backup of the system, and the severe response such as the disaster.

例えば、障害の原因調査に関し、同期運転が行える利点を活用できる。具体的には、待機系の物理マシン上の仮想マシンを、解析用の物理マシンに複製し、当該複製した仮想マシンを解析することで、運用中サービスを停止させること無く、安全に障害解析が行える。 For example, regarding the investigation of the cause of a failure, the advantage of being able to perform synchronous operation can be utilized. Specifically, by replicating a virtual machine on a standby physical machine to a physical machine for analysis and analyzing the duplicated virtual machine, failure analysis can be performed safely without stopping the operating service. You can.

上記の実施形態の一部又は全部は、以下の付記のようにも記載され得るが、以下には限られない。
［付記１］
上述の第１の視点に係るシステムのとおりである。
［付記２］
前記第１の物理マシンは、
前記稼働系として動作する第１の仮想マシンに生じる障害を検出する、仮想マシン障害検出部と、
前記仮想マシン障害検出部が障害を検出すると、前記第２の物理マシンに向けて系切り替え要求を送信する系切り替え処理部と、
を備える、好ましくは付記１のシステム。
［付記３］
前記第１の物理マシンは、
自装置に生じる障害を検出する、物理マシン障害検出部をさらに備え、
前記系切り替え処理部は、前記仮想マシン障害検出部及び前記物理マシン障害検出部のすくなくとも一方が障害を検出すると、前記第２の物理マシンに向けて前記系切り替え要求を送信する、好ましくは付記２のシステム。
［付記４］
前記第１の物理マシンは、
前記第１の仮想マシンを管理する、仮想マシン管理部をさらに備え、
前記系切り替え処理部は、前記系切り替え要求を前記第２の物理マシンに向けて送信する際に、前記仮想マシン管理部に対して前記稼働系として動作している第１の仮想マシンの停止を指示する、好ましくは付記２又は３のシステム。
［付記５］
前記系切り替え処理部は、外部から系切り替えコマンドが投入されたことに応じて、前記第２の物理マシンに向けて前記系切り替え要求を送信する、好ましくは付記２乃至４のいずれか一に記載のシステム。
［付記６］
前記第２の物理マシンは、前記第１の物理マシンと自装置の間を接続する伝送路に障害が発生した場合に、前記リソース差分情報が反映された第２の仮想マシンを復帰させる、好ましくは付記１乃至５のいずれか一に記載のシステム。
［付記７］
上述の第２の視点に係る計算機のとおりである。
［付記８］
上述の第３の視点に係る計算機のとおりである。
［付記９］
上述の第４の視点に係るシステム制御方法のとおりである。
［付記１０］
上述の第５の視点に係るプログラムのとおりである。
なお、付記７～１０の形態は、付記１の形態と同様に、付記２の形態～付記６の形態に展開することが可能である。 Some or all of the above embodiments may also be described, but not limited to:
[Appendix 1]
It is as the system which concerns on the 1st viewpoint mentioned above.
[Appendix 2]
The first physical machine is
A virtual machine failure detection unit that detects a failure that occurs in the first virtual machine that operates as the operating system, and
When the virtual machine failure detection unit detects a failure, the system switching processing unit that sends a system switching request to the second physical machine, and the system switching processing unit.
Preferably the system of Appendix 1.
[Appendix 3]
The first physical machine is
It also has a physical machine failure detector that detects failures that occur in its own device.
When at least one of the virtual machine failure detection unit and the physical machine failure detection unit detects a failure, the system switching processing unit transmits the system switching request to the second physical machine, preferably Appendix 2. System.
[Appendix 4]
The first physical machine is
Further equipped with a virtual machine management unit that manages the first virtual machine,
When the system switching processing unit transmits the system switching request to the second physical machine, the system switching processing unit stops the first virtual machine operating as the operating system to the virtual machine management unit. Instruct, preferably the system of Appendix 2 or 3.
[Appendix 5]
The system switching processing unit transmits the system switching request to the second physical machine in response to an external system switching command, preferably described in any one of Supplementary note 2 to 4. System.
[Appendix 6]
The second physical machine preferably restores the second virtual machine to which the resource difference information is reflected when a failure occurs in the transmission line connecting the first physical machine and the own device. Is the system according to any one of Supplementary note 1 to 5.
[Appendix 7]
It is as the computer which concerns on the 2nd viewpoint mentioned above.
[Appendix 8]
It is as the computer which concerns on the 3rd viewpoint mentioned above.
[Appendix 9]
This is the system control method according to the fourth viewpoint described above.
[Appendix 10]
This is the program related to the fifth viewpoint described above.
It should be noted that the forms of the appendices 7 to 10 can be expanded into the forms of the appendix 2 to the form of the appendix 6 in the same manner as the form of the appendix 1.

なお、引用した上記の特許文献等の各開示は、本書に引用をもって繰り込むものとする。本発明の全開示（請求の範囲を含む）の枠内において、さらにその基本的技術思想に基づいて、実施形態ないし実施例の変更・調整が可能である。また、本発明の全開示の枠内において種々の開示要素（各請求項の各要素、各実施形態ないし実施例の各要素、各図面の各要素等を含む）の多様な組み合わせ、ないし、選択が可能である。すなわち、本発明は、請求の範囲を含む全開示、技術的思想にしたがって当業者であればなし得るであろう各種変形、修正を含むことは勿論である。特に、本書に記載した数値範囲については、当該範囲内に含まれる任意の数値ないし小範囲が、別段の記載のない場合でも具体的に記載されているものと解釈されるべきである。 Each disclosure of the above-mentioned patent documents cited shall be incorporated into this document by citation. Within the framework of the entire disclosure (including the scope of claims) of the present invention, it is possible to change or adjust the embodiments or examples based on the basic technical idea thereof. Further, various combinations or selections of various disclosure elements (including each element of each claim, each element of each embodiment or embodiment, each element of each drawing, etc.) within the framework of all disclosure of the present invention. Is possible. That is, it goes without saying that the present invention includes all disclosure including claims, various modifications and modifications that can be made by those skilled in the art in accordance with the technical idea. In particular, with respect to the numerical range described in this document, any numerical value or small range included in the range should be construed as being specifically described even if not otherwise described.

１０、１０－１、１０－２、１０２、１１２物理マシン
１１、１１－１、１１－２通信部
１２、１２－１、１２－２仮想マシン管理部
１３、１３－１、１３－２系切り替え処理部
１４、１４－１、１４－２物理マシン障害検出部
１５、１５－１、１５－２、１０１、１１１仮想マシン
１６、１６－１、１６－２仮想マシン障害検出部
２０伝送路
３０－１～３０－ｎタスク
４０－１～４０－４リソース差分情報
４１－１～４１－５イベント
５１ソフトウェア障害（仮想マシン）
５２ソフトウェア障害（物理マシン）
５３ハードウェア障害
５４系切り替えコマンド
６１ＣＰＵ
６２メモリ
６３入出力インターフェイス
６４ＮＩＣ
２００、３００、４００、５００切替検出処理
２０１、３０１、４０１、５０１系切り替え処理（稼働系）
２０２、３０２、４０２、５０２切り替え処理
２０３、３０３、４０３、５０３系切り替え処理（待機系）
２０４、３０４、４０４、５０４継続稼働処理 10, 10-1, 10-2, 102, 112 Physical machine 11, 11-1, 11-2 Communication unit 12, 12-1, 12-2 Virtual machine management unit 13, 13-1, 13-2 System switching Processing unit 14, 14-1, 14-2 Physical machine failure detection unit 15, 15-1, 15-2, 101, 111 Virtual machine 16, 16-1, 16-2 Virtual machine failure detection unit 20 Transmission line 30- 1 to 30-n Tasks 40-1 to 40-4 Resource difference information 41-1 to 41-5 Event 51 Software failure (virtual machine)
52 Software failure (physical machine)
53 Hardware failure 54 System switching command 61 CPU
62 Memory 63 I / O interface 64 NIC
200, 300, 400, 500 switching detection process 201, 301, 401, 501 system switching process (operating system)
202, 302, 402, 502 switching process 203, 303, 403, 503 system switching process (standby system)
204, 304, 404, 504 Continuous operation processing

Claims

第１の仮想マシンが稼働する、第１の物理マシンと、
第２の仮想マシンが稼働する、第２の物理マシンと、
を含み、
前記第１の仮想マシンを稼働系とし、前記第２の仮想マシンを待機系とする場合に、
前記第１の物理マシンは、前記第１の仮想マシンが動作することにより生じるリソースの変化に関する情報をリソース差分情報として、前記第２の物理マシンに送信し、
前記第２の物理マシンは、前記リソース差分情報を前記第２の仮想マシンのリソースに反映すると共に、系の切り替えが必要な場合に、前記リソース差分情報が反映された第２の仮想マシンを、待機を解除された状態に復帰させ、
前記第１の物理マシンは、前記リソース差分情報を生成し、生成したリソース差分情報を前記第２の物理マシンに送信する前にリソースの変化が生じた結果、そのリソースの変化が今回のリソース差分情報に含まれなかった場合に次回のリソース差分情報にそのリソースの変化を反映する、システム。 The first physical machine on which the first virtual machine runs,
The second physical machine on which the second virtual machine runs,
Including
When the first virtual machine is the active system and the second virtual machine is the standby system,
The first physical machine transmits information on resource changes caused by the operation of the first virtual machine to the second physical machine as resource difference information.
The second physical machine reflects the resource difference information in the resources of the second virtual machine, and when it is necessary to switch the system, the second virtual machine to which the resource difference information is reflected is used. Return the standby to the released state and
The first physical machine generates the resource difference information, and as a result of a change in the resource before transmitting the generated resource difference information to the second physical machine, the change in the resource is the current resource difference. A system that reflects changes in that resource in the next resource diff information if it is not included in the information .

前記第１の物理マシンは、
前記稼働系として動作する第１の仮想マシンに生じる障害を検出する、仮想マシン障害検出部と、
前記仮想マシン障害検出部が障害を検出すると、前記第２の物理マシンに向けて系切り替え要求を送信する系切り替え処理部と、
を備える、請求項１のシステム。 The first physical machine is
A virtual machine failure detection unit that detects a failure that occurs in the first virtual machine that operates as the operating system, and
When the virtual machine failure detection unit detects a failure, the system switching processing unit that sends a system switching request to the second physical machine, and the system switching processing unit.
The system of claim 1.

前記第１の物理マシンは、
自装置に生じる障害を検出する、物理マシン障害検出部をさらに備え、
前記系切り替え処理部は、前記仮想マシン障害検出部及び前記物理マシン障害検出部のすくなくとも一方が障害を検出すると、前記第２の物理マシンに向けて前記系切り替え要求を送信する、請求項２のシステム。 The first physical machine is
It also has a physical machine failure detector that detects failures that occur in its own device.
2. The system switching processing unit transmits the system switching request to the second physical machine when at least one of the virtual machine failure detecting unit and the physical machine failure detecting unit detects a failure. system.

前記第１の物理マシンは、
前記第１の仮想マシンを管理する、仮想マシン管理部をさらに備え、
前記系切り替え処理部は、前記系切り替え要求を前記第２の物理マシンに向けて送信する際に、前記仮想マシン管理部に対して前記稼働系として動作している第１の仮想マシンの停止を指示する、請求項２又は３のシステム。 The first physical machine is
Further equipped with a virtual machine management unit that manages the first virtual machine,
When the system switching processing unit transmits the system switching request to the second physical machine, the system switching processing unit stops the first virtual machine operating as the operating system to the virtual machine management unit. The system of claim 2 or 3 to indicate.

前記系切り替え処理部は、外部から系切り替えコマンドが投入されたことに応じて、前記第２の物理マシンに向けて前記系切り替え要求を送信する、請求項２乃至４のいずれか一項に記載のシステム。 The system according to any one of claims 2 to 4, wherein the system switching processing unit transmits the system switching request to the second physical machine in response to the input of the system switching command from the outside. System.

前記第２の物理マシンは、前記第１の物理マシンと自装置の間を接続する伝送路に障害が発生した場合に、前記リソース差分情報が反映された第２の仮想マシンを、待機を解除された状態に復帰させる、請求項１乃至５のいずれか一項に記載のシステム。 The second physical machine cancels the standby of the second virtual machine to which the resource difference information is reflected when a failure occurs in the transmission line connecting the first physical machine and its own device. The system according to any one of claims 1 to 5, wherein the system is restored to the state of being in the state of being used .

稼働系の仮想マシンが稼働し、
前記仮想マシンが動作することにより生じるリソースの変化に関する情報をリソース差分情報として、フォールトトレラントシステムを構成する他の計算機に送信する、計算機であって、
前記計算機は、前記リソース差分情報を生成し、生成したリソース差分情報を前記他の計算機に送信する前にリソースの変化が生じた結果、そのリソースの変化が今回のリソース差分情報に含まれなかった場合に次回のリソース差分情報にそのリソースの変化を反映する、計算機。 A working virtual machine is running,
A computer that transmits information on resource changes caused by the operation of the virtual machine as resource difference information to other computers constituting the fault-tolerant system .
The computer generated the resource difference information, and as a result of a change in the resource before transmitting the generated resource difference information to the other computer, the change in the resource was not included in the resource difference information this time. A computer that reflects changes in that resource in the next resource difference information .

フォールトトレラントシステムに含まれる、稼働系の計算機が送信する情報であって、前記稼働系の計算機にて稼働する仮想マシンが動作することにより生じるリソースの変化に関するリソース差分情報を取得し、
前記リソース差分情報を自装置の仮想マシンのリソースに反映すると共に、系の切り替えが必要な場合に、前記リソース差分情報が反映された仮想マシンを、待機を解除された状態に復帰させる、計算機であって、
前記稼働系の計算機は、前記リソース差分情報を生成し、生成したリソース差分情報を待機系の前記計算機に送信する前にリソースの変化が生じた結果、そのリソースの変化が今回のリソース差分情報に含まれなかった場合に次回のリソース差分情報にそのリソースの変化を反映し、待機系の前記計算機は、そのリソースの変化を反映した次回のリソース差分情報を取得する、計算機。 Information transmitted by an operating computer included in a fault-tolerant system, and resource difference information related to resource changes caused by the operation of a virtual machine operating on the operating computer is acquired.
A computer that reflects the resource difference information to the resources of the virtual machine of the own device and returns the virtual machine to which the resource difference information is reflected to the state in which the standby is released when the system needs to be switched . There,
The active computer generates the resource difference information, and as a result of a resource change occurring before transmitting the generated resource difference information to the standby computer, the resource change becomes the resource difference information this time. A computer that reflects the change of the resource in the next resource difference information when it is not included, and the computer of the standby system acquires the next resource difference information that reflects the change of the resource .

第１の仮想マシンが稼働する、第１の物理マシンと、
第２の仮想マシンが稼働する、第２の物理マシンと、
を含むシステムにおいて、
前記第１の仮想マシンを稼働系とし、前記第２の仮想マシンを待機系とする場合に、前記第１の仮想マシンが動作することにより生じるリソースの変化に関する情報をリソース差分情報として、前記第２の物理マシンに送信するステップと、
前記リソース差分情報を前記第２の仮想マシンのリソースに反映するステップと、
系の切り替えが必要な場合に、前記リソース差分情報が反映された第２の仮想マシンを、待機を解除された状態に復帰させるステップと、
前記第１の物理マシンが、前記リソース差分情報を生成し、生成したリソース差分情報を前記第２の物理マシンに送信する前にリソースの変化が生じた結果、そのリソースの変化が今回のリソース差分情報に含まれなかった場合に次回のリソース差分情報にそのリソースの変化を反映するステップと、
を含むシステム制御方法。 The first physical machine on which the first virtual machine runs,
The second physical machine on which the second virtual machine runs,
In a system that includes
When the first virtual machine is used as an active system and the second virtual machine is used as a standby system, the information regarding the resource change caused by the operation of the first virtual machine is used as resource difference information. Steps to send to 2 physical machines and
A step of reflecting the resource difference information in the resource of the second virtual machine,
When it is necessary to switch the system, the step of returning the second virtual machine to which the resource difference information is reflected to the state in which the standby is released , and
As a result of the resource change occurring before the first physical machine generates the resource difference information and transmits the generated resource difference information to the second physical machine, the change in the resource is the current resource difference. The step to reflect the change of the resource in the next resource difference information when it is not included in the information,
System control methods including.

稼働系の仮想マシンを稼働する処理と、
前記仮想マシンが動作することにより生じるリソースの変化に関する情報をリソース差分情報として、フォールトトレラントシステムを構成する他の計算機に送信する処理と、を稼働系の仮想マシンが動作しているコンピュータに実行させるプログラムであって、
前記リソース差分情報を生成し、生成したリソース差分情報を前記他の計算機に送信する前にリソースの変化が生じた結果、そのリソースの変化が今回のリソース差分情報に含まれなかった場合に次回のリソース差分情報にそのリソースの変化を反映する、プログラム。 The process of running a running virtual machine and
Information on resource changes caused by the operation of the virtual machine is sent as resource difference information to other computers that make up the fault-tolerant system, and the computer on which the running virtual machine is running is executed. It ’s a program,
When the resource difference information is generated and the resource difference information is not included in the resource difference information this time as a result of the resource change occurring before the generated resource difference information is transmitted to the other computer, the next time. A program that reflects changes in the resource in the resource difference information.