JPH08235133A

JPH08235133A - Multiprocessing system

Info

Publication number: JPH08235133A
Application number: JP7040659A
Authority: JP
Inventors: Tetsuo Hasegawa; 哲夫長谷川
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-02-28
Filing date: 1995-02-28
Publication date: 1996-09-13

Abstract

PURPOSE: To provide a multiprocessing system which prevents all computers from being shut down even if a fault occurs to a computer owing to the fault of hardware or a bug of a program. CONSTITUTION: This system has a means which classifies plural computers 1-4 into the computers 1 and 2 of a precedent system and the computers 3 and 4 of a follow-up system, and makes the computers of the precedent system start executing a specific application process group and the computers 3 and 4 of the follow-up system start executing the specific application process group at the point of time delayed behind the execution start point of time of the computers 1 and 2 of the precedent system by a period meeting specific conditions, a means which decides whether or not the computers 1 and 2 of the precedent system are shut down owing to a fault of hardware, and a means which removes application processes being executed by the computers 1 and 2 of the precedent system when the computers 1 and 2 are shut down from the application process group of the computers 3 and 4 of the follow-up system and places the computers 3 and 4 of the follow up system as a precedent system when the means decides that the computers 1 and 2 of the precedent system are shut down for some reason other than the fault of the hardware.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、複数の計算機で同一の
処理ないし同一の機能を有する処理を並行して実行する
多重処理システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multiprocessing system in which a plurality of computers execute the same processing or processing having the same function in parallel.

【０００２】[0002]

【従来の技術】変動するデータを処理し、その処理結果
を化学・鉄鋼プラントのような産業システム、交通制御
システムあるいは原子力プラントのような電力システム
といった制御対象システムに伝達制御するシステムにお
いては、いかなる状況下にあってもシステムを常に安全
に制御し、システムに与えられたミッションを確実に達
成することが要求される。2. Description of the Related Art In a system that processes fluctuating data and transfers the processing result to a controlled system such as an industrial system such as a chemical / steel plant, a traffic control system or an electric power system such as a nuclear plant, Even under the circumstances, it is required to always control the system safely and surely accomplish the mission given to the system.

【０００３】このような要求に対し、複数の計算機によ
り同一処理または同一機能を有する処理を並行して実行
する多重処理システムが従来から利用されている。多重
処理システムには、以下に示す種々の方式がある。In response to such a request, a multi-processing system has been conventionally used in which a plurality of computers execute the same processing or processing having the same function in parallel. There are various types of multiprocessing systems as shown below.

【０００４】(1) 複数の計算機で同一のアプリケーショ
ンプロセス群を実行する方式複数の計算機に同一のアプリケーションプロセス群を実
行させる多重処理システムでは、たとえ１台の計算機が
何らかの障害によってダウンしても、他の計算機で処理
を続行させることができるので、処理の中断を避けるこ
とができる。この方式の多重処理システムは、いずれか
１台の計算機にハードウェア障害が発生する確率に比べ
て複数の計算機に同時にハードウェア障害が発生する確
率が非常に低いことを有効に利用している。(1) Method of executing the same application process group on a plurality of computers In a multi-processing system in which a plurality of computers execute the same application process group, even if one computer is down due to some failure, Since the processing can be continued by another computer, interruption of the processing can be avoided. The multiprocessing system of this system effectively utilizes the fact that the probability of hardware failure occurring in a plurality of computers at the same time is extremely lower than the probability of hardware failure occurring in any one computer.

【０００５】しかしながら、アプリケーションプロセス
のプログラムが完全であるという保証はない。プログラ
ムバグを含むアプリケーションプロセスを実行すると、
計算機に障害が発生し、この障害は全ての計算機で起こ
る。従って、たとえ並列多重処理を行っていても、シス
テムでの処理が中断することになる。However, there is no guarantee that the application process program is complete. When you run an application process that contains a program bug,
A computer fails, and this failure occurs on all computers. Therefore, the processing in the system is interrupted even if the parallel and multiple processing is performed.

【０００６】小型の計算機や制御用に用いられる高速応
答性を重視した計算機においては、計算機の動きを管理
するＯＳ（オペレーティングシステム）等の保護機構が
弱く、アプリケーションプロセスのプログラムバグによ
り障害の発生する可能性が高い。従って、並列多重処理
を行っても、一部のアプリケーションプロセスのプログ
ラムバグでシステム全体の処理が中断してしまうという
問題がある。In a small computer or a computer used for control that emphasizes high-speed responsiveness, a protection mechanism such as an OS (operating system) that manages the movement of the computer is weak, and a failure occurs due to a program bug of an application process. Probability is high. Therefore, there is a problem that the processing of the entire system is interrupted by the program bug of some application processes even if the parallel and multiple processing is performed.

【０００７】(2) 複数の計算機で同一の機能を有する複
数の版のプログラムに従ってアプリケーションプロセス
を並行に実行する方式複数の計算機に同一の機能を有する複数の版（バージョ
ン）のプログラム構造に基づくアプリケーションプロセ
スを並行に実行させる多重処理システムでは、たとえ１
台の計算機が何らかの障害によってダウンするか、また
は内部状態に矛盾を起こしたとき外部に悪影響を及ぼす
ことを避けるために以降の処理を中断しても、他の計算
機で処理を続行させることができるので、処理の中断を
避けることができる。(2) Method of executing application processes in parallel according to a plurality of versions of programs having the same function on a plurality of computers An application based on a program structure of a plurality of versions (versions) having the same function on a plurality of computers In a multi-processing system that executes processes in parallel, even 1
If one computer goes down due to some kind of failure, or if it interrupts subsequent processing to avoid adversely affecting the outside when the internal state becomes inconsistent, other computers can continue processing. Therefore, interruption of processing can be avoided.

【０００８】この方式の多重処理システムは、いずれか
１台の計算機にハードウェア障害が発生する確率が複数
台の計算機に同時に発生する確率に比べて非常に少ない
ことに加えて、同一機能を有するアプリケーションプロ
セスを有するいずれかの版のプログラムにプログラムバ
グが存在して障害が発生しても、同一の機能を有する他
の版のプログラムにはプログラムバグが存在せず、障害
が発生しない可能性が高い点を有効に利用している。The multiprocessing system of this system has the same function in addition to the probability that a hardware failure will occur in any one computer at the same time as the probability that a hardware failure will occur in a plurality of computers at the same time. Even if a program bug with an application process has a program bug and a failure occurs, there is a possibility that the program bug does not exist with another version of the program having the same function and the failure does not occur. Make good use of high points.

【０００９】この方式の並列処理システムにおいて、シ
ステム全体の処理が中断されないためには、同一の機能
を有する複数の版のプログラムのうちの最低１つの版は
障害が発生しないことが条件である。In the parallel processing system of this system, in order for the processing of the entire system not to be interrupted, it is a condition that at least one version of the programs of a plurality of versions having the same function does not cause a failure.

【００１０】しかし、障害が発生しにくい安全な作り方
をしたいわゆる安全版プログラムは実行時間が余分にか
かる場合が多く、その結果、このような安全版プログラ
ムに従って処理を実行させると、システム全体としての
処理時間が遅くなってしまうという問題がある。However, a so-called safe version program which is designed to be safe so that a failure does not occur often takes extra execution time. As a result, when processing is executed in accordance with such a safe version program, the entire system becomes There is a problem that the processing time becomes slow.

【００１１】安全版プログラムの実行時間が余分にかか
る理由は、プログラム中の随所に障害発生を防ぐための
異常検出処理を追加しなくてはならないことにある。な
お、時間がかかる処理の代表である検索処理において、
不要な検索を防ぐための処理を追加することによって処
理時間を短縮する手法も各種考案されているが、一般に
高速化するためには余分な処理が必要であり、それに伴
ってプログラムバグが発生し易くなる。The reason why the execution time of the safety version program is extra is that it is necessary to add an abnormality detection process for preventing the occurrence of a failure at various places in the program. In the search process, which is a representative of time-consuming processes,
Various methods have been devised to reduce the processing time by adding processing to prevent unnecessary searches, but in general, extra processing is required to speed up processing, which causes program bugs. It will be easier.

【００１２】(3) 複数の計算機内で稼働するプログラム
全体をＮ版プログラム方式によって多重化する方式プログラムの多重化は、本来プログラムを複製して複数
の計算機で実行する方式であるから、プログラムにバグ
が内在していれば、多重化していても共通のバグが原因
で計算の停止やシステムの一部機能の停止を引き起こす
原因となり、システムに与えられたミッションを達成す
ることが不可能になる。(3) Method of multiplexing the entire program running in a plurality of computers by the N-version program method Since program multiplexing is a method in which a program is originally duplicated and executed by a plurality of computers, If there are bugs, even if they are multiplexed, common bugs will cause calculation to stop and some system functions to stop, making it impossible to accomplish the mission given to the system. .

【００１３】このような事態を回避するための一つの方
式として、“Ｎ版プログラム方式”がある。この方式は
(2) の方式と類似しているが、特に同一機能を達成する
複数の版（バージョン）のプログラムを別々の設計者が
異なる手順で互いに隔離された環境下で作成することが
特徴である。そして、このようにして作成された同一機
能を果たす異なったプログラム群を複数の計算機内で並
列に実行し、それにより得られた複数の出力結果のうち
過半数が一致したものを正しい出力結果として選択す
る。As one method for avoiding such a situation, there is an "N version program method". This method
Although it is similar to the method of (2), it is characterized in that different designers create programs of different versions (versions) that achieve the same function in a mutually isolated environment by different procedures. Then, the different program groups that have the same function created in this way are executed in parallel on multiple computers, and the output result obtained by matching the majority is selected as the correct output result. To do.

【００１４】この“Ｎ版プログラム方式”は、複数版の
プログラムモジュール群を相互に隔離された複数の設計
者で作成するため、プログラムモジュール数が多くなる
ほどプログラム作成コストの増大を招くとともに保守管
理のコスト増大を引き起こすことになる。例えば、３チ
ームが独立に異なる手順で同一機能のプログラムを開発
する場合、従来と同一水準の品質を保証するプログラム
を開発するには３倍の開発人員が必要となり、保守の観
点からも３つの版のプログラム保守・管理のコスト増大
は避けられない。また、３つの版のプログラムを並列に
動作させて結果を判定する場合、最も処理の遅いプログ
ラムの処理性能でシステム性能が決まってしまうため、
処理速度は必然的に遅くなる。In the "N-version program system", a plurality of versions of the program module group are created by a plurality of designers who are isolated from each other. Therefore, as the number of program modules increases, the program creation cost increases and maintenance management is performed. This will increase costs. For example, if three teams independently develop programs with the same function using different procedures, three times as many development personnel are required to develop a program that guarantees the same level of quality as before, and three teams are required from the viewpoint of maintenance. Increasing cost of version program maintenance and management is inevitable. Further, when the three versions of the program are operated in parallel and the result is determined, the system performance is determined by the processing performance of the slowest processing program.
The processing speed is inevitably slow.

【００１５】上述の如く、(1) のシステムではいずれか
の計算機がハードウェア障害によってダウンしても処理
を続行できる利点はあるが、アプリケーションプロセス
のプログラムにバグが含まれていたときには全計算機が
ダウンする問題があり、（２）のシステムではいずれか
の計算機がハードウェア障害によってダウンした場合
や、いずれかの版のプログラムにバグが含まれていた場
合でも他の安全版のプログラムに従って処理を続行でき
る利点はあるが、安全版のプログラムに従って処理を実
行させると処理時間が大幅に遅くなる問題があり、
（３）のシステムではいずれかの計算機がハードウェ
ア障害によってダウンした場合や、いずれかの版のプロ
グラムにバグが含まれている場合でも他の版のプログラ
ムに従って処理を続行できる利点はあるが、処理時間が
遅くなるとともにプログラムの開発コストおよび保守・
管理コストの増加を招く問題があった。As described above, the system (1) has an advantage that the processing can be continued even if one of the computers goes down due to a hardware failure, but if the program of the application process contains a bug, all the computers will be processed. There is a problem of down, and in the system of (2), even if one of the computers goes down due to a hardware failure, or if any version of the program contains a bug, the process will be performed according to the other safe version of the program. Although there is an advantage that you can continue, there is a problem that the processing time will be significantly slowed if you execute the process according to the safe version program,
The system of (3) has the advantage that even if one of the computers goes down due to a hardware failure, or if any version of the program contains a bug, it can continue processing according to the other version of the program. Processing time becomes slow and program development cost and maintenance
There was a problem that caused an increase in management costs.

【００１６】[0016]

【発明が解決しようとする課題】上述の如く、従来の多
重処理システムにあっては、いずれかの計算機がハード
ウェア障害によってダウンした場合でも高速な処理を続
行できること、アプリケーションプロセスのいずれかの
プログラムにバグが含まれていた場合においても高速な
処理を続行できること、という２つの望まれる要件を満
すことができないため、これらの障害発生時に効率の悪
い処理速度でしか運転できない問題があった。As described above, in the conventional multiprocessing system, high-speed processing can be continued even if any computer goes down due to a hardware failure, and any program of the application process can be continued. Since there are two desired requirements that high-speed processing can be continued even when a bug is included, there is a problem that only inefficient processing speed can be operated when these failures occur.

【００１７】そこで本発明は、ハードウェア障害やプロ
グラムバグに起因して全計算機がダウンするのを防止で
きるとともに、特に障害が発生した時でも効率の良い処
理速度で運転を続行させることができる多重処理システ
ムを提供することを目的としている。Therefore, according to the present invention, it is possible to prevent all computers from going down due to a hardware failure or a program bug, and to continue the operation at an efficient processing speed even when a failure occurs. The purpose is to provide a processing system.

【００１８】[0018]

【課題を解決するための手段】上記目的を達成するため
に、第１の発明に係る多重処理システムでは、複数の計
算機と、これらの計算機を先行系の計算機と追従系の計
算機とに分類し、上記先行系の計算機に特定のアプリケ
ーションプロセス群を実行開始させるとともに、上記先
行系の計算機の実行開始時点より所定の条件を満たす期
間だけ遅れた時点から上記追従系の計算機で上記特定の
アプリケーションプロセス群を実行開始させる手段と、
前記先行系の計算機がダウンしたときに、ダウンの原因
がハードウェアの障害によるものか否かを判定する判定
手段と、この判定手段で前記先行系の計算機がハードウ
ェアの障害以外の原因でダウンしたと判定されたときに
は、ダウンしたときに上記先行系の計算機で実行してい
たアプリケーションプロセスを前記追従系の計算機のア
プリケーションプロセス群の中から取り除いて上記追従
系の計算機を先行系として動作させる手段とを備えてい
る。To achieve the above object, in the multiprocessing system according to the first aspect of the present invention, a plurality of computers and these computers are classified into a preceding computer and a following computer. A specific application process group is started by the preceding computer, and the specific application process is executed by the following computer from a time point delayed from the execution start time of the preceding computer by a period satisfying a predetermined condition. Means for starting the flock,
When the preceding computer is down, a judging means for judging whether or not the cause of the down is due to a hardware failure, and this judging means causes the preceding computer to be down due to a cause other than the hardware failure. Means for removing the application process that was being executed by the preceding computer when it went down from the application process group of the following computer and operating the following computer as the preceding system It has and.

【００１９】上記目的を達成するために、第２の発明に
係る多重処理システムでは、複数の計算機と、これらの
計算機を先行系の計算機と追従系の計算機とに分類し、
上記先行系の計算機に特定のアプリケーションプロセス
群を実行開始させるとともに、上記先行系の計算機の実
行開始時点より所定の条件を満たす期間だけ遅れた時点
から上記追従系の計算機で上記特定のアプリケーション
プロセス群を実行開始させる手段と、前記先行系の計算
機がダウンしたときに、ダウンの原因がハードウェアの
障害によるものか否かを判定する判定手段と、この判定
手段で前記先行系の計算機がハードウェアの障害以外の
原因でダウンしたと判定されたときには、ダウンしたと
きに上記先行系の計算機で実行していたアプリケーショ
ンプロセスを前記追従系の計算機のアプリケーションプ
ロセス群の中から取り除くとともに上記追従系を構成し
ている計算機を新先行系と新追従系とに再構成し、上記
新先行系の計算機側から残りのアプリケーションプロセ
スを実行開始させる手段とを備えている。In order to achieve the above object, in the multiprocessing system according to the second invention, a plurality of computers and these computers are classified into a preceding computer and a following computer,
In addition to causing the preceding computer to start executing a specific application process group, the specific computer process group in the following computer from a time point delayed by a period satisfying a predetermined condition from the execution start time point of the preceding system computer Means for starting execution, and a determining means for determining whether or not the cause of the down is due to a hardware failure when the preceding computer is down; If it is determined that the system has gone down due to a cause other than the above-mentioned failure, the application process that was being executed by the preceding computer at the time of down is removed from the application process group of the following computer and the following system is configured. The computer of the new preceding system is reconfigured into a new preceding system and a new following system. And means for starting execution of the rest of the application process from.

【００２０】上記目的を達成するために、第３の発明に
係る多重処理システムでは、複数の計算機と、これらの
計算機を先行系の計算機と追従系の計算機とに分類し、
上記先行系の計算機に特定のアプリケーションプロセス
群を実行開始させるとともに、上記先行系の計算機の実
行開始時点より所定の条件を満たす期間だけ遅れた時点
から上記追従系の計算機で上記特定のアプリケーション
プロセス群を実行開始させる手段と、前記先行系の計算
機がダウンしたときに、ダウンの原因がハードウェアの
障害によるものか否かを判定する判定手段と、この判定
手段で前記先行系の計算機がハードウェアの障害以外の
原因でダウンしたと判定されたときには、ダウンしたと
きに上記先行系の計算機で実行していたアプリケーショ
ンプロセスを前記追従系の計算機のアプリケーションプ
ロセス群の中から取り除くとともに上記追従系を構成し
ている計算機を新先行系と新追従系とに再構成し、上記
新先行系の計算機側から残りのアプリケーションプロセ
スを実行開始させる手段、前記判定手段で前記先行系の
計算機がハードウェアの障害でダウンしたと判定された
ときには、稼働中の計算機を新先行系と新追従系とに再
構成する手段とを備えている。In order to achieve the above object, in the multiprocessing system according to the third invention, a plurality of computers and these computers are classified into a preceding computer and a following computer,
In addition to causing the preceding computer to start executing a specific application process group, the specific computer process group in the following computer from a time point delayed by a period satisfying a predetermined condition from the execution start time point of the preceding system computer Means for starting execution, and a determining means for determining whether or not the cause of the down is due to a hardware failure when the preceding computer is down; If it is determined that the system has gone down due to a cause other than the above-mentioned failure, the application process that was being executed by the preceding computer at the time of down is removed from the application process group of the following computer and the following system is configured. The computer of the new preceding system is reconfigured into a new preceding system and a new following system. Means for starting execution of the remaining application processes, and when the judging means judges that the preceding computer has gone down due to a hardware failure, the running computer is reconfigured into a new preceding system and a new following system. And means for doing so.

【００２１】上記目的を達成するために、第４の発明に
係る多重処理システムでは、同一の機能を有する複数の
版のアプリケーションプロセス用プログラムを保持した
複数の計算機と、これらの計算機を先行系の計算機と追
従系の計算機とに分類し、上記先行系の計算機で特定の
アプリケーションプロセス群をそれぞれ特定の版のプロ
グラムに従って実行開始させるとともに、上記先行系の
計算機の実行開始時点より所定の条件を満たす期間だけ
遅れた時点から上記追従系の計算機で上記特定のアプリ
ケーションプロセス群をそれぞれ上記特定の版あるいは
異なる版のアプリケーションプロセス用プログラムに従
って実行開始させる手段と、前記先行系の計算機がダウ
ンしたときに、ダウンの原因がハードウェアの障害によ
るものか否かを判定する判定手段と、この判定手段で前
記先行系の計算機がハードウェアの障害以外の原因でダ
ウンしたと判定されたときには、ダウンしたときに上記
先行系の計算機で実行していたアプリケーションプロセ
スを前記特定の版とは異なる版のプログラムに従って前
記追従系の計算機で実行させる手段とを備えている。In order to achieve the above object, in the multiprocessing system according to the fourth aspect of the present invention, a plurality of computers holding a plurality of versions of application process programs having the same function, and these computers of the preceding system are used. It is classified into a computer and a follow-up computer, and a specific application process group is started to be executed by the preceding computer according to a program of a specific version, and a predetermined condition is satisfied from the execution start time of the preceding computer. A means for starting execution of the specific application process group according to the program for the specific version or a different version of the application process on the computer of the follow-up system from a point delayed by a period, and when the computer of the preceding system is down, Determine if the cause of the downtime is due to a hardware failure And a determination unit that determines that the preceding computer is down due to a cause other than a hardware failure, the application process being executed by the preceding computer when the computer is down And a means for causing the computer of the follow-up system to execute the program according to a program different from the version.

【００２２】上記目的を達成するために、第５の発明に
係る多重処理システムでは、同一の機能を有する複数の
版のアプリケーションプロセス用プログラムを保持した
複数の計算機と、これらの計算機を先行系の計算機と追
従系の計算機とに分類し、上記先行系の計算機で特定の
アプリケーションプロセス群をそれぞれ特定の版のプロ
グラムに従って実行開始させるとともに、上記先行系の
計算機の実行開始時点より所定の条件を満たす期間だけ
遅れた時点から上記追従系の計算機で上記特定のアプリ
ケーションプロセス群をそれぞれ上記特定の版あるいは
異なる版のアプリケーションプロセス用プログラムに従
って実行開始させる手段と、前記先行系の計算機がダウ
ンしたときに、ダウンの原因がハードウェアの障害によ
るものか否かを判定する判定手段と、この判定手段で前
記先行系の計算機がハードウェアの障害以外の原因でダ
ウンしたと判定されたときには、前記追従系を構成して
いる計算機を新先行系と新追従系とに再構成し、ダウン
したときに上記先行系の計算機で実行していたアプリケ
ーションプロセスを前記特定の版とは異なる版のプログ
ラムに従って上記新先行系の計算機側から残りのアプリ
ケーションプロセスを実行開始させる手段とを備えてい
る。In order to achieve the above object, in the multiprocessing system according to the fifth invention, a plurality of computers holding a plurality of versions of application process programs having the same function, and these computers of a preceding system are used. It is classified into a computer and a follow-up computer, and a specific application process group is started to be executed by the preceding computer according to a program of a specific version, and a predetermined condition is satisfied from the execution start time of the preceding computer. A means for starting execution of the specific application process group according to the program for the specific version or a different version of the application process on the computer of the follow-up system from a point delayed by a period, and when the computer of the preceding system is down, Determine if the cause of the downtime is due to a hardware failure When the determination means and the determination means determines that the preceding system computer is down due to a cause other than a hardware failure, the computers forming the following system are set to the new preceding system and the new following system. Reconfiguring the application process that was running on the preceding computer when reconfiguring, and means for starting execution of the remaining application process from the new preceding computer side according to a program of a version different from the specific version. Is equipped with.

【００２３】上記目的を達成するために、第６の発明に
係る多重処理システムでは、同一の機能を有する複数の
版のアプリケーションプロセス用プログラムを保持した
複数の計算機と、これらの計算機を先行系の計算機と追
従系の計算機とに分類し、上記先行系の計算機で特定の
アプリケーションプロセス群をそれぞれ特定の版のプロ
グラムに従って実行開始させるとともに、上記先行系の
計算機の実行開始時点より所定の条件を満たす期間だけ
遅れた時点から上記追従系の計算機で上記特定のアプリ
ケーションプロセス群をそれぞれ上記特定の版あるいは
異なる版のアプリケーションプロセス用プログラムに従
って実行開始させる手段と、前記先行系の計算機がダウ
ンしたときに、ダウンの原因がハードウェアの障害によ
るものか否かを判定する判定手段と、この判定手段で前
記先行系の計算機がハードウェアの障害以外の原因でダ
ウンしたと判定されたときには、前記追従系を構成して
いる計算機を新先行系と新追従系とに再構成し、ダウン
したときに上記先行系の計算機で実行していたアプリケ
ーションプロセスを上記特定の版とは異なる版のプログ
ラムに従って上記新先行系の計算機側から残りのアプリ
ケーションプロセスを実行開始させる手段と、前記判定
手段で前記先行系の計算機がハードウェアの障害でダウ
ンしたと判定されたときには、稼働中の計算機を新先行
系と新追従系とに再構成する手段とを備えている。な
お、前記判定手段は、先行系の計算機の全てがダウンし
たときに、ハードウェアの障害以外の原因でダウンした
と判定する手段を備えたものが好ましい。In order to achieve the above object, in the multiprocessing system according to the sixth aspect of the present invention, a plurality of computers holding a plurality of versions of application process programs having the same function, and these computers of the preceding system are used. It is classified into a computer and a follow-up computer, and a specific application process group is started to be executed by the preceding computer according to a program of a specific version, and a predetermined condition is satisfied from the execution start time of the preceding computer. A means for starting execution of the specific application process group according to the program for the specific version or a different version of the application process on the computer of the follow-up system from a point delayed by a period, and when the computer of the preceding system is down, Determine if the cause of the downtime is due to a hardware failure When the determination means and the determination means determines that the preceding system computer is down due to a cause other than a hardware failure, the computers forming the following system are set to the new preceding system and the new following system. Reconfiguring the application process that was running on the preceding computer when reconfiguring, and means for starting execution of the remaining application process from the new preceding computer side according to a program of a version different from the specific version. When the determining means determines that the preceding computer has gone down due to a hardware failure, the operating computer is reconfigured into a new preceding system and a new following system. It is preferable that the determining means includes means for determining that all the preceding computers are down due to a cause other than a hardware failure.

【００２４】[0024]

【作用】第１の発明に係る多重処理システムでは、複数
の計算機を先行系と追従系とに分類し、先行系の計算機
に特定のアプリケーションプロセス群を実行開始させる
とともに先行系の計算機より所定の条件を満たす期間だ
け遅れた時点から追従系の計算機で上記特定のアプリケ
ーションプロセス群を実行開始させる。そして、先行系
の計算機がダウンしたとき、ダウンの原因がハードウェ
アの障害によるものか否かを判定手段で判定させ、先行
系の計算機がハードウェアの障害以外の原因でダウンし
たと判定されたときには、ダウンしたときに先行系の計
算機で実行していたアプリケーションプロセスを追従系
の計算機のアプリケーションプロセス群の中から取り除
いて追従系の計算機を先行系として動作させる。In the multiprocessing system according to the first aspect of the present invention, a plurality of computers are classified into an antecedent system and a follower system, a computer of the antecedent system starts to execute a specific application process group, and a computer of the antecedent system executes a predetermined The specific computer system of the tracking system starts execution of the above-mentioned specific application process group at a time point delayed by a period satisfying the condition. Then, when the preceding computer is down, the determining means determines whether or not the cause of the down is due to a hardware failure, and it is determined that the preceding computer is down due to a cause other than the hardware failure. Occasionally, the application process that was being executed by the preceding computer when the computer went down is removed from the application process group of the following computer to operate the following computer as the preceding computer.

【００２５】したがって、追従系の計算機では、ダウン
の原因となったアプリケーションプロセスを含まないア
プリケーションプロセスの処理を続行することになるの
で、追従系の計算機までダウンするのを防止できる。ま
た、追従系での処理速度は先行系での処理速度と同じで
あるため、効率の良い処理を継続できることになる。ま
た、先行系を複数の計算機で構成する方式を採用してい
れば、判定手段でダウンの原因がハードウェア障害によ
るものと判定されても、先行系の計算機のうちの健全な
計算機で処理を続行させることができるので、処理速度
が低下するようなこともない。Therefore, the follow-up computer can continue to process the application process that does not include the application process that has caused the down, so that it is possible to prevent the follow-up computer from going down. Further, since the processing speed of the follow-up system is the same as the processing speed of the preceding system, efficient processing can be continued. In addition, if the method of configuring the preceding system with multiple computers is adopted, even if the determination means determines that the cause of the failure is due to a hardware failure, the healthy computer among the computers of the preceding system can perform the processing. Since it can be continued, the processing speed does not decrease.

【００２６】第２の発明に係る多重処理システムでは、
判定手段で先行系の計算機がハードウェアの障害以外の
原因でダウンしたと判定されたときに、ダウンしたとき
に先行系の計算機で実行していたアプリケーションプロ
セスを追従系の計算機のアプリケーションプロセス群の
中から取り除くとともに追従系を構成している計算機を
新先行系と新追従系とに再構成し、新先行系の計算機側
から残りのアプリケーションプロセスを実行開始させる
ので、第１の発明に係る多重処理システムと同様にダウ
ンの原因となったアプリケーションプロセスで追従系の
計算機までダウンするのを防止できるとともに、新先行
系と新追従系とに再構成しているので、次に起こり得る
ハードウェア障害以外の原因でのダウンに備えることが
できる。また、この場合も先行系（新先行系）を複数の
計算機で構成する方式を採用していれば、判定手段でダ
ウンの原因がハードウェア障害によるものと判定されて
も、先行系（新先行系）の計算機のうちの健全な計算機
で処理を続行させることができるので、処理速度が低下
するようなこともない。In the multiprocessing system according to the second invention,
When the determining unit determines that the preceding computer is down due to a cause other than a hardware failure, the application process that was being executed by the preceding computer when the computer was down Since the computer forming the follow-up system is reconfigured into the new predecessor system and the new follow-up system and the remaining application processes are started to be executed from the computer side of the new predecessor system, the multiplex according to the first invention As with the processing system, it is possible to prevent the computer of the tracking system from going down due to the application process that caused the failure, and because it is reconfigured into a new preceding system and a new tracking system, the next possible hardware failure. It is possible to prepare for down due to causes other than. Also in this case, if a system in which the preceding system (new preceding system) is composed of multiple computers is adopted, even if the determination means determines that the cause of the down is due to a hardware failure, the preceding system (new preceding system) Since a healthy computer among the (system) computers can continue the processing, the processing speed does not decrease.

【００２７】第３の発明に係る多重処理システムでは、
第２の発明に係る多重処理システムが備えている構成に
加えて、判定手段で先行系の計算機がハードウェアの障
害でダウンしたと判定されたときに、稼働中の計算機を
新先行系と新追従系とに再構成する手段を備えているの
で、新先行系の計算機数を常に複数に保つことが可能と
なり、次に起こり得るハードウェア障害によるダウンに
備えることができる。In the multiprocessing system according to the third invention,
In addition to the configuration provided in the multiprocessing system according to the second aspect of the present invention, when the determining unit determines that the preceding computer is down due to a hardware failure, the operating computer is set to the new preceding system. Since the reconfiguring means is provided in the follow-up system, the number of computers in the new predecessor system can always be kept plural, and it is possible to prepare for a down due to a hardware failure that may occur next.

【００２８】第４の発明に係る多重処理システムでは、
判定手段で先行系の計算機がハードウェアの障害以外の
原因でダウンしたと判定されたとき、ダウンしたときに
先行系の計算機で実行していたアプリケーションプロセ
スの版とは異なる版のプログラムに従って追従系の計算
機で実行させる手段を備えているので、第１〜３の発明
に係る多重処理システムと同様に、処理を続行させるこ
とができる。また、先行系を複数の計算機で構成する方
式を採用していれば、判定手段でダウンの原因がハード
ウェア障害によるものと判定されても、先行系の計算機
のうちの健全な計算機で処理を続行させることができ
る。In the multiprocessing system according to the fourth invention,
When the determining means determines that the preceding computer is down due to a cause other than a hardware failure, the follow-up system is executed according to a version of the application process that is different from the version of the application process that was being executed by the preceding computer when it was down. Since it is provided with the means for execution by the computer, the processing can be continued as in the multiprocessing system according to the first to third inventions. In addition, if the method of configuring the preceding system with multiple computers is adopted, even if the determination means determines that the cause of the failure is due to a hardware failure, the healthy computer among the computers of the preceding system can perform the processing. You can continue.

【００２９】第５の発明に係る多重処理システムでは、
判定手段で先行系の計算機がハードウェアの障害以外の
原因でダウンしたと判定されたときに、追従系を構成し
ている計算機を新先行系と新追従系とに再構成し、ダウ
ンしたときに先行系の計算機で実行していたアプリケー
ションプロセスの版とは異なる版のプログラムに従って
新先行系の計算機側から残りのアプリケーションプロセ
スを実行開始させる手段を設けているので、第４の発明
に係る多重処理システムと同様に処理を続行させること
ができるとともに、新先行系と新追従系とに再構成して
いるので、次に起こり得るハードウェア障害以外の原因
でのダウンに備えることができる。また、この場合も先
行系（新先行系）を複数の計算機で構成する方式を採用
していれば、判定手段でダウンの原因がハードウェア障
害によるものと判定されても、先行系（新先行系）の計
算機のうちの健全な計算機で処理を続行させることがで
きるので、処理速度が低下するようなこともない。In the multiprocessing system according to the fifth invention,
When the determining means determines that the computer in the preceding system has gone down due to a cause other than a hardware failure, reconfiguring the computers that make up the following system into a new preceding system and a new following system, and then going down Since the means for starting the execution of the remaining application processes from the computer side of the new preceding system is provided in accordance with the program of the version different from the version of the application process executed by the preceding computer, the multiplex according to the fourth invention is provided. The processing can be continued in the same manner as the processing system, and since the new preceding system and the new following system are reconfigured, it is possible to prepare for a failure due to a cause other than the next possible hardware failure. Also in this case, if a system in which the preceding system (new preceding system) is composed of multiple computers is adopted, even if the determination means determines that the cause of the down is due to a hardware failure, the preceding system (new preceding system) Since a healthy computer among the (system) computers can continue the processing, the processing speed does not decrease.

【００３０】第６の発明に係る多重処理システムでは、
第５の発明に係る多重処理システムの構成に加えて、判
定手段で先行系の計算機がハードウェアの障害でダウン
したと判定されたときに、稼働中の計算機を新先行系と
新追従系とに再構成する手段を備えているので、新先行
系の計算機数を常に複数に保つことが可能となり、次に
起こり得るハードウェア障害によるダウンに備えること
ができる。In the multiprocessing system according to the sixth invention,
In addition to the configuration of the multiprocessing system according to the fifth aspect of the present invention, when the determination means determines that the preceding computer is down due to a hardware failure, the operating computer is changed to a new preceding system and a new following system. Since it is equipped with a means for reconfiguring, it is possible to always keep the number of computers of the new predecessor system plural, and to prepare for a down due to a hardware failure that may occur next.

【００３１】なお、ハードウェアの故障は通常各計算機
が持つ、たとえば、タイムアウト検出を用いたハードウ
ェア自己診断機能の診断結果を参照することによって知
ることができるが、先行系の計算機の全てがダウンした
ときに、ハードウェアの障害以外の原因でダウンしたと
判定する判定手段であれば、ハードウェア診断機能を備
えていない計算機群で構成されるシステムでも、その判
断が可能である。The hardware failure can be known by referring to the diagnosis result of the hardware self-diagnosis function using the timeout detection, which is normally possessed by each computer, but all the computers of the preceding system are down. If the determination means determines that the system is down due to a cause other than a hardware failure at that time, the determination can be made even in a system including a computer group that does not have a hardware diagnostic function.

【００３２】[0032]

【実施例】以下、図面を参照しながら実施例を説明す
る。図１には本発明の一実施例に係る多重処理システム
のブロック構成図が示されている。Embodiments will be described below with reference to the drawings. FIG. 1 shows a block diagram of a multiprocessing system according to an embodiment of the present invention.

【００３３】この多重処理システムは、大きく分けて、
計算機１〜４と、共有メモリ５と、この共有メモリと各
計算機１〜４とを結合するバス６とで構成されている。
各計算機１〜４は、演算装置１１，２１，３１，４１
と、ローカルメモリ１４，２４，３４，４４と、タイム
スライスを発生して各演算装置に知らせるタイマ１５，
２５，３５，４５とを備えている。なお、この例の場
合、各計算機のタイムスライスの間隔は同一に設定され
ている。This multiprocessing system is roughly divided into
It is composed of computers 1 to 4, a shared memory 5, and a bus 6 connecting the shared memory and the computers 1 to 4.
Each of the computers 1 to 4 has an arithmetic unit 11, 21, 31, 41.
A local memory 14, 24, 34, 44, a timer 15 for generating a time slice and notifying each arithmetic unit,
25, 35, and 45 are provided. In the case of this example, the time slice intervals of each computer are set to be the same.

【００３４】各ローカルメモリ１４，２４，３４，４４
には、実行中プロセスキュー１２，２２，３２，４２、
実行遅延プロセスキュー１３，２３，３３，４３、アプ
リケーションプロセスの高速版プログラムを格納する高
速版プログラム格納エリア１６，２６，３６，４６、ア
プリケーションプロセスの安全版プログラムを格納する
安全版プログラム格納エリア１７，２７，３７，４７が
設定されている。なお、安全版プログラムは、高速版プ
ログラムと同一機能を有しているが、高速版プログラム
に比べて障害が発生しにくい安全な作り方をしたプログ
ラムであり、この例では高速版プログラムに比べて２倍
の実行時間を必要としている。Each local memory 14, 24, 34, 44
Include the running process queues 12, 22, 32, 42,
Execution delay process queues 13, 23, 33, 43, high speed version program storage areas 16, 26, 36, 46 for storing high speed version programs of application processes, safe version program storage areas 17, for storing safe version programs of application processes, 27, 37 and 47 are set. The safe version program has the same functions as the high speed version program, but is a safe program that is less likely to cause a failure than the high speed version program. Needs double the execution time.

【００３５】共有メモリ５は、実行状態テーブル５１、
実行終了プロセステーブル５２、障害プロセステーブル
５３を持ち、これらが全計算機とバス６で結合され、全
計算機からアクセスされる。The shared memory 5 includes an execution state table 51,
It has an execution end process table 52 and a failure process table 53, which are connected to all the computers by the bus 6 and are accessed from all the computers.

【００３６】次に、上記のように構成された多重処理シ
ステムの動作を図２〜図４に示す流れ図を適宜参照しな
がら説明する。本実施例に係る多重処理システムでは、
４つの計算機１〜４を先行系と追従系とに分けて動作さ
せている。すなわち、「先行系期待台数」を２台とし、
当初、計算機１と２が先行系に分類され、計算機３と４
が追従系に分類されているものとする。また、各キュー
やテーブルの初期状態は空で何も登録されておらず、タ
イムスライス回数は０から始まるものとする。さらに、
高速版プログラム格納エリア１６，２６，３６，４６に
はそれぞれアプリケーションプロセスＰ１〜Ｐ３の高速
版プログラムが格納され、安全版プログラム格納エリア
１７，２７，３８，４７にはそれぞれＰ１〜Ｐ３の高速
版と同一機能を有する安全版プログラムが格納されてい
るものとする。また、本実施例では指定条件（障害が発
生しなかったことを後段に知らせる条件）として、アプ
リケーションプロセスの実行終了を用いている。Next, the operation of the multi-processing system configured as described above will be described with reference to the flow charts shown in FIGS. In the multiprocessing system according to this embodiment,
The four computers 1 to 4 are divided into a leading system and a following system to operate. In other words, the "expected system expected number" is set to 2,
Initially, computers 1 and 2 were classified as antecedent systems, and computers 3 and 4
Is classified as a tracking system. The initial state of each queue or table is empty and nothing is registered, and the number of time slices starts from 0. further,
The high-speed version program storage areas 16, 26, 36, and 46 store the high-speed version programs of the application processes P1 to P3, respectively, and the safe version program storage areas 17, 27, 38, and 47 store the high-speed version programs of P1 to P3, respectively. It is assumed that a safe version program having the same function is stored. Further, in the present embodiment, the termination of the execution of the application process is used as the designated condition (a condition for notifying the subsequent stage that no failure has occurred).

【００３７】そして、ここでは、各アプリケーションプ
ロセスＰ１〜Ｐ３の高速版プログラムにバグが無けれ
ば、図５に示すタイミングでＰ１〜Ｐ３が生成され、実
行されるべきところ、実際にはタイムスライス回数１回
目に計算機２がハードウェアの故障が原因でダウンし、
さらにＰ３の高速版プログラムにバグがあった場合を例
にとって説明する。Here, if there is no bug in the high-speed version program of each application process P1 to P3, P1 to P3 are generated and executed at the timing shown in FIG. The second time computer 2 went down due to a hardware failure,
Further, a case where there is a bug in the P3 high-speed version program will be described as an example.

【００３８】まず、実行状態テーブル５１の内容は図６
に示す状態にあるものとする。この状態で全計算機１〜
４にプロセスＰ１が投入されたとする。プロセスＰ１が
投入されると、計算機１では図２に示す流れ図に従っ
て、まず発生プロセスが障害プロセステーブル５３に無
いことを確認し（Ｓ１１）、自計算機が先行系であるこ
とを確認し（Ｓ１３）、発生プロセスＰ１を、該プロセ
スのプログラムアドレスを高速版プログラム格納エリア
１６に格納されているＰ１の高速版プログラムのアドレ
スにして、実行中プロセスキュー１２に入れる（Ｓ１
４）。計算機２も同様の動作をする。First, the contents of the execution state table 51 are shown in FIG.
It shall be in the state shown in. In this state all computers 1
It is assumed that the process P1 is input to No. 4. When the process P1 is input, the computer 1 first confirms that the occurring process is not in the failed process table 53 according to the flow chart shown in FIG. 2 (S11), and confirms that the own computer is the preceding system (S13). , The generated process P1 is placed in the running process queue 12 by using the program address of the process as the address of the high-speed version program of P1 stored in the high-speed version program storage area 16 (S1
4). The computer 2 also performs the same operation.

【００３９】一方、これと同時に計算機３では、やはり
図２に示す流れ図に従って、発生プロセスＰ１が障害プ
ロセステーブル５３に無く（Ｓ１１）、自計算機が追従
系で、かつＰ１が実行終了プロセステーブル５２にも無
いことを確認し（Ｓ１３）、発生プロセスＰ１を実行遅
延プロセスキュー３３に入れる（Ｓ１５）。計算機４も
同様の動作をする。On the other hand, at the same time, in the computer 3, according to the flow chart shown in FIG. 2, the process P1 is not in the failed process table 53 (S11), the computer itself is the follower system, and P1 is in the execution end process table 52. It is confirmed that there is none (S13), and the generated process P1 is put in the execution delay process queue 33 (S15). The computer 4 also performs the same operation.

【００４０】この結果、計算機１〜４のキューの内容は
図７(a) ，(b) のようになる。この時点で、他のキュー
は依然空のままである。なお、以下の説明で用いる図で
は、実行中プロセスキュー１２，２２，３２，４２に格
納されているアプリケーションプロセスのうち、プログ
ラムアドレスとして高速版プログラムのアドレスになっ
ているプロセスをＰｎ（高）で表記し、安全版プログラ
ムのアドレスになっているプロセスをＰｎ（安）で表記
する。ただし、ｎ＝１〜３である。As a result, the contents of the queues of the computers 1 to 4 are as shown in FIGS. 7 (a) and 7 (b). At this point, the other queues are still empty. In the figures used in the following description, among the application processes stored in the executing process queues 12, 22, 32, 42, the process having the address of the high-speed version program as the program address is Pn (high). The process which is the address of the safety version program is represented by Pn (cheap). However, n = 1 to 3.

【００４１】＜回数１のタイムスライス開始＞ここで、
タイムスライスが起きる。すると計算機１では図３に示
す流れ図に従って、現在実行中のプロセスが無いことを
確認して何もせず（Ｓ２１）、実行状態テーブル５１上
の自計算機の部分のタイムスライス回数を１増して
「１」にし（Ｓ２２）、自計算機が先行系であることを
確認し（Ｓ２３）、実行中プロセスキュー１２からプロ
セスＰ１を取り出してこれを現在実行中のプロセスとし
て実行状態テーブル５１に登録してプロセスＰ１を起動
する（Ｓ３１）。このＰ１はプログラムアドレスが高速
版プログラムのものであるため、高速版プログラムに従
って実行される。計算機２も同様に動作する。<Start of Time Slice of Number 1> Here,
Time slice occurs. Then, according to the flow chart shown in FIG. 3, the computer 1 confirms that there is no currently executing process and does nothing (S21), increments the number of timeslices of its own computer on the execution status table 51 by 1 (S22), it is confirmed that the own computer is the preceding system (S23), the process P1 is taken out from the executing process queue 12, and this is registered in the execution state table 51 as the currently executing process, and the process P1 is registered. Is activated (S31). Since the program address of this P1 is that of the high-speed version program, it is executed in accordance with the high-speed version program. The computer 2 operates similarly.

【００４２】一方、これと同時に計算機３では、計算機
１と同様、図３に示す流れ図に従って、ステップＳ２２
まで計算機１と同様に動作するが、自計算機が追従系で
あるため（Ｓ２３）、先行系の「稼働中」の計算機のす
べてのタイムスライス番号が自計算機の「１」より２以
上遅れていない（この時点では先行系の計算機１、２と
も「０」または「１」である）ことを確認し（Ｓ２
４）、（先行系期待台数（この実施例では２）−タイム
スライス番号が自計算機より２以上遅れていない計算機
数（この時点で２））が０であることを確認し（Ｓ２
６）、実行終了プロセステーブル５２には何も入ってお
らず（Ｓ２９）、実行中プロセスキュー３２には何も入
っていないので何もしない（Ｓ３１）。計算機４も同様
に動作する。On the other hand, at the same time, like the computer 1, the computer 3 executes step S22 in accordance with the flowchart shown in FIG.
Up to computer 1, but since the computer itself is a follow-up system (S23), all the time slice numbers of the "operating" computer of the preceding system are not more than 2 behind the computer's "1". (At this point, both of the preceding computers 1 and 2 are "0" or "1") (S2
4), it is confirmed that (the expected number of preceding systems (2 in this embodiment) -the number of computers whose time slice number is not more than 2 behind the own computer (2 at this point)) is 0 (S2
6) Nothing is entered in the execution end process table 52 (S29), and nothing is entered in the executing process queue 32, so nothing is done (S31). The computer 4 operates similarly.

【００４３】この結果、実行状態テーブル５１および各
キューの内容は図８(a) 〜(c) のようになる。この回数
１のタイムスライスの間に計算機１ではプロセスＰ１が
プロセスＰ２を生成する。すると、計算機１は図２に示
す流れ図に従い、発生プロセスＰ２が障害プロセステー
ブル５３に無いことを確認し（Ｓ１１）、自計算機が先
行系なので（Ｓ１３）、発生プロセスＰ２を、該プロセ
スのプログラムアドレスを高速版プログラム格納エリア
１６に格納されているＰ２の高速版プログラムのアドレ
スにして、実行中プロセスキュー１２に入れる（Ｓ１
４）。この結果の様子を図８(d) に示す。As a result, the contents of the execution state table 51 and each queue are as shown in FIGS. 8 (a) to 8 (c). In the computer 1, the process P1 creates the process P2 during the time slice of the number of times 1. Then, the computer 1 confirms that the process P2 is not in the failed process table 53 according to the flow chart shown in FIG. 2 (S11), and since the own computer is the preceding system (S13), the process P2 is set to the program address of the process. Is set to the address of the high-speed version program of P2 stored in the high-speed version program storage area 16 and put in the running process queue 12 (S1
4). The results are shown in Fig. 8 (d).

【００４４】一方、この回数１のタイムスライスの間に
計算機２がハードウェアの故障が原因でダウンしたとす
る。＜回数２のタイムスライス開始＞次に、再度タイムスラ
イスが起きると、計算機１では図３に示す流れ図に従っ
て、現在実行中のプロセスＰ１を図８(d) に示す状態の
実行中プロセスキュー１２に入れ（Ｓ２１）、実行状態
テーブル５１上の自計算機の部分のタイムスライス回数
を１増して「２」にし（Ｓ２２）、自計算機が先行系で
あることを確認し（Ｓ２３）、実行中プロセスキュー１
２からＰ２を取り出して、これを現在実行中プロセスと
して実行状態テーブル５１に登録して起動する（Ｓ３
１）。このＰ２はプログラムアドレスが高速版のものな
ので、高速版プログラムに従って実行される。On the other hand, it is assumed that the computer 2 is down during the time slice of this number of times 1 due to a hardware failure. <Start of Time Slice of Number of Times 2> Next, when a time slice occurs again, the computer 1 follows the flow chart shown in FIG. 3 to place the currently executing process P1 in the executing process queue 12 in the state shown in FIG. 8 (d). (S21), the number of time slices of the self computer on the execution status table 51 is incremented by 1 to "2" (S22), and it is confirmed that the self computer is the preceding system (S23). 1
2 is taken out, P2 is registered as the currently executing process in the execution state table 51, and the process is activated (S3).
1). Since P2 has a high-speed program address, it is executed in accordance with the high-speed version program.

【００４５】このとき、計算機２はダウンしているので
何もしない。一方、計算機３、４では、回数１のタイム
スライスの時と同様の手続きが行われる。この結果、実
行状態テーブル５１および各キューの内容は図９(a) 〜
(c)のようになる。At this time, since the computer 2 is down, nothing is done. On the other hand, the computers 3 and 4 perform the same procedure as in the time slice of the number of times 1. As a result, the contents of the execution state table 51 and each queue are shown in FIG.
It becomes like (c).

【００４６】この回数２のタイムスライスの間に計算機
１ではプロセスＰ２がプロセスＰ３を生成する。する
と、計算機１は図２に示す流れ図に従い発生プロセスＰ
３が障害プロセステーブル５３に無いことを確認し（Ｓ
１１）、自計算機が先行系なので（Ｓ１３）、発生プロ
セスＰ３を、該プロセスのプログラムアドレスを高速版
プログラム格納エリア１６に格納されているＰ３の高速
版プログラムのアドレスにして、実行中プロセスキュー
１２に入れる（Ｓ１４）。この結果の様子を図９(d) に
示す。In the computer 1, the process P2 creates the process P3 during the time slice of the number of times 2. Then, the computer 1 follows the generation process P according to the flow chart shown in FIG.
Confirm that 3 is not in the failure process table 53 (S
11) Since the own computer is the leading system (S13), the process address of the process P3 is set to the address of the high-speed version program of P3 stored in the high-speed version program storage area 16 and the running process queue 12 (S14). The results are shown in Fig. 9 (d).

【００４７】さらに、この回数２のタイムスライスの間
に計算機１ではＰ２の処理が終了する。すると、計算機
１では図４に示す流れ図に従って、自計算機が先行系で
あることを確認し（Ｓ４１）、終了したプロセスＰ２を
実行終了プロセステーブル５２に入れる。この結果、実
行終了プロセステーブル５２は図９(e) に示すようにな
る。Further, the processing of P2 is completed in the computer 1 during the time slice of the number of times 2. Then, the computer 1 confirms that its own computer is the preceding system according to the flowchart shown in FIG. 4 (S41), and puts the terminated process P2 in the execution termination process table 52. As a result, the execution end process table 52 becomes as shown in FIG. 9 (e).

【００４８】＜回数３のタイムスライス開始＞次に、再
度タイムスライスが起きると、計算機１では回数２のタ
イムスライス開始時と同様、図３に示す流れ図に従った
手続きを行い、Ｐ１を起動する。<Start of Time Slice of Number 3> Next, when a time slice occurs again, the computer 1 performs the procedure according to the flowchart shown in FIG. .

【００４９】これと同時に計算機３では、図３に示す流
れ図に従って現在実行中のプロセスが無いことを確認し
て何もせず（Ｓ２１）、実行状態テーブル５１上の自計
算機の部分のタイムスライス回数を１増して「３」にし
（Ｓ２２）、自計算機が追従系であることを確認し（Ｓ
２３）、先行系でかつ「稼働中」の計算機すべてのタイ
ムスライス番号が自計算機のタイムスライス番号「３」
より２以上遅れていないことを確認し（計算機１が
「３」）（Ｓ２４）、自計算機のサイト番号３が追従系
の計算機群の中で小さい方から（先行系期待台数（２）
−タイムスライス番号が自計算機より２以上遅れていな
い計算機数（この時点で１）＝１）番目でかつ最大サイ
ト番号（この実施例では４）ではないことを確認し（Ｓ
２６）、実行状態テーブル５１上で「稼働中」かつタイ
ムスライス番号が自計算機より２以上遅れている先行系
の計算機２を「ダウン中」にし、実行遅延プロセスキュ
ー３３内のプロセスＰ１を、そのアドレスを高速版プロ
グラム格納エリア３６に格納されているプロセスＰ１の
高速版プログラムのアドレスに合わせて実行中プロセス
キュー３２に移し（Ｓ２７）、自計算機を先行系に分類
する（Ｓ２８）。実行終了プロセステーブル５２にある
プロセスＰ２が実行遅延プロセスキュー３３にはないこ
とを確認し（Ｓ２９）、実行中プロセスキュー３２から
Ｐ１を取り出し、これを現在実行中プロセスとして実行
状態テーブル５１に登録して起動する（Ｓ３１）。この
Ｐ１はプログラムアドレスが高速版プログラムのものな
ので、高速版プログラムに従って実行される。At the same time, the computer 3 confirms that there is no currently executing process according to the flow chart shown in FIG. 3 and does nothing (S21), and determines the number of time slices of the own computer on the execution state table 51. Add 1 to "3" (S22), and confirm that the own computer is a follow-up system (S22).
23), the time slice numbers of all computers that are the preceding system and are "in operation" are the time slice numbers of their own computer "3"
It is confirmed that there is no more than two delays (Computer 1 is "3") (S24), and the site number 3 of its own computer is the smallest among the follower computer groups (preferred system expected number (2)
-Confirm that the time slice number is not the second computer number (1 at this point) = 1) that is not later than the own computer and is not the maximum site number (4 in this embodiment) (S
26), on the execution status table 51, the preceding computer 2 which is “running” and whose time slice number is delayed by 2 or more from the own computer is set to “down”, and the process P1 in the execution delay process queue 33 is The address is moved to the executing process queue 32 in accordance with the address of the high-speed version program of the process P1 stored in the high-speed version program storage area 36 (S27), and the own computer is classified as the preceding system (S28). It is confirmed that the process P2 in the execution end process table 52 is not in the execution delay process queue 33 (S29), P1 is taken out from the executing process queue 32, and this is registered in the execution state table 51 as the currently executing process. To start (S31). Since P1 has the program address of the high-speed version program, it is executed in accordance with the high-speed version program.

【００５０】これと同時に計算機４では、計算機３と同
様、図３に示す流れ図に従って（Ｓ２１）〜（Ｓ２４）
まで実行するが、自計算機のサイト番号４が追従系の計
算機群の中で最大であることを確認し（Ｓ２６）、実行
終了プロセステーブル５２にあるプロセスＰ２が実行遅
延プロセスキュー４３にはないことを確認し（Ｓ２
９）、実行中プロセスキュー４２が空なので何も起動し
ない（Ｓ３１）。At the same time, in the computer 4, as in the computer 3, (S21) to (S24) according to the flow chart shown in FIG.
It is confirmed that the site number 4 of the own computer is the largest among the follower computers (S26), and the process P2 in the execution end process table 52 is not in the execution delay process queue 43. (S2
9), since the executing process queue 42 is empty, nothing is started (S31).

【００５１】これらの結果、実行状態テーブル５１およ
び各キューの内容は図１０(a) 〜(e) のようになる。こ
の回数３のタイムスライスの間に計算機１ではＰ１の処
理が終了する。すると、計算機１では図４に示す流れ図
に従って、自計算機１が先行系であることを確認し（Ｓ
４１）、終了したプロセスＰ１を実行終了プロセステー
ブル５２に入れる。この結果、実行終了プロセステーブ
ル５２は図１０(f) に示すようになる。As a result, the contents of the execution state table 51 and each queue are as shown in FIGS. 10 (a) to 10 (e). In the time slice of this number of times 3, the processing of P1 is completed in the computer 1. Then, according to the flow chart shown in FIG. 4, the computer 1 confirms that its own computer 1 is the preceding system (S
41), the finished process P1 is put in the execution finished process table 52. As a result, the execution end process table 52 becomes as shown in FIG.

【００５２】一方、この回数３のタイムスライスの間に
計算機３ではプロセスＰ１がプロセスＰ２を生成する。
すると、計算機３は図２に示す流れ図に従い発生プロセ
スＰ２が障害プロセステーブル５３に無いことを確認し
て（Ｓ１１）、自計算機が先行系なので（Ｓ１３）、発
生プロセスＰ２を、該プロセスのプログラムアドレスを
高速版プログラム格納エリア３６に格納されているＰ２
の高速版プログラムのアドレスにして、実行中プロセス
キュー３２に入れる（Ｓ１４）。この結果、計算機３の
実行中プロセスキュー３２は図１０(g) のようになる。On the other hand, in the computer 3, the process P1 creates the process P2 during the time slice of the number of times 3.
Then, the computer 3 confirms that the generated process P2 is not in the failed process table 53 according to the flowchart shown in FIG. 2 (S11), and since the own computer is the preceding system (S13), the generated process P2 is set to the program address of the process. P2 stored in the high-speed version program storage area 36
The address of the high speed version program is put in the running process queue 32 (S14). As a result, the running process queue 32 of the computer 3 becomes as shown in FIG.

【００５３】＜回数４のタイムスライス開始＞再度タイ
ムスライスが起きると、計算機１では回数３のタイムス
ライス開始時と同様、図３に示す流れ図に従った手続き
を行い、Ｐ３を起動する。<Start of Time Slice of Number of Times 4> When the time slice occurs again, the computer 1 performs the procedure according to the flowchart shown in FIG.

【００５４】これと同時に計算機３では、図３に示す流
れ図に従い回数２のタイムスライス開始時の計算機１と
同様の手続きを踏んでＰ２を起動する。これと同時に計
算機４では図３に示す流れ図に従って現在実行中のプロ
セスが無いことを確認して何もせず（Ｓ２１）、実行状
態テーブル５１の自計算機の部分のタイムスライス回数
を１増して「４」にし（Ｓ２２）、自計算機が追従系で
あることを確認し（Ｓ２３）、先行系でかつ「稼働中」
の計算機すべてのタイムスライス番号が自計算機のタイ
ムスライス番号「４」より２以上遅れていないことを確
認し（Ｓ２４）、自計算機のサイト番号が追従系の計算
機群の中で最大であることを確認し（Ｓ２６）、実行終
了プロセステーブル５２にありかつ実行遅延プロセスキ
ュー４３にあるプロセスＰ１を、そのアドレスを高速版
プログラム格納エリア４６に格納されているプロセスＰ
１の高速版プログラムのアドレスに合わせて実行遅延プ
ロセスキュー４３から実行中プロセスキュー４２に移し
（Ｓ２９）、実行中プロセスキュー４２からＰ１を取り
出して、これを現在実行中プロセスとして実行状態テー
ブル５１に登録して起動する（Ｓ３１）。このＰ１はプ
ログラムアドレスが高速版プログラムのものなので、高
速版プログラムに従って実行される。At the same time, the computer 3 starts P2 by following the procedure similar to that of the computer 1 at the time of starting the time slice of the number of times 2 according to the flow chart shown in FIG. At the same time, the computer 4 confirms that there is no currently executing process according to the flowchart shown in FIG. 3 and does nothing (S21), and increments the number of time slices of its own computer in the execution state table 51 by "4". (S22), it is confirmed that the own computer is a follow-up system (S23), and it is the preceding system and "in operation".
Check that all the time slice numbers of all the computers are not more than 2 behind the time slice number "4" of the own computer (S24), and confirm that the site number of the own computer is the largest among the computers of the tracking system. After confirming (S26), the process P1 in the execution end process table 52 and in the execution delay process queue 43 is identified as the process P whose address is stored in the high-speed version program storage area 46.
In accordance with the address of the high-speed version program of No. 1, the execution delay process queue 43 is moved to the executing process queue 42 (S29), P1 is taken out from the executing process queue 42, and is stored in the execution state table 51 as the currently executing process. Register and activate (S31). Since P1 has the program address of the high-speed version program, it is executed in accordance with the high-speed version program.

【００５５】これらの結果、実行状態テーブル５１およ
び各キューの内容は図１１(a) 〜(d) のようになる。こ
の回数４のタイムスライスの間に計算機１で実行中のプ
ロセスＰ３の高速版プログラムにあるバグが原因で計算
機１がダウンしたとする。As a result, the contents of the execution state table 51 and each queue are as shown in FIGS. 11 (a) to 11 (d). It is assumed that the computer 1 is down due to a bug in the high-speed version program of the process P3 being executed by the computer 1 during the time slice of the number of times 4.

【００５６】一方、この回数４のタイムスライスの間に
計算機３ではプロセスＰ２がプロセスＰ３を生成する。
すると、計算機３は図２に示す流れ図に従い発生プロセ
スＰ３が障害プロセステーブル５３に無いことを確認し
て（Ｓ１１）、自計算機が先行系であることを確認し
（Ｓ１３）、発生プロセスＰ３を、該プロセスのプログ
ラムアドレスを高速版プログラム格納エリア３６に格納
されているＰ３の高速版プログラムのアドレスにして、
実行中プロセスキュー３２に入れる（Ｓ１４）。この結
果、計算機３の実行中プロセスキュー３２は図１１(e)
のようになる。On the other hand, in the computer 3, the process P2 creates the process P3 during the time slice of the number of times 4.
Then, the computer 3 confirms that the occurring process P3 does not exist in the failed process table 53 according to the flowchart shown in FIG. 2 (S11), and confirms that the own computer is the preceding system (S13), The program address of the process is set to the address of the P3 high-speed version program stored in the high-speed version program storage area 36,
It is put in the running process queue 32 (S14). As a result, the running process queue 32 of the computer 3 is shown in FIG.
become that way.

【００５７】さらに、この回数４のタイムスライスの間
に計算機４ではプロセスＰ１がプロセスＰ２を生成す
る。すると、計算機４は図２に示す流れ図に従い発生プ
ロセスＰ２が障害プロセステーブル５３に無いことを確
認し（Ｓ１１）、自計算機が追従系であるが、Ｐ２が実
行終了プロセステーブル５２にあるので（Ｓ１３）、発
生プロセスＰ２を、該プロセスのプログラムアドレスを
高速版プログラム格納エリア４６に格納されているＰ２
の高速版プログラムのアドレスにして、実行中プロセス
キュー４２に入れる（Ｓ１４）。この結果、計算機４の
実行中プロセスキューは図１１(f) のようになる。Further, in the computer 4, the process P1 creates the process P2 during the time slice of the number of times 4. Then, the computer 4 confirms that the occurring process P2 does not exist in the failed process table 53 according to the flow chart shown in FIG. 2 (S11), and since its own computer is the follow-up system, P2 exists in the execution end process table 52 (S13). ), P2 in which the program address of the process P2 is stored in the high-speed version program storage area 46
The address of the high-speed version program is stored in the running process queue 42 (S14). As a result, the running process queue of the computer 4 becomes as shown in FIG. 11 (f).

【００５８】その後、この回数４のタイムスライスの間
に計算機３ではＰ２の処理が終了する。すると、計算機
３では図４に示す流れ図に従って、自計算機３が先行系
であることを確認し（Ｓ４１）、終了したプロセスＰ２
を実行終了プロセステーブル５２に入れる。すでに実行
終了プロセステーブル５２にＰ２が入っているので図１
０(f) に示すまま変化はない。After that, the processing of P2 is completed in the computer 3 during the time slice of the number of times 4. Then, the computer 3 confirms that its own computer 3 is the preceding system according to the flow chart shown in FIG. 4 (S41), and ends the process P2.
In the execution end process table 52. Since P2 has already been entered in the execution end process table 52, FIG.
There is no change as shown at 0 (f).

【００５９】＜回数５のタイムスライス開始＞再度タイ
ムスライスが起きると、計算機３では図３に示す流れ図
に従い計算機１の回数３のタイムスライス開始時と同様
の手続きを行い、Ｐ１を起動する。<Start of Time Slice of Number of Times 5> When a time slice occurs again, the computer 3 performs the same procedure as the start of the time slice of number of times 3 of the computer 1 according to the flowchart shown in FIG.

【００６０】これと同時に計算機４では図３に示す流れ
図に従って同計算機４の回数４のタイムスライスの開始
時と同様の手続きを行い、Ｐ２を起動する。これらの結
果、実行状態テーブルおよび各キューの内容は図１２
(a) 〜(c) のようになる。At the same time, the computer 4 performs the same procedure as at the start of the time slice 4 times of the computer 4 according to the flowchart shown in FIG. As a result, the execution state table and the contents of each queue are shown in FIG.
It becomes like (a)-(c).

【００６１】この回数５のタイムスライスの間に計算機
３ではプロセスＰ１が終了し、計算機３は図４に示す流
れ図に従って回数３のタイムスライスの間に計算機１で
プロセスＰ１が終了したときと同じ手続きをする。ただ
し、実行終了プロセステーブル５２には既にＰ１が入っ
ているので変化はない。During this number of times 5 times slice, the process P1 is completed in the computer 3, and the computer 3 follows the same procedure as when the process P1 is completed in computer 1 during the times 3 time slice according to the flow chart shown in FIG. do. However, since P1 has already been stored in the execution end process table 52, there is no change.

【００６２】一方、この回数５のタイムスライスの間に
計算機４ではプロセスＰ２がプロセスＰ３を生成する。
すると、計算機４は図２に示す流れ図に従い発生プロセ
スＰ３が障害プロセステーブル５３に無いことを確認し
て（Ｓ１１）、自計算機が追従系で、かつＰ３が実行終
了プロセステーブル５２に無いことを確認し（Ｓ１
３）、発生プロセスＰ３を実行遅延プロセスキュー４３
に入れる（Ｓ１４）。この結果、計算機４の実行遅延プ
ロセスキューは図１２(d) のようになる。On the other hand, in the computer 4, the process P2 creates the process P3 during the time slice of 5 times.
Then, the computer 4 confirms that the occurring process P3 does not exist in the failed process table 53 according to the flowchart shown in FIG. 2 (S11), and confirms that the own computer is a follow-up system and P3 does not exist in the execution end process table 52. (S1
3), execution process P3 is executed delayed process queue 43
(S14). As a result, the execution delay process queue of the computer 4 becomes as shown in FIG.

【００６３】＜回数６のタイムスライス＞計算機３では
回数４のタイムスライス開始時の計算機１と同様に手続
きしてプロセスＰ３を起動する。<Time Slice of Number of Times 6> The computer 3 starts the process P3 in the same manner as the computer 1 at the time of starting the time slice of the number of times 4.

【００６４】これと同時に計算機４では図３に示す流れ
図に従って同計算機４の回数４のタイムスライスの開始
時と同様の手続きを行い、Ｐ１を起動する。これらの結
果、実行状態テーブル５１および各キューの内容は図１
３(a) 〜(d) のようになる。At the same time, the computer 4 executes the procedure similar to that of the computer 4 at the start of the time slice 4 times according to the flow chart shown in FIG. As a result, the execution state table 51 and the contents of each queue are shown in FIG.
It becomes like 3 (a)-(d).

【００６５】この回数６のタイムスライスの間に計算機
３はプロセスＰ３の高速版プログラムのバグが原因でダ
ウンする。一方、この回数６のタイムスライスの間に計
算機４ではプロセスＰ１が終了し、この計算機４は図４
に示す流れ図に従って回数３のタイムスライスの間に計
算機１でプロセスＰ１が終了した時と同様の手続きをす
る。ただし、実行終了プロセステーブル５２には既にＰ
１が入っているので変化はない。During the time slice of the number of times 6, the computer 3 goes down due to a bug in the high speed version program of the process P3. On the other hand, the process P1 is completed in the computer 4 during the time slice of the number of times 6, and the computer 4 is
According to the flow chart shown in FIG. 3, the procedure similar to that when the process P1 is completed in the computer 1 is performed during the time slice of the number of times 3. However, the execution end process table 52 already has P
There is no change because it contains 1.

【００６６】＜回数７のタイムスライス＞計算機４で
は、回数６のタイムスライスの場合と同様、図３に示す
流れ図に従って（Ｓ２１）〜（Ｓ２９）までの手続きを
するが、実行中プロセスキュー４２が空であるため、プ
ロセスの起動はしない（Ｓ３１）。<Time Slice of Number of Times 7> In the computer 4, as in the case of the time slice of the number of times 6, the procedure from (S21) to (S29) is performed according to the flowchart shown in FIG. Since it is empty, the process is not started (S31).

【００６７】この結果、実行状態テーブルおよび各キュ
ーの内容は図１４(a) 〜(c) のようになる。＜回数８のタイムスライス＞計算機４では、図３に示す
流れ図に従って現在実行中のプロセスが無いことを確認
し（Ｓ２１）、実行状態テーブル５１上の自計算機の部
分のタイムスライス回数を１増して「８」とし（Ｓ２
２）、自計算機が追従系であることを確認し（Ｓ２
３）、先行系の「稼働中」の計算機すべてのタイムスラ
イス番号が自計算機より２以上遅れていることを確認し
（図１４(a) に示すように計算機１が「４」、計算機３
が「６」）（Ｓ２４）、それらの計算機１および計算機
３を「ダウン中」とし、実行中となっているプロセスＰ
３を障害プロセステーブル５３に追加し、さらに図１４
(c) に示すようにプロセスＰ３が自計算機の実行遅延プ
ロセスキュー４３にあるので、そのプログラムアドレス
を安全版プログラム格納エリア４７に格納されているプ
ロセスＰ３の安全版プログラムのアドレスに合わせて実
行中プロセスキュー４２に移す（Ｓ２５）。そして、自
計算機以外がすべて「ダウン中」になっていることを確
認し（Ｓ２６）、タイムスライス番号が自計算機より２
以上遅れている先行系の計算機は既に「ダウン中」にな
っており、実行遅延プロセスキュー４３が空であること
を確認し（Ｓ２７）、自計算機を先行系に分類する（Ｓ
２８）。その後、実行終了プロセステーブル５２にある
プロセスＰ１とＰ２がいずれも実行遅延プロセスキュー
４３（現在空）に無いことを確認し（Ｓ２９）、実行中
プロセスキュー４２からプロセスＰ３を取り出して実行
状態テーブル５１に現在実行中のプロセスとして登録
し、起動する（Ｓ３１）。このＰ１はプログラムアドレ
スが安全版のものなので、安全版プログラムに従って実
行される。As a result, the contents of the execution state table and each queue are as shown in FIGS. 14 (a) to 14 (c). <Time Slice with Number of Times 8> The computer 4 confirms that there is no currently executing process according to the flowchart shown in FIG. 3 (S21), and increments the number of time slices of its own computer on the execution state table 51 by 1. Set to "8" (S2
2) Confirm that the computer is a follow-up system (S2
3) Confirm that the time slice numbers of all the "operating" computers of the preceding system are 2 or more behind the own computer (as shown in Fig. 14 (a), computer 1 is "4", computer 3 is
Is "6") (S24), the computer 1 and the computer 3 are set to "down", and the process P being executed is
3 is added to the failure process table 53, and FIG.
As shown in (c), since the process P3 is in the execution delay process queue 43 of its own computer, its program address is being executed according to the address of the safe version program of the process P3 stored in the safe version program storage area 47. The process is transferred to the process queue 42 (S25). Then, it is confirmed that all the computers other than the own computer are "down" (S26), and the time slice number is 2 from the own computer.
It is confirmed that the computer in the preceding system that has been delayed by the above is already "down", and the execution delay process queue 43 is empty (S27), and the own computer is classified as the preceding system (S27).
28). After that, it is confirmed that neither the processes P1 and P2 in the execution end process table 52 are in the execution delay process queue 43 (currently empty) (S29), the process P3 is taken out from the execution process queue 42, and the execution state table 51 is extracted. Is registered as a process currently being executed and is activated (S31). Since the program address of this P1 is of the safe version, it is executed in accordance with the safe version program.

【００６８】以降、計算機１はプロセスＰ３を安全版の
プログラムに従って実行する。安全版のプログラムに従
った場合は高速版のプログラムに従った場合の２倍の時
間がかかるので、回数９のタイムスライス中に無事その
実行を終了する。Thereafter, the computer 1 executes the process P3 according to the safe version program. Since it takes twice as long to execute the program of the safe version as that of the program of the high-speed version, the execution is safely completed during the time slice of 9 times.

【００６９】このように、本実施例に係るシステムで
は、４つの計算機１〜４を先行系の計算機１，２と追従
系の計算機３，４とに分類して動作開始させる。そし
て、たとえば回数１のタイムスライスでハードウェアの
故障が原因でプロセスＰ１を実行中の計算機２がダウン
すると、最初に追従系に分類されていた計算機３を先行
系に組入れ、プロセスＰ１を高速版のプログラムに従っ
て実行させる。また、たとえばプロセスＰ３の高速版プ
ログラムのバグが原因で回数４のタイムスライスの間に
計算機１がダウンし、計算機３が回数６のタイムスライ
ス間にダウンすると、最初に追従系に分類されていた計
算機４が先行系となってプロセスＰ３を今度は安全版の
プログラムに従って実行させるようにしている。各計算
機が各回数のタイムスライス間に実行したプロセスの一
覧を図１６に示す。As described above, in the system according to this embodiment, the four computers 1 to 4 are classified into the leading computers 1 and 2 and the trailing computers 3 and 4 to start the operation. Then, for example, when the computer 2 that is executing the process P1 goes down due to a hardware failure in the time slice of the number of times 1, the computer 3 that was initially classified as the follower system is incorporated into the preceding system, and the process P1 is executed at the high speed version. Run according to the program. Further, for example, when the computer 1 goes down during the time slice of the number of times 4 and the computer 3 goes down during the time slice of the number of times 6 due to a bug in the high-speed version program of the process P3, it was initially classified as a follow-up system. The computer 4 serves as an antecedent system to execute the process P3 according to the safe version program. FIG. 16 shows a list of processes executed by each computer during each number of times slices.

【００７０】上記動作から判るように、本実施例に係る
システムでは、ハードウェア故障が原因で計算機がダウ
ンした場合には、高速版のプログラムに従って効率を落
とすこと無くプロセスを実行させることができ、またプ
ログラムバグが原因で計算機がダウンした場合には安全
版のプログラムに従って追従系の計算機に先行系の計算
機の処理を引継がせることができるので、効率を落とす
こと無く、かつプロセスの実行を止めること無く計算処
理を続行させることができる。As can be seen from the above operation, in the system according to the present embodiment, when the computer is down due to the hardware failure, the process can be executed according to the high speed version program without lowering the efficiency. Also, if the computer goes down due to a program bug, it is possible to have the follow-up computer take over the processing of the preceding computer in accordance with the safe version of the program, so that the execution of the process can be stopped without reducing the efficiency. The calculation process can be continued without any.

【００７１】比較参考のために他計算機がダウンした時
に実行していたプロセスを、ダウンの理由に関わらず以
降追従系の計算機で安全版のプログラムに従って実行さ
せ場合、各計算機が各回数のタイムスライス間に実行す
るプロセスの一覧を図１７に示す。これから判るよう
に、本実施例の方が処理を止めること無く、速く全プロ
セスの処理を終了させることができる。For the purpose of comparison, when a process that is being executed when another computer is down is executed by a follow-up computer in accordance with a safe version program regardless of the reason for the downtime, each computer performs a time slice of each number of times. FIG. 17 shows a list of processes executed in the meantime. As can be seen from this, in the present embodiment, the processing of all processes can be completed faster without stopping the processing.

【００７２】なお、本発明は上述した実施例に限定され
るものではない。すなわち、上述した実施例では、プロ
グラムバグによる計算機のダウンを純ソフトウェア的に
検出しているが、たとえば計算機に付設されているハー
ドウェア診断装置の診断結果を参照し、図３のステップ
Ｓ２４で「ハードウェアの故障が記録されていなくて、
かつタイムスライス番号が自計算機より２以上遅れてい
る先行系の「稼働中」の計算機があるか」なる比較を行
わせることによって実施例と同様の動作を行わせること
ができる。The present invention is not limited to the above embodiment. That is, in the above-described embodiment, the computer down due to a program bug is detected by pure software. However, for example, referring to the diagnosis result of the hardware diagnosis device attached to the computer, "S24" in FIG. No hardware failures are recorded,
In addition, the same operation as that of the embodiment can be performed by making a comparison "whether there is a preceding computer" in operation "in which the time slice number is two or more behind the own computer".

【００７３】また、先の実施例では先行系の台数が減っ
た場合に、先行系と追従系とを再構成して、先行系の台
数を常に複数台（実施例では２台）保つようにしている
が、先行系の台数が１台でよい場合には、図３のステッ
プＳ２６で「先行系の計算機がすべて「ダウン中」にな
り、かつ、自計算機のサイト番号が追従系のなかで最小
であるか」なる比較を行わせればよい。Further, in the previous embodiment, when the number of the preceding system decreases, the preceding system and the follow-up system are reconfigured so that the number of the preceding system is always kept plural (two in the example). However, if the number of the preceding system is only one, it is determined in step S26 of FIG. 3 that "all the computers of the preceding system are" down "and the site number of the own computer is the following system. It is only necessary to make a comparison "is it minimum?"

【００７４】また、上述した実施例では、アプリケーシ
ョンプロセス群の各アプリケーションプロセスに高速版
プログラムと安全版プログラムとを用意しているが、高
速版プログラムだけを用いることもできる。Further, in the above-described embodiment, the high speed version program and the safe version program are prepared for each application process of the application process group, but it is also possible to use only the high speed version program.

【００７５】[0075]

【発明の効果】以上説明したように、本発明によれば、
ハードウェア故障が原因で計算機がダウンした場合に
は、高速版のプログラムに従って効率を落とすこと無く
プロセスを実行させることができ、またプログラムバグ
が原因で計算機がダウンした場合には高速版あるいは安
全版のプログラムに従って追従系の計算機に先行系の計
算機の処理を引継がせることができるので、効率を落と
すこと無く、かつプロセスの実行を止めること無く処理
を続行させることができる。As described above, according to the present invention,
If the computer is down due to a hardware failure, the process can be executed without compromising efficiency according to the high-speed version of the program, and if the computer is down due to a program bug, the high-speed or safe version Since the computer of the follow-up system can be made to take over the process of the computer of the preceding system in accordance with the program, it is possible to continue the process without lowering the efficiency and without stopping the execution of the process.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の一実施例に係る多重処理システムのブ
ロック構成図FIG. 1 is a block configuration diagram of a multiprocessing system according to an embodiment of the present invention.

【図２】同システムにおけるアプリケーションプロセス
発生時の動作を示す流れ図FIG. 2 is a flowchart showing an operation when an application process occurs in the system.

【図３】同システムにおけるタイムスライス開始時の動
作を示す流れ図FIG. 3 is a flowchart showing an operation at the time of starting a time slice in the same system.

【図４】同システムにおけるアプリケーションプロセス
終了時の動作を示す流れ図FIG. 4 is a flowchart showing an operation at the end of an application process in the system.

【図５】同システムでアプリケーションプロセスを実行
させたときの期待される実行例を示す図FIG. 5 is a diagram showing an expected execution example when an application process is executed in the system.

【図６】同システムの初期状態における実行状態テーブ
ルの内容を示す図FIG. 6 is a diagram showing the contents of an execution state table in the initial state of the system.

【図７】同システムへプロセスを投入した時における各
キューの内容を示す図FIG. 7 is a diagram showing the contents of each queue when a process is submitted to the system.

【図８】同システムにおける回数１のタイムスライス開
始時処理後の実行状態テーブルおよび各キューの内容を
示す図FIG. 8 is a diagram showing the execution state table and the contents of each queue after the time slice start processing of the number of times 1 in the same system.

【図９】同システムにおける回数２のタイムスライス開
始時処理後の実行状態テーブルおよび各キューの内容を
示す図FIG. 9 is a diagram showing the contents of the execution state table and each queue after the processing at the time slice start time of 2 times in the same system.

【図１０】同システムにおいて回数３のタイムスライス
開始時処理後の実行状態テーブルおよび各キューの内容
を示す図FIG. 10 is a diagram showing the contents of the execution state table and each queue after the processing at the time slice start time 3 times in the same system.

【図１１】同システムにおける回数４のタイムスライス
開始時処理後の実行状態テーブルおよび各キューの内容
を示す図FIG. 11 is a diagram showing the execution state table and the contents of each queue after the time slice start processing of the number of times 4 in the same system.

【図１２】同システムにおける回数５のタイムスライス
開始時処理後の実行状態テーブルおよび各キューの内容
を示す図FIG. 12 is a diagram showing the contents of the execution state table and each queue after time slice start processing of the number of times 5 in the system.

【図１３】同システムにおける回数６のタイムスライス
開始時処理後の実行状態テーブルおよび各キューの内容
を示す図FIG. 13 is a diagram showing the execution state table and the contents of each queue after the time slice start processing of the number of times 6 in the same system.

【図１４】同システムにおける回数７のタイムスライス
開始時処理後の実行状態テーブルおよび各キューの内容
を示す図FIG. 14 is a diagram showing the execution state table and the contents of each queue after time slice start processing of the number of times 7 in the system.

【図１５】同システムにおける回数８のタイムスライス
開始時処理中および処理後の実行状態テーブルおよび各
キューの内容を示す図FIG. 15 is a diagram showing the contents of each queue and the execution state table during and after the processing at the time slice start of 8 times in the same system.

【図１６】同システムにおいて各計算機が実行したプロ
セスの一覧を示す図FIG. 16 is a diagram showing a list of processes executed by each computer in the system.

【図１７】計算機のダウンに伴わせて安全版に切換えた
ときに各計算機が実行したプロセスの一覧を示す図FIG. 17 is a diagram showing a list of processes executed by each computer when switching to the safe version due to computer down.

【符号の説明】[Explanation of symbols]

１，２，３，４…計算機５…共有メモリ６…バス１１，２１，３１，４１…演算装置１２，２２，３２，４２…実行中プロセスキュー１３，２３，３３，４３…実行遅延プロセスキュー１４，２４，３４，４４…ローカルメモリ１５，２５，３５，４５…タイマ１６，２６，３６，４６…高速版プログラム格納エリア１７，２７，３７，４７…安全版プログラム格納エリア５１…実行状態テーブル５２…実行終了プロセステーブル５３…障害プロセステーブル 1, 2, 3, 4 ... Computer 5 ... Shared memory 6 ... Buses 11, 21, 31, 41 ... Arithmetic unit 12, 22, 32, 42 ... Executing process queue 13, 23, 33, 43 ... Execution delayed process queue 14, 24, 34, 44 ... Local memory 15, 25, 35, 45 ... Timer 16, 26, 36, 46 ... High speed version program storage area 17, 27, 37, 47 ... Safety version program storage area 51 ... Execution status table 52 ... Execution end process table 53 ... Failure process table

Claims

【特許請求の範囲】[Claims]

【請求項１】複数の計算機と、これらの計算機を先行系の計算機と追従系の計算機とに
分類し、上記先行系の計算機に特定のアプリケーション
プロセス群を実行開始させるとともに、上記先行系の計
算機の実行開始時点より所定の条件を満たす期間だけ遅
れた時点から上記追従系の計算機で上記特定のアプリケ
ーションプロセス群を実行開始させる手段と、前記先行系の計算機がダウンしたときに、ダウンの原因
がハードウェアの障害によるものか否かを判定する判定
手段と、この判定手段で前記先行系の計算機がハードウェアの障
害以外の原因でダウンしたと判定されたときには、ダウ
ンしたときに上記先行系の計算機で実行していたアプリ
ケーションプロセスを前記追従系の計算機のアプリケー
ションプロセス群の中から取り除いて上記追従系の計算
機を先行系として動作させる手段とを具備してなること
を特徴とする多重処理システム。1. A plurality of computers, and these computers are classified into a preceding system computer and a following system computer, and the preceding system computer starts execution of a specific application process group and the preceding system computer. Means for starting execution of the specific application process group in the tracking system computer from a time point delayed by a period satisfying a predetermined condition from the execution start time of, and a cause of the down when the preceding system computer is down. When it is determined by this determining means that the computer of the preceding system is down due to a cause other than a hardware fault, the determining unit that determines whether or not the error is due to a hardware fault Remove the application process that was running on the computer from the application process group of the tracking computer. Multiprocessing system characterized by comprising and means for operating the tracking system of the computer as the prior system.

【請求項２】複数の計算機と、これらの計算機を先行系の計算機と追従系の計算機とに
分類し、上記先行系の計算機に特定のアプリケーション
プロセス群を実行開始させるとともに、上記先行系の計
算機の実行開始時点より所定の条件を満たす期間だけ遅
れた時点から上記追従系の計算機で上記特定のアプリケ
ーションプロセス群を実行開始させる手段と、前記先行系の計算機がダウンしたときに、ダウンの原因
がハードウェアの障害によるものか否かを判定する判定
手段と、この判定手段で前記先行系の計算機がハードウェアの障
害以外の原因でダウンしたと判定されたときには、ダウ
ンしたときに上記先行系の計算機で実行していたアプリ
ケーションプロセスを前記追従系の計算機のアプリケー
ションプロセス群の中から取り除くとともに上記追従系
を構成している計算機を新先行系と新追従系とに再構成
し、上記新先行系の計算機側から残りのアプリケーショ
ンプロセスを実行開始させる手段とを具備してなること
を特徴とする多重処理システム。2. A plurality of computers, and these computers are classified into a preceding computer and a following computer, and the preceding computer is made to start executing a specific application process group, and the preceding computer is also executed. Means for starting execution of the specific application process group in the tracking system computer from a time point delayed by a period satisfying a predetermined condition from the execution start time of, and a cause of the down when the preceding system computer is down. When it is determined by this determining means that the computer of the preceding system is down due to a cause other than a hardware fault, the determining unit that determines whether or not the error is due to a hardware fault If the application process that was running on the computer is removed from the application process group of the tracking computer, And a means for reconfiguring the computer forming the follow-up system into a new preceding system and a new follow-up system and starting execution of the remaining application processes from the computer side of the new preceding system. And multi-processing system.

【請求項３】複数の計算機と、これらの計算機を先行系の計算機と追従系の計算機とに
分類し、上記先行系の計算機に特定のアプリケーション
プロセス群を実行開始させるとともに、上記先行系の計
算機の実行開始時点より所定の条件を満たす期間だけ遅
れた時点から上記追従系の計算機で上記特定のアプリケ
ーションプロセス群を実行開始させる手段と、前記先行系の計算機がダウンしたときに、ダウンの原因
がハードウェアの障害によるものか否かを判定する判定
手段と、この判定手段で前記先行系の計算機がハードウェアの障
害以外の原因でダウンしたと判定されたときには、ダウ
ンしたときに上記先行系の計算機で実行していたアプリ
ケーションプロセスを前記追従系の計算機のアプリケー
ションプロセス群の中から取り除くとともに上記追従系
を構成している計算機を新先行系と新追従系とに再構成
し、上記新先行系の計算機側から残りのアプリケーショ
ンプロセスを実行開始させる手段、前記判定手段で前記先行系の計算機がハードウェアの障
害でダウンしたと判定されたときには、稼働中の計算機
を新先行系と新追従系とに再構成する手段とを具備して
なることを特徴とする多重処理システム。3. A plurality of computers, and these computers are classified into a preceding computer and a following computer, and the preceding computer is made to start executing a specific application process group, and the preceding computer is also executed. Means for starting execution of the specific application process group in the tracking system computer from a time point delayed by a period satisfying a predetermined condition from the execution start time of, and a cause of the down when the preceding system computer is down. When it is determined by this determining means that the computer of the preceding system is down due to a cause other than a hardware fault, the determining unit that determines whether or not the error is due to a hardware fault If the application process that was running on the computer is removed from the application process group of the tracking computer, Means for reconfiguring the computer that constitutes the following system into a new preceding system and a new following system, and starting execution of the remaining application process from the computer side of the new preceding system, the judging means of the preceding system A multiprocessing system comprising means for reconfiguring an operating computer into a new preceding system and a new following system when it is determined that the computer is down due to a hardware failure.

【請求項４】同一の機能を有する複数の版のアプリケー
ションプロセス用プログラムを保持した複数の計算機
と、これらの計算機を先行系の計算機と追従系の計算機とに
分類し、上記先行系の計算機で特定のアプリケーション
プロセス群をそれぞれ特定の版のプログラムに従って実
行開始させるとともに、上記先行系の計算機の実行開始
時点より所定の条件を満たす期間だけ遅れた時点から上
記追従系の計算機で上記特定のアプリケーションプロセ
ス群をそれぞれ上記特定の版あるいは異なる版のアプリ
ケーションプロセス用プログラムに従って実行開始させ
る手段と、前記先行系の計算機がダウンしたときに、ダウンの原因
がハードウェアの障害によるものか否かを判定する判定
手段と、この判定手段で前記先行系の計算機がハードウェアの障
害以外の原因でダウンしたと判定されたときには、ダウ
ンしたときに上記先行系の計算機で実行していたアプリ
ケーションプロセスを前記特定の版とは異なる版のプロ
グラムに従って前記追従系の計算機で実行させる手段と
を具備してなることを特徴とする多重処理システム。4. A plurality of computers holding a plurality of versions of application process programs having the same function, and these computers are classified into a preceding system computer and a following system computer, and the preceding system computer The specific application process group is started to execute according to the program of the specific version, and the specific application process is executed by the following computer from the time when the execution of the preceding computer is delayed by a period satisfying a predetermined condition. Means for starting execution of a group according to the application process program of the specific version or different version, and determining whether or not the cause of the down is due to a hardware failure when the preceding computer is down Means and the determination means, the computer of the preceding system is a hardware When it is determined that the application system is down due to a cause other than the failure, the application process that was being executed by the preceding computer at the time of the down is executed by the following computer according to a program of a version different from the specific version. A multi-processing system comprising means.

【請求項５】同一の機能を有する複数の版のアプリケー
ションプロセス用プログラムを保持した複数の計算機
と、これらの計算機を先行系の計算機と追従系の計算機とに
分類し、上記先行系の計算機で特定のアプリケーション
プロセス群をそれぞれ特定の版のプログラムに従って実
行開始させるとともに、上記先行系の計算機の実行開始
時点より所定の条件を満たす期間だけ遅れた時点から上
記追従系の計算機で上記特定のアプリケーションプロセ
ス群をそれぞれ上記特定の版あるいは異なる版のアプリ
ケーションプロセス用プログラムに従って実行開始させ
る手段と、前記先行系の計算機がダウンしたときに、ダウンの原因
がハードウェアの障害によるものか否かを判定する判定
手段と、この判定手段で前記先行系の計算機がハードウェアの障
害以外の原因でダウンしたと判定されたときには、前記
追従系を構成している計算機を新先行系と新追従系とに
再構成し、ダウンしたときに上記先行系の計算機で実行
していたアプリケーションプロセスを前記特定の版とは
異なる版のプログラムに従って上記新先行系の計算機側
から残りのアプリケーションプロセスを実行開始させる
手段とを具備してなることを特徴とする多重処理システ
ム。5. A plurality of computers holding a plurality of versions of application process programs having the same function, and these computers are classified into a preceding system computer and a following system computer, and the preceding system computer The specific application process group is started to execute according to the program of the specific version, and the specific application process is executed by the following computer from the time when the execution of the preceding computer is delayed by a period satisfying a predetermined condition. Means for starting execution of a group according to the application process program of the specific version or different version, and determining whether or not the cause of the down is due to a hardware failure when the preceding computer is down Means and the determination means, the computer of the preceding system is a hardware When it is determined that the system has gone down due to a cause other than the failure of, the computer that constitutes the tracking system is reconfigured into a new preceding system and a new tracking system, and when it is down, the computer of the preceding system is executing. And a means for starting execution of the remaining application processes from the computer side of the new preceding system according to a program of a version different from the specific version.

【請求項６】同一の機能を有する複数の版のアプリケー
ションプロセス用プログラムを保持した複数の計算機
と、これらの計算機を先行系の計算機と追従系の計算機とに
分類し、上記先行系の計算機で特定のアプリケーション
プロセス群をそれぞれ特定の版のプログラムに従って実
行開始させるとともに、上記先行系の計算機の実行開始
時点より所定の条件を満たす期間だけ遅れた時点から上
記追従系の計算機で上記特定のアプリケーションプロセ
ス群をそれぞれ上記特定の版あるいは異なる版のアプリ
ケーションプロセス用プログラムに従って実行開始させ
る手段と、前記先行系の計算機がダウンしたときに、ダウンの原因
がハードウェアの障害によるものか否かを判定する判定
手段と、この判定手段で前記先行系の計算機がハードウェアの障
害以外の原因でダウンしたと判定されたときには、前記
追従系を構成している計算機を新先行系と新追従系とに
再構成し、ダウンしたときに上記先行系の計算機で実行
していたアプリケーションプロセスを上記特定の版とは
異なる版のプログラムに従って上記新先行系の計算機側
から残りのアプリケーションプロセスを実行開始させる
手段と、前記判定手段で前記先行系の計算機がハードウェアの障
害でダウンしたと判定されたときには、稼働中の計算機
を新先行系と新追従系とに再構成する手段とを具備して
なることを特徴とする多重処理システム。6. A plurality of computers holding a plurality of versions of application process programs having the same function, and these computers are classified into a preceding computer and a following computer, and the preceding computer The specific application process group is started to execute according to the program of the specific version, and the specific application process is executed by the following computer from the time when the execution of the preceding computer is delayed by a period satisfying a predetermined condition. Means for starting execution of a group according to the application process program of the specific version or different version, and determining whether or not the cause of the down is due to a hardware failure when the preceding computer is down Means and the determination means, the computer of the preceding system is a hardware When it is determined that the system has gone down due to a cause other than the failure of, the computer that constitutes the tracking system is reconfigured into a new preceding system and a new tracking system, and when it is down, the computer of the preceding system is executing. Means for starting the remaining application processes from the new predecessor computer side according to a program of a version different from the specific version, and the preceding means computer is down due to a hardware failure in the judging means. A multiprocessing system characterized by comprising means for reconfiguring a computer in operation into a new preceding system and a new following system when it is judged that this is the case.

【請求項７】前記判定手段は、前記先行系の計算機の全
てがダウンしたときに、ハードウェアの障害以外の原因
でダウンしたと判定する手段を備えていることを特徴と
する請求項１乃至６の何れか１項に記載の多重処理シス
テム。7. The determination means comprises means for determining that all of the preceding computers are down due to a cause other than a hardware failure when all of the preceding computers are down. 6. The multiprocessing system according to any one of 6 above.