JPH0259955A

JPH0259955A - Method for supervising operation of multiprocessor system

Info

Publication number: JPH0259955A
Application number: JP63211982A
Authority: JP
Inventors: Satoru Ozaki; 覚尾崎
Original assignee: Fuji Electric Co Ltd
Current assignee: Fuji Electric Co Ltd
Priority date: 1988-08-26
Filing date: 1988-08-26
Publication date: 1990-02-28
Anticipated expiration: 2010-05-01
Also published as: JPH0740253B2

Abstract

PURPOSE:To immediately construct a mutual supervising system without bringing about the increase of a load by combining processors to supervise an operation into a ring and allocating the processors so as to enlarge the ring by order of arrival. CONSTITUTION:In a shared memory 15, respective areas to store supervising ring information to indicate a supervised side processor as an object to be supervised by respective processors 11-14 and supervised ring information to indicate a supervising side processor with the respective processors 11-14 as supervising objects are provided for the respective processors 11-14. Further, when the dislocation of starting timing exists in the plural processors 11-14, a processor to newly join in the mutual supervising system recombines the monitoring and monitored relations between the respective processors 11-14 so that each of the processors 11-14 can supervise another processor to join in the mutual supervising system immediately before the processor. Further, when abnormality in one part of the processors is detected, only processors which are decided as abnormal processors are excluded, and the ring is formed with the other processors. Thus, for every sound processor, the mutual supervising system can be always maintained.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、複数のマイクロプロセッサ（以下、単にプロ
セッサという）が共有メモリを介してデータの授受を行
なうマルチプロセッサシステムにおいて、これらのプロ
セッサが相互に監視を行なうための動作監視方法に関す
る。Detailed Description of the Invention (Industrial Application Field) The present invention relates to a multiprocessor system in which a plurality of microprocessors (hereinafter simply referred to as processors) exchange data via a shared memory. The present invention relates to an operation monitoring method for monitoring operations.

（従来の技術）従来、この種の動作監視方法として、プロセッサ相互間
で予め決められた手順でデータのやりとりを行ない、そ
の動作が実施されなくなった時にそのプロセッサを異常
として判断する方法がある。(Prior Art) Conventionally, as an operation monitoring method of this type, there is a method in which data is exchanged between processors according to a predetermined procedure, and when the operation is no longer performed, it is determined that the processor is abnormal.

例えば、第１０図に示すように複数のプロセッサ２１．
２２，２３．共有バス２４、共有メモリ２５及びバス調
停回路２６を備えたマルチプロセッサシステムにおいて
、プロセッサ２２の動作をプロセッサ２１が監視する場
合を例にとると、まずプロセッサ２２は、共有メモリ２
５内の所定のアドレスのデータΔに対して一定時間ごと
に。For example, as shown in FIG. 10, a plurality of processors 21.
22, 23. Taking as an example a case where the processor 21 monitors the operation of the processor 22 in a multiprocessor system including a shared bus 24, a shared memory 25, and a bus arbitration circuit 26, the processor 22 first monitors the operation of the processor 22.
5 for data Δ at a predetermined address at regular intervals.

（Ａ）＋　Ｉ→Ａの演算を施してデータへの内容を更新する。これに対し
、プロセッサ２１は上記のデータＡを一定時間ごとに読
み出してそれが前回値と異なることを確認する。The contents of the data are updated by performing the operation (A)+I→A. On the other hand, the processor 21 reads the above data A at regular intervals and confirms that it is different from the previous value.

ここで、仮りにプロセッサ２２に障害が発生していわゆ
る暴走状態になると、データＡの更新が行なわれなくな
り（データＡが変化しなくなり）、データＡを監視する
プロセッサ２１によってプロセッサ２２の異常を検出す
ることができる。これを相互監視動作のために予め決め
られたプロセッサ間で相互に実施することにより、マル
チプロセッサシステムにおける動作監視を行なっている
。Here, if a failure occurs in the processor 22 and a so-called runaway state occurs, data A will not be updated (data A will not change), and the processor 21 that monitors data A will detect an abnormality in the processor 22. can do. By performing this mutually between predetermined processors for mutual monitoring, operation monitoring in a multiprocessor system is performed.

（発明が解決しようとする課題）上記のように相互監視時の監視側及び被監視側の関係が
予め決められた従来の方式では、相互監視を行なうへく
組み合わされたプロセッサの一方、例えばプロセッサ２
２が暴走すれば、健全なプロセッサ２１によって暴走の
発生を検出することができるが、これと同時に暴走した
プロセッサ２２がもともと行なっていた他の健全なプロ
セッサ、例えばプロセッサ２３の動作を監視するべきプ
ロセッサがなくなり、プロセッサ２３の信頼性も同時に
低下するという問題を有している。(Problem to be Solved by the Invention) In the conventional system in which the relationship between the monitoring side and the monitored side during mutual monitoring is determined in advance as described above, one of the combined processors performing mutual monitoring, for example, 2
2 goes out of control, the occurrence of the runaway can be detected by the healthy processor 21, but at the same time, the processor that should monitor the operations of other healthy processors, such as processor 23, which the runaway processor 22 was originally performing, There is a problem in that the reliability of the processor 23 also decreases at the same time.

これを防ぐ意味で、単一のプロセッサを常に監視側プロ
セッサとしてシステムを構成する複数のプロセンサの動
作監視を行なう方式や、単一のプロセッサがシステムを
構成する他の全てのプロセッサの動作監視を行なう方式
が考えられる。しかるに、前者は一部プロセッサの故障
に対して信頼性を維持する点では有効であるが、後述の
ような監視動作の起動のばらつきに起因する問題に対し
ては改善効果はない。また、後者の場合には、ｎ個のプ
ロセッサで構成されるシステムにおいて、単一のプロセ
ッサが他のｎ−１個のプロセッサすべての動作監視を行
なう必要があり、システムの規模が大きくなるのに比例
して監視側プロセッサの負荷が大きくなるという欠点が
あった。To prevent this, there are methods in which a single processor always monitors the operations of multiple processors that make up the system, or a single processor monitors the operations of all other processors that make up the system. There are several possible methods. However, although the former method is effective in maintaining reliability against failures of some processors, it does not have the effect of improving problems caused by variations in activation of monitoring operations as described below. In the latter case, in a system consisting of n processors, a single processor must monitor the operations of all other n-1 processors, which increases the scale of the system. This has the disadvantage that the load on the monitoring processor increases proportionally.

また、電源投入によるパワーオン・リセット直後のよう
に、システムを構成するプロセッサ相互間で起動（前記
の相互監視処理を開始）するタイミングのずれがある場
合、監視側プロセッサが監視動作を開始した時点で被監
視側プロセッサがまだ相互監視動作を開始していないと
、監視側プロセッサは一時的しこ被監視側プロセッサを
異常と判断してしまうおそれがある。このような誤検出
を防ぐ目的で、被監視側プロセッサがまだ起動していな
い場合にはそれを検出し、起動するまでの間、監視動作
を停止する方法があるが、被監視側プロセッサが起動し
ていないことを判別し、かつその場合には監視動作を行
なわないという処理が新たに必要となり、ソフトウェア
がそれだけ複雑になる。In addition, if there is a lag in the timing at which the processors that make up the system start up (start the mutual monitoring process described above), such as immediately after a power-on reset when the power is turned on, the time when the monitoring processor starts the monitoring operation If the monitored processor has not yet started the mutual monitoring operation, the monitoring processor may temporarily determine that the monitored processor is abnormal. In order to prevent such false detections, there is a method to detect if the monitored processor has not started yet and to stop the monitoring operation until the monitored processor starts. A new process is required to determine whether the monitoring operation is not being performed and, in that case, to not perform the monitoring operation, which increases the complexity of the software.

しかも、これとは逆に被監視側プロセッサが相互監視動
作を開始した時点で、監視側プロセッサがまだ相互監視
動作を開始していない場合には、監視側プロセッサが相
互監視動作を開始するまでの間、被監視側プロセッサの
動作を監視するものがなく、このため信頼性の低下を招
くという問題がある。加えて、従来のように複数のプロ
セッサの監視側、被監視側の組み合わせを予め決めてお
く方式では、プロセッサの追加や除去によってシステム
構成が変更されるたびに新たな組み合わせを指定する必
要があり、ソフトウェア設計の負担が増大するという問
題があった。Moreover, on the contrary, if the monitoring processor has not yet started the mutual monitoring operation at the time the monitored processor starts the mutual monitoring operation, then During this period, there was nothing to monitor the operation of the monitored processor, which resulted in a problem of reduced reliability. In addition, with the conventional method of determining in advance the combination of multiple processors on the monitoring side and the monitored side, it is necessary to specify a new combination every time the system configuration changes due to the addition or removal of processors. , there was a problem that the burden of software design increased.

本発明は上記問題点を解決するために提案されたもので
、その目的とするところは、監視側プロセッサの負荷増
大を招くことがなく、また、起動直後のタイミングのず
れによる誤検出や信頼性低下を防ぐと共に、システム構
成の変化時にもソフトウェアの変更を伴うことなく相互
監視体制を直ちに構築できるようにしたマルチプロセッ
サシステムにおける動作監視方法を提供することにある
。The present invention was proposed in order to solve the above problems, and its purpose is to prevent an increase in the load on the monitoring processor, and to prevent false detection and reliability due to timing deviations immediately after startup. An object of the present invention is to provide a method for monitoring operations in a multiprocessor system, which prevents deterioration and allows a mutual monitoring system to be immediately established without changing software even when the system configuration changes.

（課題を解決するための手段）上記目的を達成するため、本発明は、マルチプロセッサ
システムの共有メモリ内に設けたリンク情報格納領域に
、各プロセッサが監視する対象である被監視側プロセッ
サを示す監視リンク情報と、各プロセッサを監視対象と
する監視側プロセッサを示す被監視リンク情報とを格納
する領域を各プロセッサごとに設け、かつ、前記システ
ムの起動後に順次相互監視体制に入るプロセッサを示す
最終リンク情報を格納する領域を設け、前記システムを
構成するプロセッサが起動される度に、このプロセッサ
によって最終リンク情報が示すプロセッサを監視するべ
く監視リンク情報及び被監視リンク情報を更新して相互
監視体制を構築すると共に最終リンク情報を更新し、監
視側のプロセッサがこのプロセッサにかかる監視リンク
情報に示すプロセッサの異常を検出した際に、この異常
発生プロセッサにかかる監視リンク情報が示すプロセッ
サを前記監視側プロセッサの新たな監視対象とするべく
監視リンク情報及び被監視リンク情報を更新して異常発
生プロセッサを除外した相互監視体制を再構築すると共
に、異常発生プロセッサが最終リンク情報に示すプロセ
ッサである場合には最終リンク情報に前記監視側プロセ
ッサを示す情報を格納することを特徴とする。(Means for Solving the Problems) In order to achieve the above object, the present invention provides a link information storage area provided in a shared memory of a multiprocessor system that indicates a monitored processor to be monitored by each processor. An area is provided for each processor to store monitoring link information and monitored link information indicating the monitoring side processors that monitor each processor, and a final area indicating the processors that will sequentially enter the mutual monitoring system after the system is started. An area for storing link information is provided, and each time a processor composing the system is started, this processor updates monitoring link information and monitored link information so that the processor indicated by the final link information is updated, thereby creating a mutual monitoring system. and update the final link information, and when a processor on the monitoring side detects an abnormality in the processor indicated by the monitoring link information related to this processor, the processor indicated by the monitoring link information related to this abnormal processor is transferred to the monitoring side. In order to make the processor a new monitoring target, the monitoring link information and monitored link information are updated to rebuild a mutual monitoring system excluding the abnormal processor, and if the abnormal processor is the processor indicated in the final link information, is characterized in that information indicating the monitoring processor is stored in the final link information.

（作用）本発明は、動作監視を行なうためのプロセッサの監視側
及び被監視側の組み合わせ（関係）は、システム全体が
果たすべき機能におけるプロセッサの関係とは必ずしも
同一のものである必要はないことに着目してなされたも
ので、本発明においては、動作監視を行なうプロセッサ
の組み合わせをシステム構成上のプロセッサの関係から
切り放してリング状になるようにし、システムの起動時
にいわゆる先着順で前記リングを拡大するようにプロセ
ッサを割り当てていくことにより、プロセッサ相互の起
動タイミングのずれに関係なく監視及び被監視動作の可
能なものから相互監視動作に入っていく。(Operation) The present invention provides that the combination (relationship) of the monitoring side and the monitored side of the processor for performing operation monitoring does not necessarily have to be the same as the relationship of the processors in the functions that the entire system should perform. In the present invention, the combination of processors that perform operation monitoring is separated from the relationship between the processors in the system configuration and is formed into a ring shape, and the ring is arranged on a so-called first-come, first-served basis when the system is started. By allocating processors in an expanding manner, the mutual monitoring operation starts from those that are capable of monitoring and monitored operations, regardless of the difference in startup timing between the processors.

すなわち、複数のプロセッサに起動タイミングのずれが
ある場合には、新たに相互監視体制に加わったプロセッ
サが、直０７ｉに相互監視体制に入ったプロセッサを監
視するように各プロセッサの監視、被監視の関係を組み
替えていく。In other words, if there is a difference in the startup timing of multiple processors, the monitoring of each processor and the monitored target are performed so that the processor that newly joined the mutual monitoring system monitors the processor that entered the mutual monitoring system in 07i. We will rearrange the relationships.

更に、一部プロセッサの異常を検出した場合には、既存
の相互監視体制から異常と判定されたプロセッサのみを
除外し、この除外されて切り離された部分をつなぎ合わ
せるように残りのプロセッサでリングを構成するように
相互監視のための新たな組み合わせを再構築することに
より、健全なプロセッサについては常に相互監視体制が
維持されることになる。Furthermore, if an abnormality is detected in some processors, only the processor determined to be abnormal is excluded from the existing mutual monitoring system, and a ring is created with the remaining processors to connect the excluded and separated parts. By reconstructing a new combination for mutual monitoring, a mutual monitoring system is always maintained for healthy processors.

（実施例）以下、図に沿って本発明の一実施例を説明する。(Example) An embodiment of the present invention will be described below with reference to the drawings.

第１図は、複数のプロセッサにより例えば分散処理を行
なうマルチプロセッサシステムを示しており、図中、１
１．１２．１３．１４はプロセッサ、１５は共有メモリ
、２４は共有バス、２６はバス調停回路である。各プロ
セッサ１１〜１４は、共有メモ１月５を介してデータの
授受を行なうが、複数のプロセッサが共有メモリ１５に
対して同時にアクセスした場合に共有バス２４」二で信
号が競合、交＃ｉｔ　Ｌないように。FIG. 1 shows a multiprocessor system that performs, for example, distributed processing using a plurality of processors.
1.12.13.14 is a processor, 15 is a shared memory, 24 is a shared bus, and 26 is a bus arbitration circuit. Each of the processors 11 to 14 exchanges data via the shared memory 15, but when multiple processors access the shared memory 15 at the same time, signals conflict on the shared bus 24, Don't let it be L.

予め決められた優先順位に従ってバス調停回路２６によ
り唯一のプロセッサに対して共有メモリ１５へのアクセ
ス権が与えられ、このプロセッサがアクセスを終了（ア
クセス権を放棄）するまでの間は他のプロセッサが共有
メモリ１５にアクセスできないようになっている。The bus arbitration circuit 26 grants access to the shared memory 15 to the only processor according to a predetermined priority order, and other processors are not allowed to access the shared memory 15 until this processor finishes accessing (relinquishes the access right). The shared memory 15 cannot be accessed.

しかして、この実施例では、第２図に示すようなリンク
情報格納領域１５ａが共有メモリＨ内に設けられている
。このリンク情報格納領域１５ａにおいて、最終リンク
情報とはマルチプロセッサシステムに最後に加わったプ
ロセッサを示す情報、また、各プロセッサ１１〜１４に
対応してそれぞれ設けられた監視リンク情報とは、当該
プロセッサが監視側である場合に監視する対象となるプ
ロセッサ（被監視側プロセッサ）を示す情報、更に、被
監視リンク情報とは当該プロセッサが被監視側である場
合にどのプロセッサ（監視側プロセッサ）によって監視
されるかを示す情報である。そして、これらの最終リン
ク情報、監視リンク情報及び被監視リンク情報は、例え
ば各プロセッサ１１〜１４を識別するための番号等から
なるプロセッサコードによって溝底されるものである。In this embodiment, a link information storage area 15a as shown in FIG. 2 is provided in the shared memory H. In this link information storage area 15a, the final link information is information indicating the processor that last joined the multiprocessor system, and the monitoring link information provided corresponding to each of the processors 11 to 14 is information that indicates the processor that joined the multiprocessor system last. Information indicating the processor to be monitored (monitored processor) when the processor is on the monitoring side, and information indicating the monitored link information, which indicates which processor (monitored processor) is monitoring the processor when the processor is on the monitored side. This information indicates whether the These final link information, monitoring link information, and monitored link information are subtended by a processor code consisting of, for example, a number for identifying each of the processors 11 to 14.

次に、このマルチプロセッサシステムにおける起動時の
処理を第３図に沿って説明する。Next, processing at startup in this multiprocessor system will be explained with reference to FIG.

まず、システムの電源投入直後の初期化プログラムの一
部として、共有メモリ１５の初期化処理にあてられたプ
ロセッサは、始めに初期化処理の要否を判断しく第３図
ステップＳｌ）、必要な場合には所定の初期化（同Ｓ２
）終了の後、リンク情報格納領域１５ａの最終リンク情
報に自らのプロセッサコートを設定した上で、同じくリ
ンク情報格納領域１５ａの自己の監視リンク情報、被監
視リンク情報の双方に自己のプロセッサコードを設定し
く同Ｓ３）、しかる後に他のプロセッサが共有メモリ１
５を使用してもよい旨を、例えば共有メモリ１５の所定
の領域に使用許可フラグをセットすることによって他の
プロセッサに対し通知する（同Ｓ４）。First, as part of the initialization program immediately after the system is powered on, the processor assigned to initialize the shared memory 15 first determines whether or not initialization processing is necessary (step SL in FIG. 3). In this case, predetermined initialization (S2
), after setting its own processor code to the final link information in the link information storage area 15a, it also sets its own processor code to both its own monitoring link information and monitored link information in the link information storage area 15a. (S3), and then other processors use shared memory 1.
For example, by setting a use permission flag in a predetermined area of the shared memory 15, the other processors are notified that the processor 5 may be used (S4).

その後、後述する相互監視動作を含む通常の処理動作を
開始する。Thereafter, normal processing operations including mutual monitoring operations to be described later are started.

共有メモリ１５の初期化処理を行なわないプロセッサは
、所定の初期化処理が終了していることを確認しく同３
１）、その後、共有メモリ１５の使用が許可されたこと
を確認した上で（同Ｓ６）ステップ７に移行する。A processor that does not initialize the shared memory 15 must confirm that the predetermined initialization process has been completed.
1) After that, after confirming that the use of the shared memory 15 is permitted (S6), the process moves to step 7.

このステップＳ７においては、■最終リンク情報で指定
されたプロセッサコードをリンク情報格納領域１５ａに
設けられた自己の監視リンク情報に設定する。■リンク
情報格納領域１５ａにおいて、最終リンク情報で指定さ
れたプロセッサに割り当てられた被監視リンク情報に設
定されているプロセッサコードを、自己の被監視リンク
情報に設定する。■最終リンク情報で指定されたプロセ
ッサに割り当てられた被監視リンク情報に自己のプロセ
ッサコードを設定する。■自己のプロセッサコートを最
終リンク情報に設定する。■自己の被監視リンク情報に
設定したプロセッサ（前記■におけるプロセッサコード
にかかるプロセッサ）に割当てられた監視リンク情報に
、自己のプロセッサコートを設定する。しかる後、相互
監視動作を含む通常の処理動作を開始する（同Ｓ５）。In this step S7, (1) the processor code specified by the final link information is set in its own monitoring link information provided in the link information storage area 15a. (2) In the link information storage area 15a, the processor code set in the monitored link information assigned to the processor specified by the final link information is set in its own monitored link information. ■Set its own processor code in the monitored link information assigned to the processor specified by the final link information. ■Set your own processor code as the final link information. (2) Set the own processor code in the monitoring link information assigned to the processor set in the own monitored link information (the processor associated with the processor code in (1) above). Thereafter, normal processing operations including mutual monitoring operations are started (S5).

従って、例えばプロセッサ１２が共有メモリ１５の初期
化処理を実施した後、プロセッサ１４→同１１の順で相
互監視体制を構築していく過程では、リンク情報格納領
域１５ａの各リンク情報は第４図、（ａ）→（ｂ）→（
ｃ）の順で変化していくと共に、プロセッサ１３が新た
に追加される場合のプロセッサ１３の具体的な処理は以
下のようになる。なｇ、同図においてプロセッサコード
は、各プロセッサ１１〜１４につきぞれぞれ＃　１１．
　＃　１２．　＃　１３．　＃　１４により示しである
。Therefore, for example, after the processor 12 initializes the shared memory 15, in the process of building a mutual monitoring system in the order of the processors 14 and 11, each link information in the link information storage area 15a is , (a) → (b) → (
The specific processing of the processor 13 when the processor 13 is newly added while changing in the order of c) is as follows. In the figure, the processor codes are #11. for each processor 11 to 14, respectively.
#12. #13. This is indicated by #14.

■まず、最終リンク情報で指定されるプロセッサコード
（＃１１）をプロセッサ１３に割り当てられた監視リン
ク情報に設定する。■最終リンク情報で指定されたプロ
セッサ１１に割り当てられた被監視リンク情報に設定さ
れているプロセッサコード（＃１２）を、プロセッサ１
３の被監視リンク情報に設定する。■最終リンク情報で
指定されたプロセッサ１１に割り当てられた被監視リン
ク情報に、自己のプロセッサコード（＃１３）を設定す
る。■自己のプロセッサコード（８１３）を最終リンク
情報にセットする。■自己の被監視リンク情報に設定し
たプロセッサ１２に割当てられた監視リンク情報に、自
己のプロセッサコード（＃１３）を設定する。(1) First, the processor code (#11) specified by the final link information is set in the monitoring link information assigned to the processor 13. ■The processor code (#12) set in the monitored link information assigned to the processor 11 specified in the final link information is
Set in the monitored link information of 3. (2) Set own processor code (#13) in the monitored link information assigned to the processor 11 specified by the final link information. (2) Set own processor code (813) as final link information. (2) Set the own processor code (#13) in the monitoring link information assigned to the processor 12 set in the own monitored link information.

この結果、リンク情報格納領域１５ａの各リンク情報は
、最終的に第４図（ｄ）のとおりとなる。また、各段階
（第４図（ａ）〜（ｄ））における各プロセッサ相互の
監視・被監視の関係は第５図（ａ）〜（ｄ）のようにな
る。すなわち、第５図において、各プロセッサ間の矢印
の根元は監視側、矢印の先は被監視側であり、これらの
相互関係は第４図（ａ）〜（ｄ）の各段階における各プ
ロセッサの監視リンク情報及び被監視リンク情報に基づ
くものである。また、実線の円で表わされたプロセッサ
は起動中のもの、破線の円で表わされたプロセッサは未
起動のものを示す。As a result, each link information in the link information storage area 15a finally becomes as shown in FIG. 4(d). Further, the mutual monitoring/monitored relationships among the processors at each stage (FIGS. 4(a) to 4(d)) are as shown in FIGS. 5(a) to 5(d). That is, in FIG. 5, the root of the arrow between each processor is the monitoring side, and the tip of the arrow is the monitored side, and the mutual relationship between them is the same as that of each processor at each stage in FIG. 4(a) to (d). This is based on monitoring link information and monitored link information. Furthermore, processors represented by solid circles are activated, and processors represented by broken circles are not activated.

次いで、前記リンク情報に基づいた各プロセッサ間の相
互監視動作について説明する。第７図及び第８図は、そ
れぞれ被監視側及び監視側のプロセッサの処理を示すフ
ローチャートである。まず、被監視側のプロセッサは、
第７図に示すように、共有ツーモリ１５内にプロセッサ
ごとに予め設けられた動作監視用データの内容を一定時
間ごとに１だけ加算（インクリメント）する（ステップ
Ｓ　１０１）。Next, a mutual monitoring operation between each processor based on the link information will be explained. FIG. 7 and FIG. 8 are flowcharts showing the processing of the processors on the monitored side and the monitoring side, respectively. First, the processor on the monitored side is
As shown in FIG. 7, the contents of operation monitoring data provided in advance for each processor in the shared memory 15 are incremented by 1 at regular intervals (step S101).

従って、この被監視側プロセッサが正常に動作していれ
ば、そのプロセッサの動作監視用データの内容は一定の
時間をかけて変化し続ける。Therefore, if the monitored processor is operating normally, the contents of the processor's operation monitoring data will continue to change over a certain period of time.

また、第８図において、監視側プロセッサは第７図のフ
ローチャートの処理と同時に、リンク情報格納領域１５
ａに設けられた自己の監視リンク情報で指定されたプロ
セッサの動作監視用データの内容を一定期間毎に共有メ
モリ１５から読み出しく同Ｓ　２０１）、その内容が常
に変化していることを確認する（同Ｓ　２０２）。動作
監視用データの内容が所定の期間を経過しても変化しな
い場合は、そのプロセッサに障害が生じたものとして他
のプロセッサに対して障害発生を通知する（同Ｓ　２０
３）と共に、後述のシステム再構成処理（同Ｓ　２０４
）を実施して相互監視体制から障害の発生したプロセッ
サを除外する。In addition, in FIG. 8, the monitoring side processor simultaneously processes the flowchart in FIG.
Read the contents of the operation monitoring data of the processor specified by the self-monitoring link information provided in a from the shared memory 15 at regular intervals (S201), and check that the contents are constantly changing. (S 202). If the contents of the operation monitoring data do not change after a predetermined period of time, it is assumed that a fault has occurred in that processor, and other processors are notified of the fault (S20 of the same).
3), as well as system reconfiguration processing (same S204) as described below.
) to exclude the faulty processor from the mutual monitoring system.

第９図は上述した相互監視動作の結果、他のプロセッサ
の障害発生を検出した場合、相互監視体制から障害の発
生したプロセッサを除外するためのシステム再構成処理
のフローチャートを表わしたものであり、実質上、第８
図のステップＳ　２０４に相当する。すなわち、監視側
プロセッサは自己の監視リンク情報で指定されたプロセ
ッサコードで表わされるプロセッサ（すなわち異常が発
生したプロセッサであり、相互監視動作の枠組みから外
す必要があるプロセッサ）の監視リンク情報に設定され
ているプロセッサコードを自己の監視リンク情報に設定
する（ステップＳ　３０１）と共に、設定したプロセッ
サコードにかかるプロセッサの被監視リンク情報に自己
のプロセッサコードを設定する（同Ｓ　３０２）。FIG. 9 is a flowchart of a system reconfiguration process for excluding the faulty processor from the mutual monitoring system when a fault occurs in another processor as a result of the mutual monitoring operation described above. In effect, the 8th
This corresponds to step S204 in the figure. In other words, the monitoring processor sets the monitoring link information of the processor represented by the processor code specified in its own monitoring link information (that is, the processor in which the error has occurred and must be removed from the framework of mutual monitoring operation). The processor code set is set in its own monitoring link information (step S301), and its own processor code is set in the monitored link information of the processor related to the set processor code (step S302).

つまり監視側プロセッサは、それまで監視対象であった
異常発生プロセッサに代えて、この異常発生プロセッサ
が監視していたプロセッサを以後の監視対象とし、この
新たに監視対象となったプロセッサは、自己を監視する
プロセッサとして前記監視側プロセッサのプロセッサコ
ードを設定することになる。In other words, the monitoring processor will now monitor the processor that was being monitored by this abnormal processor instead of the abnormal processor that was previously being monitored, and this newly monitored processor will be able to monitor itself. The processor code of the monitoring processor is set as the processor to be monitored.

その後、リンク情報格納領域１５ａの最終リンク情報の
内容が異常と判定されたプロセッサである場合は（同Ｓ
　３０３）、自己のプロセッサコードを新たな最終リン
ク情報として設定する（同Ｓ　３０４）。After that, if the content of the final link information in the link information storage area 15a is a processor determined to be abnormal (the same S
303), and sets its own processor code as new final link information (S304).

前述の第５図（ｄ）で表わされる相互監視の組み合わせ
で、プロセッサ１２がプロセッサ１３の異常を検出して
プロセッサ１３を相互監視体制から除外する場合を例に
とると、その処理は以下のとおりである。■自己（プロ
セッサ１２）の監視リンク情報で指定されたプロセッサ
（プロセッサ１３）の動作監視のだめのデータが一定時
間変化しないことから、プロセッサ１３の異常を検出す
る。■プロセッサ１３の異常を検出したことを共有メモ
リ１５内に設けられた故障検出フラグ等を使用して他の
プロセッサに通知する。■リンク情報格納領域１５ａ内
に設けられたプロセッサ１３の監視リンク情報に設定さ
れているプロセッサコード（＃１１）を自己の監視リン
ク情報に設定すると共に、設定したプロセッサコードで
表わされるプロセッサ１１の被監視リンク情報に自己の
プロセッサコード（＃１２）を設定する。Taking as an example the case where the processor 12 detects an abnormality in the processor 13 and excludes the processor 13 from the mutual monitoring system in the mutual monitoring combination shown in FIG. 5(d) described above, the processing is as follows. It is. (2) An abnormality in the processor 13 is detected because the data for monitoring the operation of the processor (processor 13) specified by the monitoring link information of the processor 12 itself does not change for a certain period of time. (2) Notify other processors that an abnormality in the processor 13 has been detected using a failure detection flag provided in the shared memory 15. ■Set the processor code (#11) set in the monitoring link information of the processor 13 provided in the link information storage area 15a as its own monitoring link information, and also Set own processor code (#12) in monitoring link information.

■最終リンク情報の内容（＄１３）が異常と判断された
プロセッサ１３に相当するため、自己のプロセッサコー
ド（＃１２）を新たな最終リンク情報の内容として設定
する。(2) Since the content of the final link information ($13) corresponds to the processor 13 determined to be abnormal, its own processor code (#12) is set as the content of the new final link information.

この結果、再構成されたシステムの相互監視の関係は第
６図のように変化する。また、異常と判定されたプロセ
ッサ１３が正常状態に復帰した場合には、電源投入直後
と同一の手順に従ってこのプロセッサ１３が相互監視体
制に組み入れられ、再び第５図（ｄ）の状態に戻る。As a result, the mutual monitoring relationship of the reconfigured systems changes as shown in FIG. Further, when the processor 13 determined to be abnormal returns to the normal state, this processor 13 is incorporated into the mutual monitoring system according to the same procedure as immediately after the power is turned on, and the state shown in FIG. 5(d) is returned again.

なお、上記システムの再構成処理は、別の監視側、被監
視側プロセッサ間においても同様であり、例えば第５図
（ｄ）において、プロセッサ１１がプロセッサ１４の異
常を検出した場合にはプロセッサ１４がシステムから除
去され、以後、プロセッサ１１はプロセッサ１２を監視
することになる。この場合、最終リンク情報（３１３）
は変化しない。Note that the system reconfiguration process described above is similar between other processors on the monitoring side and the monitored side. For example, in FIG. 5(d), when the processor 11 detects an abnormality in the processor 14, the processor 14 is removed from the system, and from now on processor 11 will monitor processor 12. In this case, the final link information (313)
does not change.

また、上記実施例はシステムを構成するプロセッサが４
つである場合についてのものであるが、本発明は一般に
複数のプロセッサからなるマルチプロセッサシステム全
般に適用することができる。Further, in the above embodiment, the number of processors constituting the system is four.
However, the present invention can generally be applied to multiprocessor systems including a plurality of processors.

（発明の効果）以上のように本発明によれば、マルチプロセッサシステ
ムを構成する複数のプロセッサの一部に障害が発生した
場合も、異常と判定されたプロセッサを除外して共有メ
モリを介した相互監視のための組み合わせを新たに再構
築することから、健全なプロセッサ間では障害発生の有
無に関わらず常に相互監視動作を継続できるため、シス
テムの信頼性を高めることができると共に、単一のプロ
セッサが複数ないし多数のプロセッサを監視するもので
はないから、監視側プロセッサの負荷も少なく〈済むと
いう効果がある。(Effects of the Invention) As described above, according to the present invention, even when a failure occurs in some of the plurality of processors constituting a multiprocessor system, processing is performed via the shared memory by excluding the processor determined to be abnormal. Since the combination for mutual monitoring is newly reconfigured, mutual monitoring can be continued between healthy processors regardless of the occurrence of a failure, which increases system reliability and increases the reliability of a single Since the processor does not monitor a plurality of processors or a large number of processors, the load on the monitoring processor is reduced.

また、電源投入による起動直後のように個々のプロセッ
サの動作を同期させることが困難な場合でも、準備の整
ったプロセッサから順次相互監視の枠組みに組み込まれ
ていくために相互監視動作の抜けがなくなり、その為の
ソフトウェアも極めて単純なものとなってプログラムの
簡略化を図ることができる。In addition, even when it is difficult to synchronize the operations of individual processors, such as immediately after power is turned on, the processors that are ready are incorporated into the mutual monitoring framework one by one, so there is no omission in mutual monitoring operations. The software for this purpose is also extremely simple, and the program can be simplified.

更に、相互監視を行なうためのプロセッサの組み合わせ
を固定せずに、いわゆる先着順で決定していくため、シ
ステム構成（プロセッサの数）が変更されてもプロセッ
サの追加・削減のたびごとに新たな組み合わせが構築さ
れていくことから、相互監視のためのソフトウェアはシ
ステムを構成するプロセッサの数に影響されることがな
い。例えば、システム内でプロセッサがひとつしか起動
していない場合でも、システムを構成する全てのプロセ
ッサが起動している場合でも、実行されるソフトウェア
は同一のものでよく（例えば第５図（ａ）の場合、自分
で自分の動作監視を行なうことになり、そのプロセッサ
の動作が正常である限り実質的な支障はなく、むしろ第
７図と第８図のソフトウェア（監視側・被監視側の処置
）を実行するプログラムレベルが異なる場合は、実行周
期の相違等により、第７図のラフ１−ウェアを実行する
プログラムレベルの動作を第８図のラフ１−ウェアを実
行するプログラムレベルが監視するという効果も生しる
）、システム変更に対しても柔軟に対応することができ
る等の効果がある。Furthermore, since the combination of processors for mutual monitoring is not fixed but determined on a so-called first-come, first-served basis, even if the system configuration (number of processors) changes, new processors are created each time processors are added or removed. Since the combinations are constructed, the software for mutual monitoring is not affected by the number of processors that make up the system. For example, the software to be executed may be the same whether only one processor is running in the system or if all processors making up the system are running (for example, the software shown in Figure 5(a)). In this case, you will have to monitor your own operation, and as long as the processor is operating normally, there will be no real problem, but rather the software shown in Figures 7 and 8 (measures on the monitoring side and monitored side) If the program levels that execute the software are different, due to differences in execution cycles, etc., the operation of the program level that executes the rough 1-ware in Figure 7 will be monitored by the program level that executes the rough 1-ware in Figure 8. It also has the advantage of being able to respond flexibly to system changes.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図ないし第９図は本発明の一実施例を示すもので、
第１図はマルチプロセッサシステムの構成図、第２図は
リンク情報格納領域の説明図、第３図はマルチプロセッ
サシステムにおける起動時の処理を示すフローチャート
、第４図（ａ）、（ｂ）。（Ｃ）、（ｄ）はリンク情報の変化を示す説明図、第５
図（ａ）、（ｂ）、（ｃ）、（ｄ）及び第６図はプロセ
ッサ内の相互監視の関係を示す説明図、第７図は被監視
側プロセッサの処理を示すフローチャート、第８図は監
視側プロセッサの処理を示すフローチャート、第９図は
システム再構成処理を示すフローチャート、第１０図は
従来例を説明するためのマルチプロセッサシステムの構
成図である。１１〜１４・・・プロセッサ　　　　　１５・・・共有
メモリ１５ａ・・リンク情報格納領域　　２４・・共有
バス２６・・バス調停回路1 to 9 show an embodiment of the present invention,
FIG. 1 is a configuration diagram of a multiprocessor system, FIG. 2 is an explanatory diagram of a link information storage area, FIG. 3 is a flowchart showing processing at startup in the multiprocessor system, and FIGS. 4(a) and (b). (C) and (d) are explanatory diagrams showing changes in link information;
Figures (a), (b), (c), and (d) and Figure 6 are explanatory diagrams showing mutual monitoring relationships within processors, Figure 7 is a flowchart showing processing of the monitored processor, and Figure 8. 9 is a flowchart showing processing of the monitoring side processor, FIG. 9 is a flowchart showing system reconfiguration processing, and FIG. 10 is a configuration diagram of a multiprocessor system for explaining a conventional example. 11-14... Processor 15... Shared memory 15a... Link information storage area 24... Shared bus 26... Bus arbitration circuit

Claims

【特許請求の範囲】複数のプロセッサと、その共有メモリとから構成される
マルチプロセッサシステムであって前記プロセッサが相
互に監視を行なう動作監視方法において、前記共有メモリ内に設けたリンク情報格納領域に、前記
各プロセッサが監視する対象である被監視側プロセッサ
を示す監視リンク情報と、各プロセッサを監視対象とす
る監視側プロセッサを示す被監視リンク情報とを格納す
る領域を各プロセッサごとに設け、かつ、前記システム
の起動後に順次相互監視体制に入るプロセッサを示す最
終リンク情報を格納する領域を設け、前記システムを構成するプロセッサが起動される度に、
このプロセッサによって前記最終リンク情報が示すプロ
セッサを監視するべく前記監視リンク情報及び被監視リ
ンク情報を更新して相互監視体制を構築すると共に前記
最終リンク情報を更新し、監視側のプロセッサがこのプロセッサにかかる前記監視
リンク情報に示すプロセッサの異常を検出した際に、こ
の異常発生プロセッサにかかる前記監視リンク情報が示
すプロセッサを前記監視側プロセッサの新たな監視対象
とするべく前記監視リンク情報及び被監視リンク情報を
更新して前記異常発生プロセッサを除外した相互監視体
制を再構築すると共に、前記異常発生プロセッサが前記
最終リンク情報に示すプロセッサである場合には前記最
終リンク情報に前記監視側プロセッサを示す情報を格納
することを特徴とするマルチプロセッサシステムにおけ
る動作監視方法。[Claims] In a multiprocessor system comprising a plurality of processors and a shared memory thereof, in which the processors mutually monitor each other, there is provided a link information storage area provided in the shared memory. , an area is provided for each processor to store monitoring link information indicating a monitored processor to be monitored by each processor, and monitored link information indicating a monitoring processor to be monitored by each processor, and , provide an area for storing final link information indicating processors that sequentially enter a mutual monitoring system after the system is started, and each time the processors constituting the system are started,
In order to monitor the processor indicated by the final link information by this processor, the monitoring link information and monitored link information are updated to establish a mutual monitoring system, and the final link information is updated, and the monitoring processor When an abnormality is detected in the processor indicated in the monitoring link information, the monitoring link information and the monitored link are used to make the processor indicated by the monitoring link information related to the abnormality processor a new monitoring target for the monitoring processor. Information is updated to rebuild a mutual monitoring system excluding the processor in which the abnormality has occurred, and if the processor in which the abnormality has occurred is the processor indicated in the final link information, information indicating the monitoring processor in the final link information. A method for monitoring operations in a multiprocessor system characterized by storing.