JPH06175944A

JPH06175944A - Network monitoring method

Info

Publication number: JPH06175944A
Application number: JP4324562A
Authority: JP
Inventors: Tomotsugu Saitou; 友嗣斉藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-12-04
Filing date: 1992-12-04
Publication date: 1994-06-24

Abstract

PURPOSE:To shorten a time until abnormality is detected, and to eliminate a neck due to the traffic of data for management running through a network, in a method by which the entire objects to be managed (agent) of the network are monitored by a software for management (manager). CONSTITUTION:The manager designates the name or address of the other party agent to which his own existence communication data are transmitted, the time interval of the transmission, the name or address of the other party agent by which the existence information data from the other agent are received, and the permitting time interval. Then, more than one closed loop is logically formed of the transmission and reception of the existence communication data in an LAN. Each agent transmits his own existence information data according to the designation, monitors the existence information data from the other agent based on the designation, and when the data can not be received within the permitting time, the result is communicated to the manager.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ネットワーク管理の一
環として、管理用ソフトウェア（マネージャ）がネット
ワークの全管理対象（エージェント）を監視する方法に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for management software (manager) to monitor all management objects (agents) of a network as part of network management.

【０００２】[0002]

【従来の技術】図５は従来の監視方法の例を示す説明図
である。ＬＡＮ（Local Area Network）によって結合さ
れたエージェントのグループが中継装置によって接続さ
れ、大きなネットワークを形成している。マネージャは
片方のＬＡＮ内に存在している。2. Description of the Related Art FIG. 5 is an explanatory diagram showing an example of a conventional monitoring method. A group of agents connected by a LAN (Local Area Network) are connected by a relay device to form a large network. The manager resides in one LAN.

【０００３】マネージャがエージェントを監視する方法
として従来はポーリング手法が用いられた。マネージャ
は一定の時間間隔で各エージェントに対して順番に応答
要求を送信する。それを受信したエージェントは直ちに
応答をマネージャに返す。マネージャは、一定時間内に
応答が無ければ、そのエージェントが何らかの原因で、
例えばシャットダウン、システムダウン、ネットワーク
障害等で、ネットワークとの接続が切れたと判断するこ
とで異常状態を検出する。A polling method has been conventionally used as a method for the manager to monitor the agent. The manager sequentially sends a response request to each agent at regular time intervals. The agent receiving it immediately returns a response to the manager. If the manager does not respond within a certain time, the agent is
For example, an abnormal state is detected by determining that the connection with the network is broken due to shutdown, system down, network failure, or the like.

【０００４】なお、図において、矢印のついた線は論理
的なデータのやり取りを示すものであって、物理的な結
合関係を示すものではない。この方法では、管理すべき
エージェントの数が増えると以下の２つの問題が生ず
る。In the figure, lines with arrows indicate logical data exchange, not physical connection. In this method, the following two problems occur when the number of agents to be managed increases.

【０００５】各エージェントに対して応答要求を送
信する時間の間隔がエージェントの数に比例して長くな
るので、異常状態を検出するまでの時間が長くなる。ネットワークに流れる管理用データ（応答要求と応
答）が増える。特に複数のＬＡＮが中継装置を介して接
続されている場合に、中継装置や中継装置間の回線を通
過するデータが増えることで、中継装置の処理能力ひい
てはシステムの処理能力に悪影響を及ぼす。Since the time interval for transmitting a response request to each agent becomes long in proportion to the number of agents, the time until an abnormal state is detected becomes long. The amount of management data (response request and response) flowing through the network increases. Particularly, when a plurality of LANs are connected via a relay device, the amount of data passing through the relay device or the line between the relay devices increases, which adversely affects the processing capability of the relay device and thus the processing capability of the system.

【０００６】例えば、図５において、１秒間隔でポーリ
ングを行なうとすると、エージェントが１００台あれば
異常発生から検出までに最大１００秒かかる。またエー
ジェントA5とエージェントB1は中継装置であり、ＬＡＮ
−ＡとＬＡＮ−Ｂを結合している。ＬＡＮ−Ｂ内のエー
ジェントB1〜Bnとマネージャとの通信データは中継装置
を通り、ここでトラフィックのネックを引き起こす。For example, in FIG. 5, if polling is performed at intervals of 1 second, it will take up to 100 seconds from the occurrence of an abnormality to the detection if there are 100 agents. Agent A5 and agent B1 are relay devices and
-A and LAN-B are connected. Communication data between the agents B1 to Bn in LAN-B and the manager passes through the relay device, and causes a traffic bottleneck.

【０００７】[0007]

【発明が解決しようとする課題】ネットワークの規模が
大きくなっても、異常検出までの時間を短くし、かつネ
ットワーク上を流れる管理用データのトラフィックによ
るネックを無くす方法を提供することを目的としてい
る。SUMMARY OF THE INVENTION It is an object of the present invention to provide a method of shortening the time until an abnormality is detected and eliminating the bottleneck due to the traffic of management data flowing on the network even when the scale of the network becomes large. .

【０００８】[0008]

【課題を解決するための手段】図１は本発明方法の原理
を示す概念図である。本発明は、ＬＡＮによって結合さ
れた複数の装置よりなる情報処理システムにおいて、全
体を管理するマネージャが、その他のエージェントに対
してそれぞれ、自分の生存通知データを伝えるべき相手
エージェントの名前またはアドレスと、それを送信する
時間間隔と、他からの生存報告データを受け取るべき相
手エージェントの名前またはアドレスと、その許容時間
間隔とを指定し、ＬＡＮ内で、論理的に生存通知データ
の送信−受信が１つ以上の閉ループを形成するように
し、各エージェントは、指定に従って自分の生存報告デ
ータを送信し、指定に従って他のエージェントからの生
存通知データを監視し、許容時間以内に受信できないと
き、その旨をマネージャに報告するようにしたことを特
徴としている。FIG. 1 is a conceptual diagram showing the principle of the method of the present invention. According to the present invention, in an information processing system composed of a plurality of devices connected by a LAN, a manager who manages the whole, and the name or address of the partner agent to which the other's existence notification data should be transmitted, By specifying the time interval for sending it, the name or address of the partner agent that should receive alive report data from others, and the allowable time interval, the logical transmission of alive notification data within the LAN is 1 One or more closed loops are formed, and each agent sends its own survival report data according to the specification, monitors the survival notification data from other agents according to the specification, and when it cannot receive within the allowed time, It is characterized by reporting to the manager.

【０００９】生存通知データの送信−受信閉ループは１
つにする必要はないし、全てのエージェントを入れる必
要もない。Transmission of live notification data-reception closed loop is 1
It doesn't have to be one or all agents.

【００１０】[0010]

【作用】個々のエージェントが並行して他の１つのエー
ジェントを監視しているので、エージェントの数にかか
わらず、監視時間（異常検出までの時間）は一定であ
る。Since each agent monitors one other agent in parallel, the monitoring time (time until abnormality detection) is constant regardless of the number of agents.

【００１１】管理用データの内、生存通知データは、一
方的に通知していればよいので、監視時間が同じとすれ
ば従来の要求応答方法に比べて約１／２のトラフィック
となる。その他の管理用データは、初期化時に各エージ
ェントに通知する初期化データと、異常発生時に確認を
含めて数個のデータが有るだけである。Of the management data, the survival notification data only needs to be notified unilaterally, so if the monitoring time is the same, the traffic will be about half that of the conventional request response method. The other management data is only a few pieces of data including initialization data that notifies each agent at the time of initialization and confirmation when an abnormality occurs.

【００１２】中継装置を通る管理用データのトラフィッ
クは、従来方法では１監視サイクル毎に中継装置の先に
あるエージェント数の２倍であるが、本発明によれば初
期化時に中継装置の先にあるエージェントに通知する初
期化データと異常発生時に確認を含めた数個が発生する
だけであり、通常動作時にはゼロである。In the conventional method, the traffic of the management data passing through the relay device is twice as many as the number of agents at the end of the relay device in each monitoring cycle. Only a few data including the initialization data to notify a certain agent and confirmation when an error occurs are generated, and it is zero during normal operation.

【００１３】[0013]

【実施例】以下、図面を参照して本発明の実施例を説明
する。図２は実施例の説明図である。本例では、ＬＡＮ
−ＡとＬＡＮ−Ｂの２つのＬＡＮが中継装置で結合され
たネットワークである。ＬＡＮ−Ａはマネージャとエー
ジェントA1〜Amを含み、ＬＡＮ−ＢはエージェントB1〜
Bnを含んでおり、そのうちのエージェントA5とエージェ
ントB1を介して遠距離通信回線で結合されている。Embodiments of the present invention will be described below with reference to the drawings. FIG. 2 is an explanatory diagram of the embodiment. In this example, LAN
A network in which two LANs, -A and LAN-B, are connected by a relay device. LAN-A includes managers and agents A1 to Am, and LAN-B includes agents B1 to Am.
Bn is included, and is connected via a telecommunications line via an agent A5 and an agent B1 among them.

【００１４】マネージャは、初期化にさいし、公知の方
法で各エージェントの状態を把握する。その情報を基に
適切な閉ループ構成を決定する。ネットワーク内で特別
な位置づけのエージェント、例えば中継装置や統計情報
の採取を行なう装置は別扱いで従来と同じく直接監視を
行なう。Upon initialization, the manager grasps the state of each agent by a known method. Based on that information, an appropriate closed loop configuration is determined. Agents with a special position in the network, such as relay devices and devices that collect statistical information, are handled separately and are directly monitored as in the past.

【００１５】次に、各エージェントに対して閉ループを
構成するための４つの情報、すなわち、自分の生存通知
データを伝えるべき相手エージェントのアドレスと、そ
れを送信する時間間隔と、他からの生存報告データを受
け取るべき相手エージェントのアドレスと、その許容時
間間隔とを通知する。Next, four pieces of information for constructing a closed loop for each agent, that is, the address of the partner agent to which the own survival notification data should be transmitted, the time interval for transmitting it, and the survival report from others. The address of the other agent that should receive the data and its allowable time interval are notified.

【００１６】閉ループを構成する各エージェントは、そ
れぞれ、指定された他のエージェントに、指定された時
間間隔で自分の生存通知データを送る。また、並行して
他のエージェントからの生存報告データを監視し、所定
時間内に来なければ異常報告データをマネージャに通知
する。自分の生存通知データの送信と、他からの生存通
知データの監視とは直接関係なく並行して行なう。Each agent forming the closed loop sends its own survival notification data to another designated agent at a designated time interval. In addition, in parallel, the survival report data from other agents is monitored, and if it does not come within a predetermined time, the abnormality report data is notified to the manager. The transmission of one's own survival notification data and the monitoring of the other's survival notification data are performed in direct parallel with each other.

【００１７】マネージャは問題のエージェントに対して
従来方法で（直接応答要求によって）状態を調べ、正常
に動作を継続できないと判断した場合には、このエージ
ェントを閉ループから外し、閉ループの再構築を行な
う。The manager checks the state of the agent in question by a conventional method (by a direct response request), and if it judges that the agent cannot continue operating normally, it removes this agent from the closed loop and reconstructs the closed loop. .

【００１８】図３に以上の処理のフロ−チャ−トを示
す。図においてタイマＡは生存通知データを送出する時
間間隔を規定するものであり、タイマＢは他のエージェ
ントからの生存通知データを受信するまでの許容時間を
規定するものである。従って、タイマＡの割り込みがあ
る度に生存通知データを送出し、他のエージェントから
の生存通知データを受信する度にタイマＢをセットしな
おす。所定時間内に他のエージェントからの生存通知デ
ータが来なければタイマＢ割り込みが発生するので異常
発生と判断できる。FIG. 3 shows a flowchart of the above processing. In the figure, a timer A defines a time interval for transmitting the live notification data, and a timer B defines an allowable time until receiving the live notification data from another agent. Therefore, the live notification data is sent every time the timer A interrupts, and the timer B is reset each time the live notification data is received from another agent. If the life notification data from another agent does not arrive within the predetermined time, the timer B interrupt is generated and it can be determined that an abnormality has occurred.

【００１９】図４に生存通知データの例と、異常検出時
のマネージャへの報告データの例を示す。図４（Ａ）は
生存通知データの例である。データは最小限宛先アドレ
スと、送信元アドレス（自分のアドレス）と、生存通知
であることを他のデータと区別するための識別コードと
よりなる。その他に必要に応じて送受信制御コード等が
入るが本発明とは直接関係はない。FIG. 4 shows an example of survival notification data and an example of report data to the manager when an abnormality is detected. FIG. 4A is an example of survival notification data. The data includes a minimum destination address, a source address (own address), and an identification code for distinguishing the existence notification from other data. In addition, a transmission / reception control code or the like is input as necessary, but it is not directly related to the present invention.

【００２０】図４（Ｂ）は異常報告データの例である。
生存通知データに、検出したエージェントの判断による
シーケンス番号、エラーコード、異常を検出したエージ
ェントのあるハードウェアの状態等を付加して、マネー
ジャの判断の助けとする。FIG. 4B shows an example of the abnormality report data.
To the manager's judgment, the survival notification data is added with a sequence number determined by the detected agent, an error code, the state of the hardware with the agent detecting the abnormality, etc.

【００２１】[0021]

【発明の効果】以上説明したように、本発明によれば、
エージェントの数にかかわらず異常検出までの時間は一
定であり、異常を速く検出することができる。またネッ
トワーク全体で管理用データのトラフィックを約１／２
にでき、特に中継装置を通る管理用データのトラフィッ
クは通常時には無くすことができ、トラフィックネック
を無くすことができる。As described above, according to the present invention,
The time until anomaly detection is constant regardless of the number of agents, and anomalies can be detected quickly. In addition, the traffic of management data is about 1/2 in the entire network.
In particular, the traffic of the management data passing through the relay device can be eliminated during normal times, and the traffic neck can be eliminated.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明方法の原理を示す概念図である。FIG. 1 is a conceptual diagram showing the principle of the method of the present invention.

【図２】本発明の実施例の説明図である。FIG. 2 is an explanatory diagram of an example of the present invention.

【図３】実施例の処理フロ−チャ−トである。FIG. 3 is a processing flowchart of the embodiment.

【図４】実施例の生存通知データと異常報告データ[Fig. 4] Survival notification data and abnormality report data of Example

【図５】従来の監視方法の概念説明図である。FIG. 5 is a conceptual explanatory diagram of a conventional monitoring method.

Claims

【特許請求の範囲】[Claims]

【請求項１】ＬＡＮによって結合された複数の装置よ
りなる情報処理システムにおいて、全体を管理するマネージャが、その他のエージェントに
対し、自分の生存通知データを伝えるべき相手エージェ
ントの名前またはアドレスと、それを送信する時間間隔
と、他からの生存報告データを受け取るべき相手エージ
ェントの名前またはアドレスと、その許容時間間隔とを
指定し、ＬＡＮ内で、論理的に生存通知データの送信−
受信が１つ以上の閉ループを形成するようにし、各エージェントは、指定に従って自分の生存報告データ
を送信し、指定に従って他のエージェントからの生存報
告データを監視し、許容時間以内に受信できないとき、
その旨をマネージャに報告することを特徴とするネット
ワーク監視方法。1. In an information processing system comprising a plurality of devices connected by a LAN, a manager who manages the whole, and the name or address of the partner agent to which the survival notification data of its own should be transmitted to other agents, and , The name or address of the partner agent that should receive the survival report data from other, and the allowable time interval, and logically transmits the survival notification data within the LAN −
Let the reception form one or more closed loops, each agent sends its own alive report data according to the specifications, monitors the alive report data from other agents according to the specifications, and when it cannot receive within the allowed time,
A network monitoring method characterized by reporting the fact to a manager.