JP2004145536A

JP2004145536A - Management system

Info

Publication number: JP2004145536A
Application number: JP2002308458A
Authority: JP
Inventors: Hirokazu Ikeda; 池田　博和; Toshiaki Hirata; 平田　俊明
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2002-10-23
Filing date: 2002-10-23
Publication date: 2004-05-20

Abstract

<P>PROBLEM TO BE SOLVED: To automatically find a performance deterioration phenomenon generated in a network system, to specify a factor, and to inform a system manager of a countermeasure. <P>SOLUTION: This management system 100 integrally managing an operation state or performance of the network system as a monitoring target system 10 holds operation information 400 collected from the monitoring target system, performance deterioration conditions 310 defining a range of a value of the operation information to each the performance deterioration phenomenon as conditions generating performance deterioration, and the performance deterioration factor 300 corresponding to each the performance deterioration condition, as operation management information. A performance deterioration factor analysis part 130 finds the performance deterioration phenomenon by comparing the operation information with the performance deterioration conditions, and specifies the corresponding factor from the performance deterioration factors. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、ネットワークシステムを安定かつ品質良く運用管理する管理システム及び管理方法に係り、特に、監視対象となるネットワークシステムに障害や性能劣化がある場合に、その要因及び対策を特定することを可能とした管理システム及び管理方法に関する。
【０００２】
【従来の技術】
近年、ＩＴ技術の興隆に伴い多種多様な業種、環境でネットワークシステムが稼動している。それと同時に、ネットワークシステムは、高機能化しかつ高品質であることが求められている。ネットワークシステムの運用管理は、このような背景からコスト高となってきており、効率的な運用管理技術が求められている。
なかでも、ネットワークシステムに性能劣化が生じたときにいかに早く要因を見つけ、早く改善することができるかという点が重要である。従来から、障害の特定から対処に至る作業を効率化するための方法はいくつか提案されている。
【０００３】
例えば、特許文献１には、検出された障害情報から障害箇所の特定と対策とを通知することによる障害対処作業の効率化に関する技術が記載されている。この技術は、情報記憶領域に、障害種別、障害の組み合わせ、規定発生回数、発生有効時間を予め登録しておき、検出された障害情報と条件が一致した場合に、その対処方法を出力するというものである。
【０００４】
また、特許文献２には、複数の障害情報について、障害情報の影響優先度と影響障害種別とを求め、これらと構成情報とを元にして根本原因である「原因障害」がどの障害情報であるのかを特定するという技術が記載されている。
【０００５】
【特許文献１】
特開平９−２８８５９４号公報
【０００６】
【特許文献２】
特開平１０−３０３８９７号公報
【０００７】
【発明が解決しようとする課題】
前述した従来技術は、何れも、障害情報を使用することにより性能劣化の要因の特定を行うことができないという問題点を有している。すなわち、障害事象は、サーバダウン等の非常にはっきりとした現象であるため、検出された障害の組み合わせから比較的特定し易い。それに対して性能劣化の症状は、大抵障害ではないあいまいな稼動状態の組み合わせとして検出される。このため、障害情報の組み合わせを利用して性能劣化を検出するためには、障害の閾値を適切に指定する必要がある。しかし、この閾値は、全ての性能劣化症状に対して一定ではないため、障害情報の利用には限界がある。
【０００８】
また、ネットワークシステムの管理において、性能劣化を未然で防止することも重要である。しかし、そもそもどんな症状が起きようとしているのか、未然にそれを奉仕するためにはどの項目を改善すればよいのか、熟練した管理者においても適切な対処は難しい。
【０００９】
本発明の目的は、前述した従来技術の問題点を解決し、ネットワークシステムにおける性能劣化事象を事前もしくは事後に発見し、その要因と対処を特定することを可能にした管理システム及び管理方法を提供することにある。
【００１０】
【課題を解決するための手段】
本発明によれば前記目的は、ネットワークシステムの性能や稼動状態を管理する管理システムにおいて、監視対象システムとしての前記ネットワークシステムから収集した稼動情報格納部と、性能劣化が生じる条件としてそれぞれの性能劣化事象に対して稼動情報の値の範囲を定義した性能劣化条件格納部と、前記性能劣化条件格納部に格納された各性能劣化条件に対応した性能劣化要因を定義した性能劣化要因格納部と、前記稼動情報格納部と前記性能劣化条件格納部とのデータを比較することにより、性能劣化条件を特定し前記性能劣化要因格納部から対応する性能劣化要因を特定する性能劣化要因解析手段とを備えることにより達成される。
【００１１】
また、前記目的は、ネットワークシステムの性能や稼動状態を管理する管理システムにおいて、監視対象システムとしての前記ネットワークシステムから収集した稼動情報格納部と、前記監視対象システムの構成及び構成間の関連を定義した構成情報格納部と、性能劣化が生じる条件としてそれぞれの性能劣化事象に対して稼動情報の値の範囲を定義した性能劣化条件格納部と、前記性能劣化条件格納部に格納された各性能劣化条件に対応した性能劣化要因を定義した性能劣化要因格納部と、前記構成情報格納部の構成情報を元に前記稼動情報格納部から対象となる稼動情報を特定し、該稼働情報と前記性能劣化条件格納部のデータとを比較することにより性能劣化条件を特定し前記性能劣化要因格納部から対応する性能劣化要因を特定する性能劣化要因解析手段とを備えることにより達成される。
【００１２】
前述において、前記性能劣化要因解析手段は、前記性能劣化条件格納部で定義された前記稼動情報の値の範囲に入らない値について前記値の範囲に対する割合を見積もる到達率解析部と、前記到達率解析部によって得られた割合の時間変化を直線または曲線で近似する近似解析部と、前記割合が１００％に到達する時刻を前記直線または曲線を外挿することにより見積もり予め設定した時間以内に到達する見込みとなった場合に外部に対して通知を行う性能劣化予測通知手段とを備える。
【００１３】
【発明の実施の形態】
以下、本発明によるネットワークシステムの管理システム及び管理方法の実施形態を図面により詳細に説明する。
【００１４】
図１は本発明の一実施形態によるネットワークシステムの管理システムの構成例を示すブロック図である。図１において、１０は監視対象システム、１１は構成要素、１００は管理システム、１１０は稼働情報収集部、１２０は運用管理情報格納部、１３０は性能劣化要因解析部、１４０は入出力制御部、１５０は入力装置、１６０は出力装置、３００は運用ノウハウ情報、３１０は性能劣化条件、３３０は性能劣化要因、３５０は対策、４００は稼働情報、５００は構成情報である。
【００１５】
本発明の実施形態による監視システム１００は、監視対象システム１０から稼動情報を収集し性能分析を行うものである。ここでの監視対象システム１０は、ネットワークシステムであり、監視対象システム１０としてのネットワークシステムを構成する構成要素１１は、稼動情報収集部１１０から稼動状態や性能等の稼動情報が採取可能に構成される。構成要素１１は、ハードウェアやプログラムだけでなくサービスなどの論理的な単位とすることもでき、いくつかの構成要素は、互いに関係しあって動作する。
【００１６】
管理システム１００は、稼動情報収集部１１０の他に、稼動情報などを格納する運用管理情報格納部１２０、監視対象システム１０で起きた性能劣化事象の解析を行う性能劣化要因解析部１３０、画面入出力やレポート出力等を行う入出力制御部１４０を備えて構成される。性能劣化要因解析部１３０は、運用管理情報格納部１２０に格納された情報を解析することにより、性能劣化事象を特定し、その要因、対処方法を入出力制御部１４０に渡す。運用管理情報格納部１２０は、稼動情報４００、監視対象システム１０のシステム構成を定義した構成情報５００、システム管理者の運用管理ノウハウを定型化した運用ノウハウ情報３００を格納して構成される。運用ノウハウ情報３００は、性能劣化条件３１０、性能劣化要因３３０、対策３５０の３つのカテゴリに分けられている。
【００１７】
図２は監視対象システム１０のシステム構成例とその構成を格納する構成情報５００の内容とを説明する図であり、次に、これについて説明する。図２において、１５は内部ネットワーク（サブネットＡ）、１６は外部ネットワーク、Ｕｓｒ１はプローブ、ＦＷ１はファイアウオール、ＬＢ１は負荷分散器、ＰＣ１〜ＰＣ５は汎用計算機、Ｗｅｂ１〜Ｗｅｂ３はＷｅｂサーバ、ＡＰ４はＡＰサーバ、ＤＢ５はＤＢサーバである。
【００１８】
図２（ａ）に示す監視対象システム１０は、Ｗｅｂシステムの一例であり、ここでは物理的なシステム構成を模式的に示している。そして、監視対象システム１０は、ＰＣ１〜ＰＣ３上に構築される３台のＷｅｂサーバＷｅｂ１〜Ｗｅｂ３が、負荷分散器ＬＢ１によって負荷分散され、また、ＰＣ４上に構築されるＡＰサーバＡＰ４とＰＣ５上に構築されるＤＢサーバＤＢ５とが置かれている。ＰＣ１〜ＰＣ５及び負荷分散器ＬＢ１は、内部ネットワーク１５（サブネットＡ）に収容されており、また、内部ネットワーク１５とインターネット等の外部ネットワーク１６との間には、ファイアウォールＦＷ１が置かれている。外部ネットワーク１６に接続されているＵｓｒ１は、システム管理者としてのユーザ側から見たＷｅｂサイトの応答性を定期的に監視し稼動情報として蓄積するプローブである。
【００１９】
図２（ａ）に示す監視対象システム１０の構成情報５００は、図２（ｂ）に示すように、監視対象システム１０を構成する各構成要素５１０の情報として、その構成要素と直接的な従属関係にある構成要素５３０と、構成要素種別５５０とが定義されて構成されている。直接的な従属関係にある構成要素５３０は、構成要素間の関連の定義であり、構成要素種別５５０は、構成要素の機能的分類である。そして、どちらの情報も、メタデータである運用ノウハウ情報３００の性能劣化条件３１０との比較において利用される。構成情報５００のより詳細な構成は、図２（ｂ）に示すに留め詳細な説明を省略するが、構成要素５１０としては、図２（ａ）により説明した監視対象システムとしてのネットワークシステム含まれる全ての構成要素が挙げられる。
【００２０】
図３はＷｅｂシステムを監視対象システムとした運用ノウハウ情報３００の内容の例について説明する図であり、次に、これについて説明する。図３に示す例では、Ｗｅｂシステムで過去に起きた、あるいは、起きやすい性能劣化事象として、症状Ａ３６０、Ｂ３７０、Ｃ３８０の３つを挙げている。
【００２１】
性能劣化条件３１０は、各構成要素の稼動情報に対して閾値を定量的に設定した範囲を定義している。構成要素から実際に収集した稼動情報がこの範囲の組み合わせに該当すれば、その組み合わせに対応した症状を特定することができる。
性能劣化要因３３０には、それぞれの症状に対応した性能劣化要因が格納されている。図３に示す例には示されていないが、１つの症状に対して複数の要因が対応していてもよい。対策３５０には、それぞれの要因に対応した対策が格納される。症状Ａ３６０に示す例のように、１つの要因に対して複数の対策が対応していてもよい。
【００２２】
前述運用ノウハウ情報３００の性能劣化条件３１０に含まれる構成要素の種別３２０は、監視対象システム１０内の実際の構成要素１１とは異なる。例えば、症状Ａ３６０のＦＷは、監視対象システム１０の特定のＦＷを指すのではなく、ＦＷ全般を意味する。このため、運用ノウハウ情報３００は、システムの構成とは独立したメタデータであり、個々のシステム毎に定義する必要はない。
【００２３】
図４は監視対象システム１０から収集した稼動情報４００の例を説明する図であり、次に、これについて説明する。ここに示す例は、稼動情報収集部１１０によって５分置きに採取した稼動情報の一部分を示したものである。
【００２４】
稼動情報４００は、各構成要素４１０と、その稼働情報４２０とにより構成される。図示例において、構成要素Ｕｓｒ１の稼働情報は、Ｗｅｂ応答時間とアクセス成功率とであり、構成要素ＬＢ１の稼働情報は、リクエスト数と回線使用率とである。また、ＰＣ１〜ＰＣ３の稼働情報は、ＣＰＵ使用率であり、ＡＰ４、ＤＢ５の稼働情報は、コネクション数である。
【００２５】
図４に示す例において、例えば、７月１日９：００において、構成要素Ｕｓｒ１の稼動情報であるＷｅｂ応答時間は３秒となっている。稼動情報４２０は、各時刻において基本的に全て収集されている必要があるが、実際に収集された稼動情報のタイムスタンプは同一時刻とならない場合がある。このような場合、ある時間幅の間で採取された情報を同一時刻として格納することにより対応することができる。また、収集された稼動情報は、必ずしも図４に示すように一定期間蓄積おかなくともよく、最新時刻における稼動情報が一揃いあれば、それに対して性能劣化要因の解析を行ってしまい解析結果を格納しておけば、解析に使用した稼動情報を削除してしまってもよい。
【００２６】
図５は稼動情報収集部１１０による稼動情報収集シーケンス６２０と、システムを管理するユーザによる分析条件指定を契機とした性能劣化要因の解析の処理である分析シーケンス６４０とを説明する図であり、次に、これについて説明する。
【００２７】
稼動情報収集シーケンス６２０において、稼動情報収集部１１０は、定期的に監視対象システム１０から稼動情報を収集し運用管理情報格納部１２０の稼動情報４００に格納する。稼動情報収集シーケンス６２０は、基本的に他の処理シーケンスとは独立に動作する。ユーザによる分析シーケンス６４０において、まず、システム管理者であるユーザは、端末等の入力装置１５０によって分析対象とする時間範囲と対象とするシステムとを選択する。指定された条件は、入出力制御部１４０を介して性能劣化要因解析部１３０に渡される。性能劣化要因解析部１３０は、与えられた条件に従って、運用管理情報格納部１２０から構成情報５００、稼動情報４００、運用ノウハウ情報３００を取得する。その後、性能劣化要因解析部１３０は、構成情報を元にして稼動情報と運用ノウハウ情報との比較照合６１０を行う。性能劣化要因解析部１３０は、対象とした時間範囲に運用ノウハウ情報３００の性能劣化条件３１０と合致するデータがあれば、対応する要因３３０と対策３５０とを結果として入出力制御部１４０を介して出力装置１６０に出力する。
【００２８】
図６は性能劣化解析を自動的に行う場合の自動監視シーケンス６６０を説明する図であり、次に、これについて説明する。
【００２９】
図６に示すシーケンスの開始は、図５により説明した稼動情報収集シーケンス６２０が実行されていることが前提である。そして、入出力制御部１４０は、性能劣化要因解析部１３０に対して定期的に分析を実行する命令を出す。性能劣化要因解析部１３０は、図５により説明した分析シーケンス６４０の場合と同様に、運用管理情報格納部１２０から各情報を取得し、その後、症状との比較照合６１０を同様にして行い、結果を入出力制御部１４０に対して出力する。入出力制御部１４０は、照合結果に何らかの症状が含まれる場合、出力装置１６０を介してシステム管理者等にＥメールを送信する、端末上でアラームを表示する等の警告を行う。また、特に異常がない場合についても稼動状況を管理者であるユーザに対して定期的にレポートすることもできる。
【００３０】
前述した図５、図６において、性能劣化要因の解析処理の結果を管理者であるユーザに報告するとして説明したが、本発明は、ネットワークシステム１０によるサービスを外部ネットワーク１６を介して受けているユーザが、管理システムの入出力制御部１４０にアクセスして、性能劣化要因の解析処理の結果を受領することができるようにすることもできる。
【００３１】
図７は図５、図６により説明したシーケンスに含まれる性能劣化要因解析部１３０での症状との照合６１０の処理動作を説明するフローチャートであり、次に、これについて説明する。ここでの照合の処理動作は、指定された時間範囲の全ての時刻のものに対して行われる。また、各時刻の稼動情報に対して全ての症状について、性能劣化条件と比較し適合するか否かチェックする。症状と適合した時刻については、該当する症状とその時刻を記録しておく。すべての症状と時刻について比較が終わった後、適合した時刻と症状、また、対応する要因と対策とを結果として出力する。
【００３２】
（１）この処理が開始されると、まず、該当する構成情報を特定し、該当する稼働情報と症状の性能劣化条件とを比較する（ステップ７００〜７０２）。
【００３３】
（２）ステップ７０２の比較の結果、条件に適合するか否かを判定し、適合した場合、該当する症状について適合した個数をカウントし、ステップ７０１からの処理に戻って処理を繰り返す（ステップ７０３、７０７）。
【００３４】
（３）ステップ７０３の判定で、条件に適合していなかった場合、全ての症状についての比較が終了したか否か、全ての対象時間範囲についての比較が終了したか否かを判定し、終了していないものがあった場合、ステップ７０１からの処理に戻って処理を繰り返す（ステップ７０４、７０５）。
【００３５】
（４）ステップ７０４、７０５の判定で、全ての症状についての比較が終了し、また、全ての対象時間範囲についての比較が終了していた場合、適合した症状について、対応する要因と対策とを出力して、処理を終了する（ステップ７０６）。
【００３６】
前述した性能劣化要因解析の処理の実行の前処理として、性能劣化要因解析部１３０は、解析の対象となる稼動情報間の相関関係を統計的に定量化し、相関が低い稼動情報を解析の対象から外す相関分析を行うようにすることができ、これにより、性能劣化要因解析の処理を高速化することができる。
【００３７】
ここで、前述の照合処理の詳細な例として、図３に示す運用ノウハウ情報３００、図２（ｂ）に示す構成情報５００のデータを用いて、図４に示す稼動情報４００の７月２日１５：００における比較方法を説明する。ここでは、症状Ｂとの照合を行うこととする。
【００３８】
まず、運用ノウハウ情報３００の性能劣化条件の症状Ｂの構成要素種別３２０に定義してあるＬＢ、ＰＣ、Ｕｓｒと構成情報５００とを比較し、前記構成要素種別を持つ構成要素を抽出する。その結果、構成要素としてＬＢ１、ＰＣ１、ＰＣ２、ＰＣ３、ＰＣ４、ＰＣ５、Ｕｓｒ１が得られる。但し、症状Ｂの性能劣化条件としてＰＣはＬＢに接続されていなければならないため、直接的な従属関係にある構成要素５３０からＰＣとしてはＰＣ１、ＰＣ２、ＰＣ３のみが残る。次に、抽出された構成要素の稼動情報と性能劣化条件３１０との比較を行う。
【００３９】
この場合、収集した稼働情報４００におけるＬＢ１のリクエスト数は１２０、回線使用率は４０％であり、性能劣化条件３１０を満たしている。また、ＰＣ１、ＰＣ２、ＰＣ３のＣＰＵ使用率の最大偏差は３３．３、Ｕｓｒ１のＷｅｂ応答時間は１０秒であり、こちらも性能劣化条件３１０を満たしている。従って、症状Ｂが７月２日１５：００に監視対象システム１０で生じており、条件に適合したといえる。
【００４０】
図８は性能劣化要因解析部１３０での分析シーケンス６４０の結果、ＧＵＩ等を介して出力装置１６０に表示された分析結果の例を示す図であり、分析の結果出力される内容について説明する。
【００４１】
図８に示す症状の推移８００は、各症状について時間辺りの適合割合の時間推移を表示する。図４示す稼動情報４００の例では５分毎に稼動情報を採取しており、１時間当たりの表示とした場合、１２回の症状と適合すると斜線の領域が最大、すなわち１００％となる。表示した時間範囲内で症状との適合がある場合、該当する症状に対する要因と対策８４０を表示する。要因と対策８４０は、運用ノウハウ情報３００から該当する症状に対応した要因３３０と対策３５０を参照することにより得ることができる。また、表示する文章には、症状との適合の際に特定した構成要素の名称を埋め込むことができる。図８に示す要因と対策８４０の例では、症状Ｂについて、７月２日から７月４日の３日間に渡って適合が生じているので、症状Ｂに対応した要因と対策とを要因３３０と対策３５０とを参照して抽出し、それを表示している。また、症状Ｃについて、７月３日から７月４日の２日間に渡って適合が生じているので、症状Ｃに対応した要因と対策とを要因３３０と対策３５０とを参照して抽出し、それを表示している。
【００４２】
図９は症状Ｂ３７０が１００％の確立で発生するようになる１日前に、そのことを事前に予測を行おうとする場合の例を説明する概念図であり、次に、監視対象システムで症状が生じる前に、事前に予測する方法について説明する。
【００４３】
図９において、縦軸８５０は症状Ｂ３７０の到達率であり、性能劣化条件３１０に定めた閾値に対する割合の平均値で表すことができる。例えば、ＬＢ１のリクエスト数が７５だとすると７５％、Ｕｓｒ１のＷｅｂ応答時間が２秒だと２５％となり、仮に、前記２つが性能劣化条件だとすると平均で５０％が症状の到達率となる。すなわち、到達率１００％を超えると、症状が発生することとなる。
横軸８５５は時間軸であり、図９全体としては、症状Ｂの到達率の時間変化８６０を示す。到達率の時間変化８６０は、回帰分析等により近似直線（曲線）８６５として得ることができる。例えば、症状の発生の１日前にアラームを出すという場合、近似直線８６５を現在８７５から未来に向けて外挿し、１００％となる時刻が現在８７５から一日以内かどうか判定する。図９の例の場合、ちょうど１日後に１００％に達するためアラーム等で自動通知を行うことができる。
【００４４】
図１０は前述した事前予測の処理動作を説明するフローチャートであり、次に、これについて説明する。この処理動作は、性能劣化要因解析部１３０により行われる。
【００４５】
図１０に示す処理において、全ての時刻に対して到達率を求めると値のばらつきが大きくなるため、ここでは対象時間範囲を複数の時間幅で分割し、それぞれの時間幅において平均化した到達率を求め、全ての時間範囲に対して到達率が得られたら、近似直線を求め、設定時間後に閾値に到達するか否かを外挿によって求める。そして、いずれかの症状で到達する予測が得られた場合、ユーザに通知する。
【００４６】
（１）この処理が開始されると、まず、対象時間範囲を適当な時間幅に分割し、該当する構成情報を特定し、該当する稼働情報と症状の性能劣化条件とを比較し、さらに、比較結果に基づいて、該当する症状について到達率を計算する（ステップ７２０〜７２４）。
【００４７】
（２）全ての症状について、全ての時間幅について、全ての対象時間幅について稼働情報と症状の性能劣化条件との比較が終了したかを判定し、１つでも終了していなければ、ステップ７２２からの処理に戻って処理を繰り返す（ステップ７２５〜７２７）。
【００４８】
（３）ステップ７２５〜７２７の判定で、全ての比較が終了していた場合、時間と到達率との関係を線形近似し、設定時間後に閾値に到達するか、すなわち、図９により説明した例の場合、１日後に到達率が１００％に到達するか否かを判定する（ステップ７２８、７２９）。
【００４９】
（４）ステップ７２９の判定で、設定時間後に閾値に到達する場合、システムの管理者としてのユーザにその旨を通知し、ステップ７２８からの処理に戻って処理を繰り返す（ステップ７３１）。
【００５０】
（５）ステップ７２９の判定で、設定時間後に閾値に到達しない場合、全ての該当した症状について近似処理を終了したか否かを判定し、終了していなければ、ステップ７２８からの処理に戻って処理を繰り返し、終了していれば処理を終了する（ステップ７３０、７４０）。
【００５１】
前述した本発明の実施形態による各処理は、処理プログラムとして構成することができ、この処理プログラムは、ＨＤ、ＤＡＴ、ＦＤ、ＭＯ、ＤＶＤ−ＲＯＭ、ＣＤ−ＲＯＭ等の記録媒体に格納して提供することができる。
【００５２】
【発明の効果】
以上説明したように本発明によれば、性能劣化事象に関するノウハウ情報と、システムの構成定義情報及び稼動情報とから、性能劣化事象と要因及び対処法とを特定し、性能の時間変化の見積もりから性能劣化の予測を行うことができ、ネットワークシステムの性能分析を支援することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態によるネットワークシステムの管理システムの構成例を示すブロック図である。
【図２】監視対象システムのシステム構成例とその構成を格納する構成情報の内容とを説明する図である。
【図３】Ｗｅｂシステムを監視対象システムとした運用ノウハウ情報の内容の例について説明する図である。
【図４】監視対象システムから収集した稼動情報の例を説明する図である。
【図５】稼動情報収集部による稼動情報収集シーケンスと、システムを管理するユーザによる分析条件指定を契機とした性能劣化要因の解析の処理である分析シーケンスとを説明する図である。
【図６】性能劣化解析を自動的に行う場合の自動監視シーケンスを説明する図である。
【図７】図５、図６により説明したシーケンスに含まれる性能劣化要因解析部での症状との照合の処理動作を説明するフローチャートである。
【図８】性能劣化要因解析部での分析シーケンスの結果、ＧＵＩ等を介して出力装置に表示された分析結果の例を示す図である。
【図９】症状Ｂが１００％の確立で発生するようになる１日前に、そのことを事前に予測を行おうとする場合の例を説明する概念図である。
【図１０】前述した事前予測の処理動作を説明するフローチャートである。
【符号の説明】
１０　監視対象システム
１１　構成要素
１５　内部ネットワーク（サブネットＡ）
１６　外部ネットワーク
１００　管理システム
１１０　稼働情報収集部
１２０　運用管理情報格納部
１３０　性能劣化要因解析部
１４０　入出力制御部
１５０　入力装置
１６０　出力装置
３００　運用ノウハウ情報
３１０　条件
３３０　要因
３５０　対策
４００　稼働情報
５００　構成情報
Ｕｓｒ１　プローブ
ＦＷ１　ファイアウオール
ＬＢ１　負荷分散器
ＰＣ１〜ＰＣ５　汎用計算機
Ｗｅｂ１〜Ｗｅｂ３　Ｗｅｂサーバ
ＡＰ４　ＡＰサーバ
ＤＢ５　ＤＢサーバ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a management system and a management method for operating and managing a network system stably and with high quality, and in particular, when a failure or performance degradation occurs in a network system to be monitored, it is possible to specify the cause and countermeasure thereof. To a management system and a management method.
[0002]
[Prior art]
In recent years, with the rise of IT technology, network systems have been operating in various types of industries and environments. At the same time, the network system is required to have high functionality and high quality. The operation management of the network system has been increasing in cost from such a background, and an efficient operation management technology is required.
Above all, it is important to find out the cause of the performance degradation in the network system as soon as possible and to improve the situation quickly. Hitherto, several methods have been proposed for improving the efficiency of the work from the identification of a fault to its handling.
[0003]
For example, Patent Literature 1 discloses a technique relating to the efficiency of a trouble-shooting operation by notifying a fault location and a countermeasure from detected fault information. According to this technology, a failure type, a combination of failures, a prescribed number of occurrences, and an effective time are registered in advance in an information storage area, and when the detected failure information matches a condition, a countermeasure is output. Things.
[0004]
Further, in Patent Document 2, with respect to a plurality of pieces of fault information, an influence priority and an affected fault type of the fault information are obtained, and based on these and the configuration information, the root cause “cause fault” is determined by which fault information. There is described a technique of identifying whether there is any.
[0005]
[Patent Document 1]
JP-A-9-288594
[Patent Document 2]
JP-A-10-303897
[Problems to be solved by the invention]
All of the above-mentioned prior arts have a problem that the cause of performance degradation cannot be specified by using the failure information. That is, since a failure event is a very clear phenomenon such as a server failure, it is relatively easy to specify a failure event from a combination of detected failures. On the other hand, the symptom of the performance degradation is detected as an obscure combination of operating states that is not usually a failure. Therefore, in order to detect the performance degradation using the combination of the fault information, it is necessary to appropriately specify the threshold value of the fault. However, since this threshold is not constant for all performance degradation symptoms, there is a limit to the use of fault information.
[0008]
It is also important in network system management to prevent performance degradation before it occurs. However, it is difficult for even a skilled manager to appropriately cope with what symptoms are going to happen in the first place, what items should be improved in order to serve them beforehand.
[0009]
SUMMARY OF THE INVENTION An object of the present invention is to provide a management system and a management method which solve the above-mentioned problems of the prior art, detect a performance deterioration event in a network system before or after, and can specify a cause and a countermeasure thereof. Is to do.
[0010]
[Means for Solving the Problems]
According to the present invention, the object is to provide a management system for managing the performance and operation state of a network system, wherein an operation information storage unit collected from the network system as a monitored system, A performance degradation condition storage unit defining a range of values of operation information for an event, and a performance degradation factor storage unit defining performance degradation factors corresponding to each performance degradation condition stored in the performance degradation condition storage unit; A performance degradation factor analysis unit that identifies performance degradation conditions by comparing data in the operation information storage unit and the performance degradation condition storage unit and identifies a corresponding performance degradation factor from the performance degradation factor storage unit. This is achieved by:
[0011]
In addition, the object is to define an operation information storage unit collected from the network system as a monitored system and a configuration of the monitored system and an association between the configurations in a management system that manages the performance and operating state of the network system. A configuration information storage unit, a performance degradation condition storage unit defining a range of operation information values for each performance degradation event as a condition under which performance degradation occurs, and a performance degradation stored in the performance degradation condition storage unit. A performance deterioration factor storage unit that defines a performance deterioration factor corresponding to a condition; and target operation information specified from the operation information storage unit based on the configuration information of the configuration information storage unit. A performance deterioration condition is specified by comparing the data with the condition storage unit, and a corresponding performance deterioration factor is specified from the performance deterioration factor storage unit. It is achieved by providing a potential degradation factors analysis means.
[0012]
In the above, the performance degradation factor analysis means comprises: a reachability analysis unit for estimating a ratio of a value that does not fall within the value range of the operation information defined in the performance degradation condition storage unit to the value range; An approximation analysis unit that approximates the time change of the ratio obtained by the analysis unit with a straight line or a curve, and estimates the time at which the ratio reaches 100% by extrapolating the straight line or the curve and arrives within a preset time. And a performance degradation prediction notifying unit for notifying the outside when it is expected to be performed.
[0013]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of a management system and a management method for a network system according to the present invention will be described in detail with reference to the drawings.
[0014]
FIG. 1 is a block diagram showing a configuration example of a network system management system according to an embodiment of the present invention. In FIG. 1, 10 is a monitored system, 11 is a constituent element, 100 is a management system, 110 is an operation information collection unit, 120 is an operation management information storage unit, 130 is a performance deterioration factor analysis unit, 140 is an input / output control unit, 150 is an input device, 160 is an output device, 300 is operation know-how information, 310 is a performance deterioration condition, 330 is a performance deterioration factor, 350 is a countermeasure, 400 is operation information, and 500 is configuration information.
[0015]
The monitoring system 100 according to the embodiment of the present invention collects operation information from the monitored system 10 and performs performance analysis. The monitored system 10 here is a network system, and the constituent elements 11 configuring the network system as the monitored system 10 are configured so that operating information such as operating status and performance can be collected from the operating information collecting unit 110. You. The components 11 can be logical units such as services as well as hardware and programs, and some components operate in relation to each other.
[0016]
The management system 100 includes, in addition to the operation information collection unit 110, an operation management information storage unit 120 that stores operation information and the like, a performance deterioration factor analysis unit 130 that analyzes a performance deterioration event that has occurred in the monitored system 10, a screen input It is provided with an input / output control unit 140 that performs output and report output. The performance degradation factor analysis unit 130 analyzes the information stored in the operation management information storage unit 120 to identify the performance degradation event, and passes the factor and the coping method to the input / output control unit 140. The operation management information storage unit 120 is configured to store operation information 400, configuration information 500 that defines the system configuration of the monitored system 10, and operation know-how information 300 that standardizes the operation management know-how of the system administrator. The operation know-how information 300 is divided into three categories: a performance degradation condition 310, a performance degradation factor 330, and a countermeasure 350.
[0017]
FIG. 2 is a diagram for explaining an example of the system configuration of the monitoring target system 10 and the contents of the configuration information 500 for storing the configuration. Next, this will be described. In FIG. 2, 15 is an internal network (subnet A), 16 is an external network, Usr1 is a probe, FW1 is a firewall, LB1 is a load distributor, PC1 to PC5 are general-purpose computers, Web1 to Web3 are Web servers, and AP4 is an AP server. , DB5 are DB servers.
[0018]
The monitoring target system 10 shown in FIG. 2A is an example of a Web system, and here schematically shows a physical system configuration. Then, in the monitored system 10, the three Web servers Web1 to Web3 constructed on the PC1 to PC3 are load-balanced by the load balancer LB1, and the three Web servers Web1 to Web3 are constructed on the AP server AP4 and the PC5 constructed on the PC4. The DB server DB5 to be constructed is placed. The PC1 to PC5 and the load balancer LB1 are accommodated in an internal network 15 (subnet A), and a firewall FW1 is provided between the internal network 15 and an external network 16 such as the Internet. Usr1 connected to the external network 16 is a probe that periodically monitors the responsiveness of the Web site viewed from the user side as a system administrator and accumulates the information as operation information.
[0019]
The configuration information 500 of the monitored system 10 shown in FIG. 2A is, as shown in FIG. 2B, information of each component 510 that configures the monitored system 10 and is directly dependent on the component. The related component 530 and the component type 550 are defined and configured. The component 530 having a direct dependency relationship is a definition of the relationship between the components, and the component type 550 is a functional classification of the component. Both pieces of information are used in comparison with the performance deterioration condition 310 of the operation know-how information 300 as metadata. Although a more detailed configuration of the configuration information 500 is only shown in FIG. 2B and detailed description is omitted, the component 510 includes the network system as the monitored system described with reference to FIG. All components are included.
[0020]
FIG. 3 is a diagram for explaining an example of the contents of the operation know-how information 300 in which the Web system is a monitoring target system, and this will be described next. In the example shown in FIG. 3, three symptoms A360, B370, and C380 are listed as performance degradation events that have occurred or are likely to occur in the Web system in the past.
[0021]
The performance deterioration condition 310 defines a range in which a threshold value is quantitatively set for operation information of each component. If the operation information actually collected from the components corresponds to the combination in this range, the symptom corresponding to the combination can be specified.
The performance deterioration factors 330 store performance deterioration factors corresponding to the respective symptoms. Although not shown in the example shown in FIG. 3, a plurality of factors may correspond to one symptom. The countermeasure 350 stores a countermeasure corresponding to each factor. As in the example shown in the symptom A360, a plurality of measures may correspond to one factor.
[0022]
The component type 320 included in the performance degradation condition 310 of the operation know-how information 300 is different from the actual component 11 in the monitored system 10. For example, the FW of the symptom A360 does not indicate a specific FW of the monitored system 10 but means the FW in general. For this reason, the operation know-how information 300 is metadata independent of the system configuration, and need not be defined for each individual system.
[0023]
FIG. 4 is a diagram for explaining an example of the operation information 400 collected from the monitoring target system 10, which will be described next. The example shown here shows a part of the operation information collected every five minutes by the operation information collection unit 110.
[0024]
The operation information 400 includes each component 410 and its operation information 420. In the illustrated example, the operation information of the component Usr1 is the Web response time and the access success rate, and the operation information of the component LB1 is the number of requests and the line usage rate. The operation information of PC1 to PC3 is the CPU usage rate, and the operation information of AP4 and DB5 is the number of connections.
[0025]
In the example shown in FIG. 4, for example, at 9:00 on July 1, the Web response time, which is the operation information of the component Usr1, is 3 seconds. The operation information 420 basically needs to be all collected at each time, but the time stamp of the actually collected operation information may not be the same time. In such a case, it can be dealt with by storing information collected during a certain time width as the same time. Further, the collected operation information does not necessarily need to be accumulated for a certain period of time as shown in FIG. 4, and if the operation information at the latest time is complete, an analysis of a performance deterioration factor is performed on the collected operation information, and the analysis result is obtained. If stored, the operation information used for the analysis may be deleted.
[0026]
FIG. 5 is a diagram for explaining an operation information collection sequence 620 by the operation information collection unit 110 and an analysis sequence 640 that is a process of analyzing a performance deterioration factor triggered by a specification of an analysis condition by a user who manages the system. Next, this will be described.
[0027]
In the operation information collection sequence 620, the operation information collection unit 110 periodically collects operation information from the monitoring target system 10 and stores it in the operation information 400 of the operation management information storage unit 120. The operation information collection sequence 620 basically operates independently of other processing sequences. In the analysis sequence 640 by the user, first, a user who is a system administrator selects a time range to be analyzed and a system to be analyzed using the input device 150 such as a terminal. The designated condition is passed to the performance degradation factor analysis unit 130 via the input / output control unit 140. The performance deterioration factor analysis unit 130 acquires the configuration information 500, the operation information 400, and the operation know-how information 300 from the operation management information storage unit 120 according to the given conditions. After that, the performance deterioration factor analysis unit 130 performs comparison and collation 610 between the operation information and the operation know-how information based on the configuration information. If there is data matching the performance degradation condition 310 of the operation know-how information 300 in the target time range, the performance degradation factor analysis unit 130 outputs the corresponding factor 330 and the countermeasure 350 as a result via the input / output control unit 140. Output to the output device 160.
[0028]
FIG. 6 is a diagram for explaining an automatic monitoring sequence 660 in a case where performance degradation analysis is automatically performed. Next, this will be described.
[0029]
The start of the sequence illustrated in FIG. 6 is based on the premise that the operation information collection sequence 620 described with reference to FIG. 5 has been executed. Then, the input / output control unit 140 issues an instruction to the performance deterioration factor analysis unit 130 to periodically execute the analysis. The performance deterioration factor analysis unit 130 acquires each piece of information from the operation management information storage unit 120 as in the case of the analysis sequence 640 described with reference to FIG. Is output to the input / output control unit 140. When any symptom is included in the collation result, the input / output control unit 140 issues a warning such as transmitting an e-mail to the system administrator or the like via the output device 160 or displaying an alarm on the terminal. In addition, even when there is no particular abnormality, the operation status can be periodically reported to the user who is the administrator.
[0030]
In FIG. 5 and FIG. 6 described above, the result of analyzing the performance degradation factor is described as being reported to the user who is the administrator. However, in the present invention, the service provided by the network system 10 is received via the external network 16. It is also possible for the user to access the input / output control unit 140 of the management system to receive the result of the performance degradation analysis process.
[0031]
FIG. 7 is a flowchart for explaining the processing operation of the comparison 610 with the symptom in the performance deterioration factor analysis unit 130 included in the sequences described with reference to FIGS. 5 and 6, and this will be described next. The collation processing operation is performed for all times in the designated time range. In addition, all the symptoms of the operation information at each time are compared with the performance degradation conditions to check whether or not they match. For the time that matches the symptom, record the relevant symptom and the time. After the comparison of all the symptoms and the time is completed, the suitable time and the symptom, and the corresponding factor and countermeasure are output as a result.
[0032]
(1) When this process is started, first, the corresponding configuration information is specified, and the corresponding operation information is compared with the performance deterioration condition of the symptom (steps 700 to 702).
[0033]
(2) As a result of the comparison in step 702, it is determined whether or not the condition is satisfied. If the condition is satisfied, the number of the relevant symptoms is counted, and the process returns to step 701 to repeat the process (step 703). , 707).
[0034]
(3) If the conditions are not satisfied in the determination in step 703, it is determined whether the comparison for all the symptoms has been completed or not, and whether the comparison for all the target time ranges has been completed. If not, the process returns to step 701 and repeats the process (steps 704 and 705).
[0035]
(4) If the comparisons for all the symptoms have been completed and the comparisons for all the target time ranges have been completed in the determinations of steps 704 and 705, the corresponding factors and countermeasures are determined for the compatible symptoms. The output is performed, and the process ends (step 706).
[0036]
As a pre-process before the performance degradation factor analysis process described above, the performance degradation factor analysis unit 130 statistically quantifies the correlation between the operation information to be analyzed and analyzes the operation information having a low correlation as the analysis target. It is possible to perform a correlation analysis that excludes from the above, thereby speeding up the processing of the performance degradation factor analysis.
[0037]
Here, as a detailed example of the above-described collation processing, using the operation know-how information 300 shown in FIG. 3 and the data of the configuration information 500 shown in FIG. 2B, the operation information 400 shown in FIG. The comparison method at 15:00 will be described. Here, the collation with the symptom B is performed.
[0038]
First, LB, PC, Usr defined in the component type 320 of the symptom B of the performance deterioration condition of the operation know-how information 300 is compared with the configuration information 500, and a component having the component type is extracted. As a result, LB1, PC1, PC2, PC3, PC4, PC5, and Usr1 are obtained as constituent elements. However, since the PC must be connected to the LB as the performance deterioration condition of the symptom B, only the PC1, PC2, and PC3 remain as the PC from the components 530 that are directly dependent. Next, the operation information of the extracted components is compared with the performance deterioration condition 310.
[0039]
In this case, the number of requests for LB1 in the collected operation information 400 is 120, the line usage rate is 40%, and the performance deterioration condition 310 is satisfied. The maximum deviation of the CPU usage rates of PC1, PC2, and PC3 is 33.3, and the Web response time of Usr1 is 10 seconds, which also satisfies the performance degradation condition 310. Therefore, the symptom B occurred in the monitoring target system 10 at 15:00 on July 2, and it can be said that the condition was met.
[0040]
FIG. 8 is a diagram illustrating an example of an analysis result displayed on the output device 160 via a GUI or the like as a result of the analysis sequence 640 by the performance deterioration factor analysis unit 130, and the contents output as a result of the analysis will be described.
[0041]
The transition 800 of the symptom shown in FIG. 8 displays the transition of the matching ratio over time for each symptom. In the example of the operation information 400 shown in FIG. 4, the operation information is collected every 5 minutes, and when the display is performed per hour, the area indicated by hatching becomes the maximum, that is, 100% when the symptom matches 12 times. If there is a match with the symptom within the displayed time range, a factor for the relevant symptom and a countermeasure 840 are displayed. The factor and the countermeasure 840 can be obtained by referring to the factor 330 and the countermeasure 350 corresponding to the relevant symptom from the operation know-how information 300. Also, the name of the component specified at the time of matching with the symptom can be embedded in the displayed text. In the example of the factor and the countermeasure 840 shown in FIG. 8, since the symptom B is compatible over three days from July 2 to July 4, the factor corresponding to the symptom B and the countermeasure are set to the factor 330. And the countermeasure 350 are extracted and displayed. In addition, since symptom C is compatible over two days from July 3 to July 4, the factor and countermeasure corresponding to symptom C are extracted with reference to factor 330 and countermeasure 350. And show it.
[0042]
FIG. 9 is a conceptual diagram illustrating an example of a case in which the symptom B370 is to be predicted in advance one day before it is caused by the establishment of 100%. Before this occurs, a method of predicting in advance will be described.
[0043]
In FIG. 9, the vertical axis 850 is the arrival rate of the symptom B370, which can be represented by the average value of the ratio to the threshold value defined in the performance deterioration condition 310. For example, if the number of requests of LB1 is 75, 75%, if the Web response time of Usr1 is 2 seconds, it will be 25%. If the two are performance degradation conditions, the arrival rate of symptoms will be 50% on average on average. That is, if the arrival rate exceeds 100%, symptoms will occur.
The horizontal axis 855 is a time axis, and as a whole, FIG. 9 shows a time change 860 of the arrival rate of the symptom B. The time change 860 of the arrival rate can be obtained as an approximate straight line (curve) 865 by regression analysis or the like. For example, when an alarm is issued one day before the occurrence of the symptom, an approximate straight line 865 is extrapolated from the current 875 to the future, and it is determined whether the time at which 100% is reached is within one day from the current 875. In the case of the example of FIG. 9, since it reaches 100% just one day later, automatic notification can be performed by an alarm or the like.
[0044]
FIG. 10 is a flowchart for explaining the processing operation of the above-described advance prediction, which will be described next. This processing operation is performed by the performance deterioration factor analysis unit 130.
[0045]
In the processing shown in FIG. 10, if the arrival rates are obtained for all the times, the values vary greatly. Therefore, here, the target time range is divided into a plurality of time widths, and the arrival rates averaged over the respective time widths Is obtained, and when the arrival rate is obtained for all the time ranges, an approximate straight line is obtained, and whether or not the threshold is reached after the set time is obtained by extrapolation. Then, when a prediction of reaching any symptom is obtained, the user is notified.
[0046]
(1) When this processing is started, first, the target time range is divided into appropriate time widths, the corresponding configuration information is specified, the corresponding operation information is compared with the performance degradation condition of the symptom, Based on the comparison result, the arrival rate is calculated for the relevant symptom (steps 720 to 724).
[0047]
(2) It is determined whether the comparison between the operation information and the performance degradation condition of the symptom has been completed for all the target time widths for all the time widths for all the symptoms, and if not completed, step 722 The process is returned to and the process is repeated (steps 725 to 727).
[0048]
(3) If all the comparisons have been completed in the determinations of steps 725 to 727, the relationship between the time and the arrival rate is linearly approximated and the threshold is reached after the set time, that is, the example described with reference to FIG. In the case of (1), it is determined whether the arrival rate reaches 100% after one day (steps 728 and 729).
[0049]
(4) If the threshold value is reached after the set time in the determination at step 729, the user as a system administrator is notified to that effect, and the process returns to step 728 to repeat the process (step 731).
[0050]
(5) If it is determined in step 729 that the threshold value has not been reached after the set time, it is determined whether or not the approximation processing has been completed for all applicable symptoms, and if not, the flow returns to the processing from step 728. The process is repeated, and if completed, the process ends (steps 730, 740).
[0051]
Each processing according to the above-described embodiment of the present invention can be configured as a processing program, and the processing program is provided by being stored in a recording medium such as an HD, a DAT, an FD, an MO, a DVD-ROM, and a CD-ROM. can do.
[0052]
【The invention's effect】
As described above, according to the present invention, a performance deterioration event, a factor and a countermeasure are specified from know-how information on a performance deterioration event, and system configuration definition information and operation information, and a time change of performance is estimated. Performance deterioration can be predicted, and performance analysis of a network system can be supported.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration example of a management system for a network system according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a system configuration example of a monitoring target system and contents of configuration information storing the configuration.
FIG. 3 is a diagram illustrating an example of contents of operation know-how information in which a Web system is a monitoring target system;
FIG. 4 is a diagram illustrating an example of operation information collected from a monitoring target system;
FIG. 5 is a diagram illustrating an operation information collection sequence by an operation information collection unit and an analysis sequence as a process of analyzing a performance deterioration factor triggered by specification of an analysis condition by a user who manages the system.
FIG. 6 is a diagram illustrating an automatic monitoring sequence when performance degradation analysis is automatically performed.
FIG. 7 is a flowchart illustrating a processing operation of comparing with a symptom in a performance deterioration factor analysis unit included in the sequences described with reference to FIGS. 5 and 6;
FIG. 8 is a diagram illustrating an example of an analysis result displayed on an output device via a GUI or the like as a result of an analysis sequence in a performance deterioration factor analysis unit.
FIG. 9 is a conceptual diagram illustrating an example of a case in which a symptom B is to be predicted in advance one day before the symptom occurs with the probability of 100%.
FIG. 10 is a flowchart illustrating a processing operation of the above-described advance prediction.
[Explanation of symbols]
10 monitoring target system 11 component 15 internal network (subnet A)
16 External network 100 Management system 110 Operation information collection unit 120 Operation management information storage unit 130 Performance deterioration factor analysis unit 140 Input / output control unit 150 Input device 160 Output device 300 Operation know-how information 310 Condition 330 Factor 350 Countermeasure 400 Operation information 500 Configuration information Usr1 Probe FW1 Firewall LB1 Load balancers PC1 to PC5 General-purpose computers Web1 to Web3 Web server AP4 AP server DB5 DB server

Claims

ネットワークシステムの性能や稼動状態を管理する管理システムにおいて、監視対象システムとしての前記ネットワークシステムから収集した稼動情報格納部と、性能劣化が生じる条件としてそれぞれの性能劣化事象に対して稼動情報の値の範囲を定義した性能劣化条件格納部と、前記性能劣化条件格納部に格納された各性能劣化条件に対応した性能劣化要因を定義した性能劣化要因格納部と、前記稼動情報格納部と前記性能劣化条件格納部とのデータを比較することにより、性能劣化条件を特定し前記性能劣化要因格納部から対応する性能劣化要因を特定する性能劣化要因解析手段とを備えることを特徴とする管理システム。In a management system that manages the performance and operation state of a network system, an operation information storage unit collected from the network system as a monitored system and a value of operation information for each performance deterioration event as a condition under which performance deterioration occurs. A performance degradation condition storage unit defining a range, a performance degradation factor storage unit defining performance degradation factors corresponding to each performance degradation condition stored in the performance degradation condition storage unit, the operation information storage unit, and the performance degradation A management system comprising: a performance degradation factor analysis unit that identifies performance degradation conditions by comparing data with a condition storage unit and identifies a corresponding performance degradation factor from the performance degradation factor storage unit.

ネットワークシステムの性能や稼動状態を管理する管理システムにおいて、監視対象システムとしての前記ネットワークシステムから収集した稼動情報格納部と、前記監視対象システムの構成及び構成間の関連を定義した構成情報格納部と、性能劣化が生じる条件としてそれぞれの性能劣化事象に対して稼動情報の値の範囲を定義した性能劣化条件格納部と、前記性能劣化条件格納部に格納された各性能劣化条件に対応した性能劣化要因を定義した性能劣化要因格納部と、前記構成情報格納部の構成情報を元に前記稼動情報格納部から対象となる稼動情報を特定し、該稼働情報と前記性能劣化条件格納部のデータとを比較することにより性能劣化条件を特定し前記性能劣化要因格納部から対応する性能劣化要因を特定する性能劣化要因解析手段とを備えることを特徴とする管理システム。In a management system that manages the performance and operation state of a network system, an operation information storage unit collected from the network system as a monitored system, and a configuration information storage unit that defines the configuration of the monitored system and the relationship between the configurations. A performance degradation condition storage unit that defines a range of operation information values for each performance degradation event as a condition under which performance degradation occurs, and a performance degradation corresponding to each performance degradation condition stored in the performance degradation condition storage unit. A performance deterioration factor storage unit defining a factor, and target operation information is specified from the operation information storage unit based on the configuration information of the configuration information storage unit, and the operation information and the data of the performance deterioration condition storage unit are specified. A performance degradation factor analysis method for identifying a performance degradation condition by comparing the performance degradation factors and identifying a corresponding performance degradation factor from the performance degradation factor storage unit. Management system characterized in that it comprises and.

前記性能劣化要因に対応した対策を定義した対策格納部を備え、前記性能劣化要因解析手段は、特定された性能劣化要因に対応する対策を特定することを特徴とする請求項１または２記載の管理システム。3. The method according to claim 1, further comprising a measure storage unit that defines a measure corresponding to the performance deterioration factor, wherein the performance deterioration factor analysis unit specifies a measure corresponding to the specified performance deterioration factor. Management system.

前記性能劣化要因解析手段に対して定期的に性能劣化解析を実行させ、性能劣化要因を特定した場合に自動的に外部に対して通知を行わせる自動通知手段を備えることを特徴とする請求項１、２または３記載の管理システム。An automatic notifying means for causing the performance deterioration factor analysis means to periodically perform a performance deterioration analysis and automatically notifying the outside when the performance deterioration factor is specified, further comprising an automatic notifying means. 4. The management system according to 1, 2, or 3.

前記性能劣化要因解析手段は、解析の対象となる稼動情報間の相関関係を統計的に定量化し、相関が低い稼動情報を解析の対象から外す相関分析手段を備えることを特徴とする請求項１、２または３記載の管理システム。2. The performance deterioration factor analysis means includes a correlation analysis means for statistically quantifying a correlation between operation information to be analyzed and excluding operation information having a low correlation from the analysis target. 4. The management system according to 2, 3 or 4.

前記性能劣化要因解析手段は、前記性能劣化条件格納部で定義された前記稼動情報の値の範囲に入らない値について前記値の範囲に対する割合を見積もる到達率解析部と、前記到達率解析部によって得られた割合の時間変化を直線または曲線で近似する近似解析部と、前記割合が１００％に到達する時刻を前記直線または曲線を外挿することにより見積もり予め設定した時間以内に到達する見込みとなった場合に外部に対して通知を行う性能劣化予測通知手段とを備えることを特徴とする請求項１、２または３記載の管理システム。The performance degradation factor analysis means, a reach ratio analysis unit for estimating a ratio of a value that does not fall within the value range of the operation information defined in the performance degradation condition storage unit to the value range, and the reach ratio analysis unit An approximation analysis unit that approximates the time change of the obtained ratio with a straight line or a curve, and estimates the time at which the ratio reaches 100% by extrapolating the straight line or the curve and expects to arrive within a preset time. 4. The management system according to claim 1, further comprising: a performance deterioration prediction notifying unit for notifying the outside when the condition becomes true.

ネットワークシステムの性能や稼動状態を管理する管理方法において、監視対象システムとしての前記ネットワークシステムから稼動情報を収集し、前記稼働情報と性能劣化が生じる条件としてそれぞれの性能劣化事象に対して稼動情報の値の範囲を定義した性能劣化条件とを比較することにより、性能劣化条件を特定し、前記特定した各性能劣化条件に対応した性能劣化要因を特定する性能劣化解析を行うことを特徴とする管理方法。In a management method for managing the performance and operating state of a network system, operating information is collected from the network system as a monitoring target system, and the operating information and the performance information for each performance degradation event are set as conditions under which performance degradation occurs. A management method comprising: performing a performance deterioration analysis for identifying a performance deterioration condition by comparing a performance deterioration condition defining a value range, and specifying a performance deterioration factor corresponding to each of the specified performance deterioration conditions. Method.

ネットワークシステムの性能や稼動状態を管理する管理方法において、監視対象システムとしての前記ネットワークシステムから稼動情報を収集し、前記監視対象システムの構成及び構成間の関連を定義した構成情報を元に対象となる稼働情報を特定し、特定した稼働情報と性能劣化が生じる条件としてそれぞれの性能劣化事象に対して稼動情報の値の範囲を定義した性能劣化条件とを比較することにより、性能劣化条件を特定し、前記特定した各性能劣化条件に対応した性能劣化要因を特定する性能劣化解析を行うことを特徴とする管理方法。In a management method for managing the performance and operation state of a network system, operating information is collected from the network system as a monitored system, and the target is configured based on configuration information that defines the configuration of the monitored system and the relationship between the configurations. Identify performance information by comparing the specified operation information with the performance degradation condition that defines the range of the value of the operation information for each performance degradation event as a condition that causes performance degradation. And a performance deterioration analysis for specifying a performance deterioration factor corresponding to each of the specified performance deterioration conditions.

前記特定した性能劣化要因に対応した対策を、性能劣化要因に対応した対策を定義した対策に基づいて特定することを特徴とする請求項７または８記載の管理方法。9. The management method according to claim 7, wherein a measure corresponding to the identified performance degradation factor is identified based on a measure defining a measure corresponding to the performance degradation factor.

前記性能劣化解析を定期的に実行し、性能劣化要因を特定した場合に自動的に外部に対して通知を行うことを特徴とする請求項７、８または９記載の管理方法。10. The management method according to claim 7, wherein the performance deterioration analysis is periodically performed, and when a performance deterioration factor is specified, a notification is automatically sent to an external device.

前記性能劣化要因解析は、解析の対象となる稼動情報間の相関関係を統計的に定量化し、相関が低い稼動情報を解析の対象から外す相関分析を行うことを特徴とする請求項７、８または９記載の管理方法。9. The performance degradation factor analysis according to claim 7, wherein the correlation between operation information to be analyzed is statistically quantified, and a correlation analysis for excluding operation information having a low correlation from the analysis target is performed. Or the management method according to 9.

前記性能劣化要因解析は、前記性能劣化条件格納部で定義された前記稼動情報の値の範囲に入らない値について前記値の範囲に対する割合を見積もる到達率解析と、前記到達率解析によって得られた割合の時間変化を直線または曲線で近似する近似解析と、前記割合が１００％に到達する時刻を前記直線または曲線を外挿することにより見積もり予め設定した時間以内に到達する見込みとなった場合に外部に対して通知を行う性能劣化予測通知とを行うことを特徴とする請求項７、８または９記載の管理方法。The performance degradation factor analysis is obtained by the arrival rate analysis for estimating a ratio of the value outside the value range of the operation information defined in the performance degradation condition storage unit to the value range, and the arrival rate analysis. An approximation analysis that approximates the time change of the ratio with a straight line or a curve, and the time when the ratio reaches 100% is estimated by extrapolating the straight line or the curve. 10. The management method according to claim 7, wherein a performance degradation prediction notification for notifying the outside is performed.

ネットワークシステムの性能や稼動状態を管理する管理プログラムであって、監視対象システムとしての前記ネットワークシステムから稼動情報を収集するステップと、前記稼働情報と性能劣化が生じる条件としてそれぞれの性能劣化事象に対して稼動情報の値の範囲を定義した性能劣化条件とを比較することにより、性能劣化条件を特定するステップと、前記特定した各性能劣化条件に対応した性能劣化要因を特定するステップとを有することを特徴とする管理プログラム。A management program for managing the performance and operation state of a network system, wherein the operation information is collected from the network system as a monitoring target system, and the operation information and the performance degradation event A step of identifying a performance degradation condition by comparing a performance degradation condition defining a range of the value of the operation information with the above, and a step of identifying a performance degradation factor corresponding to each of the identified performance degradation conditions. A management program characterized by the following.