JP5735326B2

JP5735326B2 - IT failure detection / retrieval device and program

Info

Publication number: JP5735326B2
Application number: JP2011076392A
Authority: JP
Inventors: 小林　宏至; 宏至小林
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2011-03-30
Filing date: 2011-03-30
Publication date: 2015-06-17
Anticipated expiration: 2031-03-30
Also published as: JP2012212228A

Description

本発明は、監視サーバが生成する障害イベントに基づいて、ＩＴ障害を検知すると共に過去の類似障害を検索する装置及び当該装置をソフトウェア的に実現するプログラムに関する。 The present invention relates to a device that detects an IT failure and searches for past similar failures based on a failure event generated by a monitoring server, and a program that implements the device in software.

企業経営や社会に大きな影響を与えるＩＴ障害が最近多発している。ＩＴ障害を引き起こす原因は、ハードウェア障害、プログラムのバグ等様々である。特に、社会インフラを提供する企業においては、たとえＩＴ障害が発生した場合でも、事業継続の観点から、目標復旧時間内に復旧を完了し、経営や社会に与える影響を最小限に抑えることが社会的に求められている。 IT obstacles that have a major impact on corporate management and society have recently occurred frequently. There are various causes of IT failures, such as hardware failures and program bugs. In particular, for companies that provide social infrastructure, even if an IT failure occurs, from the viewpoint of business continuity, it is necessary to complete the recovery within the target recovery time and minimize the impact on management and society. Is sought after.

ＩＴ障害による被害の拡大を防ぐには、迅速で適切な初動対応が最も重要となる。このため、復旧担当者による適切な対処を支援する技術、例えば過去の類似するＩＴ障害の発生時に実行した対処方法を検索できる支援技術が求められている。 In order to prevent the spread of damage due to IT failure, quick and appropriate initial response is the most important. For this reason, there is a need for a technology that supports an appropriate response by a person in charge of recovery, for example, a support technology that can search for a response method executed when a similar IT failure occurred in the past.

従来の支援技術は、障害情報とその対処方法をデータベースで管理し、ＩＴ障害に応じた対処方法の検索を可能にしている。特許文献１は、障害情報と対処方法をキーとする検索技術を記述する。特許文献２は、障害情報を、直近のＩＴ障害で更新するための技術を記述する。 The conventional assistive technology manages failure information and its coping method in a database, and makes it possible to search for coping methods according to IT failures. Patent Document 1 describes a search technique using failure information and a coping method as keys. Patent Document 2 describes a technique for updating failure information with the latest IT failure.

特許３８６７８６８号Japanese Patent No. 3867868 特許３９８３１３８号Japanese Patent No. 3983138

Fisher, Douglas H. “Knowledge acquisition via incremental clustering”, Machine Learning 2, 139-172, 1987Fisher, Douglas H. “Knowledge acquisition via incremental clustering”, Machine Learning 2, 139-172, 1987 Shohei Hido, Tsuyoshi Ide, Hisashi Kashima, Harunobu Kubo and Hirofumi Matsuzawa, “Unsupervised Change Analysis using Supervised Learning”, In Proc. 2008 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2008), Osaka, Japan, May, 2008Shohei Hido, Tsuyoshi Ide, Hisashi Kashima, Harunobu Kubo and Hirofumi Matsuzawa, “Unsupervised Change Analysis using Supervised Learning”, In Proc. 2008 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2008), Osaka, Japan, May, 2008

ＩＴシステムの運用中には、監視イベントの属性値に変化を伴う変更が発生することがある。例えばソフトウェアのバージョンがアップデートされると、属性「プロセス名」の属性値に変化が発生する。また例えばサーバ機器が変更されると、属性「サーバ名」の属性値に変化が発生する。 During operation of the IT system, a change accompanying a change may occur in the attribute value of the monitoring event. For example, when the version of the software is updated, the attribute value of the attribute “process name” changes. For example, when the server device is changed, the attribute value of the attribute “server name” changes.

この種の変更は、多くの場合、ＩＴ障害の原因そのものに変化を与えることはない。しかし、表面的な変化であるとしても、監視イベントの属性値が変化すると、障害情報の連続性が損なわれ、過去の対処方法を検索することができない。 This type of change often does not change the exact cause of IT failure. However, even if it is a superficial change, if the attribute value of the monitoring event changes, the continuity of the failure information is lost, and it is not possible to search for past countermeasures.

そこで、本発明者は、属性値に変化がある場合でも、新たに取得したＩＴ障害に類似する過去のＩＴ障害の検索を可能にするための仕組みを提供する。具体的には、ＩＴ障害を監視する監視サーバが生成した監視イベントを逐次取得し、一つの原因よって発生した単数又は複数のイベントから構成されるＩＴ障害イベントブロックを作成する。次に、ＩＴ障害イベントブロックに属するイベントに頻出する属性値をもとに特徴情報を求める。また、イベントブロックの特徴情報に発生した変化の内容と発生時間を変化テーブルに記録する。その後、特徴情報に基づいてＩＴ障害データベースを検索し、新規に取得されたＩＴ障害イベントブロックに類似する過去のＩＴ障害イベントブロックを検索する。この際、変化テーブルを参照し、検索対象とする時間範囲に応じて検索処理で使用する特徴情報の内容を補正する。 Therefore, the present inventor provides a mechanism for enabling a search for past IT failures similar to newly acquired IT failures even when the attribute value is changed. Specifically, the monitoring event generated by the monitoring server that monitors the IT failure is sequentially acquired, and an IT failure event block including one or a plurality of events generated due to one cause is created. Next, feature information is obtained based on attribute values that frequently appear in events belonging to the IT failure event block. In addition, the content and time of the change that has occurred in the feature information of the event block are recorded in the change table. Thereafter, the IT failure database is searched based on the feature information, and a past IT failure event block similar to the newly acquired IT failure event block is searched. At this time, referring to the change table, the content of the feature information used in the search process is corrected according to the time range to be searched.

本発明によれば、何らかの理由で監視イベントの属性値に変化があった場合でも、過去に発生した類似のＩＴ障害イベントを検索することができる。
上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。 According to the present invention, it is possible to search for similar IT failure events that occurred in the past even if the attribute value of the monitoring event has changed for some reason.
Problems, configurations, and effects other than those described above will be clarified by the following description of embodiments.

ＩＴ障害検知・検索システムのシステム構成例を示す図。The figure which shows the system structural example of IT failure detection / search system. 監視サーバが生成する監視イベント例を示す図。The figure which shows the example of the monitoring event which a monitoring server produces | generates. ＩＴ障害検知・検索コンピュータのシステム構成例を示す図。The figure which shows the system structural example of IT failure detection / search computer. ＩＴ障害ＤＢが保持するＩＴ障害イベントブロックテーブルとＩＴ障害特徴テーブルの具体例を示す図。The figure which shows the specific example of the IT failure event block table and IT failure characteristic table which IT failure DB hold | maintains. ＩＴ障害ＤＢが保持するＩＴ障害分類木と変化テーブルの具体例を示す図。The figure which shows the specific example of IT failure classification tree and change table which IT failure DB hold | maintains. ＩＴ障害検知・検索プログラムの設定画面例を示す図。The figure which shows the example of a setting screen of IT failure detection and search program. ＩＴ障害検知・検索プログラムのＩＴ障害検知画面例を示す図。The figure which shows the example of an IT failure detection screen of an IT failure detection / search program. ＩＴ障害検知・検索処理の概要を示すフローチャート。The flowchart which shows the outline | summary of IT failure detection and a search process. ＩＴ障害分類木のインクリメント処理を説明するフローチャート。The flowchart explaining the increment process of IT failure classification tree. ＩＴ障害分類木のデクリメント処理を説明するフローチャート。The flowchart explaining the decrement process of IT failure classification tree. ＩＴ障害分類木のインクリメント処理を説明する図。The figure explaining the increment process of IT failure classification tree. ＩＴ障害分類木のデクリメント処理を説明する図。The figure explaining the decrement process of IT failure classification tree. 変化検知・解析処理を説明する図。The figure explaining a change detection and analysis process. 変化解析処理を説明する図。The figure explaining a change analysis process.

以下、図面に基づいて、本発明の実施の形態を説明する。なお、本発明の実施の態様は、後述する形態例に限定されるものではなく、その技術思想の範囲において、種々の変形が可能である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The embodiment of the present invention is not limited to the embodiments described later, and various modifications are possible within the scope of the technical idea.

（ＩＴ障害検知・検索システムのシステム構成）
図１に、ＩＴ障害検知・検索コンピュータ１０３を実装するＩＴ障害検知・検索システムの構成例を示す。図１に示すＩＴ障害検知・検索システムは、監視対象サーバ群１０１と、そのコンピュータの状態を監視し、その内容に基づいて監視イベントを生成する監視サーバ１０２と、監視サーバ１０２が生成した監視イベントを解析し、ＩＴ障害の検知や類似する過去のＩＴ障害を検索するＩＴ障害検知・検索コンピュータ１０３と、監視イベントのうちＩＴ障害に関わる情報を格納するＩＴ障害ＤＢ１０４とを有している。 (System configuration of IT failure detection / search system)
FIG. 1 shows a configuration example of an IT failure detection / search system in which the IT failure detection / search computer 103 is mounted. The IT failure detection / retrieval system shown in FIG. 1 monitors a monitoring target server group 101, a computer status of the computer, generates a monitoring event based on the contents, and a monitoring event generated by the monitoring server 102 And an IT failure detection / retrieval computer 103 that searches for IT failures and searches for similar past IT failures, and an IT failure DB 104 that stores information related to IT failures among monitoring events.

このうち、監視サーバ１０２は、監視対象サーバ群１０１の状態（例えば、監視対象サーバ上で実行されているプロセスの死活など）を監視し、その状態に応じた監視イベントを生成する機能を提供する。生成した監視イベントはＩＴ障害検知・検索コンピュータ１０３に送信される。 Among these, the monitoring server 102 provides a function of monitoring the state of the monitoring target server group 101 (for example, the life or death of a process executed on the monitoring target server) and generating a monitoring event according to the state. . The generated monitoring event is transmitted to the IT failure detection / search computer 103.

図２は、監視サーバ１０２が生成する監視イベントの発生例を示している。監視イベントは、イベントを一意に識別するイベントＩＤ２０１、監視サーバ１０２が監視イベントを生成した日時を表す発生日時２０２、監視イベントの属性２０３から構成されている。 FIG. 2 shows an example of occurrence of a monitoring event generated by the monitoring server 102. The monitoring event includes an event ID 201 that uniquely identifies the event, an occurrence date and time 202 that represents the date and time when the monitoring server 102 generated the monitoring event, and an attribute 203 of the monitoring event.

監視イベントの属性は、「情報」、「警告」等の監視イベントの重大度を表す種類２０４、イベントが対象としているプロセス名を表すソース２０５、ソース２０５の状態を一意に識別するイベント番号２０６、ソース２０５を起動したユーザを示すユーザ２０７、ソース２０５が実行されている監視サーバ群１０１内のコンピュータを示すコンピュータ２０８から構成される。 The attributes of the monitoring event include a type 204 indicating the severity of the monitoring event such as “information” and “warning”, a source 205 indicating the process name targeted by the event, an event number 206 for uniquely identifying the state of the source 205, A user 207 indicating a user who started the source 205 and a computer 208 indicating a computer in the monitoring server group 101 on which the source 205 is executed are configured.

図３に、ＩＴ障害検知・検索コンピュータ１０３の構成例を示す。ＩＴ障害検知・検索コンピュータ１０３は、コンピュータ本体３００と、入力装置３３０と、表示装置３３１と、通信装置３３２とから構成される。なお、通信装置３３２は、監視サーバ１０２及びＩＴ障害ＤＢ１０４と通信する。 FIG. 3 shows a configuration example of the IT failure detection / search computer 103. The IT failure detection / search computer 103 includes a computer main body 300, an input device 330, a display device 331, and a communication device 332. The communication device 332 communicates with the monitoring server 102 and the IT failure DB 104.

コンピュータ本体３００は、データ演算をするＣＰＵ３０１、ＲＯＭ３０２、ＲＡＭ３１０、データを格納するハードディスク駆動装置３２０、これらデバイス間のデータ転送を実現するＣＰＵバス３０７、これらデバイスとＣＰＵバス３０７を結合するインターフェース３０３〜３０６で構成される。 The computer main body 300 includes a CPU 301 that performs data calculation, a ROM 302, a RAM 310, a hard disk drive 320 that stores data, a CPU bus 307 that realizes data transfer between these devices, and interfaces 303 to 306 that couple these devices to the CPU bus 307. Consists of.

ＲＡＭ３１０には、少なくとも、(1) ＣＰＵ３０１に演算処理させるＩＴ障害検知・検索プログラム３１１の実行領域と、(2) 演算時に一時的に生成させるデータを格納する作業領域３１２が確保される。また、ハードディスク駆動装置３２０の記憶領域には、少なくとも、(1) ＩＴ障害検知・検索プログラムの格納領域としてのプログラム格納部３２１と、(2) 監視サーバ１０２及びＩＴ障害ＤＢ１０４から取得したデータを一時的に格納するデータ格納部３２２が確保される。 The RAM 310 has at least (1) an execution area of the IT failure detection / retrieval program 311 to be processed by the CPU 301 and (2) a work area 312 for storing data to be temporarily generated during the calculation. The storage area of the hard disk drive 320 includes at least (1) a program storage unit 321 as a storage area for the IT failure detection / retrieval program, and (2) data acquired from the monitoring server 102 and the IT failure DB 104 temporarily. Thus, a data storage unit 322 for storing data is secured.

（データベースのデータ構造）
図４−１は、ＩＴ障害ＤＢ１０４が保持するＩＴ障害イベントブロックテーブル４００及びＩＴ障害特徴テーブル４１０の構造例を示す。図４−２は、ＩＴ障害ＤＢ１０４が保持するＩＴ障害分類木４２０及び変化テーブル４３０の構造例を示す。 (Data structure of the database)
FIG. 4A shows an example of the structure of the IT failure event block table 400 and the IT failure feature table 410 held in the IT failure DB 104. FIG. 4B illustrates a structure example of the IT failure classification tree 420 and the change table 430 held by the IT failure DB 104.

ＩＴ障害イベントブロックテーブル４００は、イベントブロックを一意に特定するイベントブロックＩＤ４０１と、イベントブロックに含まれる単数又は複数のＩＴ障害イベント４０２から構成される。 The IT failure event block table 400 includes an event block ID 401 that uniquely identifies an event block, and one or more IT failure events 402 included in the event block.

ここで、イベントブロックとは、一つのＩＴ障害が発生したときに、それに伴って一定時間内に発生した単数又は複数のＩＴ障害イベントの集合である。イベントブロックの作成方法の説明は、ＩＴ障害検知・検索プロセスの説明の際に行う。また、ＩＴ障害イベントは、監視イベントのうち属性「種類」２０４の値が、ＩＴ障害検知・検索プログラムの設定画面５００（図５）のＩＴ障害イベント種類入力部５０１で入力された値のいずれかである監視イベントをいう。 Here, the event block is a set of one or a plurality of IT failure events that occur within a certain time when one IT failure occurs. The event block creation method will be described when the IT failure detection / retrieval process is described. Further, the IT failure event is one of the values of the attribute “type” 204 in the monitoring event, which is input in the IT failure event type input unit 501 of the IT failure detection / search program setting screen 500 (FIG. 5). This is a monitoring event.

ＩＴ障害特徴テーブル４１０は、同じ特徴を持つＩＴ障害を一意に特定するＩＴ障害ＩＤ４１１と、ＩＴ障害の特徴４１２と、同じＩＴ障害の特徴を持つイベントブロックＩＤのリスト４１３から構成される。ＩＴ障害の特徴４１２の求め方は後述する。 The IT failure feature table 410 includes an IT failure ID 411 that uniquely identifies an IT failure having the same feature, an IT failure feature 412, and a list 413 of event block IDs having the same IT failure feature. A method for obtaining the IT failure feature 412 will be described later.

ＩＴ障害分類木４２０は、特徴４１２に基づくイベントブロックの分類により作成される。図４−２は、当該分類木の概念構成を表している。当該分類木の作成方法は、ＩＴ障害分類木のインクリメント処理/デクリメント処理プロセスにおいて後に詳細に説明する。分類木の各ノードには、その要素として分類されたＩＴ障害イベントブロックが保持される。分類木の構築方法については後述する。 The IT failure classification tree 420 is created by classifying event blocks based on the feature 412. FIG. 4B illustrates a conceptual configuration of the classification tree. The method of creating the classification tree will be described in detail later in the increment / decrement process of the IT failure classification tree. Each node of the classification tree holds an IT failure event block classified as its element. A method for constructing the classification tree will be described later.

変化テーブル４３０は、イベントブロックの特徴４１２に発生した変化の内容と発生日時を保持するテーブルである。変化テーブル４３０は、発生した変化を一意に特定する変化ＩＤ４３１、変化が発生した日付を示す変化発生日４３２、特徴４１２を構成する属性のうち変化が発生した属性を示す属性名４３３、同属性における変化前の属性値を示す変化前属性値４３４、同属性における変化後の属性値を示す変化後属性値４３５から構成される。例えば変化ＩＤ「１」には、「２０１０／０２／０３」に、属性「コンピュータ０」の属性値が「サーバ１０」から「サーバ２３」に変更されたことが記録されている。 The change table 430 is a table that holds the content of the change that occurred in the event block feature 412 and the date and time of occurrence. The change table 430 includes a change ID 431 that uniquely identifies the change that has occurred, a change occurrence date 432 that indicates the date on which the change has occurred, an attribute name 433 that indicates an attribute that has changed among the attributes constituting the feature 412, and An attribute value 434 before change indicating an attribute value before change, and an attribute value 435 after change indicating an attribute value after change in the same attribute. For example, the change ID “1” records in “2010/02/03” that the attribute value of the attribute “computer 0” has been changed from “server 10” to “server 23”.

（設定画面例）
図５及び図６に、表示装置３３１に表示されるＩＴ障害検知・検索プログラム３１１のＧＵＩ画面例を示す。 (Setting screen example)
5 and 6 show examples of GUI screens of the IT failure detection / search program 311 displayed on the display device 331. FIG.

図５は、ＩＴ障害検知・検索プログラム３１１の初期画面として、表示装置３３１に最初に表示される画面（設定画面５００）を表している。この設定画面５００は、ユーザがＩＴ障害とみなすイベントの登録時等に使用される。ＩＴ障害検知・検索プログラム３１１は、設定画面５００の設定内容に基づいて、後述するＩＴ障害検知処理及び検索処理を実行する。 FIG. 5 shows a screen (setting screen 500) initially displayed on the display device 331 as an initial screen of the IT failure detection / search program 311. This setting screen 500 is used at the time of registration of an event that the user regards as an IT failure. The IT failure detection / search program 311 executes IT failure detection processing and search processing, which will be described later, based on the setting contents of the setting screen 500.

設定画面５００は、ＩＴ障害イベント種類入力部５０１、最大イベント時間間隔入力部５０２、変化検知時間窓幅入力部５０３、変化量閾値入力部５０４、変化解析時間窓幅入力部５０５、開始ボタン５０６から構成される。
ＩＴ障害イベント種類入力部５０１は、ＩＴ障害イベントとみなすイベントの属性「種類」２０４の入力欄である。図５は、「エラー」、「致命的」、「緊急」のラベルを有する監視イベントをＩＴ障害イベントとして扱う場合である。 The setting screen 500 includes an IT failure event type input unit 501, a maximum event time interval input unit 502, a change detection time window width input unit 503, a change amount threshold value input unit 504, a change analysis time window width input unit 505, and a start button 506. Composed.
The IT failure event type input unit 501 is an input field for an attribute “type” 204 of an event regarded as an IT failure event. FIG. 5 shows a case where a monitoring event having labels of “error”, “fatal”, and “emergency” is handled as an IT failure event.

最大イベント時間間隔入力部５０２は、ＩＴ障害イベントとみなされるイベントが複数観察された場合に、同じ原因により発生したＩＴ障害イベントとみなす時間の範囲を入力するための項目欄である。 The maximum event time interval input unit 502 is an item field for inputting a time range to be regarded as an IT failure event caused by the same cause when a plurality of events regarded as an IT failure event are observed.

例えばＩＴ障害イベントとみなされるイベントが観察されてから次に観察されるまでの時間が、最大イベント時間間隔入力部５０２に入力された時間に収まる場合、該当する複数のイベントは同じ原因に起因して発生したイベントであると判定する。同じ原因に起因する１つ又は複数のイベントを、１つのイベントブロックとして扱う。図５の場合、時間の単位は「分」である。もっとも、時間単位は、秒でも、時間でも、日でも、その他の単位でも良い。 For example, if the time from when an event regarded as an IT failure event is observed to when it is observed next falls within the time input to the maximum event time interval input unit 502, the corresponding events are attributed to the same cause. It is determined that the event has occurred. One or more events caused by the same cause are treated as one event block. In the case of FIG. 5, the unit of time is “minute”. However, the time unit may be seconds, hours, days, or other units.

変化検知時間窓幅入力部５０３は、変化を検知する時間の範囲を指定するための項目欄である。図５の場合、時間の単位は「日」である。もっとも、時間単位は、秒でも、分でも、時間でも、その他の単位でも良い。 The change detection time window width input unit 503 is an item column for designating a time range for detecting a change. In the case of FIG. 5, the unit of time is “day”. However, the time unit may be seconds, minutes, hours, or other units.

変化量閾値入力部５０４は、変化の発生が検出された場合に、変化解析を実行するか否かの判断基準（変化量）の閾値を入力するための項目欄である。これを設定することにより、ノイズによる変化の誤検知を回避することができる。 The change amount threshold value input unit 504 is an item field for inputting a threshold value of a criterion (change amount) for determining whether or not to execute change analysis when occurrence of a change is detected. By setting this, erroneous detection of changes due to noise can be avoided.

変化解析時間窓幅入力部５０５は、変化解析を実行する時間範囲を指定するための項目欄である。図５の場合、時間の単位は「日」である。もっとも、時間単位は、秒でも、分でも、時間でも、その他の単位でも良い。開始ボタン５０６は、ＩＴ障害検知・検索プログラムの実行を指示するためのボタンである。 The change analysis time window width input unit 505 is an item field for designating a time range for executing change analysis. In the case of FIG. 5, the unit of time is “day”. However, the time unit may be seconds, minutes, hours, or other units. The start button 506 is a button for instructing execution of the IT failure detection / search program.

（ＩＴ障害検知画面例）
図６は、ＩＴ障害検知・検索プログラム３１１が検知したＩＴ障害及び検索した類似ＩＴ障害の結果を表示する画面６００である。画面６００は、「検知したＩＴ障害」を列記するＩＴ障害テーブル６１０と、「発生した障害イベント」の属性を表示する障害イベントテーブル６２０と、「類似ＩＴ障害」の属性を表示する類似ＩＴ障害テーブル６３０で構成される。 (IT failure detection screen example)
FIG. 6 is a screen 600 that displays the IT failure detected by the IT failure detection / retrieval program 311 and the result of the searched similar IT failure. The screen 600 includes an IT failure table 610 that lists “detected IT failures”, a failure event table 620 that displays attributes of “occurrence of failure events”, and a similar IT failure table that displays attributes of “similar IT failures”. 630.

ＩＴ障害テーブル６１０は、検知したＩＴ障害情報を一意に特定するＩＴ障害ＩＤ６１１と、ＩＴ障害を検知した検知日時６１２から構成される。障害イベントテーブル６２０は、ＩＴ障害テーブル６１０内で選択されたＩＴ障害に対して実際に発生した障害イベントの属性の内容を示す表示欄である。属性情報は、イベントＩＤ、発生日時、種類、ソース、イベント番号、ユーザ、コンピュータで構成される。 The IT failure table 610 includes an IT failure ID 611 that uniquely identifies detected IT failure information, and a detection date and time 612 when the IT failure is detected. The failure event table 620 is a display column that shows the contents of the attributes of failure events that actually occurred for the IT failure selected in the IT failure table 610. The attribute information includes an event ID, an occurrence date and time, a type, a source, an event number, a user, and a computer.

類似ＩＴ障害テーブル６３０は、ＩＴ障害をイベントブロック単位で一意に特定するＩＴ障害ＩＤ６３１、類似すると判定されたＩＴ障害イベントブロックを構成する個々の監視イベントの属性６３２、検知したＩＴ障害との類似度６３３から構成されている。 The similar IT failure table 630 includes an IT failure ID 631 that uniquely identifies an IT failure in an event block unit, an attribute 632 of each monitoring event that constitutes an IT failure event block determined to be similar, and a similarity to the detected IT failure 633.

（ＩＴ障害検知・検索動作）
図７に、ＩＴ障害検知・検索システムで実行されるＩＴ障害検知・検索プロセスの概略を示す。 (IT failure detection / search operation)
FIG. 7 shows an outline of an IT failure detection / search process executed by the IT failure detection / search system.

（ステップ７００）
ＩＴ障害検知・検索コンピュータ１０３は、ＩＴ障害検知・検索プログラムの設定画面５００（図５）において開始ボタン５０６のクリック入力を検出すると、設定画面５００を通じて設定された属性値を読み込み、ＩＴ障害検知・検索プログラム３１１に基づいた検知処理と検索処理を開始する。 (Step 700)
When the IT failure detection / search computer 103 detects the click input of the start button 506 on the IT failure detection / search program setting screen 500 (FIG. 5), it reads the attribute value set through the setting screen 500 and detects the IT failure detection / search Detection processing and search processing based on the search program 311 are started.

該当処理の実行に際し、ＩＴ障害検知・検索コンピュータ１０３は、ＩＴ障害イベント種類入力部５０１に設定入力されたＩＴ障害イベントを特定する属性「種類」２０４の属性値と、最大イベント時間間隔入力部５０２に設定入力された時間間隔と、変化検知時間窓幅入力部５０３に設定入力された時間範囲と、変化量閾値入力部５０４に設定入力された閾値と、変化解析時間窓幅入力部５０５に設定入力された時間幅を読み込む。 When executing the corresponding process, the IT failure detection / search computer 103 sets the attribute value of the attribute “type” 204 for specifying the IT failure event set and input to the IT failure event type input unit 501 and the maximum event time interval input unit 502. , The time interval set and input to the change detection time window width input unit 503, the threshold value set and input to the change amount threshold value input unit 504, and the change analysis time window width input unit 505 Read the input time span.

（ステップ７０１）
ＩＴ障害検知・検索コンピュータ１０３は、監視サーバ１０２から通信装置３３２を介して監視イベントを受信する。ＩＴ障害検知・検索コンピュータ１０３は、監視イベントの中からＩＴ障害イベントを選択的に取得し、それ以外のイベントは破棄する。監視イベントがＩＴ障害イベントか否かは、監視イベントの属性「種類」２０４の値が、ＩＴ障害イベント種類入力部５０１に設定入力された値に含まれるか否かに基づいて判定する。この実施例の場合、属性「種類」２０４の値が、「エラー」、「致命的」、「緊急」のいずれであるか判定する。 (Step 701)
The IT failure detection / retrieval computer 103 receives a monitoring event from the monitoring server 102 via the communication device 332. The IT failure detection / search computer 103 selectively acquires an IT failure event from among the monitoring events, and discards other events. Whether or not the monitoring event is an IT failure event is determined based on whether or not the value of the monitoring event attribute “type” 204 is included in the value set and input to the IT failure event type input unit 501. In this embodiment, it is determined whether the value of the attribute “type” 204 is “error”, “fatal”, or “emergency”.

図２に示す監視イベントの場合、イベントＩＤ２０１が「４」、「５」、「６」、「１０」、「１１」、「１２」の監視イベントが、ＩＴ障害イベントとして判定される。これら以外の監視イベントは無視される。 In the case of the monitoring event illustrated in FIG. 2, monitoring events having event IDs 201 of “4”, “5”, “6”, “10”, “11”, and “12” are determined as IT failure events. Other monitoring events are ignored.

（ステップ７０２）
ＩＴ障害検知・検索コンピュータ１０３は、取得されたＩＴ障害イベントからＩＴ障害イベントブロック（図４−１）を生成し、その後、ＩＴ障害イベントブロックの特徴４１２（図４−１）を計算する。さらに、ＩＴ障害検知・検索コンピュータ１０３は、通信装置３３２を介してＩＴ障害ＤＢ１０４のＩＴ障害イベントブロックテーブル４００及びＩＴ障害特徴テーブル４１０を更新する。 (Step 702)
The IT failure detection / search computer 103 generates an IT failure event block (FIG. 4-1) from the acquired IT failure event, and then calculates the IT failure event block feature 412 (FIG. 4-1). Further, the IT failure detection / search computer 103 updates the IT failure event block table 400 and the IT failure feature table 410 in the IT failure DB 104 via the communication device 332.

時間的に連続して発生した２つのＩＴ障害イベントが同じイベントブロックに属するか否かの判定は、２つのＩＴ障害イベントの発生日時２０２の時間間隔が、最大イベント時間間隔入力部５０２（図５）の入力値以内か否かにより判断する。 Whether or not two IT failure events that have occurred in a time sequence belong to the same event block is determined by the maximum event time interval input unit 502 (FIG. 5). ).

ここで、２つのＩＴ障害イベントの時間間隔が最大イベント時間間隔以内であれば、２つのＩＴ障害イベントは、同じイベントブロック内に分類される。一方、２つのＩＴ障害イベントの時間間隔が最大イベント時間間隔外であれば、別々のイベントブロックに分類される。 Here, if the time interval between two IT failure events is within the maximum event time interval, the two IT failure events are classified into the same event block. On the other hand, if the time interval between two IT failure events is outside the maximum event time interval, they are classified into separate event blocks.

例えば、図２の監視イベントの例の場合、イベントＩＤ２０１が「５」と「６」の発生日時２０２の時間間隔は１分２秒である。この値は、設定画面５００（図５）の最大イベント時間間隔入力部５０２で設定された２分以内である。従って、これら２つのＩＴ障害イベントは、ＩＴ障害イベントブロックテーブル４００（図４−１）において、同じイベントブロックＩＤ４０１（すなわち、「１」）に割り当てられている。 For example, in the example of the monitoring event of FIG. 2, the time interval between the occurrence dates 202 of the event IDs 201 “5” and “6” is 1 minute 2 seconds. This value is within 2 minutes set by the maximum event time interval input unit 502 of the setting screen 500 (FIG. 5). Therefore, these two IT failure events are assigned to the same event block ID 401 (ie, “1”) in the IT failure event block table 400 (FIG. 4A).

一方、イベントＩＤ２０１が「６」と「１０」の監視イベントの発生日時２０２の時間間隔は、最大イベント時間間隔以上である。このため、これら２つのＩＴ障害イベントは、ＩＴ障害イベントブロックテーブル４００（図４−１）において、異なるイベントブロック（すなわち、「１」と「２」）に割り当てられている。 On the other hand, the time interval between the occurrence dates 202 of the monitoring events having the event IDs 201 “6” and “10” is equal to or greater than the maximum event time interval. Therefore, these two IT failure events are assigned to different event blocks (that is, “1” and “2”) in the IT failure event block table 400 (FIG. 4A).

次に、ＩＴ障害検知・検索コンピュータ１０３は、作成したＩＴ障害イベントブロックの特徴量４１２を計算する。ＩＴ障害の特徴４１２は、ＩＴ障害イベントブロックテーブル４００において、同じイベントブロックに含まれる単数又は複数のＩＴ障害イベントの「種類」、「ソース」、「イベント番号」、「ユーザ」、「コンピュータ」の各属性に対して、共通する属性値の頻度を計算し、頻度の高い属性値の上位２つを各属性の「特徴」とする。 Next, the IT failure detection / search computer 103 calculates the feature amount 412 of the created IT failure event block. The IT failure feature 412 includes the “type”, “source”, “event number”, “user”, and “computer” of one or more IT failure events included in the same event block in the IT failure event block table 400. The frequency of the common attribute value is calculated for each attribute, and the top two attribute values having the highest frequency are used as the “feature” of each attribute.

例えばＩＴ障害イベントブロックテーブル４００（図４）の場合、イベントブロックＩＤ４０１が「１」のイベントブロックに、イベントＩＤ「４」、「５」、「６」の３つのＩＴ障害イベントが含まれている。属性「種類」に着目すると、属性値「エラー」は２回出現し、「致命的」は１回出現している。 For example, in the case of the IT failure event block table 400 (FIG. 4), the event block whose event block ID 401 is “1” includes three IT failure events with event IDs “4”, “5”, and “6”. . Focusing on the attribute “type”, the attribute value “error” appears twice and “fatal” appears once.

従って、ＩＴ障害特徴テーブル４１０においては、イベントブロックＩＤ「１」の属性「種類」に関し、最も頻度の高い属性値を表す「種類０」に「エラー」が設定され、次に頻度の高い属性値を表す「種類１」に「致命的」が設定されている。 Therefore, in the IT failure feature table 410, regarding the attribute “type” of the event block ID “1”, “error” is set to “type 0” representing the most frequent attribute value, and the next most frequent attribute value “Fatal” is set in “Type 1” representing

なお、ＩＴ障害特徴テーブル４１０では、各属性の組み合わせが全て共通するイベントブロック同士が１つのＩＴ障害ＩＤ４１１で管理される。例えばイベントブロックＩＤ「１」は、ＩＴ障害ＩＤ４１１「１」で管理される。因みに、ＩＴ障害ＩＤ４１１のイベントブロックＩＤリスト４１３には、イベントブロックＩＤ「１」の他に、「１５」と「３０」も含まれており、それらは同じ特徴を有することが分かる。 Note that in the IT failure feature table 410, event blocks having all common combinations of attributes are managed by one IT failure ID 411. For example, the event block ID “1” is managed by the IT failure ID 411 “1”. Incidentally, the event block ID list 413 of the IT failure ID 411 includes “15” and “30” in addition to the event block ID “1”, and it can be seen that they have the same characteristics.

因みに、「特徴」は、これらの属性の他、イベントブロック内の先頭イベントの発生日時２０２から最終イベントの発生日時２０２までの平均時間間隔、属性値の平均値に対して差異が大きい属性値等を用いても良い。 Incidentally, the “feature” includes, in addition to these attributes, an average time interval from the occurrence date / time 202 of the first event in the event block to the occurrence date / time 202 of the last event, an attribute value having a large difference from the average attribute value May be used.

（ステップ７０３）
ＩＴ障害検知・検索コンピュータ１０３は、ＩＴ障害ＤＢ１０４のＩＴ障害特徴テーブル４１０を参照し、ステップ７０２で作成したＩＴ障害イベントブロックと同じ特徴４１２を有するイベントブロックＩＤを、イベントブロックＩＤリスト４１３から取得する。すなわち、特徴が類似するＩＴ障害イベントブロックを取得する。 (Step 703)
The IT failure detection / retrieval computer 103 refers to the IT failure feature table 410 of the IT failure DB 104 and acquires an event block ID having the same feature 412 as the IT failure event block created in step 702 from the event block ID list 413. . That is, an IT failure event block having similar characteristics is acquired.

ここで、ＩＴ障害検知・検索コンピュータ１０３は、変化テーブル４３０（図４−２）を参照し、必要に応じて特徴４１２を補正した後の内容にてイベントブロックＩＤリスト４１３を検索する。補正の必要があるのは、監視イベントに過去に変化した属性値が含まれており、かつ、当該変化の発生時点より過去に発生したＩＴ障害イベントを検索対象とする場合である。 Here, the IT failure detection / retrieval computer 103 refers to the change table 430 (FIG. 4-2) and retrieves the event block ID list 413 with the content after correcting the feature 412 as necessary. The correction is necessary when the monitoring event includes an attribute value that has changed in the past, and an IT failure event that has occurred in the past from the time of occurrence of the change is a search target.

例えば図４−２の場合、変化テーブル４３０の変化ＩＤ「１」には、変化発生日「２０１０/０２/０３」に、属性名「コンピュータ０」の属性値が「Server１０」から「Server２３」に変更されたことが記録されている。従って、ステップ７０２で作成したＩＴ障害イベントブロックの特徴４１２に含まれる属性名「コンピュータ０」の値が「Server２３」であった場合に、変化発生日「２０１０/０２/０３」以前に発生したＩＴ障害イベントブロックの検索時には、ＩＴ障害検知・検索コンピュータ１０３は、特徴４１２の属性名「コンピュータ０」の属性値を「Server１０」に変更して検索処理を実行する。勿論、変化発生日「２０１０/０２/０３」以降に発生したＩＴ障害イベントブロックの検索には、現在の特徴４１２に含まれる属性値（すなわち、「Server２３」）をそのまま使用する。 For example, in the case of FIG. 4B, the change ID “1” of the change table 430 includes the change occurrence date “2010/02/03” and the attribute value “computer 0” from “Server10” to “Server23”. It has been recorded that it has changed. Accordingly, when the value of the attribute name “Computer 0” included in the feature 412 of the IT failure event block created in Step 702 is “Server23”, the IT that occurred before the change occurrence date “2010/02/03” When searching for a failure event block, the IT failure detection / search computer 103 changes the attribute value of the attribute name “computer 0” of the feature 412 to “Server10” and executes search processing. Of course, the attribute value (that is, “Server23”) included in the current feature 412 is used as it is for searching for an IT failure event block that occurred after the change occurrence date “2010/02/03”.

なお、ＩＴ障害特徴テーブル４１０には、現在時刻までに生成されたＩＴ障害イベントブロックの特徴が全て保存されている。勿論、ある属性の属性値に変化があると、変化前とは異なる特徴が生成される。 The IT failure feature table 410 stores all the features of IT failure event blocks generated up to the current time. Of course, if there is a change in the attribute value of a certain attribute, a feature different from that before the change is generated.

従来の障害検索技術では、属性値が変更される前の情報を有していないため、専ら属性変更後の特徴を用いてしか類似する特徴を有するＩＴ障害イベントブロックを検索することができない。結果的に、属性値の変更がＩＴ障害の原因には影響しない要因により生じた場合でも、変更以前に行った対策に関する情報を検索することができなかった。 Since the conventional failure search technique does not have information before the attribute value is changed, it is possible to search for an IT failure event block having a similar feature only by using the feature after the attribute change. As a result, even when the attribute value change is caused by a factor that does not affect the cause of the IT failure, it is not possible to search for information on measures taken before the change.

一方、本実施例の場合、現在時刻に取得されたＩＴ障害イベントブロックと特徴が共通するイベントブロックだけでなく（すなわち、属性変更後の特徴が共通するイベントブロックだけでなく）、属性値が変更される前の特徴４１２と共通するイベントブロックについても検索結果に含めることができる。ここで、過去のＩＴ障害イベントブロックに過去の対策等が関連付けてデータベースに保存されている場合には、属性値の変更の有無によらず、今回のＩＴ障害イベントに対する適切な対策等を提示することができる。すなわち、復旧作業に従事する担当者が適切な対策を効率良く実行できるように支援することができる。 On the other hand, in the case of the present embodiment, not only the event block having the same characteristic as the IT failure event block acquired at the current time (that is, not only the event block having the same characteristic after the attribute change), but also the attribute value is changed. Event blocks that are common to the feature 412 before being performed can also be included in the search results. Here, if past countermeasures are associated with the past IT failure event block and stored in the database, an appropriate countermeasure for the current IT failure event is presented regardless of whether or not the attribute value has been changed. be able to. That is, it is possible to assist the person in charge engaged in the recovery work so that appropriate measures can be efficiently executed.

（ステップ７０４）
ＩＴ障害検知・検索コンピュータ１０３は、ステップ７０２で作成したＩＴ障害イベントブロックと、ステップ７０３で取得したＩＴ障害イベントブロック間の類似度を計算する。２つのイベントブロック（イベントブロックＡ、イベントブロックＢ）間の類似度は、例えば次式

により算出する。勿論、先程説明した変化テーブル４３０を参照して、変化の前か後かを考慮し、適時属性値を変更して類似度の計算を行う。 (Step 704)
The IT failure detection / search computer 103 calculates the similarity between the IT failure event block created in step 702 and the IT failure event block acquired in step 703. The similarity between two event blocks (event block A, event block B) is, for example:

Calculated by Of course, referring to the change table 430 described above, the degree of similarity is calculated by changing the attribute value in a timely manner in consideration of whether it is before or after the change.

（ステップ７１０）
ＩＴ障害検知・検索コンピュータ１０３は、ＩＴ障害の検知結果を、表示装置３３１のＩＴ障害検知画面６００（図６）に表示する。ＩＴ障害検知画面６００には、障害イベントテーブル６２０と類似ＩＴ障害テーブル６３０が表示される。障害イベントテーブル６２０には、ステップ７０２で作成されたＩＴ障害イベントブロックが表示される。類似ＩＴ障害テーブル６３０には、ステップ７０３で検索された類似ＩＴ障害イベントブロックが類似度６３３と共に表示される。 (Step 710)
The IT failure detection / search computer 103 displays the IT failure detection result on the IT failure detection screen 600 (FIG. 6) of the display device 331. On the IT failure detection screen 600, a failure event table 620 and a similar IT failure table 630 are displayed. In the failure event table 620, the IT failure event block created in step 702 is displayed. In the similar IT failure table 630, the similar IT failure event block searched in step 703 is displayed together with the similarity 633.

図６の場合、類似ＩＴ障害テーブル６３０のＩＴ障害ＩＤ「８１」に対応するＩＴ障害イベントブロックは、ステップ７０２で作成したＩＴ障害イベントブロックと、発生日時を除き、全ての属性で属性値が一致する。このため、ＩＴ障害ＩＤ「８１」のＩＴ障害イベントブロックと、障害イベントテーブル６２０に表示された２つのイベントブロックとの類似度は１００％である。 In the case of FIG. 6, the IT failure event block corresponding to the IT failure ID “81” in the similar IT failure table 630 has the same attribute value as the IT failure event block created in step 702 except for the occurrence date and time. To do. Therefore, the similarity between the IT failure event block with the IT failure ID “81” and the two event blocks displayed in the failure event table 620 is 100%.

また、ＩＴ障害ＩＤ「９９」のＩＴ障害イベントブロックと、障害イベントテーブル６２０のイベントブロックとの類似度は８７％である。ここで、ＩＴ障害ＩＤ「９９」に属するＩＴ障害イベントブロックを構成する属性数の合計は１５個（＝５個×３）である。イベントＩＤ「３４５」及び「３４６」で特定されるイベントブロックについては、全ての属性について属性値の一致が認められ、イベントＩＤ「３４７」で特定されるイベントブロックについては３つの属性について属性値の一致が認められる。従って、類似度は、１３／１５で計算される。 The similarity between the IT failure event block with the IT failure ID “99” and the event block in the failure event table 620 is 87%. Here, the total number of attributes constituting the IT failure event block belonging to the IT failure ID “99” is 15 (= 5 × 3). For event blocks identified by event IDs “345” and “346”, matching of attribute values is recognized for all attributes, and for event blocks identified by event ID “347”, attribute values for three attributes. There is a match. Therefore, the similarity is calculated as 13/15.

（ステップ７０５）
次に、ＩＴ障害分類木の更新機能について説明する。すなわち、ＩＴ障害分類木の保守管理機能について説明する。従来のKmeansなどの教師なしクラスタリングアルゴリズムでは、構成する訓練データ（ＩＴ障害イベントブロックの集合）に変更（ＩＴ障害イベントブロックの削除や追加）があると、ＩＴ障害分類木のようなクラスターを新規に再作成する必要がある。しかし、その計算量は膨大であり、しかも過去に作成したクラスターに対する変化を連続的に追跡することが困難である。そこで、本実施例では、これらの技術的課題を解決可能な手法について説明する。 (Step 705)
Next, the update function of the IT failure classification tree will be described. That is, the maintenance management function of the IT failure classification tree will be described. In conventional unsupervised clustering algorithms such as Kmeans, if there is a change (deletion or addition of IT failure event blocks) to the training data (set of IT failure event blocks) that constitutes, a cluster such as an IT failure classification tree is newly created. It needs to be recreated. However, the amount of calculation is enormous, and it is difficult to continuously track changes to clusters created in the past. In this embodiment, a method that can solve these technical problems will be described.

ＩＴ障害検知・検索コンピュータ１０３は、現在の時刻を起点とし、ある時間範囲内に発生したＩＴ障害イベントブロックのリストＡと、現在のＩＴ障害分類木４２０を構成しているＩＴ障害イベントブロックのリストＢとを比較する。この比較処理は、ステップ７０３、７０４及び７１０と同時並行的に実行される。比較対象とする時間範囲は、設定画面５００（図５）の変化検知時間窓幅入力部５０３に設定入力された値で与えられる。 The IT failure detection / retrieval computer 103 uses the current time as a starting point, a list A of IT failure event blocks that occurred within a certain time range, and a list of IT failure event blocks that make up the current IT failure classification tree 420 Compare B. This comparison process is executed concurrently with steps 703, 704, and 710. The time range to be compared is given by the value set and input to the change detection time window width input unit 503 of the setting screen 500 (FIG. 5).

ＩＴ障害検知・検索コンピュータ１０３は、リストＢに含まれているがリストＡに含まれていないＩＴ障害イベントブロックをＩＴ障害分類木からデクリメントし、ＩＴ障害分類木４２０を更新する。一方、ＩＴ障害検知・検索コンピュータ１０３は、リストＡに含まれていているがリストＢに含まれていないＩＴ障害イベントブロックをＩＴ障害分類木にインクリメントし、ＩＴ障害分類木４２０を更新する。デクリメント処理とインクリメント処理の詳細については後述する。 The IT failure detection / retrieval computer 103 decrements an IT failure event block included in the list B but not included in the list A from the IT failure classification tree, and updates the IT failure classification tree 420. On the other hand, the IT failure detection / retrieval computer 103 increments an IT failure event block that is included in the list A but not included in the list B, and updates the IT failure classification tree 420. Details of the decrement process and the increment process will be described later.

（ステップ７０６）
ＩＴ障害検知・検索コンピュータ１０３は、所定時間単位でＩＴ障害分類木に生じた変化量を計算する。例えば１日単位で変化量を計算する。変化量は、例えば次式

により計算する。 (Step 706)
The IT failure detection / search computer 103 calculates the amount of change that has occurred in the IT failure classification tree in predetermined time units. For example, the amount of change is calculated on a daily basis. The amount of change is, for example,

Calculate with

ここで、Ｄ（ＩＴ障害分類木Ａ, ＩＴ障害分類木Ｂ）は、ＩＴ障害分類木ＡとＩＴ障害分類木Ｂとの間の編集距離を表す。編集距離とは、一方を他方に変形するために要する最小手順回数で与えられる。例えばインクリメント処理には、図１０（１）〜（３）に示す編集処理がある。この場合、編集距離は１カウントされる。また例えばデクリメント処理には、図１１（１）〜（３）に示す編集処理がある。この場合、編集距離は１カウントされる。 Here, D (IT failure classification tree A, IT failure classification tree B) represents the edit distance between the IT failure classification tree A and the IT failure classification tree B. The edit distance is given by the minimum number of steps required to transform one to the other. For example, the increment process includes an edit process shown in FIGS. In this case, the edit distance is counted by 1. Further, for example, the decrement process includes an edit process shown in FIGS. In this case, the edit distance is counted by 1.

図１２の上段に、一日単位で計算されたＩＴ障害分類木４２０の変化量の時間変化を示すグラフ１２００を示す。グラフ１２００の横軸は時間であり、縦軸は変化量（％）である。 The upper part of FIG. 12 shows a graph 1200 showing the change over time of the change amount of the IT failure classification tree 420 calculated on a daily basis. The horizontal axis of the graph 1200 is time, and the vertical axis is the amount of change (%).

（ステップ７０７）
ＩＴ障害検知・検索コンピュータ１０３は、ステップ７０６で算出された変化量と、設定画面５００（図５）の変化量閾値入力部５０４に設定入力された閾値とを比較する。変化量が閾値以上であった場合、ＩＴ障害検知・検索コンピュータ１０３は、ステップ７０８に進む。一方、変化量が閾値より小さい場合、ＩＴ障害検知・検索コンピュータ１０３は、ステップ７０１に戻る。 (Step 707)
The IT failure detection / retrieval computer 103 compares the change amount calculated in step 706 with the threshold value set and input to the change amount threshold value input unit 504 of the setting screen 500 (FIG. 5). If the change amount is equal to or greater than the threshold, the IT failure detection / retrieval computer 103 proceeds to step 708. On the other hand, if the change amount is smaller than the threshold value, the IT failure detection / retrieval computer 103 returns to step 701.

（ステップ７０８）
ＩＴ障害検知・検索コンピュータ１０３は、変化量が閾値を越える変化が発生した場合、例えば非特許文献２に開示の手法を用いて解析し、変化の前後において特徴４１２を構成するどの属性名４３３の属性値がどのように変化したかを特定する。特定された結果は、ＩＴ障害ＤＢ１０４の変化テーブル４３０に反映される。 (Step 708)
The IT failure detection / retrieval computer 103 analyzes, for example, using a method disclosed in Non-Patent Document 2 when a change in which the change amount exceeds a threshold value, and which attribute name 433 constituting the feature 412 before and after the change. Specify how the attribute value has changed. The identified result is reflected in the change table 430 of the IT failure DB 104.

次に、変化解析処理の一例を示す。ＩＴ障害検知・検索コンピュータ１０３は、変化量が閾値を超えた日を起点とし、その前後所定期間内に取得されたＩＴ障害イベントブロックをＩＴ障害イベントブロックテーブル４００（図４−１）から取得する。ＩＴ障害イベントブロックを取得する時間範囲は、設定画面５００（図５）の変化解析時間窓幅入力部５０５に設定された値を使用する。図５の場合は、３０日である。 Next, an example of a change analysis process is shown. The IT failure detection / search computer 103 starts from the day when the amount of change exceeds the threshold, and acquires the IT failure event block acquired within a predetermined period before and after the change amount from the IT failure event block table 400 (FIG. 4A). . The time range for acquiring the IT failure event block uses the value set in the change analysis time window width input unit 505 of the setting screen 500 (FIG. 5). In the case of FIG. 5, it is 30 days.

以下では、図１２の下段に示すグラフ１２１０を用いて変化解析処理に伴うＩＴ障害イベントの取得処理を説明する。なお、グラフ１２１０の横軸は時間であり、縦軸はＩＴ障害の発生頻度である。 Hereinafter, an IT failure event acquisition process associated with the change analysis process will be described using a graph 1210 shown in the lower part of FIG. The horizontal axis of the graph 1210 is time, and the vertical axis is the frequency of IT failure occurrence.

前述したように、ＩＴ障害分類木４２０の変化量１２０２は、「２０１０/０５/２０」に閾値１２０１を超えている。従って、ＩＴ障害検知・検索コンピュータ１０３は、「２０１０/０５/２０」に対して前３０日間１２１１及び後３０日間１２１２の範囲でＩＴ障害イベントブロックを取得する。 As described above, the change amount 1202 of the IT failure classification tree 420 exceeds the threshold 1201 at “2010/05/20”. Accordingly, the IT failure detection / retrieval computer 103 acquires an IT failure event block in the range of 1211 for the previous 30 days and 1212 for the following 30 days for “2010/05/20”.

次に、ＩＴ障害検知・検索コンピュータ１０３は、取得されたＩＴ障害イベントブロックに対する特徴４１２をＩＴ障害特徴テーブル４１０から取得する。その上で、ＩＴ障害検知・検索コンピュータ１０３は、変化前のＩＴ障害イベントブロックには「変化前」のラベルを付加し、変化後のＩＴ障害イベントブロックには「変化後」のラベルを付加した変化解析テーブル１３００（図１３）を作業領域３１２に作成する。 Next, the IT failure detection / retrieval computer 103 acquires the feature 412 for the acquired IT failure event block from the IT failure feature table 410. In addition, the IT failure detection / retrieval computer 103 adds a “before change” label to the IT failure event block before the change, and a “after change” label to the IT failure event block after the change. A change analysis table 1300 (FIG. 13) is created in the work area 312.

最後に、ＩＴ障害検知・検索コンピュータ１０３は、変化解析テーブル１３００の特徴１３０１とラベル１３０２を参照し、作業領域３１２上にＩＤ３やＣ４．５などの教師あり学習アルゴリズムを用いて変化解析決定木１３１０（図１３）を構築する。変化解析決定木１３１０には、変化前後で属性値がどのように変化したかの情報が表現される。 Finally, the IT failure detection / search computer 103 refers to the feature 1301 and the label 1302 of the change analysis table 1300, and uses the supervised learning algorithm such as ID3 or C4.5 on the work area 312 to change the decision analysis tree 1310. (FIG. 13) is constructed. The change analysis decision tree 1310 expresses information about how the attribute value has changed before and after the change.

図１３に示す変化解析決定木１３１０は、属性名「ソース０」の属性値が変化の前後で「Process６」から「Process７」に変化されたこと、属性名「コンピュータ０」の属性値が「Server２」から「Server８」に変化されたことを表している。 The change analysis decision tree 1310 shown in FIG. 13 indicates that the attribute value of the attribute name “source 0” has been changed from “Process6” to “Process7” before and after the change, and the attribute value of the attribute name “computer 0” is “Server2”. "Is changed to" Server8 ".

（ＩＴ障害分類木のインクメント処理）
次に、図８を用い、ＩＴ障害分類木４２０のインクリメント処理の詳細を説明する。この処理は、従来の概念クラスタリングＣＯＢＷＥＢにおけるインクリメント処理と同じ処理である（非特許文献１）。以下では、実施例に特有の処理であるデクリメント処理との比較の観点からインクリメント処理の内容を説明する。インクリメント処理とは、ＩＴ障害分類木４２０に新たなＩＴ障害イベントブロックの特徴４１２を１個追加して、ＩＴ障害分類木４２０を更新する処理である。 (Increment processing of IT failure classification tree)
Next, details of the increment process of the IT failure classification tree 420 will be described with reference to FIG. This process is the same as the increment process in the conventional concept clustering COBWEB (Non-patent Document 1). Hereinafter, the contents of the increment process will be described from the viewpoint of comparison with the decrement process, which is a process unique to the embodiment. The increment process is a process of adding one new IT failure event block feature 412 to the IT failure classification tree 420 and updating the IT failure classification tree 420.

（ステップ８００）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」にＩＴ障害分類木４２０の「rootノード」を設定する。 (Step 800)
The IT failure detection / search computer 103 sets the “root node” of the IT failure classification tree 420 to the variable “node”.

（ステップ８０１）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」に設定しているＩＴ障害分類木４２０のノードに対し、今回追加しようとしているＩＴ障害イベントブロックの特徴４１２を追加する。 (Step 801)
The IT failure detection / search computer 103 adds the feature 412 of the IT failure event block to be added this time to the node of the IT failure classification tree 420 set in the variable “node”.

（ステップ８０２）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」に設定しているＩＴ障害分類木４２０のノードがrootノードであり、かつ、そのrootノードが子ノードを持っていないか否かを判定する。ＩＴ障害検知・検索コンピュータ１０３は、肯定結果が得られた場合はステップ８０３を実行し、否定結果が得られた場合はステップ８０４を実行する。 (Step 802)
The IT failure detection / retrieval computer 103 determines whether the node of the IT failure classification tree 420 set in the variable “node” is a root node and the root node has no child node. The IT failure detection / retrieval computer 103 executes step 803 if a positive result is obtained, and executes step 804 if a negative result is obtained.

（ステップ８０３）
ＩＴ障害検知・検索コンピュータ１０３は、ＩＴ障害分類木４２０の新規ノードを作成し、新規ノードに当該ＩＴ障害イベントブロックの特徴４１２を追加する。さらに、この新規ノードをrootノードの子ノードとして追加する。 (Step 803)
The IT failure detection / search computer 103 creates a new node of the IT failure classification tree 420, and adds the feature 412 of the IT failure event block to the new node. Furthermore, this new node is added as a child node of the root node.

（ステップ８０４）
ＩＴ障害検知・検索コンピュータ１０３は、次の４つの場合についてＣＵ値（Category Utility）を計算する。
（１）適当な子ノードの要素にＩＴ障害イベントブロックを追加したときのＣＵ値（Category Utility）
（２）ＩＴ障害イベントブロックを要素に持つ新規ノードを作成し、子ノードとして追加したときのＣＵ値（Category Utility）
（３）適当な２つの子ノードを統合したときのＣＵ値（Category Utility）
（４）適当な子ノードを分割したときのＣＵ値（Category Utility） (Step 804)
The IT failure detection / retrieval computer 103 calculates a CU value (Category Utility) for the following four cases.
(1) CU value (Category Utility) when IT failure event block is added to the appropriate child node element
(2) CU value (Category Utility) when a new node having an IT failure event block as an element is created and added as a child node
(3) CU value (Category Utility) when two appropriate child nodes are integrated
(4) CU value when an appropriate child node is divided (Category Utility)

ＣＵ値（Category Utility）は、次式で与えられる。

ここで、ｎは子ノード数、Ｃ_kはｋ番目の子ノードを表し、Ａ_iは特徴４１２のi番目の属性、Ｖ_{i, j}はi番目の属性の属性値がｊ番目の属性値であることを意味している。 The CU value (Category Utility) is given by the following equation.

Here, n represents the number of child nodes, C _k represents the k th child node, A _i represents the i th attribute of the feature 412, and V _{i, j} represents the j th attribute value of the i th attribute. It means that there is.

（ステップ８０５）
ＩＴ障害検知・検索コンピュータ１０３は、ステップ８０４で計算した４種類のＣＵ値（Category Utility）を比較する。
（１）追加
適当な子ノードの要素にＩＴ障害イベントブロックを追加したときのＣＵ値（Category Utility）が最大の場合は、ステップ８０６を実行する。
（２）新規
ＩＴ障害イベントブロックを要素に持つ新規ノードを作成し、子ノードとして追加したときのＣＵ値（Category Utility）が最大の場合は、ステップ８０７を実行する。
（３）統合
適当な２つの子ノードを統合したときのＣＵ値（Category Utility）が最大の場合は、ステップ８０８を実行する。
（４）分割
適当な子ノードを分割したときのＣＵ値（Category Utility）が最大の場合は、ステップ８０９を実行する。 (Step 805)
The IT failure detection / retrieval computer 103 compares the four types of CU values (Category Utility) calculated in step 804.
(1) Addition If the CU value (Category Utility) is maximum when an IT failure event block is added to an appropriate child node element, step 806 is executed.
(2) New When a new node having an IT failure event block as an element is created and added as a child node, the CU value (Category Utility) is maximum, step 807 is executed.
(3) Integration If the CU value (Category Utility) when the appropriate two child nodes are integrated is the maximum, step 808 is executed.
(4) Division If the CU value (Category Utility) when dividing an appropriate child node is maximum, step 809 is executed.

（ステップ８０６）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」に設定されているＩＴ障害分類木４２０を構成する子ノードのうち、ＩＴ障害イベントブロックを要素に追加すると、最大のＣＵ値（Category Utility）が得られる子ノードに対し、ＩＴ障害イベントブロックを追加する。 (Step 806)
When the IT failure event block is added to an element among the child nodes constituting the IT failure classification tree 420 set in the variable “node”, the IT failure detection / retrieval computer 103 has the maximum CU value (Category Utility). An IT failure event block is added to the obtained child node.

さらに、ＩＴ障害検知・検索コンピュータ１０３は、変数「node」に、ＩＴ障害イベントブロックを追加した子ノードを新たに設定する。 Further, the IT failure detection / retrieval computer 103 newly sets a child node to which the IT failure event block is added to the variable “node”.

（ステップ８０７）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」に設定されているＩＴ障害分類木４２０のノードに対し、ＩＴ障害イベントブロックを含む新規の子ノードを追加する。 (Step 807)
The IT failure detection / search computer 103 adds a new child node including an IT failure event block to the node of the IT failure classification tree 420 set in the variable “node”.

図１０（１）は、新規ノードの追加を説明する図である。変更前のＩＴ障害分類木１０００には、「node」の子要素として「child_node_Ｂ」及び「child_node_Ｃ」がある。今、「node」に、ＩＴ障害イベントブロックを含む新規ノードを追加すると、最大のＣＵ値（Category Utility）が得られる。このとき、ＩＴ障害イベントブロックを要素に持つ新規ノード「new_node」を作成し、「node」の子ノードとして追加する。これにより、新規子ノードを追加したＩＴ障害分類木１００１が得られる。 FIG. 10A illustrates the addition of a new node. The IT failure classification tree 1000 before the change includes “child_node_B” and “child_node_C” as child elements of “node”. If a new node including an IT failure event block is added to “node”, the maximum CU value (Category Utility) can be obtained. At this time, a new node “new_node” having an IT failure event block as an element is created and added as a child node of “node”. As a result, an IT failure classification tree 1001 to which a new child node has been added is obtained.

（ステップ８０８）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」に設定されているＩＴ障害分類木４２０を構成する子ノードのうち、統合すると最大のＣＵ値（Category Utility）が得られる２つの子ノードを統合する。 (Step 808)
The IT failure detection / retrieval computer 103 integrates two child nodes that can obtain the maximum CU value (Category Utility) by integrating the child nodes constituting the IT failure classification tree 420 set in the variable “node”. To do.

図１０（２）は、ノードの統合を説明する図である。変更前のＩＴ障害分類木１０１０には、「node」の子要素として「child_node_Ａ」、「child_node_Ｂ」、「child_node_Ｃ」がある。今、「child_node_Ａ」と「child_node_Ｂ」を統合すると、最大のＣＵ値（Category Utility）が得られるとする。このとき、ＩＴ障害検知・検索コンピュータ１０３は、新規ノード「new_node」を作成し、その親ノードに「node」を設定する一方、その子ノードに「child_node_Ａ」及び「child_node_Ｂ」を設定する。この結果、「child_node_Ａ」と「child_node_Ｂ」を統合したＩＴ障害分類木１０１１が得られる。 FIG. 10B is a diagram illustrating node integration. The IT failure classification tree 1010 before the change includes “child_node_A”, “child_node_B”, and “child_node_C” as child elements of “node”. Now, it is assumed that the maximum CU value (Category Utility) can be obtained by integrating “child_node_A” and “child_node_B”. At this time, the IT failure detection / search computer 103 creates a new node “new_node”, sets “node” as its parent node, and sets “child_node_A” and “child_node_B” as its child nodes. As a result, an IT failure classification tree 1011 obtained by integrating “child_node_A” and “child_node_B” is obtained.

（ステップ８０９）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」に設定されているＩＴ障害分類木４２０を構成する子ノードのうち、分割すると最大のＣＵ値（Category Utility）が得られる子ノードを分割する。 (Step 809)
The IT failure detection / retrieval computer 103 divides a child node that can obtain a maximum CU value (Category Utility) by dividing, among child nodes constituting the IT failure classification tree 420 set in the variable “node”.

図１０（３）は、ノードの分割を説明する図である。変更前のＩＴ障害分類木１０２０には、「node」の子要素として「child_node_Ａ」及び「child_node_Ｂ」があり、「child_node_Ａ」の子ノードとして「grandchild_node_Ｃ」及び「grandchild_node_Ｄ」がある。今、「child_node_Ａ」を分割することで最大のＣＵ値（Category Utility）が得られるとする。このとき、ＩＴ障害検知・検索コンピュータは、「child_node_Ａ」を削除して「grandchild_node_Ｃ」及び「grandchild_node_Ｄ」を「node」の子要素とする。この結果、「child_node_Ａ」を分割したＩＴ障害分類木１０２１が得られる。 FIG. 10 (3) is a diagram illustrating node division. The IT failure classification tree 1020 before the change includes “child_node_A” and “child_node_B” as child elements of “node”, and “grandchild_node_C” and “grandchild_node_D” as child nodes of “child_node_A”. Now, it is assumed that the maximum CU value (Category Utility) is obtained by dividing “child_node_A”. At this time, the IT failure detection / retrieval computer deletes “child_node_A” and makes “grandchild_node_C” and “grandchild_node_D” as child elements of “node”. As a result, an IT failure classification tree 1021 obtained by dividing “child_node_A” is obtained.

（ＩＴ障害分類木のデクリメント処理）
次に、図９を用い、ＩＴ障害分類木４２０のデクリメント処理の詳細を説明する。デクリメント処理とは、ＩＴ障害分類木４２０から１個のＩＴ障害イベントブロックを削除することにより、ＩＴ障害分類木４２０を更新する処理のことをいう。 (Decrement processing of IT failure classification tree)
Next, details of the decrement processing of the IT failure classification tree 420 will be described with reference to FIG. The decrement process is a process for updating the IT failure classification tree 420 by deleting one IT failure event block from the IT failure classification tree 420.

（ステップ９００）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」にＩＴ障害分類木４２０の「rootノード」を設定する。 (Step 900)
The IT failure detection / search computer 103 sets the “root node” of the IT failure classification tree 420 to the variable “node”.

（ステップ９０１）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」に設定しているＩＴ障害分類木４２０のノードから、削除しようとしているＩＴ障害イベントブロックを削除する。 (Step 901)
The IT failure detection / search computer 103 deletes the IT failure event block to be deleted from the node of the IT failure classification tree 420 set in the variable “node”.

（ステップ９０２）
ＩＴ障害検知・検索コンピュータ１０３は、ステップ９０１においてＩＴ障害イベントブロックが削除されたことにより、変数「node」に設定しているＩＴ障害分類木４２０のノードが要素を持たない状態になったか否かを判定する。ＩＴ障害検知・検索コンピュータ１０３は、要素が存在しない場合にはステップ９０３に進み、要素が存在する場合にはステップ９０４に進む。 (Step 902)
The IT failure detection / retrieval computer 103 determines whether or not the node of the IT failure classification tree 420 set in the variable “node” has no element due to the deletion of the IT failure event block in step 901. Determine. The IT failure detection / retrieval computer 103 proceeds to step 903 if no element exists, and proceeds to step 904 if an element exists.

（ステップ９０３）
この場合、「node」は空である。従って、ＩＴ障害検知・検索コンピュータ１０３は、ＩＴ障害分類木４２０から変数「node」に設定しているノードを削除する。 (Step 903)
In this case, “node” is empty. Therefore, the IT failure detection / retrieval computer 103 deletes the node set in the variable “node” from the IT failure classification tree 420.

図１１（１）はノードの削除を説明する図である。変更前のＩＴ障害分類木１１００には、「parent node」の下に、子ノード「node」、「node Ｂ」及び「node Ｃ」が存在する。一方、変更後のＩＴ障害分類木１１０１では、「node」が削除されている。 FIG. 11 (1) is a diagram for explaining deletion of a node. In the IT failure classification tree 1100 before the change, child nodes “node”, “node B”, and “node C” exist under “parent node”. On the other hand, “node” is deleted in the IT failure classification tree 1101 after the change.

（ステップ９０４）
ＩＴ障害検知・検索コンピュータ１０３は、次の３つの場合についてＣＵ値（Category Utility）を計算する。
（１）ＩＴ障害イベントブロックを要素に持つ子ノードから、対象とするイベントブロックを削除したときのＣＵ値（Category Utility）
（２）ＩＴ障害イベントブロックを要素に持つ子ノードから、対象とするイベントブロックを削除し、他の子ノードと統合したときのＣＵ値（Category Utility）。図１１（２）に統合する場合の例を示す。詳細は後述する。
（３）ＩＴ障害イベントブロックを要素に持つ子ノードを分割したときのＣＵ値（Category Utility）図１１（３）に分割の例を示す。詳細は後述する。 (Step 904)
The IT failure detection / retrieval computer 103 calculates a CU value (Category Utility) for the following three cases.
(1) CU value (Category Utility) when a target event block is deleted from a child node having an IT failure event block as an element
(2) A CU value (Category Utility) when a target event block is deleted from a child node having an IT failure event block as an element and integrated with another child node. FIG. 11 (2) shows an example of integration. Details will be described later.
(3) CU value (Category Utility) when a child node having an IT failure event block as an element is divided FIG. 11 (3) shows an example of division. Details will be described later.

（ステップ９０５）
ＩＴ障害検知・検索コンピュータ１０３は、ステップ９０５で計算したＣＵ値（Category Utility）を比較する。ＩＴ障害イベントブロックを要素に持つ子ノードから、対象とするイベントブロックを削除したときのＣＵ値（Category Utility）が最大になる場合、ＩＴ障害検知・検索コンピュータ１０３はステップ９０６を実行する。これに対し、子ノードを統合したときのＣＵ値（Category Utility）が最大になる場合、ＩＴ障害検知・検索コンピュータ１０３はステップ９０７を実行する。そして、子ノードを分割したときのＣＵ値（Category Utility）が最大になる場合、ＩＴ障害検知・検索コンピュータ１０３はステップ９０８を実行する。 (Step 905)
The IT failure detection / retrieval computer 103 compares the CU value (Category Utility) calculated in step 905. When the target CU value (Category Utility) is maximized when the target event block is deleted from the child node having the IT failure event block as an element, the IT failure detection / search computer 103 executes step 906. On the other hand, if the CU value (Category Utility) when the child nodes are integrated becomes maximum, the IT failure detection / retrieval computer 103 executes Step 907. If the CU value (Category Utility) when the child node is divided becomes maximum, the IT failure detection / retrieval computer 103 executes Step 908.

（ステップ９０６）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」に設定されているＩＴ障害分類木４２０を構成する子ノードのうち、当該ＩＴ障害イベントブロックを含む子ノードを新たに変数「node」に設定する。 (Step 906)
The IT failure detection / retrieval computer 103 newly sets a child node including the IT failure event block as a variable “node” among child nodes constituting the IT failure classification tree 420 set in the variable “node”. .

（ステップ９０７）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」に設定されているＩＴ障害分類木４２０を構成する子ノードのうち、統合すると最大のＣＵ値（Category Utility）が得られる２つの子ノードを統合する。 (Step 907)
The IT failure detection / retrieval computer 103 integrates two child nodes that can obtain the maximum CU value (Category Utility) by integrating the child nodes constituting the IT failure classification tree 420 set in the variable “node”. To do.

図１１（２）は、ノードの統合を説明する図である。変更前のＩＴ障害分類木１１１０には、「node」の子要素として「child_node_Ａ」、「child_node_Ｂ」及び「child_node_Ｃ」がある。今、「child_node_Ａ」が削除対象とするＩＴ障害イベントブロックを保持しており、「child_node_Ｂ」と統合することで最大のＣＵ値（Category Utility）が得られるとする。このとき、ＩＴ障害検知・検索コンピュータ１０３は、新規ノード「new_node」を作成し、その親ノードとして「node」を設定し、その子ノードとして「child_node_Ａ」及び「child_node_Ｂ」を設定する。この結果、「child_node_Ａ」と「child_node_Ｂ」を統合したＩＴ障害分類木１１１が生成される。ただし、「new_node」のＩＴ障害イベントブロック要素には、「child_node_Ａ」と「child_node_Ｂ」の要素を加えたものを設定する。 FIG. 11B is a diagram illustrating node integration. The IT failure classification tree 1110 before the change includes “child_node_A”, “child_node_B”, and “child_node_C” as child elements of “node”. Now, it is assumed that “child_node_A” holds an IT failure event block to be deleted, and that the maximum CU value (Category Utility) can be obtained by integrating with “child_node_B”. At this time, the IT failure detection / search computer 103 creates a new node “new_node”, sets “node” as its parent node, and sets “child_node_A” and “child_node_B” as its child nodes. As a result, an IT failure classification tree 111 in which “child_node_A” and “child_node_B” are integrated is generated. However, the IT failure event block element of “new_node” is set by adding “child_node_A” and “child_node_B” elements.

（ステップ９０８）
ＩＴ障害検知・検索コンピュータ１０３は、変数「node」に設定されているＩＴ障害分類木４２０の子ノードのうち、対象とするＩＴ障害イベントブロックを含むノードを削除することによりノードを分割する。 (Step 908)
The IT failure detection / retrieval computer 103 divides a node by deleting a node including the target IT failure event block among the child nodes of the IT failure classification tree 420 set in the variable “node”.

図１１（３）は、ノードの分割を説明する図である。変更前のＩＴ障害分類木１１２０には、「node」の子要素として「child_node_Ａ」及び「child_node_Ｂ」があり、更に「child_node_Ａ」の子要素として「grandchild_node_Ｃ」及び「grandchild_node_Ｄ」がある。今、該当ＩＴ障害イベントブロックを含む「child_node_Ａ」を分割することで最大のＣＵ値（Category Utility）が得られるとする。このとき、ＩＴ障害検知・検索コンピュータ１０３は、「child_node_Ａ」を削除して「grandchild_node_Ｃ」及び「grandchild_node_Ｄ」を「node」の子要素とする。この結果、「child_node_Ａ」を分割したＩＴ障害分類木１１２１が得られる。 FIG. 11 (3) is a diagram illustrating node division. The IT failure classification tree 1120 before the change includes “child_node_A” and “child_node_B” as child elements of “node”, and further includes “grandchild_node_C” and “grandchild_node_D” as child elements of “child_node_A”. Now, it is assumed that the maximum CU value (Category Utility) is obtained by dividing “child_node_A” including the corresponding IT failure event block. At this time, the IT failure detection / search computer 103 deletes “child_node_A” and makes “grandchild_node_C” and “grandchild_node_D” as child elements of “node”. As a result, an IT failure classification tree 1121 obtained by dividing “child_node_A” is obtained.

（まとめ）
以上説明したように、本実施例によれば、ＩＴシステムの運用中に監視イベントの属性値に変化を伴う変更が発生した場合でも、過去に発生した類似のＩＴ障害イベントブロックの検索が可能となる。その結果、過去の同種のＩＴ障害発生時に実行した対策を、復旧担当者に効果的に提供することができる。 (Summary)
As described above, according to the present embodiment, it is possible to search for a similar IT failure event block that has occurred in the past even when a change accompanying change in the attribute value of the monitoring event occurs during operation of the IT system. Become. As a result, it is possible to effectively provide a recovery person with measures taken when an IT failure of the same type in the past occurs.

また、本実施例の場合、ＩＴシステム変更後に発生した監視イベントについてＩＴ障害分類木を最初から再作成するのではなく、ノードのインクリメント処理及びデクリメント処理による修正を可能とする。これにより、ＩＴ障害分類木の作成に要する計算量を削減することができる。 In the case of the present embodiment, the IT failure classification tree is not re-created from the beginning for the monitoring event that has occurred after the IT system change, but correction by node increment processing and decrement processing is possible. As a result, the amount of calculation required to create the IT failure classification tree can be reduced.

また、本実施例の場合、ＩＴ障害分類木の変化量が閾値を越える場合には、その変化の発生時を起点として前後所定の範囲内で変化した属性と属性値を解析して決定木を作成する。これにより、ＩＴ障害分類木に生じた構造変化の追跡が可能となる。 In the case of this embodiment, when the change amount of the IT failure classification tree exceeds the threshold, the decision tree is analyzed by analyzing the attribute and the attribute value that have changed within the predetermined range before and after the occurrence of the change. create. As a result, it is possible to track structural changes that occur in the IT failure classification tree.

（他の形態例）
本発明は上述した実施例に限定されるものでなく、様々な変形例が含まれる。例えば、上述した形態例は、本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある形態例の一部を他の形態例の構成に置き換えることが可能であり、また、ある形態例の構成に他の形態例の構成を加えることも可能である。また、各形態例の構成の一部について、他の構成を追加、削除又は置換することも可能である。 (Other examples)
The present invention is not limited to the above-described embodiments, and includes various modifications. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Moreover, it is possible to replace a part of a certain form example with the structure of another form example, and it is also possible to add the structure of another form example to the structure of a certain form example. Moreover, it is also possible to add, delete, or replace another structure with respect to a part of structure of each form example.

また、上述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路その他のハードウェアとして実現しても良い。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することにより実現しても良い。すなわち、ソフトウェアとして実現しても良い。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリやハードディスク、SSD（Solid State Drive）等の記憶装置、ICカード、SDカード、DVD等の記憶媒体に格納することができる。 Moreover, you may implement | achieve some or all of each structure, a function, a process part, a process means, etc. which were mentioned above as an integrated circuit or other hardware, for example. Each of the above-described configurations, functions, and the like may be realized by the processor interpreting and executing a program that realizes each function. That is, it may be realized as software. Information such as programs, tables, and files for realizing each function can be stored in a memory, a hard disk, a storage device such as an SSD (Solid State Drive), or a storage medium such as an IC card, an SD card, or a DVD.

また、制御線や情報線は、説明上必要と考えられるものを示すものであり、製品上必要な全ての制御線や情報線を表すものでない。実際にはほとんど全ての構成が相互に接続されていると考えて良い。 Control lines and information lines indicate what is considered necessary for the description, and do not represent all control lines and information lines necessary for the product. In practice, it can be considered that almost all components are connected to each other.

１０１…監視対象サーバ群
１０２…監視サーバ
１０３…ＩＴ障害検知・検索コンピュータ
１０４…ＩＴ障害ＤＢ
３１１…ＩＴ障害検知・検索プログラム
４００…ＩＴ障害イベントブロックテーブル
４１０…ＩＴ障害特徴テーブル
４２０…ＩＴ障害分類木
４３０…変化テーブル
５００…設定画面
５０１…ＩＴ障害イベント種類入力部
５０２…最大イベント時間間隔入力部
５０３…変化検知時間窓幅入力部
５０４…変化量閾値入力部
５０５…変化解析時間窓幅入力部
１３００…変化解析テーブル
１３１０…変化解析決定木 101 ... Monitoring target server group 102 ... Monitoring server 103 ... IT failure detection / search computer 104 ... IT failure DB
311 ... IT failure detection / search program 400 ... IT failure event block table 410 ... IT failure feature table 420 ... IT failure classification tree 430 ... change table 500 ... setting screen 501 ... IT failure event type input unit 502 ... maximum event time interval input Unit 503 ... change detection time window width input unit 504 ... change amount threshold value input unit 505 ... change analysis time window width input unit 1300 ... change analysis table 1310 ... change analysis decision tree

Claims

ＩＴ障害を監視する監視サーバが生成した監視イベントを逐次取得し、一つの原因によって発生した単数又は複数のイベントから構成されるＩＴ障害イベントブロックを生成する第一の処理部と、
前記ＩＴ障害イベントブロックに属するイベントに頻出する属性値をもとに特徴情報を求める第二の処理部と、
イベントブロックの特徴情報に発生した変化の内容と発生時間を変化テーブルに記録する第三の処理部と、
前記特徴情報に基づいてＩＴ障害データベースを検索し、新規に取得されたＩＴ障害イベントブロックに類似する過去のＩＴ障害イベントブロックを検索する第四の処理部と、
前記変化テーブルを参照し、検索対象とする時間範囲に応じて検索処理で使用する特徴情報の内容を補正する第五の処理部と
を有することを特徴とするＩＴ障害検知・検索装置。 A first processing unit that sequentially acquires a monitoring event generated by a monitoring server that monitors an IT failure and generates an IT failure event block composed of one or a plurality of events caused by one cause;
A second processing unit for obtaining feature information based on attribute values that frequently appear in events belonging to the IT failure event block;
A third processing unit that records the content and time of the change that has occurred in the feature information of the event block in a change table;
A fourth processing unit that searches an IT failure database based on the feature information and searches for past IT failure event blocks similar to a newly acquired IT failure event block;
An IT failure detection / retrieval device, comprising: a fifth processing unit that refers to the change table and corrects the content of feature information used in search processing according to a time range to be searched.

請求項１に記載のＩＴ障害検知・検索装置において、
前記第一の処理部において新規に生成されたＩＴ障害イベントブロックを使用し、ＩＴ障害イベントブロックを分類するＩＴ障害分類木を更新する第六の処理部と、
一定期間内に前記ＩＴ障害分類木に発生した編集回数に基づいて、前記ＩＴ障害分類木の変化量を計算する第七の処理部と、
前記変化量が閾値以上の場合、前記ＩＴ障害分類木の変化を解析し、ＩＴ障害イベントブロックの属性値に起こった変化を特定する第八の処理部と
を有することを特徴とするＩＴ障害検知・検索装置。 In the IT failure detection / retrieval device according to claim 1,
A sixth processing unit for updating an IT failure classification tree for classifying the IT failure event block using the IT failure event block newly generated in the first processing unit;
A seventh processing unit that calculates the amount of change in the IT failure classification tree based on the number of edits that occurred in the IT failure classification tree within a certain period;
An IT failure detection, comprising: an eighth processing unit that analyzes a change in the IT failure classification tree and identifies a change that has occurred in an attribute value of an IT failure event block when the change amount is equal to or greater than a threshold value -Search device.

請求項１に記載のＩＴ障害検知・検索装置において、
新規に取得したＩＴ障害イベントブロックに対して類似する過去のＩＴ障害イベントブロックとその発生日時を表示部に表示する第九の処理部
を有することを特徴とするＩＴ障害検知・検索装置。 In the IT failure detection / retrieval device according to claim 1,
An IT failure detection / retrieval apparatus, comprising: a ninth processing unit that displays a past IT failure event block similar to a newly acquired IT failure event block and the date and time of occurrence thereof on a display unit.

請求項２に記載のＩＴ障害検知・検索装置において、
前記第六の処理部は、ＩＴ障害イベントブロックのインクリメント処理とデクリメント処理の両方においてＩＴ障害分類木を更新する
ことを特徴とするＩＴ障害検知・検索装置。 In the IT failure detection / retrieval device according to claim 2,
The sixth processing unit, I T fault detection and retrieval device you and updates the IT fault classification tree in both the increment processing and decrementing of IT failure event block.

請求項２に記載のＩＴ障害検知・検索装置において、
前記第七の処理部は、インクリメント処理に対しては、ＩＴ障害分類木に対する新規ノード追加、ノード統合、ノード分割処理の有無及び回数から変化量を計算する
ことを特徴とするＩＴ障害検知・検索装置。 In the IT failure detection / retrieval device according to claim 2,
For the increment process, the seventh processing unit calculates an amount of change from the presence / absence and number of node additions, node integration, and node division processing to the IT failure classification tree. apparatus.

請求項２に記載のＩＴ障害検知・検索装置において、
前記第七の処理部は、デクリメント処理に対しては、ＩＴ障害分類木に対するノード削除、ノード統合、ノード分割の有無及び回数から変化量を計算する
ことを特徴とするＩＴ障害検知・検索装置。 In the IT failure detection / retrieval device according to claim 2,
For the decrement processing, the seventh processing unit calculates the amount of change from the node deletion, node integration, presence / absence of node division, and the number of times of node division for the IT failure classification tree.

コンピュータに、
ＩＴ障害を監視する監視サーバが生成した監視イベントを逐次取得し、一つの原因によって発生した単数又は複数のイベントから構成されるＩＴ障害イベントブロックを生成する第一の処理と、
前記ＩＴ障害イベントブロックに属するイベントに頻出する属性値をもとに特徴情報を求める第二の処理と、
イベントブロックの特徴情報に発生した変化の内容と発生時間を変化テーブルに記録する第三の処理と、
前記特徴情報に基づいてＩＴ障害データベースを検索し、新規に取得されたＩＴ障害イベントブロックに類似する過去のＩＴ障害イベントブロックを検索する第四の処理と、
前記変化テーブルを参照し、検索対象とする時間範囲に応じて検索処理で使用する特徴情報の内容を補正する第五の処理と
を実行させるプログラム。 On the computer,
A first process of sequentially acquiring a monitoring event generated by a monitoring server that monitors an IT failure, and generating an IT failure event block composed of one or more events caused by one cause;
A second process for obtaining feature information based on attribute values that frequently appear in events belonging to the IT failure event block;
A third process for recording the content and time of the change that occurred in the event block feature information in the change table;
A fourth process of searching an IT failure database based on the feature information and searching for past IT failure event blocks similar to a newly acquired IT failure event block;
A program that refers to the change table and executes a fifth process for correcting the content of the feature information used in the search process according to the time range to be searched.