JP2008027284A

JP2008027284A - Fault processing system, fault processing method, and fault processing device and program

Info

Publication number: JP2008027284A
Application number: JP2006200645A
Authority: JP
Inventors: Shinji Oga; 伸二大賀
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2006-07-24
Filing date: 2006-07-24
Publication date: 2008-02-07

Abstract

<P>PROBLEM TO BE SOLVED: To eliminate the labor and time for a user to determine and input the number of times for allowing the occurrence of correctable errors until performing fault processing. <P>SOLUTION: A failure processing device 30 counts the number of the occurrences of correctable errors that occur before the occurrence of an uncorrectable error at particular places of information communication systems 11 and 12 and sets a threshold on the basis of a counted value. A system stop notifying means 360 notifies the information communication systems 11 and 12 of system stop when the number of the occurrences of correctable errors at the particular places coincides with the threshold after restoring the particular places in the information communication systems 11 and 12 by exchanging, repairing, or the like. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は障害処理システム、障害処理方法、障害処理装置およびプログラムに関し、特に、訂正可能エラーと訂正不可能エラーとが発生する情報通信システムに対する障害処理システム、障害処理方法、障害処理装置およびプログラムに関する。 The present invention relates to a failure processing system, a failure processing method, a failure processing device, and a program, and more particularly, to a failure processing system, a failure processing method, a failure processing device, and a program for an information communication system in which correctable errors and uncorrectable errors occur. .

情報通信システムの維持管理を行うにあたって、障害の発生の考慮することは必須である。特にＰＦＩ（Private Finance Initiative：公共施設等の建設、維持管理、運営等を民間の資金、経営能力および技術的能力を活用して行う新しい手法）事業のように２０年間、３０年間と長期間にわたって情報通信システムを維持管理する必要がある場合には、障害が発生した場合の対処は非常に重要になる。 In maintaining and managing an information communication system, it is essential to consider the occurrence of a failure. In particular, PFI (Private Finance Initiative: a new method for constructing, maintaining, and managing public facilities, etc., utilizing private funds, management and technical skills) business over 20 years and 30 years When it is necessary to maintain and manage the information communication system, it is very important to deal with a failure.

＜エラーのタイプについて＞
情報通信システムにおける障害には、訂正可能エラーと訂正不可能エラーの２つのタイプがある。以下、訂正可能エラーをＣＥ（Correctable Error）、訂正不可能エラーをＵＥ（Uncorrectable Error）と略記することがある。 <About error types>
There are two types of failures in information communication systems: correctable errors and uncorrectable errors. Hereinafter, a correctable error may be abbreviated as CE (Correctable Error), and an uncorrectable error may be abbreviated as UE (Uncorrectable Error).

訂正可能エラーとは、情報通信システム内のＥＣＣ(Error Checking and Collection)機能を有するエラー検出回路がエラーを検出した場合に、対象データの誤りを訂正することが可能となるエラーであり、そのため、情報通信システム内の処理は続行可能である。なお、ここではエラー発生時のファームウェア等による再実行で処理が正しく実行され、処理の続行が可能となるケースも訂正可能エラーの範疇と判断することとする。 A correctable error is an error that can correct an error in target data when an error detection circuit having an ECC (Error Checking and Collection) function in the information communication system detects an error. Processing in the information communication system can continue. Here, a case where processing is correctly executed by re-execution by firmware or the like when an error occurs and processing can be continued is also determined as a category of correctable errors.

訂正不可能エラーとは、情報通信システム内のＥＣＣ機能を有するエラー検出回路がエラーを検出した場合に、対象データの誤りを訂正することができず、データは保証されないエラーであり、従って、情報通信システム内の処理は続行不可能である。その結果、情報通信システムは停止せざるを得ない状態になる。 An uncorrectable error is an error in which the error of the target data cannot be corrected when the error detection circuit having the ECC function in the information communication system detects an error, and the data is not guaranteed. Processing in the communication system cannot continue. As a result, the information communication system must be stopped.

＜エラーの原因について＞
障害が発生する原因をハードウェアに限定して検討してみると、以下の通りである。最も一般的なものとしてハードウェアの物理的または電気的破壊がある。このほかに電子回路や電子回路を収容するパッケージ等の製造ミス、設計ミス等がある。設計ミスとしては、論理回路の設計ミスのほかに、電気回路の設計ミスが考えられる。電気回路の設計ミスは、例えば、遅延時間の計算ミスや論理回路への電源供給回路の設計ミス等である。 <Cause of error>
When the cause of the failure is limited to hardware, it is as follows. The most common is the physical or electrical breakdown of hardware. In addition to this, there are manufacturing mistakes, design mistakes, etc. for electronic circuits and packages containing electronic circuits. As a design error, in addition to a logic circuit design error, an electrical circuit design error can be considered. The design error of the electric circuit is, for example, a calculation error of delay time or a design error of the power supply circuit to the logic circuit.

＜エラー発生の状況＞
ハードウェアが破壊された場合には一般に障害は固定的に発生するが、電気回路の設計ミスの場合には、障害が固定的に発生するとは限らず、不定期に発生することも多い。 <Status of error occurrence>
When hardware is destroyed, a failure generally occurs in a fixed manner. However, in the case of an electrical circuit design error, a failure does not always occur in a fixed manner and often occurs irregularly.

また、ハードウェアが破壊された場合、最初から訂正不可能エラーとなるケースと訂正可能エラーになるケースとがある。また、ハードウェアは常に一気に破壊されるとは限らず、最初は訂正可能エラーが何度か発生し、その後、障害が周辺回路に拡散したり固定化したりして、最終的に訂正不可能エラーに変わることが多い。 In addition, when the hardware is destroyed, there are cases where an error becomes uncorrectable from the beginning and cases where a correctable error occurs. Also, the hardware is not always destroyed at once, and at first, a correctable error occurs several times, and then the error diffuses to the peripheral circuit or is fixed, and finally the uncorrectable error It often changes to.

＜エラーの再発＞
ハードウェアに物理的または電気的破壊が生じた場合、破壊個所を修理すれば障害は発生しなくなるが、製造ミスや設計ミスによって障害が発生するケースでは、一般に該当ハードウェアを交換する等の処置を実施しても障害は再発する。 <Reoccurrence of errors>
If the hardware is physically or electrically damaged, the failure will not occur if the damaged part is repaired. However, if the failure occurs due to a manufacturing error or a design error, the appropriate hardware is generally replaced. The failure will recur even if

なお、論理回路の設計ミスの場合には、原則、固定障害となり、ハードウェアの該当個所を修理または交換しても障害は再発する。 In the case of a logic circuit design error, in principle, it becomes a fixed failure, and the failure reoccurs even if the relevant part of the hardware is repaired or replaced.

情報通信システムに論理回路の設計ミス以外の設計ミスや製造ミスが混入されているケースにおいては、ミスの内容によっては、一定期間は訂正可能エラーが発生し、その後、訂正不可能エラーになることは多いと想定される。プログラムの実行環境や情報通信システムの電源電圧変動の環境によっては、このようなことが多く発生することは推察される。よって、ハードウェアの修理処置（実際にはハードウェア破壊ではないため、修理できない。破壊したと思われる個所のハードウェアを交換しても原因は残っている）後に再実行した場合には、再度同じような環境でプログラムが実行されて、上記現象が再発することが推察される。即ち、一定回数の訂正可能エラーが発生した後に訂正不可能エラーとなる。 In cases where a design error or manufacturing error other than a logic circuit design error is mixed in the information communication system, a correctable error occurs for a certain period of time, depending on the content of the error, and then an uncorrectable error occurs. It is assumed that there are many. Depending on the program execution environment and the power supply voltage fluctuation environment of the information communication system, it is presumed that this often occurs. Therefore, if it is re-executed after a hardware repair procedure (in fact, it is not hardware destruction, it cannot be repaired. The cause remains even after replacing the hardware where it appears to have been destroyed). It is presumed that the above phenomenon recurs when the program is executed in the same environment. That is, an uncorrectable error occurs after a certain number of correctable errors have occurred.

訂正不可能エラーが発生する前に訂正可能エラーの発生状況を捉えて設計変更や代替処置等を実施できれば、重大障害を未然に防止することが可能である。 If it is possible to grasp the situation of the occurrence of the correctable error before the occurrence of the uncorrectable error and implement a design change or an alternative measure, a serious failure can be prevented in advance.

例えば、論理回路の論理ブロック（ＡＮＤやＯＲやＮＡＮＤ等）を構成する半導体素子間を接続する配線の幅が設計値より狭く製造され、かつ製造テストをパスして情報通信システムに組み込まれた場合、実際にシステムとして稼動した時に、半導体素子や配線にたまたま過電流が流れると、まず１個所が破壊されて（エラー検出／訂正回路（ＥＣＣ）がデータに１ビットエラーを検出したが、データを修正した）、訂正可能エラーが発生する。 For example, when the width of the wiring connecting the semiconductor elements constituting the logic block (AND, OR, NAND, etc.) of the logic circuit is manufactured to be narrower than the design value, and passes the manufacturing test and is incorporated into the information communication system When an overcurrent happens to flow in a semiconductor element or wiring when actually operating as a system, one location is destroyed first (the error detection / correction circuit (ECC) detected a 1-bit error in the data, A correctable error occurs.

この結果はログされて利用者に報告されるが、情報通信システムは停止する訳ではない。時間が経過して、前記１個所のハードウェア破壊が継続している状態では、訂正可能エラーが頻発することになる。 This result is logged and reported to the user, but the information communication system does not stop. In a state in which the hardware destruction at one place continues over time, correctable errors frequently occur.

半導体素子や配線が疲労すると、ハードウェア破壊が周辺に波及して、複数個所でハードウェア破壊が発生することになる。ある特定のエラー検出／訂正回路がデータの複数ビットエラーを検出すると、訂正不可能エラーが発生し、情報通信システムは停止することになる。（複数のエラー検出／訂正回路が、それぞれデータの１ビットエラーを検出してもシステムはダウンしない。）
しかし、訂正不可能エラーが発生する前に、即ち、ハードウェア破壊が拡散する前に、エラーログを参照して、該当ハードウェアを交換すれば、訂正不可能エラーは発生せず、情報通信システムは停止することはない。 When semiconductor elements and wiring are fatigued, hardware destruction spreads to the periphery, and hardware destruction occurs at a plurality of locations. When a specific error detection / correction circuit detects a multi-bit error in data, an uncorrectable error occurs and the information communication system stops. (The system does not go down even if a plurality of error detection / correction circuits each detect a 1-bit error in data.)
However, before the uncorrectable error occurs, that is, before the hardware destruction spreads, if the corresponding hardware is replaced with reference to the error log, the uncorrectable error does not occur, and the information communication system Will never stop.

次に、訂正不可能エラーが発生する前に該当ハードウェアを交換する場合を考慮する。訂正可能エラーが発生した場合、エラー情報がエラーログに登録され、端末を通じて利用者（保守者）にアラームが通知され、利用者はエラーログの内容を見て、該当するパッケージを保守用パッケージと交換することになる。ハードウェア交換は１個または複数個のＬＳＩが実装されたパッケージの単位で行われる。 Next, consider the case where the corresponding hardware is replaced before an uncorrectable error occurs. When a correctable error occurs, error information is registered in the error log, an alarm is notified to the user (maintenance person) through the terminal, the user looks at the contents of the error log, and the corresponding package is designated as a maintenance package. Will be replaced. Hardware replacement is performed in units of packages in which one or a plurality of LSIs are mounted.

保守用パッケージは使用していたパッケージと同一製造ロットで製造されることは通常あり得ることであるので、保守用パッケージと使用していたパッケージとは同一の製造誤差で製造されていることが普通である。即ち、例えば、使用していたパッケージのＬＳＩ内配線幅が設計値より狭く製造されている場合には、保守用パッケージも同様の製造品質になっている。ただ、保守用パッケージは未使用であるので、未だハードウェア破壊は発生していない。訂正可能エラーがかなりの回数発生した後に、ハードウェア破壊が周辺へ拡散して、遂には訂正不可能エラーになる（製造テストをパスしているので、かなりの時間が経過し、訂正可能エラーがかなりの回数発生しないと、破壊は周辺に拡散しない）が、保守用パッケージと使用していたパッケージは未使用の状態では同様の製造品質であるため、保守用パッケージと使用していたパッケージの訂正不可能エラーが発生する前に発生する訂正可能エラーの回数は、同一になることは珍しくない。 Since maintenance packages are usually manufactured in the same production lot as the package used, the maintenance package and the package used are usually manufactured with the same manufacturing error. It is. That is, for example, when the in-LSI wiring width of the used package is manufactured to be narrower than the design value, the maintenance package has the same manufacturing quality. However, since the maintenance package is not used, hardware destruction has not yet occurred. After a significant number of correctable errors, hardware corruption spreads to the periphery and eventually becomes an uncorrectable error (because it has passed manufacturing test, a considerable amount of time has passed and the correctable error has If it does not occur a significant number of times, the destruction will not spread to the surroundings), but the maintenance package and the package used were of the same manufacturing quality when not used, so the maintenance package and the package used were corrected. It is not uncommon for the number of correctable errors that occur before an impossible error occurs to be the same.

＜訂正可能エラーが多重に発生したとき＞
次に訂正可能エラーが複数個所で発生した場合の情報通信システムの処理について述べる。 <When multiple correctable errors occur>
Next, processing of the information communication system when correctable errors occur at a plurality of locations will be described.

エラー検出回路は、情報通信システム内の主要な回路のデータをチェックするために、情報通信システム内の多数の個所に設定されている。一般に障害は情報通信システム内の１個所で発生し、複数個所で同時に発生することはまれである。訂正可能エラーが複数個所で同時に発生することも同様にまれである。また、訂正可能エラーが複数個所で同時に発生しても個々にエラーデータは訂正されるため、誤った処理を行うことはないが、障害が拡散して訂正不可能エラーに変わることもある。 The error detection circuit is set in many places in the information communication system in order to check the data of the main circuit in the information communication system. In general, a failure occurs at one place in the information communication system, and rarely occurs at a plurality of places at the same time. It is equally rare for correctable errors to occur simultaneously at multiple locations. Further, even if correctable errors occur simultaneously at a plurality of locations, the error data is individually corrected, so that erroneous processing is not performed, but the failure may be diffused to change to an uncorrectable error.

従来の障害処理システムの一例が特許文献１に記載されている。この従来の障害処理システムは、訂正可能エラーの発生回数が一定時間内に予め定められた設定回数になったときに障害処理を行う。この設定回数は、ユーザーが過去の経験から適切と判断する回数を定めて障害処理システムに入力しておく必要がある。 An example of a conventional failure handling system is described in Patent Document 1. This conventional failure processing system performs failure processing when the number of occurrences of correctable errors reaches a predetermined number of times within a predetermined time. This set number of times needs to be input to the failure processing system by determining the number of times that the user determines to be appropriate based on past experience.

特開平２−１３５５３３号公報（２頁、第２図）Japanese Patent Laid-Open No. 2-135533 (page 2, FIG. 2)

上述した従来の障害処理システムにおいては、障害処理を行うまでの訂正可能エラー発生許容回数を自動設定する手段がないため、ユーザーが定めて入力する手間を必要とするという問題点があった。 In the above-described conventional failure processing system, there is no means for automatically setting the allowable number of correctable error occurrences until failure processing is performed, and thus there is a problem in that it takes time and effort to determine and input by the user.

本発明の目的は、上述した従来の課題である、障害処理を行うまでの訂正可能エラー発生許容回数をユーザーが定めて入力する手間を必要とすることを解決する障害処理システム、障害処理方法、障害処理装置およびプログラムを提供することにある。 The object of the present invention is the above-described conventional problem, a failure processing system, a failure processing method that solves the need for the user to set and input a correctable error occurrence allowable number of times until failure processing is performed, To provide a failure processing apparatus and a program.

本発明の第１の障害処理システムは、情報通信システムと障害処理装置とを備え、
前記障害処理装置は、前記情報通信システムの特定個所で訂正不可能エラーが発生する前に発生する訂正可能エラー発生回数を計数する手段と、
計数値に基づいて閾値を設定する設定手段と、
前記特定個所を修復した後に、前記特定個所での訂正可能エラー発生回数が前記閾値と一致したときに前記情報通信システムに対して通知を行う通知手段とを有することを特徴とする。 The first failure handling system of the present invention comprises an information communication system and a failure handling device,
The failure processing apparatus is configured to count a correctable error occurrence number that occurs before an uncorrectable error occurs at a specific location in the information communication system;
Setting means for setting a threshold based on the count value;
And a notification means for notifying the information communication system when the number of correctable errors occurring at the specific location matches the threshold value after the specific location is repaired.

本発明の第２の障害処理システムは、第１の障害処理システムにおいて、前記設定手段は、前記閾値として前記計数値から１を減じた値とすることを特徴とする。 According to a second failure processing system of the present invention, in the first failure processing system, the setting means sets a value obtained by subtracting 1 from the count value as the threshold value.

本発明の第３の障害処理システムは、第１または２の障害処理システムにおいて、前記通知が前記情報通信システムに対するシステム停止指示であることを特徴とする。 The third failure processing system according to the present invention is characterized in that, in the first or second failure processing system, the notification is a system stop instruction to the information communication system.

本発明の第４の障害処理システムは、第１、２または３の障害処理システムにおいて、前記通知手段は、訂正可能エラーが前記情報通信システムの複数個所で発生した場合に前記情報通信システムにシステム停止警告を発行することを特徴とする。 According to a fourth failure processing system of the present invention, in the first, second, or third failure processing system, the notifying unit includes a system in the information communication system when a correctable error occurs at a plurality of locations in the information communication system. It is characterized by issuing a stop warning.

本発明の第５の障害処理システムは、第１、２、３または４の障害処理システムにおいて、前記障害処理装置は、障害報告生成手段とテーブル記憶手段と違約金生成手段とを有し、
障害報告生成手段は、前記情報通信システムで発生した障害を重要度に応じた障害レベルを付与して前記違約金生成手段に報告し、
前記テーブル記憶手段は、障害レベルと障害ポイントとの対応を規定する障害ポイント算出テーブルと障害ポイント合計と違約金額との対応を規定する違約金額算出テーブルとを格納し、
前記違約金生成手段は、前記障害報告生成手段から入手した障害報告と前記テーブル記憶手段から読み出した前記障害ポイント算出テーブルおよび前記違約金額算出テーブルとに基づいて発生障害に対応した違約金額を算出することを特徴とする。 According to a fifth failure processing system of the present invention, in the first, second, third, or fourth failure processing system, the failure processing apparatus includes a failure report generation unit, a table storage unit, and a penalty generation unit,
The failure report generation means gives a failure level according to the importance to the failure that occurred in the information communication system, and reports it to the penalty generation means,
The table storage means stores a failure point calculation table that defines the correspondence between the failure level and the failure point, and a penalty amount calculation table that defines the correspondence between the failure point total and the penalty amount,
The penalty generation means calculates a penalty amount corresponding to the fault that has occurred based on the failure report obtained from the failure report generation means, the failure point calculation table read from the table storage means, and the penalty amount calculation table. It is characterized by that.

本発明の第１の障害処理方法は、障害処理装置が、情報通信システムの特定個所で訂正不可能エラーが発生する前に発生する訂正可能エラー発生回数を計数するステップと、
前記障害処理装置が、計数値に基づいて閾値を設定するステップと、
前記障害処理装置が、前記特定個所を修復した後に、前記特定個所での訂正可能エラー発生回数が前記閾値と一致したときに前記情報通信システムに対して通知を行うステップとを有することを特徴とする。 The first failure processing method of the present invention is a step in which the failure processing apparatus counts the number of correctable errors that occur before an uncorrectable error occurs at a specific location in the information communication system;
The failure processing device sets a threshold based on a count value;
The failure processing apparatus comprises a step of notifying the information communication system when the number of correctable errors occurring at the specific location matches the threshold value after repairing the specific location. To do.

本発明の第２の障害処理方法は、第１の障害処理方法において、前記閾値として前記計数値から１を減じた値とすることを特徴とする。 A second failure processing method according to the present invention is characterized in that, in the first failure processing method, a value obtained by subtracting 1 from the count value is used as the threshold value.

本発明の第３の障害処理方法は、第１または２の障害処理方法において、前記通知が前記情報通信システムに対するシステム停止指示であることを特徴とする。 According to a third failure processing method of the present invention, in the first or second failure processing method, the notification is a system stop instruction to the information communication system.

本発明の第４の障害処理方法は、第１、２または３の障害処理方法において、前記障害処理装置が、訂正可能エラーが前記情報通信システムの複数個所で発生した場合に前記情報通信システムにシステム停止警告を発行するステップを有することを特徴とする。 According to a fourth failure processing method of the present invention, in the first, second, or third failure processing method, when the failure processing device causes a correctable error to occur in a plurality of locations of the information communication system, It has the step which issues a system stop warning.

本発明の第５の障害処理方法は、第１、２、３または４の障害処理方法において、前記障害処理装置が、前記情報通信システムで発生した障害を重要度に応じた障害レベルを付与して障害報告を生成するステップと、
前記障害処理装置が、障害レベルと障害ポイントとの対応を規定する障害ポイント算出テーブルと障害ポイント合計と違約金額との対応を規定する違約金額算出テーブルとをテーブル記憶手段に格納するステップと、
前記障害処理装置が、前記障害報告と前記テーブル記憶手段から読み出した前記障害ポイント算出テーブルおよび前記違約金額算出テーブルとに基づいて発生障害に対応した違約金額を算出するステップとを有することを特徴とする。 According to a fifth failure processing method of the present invention, in the first, second, third, or fourth failure processing method, the failure processing apparatus assigns a failure level corresponding to the importance to a failure that has occurred in the information communication system. Generating a trouble report with
The failure processing apparatus stores in a table storage means a failure point calculation table that defines a correspondence between a failure level and a failure point, and a penalty amount calculation table that defines a correspondence between a failure point total and a penalty amount;
The failure processing apparatus includes a step of calculating a penalty amount corresponding to a failure that has occurred based on the failure report and the failure point calculation table and the penalty amount calculation table read from the table storage unit. To do.

本発明の第１の障害処理装置は、情報通信システムに接続された障害処理装置であって、
前記情報通信システムの特定個所で訂正不可能エラーが発生する前に発生する訂正可能エラー発生回数を計数する手段と、
計数値に基づいて閾値を設定する設定手段と、
前記特定個所を修復した後に、前記特定個所での訂正可能エラー発生回数が前記閾値と一致したときに前記情報通信システムに対して通知を行う通知手段とを有することを特徴とする。 A first failure processing apparatus of the present invention is a failure processing apparatus connected to an information communication system,
Means for counting the number of correctable errors occurring before an uncorrectable error occurs at a specific location of the information communication system;
Setting means for setting a threshold based on the count value;
And a notification means for notifying the information communication system when the number of correctable errors occurring at the specific location matches the threshold value after the specific location is repaired.

本発明の第２の障害処理装置は、第１の障害処理装置において、前記設定手段は、前記閾値として前記計数値から１を減じた値とすることを特徴とする。 According to a second failure processing apparatus of the present invention, in the first failure processing apparatus, the setting means sets a value obtained by subtracting 1 from the count value as the threshold value.

本発明の第３の障害処理装置は、第１または２の障害処理装置において、前記通知が前記情報通信システムに対するシステム停止指示であることを特徴とする。 According to a third failure processing apparatus of the present invention, in the first or second failure processing apparatus, the notification is a system stop instruction for the information communication system.

本発明の第４の障害処理装置は、第１、２または３の障害処理装置において、前記通知手段は、訂正可能エラーが前記情報通信システムの複数個所で発生した場合に前記情報通信システムにシステム停止警告を発行することを特徴とする。 According to a fourth failure processing apparatus of the present invention, in the first, second, or third failure processing apparatus, the notifying unit includes a system in the information communication system when a correctable error occurs at a plurality of locations in the information communication system. It is characterized by issuing a stop warning.

本発明の第５の障害処理装置は、第１、２、３または４の障害処理装置において、障害報告生成手段とテーブル記憶手段と違約金生成手段とを有し、
障害報告生成手段は、前記情報通信システムで発生した障害を重要度に応じた障害レベルを付与して前記違約金生成手段に報告し、
前記テーブル記憶手段は、障害レベルと障害ポイントとの対応を規定する障害ポイント算出テーブルと障害ポイント合計と違約金額との対応を規定する違約金額算出テーブルとを格納し、
前記違約金生成手段は、前記障害報告生成手段から入手した障害報告と前記テーブル記憶手段から読み出した前記障害ポイント算出テーブルおよび前記違約金額算出テーブルとに基づいて発生障害に対応した違約金額を算出することを特徴とする。 According to a fifth failure processing apparatus of the present invention, in the first, second, third, or fourth failure processing apparatus, a failure report generation unit, a table storage unit, and a penalty generation unit are provided.
The failure report generation means gives a failure level according to the importance to the failure that occurred in the information communication system, and reports it to the penalty generation means,
The table storage means stores a failure point calculation table that defines the correspondence between the failure level and the failure point, and a penalty amount calculation table that defines the correspondence between the failure point total and the penalty amount,
The penalty generation means calculates a penalty amount corresponding to the fault that has occurred based on the failure report obtained from the failure report generation means, the failure point calculation table read from the table storage means, and the penalty amount calculation table. It is characterized by that.

本発明の第１のプログラムは、情報通信システムの特定個所で訂正不可能エラーが発生する前に発生する訂正可能エラー発生回数を計数する処理と、
計数値に基づいて閾値を設定する処理と、
前記特定個所を修復した後に、前記特定個所での訂正可能エラー発生回数が前記閾値と一致したときに前記情報通信システムに対して通知を行う処理とをコンピュータに実行させることを特徴とする。 The first program of the present invention is a process for counting the number of correctable errors occurring before an uncorrectable error occurs in a specific part of the information communication system,
Processing to set a threshold based on the count value;
After the specific location is repaired, the computer is caused to execute a process of notifying the information communication system when the number of correctable errors occurring at the specific location matches the threshold value.

本発明の第２のプログラムは、第１のプログラムにおいて、前記閾値として前記計数値から１を減じた値とすることを特徴とする。 The second program of the present invention is characterized in that, in the first program, the threshold value is a value obtained by subtracting 1 from the count value.

本発明の第３のプログラムは、第１または２のプログラムにおいて、前記通知が前記情報通信システムに対するシステム停止指示であることを特徴とする。 According to a third program of the present invention, in the first or second program, the notification is a system stop instruction to the information communication system.

本発明の第４のプログラムは、第１、２または３のプログラムにおいて、訂正可能エラーが前記情報通信システムの複数個所で発生した場合に前記情報通信システムにシステム停止警告を発行する処理をコンピュータに実行させることを特徴とする。 According to a fourth program of the present invention, in the first, second, or third program, a process for issuing a system stop warning to the information communication system when a correctable error occurs in a plurality of locations of the information communication system It is made to perform.

本発明の第５のプログラムは、第１、２、３または４のプログラムにおいて、前記情報通信システムで発生した障害を重要度に応じた障害レベルを付与して障害報告を生成する処理と、
障害レベルと障害ポイントとの対応を規定する障害ポイント算出テーブルと障害ポイント合計と違約金額との対応を規定する違約金額算出テーブルとをテーブル記憶手段に格納する処理と、
前記障害報告と前記テーブル記憶手段から読み出した前記障害ポイント算出テーブルおよび前記違約金額算出テーブルとに基づいて発生障害に対応した違約金額を算出する処理とをコンピュータに実行させることを特徴とする。 According to a fifth program of the present invention, in the first, second, third, or fourth program, a process of generating a fault report by assigning a fault level corresponding to the importance to a fault that has occurred in the information communication system;
Processing for storing in the table storage means a failure point calculation table that defines the correspondence between the failure level and the failure point, and a penalty amount calculation table that defines the correspondence between the failure point total and the penalty amount;
A computer is caused to execute processing for calculating a penalty amount corresponding to a fault that has occurred based on the failure report and the failure point calculation table and the penalty amount calculation table read from the table storage means.

本発明は、訂正不可能エラーが発生するまでに発生した訂正可能エラーの回数から一定数を引き算した値を、次の訂正可能エラー発生許容回数閾値として記憶することにより、訂正可能エラー発生許容回数をユーザーが定めて入力する手間を不要とする効果がある。 The present invention stores a value obtained by subtracting a certain number from the number of correctable errors that have occurred before an uncorrectable error has occurred, and stores the value as the next correctable error generation allowable threshold, thereby allowing the correctable error generation allowable number of times. This eliminates the need for the user to enter and enter

本発明を実施するための最良の形態について図面を参照して詳細に説明する。図１は本発明の第１の実施の形態の全体構成を示すブロック図である。図１を参照すると、第１の実施の形態は、対象システム１０、情報通信ネットワーク２０、障害処理装置３０を含む。 The best mode for carrying out the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the overall configuration of the first embodiment of the present invention. Referring to FIG. 1, the first embodiment includes a target system 10, an information communication network 20, and a failure processing device 30.

対象システム１０は、障害処理の対象となるコンピュータシステムであり、情報通信システム１１、情報通信システム１２を含む。情報通信システム１１、情報通信システム１２はそれぞれエラー検出回路１１０、１２０を含む。エラー検出回路１１０、１２０は、障害を検出した個所と、障害の種類（訂正可能エラー、訂正不可能エラー）とを通知する。対象システム１０が含む情報通信システムの数は、図１では２であるが、２に限定されず、３以上でも１でもよい。 The target system 10 is a computer system that is a target of failure processing, and includes an information communication system 11 and an information communication system 12. The information communication system 11 and the information communication system 12 include error detection circuits 110 and 120, respectively. The error detection circuits 110 and 120 notify the location where the failure is detected and the type of failure (correctable error, uncorrectable error). The number of information communication systems included in the target system 10 is two in FIG. 1, but is not limited to two, and may be three or more or one.

情報通信ネットワーク２０は、対象システム１０と障害処理装置３０を接続する通信回線である。 The information communication network 20 is a communication line that connects the target system 10 and the failure processing device 30.

障害処理装置３０は、情報通信システム１１、１２から報告される障害情報を受信して、受信した障害情報に対応した処理を行う装置であって、記憶手段３１０、システム停止通知手段３６０、障害報告生成手段３７０を含む。障害処理装置３０はさらに、書込読出制御手段３２１などを含むが、図２に詳細を記載し、図１では省略している。 The failure processing device 30 is a device that receives failure information reported from the information communication systems 11 and 12 and performs processing corresponding to the received failure information, and includes a storage unit 310, a system stop notification unit 360, a failure report Generation means 370 is included. The failure processing apparatus 30 further includes a write / read control means 321 and the like, but details are shown in FIG. 2 and omitted in FIG.

記憶手段３１０は、対象システムである情報通信システム１１、１２から報告される障害情報を記憶。記憶手段３１０としては、たとえば、ＳＲＡＭ（Static Random Access Memory）等、電源断、再起動しても記憶内容が消えないものを使用する。 The storage unit 310 stores failure information reported from the information communication systems 11 and 12 that are target systems. As the storage means 310, for example, an SRAM (Static Random Access Memory) or the like that does not erase the stored contents even when the power is turned off or restarted is used.

システム停止通知手段３６０は、記憶手段３１０から読み出された障害情報を解析し、対象システムである情報通信システム１１、１２へ、システム停止の警告またはシステム停止の指示を送信する。 The system stop notification unit 360 analyzes the failure information read from the storage unit 310 and transmits a system stop warning or a system stop instruction to the information communication systems 11 and 12 that are target systems.

障害報告生成手段３７０は、記憶手段３１０から読み出された内容に基づいて、障害の重要度等を解析し、重要度に応じた障害レベルを決定する。 The failure report generation unit 370 analyzes the importance of the failure based on the content read from the storage unit 310 and determines a failure level according to the importance.

図２は、障害処理装置３０の詳細構成図である。図２を参照すると、障害処理装置３０は、記憶手段３１０、書込読出制御手段３２１、アドレス選択手段３２２、ＡＮＤ回路３２５、カウント回路３３１、３３２、３３４、３３５、選択手段３４１、３４２、３４３、３４４、３４５、非ゼロ検出手段３５１、比較手段３５２、反転回路３５３、検出手段３５４、３５５、システム停止通知手段３６０、障害報告生成手段３７０を含む。 FIG. 2 is a detailed configuration diagram of the failure processing apparatus 30. Referring to FIG. 2, the failure processing apparatus 30 includes a storage unit 310, a write / read control unit 321, an address selection unit 322, an AND circuit 325, count circuits 331, 332, 334, 335, selection units 341, 342, 343, 344, 345, non-zero detection means 351, comparison means 352, inverting circuit 353, detection means 354, 355, system stop notification means 360, and failure report generation means 370.

記憶手段３１０は、複数のワード（語）で構成され、かつ、５つのセグメントに分けられ、情報通信システム１１、１２から送信された障害情報を記憶する。ここでは記憶される障害情報は、訂正可能エラーと訂正不可能エラーとしている。また、各ワードは、情報通信システム１１、１２の障害発生個所（エラー検出回路１１０、１２０が障害を検出した個所）に対応しており、所定のワードに所定の数値が格納された場合、障害が発生した個所が判別できることになる。記憶手段３１０の５つのセグメントとは、ＣＥ発生回数部３１１、ＣＥ閾値部３１２、ＣＥ閾値有効ビット３１３、ＣＥ発生回数累計部３１４、および、ＵＥ発生回数累計部３１５であり、それぞれ、訂正可能エラーの発生回数、訂正可能エラーの発生の許容値を示す閾値、前記閾値の値が有効であることを示すビット、訂正可能エラーの発生回数の累計値、訂正不可能エラーの発生回数の累計値を記憶している。
ＣＥ発生回数部３１１は、訂正可能エラーの発生回数を訂正不可能エラーが発生する迄記憶する。訂正不可能エラーが発生すると、前記ＣＥ発生回数部３１１の内容は、ＣＥ閾値部３１２に記憶された後に、０にクリアされる。 The storage means 310 is composed of a plurality of words (words), is divided into five segments, and stores failure information transmitted from the information communication systems 11 and 12. Here, the stored fault information is a correctable error and an uncorrectable error. Each word corresponds to a failure occurrence location of the information communication systems 11 and 12 (location where the error detection circuits 110 and 120 have detected a failure), and when a predetermined numerical value is stored in a predetermined word, The location where this occurs can be determined. The five segments of the storage means 310 are a CE occurrence count section 311, a CE threshold section 312, a CE threshold valid bit 313, a CE occurrence count accumulation section 314, and a UE occurrence count accumulation section 315, each of which can be corrected. Occurrence threshold, threshold value indicating the allowable value for occurrence of correctable error, bit indicating that the threshold value is valid, cumulative value for the occurrence number of correctable errors, cumulative value for the occurrence number of uncorrectable errors I remember it.
The CE occurrence count section 311 stores the number of correctable error occurrences until an uncorrectable error occurs. When an uncorrectable error occurs, the contents of the CE occurrence number part 311 are cleared to 0 after being stored in the CE threshold value part 312.

なお、記憶手段３１０の５つの各セグメントは、上記の記憶のほかに、訂正可能エラーの発生時刻、訂正不可能エラーの発生時刻を記憶する構成としても良い。 Each of the five segments of the storage unit 310 may store a correctable error occurrence time and an uncorrectable error occurrence time in addition to the above storage.

書込読出制御手段３２１は、記憶手段３１０の各セグメントへの書込み読出しを制御する。書込みの指示は次の通りである。訂正可能エラーが報告された場合に、ＣＥ発生回数部３１１およびＣＥ発生回数累計部３１４に対する書込み指示を行う。訂正不可能エラーが報告された時にＣＥ発生回数部３１１の値が０で、かつ、ＣＥ閾値有効ビット３１３が０の場合に、ＣＥ閾値部３１２およびＣＥ閾値有効ビット３１３に対する書込み指示を行う。訂正不可能エラーが報告された場合に、ＵＥ発生回数累計部３１５に対する書込み指示を行う。比較手段３５２の出力が所定の値の場合に、ＣＥ発生回数部３１１およびＣＥ閾値有効ビット３１３に対する書込み指示を行う（０にクリア）。初期設定時に、ＣＥ発生回数部３１１、ＣＥ閾値有効ビット３１３、ＣＥ発生回数累計部３１４、および、ＵＥ発生回数累計部３１５に対する書込み指示を行う（０にクリア）。 The writing / reading control unit 321 controls writing / reading of each segment of the storage unit 310. The instruction for writing is as follows. When a correctable error is reported, a write instruction is issued to the CE occurrence count section 311 and the CE occurrence count accumulation section 314. When an uncorrectable error is reported, if the value of the CE occurrence number part 311 is 0 and the CE threshold valid bit 313 is 0, a write instruction is given to the CE threshold part 312 and the CE threshold valid bit 313. When an uncorrectable error is reported, a write instruction is given to the UE occurrence count accumulating unit 315. When the output of the comparison means 352 is a predetermined value, a write instruction is given to the CE occurrence count section 311 and the CE threshold valid bit 313 (cleared to 0). At the time of initial setting, a write instruction is issued to the CE occurrence number unit 311, CE threshold valid bit 313, CE occurrence number accumulation unit 314, and UE occurrence number accumulation unit 315 (cleared to 0).

アドレス選択手段３２２は、障害報告時に情報通信システム１１、１２から送信される障害個所情報に従って、記憶手段３１０のワードを選択する機能を有する。障害個所情報の値は、そのまま記憶手段３１０のワードを指定する。即ち、情報通信システム１１、１２内の障害発生個所と記憶手段３１０のワードは、１対１に対応する。更に、アドレス選択手段３２２は、記憶手段３１０の全ワードをスキャンする時に、全ワードを選択する機能も有する。 The address selection unit 322 has a function of selecting a word in the storage unit 310 in accordance with the failure location information transmitted from the information communication systems 11 and 12 when a failure is reported. As the value of the fault location information, the word of the storage means 310 is designated as it is. That is, the location of the failure in the information communication systems 11 and 12 and the word in the storage means 310 correspond one-to-one. Further, the address selection unit 322 has a function of selecting all the words when scanning all the words in the storage unit 310.

カウント回路３３１は、ＣＥ発生回数部３１１に記憶された値に１を加算する機能を有する。カウント回路３３２は、ＣＥ発生回数部３１１に記憶された値から１を引き算する機能を有する。カウント回路３３４は、ＣＥ発生回数累計部３１４に記憶された値に１を加算する機能を有する。カウント回路３３５は、ＵＥ発生回数累計部３１５に記憶された値に１を加算する機能を有する。 The count circuit 331 has a function of adding 1 to the value stored in the CE generation number unit 311. The count circuit 332 has a function of subtracting 1 from the value stored in the CE occurrence number unit 311. The count circuit 334 has a function of adding 1 to the value stored in the CE occurrence count accumulating unit 314. The count circuit 335 has a function of adding 1 to the value stored in the UE occurrence count accumulating unit 315.

選択手段３４１は、カウント回路３３１または０を選択する。選択手段３４２は、カウント回路３３２または１を選択する。選択手段３４３は、ＡＮＤ回路３２５または０を選択する。選択手段３４４は、カウント回路３３４または０を選択する。 The selection unit 341 selects the count circuit 331 or 0. The selection unit 342 selects the count circuit 332 or 1. The selection unit 343 selects the AND circuit 325 or 0. The selection means 344 selects the count circuit 334 or 0.

選択手段３４５は、カウント回路３３５または０を選択する。 The selection unit 345 selects the count circuit 335 or 0.

比較手段３５２は、ＣＥ発生回数部３１１に記憶された値とＣＥ閾値部３１２に記憶された値を比較し、比較結果を障害報告生成手段３７０に報告し、前記両値が一致した場合は、システム停止通知手段３６０経由でシステム停止指示信号を情報通信システム１１、１２に対して発行する。 The comparison unit 352 compares the value stored in the CE occurrence number unit 311 with the value stored in the CE threshold unit 312, reports the comparison result to the failure report generation unit 370, and if both the values match, A system stop instruction signal is issued to the information communication systems 11 and 12 via the system stop notification means 360.

検出手段３５４は、ＣＥ発生回数累計部３１４の各ワードを読み出して、１以上の値を含むワードが複数あるかを検出し、結果を障害報告生成手段３７０に報告し、１以上の値を含むワードが複数ある場合に、システム停止通知手段３６０経由でシステム停止警告信号を情報通信システム１１、１２に対して発行する。 The detecting unit 354 reads each word of the CE occurrence count accumulating unit 314, detects whether there are a plurality of words including one or more values, reports the result to the failure report generating unit 370, and includes one or more values. When there are a plurality of words, a system stop warning signal is issued to the information communication systems 11 and 12 via the system stop notification means 360.

検出手段３５５は、ＵＥ発生回数累計部３１５の各ワードを読み出して、１以上の値を含むワードが複数あるかを検出し、結果を障害報告生成手段３７０に報告する。 The detecting unit 355 reads each word of the UE occurrence count accumulating unit 315, detects whether there are a plurality of words including one or more values, and reports the result to the failure report generating unit 370.

検出手段３５６はＣＥ閾値有効ビット３１３のワードの値を検出する。 The detecting means 356 detects the value of the word of the CE threshold effective bit 313.

障害報告生成手段３７０は、比較手段３５２、検出手段３５４および３５５の出力から障害の重要度を判別し、障害のレベルを付与する。
障害報告（レベル１）は、対象システムの情報通信システム１１または１２で、訂正可能エラーが１個所で発生したことを示す。
障害報告（レベル２）は、対象システムの情報通信システム１１または１２で、訂正可能エラーが複数個所で発生したことを示す。
障害報告（レベル３）は、対象システムの情報通信システム１１または１２における訂正可能エラーの発生回数がＣＥ閾値部３１２に記憶された値（閾値）と一致したことを示す。
障害報告（レベル４）は、対象システムの情報通信システム１１または１２で、訂正不可能エラーが１個所で発生したことを示す。
障害報告（レベル５）は、対象システムの情報通信システム１１または１２で、訂正不可能エラーが複数個所で発生したことを示す。 The failure report generation unit 370 determines the importance of the failure from the outputs of the comparison unit 352 and the detection units 354 and 355, and assigns a failure level.
The failure report (level 1) indicates that a correctable error has occurred in one place in the information communication system 11 or 12 of the target system.
The failure report (level 2) indicates that a plurality of correctable errors have occurred in the information communication system 11 or 12 of the target system.
The failure report (level 3) indicates that the number of occurrences of correctable errors in the information communication system 11 or 12 of the target system matches the value (threshold value) stored in the CE threshold value unit 312.
The failure report (level 4) indicates that an uncorrectable error has occurred in one place in the information communication system 11 or 12 of the target system.
The failure report (level 5) indicates that uncorrectable errors have occurred in a plurality of places in the information communication system 11 or 12 of the target system.

なお、本発明の実施の形態の障害処理装置３０の動作はコンピュータ・プログラム制御により行わせることが可能である。すなわち、記録媒体に記録したプログラムを障害処理装置３０に読み込ませるか、あるいは、ネットワークからプログラムを障害処理装置３０に読み込ませて、以下に説明する動作を実行する。 The operation of the failure processing apparatus 30 according to the embodiment of the present invention can be performed by computer program control. That is, the program recorded in the recording medium is read by the failure processing device 30 or the program is read from the network by the failure processing device 30 and the operation described below is executed.

次に、第１の実施の形態の動作について説明する。図３は第１の実施の形態の動作を示すフローチャートである。 Next, the operation of the first embodiment will be described. FIG. 3 is a flowchart showing the operation of the first embodiment.

まず、処理に先立って、書込読出制御手段３２１およびアドレス選択手段３２２が、記憶手段３１０の全ワード、全セグメントに０を書き込む。 First, prior to processing, the write / read control unit 321 and the address selection unit 322 write 0 to all words and all segments of the storage unit 310.

書込読出制御手段３２１は、情報通信システム１１、１２からのエラー報告で訂正可能エラーが報告されたかを判定し（ステップＳ１）、訂正可能エラーが報告されていない場合には、次に情報通信システム１１、１２からのエラー報告で訂正不可能エラーが報告されたか判定し（ステップＳ２）、訂正不可能エラーが報告されていない場合には、何の処理も行わずに終了する。 The write / read control means 321 determines whether or not a correctable error has been reported in the error report from the information communication systems 11 and 12 (step S1). It is determined whether or not an uncorrectable error has been reported in the error reports from the systems 11 and 12 (step S2). If no uncorrectable error has been reported, the process ends without performing any processing.

情報通信システム１１、１２からのエラー報告で訂正不可能エラーが報告されている場合（ステップＳ２のＹｅｓ）は、書込読出制御手段３２１は、記憶手段３１０のＵＥ発生回数累計部３１５のアドレス選択手段３２２が指示するワードを読み出し、カウント回路３３５により１を加算して、選択手段３４５を経由してこのワードに書き込む（ステップＳ３）。 When an uncorrectable error is reported in the error report from the information communication systems 11 and 12 (Yes in step S2), the write / read control unit 321 selects the address of the UE occurrence count accumulation unit 315 in the storage unit 310. The word indicated by the means 322 is read out, 1 is added by the count circuit 335, and the word is written into this word via the selection means 345 (step S3).

続いて、書込読出制御手段３２１は、ＡＮＤ回路３２５にて、情報通信システム１１、１２からのエラー報告で訂正不可能エラーが報告されていること、かつ、記憶手段３１０のＣＥ発生回数部３１１から読み出された値が０ではないこと、即ち、非ゼロ検出手段３５１の出力値が１であること、かつ、記憶手段３１０のＣＥ閾値有効ビット３１３から読み出された値が０であること、即ち、反転回路３５３の出力の値が１であることを判定する（ステップＳ４）。ステップＳ４でＡＮＤ回路３２５の出力が１の場合、書込読出制御手段３２１は、記憶手段３１０のＣＥ閾値有効ビット３１３のアドレス選択手段３２２が指示するワードに１をセットし（ステップＳ５）、カウント回路３３２にて記憶手段３１０のＣＥ発生回数部３１１のアドレス選択手段３２２が指示するワードから読み出された値から１を引き算して、選択手段３４２を経由して記憶手段３１０のＣＥ閾値部３１２のアドレス選択手段３２２が指示するワードに書き込む（ステップＳ６）。 Subsequently, the write / read control means 321 indicates that the AND circuit 325 reports an uncorrectable error in the error report from the information communication systems 11 and 12, and the CE occurrence count section 311 of the storage means 310. The value read from the non-zero detection means 351 is 1, and the value read from the CE threshold valid bit 313 of the storage means 310 is 0. That is, it is determined that the output value of the inverting circuit 353 is 1 (step S4). When the output of the AND circuit 325 is 1 in step S4, the write / read control means 321 sets 1 to the word indicated by the address selection means 322 of the CE threshold valid bit 313 in the storage means 310 (step S5) and counts. In the circuit 332, 1 is subtracted from the value read from the word indicated by the address selection means 322 of the CE occurrence number section 311 of the storage means 310, and the CE threshold value section 312 of the storage means 310 is passed via the selection means 342. Is written in the word indicated by the address selection means 322 (step S6).

次に、書込読出制御手段３２１は、アドレス選択手段３２２の指示によって、記憶手段３１０のＵＥ発生回数累計部３１５の全ワードを読み出して、検出手段３５５に供給する。検出手段３５５は、ＵＥ発生回数累計部３１５から読み出された全ワードの中に１以上の値を含むワードがあるかを判定する（ステップＳ７）。検出手段３５５で１以上の値を含むワードが１語であることが検出された場合には、障害報告生成手段３７０は、レベル４の障害報告を生成する（ステップＳ８）。ステップＳ７で検出手段３５５で１以上の値を含むワードが複数あることが検出された場合には、障害報告生成手段３７０は、レベル５の障害報告を生成する（ステップＳ９）。 Next, the write / read control unit 321 reads all the words in the UE occurrence count accumulating unit 315 of the storage unit 310 according to the instruction of the address selection unit 322 and supplies the read word to the detection unit 355. The detecting unit 355 determines whether there is a word including one or more values among all the words read from the UE occurrence count accumulating unit 315 (step S7). If the detection unit 355 detects that a word including one or more values is one word, the failure report generation unit 370 generates a level 4 failure report (step S8). If the detection unit 355 detects that there are a plurality of words including one or more values in step S7, the failure report generation unit 370 generates a level 5 failure report (step S9).

書込読出制御手段３２１は、情報通信システム１１、１２からのエラー報告で訂正可能エラーが報告されたかを判定して、訂正可能エラーが報告されている場合には（ステップＳ１のＹｅｓ）、書込読出制御手段３２１は、記憶手段３１０のＣＥ発生回数部３１１のアドレス選択手段３２２が指示するワードを読み出して、カウント回路３３１により１を加算して、選択手段３４１を経由してこのワードに書き込む（ステップＳ１１）。さらに、書込読出制御手段３２１は、記憶手段３１０のＣＥ発生回数累計部３１４のアドレス選択手段３２２が指示するワードを読み出して、カウント回路３３４により１を加算して、選択手段３４４を経由してこのワードに書き込む（ステップＳ１２）。 The write / read control means 321 determines whether a correctable error has been reported in the error report from the information communication systems 11 and 12, and if a correctable error has been reported (Yes in step S1), The read / in control unit 321 reads the word instructed by the address selection unit 322 of the CE occurrence count unit 311 of the storage unit 310, adds 1 by the count circuit 331, and writes this word via the selection unit 341. (Step S11). Further, the writing / reading control unit 321 reads the word indicated by the address selection unit 322 of the CE occurrence count accumulating unit 314 of the storage unit 310, adds 1 by the count circuit 334, and passes through the selection unit 344. Write to this word (step S12).

次に、書込読出制御手段３２１は、記憶手段３１０のＣＥ閾値有効ビット３１３のアドレス選択手段３２２が指示するワードの値を検出手段３５６にて判定し（ステップＳ１３）、ＣＥ閾値有効ビット３１３のアドレス選択手段３２２が指示するワードの値が１の場合は、書込読出制御手段３２１は、記憶手段３１０のＣＥ発生回数部３１１のアドレス選択手段３２２が指示するワード内の値と、記憶手段３１０のＣＥ閾値部３１２のアドレス選択手段３２２が指示するワード内の値を、比較手段３５２により比較し（ステップＳ１４）、両者が一致するか否かを判定する（ステップＳ１５）。ステップＳ１３でワードの値が１ということは、そのワードが示す個所を交換、修理等により修復していることを表している。 Next, the write / read control means 321 determines the value of the word indicated by the address selection means 322 of the CE threshold valid bit 313 of the storage means 310 by the detection means 356 (step S13), and the CE threshold valid bit 313 When the value of the word instructed by the address selection unit 322 is 1, the writing / reading control unit 321 includes the value in the word instructed by the address selection unit 322 of the CE occurrence number unit 311 of the storage unit 310 and the storage unit 310. The value in the word indicated by the address selection means 322 of the CE threshold value 312 is compared by the comparison means 352 (step S14), and it is determined whether or not they match (step S15). A word value of 1 in step S13 indicates that the location indicated by the word has been repaired by replacement, repair, or the like.

記憶手段３１０のＣＥ閾値有効ビット３１３のアドレス選択手段３２２が指示するワードの値が０の場合（ステップＳ１３のＮｏ）、および、記憶手段３１０のＣＥ発生回数部３１１のアドレス選択手段３２２が指示するワード内の値と、記憶手段３１０のＣＥ閾値部３１２のアドレス選択手段３２２が指示するワード内の値が一致しなかった場合（ステップＳ１５のＮｏ）、書込読出制御手段３２１は、アドレス選択手段３２２の指示によって、記憶手段３１０のＣＥ発生回数累計部３１４の全ワードを読み出して、検出手段３５４に供給する。検出手段３５４は、１以上の値を含むワードが複数あるかを判定し（ステップＳ１６）、検出手段３５４で１以上の値を含むワードが１語であることが検出された場合には、障害報告生成手段３７０は、レベル１の障害報告を生成する（ステップＳ１７）。検出手段３５４で１以上の値を含むワードが複数あることが検出された場合には、検出手段３５４は、システム停止通知手段３６０に対して、システム停止警告信号を通知し、システム停止通知手段３６０は、情報通信システム１１、１２に対して、システム停止警告信号を発行し（ステップＳ１８）、障害報告生成手段３７０は、レベル２の障害報告を生成する（ステップＳ１９）。 When the value of the word indicated by the address selection means 322 of the CE threshold valid bit 313 of the storage means 310 is 0 (No in step S13), and the address selection means 322 of the CE occurrence number section 311 of the storage means 310 indicates When the value in the word does not match the value in the word indicated by the address selection unit 322 of the CE threshold value unit 312 of the storage unit 310 (No in step S15), the write / read control unit 321 reads the address selection unit. In response to the instruction 322, all the words in the CE occurrence count accumulating unit 314 in the storage unit 310 are read and supplied to the detection unit 354. The detection unit 354 determines whether there are a plurality of words including one or more values (step S16). If the detection unit 354 detects that a word including one or more values is one word, a failure is detected. The report generation unit 370 generates a level 1 failure report (step S17). When the detection unit 354 detects that there are a plurality of words including one or more values, the detection unit 354 notifies the system stop notification unit 360 of a system stop warning signal, and the system stop notification unit 360. Issues a system stop warning signal to the information communication systems 11 and 12 (step S18), and the failure report generator 370 generates a level 2 failure report (step S19).

記憶手段３１０のＣＥ発生回数部３１１のアドレス選択手段３２２が指示するワード内の値と、記憶手段３１０のＣＥ閾値部３１２のアドレス選択手段３２２が指示するワード内の値が一致した場合（ステップＳ１５のＹｅｓ）、書込読出制御手段３２１は、記憶手段３１０のＣＥ閾値有効ビット３１３のアドレス選択手段３２２が指示するワードを０にクリアし（ステップＳ２０）、記憶手段３１０のＣＥ発生回数部３１１のアドレス選択手段３２２が指示するワードを０にクリアする（ステップＳ２１）。 When the value in the word instructed by the address selection means 322 of the CE occurrence count section 311 of the storage means 310 matches the value in the word instructed by the address selection means 322 of the CE threshold value section 312 of the storage means 310 (step S15). In step S20, the writing / reading control unit 321 clears the word indicated by the address selection unit 322 of the CE threshold effective bit 313 in the storage unit 310 to 0 (step S20). The word indicated by the address selection means 322 is cleared to 0 (step S21).

続いて、比較手段３５２は、システム停止通知手段３６０に対して、システム停止指示信号を通知し、システム停止通知手段３６０は、情報通信システム１１、１２に対して、システム停止指示信号を発行し（ステップＳ２２）、障害報告生成手段３７０は、レベル３の障害報告を生成する（ステップ２３）。 Subsequently, the comparison unit 352 notifies the system stop notification unit 360 of a system stop instruction signal, and the system stop notification unit 360 issues a system stop instruction signal to the information communication systems 11 and 12 ( In step S22), the failure report generation means 370 generates a level 3 failure report (step 23).

ここで、情報通信システム１１、１２に対して発行するシステム停止警告信号とシステム停止指示信号について説明する。システム停止通知手段３６０は、訂正可能エラーが複数個所で同時に発生した場合には、システム停止警告信号を発行し、訂正可能エラーの発生回数が閾値を超えようとする場合には、その直前にシステム停止指示信号を発行する。システム停止警告信号よりシステム停止指示信号の方が、より重大な障害が発生したことを表示するものである。ただし、この２つの信号をどう取り扱うかは、対象システム１０である情報通信システム１１、１２の判断による。 Here, a system stop warning signal and a system stop instruction signal issued to the information communication systems 11 and 12 will be described. The system stop notification unit 360 issues a system stop warning signal when correctable errors occur simultaneously at a plurality of locations, and when the number of occurrences of correctable errors exceeds the threshold, the system immediately before Issue a stop instruction signal. The system stop instruction signal indicates that a more serious failure has occurred than the system stop warning signal. However, how to handle these two signals depends on the judgment of the information communication systems 11 and 12 as the target system 10.

上記説明では、閾値として、訂正不可能エラーが発生する前に発生した訂正可能エラー発生回数から１を減じた値としている。従って、訂正不可能エラーが発生する前に発生する訂正可能エラー発生回数が同じ値になる状況であれば、システム停止指示信号発行時には訂正可能エラーがあと１回発生してもよいだけの余裕があることになる。ここで減じる値は１の他に０または１以上の値を適宜設定することができる。 In the above description, the threshold is a value obtained by subtracting 1 from the number of correctable errors that occurred before the occurrence of an uncorrectable error. Therefore, if the number of correctable errors that occur before an uncorrectable error occurs has the same value, there is room for another correctable error to occur when the system stop instruction signal is issued. There will be. The value to be reduced here can be set to 0 or 1 or more as appropriate in addition to 1.

また、上記説明では、１回の計数値で閾値を決定するようにしているが、複数回の計数値に基づいて閾値を決定するようにしてもよい。 In the above description, the threshold value is determined based on a single count value. However, the threshold value may be determined based on a plurality of count values.

第１の実施の形態によれば、訂正不可能エラーが発生するまでに発生した訂正可能エラーの回数から一定数を引き算した値を、次の訂正可能エラー発生許容回数閾値として記憶することにより、訂正可能エラー発生許容回数をユーザーが定めて入力する手間を不要とする効果がある。 According to the first embodiment, by storing a value obtained by subtracting a certain number from the number of correctable errors that have occurred until an uncorrectable error occurs, as the next correctable error occurrence allowable threshold value, This has the effect of eliminating the need for the user to enter and input the allowable number of correctable error occurrences.

また、訂正可能エラーが複数個所で同時に発生した場合、システム停止警告信号を対象システムに対して発行することにより、訂正不可能エラーが発生する前に、障害を除去できる効果がある。 In addition, when correctable errors occur simultaneously at a plurality of locations, a failure can be eliminated before an uncorrectable error occurs by issuing a system stop warning signal to the target system.

次に本発明の第２の実施の形態について説明する。情報通信システムを業者がユーザーに納入した場合、一般に何らかの形でシステムの維持管理業務および運営支援業務を行う。前述したように、例えば、ＰＦＩ事業で情報通信システムを納入した場合には、ユーザーからの委託を受けて最長３０年間にわたり、維持管理および運営支援を行うことになっている。 Next, a second embodiment of the present invention will be described. When an information communication system is delivered to a user by a supplier, the system is generally maintained and managed in some form. As described above, for example, when an information communication system is delivered in a PFI business, maintenance and operation support is performed for up to 30 years upon entrustment from a user.

納入業者とユーザー間で締結される委託契約書においては、維持管理業務および運営支援業務の内容、この業務遂行に対する納入業者への支払い金額、委託期間、そして、この業務を規定通りに実施できなかった場合のペナルティ、即ち、違約金の算出方法および額等が決められている。 In the consignment contract concluded between the supplier and the user, the contents of the maintenance and operation support operations, the payment amount to the supplier for the execution of this operation, the consignment period, and this operation cannot be performed as prescribed. In other words, the penalty, that is, the penalty calculation method and amount, etc. are determined.

違約金の算出方法に着目すると、ユーザーはモニタリング（契約書に従い適正かつ確実なサービスの提供の確保がなされているかどうかを確認する手段）を実施して、適正な対価を支払うか、違約金の支払いを求めるかを決定する。モニタリングでは、ユーザーによって、情報通信システムの稼働に関する性能、操作性、障害発生状況、書類の整備状況等が確認されるが、業者から正確な情報が提供されない限り、正確なモニタリングが実施できない項目がある。例えば、障害の報告に関して言えば、過去に障害の予兆があり、障害が発生する前に処置すれば救済できるが、業者が処置を怠り障害が発生しシステムが停止すれば、ユーザーに大きな損失を与えることになる。この状況は納入業者が報告しない限り、ユーザー自身で判断することは非常に困難であり、従来の欠点であった。第２の実施の形態は、上記欠点を解決するための一方式である。 Focusing on the penalty calculation method, the user conducts monitoring (means to confirm whether the provision of appropriate and reliable services is made in accordance with the contract) and pays the appropriate price, Decide whether to ask for payment. In monitoring, the performance, operability, failure occurrence status, document maintenance status, etc. related to the operation of the information communication system are checked by the user, but there are items that cannot be accurately monitored unless accurate information is provided by the contractor. is there. For example, when reporting a failure, there is a sign of a failure in the past, and it can be remedied by taking action before the failure occurs. Will give. Unless the supplier reports this situation, it is very difficult for the user to judge it, which is a conventional drawback. The second embodiment is a method for solving the above-described drawbacks.

図４は、第２の実施の形態の全体構成を示すブロック図である。図４を参照すると、第２の実施の形態は、第１の実施の形態の構成（図１）に加えて、違約金生成手段３８０、テーブル記憶手段３９０を含む。テーブル記憶手段３９０は、障害ポイント算出テーブル３９１、違約金額算出テーブル３９２を格納する。 FIG. 4 is a block diagram showing the overall configuration of the second embodiment. Referring to FIG. 4, the second embodiment includes a penalty generation unit 380 and a table storage unit 390 in addition to the configuration of the first embodiment (FIG. 1). The table storage unit 390 stores a failure point calculation table 391 and a penalty amount calculation table 392.

図５は、違約金処理を実施するブロック図で、障害報告生成手段３７０と違約金生成手段３８０の接続関係を示している。第１の実施の形態で図２を参照して説明したように、障害報告生成手段３７０は、比較手段３５２、検出手段３５４および３５５の出力から障害の重要度を判別し、障害のレベルを付与する機能を有する。
障害報告（レベル１）は、対象システムの情報通信システム１１または１２で、訂正可能エラーが１個所で発生したことを示す。
障害報告（レベル２）は、対象システムの情報通信システム１１または１２で、訂正可能エラーが複数個所で発生したことを示す。
障害報告（レベル３）は、対象システムの情報通信システム１１または１２における訂正可能エラーの発生回数とＣＥ閾値部３１２に記憶された値（閾値）と一致したことを示す。
障害報告（レベル４）は、対象システムの情報通信システム１１または１２で、訂正不可能エラーが１個所で発生したことを示す。
障害報告（レベル５）は、対象システムの情報通信システム１１または１２で、訂正不可能エラーが複数個所で発生したことを示す。 FIG. 5 is a block diagram for executing penalty processing, and shows a connection relationship between the failure report generation unit 370 and the penalty generation unit 380. As described with reference to FIG. 2 in the first embodiment, the failure report generation unit 370 determines the importance of the failure from the outputs of the comparison unit 352 and the detection units 354 and 355, and assigns the failure level. Has the function of
The failure report (level 1) indicates that a correctable error has occurred in one place in the information communication system 11 or 12 of the target system.
The failure report (level 2) indicates that a plurality of correctable errors have occurred in the information communication system 11 or 12 of the target system.
The failure report (level 3) indicates that the number of occurrences of correctable errors in the information communication system 11 or 12 of the target system matches the value (threshold value) stored in the CE threshold unit 312.
The failure report (level 4) indicates that an uncorrectable error has occurred in one place in the information communication system 11 or 12 of the target system.
The failure report (level 5) indicates that uncorrectable errors have occurred in a plurality of places in the information communication system 11 or 12 of the target system.

違約金生成手段３８０は、障害報告生成手段３７０から障害報告レベルを受信し、テーブル記憶手段３９０から読み出したテーブルを参照して違約金の額を生成する機能を有する。 The penalty generation unit 380 has a function of receiving the failure report level from the failure report generation unit 370 and generating a penalty amount by referring to the table read from the table storage unit 390.

図６は、違約金額算出の動作を示すフローチャートである。違約金生成手段３８０は、障害報告生成手段３７０から受信した障害報告レベルとテーブル記憶手段３９０から読み出した障害ポイント算出テーブル３９１とに基づいて障害ポイント合計を算出する（ステップＡ１）。図７は、障害ポイント算出テーブル３９１、違約金額算出テーブル３９２の内容とこの２つのテーブルの関係とを示す図である。障害ポイント算出テーブル３９１は、障害報告内容から障害ポイントの合計を算出するためのテーブルで、障害報告内容に対応して１回当たりのポイント（障害が１回発生する場合のポイント）が予め決定されている。例えば、１回当たりのポイントは、障害報告（レベル１）の場合は０、障害報告（レベル３）の場合は５、障害報告（レベル４）の場合は１０と決められている。障害発生回数欄の値ｎは、障害報告生成手段３７０から送信される値であるが、実際には図２の記憶手段３１０のＣＥ発生回数累計部３１４、および、ＵＥ発生回数累計部３１５に記憶されていた値である。即ち、それぞれ、訂正可能エラーの発生回数の累計値、および、訂正不可能エラーの発生回数の累計値である。違約金生成手段３８０は、１回当たりのポイント欄の値に障害発生回数欄の値ｎを乗じて、障害ポイント小計欄の値を決定し、障害ポイント小計欄のすべての欄の合計を算出して、障害ポイント合計に書き込む。 FIG. 6 is a flowchart showing an operation for calculating the penalty amount. The penalty generation unit 380 calculates the total failure points based on the failure report level received from the failure report generation unit 370 and the failure point calculation table 391 read from the table storage unit 390 (step A1). FIG. 7 is a diagram showing the contents of the failure point calculation table 391 and the penalty amount calculation table 392 and the relationship between these two tables. The failure point calculation table 391 is a table for calculating the total number of failure points from the content of the failure report, and points per time (points when a failure occurs once) corresponding to the content of the failure report are determined in advance. ing. For example, the point per time is determined to be 0 for a failure report (level 1), 5 for a failure report (level 3), and 10 for a failure report (level 4). The value n in the failure occurrence number column is a value transmitted from the failure report generation unit 370, but is actually stored in the CE occurrence number accumulation unit 314 and the UE occurrence number accumulation unit 315 of the storage unit 310 in FIG. It was the value that was being done. That is, the total value of the number of occurrences of correctable errors and the total number of occurrences of uncorrectable errors, respectively. The penalty generation means 380 determines the value of the failure point subtotal column by multiplying the value of the point column per time by the value n of the failure occurrence frequency column, and calculates the sum of all the columns of the failure point subtotal column. To the total failure points.

違約金生成手段３８０は、算出した障害ポイント合計とテーブル記憶手段３９０から読み出した違約金額算出テーブル３９２とに基づいて違約金額を算出する（ステップＡ２）。仮に障害報告の内容を設定して、図７を参照して違約金算出を説明する。
障害報告（レベル３）が１回発生した場合、障害ポイント合計の値は５となり、違約金額は０円となる。
障害報告（レベル３）が２回発生した場合、障害ポイント合計の値は１０となり、違約金額は１００万円となる。
障害報告（レベル２）が２回、および、障害報告（レベル５）が１回発生した場合、障害報告（レベル２）の障害ポイント小計の値は６、障害報告（レベル５）の障害ポイント小計の値は１５になり、障害ポイント合計の値は２１となるため、違約金額は５００万円となる。 The penalty generation means 380 calculates a penalty amount based on the calculated failure point total and the penalty amount calculation table 392 read from the table storage unit 390 (step A2). The content of the trouble report is set, and penalty calculation will be described with reference to FIG.
When a failure report (level 3) occurs once, the total failure point value is 5 and the penalty amount is 0 yen.
When a failure report (level 3) occurs twice, the total failure point value is 10, and the penalty amount is 1 million yen.
When a failure report (level 2) occurs twice and a failure report (level 5) occurs once, the failure point subtotal value of the failure report (level 2) is 6, and the failure point subtotal of the failure report (level 5) The value of 15 is 15, and the total value of failure points is 21, so the penalty amount is 5 million yen.

第２の実施の形態では、障害報告のレベルと回数とに基づいて違約金額を算出する手段を有しているため、ユーザー側で違約金額を適正かつ容易に算出できるという効果がある。 In the second embodiment, since there is means for calculating the penalty amount based on the level and number of failure reports, there is an effect that the user can calculate the penalty amount appropriately and easily.

第１の実施の形態の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of 1st Embodiment. 障害処理装置の詳細構成図である。It is a detailed block diagram of a failure processing apparatus. 第１の実施の形態の動作を示すフローチャートである。It is a flowchart which shows operation | movement of 1st Embodiment. 第２の実施の形態の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of 2nd Embodiment. 第２の実施の形態の違約金処理を実施するブロック図である。It is a block diagram which implements the penalty processing of 2nd Embodiment. 第２の実施の形態の違約金額算出の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the penalty amount calculation of 2nd Embodiment. 障害ポイント算出テーブル、違約金額算出テーブルの内容とこの２つのテーブルの関係とを示す図である。It is a figure which shows the content of a failure point calculation table and a penalty amount calculation table, and the relationship between these two tables.

符号の説明Explanation of symbols

１０対象システム
１１、１２情報通信システム
２０情報通信ネットワーク
３０障害処理装置
３１０記憶手段
３１１ＣＥ発生回数部
３１２ＣＥ閾値部
３１３ＣＥ閾値有効ビット
３１４ＣＥ発生回数累計部
３１５ＵＥ発生回数累計部
３２１書込読出制御手段
３２２アドレス選択手段
３２５ＡＮＤ回路
３３１、３３２、３３４、３３５カウント回路
３４１、３４２、３４３、３４４、３４５選択手段
３５１非ゼロ検出手段
３５２比較手段
３５３反転回路
３５４、３５５、３５６検出手段
３６０システム停止通知手段
３７０障害報告生成手段
３８０違約金生成手段
３９０テーブル記憶手段
３９１障害ポイント算出テーブル
３９２違約金額算出テーブル
DESCRIPTION OF SYMBOLS 10 Target system 11, 12 Information communication system 20 Information communication network 30 Failure processing device 310 Storage means 311 CE occurrence number part 312 CE threshold part 313 CE threshold effective bit 314 CE occurrence number accumulation part 315 UE occurrence number accumulation part 321 Write read Control unit 322 Address selection unit 325 AND circuit 331, 332, 334, 335 Count circuit 341, 342, 343, 344, 345 Selection unit 351 Non-zero detection unit 352 Comparison unit 353 Inversion circuit 354, 355, 356 Detection unit 360 System stop Notification means 370 Fault report generation means 380 Penalty money generation means 390 Table storage means 391 Fault point calculation table 392 Penalty amount calculation table

Claims

情報通信システムと障害処理装置とを備え、
前記障害処理装置は、前記情報通信システムの特定個所で訂正不可能エラーが発生する前に発生する訂正可能エラー発生回数を計数する手段と、
計数値に基づいて閾値を設定する設定手段と、
前記特定個所を修復した後に、前記特定個所での訂正可能エラー発生回数が前記閾値と一致したときに前記情報通信システムに対して通知を行う通知手段とを有することを特徴とする障害処理システム。 An information communication system and a failure processing device;
The failure processing apparatus is configured to count a correctable error occurrence number that occurs before an uncorrectable error occurs at a specific location in the information communication system;
Setting means for setting a threshold based on the count value;
A failure processing system comprising: notification means for notifying the information communication system when the number of correctable errors occurring at the specific location matches the threshold value after repairing the specific location.

前記設定手段は、前記閾値として前記計数値から１を減じた値とすることを特徴とする請求項１記載の障害処理システム。 The failure processing system according to claim 1, wherein the setting unit sets a value obtained by subtracting 1 from the count value as the threshold value.

前記通知が前記情報通信システムに対するシステム停止指示であることを特徴とする請求項１または２記載の障害処理システム。 The failure processing system according to claim 1, wherein the notification is a system stop instruction for the information communication system.

前記通知手段は、訂正可能エラーが前記情報通信システムの複数個所で発生した場合に前記情報通信システムにシステム停止警告を発行することを特徴とする請求項１、２または３記載の障害処理システム。 4. The failure processing system according to claim 1, wherein said notifying means issues a system stop warning to said information communication system when a correctable error occurs at a plurality of locations of said information communication system.

前記障害処理装置は、障害報告生成手段とテーブル記憶手段と違約金生成手段とを有し、
障害報告生成手段は、前記情報通信システムで発生した障害を重要度に応じた障害レベルを付与して前記違約金生成手段に報告し、
前記テーブル記憶手段は、障害レベルと障害ポイントとの対応を規定する障害ポイント算出テーブルと障害ポイント合計と違約金額との対応を規定する違約金額算出テーブルとを格納し、
前記違約金生成手段は、前記障害報告生成手段から入手した障害報告と前記テーブル記憶手段から読み出した前記障害ポイント算出テーブルおよび前記違約金額算出テーブルとに基づいて発生障害に対応した違約金額を算出することを特徴とする請求項１、２、３または４記載の障害処理システム。 The failure processing apparatus includes a failure report generation unit, a table storage unit, and a penalty generation unit,
The failure report generation means gives a failure level according to the importance to the failure that occurred in the information communication system, and reports it to the penalty generation means,
The table storage means stores a failure point calculation table that defines the correspondence between the failure level and the failure point, and a penalty amount calculation table that defines the correspondence between the failure point total and the penalty amount,
The penalty generation means calculates a penalty amount corresponding to the fault that has occurred based on the failure report obtained from the failure report generation means, the failure point calculation table read from the table storage means, and the penalty amount calculation table. 5. The fault handling system according to claim 1, 2, 3 or 4.

障害処理装置が、情報通信システムの特定個所で訂正不可能エラーが発生する前に発生する訂正可能エラー発生回数を計数するステップと、
前記障害処理装置が、計数値に基づいて閾値を設定するステップと、
前記障害処理装置が、前記特定個所を修復した後に、前記特定個所での訂正可能エラー発生回数が前記閾値と一致したときに前記情報通信システムに対して通知を行うステップとを有することを特徴とする障害処理方法。 A step in which the failure processing apparatus counts the number of correctable errors that occur before an uncorrectable error occurs at a specific location in the information communication system;
The failure processing device sets a threshold based on a count value;
The failure processing apparatus comprises a step of notifying the information communication system when the number of correctable errors occurring at the specific location matches the threshold value after repairing the specific location. Failure handling method.

前記閾値として前記計数値から１を減じた値とすることを特徴とする請求項６記載の障害処理方法。 The failure processing method according to claim 6, wherein the threshold is a value obtained by subtracting 1 from the count value.

前記通知が前記情報通信システムに対するシステム停止指示であることを特徴とする請求項６または７記載の障害処理方法。 8. The failure processing method according to claim 6, wherein the notification is a system stop instruction for the information communication system.

前記障害処理装置が、訂正可能エラーが前記情報通信システムの複数個所で発生した場合に前記情報通信システムにシステム停止警告を発行するステップを有することを特徴とする請求項６、７または８記載の障害処理方法。 9. The step according to claim 6, 7 or 8, wherein the failure processing apparatus has a step of issuing a system stop warning to the information communication system when a correctable error occurs in a plurality of locations of the information communication system. Failure handling method.

前記障害処理装置が、前記情報通信システムで発生した障害を重要度に応じた障害レベルを付与して障害報告を生成するステップと、
前記障害処理装置が、障害レベルと障害ポイントとの対応を規定する障害ポイント算出テーブルと障害ポイント合計と違約金額との対応を規定する違約金額算出テーブルとをテーブル記憶手段に格納するステップと、
前記障害処理装置が、前記障害報告と前記テーブル記憶手段から読み出した前記障害ポイント算出テーブルおよび前記違約金額算出テーブルとに基づいて発生障害に対応した違約金額を算出するステップとを有することを特徴とする請求項６、７、８または９記載の障害処理方法。 The failure processing device generates a failure report by assigning a failure level according to importance to a failure that has occurred in the information communication system;
The failure processing apparatus stores in a table storage means a failure point calculation table that defines a correspondence between a failure level and a failure point, and a penalty amount calculation table that defines a correspondence between a failure point total and a penalty amount;
The failure processing apparatus includes a step of calculating a penalty amount corresponding to a failure that has occurred based on the failure report and the failure point calculation table and the penalty amount calculation table read from the table storage unit. The failure processing method according to claim 6, 7, 8, or 9.

情報通信システムに接続された障害処理装置であって、
前記情報通信システムの特定個所で訂正不可能エラーが発生する前に発生する訂正可能エラー発生回数を計数する手段と、
計数値に基づいて閾値を設定する設定手段と、
前記特定個所を修復した後に、前記特定個所での訂正可能エラー発生回数が前記閾値と一致したときに前記情報通信システムに対して通知を行う通知手段とを有することを特徴とする障害処理装置。 A failure processing apparatus connected to an information communication system,
Means for counting the number of correctable errors occurring before an uncorrectable error occurs at a specific location of the information communication system;
Setting means for setting a threshold based on the count value;
A failure processing apparatus comprising: a notification unit configured to notify the information communication system when the number of correctable errors occurring at the specific location matches the threshold value after repairing the specific location.

前記設定手段は、前記閾値として前記計数値から１を減じた値とすることを特徴とする請求項１１記載の障害処理装置。 12. The failure processing apparatus according to claim 11, wherein the setting unit sets a value obtained by subtracting 1 from the count value as the threshold value.

前記通知が前記情報通信システムに対するシステム停止指示であることを特徴とする請求項１１または１２記載の障害処理装置。 The failure processing apparatus according to claim 11, wherein the notification is a system stop instruction for the information communication system.

前記通知手段は、訂正可能エラーが前記情報通信システムの複数個所で発生した場合に前記情報通信システムにシステム停止警告を発行することを特徴とする請求項１１、１２または１３記載の障害処理装置。 14. The failure processing apparatus according to claim 11, wherein the notifying unit issues a system stop warning to the information communication system when a correctable error occurs in a plurality of locations of the information communication system.

障害報告生成手段とテーブル記憶手段と違約金生成手段とを有し、
障害報告生成手段は、前記情報通信システムで発生した障害を重要度に応じた障害レベルを付与して前記違約金生成手段に報告し、
前記テーブル記憶手段は、障害レベルと障害ポイントとの対応を規定する障害ポイント算出テーブルと障害ポイント合計と違約金額との対応を規定する違約金額算出テーブルとを格納し、
前記違約金生成手段は、前記障害報告生成手段から入手した障害報告と前記テーブル記憶手段から読み出した前記障害ポイント算出テーブルおよび前記違約金額算出テーブルとに基づいて発生障害に対応した違約金額を算出することを特徴とする請求項１１、１２、１３または１４記載の障害処理装置。 A failure report generation means, a table storage means, and a penalty generation means;
The failure report generation means gives a failure level according to the importance to the failure that occurred in the information communication system, and reports it to the penalty generation means,
The table storage means stores a failure point calculation table that defines the correspondence between the failure level and the failure point, and a penalty amount calculation table that defines the correspondence between the failure point total and the penalty amount,
The penalty generation means calculates a penalty amount corresponding to the fault that has occurred based on the failure report obtained from the failure report generation means, the failure point calculation table read from the table storage means, and the penalty amount calculation table. 15. The failure processing apparatus according to claim 11, 12, 13 or 14.

情報通信システムの特定個所で訂正不可能エラーが発生する前に発生する訂正可能エラー発生回数を計数する処理と、
計数値に基づいて閾値を設定する処理と、
前記特定個所を修復した後に、前記特定個所での訂正可能エラー発生回数が前記閾値と一致したときに前記情報通信システムに対して通知を行う処理とをコンピュータに実行させることを特徴とするプログラム。 A process of counting the number of correctable errors occurring before an uncorrectable error occurs at a specific location in the information communication system;
Processing to set a threshold based on the count value;
A program for causing a computer to execute a process of notifying the information communication system when the number of correctable errors occurring at the specific location matches the threshold value after repairing the specific location.

前記閾値として前記計数値から１を減じた値とすることを特徴とする請求項１６記載のプログラム。 The program according to claim 16, wherein the threshold is a value obtained by subtracting 1 from the count value.

前記通知が前記情報通信システムに対するシステム停止指示であることを特徴とする請求項１６または１７記載のプログラム。 The program according to claim 16 or 17, wherein the notification is a system stop instruction to the information communication system.

訂正可能エラーが前記情報通信システムの複数個所で発生した場合に前記情報通信システムにシステム停止警告を発行する処理をコンピュータに実行させることを特徴とする請求項１６、１７または１８記載のプログラム。 19. The program according to claim 16, 17 or 18, which causes a computer to execute a process of issuing a system stop warning to the information communication system when a correctable error occurs in a plurality of locations of the information communication system.

前記情報通信システムで発生した障害を重要度に応じた障害レベルを付与して障害報告を生成する処理と、
障害レベルと障害ポイントとの対応を規定する障害ポイント算出テーブルと障害ポイント合計と違約金額との対応を規定する違約金額算出テーブルとをテーブル記憶手段に格納する処理と、
前記障害報告と前記テーブル記憶手段から読み出した前記障害ポイント算出テーブルおよび前記違約金額算出テーブルとに基づいて発生障害に対応した違約金額を算出する処理とをコンピュータに実行させることを特徴とする請求項１６、１７、１８または１９記載のプログラム。
A process for generating a failure report by assigning a failure level corresponding to the importance to a failure that has occurred in the information communication system;
Processing for storing in the table storage means a failure point calculation table that defines the correspondence between the failure level and the failure point, and a penalty amount calculation table that defines the correspondence between the failure point total and the penalty amount;
The computer is configured to cause the computer to execute processing for calculating a penalty amount corresponding to the failure that has occurred based on the failure report and the failure point calculation table and the penalty amount calculation table read from the table storage unit. The program according to 16, 17, 18 or 19.