JP3399996B2

JP3399996B2 - Information processing system

Info

Publication number: JP3399996B2
Application number: JP32668892A
Authority: JP
Inventors: 満弘溝口; 寿男樋口
Original assignee: Hitachi Ltd; Hitachi Electronics Services Co Ltd
Current assignee: Hitachi Ltd; Hitachi Electronics Services Co Ltd
Priority date: 1992-12-07
Filing date: 1992-12-07
Publication date: 2003-04-28
Anticipated expiration: 2018-04-28
Also published as: JPH06175939A

Description

【発明の詳細な説明】【０００１】【産業上の利用分野】本発明は、情報処理システムに関
し、特に、大型計算機システムの外部記憶サブシステム
などにおける機器および可搬性記憶媒体を含めた予防保
守などに適用して有効な技術に関する。【０００２】【従来の技術】たとえば、大型計算機システムや、当該
システムの配下で稼動する磁気テープサブシステムなど
においては、システムにおける障害の発生を通信回線を
介して遠隔地の保守センタなどにおいて集中的に監視お
よび管理することにより、保守管理要員の削減や、障害
復旧の迅速化を図ることが知られている。【０００３】ところで、このようなシステムの保守管理
では、たとえば、障害情報の収集および編集出力を定期
的（１回／月）に行う障害支援プログラムを起動するこ
とにより、予防保守を行うことが考えられる。【０００４】また、他の技術としては、たとえば、特開
昭６２−２１０５４９号公報に開示される情報処理シス
テムの診断方式が知られている。当該技術では、システ
ムを構成する各装置の支配関係を示すテーブルを設け、
より上位の装置から下位側へと順に診断を開始して、当
該装置に異常がない場合に、さらに下位の装置の診断を
継続することにより、目的の装置までの診断を可能にし
ようとするものである。【０００５】【発明が解決しようとする課題】ところが、上記の前者
の従来技術の場合には、個々の機器に関する出力結果か
ら、さらに熟練した技術者が保守作業の要／不要などを
判断する必要があり、タイムリーな障害の予防保守が困
難であった。【０００６】また、後者の従来技術の場合には、機器レ
ベルでの診断にはそれなりの効果が得られるものの、磁
気テープなどの可搬性記憶媒体を含めたサブシステム全
体の予防保守については配慮されていなかった。【０００７】本発明の目的は、装置のみならず可搬性記
憶媒体を含めたシステム全体のタイムリーな予防保守を
実現することが可能な情報処理システムを提供すること
にある。【０００８】本発明の他の目的は、不良な可搬性記憶媒
体の指摘や入出力装置の故障箇所の指摘などの詳細な障
害解析および予防保守を短時間に的確に行うことが可能
な情報処理システムを提供することにある。【０００９】本発明の前記ならびにその他の目的と新規
な特徴は、本明細書の記述および添付図面から明らかに
なるであろう。【００１０】【課題を解決するための手段】本願において開示される
発明のうち、代表的なものの概要を簡単に説明すれば、
以下の通りである。【００１１】本発明は、サービスプロセッサを内蔵した
入出力制御装置と、この入出力制御装置に接続され、前
記入出力制御装置を介して上位装置との間で授受される
情報の可搬性記憶媒体に対する記録／再生動作を行う入
出力装置と、前記サービスプロセッサと保守センタとを
接続する通信手段とからなる情報処理システムであっ
て、前記入出力装置における前記記録／再生動作のエラ
ー情報を監視し、個々の前記入出力装置の識別情報およ
び個々の前記可搬性記憶媒体の識別情報およびある時刻
を基準に所定時間、各種エラー毎に閾値管理を行ってい
る情報である経時変化情報を含む形式で蓄積するエラー
情報管理手段と、このエラー情報管理手段に蓄積された
前記エラー情報に基づいて、情報処理システムの運用中
に前記入出力装置の１つにおいて一定の閾値を超えるエ
ラーが発生したことを契機として、当該一定の閾値を超
えるエラーが前記１つの入出力装置以外の入出力装置の
運用下においても生じているかどうかを判定することに
よって、個々の前記入出力装置および前記可搬性記憶媒
体のいずれがエラー要因かを判別し、かつ不良の前記入
出力装置または前記可搬性記憶媒体を特定する制御論理
とを備えるようにしたものである。【００１２】【作用】上記した本発明の情報処理システムによれば、
エラー情報管理手段において、個々の入出力装置の識別
情報および個々の可搬性記憶媒体の識別情報およびある
時刻を基準に所定時間、各種エラー毎に閾値管理を行っ
ている情報である経時変化情報を含む形式でエラー情報
が蓄積されているので、たとえば、経時変化情報とし
て、個々の入出力装置および個々の可搬性記憶媒体毎の
記録／再生の各々の動作毎にエラーバイト数の累積値を
採ることにより、磁気テープサブシステムなどにおいて
は、任意の磁気テープデッキにおける任意の磁気テープ
媒体の処理中にデータチェック系のエラーが発生した
時、たとえば、当該磁気テープデッキにおいて、異なる
磁気テープ媒体で連続してエラーが発生したか、あるい
は、当該磁気テープ媒体が異なる磁気テープデッキにお
いてエラーとなったかを調べる、という判定アルゴリズ
ムを制御論理が実行することにより、ヘッド汚れを含む
磁気テープデッキ起因の障害と磁気テープ媒体起因の障
害の弁別、さらには障害となった磁気テープデッキや磁
気テープ媒体の特定などを迅速かつ的確に行うことが可
能となり、装置のみならず可搬性記憶媒体を含めたシス
テム全体のタイムリーな予防保守を実現することができ
る。【００１３】【実施例】以下、本発明の一実施例である情報処理シス
テムを図面を参照しながら詳細に説明する。【００１４】図１は本実施例の情報処理システムを含む
遠隔保守支援システムの構成の一例を示す概念図であ
る。【００１５】なお、本実施例では、情報処理システムの
一例として磁気テープサブシステムの場合を例に採って
説明する。【００１６】磁気テープ制御装置１は、複数のメインプ
ロセッサ４（ＭＰ０，ＭＰ１）、サービスプロセッサ２
（ＳＶＰ）、当該サービスプロセッサ２の制御プログラ
ムおよび後述のような各種テーブルからなる障害情報を
格納するフロッピィディスクドライブ３などを含んでい
る。【００１７】磁気テープ制御装置１の配下には、可搬性
の磁気テープ（ＶＯＬ）を記憶媒体とする複数台の磁気
テープデッキ５（ＭＴＵ）が接続され、上位側にはホス
トコンピュータ９が接続されている。そして、磁気テー
プ制御装置１は、ホストコンピュータ９からのコマンド
に基づいて、磁気テープデッキ５の動作を制御すること
により、当該ホストコンピュータ９との間で授受される
データの磁気テープに対する記録（ＷＲ）および再生
（ＲＤ）動作を行う。【００１８】サービスプロセッサ２は、通信回線６を介
して保守センタ８に接続されている。また、本実施例の
ように、複数の磁気テープ制御装置１および配下の磁気
テープデッキ５の組み合わせを備えた構成の場合には、
各磁気テープ制御装置１は、たとえばＲＳ４２２インタ
ーフェイス７を介して相互に接続されている。【００１９】そして、磁気テープデッキ５や磁気テープ
制御装置１などで発生した障害情報をサービスプロセッ
サ２でロギング（監視および記録）し、さらに後述のよ
うな制御論理によって解析および閾値管理し、障害部位
や障害媒体などの情報を保守センタ８に自動通報するこ
とにより、予防保守を実現する。【００２０】以下、本実施例の磁気テープサブシステム
における作用の一例を詳細に説明する。【００２１】図２および図３は本実施例における障害判
定の制御論理の作用の一例を示すフローチャートであ
り、図４，図５，図６，図７および図８は、各種障害情
報を管理蓄積および判別するためのテーブルの一例を示
している。【００２２】図４は、推定ＶＯＬ不良切り分けテーブル
５１であり、個々の磁気テープデッキ５毎に一つずつ設
けられている。パーマネントデータチェック（回復不能
のエラー）またはリカバリデータチェック（回復可能な
エラー）カウントオーバが発生した後、次のＶＯＬでエ
ラー無しであったＶＯＬをロギングし、ある期間中ＶＯ
Ｌの不良判定をペンディング（保留）しておき、その間
の事象によつて磁気テープデッキ５の不良によるもの
か、ＶＯＬ不良によるものかを弁別するために用いられ
る。【００２３】図５は、判定ペンディングテーブル５２で
あり、個々の磁気テープデッキ５毎に一つずつ設けられ
ている。エラー判定を保留しているＶＯＬをロギングす
る。【００２４】図６は、エラー来歴テーブル５３であり、
磁気テープ制御装置１の配下の全磁気テープデッキ５に
共通に設けられている。主に、ＶＯＬの不良判定に用い
られる。【００２５】図７は、ＮＧＶＯＬテーブル５４であり、
磁気テープ制御装置１の配下の全磁気テープデッキ５に
共通に設けられている。ＶＯＬ不良判定により不良と判
定したＶＯＬをロギングする。【００２６】図８は、２ＶＯＬＮＧテーブル５５であ
り、個々の磁気テープデッキ５毎に一つずつ設けられて
いる。特定の磁気テープデッキ５において二つのＶＯＬ
で連続してエラーが発生し、当該磁気テープデッキ５を
不良と判定した場合に、当該二つのＶＯＬをロギングす
る。【００２７】任意の磁気テープデッキ５におけるＶＯＬ
のアンロード時、サービスプロセッサ２は、たとえば、
ＷＲバイト数、ＷＲブロックカウント、ＲＤバイト数、
ＲＤブロックカウント、ＲＤ／ＷＲにおけるリカバリデ
ータチェックの発生回数、などの統計情報を受領し（ス
テップ１０）、リカバリデータチェックが１ＶＯＬ単位
の閾値オーバか否かを判定し（ステップ１１）、オーバ
と判明した場合には、後述の図３のステップ２４に移行
する。【００２８】一方、閾値オーバでない場合には、処理１
２を実行する。すなわち、まず、当該磁気テープデッキ
５において一つ前のＶＯＬ処理でエラーがあったか否か
を判定し（ステップ１２ａ）、あった場合には、一つ前
のＶＯＬ自体がエラーと推定されるため、判定ペンディ
ングテーブル５２から推定ＶＯＬ不良切り分けテーブル
５１への当該ＶＯＬの登録を行う（ステップ１２ｂ）。
さらに推定ＶＯＬ不良切り分けテーブル５１の登録ＶＯ
Ｌ件数が閾値をオーバしたか否かを判定し（ステップ１
２ｃ）、オーバした時は当該磁気テープデッキ５の不良
として、保守センタ８に通報する（ステップ１２ｅ）。【００２９】オーバしていない場合には、判定ペンディ
ングテーブル５２の削除を行い（ステップ１２ｄ）、そ
の後、エラーカウント、処理バイト数の積算を行い、エ
ラー発生率を解析する（ステップ１３）。【００３０】本実施例の場合、各エラーカウントは、１
日単位で１ヵ月（３０日）分蓄積しており、当該蓄積値
に対して、１日の閾値をオーバしたか（ステップ１
４）、７日の閾値をオーバしたか（ステップ１９）、３
０日の閾値をオーバしたか（ステップ２１）、を監視
し、それぞれの免責条件判定であるステップ１５、ステ
ップ１６、ステップ２０、ステップ２２を満たしていれ
ば、当該磁気テープデッキ５の不良と判定し、当該磁気
テープデッキ５に対するヘッドクリーニング指示の表示
を行い（ステップ１７）、障害対策要の為の自動通報を
保守センタ８に行って（ステップ１８）、一回のＶＯＬ
のアンロード時における処理を終了する（ステップ２
３）。【００３１】図３は、前記ステップ１１において、当該
磁気テープデッキ５におけるパーマネントデータチェッ
ク発生時、またはリカバリデータチェックが発生した時
のＶＯＬ不良またはＭＴＵ不良の弁別処理の一例を示し
ている。【００３２】まず、パーマネントデータチェックまたは
リカバリデータチェックの閾値オーバが発生したと判明
した時（ステップ２４）、当該ＶＯＬと同じＶＯＬが磁
気テープ制御装置１の配下の全磁気テープデッキ５に関
して、以前にステップ２４のエラーが発生したＶＯＬを
登録しているエラー来歴テーブル５３にあるか否かを調
べ（ステップ２８）、あった場合には、同一ＶＯＬで２
回連続してエラーが発生したことにより、当該ＶＯＬを
不良と判定してＮＧＶＯＬテーブル５４に登録する（ス
テップ２９）。【００３３】ステップ２８においてエラー来歴テーブル
５３に同一ＶＯＬがなかった場合には、当該磁気テープ
デッキ５の判定ペンディングテーブル５２のＶＯＬの有
無を調べ（ステップ３０）、あった場合には、同一の磁
気テープデッキ５において異なるＶＯＬで連続してエラ
ーが発生していることになり、当該磁気テープデッキ５
の不良と判定して、保守センタ８に通報する（ステップ
３１）。なお、連続してＶＯＬ自体の不良に起因するエ
ラーが発生したとも推定されるので、２ＶＯＬＮＧテー
ブル５５にも登録して（ステップ３２）、ＶＯＬ不良の
判定の参考にする。【００３４】判定ペンディングテーブル５２にＶＯＬ無
しの時は、本エラーは当該磁気テープデッキ５において
最初に発生したエラーであるため、当該ＶＯＬのエラー
来歴テーブル５３への登録（ステップ３３）、判定ペン
ディングテーブル５２への登録（ステップ３４）、を実
行し、次回以降のエラー発生時のＶＯＬ／ＭＴＵ不良の
弁別判定に使用する。【００３５】なお、本実施例の場合には、ＶＯＬ不良お
よびＭＴＵ不良の判定結果に関係無く、エラーの発生の
都度、保守センタ８に連絡すべきか否か（ＯＮの時には
エラーの都度、保守センタ８に通報する）を指定する即
通報フラグが設けられており、当該即通報フラグがＯＮ
に設定されているか否かを調べ（ステップ２５）、ＯＮ
の場合には、保守センタ８に通報する（ステップ２
７）。【００３６】なお、推定ＶＯＬ不良切り分けテーブル５
１，判定ペンディングテーブル５２，エラー来歴テーブ
ル５３，ＮＧＶＯＬテーブル５４，２ＶＯＬＮＧテーブ
ル５５の各々は、随時、保守センタ８やサービスプロセ
ッサ２において参照可能であり、不良ＶＯＬの早期摘出
などの処置が可能となる。【００３７】以上本発明者によってなされた発明を実施
例に基づき具体的に説明したが、本発明は前記実施例に
限定されるものではなく、その要旨を逸脱しない範囲で
種々変更可能であることはいうまでもない。【００３８】たとえば、可搬性記憶媒体としては、磁気
テープに限らず、光ディスクなどの一般の可搬性記憶媒
体を用いる情報処理システムに広く適用できる。【００３９】【発明の効果】本願において開示される発明のうち、代
表的なものによって得られる効果を簡単に説明すれば、
以下のとおりである。【００４０】すなわち、本発明の情報処理システムによ
れば、エラー情報管理手段に掌握されているエラー情報
に基づいて、たとえばヘッド汚れなどの経時変化要因を
含む入出力装置エラーと、媒体不良によるエラーとを弁
別できる制御論理を備えているので、常時、システムの
障害状況を監視することで、タイムリーな予防保守を実
現でき、情報処理システムの信頼性および保守性が向上
する、という効果が得られる。【００４１】また、各入出力装置毎に回復不能／可能エ
ラーの発生を監視・掌握し、エラー発生率を閾値管理す
るので、どの入出力装置のどの部位（たとえば、ＲＤ系
回路ユニット／ＷＲ系回路ユニット単位）で障害が発生
したかを特定できるため、障害解析および予防保守の所
要時間を短縮することができる。【００４２】また、個々の可搬性記憶媒体毎に障害傾向
の監視・掌握を行うことにより、不良の可搬性記憶媒体
が即判明し、障害解析および予防保守の所要時間を短縮
することができる。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing system, and more particularly, to preventive maintenance including devices and portable storage media in an external storage subsystem of a large-scale computer system. To apply to effective technology. 2. Description of the Related Art For example, in a large-scale computer system or a magnetic tape subsystem operating under the system, the occurrence of a failure in the system is centralized in a remote maintenance center or the like via a communication line. It is known that by monitoring and managing the data, the number of maintenance management personnel can be reduced and the speed of recovery from a failure can be increased. In the maintenance management of such a system, for example, it is conceivable to perform preventive maintenance by activating a failure support program for periodically (once / month) collecting and editing failure information. Can be [0004] As another technique, for example, a diagnostic method of an information processing system disclosed in Japanese Patent Application Laid-Open No. 62-210549 is known. In this technology, a table indicating the dominant relationship of each device constituting the system is provided,
Diagnosis is started in order from a higher-level device to a lower-level device, and when there is no abnormality in the relevant device, diagnosis of the target device is enabled by continuing diagnosis of a lower-level device. It is. [0005] However, in the case of the former prior art, it is necessary for a more skilled technician to judge the necessity / unnecessity of maintenance work from the output results of individual devices. It was difficult to perform preventive maintenance for timely failures. Further, in the case of the latter conventional technique, although a certain effect can be obtained in the diagnosis at the equipment level, the preventive maintenance of the entire subsystem including the portable storage medium such as the magnetic tape is considered. I didn't. An object of the present invention is to provide an information processing system capable of realizing timely preventive maintenance of the entire system including not only the apparatus but also a portable storage medium. Another object of the present invention is to provide an information processing apparatus capable of performing detailed failure analysis and preventive maintenance in a short time and accurately, such as indicating a defective portable storage medium and indicating a failure point of an input / output device. It is to provide a system. The above and other objects and novel features of the present invention will become apparent from the description of the present specification and the accompanying drawings. Means for Solving the Problems Of the inventions disclosed in the present application, the outline of a representative one will be briefly described.
It is as follows. The present invention relates to an input / output control device having a built-in service processor, and a portable storage medium for information connected to the input / output control device and exchanged with a higher-level device via the input / output control device. the information processing system comprising a communication means for connecting the input and output apparatus for recording / reproducing operation and maintenance center to the service processor for met
Te, the recording / monitors error information reproducing operation, a predetermined time to the identification information and the reference to a certain time of the identification information and individual said portability storage medium of each of the input-output device in the output device, various errors and error information management means for storing in a form including a temporal change information is information doing threshold management for each, stored in the error information managing means
Based on the error information, during operation of the information processing system
In one of the input / output devices, an error exceeding a certain threshold
Exceeds a certain threshold value due to the occurrence of an error
Error of an I / O device other than the one I / O device
Determining whether this has occurred even during operation
Accordingly, and as provided either it is determined whether an error factor, and said input-output device or control logic the portability storage that identifies the media system of failure of individual said output device and the portability storage medium Things. According to the information processing system of the present invention described above,
In the error information management unit, aging information is information that is performed by the predetermined time based on the identification information and the certain time of the identification information and individual portable storage medium of the individual input and output devices, a threshold management for each various errors The error information is stored in a format that includes the following. For example, the accumulated value of the number of error bytes for each recording / reproducing operation for each input / output device and each portable storage medium is stored as time-dependent change information. By adopting this method, in a magnetic tape subsystem or the like, when an error occurs in a data check system during processing of an arbitrary magnetic tape medium in an arbitrary magnetic tape deck, for example, when a different magnetic tape medium is used in the magnetic tape deck, Whether an error has occurred consecutively or whether the magnetic tape medium has caused an error in a different magnetic tape deck The control logic executes a judgment algorithm to check for discrimination between failures caused by the magnetic tape deck, including head contamination, and failures caused by the magnetic tape medium, and further identifies the failed magnetic tape deck or magnetic tape medium. Can be performed quickly and accurately, and timely preventive maintenance of the entire system including not only the devices but also the portable storage medium can be realized. An information processing system according to an embodiment of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a conceptual diagram showing an example of the configuration of a remote maintenance support system including the information processing system of the present embodiment. In this embodiment, a magnetic tape subsystem will be described as an example of an information processing system. The magnetic tape controller 1 includes a plurality of main processors 4 (MP0, MP1), a service processor 2
(SVP), a floppy disk drive 3 for storing a control program of the service processor 2 and fault information composed of various tables as described later. Under the control of the magnetic tape controller 1, a plurality of magnetic tape decks 5 (MTU) using a portable magnetic tape (VOL) as a storage medium are connected, and a host computer 9 is connected to the upper side. ing. Then, the magnetic tape controller 1 controls the operation of the magnetic tape deck 5 based on a command from the host computer 9 to record data (WR) transferred to and from the host computer 9 on the magnetic tape. ) And playback (RD) operation. The service processor 2 is connected to a maintenance center 8 via a communication line 6. Further, in the case of a configuration including a combination of a plurality of magnetic tape control devices 1 and a subordinate magnetic tape deck 5 as in the present embodiment,
The respective magnetic tape control devices 1 are mutually connected via, for example, an RS422 interface 7. The service processor 2 logs (monitors and records) fault information generated in the magnetic tape deck 5 and the magnetic tape control device 1 and the like, and further analyzes and manages thresholds by control logic as described later, and The preventive maintenance is realized by automatically notifying the maintenance center 8 of the information such as the information and the failure medium. Hereinafter, an example of the operation of the magnetic tape subsystem of this embodiment will be described in detail. FIGS. 2 and 3 are flow charts showing an example of the operation of the control logic for fault determination in this embodiment. FIGS. 4, 5, 6, 7 and 8 show various types of fault information managed and stored. And an example of a table for discriminating. FIG. 4 shows an estimated VOL failure isolation table 51, one for each magnetic tape deck 5. After a permanent data check (unrecoverable error) or recovery data check (recoverable error) count-over has occurred, the next VOL is logged for the VOL that was error-free, and the VOL for a certain period of time is logged.
The determination of L is pending (pending), and is used for discriminating whether an event during this time is due to a failure of the magnetic tape deck 5 or a VOL failure. FIG. 5 shows a judgment pending table 52, one for each magnetic tape deck 5. The VOL for which error determination is suspended is logged. FIG. 6 is an error history table 53,
It is provided commonly to all the magnetic tape decks 5 under the control of the magnetic tape controller 1. It is mainly used for VOL failure determination. FIG. 7 shows an NGVOL table 54.
It is provided commonly to all the magnetic tape decks 5 under the control of the magnetic tape controller 1. The VOL determined to be defective by the VOL defect determination is logged. FIG. 8 shows a 2 VOLNG table 55, one for each magnetic tape deck 5. Two VOLs in a specific magnetic tape deck 5
In the case where an error has occurred consecutively and the magnetic tape deck 5 is determined to be defective, the two VOLs are logged. VOL in an optional magnetic tape deck 5
At the time of unloading, for example, the service processor 2
WR byte count, WR block count, RD byte count,
Statistical information such as the RD block count and the number of occurrences of the recovery data check in the RD / WR is received (step 10), and it is determined whether the recovery data check exceeds a threshold of 1 VOL unit (step 11), and it is determined that the data is over. If so, the process proceeds to step 24 in FIG. 3 described below. On the other hand, if the threshold is not exceeded, processing 1
Execute Step 2. That is, first, it is determined whether or not there was an error in the previous VOL processing in the magnetic tape deck 5 (step 12a). If there was, the previous VOL itself is estimated to be an error. The VOL is registered from the determination pending table 52 to the estimated VOL failure isolation table 51 (step 12b).
Further, the registered VOs in the estimated VOL failure isolation table 51
It is determined whether or not the number of L exceeds the threshold (step 1).
2c) If it is over, the maintenance center 8 is notified that the magnetic tape deck 5 is defective (step 12e). If it is not over, the judgment pending table 52 is deleted (step 12d), and thereafter, the error count and the number of processing bytes are integrated, and the error occurrence rate is analyzed (step 13). In this embodiment, each error count is 1
Accumulated for one month (30 days) in day units, and whether the accumulated value exceeds the daily threshold (step 1)
4) Whether the 7th day threshold was exceeded (step 19), 3
It is monitored whether the threshold value on day 0 has been exceeded (step 21). If the respective exemption condition determinations of step 15, step 16, step 20, and step 22 are satisfied, it is determined that the magnetic tape deck 5 is defective. Then, a head cleaning instruction is displayed on the magnetic tape deck 5 (step 17), and an automatic notification for troubleshooting is sent to the maintenance center 8 (step 18).
Ends the processing when unloading (step 2
3). FIG. 3 shows an example of a discrimination process of a VOL defect or an MTU defect when a permanent data check or a recovery data check occurs in the magnetic tape deck 5 in the step 11. First, when it is determined that the threshold of the permanent data check or the recovery data check has been exceeded (step 24), the same VOL as the relevant VOL has previously been assigned to all the magnetic tape decks 5 under the control of the magnetic tape controller 1. It is checked whether or not the VOL in which the error occurred in step 24 is registered in the error history table 53 (step 28).
Since the error has occurred consecutively, the VOL is determined to be defective and registered in the NGVOL table 54 (step 29). If there is no identical VOL in the error history table 53 in step 28, it is checked whether or not there is a VOL in the determination pending table 52 of the magnetic tape deck 5 (step 30). This means that an error has occurred continuously in different VOLs in the tape deck 5, and
Is determined to be defective, and the maintenance center 8 is notified (step 31). In addition, since it is also estimated that an error caused by the defect of the VOL itself has occurred continuously, it is also registered in the 2VOLNG table 55 (step 32), which is used as a reference for the judgment of the VOL defect. When there is no VOL in the judgment pending table 52, this error is the first error that occurred in the magnetic tape deck 5, so that the VOL is registered in the error history table 53 (step 33), and the judgment pending table 52 (step 34), and is used for discrimination determination of a VOL / MTU failure at the time of the next and subsequent errors. In the case of this embodiment, whether or not to contact the maintenance center 8 every time an error occurs (irrespective of the determination result of the VOL failure and the MTU failure) 8) is provided, and the immediate report flag is ON.
Is checked (step 25) and ON
In the case of, the maintenance center 8 is notified (step 2
7). The estimated VOL failure isolation table 5
1, the determination pending table 52, the error history table 53, the NGVOL table 54, and the 2VOLNG table 55 can be referred to at any time in the maintenance center 8 and the service processor 2, and measures such as early extraction of a defective VOL can be performed. . Although the invention made by the inventor has been specifically described based on the embodiments, the present invention is not limited to the above-described embodiments, and can be variously modified without departing from the gist thereof. Needless to say. For example, the portable storage medium is not limited to a magnetic tape but can be widely applied to an information processing system using a general portable storage medium such as an optical disk. The effects obtained by the representative inventions among the inventions disclosed in the present application will be briefly described.
It is as follows. That is, according to the information processing system of the present invention, based on the error information controlled by the error information management means, an input / output device error including a temporal change factor such as head contamination, and an error due to a medium defect are detected. The control logic is capable of discriminating between the system and the system. By monitoring the system failure status at all times, timely preventive maintenance can be realized and the reliability and maintainability of the information processing system can be improved. Can be Further, since the occurrence of an unrecoverable / possible error is monitored and controlled for each input / output device and the error occurrence rate is managed as a threshold value, which part of which input / output device (for example, RD system circuit unit / WR system) Since it is possible to specify whether a failure has occurred in each circuit unit), the time required for failure analysis and preventive maintenance can be reduced. Further, by monitoring and controlling the tendency of failure for each portable storage medium, a defective portable storage medium can be immediately identified, and the time required for failure analysis and preventive maintenance can be reduced.

【図面の簡単な説明】【図１】本発明の一実施例である情報処理システムを含
む遠隔保守支援システムの構成の一例を示す概念図であ
る。【図２】本発明の一実施例である情報処理システムにお
ける障害判定の制御論理の作用の一例を示すフローチャ
ートである。【図３】本発明の一実施例である情報処理システムにお
ける障害判定の制御論理の作用の一例を示すフローチャ
ートである。【図４】エラー情報管理手段の一例である推定ＶＯＬ不
良切り分けテーブルの構成の一例を示す概念図である。【図５】エラー情報管理手段の一例である判定ペンディ
ングテーブルの構成の一例を示す概念図である。【図６】エラー情報管理手段の一例であるエラー来歴テ
ーブルの構成の一例を示す概念図である。【図７】エラー情報管理手段の一例であるＮＧＶＯＬテ
ーブルの構成の一例を示す概念図である。【図８】エラー情報管理手段の一例である２ＶＯＬＮＧ
テーブルの構成の一例を示す概念図である。【符号の説明】１磁気テープ制御装置（入出力制御装置）２サービスプロセッサ３フロッピィディスクドライブ（エラー情報管理手
段）４メインプロセッサ５磁気テープデッキ（入出力装置）６通信回線（通信手段）７ＲＳ４２２インターフェイス８保守センタ９ホストコンピュータ（上位装置）５１推定ＶＯＬ不良切り分けテーブル（エラー情報管
理手段）５２判定ペンディングテーブル（エラー情報管理手
段）５３エラー来歴テーブル（エラー情報管理手段）５４ＮＧＶＯＬテーブル（エラー情報管理手段）５５２ＶＯＬＮＧテーブル（エラー情報管理手段）BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a conceptual diagram showing an example of a configuration of a remote maintenance support system including an information processing system according to an embodiment of the present invention. FIG. 2 is a flowchart illustrating an example of an operation of a control logic for failure determination in the information processing system according to the embodiment of the present invention. FIG. 3 is a flowchart illustrating an example of an operation of a control logic for failure determination in the information processing system according to the embodiment of the present invention. FIG. 4 is a conceptual diagram illustrating an example of a configuration of an estimated VOL failure isolation table, which is an example of an error information management unit. FIG. 5 is a conceptual diagram illustrating an example of a configuration of a determination pending table which is an example of an error information management unit. FIG. 6 is a conceptual diagram showing an example of a configuration of an error history table which is an example of an error information management unit. FIG. 7 is a conceptual diagram illustrating an example of a configuration of an NGVOL table which is an example of an error information management unit. FIG. 8 shows 2VOLNG which is an example of an error information management unit.
FIG. 3 is a conceptual diagram illustrating an example of a table configuration. [Description of Signs] 1 Magnetic tape control device (input / output control device) 2 Service processor 3 Floppy disk drive (error information management means) 4 Main processor 5 Magnetic tape deck (input / output device) 6 Communication line (communication means) 7 RS422 Interface 8 Maintenance center 9 Host computer (upper device) 51 Estimated VOL failure isolation table (error information management means) 52 Judgment pending table (error information management means) 53 Error history table (error information management means) 54 NGVOL table (error information management) Means) 55 2VOLNG table (Error information management means)

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭52−80754（ＪＰ，Ａ) 特開昭61−166637（ＪＰ，Ａ) 特開昭58−115560（ＪＰ，Ａ) 特開昭60−147848（ＪＰ，Ａ) 特開昭61−60156（ＪＰ，Ａ) ────────────────────────────────────────────────── ─── Continuation of front page (56) References JP-A-52-80754 (JP, A) JP-A-61-166637 (JP, A) JP-A-58-115560 (JP, A) JP-A-60-147848 (JP, A) JP-A-61-60156 (JP, A)

Claims

(57)【特許請求の範囲】【請求項１】サービスプロセッサを内蔵した入出力制
御装置と、この入出力制御装置に接続され、前記入出力
制御装置を介して上位装置との間で授受される情報の可
搬性記憶媒体に対する記録／再生動作を行う入出力装置
と、前記サービスプロセッサと保守センタとを接続する
通信手段とからなる情報処理システムであって、前記入
出力装置における前記記録／再生動作のエラー情報を監
視し、個々の前記入出力装置の識別情報および個々の前
記可搬性記憶媒体の識別情報およびある時刻を基準に所
定時間、各種エラー毎に閾値管理を行っている情報であ
る経時変化情報を含む形式で蓄積するエラー情報管理手
段と、このエラー情報管理手段に蓄積された前記エラー
情報に基づいて、情報処理システムの運用中に前記入出
力装置の１つにおいて一定の閾値を超えるエラーが発生
したことを契機として、当該一定の閾値を超えるエラー
が前記１つの入出力装置以外の入出力装置の運用下にお
いても生じているかどうかを判定することによって、個
々の前記入出力装置および前記可搬性記憶媒体のいずれ
がエラー要因かを判別し、かつ不良の前記入出力装置ま
たは前記可搬性記憶媒体を特定する制御論理とを備えた
ことを特徴とする情報処理システム。(57) [Claim 1] An input / output control device having a built-in service processor, connected to the input / output control device, and exchanged with a host device via the input / output control device. An information processing system comprising: an input / output device for performing a recording / reproducing operation on a portable storage medium for storing information; and a communication means for connecting the service processor to a maintenance center. This is information for monitoring operation error information, performing threshold management for each error for a predetermined time based on identification information of each of the input / output devices, identification information of each of the portable storage media, and a certain time. and error information management means for storing in a form including a temporal change information, on the basis of the accumulated error information to the error information management unit, wherein during the operation of the information processing system Out
One of the force devices has an error that exceeds a certain threshold
As an error, the error exceeding the certain threshold
Is operated by an input / output device other than the one input / output device.
By determining whether the error occurs, it is possible to determine which of the individual input / output devices and the portable storage medium is the cause of the error , and to specify the defective input / output device or the portable storage medium. an information processing system which is characterized in that a that control logic.