JPH09265435A

JPH09265435A - Storage device system

Info

Publication number: JPH09265435A
Application number: JP8072934A
Authority: JP
Inventors: Akira Yamamoto; 山本　　彰; Noboru Morishita; 昇森下; Yasuo Inoue; 靖雄井上; Yoshihiro Azumi; 義弘安積
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1996-03-27
Filing date: 1996-03-27
Publication date: 1997-10-07

Abstract

PROBLEM TO BE SOLVED: To improve storage efficiency without impairing reliability and to improve the relaibility of maintenance processing. SOLUTION: Write data are stored by the format of a RAID (arrangement method for dividing records and storing the divided records in plural disk devices) 3 and cache management information is stored by the format of a RAID (arrangement method (double writing) for storing data having quite the same contents in two disk devices) 1. In the case of a disk controller 1305 constituted of plural clusters 1311, the storage format of data in each cluster is provided with redundancy, and in the case of executing maintenance processing, the storage of new data in a cache memory 1308 for executing maintenance is inhibited and data stored in the cache memory 1308 for executing maintenance are written in a cache memory 1308 in the other cluster 1311 or written in a disk device 1304.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、キャッシュメモリ
を有する記憶システムに係り、特にその高性能化、高信
頼化に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a storage system having a cache memory, and more particularly to high performance and high reliability thereof.

【０００２】[0002]

【従来の技術】記憶システムの公知例として、以下に示
すＰａｔｔｅｒｓｏｎの論文が知られている。2. Description of the Related Art The following Patterson paper is known as a known example of a storage system.

【０００３】エー．シー．エム．シグモッドコンファ
レンスプロシーディング，1988年，6月,ページ１０９
−１１６（D.Patterson,et al:A Case for Redundant A
rraysof Inexpensive Disks(RAID),ACM SIGMOD confere
nce proceeding,Chicago,IL,June1-3,1988,pp.109-11
6）Ｐａｔｔｅｒｓｏｎの論文は、ディスクアレイ上のデー
タ配置に関する技術を開示したものである。[0003] A. C. M. Sigmod Conference Proceedings, 1988, June, page 109.
−116 (D. Patterson, et al: A Case for Redundant A
rraysof Inexpensive Disks (RAID), ACM SIGMOD confere
nce proceeding, Chicago, IL, June 1-3, 1988, pp.109-11
6) Patterson's paper discloses a technique related to data arrangement on a disk array.

【０００４】ディスクアレイは、ディスクシステムの高
性能化、高信頼化を実現するための機構である。ディス
クアレイでは、高性能化のために、物理的には複数のデ
ィスク装置を、処理装置に対しては１台の論理的なディ
スク装置に見せかける。一方、高信頼化のためには、デ
ータを格納したディスク装置に障害が発生した場合、デ
ータの回復を行うための冗長データを別のディスク装置
に格納しておく。[0004] The disk array is a mechanism for realizing high performance and high reliability of the disk system. In a disk array, a plurality of disk devices are physically made to appear as one logical disk device to a processing device for high performance. On the other hand, for higher reliability, when a failure occurs in a disk device storing data, redundant data for recovering data is stored in another disk device.

【０００５】Ｐａｔｔｅｒｓｏｎの論文では、ディスク
アレイ上の冗長データの配置方法についていくつかの技
術が開示されている。以下、本発明に関係する冗長デー
タの配置方法について説明する。The Patterson paper discloses several techniques for arranging redundant data on a disk array. Hereinafter, a method of arranging redundant data related to the present invention will be described.

【０００６】まったく同じ内容のデータを２つのディス
ク装置に格納する配置方法を、ＲＡＩＤ１、あるいは、
２重書きと呼んでいる。[0006] The layout method for storing data of exactly the same contents in two disk devices is RAID 1 or
We call it double writing.

【０００７】処理装置が、論理的なディスク装置との間
で、リード／ライト処理を行なう際の、リード／ライト
単位となるデータの集合をレコードと呼ぶ。Ｐａｔｔｅ
ｒｓｏｎの論文では、このレコードを分割して、複数の
ディスク装置に格納するデータ配置方法をＲＡＩＤ３と
呼ばれる。ＲＡＩＤ３では、ｎ個の分割単位に対して１
個の冗長データが作成される。A set of data in read / write units when the processing device performs read / write processing with a logical disk device is called a record. Patte
In Rson's paper, the data allocation method of dividing this record and storing it in a plurality of disk devices is called RAID3. In RAID3, 1 for n division units
Redundant data is created.

【０００８】一方、レコードは分割せずに、１つのディ
スク装置に格納するデータ配置方法をＲＡＩＤ４、ＲＡ
ＩＤ５と呼んでいる。ＲＡＩＤ４、ＲＡＩＤ５では、複
数のレコードから冗長データが作成される。On the other hand, the data allocation method for storing records in one disk device without dividing the records is RAID4, RA.
It's called ID5. In RAID4 and RAID5, redundant data is created from a plurality of records.

【０００９】ＲＡＩＤ１では、同じデータを２重に格納
するため、ディスク装置の記憶容量が２倍必要となる。
したがって、信頼性は向上するが、価格は高くなる。In RAID 1, the same data is stored twice, so the storage capacity of the disk device is required to be doubled.
Therefore, the reliability is improved but the price is increased.

【００１０】一方、ＲＡＩＤ３、ＲＡＩＤ４、ＲＡＩＤ
５は、複数のデータに対し、１つの冗長データを作成す
ればよいため、記憶容量はそれ程増大しない。On the other hand, RAID3, RAID4, RAID
In No. 5, the storage capacity does not increase so much because one redundant data needs to be created for a plurality of data.

【００１１】ＲＡＩＤ３では、レコードを複数のディス
ク装置に分割して記録するため、処理装置から要求され
るリード／ライト処理を実行する際、複数のディスク装
置を占有することになる。ディスク装置からデータをリ
ード／ライトする場合、これに先立ち、シーク／サーチ
処理を実行する必要がある。データのリード／ライト時
間は、レコードを分割した数に比例して短くなるが、シ
ーク／サーチ時間は、短くならない。このため、ＲＡＩ
Ｄ３では、レコードを１つ読み書きするような場合、Ｒ
ＡＩＤ３を構成する１つ１つのディスク装置を独立に動
作させた場合に比較すると、性能劣化が激しい。ただ
し、ＲＡＩＤ３の場合、１回のリード／ライトデータ量
が、シーク／サーチ時間を無視できるだけの量となれ
ば、充分な性能を得ることができる。さらに、専用のハ
ードウェアを設ければ、ディスクアレイの制御プログラ
ムからは、ＲＡＩＤ３を１つのディスク装置として制御
できるため、ディスクアレイの制御プログラムを大幅に
簡略して作成できる。In RAID 3, a record is divided and recorded in a plurality of disk devices, so that a plurality of disk devices are occupied when the read / write processing requested by the processing device is executed. Before reading / writing data from the disk device, it is necessary to execute seek / search processing. The data read / write time becomes shorter in proportion to the number of divided records, but the seek / search time does not become shorter. Therefore, RAI
In D3, when reading and writing one record, R
Compared to the case where each disk device constituting AID3 is operated independently, the performance is severely degraded. However, in the case of RAID 3, if the amount of read / write data for one time is such that the seek / search time can be ignored, sufficient performance can be obtained. Furthermore, if dedicated hardware is provided, the RAID 3 can be controlled as a single disk device from the disk array control program, so that the disk array control program can be greatly simplified.

【００１２】ＲＡＩＤ５の場合、レコードは１つのディ
スク装置に格納されるため、ディスク装置からレコード
を１つリードする場合、１台のディスク装置しか占有し
ない。ただし、書き変えられた内容と、そのレコードの
更新前の値、冗長データの以前の値とから、新しい冗長
データの値を作成する。このため、レコードの更新前の
値と冗長データの以前の値の読み出し、レコードと新し
い冗長データの書き込み、と計４回のディスク装置への
リード／ライト処理の実行が必要となる。また、ＲＡＩ
Ｄ４や、ＲＡＩＤ５では、ディスクアレイの制御ソフト
ウェアは、個々のディスク装置を意識し、かつ、処理装
置には、ディスクアレイを１つの論理的なデイスク装置
に見せる必要があるため、複雑になる。In the case of RAID5, a record is stored in one disk device, and therefore when reading one record from the disk device, only one disk device is occupied. However, a new redundant data value is created from the rewritten contents, the value before the update of the record, and the previous value of the redundant data. Therefore, it is necessary to read the value before update of the record and the previous value of the redundant data, write the record and new redundant data, and execute the read / write processing to the disk device four times in total. Also, RAI
In D4 and RAID5, the control software of the disk array is complicated because the disk array control software needs to be aware of each disk device and the processing device needs to make the disk array appear as one logical disk device.

【００１３】一方、特開平７−２３０３６２号は、複数
のディスク装置を１つのボード上に搭載させるボードデ
ィスクに関する技術が開示されている。通常、保守処理
は、ボード単位に実行される。この保守処理中にも、処
理装置からのリード／ライト要求を受け付けられるよう
に、複数のボード間で、ディスクアレイを構成するよう
にしている。On the other hand, Japanese Unexamined Patent Publication No. 7-230362 discloses a technique relating to a board disc in which a plurality of disc devices are mounted on one board. Normally, the maintenance process is executed on a board basis. A disk array is configured between a plurality of boards so that a read / write request from the processing device can be accepted even during this maintenance processing.

【００１４】また、近年のディスクシステムにおいて
は、特開平５−１８９３１４号に見られるように、複数
のディスク装置を制御する制御装置にキャッシュメモリ
を装備している。さらに、処理装置からライト要求を受
け付けた場合、制御装置は、ライトデータをキャッシュ
メモリに書き込んだだけで、ライト要求を完了させる。
キャッシュ内のライトデータは、制御装置が後からディ
スク装置に書き込むが、その前にキャッシュメモリに障
害が発生すると、処理装置から見ると、ディスク装置に
書き込んだデータが失われてしまったことになるため、
重大な事態になる。このため、キャッシュメモリを不揮
発化したり、２重化してライトデータを２重に格納する
ような技術が開示されている。また、このような時、キ
ャッシュメモリの管理情報も不揮発化あるいは２重化す
る。In a recent disk system, a cache memory is provided in a control device for controlling a plurality of disk devices, as disclosed in Japanese Patent Laid-Open No. 5-189314. Furthermore, when the write request is received from the processing device, the control device completes the write request only by writing the write data in the cache memory.
The write data in the cache is written to the disk device by the control device later, but if the cache memory fails before that, the data written to the disk device is lost from the viewpoint of the processing device. For,
It will be a serious situation. For this reason, techniques have been disclosed in which the cache memory is made non-volatile, or the cache data is duplicated to store the write data doubly. In such a case, the management information in the cache memory is also made non-volatile or duplicated.

【００１５】さらに、近年のディスクシステムに対して
は、無停止化の要求が強い。特に、保守処理を行なう
際、処理装置からのリード／ライト要求を受付ながら、
保守処理を実行可能にする必要となる。このような要求
に対応して、特開平７−３０６８４４号に見られるよう
に、ディスクシステムを２つのクラスタから構成し、一
方のクラスタを閉塞して保守処理を実行して、もう一方
のクラスタで処理装置からのリード／ライト要求を受付
けるように構成したものがある。Further, there is a strong demand for non-stop for the recent disk systems. Especially when performing maintenance processing, while accepting read / write requests from the processing device,
It is necessary to enable maintenance processing. In response to such a request, as seen in Japanese Patent Laid-Open No. 7-306844, the disk system is configured with two clusters, one cluster is blocked and maintenance processing is executed, and the other cluster is used. Some are configured to accept a read / write request from a processing device.

【００１６】その他、ディスクシステムの高性能化方式
として、ライトアフタ制御を行うディスクキャッシュを
利用した技術が、以下の通り開示されている。In addition, as a method for improving the performance of the disk system, a technique using a disk cache for performing write-after control is disclosed as follows.

【００１７】特開昭５５−１５７０５３号では、ディス
クキャッシュを有する制御装置において、ライトアフタ
処理を利用してライト要求を高速化に関する技術が開示
されている。具体的には、制御装置は、処理装置から受
け付けたライトデータをキャッシュ内に書き込んだ段階
で、ライト処理を完了させる。処理装置から受け付け、
キャッシュ内に格納したデータのディスク装置への書き
込みは、後から、制御装置のライトアフタ処理によって
実行される。Japanese Patent Application Laid-Open No. 55-157053 discloses a technique relating to speeding up a write request using a write after process in a control device having a disk cache. Specifically, the control device completes the write process at the stage when the write data received from the processing device is written in the cache. Received from the processing device,
The writing of the data stored in the cache to the disk device is later executed by the write after process of the control device.

【００１８】特開昭５９−１３５５６３号では、高信頼
性を保証しながらライト処理を高速化する制御装置に関
する技術が開示されている。Japanese Unexamined Patent Publication No. 59-135563 discloses a technique relating to a control device for speeding up write processing while ensuring high reliability.

【００１９】特開昭５９−１３５５６３号では、制御装
置内にキャッシュメモリ以外に不揮発性メモリを有し、
処理装置から受け取ったライトデータをキャッシュメモ
リと不揮発性メモリに格納する。ディスク装置へのライ
トデータの書き込みは、制御装置がライトアフタ処理に
よって実行する。これにより、ライトアフタ処理の高信
頼化を図る。In Japanese Patent Laid-Open No. 59-135563, a non-volatile memory is provided in the control device in addition to the cache memory.
The write data received from the processing device is stored in the cache memory and the non-volatile memory. The writing of the write data to the disk device is executed by the control device by the write after process. As a result, the reliability of the write after process is improved.

【００２０】一方、特開昭６０−１１４９４７号では、
２重書きディスク装置を制御するディスクキャッシュを
有する制御装置に関する技術が開示されている。On the other hand, in JP-A-60-114947,
A technique relating to a control device having a disk cache for controlling a double write disk device is disclosed.

【００２１】特開昭６０−１１４９４７号では、制御装
置は、処理装置から受け取ったライト要求に対し、一方
のディスク装置とキャッシュメモリに、処理装置から受
け取ったライトデータを書き込む。もう一方のディスク
装置には、制御装置が、処理装置からのリード／ライト
要求とは非同期に、キャッシュメモリに格納したライト
データを後から書き込む。制御装置が、処理装置からの
リード／ライト要求とは非同期に、キャッシュメモリに
格納したライトデータをディスク装置に後から書き込む
動作をライトアフタ処理と呼ぶ。In Japanese Patent Laid-Open No. 60-114947, the control device writes the write data received from the processing device into one of the disk device and the cache memory in response to the write request received from the processing device. The control device writes the write data stored in the cache memory later to the other disk device asynchronously with the read / write request from the processing device. The operation in which the control device writes the write data stored in the cache memory to the disk device later asynchronously with the read / write request from the processing device is called a write-after process.

【００２２】特開平２−３７４１８号では、２重書きデ
ィスク装置をディスクキャッシュを利用した高性能化に
関する技術が開示されている。Japanese Unexamined Patent Publication No. 2-37418 discloses a technique for improving the performance of a dual writing disk device by using a disk cache.

【００２３】特開平３−３７７４６号では、ディスクキ
ャッシュを有し、ライトアフタ処理を実行する制御装置
において、ライトアフタ処理を効率よく実行することを
目的としたディスクキャッシュ内のライトアフタデータ
の管理データ構造についての技術が開示されている。In Japanese Patent Laid-Open No. 3-37746, management data of write-after data in a disk cache for the purpose of efficiently executing write-after processing in a controller having a disk cache and executing write-after processing. Techniques for construction are disclosed.

【００２４】また、ディスクアレイに関しては次のよう
な技術が開示されている。Regarding the disk array, the following techniques have been disclosed.

【００２５】特開平４−３０２０２０号では、可変長レ
コードの格納を許したディスク装置を用いたディスクア
レイに関する技術が開示されている。Japanese Patent Application Laid-Open No. 4-302020 discloses a technique relating to a disk array using a disk device which allows storage of variable length records.

【００２６】特開平５−４６３２４号では、ディスクア
レイを構成する個々のディスク装置にパリティレコード
の更新値を作成する機能を分散する技術が開示されてい
る。Japanese Unexamined Patent Publication No. 5-46324 discloses a technique for distributing the function of creating an updated value of a parity record among individual disk devices constituting a disk array.

【００２７】特開平７−４４３２６号では、ディスク装
置に２つのアクチェエータ、あるいは、リード処理とラ
イト処理を並列に実行できるヘッドを利用して、パリテ
ィレコードの更新値を効率良く作成する技術が公開され
ている。Japanese Unexamined Patent Publication No. 7-44326 discloses a technique for efficiently creating an updated value of a parity record by using two actuators in a disk device or a head capable of executing a read process and a write process in parallel. ing.

【００２８】[0028]

【発明が解決しようとする課題】特開平５−１８９３１
４号では、ライトデータをキャッシュメモリに２重に格
納する。このため、キャッシュメモリの記憶容量が２倍
必要になる。同様に、キャッシュメモリの管理情報も２
重化しているため、必要な記憶容量が２倍になる。[Patent Document 1] Japanese Unexamined Patent Publication No. 5-18931
In No. 4, write data is doubly stored in the cache memory. Therefore, the storage capacity of the cache memory needs to be doubled. Similarly, the management information of the cache memory is 2
Since they are duplicated, the required storage capacity is doubled.

【００２９】すなわち、上記した従来の技術では、ライ
トデータやキャッシュの管理情報を単に２重化している
だけでであるため、記憶効率が悪いという問題がある。That is, in the above-mentioned conventional technique, since the write data and the management information of the cache are simply duplicated, there is a problem that the storage efficiency is poor.

【００３０】さらに、特開平５−１８９３１４号および
特開平７−３０６８４４号では、保守処理中においてラ
イトデータを冗長化することについては配慮されておら
ず、保守処理中に障害が発生すると、ライトデータ等が
消失する可能性がある。Further, in JP-A-5-189314 and JP-A-7-306844, no consideration is given to making write data redundant during maintenance processing, and if a failure occurs during maintenance processing, write data Etc. may disappear.

【００３１】本発明の第１の目的は、ライトデータやキ
ャッシュの管理情報の信頼性を損なうことなく記憶効率
を向上させることができる記憶システムを提供すること
にある。A first object of the present invention is to provide a storage system capable of improving storage efficiency without impairing the reliability of write data and cache management information.

【００３２】第２の目的、保守処理中の信頼性について
も向上させることができる記憶システムを提供すること
にある。A second object of the present invention is to provide a storage system capable of improving the reliability during maintenance processing.

【００３３】[0033]

【課題を解決するための手段】本発明においては、上記
第１の目的を達成するために、ライトデータをＲＡＩＤ
３の形式で格納し、キャッシュの管理情報をＲＡＩＤ１
の形式で格納するようにしたことを特徴とする。According to the present invention, in order to achieve the first object, the write data is RAID
Stored in the format of 3 and the management information of the cache is RAID1.
It is characterized in that it is stored in the format of.

【００３４】また、上記第２の目的を達成するために
は、ディスク制御装置が複数のクラスタにより構成され
ることが前提となる。この場合、それぞれのクラスタ内
のデータの格納形式に冗長性を持たせておき、クラスタ
間ではキャッシュメモリに格納するデータを重複させな
いようにし、保守処理を行なう際には、保守を行うキャ
ッシュメモリは新たにデータの格納を行なわないように
し、保守を行うキャッシュメモリにそれまで格納してい
たデータをもう一方のクラスタのキャッシュメモリに書
き込むか、ディスク装置に書き込むようにしたことを特
徴とする。In order to achieve the second object, it is premised that the disk control device is composed of a plurality of clusters. In this case, the storage format of the data in each cluster should have redundancy so that the data to be stored in the cache memory is not duplicated between the clusters. It is characterized in that new data is not stored, and the data that has been stored in the cache memory for maintenance is written to the cache memory of the other cluster or to the disk device.

【００３５】本発明においては、上記のように、ライト
データやキャッシュの管理情報の信頼性を損なうことな
く記憶効率を向上させるために、ライトデータをＲＡＩ
Ｄ３の形式で格納する。この場合、キャッシュメモリ
は、通常、ボードが保守単位となるため、ボード間で、
ＲＡＩＤ３を構成する。ただし、これだけでは、キャッ
シュメモリのデータ格納形式に、ＲＡＩＤ３を適用した
だけである。また、ボード間で、ＲＡＩＤ３を構成する
技術も、特開平７−２３０３６２号で開示されているボ
ードディスクの構成方法をキャッシュメモリに適用した
にすぎない。In the present invention, as described above, in order to improve the storage efficiency without deteriorating the reliability of the write data and the management information of the cache, the write data is RAI.
Store in the D3 format. In this case, the cache memory is usually a maintenance unit for each board.
Configure RAID3. However, this is the only application of RAID3 to the data storage format of the cache memory. Further, the technique of configuring RAID3 between boards is also the application of the board disk configuration method disclosed in Japanese Patent Laid-Open No. 7-230362 to a cache memory.

【００３６】そこで、本発明では、ライトデータをＲＡ
ＩＤ３の形式で格納し、キャッシュの管理情報をＲＡＩ
Ｄ１の形式で格納する。これは、キャッシュの管理情報
は、更新の単位が小さいため、ＲＡＩＤ３の形式で記録
すると、冗長データを作成している１組の管理情報の集
合を部分的に更新することが多くなるため、管理情報の
更新に伴い、キャッシュメモリからの読み出しが発生
し、更新性能が低下するからである。ＲＡＩＤ１の形式
で記録することにより、この性能劣化を防ぐことができ
る。また、キャッシュメモリの管理情報は、キャッシュ
メモリの大きさの比べて充分小さいため、この部分だけ
ＲＡＩＤ１の形式にしても記憶効率はそれ程悪くならな
い。Therefore, in the present invention, the write data is RA
Stored in ID3 format and cache management information RAI
Store in the D1 format. This is because the management information of the cache has a small unit of update. Therefore, if the management information of the cache is recorded in the RAID3 format, the set of management information that creates redundant data is often partially updated. This is because reading from the cache memory occurs as the information is updated, and the update performance deteriorates. By recording in the RAID1 format, this performance deterioration can be prevented. Further, since the management information of the cache memory is sufficiently smaller than the size of the cache memory, the storage efficiency does not deteriorate so much even if only this part is in the RAID1 format.

【００３７】一方、ライトデータは処理装置がレコード
という単位で更新するため、更新単位が大きく、ライト
データを複数のキャッシュメモリに分割して、冗長デー
タを作成できる。しかも、キャッシュメモリの場合、デ
ィスク装置のようにシークサーチ時間がないため、ライ
トデータを複数のキャッシュメモリに分割しても、性能
的にも問題がない。以上により、ライトデータをＲＡＩ
Ｄ３の形式で格納し、キャッシュの管理情報をＲＡＩＤ
１の形式で格納することにより、信頼性をそれほど損な
うことなく、記憶効率を従来に比べて向上させることが
できる。On the other hand, since the processing unit updates the write data in units of records, the update unit is large and the write data can be divided into a plurality of cache memories to create redundant data. Moreover, in the case of the cache memory, unlike the disk device, there is no seek search time, so there is no problem in performance even if the write data is divided into a plurality of cache memories. As described above, the write data is changed to RAI.
Store in the D3 format and set the cache management information to RAID
By storing in the format of 1, the storage efficiency can be improved as compared with the related art without significantly impairing the reliability.

【００３８】また、本発明では、上記のように、保守処
理中の信頼性を向上させるために、それぞれのクラスタ
内のデータの格納形式に冗長性を持たせ、クラスタ間
で、キャッシュメモリに格納するデータを重複させない
ようにしておく。保守処理を行なう際には、保守を行う
キャッシュメモリは、新たにデータの格納を行なわない
ようにし、保守を行うキャッシュメモリにそれまで格納
していたデータを、もう一方のクラスタのキャッシュメ
モリに書き込むか、ディスク装置に書き込むようにす
る。保守を行なわないキャッシュメモリ上のデータも冗
長性を持っている。これにより、保守処理中の信頼性を
向上させることが可能になる。Further, in the present invention, as described above, in order to improve the reliability during the maintenance processing, the storage format of the data in each cluster is made redundant, and the data is stored in the cache memory between the clusters. Make sure that the data you use is not duplicated. When performing maintenance processing, the cache memory for maintenance does not store new data, and the data previously stored in the cache memory for maintenance is written to the cache memory of the other cluster. Or write to the disk device. The data in the cache memory that is not maintained also has redundancy. This makes it possible to improve reliability during maintenance processing.

【００３９】[0039]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００４０】まず、第１の実施形態について説明する。First, the first embodiment will be described.

【００４１】図１は、本発明を適用したディスクアレイ
制御プロセッサの実施形態を示す構成図であり、図１の
ディスクアレイ制御プロセッサを用いた計算機システム
の構成図である。FIG. 1 is a block diagram showing an embodiment of a disk array control processor to which the present invention is applied, and is a block diagram of a computer system using the disk array control processor of FIG.

【００４２】図２に示す計算機システムは、処理装置１
３００、制御装置１３０５および１台以上のディスク装
置１３０４により構成されている。The computer system shown in FIG.
300, a control device 1305, and one or more disk devices 1304.

【００４３】処理装置１３００は、ＣＰＵ１３０１、主
記憶１３０２および複数のチャネル１３０３により構成
されている。The processing unit 1300 comprises a CPU 1301, a main memory 1302 and a plurality of channels 1303.

【００４４】制御装置１３０５は、処理装置１３００か
らのリード／ライト要求にしたがって、処理装置１３０
０とディスク装置１３０４の間でデータの転送処理を実
行するものであり、２つのクラスタ１３１１を備えてい
る。The controller 1305 follows the read / write request from the processor 1300.
0 and the disk device 1304 execute data transfer processing, and are provided with two clusters 1311.

【００４５】それぞれのクラスタ１３１１は、１つ以上
のディスクアレイ制御プロセッサ１３１０、キャッシュ
メモリ（以下、単にキャッシュと略す。）１３０８、バ
ス１３１２、電源１３１２、交代電源１３１３を備え、
バス１３１２は反対のクラスタ１３１１のディスクアレ
イ制御プロセッサ１３１０とキャッシュ１３０８に接続
されている。Each cluster 1311 includes one or more disk array control processors 1310, a cache memory (hereinafter, simply referred to as cache) 1308, a bus 1312, a power supply 1312, and an alternate power supply 1313.
Bus 1312 is connected to the disk array control processor 1310 and cache 1308 of the opposite cluster 1311.

【００４６】電源１３１３は、当該クラスタ１３１１の
電源を供給するが、交代電源１３１４は電源１３１３に
障害が発生した時に電源供給を行なう。すなわち、交代
電源１３１４は電源１３１３の待機系である。なお、図
２では、交代電源１３１４が同一のクラスタ１３１１の
電源１３１３の待機系になっているが、反対のクラスタ
１３１１の待機系となってもよい。The power supply 1313 supplies power to the cluster 1311, while the alternate power supply 1314 supplies power when a failure occurs in the power supply 1313. That is, the alternate power supply 1314 is a standby system for the power supply 1313. In FIG. 2, the alternate power supply 1314 is a standby system for the power supply 1313 of the same cluster 1311, but it may be a standby system for the opposite cluster 1311.

【００４７】キャッシュ１３０８には、ディスク装置１
３０４の中の一部のデータ、このデータの管理情報、制
御装置１３０５の管理情報などを格納するものであり、
信頼性の観点から不揮発化しておくことが望ましい。In the cache 1308, the disk device 1
A part of data in 304, management information of this data, management information of the control device 1305, etc. are stored.
From the viewpoint of reliability, it is desirable to make nonvolatile.

【００４８】ディスクアレイ制御プロセッサ１３１０
は、制御装置１３０５内で処理装置１３００からのリー
ド／ライト要求を受取り、キャッシュ１３０８等を利用
し、処理装置１３００とディスク装置１３０４の間でデ
ータの転送処理を実行する機能を備えている。Disk Array Control Processor 1310
Has a function of receiving a read / write request from the processing device 1300 in the control device 1305 and utilizing the cache 1308 or the like to execute a data transfer process between the processing device 1300 and the disk device 1304.

【００４９】このディスクアレイ制御プロセッサ１３１
０は、リード／ライト受付部１００、ライトアフタ処理
実行部１０１、閉塞移行処理実行部１０２、回復処理実
行部１０３とから成っている。これら各部の動作につい
ては、フローチャートを参照して後述する。This disk array control processor 131
0 is composed of a read / write accepting unit 100, a write-after process executing unit 101, a block migration process executing unit 102, and a recovery process executing unit 103. The operation of each of these units will be described later with reference to the flowchart.

【００５０】以下、本実施形態において、制御装置１３
０５が処理装置１３００からリード／ライト要求を受け
取った時の動作を簡単に説明する。なお、以下に示す動
作は、制御装置１３０５内のディスクアレイ制御プロセ
ッサ１３１０が実行する。Hereinafter, in the present embodiment, the control device 13
The operation when 05 receives a read / write request from the processing device 1300 will be briefly described. The operation described below is executed by the disk array control processor 1310 in the control device 1305.

【００５１】まず、制御装置１３０５が処理装置１３０
０からリード要求を受け取ったときの動作を説明する。First, the control unit 1305 is the processing unit 130.
The operation when a read request is received from 0 will be described.

【００５２】制御装置１３０５は処理装置１３００から
リード要求を受け取ったリード時、対象となるレコード
５００（図５）がキャッシュ１３０８に格納されていれ
ば、キャッシュ１３０８内のレコード５００を処理装置
１３００に転送し、動作を完了する。When receiving a read request from the processing device 1300, the control device 1305 transfers the record 500 in the cache 1308 to the processing device 1300 if the target record 500 (FIG. 5) is stored in the cache 1308. And complete the operation.

【００５３】しかし、リード対象となるレコード５００
がキャッシュ１３０８に格納されていない場合、レコー
ド５００をディスク装置１３０４から読出してキャッシ
ュ１３０８に転送する。この後、このレコード５００を
キャッシュ１３０８から処理装置１３００に転送し、動
作を完了する。However, the record 500 to be read
Is not stored in the cache 1308, the record 500 is read from the disk device 1304 and transferred to the cache 1308. Thereafter, this record 500 is transferred from the cache 1308 to the processing device 1300, and the operation is completed.

【００５４】一方、制御装置１３０５が処理装置１３０
０からのライト要求を受け取った場合、ディスク装置１
３０４上のレコード５００に書き込むべきライトデータ
を、キャッシュ１３０８に格納した段階で動作を完了さ
せる。ディスク装置１３０４へのライトデータの書き込
みは、後から制御装置１３０５が実行する。この処理を
ライトアフタ処理と呼ぶ。ライトアフタ処理を実行する
際、ディスクアレイの構成がＲＡＩＤ３やＲＡＩＤ５の
場合、レコード５００に対応するパリティデータを作成
し、ディスク装置１３０４に書き込む。ＲＡＩＤ１の構
成の場合、レコード５００の内容を２つのディスク装置
１３０４に書き込む。On the other hand, the control unit 1305 is the processing unit 130.
When a write request from 0 is received, the disk device 1
The operation is completed when the write data to be written in the record 500 on the 304 is stored in the cache 1308. The write of the write data to the disk device 1304 is executed by the control device 1305 later. This process is called a write-after process. When the write-after process is executed, if the disk array configuration is RAID 3 or RAID 5, parity data corresponding to the record 500 is created and written to the disk device 1304. In the case of the RAID 1 configuration, the contents of the record 500 are written in the two disk devices 1304.

【００５５】図３は、本実施形態におけるキャッシュ１
３０８の論理的な構造を示す図である。キャッシュ１３
０８は、管理情報格納領域３００とディスクデータ格納
領域３０１とから構成される。管理情報格納領域３００
は、キャッシュ１３０８に格納したディスク装置１３０
４の中のデータの管理情報および制御装置１３０５の管
理情報を格納するものである。ディスクデータ格納領域
３０１は、ディスク装置１３０４の中の一部のデータを
格納するものである。FIG. 3 shows the cache 1 in this embodiment.
It is a figure which shows the logical structure of 308. Cache 13
08 is composed of a management information storage area 300 and a disk data storage area 301. Management information storage area 300
Is the disk device 130 stored in the cache 1308.
4 stores data management information and management information of the control device 1305. The disk data storage area 301 stores a part of data in the disk device 1304.

【００５６】図４は、キャッシュ１３０８の物理的構造
を示す図であり、キャッシュ１３０８は複数のボード４
００より構成される。複数のボード４００は、大別して
管理情報ボード４０１とディスクキャッシュボード４０
２とから構成されている。FIG. 4 is a diagram showing a physical structure of the cache 1308. The cache 1308 includes a plurality of boards 4.
00. The plurality of boards 400 are roughly classified into a management information board 401 and a disk cache board 40.
And 2.

【００５７】管理情報ボード４０１は、管理情報格納領
域３００に対応したボード４００であり、ディスクキャ
ッシュボード４０２はディスクデータ格納領域３０１に
対応したボード４００である。なお、障害の検知の単位
は、ボード４００単位で行なわれる。The management information board 401 is the board 400 corresponding to the management information storage area 300, and the disk cache board 402 is the board 400 corresponding to the disk data storage area 301. The unit of failure detection is the unit of the board 400.

【００５８】図５は、ディスクキャッシュボード４０２
へのデータの格納形式を示す図であり、レコード５００
がディスク装置１３０５のリード／ライト単位である。
レコード５００は、ｍ個（ｍ≧２）に分割され、それぞ
れを分割レコード５０１として、各分割単位に対応した
ディスクキャッシュボード４０２に格納される。FIG. 5 shows the disk cache board 402.
FIG. 5 is a diagram showing a storage format of data in a record 500.
Is a read / write unit of the disk device 1305.
The record 500 is divided into m pieces (m ≧ 2), and each piece is stored as a division record 501 in the disk cache board 402 corresponding to each division unit.

【００５９】また、ｍ個の分割レコードからｎ個（ｎ≧
１）の冗長データ５０２が作成される。ｍ個の分割レコ
ード５０１とｎ個の冗長データ５０２は、それぞれの別
々にｍ＋ｎ個のディスクキャッシュボード４０２に格納
される。In addition, n pieces (n ≧
The redundant data 502 of 1) is created. The m divided records 501 and the n redundant data 502 are separately stored in m + n disk cache boards 402, respectively.

【００６０】ただし、第１の実施形態ではレコード５０
０に格納形式に冗長性があればよい。例えば、レコード
５００の内容を２重にディスクキャッシュボード４０２
に格納してもよい。However, in the first embodiment, the record 50
It is sufficient if 0 has redundancy in the storage format. For example, the contents of the record 500 are duplicated in the disk cache board 402.
May be stored in.

【００６１】なお、１枚のディスクキャッシュボード４
０２に分割レコード５０１だけを格納したり、冗長デー
タ５０２だけを格納してもよい。あるいは、１枚のディ
スクキャッシュボード４０２に分割レコード５０１と冗
長データ５０２を混在させて格納してもよい。ｎ個の冗
長データ５０２を格納した場合には、ｎ枚のディスクキ
ャッシュボード４０２が故障しても、レコード５００の
内容を復元できる。Incidentally, one disk cache board 4
02 may store only the divided record 501 or only the redundant data 502. Alternatively, the divided record 501 and the redundant data 502 may be mixed and stored in one disk cache board 402. When n redundant data 502 are stored, the contents of the record 500 can be restored even if n disk cache boards 402 fail.

【００６２】図６は、管理情報ボード４００の情報の格
納形式を示す図であり、管理情報６００は２つの管理情
報ボード４０１に２重に格納される。FIG. 6 is a diagram showing a storage format of information of the management information board 400. The management information 600 is doubly stored in the two management information boards 401.

【００６３】なお、２つの管理情報ボード４０１をペア
にして、２つの管理情報ボード４０１にまったく同じ管
理情報６００を格納してもよい。また、管理情報６００
を任意の２つの管理情報ボード４０１に格納し、管理情
報ボード４０１全体の情報は、異なっていてもよい。ま
た、第１の実施形態では、管理情報６００の格納形式は
必ずしも２重に格納する必要はなく、冗長性があればよ
い。The two management information boards 401 may be paired and the same management information 600 may be stored in the two management information boards 401. Also, management information 600
May be stored in two arbitrary management information boards 401, and the information of the entire management information board 401 may be different. Further, in the first embodiment, the storage format of the management information 600 does not necessarily have to be stored in duplicate, and may be redundant.

【００６４】本実施形態では、ボード４００に障害が発
生し、管理情報６００やレコード５００の格納に冗長性
がなくなった時、クラスタ１３１１を閉塞する前処理の
実行に入る。In this embodiment, when a failure occurs in the board 400 and the redundancy of the storage of the management information 600 and the record 500 is lost, the preprocessing for closing the cluster 1311 is started.

【００６５】すなわち、冗長性のなくなったキャッシュ
１３０８の使用を停止しようとする。直ちに、閉塞でき
ないのは、閉塞しようとするキャッシュ１３０８に未だ
ディスク装置１３０４に書き込んでいないライトデータ
が格納されているためである。That is, the use of the cache 1308, which has lost redundancy, is to be stopped. Immediately, the reason why the block cannot be blocked is that the cache 1308 to be blocked stores write data that has not been written to the disk device 1304.

【００６６】クラスタ１３１１を閉塞する前処理とは、
閉塞しようとするキャッシュ１３０８に格納され、ディ
スク装置１３０４に書き込んでいないライトデータをデ
ィスク装置１３０４に書き込むか、反対のクラスタ１３
１１のキャッシュ１３０８にコピーする処理である。同
様に、本実施形態では、キャッシュ１３０８の保守を行
おうとした場合、１つのクラスタ１３１１のキャッシュ
１３０８全体の保守を行うものとする。The preprocessing for closing the cluster 1311 is
The write data stored in the cache 1308 to be blocked and not written in the disk device 1304 is written in the disk device 1304, or the opposite cluster 13
11 is a process for copying to the cache 1308. Similarly, in this embodiment, when the cache 1308 is to be maintained, the entire cache 1308 of one cluster 1311 is to be maintained.

【００６７】以上より、本実施形態では、それぞれのキ
ャッシュ１３０８の冗長性がなくなった時、直ちに、冗
長性のあるキャッシュ１３０８を集中して使用するする
ため、ライトアフタ処理を行っても信頼性を確保でき
る。As described above, in the present embodiment, when the caches 1308 lose their redundancy, the caches 1308 with redundancy are immediately used in a concentrated manner. Can be secured.

【００６８】同様に、一方のクラスタ１３１１のキャッ
シュ１３０８の保守を行う場合も、残りのクラスタ１３
１１のキャッシュ１３０８に冗長性があるため、ライト
アフタ処理を行っても信頼性を確保できる。Similarly, when the cache 1308 of one cluster 1311 is maintained, the remaining clusters 13
Since the 11 caches 1308 have redundancy, the reliability can be secured even if the write after process is performed.

【００６９】図７は、キャッシュ１３０８の状態の遷移
をまとめたものである。なお、本実施形態においては、
キャッシュ１３０８が２つあるため、この状態は、それ
ぞれのキャッシュ１３０８毎に管理するものとする。FIG. 7 summarizes the state transitions of the cache 1308. In the present embodiment,
Since there are two caches 1308, this state is managed for each cache 1308.

【００７０】図７において、使用可能状態９００は、当
該キャッシュ１３０８が使用可能であることを表す。使
用可能状態９００から当該キャッシュ１３０８のボード
４００の障害を検知したり、保守処理を開始しようとし
た時、当該キャッシュ１３０８を閉塞移行中状態９０１
にする。これは、当該キャッシュ１３０８を閉塞するた
めの前処理を実行中であることを表す。この処理が完了
すると、キャッシュ１３０８の状態は、閉塞状態９０２
になる。In FIG. 7, the available state 900 indicates that the cache 1308 is available. When a failure of the board 400 of the relevant cache 1308 is detected from the available state 900 or when the maintenance process is started, the relevant cache 1308 is in the blocking transitioning state 901.
To This means that the preprocessing for closing the cache 1308 is being executed. When this processing is completed, the state of the cache 1308 changes to the closed state 902.
become.

【００７１】その後、当該キャッシュ１３０８の保守が
完了したり、障害を起こしたボード４００の交換等が完
了すると、キャッシュ１３０８の状態は、回復状態９０
３になる。これは、当該キャッシュ１３０８を使用可能
状態９００にするための前処理の実行中である状態を表
す。After that, when the maintenance of the cache 1308 is completed, or the replacement of the failed board 400 is completed, the state of the cache 1308 is restored to the recovery state 90.
It becomes 3. This represents a state in which pre-processing for bringing the cache 1308 into the usable state 900 is being executed.

【００７２】図８は、図３に示したディスクデータ格納
領域３０１の論理的な構造を示す図であり、ディスクデ
ータ格納領域３０１は複数のセグメント８００から構成
される。セグメント８００にはレコード５００が格納さ
れる。FIG. 8 is a diagram showing the logical structure of the disk data storage area 301 shown in FIG. 3, and the disk data storage area 301 is composed of a plurality of segments 800. A record 500 is stored in the segment 800.

【００７３】図９に、レコード５００とセグメント８０
０とディスクアレイボード４００の関係を示す。セグメ
ント８００は、ｍ＋ｎ個の分割セグメント９１０に分割
され、それぞれの分割セグメント９０１には、分割レコ
ード５０１と冗長データ５０２が格納される。FIG. 9 shows a record 500 and a segment 80.
The relationship between 0 and the disk array board 400 is shown. The segment 800 is divided into m + n divided segments 910, and a divided record 501 and redundant data 502 are stored in each divided segment 901.

【００７４】図１０は、図３に示した管理情報格納領域
３００の論理的な構造を示す図であり、本実施形態にお
いて管理情報格納領域３００に含まれる情報は、自クラ
スタ内キャッシュ状態１０００、他クラスタ内キャッシ
ュ状態１００１、セグメント対応のセグメント管理情報
１００２、空きセグメントキューポインタ１００３であ
る。FIG. 10 is a diagram showing the logical structure of the management information storage area 300 shown in FIG. 3. In this embodiment, the information contained in the management information storage area 300 is the cache state 1000 in the local cluster. A cache state 1001 in another cluster, segment management information 1002 corresponding to a segment, and a free segment queue pointer 1003.

【００７５】自クラスタ内キャッシュ状態１０００、他
クラスタキャッシュ状態１００１は、自分のクラスタ１
３１１のキャッシュ１３０８、あるいは、他のクラスタ
１３１１のキャッシュ１３０８が、それぞれ、図７に示
した状態のうちのどの状態にあるかを示している。The self-cluster cache status 1000 and the other-cluster cache status 1001 are the cluster 1 of the own cluster.
The cache 1308 of 311 or the cache 1308 of another cluster 1311 is in each of the states shown in FIG. 7.

【００７６】セグメント管理情報１００２は、図８のセ
グメント８００のそれぞれに対応して存在する管理情報
である。空きセグメントキューポインタ１００２は、図
１１に示すように、レコード５００を格納していないセ
グメント８００に対応するセグメント管理情報１００２
を結合した空きセグメントキュー１１００の先頭を示す
ポインタである。The segment management information 1002 is management information existing corresponding to each of the segments 800 in FIG. The free segment queue pointer 1002 is, as shown in FIG. 11, the segment management information 1002 corresponding to the segment 800 that does not store the record 500.
This is a pointer indicating the beginning of the empty segment queue 1100 in which are combined.

【００７７】これらの自クラスタ内キャッシュ状態１０
００、他クラスタ内キャッシュ状態１００１、セグメン
ト管理情報１００２、空きセグメントキューポインタ１
００３のいずれの情報も２重化して２つの管理情報ボー
ド４０１に格納される。These local cluster cache states 10
00, cache status in other cluster 1001, segment management information 1002, free segment queue pointer 1
Any information 003 is duplicated and stored in the two management information boards 401.

【００７８】図１２は、セグメント管理情報１００２の
詳細を示す図であり、セグメント管理情報１００２は、
セグメントポインタ１２００、使用中ビット１２０１、
ダーティビット１２０２、ディスクアドレス１２０３を
含んだ構成となっている。FIG. 12 is a diagram showing the details of the segment management information 1002. The segment management information 1002 is
Segment pointer 1200, in-use bit 1201,
The structure includes a dirty bit 1202 and a disk address 1203.

【００７９】セグメントポインタ１２００は、空きセグ
メントキュー１１００に当該セグメント管理情報１００
２が空きセグメントキュー１１００につながれている時
に、次のセグメント管理情報１００２をポイントするた
めのものである。The segment pointer 1200 stores the segment management information 100 in the free segment queue 1100.
This is for pointing to the next segment management information 1002 when 2 is connected to the empty segment queue 1100.

【００８０】使用中ビット１２０１は、ディスクアレイ
制御プロセッサ１３１０における各実行部が、当該セグ
メント管理情報１００２に対応するセグメント８００に
格納されたレコード５００のリード／ライト処理を実行
中にオンにする。すなわち、リード／ライト処理受付部
１００、ライトアフタ処理実行部１０１、閉塞移行処理
実行部１０２、回復処理実行部１０３などがセグメント
８００を使用する場合、本ビット１２０１をオンにす
る。他の実行部は、使用中ビット１２０１がオンのセグ
メント管理情報１２０１を見出した場合、これがオフに
なるまで待つ。これにより、各処理の排他が可能にな
る。The in-use bit 1201 is turned on while each execution unit in the disk array control processor 1310 is executing the read / write processing of the record 500 stored in the segment 800 corresponding to the segment management information 1002. That is, when the read / write process acceptance unit 100, the write-after process execution unit 101, the block migration process execution unit 102, the recovery process execution unit 103, etc. use the segment 800, this bit 1201 is turned on. When the other execution unit finds the segment management information 1201 in which the in-use bit 1201 is on, it waits until it is turned off. This makes it possible to exclude each process.

【００８１】ダーティビット１２０２は、当該セグメン
ト管理情報１００２に対応するセグメント８００に格納
されたレコード５００の内容が未だディスク装置１３０
４に書き込まれていない、すなわち、ライトアフタ処理
を行う必要があるレコード５００であることを示す。As for the dirty bit 1202, the contents of the record 500 stored in the segment 800 corresponding to the segment management information 1002 are still the disk device 130.
4 indicates that the record 500 is not written, that is, the record 500 needs to be subjected to the write-after process.

【００８２】ディスクアドレス１２０３は、当該セグメ
ント管理情報１００２に対応するセグメント８００に格
納されたレコード５００が、どのディスク装置１３０４
で、かつ、そのディスク装置１３０４内のどのアドレス
であるかかを表すものである。The disk address 1203 corresponds to which disk device 1304 the record 500 stored in the segment 800 corresponding to the segment management information 1002 indicates.
And the address in the disk device 1304.

【００８３】以下、図１３〜図１６のフローチャートを
用いて本実施形態における制御装置１３０５内のディス
クアレイ制御プロセッサ１３１０が実行する処理を説明
する。The processing executed by the disk array control processor 1310 in the control device 1305 according to this embodiment will be described below with reference to the flow charts of FIGS.

【００８４】なお、２つのクラスタ１３１１のキャッシ
ュ１３０８のいずれのキャッシュ１３０８も使用可能状
態９００か、回復中状態９０３にある時、どちらのキャ
ッシュ１３０８を使用するかは、リード／ライトするレ
コード５００のディスク装置１３０４のアドレス等によ
り一意に定まるとする。When any of the caches 1308 of the two clusters 1311 is in the usable state 900 or the recovering state 903, which of the caches 1308 is to be used depends on the disk of the record 500 to be read / written. It is assumed that it is uniquely determined by the address of the device 1304.

【００８５】図１３は、リード／ライト受付部１００の
処理を示すフローチャートであり、この処理は処理装置
１３００からリード／ライト要求を受け付けた時に開始
される。FIG. 13 is a flowchart showing the processing of the read / write accepting unit 100. This processing is started when a read / write request is accepted from the processing device 1300.

【００８６】まず、ステップ１１０で、どちらのキャッ
シュ１３０８も使用可能状態９００の時、どちらのキャ
ッシュ１３０８を使用するかを決定する。次に、ステッ
プ１１１で、当該キャッシュ１３０８の状態を調べる。
使用可能状態９００の場合には、そのまま当該キャッシ
ュ１３０８を使用し、ステップ１１２でリード／ライト
処理の実行に入る。First, at step 110, when either cache 1308 is in the usable state 900, it is determined which cache 1308 is to be used. Next, in step 111, the state of the cache 1308 is checked.
In the case of the usable state 900, the cache 1308 is used as it is, and the read / write processing is started in step 112.

【００８７】しかし、閉塞状態９０２の時には、ステッ
プ１１３で反対のキャッシュ１３０８を使用することに
決定し、この後、ステップ１１２へジャンプする。However, in the closed state 902, it is decided in step 113 to use the opposite cache 1308, and then the process jumps to step 112.

【００８８】また、閉塞移行中状態９０１の場合、ま
ず、ステップ１１４で当該キャッシュ１３０８にリード
／ライトするレコード５００が格納されているかをチェ
ックする。格納されていなければ、ステップ１１３へジ
ャンプする。格納されている場合、ステップ１１５にお
いて、このレコード５００の内容を反対のキャッシュ１
３０８にコピーするか、このレコード５００の内容がデ
ィスク装置１３０４に書き込まれていないものかどうか
を図１２のダーティビット１２０２によってチェック
し、書き込まれていなければ、ディスク装置１３０４に
書き込む。Further, in the case of the blocking transfer in progress state 901, first, in step 114, it is checked whether or not the record 500 to be read / written is stored in the cache 1308. If not stored, the process jumps to step 113. If so, in step 115, the contents of this record 500 are reversed to cache 1
It is copied to 308, or it is checked by the dirty bit 1202 in FIG. 12 whether the contents of this record 500 have not been written in the disk device 1304. If not, it is written in the disk device 1304.

【００８９】なお、ディスク装置１３０４に書き込む
時、ディスクアレイの構成がＲＡＩＤ３やＲＡＩＤ５の
場合、レコード５００に対応するパリティデータを作成
し、ディスク装置１３０４に書き込む。ＲＡＩＤ１の構
成の場合、レコード５００の内容を２つのディスク装置
１３０４に書き込む。When writing to the disk device 1304, if the disk array configuration is RAID 3 or RAID 5, parity data corresponding to the record 500 is created and written to the disk device 1304. In the case of the RAID 1 configuration, the contents of the record 500 are written in the two disk devices 1304.

【００９０】さらに、ステップ１１６で、当該キャッシ
ュ１３０８からこのレコードを消去する。この後、ステ
ップ１１３へジャンプする。Further, in step 116, this record is deleted from the cache 1308. After this, the process jumps to step 113.

【００９１】回復中状態９０１の場合、ステップ１１７
で反対側のキャッシュ１３０８にリード／ライトするレ
コード５００が格納されているかをチェックする。格納
されていなければ、ステップ１１２へジャンプする。格
納されている場合、ステップ１１８で、このレコード５
００の内容を反対のキャッシュ１３０８から当該キャッ
シュ１３０８にコピーするか、このレコード５００の内
容がディスク装置１３０４に書き込まれていないかチェ
ックし、書き込まれていなければ、ディスク装置１３０
４に書き込む。In the case of the recovering state 901, step 117
Then, it is checked whether or not the record 500 to be read / written is stored in the cache 1308 on the opposite side. If not stored, the process jumps to step 112. If so, in step 118, this record 5
00 is copied from the opposite cache 1308 to the cache 1308, or it is checked whether the content of this record 500 has been written to the disk device 1304. If not, the disk device 130
Write to 4.

【００９２】なお、ディスク装置１３０４に書き込みを
行なう時、ディスクアレイの構成がＲＡＩＤ３やＲＡＩ
Ｄ５の場合、レコード５００に対応するパリティデータ
を作成し、ディスク装置１３０４に書き込む。ＲＡＩＤ
１の構成の場合、レコード５００の内容を２つのディス
ク装置１３０４に書き込む。When writing to the disk device 1304, the disk array configuration is RAID3 or RAI.
In the case of D5, parity data corresponding to the record 500 is created and written in the disk device 1304. RAID
In the case of the configuration of 1, the content of the record 500 is written in the two disk devices 1304.

【００９３】さらに、ステップ１１９で、当該キャッシ
ュ１３０８からこのレコードを消去する。この後、リー
ド／ライト処理に入る。Further, in step 119, this record is deleted from the cache 1308. After this, read / write processing starts.

【００９４】図１４は、ライトアフタ処理実行部１０１
の処理を示すフローチャートであり、その概要は既に説
明したように、キャッシュ１３０８の中に未書き込みの
データをディスク装置１３０４に書き込むものである
（ステップ１３１）。FIG. 14 shows the write after process execution unit 101.
Is a flow chart showing the processing of the above, the outline of which is to write the unwritten data in the cache 1308 to the disk device 1304, as already described (step 131).

【００９５】図１５は、閉塞移行処理実行部１０２の処
理を示すフローチャートであり、この処理は、当該キャ
ッシュ１３０８のボード４００の障害を検知したり、保
守処理を開始しようとした時に開始される。本処理は、
当該キャッシュ１３０８を閉塞するための前処理であ
る。FIG. 15 is a flowchart showing the processing of the blocking migration processing execution unit 102. This processing is started when a failure of the board 400 of the cache 1308 is detected or when the maintenance processing is started. This processing is
This is a pre-process for closing the cache 1308.

【００９６】具体的には、ステップ１２０で当該キャッ
シュ１３０８の中に格納されているレコード５００がな
いかを調べる。見つかった場合、ステップ１２１で、こ
のレコード５００の内容を反対のキャッシュ１３０８に
コピーするか、このレコード５００がライトアフタ処理
を行なう必要があるかをチェックし、ライトアフタ処理
を行なう必要がある未書き込みデータならば、その内容
をディスク装置１３０４に書き込む。Specifically, in step 120, it is checked whether or not there is a record 500 stored in the cache 1308. If found, in step 121, it is checked whether the content of this record 500 is copied to the opposite cache 1308 or whether this record 500 needs to perform write-after processing, and write-after processing needs to be performed. If it is data, the content is written in the disk device 1304.

【００９７】なお、ディスク装置１３０４に書き込む
時、ディスクアレイの構成がＲＡＩＤ３やＲＡＩＤ５の
場合、レコード５００に対応するパリティデータを作成
し、ディスク装置１３０４に書き込む。ＲＡＩＤ１の構
成の場合、レコード５００の内容を２つのディスク装置
１３０４に書き込む。When writing to the disk device 1304, if the disk array configuration is RAID 3 or RAID 5, parity data corresponding to the record 500 is created and written to the disk device 1304. In the case of the RAID 1 configuration, the contents of the record 500 are written in the two disk devices 1304.

【００９８】この後、このレコード５００を当該キャッ
シュ１３０８から消去する。当該キャッシュ１３０８に
格納されたレコード５００がなくなった時、ステップ１
２２で、当該キャッシュ１３０８の状態を閉塞状態９０
２にして処理を完了する。Thereafter, this record 500 is deleted from the cache 1308. When there are no more records 500 stored in the cache 1308, step 1
22, the state of the cache 1308 is closed 90
Then, the process is completed.

【００９９】図１６は、回復処理実行部１０３の処理を
示すフローチャートであり、この処理は、当該キャッシ
ュ１３０８の保守が完了したり、障害を起こしたボード
４００の交換等が完了したときに開始される。本処理
は、当該キャッシュ１３０８を使用可能状態９００にす
るための前処理である。FIG. 16 is a flow chart showing the processing of the recovery processing execution unit 103. This processing is started when the maintenance of the cache 1308 concerned is completed, or the replacement of the failed board 400 is completed. It This process is a pre-process for bringing the cache 1308 into the usable state 900.

【０１００】まず、ステップ１２３で、反対のキャッシ
ュ１３０８の中に、当該キャッシュ１３０８に格納すべ
きレコード５００がないかをチェックする。見つかった
場合、ステップ１２４で、このレコード５００の内容を
当該キャッシュ１３０８にコピーするか、このレコード
５００がライトアフタすべきレコード５００であれば、
その内容をディスク装置１３０４に書き込む。First, in step 123, it is checked whether or not there is a record 500 to be stored in the cache 1308 in the opposite cache 1308. If found, in step 124, the content of this record 500 is copied to the cache 1308, or if this record 500 is the record 500 to be write-after,
The contents are written in the disk device 1304.

【０１０１】なお、ディスク装置１３０４に書き込む
時、ディスクアレイの構成がＲＡＩＤ３やＲＡＩＤ５の
場合、レコード５００に対応するパリティデータを作成
し、ディスク装置１３０４に書き込む。ＲＡＩＤ１の構
成の場合、レコード５００の内容を２つのディスク装置
１３０４に書き込む。When writing to the disk device 1304, if the configuration of the disk array is RAID3 or RAID5, parity data corresponding to the record 500 is created and written to the disk device 1304. In the case of the RAID 1 configuration, the contents of the record 500 are written in the two disk devices 1304.

【０１０２】この後、当該レコード５００を反対のキャ
ッシュ１３０８から消去する。反対のキャッシュ１３０
８に当該キャッシュ１３０８に格納すべきレコード５０
０がなくなった時、ステップ１２５で、当該キャッシュ
１３０８の状態を使用可能状態９０２にして処理を完了
する。After that, the record 500 is erased from the opposite cache 1308. Opposite cache 130
8 records 50 to be stored in the cache 1308
When 0 disappears, in step 125, the state of the cache 1308 is set to the usable state 902, and the processing is completed.

【０１０３】ここで、リード／ライト処理受付部１００
とライトアフタ処理実行部１０１と閉塞移行処理実行部
１０２の処理は並列に実行される。各処理の排他は、キ
ャッシュ１３０８のセグメント管理情報１００２中の使
用ビット１２０１を用いて実行される。同様に、リード
／ライト処理受付部１００とライトアフタ処理実行部１
０１と回復処理実行部１０３の処理も並列に実行され
る。この場合も、各処理の排他は、キャッシュ１３０８
の使用ビット１２０１を用いて実行される。Here, the read / write processing acceptance unit 100
The processes of the write-after process execution unit 101 and the block transfer process execution unit 102 are executed in parallel. The exclusion of each process is executed by using the usage bit 1201 in the segment management information 1002 of the cache 1308. Similarly, the read / write processing acceptance unit 100 and the write-after processing execution unit 1
01 and the process of the recovery process execution unit 103 are also executed in parallel. Also in this case, exclusion of each process is performed by the cache 1308.
Is executed using the use bit 1201 of

【０１０４】本実施形態では、それぞれのクラスタ内の
データの格納形式にＲＡＩＤ３のような冗長性を持た
せ、クラスタ間ではキャッシュメモリに格納するデータ
を重複させないようにしておく。さらに、保守処理を行
なう際には、保守を行うキャッシュメモリは新たにデー
タの格納を行なわないようにし、保守を行うキャッシュ
メモリにそれまで格納していたデータを、もう一方のク
ラスタのキャッシュメモリに書き込むか、ディスク装置
に書き込むようにした。以上により、保守処理中の信頼
性を向上させることができるという効果がある。In the present embodiment, the storage format of the data in each cluster has redundancy such as RAID3 so that the data stored in the cache memory is not duplicated between the clusters. Furthermore, when performing maintenance processing, do not store new data in the cache memory that performs maintenance, and use the data that was previously stored in the cache memory that performs maintenance in the cache memory of the other cluster. I tried to write or write to the disk device. As described above, there is an effect that reliability during maintenance processing can be improved.

【０１０５】次に、本発明の第２の実施形態について説
明する。Next, a second embodiment of the present invention will be described.

【０１０６】図１７は、本発明を適用した計算機システ
ムの第２の実施の形態を示す構成図である。FIG. 17 is a configuration diagram showing a second embodiment of a computer system to which the present invention is applied.

【０１０７】この第２の実施形態と前述した第１の実施
形態との構成上の相違点は、次の通りである。The structural difference between the second embodiment and the above-described first embodiment is as follows.

【０１０８】（１）制御装置１３０５がクラスタ１３１
１を持たない。(1) The control unit 1305 is the cluster 131
Do not have 1.

【０１０９】（２）第２の実施形態の制御装置１３０５
が２本のバス１３１２を持っている。(2) Controller 1305 of the second embodiment
Has two buses 1312.

【０１１０】（３）キャッシュ１３０８の管理情報格納
領域３００の構成が異なる。(3) The configuration of the management information storage area 300 of the cache 1308 is different.

【０１１１】この実施形態における管理情報格納領域３
００の構成を図１８に示す。この実施形態の管理情報格
納領域３００は、キャッシュ状態１４００、セグメント
対応のセグメント管理情報１００２、空きセグメントキ
ューポインタ１００３で構成されている。第１の実施形
態との相違は、キャッシュ１３０８が１つしかないた
め、キャッシュ１３０８の状態が１つしかない点であ
る。Management information storage area 3 in this embodiment
The configuration of 00 is shown in FIG. The management information storage area 300 of this embodiment includes a cache status 1400, segment management information 1002 corresponding to a segment, and an empty segment queue pointer 1003. The difference from the first embodiment is that since there is only one cache 1308, the cache 1308 has only one state.

【０１１２】これ以外の図３〜図６、図８〜図９、図１
１〜図１２の構成は、第２の実施形態でも適用できる。Other than this, FIG. 3 to FIG. 6, FIG. 8 to FIG.
The configurations of 1 to 12 can also be applied to the second embodiment.

【０１１３】但し、本実施形態では、管理情報６００は
図６に示したように必ず２つの管理情報ボード４０１に
格納される。また本実施形態では、図５に示した通り、
レコード５００は、ＲＡＩＤ３の形式で各ディスクキャ
ッシュとボード４０２に必ず格納される。However, in this embodiment, the management information 600 is always stored in the two management information boards 401 as shown in FIG. Further, in this embodiment, as shown in FIG.
The record 500 is always stored in each disk cache and the board 402 in the RAID3 format.

【０１１４】本実施形態では、キャッシュ１３０８は１
つしかないため、障害時のキャッシュ１３０８の閉塞の
単位はボード４００となる。同様に、キャッシュ１３０
８の保守の単位もボード４００である。In this embodiment, the cache 1308 is 1
Since there is only one, the unit of blockage of the cache 1308 at the time of failure is the board 400. Similarly, the cache 130
The maintenance unit of 8 is also the board 400.

【０１１５】図１９は、本実施形態におけるキャッシュ
１３０８の状態遷移をまとめたものである。正常状態１
５００は、キャッシュ１３０８に冗長度がある状態であ
る。この状態で、制御装置１３０５が処理装置１３００
からライト要求を受け取った時には、ライトデータをキ
ャッシュ１３０８に書き込んだ段階で処理を完了する。FIG. 19 summarizes the state transition of the cache 1308 in this embodiment. Normal state 1
500 is a state in which the cache 1308 has redundancy. In this state, the control device 1305 controls the processing device 1300.
When the write request is received from, the process is completed when the write data is written in the cache 1308.

【０１１６】キャッシュ１３０８のボード４００に障害
が発生したり、保守処理を実行する際に、キャッシュ１
３０８の冗長度がなくなる時、冗長度なし移行状態１５
０１に移行する。この冗長度なし移行状態１５０１で
は、キャッシュ１３０８の中に、レコード５００の内容
をディスク装置１３０４に書き込んでいないすべてのレ
コード５００について、その内容をディスク装置１３０
４に書き込む処理が実行される。これが完了すると、キ
ャッシュ１３０８の状態は、冗長度なし状態１５０２に
遷移する。When a failure occurs in the board 400 of the cache 1308 or when maintenance processing is executed, the cache 1308
When there is no redundancy in 308, there is no redundancy transition state 15
Move to 01. In the non-redundancy transition state 1501, the contents of the record 500 in the cache 1308 are not written to the disk device 1304, but the contents are written to the disk device 130.
4 is written. When this is completed, the state of the cache 1308 transits to the no redundancy state 1502.

【０１１７】また、キャッシュ１３０８の状態が正常状
態１６００以外の時には、信頼性の観点から、制御装置
１３０５は、ライト要求を受け取ったとき、ライトデー
タをディスク装置１３０４に書き込んだ段階で処理を完
了させる。When the cache 1308 is in a state other than the normal state 1600, from the viewpoint of reliability, the control device 1305 completes the process when the write data is written to the disk device 1304 when the write request is received. .

【０１１８】なお、リード要求に対する制御装置１３０
５の動作は、キャッシュ１３０８の状態によらず、第１
の実施形態の場合と同じである。Incidentally, the control device 130 for the read request.
5 does not depend on the state of the cache 1308.
This is the same as in the embodiment.

【０１１９】図２０は、制御装置１３０５内のディスク
アレイ制御プロセッサ１３１０の構成を示した図であ
り、リード／ライト受付部１６００、冗長なし移行処理
実行部１６０１、ライトアフタ処理実行部１０１で構成
されている。FIG. 20 is a diagram showing the configuration of the disk array control processor 1310 in the control device 1305, which is composed of a read / write accepting unit 1600, a non-redundant migration process executing unit 1601, and a write-after process executing unit 101. ing.

【０１２０】以下、図２１および図２２のフローチャー
トを参照してディスクアレイ制御プロセッサ１３１０が
実行する処理を説明する。なお、ライトアフタ処理実行
部１０１は、第１の実施形態と同じであるので、その説
明は省略する。The processing executed by the disk array control processor 1310 will be described below with reference to the flowcharts of FIGS. 21 and 22. The write-after process execution unit 101 is the same as that in the first embodiment, and therefore its description is omitted.

【０１２１】図２１は、リード／ライト受付部１６００
の処理を示すフローチャートであり、リード／ライト受
付部１６００は処理装置１３００からリード／ライト要
求を受け付けた時に動作を開始する。FIG. 21 shows a read / write acceptance unit 1600.
2 is a flowchart showing the processing of FIG. 1, and the read / write accepting unit 1600 starts its operation when accepting a read / write request from the processing device 1300.

【０１２２】まず、ステップ１６１０で、リード要求と
ライト要求のどちらを受け取ったかを判別する。リード
要求であった場合、ステップ１６１１で、リード処理の
実行に入る。ライト要求の場合、ステップ１６１２で、
キャッシュ１３０８の状態を分析する。First, in step 1610, it is determined whether the read request or the write request is received. If it is a read request, in step 1611, the read process is started. If it is a write request, in step 1612,
The state of the cache 1308 is analyzed.

【０１２３】キャッシュが正常状態の場合、ステップ１
６１３で、処理装置１３００から受け取ったライトデー
タをキャッシュ１３０８に書き込み動作を終了させる。
冗長なし状態１５０２の場合、ステップ１６１４で、処
理装置１３００から受け取ったライトデータをディスク
装置１３０４に書き込み、動作を終了させる。If the cache is in a normal state, step 1
At 613, the write operation of writing the write data received from the processing device 1300 to the cache 1308 is completed.
In the case of the no redundancy state 1502, in step 1614, the write data received from the processing device 1300 is written to the disk device 1304, and the operation is ended.

【０１２４】冗長度なし移行状態１５０１の場合、図２
２に示す冗長なし移行処理実行部１６０１の処理が開始
される。In the case of the transition state 1501 without redundancy, FIG.
The processing of the non-redundancy migration processing execution unit 1601 shown in 2 is started.

【０１２５】この冗長なし移行処理では、ステップ１６
１５で、レコード５００の内容がディスク装置１３０４
に書き込まれていないかチェックし、書き込まれていれ
ば、ステップ１６１７にジャンプし、キャッシュ状態を
冗長なし状態１５０２に更新する。書き込まれていない
場合、ステップ１６１６で、その未書き込みレコード５
００をディスク装置１３０５に書き込む。その後、ステ
ップ１６１５へジャンプする。In this redundancy-free migration processing, step 16
15, the content of the record 500 is the disk device 1304.
If it is written, the process jumps to step 1617 to update the cache state to the no redundancy state 1502. If not, the unwritten record 5
00 is written in the disk device 1305. Then, the process jumps to step 1615.

【０１２６】なお、ステップ１６１４およびステップ１
６１６で、ディスク装置１３０４に未書き込みレコード
５００を書き込む時、ディスクアレイの構成がＲＡＩＤ
３やＲＡＩＤ５の場合、レコード５００に対応するパリ
ティデータを作成し、ディスク装置１３０４に書き込
む。ＲＡＩＤ１の構成の場合、レコード５００の内容を
２つのディスク装置１３０４に書き込む。Note that step 1614 and step 1
In 616, when writing the unwritten record 500 to the disk device 1304, the configuration of the disk array is RAID.
In the case of 3 or RAID 5, parity data corresponding to the record 500 is created and written in the disk device 1304. In the case of the RAID 1 configuration, the contents of the record 500 are written in the two disk devices 1304.

【０１２７】その後、キャッシュ１３０８の中に、ディ
スク装置１３０４にその内容を書き込んでいないレコー
ド５００がなくなった時、ステップ１６１７でキャッシ
ュ１３０８の状態を冗長なし状態１５０２に更新して処
理を完了する。After that, when there is no record 500 in the cache 1308 whose contents are not written in the disk device 1304, the state of the cache 1308 is updated to the non-redundancy state 1502 in step 1617, and the processing is completed.

【０１２８】以上の説明から明らかなように、本実施形
態によれば、ライトデータをＲＡＩＤ３の形式で格納
し、キャッシュの管理情報をＲＡＩＤ１の形式で格納す
るようにしたため、信頼性をそれほど損なうことなく、
記憶効率を従来に比べて向上させることができる。As is clear from the above description, according to the present embodiment, the write data is stored in the RAID3 format and the cache management information is stored in the RAID1 format, so that the reliability is greatly impaired. Without
The storage efficiency can be improved as compared with the conventional one.

【０１２９】[0129]

【発明の効果】以上のように本発明においては、ディス
クキャッシュのライトデータの格納形式をＲＡＩＤ３の
形式とし、キャッシュの管理情報をＲＡＩＤ１の形式で
格納するようにしたため、信頼性を損なうことなく、記
憶効率を従来に比べて向上させることができる。As described above, in the present invention, the storage format of the write data of the disk cache is set to the RAID3 format and the cache management information is stored in the RAID1 format, so that the reliability is not impaired. The storage efficiency can be improved as compared with the conventional one.

【０１３０】一方、本発明では、複数のクラスタで構成
されるディスク制御装置の場合には、それぞれのクラス
タ内のデータの格納形式に冗長性を持たせ、さらに保守
処理を行なう際には、保守を行うキャッシュメモリに
は、新たにデータの格納を行なわないようにし、保守を
行うキャッシュメモリにそれまで格納していたデータを
もう一方のクラスタのキャッシュメモリに書き込むか、
ディスク装置に書き込むようにするようにした。これに
より、保守処理中の信頼性を向上させることができると
いう効果が得られる。On the other hand, according to the present invention, in the case of a disk control device composed of a plurality of clusters, the storage format of the data in each cluster is made redundant, and further maintenance processing is performed. Do not store new data in the cache memory that performs the operation, and write the data that was stored in the cache memory that performs maintenance to the cache memory of the other cluster, or
I tried to write to the disk device. As a result, it is possible to improve the reliability during the maintenance process.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明を適用したディスクアレイ制御プロセッ
サの第１の実施形態を示す構成図である。FIG. 1 is a configuration diagram showing a first embodiment of a disk array control processor to which the present invention is applied.

【図２】図１のディスクアレイ制御プロセッサを用いた
計算機システムの構成図である。FIG. 2 is a configuration diagram of a computer system using the disk array control processor of FIG.

【図３】キャッシュメモリの論理的構成図である。FIG. 3 is a logical configuration diagram of a cache memory.

【図４】キャッシュメモリの物理的構成図である。FIG. 4 is a physical configuration diagram of a cache memory.

【図５】ディスクキャッシュボードのレコードの格納形
式を示す図である。FIG. 5 is a diagram showing a storage format of a record of a disk cache board.

【図６】管理情報ボードの情報格納形式を示す図であ
る。FIG. 6 is a diagram showing an information storage format of a management information board.

【図７】第１の実施形態におけるキャッシュメモリの状
態遷移を示す図である。FIG. 7 is a diagram showing a state transition of a cache memory in the first embodiment.

【図８】ディスクデータ格納領域の論理的構成図であ
る。FIG. 8 is a logical configuration diagram of a disk data storage area.

【図９】セグメントの物理的格納形式を示す図である。FIG. 9 is a diagram showing a physical storage format of a segment.

【図１０】第１の実施形態における管理情報格納領域の
論理的構成図である。FIG. 10 is a logical configuration diagram of a management information storage area in the first embodiment.

【図１１】空きセグメントキューの説明図である。FIG. 11 is an explanatory diagram of an empty segment queue.

【図１２】セグメント管理情報の構成図である。FIG. 12 is a configuration diagram of segment management information.

【図１３】第１の実施形態におけるリード／ライト受付
部の処理を示すフローチャートである。FIG. 13 is a flowchart showing a process of a read / write accepting unit in the first embodiment.

【図１４】第１の実施形態におけるライトアフタ処理実
行部の処理を示すフローチャートである。FIG. 14 is a flowchart showing a process of a write after process execution unit in the first embodiment.

【図１５】第１の実施形態における閉塞移行処理実行部
の処理を示すフローチャートである。FIG. 15 is a flowchart showing processing of a blocking migration processing execution unit according to the first embodiment.

【図１６】第１の実施形態における回復処理実行部の処
理を示すフローチャートである。FIG. 16 is a flowchart showing processing of a recovery processing execution unit in the first embodiment.

【図１７】本発明の第２の実施形態を示す構成図であ
る。FIG. 17 is a configuration diagram showing a second embodiment of the present invention.

【図１８】第２の実施形態における管理情報格納領域の
論理的構成図である。FIG. 18 is a logical configuration diagram of a management information storage area in the second embodiment.

【図１９】第２の実施形態におけるキャッシュ１３０８
の状態遷移を示す図である。FIG. 19 is a cache 1308 according to the second embodiment.
It is a figure which shows the state transition of.

【図２０】第２の実施形態におけるディスクアレイ制御
プロセッサの構成図である。FIG. 20 is a configuration diagram of a disk array control processor according to the second embodiment.

【図２１】第２の実施形態におけるリード／ライト受付
部の処理を示すフローチャートである。FIG. 21 is a flowchart showing a process of a read / write accepting unit in the second embodiment.

【図２２】第２の実施形態における冗長なし移行処理実
行部の処理を示すフローチャートである。FIG. 22 is a flowchart showing processing of a non-redundancy migration processing execution unit according to the second embodiment.

【符号の説明】[Explanation of symbols]

１００，１６００…リード／ライト受付部、１０１…ラ
イトアフタ処理実行部、１０２…閉塞移行処理実行部、
１０３…回復処理実行部、３００…管理情報格納領域、
３０１…ディスクデータ格納領域、４００…ボード、４
０１…管理情報ボード、４０２…ディスクキャッシュボ
ード、５００…レコード、５０１…分割レコード、５０
２…冗長データ、８００…セグメント、９０１…分割セ
グメント、１３００…処理装置、１３０４…ディスク装
置、１３０５制御装置、１３０８…キャッシュメモリ、
１３１０…ディスクアレイ制御プロセッサ、１３１１…
クラスタ、１３１２…バス、１３１３…電源、１３１４
…交替電源、１６０１…冗長なし移行処理実行部。Reference numeral 100, 1600 ... Read / write acceptance unit, 101 ... Write after processing execution unit, 102 ... Block migration processing execution unit,
103 ... Recovery processing execution unit, 300 ... Management information storage area,
301 ... Disk data storage area, 400 ... Board, 4
01 ... Management information board, 402 ... Disk cache board, 500 ... Record, 501 ... Divided record, 50
2 ... Redundant data, 800 ... Segment, 901 ... Divided segment, 1300 ... Processing device, 1304 ... Disk device, 1305 Control device, 1308 ... Cache memory,
1310 ... Disk array control processor, 1311 ...
Cluster, 1312 ... Bus, 1313 ... Power supply, 1314
... Alternate power supply, 1601 ... Non-redundant migration processing execution unit.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ０６Ｆ 12/08 ３１０ 7623−5ＢＧ０６Ｆ 12/08 ３１０Ｚ３２０ 7623−5Ｂ３２０ (72)発明者安積義弘神奈川県小田原市国府津2880番地株式会社日立製作所ストレージシステム事業部内─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification number Internal reference number FI Technical display location G06F 12/08 310 7623-5B G06F 12/08 310Z 320 7623-5B 320 (72) Inventor Yoshihiro Azumi 2880 Kozu, Odawara-shi, Kanagawa Stock Company Hitachi Storage Systems Division

Claims

【特許請求の範囲】[Claims]

【請求項１】１つ以上の記憶装置と複数のクラスタを
有する制御装置から構成される記憶装置システムであっ
て、それぞれのクラスタが冗長性を備えたキャッシュメモリ
と、前記制御装置が、各クラスタの前記キャッシュメモリが正常状態の時、前
記記憶装置のデータを各クラスタの前記キャッシュメモ
リに格納する手段と、処理装置からライト要求を受付け、該ライト要求に付随
して前記処理装置から受け付けたライトデータを前記キ
ャッシュメモリに格納し、前記ライト要求を完了させる
手段とを有することを特徴とする記憶装置システム。1. A storage device system comprising one or more storage devices and a control device having a plurality of clusters, wherein each cluster has a redundant cache memory, and the control device has each cluster Means for storing the data in the storage device in the cache memory of each cluster when the cache memory is in a normal state; and a write request accepted from the processing device in response to the write request from the processing device. A storage device system, which stores data in the cache memory and completes the write request.

【請求項２】前記クラスタの保守を行なう際の前処理
として、保守対象のクラスタ内のキャッシュメモリ中の
前記記憶装置にまだ書き込まれていないライトデータ
を、保守を行なわないクラスタのキャッシュメモリある
いは前記記憶装置に書き込む手段を有することを特徴と
する請求項１記載の記憶装置システム。2. As a pre-process when the maintenance of the cluster is performed, the write data not yet written in the storage device in the cache memory in the maintenance target cluster is stored in the cache memory of the cluster not performing maintenance or the write data. 2. The storage device system according to claim 1, further comprising means for writing in the storage device.

【請求項３】保守対象のクラスタ内のキャッシュメモ
リ中の前記記憶装置にまだ書き込まれていないライトデ
ータを、保守を行なわないクラスタのキャッシュメモリ
あるいは前記記憶装置に書き込む手段と、前記処理装置からリード／ライト要求を受付け、前記書
き込む手段の書き込み処理と並行してリード／ライト処
理を実行する手段とを有することを特徴とする請求項２
記載の記憶装置システム。3. A unit for writing write data, which has not been written in the storage device in the cache memory in the maintenance target cluster, to the cache memory or the storage device in the cluster that does not perform maintenance, and read from the processing device. A write request is received, and a read / write process is executed in parallel with the write process of the write unit.
The storage system described.

【請求項４】保守対象のクラスタの保守が完了した
後、保守が完了したクラスタ以外のキャッシュメモリ中
で、前記記憶装置にまだ書き込まれていないライトデー
タを、保守が完了したクラスタ内のキャッシュメモリあ
るいは前記記憶装置に書き込む手段を有することを特徴
とする請求項２記載の記憶装置システム。4. After the maintenance of the maintenance target cluster is completed, the write data not yet written in the storage device in the cache memory other than the maintenance completed cluster is replaced with the cache memory in the maintenance completed cluster. 3. The storage device system according to claim 2, further comprising means for writing in the storage device.

【請求項５】保守が完了したクラスタ以外のキャッシ
ュメモリ中で、前記記憶装置にまだ書き込まれていない
ライトデータを、保守が完了した前記クラスタのキャッ
シュメモリあるいは前記記憶装置に書き込む手段を有す
る手段と、前記処理装置からリード／ライト要求を受付け、前記書
き込む手段の書き込み処理と並行してリード／ライト処
理を実行する手段とを有することを特徴とする請求項４
記載の記憶装置システム。5. Means for writing write data that has not yet been written to the storage device in a cache memory other than the maintenance completed cluster to the cache memory or the storage device of the maintenance completed cluster. 5. A means for receiving a read / write request from the processing device, and executing the read / write processing in parallel with the write processing of the write means.
The storage system described.

【請求項６】前記クラスタ内のキャッシュメモリが障
害のために冗長性が失われたとき、障害を起こしたキャ
ッシュメモリ中の前記記憶装置にまだ書き込まれていな
いライトデータを、障害を起こしたキャッシュメモリ以
外のキャッシュメモリあるいは前記記憶装置に書き込む
手段を有することを特徴とする請求項１記載の記憶装置
システム。6. When the cache memory in the cluster loses redundancy due to a failure, the write data not yet written to the storage device in the failed cache memory is replaced by the failed cache. The storage system according to claim 1, further comprising a cache memory other than the memory or a unit for writing in the storage.

【請求項７】障害を起こしたキャッシュメモリ中の前
記記憶装置にまだ書き込まれていないライトデータを、
障害を起こしたキャッシュメモリ以外のキャッシュメモ
リあるいは前記記憶装置に書き込む手段と、前記処理装置からリード／ライト要求を受付け、前記書
き込む手段の書き込み処理と並行してリード／ライト処
理を実行する手段とを有することを特徴とする請求項６
記載の記憶装置システム。7. Write data that has not yet been written to the storage device in the failed cache memory,
A means for writing to a cache memory other than the failed cache memory or the storage device, and means for receiving a read / write request from the processing device and executing the read / write processing in parallel with the write processing of the writing means. 7. The method according to claim 6, wherein
The storage system described.

【請求項８】前記キャッシュメモリの冗長性が回復し
た後、冗長性が回復したキャッシュメモリ以外のキャッ
シュメモリ中で前記記憶装置にまだ書き込まれていない
ライトデータを、冗長性を回復したキャッシュメモリあ
るいは前記記憶装置に書き込む手段を有することを特徴
とする請求項６記載の記憶装置システム。8. After the redundancy of the cache memory is restored, the write data not yet written in the storage device in the cache memory other than the cache memory where the redundancy is restored is stored in the cache memory where the redundancy is restored or 7. The storage device system according to claim 6, further comprising means for writing in the storage device.

【請求項９】冗長性が回復したキャッシュメモリ以外
のキャッシュメモリ中で、前記記憶装置にまだ書き込ま
れていないライトデータを、冗長性を回復したキャッシ
ュメモリあるいは前記記憶装置に書き込む手段を有する
手段と、前記処理装置からリード／ライト要求を受付け、前記書
き込む手段の書き込み処理と並行してリード／ライト処
理を実行する手段とを有することを特徴とする請求項４
記載の記憶装置システム。9. A means having means for writing write data, which has not been written in the storage device, in a cache memory other than the cache memory in which redundancy has been restored, to the cache memory in which redundancy has been restored or the storage device. 5. A means for receiving a read / write request from the processing device, and executing the read / write processing in parallel with the write processing of the write means.
The storage system described.

【請求項１０】１つ以上の記憶装置と制御装置から構
成される記憶装置システムであって、複数のボードから構成され、前記記憶装置の一部のデー
タを格納するキャッシュメモリと、複数のボードから構成され、前記キャッシュメモリに格
納されたデータの管理情報を格納する管理情報格納メモ
リと、前記キャッシュメモリ内のｍ（≧２）枚のボードに前記
記憶装置の内容の一部のデータを分割して格納し、前記
データの内容を回復するための冗長データを前記キャッ
シュメモリ内のｎ（≧１）枚のボードに格納する手段
と、前記管理情報を、前記管理情報メモリ内の２つ以上のボ
ードに重複して格納する手段を有することを特徴とする
記憶装置システム。10. A storage device system comprising one or more storage devices and a control device, comprising a plurality of boards, a cache memory for storing a part of data of the storage devices, and a plurality of boards. And a management information storage memory for storing management information of data stored in the cache memory, and a part of data of the content of the storage device is divided into m (≧ 2) boards in the cache memory. Means for storing redundant data for recovering the contents of the data in n (≧ 1) boards in the cache memory, and storing the management information in two or more pieces in the management information memory. Storage system having means for redundantly storing the same on the board.

【請求項１１】前記キャッシュメモリおよび前記管理
情報格納メモリのいずれも冗長性をもっている時、処理
装置からライト要求を受付け、当該ライト要求に付随し
て前記処理装置から受け付けたライトデータを前記キャ
ッシュメモリに格納し、前記ライト要求を完了させる手
段を有することを特徴とする請求項１０記載の記憶装置
システム。11. When both the cache memory and the management information storage memory have redundancy, a write request is received from a processing device, and write data received from the processing device in association with the write request is written to the cache memory. 11. The storage device system according to claim 10, further comprising means for storing the data in a storage medium and completing the write request.

【請求項１２】前記キャッシュメモリおよび前記管理
情報格納メモリのいずれかが障害のため冗長性がなくな
った時、処理装置からライト要求を受付け、そのライト
要求に付随して前記処理装置から受け付けたライトデー
タを前記記憶装置に書き込んだ後、前記ライト要求を完
了させる手段を有することを特徴とする請求項１０記載
の記憶装置システム。12. A write request accepted from a processing device when a redundancy is lost due to a failure of either the cache memory or the management information storage memory, and the write request accepted from the processing device in association with the write request. 11. The storage device system according to claim 10, further comprising means for completing the write request after writing data in the storage device.

【請求項１３】前記キャッシュメモリおよび前記管理
情報格納メモリのいずれかが障害のため冗長性がなくな
った時、キャッシュメモリ中で前記記憶装置にまだ書き
込まれていないライトデータを、前記記憶装置に書き込
む手段を有することを特徴とする請求項１２記載の記憶
装置システム。13. When any one of the cache memory and the management information storage memory loses redundancy due to a failure, write data not yet written in the storage device in the cache memory is written in the storage device. 13. The storage device system according to claim 12, further comprising means.

【請求項１４】前記キャッシュメモリあるいは前記管
理情報格納メモリのボードの保守を行なう際の前処理と
して、前記キャッシュメモリ中の前記記憶装置にまだ書
き込まれていないライトデータを前記記憶装置に書き込
む手段を有することを特徴とする請求項１０記載の記憶
装置システム。14. A means for writing write data, which has not been written in the storage device in the cache memory, to the storage device as a pre-process when the board of the cache memory or the management information storage memory is maintained. The storage device system according to claim 10, further comprising:

【請求項１５】前記キャッシュメモリあるいは前記管
理情報格納メモリのボードの保守を行っている時、処理
装置からライト要求を受付け、そのライト要求に付随し
て前記処理装置から受け付けたライトデータを前記記憶
装置に書き込んだ後、前記ライト要求を完了させる手段
を有することを特徴とする請求項１４記載の記憶装置シ
ステム。15. A write request is received from a processing device when the board of the cache memory or the management information storage memory is being maintained, and the write data received from the processing device in association with the write request is stored in the storage device. 15. The storage device system according to claim 14, further comprising means for completing the write request after writing to the device.

【請求項１６】前記キャッシュメモリ中の前記記憶装
置にまだ書き込まれていないライトデータを前記記憶装
置に書き込む手段を有する手段と、処理装置からライト要求を受付け、そのライト要求に付
随して前記処理装置から受け付けたライトデータを前記
記憶装置に書き込んだ後、前記ライト要求を完了させる
処理を前記書き込む手段の書き込み処理と並行に実行す
る手段とを有することを特徴とする請求項１５記載の記
憶装置システム。16. A unit having a unit for writing write data, which has not yet been written to the storage device in the cache memory, to the storage device, and a write request from a processing device, the processing being accompanied by the write request. The storage device according to claim 15, further comprising: a unit that, after writing the write data received from the device to the storage unit, executes a process of completing the write request in parallel with the writing process of the writing unit. system.