JP5309816B2

JP5309816B2 - Data management program, data management apparatus, and data management method

Info

Publication number: JP5309816B2
Application number: JP2008230661A
Authority: JP
Inventors: 一隆荻原; 達夫熊野; 和一大江; 高志渡辺; 雅寿田村; 芳浩土屋; 哲太郎丸山
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2008-09-09
Filing date: 2008-09-09
Publication date: 2013-10-09
Anticipated expiration: 2028-09-09
Also published as: JP2010066862A

Abstract

<P>PROBLEM TO BE SOLVED: To prevent lengthening of an save processing execution time caused by recovery processing. <P>SOLUTION: When an save request is input, an save processing execution means 2 stores identification information of a designated save processing target storage device in a storage means 3, and executes save processing. A failure detection means 4 detects a failure storage device wherein failure is occurred. When the occurrence of the failure is detected, an save processing stop instruction means 5 stops the save processing by the save processing execution means 2. A recovery target data decision means 6 sets data wherein a duplication state is impaired as recovery target data. A redundant data copy destination selection means 7 preferentially selects a redundant data copy destination of the recovery target data from storage areas of storage devices except the save processing target storage device. A redundant data copy means copies the recovery target data to the storage area of the redundant data copy destination. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は複数のストレージ装置に格納されたデータを管理するためのデータ管理プログラム、データ管理装置、およびデータ管理方法に関し、特に二重化されたデータを管理するためのデータ管理プログラム、データ管理装置、およびデータ管理方法に関する。 The present invention relates to a data management program, a data management device, and a data management method for managing data stored in a plurality of storage devices, and more particularly to a data management program, a data management device, and a data management program for managing duplicated data. The present invention relates to a data management method.

コンピュータシステムでは、多数のユーザが使用する大量のデータを管理する必要がある。このような大量のデータを管理するためのシステムの１つとして、マルチノードストレージシステムがある。 In a computer system, it is necessary to manage a large amount of data used by many users. One system for managing such a large amount of data is a multi-node storage system.

マルチノードストレージシステムは、複数のノードで構成されるストレージシステムである。例えば、マルチノードストレージシステムには、アクセスノード、制御ノード、およびディスクノードで構成される。アクセスノードは、ディスクノードに格納されたデータへのアクセス環境をユーザに提供するコンピュータである。ユーザは、アクセスノードに対してデータへのアクセス要求を出すことで、ディスクノードに格納されたデータにアクセスすることができる。制御ノードは、マルチノードストレージシステムで提供されるデータの保守管理をコンピュータである。例えば、制御ノードは、仮想的に定義した論理ボリュームと、ディスクノードが管理するストレージ装置内の記憶領域との対応関係を管理する。ディスクノードは、ストレージ装置を有しており、ストレージ装置の記憶領域を、一定の大きさで区切ったスライスと呼ぶ領域で管理する。 A multi-node storage system is a storage system composed of a plurality of nodes. For example, a multi-node storage system includes an access node, a control node, and a disk node. The access node is a computer that provides the user with an access environment to data stored in the disk node. The user can access the data stored in the disk node by issuing a data access request to the access node. The control node is a computer that manages maintenance of data provided in the multi-node storage system. For example, the control node manages the correspondence between the virtually defined logical volume and the storage area in the storage device managed by the disk node. The disk node has a storage device, and manages the storage area of the storage device in an area called a slice divided by a certain size.

マルチノードストレージシステムでは、同一のデータを異なる複数のディスクノードで管理する。すなわち、同じデータが、異なるディスクノードに属する２つのスライスに格納される。このようにデータを冗長化することで、１つのディスクノードに障害が発生してもデータを失わずに済む。 In a multi-node storage system, the same data is managed by a plurality of different disk nodes. That is, the same data is stored in two slices belonging to different disk nodes. By making the data redundant in this way, even if a failure occurs in one disk node, the data is not lost.

なお、ディスクノードに障害が発生すると、そのディスクノードに格納されたデータが使用できなくなり、データの二重化状態が失われる。その場合、制御ノードがディスクノードに指示を出し、リカバリ処理が開始される。リカバリ処理とは、データの二重化を復旧する処理である。 When a failure occurs in a disk node, the data stored in the disk node cannot be used, and the data duplex state is lost. In that case, the control node issues an instruction to the disk node, and the recovery process is started. The recovery process is a process for recovering data duplication.

リカバリ処理では、制御ノードが、障害などでアクセスができなくなったスライスとペアを組んでいたスライス（二重化が失われたスライス）を特定する。次に、制御ノードは、二重化が失われたスライスとペアを組む別のスライス（これをリザーブスライスと呼ぶ）を確保する。そして、制御ノードは、確保したリザーブスライスに対してデータコピーを行うように、二重化が失われたスライスを管理するディスクノードに指示する。このようなリカバリ処理により、マルチノードストレージシステムでは、ディスクノードで障害が発生してもデータの二重化状態が回復される。 In the recovery process, the control node identifies a slice that has paired with a slice that has become inaccessible due to a failure or the like (a slice in which duplexing has been lost). Next, the control node secures another slice (referred to as a reserve slice) that forms a pair with the slice from which duplexing has been lost. Then, the control node instructs the disk node that manages the slice from which duplexing has been lost so as to copy data to the reserved slice that has been secured. With such a recovery process, in the multi-node storage system, the data duplex state is recovered even if a failure occurs in the disk node.

ところで、障害とは別の理由で、ディスクノードの機能を停止する場合がある。例えばディスクノードのハードウェアを、より性能の高い別のハードウェアと入れ替える場合などである。このような場合、データの退避処理（evacuate処理）が行われる。退避処理とは、指定したディスクノードが管理するデータを他のノードにコピーする処理である。管理者は、退避処理の後に指定したディスクノードを新しいハードウェアと交換する。ディスクノードを交換後、そのディスクノードが再度マルチノードストレージシステムに組み込まれる。なお、ディスクノードのハードウェアの交換時に限らず、ディスクノードのＯＳ（Operating System）などの再起動を伴うソフトウェアのアップグレード時にも退避処理が行われる。ディスクノードのメンテナンス時に退避処理を行うことにより、データの二重化を維持したままハードウェアのグレードアップなどの作業が可能となる。 By the way, the function of the disk node may be stopped for a reason different from the failure. For example, the hardware of the disk node may be replaced with another hardware with higher performance. In such a case, data evacuation processing is performed. The save process is a process for copying data managed by a specified disk node to another node. The administrator replaces the specified disk node after replacement processing with new hardware. After exchanging the disk node, the disk node is incorporated into the multi-node storage system again. Note that the save processing is performed not only when the hardware of the disk node is replaced, but also when the software is upgraded with restart of the OS (Operating System) of the disk node. By performing backup processing during disk node maintenance, it is possible to perform operations such as upgrading the hardware while maintaining duplication of data.

なお、退避処理に要する時間は、ディスクノードが管理しているストレージ装置の記憶容量に大きく依存する。最近のストレージ装置の大容量化に伴い退避処理の時間も長くなる傾向にある。退避処理に長時間を要することにより、退避処理中に、退避処理対象とは別のディスクノードに障害が発生する可能性も高くなる。退避処理中に退避処理対象ではないディスクノードに障害が発生すると退避処理が打ち切られ、リカバリ処理が実行される。
特開２００５−４６８１号公報 Note that the time required for the save process greatly depends on the storage capacity of the storage device managed by the disk node. With the recent increase in capacity of storage devices, the save processing time tends to increase. Since the save process takes a long time, the possibility of a failure occurring in a disk node other than the save process target during the save process increases. If a failure occurs in a disk node that is not the target of the save process during the save process, the save process is aborted and the recovery process is executed.
JP 2005-4681 A

しかし、退避処理を中止してリカバリ処理を実行した場合、退避処理対象のディスクノードのスライスが、リザーブスライスとして選択される場合がある。すると、退避処理対象のディスクノードのスライスにデータがコピーされる。この場合、リカバリ完了後に退避処理を中止したディスクノードの退避処理を再度続行すると、中止前よりも退避させなければならないデータ量が増え、退避処理が完了するまでの時間が長期化してしまう。 However, when the save process is canceled and the recovery process is executed, the slice of the disk node that is the save process target may be selected as the reserve slice. Then, data is copied to the slice of the disk node to be saved. In this case, if the evacuation process of the disk node for which the evacuation process is stopped after the recovery is completed is continued again, the amount of data that must be evacuated increases before the cancellation, and the time until the evacuation process is completed becomes longer.

本発明はこのような点に鑑みてなされたものであり、リカバリ処理に起因して退避処理実行時間が長期化するのを防止できるデータ管理プログラムを提供することを目的とする。 The present invention has been made in view of these points, and an object of the present invention is to provide a data management program capable of preventing the save processing execution time from being prolonged due to the recovery processing.

上記課題を解決するために、以下のような機能を実現するデータ管理装置が提供される。データ管理装置は、複数のストレージ装置に格納されたデータを管理するために、退避処理実行手段、障害検出手段、退避処理中止指示手段、復旧対象データ判定手段、冗長データコピー先選択手段、および冗長データコピー手段を有する。 In order to solve the above-described problems, a data management apparatus that realizes the following functions is provided. In order to manage data stored in a plurality of storage devices, the data management device includes a save processing execution means, a failure detection means, a save processing stop instruction means, a recovery target data determination means, a redundant data copy destination selection means, and a redundancy It has a data copy means.

退避処理実行手段は、任意のストレージ装置を指定した退避要求が入力されると、指定された退避処理対象ストレージ装置の識別情報を記憶手段に格納し、退避処理対象ストレージ装置に格納されている全データを他のストレージ装置にコピーする退避処理を実行する。障害検出手段は、障害が発生した障害ストレージ装置を検出する。退避処理中止指示手段は、退避処理中に障害検出手段で障害の発生が検出された場合、退避処理実行手段による退避処理を中止させる。復旧対象データ判定手段は、障害検出手段で検出された障害ストレージ装置の障害により二重化状態が損なわれたデータを復旧対象データとする。冗長データコピー先選択手段は、記憶手段を参照し、復旧対象データの冗長データコピー先を、退避処理対象ストレージ装置以外のストレージ装置の記憶領域から優先的に選択する。冗長データコピー手段は、冗長データコピー先選択手段で選択された冗長データコピー先の記憶領域に、復旧対象データをコピーする。 When an evacuation request specifying an arbitrary storage device is input, the evacuation processing execution unit stores the identification information of the specified evacuation processing target storage device in the storage unit, and stores all the information stored in the evacuation processing target storage device. Execute save processing to copy data to another storage device. The failure detection means detects a failed storage device in which a failure has occurred. The save process stop instructing unit stops the save process by the save process executing unit when the failure detection unit detects a failure during the save process. The recovery target data determination means sets the data whose duplex state has been lost due to the failure of the failed storage device detected by the failure detection means as the recovery target data. The redundant data copy destination selection unit refers to the storage unit and preferentially selects the redundant data copy destination of the recovery target data from the storage area of the storage device other than the save processing target storage device. The redundant data copy means copies the recovery target data to the redundant data copy destination storage area selected by the redundant data copy destination selection means.

また、データ管理装置と同様の機能をコンピュータに実行させるデータ管理プログラム、およびデータ管理装置で実行される処理をコンピュータで実行するデータ管理方法が提供される。 Also provided are a data management program for causing a computer to execute the same function as that of the data management device, and a data management method for executing processing executed by the data management device on the computer.

上記データ管理プログラム、データ管理装置、およびデータ管理方法では、退避処理対象のストレージ装置内のデータ量を増加させずにリカバリ処理が実行され、退避処理実行時間の長期化を防止できる。 In the data management program, the data management device, and the data management method, the recovery process is executed without increasing the amount of data in the storage apparatus that is the save process target, and the save process execution time can be prevented from prolonging.

以下、本発明の実施の形態を図面を参照して説明する。
図１は、実施の形態の概要を示す図である。実施の形態では、複数のストレージ装置１ａ，１ｂ，１ｃ，１ｄに格納されたデータを管理するために、退避処理実行手段２、障害検出手段４、退避処理中止指示手段５、復旧対象データ判定手段６、冗長データコピー先選択手段７、および冗長データコピー手段８を有する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a diagram showing an outline of the embodiment. In the embodiment, in order to manage data stored in a plurality of storage apparatuses 1a, 1b, 1c, 1d, save processing execution means 2, failure detection means 4, save processing stop instruction means 5, recovery target data determination means 6. Redundant data copy destination selection means 7 and redundant data copy means 8 are provided.

なお、図１の例では、ストレージ装置１ａ，１ｂ，１ｃ，１ｄ内が単位記憶領域（スライス）に分割され、二重化されたスライスに同じ識別番号を付している。
退避処理実行手段２は、任意のストレージ装置を指定した退避要求が入力されると、指定された退避処理対象ストレージ装置の識別情報を記憶手段３に格納する。図１の例では、ストレージ装置１ｃが、退避処理対象ストレージ装置である。また、退避処理実行手段２は、退避処理対象ストレージ装置に格納されている全データを他のストレージ装置にコピーする退避処理を実行する。図１の例では、ストレージ装置１ｃに格納されているデータが、他のストレージ装置１ａ，１ｂ，１ｄにコピーされる。 In the example of FIG. 1, the storage devices 1a, 1b, 1c, and 1d are divided into unit storage areas (slices), and duplicated slices are given the same identification numbers.
When a save request designating an arbitrary storage device is input, the save processing execution means 2 stores the identification information of the designated save processing target storage device in the storage means 3. In the example of FIG. 1, the storage device 1c is a storage processing target storage device. Further, the save processing execution means 2 executes a save process for copying all data stored in the save processing target storage device to another storage device. In the example of FIG. 1, the data stored in the storage device 1c is copied to the other storage devices 1a, 1b, 1d.

障害検出手段４は、障害が発生した障害ストレージ装置を検出する。図１の例では、ストレージ装置１ｄに障害が発生している。
退避処理中止指示手段５は、退避処理中に障害検出手段４で障害の発生が検出された場合、退避処理実行手段２による退避処理を中止させる。図１の例では、識別番号「１」のスライスのデータと、識別番号「３」のスライスのデータとのコピーが終了した時点で、退避処理が中止されたものとする。すなわち、識別番号「２」のスライスのデータのコピーは実行されていない（退避処理が未処理である）。 The failure detection means 4 detects a failed storage device in which a failure has occurred. In the example of FIG. 1, a failure has occurred in the storage device 1d.
The save process stop instructing unit 5 stops the save process by the save process executing unit 2 when the failure detection unit 4 detects a failure during the save process. In the example of FIG. 1, it is assumed that the saving process is stopped when copying of the data of the slice with the identification number “1” and the data of the slice with the identification number “3” is completed. That is, the copy of the data of the slice with the identification number “2” has not been executed (the save process has not been processed).

復旧対象データ判定手段６は、障害検出手段４で検出された障害ストレージ装置の障害により二重化状態が損なわれたデータを復旧対象データとする。例えば、復旧対象データ判定手段６は、正常に動作しているストレージ装置１ａ，１ｂ，１ｃから格納されているデータの情報を収集し、各データの対応関係を調査することで、二重化状態が損なわれているデータを検出し、検出したデータを復旧対象データとする。図１の例では、識別番号「２」、「３」、「５」のスライスのデータが復旧対象データとなる。 The recovery target data determination unit 6 sets the data whose duplex state has been lost due to the failure of the failed storage device detected by the failure detection unit 4 as the recovery target data. For example, the recovery target data determination unit 6 collects data information stored from the normally operating storage apparatuses 1a, 1b, and 1c, and investigates the correspondence between the data, thereby losing the duplex state. The detected data is detected, and the detected data is set as the recovery target data. In the example of FIG. 1, the data of the slices with the identification numbers “2”, “3”, and “5” are the recovery target data.

冗長データコピー先選択手段７は、記憶手段３を参照して退避処理対象ストレージ装置を認識する。次に、冗長データコピー先選択手段７は、復旧対象データの冗長データコピー先を、退避処理対象ストレージ装置以外のストレージ装置の記憶領域から優先的に選択する。図１の例では、ストレージ装置１ａ，１ｂの記憶領域が優先的に冗長データコピー先として選択される。なお、冗長データコピー先を選択する場合、コピー元の復旧対象データが格納されたストレージ装置とは異なるストレージ装置内の記憶領域が選択される。 The redundant data copy destination selection unit 7 refers to the storage unit 3 and recognizes the storage processing target storage device. Next, the redundant data copy destination selection unit 7 preferentially selects the redundant data copy destination of the recovery target data from the storage area of the storage device other than the save processing target storage device. In the example of FIG. 1, the storage areas of the storage apparatuses 1a and 1b are preferentially selected as redundant data copy destinations. When a redundant data copy destination is selected, a storage area in a storage device different from the storage device in which the copy source recovery target data is stored is selected.

冗長データコピー手段８は、冗長データコピー先選択手段で選択された冗長データコピー先の記憶領域に、復旧対象データをコピーする。
このような機能によって、退避処理実行手段２による退避処理実行中にストレージ装置の障害が検出されると、退避処理中止指示手段５による退避処理が中止され、リカバリ処理が開始される。リカバリ処理では、復旧対象データ判定手段６により復旧対象データが特定される。次に、冗長データコピー先選択手段７により、復旧対象データの冗長データコピー先が選択される。その際、退避処理対象ストレージ装置以外のストレージ装置の記憶領域から優先的に選択される。そして、冗長データコピー手段８により、冗長データコピー先の記憶領域に復旧対象データがコピーされる。 The redundant data copy means 8 copies the recovery target data to the redundant data copy destination storage area selected by the redundant data copy destination selection means.
With such a function, when a failure of the storage apparatus is detected during execution of the save process by the save process execution unit 2, the save process by the save process stop instructing unit 5 is stopped and the recovery process is started. In the recovery process, recovery target data is specified by the recovery target data determination means 6. Next, the redundant data copy destination selection means 7 selects the redundant data copy destination of the recovery target data. At that time, the storage area of the storage apparatus other than the save process target storage apparatus is preferentially selected. Then, the redundant data copy means 8 copies the recovery target data to the redundant data copy destination storage area.

このように、リカバリ処理において退避処理対象ストレージ装置以外のストレージ装置の記憶領域から優先的に冗長データコピー先を選択するようにしたため、退避処理対象ストレージ装置の記憶領域が冗長データコピー先として選択されることを防止できる。その結果、リカバリ処理の実行に起因して退避対象データが増大することがなくなり、退避処理の長期化が防止される。 As described above, since the redundant data copy destination is preferentially selected from the storage area of the storage device other than the save processing target storage device in the recovery process, the storage area of the save processing target storage device is selected as the redundant data copy destination. Can be prevented. As a result, the save target data does not increase due to the execution of the recovery process, thereby preventing the save process from being lengthened.

ところで、図１に示した機能は、マルチノードストレージシステムに適用することができる。マルチノードストレージシステムでは、各ストレージ装置が個別のディスクノードで管理されている。そのため、ディスクノードに対してネットワーク経由でコピー指示を出すことで、ディスクノード間でデータをコピーさせることができる。そこで、退避処理や障害検出時のリカバリ処理は、ディスクノードにネットワークで接続された制御ノードからの遠隔制御によって実行することが可能となる。以下に、マルチノードストレージシステムを用いた実施の形態を詳細に説明する。 Incidentally, the function shown in FIG. 1 can be applied to a multi-node storage system. In a multi-node storage system, each storage device is managed by an individual disk node. Therefore, data can be copied between the disk nodes by issuing a copy instruction to the disk nodes via the network. Therefore, the save process and the recovery process when a failure is detected can be executed by remote control from a control node connected to the disk node via a network. In the following, an embodiment using a multi-node storage system will be described in detail.

図２は、本実施の形態のマルチノードストレージシステム構成例を示す図である。本実施の形態では、ネットワーク１０を介して、複数のディスクノード１００，２００，３００，４００、制御ノード５００、アクセスノード３０，４０、および管理ノード５０が接続されている。ディスクノード１００，２００，３００，４００それぞれには、ストレージ装置１１０，２１０，３１０，４１０が接続されている。 FIG. 2 is a diagram illustrating a configuration example of a multi-node storage system according to the present embodiment. In the present embodiment, a plurality of disk nodes 100, 200, 300, 400, a control node 500, access nodes 30, 40, and a management node 50 are connected via the network 10. Storage devices 110, 210, 310, and 410 are connected to the disk nodes 100, 200, 300, and 400, respectively.

ストレージ装置１１０には、複数のハードディスク装置（ＨＤＤ）１１１，１１２，１１３，１１４が実装されている。ストレージ装置２１０には、複数のＨＤＤ２１１，２１２，２１３，２１４が実装されている。ストレージ装置３１０には、複数のＨＤＤ３１１，３１２，３１３，３１４が実装されている。ストレージ装置４１０には、複数のＨＤＤ４１１，４１２，４１３，４１４が実装されている。各ストレージ装置１１０，２１０，３１０，４１０は、内蔵するＨＤＤを用いたＲＡＩＤシステムである。本実施の形態では、各ストレージ装置１１０，２１０，３１０，４１０のＲＡＩＤ５のディスク管理サービスを提供する。 A plurality of hard disk devices (HDDs) 111, 112, 113, and 114 are mounted on the storage device 110. A plurality of HDDs 211, 212, 213, and 214 are mounted on the storage device 210. A plurality of HDDs 311, 312, 313, and 314 are mounted on the storage device 310. A plurality of HDDs 411, 412, 413, and 414 are mounted on the storage device 410. Each storage device 110, 210, 310, 410 is a RAID system using a built-in HDD. In the present embodiment, a RAID 5 disk management service for each of the storage apparatuses 110, 210, 310, 410 is provided.

ディスクノード１００，２００，３００，４００は、例えば、ＩＡ（Intel Architecture）と呼ばれるアーキテクチャのコンピュータである。ディスクノード１００，２００，３００，４００は、接続されたストレージ装置１１０，２１０，３１０，４１０に格納されたデータを管理し、管理しているデータをネットワーク１０経由で端末装置２１，２２，２３に提供する。また、ディスクノード１００，２００，３００，４００は、冗長性を有するデータを管理している。すなわち、同一のデータが、少なくとも２つのディスクノードで管理されている。 The disk nodes 100, 200, 300, and 400 are computers having an architecture called IA (Intel Architecture), for example. The disk nodes 100, 200, 300, and 400 manage data stored in the connected storage devices 110, 210, 310, and 410, and manage the managed data to the terminal devices 21, 22, and 23 via the network 10. provide. The disk nodes 100, 200, 300, and 400 manage data having redundancy. That is, the same data is managed by at least two disk nodes.

制御ノード５００は、ディスクノード１００，２００，３００，４００を管理する。例えば、制御ノード５００は、管理ノード５０から新たなストレージ装置の追加要求を受け取ると、新たな論理ボリュームを定義し、その論理ボリュームを介して追加されたストレージ装置に格納されていたデータにアクセスできるようにする。 The control node 500 manages the disk nodes 100, 200, 300, 400. For example, when receiving a request for adding a new storage device from the management node 50, the control node 500 can define a new logical volume and access data stored in the added storage device via the logical volume. Like that.

アクセスノード３０，４０には、ネットワーク２０を介して複数の端末装置２１，２２，２３が接続されている。アクセスノード３０，４０には、論理ボリュームが定義されている。そして、アクセスノード３０，４０は、端末装置２１，２２，２３からの論理ボリューム上でのデータのアクセス要求に応答して、ディスクノード１００，２００，３００，４００内の対応するデータへアクセスする。 A plurality of terminal devices 21, 22, and 23 are connected to the access nodes 30 and 40 through the network 20. In the access nodes 30 and 40, logical volumes are defined. Then, the access nodes 30 and 40 access corresponding data in the disk nodes 100, 200, 300, and 400 in response to data access requests on the logical volumes from the terminal devices 21, 22, and 23.

管理ノード５０は、管理者がマルチノードストレージシステムの運用を管理するために使用するコンピュータである。例えば、管理ノード５０では、マルチノードストレージシステムにおける論理ボリュームの使用量などの情報を収集し、運用状況を画面に表示する。 The management node 50 is a computer used by an administrator to manage the operation of the multi-node storage system. For example, the management node 50 collects information such as the usage of logical volumes in the multi-node storage system and displays the operation status on the screen.

図３は、制御ノードのハードウェア構成例を示す図である。制御ノード５００は、ＣＰＵ（Central Processing Unit）５０１によって装置全体が制御されている。ＣＰＵ５０１には、バス５０７を介してＲＡＭ（Random Access Memory）５０２、ハードディスクドライブ（ＨＤＤ:Hard Disk Drive）５０３、グラフィック処理装置５０４、入力インタフェース５０５、および通信インタフェース５０６が接続されている。 FIG. 3 is a diagram illustrating a hardware configuration example of the control node. The control node 500 is entirely controlled by a CPU (Central Processing Unit) 501. A random access memory (RAM) 502, a hard disk drive (HDD) 503, a graphic processing device 504, an input interface 505, and a communication interface 506 are connected to the CPU 501 via a bus 507.

ＲＡＭ５０２は、制御ノード５００の主記憶装置として使用される。ＲＡＭ５０２には、ＣＰＵ５０１に実行させるＯＳのプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、ＲＡＭ５０２には、ＣＰＵ５０１による処理に必要な各種データが格納される。ＨＤＤ５０３は、制御ノード５００の二次記憶装置として使用される。ＨＤＤ５０３には、ＯＳのプログラム、アプリケーションプログラム、および各種データが格納される。なお、二次記憶装置としては、フラッシュメモリなどの半導体記憶装置を使用することもできる。 The RAM 502 is used as a main storage device of the control node 500. The RAM 502 temporarily stores at least a part of OS programs and application programs to be executed by the CPU 501. The RAM 502 stores various data necessary for processing by the CPU 501. The HDD 503 is used as a secondary storage device of the control node 500. The HDD 503 stores an OS program, application programs, and various data. Note that a semiconductor storage device such as a flash memory can also be used as the secondary storage device.

グラフィック処理装置５０４には、モニタ１１が接続されている。グラフィック処理装置５０４は、ＣＰＵ５０１からの命令に従って、画像をモニタ１１の画面に表示させる。モニタ１１としては、ＣＲＴ（Cathode Ray Tube）を用いた表示装置や液晶表示装置がある。 A monitor 11 is connected to the graphic processing device 504. The graphic processing device 504 displays an image on the screen of the monitor 11 in accordance with a command from the CPU 501. Examples of the monitor 11 include a display device using a CRT (Cathode Ray Tube) and a liquid crystal display device.

入力インタフェース５０５には、キーボード１２とマウス１３とが接続されている。入力インタフェース５０５は、キーボード１２やマウス１３から送られてくる信号を、バス５０７を介してＣＰＵ５０１に送信する。なお、マウス１３は、ポインティングデバイスの一例であり、他のポインティングデバイスを使用することもできる。他のポインティングデバイスとしては、タッチパネル、タブレット、タッチパッド、トラックボールなどがある。 A keyboard 12 and a mouse 13 are connected to the input interface 505. The input interface 505 transmits a signal sent from the keyboard 12 or the mouse 13 to the CPU 501 via the bus 507. The mouse 13 is an example of a pointing device, and other pointing devices can also be used. Examples of other pointing devices include a touch panel, a tablet, a touch pad, and a trackball.

通信インタフェース５０６は、ネットワーク１０に接続されている。通信インタフェース５０６は、ネットワーク１０を介して、他のコンピュータとの間でデータの送受信を行う。 The communication interface 506 is connected to the network 10. The communication interface 506 transmits and receives data to and from other computers via the network 10.

以上のようなハードウェア構成によって、本実施の形態の処理機能を実現することができる。なお、図３には、制御ノード５００のハードウェア構成を示しているが、ディスクノード１００，２００，３００，４００、アクセスノード３０，４０、および管理ノード５０も同様のハードウェア構成で実現することができる。 With the hardware configuration as described above, the processing functions of the present embodiment can be realized. Although FIG. 3 shows the hardware configuration of the control node 500, the disk nodes 100, 200, 300, 400, the access nodes 30, 40, and the management node 50 are also realized with the same hardware configuration. Can do.

図４は、論理ボリュームのデータ構造を示す図である。本実施の形態では、論理ボリューム６０には論理ボリューム識別子「ＬＶＯＬ−Ｘ」が付与されている。ネットワーク経由で接続された４台のディスクノード１００，２００，３００，４００には、個々のノードの識別のためにそれぞれ「ＤＰ−Ａ」、「ＤＰ−Ｂ」、「ＤＰ−Ｃ」、「ＤＰ−Ｄ」というノード識別子が付与されている。そして、各ディスクノード１００，２００，３００，４００に接続されているストレージ装置１１０，２１０，３１０，４１０は、ディスクノード１００，２００，３００，４００のノード識別子によってネットワーク１０で一意に識別される。 FIG. 4 is a diagram showing the data structure of the logical volume. In this embodiment, the logical volume identifier “LVOL-X” is assigned to the logical volume 60. The four disk nodes 100, 200, 300, and 400 connected via the network have “DP-A”, “DP-B”, “DP-C”, and “DP”, respectively, for identifying individual nodes. The node identifier “-D” is assigned. The storage devices 110, 210, 310, 410 connected to the respective disk nodes 100, 200, 300, 400 are uniquely identified on the network 10 by the node identifiers of the disk nodes 100, 200, 300, 400.

各ディスクノード１００，２００，３００，４００が有するストレージ装置１１０，２１０，３１０それぞれにおいてＲＡＩＤ５のストレージシステムが構成されている。各ストレージ装置１１０，２１０，３１０，４１０で提供される記憶機能は、複数のスライス１１５ａ〜１１５ｆ，２１５ａ〜２１５ｆ，３１５ａ〜３１５ｆ，４１５ａ〜４１５ｆに分割されて管理されている。 A RAID 5 storage system is configured in each of the storage devices 110, 210, and 310 included in each of the disk nodes 100, 200, 300, and 400. The storage function provided by each storage device 110, 210, 310, 410 is managed by being divided into a plurality of slices 115a to 115f, 215a to 215f, 315a to 315f, and 415a to 415f.

論理ボリューム６０は、セグメント６１〜６６という単位で構成される。セグメント６１〜６６の記憶容量は、ストレージ装置１１０，２１０，３１０，４１０における管理単位であるスライスの記憶容量と同じである。例えば、スライスの記憶容量が１ギガバイトとするとセグメントの記憶容量も１ギガバイトである。論理ボリューム６０の記憶容量はセグメント１つ当たりの記憶容量の整数倍である。セグメント６１〜６６は、それぞれプライマリスライス６１ａ，６２ａ，６３ａ，６４ａ，６５ａ，６６ａとセカンダリスライス６１ｂ，６２ｂ，６３ｂ，６４ｂ，６５ｂ，６６ｂとの組（スライスペア）で構成される。 The logical volume 60 is configured in units of segments 61-66. The storage capacity of the segments 61 to 66 is the same as the storage capacity of a slice that is a management unit in the storage apparatuses 110, 210, 310, and 410. For example, if the storage capacity of the slice is 1 gigabyte, the storage capacity of the segment is also 1 gigabyte. The storage capacity of the logical volume 60 is an integral multiple of the storage capacity per segment. Each of the segments 61 to 66 includes a set (slice pair) of primary slices 61a, 62a, 63a, 64a, 65a, and 66a and secondary slices 61b, 62b, 63b, 64b, 65b, and 66b.

同一セグメントに属する２つのスライスは別々のディスクノードに属する。個々のスライスを管理する領域には論理ボリューム識別子やセグメント情報や同じセグメントを構成するスライス情報の他にフラグがあり、そのフラグにはプライマリあるいはセカンダリなどを表す値が格納される。 Two slices belonging to the same segment belong to different disk nodes. In addition to the logical volume identifier, segment information, and slice information constituting the same segment, an area for managing individual slices has a flag, and a value representing primary or secondary is stored in the flag.

図４の例では、論理ボリューム６０内のスライスの識別子を、「Ｐ」または「Ｓ」のアルファベットと数字との組合せで示している。「Ｐ」はプライマリスライスであることを示している。「Ｓ」はセカンダリスライスであることを示している。アルファベットに続く数字は、何番目のセグメントに属するのかを表している。例えば、１番目のセグメント６１のプライマリスライスが「Ｐ１」で示され、セカンダリスライスが「Ｓ１」で示される。 In the example of FIG. 4, the identifier of the slice in the logical volume 60 is indicated by a combination of alphabets “P” or “S” and numbers. “P” indicates a primary slice. “S” indicates a secondary slice. The number following the alphabet represents what number segment it belongs to. For example, the primary slice of the first segment 61 is indicated by “P1”, and the secondary slice is indicated by “S1”.

図５は、ディスクノードとアクセスノードとの機能を示すブロック図である。アクセスノード３０は、論理ボリュームアクセス制御部３１を有している。論理ボリュームアクセス制御部３１は、端末装置２１，２２，２３からの論理ボリューム６０内のデータを指定したアクセス要求に応じて、指定されたデータを管理するディスクノードに対してデータアクセスを行う。具体的には、論理ボリュームアクセス制御部３１は、アクセス対象のデータが記憶された論理ボリューム６０内のブロックを特定する。次に、論理ボリュームアクセス制御部３１は、特定したブロックに対応するセグメントを特定する。さらに、論理ボリュームアクセス制御部３１は、セグメントを構成するプライマリスライスに対応するディスクノードおよびそのディスクノード内のスライスを特定する。そして、論理ボリュームアクセス制御部３１は、特定したディスクノードに対して、特定したスライスへのアクセス要求を出力する。アクセスノード４０も、アクセスノード３０と同様の機能を有している。 FIG. 5 is a block diagram illustrating functions of the disk node and the access node. The access node 30 has a logical volume access control unit 31. The logical volume access control unit 31 performs data access to the disk node that manages the specified data in response to an access request specifying the data in the logical volume 60 from the terminal devices 21, 22, and 23. Specifically, the logical volume access control unit 31 specifies a block in the logical volume 60 in which data to be accessed is stored. Next, the logical volume access control unit 31 specifies a segment corresponding to the specified block. Furthermore, the logical volume access control unit 31 identifies the disk node corresponding to the primary slice that constitutes the segment and the slice in the disk node. Then, the logical volume access control unit 31 outputs an access request to the specified slice to the specified disk node. The access node 40 has the same function as the access node 30.

ディスクノード１００は、Ｉ／Ｆ部１２０、デバイス制御部１３０、スライス情報記憶部１４０、および制御部１５０を有する。
Ｉ／Ｆ部１２０は、所定の通信プロトコルにより、他の装置とネットワーク１０を介したデータ通信を実現するための通信機能である。デバイス制御部１３０は、ストレージ装置１１０を制御する。 The disk node 100 includes an I / F unit 120, a device control unit 130, a slice information storage unit 140, and a control unit 150.
The I / F unit 120 is a communication function for realizing data communication with other devices via the network 10 using a predetermined communication protocol. The device control unit 130 controls the storage apparatus 110.

スライス情報記憶部１４０は、ストレージ装置１１０のスライスを管理するためのスライス情報を記憶する記憶機能である。例えば、ディスクノード１００のＲＡＭ内の記憶領域の一部がスライス情報記憶部１４０として使用される。なお、スライス情報は、スライス毎に設けられたメタデータを集めたものである。メタデータは、対応するスライスが割り当てられたセグメントや、スライスペアにおいて対となる他のスライスを示す情報である。 The slice information storage unit 140 is a storage function that stores slice information for managing slices of the storage device 110. For example, a part of the storage area in the RAM of the disk node 100 is used as the slice information storage unit 140. The slice information is a collection of metadata provided for each slice. The metadata is information indicating a segment to which a corresponding slice is allocated and other slices that are paired in a slice pair.

制御部１５０は、データアクセス制御部１５１、ハートビート送信部１５２、データ管理部１５３を有する。
データアクセス制御部１５１は、アクセスノードからのアクセス要求（リード要求またはライト要求）を受け付けた場合に、スライス情報記憶部１４０を参照して、アクセス対象のスライスを判断する。そして、データアクセス制御部１５１は、ストレージ装置１１０内のアクセス要求に応じたデータにアクセスする。例えば、アクセス要求がリード要求であれば、データアクセス制御部１５１は、ストレージ装置１１０からデータを取得し、アクセスノード３０に送信する。また、アクセス要求がライト要求であれば、データアクセス制御部１５１は、ストレージ装置１１０にデータを書き込む。 The control unit 150 includes a data access control unit 151, a heartbeat transmission unit 152, and a data management unit 153.
When receiving an access request (read request or write request) from an access node, the data access control unit 151 refers to the slice information storage unit 140 and determines a slice to be accessed. Then, the data access control unit 151 accesses data according to the access request in the storage apparatus 110. For example, if the access request is a read request, the data access control unit 151 acquires data from the storage device 110 and transmits the data to the access node 30. If the access request is a write request, the data access control unit 151 writes data to the storage device 110.

ハートビート送信部１５２は、ディスクノード１００およびストレージ装置１１０が正常動作していることを制御ノード５００に定期的に通知する。具体的には、ハートビート送信部１５２は、ディスクノード１００およびストレージ装置１１０が正常に動作している場合、定期的に制御ノード５００へ正常に動作していることを示す正常通知（ハートビート）を送信する。制御ノード５００では、ディスクノード１００からのハートビートが一定時間途絶えた場合に、ディスクノード１００での障害が発生したことを認識する。 The heartbeat transmission unit 152 periodically notifies the control node 500 that the disk node 100 and the storage device 110 are operating normally. Specifically, when the disk node 100 and the storage device 110 are operating normally, the heartbeat transmission unit 152 periodically notifies the control node 500 that the disk node 100 and the storage device 110 are operating normally (heartbeat). Send. The control node 500 recognizes that a failure has occurred in the disk node 100 when the heartbeat from the disk node 100 stops for a certain period of time.

データ管理部１５３は、制御ノード５００からスライス情報の更新指示を受け付けた場合に、スライス情報記憶部１４０に記憶されているスライス情報を更新する。スライス情報の更新処理としては、例えば、スライスのプライマリ化がある。データ管理部１５３は、プライマリ化指示を受け付けた場合、スライス情報記憶部１４０内のプライマリ化指示に含まれるスライスＩＤで示されるメタデータの属性を「プライマリ（Ｐ）」に更新する。 When the data management unit 153 receives a slice information update instruction from the control node 500, the data management unit 153 updates the slice information stored in the slice information storage unit 140. As slice information update processing, for example, there is slice primary. When the data management unit 153 receives the primary instruction, the data management unit 153 updates the attribute of the metadata indicated by the slice ID included in the primary instruction in the slice information storage unit 140 to “primary (P)”.

また、データ管理部１５３は、スライス割当指示を受け付けた場合、スライス割当指示に含まれるスライスＩＤをキーにして、スライス情報記憶部１４０からメタデータを特定する。データ管理部１５３は、特定したメタデータの内容を、スライス割当指示で指定された内容（割当先のセグメントやペアとなる他のスライスの情報が含まれる）に応じて更新する。ストレージ装置１１０のスライスをセグメントに割り当てた場合、データ管理部１５３は、他のディスクノードで管理されているペアとなるスライスからデータを取得し、新たに割り当てられたスライスに格納する。 Further, when receiving the slice allocation instruction, the data management unit 153 identifies metadata from the slice information storage unit 140 using the slice ID included in the slice allocation instruction as a key. The data management unit 153 updates the content of the identified metadata according to the content specified by the slice allocation instruction (including information on the allocation destination segment and other slices to be paired). When the slice of the storage device 110 is assigned to a segment, the data management unit 153 acquires data from a pair of slices managed by other disk nodes and stores the data in the newly assigned slice.

なお、データ管理部１５３は、スライス情報内のメタデータを更新すると、ストレージ装置１１０内の対応するメタデータも更新する。これにより、ストレージ装置１１０内のメタデータとスライス情報記憶部１４０内のメタデータとの同一性が維持される。さらに、データ管理部１５３は、制御ノード５００から、スライス情報取得要求を受け取ると、スライス情報記憶部１４０内のスライス情報を制御ノード５００に送信する。 Note that when the metadata in the slice information is updated, the data management unit 153 also updates the corresponding metadata in the storage apparatus 110. Thereby, the identity of the metadata in the storage device 110 and the metadata in the slice information storage unit 140 is maintained. Further, upon receiving a slice information acquisition request from the control node 500, the data management unit 153 transmits slice information in the slice information storage unit 140 to the control node 500.

退避処理やリカバリ処理が行われる場合、制御ノード５００からディスクノード１００にストレージ装置１１０内のデータのコピー指示が出される。データ管理部１５３は、制御ノード５００からのコピー指示を受け取ると、ストレージ装置１１０内のデータを他のディスクノードに転送する。 When the save process or the recovery process is performed, the control node 500 issues a data copy instruction in the storage apparatus 110 to the disk node 100. When receiving a copy instruction from the control node 500, the data management unit 153 transfers the data in the storage device 110 to another disk node.

さらに、データ管理部１５３は、プライマリスライスとセカンダリスライスとのデータの冗長性確保処理を行う。具体的には、データ管理部１５３は、データアクセス制御部１５１によってライト要求に基づくデータの書き込みが行われた場合、データが書き込まれたスライス（プライマリスライス）に対応するセカンダリスライスを管理するディスクノードのデータ管理部と連携動作し、セカンダリスライス内のデータを更新する。 Further, the data management unit 153 performs data redundancy ensuring processing for the primary slice and the secondary slice. Specifically, when data is written based on a write request by the data access control unit 151, the data management unit 153 is a disk node that manages a secondary slice corresponding to a slice (primary slice) in which data is written. In cooperation with the data management unit, the data in the secondary slice is updated.

なお、ディスクノード２００，３００，４００もディスクノード１００と同様の機能を有している。
次に、スライス情報記憶部１４０に格納されるデータについて詳細に説明する。 The disk nodes 200, 300, and 400 have the same functions as the disk node 100.
Next, data stored in the slice information storage unit 140 will be described in detail.

図６は、スライス情報記憶部のデータ構造例を示す図である。スライス情報記憶部１４０には、メタデータテーブル１４１が格納されている。メタデータテーブル１４１には、ディスクノードＩＤ、スライスＩＤ、状態、論理ボリュームＩＤ、セグメントＩＤ、ペアのディスクノードＩＤ、およびペアのスライスＩＤの欄が設けられている。メタデータテーブル１４１内の横方向に並べられた情報同士が互いに関連付けられ、メタデータを示す１つのレコードを構成している。 FIG. 6 is a diagram illustrating a data structure example of the slice information storage unit. The slice information storage unit 140 stores a metadata table 141. The metadata table 141 includes columns for disk node ID, slice ID, state, logical volume ID, segment ID, paired disk node ID, and paired slice ID. Information arranged in the horizontal direction in the metadata table 141 is associated with each other, and constitutes one record indicating metadata.

ディスクノードＩＤの欄は、ストレージ装置１１０を管理しているディスクノード１００の識別情報（ディスクノードＩＤ）が設定される。
スライスＩＤの欄には、メタデータに対応するスライスのストレージ装置１１０内での識別情報（スライスＩＤ）が設定される。 In the disk node ID column, identification information (disk node ID) of the disk node 100 that manages the storage apparatus 110 is set.
In the slice ID column, identification information (slice ID) in the storage apparatus 110 of the slice corresponding to the metadata is set.

状態の欄には、スライスの状態を示す状態フラグが設定される。スライスが論理ボリュームのセグメントに割り当てられていない場合、状態フラグ「Ｆ」が設定される。論理ボリュームのセグメントのプライマリストレージに割り当てられている場合、状態フラグ「Ｐ」が設定される。論理ボリュームのセグメントのセカンダリストレージに割り当てられている場合、状態フラグ「Ｓ」が設定される。 In the status column, a status flag indicating the status of the slice is set. If the slice is not assigned to a segment of the logical volume, the status flag “F” is set. When the logical volume segment is assigned to the primary storage, the status flag “P” is set. When the logical volume segment is allocated to the secondary storage, the status flag “S” is set.

論理ボリュームＩＤの欄には、スライスに対応するセグメントが属する論理ボリュームを識別するための識別情報（論理ボリュームＩＤ）が設定される。
ペアのディスクノードＩＤの欄には、ペアのスライス（同じセグメントに属する別のスライス）を有するストレージ装置を管理するディスクノードの識別情報（ディスクノードＩＤ）が設定される。 Identification information (logical volume ID) for identifying the logical volume to which the segment corresponding to the slice belongs is set in the logical volume ID column.
In the paired disk node ID column, identification information (disk node ID) of a disk node that manages a storage device having a pair of slices (another slice belonging to the same segment) is set.

ペアのスライスＩＤの欄には、ペアのスライスを、そのスライスが属するストレージ装置内で識別するための識別情報（スライスＩＤ）が設定される。
次に、制御ノード５００の機能を詳細に説明する。制御ノード５００は、論理ボリュームを管理すると共に、ディスクノード１００，２００，３００，４００に障害が発生したときのリカバリ処理を行う。また、制御ノード５００は、システム管理者からの操作入力などに応答し、いずれかのストレージ装置に格納されているデータの退避処理を行う。 In the pair slice ID column, identification information (slice ID) for identifying the pair slice within the storage apparatus to which the slice belongs is set.
Next, the function of the control node 500 will be described in detail. The control node 500 manages logical volumes and performs recovery processing when a failure occurs in the disk nodes 100, 200, 300, and 400. In addition, the control node 500 responds to an operation input from the system administrator or the like, and performs a save process for data stored in any storage device.

図７は、制御ノードの機能を示すブロック図である。制御ノード５００は、インタフェース（Ｉ／Ｆ）部５１０、記憶部５２０、および制御部５３０を有する。Ｉ／Ｆ部５１０は、所定の通信プロトコルにより、ディスクノード１００，２００，３００，４００、アクセスノード３０，４０、および管理ノード５０と、ネットワーク１０を介したデータ通信を実現するための通信機能である。 FIG. 7 is a block diagram illustrating functions of the control node. The control node 500 includes an interface (I / F) unit 510, a storage unit 520, and a control unit 530. The I / F unit 510 has a communication function for realizing data communication via the network 10 with the disk nodes 100, 200, 300, 400, the access nodes 30, 40, and the management node 50 by a predetermined communication protocol. is there.

記憶部５２０は、各種情報を記憶する記憶機能である。例えば、ＲＡＭ５０２の記憶領域の一部を、記憶部５２０として利用することができる。記憶部５２０には、スライス情報テーブル５２１、セグメント情報テーブル５２２、復旧対象リスト５２３、退避対象スライスリスト５２４、および退避処理対象ディスクノードリスト５２５が記憶されている。 The storage unit 520 is a storage function that stores various types of information. For example, a part of the storage area of the RAM 502 can be used as the storage unit 520. The storage unit 520 stores a slice information table 521, a segment information table 522, a recovery target list 523, a save target slice list 524, and a save processing target disk node list 525.

スライス情報テーブル５２１は、各ディスクノード１００，２００，３００，４００から収集したメタデータ（ストレージ装置内の各スライスに関して、割り当てられたセグメント、および対となる他のスライスが示された情報）が登録されたデータテーブルである。スライス情報テーブル５２１は、制御部５３０内のスライス情報収集部５３２によって作成される。 In the slice information table 521, metadata collected from each of the disk nodes 100, 200, 300, and 400 (information indicating an allocated segment and another pair of slices for each slice in the storage device) is registered. This is a data table. The slice information table 521 is created by the slice information collection unit 532 in the control unit 530.

セグメント情報テーブル５２２は、論理ボリュームを構成する各セグメントに対して割り当てられたスライスを示す情報である。セグメント情報テーブル５２２は、スライス情報収集部５３２によって、ディスクノード１００，２００，３００，４００から収集したメタデータ基づいて生成される。 The segment information table 522 is information indicating a slice allocated to each segment constituting the logical volume. The segment information table 522 is generated based on the metadata collected from the disk nodes 100, 200, 300, and 400 by the slice information collection unit 532.

復旧対象リスト５２３は、リカバリ処理の対象となるセグメントを示す情報である。復旧対象リスト５２３は、制御部５３０内の復旧対象抽出部５３３によって生成される。
退避対象スライスリスト５２４は、退避処理を実行する際に、データを退避させるべきスライスのリストである。退避対象スライスリスト５２４は、制御部５３０内の退避制御部５３５によって生成される。 The recovery target list 523 is information indicating segments to be recovered. The recovery target list 523 is generated by the recovery target extraction unit 533 in the control unit 530.
The save target slice list 524 is a list of slices in which data is to be saved when executing the save process. The save target slice list 524 is generated by the save control unit 535 in the control unit 530.

退避処理対象ディスクノードリスト５２５は、退避処理の対象となっているディスクノードのリストである。退避対象スライスリスト５２４は、制御部５３０内の退避制御部５３５によって生成される。 The save process target disk node list 525 is a list of disk nodes that are targets of the save process. The save target slice list 524 is generated by the save control unit 535 in the control unit 530.

制御部５３０は、監視部５３１、スライス情報収集部５３２、復旧対象抽出部５３３、二重化制御部５３４、および退避(Evacuate)制御部５３５を有する。
監視部５３１は、ディスクノード１００，２００，３００，４００が正常に動作しているか否かを監視する処理部である。具体的には、監視部５３１は、ディスクノード１００，２００，３００，４００から定期的に送られてくるハートビートを受信する。そのとき、監視部５３１は、ハートビートの受信時刻を、ディスクノードに対応付けて記憶する。そして、監視部５３１は、前回のハートビート受信から所定期間以上経過しても次のハートビートを送ってこないディスクノードがあると、そのディスクノードに障害が発生したものと判断する。 The control unit 530 includes a monitoring unit 531, a slice information collection unit 532, a recovery target extraction unit 533, a duplexing control unit 534, and an evacuation control unit 535.
The monitoring unit 531 is a processing unit that monitors whether the disk nodes 100, 200, 300, and 400 are operating normally. Specifically, the monitoring unit 531 receives heartbeats periodically transmitted from the disk nodes 100, 200, 300, and 400. At that time, the monitoring unit 531 stores the heartbeat reception time in association with the disk node. If there is a disk node that does not send the next heartbeat even after a predetermined period of time has elapsed since the last heartbeat reception, the monitoring unit 531 determines that a failure has occurred in that disk node.

スライス情報収集部５３２は、システムの運用開始時にディスクノード１００，２００，３００，４００からスライス情報を収集し、スライス情報テーブル５２１とセグメント情報テーブル５２２とを生成する。また、スライス情報収集部５３２は、監視部５３１によってディスクノード１００，２００，３００，４００のいずれかに異常が発生したことが検知された場合に、正常に動作しているディスクノードからスライス情報（メタデータの集合）を収集する。そして、スライス情報収集部５３２は、収集したスライス情報に基づいて、スライス情報テーブル５２１とセグメント情報テーブル５２２とを新たに生成する。 The slice information collection unit 532 collects slice information from the disk nodes 100, 200, 300, and 400 at the start of system operation, and generates a slice information table 521 and a segment information table 522. In addition, the slice information collection unit 532, when the monitoring unit 531 detects that an abnormality has occurred in any of the disk nodes 100, 200, 300, 400, slice information ( A collection of metadata). Then, the slice information collection unit 532 newly generates a slice information table 521 and a segment information table 522 based on the collected slice information.

復旧対象抽出部５３３は、監視部５３１によりディスクノードの障害が検出されると、復旧対象のセグメントを検出する。具体的には、復旧対象抽出部５３３は、障害発生により再生成されたセグメント情報テーブル５２２に基づいて、復旧対象のセグメントを抽出する。復旧対象のセグメントとは、プライマリスライスまたはセカンダリスライスの一方のみが記憶されているセグメントである。そして、復旧対象抽出部５３３は、復旧対象のセグメントを示す情報を復旧対象リスト５２３に格納する。 When the monitoring unit 531 detects a disk node failure, the recovery target extraction unit 533 detects a recovery target segment. Specifically, the recovery target extraction unit 533 extracts a recovery target segment based on the segment information table 522 regenerated due to the occurrence of a failure. The recovery target segment is a segment in which only one of the primary slice and the secondary slice is stored. Then, the recovery target extraction unit 533 stores information indicating the recovery target segment in the recovery target list 523.

二重化制御部５３４は、セグメントのデータの二重化が保たれるように、セグメントへのスライス割当を制御する。すなわち、ディスクノードの障害により二重化状態が崩れたセグメントが発生すると、そのセグメントのリカバリ処理（二重化回復処理）を行う。リカバリ処理は、復旧対象のセグメントに割り当てられているスライスを、正常に動作しているストレージ管理装置の空きスライスと二重化させる処理である。リカバリ処理は、大別するとプライマリ化処理、空きスライス割当処理、およびスライス情報更新処理に分かれる。 The duplex control unit 534 controls the slice allocation to the segment so that the segment data is kept duplex. That is, when a segment whose duplex state is lost due to a failure of a disk node occurs, recovery processing (duplex recovery processing) of the segment is performed. The recovery process is a process of duplicating a slice assigned to a recovery target segment with an empty slice of a storage management device that is operating normally. The recovery process is roughly divided into a primary process, a free slice allocation process, and a slice information update process.

プライマリ化処理は、復旧対象のセグメントに現在割り当てられているスライスをすべてプライマリにする処理である。具体的には、二重化制御部５３４は、復旧対象リスト５２３とセグメント情報テーブル５２２とから、復旧対象セグメントに割り当てられているスライスのディスクノードＩＤおよびスライスＩＤを取得する。そして、二重化制御部５３４は、取得した復旧対象のスライスのディスクノードＩＤに該当するストレージ装置を有するディスクノードに対して、プライマリ化指示を送信する。プライマリ化指示で指定されたスライスは、そのプライマリか指示を受け取ったディスクノードによって、属性がプライマリに変更される。 The primary processing is processing for making all slices currently allocated to the recovery target segment primary. Specifically, the duplexing control unit 534 acquires the disk node ID and slice ID of the slice assigned to the recovery target segment from the recovery target list 523 and the segment information table 522. Then, the duplexing control unit 534 transmits a primary instruction to the disk node having the storage device corresponding to the acquired disk node ID of the recovery target slice. The attribute of the slice designated by the primary instruction is changed to primary depending on whether the slice is the primary or the disk node that received the instruction.

続いて、二重化制御部５３４は、空きスライス割当処理を行う。具体的には、二重化
制御部５３４は、スライス情報テーブル５２１から、復旧対象のセグメントに現在割り当てられているスライス（プライマリスライス）とは異なるストレージ装置に属するスライスで、いずれのセグメントにも割り当てられていないスライス（空きスライス）を検索する。 Subsequently, the duplexing control unit 534 performs free slice allocation processing. Specifically, the duplexing control unit 534 is a slice belonging to a storage device different from the slice (primary slice) currently assigned to the recovery target segment from the slice information table 521, and is assigned to any segment. Search for missing slices (free slices).

そして、二重化制御部５３４は、検索されたスライスの１つを選択し、復旧対象のセグメントのセカンダリスライスとして割り当てる。この際、二重化制御部５３４は、退避処理対象ディスクノードリスト５２５を参照し、退避処理対象のディスクノードの存否を確認する。退避処理対象のディスクノードがある場合、二重化制御部５３４は、そのディスクノードが管理するスライスは、空きスライス割当処理におけるセカンダリスライスとしての選択する優先順位を最も低くする。すなわち、退避処理対象のディスクノード以外のディスクノードが管理するスライスの中に空きスライスが無くなったときにのみ、退避処理対象のディスクノードが管理するスライスが、復旧対象のセグメントに割り当てられる。 Then, the duplexing control unit 534 selects one of the searched slices and assigns it as a secondary slice of the recovery target segment. At this time, the duplexing control unit 534 refers to the save process target disk node list 525 and confirms whether there is a save process target disk node. When there is a disk node to be saved, the duplexing control unit 534 lowers the priority of selecting a slice managed by the disk node as a secondary slice in the free slice allocation process. That is, only when there are no free slices in slices managed by disk nodes other than the save processing target disk node, the slice managed by the save processing target disk node is allocated to the recovery target segment.

さらに、二重化制御部５３４は、セカンダリスライスとするスライスへのデータコピー指示を、プライマリスライスを管理するディスクノードに対して送信する。データコピー指示には、セカンダリスライスを管理するディスクノードのディスクノードＩＤと、セカンダリスライスのスライスＩＤとが含まれる。データコピー指示を受け取ったディスクノードは、プライマリスライスのデータをセカンダリスライスを管理するディスクノードに転送する。データを受け取ったディスクノードは、そのデータをセカンダリスライスに格納する。これにより、復旧対象のセグメントのデータが二重化される。 Furthermore, the duplexing control unit 534 transmits a data copy instruction to the slice to be the secondary slice to the disk node that manages the primary slice. The data copy instruction includes the disk node ID of the disk node that manages the secondary slice and the slice ID of the secondary slice. The disk node that has received the data copy instruction transfers the data of the primary slice to the disk node that manages the secondary slice. The disk node that has received the data stores the data in the secondary slice. Thereby, the data of the recovery target segment is duplicated.

そして、二重化制御部５３４は、復旧対象のセグメントのプライマリスライスとセカンダリスライスとのそれぞれを管理するディスクノードに対して、スライス情報更新指示を送信し、復旧対象リスト５２３の内容を消去する。スライス情報更新指示には、復旧対象のセグメントに割り当てられたスライスのメタデータが含まれる。スライス情報更新指示を受け取ったディスクノードは、スライス情報更新指示に含まれるメタデータにより、スライス情報記憶部とストレージ装置とのメタデータを更新する。 Then, the duplexing control unit 534 transmits a slice information update instruction to the disk node that manages the primary slice and the secondary slice of the recovery target segment, and erases the contents of the recovery target list 523. The slice information update instruction includes the metadata of the slice assigned to the recovery target segment. The disk node that has received the slice information update instruction updates the metadata of the slice information storage unit and the storage device with the metadata included in the slice information update instruction.

退避制御部５３５は、管理者による操作入力などに基づいて退避処理を行う。具体的には、退避制御部５３５は、退避処理対象のストレージ装置を指定した退避指示を受け取ると、指定されたストレージ装置を管理するディスクノードのディスクノードＩＤを退避処理対象ディスクノードリスト５２５に登録する。次に退避制御部５３５は、スライス情報テーブル５２１を参照し、退避処理対象のストレージ装置のスライスのうち、セグメントに割り当てられているスライス（状態が「Ｐ」または「Ｓ」）を検出する。そして退避制御部５３５は、検出したスライス（退避処理対象スライス）に関するディスクノードＩＤとスライスＩＤとを、退避対象スライスリスト５２４に登録する。 The save control unit 535 performs save processing based on an operation input by the administrator. Specifically, when the save control unit 535 receives a save instruction designating the storage device targeted for the save process, it registers the disk node ID of the disk node that manages the designated storage device in the save process target disk node list 525. To do. Next, the save control unit 535 refers to the slice information table 521 and detects a slice (status is “P” or “S”) assigned to the segment from the slices of the storage apparatus targeted for save processing. Then, the save control unit 535 registers the disk node ID and slice ID related to the detected slice (save processing target slice) in the save target slice list 524.

次に退避制御部５３５は、セグメント情報テーブル５２２を参照し、退避処理対象スライスがプライマリスライスか否かを判断する。退避処理対象スライスがプライマリスライスであれば、退避制御部５３５は、同じセグメントに割り当てられたセカンダリスライスを管理するディスクノードに対して、プライマリスライスへの変更指示（プライマリ化指示）を送信する。 Next, the save control unit 535 refers to the segment information table 522 and determines whether the save process target slice is a primary slice. If the evacuation processing target slice is a primary slice, the evacuation control unit 535 transmits an instruction to change to the primary slice (primary instruction) to the disk node that manages the secondary slice allocated to the same segment.

さらに、退避制御部５３５は、退避処理対象のディスクノード以外のディスクノードが管理する空きスライスの中から、退避処理対象スライスのデータを退避させるためのスライスを選択する。そして、退避制御部５３５は、退避処理対象のディスクノードに対して、選択したスライスへのデータコピー指示を送信する。退避制御部５３５は、ディスクノードからコピー完了の応答を受け取ると、コピー先のディスクノードに対してスライス情報更新指示を送信する。退避処理対象スライスが複数ある場合、退避制御部５３５は、退避処理対象スライスを１つずつ選択する。そして、退避制御部５３５は、前に選択した退避処理対象スライスのデータのコピーが終了すると、次の退避処理対象スライスのデータコピー指示を送信する。 Further, the save control unit 535 selects a slice for saving the data of the save process target slice from the free slices managed by the disk nodes other than the save process target disk node. Then, the save control unit 535 transmits a data copy instruction to the selected slice to the disk node that is the save process target. Upon receiving a copy completion response from the disk node, the save control unit 535 transmits a slice information update instruction to the copy destination disk node. When there are a plurality of save process target slices, the save control unit 535 selects the save process target slices one by one. Then, when the copy of the data of the previously selected save process target slice is completed, the save control unit 535 transmits a data copy instruction for the next save process target slice.

退避制御部５３５は、選択した１つの退避処理対象スライスのデータコピーが終了するごとに、コピー先のディスクノードに対してセグメント情報の更新指示を送信する。すなわち退避制御部５３５は、コピー先のスライスを、コピー元のスライスと同じセグメントのセカンダリスライスとするメタデータを、コピー先のスライスを管理するディスクノードに送信する。 The save control unit 535 transmits a segment information update instruction to the copy destination disk node each time data copy of one selected save processing target slice is completed. In other words, the save control unit 535 transmits metadata that sets the copy destination slice as the secondary slice of the same segment as the copy source slice to the disk node that manages the copy destination slice.

なお、退避制御部５３５は、退避処理中に、退避処理対象のディスクノード以外のディスクノードで障害が検出されると、退避処理を中止する。中止した退避処理は、リカバリ処理の完了後に再開される。例えば、退避制御部５３５は、二重化制御部５３４によるリカバリ処理の完了を検知すると、中止していた退避処理を再開する。また、退避制御部５３５は、二重化制御部５３４によるリカバリ処理の完了後、管理者からの再開指示があったときに中止していた退避処理を再開するようにしてもよい。 If a failure is detected in a disk node other than the disk node that is the save process during the save process, the save control unit 535 stops the save process. The canceled saving process is resumed after the recovery process is completed. For example, when the save control unit 535 detects the completion of the recovery process by the duplex control unit 534, the save control unit 535 resumes the save process that has been stopped. Further, the evacuation control unit 535 may resume the evacuation process that has been stopped when the administrator instructs to resume after completion of the recovery process by the duplexing control unit 534.

ところで、図１に示した各機能と図７に示した機能との対応関係は次の通りである。図１の退避処理実行手段２と退避処理中止指示手段５との機能は、図７の退避制御部５３５に含まれる。図１の記憶手段３の機能は、図７の記憶部５２０に含まれる。図１の障害検出手段４の機能は、図７の監視部５３１に含まれる。図１の復旧対象データ判定手段６、冗長データコピー先選択手段７、および冗長データコピー手段８の機能は、図７の二重化制御部５３４に含まれる。 Incidentally, the correspondence relationship between the functions shown in FIG. 1 and the functions shown in FIG. 7 is as follows. The functions of the save process execution unit 2 and the save process stop instruction unit 5 in FIG. 1 are included in the save control unit 535 in FIG. The function of the storage unit 3 in FIG. 1 is included in the storage unit 520 in FIG. The function of the failure detection unit 4 in FIG. 1 is included in the monitoring unit 531 in FIG. The functions of the recovery target data determination unit 6, the redundant data copy destination selection unit 7, and the redundant data copy unit 8 in FIG. 1 are included in the duplexing control unit 534 in FIG.

次に、記憶部５２０に格納される情報について詳細に説明する。
図８は、スライス情報テーブルのデータ構造例を示す図である。スライス情報テーブル５２１には、ディスクノードＩＤ、スライスＩＤ、状態、論理ボリュームＩＤ、セグメントＩＤ、ペアのディスクノードＩＤ、およびペアのスライスＩＤの欄が設けられている。スライス情報テーブル５２１内の横方向に並べられた情報同士が互いに関連付けられ、メタデータを示す１つのレコードを構成している。スライス情報テーブル５２１の各欄に設定される情報は、メタデータテーブル１４１の同名の欄と同じ情報である。ただし、スライス情報テーブル５２１は、ディスクノード１００のメタデータテーブル１４１とは異なり、複数のディスクノード１００，２００，３００，４００から収集したすべてのスライス情報に含まれるメタデータが登録されている。 Next, information stored in the storage unit 520 will be described in detail.
FIG. 8 is a diagram illustrating an exemplary data structure of the slice information table. The slice information table 521 includes columns for disk node ID, slice ID, status, logical volume ID, segment ID, paired disk node ID, and paired slice ID. Information arranged in the horizontal direction in the slice information table 521 is associated with each other to form one record indicating metadata. Information set in each column of the slice information table 521 is the same information as the column of the same name in the metadata table 141. However, unlike the metadata table 141 of the disk node 100, the slice information table 521 registers metadata included in all slice information collected from the plurality of disk nodes 100, 200, 300, and 400.

図９は、セグメント情報テーブルのデータ構造例を示す図である。セグメント情報テーブル５２２には、論理ボリュームＩＤ、セグメントＩＤ、プライマリ情報、およびセカンダリ情報の欄が設けられている。 FIG. 9 is a diagram illustrating an exemplary data structure of the segment information table. The segment information table 522 has columns for logical volume ID, segment ID, primary information, and secondary information.

論理ボリュームＩＤの欄には、マルチノードストレージシステムで定義された論理ボリュームの識別情報（論理ボリュームＩＤ）が設定される。セグメントＩＤの欄には、論理ボリュームに定義されたセグメントの識別情報（セグメントＩＤ）が設定される。 In the logical volume ID column, identification information (logical volume ID) of the logical volume defined in the multi-node storage system is set. In the segment ID column, segment identification information (segment ID) defined in the logical volume is set.

プライマリ情報の欄には、対応するセグメントに割り当てられたプライマリスライスに関する情報が設定される。プライマリ情報の欄は、ディスクノードＩＤとスライスＩＤとの欄に細分化されている。ディスクノードＩＤの欄には、プライマリスライスとして割り当てられたスライスを管理するディスクノードの識別情報（ディスクノードＩＤ）が設定される。スライスＩＤの欄には、プライマリスライスとして割り当てられたスライスの識別情報（スライスＩＤ）が設定される。 In the primary information column, information related to the primary slice assigned to the corresponding segment is set. The primary information column is subdivided into disk node ID and slice ID columns. In the column of the disk node ID, identification information (disk node ID) of the disk node that manages the slice assigned as the primary slice is set. In the slice ID column, identification information (slice ID) of a slice assigned as the primary slice is set.

セカンダリ情報の欄には、対応するセグメントに割り当てられたセカンダリスライスに関する情報が設定される。セカンダリ情報の欄は、ディスクノードＩＤとスライスＩＤとの欄に細分化されている。ディスクノードＩＤの欄には、セカンダリスライスとして割り当てられたスライスを管理するディスクノードの識別情報（ディスクノードＩＤ）が設定される。スライスＩＤの欄には、セカンダリスライスとして割り当てられたスライスの識別情報（スライスＩＤ）が設定される。 In the secondary information column, information on the secondary slice assigned to the corresponding segment is set. The column of secondary information is subdivided into columns of disk node ID and slice ID. In the column of disk node ID, identification information (disk node ID) of a disk node that manages a slice assigned as a secondary slice is set. In the slice ID column, identification information (slice ID) of a slice assigned as a secondary slice is set.

図１０は、復旧対象リストのデータ構造例を示す図である。復旧対象リスト５２３には、論理ボリュームＩＤとセグメントＩＤとの欄が設けられている。
論理ボリュームＩＤの欄には、リカバリ処理による復旧対象となるセグメントが属する論理ボリュームの識別情報（論理ボリュームＩＤ）が設定される。セグメントのＩＤの欄には、リカバリ処理による復旧対象となるセグメントの識別情報（セグメントＩＤ）が設定される。 FIG. 10 is a diagram illustrating an example of the data structure of the recovery target list. The recovery target list 523 has columns for logical volume ID and segment ID.
In the logical volume ID column, identification information (logical volume ID) of the logical volume to which the segment to be recovered by the recovery process belongs is set. In the segment ID column, identification information (segment ID) of a segment to be restored by the recovery process is set.

図１１は、退避対象スライスリストのデータ構造例を示す図である。退避対象スライスリスト５２４には、ディスクノードＩＤとスライスＩＤとの欄が設けられている。退避対象スライスリスト５２４内の横方向に並べられた情報同士が互いに関連付けられ、退避処理対象スライスを示す１つのレコードを構成している。 FIG. 11 is a diagram illustrating an example of the data structure of the save target slice list. The save target slice list 524 has columns of a disk node ID and a slice ID. Information arranged in the horizontal direction in the save target slice list 524 is associated with each other, and constitutes one record indicating the save process target slice.

ディスクノードＩＤの欄には、退避処理の対象となるスライスを管理するディスクノードの識別情報（ディスクノードＩＤ）が設定される。スライスＩＤの欄には、退避処理の対象となるスライスの識別情報（スライスＩＤ）が設定される。 In the disk node ID column, identification information (disk node ID) of a disk node that manages a slice to be saved is set. In the slice ID field, identification information (slice ID) of a slice to be saved is set.

図１２は、退避処理対象ディスクノードリストのデータ構造例を示す図である。退避処理対象ディスクノードリスト５２５には、退避処理対象となっているディスクノードの識別情報（ディスクノードＩＤ）が設定される。 FIG. 12 shows an example of the data structure of the save process target disk node list. In the save process target disk node list 525, identification information (disk node ID) of a disk node that is a save process target is set.

以上のような構成のシステムにより、退避処理やリカバリ処理が実行される。そして、退避処理中に退避処理対象のディスクノードとは別のディスクノードに障害が発生すると、退避処理を中止してリカバリ処理が開始される。以下、退避処理とリカバリ処理とについて詳細に説明する。 The save process and the recovery process are executed by the system configured as described above. If a failure occurs in a disk node other than the disk node that is the save process target during the save process, the save process is stopped and the recovery process is started. Hereinafter, the saving process and the recovery process will be described in detail.

図１３は、退避処理の手順を示すフローチャートである。以下、図１３に示す処理をステップ番号に沿って説明する。なお、退避処理は、ディスクノードを指定した退避処理の指示が入力されたときに開始される。 FIG. 13 is a flowchart showing the procedure of the saving process. In the following, the process illustrated in FIG. 13 will be described in order of step number. The save process is started when a save process instruction specifying a disk node is input.

［ステップＳ１１］退避制御部５３５は、退避処理の指示で指定されたディスクノード（退避処理対象ディスクノード）のディスクノードＩＤが登録された退避処理対象ディスクノードリスト５２５を生成する。そして、退避制御部５３５は、生成した退避処理対象ディスクノードリスト５２５を記憶部５２０に格納する。 [Step S11] The save control unit 535 generates a save process target disk node list 525 in which the disk node IDs of the disk nodes (save process target disk nodes) designated by the save process instruction are registered. Then, the save control unit 535 stores the generated save process target disk node list 525 in the storage unit 520.

［ステップＳ１２］退避制御部５３５は、退避対象スライスリスト５２４を生成する。具体的には、退避制御部５３５は、スライス情報テーブル５２１を参照し、退避処理対象ディスクノードが管理するスライスのうち、論理ボリュームのセグメントに割り当てられているスライス（退避対象スライス）のメタデータを検索する。セグメントに割り当てられているスライスとは、状態の欄に「Ｐ（プライマリスライス）」または「Ｓ（セカンダリスライス）」のフラグが設定されたスライスである。退避制御部５３５は、検出した各メタデータのディスクノードＩＤとスライスＩＤとの組みを退避対象スライスリスト５２４に設定する。そして、退避制御部５３５は、生成した退避対象スライスリスト５２４を、記憶部５２０に格納する。 [Step S12] The save control unit 535 generates a save target slice list 524. Specifically, the save control unit 535 refers to the slice information table 521, and among the slices managed by the save processing target disk node, the metadata of the slice (save target slice) allocated to the logical volume segment. Search for. A slice allocated to a segment is a slice in which a flag “P (primary slice)” or “S (secondary slice)” is set in the status column. The save control unit 535 sets a combination of the detected disk node ID and slice ID of each metadata in the save target slice list 524. Then, the save control unit 535 stores the generated save target slice list 524 in the storage unit 520.

［ステップＳ１３］退避制御部５３５は、退避対象スライスリスト５２４から退避対象スライスを１つ選択する。具体的には、退避制御部５３５は、退避対象スライスリスト５２４から、１つのレコード（ディスクノードＩＤとスライスＩＤとの組み）を選択する。 [Step S13] The save control unit 535 selects one save target slice from the save target slice list 524. Specifically, the save control unit 535 selects one record (a combination of a disk node ID and a slice ID) from the save target slice list 524.

［ステップＳ１４］退避制御部５３５は、選択したスライスがプライマリスライスか否かを判断する。具体的には、退避制御部５３５は、スライス情報テーブル５２１から、ステップＳ１３で選択したスライスのディスクノードＩＤとスライスＩＤとの組みに対応するメタデータを検索する。そして、退避制御部５３５は、検出したメタデータの状態が「Ｐ（プライマリスライス）」であれば、選択したスライスがプライマリスライスであると判断する。選択したスライスがプライマリスライスであれば、処理がステップＳ１５に進められる。選択したスライスがセカンダリスライスであれば、処理がステップＳ１６に進められる。 [Step S14] The save control unit 535 determines whether the selected slice is a primary slice. Specifically, the save control unit 535 searches the slice information table 521 for metadata corresponding to the combination of the disk node ID and slice ID of the slice selected in step S13. If the detected metadata state is “P (primary slice)”, the save control unit 535 determines that the selected slice is the primary slice. If the selected slice is a primary slice, the process proceeds to step S15. If the selected slice is a secondary slice, the process proceeds to step S16.

［ステップＳ１５］退避制御部５３５は、選択したスライスとペアのスライスのプライマリ化処理を行う。具体的には、退避制御部５３５は、ステップＳ１４で検出したメタデータにおけるペアのディスクノードＩＤおよびペアのスライスＩＤによって、ペアのスライスを特定する。選択したスライスがプライマリスライスであるため、ペアのスライスはセカンダリスライスである。そこで、退避制御部５３５は、セカンダリスライスを管理するディスクノードに対して、プライマリ化（メタデータの属性を「Ｐ」に変更）の指示を送信する。プライマリ化指示には、プライマリ化すべきスライスのスライスＩＤが含まれる。 [Step S15] The save control unit 535 performs primary processing of the selected slice and a pair of slices. Specifically, the save control unit 535 specifies a pair of slices based on the paired disk node ID and the paired slice ID in the metadata detected in step S14. Since the selected slice is the primary slice, the paired slice is the secondary slice. Therefore, the save control unit 535 transmits an instruction for primary (changes the metadata attribute to “P”) to the disk node that manages the secondary slice. The primary instruction includes the slice ID of the slice to be primary.

プライマリ化指示を受信したディスクノードでは、指定されたスライスのメタデータの属性を「Ｐ」に変更する。プライマリ化を行ったディスクノードは、プライマリ化完了応答を退避制御部５３５に返す。プライマリ化完了応答を受け取った退避制御部５３５は、スライス情報テーブル５２１とセグメント情報テーブル５２２を更新する。すなわち、プライマリ化させたスライスのメタデータの属性が「Ｐ」に変更され、選択された退避対象スライスのメタデータの属性が「Ｓ」に変更される。 In the disk node that has received the primary instruction, the metadata attribute of the designated slice is changed to “P”. The disk node that has made the primary returns a primary completion completion response to the save control unit 535. The save control unit 535 that has received the primaryization completion response updates the slice information table 521 and the segment information table 522. That is, the metadata attribute of the slice that has been made primary is changed to “P”, and the metadata attribute of the selected save target slice is changed to “S”.

［ステップＳ１６］退避制御部５３５は、退避対象スライスのデータを、他のスライスにコピーする。具体的には、退避制御部５３５は、スライス情報テーブル５２１を参照し、選択した退避対象スライスのペアのスライスとは異なるディスクノードで管理された空きスライスを１つ選択する。そして、退避制御部５３５は、退避対象スライスを管理するディスクノードに対して、退避対象スライスの空きスライスへのデータコピー指示を送信する。このデータコピー指示には、退避対象スライスのスライスＩＤ、コピー先のディスクノードのディスクノードＩＤ、および選択した空きスライスのスライスＩＤが含まれる。 [Step S16] The save control unit 535 copies the data of the save target slice to another slice. Specifically, the save control unit 535 refers to the slice information table 521 and selects one free slice managed by a disk node different from the pair of slices of the selected save target slice. Then, the save control unit 535 transmits a data copy instruction to a free slice of the save target slice to the disk node that manages the save target slice. This data copy instruction includes the slice ID of the save target slice, the disk node ID of the copy destination disk node, and the slice ID of the selected free slice.

データコピー指示を受け取ったディスクノードは、退避対象スライス内のデータを、指定された空きスライスにコピーする。データコピーを行ったディスクノードは、データコピー完了応答を退避制御部５３５に返す。 The disk node that has received the data copy instruction copies the data in the save target slice to the designated empty slice. The disk node that has performed the data copy returns a data copy completion response to the save control unit 535.

［ステップＳ１７］退避制御部５３５は、メタデータを更新する。具体的には、退避制御部５３５は、データのコピー先となったスライスを管理するディスクノードに対して、その空きスライスを、選択した退避対象スライスと同じセグメントのセカンダリスライスとして割り当てることを示すスライス情報更新指示を送信する。スライス情報更新指示を受け取ったディスクノードは、指定された内容でデータコピー先となったスライスのメタデータを更新する。メタデータを更新したディスクノードは、退避制御部５３５に対してメタデータ更新完了応答を返す。メタデータ更新完了応答を受け取った退避制御部５３５は、スライス情報テーブル５２１とセグメント情報テーブル５２２を更新する。 [Step S17] The save control unit 535 updates the metadata. Specifically, the save control unit 535 indicates that a free slice is assigned as a secondary slice of the same segment as the selected save target slice to the disk node that manages the slice that is the data copy destination. An information update instruction is transmitted. The disk node that has received the slice information update instruction updates the metadata of the slice that is the data copy destination with the specified contents. The disk node that updated the metadata returns a metadata update completion response to the save control unit 535. Upon receiving the metadata update completion response, the save control unit 535 updates the slice information table 521 and the segment information table 522.

［ステップＳ１８］退避制御部５３５は、退避対象スライスを退避対象スライスリスト５２４から削除する。
［ステップＳ１９］退避制御部５３５は、退避対象スライスリスト５２４に未選択の退避対象スライスがあるか否かを判断する。未選択の退避対象スライスがあれば、処理がステップＳ１３に進められる。すべての退避対象スライスについての退避処理が完了していれば、処理がステップＳ２０に進められる。 [Step S18] The save control unit 535 deletes the save target slice from the save target slice list 524.
[Step S19] The save control unit 535 determines whether there is an unselected save target slice in the save target slice list 524. If there is an unselected save target slice, the process proceeds to step S13. If the save processing has been completed for all save target slices, the process proceeds to step S20.

［ステップＳ２０］退避制御部５３５は、退避処理対象ディスクノードを退避処理対象ディスクノードリスト５２５から削除する。その後、処理が終了する。
このように、退避処理では、退避対象スライスが１つずつ選択され、順次処理される。退避処理が完了する前にディスクノードの障害が検出されると、退避処理は中止される。退避処理が中止されても、退避処理対象ディスクノードリスト５２５の内容は維持される。 [Step S20] The save control unit 535 deletes the save process target disk node from the save process target disk node list 525. Thereafter, the process ends.
Thus, in the save process, save target slices are selected one by one and sequentially processed. If a disk node failure is detected before the save process is completed, the save process is stopped. Even when the save process is stopped, the contents of the save process target disk node list 525 are maintained.

次に、リカバリ処理について詳細に説明する。
図１４は、リカバリ処理の手順を示すフローチャートである。以下、図１４に示す処理をステップ番号に沿って説明する。なお、図１４の処理は、ディスクノードからのハートビートが所定時間以上途絶えたときに実行される。 Next, the recovery process will be described in detail.
FIG. 14 is a flowchart showing the procedure of the recovery process. In the following, the process illustrated in FIG. 14 will be described in order of step number. Note that the processing in FIG. 14 is executed when the heartbeat from the disk node stops for a predetermined time or more.

［ステップＳ３１］監視部５３１は、ディスクノードからのハードビートが所定時間途絶えたことにより、そのディスクノードの異常を検出する。
［ステップＳ３２］二重化制御部５３４は、退避制御部５３５による退避処理中か否かを判断する。退避処理中でなければ、処理がステップＳ３３に進められる。退避処理中であれば、処理がステップＳ３４に進められる。 [Step S31] The monitoring unit 531 detects an abnormality of the disk node when the hard beat from the disk node is interrupted for a predetermined time.
[Step S32] The duplexing control unit 534 determines whether or not the saving control unit 535 is performing saving processing. If the saving process is not in progress, the process proceeds to step S33. If so, the process proceeds to step S34.

［ステップＳ３３］制御部５３０内の各要素が連携して、通常リカバリ処理を実行する。この処理の詳細は後述する。その後、リカバリ処理が終了する。
［ステップＳ３４］制御部５３０内の各要素が連携して、退避処理中リカバリ処理を実行する。この処理の詳細は後述する。その後、リカバリ処理が終了する。 [Step S33] Each element in the control unit 530 cooperates to execute normal recovery processing. Details of this processing will be described later. Thereafter, the recovery process ends.
[Step S34] Each element in the control unit 530 cooperates to execute the recovery process during the save process. Details of this processing will be described later. Thereafter, the recovery process ends.

このように、リカバリ処理は、退避処理中か否かによって処理内容が異なってくる。以下、通常リカバリ処理と退避処理中リカバリ処理との詳細な手順について説明する。
図１５は、通常リカバリ処理の手順を示すフローチャートである。以下、図１５に示す処理をステップ番号に沿って説明する。 As described above, the contents of the recovery process differ depending on whether or not the save process is in progress. Hereinafter, detailed procedures of the normal recovery process and the recovery process during the save process will be described.
FIG. 15 is a flowchart showing the procedure of normal recovery processing. In the following, the process illustrated in FIG. 15 will be described in order of step number.

［ステップＳ４１］スライス情報収集部５３２は、正常なディスクノードからスライス情報を収集する。具体的には、スライス情報収集部５３２は、監視部５３１から、異常を検出したディスクノードのディスクノードＩＤを取得する。次に、スライス情報収集部５３２は、異常を検出したディスクノード以外の各ディスクノードに対して、スライス情報取得要求を送信する。スライス情報取得要求を受信した各ディスクノードは、スライス情報記憶部に格納されているスライス情報（スライス毎のメタデータ）をスライス情報収集部５３２に送信する。スライス情報収集部５３２は、各ディスクノードが送信したスライス情報を受信する。 [Step S41] The slice information collection unit 532 collects slice information from normal disk nodes. Specifically, the slice information collection unit 532 acquires the disk node ID of the disk node that detected the abnormality from the monitoring unit 531. Next, the slice information collection unit 532 transmits a slice information acquisition request to each disk node other than the disk node that detected the abnormality. Each disk node that has received the slice information acquisition request transmits slice information (metadata for each slice) stored in the slice information storage unit to the slice information collection unit 532. The slice information collection unit 532 receives the slice information transmitted by each disk node.

［ステップＳ４２］スライス情報収集部５３２は、スライス情報テーブル５２１とセグメント情報テーブル５２２とを再構成する。具体的には、スライス情報収集部５３２は、収集したスライス情報に基づいてスライス情報テーブルを生成し、記憶部５２０内のスライス情報テーブル５２１を更新する。さらに、スライス情報収集部５３２は、収集したスライス情報に基づいてセグメント情報テーブルを生成し、記憶部５２０内のセグメント情報テーブル５２２を更新する。なお、セグメント情報テーブル５２２を生成する際には、スライス情報収集部５３２は、収集したスライス情報の各メタデータの状態を参照する。そして、スライス情報収集部５３２は、メタデータの状態が「Ｐ」であれば、そのメタデータの情報を、セグメント情報テーブル５２２のプライマリ情報の欄に設定する。また、スライス情報収集部５３２は、メタデータの状態が「Ｓ」であれば、そのメタデータの情報を、セグメント情報テーブル５２２のセカンダリ情報の欄に設定する。この際、障害が発生したディスクノードからはスライス情報が収集されないため、再構築したセグメント情報テーブル５２２では、障害が発生したディスクノードで管理するスライスを割り当てていたセグメントのプライマリ情報またはセカンダリ情報の欄は空欄となる。 [Step S42] The slice information collection unit 532 reconfigures the slice information table 521 and the segment information table 522. Specifically, the slice information collection unit 532 generates a slice information table based on the collected slice information, and updates the slice information table 521 in the storage unit 520. Furthermore, the slice information collection unit 532 generates a segment information table based on the collected slice information, and updates the segment information table 522 in the storage unit 520. Note that when the segment information table 522 is generated, the slice information collection unit 532 refers to the state of each metadata of the collected slice information. If the metadata state is “P”, the slice information collection unit 532 sets the metadata information in the primary information column of the segment information table 522. If the metadata state is “S”, the slice information collection unit 532 sets the metadata information in the secondary information column of the segment information table 522. At this time, since slice information is not collected from the failed disk node, in the reconstructed segment information table 522, the primary information or secondary information column of the segment to which the slice managed by the failed disk node has been allocated. Is blank.

［ステップＳ４３］復旧対象抽出部５３３は、復旧対象リスト５２３を作成する。具体的には、復旧対象抽出部５３３は、セグメント情報テーブル５２２を参照し、プライマリ情報とセカンダリ情報とのいずれかの欄が空欄となっているセグメントを抽出する。そして、復旧対象抽出部５３３は、抽出したセグメントの情報（論理ボリュームＩＤとセグメントＩＤ）を復旧対象リスト５２３に登録する。これにより、異常が検出されたディスクノードが管理するスライスが割り当てられていたセグメント（復旧対象セグメント）を示す情報が登録された復旧対象リスト５２３が生成される。生成された復旧対象リスト５２３は、記憶部５２０に格納される。 [Step S43] The recovery target extraction unit 533 creates the recovery target list 523. Specifically, the recovery target extraction unit 533 refers to the segment information table 522 and extracts a segment in which any column of primary information and secondary information is blank. Then, the recovery target extraction unit 533 registers the extracted segment information (logical volume ID and segment ID) in the recovery target list 523. As a result, the recovery target list 523 in which information indicating the segment (recovery target segment) to which the slice managed by the disk node where the abnormality is detected is assigned is generated. The generated recovery target list 523 is stored in the storage unit 520.

［ステップＳ４４］二重化制御部５３４は、復旧対象セグメントの既割当スライスをプライマリスライスとする。具体的には、二重化制御部５３４は、復旧対象リスト５２３から、復旧対象セグメントのディスクノードＩＤおよびセグメントＩＤを取得する。そして、二重化制御部５３４は、セグメント情報テーブル５２２を参照し、復旧対象セグメントに割り当てられている既存のスライス（既割当スライス）がプライマリスライスかセカンダリスライスかを判断する。既割当スライスがセカンダリスライスであれば、二重化制御部５３４は、その既割当スライスのディスクノードＩＤに該当するディスクノードに対してプライマリ化指示を送信する。すると、既割当スライスがプライマリスライスに変更される。 [Step S44] The duplexing control unit 534 sets the already allocated slice of the recovery target segment as the primary slice. Specifically, the duplexing control unit 534 acquires the disk node ID and segment ID of the recovery target segment from the recovery target list 523. Then, the duplexing control unit 534 refers to the segment information table 522 and determines whether the existing slice (allocated slice) allocated to the recovery target segment is a primary slice or a secondary slice. If the already allocated slice is a secondary slice, the duplexing control unit 534 transmits a primary instruction to the disk node corresponding to the disk node ID of the already allocated slice. Then, the already allocated slice is changed to the primary slice.

［ステップＳ４５］二重化制御部５３４は、復旧対象セグメントに対して、セカンダリスライスを割り当てる。具体的には、二重化制御部５３４は、スライス情報テーブル５２１を参照し、復旧対象セグメントの既割当スライスとは別のディスクノードで管理されている空きスライス（状態が「Ｆ」のスライス）を１つ選択する。そして、二重化制御部５３４は、選択したスライスを復旧対象セグメントのセカンダリスライスとして割り当てる。なお、この時点では、スライス情報テーブル５２１では、割り当てたスライスの状態が「Ｒ」（リザーブスライスであることを示す）に設定される。そして、後述するメタデータの更新処理が完了したときに、そのスライスの状態が「Ｓ」（セカンダリスライス）に変更される。 [Step S45] The duplexing control unit 534 allocates a secondary slice to the recovery target segment. Specifically, the duplexing control unit 534 refers to the slice information table 521 and sets 1 as a free slice (slice whose status is “F”) managed by a disk node different from the already allocated slice of the recovery target segment. Select one. Then, the duplexing control unit 534 allocates the selected slice as a secondary slice of the recovery target segment. At this time, in the slice information table 521, the state of the allocated slice is set to “R” (indicating that it is a reserved slice). Then, when the metadata update process described later is completed, the state of the slice is changed to “S” (secondary slice).

［ステップＳ４６］二重化制御部５３４は、復旧対象セグメントの既割当スライス（プライマリスライス）から新たに割り当てたスライス（セカンダリスライス）へのデータコピーを指示する。具体的には、二重化制御部５３４は、プライマリスライスを管理するディスクノードに対して、セカンダリスライスのディスクノードＩＤとスライスＩＤとを指定したデータコピー指示を送信する。 [Step S46] The duplexing control unit 534 instructs data copying from the already allocated slice (primary slice) of the recovery target segment to the newly allocated slice (secondary slice). Specifically, the duplexing control unit 534 transmits a data copy instruction specifying the disk node ID and slice ID of the secondary slice to the disk node that manages the primary slice.

［ステップＳ４７］二重化制御部５３４は、スライス情報更新指示をプライマリスライスとセカンダリスライスとを管理するディスクノードに送信する。送信されるスライス情報更新指示には、既割当スライスを復旧対象セグメントのプライマリライスとし、ステップＳ４５で割り当てたスライスを復旧対象セグメントのセカンダリスライスとすることが示される。各ディスクノードでは、スライス情報記憶部とストレージ装置とのそれぞれに格納されたメタデータが更新される。ディスクノードは、メタデータを更新すると、スライス情報更新完了の応答を二重化制御部５３４に送信する。二重化制御部５３４は、ディスクノードからスライス情報更新完了の応答が返されると、スライス情報テーブル５２１とセグメント情報テーブル５２２とを更新する。 [Step S47] The duplexing control unit 534 transmits a slice information update instruction to the disk node that manages the primary slice and the secondary slice. The transmitted slice information update instruction indicates that the already allocated slice is the primary slice of the recovery target segment and the slice allocated in step S45 is the secondary slice of the recovery target segment. In each disk node, metadata stored in each of the slice information storage unit and the storage device is updated. When updating the metadata, the disk node transmits a response to update the slice information to the duplex control unit 534. The duplexing control unit 534 updates the slice information table 521 and the segment information table 522 when a response of completion of slice information update is returned from the disk node.

このようにして、退避処理を実行していないときのリカバリ処理が完了する。次に、退避処理中のリカバリ処理を詳細に説明する。
図１６は、退避処理中リカバリ処理を示す第１のフローチャートである。以下、図１６に示す処理をステップ番号に沿って説明する。 In this way, the recovery process when the save process is not executed is completed. Next, the recovery process during the save process will be described in detail.
FIG. 16 is a first flowchart showing recovery processing during save processing. In the following, the process illustrated in FIG. 16 will be described in order of step number.

［ステップＳ５１］退避制御部５３５は、退避処理を中止する。具体的には、二重化制御部５３４から退避制御部５３５へ退避処理中止の指示が出される。退避制御部５３５は、退避処理中止の指示に応答して、退避処理を中止する。その際、退避対象スライスリスト５２４の内容はクリアされるが、退避処理対象ディスクノードリスト５２５の内容は維持される。 [Step S51] The save control unit 535 stops the save process. Specifically, an instruction to stop the save process is issued from the duplex control unit 534 to the save control unit 535. The save control unit 535 stops the save process in response to an instruction to stop the save process. At this time, the contents of the save target slice list 524 are cleared, but the contents of the save process target disk node list 525 are maintained.

［ステップＳ５２］スライス情報収集部５３２は、正常なディスクノードからスライス情報を収集する。この処理の詳細は、図１５のステップＳ４１と同様である。
［ステップＳ５３］スライス情報収集部５３２は、スライス情報テーブル５２１とセグメント情報テーブル５２２とを再構成する。この処理の詳細は、図１５のステップＳ４２と同様である。 [Step S52] The slice information collection unit 532 collects slice information from normal disk nodes. The details of this process are the same as step S41 in FIG.
[Step S53] The slice information collection unit 532 reconfigures the slice information table 521 and the segment information table 522. The details of this process are the same as step S42 in FIG.

［ステップＳ５４］復旧対象抽出部５３３は、復旧対象リスト５２３を作成する。この処理の詳細は、図１５のステップＳ４３と同様である。
［ステップＳ５５］二重化制御部５３４は、復旧対象セグメントの既割当スライスをプライマリスライスとする。この処理の詳細は、図１５のステップＳ４４と同様である。 [Step S54] The recovery target extraction unit 533 creates a recovery target list 523. The details of this process are the same as step S43 in FIG.
[Step S55] The duplexing control unit 534 sets the already allocated slice of the recovery target segment as the primary slice. The details of this process are the same as step S44 in FIG.

［ステップＳ５６］二重化制御部５３４は、未処理の復旧対象セグメントに退避対象スライスをもつセグメントがあるか否かを判断する。具体的には、二重化制御部５３４は、復旧対象リスト５２３を参照して、復旧対象セグメントを判断する。なお、二重化制御部５３４は、復旧対象セグメントに対して二重化復旧のためにセカンダリスライスが割り当てられているか否かの情報を内部で記憶しており、セカンダリスライスが割り当てられていない復旧対象セグメントを未処理と認識する。次に、二重化制御部５３４は、セグメント情報テーブル５２２を参照して、未処理の復旧対象セグメントに割り当てられている既割当スライスを判断する。さらに、二重化制御部５３４は、退避対象ディスクノードリスト５２５を参照し、既割当スライスが退避対象ディスクノードに属しているスライスならば退避対象スライスに該当し、この退避対象スライスを持つセグメントが該当する復旧対象セグメントと判断する。該当する復旧対象セグメントがあれば処理がステップＳ５７に進められる。該当する復旧対象セグメントがなければ、処理がステップＳ６１（図１７参照）に進められる。 [Step S56] The duplexing control unit 534 determines whether there is a segment having a save target slice in an unprocessed recovery target segment. Specifically, the duplexing control unit 534 refers to the recovery target list 523 and determines a recovery target segment. Note that the duplexing control unit 534 internally stores information on whether or not a secondary slice has been allocated to the recovery target segment for duplex recovery, and the recovery target segment to which the secondary slice has not been allocated has not been stored. Recognize as a process. Next, the duplexing control unit 534 refers to the segment information table 522 and determines an already allocated slice allocated to an unprocessed recovery target segment. Further, the duplexing control unit 534 refers to the save target disk node list 525, and if the already allocated slice belongs to the save target disk node, it corresponds to the save target slice, and the segment having this save target slice corresponds. Judged as a recovery target segment. If there is a corresponding recovery target segment, the process proceeds to step S57. If there is no corresponding recovery target segment, the process proceeds to step S61 (see FIG. 17).

［ステップＳ５７］二重化制御部５３４は、退避対象スライスが割り当てられた未処理の復旧対象セグメントを１つ選択する。
［ステップＳ５８］二重化制御部５３４は、選択した復旧対象セグメントに、退避対象ディスクノード以外のディスクノードで管理されているスライスを割り当てる。具体的には、二重化制御部５３４は、スライス情報テーブル５２１から、退避対象ディスクノード以外のディスクノードに対応するスライスから、空きスライス（状態が「Ｆ」）を１つ選択する。そして、選択したスライスを、ステップＳ５７で選択した復旧対象セグメントのセカンダリスライスとして割り当てる。その後、処理がステップＳ５６に進められる。 [Step S57] The duplexing control unit 534 selects one unprocessed recovery target segment to which the save target slice is assigned.
[Step S58] The duplexing control unit 534 allocates a slice managed by a disk node other than the save target disk node to the selected recovery target segment. Specifically, the duplexing control unit 534 selects one free slice (status “F”) from the slices corresponding to the disk nodes other than the save target disk node from the slice information table 521. Then, the selected slice is allocated as the secondary slice of the recovery target segment selected in step S57. Thereafter, the process proceeds to step S56.

図１７は、退避処理中リカバリ処理を示す第２のフローチャートである。以下、図１７に示す処理をステップ番号に沿って説明する。
［ステップＳ６１］二重化制御部５３４は、未処理の復旧対象セグメントがあるか否かを判断する。具体的には、二重化制御部５３４は、復旧対象リスト５２３を参照して、復旧対象セグメントを判断する。そして、二重化制御部５３４は、復旧対象セグメントのうち、セカンダリスライスを割り当てていないセグメントがあれば、未処理の復旧対象セグメントがあると判断する。未処理の復旧対象セグメントがある場合、処理がステップＳ６２に進められる。未処理の復旧対象セグメントがない場合、処理がステップＳ７４（図１８参照）に進められる。 FIG. 17 is a second flowchart showing the recovery process during the save process. In the following, the process illustrated in FIG. 17 will be described in order of step number.
[Step S61] The duplexing control unit 534 determines whether or not there is an unprocessed recovery target segment. Specifically, the duplexing control unit 534 refers to the recovery target list 523 and determines a recovery target segment. Then, if there is a segment to which no secondary slice is allocated among the recovery target segments, the duplexing control unit 534 determines that there is an unprocessed recovery target segment. If there is an unprocessed recovery target segment, the process proceeds to step S62. If there is no unprocessed recovery target segment, the process proceeds to step S74 (see FIG. 18).

［ステップＳ６２］二重化制御部５３４は、退避対象ディスクノード以外のディスクノードに空きスライスがあるか否かを判断する。具体的には、二重化制御部５３４は、スライス情報テーブル５２１から、退避対象ディスクノード以外のディスクノードに対応するスライスのうち、空きスライス（状態が「Ｆ」）があるか否かを判断する。空きスライスがあれば、処理がステップＳ６３に進められる。空きスライスがなければ、処理がステップＳ７１（図１８に示す）に進められる。 [Step S62] The duplexing control unit 534 determines whether there is a free slice in a disk node other than the save target disk node. Specifically, the duplexing control unit 534 determines from the slice information table 521 whether there is an empty slice (status “F”) among slices corresponding to disk nodes other than the save target disk node. If there is an empty slice, the process proceeds to step S63. If there is no empty slice, the process proceeds to step S71 (shown in FIG. 18).

［ステップＳ６３］二重化制御部５３４は、未処理の復旧対象セグメントを１つ選択する。
［ステップＳ６４］二重化制御部５３４は、ステップＳ６３で選択した復旧対象セグメントに、退避対象ディスクノード以外のディスクノードで管理されているスライスを割り当てる。具体的には、二重化制御部５３４は、スライス情報テーブル５２１から、退避対象ディスクノード以外のディスクノードに対応するスライスから、空きスライス（状態が「Ｆ」）を１つ選択する。そして、選択したスライスを、ステップＳ６３で選択した復旧対象セグメントのセカンダリスライスとして割り当てる。その後、処理がステップＳ６１に進められる。 [Step S63] The duplex control unit 534 selects one unprocessed recovery target segment.
[Step S64] The duplexing control unit 534 allocates a slice managed by a disk node other than the save target disk node to the recovery target segment selected in Step S63. Specifically, the duplexing control unit 534 selects one free slice (status “F”) from the slices corresponding to the disk nodes other than the save target disk node from the slice information table 521. Then, the selected slice is allocated as the secondary slice of the recovery target segment selected in step S63. Thereafter, the process proceeds to step S61.

図１８は、退避処理中リカバリ処理を示す第３のフローチャートである。以下、図１８に示す処理をステップ番号に沿って説明する。
［ステップＳ７１］二重化制御部５３４は、未処理の復旧対象セグメントを１つ選択する。 FIG. 18 is a third flowchart showing the recovery process during the save process. In the following, the process illustrated in FIG. 18 will be described in order of step number.
[Step S71] The duplexing control unit 534 selects one unprocessed recovery target segment.

［ステップＳ７２］二重化制御部５３４は、ステップＳ７１で選択した復旧対象セグメントに、退避対象ディスクノードで管理されているスライスを割り当てる。具体的には、二重化制御部５３４は、スライス情報テーブル５２１から、退避対象ディスクノードに対応するスライスから、空きスライス（状態が「Ｆ」）を１つ選択する。そして、選択したスライスを、ステップＳ７１で選択した復旧対象セグメントのセカンダリスライスとして割り当てる。 [Step S72] The duplexing control unit 534 allocates the slice managed by the save target disk node to the recovery target segment selected in Step S71. Specifically, the duplexing control unit 534 selects one free slice (status “F”) from the slice corresponding to the save target disk node from the slice information table 521. Then, the selected slice is allocated as the secondary slice of the recovery target segment selected in step S71.

［ステップＳ７３］二重化制御部５３４は、未処理の復旧対象セグメントがあるか否かを判断する。未処理の復旧対象セグメントがあれば、処理がステップＳ７１に進められる。未処理の復旧対象セグメントがなければ、処理がステップＳ７４に進められる。 [Step S73] The duplexing control unit 534 determines whether there is an unprocessed recovery target segment. If there is an unprocessed recovery target segment, the process proceeds to step S71. If there is no unprocessed recovery target segment, the process proceeds to step S74.

［ステップＳ７４］二重化制御部５３４は、復旧対象セグメントの既割当スライス（プライマリスライス）から新たに割り当てたスライス（セカンダリスライス）へのデータコピーを指示する。具体的には、二重化制御部５３４は、プライマリスライスを管理するディスクノードに対して、セカンダリスライスのディスクノードＩＤとスライスＩＤとを指定したデータコピー指示を送信する。 [Step S74] The duplexing control unit 534 instructs to copy data from the already allocated slice (primary slice) of the recovery target segment to the newly allocated slice (secondary slice). Specifically, the duplexing control unit 534 transmits a data copy instruction specifying the disk node ID and slice ID of the secondary slice to the disk node that manages the primary slice.

［ステップＳ７５］二重化制御部５３４は、スライス情報更新指示をプライマリスライスとセカンダリスライスとを管理するディスクノードに送信する。送信されるスライス情報更新指示には、既割当スライスを復旧対象セグメントのプライマリライスとし、ステップＳ４５で割り当てたスライスを復旧対象セグメントのセカンダリスライスとすることが示される。各ディスクノードでは、スライス情報記憶部とストレージ装置とのそれぞれに格納されたメタデータが更新される。ディスクノードは、メタデータを更新すると、スライス情報更新完了の応答を二重化制御部５３４に送信する。二重化制御部５３４は、ディスクノードからスライス情報更新完了の応答が返されると、スライス情報テーブル５２１とセグメント情報テーブル５２２とを更新する。 [Step S75] The duplexing control unit 534 transmits a slice information update instruction to the disk node that manages the primary slice and the secondary slice. The transmitted slice information update instruction indicates that the already allocated slice is the primary slice of the recovery target segment and the slice allocated in step S45 is the secondary slice of the recovery target segment. In each disk node, metadata stored in each of the slice information storage unit and the storage device is updated. When updating the metadata, the disk node transmits a response to update the slice information to the duplex control unit 534. The duplexing control unit 534 updates the slice information table 521 and the segment information table 522 when a response of completion of slice information update is returned from the disk node.

［ステップＳ７６］二重化制御部５３４は、退避処理を再実行するか否かを判断する。例えば、リカバリ処理終了後に中止した退避処理を再実行する旨の指示が予め入力されていた場合、退避処理を再実行すると判断する。また、二重化制御部５３４は、退避処理の再実行の有無を問い合わせる画面を表示し、管理者からの再実行の要否を示す操作入力に基づいて、退避処理再開の要否を判断してもよい。退避処理を再実行する場合、処理がステップＳ７７に進められる。退避処理を再実行しない場合、退避処理中リカバリ処理が終了する。 [Step S76] The duplex control unit 534 determines whether to re-execute the save process. For example, if an instruction to re-execute the save process that was canceled after the recovery process has been input is determined, the save process is determined to be re-executed. Further, the duplexing control unit 534 displays a screen for inquiring whether or not the evacuation process is re-executed, and determines whether or not the evacuation process needs to be resumed based on an operation input indicating whether or not the administrator needs to re-execute. Good. When the save process is re-executed, the process proceeds to step S77. If the save process is not re-executed, the recovery process during the save process ends.

［ステップＳ７７］二重化制御部５３４は、退避処理の再実行を退避制御部５３５に送信する。これにより、退避制御部５３５により退避処理が再実行される。再実行される退避処理の詳細は、図１３に示した処理と同様である。ただし、退避処理対象ディスクノードリスト５２５については中止された退避処理において作成されたものが使用される。そのため、再実行された退避処理では、ステップＳ１１に示す退避処理対象ディスクノードリスト作成処理は行われない。二重化制御部５３４が退避処理再開指示を出力すると、退避処理中リカバリ処理が終了する。 [Step S77] The duplex control unit 534 transmits the re-execution of the save process to the save control unit 535. Thereby, the save control unit 535 re-executes the save process. The details of the re-executed save process are the same as the process shown in FIG. However, for the save process target disk node list 525, one created in the canceled save process is used. Therefore, in the re-executed save process, the save process target disk node list creation process shown in step S11 is not performed. When the duplex control unit 534 outputs a save process restart instruction, the recovery process during the save process ends.

以上のような手順でリカバリ処理が行われる。以下に、退避処理中に障害が発生したときのスライス割当状態の遷移について説明する。
図１９は、退避処理中に障害が発生したときのスライス割当の状態遷移を示す図である。図１９において、ストレージ装置１１０，２１０，３１０，４１０を管理するディスクノードのディスクノードＩＤを、各ストレージ装置１１０，２１０，３１０，４１０の下に示している。また、ストレージ装置１１０，２１０，３１０，４１０内のスライスのスライスＩＤを、各スライスの左に示している。 The recovery process is performed in the above procedure. Hereinafter, the transition of the slice allocation state when a failure occurs during the saving process will be described.
FIG. 19 is a diagram showing a state transition of slice assignment when a failure occurs during the saving process. In FIG. 19, the disk node ID of the disk node that manages the storage devices 110, 210, 310, 410 is shown under each storage device 110, 210, 310, 410. The slice IDs of the slices in the storage apparatuses 110, 210, 310, and 410 are shown on the left of each slice.

第１の状態［ＳＴ１］は、退避処理開始時のスライス割当状態を示している。図１９の例では、ストレージ装置３１０の退避処理が開始されている。ストレージ装置３１０には、３つのスライスがセグメントに割り当てられている。そこで、ストレージ装置３１０のスライスＩＤ「２」のスライスのデータをストレージ装置２１０のスライスＩＤ「２」のスライスに退避させ、ストレージ装置３１０のスライスＩＤ「３」のスライスのデータをストレージ装置４１０のスライスＩＤ「２」のスライスに退避させ、ストレージ装置３１０のスライスＩＤ「６」のスライスのデータをストレージ装置１１０のスライスＩＤ「２」のスライスに退避させるものとする。 The first state [ST1] indicates the slice allocation state at the start of the save process. In the example of FIG. 19, the save processing for the storage device 310 is started. In the storage device 310, three slices are assigned to the segments. Therefore, the data of the slice with the slice ID “2” in the storage device 310 is saved to the slice with the slice ID “2” in the storage device 210, and the data of the slice with the slice ID “3” in the storage device 310 is saved in the slice of the storage device 410. It is assumed that the slice with the ID “2” is saved and the data of the slice with the slice ID “6” of the storage device 310 is saved in the slice with the slice ID “2” of the storage device 110.

この場合、ストレージ装置３１０内の退避対象スライスが順に選択され、退避先のスライスにデータがコピーされる。なお、ストレージ装置３１０のスライスＩＤ「３」のスライスは、セグメントＩＤ「３」のセグメントのプライマリリスライスである。そこで、セグメントＩＤ「３」のセグメントのセカンダリスライス（ストレージ装置２１０のスライスＩＤ「３」のスライス）のプライマリ化を行った後、データコピーが行われる。 In this case, the save target slices in the storage device 310 are selected in order, and the data is copied to the save destination slice. Note that the slice with the slice ID “3” of the storage apparatus 310 is the primary re-slice of the segment with the segment ID “3”. Therefore, after the secondary slice of the segment with the segment ID “3” (the slice with the slice ID “3” of the storage apparatus 210) is made primary, data copy is performed.

このような退避処理中に、退避対象のストレージ装置３１０以外のストレージ装置に障害が発生すると、退避処理は中止される。
第２の状態［ＳＴ２］は、障害発生時のスライス割当状態を示している。図１９の例では、ストレージ装置３１０のスライスＩＤ「６」のスライスのデータを退避させる前に、ストレージ装置４１０に障害が発生している。この場合、退避処理が中止される。 If a failure occurs in a storage device other than the storage device 310 to be saved during such saving processing, the saving processing is stopped.
The second state [ST2] indicates the slice allocation state when a failure occurs. In the example of FIG. 19, a failure has occurred in the storage apparatus 410 before the data of the slice with the slice ID “6” in the storage apparatus 310 is saved. In this case, the save process is stopped.

第３の状態［ＳＴ３］は、退避処理中止後のスライス割当状態を示している。退避処理が中止されたため、依然として、ストレージ装置３１０のスライスＩＤ「６」のスライスは、セグメントＩＤ「２」のセグメントに割り当てられている。なお、退避処理が中止されても、ストレージ装置が退避処理の対象であることを示す情報は維持される。 The third state [ST3] indicates a slice allocation state after the save process is stopped. Since the saving process is stopped, the slice with the slice ID “6” of the storage apparatus 310 is still assigned to the segment with the segment ID “2”. Even when the save process is stopped, information indicating that the storage device is the target of the save process is maintained.

退避処理が中止されると、リカバリ処理が開始される。リカバリ処理では、まずスライス情報テーブル５２１とセグメント情報テーブル５２２とが再構成される。
図２０は、再構成されたスライス情報テーブルの例を示す図である。再構成されたスライス情報テーブル５２１には、障害が発生したディスクノード４００（ディスクノードＩＤ「ＤＰ−Ｄ」）のメタデータは含まれていない。そのため、正常動作している３台のディスクノード１００，２００，３００から収集したメタデータから、セグメント情報テーブル５２２が再構成される。 When the save process is stopped, the recovery process is started. In the recovery process, first, the slice information table 521 and the segment information table 522 are reconfigured.
FIG. 20 is a diagram illustrating an example of a reconfigured slice information table. The reconfigured slice information table 521 does not include the metadata of the disk node 400 (disk node ID “DP-D”) where the failure has occurred. Therefore, the segment information table 522 is reconstructed from the metadata collected from the three normally operating disk nodes 100, 200, and 300.

図２１は、再構成されたセグメント情報テーブルの例を示す図である。再構成されたセグメント情報テーブル５２２では、１つのスライスしか割り当てられていないセグメントが存在する。このセグメントが、復旧対象セグメントとして選択される。 FIG. 21 is a diagram illustrating an example of a reconfigured segment information table. In the reconfigured segment information table 522, there is a segment to which only one slice is allocated. This segment is selected as a recovery target segment.

次に、退避処理中のリカバリ処理におけるスライス割当の状態遷移例について説明する。
図２２は、退避処理中のリカバリ処理におけるスライス割当の状態遷移を示す図である。リカバリ処理を行う場合、障害が発生したストレージ装置４１０に格納されていたスライスが割り当てられていたセグメントがリカバリ対象となる。そして、リカバリ対象のセグメントに割り当てられている別のセグメントのデータを用いて、データの二重化が実行される。データの二重化を行う場合、退避処理対象のストレージ装置３１０のスライスが優先的に処理される。 Next, an example of state transition of slice allocation in the recovery process during the save process will be described.
FIG. 22 is a diagram illustrating state transition of slice allocation in the recovery process during the save process. When performing the recovery process, the segment to which the slice stored in the storage apparatus 410 in which the failure has been allocated is the recovery target. Then, data duplication is executed using data of another segment assigned to the recovery target segment. When data is duplicated, the slice of the storage apparatus 310 that is the save process target is preferentially processed.

第４の状態［ＳＴ４］は、退避対象スライスのリカバリ処理時のスライス割当状態を示している。図２２の例では、ストレージ装置４１０のスライスＩＤ「６」のスライスが、セグメントＩＤ「２」のセグメントに対して、プライマリスライスとして割り当てられている。またこのセグメントには、退避処理対象のストレージ装置３１０のスライスＩＤ「６」のスライスが、セカンダリスライスとして割り当てられている。そこで、ストレージ装置３１０のスライスＩＤ「６」のスライスをプライマリスライスに変更した後、そのスライスのデータのコピーが行われる。図２２の例では、ストレージ装置３１０のスライス番号「６」の退避対象スライスのデータが、ストレージ装置１１０のスライス番号「２」のスライスにコピーされている。 The fourth state [ST4] indicates the slice allocation state during the recovery process of the save target slice. In the example of FIG. 22, the slice with the slice ID “6” of the storage apparatus 410 is assigned as the primary slice to the segment with the segment ID “2”. In addition, a slice with the slice ID “6” of the storage apparatus 310 that is the save process target is assigned to this segment as a secondary slice. Therefore, after the slice with the slice ID “6” in the storage device 310 is changed to the primary slice, the data of the slice is copied. In the example of FIG. 22, the data of the slice to be saved whose slice number is “6” in the storage apparatus 310 is copied to the slice whose slice number is “2” in the storage apparatus 110.

その後、退避処理対象のストレージ装置３１０以外のストレージ装置１１０，２１０に格納されているデータのリカバリ処理が行われる。
第５の状態［ＳＴ５］は、退避対象ではないスライスのリカバリ処理時のスライス割当状態を示している。図２２の例では、ストレージ装置１１０のスライスＩＤ「２」のスライスのデータが、ストレージ装置２１０のスライスＩＤ「１」のスライスにコピーされる。また、ストレージ装置２１０のスライスＩＤ「３」のスライスのデータが、ストレージ装置１１０のスライスＩＤ「５」のスライスにコピーされる。 Thereafter, recovery processing of data stored in the storage apparatuses 110 and 210 other than the storage apparatus 310 to be saved is performed.
The fifth state [ST5] indicates a slice allocation state at the time of recovery processing of a slice that is not a save target. In the example of FIG. 22, the data of the slice with the slice ID “2” in the storage device 110 is copied to the slice with the slice ID “1” in the storage device 210. Further, the data of the slice with the slice ID “3” of the storage apparatus 210 is copied to the slice with the slice ID “5” of the storage apparatus 110.

第６の状態［ＳＴ６］は、リカバリ処理終了時の状態を示している。リカバリ処理が終了したことにより、ストレージ装置４１０のスライスが割り当てられていたセグメントは、正常に動作しているストレージ装置１１０，２１０，３１０のスライスが二重化して割り当てられている。また、退避処理対象のストレージ装置３１０は、リカバリ処理時のコピー先として選択される優先順位が最も低いため、リカバリ処理により新たにデータがコピーされずに済んでいる。 The sixth state [ST6] indicates a state at the end of the recovery process. As a result of the completion of the recovery process, the segment to which the slice of the storage device 410 has been assigned is assigned with the slice of the storage device 110, 210, 310 operating normally being duplicated. Further, since the storage apparatus 310 that is the save process target has the lowest priority selected as the copy destination at the time of the recovery process, new data is not copied by the recovery process.

リカバリ処理終了後、中止した退避処理を再開することができる。
図２３は、再開した退避処理におけるスライス割当の状態遷移を示す図である。第７の状態［ＳＴ７］は、退避処理再開時のスライス割当状態を示している。図２３の例では、空きスライスは、ストレージ装置２１０のスライスＩＤ「６」のスライスのみである。そこで、ストレージ装置２１０のスライスＩＤ「６」のスライスに、退避対象スライスの退避が行われる。 After the recovery process is completed, the canceled save process can be resumed.
FIG. 23 is a diagram illustrating a state transition of slice allocation in the resumed save process. The seventh state [ST7] indicates the slice allocation state when the save process is resumed. In the example of FIG. 23, the empty slice is only the slice with the slice ID “6” of the storage apparatus 210. Therefore, the save target slice is saved in the slice with the slice ID “6” in the storage apparatus 210.

なお、退避対象スライスであるストレージ装置３１０のスライスＩＤ「６」のスライスは、プライマリスライスである。そこで、そのスライスとペアを組んでいるストレージ装置１１０のスライスＩＤ「２」のスライスが、プライマリスライスに変更される。 Note that the slice with the slice ID “6” of the storage device 310 that is the save target slice is the primary slice. Therefore, the slice with the slice ID “2” of the storage apparatus 110 paired with the slice is changed to the primary slice.

第８の状態［ＳＴ８］は、退避処理終了後のスライス割当状態を示している。退避処理が終了したことにより、ストレージ装置３１０内のスライスはすべて空きスライスとなっている。 The eighth state [ST8] shows the slice allocation state after the end of the save process. Since the saving process is completed, all the slices in the storage device 310 are empty slices.

このようにして、再開した退避処理が実行される。退避処理対象のストレージ装置３１０のスライスは、リカバリ処理におけるリザーブスライスとして選択される優先順位が低い。そのため、退避処理を中止してリカバリ処理が実行されても、退避対象スライス数が増加せずに済む。その結果、再開した退避処理では、中止時点で未処理だった退避対象スライスのデータを退避させるだけで済む。すなわち、退避処理中に退避処理対象のストレージ装置とは異なるストレージ装置が故障してリカバリ処理が実行されても、退避処理の長期化が防止される。 In this way, the resumed save process is executed. The slice of the storage apparatus 310 that is the save process target has a low priority order to be selected as the reserve slice in the recovery process. Therefore, even if the save process is stopped and the recovery process is executed, the number of save target slices does not need to increase. As a result, in the resumed save process, it is only necessary to save the data of the slice to be saved that has not been processed at the time of cancellation. In other words, even if a storage device that is different from the storage device targeted for the save process fails and the recovery process is executed during the save process, the save process is prevented from being prolonged.

しかも、リカバリ処理において、退避処理対象ディスクノードで管理されているスライスのコピー先の選択を優先して実行する。これにより、未処理の復旧対象セグメントがあるにも拘わらず退避処理対象ディスクノード以外のディスクノードに空きスライスが無くなってしまった場合であっても、未処理の復旧対象セグメントに割り当てるスライスを確実に選択できる。 In addition, in the recovery process, selection of the copy destination of the slice managed by the save process target disk node is executed with priority. This ensures that the slices to be allocated to the unprocessed recovery target segment can be ensured even when there are no free slices in the disk nodes other than the save processing target disk node even though there are unprocessed recovery target segments. You can choose.

すなわち、リカバリ処理で未処理の復旧対象セグメントに割り当てるスライスを選択する場合、既に割り当てられているスライスとは異なるディスクノードで管理されているスライスを選択する必要がある。退避処理対象ディスクノード以外のディスクノードに空きスライスが無くなってしまった場合、空きスライスは退避処理対象ディスクノードにしか存在しない。このとき、未処理の復旧対象セグメントの既に割り当てられているスライスが退避処理対象ディスクノードで管理されている場合、二重化のためのペアとなるスライスの選択の余地が無くなってしまう。退避処理対象ディスクノードで管理されているスライスのコピー先の選択を優先して実行しておけば、このようなペアとなるスライスが選択できないという事態の発生を回避することができる。 That is, when selecting a slice to be allocated to an unprocessed recovery target segment in the recovery process, it is necessary to select a slice managed by a disk node different from the already allocated slice. When there is no free slice in a disk node other than the save process target disk node, the free slice exists only in the save process target disk node. At this time, if a slice that has already been allocated to an unprocessed recovery target segment is managed by the save processing target disk node, there is no room for selection of a pair of slices for duplexing. If priority is given to the selection of the copy destination of the slice managed by the save process target disk node, such a situation that the paired slice cannot be selected can be avoided.

さらに、上記実施の形態では、退避対象スライスがプライマリスライスの場合、ペアとなるスライスをプライマリ化した後、データのコピーを実行する。これにより、アクセスノード３０，４０からのデータアクセスを、データコピー処理を行っていないストレージノードに対して行わせることができる。すなわち、アクセスノード３０，４０は、制御ノード５００からセグメント情報テーブル５２２に示される情報を取得し、常にプライマリスライスに対してデータアクセスを行う。アクセスノード３０，４０がデータアクセスを行ったときに、アクセス先のスライスが、アクセス対象のセグメントのプライマリスライスではない場合、アクセスノード３０，４０は、制御ノード５００から最新のセグメント情報テーブル５２２を取得し、改めてプライマリスライスにアクセスを行う。このように、アクセスノード３０，４０が常にプライマリスライスにアクセスするように制御されていることから、退避処理によるコピー元のスライスがプライマリスライスの場合、ペアとなるスライスをプライマリ化することで、退避対象スライスへのアクセスを防止できる。その結果、退避処理のためのデータコピーとアクセスノード３０，４０からのアクセスが同一セグメントに同時に行われることがなくなり、アクセス効率の低下を防止できる。 Furthermore, in the above embodiment, when the save target slice is a primary slice, data is copied after the paired slice is made primary. As a result, data access from the access nodes 30 and 40 can be performed on a storage node that has not been subjected to data copy processing. That is, the access nodes 30 and 40 obtain information indicated in the segment information table 522 from the control node 500, and always perform data access to the primary slice. When the access nodes 30 and 40 perform data access and the access destination slice is not the primary slice of the segment to be accessed, the access nodes 30 and 40 obtain the latest segment information table 522 from the control node 500. Then, the primary slice is accessed again. As described above, since the access nodes 30 and 40 are controlled so as to always access the primary slice, when the copy source slice by the save process is the primary slice, the paired slice is made primary to save it. Access to the target slice can be prevented. As a result, the data copy for the saving process and the access from the access nodes 30 and 40 are not simultaneously performed on the same segment, and a decrease in access efficiency can be prevented.

なお、上記の処理機能は、コンピュータによって実現することができる。その場合、制御ノードやディスクノードが有すべき機能の処理内容を記述したプログラムが提供される。そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリなどがある。磁気記録装置には、ハードディスク装置（ＨＤＤ）、フレキシブルディスク（ＦＤ）、磁気テープなどがある。光ディスクには、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）などがある。光磁気記録媒体には、ＭＯ（Magneto-Optical disc）などがある。 The above processing functions can be realized by a computer. In that case, a program describing the processing contents of the functions that the control node and the disk node should have is provided. By executing the program on a computer, the above processing functions are realized on the computer. The program describing the processing contents can be recorded on a computer-readable recording medium. Examples of the computer-readable recording medium include a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Examples of the magnetic recording device include a hard disk device (HDD), a flexible disk (FD), and a magnetic tape. Optical disks include DVD (Digital Versatile Disc), DVD-RAM, CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable) / RW (ReWritable), and the like. Magneto-optical recording media include MO (Magneto-Optical disc).

プログラムを流通させる場合には、例えば、そのプログラムが記録されたＤＶＤ、ＣＤ−ＲＯＭなどの可搬型記録媒体が販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することもできる。 When distributing the program, for example, a portable recording medium such as a DVD or a CD-ROM in which the program is recorded is sold. It is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.

プログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータは、サーバコンピュータからプログラムが転送されるごとに、逐次、受け取ったプログラムに従った処理を実行することもできる。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute processing according to the program. Further, each time the program is transferred from the server computer, the computer can sequentially execute processing according to the received program.

なお、本発明は、上述の実施の形態にのみ限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変更を加えることができる。
以上説明した実施の形態の主な技術的特徴は、以下の付記の通りである。 The present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the gist of the present invention.
The main technical features of the embodiment described above are as follows.

（付記１）複数のストレージ装置に格納されたデータの管理をコンピュータに実行させるデータ管理プログラムであって、
前記コンピュータを、
任意のストレージ装置を指定した退避要求が入力されると、指定された退避処理対象ストレージ装置の識別情報を記憶手段に格納し、前記退避処理対象ストレージ装置に格納されている全データを他のストレージ装置にコピーする退避処理を実行する退避処理実行手段、
障害が発生した障害ストレージ装置を検出する障害検出手段、
前記退避処理中に前記障害検出手段で障害の発生が検出された場合、前記退避処理実行手段による前記退避処理を中止させる退避処理中止指示手段、
前記障害検出手段で検出された前記障害ストレージ装置の障害により二重化状態が損なわれたデータを復旧対象データとする復旧対象データ判定手段、
前記記憶手段を参照し、前記復旧対象データの冗長データコピー先を、前記退避処理対象ストレージ装置以外のストレージ装置の記憶領域から優先的に選択する冗長データコピー先選択手段、
前記冗長データコピー先選択手段で選択された前記冗長データコピー先の記憶領域に、前記復旧対象データをコピーする冗長データコピー手段、
として機能させるデータ管理プログラム。 (Supplementary note 1) A data management program for causing a computer to manage data stored in a plurality of storage devices,
The computer,
When an evacuation request designating an arbitrary storage device is input, the identification information of the designated evacuation processing target storage device is stored in the storage means, and all the data stored in the evacuation processing target storage device is stored in another storage device Save processing execution means for executing save processing to be copied to the device,
Failure detection means for detecting a failed storage device in which a failure has occurred;
A evacuation process stop instruction means for suspending the evacuation process by the evacuation process execution means when an occurrence of a failure is detected by the failure detection means during the evacuation process;
Recovery target data determination means for setting recovery target data to data whose duplex state has been lost due to a failure of the failed storage device detected by the failure detection means;
A redundant data copy destination selection unit that refers to the storage unit and preferentially selects a redundant data copy destination of the recovery target data from a storage area of a storage device other than the save processing target storage device;
Redundant data copy means for copying the recovery object data to the redundant data copy destination storage area selected by the redundant data copy destination selection means;
Data management program to function as.

（付記２）前記冗長データコピー先選択手段は、前記復旧対象データの総量に対して、前記退避処理対象ストレージ装置以外のストレージ装置内の空き領域が不足している場合、不足しているデータ容量分だけ、前記退避処理対象ストレージ装置内の空き領域を前記冗長データコピー先として選択することを特徴とする付記１記載のデータ管理プログラム。 (Supplementary Note 2) If the redundant data copy destination selection unit has insufficient free space in the storage device other than the save processing target storage device with respect to the total amount of the recovery target data, the insufficient data capacity The data management program according to appendix 1, wherein an empty area in the save processing target storage apparatus is selected as the redundant data copy destination by the amount.

（付記３）前記冗長データコピー先選択手段は、前記退避処理対象ストレージ装置内に格納されている前記復旧対象データの前記冗長データコピー先を決定後、前記退避処理対象ストレージ装置以外のストレージ装置に格納されている前記復旧対象データの前記冗長データコピー先を決定することを特徴とする付記１記載のデータ管理プログラム。 (Supplementary Note 3) After the redundant data copy destination selection unit determines the redundant data copy destination of the recovery target data stored in the save processing target storage device, the redundant data copy destination selection unit sends the redundant data copy destination selection unit to a storage device other than the save processing target storage device. The data management program according to appendix 1, wherein the redundant data copy destination of the stored recovery target data is determined.

（付記４）前記復旧対象データ判定手段は、前記障害検出手段で検出された前記障害ストレージ装置以外の正常ストレージ装置に格納されているデータ同士の対応関係を調査し、対応関係にある他のデータが存在しないデータを前記復旧対象データとすることを特徴とする付記１記載のデータ管理プログラム。 (Additional remark 4) The said recovery object data determination means investigates the correspondence of the data stored in normal storage apparatuses other than the said failure storage apparatus detected by the said failure detection means, and other data in correspondence The data management program according to appendix 1, wherein data for which there is no data is the recovery target data.

（付記５）前記復旧対象データ判定手段は、仮想的に設けられた論理ボリュームの記憶領域を構成する複数のセグメントへの前記複数のストレージ装置の記憶領域を構成するスライスの割当関係を示すメタデータを、前記複数のストレージ装置から収集し、収集した前記メタデータに基づいて、１つのスライスしか割り当てられていないセグメントを復旧対象セグメントとし、前記復旧対象セグメントに割り当てられているスライス内のデータを前記復旧対象データとすることを特徴とする付記１記載のデータ管理プログラム。 (Additional remark 5) The said recovery object data determination means is the metadata which shows the allocation relationship of the slice which comprises the storage area of the said some storage apparatus to the some segment which comprises the storage area of the logical volume provided virtually Are collected from the plurality of storage devices, and based on the collected metadata, a segment to which only one slice is allocated is set as a recovery target segment, and data in the slice allocated to the recovery target segment is set as the recovery target segment. The data management program according to supplementary note 1, wherein the data management program is recovery target data.

（付記６）前記退避処理実行手段は、前記退避処理が中止された場合、前記冗長データコピー手段によるコピー処理が終了後、前記退避対象ストレージ装置内のデータの退避処理を再実行することを特徴とする付記１記載のデータ管理プログラム。 (Additional remark 6) The said save process execution means re-executes the save process of the data in the said storage object storage apparatus, after the copy process by the said redundant data copy means is complete | finished, when the said save process is stopped. The data management program according to appendix 1.

（付記７）前記複数のストレージ装置はそれぞれ個別のディスクノードで管理されており、
前記退避処理実行手段は、前記退避処理対象ストレージ装置に格納されているデータのコピー先となる退避データコピー先を、前記退避処理対象ストレージ装置以外のストレージ装置の空き領域から選択し、前記退避処理対象ストレージ装置を管理する前記ディスクノードに、前記退避データコピー先への前記退避処理対象ストレージ装置内のデータのコピーを実行させることを特徴とする付記１記載のデータ管理プログラム。 (Appendix 7) Each of the storage devices is managed by an individual disk node,
The save processing execution unit selects a save data copy destination as a copy destination of data stored in the save processing target storage device from free areas of storage devices other than the save processing target storage device, and performs the save processing. The data management program according to appendix 1, wherein the disk node that manages the target storage device is caused to execute copying of data in the backup processing target storage device to the save data copy destination.

（付記８）前記複数のストレージ装置はそれぞれ個別のディスクノードで管理されており、
前記冗長データコピー手段は、前記復旧対象データが格納された前記ストレージ装置を管理する前記ディスクノードに、前記冗長データコピー先選択手段で選択された前記冗長データコピー先への前記復旧対象データのコピーを実行させることを特徴とする付記１記載のデータ管理プログラム。 (Appendix 8) Each of the storage devices is managed by an individual disk node.
The redundant data copy means copies the recovery target data to the redundant data copy destination selected by the redundant data copy destination selection means to the disk node that manages the storage device in which the recovery target data is stored. The data management program according to supplementary note 1, wherein the data management program is executed.

（付記９）複数のストレージ装置に格納されたデータの管理をコンピュータに実行させるデータ管理装置であって、
任意のストレージ装置を指定した退避要求が入力されると、指定された退避処理対象ストレージ装置の識別情報を記憶手段に格納し、前記退避処理対象ストレージ装置に格納されている全データを他のストレージ装置にコピーする退避処理を実行する退避処理実行手段と、
障害が発生した障害ストレージ装置を検出する障害検出手段と、
前記退避処理中に前記障害検出手段で障害の発生が検出された場合、前記退避処理実行手段による前記退避処理を中止させる退避処理中止指示手段と、
前記障害検出手段で検出された前記障害ストレージ装置の障害により二重化状態が損なわれたデータを判断し、復旧対象データとする復旧対象データ判定手段と、
前記記憶手段を参照し、前記復旧対象データの冗長データコピー先を、前記退避処理対象ストレージ装置以外のストレージ装置の記憶領域から優先的に選択する冗長データコピー先選択手段と、
前記冗長データコピー先選択手段で選択された前記冗長データコピー先の記憶領域に、前記復旧対象データをコピーする冗長データコピー手段と、
を有するデータ管理装置。 (Supplementary Note 9) A data management apparatus that causes a computer to manage data stored in a plurality of storage apparatuses,
When an evacuation request designating an arbitrary storage device is input, the identification information of the designated evacuation processing target storage device is stored in the storage means, and all the data stored in the evacuation processing target storage device is stored in another storage device Save processing execution means for executing save processing to be copied to the device;
Failure detection means for detecting a failed storage device in which a failure has occurred;
A saving process stop instruction means for stopping the saving process by the saving process execution means when the failure detection means detects a failure during the saving process;
A recovery target data determination unit that determines data that has been duplicated due to a failure of the failed storage device detected by the failure detection unit, and sets the data as a recovery target data;
A redundant data copy destination selection means for preferentially selecting a redundant data copy destination of the recovery target data from a storage area of a storage device other than the save processing target storage device with reference to the storage means;
Redundant data copy means for copying the recovery target data to the storage area of the redundant data copy destination selected by the redundant data copy destination selection means;
A data management device.

（付記１０）複数のストレージ装置に格納されたデータの管理をコンピュータに実行させるデータ管理方法であって、
前記コンピュータが、
任意のストレージ装置を指定した退避要求が入力されると、指定された退避処理対象ストレージ装置の識別情報を記憶手段に格納し、前記退避処理対象ストレージ装置に格納されている全データを他のストレージ装置にコピーする退避処理を実行し、
障害が発生した障害ストレージ装置を検出し、
前記退避処理中に障害の発生が検出された場合、前記退避処理を中止させ、
前記障害ストレージ装置の障害により二重化状態が損なわれたデータを復旧対象データとし、
前記記憶手段を参照し、前記復旧対象データの冗長データコピー先を、前記退避処理対象ストレージ装置以外のストレージ装置の記憶領域から優先的に選択し、
選択された前記冗長データコピー先の記憶領域に、前記復旧対象データをコピーする、
ことを特徴とするデータ管理方法。 (Supplementary Note 10) A data management method for causing a computer to manage data stored in a plurality of storage devices,
The computer is
When an evacuation request designating an arbitrary storage device is input, the identification information of the designated evacuation processing target storage device is stored in the storage means, and all the data stored in the evacuation processing target storage device is stored in another storage device Execute save processing to copy to the device,
Detect the failed storage device that failed,
If a failure is detected during the save process, the save process is stopped,
The data whose duplex state has been lost due to the failure of the failed storage device is the recovery target data,
With reference to the storage means, a redundant data copy destination of the recovery target data is preferentially selected from storage areas of storage devices other than the save processing target storage device, and
Copying the recovery target data to the selected storage area of the redundant data copy destination;
A data management method characterized by the above.

実施の形態の概要を示す図である。It is a figure which shows the outline | summary of embodiment. 本実施の形態のマルチノードストレージシステム構成例を示す図である。It is a figure which shows the example of a multinode storage system structure of this Embodiment. 制御ノードのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of a control node. 論理ボリュームのデータ構造を示す図である。It is a figure which shows the data structure of a logical volume. ディスクノードとアクセスノードとの機能を示すブロック図である。It is a block diagram which shows the function of a disk node and an access node. スライス情報記憶部のデータ構造例を示す図である。It is a figure which shows the example of a data structure of a slice information storage part. 制御ノードの機能を示すブロック図である。It is a block diagram which shows the function of a control node. スライス情報テーブルのデータ構造例を示す図である。It is a figure which shows the example of a data structure of a slice information table. セグメント情報テーブルのデータ構造例を示す図である。It is a figure which shows the example of a data structure of a segment information table. 復旧対象リストのデータ構造例を示す図である。It is a figure which shows the example of a data structure of a recovery object list | wrist. 退避対象スライスリストのデータ構造例を示す図である。It is a figure which shows the example of a data structure of the save object slice list. 退避処理対象ディスクノードリストのデータ構造例を示す図である。It is a figure which shows the example of a data structure of the evacuation process object disk node list. 退避処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a save process. リカバリ処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a recovery process. 通常リカバリ処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a normal recovery process. 退避処理中リカバリ処理を示す第１のフローチャートである。It is a 1st flowchart which shows the recovery process during an evacuation process. 退避処理中リカバリ処理を示す第２のフローチャートである。It is a 2nd flowchart which shows the recovery process during an evacuation process. 退避処理中リカバリ処理を示す第３のフローチャートである。It is a 3rd flowchart which shows the recovery process during an evacuation process. 退避処理中に障害が発生したときのスライス割当の状態遷移を示す図である。It is a figure which shows the state transition of the slice allocation when a failure generate | occur | produces during a save process. 再構成されたスライス情報テーブルの例を示す図である。It is a figure which shows the example of the reconfigure | reconstructed slice information table. 再構成されたセグメント情報テーブルの例を示す図である。It is a figure which shows the example of the reconfigure | reconstructed segment information table. 退避処理中のリカバリ処理におけるスライス割当の状態遷移を示す図である。It is a figure which shows the state transition of the slice allocation in the recovery process during a save process. 再開した退避処理におけるスライス割当の状態遷移を示す図である。It is a figure which shows the state transition of the slice allocation in the resuming process which restarted.

符号の説明Explanation of symbols

１ａ，１ｂ，１ｃ，１ｄストレージ装置
２退避処理実行手段
３記憶手段
４障害検出手段
５退避処理中止指示手段
６復旧対象データ判定手段
７冗長データコピー先選択手段
８冗長データコピー手段 1a, 1b, 1c, 1d Storage device 2 Save processing execution means 3 Storage means 4 Failure detection means 5 Save process stop instruction means 6 Recovery target data determination means 7 Redundant data copy destination selection means 8 Redundant data copy means

Claims

複数のストレージ装置に格納されたデータの管理をコンピュータに実行させるデータ管理プログラムであって、
前記コンピュータに、
任意のストレージ装置を指定した退避要求が入力されると、指定された退避処理対象ストレージ装置の識別情報を記憶手段に格納し、前記退避処理対象ストレージ装置に格納されているデータを他のストレージ装置にコピーする退避処理を実行し、
前記退避処理中に、いずれかのストレージ装置での障害の発生が検出された場合、前記退避処理を中止させ、
該ストレージ装置の障害により二重化状態が損なわれたデータを復旧対象データとし、
前記記憶手段を参照し、該復旧対象データの冗長データコピー先を、前記退避処理対象ストレージ装置以外のストレージ装置の記憶領域から優先的に選択し、
選択された該冗長データコピー先の記憶領域に、該復旧対象データをコピーする、
処理を実行させるデータ管理プログラム。 A data management program for causing a computer to manage data stored in a plurality of storage devices,
To the computer,
When saving request specifying any storage device is input, the designated save processing target storage device stores the identification information in the storage means, the evacuation processing target storage unit stored with the Lud over data other have the run the save processing to be copied to the storage device,
During pre-Symbol saving process, when the occurrence of failure in one of the storage device is detected, it stops the pre Symbol saving process,
The data duplication state has been compromised by the failure of the storage system and recovery target data,
Referring to the storage means, a redundant data copy destination of the restoration target data, preferentially selected from the storage area of the save processing target storage device other than the storage device,
To select the storage area of the redundant data copy destination has been, to copy the recovery target data,
Data management program that executes processing .

冗長データコピー先の選択では、復旧対象データの総量に対して、前記退避処理対象ストレージ装置以外のストレージ装置内の空き領域が不足している場合、不足しているデータ容量分だけ、前記退避処理対象ストレージ装置内の空き領域を冗長データコピー先として選択することを特徴とする請求項１記載のデータ管理プログラム。 In the selection of redundant data copy destination, recovery of the total amount of the old target data, when said free space saving processing target storage unit in the storage devices other than is insufficient, only the data capacity of the missing, claim 1, wherein the data management program, characterized by selecting a free area of the save processing target storage in the device as a redundant data copy destination.

冗長データコピー先の選択では、前記退避処理対象ストレージ装置内に格納されている復旧対象データの冗長データコピー先を決定後、前記退避処理対象ストレージ装置以外のストレージ装置に格納されている復旧対象データの冗長データコピー先を決定することを特徴とする請求項１または２記載のデータ管理プログラム。 In the selection of redundant data copy destination, after determining the redundant data copy destination restoration target data that are stored in the save processing target storage in the device, are stored in the storage devices other than the save processing target storage device according to claim 1 or 2, wherein the data management program and determines the redundant data copy destination restoration target data that.

二重化状態が損なわれたデータを復旧対象データとする際には、前記障害ストレージ装置以外の正常ストレージ装置に格納されているデータ同士の対応関係を調査し、対応関係にある他のデータが存在しないデータを該復旧対象データとすることを特徴とする請求項１乃至３のいずれかに記載のデータ管理プログラム。 When data with a duplicated state is damaged is used as recovery target data, the correspondence relationship between data stored in normal storage devices other than the failed storage device is investigated, and there is no other data in the correspondence relationship. data management program according to the data in any one of claims 1 to 3, characterized in that the said recovery target data.

前記退避処理が中止された場合、復旧対象データのコピー処理が終了後、前記退避対象ストレージ装置内のデータの退避処理を再実行することを特徴とする請求項１乃至４のいずれかに記載のデータ管理プログラム。 If the previous SL saving process is aborted, after copying the recovery target data is completed, according to any one of claims 1 to 4, characterized in that to re-execute the save processing of data of the save-target storage in the device Data management program.

複数のストレージ装置に格納されたデータを管理するデータ管理装置であって、
任意のストレージ装置を指定した退避要求が入力されると、指定された退避処理対象ストレージ装置の識別情報を記憶手段に格納し、前記退避処理対象ストレージ装置に格納されているデータを他のストレージ装置にコピーする退避処理を実行する退避処理実行手段と、
障害が発生したストレージ装置を検出する障害検出手段と、
前記退避処理中に前記障害検出手段で障害の発生が検出された場合、前記退避処理実行手段による前記退避処理を中止させる退避処理中止指示手段と、
前記障害検出手段で検出された該ストレージ装置の障害により二重化状態が損なわれたデータを判断し、復旧対象データとする復旧対象データ判定手段と、
前記記憶手段を参照し、該復旧対象データの冗長データコピー先を、前記退避処理対象ストレージ装置以外のストレージ装置の記憶領域から優先的に選択する冗長データコピー先選択手段と、
前記冗長データコピー先選択手段で選択された該冗長データコピー先の記憶領域に、該復旧対象データをコピーする冗長データコピー手段と、
を有するデータ管理装置。 A data management device for managing data stored in a plurality of storage devices,
When saving request specifying any storage device is input, the designated save processing target storage device stores the identification information in the storage means, the evacuation processing target storage unit stored with the Lud over data other have the Save processing execution means for executing save processing to be copied to the storage device;
And failure detection means for detecting a storage device fails,
A saving process stop instruction means for stopping the saving process by the saving process execution means when the failure detection means detects a failure during the saving process;
A recovery target data determination unit that determines data that has been duplicated due to a failure of the storage device detected by the failure detection unit, and sets the data as a recovery target data;
Referring to the storage means, a redundant data copy destination of the restoration target data, and the redundant data copy destination selection means for selecting preferentially from the storage area of the save processing target storage device other than the storage device,
In the storage area of the redundant data copy destination selected by the redundant data copy destination selection means, and the redundant data copying means for copying the recovery target data,
A data management device.

複数のストレージ装置に格納されたデータを管理するデータ管理方法であって、
コンピュータが、
任意のストレージ装置を指定した退避要求が入力されると、指定された退避処理対象ストレージ装置の識別情報を記憶手段に格納し、前記退避処理対象ストレージ装置に格納されているデータを他のストレージ装置にコピーする退避処理を実行し、
前記退避処理中に、いずれかのストレージ装置での障害の発生が検出された場合、前記退避処理を中止させ、
該ストレージ装置の障害により二重化状態が損なわれたデータを復旧対象データとし、
前記記憶手段を参照し、該復旧対象データの冗長データコピー先を、前記退避処理対象ストレージ装置以外のストレージ装置の記憶領域から優先的に選択し、
選択された該冗長データコピー先の記憶領域に、該復旧対象データをコピーする、
ことを特徴とするデータ管理方法。
A data management method for managing data stored in a plurality of storage devices,
Computer is,
When saving request specifying any storage device is input, the designated save processing target storage device stores the identification information in the storage means, the evacuation processing target storage unit stored with the Lud over data other have the Execute save processing to copy to the storage device,
During pre-Symbol saving process, when the occurrence of failure in one of the storage device is detected, to stop the evacuation process,
The data duplication state has been compromised by the failure of the storage system and recovery target data,
Referring to the storage means, a redundant data copy destination of the restoration target data, preferentially selected from the storage area of the save processing target storage device other than the storage device,
The storage area of the selected the redundant data copy destination, copying the recovery target data,
A data management method characterized by the above.