JP7201775B2

JP7201775B2 - Storage system, data management method, and data management program

Info

Publication number: JP7201775B2
Application number: JP2021182809A
Authority: JP
Inventors: 悠冬鴨生; 良介達見; 朋宏吉原; 尚長尾
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2018-11-08
Filing date: 2021-11-09
Publication date: 2023-01-10
Anticipated expiration: 2038-11-08
Also published as: JP2022010181A

Description

本発明は、ストレージシステムにおけるデータを管理する技術に関する。 The present invention relates to technology for managing data in a storage system.

ストレージシステムでは、電源喪失などの障害からデータを保護するために、複数のストレージコントローラ間でデータを多重化（一般には二重化）している。また、ストレージシステムでは、複数の専用回路を用い、ライトデータをキャッシュ領域に同時に多重化することも行われている。 In a storage system, data is multiplexed (generally duplicated) among multiple storage controllers in order to protect data from failures such as power loss. Further, in a storage system, a plurality of dedicated circuits are used to simultaneously multiplex write data in a cache area.

例えば、特許文献１には、第一キャッシュとＦＩＦＯバッファとにデータを格納した時点でホストにライト完了を送信し、その後、ＦＩＦＯバッファから第二キャッシュにデータを送付することで、キャッシュを二重化するストレージシステムでの書込みを高速化する技術が開示されている。 For example, in Patent Document 1, when data is stored in the first cache and the FIFO buffer, a write completion is sent to the host, and then the data is sent from the FIFO buffer to the second cache, thereby duplicating the cache. A technique for speeding up writing in a storage system is disclosed.

一方、専用回路の開発コスト削減を目的として、特許文献２には、専用回路の処理を汎用コントローラでエミュレーションするストレージシステムにおいて、データの一貫性を保証する技術が開示されている。この技術では、コントローラ外部から受領するデータをバッファ領域に格納し、このコントローラがバッファ領域からキャッシュ領域に転送することで、Ｉ／Ｏ処理中に障害が発生しても、データの破壊を防ぐことができるようにしている。 On the other hand, for the purpose of reducing development costs of dedicated circuits, Patent Document 2 discloses a technique for ensuring data consistency in a storage system in which processing of dedicated circuits is emulated by a general-purpose controller. With this technology, data received from outside the controller is stored in a buffer area, and the controller transfers the data from the buffer area to the cache area, thereby preventing data corruption even if a failure occurs during I/O processing. We are making it possible.

特開２００５－４４０１０号公報Japanese Patent Application Laid-Open No. 2005-44010 国際公開第２０１５／０５２７９８号WO2015/052798

ストレージシステムの性能向上のために、汎用コントローラを多数搭載したストレージシステムが登場している。 In order to improve the performance of storage systems, storage systems equipped with many general-purpose controllers have appeared.

このようなストレージシステムにおいて、メモリ容量の増加を抑える観点では、データの多重化を二多重とすることが望ましい。 In such a storage system, from the viewpoint of suppressing an increase in memory capacity, it is desirable to multiplex data in two.

例えば、このようなストレージシステムにおいてデータを二重化する場合においては、ホストからデータを受領するコントローラと、データの二重化先のコントローラとが全て異なる場合がある。このような場合に、特許文献２に記載の技術では、二重化先のコントローラのそれぞれにおいて、バッファ領域からキャッシュ領域へデータを転送する必要があり、コントローラのプロセッサへの処理負荷が掛かり、コントローラの性能が低下してしまう虞がある。 For example, when data is duplicated in such a storage system, the controller that receives data from the host and the controller that duplicates the data may be completely different. In such a case, in the technique described in Patent Document 2, it is necessary to transfer data from the buffer area to the cache area in each of the duplicated controllers, which imposes a processing load on the processors of the controllers and increases the performance of the controllers. is likely to decrease.

本発明は、上記事情に鑑みなされたものであり、その目的は、コントローラのプロセッサへの処理負荷を抑制しつつ、適切にデータの一貫性を確保することのできる技術を提供することにある。 SUMMARY OF THE INVENTION The present invention has been made in view of the circumstances described above, and an object thereof is to provide a technique capable of appropriately ensuring data consistency while suppressing the processing load on the processor of the controller.

上記目的を達成するため、一観点に係るストレージシステムは、複数のコントローラと、データを格納可能な記憶デバイスユニットとを有するストレージシステムであって、メモリに直接アクセス可能であるとともに、他のコントローラとの通信が可能なＤＭＡ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓ）部を有し、コントローラは、プロセッサ部と、データを一時的に格納するバッファ領域と、データをキャッシュするキャッシュ領域とを有するメモリと、を有し、ライト要求にかかる新データがバッファ領域に格納された場合に、コントローラのプロセッサ部は、ＤＭＡ部を用いて、ライト要求に対応する新データが格納されたバッファ領域から、複数のコントローラのキャッシュ領域に対して、他のバッファ領域を介さずに順次転送させる。 In order to achieve the above object, a storage system according to one aspect is a storage system having a plurality of controllers and storage device units capable of storing data, capable of directly accessing memory, and capable of communicating with other controllers. The controller has a DMA (Direct Memory Access) unit capable of communication, a processor unit, a memory having a buffer area for temporarily storing data, and a memory having a cache area for caching data. When the new data related to the request is stored in the buffer area, the processor unit of the controller uses the DMA unit to transfer the new data corresponding to the write request from the buffer area to the cache areas of the plurality of controllers. are transferred sequentially without going through other buffer areas.

本発明によれば、コントローラのプロセッサへの処理負荷を抑制しつつ、適切にデータの一貫性を確保することができる。 ADVANTAGE OF THE INVENTION According to this invention, the consistency of data can be ensured appropriately, suppressing the processing load to the processor of a controller.

図１は、実施例１に係るストレージシステムのライト処理の概要を説明する図である。FIG. 1 is a diagram for explaining an outline of write processing of a storage system according to the first embodiment. 図２は、実施例１に係るストレージシステムの障害発生時のライト処理の概要を説明する図である。FIG. 2 is a diagram for explaining an overview of write processing when a failure occurs in the storage system according to the first embodiment. 図３は、実施例１に係る計算機システムの構成図である。FIG. 3 is a configuration diagram of a computer system according to the first embodiment. 図４は、実施例１に係るコントローラ状態管理情報のデータ構造の一例を示す図である。FIG. 4 is a diagram illustrating an example of a data structure of controller state management information according to the first embodiment; 図５は、実施例１に係るキャッシュ状態管理情報のデータ構造の一例を示す図である。FIG. 5 is a diagram illustrating an example of the data structure of cache state management information according to the first embodiment; 図６は、実施例１に係る転送管理情報のデータ構造の一例を示す図である。FIG. 6 is a diagram illustrating an example of the data structure of transfer management information according to the first embodiment. 図７は、実施例１に係る転送状態管理情報のデータ構造の一例を示す図である。FIG. 7 is a diagram illustrating an example of the data structure of transfer status management information according to the first embodiment; 図８は、実施例１に係る逐次転送依頼処理のフローチャートである。FIG. 8 is a flowchart of sequential transfer request processing according to the first embodiment. 図９は、実施例１に係る逐次転送完了待ち処理のフローチャートである。FIG. 9 is a flowchart of sequential transfer completion waiting processing according to the first embodiment. 図１０は、実施例１に係る逐次転送処理のフローチャートである。FIG. 10 is a flowchart of sequential transfer processing according to the first embodiment. 図１１は、実施例１に係る障害対応処理のフローチャートである。FIG. 11 is a flowchart of failure handling processing according to the first embodiment. 図１２は、実施例２に係る逐次転送完了待ち処理のフローチャートである。FIG. 12 is a flowchart of sequential transfer completion waiting processing according to the second embodiment. 図１３は、実施例２に係る障害対応処理のフローチャートである。FIG. 13 is a flowchart of failure handling processing according to the second embodiment.

以下、本発明の実施例を、図面を用いて説明する。ただし、本発明は以下に示す実施の形態の記載内容に限定して解釈されるものではない。本発明の思想ないし趣旨から逸脱しない範囲で、その具体的構成を変更し得ることは当業者であれば容易に理解される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the present invention should not be construed as being limited to the description of the embodiments shown below. Those skilled in the art will easily understand that the specific configuration can be changed without departing from the idea or gist of the present invention.

以下に説明する発明の構成において、同一又は類似する構成又は機能には同一の符号を付し、重複する説明は省略する。 In the configurations of the invention described below, the same or similar configurations or functions are denoted by the same reference numerals, and overlapping descriptions are omitted.

本明細書等における「第１」、「第２」、「第３」等の表記は、構成要素を識別するために付するものであり、必ずしも、数又は順序を限定するものではない。 The notations such as “first”, “second”, “third”, etc. in this specification and the like are attached to identify the constituent elements, and do not necessarily limit the number or order.

図面等において示す各構成の位置、大きさ、形状、及び範囲等は、発明の理解を容易にするため、実際の位置、大きさ、形状、及び範囲等を表していない場合がある。したがって、本発明では、図面等に開示された位置、大きさ、形状、及び範囲等に限定されない。 The position, size, shape, range, etc. of each component shown in the drawings may not represent the actual position, size, shape, range, etc. in order to facilitate understanding of the invention. Therefore, the present invention is not limited to the positions, sizes, shapes, ranges, etc. disclosed in the drawings and the like.

また、以下の説明における用語の意味は、下記の通りである。
（＊）「ＰＤＥＶ」は、不揮発性の物理的な記憶デバイスの略である。複数のＰＤＥＶで複数のＲＡＩＤグループが構成されてよい。「ＲＡＩＤ」は、Redundant Array of Independent (or Inexpensive) Disksの略である。ＲＡＩＤグループはパリティグループと呼ばれてもよい。
（＊）ＨＣＡ（Host Channel Adaptor）は、ＣＰＵに指示され、コントローラ間の通信を行うデバイスである。ＨＣＡは、例えば、ＤＭＡ（Direct Memory Access）部の一例であり、メモリに直接アクセスすることができる。
（＊）プロセッサ部は、１以上のプロセッサを含む。少なくとも１つのプロセッサは、典型的には、ＣＰＵ（Central Processing Unit）のようなマイクロプロセッサである。１以上のプロセッサの各々は、シングルコアでもよいしマルチコアでもよい。プロセッサは、処理の一部または全部を行うハードウェア回路を含んでもよい。 Also, the meanings of terms used in the following description are as follows.
(*) "PDEV" is an abbreviation for non-volatile physical storage device. Multiple PDEVs may configure multiple RAID groups. "RAID" is an abbreviation for Redundant Array of Independent (or Inexpensive) Disks. A RAID group may be called a parity group.
(*) An HCA (Host Channel Adapter) is a device that is instructed by a CPU and performs communication between controllers. HCA is an example of a DMA (Direct Memory Access) unit, for example, and can directly access memory.
(*) The processor section includes one or more processors. The at least one processor is typically a microprocessor such as a CPU (Central Processing Unit). Each of the one or more processors may be single-core or multi-core. A processor may include hardware circuitry that performs some or all of the processing.

まず、実施例１に係る計算機システムについて説明する。 First, a computer system according to the first embodiment will be explained.

図１は、実施例１に係るストレージシステムのライト処理の概要を説明する図である。図１は、二重化先のコントローラ２２（＃１、＃２）それぞれのキャッシュ領域２４３（＃１、＃２）に対して逐次にデータを転送するライト処理の流れを示している。 FIG. 1 is a diagram for explaining an outline of write processing of a storage system according to the first embodiment. FIG. 1 shows the flow of write processing in which data is sequentially transferred to the cache areas 243 (#1, #2) of the controllers 22 (#1, #2) of the duplication destination.

本実施例に係る計算機システム１００のストレージシステム２は、複数のコントローラ２２（コントローラ＃０，＃１，＃２）を備えている。複数のコントローラ２２は、相互に接続されている。コントローラ２２は、例えば、ストレージシステム専用のコントローラではなくて、汎用のコントローラである。コントローラ２２は、ＦＥ－Ｉ／Ｆ２１０と、プロセッサ部の一例としてのＣＰＵ２３０と、メモリ２４０とを有する。メモリ２４０は、バッファ領域２４２及びキャッシュ領域２４３を有するとともに、転送状態管理情報２４７を格納している。 The storage system 2 of the computer system 100 according to this embodiment comprises a plurality of controllers 22 (controllers #0, #1, #2). A plurality of controllers 22 are interconnected. The controller 22 is, for example, a general-purpose controller rather than a dedicated controller for the storage system. The controller 22 has an FE-I/F 210 , a CPU 230 as an example of a processor section, and a memory 240 . The memory 240 has a buffer area 242 and a cache area 243 and stores transfer state management information 247 .

ストレージシステム２においては、各コントローラ２２がホスト計算機（ホストともいう）１からのＩ／Ｏ要求を並列に処理できるよう、Ｉ／Ｏ処理対象の空間（例えば、論理ユニット：ＬＵ）ごとに処理担当（この処理担当である権利を、オーナ権という）のコントローラ２２を定めている。例えば、コントローラ＃１が、ＬＵＮ＃０のＬＵに対するオーナ権を持っているとき、ＬＵＮ＃０のＬＵに対するＩ／Ｏ要求は、このコントローラ＃１の制御により処理される。 In the storage system 2, each controller 22 is in charge of processing for each I/O processing target space (eg, logical unit: LU) so that each controller 22 can process I/O requests from the host computer (also called host) 1 in parallel. (The right to be in charge of this processing is called an owner right). For example, when the controller #1 has ownership of the LU of LUN #0, I/O requests to the LU of LUN #0 are processed under the control of this controller #1.

ホスト計算機１（ホスト計算機＃０、＃１、＃２）は、通信ネットワーク１１を介して、ストレージシステム２と接続されている。ホスト計算機１は、例えば、それぞれ１つのコントローラ２２と接続する。 Host computers 1 (host computers # 0 , # 1 , # 2 ) are connected to the storage system 2 via a communication network 11 . For example, each host computer 1 is connected to one controller 22 .

コントローラ２２は、ホスト計算機１からのライト要求に従うライトデータを、記憶デバイスユニット２０に書き込まず、複数のコントローラ２２内のキャッシュ領域２４３に二重化して格納した後に、ホスト計算機１に対してライト処理の完了を通知する。これにより、高速なライト処理を実現することができる。 The controller 22 does not write the write data according to the write request from the host computer 1 to the storage device unit 20, but duplicates and stores it in the cache areas 243 within the controllers 22, and then sends the write processing to the host computer 1. Notify completion. As a result, high-speed write processing can be realized.

コントローラ２２は、ライト要求とは非同期にキャッシュ領域２４３内のライトデータを記憶デバイスユニット２０に書き込む。既にキャッシュ領域２４３に二重化されたライトデータが格納されていて、まだ記憶デバイスユニット２０に書き込まれていない状態（ダーティ状態といい、このライトデータをダーティデータという）において、ホスト計算機１から同じ書き込み先への新たなライトデータを受領するときには、コントローラ２２は、キャッシュ領域２４３内のライトデータの破壊を避けるために、新たなライトデータをバッファ領域２４２に格納する。その後、コントローラ２２は、バッファ領域２４２内に格納した新たなライトデータを複数（二重化の場合には２つ）のコントローラ内のそれぞれのキャッシュ領域２４３に逐次に転送することでライトデータの一貫性を保持する。転送状態管理情報２４７は、二重化先のコントローラ２２のそれぞれのキャッシュ領域２４３に対してライトデータを逐次に転送する際の進捗状況（転送状態）を管理する情報である。 The controller 22 writes the write data in the cache area 243 to the storage device unit 20 asynchronously with the write request. In a state where duplicated write data has already been stored in the cache area 243 and has not yet been written to the storage device unit 20 (called a dirty state, and this write data is called dirty data), the host computer 1 writes to the same destination. When receiving new write data to the controller 22 , the new write data is stored in the buffer area 242 in order to avoid destroying the write data in the cache area 243 . After that, the controller 22 sequentially transfers the new write data stored in the buffer area 242 to each of the cache areas 243 in a plurality of controllers (two in the case of duplication), thereby ensuring the consistency of the write data. Hold. The transfer state management information 247 is information for managing the progress (transfer state) when sequentially transferring write data to each cache area 243 of the controller 22 of the duplication destination.

ここで、ライト要求を受領したコントローラ２２以外のコントローラ２２のバッファ領域２４２を介さずに、一のコントローラ２２のキャッシュ領域２４３にライトデータを転送し、転送が完了した後、他のコントローラ２２のキャッシュ領域２４３にライトデータを転送することを「逐次転送」という。 Here, the write data is transferred to the cache area 243 of one controller 22 without passing through the buffer area 242 of the controller 22 other than the controller 22 that received the write request, and after the transfer is completed, the cache of the other controller 22 Transferring write data to the area 243 is called "sequential transfer".

ここで、コントローラ＃０が、コントローラ＃１がオーナ権を有するＬＵに対するライト要求をホスト計算機＃０から受け取った場合のライト処理について説明する。 Write processing when the controller #0 receives from the host computer #0 a write request for the LU whose ownership is owned by the controller #1 will now be described.

コントローラ＃０は、ホスト計算機＃０からライト要求を受信した場合、コントローラ＃０のＣＰＵ＃０は、ライト要求の対象となるＬＵのオーナ権を有するコントローラ＃１のＣＰＵ＃１にライト要求を転送する。 When the controller #0 receives a write request from the host computer #0, the CPU #0 of the controller #0 transfers the write request to the CPU #1 of the controller #1, which has the ownership of the LU targeted for the write request. do.

ＣＰＵ＃１は、バッファ領域＃０上にライトデータを格納する領域を確保させ、ライトデータに対応するキャッシュ領域２４３（本例では、キャッシュ領域＃１、＃２）上に格納されているデータの状態を確認する。本実施形態では、キャッシュ領域２４３のデータは、ダーティ状態であることとする。キャッシュ領域２４３のデータがダーティ状態（ダーティデータ）であるので、ＣＰＵ＃１は、逐次転送が必要と判断する。 The CPU #1 secures an area for storing write data in the buffer area #0, and stores the data stored in the cache areas 243 (cache areas #1 and #2 in this example) corresponding to the write data. Check status. In this embodiment, data in the cache area 243 is assumed to be in a dirty state. Since the data in the cache area 243 is in a dirty state (dirty data), the CPU #1 determines that sequential transfer is necessary.

その後、コントローラ＃０のＣＰＵ＃０は、ＦＥ－ＩＦ＃０を介してバッファ領域＃０に確保された領域にライトデータを格納する（ステップＳ１）。 After that, the CPU #0 of the controller #0 stores the write data in the area secured in the buffer area #0 via the FE-IF#0 (step S1).

次いで、ＣＰＵ＃１は、ＨＣＡ＃０に、バッファ領域＃０からキャッシュ領域＃２へライトデータをコピー（転送）し、その後、キャッシュ領域＃１にライトデータをコピー（転送）すること（逐次転送）を依頼する（ステップＳ２）。 Next, the CPU #1 copies (transfers) the write data from the buffer area #0 to the cache area #2 to the HCA #0, and then copies (transfers) the write data to the cache area #1 (sequential transfer). ) (step S2).

ＨＣＡ＃０は、ＨＣＡ＃２を介して、バッファ領域＃０からキャッシュ領域＃２へライトデータをコピーする（以降、第一転送という）（ステップＳ３）。この際、ＨＣＡ＃０は、データコピー時にデータに付与された保証コードを確認する。保証コードは、データの格納位置を示す情報（ＶＯＬ番号やＶＯＬのアドレス等）やデータの一貫性を確認する情報（ＣＲＣ（Cyclic Redundancy Check）等）から構成されてもよい。 HCA#0 copies the write data from buffer area #0 to cache area #2 via HCA#2 (hereinafter referred to as first transfer) (step S3). At this time, HCA#0 confirms the guarantee code given to the data at the time of data copying. The guarantee code may consist of information indicating the data storage location (VOL number, VOL address, etc.) and information for checking data consistency (CRC (Cyclic Redundancy Check), etc.).

次に、ＨＣＡ＃０は、ＨＣＡ＃２を介して、転送状態管理情報＃２に、キャッシュ領域＃２へのライトデータの転送受領を格納させ、ＨＣＡ＃２に、転送状態管理情報＃１にキャッシュ領域＃２へのライトデータの転送完了を格納させることを依頼する（ステップＳ４）。依頼を受けたＨＣＡ＃２は、ＨＣＡ＃１を介して、転送状態管理情報＃１にキャッシュ領域＃２へのライトデータの転送完了を格納させる（ステップＳ５）。 Next, HCA#0 causes transfer status management information #2 to store the reception of transfer of write data to cache area #2 via HCA#2, and HCA#2 stores transfer status management information #1. A request is made to store the completion of transfer of the write data to the cache area #2 (step S4). Upon receipt of the request, HCA#2 stores completion of transfer of write data to cache area #2 in transfer status management information #1 via HCA#1 (step S5).

次に、ＨＣＡ＃０は、ＨＣＡ＃１を介して、バッファ領域＃０からキャッシュ領域＃１へライトデータをコピーする（以降、第二転送という）（ステップＳ６）。 Next, HCA#0 copies the write data from buffer area #0 to cache area #1 via HCA#1 (hereinafter referred to as second transfer) (step S6).

次に、ＨＣＡ＃０は、ＨＣＡ＃１を介して、転送状態管理情報＃１にキャッシュ領域＃１へのライトデータの転送完了を格納させる（ステップＳ７）。 Next, HCA#0 stores completion of transfer of write data to cache area #1 in transfer status management information #1 via HCA#1 (step S7).

ＣＰＵ＃１は、転送状態管理情報＃１を参照し、ライトデータの二重化完了を確認する（ステップＳ８）。次に、ＣＰＵ＃１は、ＣＰＵ＃０及びＦＥ－Ｉ／Ｆ＃０を介して、ホスト計算機＃０にライト要求完了を報告する（ステップＳ９）。これにより、ホスト計算機１からのライトデータは、キャッシュ領域＃１と、キャッシュ領域＃２とに二重化して格納される。 The CPU #1 refers to the transfer status management information #1 and confirms completion of duplication of the write data (step S8). Next, the CPU#1 reports the completion of the write request to the host computer #0 via the CPU#0 and FE-I/F#0 (step S9). As a result, the write data from the host computer 1 is duplicated and stored in the cache area #1 and the cache area #2.

なお、上記例では、キャッシュ領域＃２、＃１の順でライトデータを順次転送させていたが、キャッシュ領域＃１、＃２の順としてもよい。 In the above example, the write data is sequentially transferred in the order of the cache areas #2 and #1, but it may be transferred in the order of the cache areas #1 and #2.

図２は、実施例１に係るストレージシステムの障害発生時のライト処理の概要を説明する図である。図２は、図１に示すライト処理の途中に障害が発生した時のライト処理の概要を示している。 FIG. 2 is a diagram for explaining an overview of write processing when a failure occurs in the storage system according to the first embodiment. FIG. 2 shows an outline of write processing when a failure occurs during the write processing shown in FIG.

ＨＣＡ＃０が、ＣＰＵ＃１から逐次転送の依頼を受け、逐次転送を実施中にＨＣＡ２５０やＨＣＡ２５０を繋ぐネットワーク（図３のＨＣＡネットワーク２３）のパス等に障害が発生すると（ステップＳ１１）、キャッシュ領域＃１または＃２内のダーティ状態のライトデータ（ダーティデータ）を破壊してしまう虞がある（ステップＳ１２）。つまり、ダーティデータの一部分だけが新たなライトデータの一部分によって上書きされた別のデータとなってしまう虞がある。 When HCA#0 receives a sequential transfer request from CPU#1 and a failure occurs in the HCA 250 or the path of the network connecting the HCA 250 (HCA network 23 in FIG. 3) during sequential transfer (step S11), the cache There is a risk of destroying the dirty write data (dirty data) in the area #1 or #2 (step S12). In other words, there is a risk that only part of the dirty data will become different data overwritten by part of the new write data.

そこで、ライトデータを管理するＣＰＵ＃１は、転送状態管理情報２４７を参照し、正常なダーティデータを保持するキャッシュ領域２４３を特定する（ステップＳ１３）。その後、ＣＰＵ＃１は、特定したキャッシュ領域２４３内のダーティデータをデステージ（すなわち、記憶デバイスユニット２０に転送）する（ステップＳ１４）。さらに、ＣＰＵ＃１は、デステージを完了後に、キャッシュ領域＃１及び＃２内のダーティデータを破棄する。なお、以降の説明では、特に記載しない場合には、デステージの完了後に、そのデステージしたデータに対応するデータが格納されていた複数のキャッシュ領域２４３のデータを破棄するものとする。 Therefore, the CPU #1 that manages the write data refers to the transfer status management information 247 and identifies the cache area 243 that holds normal dirty data (step S13). After that, CPU #1 destages (that is, transfers to the storage device unit 20) the dirty data in the specified cache area 243 (step S14). Furthermore, CPU #1 discards the dirty data in cache areas #1 and #2 after completing the destage. In the following description, it is assumed that data in multiple cache areas 243 storing data corresponding to the destaged data is discarded after completion of destage unless otherwise specified.

以上の処理により、正常なダーティデータを選択して記憶デバイスユニット２０に書き込むことができ、ライトデータの一貫性を保証できる。 By the above processing, normal dirty data can be selected and written to the storage device unit 20, and the consistency of write data can be guaranteed.

次に、本実施例に係る計算機システムについて詳細に説明する。 Next, the computer system according to this embodiment will be described in detail.

図３は、実施例１に係る計算機システムの構成図である。 FIG. 3 is a configuration diagram of a computer system according to the first embodiment.

計算機システム１００は、１以上のホスト計算機１と、ストレージシステム２とを備える。ホスト計算機１と、ストレージシステム２とは、ネットワーク１１を介して接続されている。ネットワーク１１は、例えば、ＳＡＮ（Storage Area Network）である。 A computer system 100 comprises one or more host computers 1 and a storage system 2 . The host computer 1 and storage system 2 are connected via a network 11 . The network 11 is, for example, a SAN (Storage Area Network).

ストレージシステム２は、複数（例えば、３台以上）のコントローラ２２（コントローラ２２＃０，・・・，＃Ｎ）と、記憶デバイスユニット２０とを有する。複数のコントローラ２２は、ＨＣＡネットワーク２３を介して相互に接続されている。ストレージシステム２の可用性を向上させるため、コントローラ２２毎に専用の電源を用意し、それぞれのコントローラ２２に対して、その専用の電源を用いて給電するようにしてもよい。 The storage system 2 has a plurality (eg, three or more) of controllers 22 (controllers 22 # 0 , . . . , #N) and storage device units 20 . A plurality of controllers 22 are interconnected via an HCA network 23 . In order to improve the availability of the storage system 2, a dedicated power supply may be prepared for each controller 22, and power may be supplied to each controller 22 using the dedicated power supply.

コントローラ２２は、通信インタフェースと、記憶デバイスと、それらに接続されたプロセッサとを有する。通信インタフェースは、例えば、ＦＥ－Ｉ／Ｆ（Front End Inter/Face）２１０、ＢＥ－Ｉ／Ｆ（Back End Inter/Face）２２０、及びＨＣＡ２５０である。記憶デバイスは、例えば、メモリ２４０である。プロセッサは、例えば、ＣＰＵ（Central Processing Unit）２３０である。なお、図３においては、コントローラ２２は、１つのメモリ２４０を備えている構成としているが、メモリ２４０を複数備えてもよい。 Controller 22 has a communication interface, a storage device, and a processor connected thereto. The communication interfaces are FE-I/F (Front End Inter/Face) 210, BE-I/F (Back End Inter/Face) 220, and HCA 250, for example. A storage device is, for example, the memory 240 . The processor is, for example, a CPU (Central Processing Unit) 230 . Note that the controller 22 is configured to include one memory 240 in FIG. 3 , but may include a plurality of memories 240 .

ＦＥ－Ｉ／Ｆ２１０は、ホスト計算機１等のフロントエンドに存在する外部デバイスと通信するためのインタフェースデバイスである。ＢＥ－Ｉ／Ｆ２２０は、コントローラ２２が記憶デバイスユニット２０と通信するためのインタフェースデバイスである。ＨＣＡ２５０は、各コントローラ２２のメモリ２４０を操作するために他のＨＣＡ２５０と通信するためのインタフェースデバイスである。 The FE-I/F 210 is an interface device for communicating with an external device present on the front end such as the host computer 1 . BE-I/F 220 is an interface device for controller 22 to communicate with storage device unit 20 . HCA 250 is an interface device for communicating with other HCAs 250 to manipulate memory 240 of each controller 22 .

メモリ２４０は、例えば、ＲＡＭ（Random Access Memory）であり、バッファ領域２４２と、キャッシュ領域２４３とを含む。また、メモリ２４０は、制御モジュール２４１、コントローラ状態管理情報２４４、キャッシュ状態管理情報２４５、転送管理情報２４６、及び転送状態管理情報２４７を記憶する。なお、メモリ２４０は、不揮発性メモリであっても、揮発性メモリであってもよい。 The memory 240 is, for example, a RAM (Random Access Memory) and includes a buffer area 242 and a cache area 243 . The memory 240 also stores a control module 241 , controller status management information 244 , cache status management information 245 , transfer management information 246 and transfer status management information 247 . Note that the memory 240 may be a non-volatile memory or a volatile memory.

制御モジュール２４１は、ＣＰＵ２３０に実行されることにより、ストレージシステム２全体を制御するためのモジュール（プログラム）である。より具体的には、制御モジュール２４１は、ＣＰＵ２３０に実行されることにより、Ｉ／Ｏ処理の制御等を行う。 The control module 241 is a module (program) for controlling the entire storage system 2 by being executed by the CPU 230 . More specifically, the control module 241 controls I/O processing and the like by being executed by the CPU 230 .

バッファ領域２４２は、ホスト計算機１から受領したライトデータを一時的に格納する領域である。 The buffer area 242 is an area for temporarily storing write data received from the host computer 1 .

キャッシュ領域２４３は、ホスト計算機１から記憶デバイスユニット２０へ送信されるライトデータをキャッシュする領域である。キャッシュ領域２４３は、ダーティデータを格納することもあるので、バックアップ電源等により不揮発化されていてもよい。 The cache area 243 is an area for caching write data sent from the host computer 1 to the storage device unit 20 . Since the cache area 243 may store dirty data, it may be made non-volatile by a backup power source or the like.

コントローラ状態管理情報２４４は、コントローラ２２が正常状態か、故障状態かを管理するための情報である。キャッシュ状態管理情報２４５は、二重化に使用されているキャッシュ領域２４３を有するコントローラ２２とキャッシュの状態を管理するための情報である。転送管理情報２４６は、逐次転送で転送するライトデータを受信したコントローラ２２と、転送状態管理情報２４７のエントリのアドレスを管理するための情報である。転送状態管理情報２４７は、逐次転送の進捗状況（転送状態）を管理するための情報である。コントローラ状態管理情報２４４、キャッシュ状態管理情報２４５、転送管理情報２４６、及び転送状態管理情報２４７の詳細は、図４乃至図７を参照して後述する。 The controller state management information 244 is information for managing whether the controller 22 is in a normal state or a failure state. The cache state management information 245 is information for managing the state of the controller 22 having the cache area 243 used for duplication and the cache. The transfer management information 246 is information for managing the controller 22 that received the write data transferred by sequential transfer and the address of the entry of the transfer status management information 247 . The transfer status management information 247 is information for managing the progress of sequential transfer (transfer status). Details of the controller status management information 244, cache status management information 245, transfer management information 246, and transfer status management information 247 will be described later with reference to FIGS.

記憶デバイスユニット２０は、複数のＰＤＥＶ２００を有する。ＰＤＥＶ２００は、ＨＤＤ（Hard Disk Drive）でよいが、他種の記憶デバイス（不揮発性の記憶デバイス）、例えば、ＳＳＤ（Solid State Drive）のようなＦＭ（Flash Memory）デバイスでもよい。記憶デバイスユニット２０は、異なる種類のＰＤＥＶ２００を有してよい。また、複数の同種のＰＤＥＶ２００でＲＡＩＤグループが構成されてよい。ＲＡＩＤグループには、所定のＲＡＩＤレベルに従いデータが格納される。コントローラ２２がホスト計算機１から受信したライトデータに対しては、ＦＥ－Ｉ／Ｆ２１０によって保証コードが付与される。この保証コードが付与されたデータは、ＲＡＩＤグループに格納される。 The storage device unit 20 has multiple PDEVs 200 . The PDEV 200 may be a HDD (Hard Disk Drive), or may be another type of storage device (non-volatile storage device) such as an FM (Flash Memory) device such as an SSD (Solid State Drive). The storage device unit 20 may have PDEV 200 of different types. Also, a RAID group may be configured with a plurality of PDEV 200 of the same type. A RAID group stores data according to a predetermined RAID level. A guarantee code is given by the FE-I/F 210 to write data received by the controller 22 from the host computer 1 . Data to which this guarantee code is assigned is stored in the RAID group.

ＨＣＡ２５０は、ＣＰＵ２３０から指示を受け、自コントローラ２２のメモリ２４０に対する操作や、ＨＣＡネットワーク２３を経由して、他コントローラ２２のメモリ２４０に対する操作を行う。 The HCA 250 receives instructions from the CPU 230 and performs operations on the memory 240 of its own controller 22 and operations on the memory 240 of the other controllers 22 via the HCA network 23 .

次に、コントローラ状態管理情報２４４を詳細に説明する。 Next, the controller state management information 244 will be explained in detail.

図４は、実施例１に係るコントローラ状態管理情報のデータ構造の一例を示す図である。 FIG. 4 is a diagram illustrating an example of a data structure of controller state management information according to the first embodiment;

コントローラ状態管理情報２４４は、コントローラ２２ごとのエントリを格納する。コントローラ状態管理情報２３３のエントリは、コントローラＩＤ４０１及び状態４０２のフィールドを含む。コントローラＩＤ４０１には、エントリに対応するコントローラ２２の識別子（コントローラＩＤ）が格納される。状態４０２には、エントリに対応するコントローラ２２の動作状態が格納される。動作状態としては、正常、故障等がある。 The controller state management information 244 stores entries for each controller 22 . The entry of the controller state management information 233 includes fields of controller ID 401 and state 402 . The controller ID 401 stores the identifier (controller ID) of the controller 22 corresponding to the entry. The state 402 stores the operating state of the controller 22 corresponding to the entry. Operation states include normal, failure, and the like.

次に、キャッシュ状態管理情報２４５を詳細に説明する。 Next, the cache state management information 245 will be explained in detail.

図５は、実施例１に係るキャッシュ状態管理情報のデータ構造の一例を示す図である。 FIG. 5 is a diagram illustrating an example of the data structure of cache state management information according to the first embodiment;

キャッシュ状態管理情報２４５は、データアドレス毎のエントリを格納する。キャッシュ状態管理情報２４５のエントリは、データアドレス５０１、第一転送先コントローラＩＤ５０２、第二転送先コントローラＩＤ５０３、及びキャッシュ状態５０４のフィールドを含む。 The cache state management information 245 stores an entry for each data address. The entry of the cache state management information 245 includes data address 501, first destination controller ID 502, second destination controller ID 503, and cache state 504 fields.

データアドレス５０１には、エントリに対応するストレージシステム２内のユーザデータの格納位置を示す値（データアドレス）が格納される。 The data address 501 stores a value (data address) indicating the storage location of the user data in the storage system 2 corresponding to the entry.

第一転送先コントローラＩＤ５０２には、エントリに対応するデータアドレスのデータが二重化されてキャッシュされている、第一転送の転送先のキャッシュ領域２４３を有するコントローラ２２（転送先コントローラの一例）の識別子（コントローラＩＤ：第一転送先コントローラＩＤ）が格納される。 In the first transfer destination controller ID 502, the identifier (an example of the transfer destination controller) of the controller 22 (an example of the transfer destination controller) having the cache area 243 of the transfer destination of the first transfer in which the data of the data address corresponding to the entry is duplicated and cached. Controller ID: first transfer destination controller ID) is stored.

第二転送先コントローラＩＤ５０３には、エントリに対応するデータアドレスのデータが二重化されてキャッシュされている、第二転送の転送先のキャッシュ領域２４３を有するコントローラ２２（担当コントローラの一例）の識別子（コントローラＩＤ：第二転送先コントローラＩＤ）が格納される。本実施形態では、第二転送先コントローラＩＤ５０３には、エントリに対応するデータアドレスのデータが属する論理ユニットのオーナ権を有するコントローラ（オーナコントローラ）２２のコントローラＩＤが格納される。 In the second transfer destination controller ID 503, the identifier (an example of the controller in charge) of the controller 22 (an example of the controller in charge) having the cache area 243 of the transfer destination of the second transfer in which the data of the data address corresponding to the entry is duplicated and cached. ID: Second transfer destination controller ID) is stored. In this embodiment, the second transfer destination controller ID 503 stores the controller ID of the controller (owner controller) 22 that has the ownership of the logical unit to which the data of the data address corresponding to the entry belongs.

キャッシュ状態５０４には、エントリに対するデータアドレスのデータのキャッシュの状態を示す情報が格納される。キャッシュの状態としては、記憶デバイスユニット２０にデステージされていないことを示すダーティと、デステージされていることを示すクリーンとがある。 The cache state 504 stores information indicating the cache state of the data of the data address for the entry. The cache status includes dirty, which indicates that it has not been destaged to the storage device unit 20, and clean, which indicates that it has been destaged.

次に、転送管理情報２４６を詳細に説明する。 Next, the transfer management information 246 will be explained in detail.

図６は、実施例１に係る転送管理情報のデータ構造の一例を示す図である。 FIG. 6 is a diagram illustrating an example of the data structure of transfer management information according to the first embodiment.

転送管理情報２４６は、データアドレス毎のエントリを格納する。転送管理情報２４６のエントリは、データアドレス６０１、コントローラＩＤ６０２、及び転送状態管理情報アドレス６０３のフィールドを含む。データアドレス６０１には、エントリに対応するストレージシステム２内のユーザデータの格納位置（記憶空間）を示す値（データアドレス）が格納される。コントローラＩＤ６０２には、エントリに対応するデータアドレスのライトデータをホスト１から受信したコントローラ（受信コントローラ）２２の識別情報（コントローラＩＤ）が格納される。転送状態管理情報アドレス６０３には、エントリに対応するデータアドレスの転送状態管理情報２４７における対応するエントリの格納場所を示す値（アドレス）が格納される。 The transfer management information 246 stores an entry for each data address. The entry of the transfer management information 246 includes data address 601 , controller ID 602 and transfer status management information address 603 fields. The data address 601 stores a value (data address) indicating the storage location (storage space) of the user data in the storage system 2 corresponding to the entry. The controller ID 602 stores identification information (controller ID) of the controller (receiving controller) 22 that received the write data of the data address corresponding to the entry from the host 1 . The transfer state management information address 603 stores a value (address) indicating the storage location of the corresponding entry in the transfer state management information 247 of the data address corresponding to the entry.

次に、転送状態管理情報２４７を詳細に説明する。 Next, the transfer status management information 247 will be explained in detail.

図７は、実施例１に係る転送状態管理情報のデータ構造の一例を示す図である。 FIG. 7 is a diagram illustrating an example of the data structure of transfer status management information according to the first embodiment;

転送状態管理情報２４７は、データアドレス毎のエントリを格納する。転送状態管理情報２４７のエントリは、データアドレス７０１、第一転送データ受領済フラグ７０２、第一転送完了フラグ７０３、及び第二転送完了フラグ７０４のフィールドを含む。 The transfer status management information 247 stores an entry for each data address. The entry of the transfer status management information 247 includes data address 701 , first transfer data received flag 702 , first transfer completion flag 703 , and second transfer completion flag 704 fields.

データアドレス７０１は、エントリに対応するストレージシステム２内のユーザデータの格納位置を示す値（データアドレス）が格納される。第一転送データ受領済フラグ７０２には、ＨＣＡ２５０によって、データアドレス７０１のデータアドレスに対応するライトデータの第一転送のデータが受領されたか否かを示す値（受領済みフラグ）が格納される。受領済みフラグは、受領された場合には、「１」が設定され、受領されていない場合には、「０」が設定される。第一転送完了フラグ７０３には、ＨＣＡ２５０によって、データアドレス７０１に対応するデータアドレスのライトデータの第一転送が完了したか否かを示す値（第一転送完了フラグ）が格納される。第一転送完了フラグは、第一転送が完了された場合には、「１」が設定され、第一転送が完了されていない場合には、「０」が設定される。第二転送完了フラグ７０４には、ＨＣＡ２５０によって、データアドレス７０１に対応するデータアドレスのライトデータの第二転送が完了したか否かを示す値（第二転送完了フラグ）が格納される。第二転送完了フラグは、第二転送が完了された場合には、「１」が設定され、第二転送が完了されていない場合には、「０」が設定される。 The data address 701 stores a value (data address) indicating the storage location of user data in the storage system 2 corresponding to the entry. The first transfer data reception flag 702 stores a value (reception flag) indicating whether or not the first transfer data of the write data corresponding to the data address of the data address 701 has been received by the HCA 250 . The received flag is set to "1" if it has been received, and is set to "0" if it has not been received. The first transfer completion flag 703 stores a value (first transfer completion flag) indicating whether the HCA 250 has completed the first transfer of the write data at the data address corresponding to the data address 701 . The first transfer completion flag is set to "1" when the first transfer is completed, and is set to "0" when the first transfer is not completed. The second transfer completion flag 704 stores a value (second transfer completion flag) indicating whether the HCA 250 has completed the second transfer of the write data at the data address corresponding to the data address 701 . The second transfer completion flag is set to "1" when the second transfer is completed, and is set to "0" when the second transfer is not completed.

次に、実施例１に係る計算機システムによる処理動作について説明する。 Next, processing operations by the computer system according to the first embodiment will be described.

まず、逐次転送依頼処理について説明する。 First, the sequential transfer request processing will be described.

図８は、実施例１に係る逐次転送依頼処理のフローチャートである。 FIG. 8 is a flowchart of sequential transfer request processing according to the first embodiment.

逐次転送依頼処理は、ライト要求に対応するライトデータ（新データ）が対象とする記憶デバイスユニット２０における論理ユニット（記憶空間）のオーナ権を有するコントローラ２２（オーナコントローラ２２という。：担当コントローラの一例）がライト要求を受信した場合に実行される。ここで、オーナコントローラ２２にライト要求が送信される場合としては、ホスト計算機１から直接オーナコントローラ２２に送られる場合と、ライト要求に対応するライトデータに対応するキャッシュ領域２４３を有さず、ＦＥ－Ｉ／Ｆ２１０を介してホスト計算機１からライトデータを受領したコントローラ２２（ＦＥコントローラ２２という。受信コントローラの一例）からオーナコントローラ２２に転送される場合と、がある。 The sequential transfer request process is performed by the controller 22 (referred to as the owner controller 22) having ownership of the logical unit (storage space) in the storage device unit 20 targeted by the write data (new data) corresponding to the write request. ) is executed when it receives a write request. Here, the case where the write request is sent to the owner controller 22 is the case where it is sent directly from the host computer 1 to the owner controller 22, and the case where the FE does not have the cache area 243 corresponding to the write data corresponding to the write request. - There is a case where write data is transferred to the owner controller 22 from the controller 22 (referred to as the FE controller 22, which is an example of a receiving controller) that has received the write data from the host computer 1 via the I/F 210 .

本例では、ＦＥコントローラ２２からオーナコントローラ２２にライト要求が転送された場合を例に説明する。 In this example, a case where a write request is transferred from the FE controller 22 to the owner controller 22 will be described.

オーナコントローラ２２は、ライト要求を受信する（ステップＳ１０１）。次いで、オーナコントローラ２２は、キャッシュ状態管理情報２４５を参照し、ライト要求のデータアドレスに対応するエントリのキャッシュ状態５０４からキャッシュ状態を取得し（ステップＳ１０２）、キャッシュ状態がダーティであるか否かを判定する（ステップＳ１０３）。 The owner controller 22 receives the write request (step S101). Next, the owner controller 22 refers to the cache state management information 245, acquires the cache state from the cache state 504 of the entry corresponding to the data address of the write request (step S102), and determines whether the cache state is dirty. Determine (step S103).

この結果、ダーティでないと判定された場合（ステップＳ１０３：ＮＯ）には、キャッシュ領域２４３のデータ（旧データ）が既に記憶デバイスユニット２０に格納されていることを示すので、オーナコントローラ２２は、ライトデータを２つのコントローラ２２のキャッシュ領域２４３に同時に（並行して）転送し、処理を終了する（Ｓ１０６）。 As a result, if it is determined that the data is not dirty (step S103: NO), it indicates that the data (old data) in the cache area 243 has already been stored in the storage device unit 20, so the owner controller 22 performs the write operation. The data is simultaneously (in parallel) transferred to the cache areas 243 of the two controllers 22, and the process ends (S106).

一方、ダーティであると判定された場合（ステップＳ１０３：ＹＥＳ）には、オーナコントローラ２２は、ライト要求のデータアドレスに対応する転送状態管理情報２４７のエントリの格納先を示す値（転送状態管理情報アドレス）を取得し、転送管理情報２４６にエントリを追加する。オーナコントローラ２２は、追加したエントリのデータアドレス６０１、コントローラＩＤ６０２、及び転送状態管理情報アドレス６０３に、それぞれ、ライトデータのデータアドレス、ＦＥコントローラ２２のコントローラＩＤ、及び転送状態管理情報２４７のエントリの転送状態管理情報アドレスを設定する（ステップＳ１０４）。 On the other hand, if it is determined to be dirty (step S103: YES), the owner controller 22 stores a value (transfer state management information address) and adds an entry to the transfer management information 246 . The owner controller 22 transfers the data address of the write data, the controller ID of the FE controller 22, and the entry of the transfer status management information 247 to the data address 601, controller ID 602, and transfer status management information address 603 of the added entry, respectively. A state management information address is set (step S104).

次に、オーナコントローラ２２は、ライトデータの逐次転送をＦＥコントローラ２２内のＨＣＡ２５０に依頼し（ステップＳ１０５）、次の処理（図９の逐次転送完了待ち処理）を実行する（Ｌ０）。なお、ＦＥコントローラ２２のＨＣＡ２５０への依頼は、自コントローラ２２のＨＣＡ２５０を経由して通知してもよい。 Next, the owner controller 22 requests the HCA 250 in the FE controller 22 to sequentially transfer the write data (step S105), and executes the next process (sequential transfer completion waiting process in FIG. 9) (L0). Note that the request to the HCA 250 of the FE controller 22 may be notified via the HCA 250 of its own controller 22 .

次に、逐次転送完了待ち処理について説明する。 Next, the sequential transfer completion waiting process will be described.

図９は、実施例１に係る逐次転送完了待ち処理のフローチャートである。 FIG. 9 is a flowchart of sequential transfer completion waiting processing according to the first embodiment.

オーナコントローラ２２は、逐次転送が完了しているか否かを判定する（ステップＳ２０１）。すなわち、オーナコントローラ２２は、転送状態管理情報２４７を参照し、ライトデータのデータアドレスに対応するエントリ、すなわち、データアドレス７０１の値がライトデータのデータアドレスであるエントリにおける第一転送完了フラグ７０３及び第二転送完了フラグ７０４のフラグが立っているか否か、すなわち、フラグの値が“１”であるか否かを判定する。なお、本ステップの処理は、一定の周期で行ってもよい。 The owner controller 22 determines whether the sequential transfer has been completed (step S201). That is, the owner controller 22 refers to the transfer status management information 247, and the first transfer completion flag 703 and It is determined whether or not the flag of the second transfer completion flag 704 is set, that is, whether or not the value of the flag is "1". Note that the process of this step may be performed at regular intervals.

この結果、逐次転送が完了していると判定した場合（ステップＳ２０１：ＹＥＳ）には、オーナコントローラ２２は、ＦＥコントローラ２２を経由してホスト１にライト処理が終了したことを意味するＧｏｏｄ応答を送信し（ステップＳ２０２）、処理を終了する。一方、逐次転送が完了していないと判定した場合（ステップＳ２０１：ＮＯ）には、オーナコントローラ２２は、コントローラ状態管理情報２４４から他のコントローラ２２の状態を取得し、状態が故障であるコントローラＩＤ（故障コントローラＩＤ）を特定する（ステップＳ２０３）。 As a result, when it is determined that the sequential transfer has been completed (step S201: YES), the owner controller 22 sends a Good response to the host 1 via the FE controller 22, meaning that the write process has been completed. It transmits (step S202) and ends the process. On the other hand, when it is determined that the sequential transfer is not completed (step S201: NO), the owner controller 22 obtains the status of the other controllers 22 from the controller status management information 244, (faulty controller ID) is identified (step S203).

次いで、オーナコントローラ２２は、第一転送の転送先コントローラ２２（第一転送先コントローラ２２）が故障しているか否かを判定する（ステップＳ２０４）。具体的には、オーナコントローラ２２は、ステップＳ２０３で特定した故障コントローラＩＤに、データアドレスに対応するキャッシュ状態管理情報２４５のエントリにおけるデータアドレス５０１に格納された第一転送先コントローラＩＤ５０２の第一転送先コントローラＩＤと一致するものが存在するか否かにより、第一転送先コントローラ２２が故障しているか否かを判定する。 Next, the owner controller 22 determines whether or not the transfer destination controller 22 of the first transfer (first transfer destination controller 22) has failed (step S204). Specifically, the owner controller 22 assigns the first transfer destination controller ID 502 stored in the data address 501 in the entry of the cache status management information 245 corresponding to the data address to the failed controller ID specified in step S203. It is determined whether or not the first transfer destination controller 22 is faulty based on whether or not there is a matching destination controller ID.

この結果、第一転送先コントローラ２２が故障していると判定した場合（ステップＳ２０４：ＹＥＳ）には、オーナコントローラ２２は、第二転送が完了しているか否かを判定する（ステップＳ２０５）。すなわち、オーナコントローラ２２は、転送状態管理情報２４７を参照し、データアドレスに対応するエントリの第二転送完了フラグ７０４のフラグが立っているか否かを判定する。 As a result, when it is determined that the first transfer destination controller 22 is out of order (step S204: YES), the owner controller 22 determines whether or not the second transfer is completed (step S205). That is, the owner controller 22 refers to the transfer status management information 247 and determines whether or not the second transfer completion flag 704 of the entry corresponding to the data address is set.

この結果、第二転送が完了していると判定した場合（ステップＳ２０５：ＹＥＳ）には、第二転送により、第二転送先コントローラ（オーナコントローラ２２）のキャッシュ領域２４３に対してライトデータが格納されていることを意味しているので、オーナコントローラ２２は、オーナコントローラ２２のキャッシュ領域２４３に格納されているライトデータ（保証データ）をデステージ（記憶デバイスユニット２０に転送）する（ステップＳ２０７）。次に、オーナコントローラ２２は、ＦＥコントローラ２２を経由してホスト１に失敗応答を送信し、処理を終了する（ステップＳ２１１）。 As a result, when it is determined that the second transfer is completed (step S205: YES), the write data is stored in the cache area 243 of the second transfer destination controller (owner controller 22) by the second transfer. Therefore, the owner controller 22 destages (transfers to the storage device unit 20) the write data (guaranteed data) stored in the cache area 243 of the owner controller 22 (step S207). . Next, the owner controller 22 transmits a failure response to the host 1 via the FE controller 22, and terminates the process (step S211).

ここで、第二転送が完了している場合（ステップＳ２０５：ＹＥＳ）には、第二転送の転送先のコントローラ２２（第二転送先コントローラ２２、オーナコントローラ）のキャッシュ領域２４３のライトデータは壊れていないことを示しているため、キャッシュ領域２４３のライトデータをデステージすることで、データの一貫性を保証できる。 Here, if the second transfer is completed (step S205: YES), the write data in the cache area 243 of the controller 22 of the transfer destination of the second transfer (the second transfer destination controller 22, the owner controller) is destroyed. Therefore, by destaging the write data in the cache area 243, the consistency of the data can be guaranteed.

一方、第二転送が完了していないと判定した場合（ステップＳ２０５：ＮＯ）には、オーナコントローラ２２は、第一転送が完了しているか否かを判定する（ステップＳ２０６）。すなわち、オーナコントローラ２２は、転送状態管理情報２４７を参照し、データアドレスに対応するエントリの第一転送完了フラグ７０３のフラグが立っているか否か判定する。 On the other hand, when determining that the second transfer has not been completed (step S205: NO), the owner controller 22 determines whether or not the first transfer has been completed (step S206). That is, the owner controller 22 refers to the transfer status management information 247 and determines whether or not the first transfer completion flag 703 of the entry corresponding to the data address is set.

この結果、第一転送が完了していると判定した場合（ステップＳ２０６：ＹＥＳ）には、オーナコントローラ２２は、処理をステップＳ２０１に進め、第二転送の完了を待つ。 As a result, when it is determined that the first transfer is completed (step S206: YES), the owner controller 22 advances the process to step S201 and waits for the completion of the second transfer.

一方、第一転送が完了していないと判定した場合（ステップＳ２０６：ＮＯ）、オーナコントローラ２２は、処理をステップＳ２０７に進める。 On the other hand, when determining that the first transfer has not been completed (step S206: NO), the owner controller 22 advances the process to step S207.

ここで、第一転送が完了していない場合、第二転送の開始前に第一転送先コントローラ２２が故障していることを意味しているので、第二転送先コントローラであるオーナコントローラ２２のキャッシュ領域２４３のダーティデータは更新されておらず、このキャッシュ領域２４３のダーティデータ（保証データ）をデステージすることで、データの一貫性を保証できる。 Here, if the first transfer has not been completed, it means that the first transfer destination controller 22 has failed before the start of the second transfer. The dirty data in the cache area 243 has not been updated, and by destaging the dirty data (guaranteed data) in the cache area 243, data consistency can be guaranteed.

一方、ステップＳ２０４で、第一転送先コントローラ２２が故障していないと判定した場合（Ｓ２０４：ＮＯ）には、オーナコントローラ２２は、ＦＥコントローラ２２が故障しているか否かを判定する（ステップＳ２０８）。すなわち、オーナコントローラ２２は、転送管理情報２４６を参照し、データアドレスに対応するエントリのコントローラＩＤ６０２のコントローラＩＤを取得し、このコントローラＩＤと一致するものがステップＳ２０３で特定した故障コントローラＩＤに存在するか否かにより、ＦＥコントローラ２２が故障しているか否かを判定する。 On the other hand, if it is determined in step S204 that the first transfer destination controller 22 has not failed (S204: NO), the owner controller 22 determines whether the FE controller 22 has failed (step S208). ). That is, the owner controller 22 refers to the transfer management information 246, acquires the controller ID of the controller ID 602 of the entry corresponding to the data address, and the failed controller ID specified in step S203 has a controller ID that matches this controller ID. It is determined whether or not the FE controller 22 is out of order.

この結果、ＦＥコントローラ２２が故障していないと判定した場合（ステップＳ２０８：ＮＯ）には、オーナコントローラ２２は、処理をステップＳ２０１に戻し、逐次転送の完了を待つ。 As a result, when it is determined that the FE controller 22 has not failed (step S208: NO), the owner controller 22 returns the process to step S201 and waits for completion of the sequential transfer.

一方、ＦＥコントローラ２２が故障していると判定した場合（ステップＳ２０８：ＹＥＳ）には、オーナコントローラ２２は、第一転送が完了しているか否かを判定する（ステップＳ２０９）。すなわち、オーナコントローラ２２は、転送状態管理情報２４７を参照し、データアドレスに対応するエントリの第一転送完了フラグ７０３のフラグが立っているか否か判定する。なお、第一転送完了フラグ７０３のフラグで判定する代わりに、第一転送データ受領済フラグ７０２のフラグが立っているか否かを判定してもよい。また、ステップＳ２０９の前に、第二転送が完了しているか否かを判定し、第二転送が完了している場合、ホスト１にＧｏｏｄ応答を送信し、処理を終了してもよい。 On the other hand, if it is determined that the FE controller 22 has failed (step S208: YES), the owner controller 22 determines whether the first transfer has been completed (step S209). That is, the owner controller 22 refers to the transfer status management information 247 and determines whether or not the first transfer completion flag 703 of the entry corresponding to the data address is set. It should be noted that it may be determined whether or not the flag of the first transfer data reception completion flag 702 is set instead of determining by the flag of the first transfer completion flag 703 . Also, before step S209, it may be determined whether or not the second transfer has been completed, and if the second transfer has been completed, a Good response may be transmitted to the host 1 and the process may be terminated.

この結果、第一転送が完了していると判定した場合（ステップＳ２０９：ＹＥＳ）、オーナコントローラ２２は、第一転送先のコントローラ２２にそのコントローラ２２のキャッシュ領域２４３からのデータ（保証データ）のデステージを依頼し（ステップＳ２１０）、処理をステップ２１１に進める。なお、第一転送先のコントローラ２２は、依頼に対応して、キャッシュ領域２４３のデータをデステージすることとなる。ここで、第一転送が完了している場合、第一転送先コントローラ２２のキャッシュ領域２４３のライトデータは壊れていないため、このキャッシュ領域２４３のライトデータがデステージされることによりデータの一貫性を保証できる。 As a result, when it is determined that the first transfer is completed (step S209: YES), the owner controller 22 sends the data (guaranteed data) from the cache area 243 of the controller 22 to the first transfer destination controller 22. Destage is requested (step S210), and the process proceeds to step S211. Note that the first transfer destination controller 22 destages the data in the cache area 243 in response to the request. Here, when the first transfer is completed, the write data in the cache area 243 of the first transfer destination controller 22 is not broken, so the write data in the cache area 243 is destaged to ensure data consistency. can be guaranteed.

一方、第一転送が完了していないと判定した場合（ステップＳ２０９：ＮＯ）には、オーナコントローラ２２は、処理をステップＳ２０７に進め、自コントローラ２２のキャッシュ領域２４３のライトデータ（保証データ）をデステージする。ここで、第一転送が完了していない場合、第二転送の開始前にＦＥコントローラ２２が故障しているため、オーナコントローラ２２のキャッシュ領域２４３のダーティデータは更新されておらず、このキャッシュ領域２４３のダーティデータをデステージすることでデータの一貫性を保証できる。 On the other hand, if it is determined that the first transfer has not been completed (step S209: NO), the owner controller 22 advances the process to step S207, and writes the write data (guaranteed data) in the cache area 243 of its own controller 22. destage. Here, if the first transfer has not been completed, the dirty data in the cache area 243 of the owner controller 22 has not been updated because the FE controller 22 has failed before the start of the second transfer. H.243 dirty data can be destaged to ensure data consistency.

次に、逐次転送処理について説明する。 Next, sequential transfer processing will be described.

図１０は、実施例１に係る逐次転送処理のフローチャートである。 FIG. 10 is a flowchart of sequential transfer processing according to the first embodiment.

ＦＥコントローラ２２（具体的には、ＦＥコントローラ２２のＨＣＡ２５０）は、オーナコントローラ２２から送信された逐次転送依頼を受信し、逐次転送依頼からライトデータのデータアドレスを取得する（ステップＳ３０１）。次に、ＦＥコントローラ２２は、キャッシュ状態管理情報２４５を参照し、データアドレスに対応するエントリの第一転送先コントローラＩＤ５０２及び第二転送先コントローラＩＤ５０３から、第一転送先コントローラＩＤ及び第二転送先コントローラＩＤを取得する（ステップＳ３０２）。 The FE controller 22 (specifically, the HCA 250 of the FE controller 22) receives the sequential transfer request transmitted from the owner controller 22 and acquires the data address of the write data from the sequential transfer request (step S301). Next, the FE controller 22 refers to the cache state management information 245, and from the first transfer destination controller ID 502 and second transfer destination controller ID 503 of the entry corresponding to the data address, the first transfer destination controller ID and the second transfer destination A controller ID is acquired (step S302).

次に、ＦＥコントローラ２２のＨＣＡ２５０は、第一転送を実行する（ステップＳ３０３）。具体的には、ＦＥコントローラ２２のＨＣＡ２５０は、ライトデータをバッファ領域２４２から取り出し、第一転送先コントローラ２２のＨＣＡ２５０を介して、ライトデータをキャッシュ領域２４３に転送する（ステップＳ３０３）。この際、ライトデータは、第一転送先コントローラ２２のバッファ領域２４０を経由することなく、また、第一転送先コントローラ２２のＣＰＵ２３０の関与なしに、キャッシュ領域２４３に転送される。 Next, the HCA 250 of the FE controller 22 executes the first transfer (step S303). Specifically, the HCA 250 of the FE controller 22 retrieves the write data from the buffer area 242 and transfers the write data to the cache area 243 via the HCA 250 of the first transfer destination controller 22 (step S303). At this time, the write data is transferred to the cache area 243 without going through the buffer area 240 of the first destination controller 22 and without involvement of the CPU 230 of the first destination controller 22 .

次に、ＦＥコントローラ２２のＨＣＡ２５０は、ライトデータの転送が成功したか否かを判定する（ステップＳ３０４）。 Next, the HCA 250 of the FE controller 22 determines whether the transfer of the write data has succeeded (step S304).

この結果、転送が失敗したと判定した場合（ステップＳ３０４：ＮＯ）には、ＦＥコントローラ２２のＨＣＡ２５０は、逐次転送処理を終了する。 As a result, when it is determined that the transfer has failed (step S304: NO), the HCA 250 of the FE controller 22 terminates the sequential transfer process.

一方、転送が成功したと判定した場合（ステップＳ３０４：ＹＥＳ）には、ＦＥコントローラ２２のＨＣＡ２５０は、第一転送先コントローラ２２のメモリ２４０内に存在する転送状態管理情報２４７のデータアドレスに対応するエントリの第一転送データ受領済フラグ７０２のフラグを立てる、すなわち、フラグを１に設定する（ステップＳ３０５）。 On the other hand, if it is determined that the transfer was successful (step S304: YES), the HCA 250 of the FE controller 22 corresponds to the data address of the transfer status management information 247 existing in the memory 240 of the first transfer destination controller 22. The first transfer data reception completion flag 702 of the entry is turned on, that is, the flag is set to 1 (step S305).

次に、ＦＥコントローラ２２のＨＣＡ２５０は、第一転送先コントローラ２２のＨＣＡ２５０に第二転送先コントローラ２２のメモリ２４０内に存在する転送状態管理情報２４７のデータアドレスに対応するエントリの第一転送完了フラグ７０３のフラグを立てることを指示する（ステップＳ３０６）。 Next, the HCA 250 of the FE controller 22 sends the first transfer completion flag of the entry corresponding to the data address of the transfer state management information 247 existing in the memory 240 of the second destination controller 22 to the HCA 250 of the first destination controller 22. 703 is instructed to be set (step S306).

次に、ＦＥコントローラ２２のＨＣＡ２５０は、第二転送を実行する（ステップＳ３０７）。具体的には、ＦＥコントローラ２２のＨＣＡ２５０は、ライトデータをバッファ領域２４２から取り出し、第二転送先コントローラ２２のＨＣＡ２５０を介して、ライトデータをキャッシュ領域２４３に転送する（ステップＳ３０７）。 Next, the HCA 250 of the FE controller 22 executes the second transfer (step S307). Specifically, the HCA 250 of the FE controller 22 retrieves the write data from the buffer area 242 and transfers the write data to the cache area 243 via the HCA 250 of the second destination controller 22 (step S307).

次に、ＦＥコントローラ２２のＨＣＡ２５０は、ライトデータの転送が成功したか否かを判定する（ステップＳ３０８）。 Next, the HCA 250 of the FE controller 22 determines whether or not the write data transfer has succeeded (step S308).

この結果、転送が失敗したと判定した場合（ステップＳ３０８：ＮＯ）には、ＦＥコントローラ２２は、逐次転送処理を終了する。 As a result, when it is determined that the transfer has failed (step S308: NO), the FE controller 22 terminates the sequential transfer process.

一方、転送が成功したと判定した場合（ステップＳ３０８：ＹＥＳ）には、ＦＥコントローラ２２のＨＣＡ２５０は、第二転送先コントローラ２２のメモリ２４０内に存在する転送状態管理情報２４７のデータアドレスに対応するエントリの第二転送完了フラグ７０４のフラグを立てる、すなわち、フラグを１に設定し（ステップＳ３０９）、処理を終了する。 On the other hand, if it is determined that the transfer was successful (step S308: YES), the HCA 250 of the FE controller 22 corresponds to the data address of the transfer status management information 247 existing in the memory 240 of the second destination controller 22. The second transfer completion flag 704 of the entry is set, that is, the flag is set to 1 (step S309), and the process ends.

次に、障害対応処理について説明する。 Next, failure handling processing will be described.

図１１は、実施例１に係る障害対応処理のフローチャートである。障害対応処理は、オーナコントローラ２２以外のコントローラ２２により実行される処理である。障害対応処理は、一定時間ごとに一度実行されてもよく、コントローラ２２により障害が検知された場合に実行されてもよい。 FIG. 11 is a flowchart of failure handling processing according to the first embodiment. The failure handling process is a process executed by a controller 22 other than the owner controller 22 . The failure handling process may be executed once at regular time intervals, or may be executed when the controller 22 detects a failure.

コントローラ２２は、コントローラ状態管理情報２４４から他のコントローラ２２の状態を取得し、状態が故障であるコントローラＩＤ４０１（故障コントローラＩＤ）を特定する（ステップＳ４０１）。 The controller 22 acquires the states of the other controllers 22 from the controller state management information 244, and identifies the controller ID 401 whose state is failure (failed controller ID) (step S401).

次に、コントローラ２２は、オーナコントローラ２２（第二転送先コントローラ２２）が故障しているか否かを判定する（ステップＳ４０２）。すなわち、オーナコントローラ２２は、障害コントローラＩＤに、データアドレスに対応するキャッシュ状態管理情報２４５のエントリにおける第二転送先コントローラＩＤ５０３のコントローラＩＤと一致するものが存在しているか否かにより、オーナコントローラ２２が故障しているか否かを判定する。 Next, the controller 22 determines whether or not the owner controller 22 (second transfer destination controller 22) has failed (step S402). That is, the owner controller 22 determines whether or not there is a failed controller ID that matches the controller ID of the second transfer destination controller ID 503 in the entry of the cache state management information 245 corresponding to the data address. is faulty.

この結果、オーナコントローラ２２が故障していないと判定した場合（ステップＳ４０２：ＮＯ）には、コントローラ２２は、処理をステップＳ４０１に戻す。一方、オーナコントローラ２２が故障していると判定した場合（ステップＳ４０２：ＹＥＳ）には、コントローラ２２は、自身（自コントローラ）が第一転送先コントローラであるか否かを判定する（ステップＳ４０３）。すなわち、コントローラ２２は、自身のコントローラＩＤ（自コントローラＩＤ）と、データアドレスに対応するキャッシュ状態管理情報２４５のエントリにおける第一転送先コントローラＩＤ５０３のコントローラＩＤとが同一であるか判定する。なお、上記処理の代わりに、自コントローラがＦＥコントローラであるか否かを判定し、すなわち、自コントローラＩＤと、転送管理情報２４６のデータアドレスに対応するエントリのコントローラＩＤ６０２のコントローラＩＤとが同一であるか否かを判定し、自コントローラがＦＥコントローラである場合に、以降の処理を行ってもよい。 As a result, when it is determined that the owner controller 22 is not out of order (step S402: NO), the controller 22 returns the process to step S401. On the other hand, when it is determined that the owner controller 22 is out of order (step S402: YES), the controller 22 determines whether itself (its own controller) is the first transfer destination controller (step S403). . That is, the controller 22 determines whether its own controller ID (own controller ID) is the same as the controller ID of the first transfer destination controller ID 503 in the entry of the cache state management information 245 corresponding to the data address. Instead of the above processing, it is determined whether or not its own controller is an FE controller. It may be determined whether or not there is, and the subsequent processing may be performed when the own controller is the FE controller.

ステップＳ４０３の判定の結果、自コントローラが第一転送先コントローラでないと判定した場合（ステップＳ４０３：ＮＯ）には、コントローラ２２は、処理をステップＳ４０１に進める。 As a result of the determination in step S403, when the controller 22 determines that its own controller is not the first transfer destination controller (step S403: NO), the controller 22 advances the process to step S401.

一方、自コントローラが第一転送先コントローラであると判定した場合（ステップＳ４０３：ＹＥＳ）には、コントローラ２２は、第一転送が完了しているか否かを判定する（ステップＳ４０４）。すなわち、コントローラ２２は、転送管理情報２４６のデータアドレスに対応するエントリの転送状態管理情報アドレス６０３のアドレスを用いて、転送状態管理情報２４７のエントリを参照し、このエントリの第一転送データ受領済フラグ７０２のフラグが立っているか否かを判定する。 On the other hand, when it is determined that its own controller is the first transfer destination controller (step S403: YES), the controller 22 determines whether or not the first transfer has been completed (step S404). That is, the controller 22 refers to the entry of the transfer status management information 247 using the address of the transfer status management information address 603 of the entry corresponding to the data address of the transfer management information 246, and confirms that the first transfer data of this entry has been received. It is determined whether or not the flag of the flag 702 is set.

この判定結果、第一転送が完了していないと判定した場合（ステップＳ４０４：ＮＯ）には、コントローラ２２は、処理をステップＳ４０１に進め、第一転送の完了を待つ。 As a result of this determination, when it is determined that the first transfer has not been completed (step S404: NO), the controller 22 advances the process to step S401 and waits for the completion of the first transfer.

一方、第一転送が完了していると判定した場合（ステップＳ４０４：ＹＥＳ）には、コントローラ２２は、キャッシュ領域２４３のライトデータ（保証データ）をデステージし（ステップＳ４０５）、ＦＥコントローラ２２を経由してホスト１に失敗応答を送信し（ステップＳ４０６）、処理を終了する。ここで、第一転送が完了している場合（ステップＳ４０４：ＹＥＳ）、第一転送先コントローラ２２のキャッシュ領域２４３のライトデータは壊れていないため、キャッシュ領域２４３のライトデータをデステージすることでデータの一貫性を保証できる。 On the other hand, if it is determined that the first transfer has been completed (step S404: YES), the controller 22 destages the write data (guaranteed data) in the cache area 243 (step S405), A failure response is sent to the host 1 via the host 1 (step S406), and the process ends. Here, if the first transfer has been completed (step S404: YES), the write data in the cache area 243 of the first transfer destination controller 22 is not broken, so the write data in the cache area 243 can be destaged. Data consistency can be guaranteed.

以上説明したように、上記実施例に係る計算機システムでは、ライトデータの二重化の処理の進捗に合わせて、障害発生時に記憶デバイスユニット２０に書き込むキャッシュ領域２４３を使い分けることで、ライトデータの一貫性を保証できる。 As described above, in the computer system according to the above embodiment, the consistency of write data is ensured by selectively using the cache area 243 to be written to the storage device unit 20 when a failure occurs in accordance with the progress of the write data duplication process. I can assure you.

次に、実施例２に係る計算機システムについて説明する。 Next, a computer system according to the second embodiment will be explained.

実施例２に係る計算機システムは、図３に示す実施例１に係る計算機システムにおいて、論理ユニットを担当するコントローラ２２を特定のコントローラ２２に限定しない、すなわち、論理ユニットのオーナ権を設定しないようにしたシステムである。この計算機システムにおいては、例えば、ホスト１からのライト要求を受信したコントローラ（受信コントローラ）が担当コントローラとなる。 The computer system according to the second embodiment does not limit the controller 22 in charge of the logical unit to a specific controller 22 in the computer system according to the first embodiment shown in FIG. It is a system that In this computer system, for example, a controller that receives a write request from the host 1 (receiving controller) is the responsible controller.

実施例２に係る計算機システムでは、図８に示す逐次転送依頼処理を、ホスト１からライト要求を受信したコントローラ２２（ＦＥコントローラ２２）が実行する。 In the computer system according to the second embodiment, the controller 22 (FE controller 22) that receives the write request from the host 1 executes the sequential transfer request processing shown in FIG.

図１２は、実施例２に係る逐次転送完了待ち処理のフローチャートである。 FIG. 12 is a flowchart of sequential transfer completion waiting processing according to the second embodiment.

ＦＥコントローラ２２は、逐次転送が完了しているか否かを判定する（ステップＳ５０１）。すなわち、ＦＥコントローラ２２は、転送状態管理情報２４７を参照し、ライトデータのデータアドレスに対応するエントリ、すなわち、データアドレス７０１の値がライトデータのデータアドレスであるエントリにおける第一転送完了フラグ７０３及び第二転送完了フラグ７０４のフラグが立っているか否か、すなわち、フラグの値が“１”であるか否かを判定する。 The FE controller 22 determines whether the sequential transfer is completed (step S501). That is, the FE controller 22 refers to the transfer status management information 247, and the first transfer completion flag 703 and the It is determined whether or not the flag of the second transfer completion flag 704 is set, that is, whether or not the value of the flag is "1".

この結果、逐次転送が完了していると判定した場合（ステップＳ５０１：ＹＥＳ）には、ＦＥコントローラ２２は、ホスト１にＧｏｏｄ応答を送信し（ステップＳ５０２）、処理を終了する。一方、逐次転送が完了していないと判定した場合（ステップＳ５０１：ＮＯ）には、ＦＥコントローラ２２は、コントローラ状態管理情報２４４から他のコントローラ２２の状態を取得し、状態が故障であるコントローラＩＤ（故障コントローラＩＤ）を特定する（ステップＳ５０３）。 As a result, when it is determined that the sequential transfer is completed (step S501: YES), the FE controller 22 transmits a Good response to the host 1 (step S502), and terminates the process. On the other hand, if it is determined that the sequential transfer has not been completed (step S501: NO), the FE controller 22 obtains the status of the other controllers 22 from the controller status management information 244, (faulty controller ID) is specified (step S503).

次いで、オーナコントローラ２２は、第一転送の転送先コントローラ２２（第一転送先コントローラ２２）が故障しているか否かを判定する（ステップＳ５０４）。 Next, the owner controller 22 determines whether or not the transfer destination controller 22 of the first transfer (first transfer destination controller 22) has failed (step S504).

この結果、第一転送先コントローラ２２が故障していると判定した場合（ステップＳ５０４：ＹＥＳ）には、ＦＥコントローラ２２は、第二転送が完了しているか否かを判定する（ステップＳ５０５）。すなわち、ＦＥコントローラ２２は、転送状態管理情報２４７を参照し、データアドレスに対応するエントリの第二転送完了フラグ７０４のフラグが立っているか否かを判定する。 As a result, when it is determined that the first transfer destination controller 22 is out of order (step S504: YES), the FE controller 22 determines whether or not the second transfer is completed (step S505). That is, the FE controller 22 refers to the transfer state management information 247 and determines whether or not the second transfer completion flag 704 of the entry corresponding to the data address is set.

この結果、第二転送が完了していると判定した場合（ステップＳ５０５：ＹＥＳ）には、ＦＥコントローラ２２は、第二転送先のコントローラ２２のキャッシュ領域２４３に格納されているライトデータ（保証データ）のデステージを依頼する（ステップＳ５０７）。次に、ＦＥコントローラ２２は、ホスト１に失敗応答を送信し、処理を終了する（ステップＳ５１１）。 As a result, when it is determined that the second transfer is completed (step S505: YES), the FE controller 22 writes the write data (guarantee data) stored in the cache area 243 of the controller 22 of the second transfer destination. ) is requested (step S507). The FE controller 22 then transmits a failure response to the host 1 and ends the process (step S511).

ここで、第二転送が完了している場合（ステップＳ５０５：ＹＥＳ）には、第二転送の転送先のコントローラ２２（第二転送先コントローラ２２）のキャッシュ領域２４３のライトデータは壊れていないことを示しているため、キャッシュ領域２４３のライトデータをデステージすることで、データの一貫性を保証できる。 Here, if the second transfer is completed (step S505: YES), the write data in the cache area 243 of the controller 22 of the transfer destination of the second transfer (the second transfer destination controller 22) is not broken. , data consistency can be guaranteed by destaging the write data in the cache area 243 .

一方、第二転送が完了していないと判定した場合（ステップＳ５０５：ＮＯ）には、ＦＥコントローラ２２は、第一転送が完了しているか否かを判定する（ステップＳ５０６）。 On the other hand, if it is determined that the second transfer has not been completed (step S505: NO), the FE controller 22 determines whether or not the first transfer has been completed (step S506).

この結果、第一転送が完了していると判定した場合（ステップＳ５０６：ＹＥＳ）には、ＦＥコントローラ２２は、処理をステップＳ５０１に進め、第二転送の完了を待つ。 As a result, when it is determined that the first transfer is completed (step S506: YES), the FE controller 22 advances the process to step S501 and waits for the completion of the second transfer.

一方、第一転送が完了していないと判定した場合（ステップＳ５０６：ＮＯ）、ＦＥコントローラ２２は、処理をステップＳ５０７に進める。 On the other hand, if it is determined that the first transfer has not been completed (step S506: NO), the FE controller 22 advances the process to step S507.

ここで、第一転送が完了していない場合、第二転送の開始前に第一転送先コントローラ２２が故障していることを意味しているので、第二転送先コントローラのキャッシュ領域２４３のダーティデータは更新されておらず、このキャッシュ領域２４３のダーティデータをデステージすることで、データの一貫性を保証できる。 Here, if the first transfer is not completed, it means that the first destination controller 22 has failed before the start of the second transfer. The data has not been updated, and by destaging the dirty data in this cache area 243, the consistency of the data can be guaranteed.

一方、ステップＳ５０４で、第一転送先コントローラ２２が故障していないと判定した場合（Ｓ５０４：ＮＯ）には、ＦＥコントローラ２２は、第二転送先コントローラ２２が故障しているか否かを判定する（ステップＳ５０８）。すなわち、ＦＥコントローラ２２は、キャッシュ状態管理情報２４５を参照し、データアドレスに対応するエントリの第二転送先コントローラＩＤ５０３のコントローラＩＤを取得し、このコントローラＩＤと一致するものがステップＳ５０３で特定した故障コントローラＩＤに存在するか否かにより、第二転送先コントローラ２２が故障しているか否かを判定する。 On the other hand, if it is determined in step S504 that the first transfer destination controller 22 has not failed (S504: NO), the FE controller 22 determines whether the second transfer destination controller 22 has failed. (Step S508). That is, the FE controller 22 refers to the cache state management information 245, acquires the controller ID of the second transfer destination controller ID 503 of the entry corresponding to the data address, and the controller ID that matches the fault identified in step S503. It is determined whether or not the second transfer destination controller 22 is out of order based on whether or not it exists in the controller ID.

この結果、第二転送先コントローラ２２が故障していないと判定した場合（ステップＳ５０８：ＮＯ）には、ＦＥコントローラ２２は処理をステップＳ５０１に戻し、逐次転送の完了を待つ。 As a result, when it is determined that the second transfer destination controller 22 is not malfunctioning (step S508: NO), the FE controller 22 returns the process to step S501 and waits for completion of the sequential transfer.

一方、第二転送先コントローラ２２が故障していると判定した場合（ステップＳ５０８：ＹＥＳ）には、ＦＥコントローラ２２は、第一転送が完了しているか否かを判定する（ステップＳ５０９）。 On the other hand, if it is determined that the second transfer destination controller 22 has failed (step S508: YES), the FE controller 22 determines whether or not the first transfer has been completed (step S509).

この結果、第一転送が完了していないと判定した場合（ステップＳ５０９：ＮＯ）には、ＦＥコントローラ２２は、処理をステップＳ５０１に進め、第一転送が終わるのを待つ。 As a result, when it is determined that the first transfer has not been completed (step S509: NO), the FE controller 22 advances the process to step S501 and waits for the end of the first transfer.

一方、第一転送が完了していると判定した場合（ステップＳ５０９：ＹＥＳ）、ＦＥコントローラ２２は、第一転送先のコントローラ２２にそのコントローラ２２のキャッシュ領域２４３からのデータ（保証データ）のデステージを依頼し（ステップＳ５１０）、処理をステップＳ５１１に進める。ここで、第一転送が完了している場合、第一転送先コントローラ２２のキャッシュ領域２４３のライトデータは壊れていないため、このキャッシュ領域２４３のライトデータをデステージすることでデータの一貫性を保証できる。 On the other hand, if it is determined that the first transfer is completed (step S509: YES), the FE controller 22 transfers the data (guaranteed data) from the cache area 243 of the controller 22 to the first transfer destination controller 22. A stage is requested (step S510), and the process proceeds to step S511. Here, if the first transfer has been completed, the write data in the cache area 243 of the first transfer destination controller 22 is not broken, so the write data in the cache area 243 is destaged to ensure data consistency. I can assure you.

実施例２に係る計算機システムの逐次転送処理は、図１０に示す逐次転送処理とは、ステップＳ３０６とステップＳ３０９における処理内容が異なる。 The sequential transfer processing of the computer system according to the second embodiment differs from the sequential transfer processing shown in FIG. 10 in the processing contents in steps S306 and S309.

実施例２に係る計算機システムにおいては、ステップＳ３０６では、コントローラ２２のＨＣＡ２５０は、自コントローラ２２のメモリ２４０内に存在する、データアドレスに対応する転送状態管理情報２４７のエントリの第一転送完了フラグ７０３のフラグを立てる。また、ステップＳ３０９では、コントローラ２２のＨＣＡ２５０は、自コントローラ２２のメモリ２４０内に存在する、データアドレスに対応する転送状態管理情報２４７のエントリの第二転送完了フラグ７０４のフラグを立てる。 In the computer system according to the second embodiment, in step S306, the HCA 250 of the controller 22 clears the first transfer completion flag 703 of the entry of the transfer status management information 247 corresponding to the data address existing in the memory 240 of its own controller 22. flag. Also, in step S309, the HCA 250 of the controller 22 sets the flag of the second transfer completion flag 704 of the entry of the transfer status management information 247 corresponding to the data address existing in the memory 240 of the controller 22 itself.

図１３は、実施例２に係る障害対応処理のフローチャートである。 FIG. 13 is a flowchart of failure handling processing according to the second embodiment.

障害対応処理は、ＦＥコントローラ２２以外のコントローラ（他コントローラ）が実行する処理である。障害対応処理は、一定時間に一度実施してもよいし、他コントローラ２２の障害検知時に実施してもよい。 The failure handling process is a process executed by a controller other than the FE controller 22 (another controller). The failure handling process may be performed once at a fixed time, or may be performed when a failure of another controller 22 is detected.

コントローラ２２は、コントローラ状態管理情報２４４から他のコントローラ２２の状態を取得し、状態が故障であるコントローラＩＤ（故障コントローラＩＤ）を特定する（ステップＳ６０１）。次に、コントローラ２２は、ＦＥコントローラ２２が故障しているか否かを判定する（ステップＳ６０２）。 The controller 22 acquires the status of the other controllers 22 from the controller status management information 244, and identifies the controller ID whose status is failure (failed controller ID) (step S601). Next, the controller 22 determines whether the FE controller 22 has failed (step S602).

この結果、ＦＥコントローラ２２が故障していないと判定した場合（ステップＳ６０２：ＮＯ）には、コントローラ２２は、処理をステップＳ６０１に戻す。一方、ＦＥコントローラ２２が故障していると判定した場合（ステップＳ６０２：ＹＥＳ）には、コントローラ２２は、自身（自コントローラ）が第一転送先コントローラであるか否かを判定する（ステップＳ６０３）。なお、上記処理の代わりに、自コントローラが第二転送先コントローラであるか否かを判定し、以降の処理を行ってもよい。 As a result, when it is determined that the FE controller 22 has not failed (step S602: NO), the controller 22 returns the process to step S601. On the other hand, if it is determined that the FE controller 22 is out of order (step S602: YES), the controller 22 determines whether itself (its own controller) is the first transfer destination controller (step S603). . It should be noted that, instead of the above process, it is also possible to determine whether or not its own controller is the second transfer destination controller, and then perform the subsequent processes.

ステップＳ６０３の判定の結果、自コントローラ２２が第一転送先コントローラでないと判定した場合（ステップＳ６０３：ＮＯ）には、コントローラ２２は、処理をステップＳ６０１に進める。 As a result of the determination in step S603, when it is determined that the own controller 22 is not the first transfer destination controller (step S603: NO), the controller 22 advances the process to step S601.

一方、自コントローラが第一転送先コントローラであると判定した場合（Ｓ６０３：ＹＥＳ）には、コントローラ２２は、第一転送が完了しているか否かを判定する（ステップＳ６０４）。 On the other hand, when determining that its own controller is the first transfer destination controller (S603: YES), the controller 22 determines whether or not the first transfer has been completed (step S604).

この判定結果、第一転送が完了していると判定した場合（ステップＳ６０４：ＹＥＳ）には、コントローラ２２は、キャッシュ領域２４３のライトデータ（保証データ）をデステージし（ステップＳ６０６）、ＦＥコントローラ２２を経由してホスト１に失敗応答を送信し（ステップＳ６０７）、処理を終了する。ここで、第一転送が完了している場合（ステップＳ６０４：ＹＥＳ）、第一転送先コントローラ２２のキャッシュ領域２４３のライトデータは壊れていないため、キャッシュ領域２４３のライトデータをデステージすることでデータの一貫性を保証できる。 As a result of this determination, if it is determined that the first transfer has been completed (step S604: YES), the controller 22 destages the write data (guaranteed data) in the cache area 243 (step S606), and the FE controller 22 to the host 1 (step S607), and the process ends. Here, if the first transfer has been completed (step S604: YES), the write data in the cache area 243 of the first transfer destination controller 22 is not broken, so the write data in the cache area 243 can be destaged. Data consistency can be guaranteed.

一方、第一転送が完了していないと判定した場合（ステップＳ６０４：ＮＯ）には、コントローラ２２は、第二転送先のコントローラ２２にキャッシュ領域２４３に格納されているライトデータ（保証データ）のデステージを依頼し（ステップＳ６０５）、処理をステップＳ６０７に進める。ここで、第一転送が完了していない場合（ステップＳ６０４：ＮＯ）、第二転送の開始前に第一転送先コントローラ２２が故障しているため、第二転送先コントローラ２２のキャッシュ領域２４３のダーティデータは更新されておらず、このキャッシュ領域２４３のダーティデータをデステージすることでデータの一貫性を保証できる。 On the other hand, if it is determined that the first transfer has not been completed (step S604: NO), the controller 22 transfers the write data (guaranteed data) stored in the cache area 243 to the second transfer destination controller 22. Destage is requested (step S605), and the process proceeds to step S607. Here, if the first transfer has not been completed (step S604: NO), the first destination controller 22 has failed before the start of the second transfer, so the cache area 243 of the second destination controller 22 Dirty data has not been updated, and destaging the dirty data in this cache area 243 can ensure data consistency.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。また、例えば、上記した実施例は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、各実施例の構成の一部について、他の構成に追加、削除、置換することが可能である。 In addition, the present invention is not limited to the above-described embodiments, and includes various modifications. Further, for example, the above-described embodiments are detailed descriptions of the configurations for easy understanding of the present invention, and are not necessarily limited to those having all the described configurations. Moreover, it is possible to add, delete, or replace a part of the configuration of each embodiment with another configuration.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、本発明は、実施例の機能を実現するソフトウェア（データ管理プログラム）のプログラムコードによって実現してもよい。この場合、プログラムコードを記録した記憶媒体をコンピュータに提供し、そのコンピュータが備えるプロセッサが記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク、光磁気ディスク、ＣＤ－Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどがある。 Further, each of the above configurations, functions, processing units, processing means, and the like may be realized by hardware, for example, by designing a part or all of them using an integrated circuit. Further, the present invention may be implemented by program code of software (data management program) that implements the functions of the embodiments. In this case, a computer is provided with a storage medium recording the program code, and a processor included in the computer reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium implements the functions of the above-described embodiments, and the program code itself and the storage medium storing it constitute the present invention. Examples of storage media for supplying such program code include flexible disks, CD-ROMs, DVD-ROMs, hard disks, SSDs (Solid State Drives), optical disks, magneto-optical disks, CD-Rs, magnetic tapes, There are non-volatile memory cards, ROMs, and the like.

また、本実施例に記載の機能を実現するプログラムコードは、例えば、アセンブラ、Ｃ／Ｃ＋＋、ｐｅｒｌ、Ｓｈｅｌｌ、ＰＨＰ、Ｊａｖａ（登録商標）等の広範囲のプログラム又はスクリプト言語で実装してもよい。 Also, the program code that implements the functions described in this embodiment may be implemented in a wide range of programming or scripting languages, such as assembler, C/C++, perl, Shell, PHP, and Java.

さらに、実施例の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することによって、それをコンピュータのハードディスクやメモリ等の記憶手段又はＣＤ－ＲＷ、ＣＤ－Ｒ等の記憶媒体に格納し、コンピュータが備えるプロセッサが当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしてもよい。 Furthermore, by distributing the program code of the software that implements the functions of the embodiment via a network, it can be stored in storage means such as a hard disk or memory of a computer, or in a storage medium such as a CD-RW or CD-R. Alternatively, a processor provided in the computer may read and execute the program code stored in the storage means or the storage medium.

上記実施例において、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていてもよい。 In the above embodiments, the control lines and information lines are those considered necessary for explanation, and not all control lines and information lines are necessarily shown on the product. All configurations may be interconnected.

また、上記実施例では、複数のコントローラ２２のキャッシュ領域２４３上での二重化ができない場合に、正常なデータを記憶デバイスユニット２０にデステージすることにより、データの一貫性を保証できるようにしていたが、本発明はこれに限られず、例えば、複数のコントローラ２２のキャッシュ領域２４３上での二重化ができない場合に、正常なデータを、正常な動作が可能なコントローラ２２のキャッシュ領域２４３にコピーして、複数のコントローラ２２のキャッシュ領域２４３上で二重化させるようにしてもよい。 Also, in the above embodiment, when duplication on the cache areas 243 of a plurality of controllers 22 is not possible, normal data is destaged to the storage device unit 20 to ensure data consistency. However, the present invention is not limited to this. , may be duplicated on the cache areas 243 of a plurality of controllers 22 .

また、上記実施形態では、ライトデータを複数のコントローラ２２のキャッシュ領域２４３上で二重化をさせるようにしていたが、本発明はこれに限られず、３以上の多重化をさせるようにしてもよく。 Further, in the above embodiment, write data is duplicated on the cache areas 243 of a plurality of controllers 22, but the present invention is not limited to this, and three or more may be multiplexed.

１…ホスト計算機、２…ストレージシステム、１１…ネットワーク、２０…記憶デバイスユニット、２２…コントローラ、２３…ＨＣＡネットワーク、１００…計算機システム、２００…ＰＤＥＶ、２３０…ＣＰＵ、２４０…メモリ、２４３…キャッシュ領域、２４４…コントローラ状態管理情報、２４５…キャッシュ状態管理情報、２４６…転送管理情報、２４７…転送状態管理情報、２５０…ＨＣＡ

1 Host computer 2 Storage system 11 Network 20 Storage device unit 22 Controller 23 HCA network 100 Computer system 200 PDEV 230 CPU 240 Memory 243 Cache area , 244 controller status management information 245 cache status management information 246 transfer management information 247 transfer status management information 250 HCA

Claims

複数のコントローラと、データを格納可能な記憶デバイスユニットとを有するストレージシステムであって、
前記コントローラは、
プロセッサ部と、
メモリと、を有し、
ライト要求にかかる新データが第１の前記コントローラの第１の前記メモリに格納された場合に、前記ライト要求にかかる新データにかかる第２のコントローラの第２のメモリ及び第３のコントローラの第３のメモリの旧データの状態がダーティである場合に、前記第１のコントローラは、ライト要求に対応する新データが格納された第１のメモリから、前記新データを第２のコントローラの第２のメモリに対して転送し、前記第２のメモリへの転送が完了した後に、前記新データを第３のコントローラの第３のメモリに対して転送することにより、前記第２のメモリと前記第３のメモリに別々に転送を行って前記旧データを上書きし、
前記ライト要求にかかる新データにかかる前記第２のメモリ及び前記第３のメモリの旧データの状態がダーティではない場合に、前記新データを前記第２のメモリと前記第３のメモリに並行して転送を行って前記旧データを上書きする
ストレージシステム。 A storage system having a plurality of controllers and storage device units capable of storing data,
The controller is
a processor unit;
having a memory and
When the new data related to the write request is stored in the first memory of the first controller, the new data related to the write request is stored in the second memory of the second controller and the memory of the third controller. 3 is dirty, the first controller transfers the new data from the first memory storing the new data corresponding to the write request to the second memory of the second controller. , and after the transfer to the second memory is completed, the new data is transferred to the third memory of the third controller , whereby the second memory and the separately transferring to a third memory to overwrite the old data;
When the state of the old data in the second memory and the third memory related to the new data related to the write request is not dirty, the new data is transferred to the second memory and the third memory in parallel. to overwrite the old data
storage system.

前記別々に転送を行う場合には、前記第２のメモリへの転送の成功を確認してから前記第３のメモリへの転送を行い、
前記転送時に障害が発生した場合には、前記障害により損傷が発生していない前記新データまたは旧データを用いて処理を行う
請求項１に記載のストレージシステム。 When the transfer is performed separately, the transfer to the third memory is performed after confirming the success of the transfer to the second memory,
2. The storage system according to claim 1 , wherein if a failure occurs during said transfer, said new data or old data that has not been damaged by said failure is used for processing.

前記転送時に障害が発生した場合に、
前記第２のメモリまたは前記第３のメモリのいずれかの転送が正常に行われた場合には、正常に転送された新データを前記記憶デバイスユニットにデステージし、
前記第２のメモリまたは前記第３のメモリのいずれかにも転送が正常に行われていない場合には、前記第２のメモリまたは前記第３のメモリに格納済みの旧データを前記記憶デバイスユニットにデステージして、前記第２のメモリまたは前記第３のメモリのデータをダーティでなくする
請求項２に記載のストレージシステム。 If a failure occurs during the transfer,
if the transfer to either the second memory or the third memory is successful, destage the successfully transferred new data to the storage device unit;
When the transfer to either the second memory or the third memory is not normally performed, the old data stored in the second memory or the third memory is transferred to the storage device unit. 3. The storage system of claim 2 , wherein the data in the second memory or the third memory is undirty by destaging to the second memory.

前記第１のコントローラは、前記第２のコントローラの第２のメモリ及び前記第３のコントローラの第３のメモリに直接アクセス可能である
請求項１に記載のストレージシステム。 2. The storage system according to claim 1 , wherein said first controller can directly access a second memory of said second controller and a third memory of said third controller.

前記転送は、前記第１のコントローラのＤＭＡ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓ）部が行う
請求項４に記載のストレージシステム。 5. The storage system according to claim 4 , wherein said transfer is performed by a DMA (Direct Memory Access) section of said first controller.

複数のコントローラと、データを格納可能な記憶デバイスユニットとを有するストレージシステムによるデータ管理方法であって、
前記コントローラは、
プロセッサ部と、
メモリと、を有し、
ライト要求にかかる新データが第１の前記コントローラの第１の前記メモリに格納された場合に、前記ライト要求にかかる新データにかかる第２のコントローラの第２のメモリ及び第３のコントローラの第３のメモリの旧データの状態がダーティである場合に、前記第１のコントローラは、ライト要求に対応する新データが格納された第１のメモリから、前記新データを第２のコントローラの第２のメモリに対して転送し、前記第２のメモリへの転送が完了した後に、前記新データを第３のコントローラの第３のメモリに対して転送することにより、前記第２のメモリと前記第３のメモリに別々に転送を行って前記旧データを上書きし、
前記ライト要求にかかる新データにかかる前記第２のメモリ及び前記第３のメモリの旧データの状態がダーティではない場合に、前記新データを前記第２のメモリと前記第３のメモリに並行して転送を行って前記旧データを上書きする
データ管理方法。 A data management method by a storage system having a plurality of controllers and storage device units capable of storing data,
The controller is
a processor unit;
having a memory and
When the new data related to the write request is stored in the first memory of the first controller, the new data related to the write request is stored in the second memory of the second controller and the memory of the third controller. 3 is dirty, the first controller transfers the new data from the first memory storing the new data corresponding to the write request to the second memory of the second controller. , and after the transfer to the second memory is completed, the new data is transferred to the third memory of the third controller , whereby the second memory and the separately transferring to a third memory to overwrite the old data;
when the state of the old data in the second memory and the third memory related to the new data related to the write request is not dirty, the new data is transferred to the second memory and the third memory in parallel. to overwrite the old data
Data management method.