JP2012008934A

JP2012008934A - Distributed file system and redundancy method in distributed file system

Info

Publication number: JP2012008934A
Application number: JP2010146383A
Authority: JP
Inventors: Akihiko Nishitani; 明彦西谷; Masato Terashita; 雅人寺下; Tomohiko Ogishi; 智彦大岸
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2010-06-28
Filing date: 2010-06-28
Publication date: 2012-01-12

Abstract

PROBLEM TO BE SOLVED: To avoid the service stop of a whole system due to the failure of a management server which manages a file server in a distributed file system.SOLUTION: A plurality of pairs of a proxy server 2 and a meta data server corresponding to this are installed, and the proxy server 2 is provided with: an access control part 21 for receiving an I/O request from a user; a file system access part 25 for performing access to the meta data server, and for fulfilling a client function for storing meta information relating to a write request of the I/O request; a duplication processing part 22 for performing duplication of the meta information and write history management to another proxy server 2; a data updating part 23 for executing periodic data complementary processing between the proxy servers from write error history stored by the duplication processing part 22; and an environment setting fetching part 24 for reading an update period when the data complementary processing is executed from an environment setting file.

Description

本発明は、複数のユーザによるファイル書込み要求及びファイル読込み要求が行われるネットワークにおいて、記憶部（ストレージ）を有する複数のファイルサーバを広域な範囲に分散配置させて形成される分散ファイルシステムでファイル（ユーザデータ）を保存する際に、分散ファイルシステムを運用する複数の管理サーバ間で行われるメタ情報の管理に関し、特に、システム全体のサービス停止の回避を目的とした分散ファイルシステム及び分散ファイルシステムにおける冗長化方法に関する。 The present invention provides a distributed file system formed by distributing a plurality of file servers having storage units (storage) in a wide range in a network where a file write request and a file read request are made by a plurality of users. When managing user data), it is related to meta information management performed between a plurality of management servers operating a distributed file system. In particular, in a distributed file system and a distributed file system for the purpose of avoiding service stoppage of the entire system. It relates to a redundancy method.

この種の技術としては、非特許文献１や非特許文献２で示されるように、複数のマシンのディスクを組み合わせて１つのファイルシステムとして機能する分散プラットフォームが提案されている。
非特許文献１に示されたGfarmは、広域ネットワーク上で、大容量、大規模データ処理の要求に応えるスケーラブルな分散ファイルシステムプラットフォームであり、広域なネットワーク上での効率的なファイル共有に適した分散プラットフォームである。
一方、非特許文献２に示されたHadoopは、１つのディスクで保存できない大量のデータを並列化することで高速かつ効率良く処理できるものであり、比較的大きなサイズかつ基本的に更新されることのないファイルのI/Oに適した分散プラットフォームである。
また、非特許文献３のdrbdには、一方のデータを他方にコピーする技術が開示されている。 As this type of technology, as shown in Non-Patent Document 1 and Non-Patent Document 2, a distributed platform that functions as one file system by combining disks of a plurality of machines has been proposed.
Gfarm shown in Non-Patent Document 1 is a scalable distributed file system platform that meets the demands of large-capacity, large-scale data processing on a wide area network, and is suitable for efficient file sharing on a wide area network A distributed platform.
On the other hand, Hadoop disclosed in Non-Patent Document 2 can process a large amount of data that cannot be saved on a single disk in parallel and can be processed at high speed and efficiently, and is relatively large and basically updated. It is a distributed platform suitable for I / O of files without files.
Also, drbd of Non-Patent Document 3 discloses a technique for copying one data to the other.

URL：http://datafarm.apgrid.org/index.ja.htmlURL: http://datafarm.apgrid.org/index.en.html URL：http://hadoop.apache.org/URL: http://hadoop.apache.org/ URL：http://www.drbd.org/URL: http://www.drbd.org/

非特許文献１に示されたGfarmは、広域なネットワーク上での効率的なファイル共有に適した分散プラットフォームであり、この分散プラットフォームでは、広域な範囲に配置した複数のファイルサーバに、ユーザから書込み要求のあったファイルを、データの可用性を高めるために分散配置する。しかしながら、ファイル自体の可用性は高めることができるが、ファイルのメタ情報は、管理サーバが一か所で管理しており、管理サーバが障害となった場合、ユーザからの分散ファイルシステムに対するアクセスが不可能となる（サービスのダウンタイムが発生する）現象が生じる。 Gfarm shown in Non-Patent Document 1 is a distributed platform suitable for efficient file sharing on a wide area network. In this distributed platform, users write to multiple file servers located in a wide area. Distribute the requested file to increase data availability. However, although the availability of the file itself can be increased, the meta information of the file is managed in one place by the management server. If the management server fails, the user cannot access the distributed file system. A phenomenon occurs (service downtime occurs).

この現象を回避するため、メタ情報を管理するサーバを複数台設けて冗長化する技術が提案されている。例えば、図１３に示すように、ユーザ端末１と、ユーザ端末１からの要求を処理する複数のプロキシサーバ２と、データを格納する複数のファイルサーバ４と、ユーザ端末１からのアクセスに対してプロキシサーバ２の選択を行うＤＮＳサーバ５から分散ファイルシステムを構成する場合に、メタ情報を管理する複数のメタデータサーバ３を設け、メタデータサーバ３が管理するメタ情報を、ユーザからの操作とは非同期に各メタデータサーバ３間で相互に定期的に送受信を行い、全てのメタデータサーバ３を現用機として並列に動作させることで、いずれかのメタデータサーバ３に障害が発生しても残りの現用機でサービスを継続する構成が提案されている。 In order to avoid this phenomenon, a technique has been proposed in which a plurality of servers for managing meta information are provided for redundancy. For example, as shown in FIG. 13, a user terminal 1, a plurality of proxy servers 2 that process requests from the user terminal 1, a plurality of file servers 4 that store data, and an access from the user terminal 1 When configuring a distributed file system from the DNS server 5 that selects the proxy server 2, a plurality of metadata servers 3 for managing meta information are provided, and meta information managed by the meta data server 3 can be used as an operation from a user. Asynchronously sends and receives data between each metadata server 3 periodically, and by operating all the metadata servers 3 in parallel as active machines, even if any metadata server 3 fails A configuration has been proposed in which the service is continued with the remaining working machines.

しかしながら、この構成によれば、ユーザからのアクセスとは別のタイミングでメタ情報の同期処理が行われているので、常時メタ情報の一貫性が保たれてはいない。したがって、ユーザからファイルが書き込まれ、あるメタデータサーバがそのファイルのメタ情報を保持し、次の同期のタイミングで別のメタデータサーバにそのメタ情報を受け渡す前にそのメタデータサーバに障害が発生した場合には、そのファイルにはアクセス不可能となるという問題があった。 However, according to this configuration, since the meta information synchronization processing is performed at a timing different from the access from the user, the consistency of the meta information is not always maintained. Therefore, a file is written by the user, a metadata server holds the metadata information of the file, and the metadata server fails before it is passed to another metadata server at the next synchronization timing. If it occurred, the file could not be accessed.

また、ユーザからのファイル書込みに対しては、各メタデータサーバ間でメタ情報が受け渡しされる前に、書込みが終了したことをユーザに示すため、書込みを行ったデータは可用性が保たれているとユーザ側では認識してしまうという課題も生じる。 In addition, for file writing from the user, the written data is kept available to indicate to the user that the writing is completed before the meta information is passed between the metadata servers. There is also a problem that the user recognizes that.

更に、各管理サーバ間において、短い周期で複製を行う設定も考えられるが、クラウドサービスとして巨大ストレージを提供する場合など、増加するユーザの膨大なメタ情報を高い頻度で同期をとることになり、相互管理サーバ間での処理負荷が増加するため現実的でない。 In addition, it is possible to set up replication between each management server in a short cycle, but when providing huge storage as a cloud service, a huge amount of meta information of increasing users will be synchronized with high frequency, Since the processing load between mutual management servers increases, it is not realistic.

本発明は上記事情に鑑みて提案されたもので、記憶部（ストレージ）を提供するファイルサーバと分散ファイルシステムを利用するユーザとの間を仲介する管理サーバの障害からのシステム全体のサービス停止を回避することを可能とした分散ファイルシステム及び分散ファイルシステムにおける冗長化方法を提供することを目的とする。 The present invention has been proposed in view of the above circumstances, and it is possible to stop service of the entire system from a failure of a management server that mediates between a file server providing a storage unit (storage) and a user using a distributed file system. It is an object of the present invention to provide a distributed file system that can be avoided and a redundancy method in the distributed file system.

上記目的を達成するため請求項１の発明は、記憶部を有する複数のファイルサーバと、ユーザからの書込み要求を処理するプロキシサーバと、前記書込み要求に関するメタ情報を管理するメタデータサーバを備え、前記プロキシサーバに対して複数ユーザによるファイル書込み要求を行い前記ファイルサーバの記憶部に記録するネットワークを構成し、前記複数のファイルサーバを広域な範囲に分散配置させた分散ファイルシステムにおいて、次の構成を含むことを特徴としている。
前記プロキシサーバ及びこれに対応するメタデータサーバを複数対設ける。
そして、前記プロキシサーバは、ユーザからのＩ／Ｏ要求を受信するアクセス制御部と、前記メタデータサーバにアクセスし前記Ｉ／Ｏ要求の書込み要求に関するメタ情報を保存するためのクライアント機能を果たすファイルシステムアクセス部と、他のプロキシサーバに対し前記メタ情報の複製及び書込み履歴管理を行う複製処理部と、前記複製処理部が保持する書込みエラー履歴からプロキシサーバ間での定期的なデータ補完処理を実行するデータ更新部と、前記データ補完処理を実行する更新時期を環境設定ファイルから読み込む環境設定取込部を備えている。 In order to achieve the above object, the invention of claim 1 includes a plurality of file servers having a storage unit, a proxy server that processes a write request from a user, and a metadata server that manages meta information related to the write request, In the distributed file system in which a network for recording a file write request by a plurality of users to the proxy server and recording in the storage unit of the file server is configured, and the plurality of file servers are distributed in a wide range, the following configuration It is characterized by including.
A plurality of pairs of the proxy server and corresponding metadata servers are provided.
The proxy server includes an access control unit that receives an I / O request from a user, and a file that performs a client function for accessing the metadata server and storing meta information relating to a write request for the I / O request. A system access unit, a replication processing unit that performs replication and write history management of the meta information to other proxy servers, and a periodic data supplement processing between the proxy servers from the write error history held by the replication processing unit A data update unit to be executed, and an environment setting fetch unit that reads from the environment setting file the update time for executing the data complementing process.

請求項２は、記憶部を有する複数のファイルサーバと、ユーザからのＩ／Ｏ要求を処理するプロキシサーバと、前記Ｉ／Ｏ要求に関するメタ情報を管理するメタデータサーバを備え、前記プロキシサーバに対して複数ユーザによるファイル書込み要求を行い前記ファイルサーバの記憶部に記録するネットワークを構成し、前記複数のファイルサーバを広域な範囲に分散配置させた分散ファイルシステムにおいて、
前記プロキシサーバ及びこれに対応するメタデータサーバを複数対設ける一方、
一のユーザによる一つのプロキシサーバへのファイルサーバのＩ／Ｏ要求に対して、前記Ｉ／Ｏ要求を処理するプロキシサーバは、対応するメタデータサーバに前記ファイルサーバへのメタ情報を書き込むとともに、他のプロキシサーバに対して前記Ｉ／Ｏ要求の書込み要求の複製処理を行うことでユーザからのアクセスに同期して前記他のプロキシサーバに対応するメタデータサーバに対して並列にメタ情報を更新保存することでメタ情報の一貫性を保証することを特徴としている。 According to a second aspect of the present invention, the proxy server includes a plurality of file servers having a storage unit, a proxy server that processes an I / O request from a user, and a metadata server that manages meta information regarding the I / O request. In a distributed file system in which a file write request is made by a plurality of users and recorded in the storage unit of the file server, and the plurality of file servers are distributed and arranged in a wide range,
While providing a plurality of pairs of the proxy server and the corresponding metadata server,
In response to an I / O request of a file server to one proxy server by one user, the proxy server that processes the I / O request writes meta information to the file server in a corresponding metadata server, and The meta information is updated in parallel to the metadata server corresponding to the other proxy server in synchronization with the access from the user by performing the replication processing of the write request of the I / O request to the other proxy server. It is characterized by ensuring the consistency of meta information by storing.

請求項３は、請求項２の分散ファイルシステムにおける冗長化方法において、前記プロキシサーバ及びメタデータサーバの生存監視を行う生存監視サーバを設け、プロキシサーバ又はメタデータサーバの障害を検知し、ユーザからのＩ／Ｏ要求に対してプロキシサーバの選択を行うＤＮＳサーバの設定情報について、前記障害状況を考慮して更新することを特徴としている。 According to a third aspect of the present invention, in the redundancy method in the distributed file system according to the second aspect, a life monitoring server that performs the life monitoring of the proxy server and the metadata server is provided, and a failure of the proxy server or the metadata server is detected, and The setting information of the DNS server that selects the proxy server for the I / O request is updated in consideration of the failure status.

請求項４は、請求項３に記載の分散ファイルシステムにおける冗長化方法において、プロキシサーバ又はメタデータサーバが障害から復旧した場合に、正常にメタ情報を更新保存したメタデータサーバからメタ情報を自動的に再構築することを特徴としている。 According to a fourth aspect of the present invention, in the redundancy method in the distributed file system according to the third aspect, when the proxy server or the metadata server recovers from a failure, the meta information is automatically updated from the metadata server that has successfully updated and stored the meta information. It is characterized by restructuring.

請求項５は、請求項２の分散ファイルシステムにおける冗長化方法において、前記メタ情報の保存に際して、前記ユーザのＩ／Ｏ要求の書込み要求と並行して他のプロキシサーバに対して前記書込み要求の複製処理を行って各メタデータサーバで書込みを行い、書込み要求を処理するプロキシサーバに対して、各プロキシサーバの中から一つのプロキシサーバからの書込み要求に対する成功応答があった場合に、メタ情報を保存してファイルサーバ書込み終了の応答を前記ユーザに対し行うことを特徴としている。 According to a fifth aspect of the present invention, in the redundancy method in the distributed file system according to the second aspect, when the meta information is stored, the write request is sent to another proxy server in parallel with the user I / O request write request. When the replication processing is performed and each metadata server writes data, and the proxy server that processes the write request receives a successful response to the write request from one proxy server, the meta information Is stored and a response to the end of writing to the file server is sent to the user.

本発明の分散ファイルシステムによれば、Ｉ／Ｏ要求の書込み要求を処理するプロキシサーバは、対応するメタデータサーバにファイルサーバへのメタ情報を書き込むとともに、他のプロキシサーバに対して前記書込み要求の複製処理を行うことでユーザからのアクセスに同期して他のプロキシサーバに対応するメタデータサーバに対して並列にメタ情報を保持してメタ情報の一貫性を保証することができるとともに、ファイルサーバに書込みデータの複製を保存することができる。
したがって、その後におけるいかなるタイミングでメタデータサーバに障害が発生した場合でも、他のメタデータサーバが常時メタ情報の複製を保持しているため、ユーザが他のメタデータサーバの複製メタ情報を介して、書込みを行ったファイルサーバからデータを読み込むことができる。 According to the distributed file system of the present invention, the proxy server that processes the write request for the I / O request writes the meta information to the file server in the corresponding metadata server and the write request to the other proxy server. By performing the replication process, it is possible to maintain the meta information in parallel with the metadata server corresponding to the other proxy server in synchronization with the access from the user, and to guarantee the consistency of the meta information, A copy of the write data can be stored on the server.
Therefore, even if a failure occurs in the metadata server at any time after that, the other metadata server always keeps a copy of the metadata, so that the user can Data can be read from the file server that performed the writing.

また、データ更新部により、複製処理部が保持する書込みエラー履歴からプロキシサーバ間での定期的なデータ補完処理が実行されるので、プロキシサーバへの書込み要求の複製失敗が生じた場合でも、正しいデータへの更新を行うことができる。 In addition, since the data update unit performs periodic data complementing processing between proxy servers from the write error history held by the replication processing unit, it is correct even if a replication request replication failure occurs. Updates to the data can be made.

また、プロキシサーバ及びメタデータサーバの生存監視を行う生存監視サーバを設け、ＤＮＳサーバにおけるプロキシサーバ選択時の設定情報を更新することで、常時適切なプロキシサーバの選択を可能としている。 In addition, a survival monitoring server that performs the survival monitoring of the proxy server and the metadata server is provided, and the setting information when the proxy server is selected in the DNS server is updated, so that an appropriate proxy server can always be selected.

また、各プロキシサーバの中から一つのプロキシサーバからの書込み要求に対する成功応答があった場合に、メタ情報を保存してファイルサーバ書込み終了の応答を前記ユーザに対し行うようにすることで、ユーザに対する応答処理を速くすることができる。 In addition, when there is a successful response to a write request from one proxy server among the proxy servers, the meta information is saved and a response to the file server write end is sent to the user. The response process for can be made faster.

本発明の分散ファイルシステムの実施形態の一例を示す全体構成モデル図である。It is a whole structure model figure which shows an example of embodiment of the distributed file system of this invention. 本発明の分散ファイルシステムにおける冗長化方法を説明するためのモデル図である。It is a model figure for demonstrating the redundancy method in the distributed file system of this invention. 本発明の分散ファイルシステムにおける書込み処理の手順を説明するためのシーケンス図である。It is a sequence diagram for demonstrating the procedure of the write processing in the distributed file system of this invention. 本発明の分散ファイルシステムにおける読込み処理（成功例）の手順を説明するためのシーケンス図である。It is a sequence diagram for demonstrating the procedure of the reading process (successful example) in the distributed file system of this invention. 本発明の分散ファイルシステムにおける読込み処理（失敗例）の手順を説明するためのシーケンス図である。It is a sequence diagram for demonstrating the procedure of the reading process (failure example) in the distributed file system of this invention. 分散ファイルシステムに使用するプロキシサーバの構成を説明するためのブロック図である。It is a block diagram for demonstrating the structure of the proxy server used for a distributed file system. 分散ファイルシステムにおける要求複製発行（書込み）処理を説明するためのフローチャート図である。It is a flowchart for demonstrating request replication issue (write) processing in a distributed file system. 分散ファイルシステムにおける要求複製受信（読込み）処理を説明するためのフローチャート図である。It is a flowchart for demonstrating the request replication receiving (reading) process in a distributed file system. 分散ファイルシステムにおけるデータ送信処理を説明するためのフローチャート図である。It is a flowchart for demonstrating the data transmission process in a distributed file system. 分散ファイルシステムにおけるプロキシサーバの生存監視を説明するためのモデル図である。It is a model figure for demonstrating the survival monitoring of the proxy server in a distributed file system. 分散ファイルシステムにおけるメタデータサーバの生存監視を説明するためのモデル図である。It is a model figure for demonstrating the survival monitoring of the metadata server in a distributed file system. 分散ファイルシステムにおける異常応答履歴によるデータの補完を説明するためのモデル図である。It is a model figure for demonstrating the complementation of the data by the abnormal response log | history in a distributed file system. 従来の分散ファイルシステムの構成による冗長化方法を説明するためのモデル図である。It is a model for demonstrating the redundancy method by the structure of the conventional distributed file system.

本発明の分散ファイルシステムについて、図面を参照しながら説明する。
本発明の冗長化方法が適用される分散ファイルシステムは、図１に示すように、複数のファイルサーバ４と、複数のファイルサーバ４をそれぞれ管理する複数のプロキシサーバ２と、各プロキシサーバ２に対して一対一に対応して設置されるメタデータサーバ３でネットワークが構成されている。ユーザ端末１との間でＩ／Ｏ要求（書込みや読込み）及び通知（書込み終了通知やデータ閲覧）を行うプロキシサーバ２により、ユーザ端末１によるファイル書込み要求やファイル読込み要求に対して、ファイルサーバ４への書込みやファイルサーバ４からの読込みが行われる。
プロキシサーバ群、メタデータサーバ群は、各サーバが正常に動作しているかを検知する生存監視サーバ６に接続されることで、動作状況が監視されている。生存監視サーバ６が得た各サーバ生存データにより、ＤＮＳサーバ５の設定内容が更新される。
この例では、各５台のプロキシサーバ２及びメタデータサーバ３を並列に動作させ、３台のユーザ端末１それぞれからファイルアクセス要求を受けたプロキシサーバ２が、他のプロキシサーバ２に書込み要求を複製し、５台分のメタデータサーバのメタ情報を更新するように動作する。 The distributed file system of the present invention will be described with reference to the drawings.
As shown in FIG. 1, the distributed file system to which the redundancy method of the present invention is applied includes a plurality of file servers 4, a plurality of proxy servers 2 that respectively manage the plurality of file servers 4, and each proxy server 2. On the other hand, a network is configured by the metadata servers 3 installed in a one-to-one correspondence. In response to a file write request or file read request from the user terminal 1 by the proxy server 2 that makes an I / O request (write or read) and notification (write end notification or data browsing) with the user terminal 1. 4 is read or read from the file server 4.
The proxy server group and the metadata server group are connected to the alive monitoring server 6 that detects whether each server is operating normally, thereby monitoring the operation status. The setting contents of the DNS server 5 are updated with each server survival data obtained by the survival monitoring server 6.
In this example, five proxy servers 2 and metadata servers 3 are operated in parallel, and the proxy server 2 that receives a file access request from each of the three user terminals 1 sends a write request to another proxy server 2. It replicates and operates to update the meta information of the five metadata servers.

すなわち、図２に示すように、ユーザ端末１はＤＮＳサーバ５に対してアクセス先の問い合わせを行い、ユーザ端末１からのデータアクセス要求を受けた一のプロキシサーバ２は、他の全て（４つ）のプロキシサーバ２に書込み要求を複製する。書込み要求を複製した先の分散ファイルシステム内部では、複数のファイルサーバにデータを書き込むことにより可用性が保たれている。 That is, as shown in FIG. 2, the user terminal 1 inquires of the DNS server 5 about the access destination, and the one proxy server 2 that has received the data access request from the user terminal 1 The write request is replicated to the proxy server 2). In the distributed file system to which the write request is copied, the availability is maintained by writing data to a plurality of file servers.

次に、プロキシサーバ２及びメタデータサーバ３における書込み動作のシーケンスについて、図３及び図１を参照しながら説明する。
ユーザ端末１がファイルサーバ４に書込みを行う場合、先ず、ＤＮＳサーバ５に対してアクセス先の問い合わせを行う（ステップ３１）。
ＤＮＳサーバ５では、正常に動作しているプロキシサーバ２を選択し、アクセス先としてユーザ端末１に応答する（ステップ３２）。
ユーザ端末１では、ＤＮＳサーバ５からの応答を受け、複製元として選択されたプロキシサーバ２に対して書込み命令を出す（ステップ３３）。
書込み命令を受けた（複製元）プロキシサーバ２は、各プロキシサーバ２（この例では、複製元以外の残りの４つのプロキシサーバ）に対して書込み命令複製（ＩＯ命令の複製）を指示する（ステップ３４）。
複製元プロキシサーバ２は、ファイルサーバ群の中のファイルサーバ４にデータを書き込む処理を行う（ステップ３５）。
また、複製元プロキシサーバ２は、各複製先プロキシサーバ２から複製成功の応答を受信する（ステップ３６）。
また、複製元プロキシサーバ２は、複製先から成功応答があった場合に、ユーザ端末１に対して成功応答を送信する（ステップ３７）。 Next, a write operation sequence in the proxy server 2 and the metadata server 3 will be described with reference to FIGS. 3 and 1.
When the user terminal 1 writes to the file server 4, first, an inquiry about the access destination is made to the DNS server 5 (step 31).
The DNS server 5 selects the normally operating proxy server 2 and responds to the user terminal 1 as an access destination (step 32).
The user terminal 1 receives a response from the DNS server 5 and issues a write command to the proxy server 2 selected as the replication source (step 33).
The proxy server 2 that has received the write command (replication source) instructs each proxy server 2 (in this example, the remaining four proxy servers other than the copy source) to replicate the write command (replication of the IO command) ( Step 34).
The replication source proxy server 2 performs a process of writing data to the file server 4 in the file server group (step 35).
Further, the replication source proxy server 2 receives a replication success response from each replication destination proxy server 2 (step 36).
Further, when there is a success response from the replication destination, the replication source proxy server 2 transmits a success response to the user terminal 1 (step 37).

複製元プロキシサーバ２におけるファイルシステムの内部における処理について説明する。
複製元プロキシサーバ２では、対応するメタデータサーバ３に対して書込み先のファイルサーバ４を問い合わせる（ステップ４１）。
プロキシサーバ２には、メタデータサーバ３から書込み先となるファイルサーバ４に関する応答がなされ（ステップ４２）、それぞれのファイルサーバ４に対する書込み処理が行われる（ステップ４３）。そして、プロキシサーバ２は、ファイルサーバ４における書込み終了を受信する（ステップ４４）。
この処理は、他のプロキシサーバ（複製先プロキシサーバ）においても並列に行われるため、書込みは複数のファイルサーバにおいても行われるので、データが複数のファイルサーバで併存して記録される。 Processing inside the file system in the replication source proxy server 2 will be described.
The replication source proxy server 2 inquires of the corresponding metadata server 3 about the write destination file server 4 (step 41).
The proxy server 2 receives a response from the metadata server 3 regarding the file server 4 that is the write destination (step 42), and performs a write process for each file server 4 (step 43). Then, the proxy server 2 receives the end of writing in the file server 4 (step 44).
Since this processing is also performed in parallel in other proxy servers (replication destination proxy servers), writing is also performed in a plurality of file servers, so that data is recorded in parallel in the plurality of file servers.

複製元プロキシサーバからの応答成功（ステップ３７）を返すタイミングは、複製元プロキシサーバにおける書込み終了の受信（ステップ４４）の後でも良く、また、他のプロキシサーバ（複製先プロキシサーバ）からの成功応答（ステップ３６）のいずれかの後でもよい。
そして、いずれかの成功応答があった場合に、ステップ３７のファイルサーバ書込み終了の応答（書込み要求に対する成功応答）を複製元プロキシサーバからユーザ端末に対し行うようにすれば、ユーザへの応答を速く行うことができる。 The timing of returning a response success from the replication source proxy server (step 37) may be after the reception of the write end in the replication source proxy server (step 44), or success from another proxy server (replication destination proxy server). It may be after any of the responses (step 36).
If any success response is received, a response to the end of file server write in step 37 (success response to the write request) is sent from the replication source proxy server to the user terminal. Can be done fast.

次に、プロキシサーバに対する読み込み動作のシーケンスについて、図４を参照しながら説明する。
ユーザ端末がファイルサーバに読込みを行う場合、ＤＮＳサーバに対してアクセス先の問い合わせを行うと（ステップ５１）、読込み先のプロキシサーバがユーザ端末に対して応答される（ステップ５２）。ユーザ端末は指定されたプロキシサーバから読込を行う（ステップ５３）。この時、プロキシサーバは、メタデータサーバを参照してデータが記録されたファイルサーバを特定して読込が行われ、ユーザ端末に成功応答の信号を送信する（ステップ５４）。 Next, a reading operation sequence for the proxy server will be described with reference to FIG.
When the user terminal reads into the file server, when an access destination inquiry is made to the DNS server (step 51), the read-in proxy server responds to the user terminal (step 52). The user terminal reads from the designated proxy server (step 53). At this time, the proxy server refers to the metadata server, specifies the file server on which the data is recorded, reads the file server, and transmits a success response signal to the user terminal (step 54).

プロキシサーバからの読込み動作が１回で成功しない場合は、図５のシーケンスによって処理される。
ユーザ端末がファイルサーバに読込みを行う場合、ＤＮＳサーバに対してアクセス先の問い合わせを行うと（ステップ５１）、読込み先のプロキシサーバがユーザ端末に対して応答される（ステップ５２）。
ユーザ端末１は指定されたプロキシサーバ２から読込み（ステップ５３）に失敗した場合、他（複製先）のプロキシサーバ２に対して読込み命令を転送することで複製する（ステップ５５）。
複製先プロキシサーバ２は、メタデータサーバ３を参照してデータが記録されたファイルサーバ４を特定して読込が行われ、複製元プロキシサーバに２対して成功応答がなされ（ステップ５６）、ユーザ端末１に成功応答の信号を送信する（ステップ５４）。 When the reading operation from the proxy server is not successful at one time, it is processed by the sequence of FIG.
When the user terminal reads into the file server, when an access destination inquiry is made to the DNS server (step 51), the read-in proxy server responds to the user terminal (step 52).
When the user terminal 1 fails to read from the designated proxy server 2 (step 53), the user terminal 1 makes a copy by transferring a read command to the other (duplication destination) proxy server 2 (step 55).
The replication destination proxy server 2 refers to the metadata server 3 and reads the file server 4 on which the data has been recorded, and a success response is made to the replication source proxy server 2 (step 56). A success response signal is transmitted to the terminal 1 (step 54).

次に、プロキシサーバ２を構成する各モジュールの概要について、図６を参照しながら説明する。
前記プロキシサーバ２は、ユーザからのＩ／Ｏ要求を受信するアクセス制御部２１と、前記メタデータサーバ２にアクセスし前記Ｉ／Ｏ要求の書込み要求に関するメタ情報を保存するためのクライアント機能を果たすファイルシステムアクセス部２５と、他のプロキシサーバ２に対し前記メタ情報の複製及び書込み履歴管理を行う複製処理部２２と、前記複製処理部２２が保持する書込みエラー履歴からプロキシサーバ間での定期的なデータ補完処理を実行する保管データ更新部２３と、書込みエラーの発生に対するデータの更新時期を予め設定した設定ファイルから読み込む環境設定取込部２４を備えて構成されている。 Next, an outline of each module constituting the proxy server 2 will be described with reference to FIG.
The proxy server 2 performs an access control unit 21 that receives an I / O request from a user and a client function for accessing the metadata server 2 and storing meta information related to a write request for the I / O request. A file system access unit 25, a replication processing unit 22 that performs replication and writing history management of the meta information with respect to another proxy server 2, and a regularity between proxy servers based on a write error history held by the replication processing unit 22. The storage data updating unit 23 executes a data complementing process, and the environment setting capturing unit 24 reads from a setting file in which the data update time for the occurrence of a write error is set in advance.

アクセス制御部（WebDAV等）２１は、外部アプリケーションと本ファイルシステムを利用可能とするための仲介を行うものである。
環境設定取込部２４は、データの更新（同期）時期について、環境設定ファイルから読込むものである。一定周期毎にデータの更新を行うことで、書込みエラーが発生した場合についても、データの補完を行うことで最新データの保持が可能となる。
複製処理部２２は、ファイルアクセス命令の他プロキシサーバ２への複製、応答受信解析、書込み履歴管理を行う。書込みエラーの履歴は、書込みエラーが生じた場合に後述するエラー履歴管理テーブルを作成することで管理されている。
データ更新部２３は、複製処理部２２が保持する書込みエラー履歴から、プロキシサーバ間での定期的なデータ補完処理を実行する。
ファイルシステムアクセス部２５は、ファイルシステムにアクセスするためのクライアント機能を果たす。従来技術として説明したGfarmを例にすると、Gfarmクライアントプログラムがこれに相当する。 The access control unit (WebDAV or the like) 21 mediates an external application and the file system to be usable.
The environment setting fetch unit 24 reads data update (synchronization) time from the environment setting file. By updating the data at regular intervals, even when a write error occurs, the latest data can be retained by complementing the data.
The replication processing unit 22 performs replication of the file access command to the proxy server 2, response reception analysis, and write history management. The history of write errors is managed by creating an error history management table to be described later when a write error occurs.
The data update unit 23 executes a periodic data complementing process between proxy servers from the write error history held by the replication processing unit 22.
The file system access unit 25 performs a client function for accessing the file system. Taking Gfarm described as the prior art as an example, the Gfarm client program corresponds to this.

続いて、要求複製発行（書込み）処理を行う場合の手順について、図７を参照しながら説明する。
アクセス制御部２１が外部アプリケーションから書込み要求を受けた場合、アクセス制御部２１は複製制御部２２に対してファイルアクセス要求を行う（ステップ１０１）。
複製制御部２２では、要求パラメタを抽出し（ステップ１０２）、要求複製先のプロキシサーバ２を選出し（ステップ１０３）、複製先状態チェックを行う（ステップ１０４）。
複製先となるプロキシサーバ２が存在する場合（ステップ１０５）には、当該プロキシサーバ２に対して複製要求を送信する（ステップ１０６）。複製先となるプロキシサーバ２が存在する限り、ステップ１０３〜１０６を繰り返す。
複製先となるプロキシサーバ２がなくなった場合（ステップ１０５）には、複製制御部２２が複製先からの応答を受信し（ステップ１０７）、応答内容の一時登録（エラー履歴の登録）を行う（ステップ１０８）。
指定件数受信済であるか（複製先の数と同数の応答があったか）を判断し（ステップ１０９）、まだの場合はステップ１０７〜１０９を繰り返し、受信済の場合は、アクセス制御部２１への応答を行う（ステップ１１０）。 Next, the procedure for performing the requested copy issuing (writing) process will be described with reference to FIG.
When the access control unit 21 receives a write request from an external application, the access control unit 21 makes a file access request to the replication control unit 22 (step 101).
The replication control unit 22 extracts request parameters (step 102), selects a requested replication destination proxy server 2 (step 103), and performs a replication destination state check (step 104).
If there is a proxy server 2 that is the replication destination (step 105), a replication request is transmitted to the proxy server 2 (step 106). Steps 103 to 106 are repeated as long as the proxy server 2 serving as a replication destination exists.
When there is no proxy server 2 serving as a replication destination (step 105), the replication control unit 22 receives a response from the replication destination (step 107), and temporarily registers the response content (error history registration) (step 107). Step 108).
It is determined whether the designated number has been received (whether there have been the same number of responses as the number of copy destinations) (step 109). If not, steps 107 to 109 are repeated. A response is made (step 110).

次に、要求複製受信（読込み）処理を行う場合の手順について、図８を参照しながら説明する。
アクセス制御部２１が外部アプリケーションから読込み要求を受けた場合、アクセス制御部２１は複製制御部２２に対してファイルアクセス要求を行う（ステップ２０１）。
複製制御部２２では、自身のローカルファイルシステムからの読込み処理を行う（ステップ２０２）。ローカルファイルシステムからの読込み処理が成功した場合は（ステップ２０３）、そのまま処理が終了する。
ローカルファイルシステムからの読込み処理が失敗した場合は（ステップ２０３）、複製制御部２２では、要求パラメタを抽出し（ステップ２０４）、要求複製先のプロキシサーバ２を選出する（ステップ２０５）。
複製先となるプロキシサーバ２が存在する場合（ステップ２０６）には、当該プロキシサーバ２に対して複製要求を送信し（ステップ２０７）、複製先からの応答を受信する（ステップ２０８）。アクセス制御部２１への応答を行う（ステップ２１０）。複製先となるプロキシサーバがない場合（ステップ２０６）には、アクセス制御部２１への応答（エラー応答）が行われる（ステップ２１０）。
複製先からの応答による受信が失敗した場合は（ステップ２０９）、複製先となるプロキシサーバ２がなくなるまで（ステップ２０６）、ステップ２０５〜２０９を繰り返す。 Next, the procedure for performing the request copy reception (reading) process will be described with reference to FIG.
When the access control unit 21 receives a read request from an external application, the access control unit 21 makes a file access request to the replication control unit 22 (step 201).
The replication control unit 22 performs reading processing from its own local file system (step 202). If the read process from the local file system is successful (step 203), the process ends as it is.
When the reading process from the local file system fails (step 203), the replication control unit 22 extracts the request parameter (step 204) and selects the proxy server 2 as the request replication destination (step 205).
If there is a proxy server 2 as a replication destination (step 206), a replication request is transmitted to the proxy server 2 (step 207), and a response from the replication destination is received (step 208). A response to the access control unit 21 is made (step 210). If there is no proxy server as a replication destination (step 206), a response (error response) to the access control unit 21 is performed (step 210).
If reception by a response from the copy destination fails (step 209), steps 205 to 209 are repeated until there is no proxy server 2 serving as a copy destination (step 206).

次に、複製処理部２２、環境設定取込部２４及びデータ更新部２３で行われるデータ同期処理の手順について、図９を参照しながら説明する。
環境設定取込部２４の環境設定ファイルで指定された時間が経過（同期タイママイムアウト）した場合（ステップ３０１）、複製処理部２２に一時的に登録されたエラー履歴を参照し（ステップ３０２）、履歴がある場合には（ステップ３０３）、成功複製先から失敗複製先へデータ転送指示を行う（ステップ３０４）。複製処理部２２が転送結果を受信し（ステップ３０５）、転送が成功した場合は（ステップ３０６）、複製処理部２２に一時的に登録されたエラー履歴を削除し（ステップ３０７）、失敗した場合は履歴を削除することなしに、ステップ３０２〜３０７の処理について、履歴無になる（ステップ３０３）まで繰り返す。 Next, a procedure of data synchronization processing performed by the replication processing unit 22, the environment setting fetch unit 24, and the data update unit 23 will be described with reference to FIG.
When the time specified in the environment setting file of the environment setting capturing unit 24 has elapsed (synchronization timer mime-out) (step 301), the error history temporarily registered in the replication processing unit 22 is referred to (step 302). If there is a history (step 303), a data transfer instruction is issued from the successful replication destination to the failed replication destination (step 304). When the copy processing unit 22 receives the transfer result (step 305) and the transfer is successful (step 306), the error history temporarily registered in the copy processing unit 22 is deleted (step 307) and fails. Without deleting the history, the processing in steps 302 to 307 is repeated until there is no history (step 303).

次に、プロキシサーバ２またはメタデータサーバ３自体がダウンした場合の対処について、図１０及び図１１を参照しながら説明する。
プロキシサーバ２及びメタデータサーバ３に対しては、生存監視サーバ６により動作状況が監視されているので、プロキシサーバ２がダウンした場合、生存監視サーバ６がプロキシサーバ２のダウンを検知し、ダウンしたプロキシサーバ２にユーザ端末１からのアクセスが行かないように、生存監視サーバ６がＤＮＳサーバ５の設定を更新する（図１０）。
そして、ダウンしたプロキシサーバ２が復活した場合は、ＤＮＳサーバ５の設定を生存監視サーバ６が更新する。 Next, a countermeasure when the proxy server 2 or the metadata server 3 itself goes down will be described with reference to FIGS. 10 and 11.
The operation status of the proxy server 2 and the metadata server 3 is monitored by the alive monitoring server 6, so when the proxy server 2 is down, the alive monitoring server 6 detects that the proxy server 2 is down and The existence monitoring server 6 updates the setting of the DNS server 5 so that the proxy server 2 is not accessed from the user terminal 1 (FIG. 10).
Then, when the down proxy server 2 is restored, the survival monitoring server 6 updates the setting of the DNS server 5.

メタデータサーバ３がダウンした場合、生存監視サーバ６がメタデータサーバ３のダウンを検知し、ダウンしたメタデータサーバ３と対のプロキシサーバ２に、ユーザ端末１からのアクセスが行かないように、生存監視サーバ６がＤＮＳサーバ５の設定を更新する（図１１）。
メタデータサーバ３がダウンしてからＤＮＳサーバ５の設定が更新されるまでの間に、ダウンしたメタデータサーバ３と対のクライアントサー２バにアクセスしたユーザに対しては、書込みの場合、要求複製先の成功をもって、ユーザには成功を返す。読込みの場合は、要求複製先からの読込みデータを返す。 When the metadata server 3 is down, the survival monitoring server 6 detects that the metadata server 3 is down, and the proxy server 2 paired with the down metadata server 3 is not accessed from the user terminal 1. The existence monitoring server 6 updates the setting of the DNS server 5 (FIG. 11).
For the user who accesses the client server 2 paired with the metadata server 3 that has gone down after the metadata server 3 has gone down until the setting of the DNS server 5 is updated, The success is returned to the user with the success of the copy destination. In the case of reading, the read data from the request replication destination is returned.

プロキシサーバ２やメタデータサーバ３がダウンしたことの検出は、上述した生存監視サーバ６による監視以外に、複製要求の応答タイムアウトによっても検出し、上記同様のＤＮＳ更新によるプロキシサーバ２またはメタデータサーバ３の切り離し処理を行ってもよい。
尚、基本的には複数台のプロキシサーバ２やメタデータサーバ３が、同時にダウンすることは想定していない。
ダウンしたメタデータサーバ３が復活した場合、ＤＮＳサーバ５の設定を生存監視サーバ６が自動更新する。 The detection that the proxy server 2 or the metadata server 3 is down is detected not only by the above-described monitoring by the survival monitoring server 6 but also by a response timeout of the replication request, and the proxy server 2 or the metadata server by the DNS update similar to the above. 3 separation processing may be performed.
Basically, it is not assumed that a plurality of proxy servers 2 and metadata servers 3 go down at the same time.
When the down metadata server 3 is restored, the alive monitoring server 6 automatically updates the setting of the DNS server 5.

次に、プロキシサーバ２への複製に際して書込みエラーが生じた場合のデータ補完処理について、図１２を参照しながら説明する。
プロキシサーバＡが複製元となる場合、プロキシサーバＢ，Ｃ，Ｄ，Ｅにおいて書込み要求（file１）の複製が行われる。この時、プロキシサーバＥでの複製が書込みエラーにより失敗した場合、プロキシサーバＡの複製処理部２２では、エラー履歴管理テーブルが作成される。エラー履歴管理テーブルには、成功複製先プロキシサーバ、失敗複製先プロキシサーバ、失敗したファイル名、原因の情報が記録される。
そして、環境設定取込部２４において入力された環境設定ファイルで定義される周期毎（データ更新時）に、データ更新部２３におけるデータ更新が行われる。この場合、データ更新時において、プロキシサーバＡは、プロキシサーバＢ，Ｃ，Ｄに書込み要求（file１）が正常に複製されたことを認識しているので、その中から例えばプロキシサーバＣを選択して補完指示命令を出し、プロキシサーバＣからプロキシサーバＥに書込み要求（file１）の補完を行うことでデータが再構築される。 Next, data supplement processing when a write error occurs during replication to the proxy server 2 will be described with reference to FIG.
When the proxy server A is the replication source, the proxy server B, C, D, E replicates the write request (file1). At this time, if replication in the proxy server E fails due to a write error, the replication processing unit 22 of the proxy server A creates an error history management table. In the error history management table, the succeeding replication destination proxy server, the failed replication destination proxy server, the failed file name, and cause information are recorded.
Then, the data update unit 23 updates the data for each period (during data update) defined in the environment setting file input by the environment setting fetch unit 24. In this case, at the time of data update, the proxy server A recognizes that the write request (file1) has been normally replicated to the proxy servers B, C, and D, and therefore, for example, selects the proxy server C. Then, a complement instruction command is issued and the proxy server C complements the write request (file1) to the proxy server E to reconstruct the data.

同様に、プロキシサーバＢが複製元となる場合は、プロキシサーバＡ，Ｃ，Ｄ，Ｅにおいて書込み要求（file２）の複製が行われる。この時、プロキシサーバＣでの複製がタイムアウトにより失敗した場合、プロキシサーバＢの複製処理部２２では、エラー履歴管理テーブルが作成される。エラー履歴管理テーブルには、成功複製先プロキシサーバ、失敗複製先プロキシサーバ、失敗したファイル名、原因の情報が記録される。
そして、環境設定取込部２４において入力された環境設定ファイルで定義される周期毎（データ更新時）に、データ更新部２３におけるデータ更新が行われる。この場合、データ更新時において、プロキシサーバＢは、プロキシサーバＡ，Ｄ，Ｅに書込み要求（file２）が正常に複製されたことを認識しているので、その中から例えばプロキシサーバＤを選択して補完指示命令を出し、プロキシサーバＤからプロキシサーバＣに書込み要求（file２）の補完を行うことでデータが再構築される。 Similarly, when the proxy server B is the replication source, the proxy server A, C, D, E replicates the write request (file2). At this time, if the replication at the proxy server C fails due to a timeout, the replication processing unit 22 of the proxy server B creates an error history management table. In the error history management table, the succeeding replication destination proxy server, the failed replication destination proxy server, the failed file name, and cause information are recorded.
Then, the data update unit 23 updates the data for each period (during data update) defined in the environment setting file input by the environment setting fetch unit 24. In this case, at the time of data update, the proxy server B recognizes that the write request (file2) has been normally replicated to the proxy servers A, D, and E, and therefore selects, for example, the proxy server D from among them. Then, a complement instruction command is issued and the proxy server D complements the write request (file2) to the proxy server C, thereby reconstructing the data.

すなわち、プロキシサーバ２やメタデータサーバ３のダウンにともなうデータの欠落を補うため、ユーザの要求処理とは独立なタイミング（設定ファイルに定義する周期）に、各プロキシサーバが保持する過去の書込みエラー履歴をもとに、他のファイルシステムにデータを補うようになっている。エラー履歴のみに基づいた仕組みにより、各プロキシサーバ２が少量の処理能力を割くことでデータの一貫性を維持できる。 In other words, in order to compensate for data loss due to the proxy server 2 or metadata server 3 being down, past write errors held by each proxy server at a timing independent of user request processing (cycle defined in the configuration file) Data is supplemented to other file systems based on the history. With a mechanism based only on the error history, each proxy server 2 can maintain data consistency by allocating a small amount of processing capability.

上述した分散ファイルシステムによれば、書込み終了後にいかなるタイミングでメタデータサーバ３に障害が発生した場合でも、他のプロキシサーバ２に対して書込み要求の複製が行われているので、プロキシサーバ２に対して対となるメタデータサーバ３が常時メタ情報の複製を保持しているため、ユーザが書込みを行ったデータの一部を取り出せなくなるというリスクを回避しつつサービスを継続することが可能となる。 According to the distributed file system described above, even when a failure occurs in the metadata server 3 at any timing after the writing is completed, the write request is replicated to the other proxy server 2. On the other hand, since the paired metadata server 3 always keeps a copy of the meta information, it is possible to continue the service while avoiding the risk that the user cannot take out a part of the written data. .

また、データ更新部２３により、複製処理部２２が保持する書込みエラー履歴からプロキシサーバ間での定期的なデータ補完処理が実行されるので、プロキシサーバ２への書込み要求の複製失敗が生じた場合でも、正しいデータへの更新を行うことができる。 In addition, when the data update unit 23 performs a periodic data complementing process between the proxy servers from the write error history held by the replication processing unit 22, the replication failure of the write request to the proxy server 2 occurs. But you can update to the correct data.

また、プロキシサーバ２及びメタデータサーバ３の生存監視を行う生存監視サーバ６を設け、ＤＮＳサーバ５におけるプロキシサーバ選択時の設定情報を更新することで、常時適切なプロキシサーバ２の選択を可能としている。 In addition, by providing a survival monitoring server 6 that performs the survival monitoring of the proxy server 2 and the metadata server 3, and updating the setting information when the proxy server is selected in the DNS server 5, it is possible to always select an appropriate proxy server 2 Yes.

また、各プロキシサーバ２の中から一つのプロキシサーバ２からの書込み要求に対する成功応答があった場合に、メタ情報を保存してファイルサーバ書込み終了の応答をユーザに対し行うようにすることで、全てのプロキシサーバ２からの応答を待たずにデータの対障害性が確保されたことをユーザに対して応答することができる。 In addition, when there is a successful response to a write request from one proxy server 2 from among each proxy server 2, by saving the meta information and making a response to the file server write end to the user, It is possible to respond to the user that the fault tolerance of data has been secured without waiting for responses from all proxy servers 2.

１…ユーザ端末、２…プロキシサーバ、３…メタデータサーバ、４…ファイルサーバ、５…ＤＮＳサーバ、６…生存監視サーバ、２１…アクセス制御部、２２…複製制御部、２３…データ更新部、２４…環境設定取込部、２５…ファイルシステムアクセス部。 DESCRIPTION OF SYMBOLS 1 ... User terminal, 2 ... Proxy server, 3 ... Metadata server, 4 ... File server, 5 ... DNS server, 6 ... Survival monitoring server, 21 ... Access control part, 22 ... Duplication control part, 23 ... Data update part, 24 ... Environmental setting capture unit 25 ... File system access unit

Claims

記憶部を有する複数のファイルサーバと、ユーザからのＩ／Ｏ要求を処理するプロキシサーバと、前記Ｉ／Ｏ要求に関するメタ情報を管理するメタデータサーバを備え、前記プロキシサーバに対して複数ユーザによるファイル書込み要求を行い前記ファイルサーバの記憶部に記録するネットワークを構成し、前記複数のファイルサーバを広域な範囲に分散配置させた分散ファイルシステムにおいて、
前記プロキシサーバ及びこれに対応するメタデータサーバを複数対設ける一方、
前記プロキシサーバは、
ユーザからのＩ／Ｏ要求を受信するアクセス制御部と、
前記メタデータサーバにアクセスし前記Ｉ／Ｏ要求の書込み要求に関するメタ情報を保存するためのクライアント機能を果たすファイルシステムアクセス部と、
他のプロキシサーバに対し前記メタ情報の複製及び書込み履歴管理を行う複製処理部と、
前記複製処理部が保持する書込みエラー履歴からプロキシサーバ間での定期的なデータ補完処理を実行するデータ更新部と、
前記データ補完処理を実行する更新時期を環境設定ファイルから読み込む環境設定取込部と、
を備えたことを特徴とする分散ファイルシステム。 A plurality of file servers having a storage unit; a proxy server that processes I / O requests from users; and a metadata server that manages meta information related to the I / O requests. In a distributed file system in which a file write request is made and a network is recorded in the storage unit of the file server, and the plurality of file servers are distributed and arranged in a wide range,
While providing a plurality of pairs of the proxy server and the corresponding metadata server,
The proxy server is
An access control unit that receives an I / O request from a user;
A file system access unit that performs a client function to access the metadata server and store meta information related to the write request of the I / O request;
A replication processing unit that performs replication and writing history management of the meta information to another proxy server;
A data update unit that performs periodic data complementing processing between proxy servers from the write error history held by the replication processing unit;
An environment setting capturing unit that reads from the environment setting file the update time for executing the data complementing process;
A distributed file system comprising:

記憶部を有する複数のファイルサーバと、ユーザからのＩ／Ｏ要求を処理するプロキシサーバと、前記Ｉ／Ｏ要求に関するメタ情報を管理するメタデータサーバを備え、前記プロキシサーバに対して複数ユーザによるファイル書込み要求を行い前記ファイルサーバの記憶部に記録するネットワークを構成し、前記複数のファイルサーバを広域な範囲に分散配置させた分散ファイルシステムにおいて、
前記プロキシサーバ及びこれに対応するメタデータサーバを複数対設ける一方、
一のユーザによる一つのプロキシサーバへのファイルサーバの書込み要求に対して、前記Ｉ／Ｏ要求を処理するプロキシサーバは、対応するメタデータサーバに前記ファイルサーバへのメタ情報を書き込むとともに、他のプロキシサーバに対して前記Ｉ／Ｏ要求の書込み要求の複製処理を行うことでユーザからのアクセスに同期して前記他のプロキシサーバに対応するメタデータサーバに対して並列にメタ情報を更新保存することでメタ情報の一貫性を保証する分散ファイルシステムにおける冗長化方法。 A plurality of file servers having a storage unit; a proxy server that processes I / O requests from users; and a metadata server that manages meta information related to the I / O requests. In a distributed file system in which a file write request is made and a network is recorded in the storage unit of the file server, and the plurality of file servers are distributed and arranged in a wide range,
While providing a plurality of pairs of the proxy server and the corresponding metadata server,
In response to a write request of the file server to one proxy server by one user, the proxy server that processes the I / O request writes the meta information to the file server to the corresponding metadata server, By copying the I / O request write request to the proxy server, the meta information is updated and stored in parallel with the metadata server corresponding to the other proxy server in synchronization with the access from the user. A redundancy method in a distributed file system that ensures consistency of meta information.

前記プロキシサーバ及びメタデータサーバの生存監視を行う生存監視サーバを設け、プロキシサーバ又はメタデータサーバの障害を検知し、ユーザからのＩ／Ｏ要求に対してプロキシサーバの選択を行うＤＮＳサーバの設定情報について、前記障害状況を考慮して更新する請求項２に記載の分散ファイルシステムにおける冗長化方法。 Setting up a DNS server that provides a liveness monitoring server that performs liveness monitoring of the proxy server and metadata server, detects a failure of the proxy server or metadata server, and selects a proxy server in response to an I / O request from a user The redundancy method in the distributed file system according to claim 2, wherein information is updated in consideration of the failure status.

プロキシサーバ又はメタデータサーバが障害から復旧した場合に、正常にメタ情報を更新保存したメタデータサーバからメタ情報を自動的に再構築する請求項３に記載の分散ファイルシステムにおける冗長化方法。 4. The redundancy method in the distributed file system according to claim 3, wherein when the proxy server or the metadata server recovers from a failure, the meta information is automatically reconstructed from the metadata server that has normally updated and stored the meta information.

前記メタ情報の保存に際して、
前記ユーザの書込み要求と並行して他のプロキシサーバに対して前記Ｉ／Ｏ要求の書込み要求の複製処理を行って各メタデータサーバで書込みを行い、
書込み要求を処理するプロキシサーバに対して、各プロキシサーバの中から一つのプロキシサーバからの書込み要求に対する成功応答があった場合に、
メタ情報を保存してファイルサーバ書込み終了の応答を前記ユーザに対し行う
請求項２に記載の分散ファイルシステムにおける冗長化方法。 When storing the meta information,
In parallel with the user's write request, perform a copy process of the write request of the I / O request to other proxy servers and write in each metadata server,
When there is a successful response to a write request from one proxy server among the proxy servers for the proxy server that processes the write request,
The redundancy method in the distributed file system according to claim 2, wherein the meta information is stored and a response to the file server writing end is sent to the user.