JP2012027587A

JP2012027587A - Data distribution storage, method, program and storage medium

Info

Publication number: JP2012027587A
Application number: JP2010163834A
Authority: JP
Inventors: Kenji Miyayasu; 憲治宮保; Yoichiro Ueno; 洋一郎上野; Shuichi Suzuki; 秀一鈴木; Kazuo Ichihara; 和雄市原
Original assignee: Net&logic; NET&LOGIC Inc; Tokyo Denki University
Current assignee: Net&logic; NET&LOGIC Inc; Tokyo Denki University
Priority date: 2010-07-21
Filing date: 2010-07-21
Publication date: 2012-02-09
Anticipated expiration: 2030-07-21
Also published as: JP5594828B2

Abstract

PROBLEM TO BE SOLVED: To provide a data distribution storage and method capable of reducing load on the data distribution storage.SOLUTION: A data distribution storage according to the invention comprises: a data modification section 11 that modifies a data array of input data I; a data dividing section 12 that divides the data into plural data pieces Pto P; a calculation operation section 13 that calculates a calculation value Ounique to the input data I and calculation values Oto Oof the data pieces; a metadata storage section 14 that stores metadata M in which the calculation value O, the calculation values Oto Oand modification record of the data modification section 11 associated with each other; a distribution data configuring section 15 that configures distribution data Dto Dby adding the calculation value Oand the calculation values Oto Oto each of the data pieces Pto P; and a server distribution data receiving/transmitting section 17 that transmits the distribution data Dto Dto a client 3having an identifier corresponding to the calculation values Oto O.

Description

本発明は、通信ネットワークを用いて複数の物理的装置、複数の論理的装置又はこれらを組み合わせた複数のクライアントへデータの保管を行うデータ分散保管装置及び方法及びプログラム及び記録媒体に関する。 The present invention relates to a data distributed storage device and method, a program, and a recording medium that store data in a plurality of physical devices, a plurality of logical devices, or a plurality of clients that are a combination thereof using a communication network.

近年データの電子化が急速に進み、サーバへのデータ蓄積量は顕著な増加を見せている。こうしたデータのうち、業務遂行やサービスの提供に重要なデータは、ＢＣＰ（事業継続計画）などの概念で示されるように、災害や不慮の事故、サイバーテロなど悪意あるアクセスから守られ、システム停止やデータ損失を最小限に抑えるための仕組みが必要とされている。こうした中、サーバの置かれた場所を物理的に保護し、これらを相互に接続し、相互にバックアップを行うシステムや、クラウドを利用したバックアップシステムが提案され、管理の仕組みも徐々に普及しつつある。また、ディザスタリカバリ（災害復旧）というキーワードでデータを分散保管する仕組みも、様々な手法が提案されている。 In recent years, the digitization of data has progressed rapidly, and the amount of data stored in the server has increased remarkably. Among these data, data that is important for business execution and service provision is protected from malicious access such as disasters, unforeseen accidents, and cyber terrorism, as indicated by concepts such as BCP (Business Continuity Plan). And a mechanism for minimizing data loss is needed. Under these circumstances, a system that physically protects the server location, connects them to each other, and backs up each other and a backup system using the cloud have been proposed, and the management mechanism is gradually spreading. is there. Various methods for distributing and storing data with the keyword of disaster recovery (disaster recovery) have been proposed.

守秘の必要なデータを有線や無線ネットワークで送信し、データを作成した装置とは別の装置に保管する場合、送出者がデータをＤＥＳ（ＤａｔａＥｎｃｒｙｐｔｉｏｎＳｔａｎｄａｒｄ）やＡＥＳ（ＡｄｖａｎｃｅｄＥｎｃｒｙｐｔｉｏｎＳｔａｎｄａｒｄ）といった暗号を用いて暗号化し、保管先のクライアントへ送付することが一般的である。 When data that needs confidentiality is transmitted via a wired or wireless network and stored in a device other than the device that created the data, the sender sends the data using a DES (Data Encryption Standard) or AES (Advanced Encryption Standard) cipher. It is common to encrypt the data and send it to a client at the storage destination.

高い守秘性能と同時に冗長性を確保するために、データを分散して保管するデータ分散保管装置が提案されている（例えば、特許文献１から３及び非特許文献１及び２を参照。）。たとえば、特許文献１のデータ分散保管装置は、保管するデータを撹拌して分割し、分割したデータピースをネットワーク上の複数のクライアントに分散して保管する。これにより、従来の一対一の暗号化とは異なった考え方による守秘性の向上、保管の確実性と処理の軽量化を実現している。 In order to ensure high redundancy as well as high confidentiality performance, data distributed storage devices that distribute and store data have been proposed (see, for example, Patent Documents 1 to 3 and Non-Patent Documents 1 and 2). For example, the data distribution storage device of Patent Document 1 agitates and divides data to be stored, and distributes and stores the divided data pieces to a plurality of clients on the network. This achieves improved confidentiality, storage reliability and lighter processing based on a different concept from conventional one-to-one encryption.

ＷＯ２００７／１１１０８６号公報WO2007 / 111108 Publication 特開２０１０−４５６７０号公報JP 2010-45670 A 特開２０１０−９２３３７号公報JP 2010-92337 A

宮保憲治、「災害時に備えたディザスタリカバリ技術の新しい展開」、ＩＰＥＪＪｏｕｒｎａｌ、Ｖｏｌ．２１、Ｎｏ．１２、ｐｐ．８−ｐｐ．１１（２００９）Noriharu Miyabo, “New development of disaster recovery technology in case of disaster”, IPEJ Journal, Vol. 21, no. 12, pp. 8-pp. 11 (2009) ＫｅｎｊｉＭｏｒｉ，ＹｏｉｃｈｉｒｏＵｅｎｏ，ＳｕｚｕｋｉＳｈｕｉｃｈｉ，ＫａｚｕｏＩｃｈｉｈａｒａ，ＮｏｒｉｈａｒｕＭｉｙａｈｏ，“Ｓｔｕｄｙｏｎｔｈｅｐｅｒｆｏｒｍａｎｃｅｅｖａｌｕａｔｉｏｎａｐｐｌｙｉｎｇｔｏｔｈｅｐｕｌｌ−ｔｙｐｅｎｅｔｗｏｒｋｍｅｃｈａｎｉｓｍｆｏｒｒｅａｌｉｚｉｎｇｔｈｅｄｉｓａｓｔｅｒｒｅｃｏｖｅｒｙｓｙｓｔｅｍ”，２０１０年電子情報通信学会総合大会（英語セッションシンポジウム），２０１０．３．１６，ｐ．Ｓ４２−Ｓ４３Kenji Mori, Yoichiro Ueno, Suzuki Shuichi, Kazuo Ichihara, Noriharu Miyaho, "Study on the performance evaluation applying to the pull-type network mechanism for realizing the disaster recovery system", Institute of Electronics, Information and Communication Engineers General Conference 2010 (English session Symposium) , 2011.16, p. S42-S43

特許文献１から３のデータ分散保管装置は、データピースを送信すべきクライアントの情報を管理しなければならないため、クライアントの情報を管理するテーブルが肥大化する。このため、クライアントの情報管理におけるデータ分散保管装置の負荷が大きいという問題があった。 Since the data distributed storage devices of Patent Documents 1 to 3 have to manage client information to which data pieces are to be transmitted, a table for managing client information is enlarged. For this reason, there is a problem that the load of the data distributed storage device in the information management of the client is large.

そこで、本発明は、データ分散保管装置の負荷を軽減することのできるデータ分散保管装置及び方法及びプログラム及び記録媒体の提供を目的とする。 Accordingly, an object of the present invention is to provide a data distributed storage device and method, a program, and a recording medium that can reduce the load on the data distributed storage device.

上記目的を達成するために、本願発明のデータ分散保管装置は、予め定められた規則に基づいて入力データのデータ配列を変更するデータ変更部と、前記データ変更部からの変更データを複数のデータピースに分割するデータ分割部と、予め定められた演算アルゴリズムを用いて、前記入力データ固有の演算値及び前記データピースの演算値を算出する演算値算出部と、前記演算値算出部の算出する前記入力データ固有の演算値及び前記データピースの演算値並びに前記データ変更部の変更履歴が関連付けられたメタデータを格納するメタデータ格納部と、前記演算値算出部の算出する前記入力データ固有の演算値及び前記データピースの演算値を、前記データ分割部からの各データピースに付して分散用データを構成する分散用データ構成部と、前記分散用データ構成部の構成する前記分散用データを、前記分散用データに付されている前記データピースの演算値に適合する識別子を有するクライアントのうちの任意のクライアントに対して送信するサーバ分散用データ送受信部と、を備える。 In order to achieve the above object, a data distributed storage device according to the present invention includes a data changing unit that changes a data arrangement of input data based on a predetermined rule, and a plurality of pieces of changed data from the data changing unit. A data dividing unit that divides the data into pieces, a calculation value calculation unit that calculates a calculation value unique to the input data and a calculation value of the data piece using a predetermined calculation algorithm, and calculation of the calculation value calculation unit A metadata storage unit that stores metadata associated with a calculation value unique to the input data, a calculation value of the data piece, and a change history of the data change unit, and a unique value of the input data calculated by the calculation value calculation unit A data structure for distribution that configures data for distribution by attaching the calculated value and the calculated value of the data piece to each data piece from the data dividing unit And the distribution data configured by the distribution data configuration unit is transmitted to any of the clients having an identifier that matches the operation value of the data piece attached to the distribution data. A server distribution data transmission / reception unit.

分散用データ構成部がデータピースに特定の演算値を付し、サーバ分散用データ送受信部がデータピースを特定の演算値に適合するクライアントのみに送信するため、データ分散保管装置は分散用データをどのクライアントに保管したかを管理する必要がない。これにより、本願発明のデータ分散保管装置は、データ分散保管装置の負荷を軽減することができる。 Since the data composition unit for distribution attaches a specific operation value to the data piece, and the data transmission / reception unit for server distribution transmits the data piece only to clients that match the specific operation value, the data distribution storage device stores the data for distribution. There is no need to manage which clients are stored. Thereby, the data distributed storage apparatus of this invention can reduce the load of a data distributed storage apparatus.

本願発明のデータ分散保管装置では、前記データピースの演算値に適合する識別子を有するクライアントからは取得可能であるけれども前記データピースの演算値に適合しない識別子を有するクライアントからは取得不可能な状態で、前記分散用データ構成部の構成する前記分散用データを格納する分散用データ格納部を、さらに備えてもよい。
分散用データ格納部を備えるため、クライアントから分散用データの有無を確認された際に分散用データがあれば送信するプル型の構成とすることができる。これにより、本願発明のデータ分散保管装置は、分散用データをクライアントにの通信に要する負荷を軽減することができる。 In the data distributed storage device of the present invention, it can be acquired from a client having an identifier that matches the calculated value of the data piece, but cannot be acquired from a client that has an identifier that does not match the calculated value of the data piece. The distribution data storage unit configured to store the distribution data included in the distribution data configuration unit may be further provided.
Since the data storage unit for distribution is provided, it can be configured as a pull type that transmits data if there is data for distribution when the presence or absence of data for distribution is confirmed by the client. Thereby, the data distribution storage device of the present invention can reduce the load required for communication of the data for distribution to the clients.

本願発明のデータ分散保管装置では、前記データ分割部からの任意の数のデータピースを用いて、前記データピースのパリティデータを生成するパリティ演算部をさらに備え、前記パリティ演算部は、前記任意の数が可変であり、前記演算値算出部は、前記演算アルゴリズムを用いて、前記パリティデータの演算値をさらに算出し、前記分散用データ構成部は、前記演算値算出部の算出する前記入力データ固有の演算値及び前記パリティデータの演算値を、前記パリティ演算部の生成する前記パリティデータに付して前記分散用データをさらに構成し、前記サーバ分散用データ送受信部は、前記分散用データ構成部の構成する前記分散用データを、前記パリティデータの演算値に適合する識別子を有するクライアントに送信してもよい。
パリティ演算部を備えるため、データピースの冗長保管を行うことができる。ここで、パリティデータを保管するため、効率よく冗長保管を行うことができる。 The data distributed storage device of the present invention further includes a parity operation unit that generates parity data of the data piece using an arbitrary number of data pieces from the data division unit, and the parity operation unit includes The number is variable, the calculation value calculation unit further calculates a calculation value of the parity data using the calculation algorithm, and the data component for distribution calculates the input data calculated by the calculation value calculation unit The distribution data is further configured by adding a unique calculation value and a calculation value of the parity data to the parity data generated by the parity calculation unit, and the server distribution data transmission / reception unit is configured as the distribution data configuration The data for distribution that constitutes the unit may be transmitted to a client having an identifier that matches the calculated value of the parity data.
Since the parity calculation unit is provided, redundant storage of data pieces can be performed. Here, since parity data is stored, redundant storage can be efficiently performed.

本願発明のデータ分散保管装置では、前記サーバ分散用データ送受信部は、前記分散用データ構成部の構成する前記分散用データを、前記データピースの演算値に一致する識別子を有するクライアント及び前記演算アルゴリズムを用いて導き出される数値空間において前記データピースの演算値付近に位置する演算値に対応する識別子を有するクライアントのうちの少なくともいずれかのクライアントに送信してもよい。
本発明により、データ分散保管装置にアクセスしているクライアントのなかにデータピースの演算値に一致する識別子を有するクライアントがない場合であっても、分散用データを速やかにクライアントに送信することができる。 In the data distributed storage device according to the present invention, the server distribution data transmitting / receiving unit includes a client having an identifier that matches an operation value of the data piece, and the calculation algorithm, the distribution data constituting the distribution data configuration unit May be transmitted to at least one of the clients having an identifier corresponding to the calculated value located in the vicinity of the calculated value of the data piece in the numerical space derived using.
According to the present invention, even when there is no client having an identifier that matches the calculated value of the data piece among the clients accessing the data distributed storage device, the data for distribution can be transmitted to the client promptly. .

本願発明のデータ分散保管装置では、前記メタデータ格納部の格納する前記メタデータを取得するメタデータ取得部と、前記メタデータ取得部の取得する前記メタデータに含まれる前記データピースの演算値に適合するクライアントから、前記メタデータに含まれる前記入力データ固有の演算値が付されている前記分散用データを回収する分散用データ回収部と、前記メタデータ取得部の取得する前記メタデータに含まれる前記データピースの演算値に従って、前記分散用データ回収部の回収する前記分散用データを配列し、前記データピースを結合するデータ結合部と、前記メタデータ取得部の取得する前記メタデータに含まれる前記データ変更部の履歴に基づいて、前記データ結合部からの結合データを前記入力データに復元するデータ復元部と、を備えてもよい。
メタデータ取得部及び分散用データ回収部を備えるため、データピースに付された特定の演算値を用いて分散用データを回収することができる。データ結合部及びデータ復元部を備えるため、分散用データを用いて入力データを復元することができる。ここで、データピースに付された特定の演算値を用いているため、データ分散保管装置は分散用データをどのクライアントに保管したかを管理する必要がない。これにより、本願発明のデータ分散保管装置は、データ分散保管装置の負荷を軽減することができる。 In the data distributed storage device of the present invention, the metadata acquisition unit that acquires the metadata stored in the metadata storage unit, and the calculated value of the data piece included in the metadata acquired by the metadata acquisition unit Included in the metadata acquired by the metadata acquisition unit and the data acquisition unit for distribution that collects the data for distribution to which the operation value specific to the input data included in the metadata is attached from a suitable client In accordance with the calculated value of the data piece, the data for distribution collected by the data collection unit for distribution is arranged and included in the data combining unit that combines the data pieces and the metadata acquired by the metadata acquisition unit Based on the history of the data changing unit, the data for restoring the combined data from the data combining unit to the input data The original unit, may be provided.
Since the metadata acquisition unit and the data collection unit for distribution are provided, the data for distribution can be collected using a specific calculation value attached to the data piece. Since the data combination unit and the data restoration unit are provided, the input data can be restored using the data for distribution. Here, since the specific operation value attached to the data piece is used, the data distributed storage device does not need to manage to which client the distribution data is stored. Thereby, the data distributed storage apparatus of this invention can reduce the load of a data distributed storage apparatus.

本願発明のデータ分散保管装置では、前記演算値算出部は、予め定められた演算アルゴリズムを用いて、前記入力データ自体の演算値をさらに算出し、前記メタデータ格納部は、前記演算値算出部の算出する前記入力データ自体の演算値をさらに含む前記メタデータを格納し、前記入力データ自体の演算値を算出した前記演算アルゴリズムを用いて前記データ復元部の復元データの演算値を算出し、算出した前記復元データの演算値を、前記メタデータ取得部の取得する前記メタデータに含まれる前記入力データ自体の演算値と照合する演算値照合部をさらに備えてもよい。
演算値照合部を備えるため、復元データと入力データとが一致するか否かを判定することができる。判定結果を用いて回収したデータピースの真偽を判定することができるため、本願発明のデータ分散保管装置は、クライントのなかに悪意ある参加を試みるクライアントがあった場合にこれを排除することができる。 In the data distributed storage device of the present invention, the calculated value calculation unit further calculates a calculated value of the input data itself using a predetermined calculation algorithm, and the metadata storage unit includes the calculated value calculation unit. Storing the metadata further including the calculated value of the input data itself to calculate the calculated value of the restored data of the data restoration unit using the calculation algorithm that calculated the calculated value of the input data itself, You may further provide the calculated value collation part which collates the calculated calculated value of the said restoration data with the calculated value of the said input data itself contained in the said metadata which the said metadata acquisition part acquires.
Since the calculation value collation unit is provided, it can be determined whether the restored data and the input data match. Since it is possible to determine the authenticity of the collected data piece using the determination result, the data distribution storage device of the present invention can eliminate the client when there is a client who attempts malicious participation in the client. it can.

上記目的を達成するために、本願発明のデータ分散保管方法は、予め定められた規則に基づいて入力データのデータ配列を変更するデータ変更手順（Ｓ１０２）と、前記入力データを変更した変更データを複数のデータピースに分割するデータ分割手順（Ｓ１０３）と、予め定められた演算アルゴリズムを用いて、前記入力データ固有の演算値及び前記データピースの演算値を算出する演算値算出手順（Ｓ１０４）と、前記入力データ固有の演算値及び前記データピースの演算値並びに前記データ変更手順における変更履歴が関連付けられたメタデータを格納するメタデータ格納手順（Ｓ１０５）と、前記入力データ固有の演算値及び前記データピースの演算値を前記各データピースに付して分散用データを構成し、当該分散用データを格納する分散用データ格納手順（Ｓ１０６）と、前記分散用データを、前記分散用データに付されている前記データピースの演算値に適合する識別子を有するクライアントのうちの任意のクライアントに対して送信するサーバ分散用データ送信手順（Ｓ１０７）と、を順に有する。 In order to achieve the above object, the data distributed storage method of the present invention includes a data change procedure (S102) for changing the data arrangement of input data based on a predetermined rule, and a change data obtained by changing the input data. A data division procedure (S103) for dividing the data piece into a plurality of data pieces, and a calculation value calculation procedure (S104) for calculating a calculation value specific to the input data and a calculation value of the data piece using a predetermined calculation algorithm. A metadata storage procedure (S105) for storing metadata associated with a computation value unique to the input data, a computation value of the data piece, and a change history in the data modification procedure, and a computation value unique to the input data and the data A distributed data is configured by assigning the calculated value of the data piece to each data piece, and the distributed data is stored. The distribution data storage procedure (S106) and the distribution data are transmitted to any of the clients having an identifier that matches the calculated value of the data piece attached to the distribution data. Server distribution data transmission procedure (S107).

分散用データ格納手順においてデータピースに特定の演算値を付し、サーバ分散用データ送信手順においてデータピースを特定の演算値に適合するクライアントのみに送信するため、データ分散保管装置は分散用データをどのクライアントに保管したかを管理する必要がない。これにより、本願発明のデータ分散保管方法は、データ分散保管装置の負荷を軽減することができる。 In the data storage procedure for distribution, a specific operation value is assigned to the data piece, and in the data transmission procedure for server distribution, the data piece is transmitted only to clients that match the specific operation value. There is no need to manage which clients are stored. Thereby, the data distributed storage method of this invention can reduce the load of a data distributed storage apparatus.

本願発明のデータ分散保管方法では、前記分散用データ格納手順において、前記データピースの演算値に適合する識別子を有するクライアントからは取得可能であるけれども前記データピースの演算値に適合しない識別子を有するクライアントからは取得不可能な状態で、前記分散用データ構成部の構成する前記分散用データを格納してもよい。
本発明により、クライアントから分散用データの有無を確認された際に分散用データがあれば送信するプル型の構成とすることができる。これにより、分散用データをクライアントの通信に要する負荷を軽減することができる。 In the data distributed storage method of the present invention, in the data storage procedure for distribution, a client having an identifier that can be obtained from a client having an identifier that matches the calculated value of the data piece but does not match the calculated value of the data piece The distribution data constituting the distribution data configuration unit may be stored in a state in which the distribution data cannot be acquired.
According to the present invention, it is possible to adopt a pull-type configuration in which if there is data for distribution when the presence / absence of data for distribution is confirmed from the client. As a result, it is possible to reduce the load required for communication of the data for distribution by the client.

本願発明のデータ分散保管方法では、前記データ分割手順において、前記データピースのパリティデータを生成し、前記演算値算出手順において、前記演算アルゴリズムを用いて、前記パリティデータの演算値をさらに算出し、前記分散用データ格納手順において、前記入力データ固有の演算値及び前記パリティデータの演算値を前記パリティデータに付して前記分散用データをさらに構成し、前記サーバ分散用データ送信手順において、前記分散用データを、前記パリティデータの演算値に適合する識別子を有するクライアントにさらに送信してもよい。
パリティ演算部を備えるため、データピースの冗長保管を行うことができる。ここで、パリティデータを保管するため、効率よく冗長保管を行うことができる。 In the data distributed storage method of the present invention, in the data division procedure, parity data of the data piece is generated, and in the calculation value calculation procedure, the calculation value of the parity data is further calculated using the calculation algorithm, In the distribution data storage procedure, the distribution data is further configured by adding the operation value specific to the input data and the operation value of the parity data to the parity data, and in the server distribution data transmission procedure, the distribution data The data may be further transmitted to a client having an identifier that matches the operation value of the parity data.
Since the parity calculation unit is provided, redundant storage of data pieces can be performed. Here, since parity data is stored, redundant storage can be efficiently performed.

本願発明のデータ分散保管方法では、前記サーバ分散用データ送信手順において、前記分散用データを、前記データピースの演算値に一致する識別子を有するクライアント及び前記演算アルゴリズムを用いて導き出される数値空間において前記データピースの演算値付近に位置する演算値に対応する識別子を有するクライアントのうちの少なくともいずれかのクライアントに送信してもよい。
本発明により、データ分散保管装置にアクセスしているクライアントのなかにデータピースの演算値に一致する識別子を有するクライアントがない場合であっても、分散用データを速やかにクライアントに送信することができる。 In the data distribution storage method of the present invention, in the server distribution data transmission procedure, the distribution data is calculated in the numerical space derived using the client having the identifier that matches the operation value of the data piece and the operation algorithm. You may transmit to at least one of the clients which have the identifier corresponding to the calculated value located near the calculated value of a data piece.
According to the present invention, even when there is no client having an identifier that matches the calculated value of the data piece among the clients accessing the data distributed storage device, the data for distribution can be transmitted to the client promptly. .

本願発明のデータ分散保管方法では、前記メタデータ格納手順において格納した前記メタデータを取得するメタデータ取得手順（Ｓ２０１）と、前記メタデータ取得手順で取得した前記メタデータに含まれる前記データピースの演算値に適合するクライアントから、前記メタデータに含まれる前記入力データ固有の演算値が付されている前記分散用データを回収する分散用データ回収手順（Ｓ２０２）と、前記メタデータ取得手順で取得した前記メタデータに含まれる前記データピースの演算値に従って、前記分散用データ回収手順で回収した前記分散用データを配列し、前記データピースを結合するデータ結合手順（Ｓ２０３）と、前記メタデータ取得手順で取得した前記メタデータに含まれる前記変更履歴に基づいて、前記データ結合手順で結合した結合データを前記入力データに復元するデータ復元手順（Ｓ２０４）と、を前記サーバ分散用データ送信手順の後に順に有する。
メタデータ取得手順及び分散用データ回収手順を有するため、データピースに付された特定の演算値を用いて分散用データを回収することができる。データ結合手順及びデータ復元手順を有するため、分散用データを用いて入力データを復元することができる。ここで、データピースに付された特定の演算値を用いているため、データ分散保管装置は分散用データをどのクライアントに保管したかを管理する必要がない。これにより、本願発明のデータ分散保管方法は、データ分散保管装置の負荷を軽減することができる。 In the data distributed storage method of the present invention, a metadata acquisition procedure (S201) for acquiring the metadata stored in the metadata storage procedure, and the data pieces included in the metadata acquired in the metadata acquisition procedure Obtained by a data collection procedure for distribution (S202) for collecting the data for distribution to which the computation value specific to the input data included in the metadata is attached, and the metadata acquisition procedure from a client that matches the computation value In accordance with the calculated value of the data piece included in the metadata, the data to be distributed collected in the data collection procedure for distribution is arranged and the data pieces are combined (S203), and the metadata acquisition Based on the change history included in the metadata acquired in the procedure, In a sequentially and data recovery procedure to restore the binding data bound to the input data (S204), the after the server distribution data transmission procedure.
Since it has a metadata acquisition procedure and a data collection procedure for distribution, the data for distribution can be collected using a specific calculation value attached to the data piece. Since the data combination procedure and the data restoration procedure are provided, the input data can be restored using the data for distribution. Here, since the specific operation value attached to the data piece is used, the data distributed storage device does not need to manage to which client the distribution data is stored. Thereby, the data distributed storage method of this invention can reduce the load of a data distributed storage apparatus.

本願発明のデータ分散保管方法では、前記演算値算出手順において、予め定められた演算アルゴリズムを用いて、前記入力データ自体の演算値をさらに算出し、前記メタデータ格納手順において、前記演算値算出手順で算出した前記入力データ自体の演算値をさらに含む前記メタデータを格納し、前記入力データ自体の演算値を算出した前記演算アルゴリズムを用いて前記データ復元手順で復元した復元データ自体の演算値を算出し、算出した前記復元データ自体の演算値を、前記メタデータ取得手順で取得した前記メタデータに含まれる前記入力データ自体の演算値と照合する演算値照合手順（Ｓ２０５）を前記データ復元手順の後にさらに有してもよい。
演算値照合手順を有するため、復元データと入力データとが一致するか否かを判定することができる。判定結果を用いて回収したデータピースの真偽を判定することができるため、本願発明のデータ分散保管方法は、クライントのなかに悪意ある参加を試みるクライアントがあった場合にこれを排除することができる。 In the data distributed storage method of the present invention, in the calculation value calculation procedure, a calculation value of the input data itself is further calculated using a predetermined calculation algorithm, and in the metadata storage procedure, the calculation value calculation procedure is calculated. The metadata further including the calculated value of the input data itself calculated in step (b) is stored, and the calculated value of the restored data itself restored in the data restoration procedure using the calculation algorithm that has calculated the calculated value of the input data itself. An operation value collating procedure (S205) for calculating and collating the calculated operation value of the restored data itself with the operation value of the input data itself included in the metadata acquired in the metadata acquisition procedure is the data restoration procedure. It may further have after.
Since it has a calculation value collation procedure, it can be determined whether restoration data and input data correspond. Since it is possible to determine the authenticity of the collected data piece using the determination result, the data distributed storage method of the present invention can eliminate the client when there is a client who attempts to participate maliciously. it can.

本願発明のプログラムは、本発明のデータ分散保管方法をコンピュータに実行させる。
本発明により、コンピュータを用いて本発明のデータ分散保管方法を実行することができる。これにより、本願発明のプログラムは、データ分散保管装置の負荷を軽減することができる。 The program of the present invention causes a computer to execute the data distributed storage method of the present invention.
According to the present invention, the distributed data storage method of the present invention can be executed using a computer. Thereby, the program of this invention can reduce the load of a data distribution storage apparatus.

本願発明の記録媒体は、本発明のデータ分散保管方法をコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体である。
本発明により、コンピュータを用いて本発明のデータ分散保管方法を実行することができる。これにより、本願発明のプログラムは、データ分散保管装置の負荷を軽減することができる。 The recording medium of the present invention is a computer-readable recording medium that records a program for causing a computer to execute the data distributed storage method of the present invention.
According to the present invention, the distributed data storage method of the present invention can be executed using a computer. Thereby, the program of this invention can reduce the load of a data distribution storage apparatus.

なお、上記各発明は、可能な限り組み合わせることができる。 The above inventions can be combined as much as possible.

本発明によれば、データ分散保管装置の負荷を軽減することのできるデータ分散保管装置及び方法及びプログラム及び記録媒体を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the data distributed storage apparatus and method, program, and recording medium which can reduce the load of a data distributed storage apparatus can be provided.

実施形態１に係るデータ分散保管システムの一例を示す。An example of the data distribution storage system which concerns on Embodiment 1 is shown. 実施形態１に係るデータ分散保管方法の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the data distribution storage method which concerns on Embodiment 1. FIG. 本実施形態において扱うデータの一例を示す。An example of the data handled in this embodiment is shown. 分散用データＤ_ｊの一例を示す。An example of the distribution data D _j is shown. 分散用データ格納部の格納する分散用データの一例を示す。An example of the data for distribution which the data storage part for distribution stores is shown. サーバ分散用データ送受信部の送信する分散用データの一例を示す。An example of the data for distribution which a data transmission / reception part for server distribution transmits is shown. 実施形態２に係るデータ分散保管システムのフロチャートを示す。7 shows a flowchart of a data distributed storage system according to a second embodiment. 実施形態３に係るデータ分散保管システムの一例を示す。An example of the data distribution storage system which concerns on Embodiment 3 is shown. 実施形態３において扱うデータの一例を示す。An example of the data handled in Embodiment 3 is shown. 実施形態４に係るデータ分散保管システムの一例を示す。An example of the data distribution storage system which concerns on Embodiment 4 is shown. 実施形態４に係るデータ分散保管方法の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the data distribution storage method which concerns on Embodiment 4. 分散用データＤ_ｊを回収する際のデータ分散保管システムの動作の一例を示すフロチャートである。It is a flowchart which shows an example of operation | movement of the data distribution storage system at the time of collect _| recovering the data Dj for dispersion | distribution.

添付の図面を参照して本発明の実施形態を説明する。以下に説明する実施形態は本発明の実施の例であり、本発明は、以下の実施形態に制限されるものではない。なお、本明細書及び図面において符号が同じ構成要素は、相互に同一のものを示すものとする。 Embodiments of the present invention will be described with reference to the accompanying drawings. The embodiments described below are examples of the present invention, and the present invention is not limited to the following embodiments. In the present specification and drawings, the same reference numerals denote the same components.

（実施形態１）
図１に、実施形態１に係るデータ分散保管システムの一例を示す。本実施形態に係るデータ分散保管システムは、データ分散保管装置１Ａと、Ｍ台のクライアント３_１〜３_Ｍと、データ分散保管装置１Ｂと、を備える。図１では、Ｍ台のクライアント３_１〜３_Ｍのうちのｉ番目のクライアント３_ｉ（ｉは１〜Ｍの任意の整数。）のみを記載した。データ分散保管装置１Ａ及び１Ｂとクライアント３_１〜３_Ｍとは、通信ネットワークで接続されている。 (Embodiment 1)
FIG. 1 shows an example of a data distributed storage system according to the first embodiment. The data distributed storage system according to the present embodiment includes a data distributed storage device 1A, M clients 3 _{1 to} 3 _M, and a data distributed storage device 1B. In FIG. 1, only the i-th client 3 _i (i is an arbitrary integer from ₁ to _M ) among the _M clients 3 _{1 to} 3 _M is described. The data distributed storage devices 1A and 1B and the clients 3 _{1 to} 3 _M are connected by a communication network.

データ分散保管装置１Ａは、複数のクライアント３_１〜３_Ｍに入力データを分散させるための構成を備える。例えば、データ分散保管装置１Ａは、データ変更部１１と、データ分割部１２と、演算値算出部１３と、メタデータ格納部１４と、分散用データ構成部１５と、分散用データ格納部１６と、サーバ分散用データ送受信部１７と、メタデータ収集部１９と、を備える。 The data distribution storage device 1A includes a configuration for distributing input data to a plurality of clients 3 _{1 to} 3 _M. For example, the data distribution storage device 1A includes a data change unit 11, a data division unit 12, a calculation value calculation unit 13, a metadata storage unit 14, a distribution data configuration unit 15, and a distribution data storage unit 16. The server distribution data transmission / reception unit 17 and the metadata collection unit 19 are provided.

クライアント３_ｉは、識別子ＩＤ_ｉを有し、識別子ＩＤ_ｉに対応したデータピースＰ_１〜Ｐ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮに適合する分散用データＤ_ｊを保管する。例えば、クライアント３は、分散用データ取得部３１と、分散用データ格納部３２と、クライアント分散用データ送受信部３３と、を備える。 The client _{3 i} has an identifier ID _i, stores the calculated value _O P1 _{~ O PN} in a compatible dispersion data _{D j} of the data pieces _P 1 to P _N which corresponds to the identifier ID _i. For example, the client 3 includes a distribution data acquisition unit 31, a distribution data storage unit 32, and a client distribution data transmission / reception unit 33.

クライアント３_ｉとしてはＰＣや、携帯電話など蓄積機能を有するあらゆる通信装置が利用可能であり、ユーザは、当該の通信装置に、アプリケーション（以下、ＤＲＴ（ＤｉｓｔｒｉｂｕｔｉｏｎａｎｄＲａｋｅＴｅｃｈｎｏｌｏｇｙ）アプリケーションと記述）をインストールしてシステムに参加することができる。クライアントとしてはＰＣや、携帯電話など蓄積機能を有するあらゆる通信装置が利用可能であり、ユーザは、当該の通信装置に、アプリケーション（以下ＤＲＴアプリケーションと記述）をインストールしてシステムに参加することができる。 As the client 3 _i , any communication device having a storage function such as a PC or a mobile phone can be used, and the user installs an application (hereinafter referred to as DRT (Distribution and Rake Technology) application) in the communication device. And participate in the system. As a client, any communication device having a storage function such as a PC or a mobile phone can be used, and a user can install an application (hereinafter referred to as a DRT application) in the communication device and participate in the system. .

データ分散保管装置１Ｂは、複数のクライアント３_１〜３_Ｍから分散用データＤ_１〜Ｄ_Ｎを回収して入力データを復元する。例えば、データ分散保管装置１Ｂは、メタデータ取得部２１と、分散用データ回収部２２と、データ結合部２３と、データ復元部２４と、を備える。 The data distribution storage device 1B recovers input data by collecting the data D _{1 to} _DN for distribution from the plurality of clients 3 _{1 to} 3 _M. For example, the data distribution storage device 1B includes a metadata acquisition unit 21, a distribution data collection unit 22, a data combination unit 23, and a data restoration unit 24.

図２は、実施形態１に係るデータ分散保管方法の一例を示すシーケンス図である。本実施形態に係るデータ分散保管方法は、入力データ取得手順Ｓ１０１と、データ変更手順Ｓ１０２と、データ分割手順Ｓ１０３と、演算値算出手順Ｓ１０４と、メタデータ格納手順Ｓ１０５と、分散用データ格納手順Ｓ１０６と、サーバ分散用データ送信手順Ｓ１０７と、メタデータ送信手順Ｓ１０８と、分散用データ取得手順Ｓ３０１と、分散用データ格納手順Ｓ３０２と、クライアント分散用データ送信手順Ｓ３０３と、メタデータ取得手順Ｓ２０１と、分散用データ回収手順Ｓ２０２と、データ結合手順Ｓ２０３と、データ復元手順Ｓ２０４と、を順に有する。本実施形態に係るデータ分散保管プログラムは、本実施形態に係るデータ分散保管方法をコンピュータに実行させるためのプログラムである。 FIG. 2 is a sequence diagram illustrating an example of a data distributed storage method according to the first embodiment. The data distributed storage method according to the present embodiment includes an input data acquisition procedure S101, a data change procedure S102, a data division procedure S103, a calculation value calculation procedure S104, a metadata storage procedure S105, and a data storage procedure for distribution S106. A server distribution data transmission procedure S107, a metadata transmission procedure S108, a distribution data acquisition procedure S301, a distribution data storage procedure S302, a client distribution data transmission procedure S303, a metadata acquisition procedure S201, The distribution data collection procedure S202, the data combination procedure S203, and the data restoration procedure S204 are sequentially provided. The data distributed storage program according to the present embodiment is a program for causing a computer to execute the data distributed storage method according to the present embodiment.

図３に、本実施形態において扱うデータの一例を示す。以下、図１、図２及び図３を参照しながら、本実施形態に係るデータ分散保管方法について説明する。 FIG. 3 shows an example of data handled in the present embodiment. Hereinafter, the data distributed storage method according to the present embodiment will be described with reference to FIGS. 1, 2, and 3.

入力データ取得手順Ｓ１０１では、データ分散保管装置１Ａが入力データＩを取得する。例えば、ＤＲＴアプリケーションからアクセス可能なサーバのフォルダ内に入力データＩが格納されると、データ分散保管装置１Ａは入力データＩを取得する。 In the input data acquisition procedure S101, the data distributed storage device 1A acquires the input data I. For example, when the input data I is stored in a folder of a server accessible from the DRT application, the data distributed storage device 1A acquires the input data I.

データ分散保管装置１Ａは、保管すべきデータを認識すると、データ変更手順Ｓ１０２を実行する。データ変更手順Ｓ１０２では、データ変更部１１が、予め定められた規則に基づいて入力データＩのデータ配列を変更する。これにより、入力データＩを変更した変更データＣがデータ分割部１２に入力される。予め定められた規則は、例えば、一体化関数、生成した順若しくは日時、チェックサム、ＣＲＣ（ＣｙｃｌｉｃＲｅｄｕｎｄａｎｃｙＣｈｅｃｋ）又は入力データＩの特定部分のビット列である。 When the data distributed storage device 1A recognizes the data to be stored, it executes the data change procedure S102. In the data change procedure S102, the data change unit 11 changes the data array of the input data I based on a predetermined rule. Thereby, the changed data C obtained by changing the input data I is input to the data dividing unit 12. The predetermined rule is, for example, a unified function, a generation order or date / time, a checksum, a CRC (Cyclic Redundancy Check), or a bit string of a specific part of the input data I.

データ分割手順Ｓ１０３では、データ分割部１２が、変更データＣを複数のデータピースＰ_１〜Ｐ_Ｎに分割する。 In the data division procedure S103, the data division unit 12 divides the change data C into a plurality of data pieces P _{1 to} P _N.

演算値算出手順Ｓ１０４では、演算値算出部１３が、予め定められた演算アルゴリズムを用いて、入力データＩ固有の演算値Ｏ_Ｉ及びデータピースＰ_１〜Ｐ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮを算出する。入力データＩ固有の演算値Ｏ_Ｉは、例えば、入力データＩのファイル名の演算値である。予め定められた演算アルゴリズムは、例えば、ハッシュ関数である。この場合、演算値算出部１３は、入力データＩのファイル名のハッシュ値を算出することによって、入力データＩ固有の演算値Ｏ_Ｉを算出する。演算値算出部１３は、データピースＰ_ｊのハッシュ値を算出することによって、演算値Ｏ_Ｐｊを算出する。入力データＩ固有の演算値Ｏ_Ｉは、入力データＩのファイル名に限らず、例えば、入力データＩそのものの演算値であってもよいし、入力データＩのヘッダ情報の演算値であってもよい。 The arithmetic calculation procedure S104, calculation value calculating unit 13, using a predetermined arithmetic algorithm, the calculated value _O P1 _{~ O PN} of the input data I specific operation value _{O I} and the data pieces _P 1 to P _N calculate. Input data I specific operation value O _I is, for example, the calculated value of the file name of the input data I. The predetermined calculation algorithm is, for example, a hash function. In this case, the calculation value calculation unit 13 calculates a calculation value O _I unique to the input data I by calculating a hash value of the file name of the input data I. Calculating value calculation unit 13, by calculating the hash value of the data pieces P _j, calculates the calculated value O _Pj. Input data I specific operation value O _I is not limited to the file name of the input data I, for example, may be a calculated value of the input data I itself, be an arithmetic value of the header information of the input data I Good.

メタデータ格納手順Ｓ１０５では、メタデータ格納部１４が、入力データＩ固有の演算値Ｏ_Ｉ及びデータピースＰ_１〜Ｐ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮ並びにデータ変更手順Ｓ１０２における変更履歴の関連付けられたメタデータＭを格納する。メタデータＭは、入力データＩのファイル名、入力データＩ固有の演算値Ｏ_Ｉ、各データピースＰ_１〜Ｐ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮ、データ変更部１１の変更履歴及びデータ分割部１２の分割履歴を含む。メタデータＭは、各データピースＰ_１〜Ｐ_Ｎの並び、各データピースＰ_１〜Ｐ_Ｎの暗号鍵を含んでいてもよい。 Metadata storage procedure S105, the metadata storage unit 14 is associated with the change history in the arithmetic value _O P1 _{~ O PN} and data change procedure S102 of the input data I specific operation value _{O I} and the data pieces _P 1 to P _N Stored metadata M. Metadata M is the file name of the input data I, the input data I specific operation value _{O I,} calculated value _O P1 _{~ O PN} of each data piece _P 1 to P _N, change history and the data division unit of the data change section 11 12 division histories are included. Metadata M are aligned in each data piece _P 1 to P _N, it may include an encryption key for each data piece _P 1 to P _N.

分散用データ格納手順Ｓ１０６では、分散用データ構成部１５が演算値算出部１３の算出する入力データＩ固有の演算値Ｏ_Ｉ及びデータピースＰ_１〜Ｐ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮを、データ分割部１２からの各データピースＰ_１〜Ｐ_Ｎに付して分散用データＤ_１〜Ｄ_Ｎを構成する。そして、分散用データ格納部１６が、分散用データＤ_１〜Ｄ_Ｎを格納する。図４に、分散用データＤ_ｊの一例を示す。分散用データＤ_ｊは、入力データＩ固有の演算値Ｏ_Ｉ及びデータピースＰ_ｊの演算値Ｏ_Ｐｊを含む。管理情報は、例えば、チェックサムである。 Distributed data storage procedure S106, the calculated value _O P1 _{~ O PN} of the input data I specific operation value _{O I} and the data pieces _P 1 to P _N for distributed data construction unit 15 calculates a calculation value calculating section 13, Distributing data D _{1 to} _DN are configured by attaching to the data pieces P _{1 to} P _N from the data dividing unit 12. Then, the distribution data storage unit 16 stores the distribution data D _{1 to} _DN . 4 shows an example of a distributed data D _j. The distribution data D _j includes an operation value O _I unique to the input data _I and an operation value _OPJ of the data piece P _j . The management information is, for example, a checksum.

分散用データ格納部１６は、データピースＰ_１〜Ｐ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮに適合する識別子ＩＤ_ｉを有するクライアント３からは取得可能であるけれどもデータピースＰ_１〜Ｐ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮに適合しない識別子ＩＤ_ｉを有するクライアント３_ｉからは取得不可能な状態で、分散用データ構成部１５の構成する分散用データＤ_１〜Ｄ_Ｎを格納する。例えば、クライアントの識別子を参照してアクセス制限をかける。 Distributed data storage unit 16, data pieces _P 1 to P _N calculation value _O P1 _{~ O} operation value but can be acquired data pieces _P 1 to P _N from the client 3 with a _PN in a compatible identifier ID _i O _P1 ~ O in a state impossible acquired from the client _{3 i} having identifier ID _i that do not conform to _PN, and stores the distribution data _D 1 to D _N for configuration of a distributed data structure 15. For example, access restriction is applied with reference to the identifier of the client.

例えば、図５に示すように、演算値Ｏ_Ｉが「１２３４５」で演算値Ｏ_Ｐｊが「６６１２」のデータピースＰ_ｊを含む分散用データＤ_ｊが分散用データ格納部１６に格納されている。この場合、演算値Ｏ_Ｐｊに一致する「６６１２」の識別子ＩＤ_ｉを有するクライアント３_ｉは、分散用データＤ_ｊを取得することができる。一方、演算値Ｏ_Ｐｊに適合しない「５１４７」の識別子ＩＤ_１を有するクライアント３_１は、分散用データＤ_ｊを取得することはできない。 For example, as illustrated in FIG. 5, the distribution data D _j including the data piece P _j having the operation value O _I of “12345” and the operation value _OPj of “6612” is stored in the distribution data storage unit 16. . In this case, the client _{3 i} having identifier ID _i of "6612" to match the calculated value _{O Pj} can obtain the dispersion data _{D j.} Meanwhile, the client _{3 1} having an identifier ID ₁ of incompatible with the calculated value _{O Pj} "5147" can not be obtained for dispersion data _{D j.}

ここで、演算値Ｏ_Ｐｊと識別子ＩＤ_ｉとが適合する場合には、分散用データＤ_Ｐｊと演算値Ｏ_Ｐｊとが一致する場合と、演算アルゴリズムを用いて導き出される数値空間においてデータピースＰ_ｊの演算値Ｏ_Ｐｊ付近に位置する演算値に一致する場合と、も含む。例えば、図５に示すように、クライアント３_Ｍの識別子「７７００」がハッシュ空間においてデータピースＰ_２の演算値Ｏ_Ｐ２「７７７７」付近に位置する演算値「７７００」に対応する場合、クライアント３_Ｍは分散用データＤ_２を取得することができる。 Here, when the calculated value O _Pj matches the identifier ID _i , the data piece P _j in the numerical space derived using the calculation algorithm when the distribution data D _Pj matches the calculated value _OPj. And the case where the calculated value is in the vicinity of the calculated value _OPj . For example, as shown in FIG. 5, if the corresponding value in the calculated value of the identifier "7700" of the client _{3 M} is located at the data calculated value _{O P2} "7777" of the piece _{P 2} near the hash space "7700", the client _{3 M} You can obtain the dispersion data D _2.

クライアント３_ｊから分散用データＤ_１〜Ｄ_Ｎの有無を確認された際に、サーバ分散用データ送信手順Ｓ１０７を実行する。図６に、サーバ分散用データ送受信部の送信する分散用データの一例を示す。サーバ分散用データ送信手順Ｓ１０７では、サーバ分散用データ送受信部１７が、分散用データＤ_１〜Ｄ_Ｎをクライアント３_１〜３_Ｎに送信する。この仕組みにより送信先の存在を確認する必要がないため、通信効率を向上化させることができる。特に相手先がいない場合の通信タイムアウトは数秒から数十秒かかるため、保管すべきデータが一時的に大量に発生した場合でも輻輳が発生しにくくなる特徴を持ち、更に、輻輳のためのトラフィック制御自体も不要となる。 When the presence or absence of the distribution data D _{1 to} _DN is confirmed by the client 3 _j , the server distribution data transmission procedure S107 is executed. FIG. 6 shows an example of distribution data transmitted by the server distribution data transmission / reception unit. The server distribution data transmission procedure S107, server distribution data transceiver unit 17 transmits the distributed data _D 1 to D _N to the client ₃ 1 _~3 _N. This mechanism eliminates the need for confirming the presence of the transmission destination, so that communication efficiency can be improved. Especially when there is no other party, the communication timeout takes several seconds to several tens of seconds, so even if a large amount of data to be stored temporarily occurs, it has the feature that congestion is difficult to occur, and traffic control for congestion It is not necessary.

クライアント３_ｉは、ＤＲＴアプリケーションの起動時にデータ分散保管装置１Ａと通信を行い、保管すべきデータＤ_ｊがあるか否かを確認する。その後も一定期間ごとにクライアント３_ｉはデータ分散保管装置１Ａに保管すべきデータＤ_ｊがあるか否かを確認する。このとき、ＤＲＴアプリケーションはクライアント３_ｉ内で動作しているＣＰＵの負荷を監視し、ＣＰＵの使用率が規定の閾値を超えた場合は確認を行わない仕組みを持たせてもよい。 The client 3 _i communicates with the data distributed storage device 1A when the DRT application is activated, and checks whether there is data D _j to be stored. Thereafter, the client 3 _i checks whether there is data D _j to be stored in the data distributed storage device 1A at regular intervals. At this time, the DRT application may monitor the load of the CPU operating in the client 3 _{i and} may have a mechanism that does not check if the CPU usage rate exceeds a prescribed threshold.

ここで、各分散用データＤ_ｊには、各データピースＰ_ｊの演算値Ｏ_Ｐｊが付されている。一方、各クライアント３_ｉは識別子ＩＤ_ｉを有している。サーバ分散用データ送受信部１７は、分散用データＤ_１〜Ｄ_Ｎを、分散用データＤ_１〜Ｄ_Ｎに付されているデータピースＰ_１〜Ｐ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮに適合する識別子を有するクライアントのうちの任意のクライアント３_ｉに対して送信する。 Here, each distribution data D _j is assigned an operation value O _Pj of each data piece P _j . On the other hand, each client 3 _i has an identifier ID _i . Server distribution data receiving unit 17 is adapted for dispersion data _D 1 to D _N, the calculated value _O P1 _{~ O PN} data pieces _P 1 to P _N, which are assigned to the distributed data _D 1 to D _N It transmits to arbitrary clients 3 _i among the clients having the identifier.

このとき、クライアント３_ｉは分散用データ取得手順Ｓ３０１及び分散用データ格納手順Ｓ３０２を実行する。分散用データ取得手順Ｓ３０１では、分散用データ取得部３１が、分散用データＤ_ｊを受信する。分散用データ格納手順Ｓ３０２では、分散用データ格納部３２が、分散用データＤ_ｊを格納する。これにより、分散用データＤ_１〜Ｄ_Ｎがクライアント３_１〜３_Ｍに分散して保管される。 At this time, the client 3 _i executes a distribution data acquisition procedure S301 and a distribution data storage procedure S302. Distributed data acquisition procedure S301, the distributed data acquisition unit 31 receives the distributed data _{D j.} Distributed data storage procedure S302, the distributed data storage unit 32 stores the distributed data _{D j.} As a result, the distribution data D _{1 to} _DN are distributed and stored in the clients 3 _{1 to} 3 _M.

メタデータ格納手順Ｓ１０５の後、データ分散保管装置１ＢがメタデータＭを取得する。例えば、メタデータ送信手順Ｓ１０８及びメタデータ取得手順Ｓ２０１を実行する。メタデータ取得手順Ｓ２０１では、メタデータ取得部２１が、メタデータ格納部１４にアクセスして、メタデータ格納手順Ｓ１０５において格納したメタデータＭを取得する。このとき、データ分散保管装置１Ａがメタデータ送信手順Ｓ１０８を実行して、メタデータＭをデータ分散保管装置１Ｂに送信する。 After the metadata storage procedure S105, the data distributed storage device 1B acquires the metadata M. For example, the metadata transmission procedure S108 and the metadata acquisition procedure S201 are executed. In the metadata acquisition procedure S201, the metadata acquisition unit 21 accesses the metadata storage unit 14 and acquires the metadata M stored in the metadata storage procedure S105. At this time, the data distributed storage device 1A executes the metadata transmission procedure S108, and transmits the metadata M to the data distributed storage device 1B.

分散用データ回収手順Ｓ２０２では、分散用データ回収部２２が、クライアント３_１〜３_Ｍから分散用データＤ_１〜Ｄ_Ｎを回収する。例えば、分散用データ回収部２２は、メタデータＭに含まれるデータピースＰ_１〜Ｐ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮに適合するクライアントに対して、メタデータＭに含まれる入力データＩ固有の演算値Ｏ_Ｉが付されている分散用データの回収指示を送信する。例えば、図５及び図６に示すように、入力データＩの復元を行う場合、演算値Ｏ_Ｐｊ「６６１２」に適合する識別子「６６１２」を有するクライアント３_ｉに対して、演算値Ｏ_Ｉ「１２３４５」が付されている分散用データの送信を指示する。 Distributed data collection procedure S202, the distributed data collecting section 22, collecting the dispersed data _D 1 to D _N from the client ₃ 1 _~3 _M. For example, distributed data collecting unit 22, to the clients that meet the calculated value _O P1 _{~ O PN} data pieces _P 1 to P _N included in the meta data M, the input data I specific included in the metadata M An instruction to collect the data for distribution to which the operation value O _I is attached is transmitted. For example, as illustrated in FIGS. 5 and 6, when the input data I is restored, the operation value O _I “12345” is transmitted to the client 3 _i having the identifier “6612” that matches the operation value O _Pj “6612”. The distribution data with “” is instructed to be transmitted.

そして、分散用データ回収部２２は、メタデータＭに含まれるデータピースＰ_１〜Ｐ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮに適合する各クライアント３_１〜３_Ｍから、メタデータＭに含まれる入力データＩ固有の演算値Ｏ_Ｉが付されている分散用データＤ_１〜Ｄ_Ｎを回収する。このとき、クライアント３_ｉがクライアント分散用データ送信手順Ｓ３０３を実行する。 Then, the distributed data collecting unit 22, the calculated value _O P1 _{~ O} conforming to _PN each client ₃ 1 to 3 _M of data pieces _P 1 to P _N included in the meta data M, included in the metadata M input recovering data I specific operation value _{O I} is attached is distributed data _D 1 and to D _N. At this time, the client 3 _i executes a client distribution data transmission procedure S303.

ここで、演算値Ｏ_Ｐ１〜Ｏ_ＰＮに一致する場合だけでなく、演算値Ｏ_Ｐ１〜Ｏ_ＰＮ付近に位置する演算値に一致するクライアント３_１〜３_Ｍからも分散用データＤ_１〜Ｄ_Ｎを回収する。これにより、分散用データ回収部２２のアクセス範囲が自動的に広がるため、同一の分散用データを保管するクライアントが少なく、分散用データの回収に時間を要する場合であっても、分散用データを速やかに回収することができる。管理テーブルではなく演算値で管理することによって分散データの回収遅延を防ぐため、不特定多数のクライアントが参加する場合であっても、管理テーブルのライフサイクル管理が非常に重くなる事態を避けることができる。 Here, the calculated value _O P1 _{~ O} well if it matches the _PN calculated value _O P1 _{~ O} Distributed data _D 1 from the client ₃ 1 to 3 _M to match the calculated value which is located near the _PN to D _N Recover. As a result, since the access range of the distribution data collection unit 22 is automatically expanded, there are few clients that store the same distribution data, and even if it takes time to collect the distribution data, It can be recovered quickly. To avoid delays in the collection of distributed data by managing with calculated values instead of management tables, avoid situations where lifecycle management of management tables becomes very heavy even when an unspecified number of clients participate. it can.

クライアント分散用データ送信手順Ｓ３０３では、クライアント分散用データ送受信部３３が、分散用データ格納部３２に、メタデータＭに含まれる入力データＩ固有の演算値Ｏ_Ｉが付されている分散用データＤ_ｊが格納されているか否かを確認する。そして、メタデータＭに含まれる入力データＩ固有の演算値Ｏ_Ｉが付されている分散用データＤ_ｊが格納されている場合には、クライアント分散用データ送受信部３３が、その分散用データＤ_ｊを送信する。 In the client distribution data transmission step S303, the client distribution data transmission / reception unit 33 distributes the distribution data D in which the distribution data storage unit 32 is provided with the operation value O _I specific to the input data I included in the metadata M. Check whether _j is stored. When the input data I specific operation value O _I distributed data D _j that is assigned included in the meta data M is stored, the client distributed data transceiver 33 is, the dispersion data D _j is sent.

クライアントから分散用データＤ_１〜Ｄ_Ｎの有無を確認された際に分散用データＤ_１〜Ｄ_Ｎがあれば送信するように指示を行う型の構成であることが好ましい。この時、クライアント３_１〜３_Ｍは指示されたデータがあれば送信を行う。この仕組みによりデータ分散保管装置１Ｂは分散用データＤ_１〜Ｄ_Ｎをどのクライアント３_１〜３_Ｍに保管したかを管理する必要がなくなり、従来の方式と比べ、管理に要する負荷を大きく軽減できる。このように、クライアント３_１〜３_Ｍを直接的に管理する必要がない。また、データ分散保管装置１Ｂの処理負荷を軽くできるだけでなく、クライアント３_１〜３_Ｍの増減（システムのスケーラビリティ）に対して、柔軟に対応できる特徴となる。 It is preferred from the client is an instruction type configuration for performing to transmit if there is variance data D ₁ to D _N when it is confirmed whether the distributed data D ₁ to D _N. At this time, the clients 3 _{1 to} 3 _M transmit if instructed data exists. This mechanism eliminates the need for the data distributed storage device 1B to manage which client 3 _{1 to} 3 _M stores the distribution data D _{1 to} _DN, and can greatly reduce the load required for management compared to the conventional method. . Thus, it is not necessary to directly manage the clients 3 _{1 to} 3 _M. Further, not only can reduce the processing load of the data distribution storing apparatus 1B, the client 3 ₁ to 3 _M increase or decrease of the (system scalability), and wherein the flexible response.

データ結合手順Ｓ２０３では、データ結合部２３が、メタデータＭに含まれるデータピースＰ_１〜Ｐ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮに従って、分散用データ回収手順Ｓ２０２で回収した分散用データＤ_１〜Ｄ_Ｎを配列し、データピースＰ_１〜Ｐ_Ｎを結合する。例えば、メタデータＭの演算値Ｏ_Ｐ１〜Ｏ_ＰＮがデータ分割部１２で分割したデータの順に配列されている場合、データ結合部２３は、分散用データＤ_１〜Ｄ_Ｎの演算値Ｏ_Ｐ１〜Ｏ_ＰＮを読み出し、メタデータＭから演算値Ｏ_Ｐ１〜Ｏ_ＰＮの順に分散用データＤ_１〜Ｄ_Ｎを並べ、並べた順に分散用データＤ_１〜Ｄ_Ｎに含まれるデータピースＰ_１〜Ｐ_Ｎを結合する。そして、データ結合部２３は、結合データＢをデータ復元部２４に出力する。 The data binding procedure S203, the data combining unit 23, the meta according to the calculation value _O P1 _{~ O PN} data pieces _P 1 to P _N included in the data M, collected dispersed data _D 1 ~ Distributed data recovery procedure S202 D _N is arranged and data pieces P _{1 to} P _N are combined. For example, if the calculated value _O P1 _{~ O PN} metadata M are arranged in the order of data divided by the data dividing unit 12, the data coupling unit 23, the arithmetic value _{O P1} ~ Distributed data _D 1 to D _N reads O _PN, arranged calculated value _O P1 _{~ O} distributed data _D 1 in the order of _PN to D _N from the metadata M, data pieces _P 1 contained in the distributed data _D 1 to D _N in the order arranged to P _N Join. Then, the data combining unit 23 outputs the combined data B to the data restoring unit 24.

データ復元手順Ｓ２０４では、データ復元部２４が、メタデータＭに含まれる変更履歴に基づいて、結合データＢを入力データＩに復元する。例えば、データ変更部１１が入力データの一体化を行う場合、データ復元部２４は、メタデータＭからデータ変更部１１の一体化関数を読み出し、読み出した一体化関数を用いて結合データＢの逆一体化処理を行う。これにより、データ復元部２４は、入力データＩを復元することができる。 In the data restoration procedure S204, the data restoration unit 24 restores the combined data B to the input data I based on the change history included in the metadata M. For example, when the data change unit 11 performs integration of input data, the data restoration unit 24 reads the integration function of the data change unit 11 from the metadata M, and reverses the combined data B using the read integration function. Perform integration processing. Thereby, the data restoration unit 24 can restore the input data I.

なお、データ分散保管装置１Ａは、さらにデータ分散保管装置１Ｂの機能を備えていてもよい。例えば、データ分散保管装置１Ａが、さらにメタデータ取得部２１と、分散用データ回収部２２と、データ結合部２３と、データ復元部２４と、を備えていてもよい。同様に、データ分散保管装置１Ｂも、さらにデータ分散保管装置１Ａの機能を備えていてもよい。 The data distributed storage device 1A may further have the function of the data distributed storage device 1B. For example, the data distribution storage device 1A may further include a metadata acquisition unit 21, a distribution data collection unit 22, a data combination unit 23, and a data restoration unit 24. Similarly, the data distributed storage device 1B may further have the function of the data distributed storage device 1A.

本実施形態に係るデータ分散保管システム及びデータ分散保管方法は、クライアント３_１〜３_Ｍに固定アドレスを付与することなく、分散用データＤ_１〜Ｄ_Ｎの保管及び回収をすることができる。これにより、分散用データＤ_１〜Ｄ_Ｎの保管と回収に係る通信を著しく効率化することができる。また、どのクライアント３_１〜３_Ｍにどの分散用データＤ_１〜Ｄ_Ｎが保管されているかを管理する必要がなくなるため、クライアント３_１〜３_Ｍの状態を管理するための通信や記憶容量がデータ分散保管装置１Ａ及び１Ｂに不要となり、不特定多数が参加するシステムの構築を容易にできる。また、システムにクライアントを増減させる場合、ＣｏｎｓｉｓｔｅｎｔＨａｓｈｉｎｇの手法を応用し、新たな装置が徐々にシステムに参加し、滞り無く装置を除外できる仕組みにより、システムのスケーラビリティを確保し、運用を容易にできる。 The data distributed storage system and the data distributed storage method according to the present embodiment can store and collect the data D _{1 to} _DN for distribution without giving fixed addresses to the clients 3 _{1 to} 3 _M. As a result, it is possible to remarkably improve the communication related to storage and collection of the data D _{1 to} _DN for distribution. Further, since the need to manage what client 3 ₁ to 3 _M which distributed data D ₁ to D _N to is stored is eliminated, communication and storage capacity for managing the status of the client 3 ₁ to 3 _M It becomes unnecessary for the data distributed storage devices 1A and 1B, and it is easy to construct a system in which an unspecified number of people participate. In addition, when increasing or decreasing the number of clients in the system, the Consistent Hashing method is applied, and a system that allows new devices to gradually join the system and remove devices without delay can ensure system scalability and facilitate operation. .

さらに、本実施形態に係るデータ分散保管システム及びデータ分散保管方法は、クライアント３_１〜３_Ｍを起点とするアクセス方法（プル型通信）の手法の採用が可能になっている。これにより、分散用データＤ_１〜Ｄ_Ｎ回収時の遅延時間を最短化することができる。 Furthermore, the data distributed storage system and the data distributed storage method according to the present embodiment can adopt an access method (pull type communication) method starting from the clients 3 _{1 to} 3 _M. This makes it possible to minimize the delay time of the dispersion data D ₁ to D _N recovery.

入力データＩの演算値を指定することで、１台のクライアント３_ｉが複数の分散用データを保管している場合であっても、１度の回収指示ですべての分散用データを回収することができる。 By specifying the operation value of the input data I, even if one client 3 _i stores a plurality of data for distribution, all the data for distribution can be collected with one collection instruction. Can do.

（実施形態２）
図７に、本実施形態に係るデータ分散保管システムのフロチャートを示す。本実施形態に係るデータ分散保管システムでは、実施形態１で説明した図２に示す分散用データ取得手順Ｓ３０１において、図１に示すクライアント３_ｉは、ＤＲＴアプリケーションの起動時（Ｓ４１２）にデータ分散保管装置１Ａと通信を行い（Ｓ４１３）、保管すべきデータＤ_ｊがあるか否かを確認する（Ｓ４１４）。このとき、クライアント３_ｉは、データ分散保管装置１Ａからの分散用データの回収指示があるか否かも確認する（Ｓ４１５）。そして、分散用データの回収指示がある場合は、回収を優先してステップＳ４１６を実行する。 (Embodiment 2)
FIG. 7 shows a flowchart of the data distributed storage system according to the present embodiment. In the data distributed storage system according to the present embodiment, in the data acquisition procedure for distribution S301 illustrated in FIG. 2 described in the first embodiment, the client 3 _i illustrated in FIG. 1 performs data distributed storage at the time of starting the DRT application (S412). communicates with device 1A (S413), it checks whether there is data _{D j} to be stored (S414). At this time, the client 3 _i also checks whether there is an instruction to collect the data for distribution from the data distribution storage device 1A (S415). If there is an instruction to collect the data for distribution, step S416 is executed with priority on the collection.

ステップＳ４１６では、クライアント３_ｉは、データ分散保管装置１Ａへの確認でデータ分散保管装置１Ａから回収すべき分散用データＤ_ｊの情報をうけとった場合には、ローカルに保管している分散用データＤ_ｊのうち該当するデータをデータ分散保管装置１Ａに送信する。保管すべきデータがある場合には（Ｓ４１７）、分散用データがある場合は受信して、保管する（Ｓ４１８）。 In step S416, when the client 3 _i receives the information of the data D _j for distribution to be collected from the data distributed storage device 1A in the confirmation to the data distributed storage device 1A, the data for distribution stored locally The corresponding data in D _j is transmitted to the data distributed storage device 1A. If there is data to be stored (S417), if there is data for distribution, it is received and stored (S418).

本実施形態では、クライアント３_１〜３_Ｍからのアクセスを起点とするため、配信先管理によって生じる回収効率の差が生じない。これにより、システムの簡略化が可能になる。更に配信先を、例えば、ハッシュで規定することにより、回収指示が膨大になることを回避する仕組みを有する。 In this embodiment, since the access from the clients 3 _{1 to} 3 _M is a starting point, there is no difference in collection efficiency caused by distribution destination management. Thereby, the system can be simplified. Furthermore, the distribution destination is defined by, for example, a hash, thereby preventing a collection instruction from becoming enormous.

また、本実施形態に係るデータ分散保管システムでは、実施形態１で説明したデータ分散保管装置１Ａは、さらにデータ分散保管装置１Ｂの機能を備え、以下の動作を行うことが好ましい。
ステップＳ４１１では、クライアント３_ｉが起動処理を行う。このとき、クライアント３_ｉは、予め登録されたアクセス先やＰＣの固有情報を取得する。ステップＳ４１１は、クライアント３_ｉ起動後に自動的に実行してもよいし、クライアント３_ｉ起動中に常に実行するようにしてもよいし、別のトリガで任意のタイミングに任意の期間実行してもよい。また、ステップＳ４１１において、起動のためのパスワード入力を要求してもよい。ステップＳ４１２では、ステップＳ４１１で取得する各種情報や認証結果に基づきＤＲＴアプリケーションを起動する。 In the data distributed storage system according to the present embodiment, the data distributed storage device 1A described in the first embodiment preferably further includes the function of the data distributed storage device 1B and performs the following operations.
In step S411, the client 3 _i performs activation processing. At this time, the client 3 _i acquires pre-registered access destination and PC specific information. Step S411 may or automatically be performed after the client 3 _i starts, it may also be always executed in the client 3 _i running, running any time to any time in a different trigger Good. In step S411, a password input for activation may be requested. In step S412, the DRT application is activated based on various information and authentication results acquired in step S411.

ステップＳ４１３では、クライアント３_ｉが、予め登録されたアクセス先であるデータ分散保管装置１ＡとＩＰ通信可能であるか否かを確認する。この確認は、例えば、ｈｔｔｐやｈｔｔｐｓといった汎用のウエブブラウザ用のプロトコルを用いてもよいし、他の方法でもよい。確認の際に、クライアント３_ｉが正しく対象としている装置かどうかを、データ分散保管装置１Ａが確認してもよい。この場合、データ分散保管装置１Ａは、自己認証局として、データ分散保管装置１Ａが予め発行した証明書とクライアント３_ｉの送信する証明書とを照合する手順をとってもよい。この手順は、ベリサインなどに代表される公的な認証サービスを使っても良い。 In step S413, the client 3 _i confirms whether or not the IP communication can be performed with the data distribution storage device 1A that is an access destination registered in advance. For this confirmation, for example, a general-purpose web browser protocol such as http or https may be used, or another method may be used. At the time of confirmation, the data distributed storage device 1A may confirm whether or not the client 3 _i is a target device correctly. In this case, the data distributed storage device 1A may take a procedure of collating a certificate issued in advance by the data distributed storage device 1A with a certificate transmitted by the client 3 _i as a self-certification authority. This procedure may use a public authentication service such as VeriSign.

ステップＳ４１４では、クライアント３_ｉが、保管すべき分散用データの有無を確認する。例えば、クライアント３_ｉは、分散用データの有無を確認するためのデータ確認パケットをデータ分散保管装置１Ａに送出し、その応答である応答パケットを待つ。データ確認パケット及び応答パケットは多くのネットワーク環境で単一パケットとなるよう情報量を制限して効率的な送信ができるように配慮することが望ましい。このような情報量の最大値をｐａｔｈＭＴＵ（ＭａｘｉｍｕｍＴｒａｎｓｍｉｓｓｉｏｎＵｎｉｔ）と言い、一般的には１４００〜１５００程度の数値が採用される。 In step S414, the client _{3 i} confirms the existence of the distributed data to be stored. For example, the client 3 _i transmits a data confirmation packet for confirming the presence / absence of data for distribution to the data distribution storage device 1A, and waits for a response packet as a response. It is desirable to consider the data confirmation packet and the response packet so that they can be efficiently transmitted by limiting the amount of information so that it becomes a single packet in many network environments. Such a maximum value of information amount is called a path MTU (Maximum Transmission Unit), and generally a numerical value of about 1400 to 1500 is adopted.

ステップＳ４１５では、クライアント３_ｉが、応答パケットの内容を確認し、分散用データの回収を指示する回収指示があるかどうかを判断する。回収指示があった場合は、ステップＳ４１６で該当する全ての分散用データを、データ分散保管装置１Ａへ送信する。ステップＳ４１５において回収指示がない場合は、ステップＳ４１７に移行する。 In step S415, the client 3 _i is, to check the contents of the response packet, determines whether there is a recovery instruction to recover the distributed data. If there is a collection instruction, all corresponding distribution data is transmitted to the data distribution storage device 1A in step S416. If there is no collection instruction in step S415, the process proceeds to step S417.

ステップＳ４１７では、クライアント３_ｉが、受け取るべき保管データがあるかどうかを確認し、ある場合にはステップＳ４１８に移行し、ない場合にはステップＳ４１９に移行する。ステップＳ４１８では、クライアント３_ｉが、分散用データを受信して格納する。このとき、パケット内通信の効率化のため、クライアント３_ｉの受け取るべき分散用データは応答パケット内に含まれていることが好ましい。また受け取るべきデータは分割されているが、ｐａｔｈＭＴＵを勘案し１パケット内に収まるサイズとする。更に受け取るべき分散用データが複数ある場合、受信するパケットも複数となるが、この際パケット内には受信すべき残りパケット数が埋め込まれているため、クライアント３_ｉはこのパケット数を参照し、０になるまで受信を繰り返す。ステップＳ４１９では、ステップＳ４１８で前述の残りパケット数が０になると受信を完了し、待機状態に移行する。所定の時間待機後、ステップＳ４１３へ戻る。 In step S417, the client 3 _i checks whether there is stored data to be received. If there is, the process proceeds to step S418, and if not, the process proceeds to step S419. In step S418, the client 3 _i receives and stores the data for distribution. At this time, in order to improve the efficiency of intra-packet communication, it is preferable that the distribution data to be received by the client 3 _i is included in the response packet. Although the data to be received is divided, the size is set within one packet in consideration of the pathMTU. Further, when there are a plurality of distribution data to be received, there are a plurality of packets to be received. At this time, since the number of remaining packets to be received is embedded in the packet, the client 3 _i refers to this number of packets, Repeat reception until zero. In step S419, when the number of remaining packets is 0 in step S418, the reception is completed and the process shifts to a standby state. After waiting for a predetermined time, the process returns to step S413.

ステップＳ４２１では、データ分散保管装置１Ａが起動処理を行う。ステップＳ４２１はデータ分散保管装置１Ａの内部の動作であり、本実施形態ではデータ分散保管装置１Ａの起動後すぐに実行する。ステップＳ４２１は、データ分散保管装置１Ａの起動以外のトリガで、任意のタイミングに実行してもよい。 In step S421, the data distribution storage device 1A performs activation processing. Step S421 is an internal operation of the data distributed storage device 1A. In this embodiment, the operation is executed immediately after the data distributed storage device 1A is activated. Step S421 may be executed at an arbitrary timing by a trigger other than the activation of the data distributed storage device 1A.

ステップＳ４２１の後、ステップＳ４２２、ステップＳ４２８及びステップＳ４３１を実行する。ステップＳ４２２、ステップＳ４２８及びステップＳ４３１は、データ分散保管装置１Ａの内部で分化して並列動作するソフトウェアプロセスが担当する。本実施形態では並列動作としたが、演算性能が比較的低い機器においては、ステップＳ４２２、ステップＳ４２８及びステップＳ４３１を順次処理してもよい。 After step S421, step S422, step S428, and step S431 are executed. Steps S422, S428, and S431 are handled by software processes that are differentiated and operate in parallel inside the data distributed storage device 1A. In the present embodiment, the parallel operation is performed. However, in a device with relatively low calculation performance, step S422, step S428, and step S431 may be sequentially processed.

ステップＳ４２２では、データ分散保管装置１Ａが、保管すべき入力データを監視している。保管すべき入力データがない場合は規定時間待機し、定期的に監視を継続する。保管すべき入力データを検出した場合はステップＳ４２３に移る。
ステップＳ４２３では、データ分散保管装置１Ａが、保管すべき入力データを一体化し、送信パケット単位で分割処理を行い、分散用データを一時的にデータ分散保管装置１Ａ内へ保管する。本実施形態では一時的な分散用データをデータベースに登録しているが、ファイルとして一時保管しても良い。処理後はステップＳ４２２に戻り、入力データの監視を継続する。 In step S422, the data distribution storage device 1A monitors input data to be stored. If there is no input data to be stored, wait for the specified time and continue monitoring periodically. If input data to be stored is detected, the process proceeds to step S423.
In step S423, the data distribution storage device 1A integrates the input data to be stored, performs division processing in units of transmission packets, and temporarily stores the distribution data in the data distribution storage device 1A. In this embodiment, temporary distribution data is registered in the database, but it may be temporarily stored as a file. After processing, the process returns to step S422, and monitoring of input data is continued.

ステップＳ４３１では、データ分散保管装置１Ａが、入力データを復元するための分散用データの回収依頼を待つ。回収依頼は、例えば、入力データの保管を依頼したユーザが、ｗｅｂアプリケーションを用いて復元動作を指示することによって発生する。回収依頼のトリガは主にユーザ操作であるが、入力データの損失を検出した場合など自動的に依頼が発生しても良い。回収依頼がない場合は、定期的に回収依頼を監視し、回収依頼を検出した場合はステップＳ４３２に移る。 In step S431, the data distribution storage device 1A waits for a distribution data collection request for restoring the input data. The collection request is generated, for example, when a user who requests storage of input data instructs a restoration operation using a web application. The trigger for the collection request is mainly a user operation, but the request may be automatically generated when a loss of input data is detected. If there is no collection request, the collection request is periodically monitored, and if a collection request is detected, the process proceeds to step S432.

ステップＳ４３２では、データ分散保管装置１Ａが、内部のキャッシュに分散用データが存在していないかどうか確認を行う。クライアント３_ｉへ分散用データの回収指示を送信する前に、分散用データの確認を行うことで、分散用データを効率的に回収することができる。
ステップＳ４３３では、データ分散保管装置１Ａが、自己のキャッシュ内に復元可能な分散用データが揃っているか否かを判定し、キャッシュ内に復元可能な全ての分散用データが揃っていない場合はステップＳ４３４へ移行し、キャッシュ内に復元可能なデータが全て揃っている場合はステップＳ４３６へ移行する。 In step S432, the data distribution storage device 1A checks whether or not there is data for distribution in the internal cache. The distribution data can be efficiently collected by confirming the distribution data before transmitting the distribution data collection instruction to the client 3 _i .
In step S433, the data distribution storage device 1A determines whether or not the distribution data that can be restored is prepared in its own cache, and if not all the distribution data that can be restored is prepared in the cache, step S433 is performed. The process proceeds to S434, and if all the data that can be restored is prepared in the cache, the process proceeds to Step S436.

ステップＳ４３４では、データ分散保管装置１Ａが、各クライアント３_ｉから分散用データを回収する。このとき、データ分散保管装置１Ａは、クライアント３_ｉのステップＳ４１４で送出されるパケットへの応答の際に回収指示を送信する。また、キャッシュ内になかった分散用データを選択的に回収してもよい。
ステップＳ４３５では、データ分散保管装置１Ａが、回収すべき分散用データの回収が完了したか否かを判定し、復元可能な分散用データが揃うまで受信待機する。回収が完了するまで分散用データを回収し続け、回収が完了した時点でステップＳ４３６へ移行する。 In step S434, the data distribution storage device 1A collects the data for distribution from each client 3 _i . At this time, data distribution storing apparatus 1A transmits the collected instructions in response to the packet sent in step S414 of the client 3 _i. Further, data for distribution that was not in the cache may be selectively collected.
In step S435, the data distribution storage device 1A determines whether or not collection of the distribution data to be collected is completed, and waits for reception until there is complete distribution data that can be restored. The distribution data is continuously collected until the collection is completed, and the process proceeds to step S436 when the collection is completed.

ステップＳ４３６では、データ分散保管装置１Ａが、回収した分散用データを用いて入力データを復元する。例えば、回収された分散用データを合成して逆一体化を行う。
ステップＳ４３７では、データ分散保管装置１Ａが、復元した入力データをユーザに送信する。例えば、予め定められたデータの格納場所又はユーザ指定の格納場所に復元した入力データを送信する。 In step S436, the data distribution storage device 1A restores input data using the collected distribution data. For example, the collected dispersion data is synthesized and reverse integration is performed.
In step S437, the data distribution storage device 1A transmits the restored input data to the user. For example, the restored input data is transmitted to a predetermined data storage location or a user-specified storage location.

ステップＳ４２８では、データ分散保管装置１Ａが、保管すべき分散用データがあるか否かを確認する旨の保管確認をクライアント３_ｉから受信し、分散用データを保管可能なクライアント３_ｉを確認する。ステップＳ４２８は、ステップＳ４２３で生成された分散用データ群をクライアント３_ｉ群へ送信するためのソフトウェアプロセスが担当する。但しこれはデータ分散保管装置１Ａが主体となって、アクセス可能なクライアント３_ｉへ分散用データを配布し回収するためのステップであり、実施形態１と複合し、プッシュ形態とプルの形態を共存させて動作する場合のステップである。ステップＳ４２８は、ステップＳ４２３の処理が完了すると実行するが、定期的に実行してもよい。 In step S428, the data distribution storage device 1A receives from the client 3 _i a storage confirmation that confirms whether there is distribution data to be stored, and confirms the client 3 _i that can store the distribution data. . Step S428, the software process for transmitting distributed data group generated in step S423 to the client 3 _i group is responsible. However, this is a step for distributing and collecting the data for distribution to the accessible client 3 _i , with the data distributed storage device 1A as the main body. Combined with the first embodiment, the push form and the pull form coexist. This is a step when the operation is performed. Step S428 is executed when the process of step S423 is completed, but may be executed periodically.

ステップ４２４では、ステップＳ４３５で回収が完了していない分散用データがあるか否かを判定し、ある場合にはステップＳ４２５に移行し、ない場合にはステップＳ４２６に移行する。ステップＳ４２５では、クライアント３_ｉへ分散用データの回収指示を送信して分散用データを回収する。本実施形態では、ここで回収したデータの復元も別のソフトウェアプロセスであるステップＳ４３５、ステップＳ４３６に処理を委ねている。これは実施形態１でのプッシュ型アクセスが可能なクライアントと、実施形態２のプル型のみ可能なクライアントが混在した場合の手順の複雑さを回避している。 In step 424, it is determined whether or not there is data for distribution that has not been collected in step S435. If there is, the process proceeds to step S425, and if not, the process proceeds to step S426. In step S425, the distribution data collection instruction is transmitted to the client 3 _i to collect the distribution data. In the present embodiment, the recovery of the collected data is also left to other software processes, step S435 and step S436. This avoids the complexity of the procedure when the client capable of push-type access in the first embodiment and the client capable of only the pull-type in the second embodiment coexist.

ステップＳ４２６では、クライアント３_ｉへ配信すべき分散用データがあるか否かを判定し、あればステップＳ４２７に移行し、なければステップＳ４２８へ移行する。ステップＳ４２７では、分散用データをクライアント３_ｉへ送信する。 In step S426, it is determined whether there is dispersing data to be delivered to the client _{3 i,} the process proceeds to step S427 if, proceeds unless to step S428. In step S427, the distribution data is transmitted to the client 3 _i .

（実施形態３）
本実施形態に係るデータ分散保管システムは、データピースＰ_１〜Ｐ_Ｎを複製することで冗長保管を行う。従来システムでは、例えば、複製数として３０（３０冗長）などの例でシステム構成を行うことが考えられる。これはクライアントが常時通電し、通信可能な状態が保障されない装置を前提としているためであり、データ分散保管装置など可用性が高いクライアントを想定する場合は、過剰な冗長設計となる場合がある。例えば、２冗長の場合が、最低の冗長保管数となるが、その場合でも実保管容量は物理的保管容量の半分となる。このように、信頼性と可用性が高いクライアントを想定する場合、効率を考慮した冗長保管の仕組みが必要である。 (Embodiment 3)
The data distributed storage system according to the present embodiment performs redundant storage by duplicating the data pieces P _{1 to} P _N. In the conventional system, for example, it is conceivable to configure the system with an example of 30 (30 redundancy) as the number of replicas. This is based on the premise that the client is always energized and the communicable state is not guaranteed. When assuming a highly available client such as a data distributed storage device, an excessively redundant design may occur. For example, in the case of 2 redundancy, the number of redundant storages is the lowest, but even in that case, the actual storage capacity is half of the physical storage capacity. As described above, when a client with high reliability and availability is assumed, a redundant storage mechanism considering efficiency is required.

図８に、本実施形態に係るデータ分散保管システムの一例を示す。本実施形態に係るデータ分散保管システムでは、実施形態１のデータ分散保管装置１Ａがパリティ演算部１８をさらに備える。そして、図２に示すデータ分割手順Ｓ１０３においてデータピースＰ_１〜Ｐ_Ｎのパリティデータを生成することを特徴とする。 FIG. 8 shows an example of a data distributed storage system according to this embodiment. In the data distributed storage system according to the present embodiment, the data distributed storage device 1A according to the first embodiment further includes a parity calculation unit 18. Then, the parity data of the data pieces P _{1 to} P _N is generated in the data division procedure S103 shown in FIG.

図９に、本実施形態において扱うデータの一例を示す。ステップＳ２２１，Ｓ２２２，Ｓ２２３，Ｓ２２４，Ｓ２２５はデータをクライアント３_１〜３_３へ保管する処理を示し、ステップＳ２２５，Ｓ２２６，Ｓ２２７，Ｓ２２８，Ｓ２２９は保管したデータを回収する処理を模式化したものである。簡単のため、図８に示すＮが６であり、Ｍが３であり、Ｋが３である場合について示した。以下、本実施形態の特徴について説明する。 FIG. 9 shows an example of data handled in the present embodiment. Step S221, S222, S223, S224, S225 denotes a storing process data to the client ₃ 1 to 3 _3, step S225, S226, S227, S228, S229 than the process of recovering the stored data that schematizes is there. For simplicity, the case where N shown in FIG. 8 is 6, M is 3, and K is 3 is shown. Hereinafter, features of the present embodiment will be described.

図２に示すデータ分割手順Ｓ１０３において、さらに、パリティ演算部１８が、２個のデータピースＰ_１及びＰ_２を用いて、１個のパリティデータＰｐ_１を生成する。パリティ演算部１８は、データピースＰ_１〜Ｐ_６のうちの２つを用いてもよいし、３つ以上を用いてもよい。パリティデータＰｐ_１〜Ｐｐ_３の生成方法は、可逆演算であればよく、例えば、加算又は減算又はこれらの組み合わせを用いて行うことができる。これにより、ステップＳ２２３に示すデータピースＰ_１〜Ｐ_６及びパリティデータＰｐ_１〜Ｐｐ_３が分散用データ構成部１５に入力される。 In the data division procedure S103 shown in FIG. 2, further, the parity operation unit 18, by using the two data pieces _{P 1} and _{P 2,} generates one parity data Pp _1. The parity calculation unit 18 may use two of the data pieces P _{1 to} P ₆ , or may use three or more. The generation method of the parity data Pp _{1 to} Pp ₃ may be a reversible operation, and can be performed using, for example, addition, subtraction, or a combination thereof. Thus, the data pieces P _{1 to} P ₆ and the parity data Pp _{1 to} Pp ₃ shown in step S223 are input to the distribution data configuration unit 15.

ここで、パリティ演算部１８の用いるデータピースＰ_１〜Ｐ_６の数は、可変であることが好ましい。例えば、通信や回線の状態、回収の時間、通信の精度、クライアント３_１〜３_３からの応答時間又はクライアント３_１〜３_３のアクセス頻度によって、パリティ演算部１８の用いるデータピースＰ_１〜Ｐ_６の数を変化させる。 Here, it is preferable that the number of data pieces P _{1 to} P ₆ used by the parity calculation unit 18 is variable. For example, communications and line status, recovery time, the communication accuracy, the access frequency of the response time or the client ₃ 1 to 3 ₃ from the client ₃ 1 to 3 _3, the data-piece _P 1 using the parity operation unit 18 to P Change the number of ₆ .

演算値算出手順Ｓ１０４において、演算値算出部１３が、演算アルゴリズムを用いて、パリティデータＰｐ_１〜Ｐｐ_３の演算値Ｏｐ_Ｐ１〜Ｏｐ_Ｐ３をさらに算出する。そして、分散用データ格納手順Ｓ１０６において、分散用データ構成部１５が、入力データＩ固有の演算値Ｏ_Ｉ及びパリティデータＰｐ_１〜Ｐｐ_３の演算値Ｏｐ_Ｐ１〜Ｏｐ_Ｐ３をパリティデータＰｐ_１〜Ｐｐ_３に付して分散用データＤｐ_１〜Ｄｐ_３をさらに構成する。そして、分散用データ格納部１６が、分散用データＤｐ_１〜Ｄｐ_３をさらに格納する。 In the calculation value calculation procedure S104, the calculation value calculation unit 13 further calculates the calculation values Op _{P1 to} Op _P3 of the parity data Pp _{1 to} Pp ₃ using the calculation algorithm. Then, in a distributed data storage procedure S106, distributed data construction unit 15, the parity data _Pp 1 calculated value _{_Op} P1 _~Op _P3 of the input data I specific operation value _{O I} and parity data _Pp 1 _~Pp ₃ _~Pp ₃ , the data Dp _{1 to} Dp ₃ for distribution are further configured. Then, the distribution data storage unit 16 further stores the distribution data Dp _{1 to} Dp ₃ .

サーバ分散用データ送信手順Ｓ１０７において、サーバ分散用データ送受信部１７が、分散用データＤｐ_１〜Ｄｐ_３を、パリティデータＰｐ_１〜Ｐｐ_３の演算値Ｏｐ_Ｐ１〜Ｏｐ_Ｐ３に適合する識別子ＩＤ_１〜ＩＤ_３を有するクライアント３_１〜３_３にさらに送信する。これにより、パリティデータＰｐ_１〜Ｐｐ_３の含まれた分散用データＤｐ_１〜Ｄｐ_３がクライアント３_１〜３_３に保管される。 In the server distribution data transmission procedure S107, the server distribution data transmitting / receiving unit 17 converts the distribution data Dp _{1 to} Dp ₃ to identifiers ID ₁ to ID that match the operation values Op _{P1 to} Op _P3 of the parity data Pp _{1 to} Pp _3. Further send to clients 3 ₁ to ₃ 3 with ID ₃ . Thus, parity data _Pp 1 _~Pp distributed data _Dp 1 to DP ₃ which contains ₃ is stored in the client ₃ 1 to 3 _3.

分散用データ回収手順Ｓ２０２では、分散用データ回収部２２が、さらに、クライアント３_１〜３_３から分散用データＤｐ_１〜Ｄｐ_３も回収する。そして、分散用データＤ_１〜Ｄ_６のなかで回収できないデータがあるときは、分散用データＤｐ_１〜Ｄｐ_３を用いて分散用データＤ_１〜Ｄ_６を算出する。このとき、演算値算出部１３の用いている演算アルゴリズムが必要になるため、メタデータ格納手順Ｓ１０５において、メタデータ収集部１９は演算値算出部１３の用いている演算アルゴリズムも収集する。そして、メタデータ格納１４は、演算値算出部１３の用いている演算アルゴリズムもメタデータＭに格納する。これにより、分散用データ回収部２２は、分散用データＤ_１〜Ｄ_６のすべてが収集できない場合であっても、分散用データＤｐ_１〜Ｄｐ_３を用いて分散用データＤ_１〜Ｄ_６のすべてを回収することができる。 Distributed data collection procedure S202, the distributed data collecting unit 22, further, also recovered client ₃ 1 to 3 ₃ distributed data _Dp 1 to DP ₃ from. Then, when there is data that can not be recovered among the distributed data _D 1 to D ₆ calculates the variance data _D 1 to D ₆ using a dispersion data _Dp 1 _~Dp _3. At this time, since the calculation algorithm used by the calculation value calculation unit 13 is required, the metadata collection unit 19 also collects the calculation algorithm used by the calculation value calculation unit 13 in the metadata storage procedure S105. The metadata storage 14 also stores the calculation algorithm used by the calculation value calculation unit 13 in the metadata M. As a result, the distribution data collection unit 22 uses the distribution data Dp _{1 to} Dp ₃ to distribute the distribution data D _{1 to} D ₆ even when all of the distribution data D _{1 to} D ₆ cannot be collected. All can be recovered.

さらに本実施形態のデータ分散保管方法の具体例を説明する。クライアント３_１〜３_３のうちの十分に信頼性が高いものを想定する場合、分割の際にパリティ処理を加えることで冗長度を最適化し、蓄積コストを低減させることができる。またこの時回収のための通信を最適化し、回収時間を最短化した上でパリティ分の通信量増加を避けることができる。 Furthermore, a specific example of the data distributed storage method of this embodiment will be described. When it is assumed that the clients 3 _{1 to} 3 ₃ have sufficiently high reliability, redundancy can be optimized by adding parity processing at the time of division, and the storage cost can be reduced. Further, at this time, the communication for recovery can be optimized, the recovery time can be minimized, and an increase in the communication amount for parity can be avoided.

例えば、図９に示すステップＳ２２４及びステップＳ２２５において、クライアント３_１〜３_３のうちのいくつかのデータを束ね、そのうち、１つをパリティデータの保管として利用する方法をとることができる。例えば、３つのクライアント３_１〜３_３がある場合、うち、２つのクライアント３_１及びクライアント３_２に二種類のデータピースＰ_１及びＰ_２を保管し、残りの１つのクライアント３_３には、この二種類のデータのパリティデータＰｐ_１を保管することも、可能である。上記の場合、クライアント３_１〜３_３の有効容量は２／３となるため約６７％となる。クライアントを５つとした場合は４／５となり、８０％まで改善できる。 For example, in step S224 and step S225 shown in FIG. 9, it is possible to use a method of bundling several pieces of data among the clients 3 _{1 to} 3 ₃ and using one of them for storing parity data. For example, if there are three clients ₃ 1 to 3 _3, out, the two client _{3 1} and the client _{3 2} stores two types of data pieces _{P 1} and _{P 2,} the remaining one client _{3 3,} It is also possible to store the parity data Pp ₁ of these two types of data. In the above case, since the effective capacities of the clients 3 _{1 to} 3 ₃ are 2/3, they are about 67%. If there are 5 clients, it becomes 4/5, which can be improved to 80%.

しかしながらこの方式では、システムを運用している中で束ねるクライアントの数を変更する場合に煩雑な処理となる。そこで、本実施形態では、ステップＳ２２２の一体化処理の段階で上記の処理を行うこととし、配信及び回収の仕組みを、極力、単純化できる構成により、実現する。 However, this method is a complicated process when changing the number of clients bundled while operating the system. Therefore, in the present embodiment, the above processing is performed at the stage of the integration processing in step S222, and the distribution and collection mechanism is realized with a configuration that can be simplified as much as possible.

例えば、図９に示すステップＳ２２３からＳ２２５における保管の際は、データ分割部１２が分割する際にパリティデータＰｐ_１〜Ｐｐ_３を付加している。例えばデータ２つに対して１つのパリティを付加しており、６つに分割されたデータピースＰ_１〜Ｐ_６が２つづつ３つのブロックを構成し、それぞれにパリティデータＰｐ_１〜Ｐｐ_３が付与される。 For example, when storing in steps S223 to S225 shown in FIG. 9, parity data Pp _{1 to} Pp ₃ are added when the data dividing unit 12 divides. For example, one parity is added to two pieces of data, and three pieces of data pieces P _{1 to} P ₆ divided into _six pieces constitute three blocks, and parity data Pp _{1 to} Pp ₃ are respectively provided. Is granted.

図９に示すステップＳ２２７における回収の際、パリティデータＰｐ_１〜Ｐｐ_３とデータピースＰ_１〜Ｐ_６は区別なく回収され、復元が行われる。ステップＳ２２７では最初のブロックで２番目のデータを破線で示しているが、これはデータピースＰ_２が失われて回収できなかったことを示している。同様に２つめのブロックは全てのデータピースＰ_３〜Ｐ_４が回収され、３つめのブロックではパリティデータＰｐ_３が回収できない例を示している。 At the time of collection in step S227 shown in FIG. 9, the parity data Pp _{1 to} Pp ₃ and the data pieces P _{1 to} P ₆ are collected without distinction and restored. While indicating second data in step S227 in the first block with a broken line, which indicates that the data piece P ₂ can not be recovered is lost. Similarly, in the second block, all data pieces P _{3 to} P ₄ are collected, and parity data Pp ₃ cannot be collected in the _third block.

最初のブロックでは失われた２つめのデータピースＰ_２を１つめのデータピースＰ_１、及びパリティデータＰｐ_１から復元する。２つめ及び３つめのブロックでは、データが回収できているのでパリティデータＰｐ_２及びＰｐ_３は破棄される。この後有効なデータピースＰ_１〜Ｐ_６が揃ったので、図９に示すステップＳ２２８における逆一体化処理を行い、入力データＩの復元が行われる。 In the first block, the lost second data piece P ₂ is restored from the _first data piece P ₁ and the parity data Pp ₁ . In the second and third blocks, since the data can be collected, the parity data Pp ₂ and Pp ₃ are discarded. After that, since valid data pieces P _{1 to} P ₆ are prepared, the reverse integration process in step S228 shown in FIG. 9 is performed, and the input data I is restored.

分散用データＤ_１〜Ｄ_Ｎの回収をデータ分散保管装置１Ｂ主体（プッシュ型）で行う場合、データ回収時に、データピースＰ_１〜Ｐ_Ｎが一切損失していない場合に通信量が保管総量と同じく１．５倍となる。しかし、分散用データＤ_１〜Ｄ_Ｎが回収できない場合のみに、パリティデータＰｐ_１〜Ｐｐ_３を回収する仕組みと変更してこの方法を行えば、この通信量が１．５倍になるというデメリットは回避できる。 When the data D _{1 to} _DN for distribution is collected by the data distributed storage device 1B main body (push type), if the data pieces P _{1 to} P _N are not lost at the time of data collection, the communication amount is equal to the total storage amount. Similarly, it becomes 1.5 times. However, if this method is changed to a scheme for collecting the parity data Pp _{1 to} Pp ₃ only when the data D _{1 to} _DN for distribution cannot be collected, this communication amount is increased by 1.5 times. Can be avoided.

一方、クライアント主体（プル型）で回収を行う場合は、１つのブロックを構成する３つのデータのうち２つが揃った時点で回収は完了するので、分散用データ回収部２２はこの時点で回収依頼を打ち切ることも可能である。その際には通信量が前記のように１．５倍になることはない特徴が発揮できる。 On the other hand, when the collection is performed by the client (pull type), the collection is completed when two of the three data constituting one block are collected, so the distribution data collection unit 22 requests the collection at this point. It is also possible to abort. In this case, it is possible to exhibit the characteristic that the communication amount does not increase 1.5 times as described above.

（実施形態４）
分割して配信されたデータを改竄して意図したデータを入力データＩとして復元させることは原理上不可能である。しかし、悪意ある参加があった場合には、保管した分散用データＤ_ｊを変更し、回収指示に対して誤ったデータを分散用データＤ_ｊとして返送したり、回収指示に対して全く異なるデータを返送して、入力データＩの復元を妨害することは原理上可能である。 (Embodiment 4)
In principle, it is impossible to restore the intended data as the input data I by falsifying the divided and distributed data. However, when there is a malicious participation, the stored distribution data D _j is changed, and erroneous data in response to the collection instruction is returned as the distribution data D _j , or completely different data in response to the collection instruction Is possible in principle to prevent the restoration of the input data I.

分散用データＤ_ｊに付与されている管理情報を用いれば、単純な改竄はデータ分散保管装置１Ｂ側で排除可能である。しかしながら、管理情報の照合は、通信エラー等を想定したものであり、演算値Ｏ_Ｉ又は演算値Ｏ_Ｐｊの改竄まで行う、悪意的な行為を含めた対策を、対象としたものではない。演算値Ｏ_Ｉ又は演算値Ｏ_Ｐｊの改竄までを行った場合、データ分散保管装置１Ｂの復元した復元データＲが入力データＩと異なるデータとなってしまう。 With the management information attached to the distributed data D _j, simple alteration can be eliminated in the data distributed storage device 1B side. However, the collation of the management information assumes a communication error or the like, and does not target measures including malicious acts that are performed until the calculated value O _I or the calculated value _OPj is falsified. If performed until tampered calculation value O _I or calculated value O _Pj, restored reconstructed data R data distribution storing apparatus 1B becomes the input data I and the different data.

そこで、本実施形態に係るデータ分散保管システムは、クライント３_１〜３_Ｍのなかに悪意ある参加を試みるクライアントがあった場合に、これを排除することを特徴とする。 Therefore, the data distributed storage system according to the present embodiment is characterized in that, when there is a client who attempts malicious participation in the clients 3 _{1 to} 3 _M , this is excluded.

図１０に、本実施形態に係るデータ分散保管システムの一例を示す。本実施形態に係るデータ分散保管システムは、データ分散保管装置１Ａにおける演算値算出部１３が異なり、データ分散保管装置１Ｂがさらに演算値照合部２５を備える。 FIG. 10 shows an example of a data distributed storage system according to this embodiment. In the data distributed storage system according to the present embodiment, the calculated value calculation unit 13 in the data distributed storage device 1A is different, and the data distributed storage device 1B further includes a calculated value matching unit 25.

図１１は、本実施形態に係るデータ分散保管方法の一例を示すシーケンス図である。本実施形態に係るデータ分散保管方法は、図２に示すデータ復元手順Ｓ２０４の後に、演算値照合手順Ｓ２０５と、不正データ排除手順Ｓ２０６と、をさらに有する。 FIG. 11 is a sequence diagram illustrating an example of a data distributed storage method according to the present embodiment. The data distributed storage method according to the present embodiment further includes a calculation value matching procedure S205 and an illegal data exclusion procedure S206 after the data restoration procedure S204 shown in FIG.

図２に示す演算値算出手順Ｓ１０４において、演算値算出部１３が、予め定められた演算アルゴリズムを用いて、入力データＩ自体の演算値Ｏ_Ｏをさらに算出する。そして、メタデータ格納手順Ｓ１０５において、メタデータ格納部１４は、演算値算出手順Ｓ１０４で算出した入力データＩ自体の演算値Ｏ_Ｏをさらに含むメタデータＭを格納する。 In operation value calculation procedure S104 shown in FIG. 2, the arithmetic value calculating unit 13, using a predetermined arithmetic algorithm, and calculates the calculated value O _O of the input data I itself. Then, in the metadata storage procedure S105, the metadata storage unit 14 stores the metadata M further comprising a calculation value O _O of the input data I itself calculated by the calculation value calculation procedure S104.

サーバ分散用データ送信手順Ｓ１０７では、サーバ分散用データ送受信部１７が、分散用データＤ_１〜Ｄ_Ｎをクライアント３_１〜３_Ｍに冗長分散させる。例えば、図６に示すように、クライアント３_ｉ＋１の識別子「６６１４」がハッシュ空間においてデータピースＰ_ｊの演算値Ｏ_Ｐｊ「６６１２」付近に位置する演算値「６６１４」に一致する場合、サーバ分散用データ送受信部１７が、分散用データＤ_ｊを２台以上のクライアント３_ｉ及び３_ｉ＋１に送信する。 The server distribution data transmission procedure S107, server distribution data receiving unit 17, thereby the redundant distributed distributed data _D 1 to D _N to the client ₃ 1 _~3 _M. For example, as shown in FIG. 6, when the identifier “6614” of the client 3 _{i + 1} matches the calculated value “6614” located near the calculated value O _Pj “6612” of the data piece P _{j in} the hash space, The data transmitter / receiver 17 transmits the distribution data D _j to two or more clients 3 _i and 3 _{i + 1} .

図１２は、分散用データＤ_ｊを回収する際のデータ分散保管システムの動作の一例を示すフロチャートである。
ステップＳ３１１では、分散用データ回収部２２が、分散用データの回収指示を取得したか否かを判定する。分散用データ回収部２２が分散用データの回収指示を取得すると、ステップＳ３１２に移行する。 Figure 12 is a flow chart showing an example of the operation of the data distribution storage system when recovering distributed data D _j.
In step S311, the distribution data collection unit 22 determines whether or not a distribution data collection instruction has been acquired. When the distribution data collection unit 22 obtains the distribution data collection instruction, the process proceeds to step S312.

ステップＳ３１２では、分散用データ回収部２２が図１１に示す分散用データ回収手順Ｓ２０２を実行する。ステップＳ３１２では、ステップＳ３１９〜ステップＳ３２３を実行する。ステップＳ３１９では、分散用データ回収部２２がクライアント３_１〜３_Ｍからアクセスがあったか否かを判定する。アクセスがあった場合、ステップＳ３２０へ移行する。ステップＳ３２０では、回収した分散用データＤ_ｊに含まれるデータピースＰ_ｊの演算値Ｏ_Ｐｊ例えばハッシュ値を算出する。ステップＳ３２１では、算出した演算値Ｏ_ＰｊがメタデータＭに含まれる演算値Ｏ_Ｐｊに適合するか否かを判定し、適合しなければその分散用データＤ_ｊを廃棄し（ステップＳ３２３）、適合すればステップＳ３２２へ移行する。ステップＳ３２２では、全ての分散用データＤ_１〜Ｄ_Ｎが予め定められた個数そろっているか否かを判定する。全ての分散用データＤ_１〜Ｄ_Ｎが予め定められた個数そろっていなければ、ステップＳ３１９へ移行し、全ての分散用データＤ_１〜Ｄ_Ｎが予め定められた個数揃うまでステップＳ３１９〜ステップＳ３２２を繰り返す。ステップＳ３２２において全ての分散用データＤ_１〜Ｄ_Ｎが予め定められた個数そろっていれば、ステップＳ３１２を終了する。 In step S312, the distribution data collection unit 22 executes the distribution data collection procedure S202 shown in FIG. In step S312, steps S319 to S323 are executed. In step S319, the dispersion data recovery unit 22 determines whether there is an access from the client ₃ 1 _~3 _M. If there is an access, the process proceeds to step S320. In step S320, it calculates the calculated value _{O Pj} example a hash value of data pieces _{P j} contained in the recovered dispersion data _{D j.} In step S321, the calculated operation value _{O Pj} is determined whether to match the calculated value _{O Pj} included in the meta data M, it discards the variance data _{D j} be compatible (step S323), adapted If it does, it will transfer to step S322. In step S322, it determines whether or not all of the distributed data D ₁ to D _N are aligned a predetermined number. If all the data D _{1 to} _DN for distribution do not have a predetermined number, the process proceeds to step S319, and steps S319 to S322 are performed until all the data D _{1 to} _DN for distribution have a predetermined number. repeat. If all the data for dispersion D _{1 to} _DN have a predetermined number in step S322, step S312 ends.

ステップＳ３１３では、データ結合部２３がデータ結合手順Ｓ２０３を実行するとともに、データ復元部２４が図１１に示すデータ復元手順２０４を実行する。データ復元手順Ｓ２０４では、データ復元部２４が、結合データＢを入力データＩに復元した復元データＲを出力する。 In step S313, the data combining unit 23 executes the data combining procedure S203, and the data restoring unit 24 executes the data restoring procedure 204 shown in FIG. In the data restoration procedure S204, the data restoration unit 24 outputs restored data R obtained by restoring the combined data B to the input data I.

ステップＳ３１４では、演算値照合部２５が、図１１に示す演算値照合手順Ｓ２０５を実行する。演算値照合手順Ｓ２０５では、演算値照合部２５は、入力データＩ自体の演算値Ｏ_Ｏを算出した演算アルゴリズムを用いて復元データＲ自体の演算値Ｏ_Ｒを算出する。そして、演算値照合部２５は、復元データＲ自体の演算値Ｏ_Ｒを、メタデータ取得手順Ｓ２０１で取得したメタデータＭに含まれる入力データＩ自体の演算値Ｏ_Ｏと照合する。演算値Ｏ_Ｒと演算値Ｏ_Ｏとが一致すれば、演算値照合部２５は、復元データＲを入力データＩとして出力し、分散用データＤ_１〜Ｄ_Ｎの回収を終了する。 In step S314, the calculated value matching unit 25 executes a calculated value matching procedure S205 shown in FIG. The arithmetic value verification procedure S205, the arithmetic value comparison unit 25 calculates the calculated value O _R of the restored data R itself by using an arithmetic algorithm of calculation of the calculation value O _O of the input data I itself. The arithmetic value comparison unit 25 collates the calculated value O _R of the restored data R itself, the calculated value O _O of the input data I itself contained in the metadata M acquired by the metadata acquisition procedure S201. If they match the calculated value O _R and the arithmetic value O _O is, calculation value comparison unit 25 outputs the restored data R as input data I, terminate collection of distributed data D ₁ to D _N.

ステップＳ３１４において、演算値Ｏ_Ｒと演算値Ｏ_Ｏとが一致しない場合、データ分散保管装置１Ｂは、図１１に示す不正データ排除手順Ｓ２０６を実行する。不正データ排除手順Ｓ２０６では、データ分散保管装置１Ｂは、改めて分散用データＤ_１〜Ｄ_Ｎの回収を行うが、通常とは異なる動作でデータ回収を行う。以下に、この方法をｄｏｕｂｔモードと記述する。 In step S314, the case where the calculated value _{O R} and the arithmetic value _{O O} do not match, data distribution storing apparatus 1B performs an illegal data exclusion procedure S206 shown in FIG. 11. In bad data exclusion procedure S206, data distribution storing apparatus 1B is again performed to recover the distributed data D ₁ to D _N, performs data collected at different behavior from the normal. Hereinafter, this method is described as a doubt mode.

ｄｏｕｂｔモードでは、データ回収依頼は通常と同一であるが、アクセスしてきたクライアント３_ｊが当該分散用データＤ_ｊを持ち、これを回収できたとしても、通常とは異なり、ひとつのデータピースＰ_ｊにつき、冗長保管されている２つの分散用データＤ_ｊの回収が完了するまで各クライアント３_１〜３_Ｍへの回収依頼を停止しない。 In the doubt mode, the data collection request is the same as usual, but even if the client 3 _j that has accessed has the data D _j for distribution and can retrieve it, unlike the normal case, one data piece P _j Therefore, the collection request to each of the clients 3 _{1 to} 3 _M is not stopped until the collection of the two distributed data D _j stored redundantly is completed.

実際には、データが完成し、演算値Ｏ_Ｒと演算値Ｏ_Ｏとが一致すること（入力データＩが確実に復元できたこと）を確認するまで回収依頼は停止していない方法を実現することが想定される。この方法はｄｏｕｂｔモードでの回収効率を改善するために有益であり、トラフィック量に余裕がある場合は回収レーテンシの短縮が実現できる。 In fact, the data is completed, collection request to the calculated value O _R and the arithmetic value O _O to ensure that it matches (the input data I could be reliably restored) implements a method that does not stop It is assumed that This method is useful for improving the collection efficiency in the doubt mode, and the collection latency can be shortened when there is a margin in the amount of traffic.

具体的には、ｄｏｕｂｔモードの不正データ排除手順Ｓ２０６では、図１２に示すステップＳ３１５、ステップＳ３１６、ステップＳ３１７及びステップＳ３１８を実行する。ステップＳ３１５では、演算値照合部２５は、分散用データ回収部２２に、分散用データＤ_ｊを回収させる。分散用データ回収部２２は、図１１に示す分散用データ回収手順Ｓ２０２と同様に、分散用データＤ_１〜Ｄ_Ｎを回収する。このとき、分散用データＤ_１〜Ｄ_Ｎを揃える予め定められた個数は、２以上の数であり、奇数であることが好ましい。 Specifically, in the illegal data exclusion procedure S206 in the doubt mode, step S315, step S316, step S317, and step S318 shown in FIG. 12 are executed. In step S315, the arithmetic value comparison unit 25, the distributed data collecting unit 22 to collect the dispersed data _{D j.} The distribution data collection unit 22 collects the distribution data D _{1 to} _DN in the same manner as the distribution data collection procedure S202 shown in FIG. At this time, the predetermined number of the data D _{1 to} _DN for distribution is a number of 2 or more, and preferably an odd number.

ステップＳ３１６では、分散用データ回収部２２は、データピースＰ_ｊを回収するために２つの分散用データＤ_ｊを回収した後、２つの分散用データＤ_ｊのデータピースＰ_ｊを比較し、同一であれば一方の分散用データＤ_ｊをデータ結合部２３へ出力する。２つの分散用データＤ_ｊが同一でない場合は更に３つめの回収を行い、多数決ロジックにより復元に使用する分散用データＤ_ｊを特定してデータ結合部２３へ出力する。これを各分散用データＤ_１〜Ｄ_Ｎについて行う。 In step S316, the distributed data collecting unit 22, after the recovery of two distributed data D _j to recover the data piece P _j, and compares the data pieces P _j of the two distributed data D _j, identical If so, one of the data D _j for distribution is output to the data combining unit 23. If the two distribution data D _j are not the same, a third collection is further performed, and the distribution data D _j used for restoration is specified by the majority logic and output to the data combining unit 23. This is performed for each of the data D _{1 to} _DN for distribution.

ステップＳ３１７では、データ結合部２３が、分散用データ回収部２２の出力する分散用データＤ_１〜Ｄ_Ｎを用いてデータピースＰ_１〜Ｐ_Ｎを結合する。そして、データ復元部２４がデータ結合部２３からの結合データＢを復元する。そして、演算値照合部２５が、データ復元部２４からの復元データＲの演算値Ｏ_Ｒと入力データＩ自体の演算値Ｏ_Ｏを照合する。そして、演算値Ｏ_Ｒと演算値Ｏ_Ｏとが一致すれば、演算値照合部２５は、復元データＲを入力データＩとして出力し、分散用データＤ_１〜Ｄ_Ｎの回収を終了する。 In step S317, the data combining unit 23 combines the data pieces P _{1 to} P _N using the distribution data D _{1 to} _DN output from the distribution data collection unit 22. Then, the data restoring unit 24 restores the combined data B from the data combining unit 23. The arithmetic value comparison unit 25 collates the calculated value O _O of the input data I itself a calculated value O _R restoring data R from the data recovery unit 24. If the calculated value O _R and the calculated value O _O match, the calculated value matching unit 25 outputs the restored data R as the input data I, and ends the collection of the distribution data D _{1 to} _DN .

上記の方法を用いた場合でも、復元ができない時には、分散用データ回収部２２の回収する分散用データＤ_ｊの数を２つ単位で増やし、例えば、５個回収したり、７個回収したり等の、回収数を増やす動作を実施し、多数決ロジックで正しいデータの特定を試みる方法を用いることが好ましい。 Even when using the above method, when unable to restore the number of distributed data D _j to recover the distributed data collecting unit 22 increases by two units, for example, to five recovered, or seven recovered It is preferable to use a method of performing an operation of increasing the number of collections and trying to specify correct data by majority logic.

なお、演算値Ｏ_Ｒと演算値Ｏ_Ｏとが一致し、入力データＩが復元できた場合は、誤ったデータを回収したクライアント３_ｉが特定できるため、このクライアント３_ｉは以後は、システムから除外し、当該クライアント３_ｉが担当していた分散用データは、改めて他の冗長保管先から複製を行う。この際は、２以上の保管先から同一データを回収し比較を行うことにより、悪意ある異常データがシステム内に固定されることを防ぐことが可能である。 Incidentally, the calculated value O _R and the arithmetic value O _O coincides, when the input data I could be restored, because the client 3 _i recovered erroneous data can be specified, the client 3 _i is Thereafter, the system The data for distribution excluded by the client 3 _i is duplicated from another redundant storage destination. In this case, it is possible to prevent malicious abnormal data from being fixed in the system by collecting and comparing the same data from two or more storage destinations.

システムに悪意あるアクセスを行うクライアントがあった場合、冗長保管された分散用データＤ_ｊを利用し、これを効率的に排除することができる。 If there is a client that performs malicious access to the system, the redundantly stored distribution data D _j can be used, and this can be efficiently eliminated.

（実施形態５）
本実施形態に係るデータ分散保管システムは、実施形態１で説明したサーバ分散用データ送受信部１７がデータを配信するにあたり、配信先をランダムに決定するか、又は生成順に配布する最もシンプルな方法などの動作を実施するために、二種類以上の配信先決定ロジックを有する。 (Embodiment 5)
The distributed data storage system according to the present embodiment is the simplest method in which the server distribution data transmission / reception unit 17 described in the first embodiment determines the distribution destination at random or distributes the data in the order of generation. In order to carry out the above operation, two or more types of delivery destination determination logic are provided.

配信先をランダムに決定する場合、例えば、データピースＰ_１〜Ｐ_Ｎから演算される演算値Ｏ_Ｐ１〜Ｏ_ＰＮを基準として送信先のクライアント３_１〜３_Ｍを決定する方法である。送信先の決定方法には各種の演算が実施可能であるが、入力データＩの偏りに影響されないハッシュ値のようなダイジェスト値を用いる方法は、本方式に適用可能な演算方法の一例である。この場合には、分割されたデータピースＰ_１〜Ｐ_Ｎの一部もしくは全体から演算値を算出し、同様にアクセスしてくるクライアント３_１〜３_Ｍの識別子ＩＤ_１〜ＩＤ_Ｍの演算値と比較して、この演算値が一致した場合、もしくは一定範囲で合致するに送信対象とすることができる。この仕組みにより、回収する分散用データＤ_１〜Ｄ_Ｎが一時的に大量に発生した場合、クライアント３_ｉへの回収指示コマンド内の回収データリストが膨大になることを避ける事ができる。 When determining the delivery destination in a random, for example, a method of determining the client ₃ 1 to 3 _M of destination calculated value _O P1 _{~ O PN} which is calculated from the data piece _P 1 to P _N as a reference. Although various calculations can be performed as the transmission destination determination method, a method using a digest value such as a hash value that is not affected by the bias of the input data I is an example of a calculation method applicable to this method. In this case, calculated values are calculated from a part or all of the divided data pieces P _{1 to} P _N , and the calculated values of identifiers ID _{1 to} ID _M of the clients 3 _{1 to} 3 _M that are accessed in the same way In comparison, if the calculated values match, or if they match within a certain range, they can be the transmission target. With this mechanism, when a large amount of data for distribution D _{1 to} _{DN to} be collected is temporarily generated, it is possible to prevent the collection data list in the collection instruction command to the client 3 _i from becoming enormous.

なおクライアント３_ｉがある程度大量に存在しないと送信効率が低下する場合があるため、小規模なシステムでは演算値の範囲を拡大したり縮小することも可能である。 If the client 3 _i does not exist to some extent, the transmission efficiency may decrease. Therefore, in a small-scale system, the range of operation values can be expanded or reduced.

生成順に配布する最もシンプルな方法などの動作を実施する場合、例えば、サーバ分散用データ送受信部１７は、クライアント３_ｉのアクセス頻度（平均インターバル）と提供するデータ容量、平均通信速度、および過去の回収確率から決定される信頼性パラメータで端末群をグループ分けし、保管すべきデータピースＰ_１〜Ｐ_Ｎの種類や信頼性に応じてグループを決定する方法が挙げられる。 When performing the operation such as the simplest method of distributing in the generation order, for example, the server distribution data transmitting / receiving unit 17 includes the access frequency (average interval) of the client 3 _i , the data capacity to be provided, the average communication speed, and the past There is a method in which terminals are grouped according to a reliability parameter determined from the recovery probability, and the group is determined according to the type and reliability of data pieces P _{1 to} P _N to be stored.

パラメータに多少の差異があるほか、信頼性パラメータには時系列の履歴情報が必要となるが、本方式においては、データ分散保管装置１Ａの処理を軽くするため、計算は、主として、クライアント３_ｉ側で行う点が、大きく異なる特徴である。具体的にはＤＲＴアプリケーションが過去１４日間のアクセスインターバル、（アクセス回数÷１４日で計算されるものではなく、処理完了から次のアクセス開始までの時間の平均）、提供データ量（許容する最大データ量ではなく実際に保管しているデータ量）、保管する際の通信ビットレートの平均、データ回収の際の通信ビットレートの平均を記録し、データ分散保管装置１Ａのアクセスの際にこれらのパラメータを同時に送信する方法が可能である。また分散用データＤ_ｊの回収確率については、データ分散保管装置１Ａ側でなければ算出することができないため、この値のみ、サーバ分散用データ送受信部１７が算出する方法が好ましい。 In addition to slight differences in parameters, time series history information is required for reliability parameters. In this method, the calculation is mainly performed by the client 3 _i in order to reduce the processing of the data distributed storage device 1A. This is a very different feature. Specifically, the DRT application has access intervals for the past 14 days (the number of accesses divided by the number of accesses divided by 14 days, the average time from the completion of processing until the start of the next access), the amount of data provided (the maximum allowable data) Record the amount of data actually stored, not the amount), the average communication bit rate at the time of storage, the average communication bit rate at the time of data collection, and these parameters when accessing the data distributed storage device 1A Can be transmitted simultaneously. Further, since the recovery probability of the distribution data D _j can be calculated only on the data distribution storage device 1A side, the method in which the server distribution data transmitting / receiving unit 17 calculates only this value is preferable.

ここで提供データ量については、許容する最大データ量ではなく、実際に保管しているデータ量とすることは、新しくシステムに参加したクライアントを“徐々に信頼する”仕組みとして有効である。この理由は、システムに参加した当初は保管している分散用データがないため、アクセス頻度が高く、通信レートが高くとも信頼性パラメータは低く、信頼性が保管端末の属性だけで決定されず、実績を反映することになるためである。 Here, the provided data amount is not the maximum allowable data amount but the actually stored data amount, which is effective as a mechanism for “gradually trusting” newly joined clients. The reason for this is that since there is no distribution data stored at the beginning of participation in the system, the access frequency is high, the reliability parameter is low even if the communication rate is high, and the reliability is not determined only by the attribute of the storage terminal, This is because the results will be reflected.

サーバ分散用データ送受信部１７はこれらのデータを受取り、データ分散保管装置１Ａ側で記録しているデータ回収確率と併せて保管し、次の送信の決定要素として使用することができる。 The server distribution data transmission / reception unit 17 receives these data, stores it together with the data recovery probability recorded on the data distribution storage device 1A side, and can use it as a determinant of the next transmission.

サーバ分散用データ送受信部１７は平均アクセスインターバルおよび通信ビットレートの情報からクライアント毎のデータ回収のための平均所要時間が計算できるため、不確定になりがちな回収のための所要時間を、ある程度の範囲に縮減する事ができるようになる。また逆に、ある程度不確定で良いデータについては、所要時間が大きくなるクライアントへ送信し、システムとしてデータ種類や、サービス品質に応じた調整が可能となる。 The server distribution data transmitting / receiving unit 17 can calculate the average required time for data collection for each client from the information of the average access interval and the communication bit rate. Can be reduced to range. Conversely, data that is uncertain to some extent can be transmitted to a client whose required time is increased, and the system can be adjusted according to the data type and service quality.

これらの２つは組み合わせて、もしくはどちらかのみでの運用も可能で、比較的小規模なシステムでは後者のグルーピングのみを採用したり、クライアントの稼働がある程度保障されている社内のイントラネット環境などでは前者の演算値ベースの送信のみを採用したりすることができる。更に、データピースの演算値で配信先をグループ化することで、後のデータ回収時の通信効率を改善できる。 These two can be used in combination or only in either case. For relatively small systems, only the latter grouping is used, or in an intranet environment where the operation of the client is guaranteed to some extent. Only the former calculation value-based transmission can be employed. Furthermore, by grouping the delivery destinations with the calculated values of the data pieces, it is possible to improve the communication efficiency when data is collected later.

（実施形態６）
本実施形態に係るデータ分散保管システムは、実施形態１で説明したサーバ分散用データ送受信部１７が、各クライアント３_１〜３_Ｍの各種情報を用いて保管の冗長度を決定するためにも用いる。 (Embodiment 6)
The data distributed storage system according to the present embodiment is also used by the server distribution data transmitting / receiving unit 17 described in the first embodiment to determine storage redundancy using various information of each of the clients 3 _{1 to} 3 _M. .

ほぼ同様なパラメータを持つクライアントのグループで、例えば４冗長（同一のデータを４カ所に保管する）の場合、各クライアントのアクセス間隔（アクセスインターバル）Ｔｉが同一であり、４つのクライアントが非同期に動作している場合、回収するデータの通信時間を簡単のため０とすれば、期待される平均回収時間はインターバルの半分を更に台数で割った値、すなわちＴｉ／８となる。これを基準とし、平均回収時間がある閾値を超えた場合、有効な冗長分散がなされていないと判断し、冗長度を上げる必要があると判断する。本システムでは、例えば閾値として、１．８倍を採用することが考えられる。これは実動作において冗長度が低い場合に平均回収時間がインターバル時間に相当する程度（二倍弱）となる場合があるためで、この日常的に起こりうる最悪値の近傍を閾値とする方法が望ましい。 In the case of a group of clients having almost the same parameters, for example, in the case of 4 redundancy (same data is stored in 4 locations), the access interval (access interval) Ti of each client is the same, and 4 clients operate asynchronously. In this case, if the communication time of the data to be collected is set to 0 for simplicity, the expected average collection time is a value obtained by dividing half of the interval by the number of vehicles, that is, Ti / 8. Based on this, when the average recovery time exceeds a certain threshold, it is determined that effective redundancy distribution is not performed, and it is determined that the redundancy needs to be increased. In this system, for example, 1.8 times may be adopted as the threshold value. This is because when the redundancy is low in actual operation, the average recovery time may be equivalent to the interval time (a little less than twice). desirable.

また十分な実保管容量が確保できる場合は上限と下限を設定した上で冗長度の自動的な増減を行う方法が好ましい。 Further, when a sufficient actual storage capacity can be secured, a method of automatically increasing / decreasing redundancy after setting an upper limit and a lower limit is preferable.

なおクライアント側では、前記の期待しない同期動作を避けるため、起動後、最初の待ち時間は、設定された待ち時間を最大とするランダムな待ち時間とする方法をとることができる。これにより同一の分散用データを保管する複数のクライアントが、最短時間でデータ分散保管装置の回収指示に対応できるようになる。この理由は、複数のクライアントが同期して動作する場合、平均回収時間はインターバルの半分となってしまうためであり、各クライアントはランダムにアクセスを行うことが、データ分散保管装置１Ｂが回収すべきデータの発生後、最も短い時間でアクセスを受けることになるからである。 On the client side, in order to avoid the unexpected synchronous operation, the first waiting time after startup can be a random waiting time that maximizes the set waiting time. As a result, a plurality of clients that store the same distribution data can respond to the collection instruction of the data distribution storage device in the shortest time. The reason is that when a plurality of clients operate synchronously, the average collection time becomes half of the interval, and the data distributed storage device 1B should collect that each client performs random access. This is because access is received in the shortest time after data is generated.

なお、不特定多数のクライアントが参加するシステムにおいては、ある程度容量属性を抽象化する必要があるが、これは実施形態５で説明したグループ化の際に属性として付与することができる。 In a system in which an unspecified number of clients participate, it is necessary to abstract the capacity attribute to some extent, but this can be given as an attribute in the grouping described in the fifth embodiment.

（実施形態７）
実施形態１で説明したデータ分散保管システムにおいて分散用データＤ_１〜Ｄ_Ｎを分散して保管するにあたり、その処理の時系列の情報（メタデータＭ）はブロック暗号化のパスワードに相当する重要なデータである。一体化処理と分割の前後でブロック暗号化を行うことも可能であり、その場合もメタデータにパスワードが含まれることが想定される。 (Embodiment 7)
In the data distributed storage system described in the first embodiment, when the data D _{1 to} _DN for distribution are distributed and stored, the time series information (metadata M) of the processing is important corresponding to a block encryption password. It is data. It is also possible to perform block encryption before and after the integration process and division, and in that case, it is assumed that the metadata includes a password.

従来の方式では、このメタデータは他の専用の管理装置に保管され運用される。この際、メタデータの損失に備え分散して保管する方法もあり、またメタデータ自体を冗長保管するような運用も考えられる。 In the conventional method, this metadata is stored and operated in another dedicated management device. At this time, there is a method of distributing and storing in preparation for the loss of metadata, and an operation of storing the metadata itself redundantly is also conceivable.

メタデータをデータ分散保管装置１Ａ内で保持する方法も想定できるが、この場合、メタデータを通信しないため、セキュリティ上は望ましい形となる。しかしながら、データ分散保管装置１Ａの故障時にはメタデータＭが失われる可能性があり、この場合、分散して保管した分散用データＤ_１〜Ｄ_Ｎが復元できなくなる可能性を回避する仕組みが必要である。 Although a method of holding the metadata in the data distributed storage device 1A can be assumed, in this case, since the metadata is not communicated, this is a desirable form in terms of security. However, there is a possibility that the metadata M may be lost when the data distributed storage device 1A fails, and in this case, a mechanism for avoiding the possibility that the distributed data D _{1 to} _DN stored in a distributed manner cannot be restored is necessary. is there.

また異なったデータ分散保管装置１Ｂがクライアント３_１〜３_Ｍからデータを回収する場合、このメタデータＭを共有する必要がある。もしデータ分散保管装置１Ｂが外部ネットワークに存在した場合は、このメタデータＭがインターネット等のネットワーク上で通信されることになり、セキュリティ上は、推奨できない運用法となる可能性がある。 Further, when different data distributed storage devices 1B collect data from the clients 3 _{1 to} 3 _M , it is necessary to share this metadata M. If the data distributed storage device 1B exists in an external network, the metadata M is communicated on a network such as the Internet, which may be an operation method that is not recommended for security.

まずデータ分散保管装置１ＡのメタデータＭを保管する場合、データ分散保管装置１Ａのハードウェア故障を想定し、必ず物理的に他のハードウェアにメタデータＭを保管する方法が好ましい。本方式では、二台以上のデータ分散保管装置１Ａ，１Ｂを設置することにより、メタデータＭの損失を回避する仕組みを用いる方法が好ましい。従来方式は、秘密分散方式などで管理装置を冗長化しているが、データ分散保管装置１Ａ，１Ｂを複数設置することにより、メタデータＭを相互に保持し冗長性を持たせ、また専用の管理装置を不要とすることで管理装置自体の冗長性を確保する必要性を、排除することが可能である。 First, when the metadata M of the data distributed storage device 1A is stored, it is preferable to assume a hardware failure of the data distributed storage device 1A and always physically store the metadata M in other hardware. In this method, a method using a mechanism that avoids the loss of the metadata M by installing two or more data distributed storage devices 1A and 1B is preferable. In the conventional method, the management device is made redundant by the secret sharing method or the like. However, by installing a plurality of data distributed storage devices 1A and 1B, the metadata M is mutually retained to provide redundancy, and dedicated management is performed. The necessity of ensuring the redundancy of the management device itself can be eliminated by eliminating the need for the device.

具体的には、データ分散保管装置１Ａとデータ分散保管装置１Ｂが近傍で動作している場合、データ分散保管装置１Ａが処理したデータのメタデータＭをデータ分散保管装置１Ｂにも保管する。例えば、メタデータをＳＱＬのデータベースに登録し、他方のデータ分散保管装置へのメタデータ登録と自データ分散保管装置への登録を行ったのち、実ファイルを削除対象とする方法をとることが好ましい。 Specifically, when the data distributed storage device 1A and the data distributed storage device 1B operate in the vicinity, the metadata M of the data processed by the data distributed storage device 1A is also stored in the data distributed storage device 1B. For example, it is preferable to take a method of registering metadata in an SQL database, registering metadata in the other data distributed storage device, and registering in the own data distributed storage device, and then deleting an actual file. .

また大規模なシステムにおいて複数のデータ分散保管装置１Ａ，１Ｂを設置する場合は、各データ分散保管装置１Ａ，１Ｂを論理的にリング状に配置し、右回りで次のデータ分散保管装置１Ｂに自データ分散保管装置１Ａのメタデータを登録する方法も可能である。これはｈｔｔｐ：／／ｗｗｗ８．ｏｒｇ／ｗ８−ｐａｐｅｒｓ／２ａ−ｗｅｂｓｅｒｖｅｒ／ｃａｃｈｉｎｇ／ｐａｐｅｒ２．ｈｔｍｌに示されるように、ＣｏｎｓｉｓｔｅｎｔＨａｓｈｉｎｇと呼ばれる手法である。一般的にｗｅｂのキャッシュに使われている手法であるが、本方式ではこれを分散保管のために活用することが可能である。 When a plurality of data distributed storage devices 1A and 1B are installed in a large-scale system, the data distributed storage devices 1A and 1B are logically arranged in a ring shape, and clockwise to the next data distributed storage device 1B. A method of registering metadata of the own data distributed storage device 1A is also possible. This is http: // www8. org / w8-papers / 2a-webserver / caching / paper2. As shown in html, it is a technique called Consistent Hashing. Although this is a technique generally used for web caches, this method can be used for distributed storage.

スケーラビリティやダイナミックな構成変更を伴わないシステムであれば、こうした手法を用いない実装も可能である。以下にこの方式を採用した場合の新たな工夫と実現手段を述べる。具体的にはｗｅｂキャッシュの用途においては、キャッシュが追加された際のキャッシュアウトを回避するために用いられるが、保管の場合にはキャッシュとしての使用と異なり、ヒットしないことは許容されず、キャッシュアウトに相当するデータの損失は一時的なトラフィック増加にとどまらないことが大きな問題となる可能性がある。そのため、データ分散保管装置追加時ではなくデータ分散保管装置削除時の処理に新たな工夫が必要となる。また冗長保管を行う為の工夫も、新たな手段として追加する必要がある。 A system that does not involve scalability or dynamic configuration changes can be implemented without using these methods. The following describes new ideas and means for implementing this method. Specifically, in the use of the web cache, it is used to avoid a cash-out when the cache is added. However, unlike the use as a cache in the case of storage, it is not allowed to not hit and the cache is not used. There is a possibility that data loss corresponding to “out” is not limited to a temporary increase in traffic. For this reason, a new device is required for the processing when the data distributed storage device is deleted, not when the data distributed storage device is added. Moreover, it is necessary to add a device for performing redundant storage as a new means.

以下に具体的な実現例を述べる。
まず保管すべき入力データＩは演算値Ｏ_Ｉとしてハッシュ値を計算され、ハッシュ値に応じたデータ分散保管装置１Ｂがこれを担当する。ハッシュ値は、例えば３２ビット値で３２ビットの数値空間に存在するそれぞれのデータ分散保管装置アドレスを２５６個づつ登録している場合を想定する。保管すべきデータから計算されたハッシュ値より小さいもっとも近傍なポイントに登録されているデータ分散保管装置１Ｂがこのデータの処理を担当する。このデータ分散保管装置１Ｂは保管データを処理した後、リング上の右隣へこのデータを冗長保管する。 A specific implementation example is described below.
First input data I to be stored is calculated hash value as the operation value O _I, data distribution storing apparatus 1B corresponding to the hash value is responsible for this. Assuming that the hash value is a 32-bit value, for example, 256 data storage addresses each registered in a 32-bit numeric space are registered. The data distributed storage device 1B registered at the nearest point smaller than the hash value calculated from the data to be stored is responsible for processing this data. The data distributed storage device 1B processes the stored data and then redundantly stores the data to the right on the ring.

システムにデータ分散保管装置を追加する場合は、３２ビットの数値空間に同様に２５６のポイントを追加し、動作を開始する。この仕組みによりデータ分散保管装置の追加直後から全てのデータ分散保管装置へ均等に作業を分散させることが可能となる。 When adding a data distributed storage device to the system, 256 points are similarly added to the 32-bit numeric space, and the operation is started. This mechanism makes it possible to distribute work evenly to all data distributed storage devices immediately after the addition of the data distributed storage devices.

更に、システムからデータ分散保管装置を減らす場合は、当該データ分散保管装置のポイントを３２ビットの数値空間上から削除し、新たなリングで計算した次のデータ分散保管装置の、また次のデータ分散保管装置に、所有しているメタデータのデータベースを移管する方法が好ましい。このとき、次のデータ分散保管装置は既に自分のＤＢのコピーを持っている。 Further, when the number of data distributed storage devices is reduced from the system, the points of the data distributed storage devices are deleted from the 32-bit numerical space, and the next data distributed storage device calculated by a new ring is also used. A method of transferring a database of owned metadata to a storage device is preferable. At this time, the next distributed data storage device already has a copy of its own DB.

この仕組みによりデータ分散保管装置削除後も他のデータ分散保管装置に再設定することなく、システムの運用が継続できる。 With this mechanism, even after the data distributed storage device is deleted, the system operation can be continued without resetting to another data distributed storage device.

次に外部のデータ分散保管装置１Ｂがクライアント３_ｉのデータを取り出す場合の仕組みをのべる。外部ネットワークのデータ分散保管装置１Ｂは、通常は、上記の相互メタデータ順次冗長保管方式は使用できない場合が想定される。外部ネットワークではデータ分散保管装置間の通信を物理的に保護することが難しいためである。よって十分な暗号化を施して共有し、メタデータＭ自体はネットワーク上では、通信しない仕組みが好ましい。 Next, a mechanism in the case where the external data distributed storage device 1B takes out the data of the client 3 _i will be described. The data distributed storage device 1B of the external network is usually assumed to be unable to use the mutual metadata sequential redundant storage method described above. This is because it is difficult to physically protect communication between data distributed storage devices in an external network. Therefore, it is preferable to share the data with sufficient encryption, and the metadata M itself does not communicate on the network.

メタデータＭは、一種の共通鍵暗号化の共通キーに相当するため、これを交換、共有する仕組みとしては、ＣＨＡＰ、もしくはＤＨ法による鍵交換手法（公開鍵暗号）などが、使用可能である。しかし、メタデータはファイルやフォルダなど暗号化単位毎に異なり、これを類推できないため、チャレンジ演算ができない（データ分散保管装置側へダイジェスト値を送る事ができない）ため、ＣＨＡＰなどの手法は使用できない。またＤＨ法では生成される共通キーはランダムであるためメタデータとはできない。そこで、ＤＨ法を利用して取得するファイル毎に共通キーを生成し、その共通キーで交換する情報を暗号化する方法などが好ましい。 Since the metadata M corresponds to a common key of a kind of common key encryption, as a mechanism for exchanging and sharing this, a key exchange method (public key cryptography) by CHAP or DH method can be used. . However, since metadata is different for each encryption unit such as a file or folder and cannot be inferred, a challenge operation cannot be performed (a digest value cannot be sent to the data distributed storage device side), so a method such as CHAP cannot be used. . In the DH method, the generated common key is random and cannot be metadata. Therefore, a method of generating a common key for each file acquired using the DH method and encrypting information exchanged with the common key is preferable.

データ分散保管装置１Ａは対象ファイルのメタデータＭを所有しており、データ分散保管装置１Ｂは所有していない。ここでデータ分散保管装置１Ｂは必要な入力データＩの演算値Ｏ_Ｉからデータ分散保管装置１Ａがこの入力データＩのメタデータＭを所有していることを知る。これは前記のＣｏｎｓｉｓｔｅｎｔＨａｓｈｉｎｇの手法に該当する。 The data distributed storage device 1A owns the metadata M of the target file, and does not own the data distributed storage device 1B. Here, the data distributed storage device 1B knows from the calculated value O _I of the necessary input data I that the data distributed storage device 1A owns the metadata M of this input data I. This corresponds to the consistent hashing method described above.

データ分散保管装置１Ｂは乱数を秘密キーとしてデータ分散保管装置１Ａとの間に共通キーを生成する。これはＤＨ法による共通キー生成の手法を応用したものである。データ分散保管装置１Ｂは共通キーで暗号化した取得ファイル情報をデータ分散保管装置１Ａに送る。この際の暗号化は軽量である必要はないため、さまざまな選択肢がある。例えばファイル識別のための文字列に共通キーを用いたＡＥＳ暗号化や、文字列に共通キーをＸＯＲした上で６回の一体化処理を行った上でＡＥＳ暗号化を施す方法が考えられる。更にＲＡＤＩＵＳなどのホスト認証の仕組みをこれに組み合わせることで、より、守秘性能が向上する。 The data distributed storage device 1B generates a common key with the data distributed storage device 1A using a random number as a secret key. This is an application of a common key generation technique based on the DH method. The data distributed storage device 1B sends the acquired file information encrypted with the common key to the data distributed storage device 1A. Since encryption at this time does not need to be lightweight, there are various options. For example, AES encryption using a common key for a character string for file identification, or a method of performing AES encryption after performing six integration processes after XORing a common key to a character string is conceivable. Further, by combining a host authentication mechanism such as RADIUS with this, confidentiality performance is further improved.

データ分散保管装置１Ａはデータ分散保管装置１Ｂが要求するファイルのメタデータＭを同様な暗号化を施して送信する。データ分散保管装置１Ｂは取得したメタデータに従い、ネットワーク上に分散している分散用データＤ_１〜Ｄ_Ｎを回収する。 The data distributed storage device 1A transmits the file metadata M requested by the data distributed storage device 1B with the same encryption. The data distributed storage device 1B collects the distribution data D _{1 to} _DN distributed on the network according to the acquired metadata.

この方法はデータ分散保管装置１Ｂが能動的に配信を行う場合、すなわちデータ分散保管装置１Ｂがクライアント３_１〜３_Ｍを特定して、アクセス可能な場合に実現可能な方法であり、クライアント３_１〜３_Ｍを起点としたアクセスを前提とする場合には実現が困難であった。 This method is feasible when the data distributed storage device 1B actively distributes, that is, when the data distributed storage device 1B specifies the clients 3 _{1 to} 3 _M and can access them, and the client 3 ₁ It realized in the case that assumes access starting from the ~3 _M has been difficult.

以下に、クライアントを起点としたアクセスを行う場合の拡張方法を述べる。まずメタデータＭを保管しないデータ分散保管装置１Ｂは、メタデータＭを保管するデータ分散保管装置１Ａに回収したいデータを申告する。メタデータＭをデータ分散保管装置１Ａから受け取るところまでは前述の手法と同じである。 The following describes the extension method when accessing from a client. First, the data distributed storage device 1B that does not store the metadata M declares data to be collected to the data distributed storage device 1A that stores the metadata M. The process up to receiving the metadata M from the data distributed storage device 1A is the same as that described above.

このあとデータ分散保管装置１Ａは申告されたデータを持つと思われるクライアント３_１〜３_Ｍからのアクセスを受け付けた後、データ分散保管装置１Ｂへアクセスする旨の指示を行う。クライアント３_１〜３_Ｍはデータ分散保管装置１Ｂにアクセスし、回収すべき分散用データＤ_１〜Ｄ_Ｎがあるかどうかを確認する。 After the later data distribution storing apparatus 1A accepts the access from the client 3 ₁ to 3 _M that seems to have been reporting data, performs an instruction to access the data distribution storing apparatus 1B. The clients 3 _{1 to} 3 _M access the data distributed storage device 1B and confirm whether there is distribution data D _{1 to} _DN to be collected.

データ分散保管装置１Ｂは回収データについて通常と同様の手順でクライアント３_１〜３_Ｍと通信を行う。データ分散保管装置１Ｂはデータを回収後、データ分散保管装置１Ａに回収完了を通知する。データ分散保管装置１Ａはデータ分散保管装置１Ｂから回収完了が通知されるか、もしくは規定時間経過後にデータ分散保管装置１Ｂへの再アクセス指示をとりやめる。 The data distributed storage device 1B communicates with the clients 3 _{1 to} 3 _{M in} the same procedure as usual for the collected data. After collecting the data, the data distributed storage device 1B notifies the data distributed storage device 1A of the completion of collection. The data distributed storage device 1A is notified of the collection completion from the data distributed storage device 1B, or cancels the re-access instruction to the data distributed storage device 1B after the lapse of a specified time.

この方法によりクライアント起点（プル型）の保管システムであっても異なったデータ分散保管装置がデータを回収することが可能となる。このように、複数のデータ分散保管装置が存在する場合、相互に管理情報を保管する仕組みにより専用の管理装置を不要とし、運用コスト、設備コストを低減することができる。また上記の相互保管方法に工夫を行ったことでスケーラビリティを確保し、規模の増減に容易に対応することができる。加えて遠隔地にある複数のデータ分散保管装置が分散された同一のデータを回収する仕組みをＤＨ法を応用して実装し、これを安全に行うことを可能とした。 This method enables different data distributed storage devices to collect data even in a client-origin (pull type) storage system. In this way, when there are a plurality of data distributed storage devices, a dedicated management device is not required due to a mechanism for storing management information between each other, and operation costs and facility costs can be reduced. In addition, by devising the above mutual storage method, it is possible to secure scalability and easily cope with increase / decrease in scale. In addition, a mechanism that collects the same distributed data by a plurality of distributed data storage devices at remote locations has been implemented by applying the DH method, making it possible to perform this safely.

（実施形態８）
実施形態１から７に係るデータ分散保管システムは、通信ネットワーク上のデータ分散保管装置１Ａ及び１Ｂを検索する検索サーバ（不図示）をさらに備えていてもよい。 (Embodiment 8)
The data distributed storage system according to the first to seventh embodiments may further include a search server (not shown) that searches the data distributed storage devices 1A and 1B on the communication network.

検索サーバ（不図示）は、クライアント３_ｉから検索の依頼を受けると、登録されているデータ分散保管装置の中からランダムにひとつを選択し、これをクライアント３_ｉに通知する。通信すべきデータ分散保管装置を決定されたクライアント３_ｉの動作はデータ分散保管装置１Ａを指定して保管する場合と同様である。 When a search request (not shown) is received from the client 3 _i , the search server randomly selects one from the registered data distribution storage devices and notifies the client 3 _i of this. The operation of the client 3 _i for which the data distributed storage device to be communicated has been determined is the same as when the data distributed storage device 1A is specified and stored.

クライアント３_ｉが検索サーバ（不図示）にデータ分散保管装置１Ａを指定した場合、クライアント３_ｉは、指定したデータ分散保管装置１Ａの分散用データＤ_ｊを保管する。 When the client 3 _i designates the data distribution storage device 1A in the search server (not shown), the client 3 _i stores the distribution data D _j of the designated data distribution storage device 1A.

クライアント３_ｉが検索サーバ（不図示）にデータ分散保管装置１Ａを指定しない場合、クライアント３_ｉは、インターネット等の通信ネットワーク上の任意のデータ分散保管装置からの分散用データＤ_ｊを保管する。この場合は、通信ネットワーク上のデータ分散保管装置を検索する検索サーバ（不図示）へアクセスし、分散用データを有するデータ分散保管装置（不図示）を検索してアクセスすることになる。 When the client 3 _i does not designate the data distribution storage device 1A as a search server (not shown), the client 3 _i stores the data D _j for distribution from any data distribution storage device on a communication network such as the Internet. In this case, a search server (not shown) for searching for a data distributed storage device on the communication network is accessed, and a data distributed storage device (not shown) having data for distribution is searched and accessed.

本発明は、情報通信産業に適用することができる。 The present invention can be applied to the information communication industry.

１Ａ、１Ｂ：データ分散保管装置
３_１、３_２、３_３、３_ｉ、３_ｉ＋１、３_Ｍ：クライアント
１１：データ変更部
１２：データ分割部
１３：演算値算出部
１４：メタデータ格納部
１５：分散用データ構成部
１６：分散用データ格納部
１７：サーバ分散用データ送受信部
１８：パリティ演算部
１９：メタデータ収集部
２１：メタデータ取得部
２２：分散用データ回収部
２３：データ結合部
２４：データ復元部
２５：演算値照合部
３１：分散用データ取得部
３２：分散用データ格納部
３３：クライアント分散用データ送受信部 1A, 1B: Data distributed storage devices 3 ₁ , 3 ₂ , 3 ₃ , 3 _i , 3 _{i + 1} , 3 _M : Client 11: Data changing unit 12: Data dividing unit 13: Calculation value calculating unit 14: Metadata storage unit 15 : Distribution data configuration unit 16: Distribution data storage unit 17: Server distribution data transmission / reception unit 18: Parity calculation unit 19: Metadata collection unit 21: Metadata acquisition unit 22: Distribution data collection unit 23: Data combination unit 24: Data restoration unit 25: Operation value collation unit 31: Distribution data acquisition unit 32: Distribution data storage unit 33: Client distribution data transmission / reception unit

Claims

予め定められた規則に基づいて入力データのデータ配列を変更するデータ変更部と、
前記データ変更部からの変更データを複数のデータピースに分割するデータ分割部と、
予め定められた演算アルゴリズムを用いて、前記入力データ固有の演算値及び前記データピースの演算値を算出する演算値算出部と、
前記演算値算出部の算出する前記入力データ固有の演算値及び前記データピースの演算値並びに前記データ変更部の変更履歴が関連付けられたメタデータを格納するメタデータ格納部と、
前記演算値算出部の算出する前記入力データ固有の演算値及び前記データピースの演算値を、前記データ分割部からの各データピースに付して分散用データを構成する分散用データ構成部と、
前記分散用データ構成部の構成する前記分散用データを、前記分散用データに付されている前記データピースの演算値に適合する識別子を有するクライアントのうちの任意のクライアントに対して送信するサーバ分散用データ送受信部と、
を備えるデータ分散保管装置。 A data changing unit for changing the data arrangement of the input data based on a predetermined rule;
A data dividing unit for dividing changed data from the data changing unit into a plurality of data pieces;
A calculation value calculation unit that calculates a calculation value unique to the input data and a calculation value of the data piece using a predetermined calculation algorithm;
A metadata storage unit for storing metadata associated with a calculation value unique to the input data calculated by the calculation value calculation unit, a calculation value of the data piece, and a change history of the data change unit;
A distribution data configuration unit that configures data for distribution by attaching the calculation value specific to the input data calculated by the calculation value calculation unit and the calculation value of the data piece to each data piece from the data division unit,
Server distribution for transmitting the data for distribution, which is configured by the data unit for distribution, to any client among clients having an identifier that matches the operation value of the data piece attached to the data for distribution Data transmission / reception unit,
A distributed data storage device.

前記データピースの演算値に適合する識別子を有するクライアントからは取得可能であるけれども前記データピースの演算値に適合しない識別子を有するクライアントからは取得不可能な状態で、前記分散用データ構成部の構成する前記分散用データを格納する分散用データ格納部を、
さらに備えることを特徴とする請求項１に記載のデータ分散保管装置。 The configuration of the data configuration unit for distribution in a state where it can be acquired from a client having an identifier that matches the calculated value of the data piece but cannot be acquired from a client having an identifier that does not match the calculated value of the data piece A distribution data storage unit for storing the distribution data;
The data distributed storage device according to claim 1, further comprising:

前記データ分割部からの任意の数のデータピースを用いて、前記データピースのパリティデータを生成するパリティ演算部をさらに備え、
前記パリティ演算部は、前記任意の数が可変であり、
前記演算値算出部は、前記演算アルゴリズムを用いて、前記パリティデータの演算値をさらに算出し、
前記分散用データ構成部は、前記演算値算出部の算出する前記入力データ固有の演算値及び前記パリティデータの演算値を、前記パリティ演算部の生成する前記パリティデータに付して前記分散用データをさらに構成し、
前記サーバ分散用データ送受信部は、前記分散用データ構成部の構成する前記分散用データを、前記パリティデータの演算値に適合する識別子を有するクライアントに送信する
ことを特徴とする請求項１又は２に記載のデータ分散保管装置。 Using an arbitrary number of data pieces from the data dividing unit, further comprising a parity operation unit that generates parity data of the data pieces;
In the parity operation unit, the arbitrary number is variable,
The calculation value calculation unit further calculates a calculation value of the parity data using the calculation algorithm,
The distribution data configuration unit adds the calculation value unique to the input data calculated by the calculation value calculation unit and the calculation value of the parity data to the parity data generated by the parity calculation unit, and the distribution data Further configure
The server distribution data transmission / reception unit transmits the distribution data, which is configured by the distribution data configuration unit, to a client having an identifier that conforms to an operation value of the parity data. Data storage device described in 1.

前記サーバ分散用データ送受信部は、前記分散用データ構成部の構成する前記分散用データを、前記データピースの演算値に一致する識別子を有するクライアント及び前記演算アルゴリズムを用いて導き出される数値空間において前記データピースの演算値付近に位置する演算値に対応する識別子を有するクライアントのうちの少なくともいずれかのクライアントに送信する
ことを特徴とする請求項１から３のいずれかに記載のデータ分散保管装置。 The server distribution data transmission / reception unit is configured such that the distribution data included in the distribution data configuration unit is calculated in a numerical space derived using a client having an identifier that matches an operation value of the data piece and the operation algorithm. The data distributed storage device according to any one of claims 1 to 3, wherein the data is distributed to at least one of clients having an identifier corresponding to a calculated value located in the vicinity of the calculated value of the data piece.

前記メタデータ格納部の格納する前記メタデータを取得するメタデータ取得部と、
前記メタデータ取得部の取得する前記メタデータに含まれる前記データピースの演算値に適合するクライアントから、前記メタデータに含まれる前記入力データ固有の演算値が付されている前記分散用データを回収する分散用データ回収部と、
前記メタデータ取得部の取得する前記メタデータに含まれる前記データピースの演算値に従って、前記分散用データ回収部の回収する前記分散用データを配列し、前記データピースを結合するデータ結合部と、
前記メタデータ取得部の取得する前記メタデータに含まれる前記データ変更部の履歴に基づいて、前記データ結合部からの結合データを前記入力データに復元するデータ復元部と、
を備えることを特徴とする請求項１から４のいずれかに記載のデータ分散保管装置。 A metadata acquisition unit for acquiring the metadata stored in the metadata storage unit;
The distribution data to which the operation value specific to the input data included in the metadata is attached is collected from a client that matches the operation value of the data piece included in the metadata acquired by the metadata acquisition unit. A data collection unit for distribution,
A data combination unit that arranges the data for distribution collected by the data collection unit for distribution and combines the data pieces in accordance with the operation value of the data piece included in the metadata acquired by the metadata acquisition unit;
A data restoration unit for restoring combined data from the data combining unit to the input data based on a history of the data changing unit included in the metadata acquired by the metadata acquiring unit;
The data distributed storage device according to any one of claims 1 to 4, further comprising:

前記演算値算出部は、予め定められた演算アルゴリズムを用いて、前記入力データ自体の演算値をさらに算出し、
前記メタデータ格納部は、前記演算値算出部の算出する前記入力データ自体の演算値をさらに含む前記メタデータを格納し、
前記入力データ自体の演算値を算出した前記演算アルゴリズムを用いて前記データ復元部の復元データの演算値を算出し、算出した前記復元データの演算値を、前記メタデータ取得部の取得する前記メタデータに含まれる前記入力データ自体の演算値と照合する演算値照合部をさらに備える
ことを特徴とする請求項５に記載のデータ分散保管装置。 The calculation value calculation unit further calculates a calculation value of the input data itself using a predetermined calculation algorithm,
The metadata storage unit stores the metadata further including a calculation value of the input data itself calculated by the calculation value calculation unit,
The calculation value of the restoration data of the data restoration unit is calculated using the calculation algorithm that has calculated the calculation value of the input data itself, and the calculated calculation value of the restoration data is acquired by the metadata acquisition unit. The data distributed storage device according to claim 5, further comprising a calculated value collating unit that collates with a calculated value of the input data itself included in data.

予め定められた規則に基づいて入力データのデータ配列を変更するデータ変更手順と、
前記入力データを変更した変更データを複数のデータピースに分割するデータ分割手順と、
予め定められた演算アルゴリズムを用いて、前記入力データ固有の演算値及び前記データピースの演算値を算出する演算値算出手順と、
前記入力データ固有の演算値及び前記データピースの演算値並びに前記データ変更手順における変更履歴が関連付けられたメタデータを格納するメタデータ格納手順と、
前記入力データ固有の演算値及び前記データピースの演算値を前記各データピースに付して分散用データを構成し、当該分散用データを格納する分散用データ格納手順と、
前記分散用データを、前記分散用データに付されている前記データピースの演算値に適合する識別子を有するクライアントのうちの任意のクライアントに対して送信するサーバ分散用データ送信手順と、
を順に有するデータ分散保管方法。 A data change procedure for changing the data arrangement of the input data based on a predetermined rule;
A data division procedure for dividing the changed data obtained by changing the input data into a plurality of data pieces;
A calculation value calculation procedure for calculating a calculation value specific to the input data and a calculation value of the data piece using a predetermined calculation algorithm;
A metadata storage procedure for storing metadata associated with a calculation value unique to the input data, a calculation value of the data piece, and a change history in the data change procedure;
A distribution data storage procedure for storing the distribution data by adding the calculation value specific to the input data and the calculation value of the data piece to each data piece to form the distribution data;
A server distribution data transmission procedure for transmitting the distribution data to an arbitrary client of clients having an identifier that matches an operation value of the data piece attached to the distribution data;
The data distributed storage method which has in order.

前記分散用データ格納手順において、前記データピースの演算値に適合する識別子を有するクライアントからは取得可能であるけれども前記データピースの演算値に適合しない識別子を有するクライアントからは取得不可能な状態で、前記分散用データ構成部の構成する前記分散用データを格納することを特徴とする請求項７に記載のデータ分散保管方法。 In the data storage procedure for distribution, in a state that can be obtained from a client having an identifier that matches the calculated value of the data piece but cannot be acquired from a client that has an identifier that does not match the calculated value of the data piece, The data distribution storage method according to claim 7, wherein the distribution data configured by the distribution data configuration unit is stored.

前記データ分割手順において、前記データピースのパリティデータを生成し、
前記演算値算出手順において、前記演算アルゴリズムを用いて、前記パリティデータの演算値をさらに算出し、
前記分散用データ格納手順において、前記入力データ固有の演算値及び前記パリティデータの演算値を前記パリティデータに付して前記分散用データをさらに構成し、
前記サーバ分散用データ送信手順において、前記分散用データを、前記パリティデータの演算値に適合する識別子を有するクライアントにさらに送信する
ことを特徴とする請求項７又は８に記載のデータ分散保管方法。 In the data division procedure, generate parity data of the data piece,
In the calculation value calculation procedure, the calculation value of the parity data is further calculated using the calculation algorithm,
In the distribution data storage procedure, the distribution data is further configured by attaching the operation value specific to the input data and the operation value of the parity data to the parity data,
9. The data distribution and storage method according to claim 7, wherein, in the server distribution data transmission procedure, the distribution data is further transmitted to a client having an identifier that matches a calculated value of the parity data.

前記サーバ分散用データ送信手順において、前記分散用データを、前記データピースの演算値に一致する識別子を有するクライアント及び前記演算アルゴリズムを用いて導き出される数値空間において前記データピースの演算値付近に位置する演算値に対応する識別子を有するクライアントのうちの少なくともいずれかのクライアントに送信する
ことを特徴とする請求項７から９のいずれかに記載のデータ分散保管方法。 In the server distribution data transmission procedure, the distribution data is positioned near the calculation value of the data piece in a numerical space derived using a client having an identifier that matches the calculation value of the data piece and the calculation algorithm. The data distributed storage method according to any one of claims 7 to 9, wherein the data is transmitted to at least one of the clients having an identifier corresponding to the calculated value.

前記メタデータ格納手順において格納した前記メタデータを取得するメタデータ取得手順と、
前記メタデータ取得手順で取得した前記メタデータに含まれる前記データピースの演算値に適合するクライアントから、前記メタデータに含まれる前記入力データ固有の演算値が付されている前記分散用データを回収する分散用データ回収手順と、
前記メタデータ取得手順で取得した前記メタデータに含まれる前記データピースの演算値に従って、前記分散用データ回収手順で回収した前記分散用データを配列し、前記データピースを結合するデータ結合手順と、
前記メタデータ取得手順で取得した前記メタデータに含まれる前記変更履歴に基づいて、前記データ結合手順で結合した結合データを前記入力データに復元するデータ復元手順と、
を前記サーバ分散用データ送信手順の後に順に有することを特徴とする請求項７から１０のいずれかに記載のデータ分散保管方法。 A metadata acquisition procedure for acquiring the metadata stored in the metadata storage procedure;
The data for distribution to which the operation value specific to the input data included in the metadata is attached is collected from a client that matches the operation value of the data piece included in the metadata acquired in the metadata acquisition procedure. Data collection procedure for distribution,
In accordance with the calculated value of the data piece included in the metadata acquired in the metadata acquisition procedure, the data for dispersion distributed in the data recovery procedure for distribution is arranged, and a data combining procedure for combining the data pieces;
A data restoration procedure for restoring the combined data combined in the data combining procedure to the input data based on the change history included in the metadata acquired in the metadata acquiring procedure;
The data distributed storage method according to claim 7, further comprising: in order after the server distribution data transmission procedure.

前記演算値算出手順において、予め定められた演算アルゴリズムを用いて、前記入力データ自体の演算値をさらに算出し、
前記メタデータ格納手順において、前記演算値算出手順で算出した前記入力データ自体の演算値をさらに含む前記メタデータを格納し、
前記入力データ自体の演算値を算出した前記演算アルゴリズムを用いて前記データ復元手順で復元した復元データ自体の演算値を算出し、算出した前記復元データ自体の演算値を、前記メタデータ取得手順で取得した前記メタデータに含まれる前記入力データ自体の演算値と照合する演算値照合手順を前記データ復元手順の後にさらに有する
ことを特徴とする請求項１１に記載のデータ分散保管方法。 In the calculation value calculation procedure, a calculation value of the input data itself is further calculated using a predetermined calculation algorithm,
In the metadata storage procedure, storing the metadata further including a calculation value of the input data itself calculated in the calculation value calculation procedure,
The calculation value of the restored data itself restored in the data restoration procedure is calculated using the calculation algorithm that calculated the calculated value of the input data itself, and the calculated calculated value of the restored data itself is calculated in the metadata acquisition procedure. The data distributed storage method according to claim 11, further comprising a calculated value collating procedure for collating with a calculated value of the input data itself included in the acquired metadata after the data restoring procedure.

請求項７から１２のいずれかに記載のデータ分散保管方法をコンピュータに実行させるためのプログラム。 A program for causing a computer to execute the data distributed storage method according to any one of claims 7 to 12.

請求項７から１２のいずれかに記載のデータ分散保管方法をコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium recording a program for causing a computer to execute the data distributed storage method according to claim 7.