JP5816299B2

JP5816299B2 - Secret search method and secret search device

Info

Publication number: JP5816299B2
Application number: JP2013546927A
Authority: JP
Inventors: 康広藤井; 進芹田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2011-12-01
Filing date: 2011-12-01
Publication date: 2015-11-18
Anticipated expiration: 2031-12-01
Also published as: JPWO2013080365A1; US9311494B2; US20140331044A1; WO2013080365A1

Description

本発明は、クラウドコンピューティングなどのクライアントサーバモデルにおいて、サーバに預託された暗号化データを復号せずに検索する「検索可能暗号」を用いた、秘匿検索システムに関する。 The present invention relates to a secret search system using a “searchable cipher” for searching without decrypting encrypted data deposited in a server in a client server model such as cloud computing.

クラウドコンピューティングの普及により、データベースサーバへの情報の預託が活発になってきてきる。その一方で、個人情報などといった機密情報の漏えいも大きな社会問題となってきている。 With the spread of cloud computing, the deposit of information to database servers has become active. On the other hand, leakage of confidential information such as personal information has become a major social problem.

情報漏えいを防止しつつ安全にデータベースサーバへ情報を預託するために、暗号化したまま預託データの検索を可能にする検索可能暗号技術が提案されている。検索可能暗号を用いることで、通信路上の第三者だけではなくデータベースサーバ管理者に対しても情報漏えいを防止することができる。 In order to safely deposit information to a database server while preventing information leakage, a searchable encryption technique that enables retrieval of deposit data while being encrypted has been proposed. By using searchable encryption, it is possible to prevent information leakage not only for third parties on the communication path but also for the database server administrator.

検索可能暗号技術としてさまざまな方法が提案されている。検索可能暗号を用いた検索は、一般に以下の手順で行われる。 Various methods have been proposed as searchable cryptographic techniques. A search using a searchable encryption is generally performed according to the following procedure.

（１）データを預託するクライアントにおいて、預託するデータ（以下、預託データとよぶ）の内容を表すインデックスを計算し、秘匿化する。ここで秘匿化とは、当該インデックスから預託データの内容を求めることが困難となるような検索可能暗号特有の処理をいう。以下、秘匿化されたインデックスを秘匿化インデックスという。 (1) In a client depositing data, an index representing the contents of the deposited data (hereinafter referred to as deposit data) is calculated and concealed. Here, concealment refers to a process unique to searchable encryption that makes it difficult to obtain the contents of deposit data from the index. Hereinafter, the concealed index is referred to as a concealed index.

（２）クライアントは、預託するデータを暗号化し（以下、暗号化データという）、秘匿化インデックスとともにデータベースサーバに送信する。 (2) The client encrypts data to be deposited (hereinafter referred to as encrypted data) and transmits it to the database server together with the concealment index.

（３）データベースサーバは、暗号化データと秘匿化インデックスのペアをデータベースに登録する。 (3) The database server registers a pair of encrypted data and a concealment index in the database.

（４）データを検索する検索クライアントにおいて、検索したいキーワード（検索クエリ）のトラップドア(trapdoor)を計算する。ここでトラップドアとは、検索用の情報であり、特に、検索クエリに含まれる検索用キーワードを秘匿化したものをいう。 (4) In a search client that searches for data, a trapdoor of a keyword (search query) to be searched is calculated. Here, the trap door is information for search, and in particular, it means that the search keyword included in the search query is concealed.

（５）検索クライアントは、トラップドアをデータベースサーバに送信する。 (5) The search client transmits the trap door to the database server.

（６）データベースサーバは、データベースに登録した秘匿化インデックスとトラップドアを検索可能暗号特有の手順で照合することで、検索クエリにヒットするデータを検索する。 (6) The database server searches the data hitting the search query by collating the concealment index registered in the database with the trapdoor in a procedure specific to searchable encryption.

（７）データベースサーバは、ヒットした秘匿化インデックスに対応する暗号化データなどを検索クライアントに送信する。 (7) The database server transmits encrypted data corresponding to the hit concealment index to the search client.

（８）検索クライアントは、受信した検索結果からデータを預託したクライアントを特定し、当該クライアントとの間で復号鍵を共有する。 (8) The search client specifies the client that has deposited the data from the received search result, and shares the decryption key with the client.

（９）検索クライアントは、共有した鍵を用いて、データベースサーバから受信した暗号化データを復号する。 (9) The search client decrypts the encrypted data received from the database server using the shared key.

預託データが暗号化されているので、データベースサーバ管理者が預託データを解読するのは実質不可能である。また、インデックスが秘匿化されているので、インデックスから預託データの内容を抽出するのは困難である。さらに、検索クエリがトラップドアに変換されているので、検索クエリが漏えいする可能性も低い。さらに、それぞれ異なる秘匿化インデックスが同じキーワードを含んでいるかどうかを判定するのも困難なので、単語の出現頻度から平文を推測する頻度分析などの不正攻撃を防止することができる。このように検索可能暗号技術を用いることで、通信路上の第三者だけではなくデータベースサーバ管理者などに対しても情報漏えいを事実上防止することができる。 Since the deposit data is encrypted, it is virtually impossible for the database server administrator to decrypt the deposit data. Moreover, since the index is kept secret, it is difficult to extract the contents of the deposit data from the index. Furthermore, since the search query is converted into a trapdoor, the possibility that the search query is leaked is low. Furthermore, since it is difficult to determine whether or not different concealment indexes include the same keyword, it is possible to prevent illegal attacks such as frequency analysis in which plaintext is estimated from the appearance frequency of words. By using the searchable encryption technology in this way, information leakage can be effectively prevented not only for third parties on the communication path but also for database server managers and the like.

検索可能暗号技術として、例えば非特許文献１や非特許文献２が知られている。これらの方式は、平文と暗号文が１対１の単純な対応関係を有する決定的暗号化方式よりも安全な、平文と暗号文が１対ｍの複雑な対応関係を有する確率的暗号化方式を採用している。頻度分析などの攻撃に対して比較的安全である。 For example, Non-Patent Document 1 and Non-Patent Document 2 are known as searchable encryption technologies. These schemes are probabilistic encryption schemes in which plaintext and ciphertext have a 1-to-1 m complex correspondence that is safer than a deterministic encryption scheme in which plaintext and ciphertext have a simple one-to-one correspondence. Is adopted. Relatively safe against attacks such as frequency analysis.

また、非特許文献３、非特許文献４、および特許文献１も知られている。非特許文献３や非特許文献４記載の方式では、確率的なデータ構造の一つであるブルームフィルタを利用することで頻度分析などの攻撃にも耐性を持たせている。特許文献１記載の方法では、誤り訂正符号を用いて集合間のあいまい照合を実現する「Fuzzy Vault」を用いることで、頻度分析などの攻撃にも耐性を持たせている。 Non-Patent Document 3, Non-Patent Document 4, and Patent Document 1 are also known. In the methods described in Non-Patent Document 3 and Non-Patent Document 4, resistance against attacks such as frequency analysis is provided by using a Bloom filter which is one of probabilistic data structures. In the method described in Patent Document 1, “Fuzzy Vault” that realizes fuzzy matching between sets using an error correction code is used, so that it is resistant to attacks such as frequency analysis.

非特許文献１〜非特許文献４や特許文献１記載の技術は、いずれも、確率的暗号化や確率的データ構造、あいまい照合技術などを利用することで、頻度分析に対しても安全となるようにしている。具体例を挙げると、例えば「雲」というキーワードを含む複数のデータをデータベースサーバに預託する場合、預託データごとに対応する秘匿化インデックスは異なったものになる。さらに、秘匿化インデックス同士を比較しても、同じ「雲」というキーワードを含んでいると判定することは困難となっている。また、「雲」で検索したとしても、トラップドアから検索クエリ「雲」を推測することは困難である。よって検索クエリにヒットした事実を知ったとしても、秘匿化インデックスが「雲」を含んでいるかどうか、データベースサーバ管理者には事実上知られることはない。 All of the techniques described in Non-Patent Document 1 to Non-Patent Document 4 and Patent Document 1 are secure against frequency analysis by using probabilistic encryption, probabilistic data structure, fuzzy matching technique, and the like. I am doing so. As a specific example, for example, when a plurality of data including the keyword “cloud” is deposited in a database server, the concealment index corresponding to each deposit data is different. Furthermore, even if the concealment indexes are compared with each other, it is difficult to determine that the same keyword “cloud” is included. Further, even if a search is performed using “cloud”, it is difficult to infer the search query “cloud” from the trap door. Therefore, even if the fact that the search query is hit is known, whether or not the concealment index includes “cloud” is virtually unknown to the database server administrator.

特開２００９−２７１５８４号公報JP 2009-271484 A

Dawn Xiaodong Song, David Wagner and Arian Perrig: "Practical Techniques for Searches on Encrypted Data," In Proceedings of the 2000 IEEE Symposium on Security and Privacy, pp. 44-55 (2000).Dawn Xiaodong Song, David Wagner and Arian Perrig: "Practical Techniques for Searches on Encrypted Data," In Proceedings of the 2000 IEEE Symposium on Security and Privacy, pp. 44-55 (2000). Zhiqiang Yang, Sheng Zhong, Rebecca N. Wright: "Privacy-Preserving Queries on Encrypted Data," In Proceedings of the 11th European Symposium on Research in Computer Security (Esorics), Vol. 4189 of Lecture Notes in Computer Science, pp. 476-495 (2006).Zhiqiang Yang, Sheng Zhong, Rebecca N. Wright: "Privacy-Preserving Queries on Encrypted Data," In Proceedings of the 11th European Symposium on Research in Computer Security (Esorics), Vol. 4189 of Lecture Notes in Computer Science, pp. 476 -495 (2006). Eu-Jin Goh: "Secure Indexes," Cryptology ePrint Archive, Report 2003/216 (2003).Eu-Jin Goh: "Secure Indexes," Cryptology ePrint Archive, Report 2003/216 (2003). 菅孝徳、西出隆志、堀良彰、櫻井幸一：「ブルームフィルタを用いた検索自由度の高い検索可能暗号の設計と実装評価」、IEICE technical report Vol.111, No.30, pp. 111-116 (2011).Takanori Tsuji, Takashi Nishide, Yoshiaki Hori, Koichi Sakurai: “Design and Implementation Evaluation of Searchable Ciphers with High Freedom of Search Using Bloom Filter”, IEICE technical report Vol.111, No.30, pp. 111-116 (2011). A. D. Bimbo: "Visual Information Retrieval", Morgan Kaufmann Publishers (1999).A. D. Bimbo: "Visual Information Retrieval", Morgan Kaufmann Publishers (1999). 芹田進、藤井康広、甲斐賢、村上隆夫、本多義則：「ファイル伸縮に耐性のある類似ハッシュ算出方式の考察」、IEICE technical report Vol.110, No.282, pp.31-36 (2010).Susumu Hamada, Yasuhiro Fujii, Ken Kai, Takao Murakami, Yoshinori Honda: “Study of Similar Hash Calculation Method Resistant to File Stretching”, IEICE technical report Vol.110, No.282, pp.31-36 (2010) . C. M. ビショップ：「パターン認識と機械学習」、シュプリンガー・ジャパン株式会社 (2007).C. M. Bishop: “Pattern Recognition and Machine Learning”, Springer Japan Ltd. (2007). F. Murtagh: "A Survey of Recent Advances in Hierarchical Clustering Algorithms", The Computer Journal, vol.26, pp.354-359 (1983).F. Murtagh: "A Survey of Recent Advances in Hierarchical Clustering Algorithms", The Computer Journal, vol.26, pp.354-359 (1983).

一般に、文字列検索においては、単語とその単語を含む文書の索引（転置インデックス）を備えることで、検索応答時間の短縮を図っている。このような索引がない場合、毎回検索クエリと預託データを総当りで照合する必要があり、検索応答が大幅に遅延することとなる。 Generally, in a character string search, a search response time is shortened by providing a word and an index (transposition index) of a document including the word. If there is no such index, it is necessary to collate the search query with the deposit data every time, and the search response is greatly delayed.

非特許文献１〜非特許文献４や特許文献１記載の技術では、頻度分析などの攻撃に耐性を持たせるために、秘匿化インデックスがどの単語を含んでいるかを判定することが困難な仕組みとなっている。つまり、転置インデックスなどの索引を構成することが事実上不可能である。よってこれらの従来技術では、検索クエリと秘匿化インデックスを毎回総当りで照合する必要があり、検索応答が大幅に遅延してしまう。本発明の目的は、検索可能暗号を用いた秘匿検索システムにおいては、検索の高速化を達成することにある。 In the technologies described in Non-Patent Document 1 to Non-Patent Document 4 and Patent Document 1, it is difficult to determine which word the concealment index contains in order to have resistance against attacks such as frequency analysis. It has become. That is, it is practically impossible to construct an index such as a transposed index. Therefore, in these conventional techniques, it is necessary to collate the search query with the concealment index every time, and the search response is greatly delayed. An object of the present invention is to achieve high speed search in a secret search system using searchable encryption.

上記の目的を達成するために本発明は、暗号化データや秘匿化インデックスだけではなく、預託データの特徴量もデータベースサーバに登録する手段を提供する。ここで特徴量とは、預託データの特徴をできるだけ損なわないようにデータ長を大幅に削減したもので、特徴量だけを用いて預託データの類似度を計算することができるが、特徴量から元のデータを推測することが困難なものとして定義される。特徴量として、例えば預託データ内の単語などから計算される特徴ベクトルや、預託データを分割してハッシュ値を求めて連結したファジィハッシュとよばれる量などが知られている。 In order to achieve the above object, the present invention provides a means for registering not only encrypted data and a concealment index but also a feature amount of deposit data in a database server. Here, the feature quantity is a data length that has been greatly reduced so as not to impair the characteristics of the deposit data as much as possible, and the similarity of the deposit data can be calculated using only the feature quantity. Is defined as difficult to guess. Known feature quantities include, for example, feature vectors calculated from words in depositary data, quantities called fuzzy hashes obtained by dividing depository data and obtaining hash values.

次に本発明は、暗号化データや秘匿化インデックスともに受信した特徴量を用いて、データベースサーバ側で特徴量に対応する預託データの類似度を計算し、類似した預託データが同じクラスタに含まれるように秘匿化インデックスなどをクラスタリングしておく手段を提供する。 Next, the present invention calculates the similarity of deposit data corresponding to the feature amount on the database server side using the feature amount received together with the encrypted data and the concealment index, and the similar deposit data is included in the same cluster. Thus, a means for clustering the concealment index is provided.

さらに本発明は、秘匿検索処理において、まず各クラスタにおいて秘匿化インデックスの代表（以下ピボットと呼ぶ）を選択して、ピボットとトラップドア（検索クエリに含まれる検索キーワードを秘匿化したもの）の照合を行い、ピボットがトラップドアにヒットした場合、当該ピボットが属するクラスタに含まれる全登録データについてトラップドアとの照合の優先順位を上げ、ピボットがトラップドアにヒットしなかった場合、当該ピボットが属するクラスタに含まれる全登録データについて照合の優先順位を下げる手段を提供する。照合対象の優先順位を定めた後全登録データについて照合を順次行い、一定回数で照合を打ち切ることで、安全性や検索精度の低下を抑えつつ秘匿検索処理の高速化を実現する。 Further, according to the present invention, in the secret search process, first, a representative of the concealment index (hereinafter referred to as a pivot) is selected in each cluster, and the pivot and the trap door (the concealment of the search keyword included in the search query) are collated. If the pivot hits the trap door, the priority of collation with the trap door is raised for all registered data included in the cluster to which the pivot belongs, and if the pivot does not hit the trap door, the pivot belongs A means for lowering the priority of collation for all registered data included in a cluster is provided. After the priority order of the verification target is determined, the verification is sequentially performed for all registered data, and the verification is terminated at a fixed number of times, thereby realizing a high-speed secure search process while suppressing a decrease in safety and search accuracy.

データベースサーバに預託された暗号化データを復号せずに検索する秘匿検索システムにおいて、元のデータを推測しにくい特徴量を用いて秘匿化インデックスをクラスタリングしておくことで、安全性や検索精度の低下を抑えつつ、秘匿検索を高速化することができる。 In a secure search system that searches encrypted data entrusted to a database server without decryption, clustering the secure index using features that are difficult to guess the original data can improve safety and search accuracy. It is possible to speed up the confidential search while suppressing the decrease.

本発明の実施の形態における、秘匿検索処理システムの概略を例示する図である。It is a figure which illustrates the outline of the secret search processing system in embodiment of this invention. 本発明の実施の形態における、登録クライアントの概略構成を例示する図である。It is a figure which illustrates schematic structure of the registration client in embodiment of this invention. 本発明の実施の形態における、検索クライアントの概略構成を例示する図である。It is a figure which illustrates schematic structure of a search client in embodiment of this invention. 本発明の実施の形態における、データベースサーバの概略構成を例示する図である。It is a figure which illustrates schematic structure of the database server in embodiment of this invention. 本発明の実施の形態における、登録クライアントとデータベースサーバのデータ登録処理を例示するシーケンス図である。It is a sequence diagram which illustrates the data registration process of a registration client and a database server in embodiment of this invention. 本発明の実施の形態における、データベースサーバが作成する登録データ格納位置管理テーブルおよびクラスタ管理テーブルのデータ構成を例示する図である。It is a figure which illustrates the data structure of the registration data storage location management table and cluster management table which a database server produces in embodiment of this invention. 本発明の実施の形態における、検索クライアントとデータベースサーバの秘匿検索処理を例示するシーケンス図である。It is a sequence diagram which illustrates the secret search process of a search client and a database server in embodiment of this invention. 本発明の実施の形態における、データベースサーバが行う優先順位算出の処理手順を例示するフローチャートである。It is a flowchart which illustrates the processing sequence of the priority order calculation which a database server performs in embodiment of this invention. 本発明の実施の形態における、データベースサーバが行う秘匿化インデックスとトラップドアの照合処理手順を例示するフローチャートである。It is a flowchart which illustrates the collation processing procedure of the concealment index and trapdoor which a database server performs in embodiment of this invention. 本発明の実施の形態における、データベースサーバが行う秘匿化インデックスとトラップドアの照合処理手順を例示するフローチャートである。It is a flowchart which illustrates the collation processing procedure of the concealment index and trapdoor which a database server performs in embodiment of this invention. 本発明の実施の形態における、検索クライアントまたはデータベースサーバで行う設定画面を例示する図である。It is a figure which illustrates the setting screen performed with a search client or a database server in embodiment of this invention. 本発明の処理の概要を示す図である。It is a figure which shows the outline | summary of the process of this invention. 秘匿化インデックス生成の手順を示す図である。It is a figure which shows the procedure of a concealment index production | generation. 秘匿化インデックスとトラップドアの照合の手順を示す図である。It is a figure which shows the procedure of collation of a concealment index and a trap door. 預託データから特徴量ベクトルを生成する手順を示す図である。It is a figure which shows the procedure which produces | generates the feature-value vector from deposit data.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（システム構成）
図１は、本発明の実施の形態である秘匿検索処理システムの概略図である。図示するように検索処理システムは、ｎ個の登録クライアント２０−１〜２０−ｎ、ｍ個の検索クライアント３０−１〜３０−ｍおよびデータベースサーバ４０を備え、これらがネットワーク１０を介して相互に情報を送受信できるよう設計されている。ここでｎ及びｍは１以上の整数をとり、ｎとｍは異なっていてもよい。登録クライアント２０−１〜２０−ｎはすべて同じ構成をとる。以下、任意の一つを登録クライアント２０と呼ぶ。同様に、検索クライアント３０−１〜３０−ｍはすべて同じ構成をとる。以下、任意の一つを検索クライアント３０と呼ぶ。(System configuration)
FIG. 1 is a schematic diagram of a confidential search processing system according to an embodiment of the present invention. As shown in the figure, the search processing system includes n registration clients 20-1 to 20-n, m search clients 30-1 to 30-m, and a database server 40, which are mutually connected via the network 10. Designed to send and receive information. Here, n and m take an integer of 1 or more, and n and m may be different. The registered clients 20-1 to 20-n all have the same configuration. Hereinafter, an arbitrary one is referred to as a registered client 20. Similarly, all the search clients 30-1 to 30-m have the same configuration. Hereinafter, an arbitrary one is referred to as a search client 30.

本実施形態における登録クライアント２０は、暗号化した預託データなどをデータベースサーバ４０に送信するデータ登録用の送受信装置として機能する。検索クライアント３０は、秘匿化した検索クエリをデータベースサーバ４０に送信して検索結果を受信する、検索用の送受信装置として機能する。データベースサーバ４０は、暗号化した預託データなどをデータベースに登録し、かつデータベース内のデータを検索する秘匿検索装置として機能する。 The registration client 20 in this embodiment functions as a data registration transmission / reception device that transmits encrypted deposit data to the database server 40. The search client 30 functions as a search transmission / reception device that transmits a concealed search query to the database server 40 and receives a search result. The database server 40 functions as a secret search device that registers encrypted deposit data and the like in the database and searches for data in the database.

（登録クライアント）
図２は登録クライアント２０の概略構成を例示する図である。図示するように登録クライアント２０は、ＣＰＵ２１２、メモリ２１４、記憶装置２１６、鍵生成部２１８、登録部２２０、ユーザインターフェース２３０および通信インターフェース２３２を備え、これらが内部バス２００を介して相互に情報を送受信できるように設計されている。登録部２２０は、暗号化部２２２、秘匿化インデックス生成部２２４、特徴量算出部２２６および設定部２２８を備える。これらの装置はハブ２００を介して、単体でＣＰＵ２１２などと相互に情報を送受信することができる。(Registered client)
FIG. 2 is a diagram illustrating a schematic configuration of the registration client 20. As illustrated, the registration client 20 includes a CPU 212, a memory 214, a storage device 216, a key generation unit 218, a registration unit 220, a user interface 230, and a communication interface 232, which transmit / receive information to / from each other via the internal bus 200. Designed to be able to. The registration unit 220 includes an encryption unit 222, a concealment index generation unit 224, a feature amount calculation unit 226, and a setting unit 228. These devices can transmit and receive information to and from the CPU 212 and the like alone via the hub 200.

まず汎用的な構成要素について説明する。ＣＰＵ２１２は、様々な数値計算や情報処理、機器制御などを行う中央処理装置である。メモリ２１４は、ＣＰＵ２１２が直接読み書きできるＲＡＭやＲＯＭなどの半導体記憶装置である。記憶装置２１６は、コンピュータ内でデータやプログラムを記憶するハードディスクや磁気テープ、フラッシュメモリなどといった装置である。当該装置はデータベースサーバ４０に預託するデータなどを格納する。 First, general-purpose components will be described. The CPU 212 is a central processing unit that performs various numerical calculations, information processing, device control, and the like. The memory 214 is a semiconductor storage device such as a RAM and a ROM that can be directly read and written by the CPU 212. The storage device 216 is a device such as a hard disk, a magnetic tape, or a flash memory that stores data and programs in the computer. The apparatus stores data to be deposited in the database server 40.

鍵生成部２１８は、データを暗号化もしくは復号化するための鍵などを生成し、さらに検索クライアント３０と復号鍵を共有するための処理を行う装置である。復号鍵の共有については後で図７を用いて説明する。 The key generation unit 218 is a device that generates a key for encrypting or decrypting data, and performs processing for sharing the decryption key with the search client 30. Decryption key sharing will be described later with reference to FIG.

ユーザインターフェース２３０は、ユーザに処理結果を出力し、かつユーザの指示を受け付けて登録クライアント２０の各構成要素に反映させる、ディスプレイやマウス、キーボードなどといった装置である。通信インターフェース２３２は、登録クライアント２０の各構成要素と検索クライアント３０やデータベースサーバ４０などの外部装置とのデータの送受信を制御するための装置である。 The user interface 230 is a device such as a display, a mouse, and a keyboard that outputs a processing result to the user, receives a user instruction, and reflects it on each component of the registration client 20. The communication interface 232 is a device for controlling transmission / reception of data between each component of the registration client 20 and an external device such as the search client 30 or the database server 40.

本発明特有の構成要素は、登録部２２０およびこれを構成する暗号化部２２２、秘匿化インデックス生成部２２４、特徴量算出部２２６、設定部２２８である。これらのうち、従来の検索可能暗号技術にはない最も特徴的な構成要素は、特徴量算出部２２６である。まず登録部２２０の構成要素から説明する。 Constituent elements unique to the present invention are a registration unit 220, an encryption unit 222, a concealment index generation unit 224, a feature amount calculation unit 226, and a setting unit 228 that constitute the registration unit 220. Among these, the most characteristic component not found in the conventional searchable encryption technology is the feature amount calculation unit 226. First, the components of the registration unit 220 will be described.

暗号化部２２２は、データベースサーバ４０に預託するデータを記憶装置２１６から読み出して、鍵生成部２１８で生成した暗号鍵を用いて暗号化し、暗号化したデータを登録部２２０に通知し、またはメモリ２１４もしくは記憶装置２１６に一時的に出力する装置である。 The encryption unit 222 reads the data to be deposited with the database server 40 from the storage device 216, encrypts it using the encryption key generated by the key generation unit 218, notifies the registration unit 220 of the encrypted data, or the memory 214 or a device that temporarily outputs to the storage device 216.

秘匿化インデックス生成部２０４は、データベースサーバ４０に預託するデータを記憶装置２１６から読み出し、検索可能暗号特有のアルゴリズムで預託データの内容から秘匿化インデックスを生成し、生成した秘匿化インデックスを登録部２２０に通知し、またはメモリ２１４もしくは記憶装置２１６に一時的に出力する装置である。秘匿化インデックスの具体的な生成手順については、後で図５を用いて説明する。 The concealment index generation unit 204 reads data to be deposited in the database server 40 from the storage device 216, generates a concealment index from the contents of the deposit data using an algorithm specific to searchable encryption, and registers the generated concealment index in the registration unit 220. Or temporarily output to the memory 214 or the storage device 216. A specific procedure for generating the concealment index will be described later with reference to FIG.

特徴量算出部２０６は、データベースサーバ４０に預託するデータを記憶装置２１６から読み出し、所定のアルゴリズムで預託データの特徴量を算出し、算出した特徴量を登録部２２０に通知し、またはメモリ２１４もしくは記憶装置２１６に一時的に出力する装置である。特徴量の具体的な算出手順については後で図５を用いて説明する。 The feature amount calculation unit 206 reads data to be deposited in the database server 40 from the storage device 216, calculates the feature amount of the deposit data using a predetermined algorithm, notifies the registration unit 220 of the calculated feature amount, or the memory 214 or This is a device that temporarily outputs to the storage device 216. A specific procedure for calculating the feature amount will be described later with reference to FIG.

設定部２２８は、暗号化、秘匿化インデックス生成、特徴量算出などの処理に必要なパラメータを設定するための装置である。当該パラメータはユーザインターフェース２３０を介してユーザによって設定され、登録部２２０、暗号化部２２２、秘匿化インデックス生成部２２４、特徴量算出部２２６に反映される。 The setting unit 228 is a device for setting parameters necessary for processing such as encryption, concealment index generation, and feature amount calculation. The parameter is set by the user via the user interface 230 and is reflected in the registration unit 220, the encryption unit 222, the concealment index generation unit 224, and the feature amount calculation unit 226.

そして登録部２２０は、ユーザインターフェース２３０を介してデータベースサーバ４０への登録指示をユーザから受け取り、指示された記憶装置２１６の格納データに対して、暗号化部２２２、秘匿化インデックス生成部２２４および特徴量算出部２２６を制御して、暗号化データ、秘匿化インデックスおよび特徴量の組に含まれるそれぞれを計算して、計算したそれぞれを含む組を、通信インターフェース２３２を介してデータベースサーバ４０に送信する装置である。データ登録処理の詳細については後で図５を用いて説明する。 Then, the registration unit 220 receives a registration instruction to the database server 40 from the user via the user interface 230, and with respect to the storage data of the instructed storage device 216, the encryption unit 222, the concealment index generation unit 224, and the feature The amount calculation unit 226 is controlled to calculate each included in the set of the encrypted data, the concealment index, and the feature amount, and the set including each calculated is transmitted to the database server 40 via the communication interface 232. Device. Details of the data registration processing will be described later with reference to FIG.

なお、鍵生成部２１８、登録部２２０およびこれを構成する暗号化部２２２、秘匿化インデックス生成部２２４、特徴量算出部２２６、設定部２２８については、それぞれの装置が単体で処理を実行してもよいし、それぞれの装置はプログラムのみを具備し、ＣＰＵ２１２が当該プログラムをメモリ２１４に読み込んで実行してもよい。 The key generation unit 218, the registration unit 220, and the encryption unit 222, the concealment index generation unit 224, the feature amount calculation unit 226, and the setting unit 228 that constitute the key generation unit 218, the registration unit 220, and the setting unit 228 are individually executed by each device Alternatively, each device may include only a program, and the CPU 212 may read the program into the memory 214 and execute the program.

（検索クライアント）
図３は検索クライアント３０の概略構成を例示する図である。図示するように検索クライアント３０は、ＣＰＵ３１２、メモリ３１４、記憶装置３１６、検索部３２０、ユーザインターフェース３３０および通信インターフェース３３２を備え、これらが内部バス３００を介して相互に情報を送受信できるように設計されている。また、検索部３２０は、トラップドア生成部３２２、鍵共有部３２４、復号化部３２６および設定部３２８を備える。これらの装置は内部バス３００を介して、単体でＣＰＵ３１２などと相互に情報を送受信することができる。「トラップドア」とは、検索用の情報であって、検索クエリに含まれる検索キーワードを秘匿化したものである。(Search client)
FIG. 3 is a diagram illustrating a schematic configuration of the search client 30. As illustrated, the search client 30 includes a CPU 312, a memory 314, a storage device 316, a search unit 320, a user interface 330, and a communication interface 332, which are designed so that they can transmit / receive information to / from each other via the internal bus 300. ing. The search unit 320 includes a trap door generation unit 322, a key sharing unit 324, a decryption unit 326, and a setting unit 328. These devices can send and receive information to and from the CPU 312 or the like alone via the internal bus 300. “Trap door” is information for search and is a search keyword included in a search query.

汎用的な構成要素であるＣＰＵ３１２、メモリ３１４、記憶装置３１６、ユーザインターフェース３３０および通信インターフェース３３２は、図２の説明と同様の機能を有する。本発明特有の構成要素は、検索部３２０およびこれを構成するトラップドア生成部３２２、鍵共有部３２４、復号化部３２６、設定部３２８となる。以下、ＣＰＵ３１２など汎用的な構成要素の説明については割愛し、検索部３２０の構成要素から説明する。 The CPU 312, the memory 314, the storage device 316, the user interface 330, and the communication interface 332 that are general-purpose components have the same functions as described in FIG. Constituent elements unique to the present invention are a search unit 320 and a trapdoor generation unit 322, a key sharing unit 324, a decryption unit 326, and a setting unit 328 that constitute the search unit 320. Hereinafter, description of general-purpose components such as the CPU 312 will be omitted, and the components of the search unit 320 will be described.

トラップドア生成部３２２は、検索クエリを、ユーザインターフェース３３０を介してユーザから受け取り、検索可能暗号特有のアルゴリズムで検索クエリに含まれるキーワードを秘匿化することによってトラップドアを生成し、生成したトラップドアを検索部３２０に通知し、またはメモリ３１４もしくは記憶装置３１６に出力する装置である。トラップドアの具体的な生成手順については、後で図７を用いて説明する。 The trap door generation unit 322 receives a search query from the user via the user interface 330, generates a trap door by concealing a keyword included in the search query using an algorithm specific to searchable encryption, and generates the generated trap door. Is transmitted to the memory 314 or the storage device 316. A specific procedure for generating the trapdoor will be described later with reference to FIG.

鍵共有部３２４は、データベースサーバ４０から検索クエリにヒットした暗号化データを受信した際、当該暗号化データの復号鍵を登録クライアント２０と共有するための装置である。共有した復号鍵は、検索部３２０内に、またはメモリ３１４もしくは記憶装置３１６に一時的に保存される。具体的な鍵共有処理については、後で図７を用いて説明する。 The key sharing unit 324 is a device for sharing the decryption key of the encrypted data with the registered client 20 when receiving the encrypted data that hits the search query from the database server 40. The shared decryption key is temporarily stored in the search unit 320 or in the memory 314 or the storage device 316. Specific key sharing processing will be described later with reference to FIG.

復号化部３２６は、鍵共有部３２４が取得した復号鍵を用いてデータベースサーバ４０から受信した暗号化データを復号して検索部３２０に通知し、またはメモリ３１４もしくは記憶装置３１６に一時的に出力する装置である。 The decryption unit 326 decrypts the encrypted data received from the database server 40 using the decryption key acquired by the key sharing unit 324 and notifies the search unit 320 or temporarily outputs it to the memory 314 or the storage device 316. It is a device to do.

設定部３２８は、トラップドア生成、鍵共有、復号化などの処理に必要なパラメータを設定するための装置である。当該パラメータはユーザインターフェース３３０を介してユーザによって設定され、検索部３２０、トラップドア生成部３２２、鍵共有部３２４、復号化部３２６に反映される。パラメータ設定の一例について、後で図１１を用いて説明する。 The setting unit 328 is a device for setting parameters necessary for processing such as trap door generation, key sharing, and decryption. The parameters are set by the user via the user interface 330 and are reflected in the search unit 320, the trap door generation unit 322, the key sharing unit 324, and the decryption unit 326. An example of parameter setting will be described later with reference to FIG.

そして検索部３２０は、ユーザインターフェース３３０を介してユーザから検索クエリを受けとり、トラップドア生成部３２２を制御して検索クエリからトラップドアを生成し、生成したトラップドアをデータベースサーバ４０に送信し、データベースサーバ４０から返信された暗号化データを鍵共有部３２４および復号化部３２６を制御して復号してメモリ３１４もしくは記憶装置３１６に出力し、またはユーザインターフェース３３０を介してユーザに提示する装置である。処理の詳細については後で図７を用いて説明する。 The search unit 320 receives a search query from the user via the user interface 330, controls the trap door generation unit 322 to generate a trap door from the search query, transmits the generated trap door to the database server 40, and This is a device that controls the key sharing unit 324 and the decryption unit 326 to decrypt the encrypted data returned from the server 40 and outputs the decrypted data to the memory 314 or the storage device 316 or presents it to the user via the user interface 330. . Details of the processing will be described later with reference to FIG.

なお、検索部３２０およびこれを構成するトラップドア生成部３２２、鍵共有部３２４、復号化部３２６、設定部３２８については、それぞれの装置が単体で処理を実行してもよいし、それぞれの装置はプログラムのみを具備し、ＣＰＵ３１２が当該プログラムをメモリ３１４に読み込んで実行してもよい。 Note that the search unit 320 and the trap door generation unit 322, the key sharing unit 324, the decryption unit 326, and the setting unit 328 that configure the search unit 320 may execute processing alone or each device. May comprise only a program, and the CPU 312 may read the program into the memory 314 and execute it.

（データベースサーバ）
図４はデータベースサーバ４０の概略構成を例示する図である。図示するようにデータベースサーバ４０は、ＣＰＵ４１２、メモリ４１４、記憶装置４１６、認証部４１８、登録部４２０、クラスタリング部４３０、検索部４４０、設定部４５０、ユーザインターフェース４６０および通信インターフェース４６２を備え、これらが内部バス４００を介して相互に情報を送受信できるように設計されている。また、クラスタリング部４３０は類似度算出部４３２を備える。類似度算出部４３２は内部バス４００を介して単体でＣＰＵ４１２などと相互に情報を送受信することができる。また、検索部４４０は優先順位算出部４４２および照合部４４４を備える。これらの装置も内部バス４００を介して単体でＣＰＵ４１２などと相互に情報を送受信することができる。(Database server)
FIG. 4 is a diagram illustrating a schematic configuration of the database server 40. As shown in the figure, the database server 40 includes a CPU 412, a memory 414, a storage device 416, an authentication unit 418, a registration unit 420, a clustering unit 430, a search unit 440, a setting unit 450, a user interface 460, and a communication interface 462. It is designed so that information can be transmitted / received to / from each other via the internal bus 400. In addition, the clustering unit 430 includes a similarity calculation unit 432. The similarity calculation unit 432 can send and receive information to and from the CPU 412 or the like alone via the internal bus 400. The search unit 440 includes a priority order calculation unit 442 and a collation unit 444. These devices can also send and receive information to and from the CPU 412 alone via the internal bus 400.

汎用的な構成要素であるＣＰＵ４１２、メモリ４１４、記憶装置４１６、ユーザインターフェース４６０および通信インターフェース４６２は、図２の説明と同様の機能を有するので説明を割愛する。 The CPU 412, the memory 414, the storage device 416, the user interface 460, and the communication interface 462, which are general-purpose components, have the same functions as those described in FIG.

認証部４１８は、データベースサーバ４０へのデータ登録や検索を許可するユーザのＩＤやパスワードを管理する装置である。詳細については後で図５を用いて説明する。 The authentication unit 418 is a device that manages IDs and passwords of users who are permitted to register and search data in the database server 40. Details will be described later with reference to FIG.

本発明特有の構成要素は、登録部４２０、クラスタリング部４３０、これを構成する類似度算出部４３２、検索部４４０およびこれを構成する優先順位算出部４４２、照合部４４４、ならびに設定部４５０である。 Constituent elements unique to the present invention are a registration unit 420, a clustering unit 430, a similarity calculation unit 432 constituting the registration unit 420, a search unit 440, a priority order calculation unit 442, a collation unit 444, and a setting unit 450 constituting the same. .

登録部４２０は、登録クライアント２０から暗号化データ、秘匿化インデックスおよび特徴量の組を受信したとき、この組を記憶装置４１６に登録する装置である。以下、この組のことを登録データと呼ぶ。具体的な登録内容については後で図６を用いて説明する。 The registration unit 420 is a device that registers a set of encrypted data, a concealment index, and a feature amount from the registration client 20 in the storage device 416 when receiving the set. Hereinafter, this set is referred to as registration data. Specific registration contents will be described later with reference to FIG.

クラスタリング部４３０は、記憶装置４１６に登録されている登録データをクラスタリングし、クラスタリング結果をメモリ４１４もしくは記憶装置４１６に一時的に出力する装置である。クラスタリングを行うには登録データ間の類似度を計算する必要があるが、この計算は類似度算出部４３２が登録データ中の特徴量を用いてクラスタリングを行う。具体的なクラスタリング処理については後で図５を用いて説明する。また、クラスタリング結果の具体例を図６であげる。 The clustering unit 430 is a device that clusters registration data registered in the storage device 416 and temporarily outputs a clustering result to the memory 414 or the storage device 416. In order to perform clustering, it is necessary to calculate the similarity between registered data. In this calculation, the similarity calculating unit 432 performs clustering using the feature amount in the registered data. Specific clustering processing will be described later with reference to FIG. A specific example of the clustering result is given in FIG.

類似度算出部４３２は、クラスタリング部４３０からの要求に応じて２つの登録データの類似度を算出する装置である。類似度を算出する登録データは、クラスタリング部４３０が一時的にメモリ４１４もしくは記憶装置４１６に格納するか、類似度算出部４３２に直接通知する。算出した類似度は、メモリ４１４もしくは記憶装置４１６に一時的に出力されるか、クラスタリング部４３０に直接返される。具体的な類似度の計算手順については後で図５を用いて説明する。 The similarity calculation unit 432 is a device that calculates the similarity between two registered data in response to a request from the clustering unit 430. The registration data for calculating the similarity is temporarily stored in the memory 414 or the storage device 416 by the clustering unit 430 or directly notified to the similarity calculation unit 432. The calculated similarity is temporarily output to the memory 414 or the storage device 416 or is directly returned to the clustering unit 430. A specific similarity calculation procedure will be described later with reference to FIG.

検索部４４０は、検索クライアント３０から検索クエリのトラップドアを受信し、検索クエリにヒットした暗号化データを、通信インターフェース４４２を介して検索クライアント３０に返信する装置である。トラップドアを受信したとき検索部４４０は、まず優先順位算出部４４２を起動する。 The search unit 440 is a device that receives a trap door of a search query from the search client 30 and returns encrypted data that has hit the search query to the search client 30 via the communication interface 442. When receiving the trap door, the search unit 440 first activates the priority order calculation unit 442.

優先順位算出部４４２は、クラスタリング部４３０の処理によるクラスタリング結果をもとに照合処理の優先順位を算出する装置である。各クラスタの秘匿インデックスを代表する秘匿インデックスであるピボットとトラップドアとを比較して、各クラスタの照合処理の優先順位を決める。算出した優先順位はメモリ４１４もしくは記憶装置４１６に一時的に出力されるか、検索部４３０に直接返される。具体的な優先順位算出手順については後で図８を用いて説明する。各クラスタのピボットは、クラスタリング部４３０又は優先順位算出部４４２で設定／決定される。 The priority order calculation unit 442 is a device that calculates the priority order of the collation processing based on the clustering result obtained by the processing of the clustering unit 430. The pivot that is the secret index representing the secret index of each cluster is compared with the trap door, and the priority of the collation processing of each cluster is determined. The calculated priority order is temporarily output to the memory 414 or the storage device 416 or is directly returned to the search unit 430. A specific priority order calculation procedure will be described later with reference to FIG. The pivot of each cluster is set / determined by the clustering unit 430 or the priority order calculating unit 442.

次に検索部４４０は照合部４４４を呼び出し、メモリ４１４もしくは記憶装置４１６に、または検索部４３０に格納されている優先順位に基づいて、優先度が高い順にトラップドアと秘匿化インデックスとの照合を行う。照合すべきトラップドアおよび秘匿化インデックスは、検索部４４０が一時的にメモリ４１４もしくは記憶装置４１６に格納するか、直接照合部４４４に渡す。照合部４４４は検索可能暗号特有のアルゴリズムによりトラップドアと秘匿化インデックスとの照合を行い、照合の結果をメモリ４１４もしくは記憶装置４１６に一時的に出力するか、検索部４４０に直接返す。具体的な照合手順については後で図９を用いて説明する。 Next, the search unit 440 calls the collation unit 444, and collates the trap door with the concealment index in descending order of priority based on the priority stored in the memory 414 or the storage device 416 or the search unit 430. Do. The trap door to be verified and the concealment index are temporarily stored in the memory 414 or the storage device 416 by the search unit 440 or directly passed to the verification unit 444. The collation unit 444 collates the trapdoor with the concealment index using an algorithm unique to the searchable encryption, and temporarily outputs the collation result to the memory 414 or the storage device 416 or returns it directly to the search unit 440. A specific verification procedure will be described later with reference to FIG.

従来技術では全登録データについて総当りでトラップドアと照合する必要があったが、優先順位算出部４４２が特徴量を用いて登録データの照合処理の優先順位を設定し、検索部４４０が優先度の高い順でトラップドアと秘匿化インデックスとの照合を行い、一定回数で照合を打ち切ることで、検索応答時間の大幅な短縮を実現する。さらに、特徴量からは元の預託データの中身を推測することが困難なので、安全性の低下を抑えることができる。具体的な検索手順については、後で図７ないし図１０を用いて説明する。In the prior art, it was necessary to collate with all trapped doors for all registered data. However, the priority calculation unit 442 uses the feature amount to set the priority of the registered data collation process, and the search unit 440 uses the priority. By comparing the trapdoor with the concealment index in descending order, the search response time is significantly shortened by stopping the verification at a fixed number of times. Furthermore, since it is difficult to guess the contents of the original deposit data from the feature amount, it is possible to suppress a decrease in safety. A specific search procedure will be described later with reference to FIGS.

設定部４５０はクラスタリングや検索などの処理に必要なパラメータを設定するための装置である。当該パラメータはユーザインターフェース４６０を介してデータベースサーバ管理者によって設定され、登録部４２０、クラスタリング部４３０、類似度算出部４３２、検索部４４０、優先順位算出部４４２および照合部４４４に反映される。 The setting unit 450 is a device for setting parameters necessary for processing such as clustering and search. The parameter is set by the database server administrator via the user interface 460 and is reflected in the registration unit 420, clustering unit 430, similarity calculation unit 432, search unit 440, priority order calculation unit 442, and collation unit 444.

なお、認証部４１８、登録部４２０、クラスタリング部４３０、類似度算出部４３２、検索部４４０、優先順位算出部４４２、照合部４４４および設定部４５０については、それぞれの装置が単体で処理を実行してもよいし、それぞれの装置はプログラムのみを具備し、ＣＰＵ４１２が当該プログラムをメモリ４１４に読み込んで実行してもよい。 Note that, regarding the authentication unit 418, the registration unit 420, the clustering unit 430, the similarity calculation unit 432, the search unit 440, the priority order calculation unit 442, the collation unit 444, and the setting unit 450, each device executes processing alone. Alternatively, each device may include only a program, and the CPU 412 may read the program into the memory 414 and execute it.

（処理概要）
まず始めに、図１２を用いて、本発明の秘匿検索方法の処理概要を説明する。(Outline of processing)
First, the processing outline of the secret search method of the present invention will be described with reference to FIG.

（１）登録クライアントは、預託データを暗号化した暗号化データや、預託データから抽出したインデックスを秘匿化した秘匿化インデックスだけではなく、預託データの特徴量もデータベースサーバに登録する（１２０１）。ここで特徴量とは、預託データの特徴をできるだけ損なわないようにデータ長を大幅に削減したもので、例えば預託データ内の単語などから計算される特徴ベクトルや、預託データを分割してハッシュ値を求めて連結したファジィハッシュとよばれる量などがある。 (1) The registration client registers not only the encrypted data obtained by encrypting the deposit data and the concealed index obtained by concealing the index extracted from the deposit data, but also the feature amount of the deposit data in the database server (1201). Here, the feature amount is a data length that is greatly reduced so as not to damage the characteristics of the deposit data as much as possible. For example, a feature vector calculated from words in the deposit data or a hash value obtained by dividing the deposit data. There is an amount called a fuzzy hash that is concatenated in search of.

（２）データベースサーバは、受信した特徴量を用いて、特徴量に対応する預託データの類似度を計算し、類似した預託データが同じクラスタに含まれるように秘匿化インデックスなどをクラスタリングする（１２０２）。 (2) The database server calculates the similarity of the deposit data corresponding to the feature amount using the received feature amount, and clusters the concealment index or the like so that the similar deposit data is included in the same cluster (1202). ).

（３）データベースサーバは、秘匿検索処理において、まず、各クラスタにおいて秘匿化インデックスの代表（ピボット）を選択し、このピボットと、検索クライアントで検索クエリに含まれる検索キーワードを秘匿化したトラップドアとの照合を行なうことにより、登録データ照合時のクラスタの優先順位を決定する（１２０３）。 (3) In the secret search process, the database server first selects a representative (pivot) of the concealment index in each cluster, and a trap door that conceals the search keyword included in the search query by the search client. The priorities of the clusters at the time of registered data verification are determined (1203).

（４）データベースサーバは、優先順位に基づいて、クラスタ単位に全登録データの照合を行い（１２０４）、検索結果を検索クライアントに出力する。 (4) The database server collates all registered data for each cluster based on the priority order (1204), and outputs the search result to the search client.

なお、優先順位決定の際に、ピボットがトラップドアにヒットした場合、当該ピボットが属するクラスタに含まれる全登録データについてトラップドアとの照合の優先順位を上げ、ピボットがトラップドアにヒットしなかった場合、当該ピボットが属するクラスタに含まれる全登録データについて照合の優先順位を下げる。さらに、秘匿検索処理を高速化するために、照合対象の優先順位を定めた後全登録データについて照合を順次行なう際に、一定回数で照合処理を打ち切る。 If the pivot hits the trap door at the time of priority determination, the priority of collation with the trap door is raised for all registered data included in the cluster to which the pivot belongs, and the pivot did not hit the trap door. In this case, the collation priority is lowered for all registered data included in the cluster to which the pivot belongs. Further, in order to speed up the confidential search process, the collation process is terminated at a fixed number of times when the collation is sequentially performed on all registered data after the priority order of collation targets is determined.

（データ登録処理）
図５は、登録クライアント２０とデータベースサーバ４０のデータ登録処理を例示するシーケンス図である。図５に基づいて当該データ登録処理を説明するとともに、図２の説明で述べた秘匿化インデックス生成手順、特徴量算出手順や、図４の説明で述べたクラスタリング処理、類似度算出手順についても、具体的な処理内容を例示する。(Data registration process)
FIG. 5 is a sequence diagram illustrating the data registration process of the registration client 20 and the database server 40. The data registration process will be described with reference to FIG. 5, and the concealment index generation procedure and feature amount calculation procedure described in FIG. 2 and the clustering process and similarity calculation procedure described in FIG. Specific processing contents will be exemplified.

登録クライアント２０によるデータベースサーバ４０へのデータ登録は、大別すると、登録クライアント２０が登録データを生成するデータ生成処理Ｓ５０と、登録クライアント２０とデータベースサーバ４０との間でデータの授受を行うデータ送受信処理Ｓ５２と、データベースサーバ４０が登録データをクラスタリングするクラスタリング処理Ｓ５４からなる。 The data registration to the database server 40 by the registration client 20 is roughly divided into a data generation process S50 in which the registration client 20 generates registration data, and data transmission / reception in which data is exchanged between the registration client 20 and the database server 40. Processing S52 and clustering processing S54 in which the database server 40 clusters registration data.

データ生成処理Ｓ５０は以下の手順で進行する。 The data generation process S50 proceeds in the following procedure.

（Ｓ５００）登録クライアント２０のユーザは、ユーザインターフェース２３０を介してデータベースサーバ４０に登録するデータを指定する。当該指定を受けて登録部２２０は、まず鍵生成部２１８を起動する。鍵生成部２１８は暗号鍵と復号鍵のペアを生成し、メモリ２１４または記憶装置２１６に格納する。暗号化部２２２は登録指示を受けたデータ（預託データ）に対して、鍵生成部２１８が生成した暗号鍵を用いて暗号化処理を施す。なお、生成した復号鍵は後で検索クライアント３０からの要求により外部に送信される可能性があるため、記憶装置２１６または鍵生成部２１８自体が保持しておく。 (S500) The user of the registration client 20 designates data to be registered in the database server 40 via the user interface 230. In response to the designation, the registration unit 220 first activates the key generation unit 218. The key generation unit 218 generates an encryption key / decryption key pair and stores the pair in the memory 214 or the storage device 216. The encryption unit 222 performs encryption processing on the data (deposited data) for which the registration instruction has been received, using the encryption key generated by the key generation unit 218. Since the generated decryption key may be transmitted to the outside in response to a request from the search client 30, the storage device 216 or the key generation unit 218 itself holds it.

（Ｓ５０２）秘匿化インデックス生成部２２４は預託データの内容から秘匿化インデックスを生成する。非特許文献１に基づいて具体的な生成方法を例示すると以下のようになる。秘匿化インデックス生成の処理手順を図１３に示す。 (S502) The concealment index generation unit 224 generates a concealment index from the contents of the deposit data. An example of a specific generation method based on Non-Patent Document 1 is as follows. FIG. 13 shows a processing procedure for generating the concealment index.

（Ｓ５０２−１）預託データから照合対象となる単語（ｗ１，ｗ２，．．．）を抽出する。単語の抽出は、英語であれば空白で区切られた文字列を抽出することでなされる。日本語の場合は、預託データ内の文章を一定の長さの文字列に分解する方法（Ｎ−ｇｒａｍ）や、形態素解析により単語の抽出を行うことができる。 (S502-1) Extract words (w1, w2,...) To be verified from the deposit data. Words are extracted by extracting character strings separated by spaces in the case of English. In the case of Japanese, words can be extracted by a method (N-gram) for decomposing a sentence in the deposit data into a character string of a certain length or by morphological analysis.

（Ｓ５０２−２）抽出した単語（ｗ１，ｗ２，．．．）それぞれについてハッシュ値（ｈ１，ｈ２，．．．）を計算する。各ハッシュ値のビット長をｎとする。 (S502-2) A hash value (h1, h2,...) Is calculated for each extracted word (w1, w2,...). Let n be the bit length of each hash value.

（Ｓ５０２−３）各ハッシュ値ｈｉに対してｃビットの乱数列ｒｉを生成する（ｉ＝１，２，．．．）。各乱数列ｒｉについて所定の演算を行ってｎ−ｃビットのメッセージダイジェストｄｉを求める。ダイジェスト算出のための所定の演算は、例えば、上記のような単語からハッシュ値を計算する場合とは異なる別のハッシュ関数である。 (S502-3) A c-bit random number sequence ri is generated for each hash value hi (i = 1, 2,...). A predetermined calculation is performed on each random number sequence ri to obtain an nc-bit message digest di. The predetermined calculation for calculating the digest is, for example, another hash function different from the case of calculating the hash value from the word as described above.

（Ｓ５０２−４）乱数列ｒｉの末尾にメッセージダイジェストｄｉを連結して、長さｎのビット列ｓｉを得る（ｉ＝１，２，．．．）。各ハッシュ値ｈｉとビット列ｓｉとの排他的論理和が秘匿化インデックスＨｉとなる。 (S502-4) A message digest di is concatenated to the end of the random number sequence ri to obtain a bit sequence si of length n (i = 1, 2,...). The exclusive OR of each hash value hi and bit string si becomes the concealment index Hi.

ここで、図１３を参照しながら、上記のステップＳ５０２−１〜４によって得られる秘匿化インデックスＨｉの表現について説明する。排他的論理和を「ＸＯＲ」、２つのビット列の結合を「｜」とすると、秘匿化インデックスＨｉは、ｈｉ（単語ｗｉのハッシュ値）と（ｒｉ｜ｄｉ）（乱数列ｒｉとメッセージダイジェストｄｉからなるビット列ｓｉ）を用いて、Ｈｉ＝ｈｉＸＯＲ（ｒｉ｜ｄｉ）と表される。但し、単語Ｗｉのハッシュ値ｈｉ、乱数列ｒｉ、メッセージダイジェストｄｉは、それぞれ、ハッシュ関数ｈ（Ｗｉ）＝ｈｉ、乱数生成Ｒ（ｈｉ）＝ｒｉ、所定演算ｆ（ｒｉ）＝ｄｉによって生成される。 Here, the expression of the concealment index Hi obtained by the above steps S502-1 to S502-1 will be described with reference to FIG. When the exclusive OR is “XOR” and the combination of two bit strings is “|”, the concealment index Hi is obtained from hi (hash value of word wi) and (ri | di) (random number sequence ri and message digest di). Hi = hi XOR (ri | di), using the bit string si). However, the hash value hi, the random number sequence ri, and the message digest di of the word Wi are generated by the hash function h (Wi) = hi, the random number generation R (hi) = ri, and the predetermined calculation f (ri) = di, respectively. .

単語そのものではなくハッシュ値を用いたり、乱数列との排他的論理和を取ったりしているため、秘匿化インデックスから預託データ内の単語を求めることは困難となる。詳細については非特許文献１参照のこと。こうして求めた秘匿化インデックスとトラップドアとの照合方法については、後の図８の説明で明らかにする。 Since a hash value is used instead of the word itself, or an exclusive OR is performed with a random number sequence, it is difficult to obtain a word in the deposit data from the concealment index. See Non-Patent Document 1 for details. A method for collating the concealment index thus obtained with the trap door will be clarified in the description of FIG.

（Ｓ５０４）特徴量算出部２２６は預託データの内容から特徴量を算出する。特徴量として、例えば預託データの大きさなど、恣意的に変更することが難しく、かつ連続的な値を取るような属性情報を用いる方法がまずあげられる。預託データの大きさを特徴量とした場合、２つの預託データ間の類似度は、預託データの大きさをｓ１、ｓ２とすると、１／（１＋｜ｓ１−ｓ２｜）で近似することができる。この類似度は０から１までの値をとり、預託データが類似するほど１に近づく。 (S504) The feature quantity calculation unit 226 calculates the feature quantity from the contents of the deposit data. As the feature quantity, for example, a method of using attribute information that is difficult to change arbitrarily, such as the size of deposit data, and takes a continuous value is first mentioned. When the size of the deposit data is a feature amount, the similarity between the two deposit data can be approximated by 1 / (1+ | s1-s2 |), where the size of the deposit data is s1 and s2. . This similarity takes a value from 0 to 1, and approaches 1 as the deposit data becomes similar.

より洗練された特徴量として、預託データ内の単語からベクトル（特徴ベクトル）を生成する方法が知られている。特徴ベクトルは以下の手順で求められる。特徴ベクトル生成の処理手順を図１５に示す。 As a more sophisticated feature value, a method of generating a vector (feature vector) from words in deposit data is known. The feature vector is obtained by the following procedure. The feature vector generation processing procedure is shown in FIG.

（Ｓ５０４−１）預託データから照合対象となる単語（ｗ１，ｗ２，．．．）を抽出する。 (S504-1) Extract words (w1, w2,...) To be verified from the deposit data.

（Ｓ５０４−２）抽出した単語列（ｗ１，ｗ２，．．．）それぞれについてハッシュ値（ｈ１，ｈ２，．．．）を計算する。各ハッシュ値のビット長をｎとする。 (S504-2) A hash value (h1, h2,...) Is calculated for each extracted word string (w1, w2,...). Let n be the bit length of each hash value.

（Ｓ５０４−３）ｈ１、ｈ２、．．．の各ビットの論理和をとる。これをｎ次元のベクトルとみなして特徴ベクトルと呼ぶ。 (S504-3) h1, h2,. . . OR of each bit. This is regarded as an n-dimensional vector and is called a feature vector.

こうして算出した特徴ベクトルを用いれば、２つの預託データ間の類似度を特徴ベクトル内のともに１になるビットの個数で近似することができる。即ち、２つの預託データに対応するハッシュ値のそれぞれのビットに対して論理積（ＡＮＤ）を求めた結果に含まれるビット「１」の個数が類似度となる。特徴ベクトルの詳細については非特許文献５を参照のこと。 By using the feature vector calculated in this way, the similarity between two deposit data can be approximated by the number of bits that are both 1 in the feature vector. That is, the number of bits “1” included in the result of calculating the logical product (AND) for each bit of the hash values corresponding to the two deposit data is the similarity. See Non-Patent Document 5 for details of feature vectors.

また、単語の抽出などを行なわずに、計算可能な特徴量として「ファジィハッシュ」とよばれる量を用いて類似度を求めることができる。ファジィハッシュは以下の手順で算出される。 Further, the similarity can be obtained by using a quantity called “fuzzy hash” as a feature quantity that can be calculated without extracting a word or the like. The fuzzy hash is calculated by the following procedure.

（Ｓ５０４−ａ）預託データを分割する。固定長で分割するほか、特定のビット列が境界となるように分割する方法が知られている。 (S504-a) The deposit data is divided. In addition to dividing at a fixed length, a method of dividing so that a specific bit string becomes a boundary is known.

（Ｓ５０４−ｂ）各分割データ（ｄ１，ｄ２，．．．）のハッシュ値（ｈ１，ｈ２，．．．）を計算する。 (S504-b) The hash value (h1, h2,...) Of each divided data (d1, d2,...) Is calculated.

（Ｓ５０４−ｃ）ハッシュ値の配列（ｈ１，ｈ２，．．．）をファジィハッシュとして出力する。 (S504-c) The hash value array (h1, h2,...) Is output as a fuzzy hash.

当該預託データに対応するファジィハッシュＨ＝（ｈ１，ｈ２，．．．）、他の預託データに対応するファジィハッシュＦ＝（ｆ１，ｆ２，．．．）を用いると、２つの預託データ間の類似度を、（ｈ１，ｈ２，．．．）と（ｆ１，ｆ２，．．．）との積集合の要素数（ＨとＦの双方に含まれているハッシュ値の数ｎ）と、（ｈ１，ｈ２，．．．）と（ｆ１，ｆ２，．．．）の和集合の要素数（ＨとＦの双方における要素の数の和Ｎからｎを引いた数）との比ｎ／（Ｎ−ｎ）で近似することができる。なお、Ｈ及びＦの要素の数をｍ１及びｍ２とすると、Ｎ＝ｍ１＋ｍ２、０＝＜ｎ＝＜ｍｉｎ（ｍ１、ｍ２）＜Ｎである。 When the fuzzy hash H = (h1, h2,...) Corresponding to the deposit data and the fuzzy hash F = (f1, f2,...) Corresponding to other deposit data are used, the two deposit data are The similarity is expressed by the number of elements of the product set of (h1, h2,...) And (f1, f2,...) (The number n of hash values included in both H and F), and ( The ratio n / (h1, h2,...) and the number of elements in the union of (f1, f2,...) (the sum N of the numbers of elements in both H and F minus n) N−n). If the number of elements H and F is m1 and m2, N = m1 + m2, 0 = <n = <min (m1, m2) <N.

これまでさまざまなファジィハッシュ技術が提案されている。詳細については非特許文献６を参照のこと。 Various fuzzy hash techniques have been proposed so far. See Non-Patent Document 6 for details.

次にデータ送受信処理Ｓ５２は、以下の手順で進行する。 Next, the data transmission / reception process S52 proceeds in the following procedure.

（Ｓ５２０）登録クライアント２０とデータベースサーバ４０との間でデータ授受のための通信路を確立する。具体的には、まず認証部４１８が、通信インターフェース２３２，４６２やユーザインターフェース２３０を介して登録クライアント２０のユーザのＩＤ、及びパスワードなどでユーザの認証を行う。あらかじめ登録された正規のユーザと判断した場合、認証部４１８は登録クライアント２０の登録部２２０とデータベースサーバ４０の登録部４２０との間で通信路を確立する。この際、認証部４１８は登録クライアント２０のＩＰアドレスなどの識別情報も収集して、記憶装置４１６に格納する。この識別情報は検索クライアント３０と登録クライアント２０との間で鍵共有をする場合に必要になる。詳細については後で図６及び図７を用いて説明する。なお、正規のユーザでないと判断した場合は、通信路を確立せず処理を終了する。 (S520) A communication path for data exchange is established between the registration client 20 and the database server 40. Specifically, the authentication unit 418 first authenticates the user with the user ID and password of the registered client 20 via the communication interfaces 232 and 462 and the user interface 230. When it is determined that the user is a registered regular user, the authentication unit 418 establishes a communication path between the registration unit 220 of the registration client 20 and the registration unit 420 of the database server 40. At this time, the authentication unit 418 also collects identification information such as the IP address of the registered client 20 and stores it in the storage device 416. This identification information is necessary when the search client 30 and the registration client 20 share a key. Details will be described later with reference to FIGS. If it is determined that the user is not an authorized user, the process is terminated without establishing a communication path.

（Ｓ５２２）登録部２２０は通信インターフェース２３２を介して、登録データ（暗号化データ、秘匿化インデックス、及び特徴量の組）をデータベースサーバ４０に送信する。 (S522) The registration unit 220 transmits registration data (a set of encrypted data, a concealment index, and a feature amount) to the database server 40 via the communication interface 232.

（Ｓ５２４）登録部４２０は通信インターフェース４６２を介して送信された登録データを受信し、記憶装置４１６に格納する。具体的な登録内容については後で図６を用いて説明する。 (S524) The registration unit 420 receives the registration data transmitted via the communication interface 462 and stores it in the storage device 416. Specific registration contents will be described later with reference to FIG.

（Ｓ５２６）登録部４２０は通信インターフェース４６２を介して登録完了の旨を登録クライアント２０の登録部２２０に通知する。 (S526) The registration unit 420 notifies the registration unit 220 of the registration client 20 of the completion of registration via the communication interface 462.

（Ｓ５２８）認証部４１８は、登録クライアント２０の登録部２２０とデータベースサーバ４０の登録部４２０との間で確立した通信路を開放する。 (S528) The authentication unit 418 opens the communication path established between the registration unit 220 of the registration client 20 and the registration unit 420 of the database server 40.

以上の手順により、登録クライアント２０のユーザは、データベースサーバ４０の管理者や通信路上の第三者に内容を事実上知られることなくデータベースサーバ４０に自己のデータを預託できる。 Through the above procedure, the user of the registration client 20 can deposit his / her data in the database server 40 without the contents of the administrator of the database server 40 or a third party on the communication path being practically known.

データの登録後データベースサーバ４０は登録データのクラスタリング処理Ｓ５４を行う。代表的なクラスタリング手法として、Ｋ平均法と階層的クラスタリングが知られている。まず、Ｋ平均法は以下の手順で行われる。特徴量の個数をＮとする。 After the data registration, the database server 40 performs a clustering process S54 for the registered data. As a representative clustering method, a K-means method and hierarchical clustering are known. First, the K-average method is performed according to the following procedure. Let N be the number of feature quantities.

（Ｓ５４−１）Ｋ個のクラスタの中心をランダムに設定する。あるいは、各クラスタに含まれる複数の秘匿化インデックスを所定の順に並べた時に、順序の中心の位置するものを当該クラスタの「中心」に設定する。 (S54-1) The centers of K clusters are set at random. Alternatively, when a plurality of concealment indexes included in each cluster are arranged in a predetermined order, the one at the center of the order is set as the “center” of the cluster.

（Ｓ５４−２）各特徴量ｘｉ（ｉ＝１，２，．．．，Ｎ）についてＫ個の中心との類似度を計算して、最も類似する中心を求める。ｘｉを当該中心が属するクラスタに割り振る。 (S54-2) For each feature quantity xi (i = 1, 2,..., N), the degree of similarity with K centers is calculated to find the most similar center. xi is assigned to the cluster to which the center belongs.

（Ｓ５４−３）全ての特徴量についてクラスタへの割り振りが変化しなかった場合は処理を終了する。それ以外の場合は割り振った特徴量から各クラスタの中心を計算し直してから、ステップＳ５４−２に戻る。 (S54-3) If the allocation to the cluster has not changed for all the feature values, the process is terminated. In other cases, the center of each cluster is recalculated from the assigned feature amount, and the process returns to step S54-2.

結果は最初のクラスタのランダムな設定に依存するが、計算量はｎＫのオーダなので比較的高速に動作するというメリットがある。詳細については非特許文献７を参照のこと。 Although the result depends on the random setting of the first cluster, since the calculation amount is on the order of nK, there is an advantage that the operation is relatively fast. See Non-Patent Document 7 for details.

なおＫ平均法では、同じクラスタに属している特徴量からクラスタの中心を求める必要がある。特徴量として預託データの大きさを用いた場合、あるクラスタには、そのクラスタに含まれるｍ個の預託データそれぞれの大きさである特徴量ｘｋ（ｋ＝１，．．．，ｍ）が属しているとすると、その中心ｖは（ｘ１＋．．．＋ｘｍ）／ｍで与えられる。 In the K-average method, it is necessary to obtain the center of the cluster from the feature quantities belonging to the same cluster. When the size of the deposit data is used as the feature amount, a feature amount xk (k = 1,..., M) that is the size of each of the m deposit data included in the cluster belongs to a certain cluster. The center v is given by (x1 +... + Xm) / m.

特徴量として特徴ベクトルを用いた場合、中心ベクトルｖのｉ番目の要素をｖｉ、ｍ個の預託データからなるクラスタに含まれる預託データｋの、ｎ個の特徴量からなる特徴ベクトルｘｋのｉ番目の要素をｘｋ,ｉ（ｉ＝１，・・・・，ｎ）とかくと、ｘｋ,ｉのｋ（ｋ＝１，．．．，ｍ）個の預託データについての標準偏差ｕｉと平均値＜ｘｉ＞との比（ｕｉ／＜ｘｉ＞）が、１／Ｃ（Ｃは２〜１０程度の定数）よりも小さければ（（ｕｉ／＜ｘｉ＞）＝＜（１／Ｃ）、即ち、（＜ｘｉ＞／ｕｉ）＞＝Ｃ）、多くのｘｋ,ｉが平均値＜ｘｉ＞の近傍に集中しているので、ｎ個の特徴量におけるｉ番目の特徴量がそのクラスタの特徴量として有効であることを意味しており、その場合はｖｉ＝１、それ以外の場合（ｉ番目の特徴量がそのクラスタの特徴量として有効でない）はｖｉ＝０とし、１又は０であるｖｉを要素とする中心ベクトルｖ＝（ｖ１，・・・・，ｖｎ）を求めることができる。即ち、中心ベクトルｖは、ｎ個の特徴量の中で、どの特徴量がそのクラスタの特徴量として有効であるかを示す特徴量ベクトルである。 When a feature vector is used as a feature amount, the i-th element of the center vector v is vi, and the i-th feature vector xk of n feature amounts of deposit data k included in a cluster of m deposit data Where xk, i (i = 1,..., N), the standard deviation ui and the average value of k (k = 1,..., M) deposit data of xk, i < If the ratio (ui / <xi>) to xi> is smaller than 1 / C (C is a constant of about 2 to 10) ((ui / <xi>) = <(1 / C), that is, ( <Xi> / ui)> = C), since many xk, i are concentrated in the vicinity of the average value <xi>, the i-th feature value of n feature values is effective as the feature value of the cluster In that case, vi = 1, otherwise (the i-th feature is the cluster Not effective as a feature) of the vi = 0, the center vector v = (v1 to the vi is 1 or 0 and elements, · · · ·, vn) can be obtained. That is, the center vector v is a feature quantity vector indicating which of the n feature quantities is effective as the feature quantity of the cluster.

また、上記の＜ｘｉ＞が正であれば、上記の判定式によって得られる、中心ベクトルの要素ｖｉ（１又は０）は、ガウスの記号［］を用いてｖｉ＝［ｐｉ］−［｜ｐｉ−１｜］と表される。但し、ｐｉ＝（＜ｘｉ＞／（ｕｉ・Ｃ））である。 Also, if the above <xi> is positive, the element vi (1 or 0) of the center vector obtained by the above determination formula is expressed as vi = [pi] − [| pi using the Gaussian symbol []. -1 |]. However, pi = (<xi> / (ui · C)).

一方、特徴量としてファジィハッシュを用いた場合は、類似度の計算が特殊なので特徴量から中心を求めることは難しい。 On the other hand, when a fuzzy hash is used as a feature value, it is difficult to obtain the center from the feature value because the similarity calculation is special.

類似度さえ計算できればどのような特徴量でもクラスタリング可能な方法として階層的クラスタリングがある。これは以下の手順で行われる。 Hierarchical clustering is a method capable of clustering any feature amount as long as the similarity can be calculated. This is done in the following procedure.

（Ｓ５４−ａ）１個の特徴量だけを含むＮ個のクラスタを生成する。 (S54-a) N clusters including only one feature amount are generated.

（Ｓ５４−ｂ）クラスタｉとｊのそれぞれの特徴量ｘｉとｘｊの距離（非類似度）からクラスタ間の距離を計算し、最も距離の近い２つのクラスタを逐次的に１つのクラスタに併合する。 (S54-b) The distance between the clusters is calculated from the distance (dissimilarity) between the respective feature quantities xi and xj of the clusters i and j, and the two nearest clusters are sequentially merged into one cluster. .

（Ｓ５４−ｃ）この併合を全ての対象が１つのクラスタに併合されるまで繰り返す。 (S54-c) This merging is repeated until all objects are merged into one cluster.

階層的クラスタリングによる出力はデンドログラム（dendrogram、樹状図、系統樹）とよばれるツリー構造をとる。デンドログラムによりどのデータがどのクラスタに属するかということだけではなく、クラスタ内のデータ同士がどの程度離れているかということも求められる。なお、クラスタの併合を工夫することで、計算量のオーダをＮの２乗にまで抑えられることが知られている。詳細については非特許文献８を参照のこと。 The output by hierarchical clustering takes a tree structure called a dendrogram (dendrogram, dendrogram). The dendrogram requires not only which data belongs to which cluster but also how far the data in the cluster are. It is known that the order of calculation amount can be suppressed to the square of N by devising the merging of clusters. See Non-Patent Document 8 for details.

以上まとめると、Ｋ平均法は高速に動作するが、特徴量から中心が定まる場合にのみ適用できる。階層的クラスタリングはＫ平均法より低速だが、類似度さえ計算できればどのようなデータでもクラスタリングできる。 In summary, the K-average method operates at high speed, but can be applied only when the center is determined from the feature amount. Hierarchical clustering is slower than the K-means method, but any data can be clustered as long as the similarity can be calculated.

図６はデータベースサーバが作成する登録データ格納位置管理テーブル６０とクラスタ管理テーブル６２のデータ構成を例示する図である。データベースサーバ４０の登録部４２０は、登録クライアント２０から暗号化データ、秘匿化インデックス、特徴量の組（登録データ）を受信すると、登録データ格納位置管理テーブル６０を作成して記憶装置４１６に格納する。 FIG. 6 is a diagram illustrating a data configuration of the registered data storage location management table 60 and the cluster management table 62 created by the database server. When the registration unit 420 of the database server 40 receives the encrypted data, the concealment index, and the feature amount set (registration data) from the registration client 20, the registration unit 420 creates the registration data storage location management table 60 and stores it in the storage device 416. .

登録データ格納位置管理テーブル６０は、登録データを一意に識別する登録データＩＤを格納する登録ＩＤカラム６００、受信した暗号化データの記憶装置４１６内の格納場所を記録する暗号化データカラム６０２、秘匿化インデックスの記憶装置４１６内の格納場所を記録する秘匿化インデックスカラム６０４、特徴量の記憶装置４１６内の格納場所を記録する特徴量カラム６０６、暗号化データ等を登録した登録クライアント２０の識別情報を格納する登録クライアントカラム６０８およびその他必要な事項を格納するカラム６０８を備える。 The registration data storage location management table 60 includes a registration ID column 600 for storing a registration data ID for uniquely identifying registration data, an encrypted data column 602 for recording a storage location of the received encrypted data in the storage device 416, and a secret. Identification information of the registration client 20 that has registered the concealment index column 604 that records the storage location in the storage device 416 of the encrypted index, the feature column 606 that records the storage location in the storage device 416 of the feature amount, and the encrypted data Are registered client column 608 and other necessary items are stored in column 608.

登録部４２０は、登録データが追加される度に値を１ずつ増やすなどして、登録データを一意に識別できるように登録データＩＤを発行する。 The registration unit 420 issues a registration data ID so that the registration data can be uniquely identified by increasing the value by one each time registration data is added.

暗号化データカラム６０２、秘匿化インデックスカラム６０４および特徴量カラム６０６に記録する情報として、暗号化データなどのファイル名や記憶装置４１６内のセクタアドレスなどがある。なお、特徴量は暗号化データなどと比較してデータ量が少ないため、直接特徴量カラム６０６に格納してもよい。 Information recorded in the encrypted data column 602, the concealment index column 604, and the feature amount column 606 includes a file name such as encrypted data and a sector address in the storage device 416. Note that since the feature amount is smaller than the encrypted data or the like, the feature amount may be directly stored in the feature amount column 606.

登録クライアントカラム６０８に格納すべき情報として、暗号化データ等を登録した登録クライアント２０のＩＰアドレスがある。この情報は認証部４１８が図５のステップＳ５２０で取得しており、検索クライアント３０と登録クライアント２０の間で鍵共有をする場合に必要になる。鍵共有処理の詳細については後で図７を用いて説明する。 Information to be stored in the registered client column 608 includes the IP address of the registered client 20 that has registered encrypted data or the like. This information is acquired by the authentication unit 418 in step S520 of FIG. 5 and is necessary when the search client 30 and the registration client 20 share a key. Details of the key sharing process will be described later with reference to FIG.

さらに、カラム６０８に格納すべき情報として、例えばデータの登録日時がある。 Furthermore, as information to be stored in the column 608, for example, there is a data registration date.

クラスタリング部４３０は特徴量を用いてクラスタリングした結果をクラスタ管理テーブル６２に記録し、記憶装置４１６に格納する。そのクラスタ管理テーブル６２は、クラスタを一意に識別するクラスタＩＤを格納するクラスタＩＤカラム６２０、クラスタに属する登録データの登録データＩＤ６００を格納する登録データＩＤカラム６２２およびその他必要な事項を格納するカラム６２４からなる。カラム６２４に格納すべき情報として、例えばＫ平均法におけるクラスタの中心に関する情報をなどがある。 The clustering unit 430 records the clustering result using the feature amount in the cluster management table 62 and stores it in the storage device 416. The cluster management table 62 includes a cluster ID column 620 for storing a cluster ID for uniquely identifying a cluster, a registration data ID column 622 for storing a registration data ID 600 of registration data belonging to the cluster, and a column 624 for storing other necessary items. Consists of. As information to be stored in the column 624, for example, information on the center of the cluster in the K-average method is included.

（秘匿検索処理の詳細）
以上、特徴量算出方法やクラスタリング方法、およびこれらを実現するための装置について説明してきた。これらの方法および装置は、秘匿検索の高速化のために必要な、いわば事前準備に相当する。以下、データベースサーバが行う秘匿検索処理の詳細について説明する。(Details of confidential search processing)
The feature amount calculation method and the clustering method, and the apparatus for realizing them have been described above. These methods and apparatuses correspond to preparations necessary for speeding up the secret search. Hereinafter, the details of the confidential search process performed by the database server will be described.

（検索クライアントとデータベースサーバの検索処理）
図７は検索クライアントとデータベースサーバの検索処理を例示するシーケンス図である。図７に基づいて当該データ登録処理を説明するとともに、図２、図３および図６の説明で述べた鍵共有手順や、図３の説明で述べたトラップドア生成手順についても具体的な処理を例示する。(Search processing of search client and database server)
FIG. 7 is a sequence diagram illustrating search processing of the search client and the database server. The data registration process will be described with reference to FIG. 7, and specific processing will be performed for the key sharing procedure described in the description of FIGS. 2, 3, and 6, and the trap door generation procedure described in the description of FIG. Illustrate.

検索クライアント３０とデータベースサーバ４０で行われる秘匿検索処理は、大別すると、検索クライアント２０が検索クエリからトラップドアを生成するトラップドア生成処理Ｓ７０と、検索クライアント２０とデータベースサーバ４０との間で検索処理を行う秘匿検索処理Ｓ７２と、登録クライアント２０と検索クライアント３０との間で復号鍵を共有し、暗号データを復号する復号化処理Ｓ７４からなる。 The secret search process performed by the search client 30 and the database server 40 can be broadly divided into a trap door generation process S70 in which the search client 20 generates a trap door from a search query, and a search between the search client 20 and the database server 40. The process includes a secret search process S72 for performing the process, and a decryption process S74 for sharing the decryption key between the registration client 20 and the search client 30 and decrypting the encrypted data.

検索クライアント２０の検索部３２０はトラップドア生成処理Ｓ７０として、ユーザインターフェース３３０を介してユーザから検索クエリを受けとり、トラップドア生成部３２２を制御して検索クエリからトラップドアを生成する。非特許文献１に基づいて具体的なトラップドア生成処理Ｓ７０を例示すると、以下のようになる。 As the trap door generation process S70, the search unit 320 of the search client 20 receives a search query from the user via the user interface 330, and controls the trap door generation unit 322 to generate a trap door from the search query. An example of a specific trap door generation process S70 based on Non-Patent Document 1 is as follows.

（Ｓ７０−１）秘匿化インデックス生成処理（Ｓ５０２−２）で用いたハッシュ関数を準備する。 (S70-1) The hash function used in the concealment index generation process (S502-2) is prepared.

（Ｓ７０−２）当該ハッシュ関数を用いて検索クエリ（検索キーワード）のハッシュ値を計算する（検索キーワードの秘匿化）。これがトラップドアとなる。 (S70-2) The hash value of the search query (search keyword) is calculated using the hash function (search keyword concealment). This is the trap door.

検索クエリのハッシュ値を用いるので、トラップドアから検索クエリを特定することは困難となる。こうして求めたトラップドアと秘匿化インデックスとの照合方法については、後の図９の説明で明らかにする。 Since the hash value of the search query is used, it is difficult to specify the search query from the trapdoor. A method for collating the trap door and the concealment index thus obtained will be clarified in the description of FIG.

秘匿検索処理Ｓ７２は以下の手順で進行する。 The secret search process S72 proceeds in the following procedure.

（Ｓ７２０）検索クライアント３０とデータベースサーバ４０との間でデータ授受のための通信路を確立する。具体的には、まず認証部４１８が通信インターフェース３３２，４６２やユーザインターフェース３３０を介して検索クライアント３０のユーザとのＩＤ、及びパスワードなどでユーザの認証を行う。あらかじめ登録された正規のユーザと判断した場合、認証部４１８は、検索クライアント３０の検索部３２０とデータベースサーバ４０の検索部４４０との間で通信路を確立する。正規のユーザでないと判断した場合は通信路を確立せず、処理を終了する。 (S720) A communication path for data exchange is established between the search client 30 and the database server 40. Specifically, the authentication unit 418 first authenticates the user with the ID of the user of the search client 30 and a password via the communication interfaces 332 and 462 and the user interface 330. If it is determined that the user is a registered regular user, the authentication unit 418 establishes a communication path between the search unit 320 of the search client 30 and the search unit 440 of the database server 40. If it is determined that the user is not a legitimate user, the communication path is not established and the process is terminated.

（Ｓ７２２）検索部３２０は通信インターフェース３３２を介してトラップドアをデータベースサーバ４０に送信する。データベースサーバ４０の検索部４４０は通信インターフェース４６２を介して当該トラップドアを受信する。 (S722) The search unit 320 transmits the trap door to the database server 40 via the communication interface 332. The search unit 440 of the database server 40 receives the trap door via the communication interface 462.

（Ｓ７２４）検索部４４０の優先順位算出部４４２は、受信したトラップドアと記憶装置４１６に格納されている秘匿化インデックスの一部と照合を行うことで照合の優先順位を算出する。具体的な優先順位算出手順については次の図８を用いて説明する。 (S724) The priority calculation unit 442 of the search unit 440 calculates the priority of collation by collating the received trapdoor with a part of the concealment index stored in the storage device 416. A specific priority order calculation procedure will be described with reference to FIG.

（Ｓ７２６）検索部４４０はステップＳ７２４で求めた優先順位をもとに照合部４４４を制御して、優先度が高い順にトラップドアと秘匿化インデックスの照合処理を行う。具体的な照合手順については後で図９を用いて説明する。 (S726) The search unit 440 controls the collation unit 444 based on the priority obtained in step S724, and performs collation processing between the trapdoor and the concealment index in descending order of priority. A specific verification procedure will be described later with reference to FIG.

（Ｓ７２８）検索部４４０は、ヒットした暗号化データを検索部３２０に返信する。併せて、登録データ格納位置管理テーブル６０の登録クライアントカラム６０８に格納されている登録クライアント２０のＩＰアドレスも返信する。後の復号化処理Ｓ７４で検索クライアント３０が復号鍵を入手するために必要となるためである。 (S728) The search unit 440 returns the hit encrypted data to the search unit 320. At the same time, the IP address of the registration client 20 stored in the registration client column 608 of the registration data storage location management table 60 is also returned. This is because it is necessary for the search client 30 to obtain the decryption key in the subsequent decryption processing S74.

（Ｓ７３０）認証部４１８は、検索部３２０と検索部４４０との間で確立した通信路を開放する。 (S730) The authentication unit 418 opens the communication path established between the search unit 320 and the search unit 440.

次に復号化処理Ｓ７４について説明する。検索結果として入手した暗号化データを復号するためには、登録クライアント２０から復号鍵を共有しなければならない。鍵共有の方法として、ＳＳＬ(Secure Sockets Layer)で利用されている公開鍵暗号を用いる鍵共有法や、ＩＰＳｅｃ(Security Architecture for Internet Protocol)で利用されているＤＨ(Diffie-Hellman)鍵共有法が知られている。ここでは公開鍵暗号を用いる鍵共有法について具体的手順を説明する。 Next, the decoding process S74 will be described. In order to decrypt the encrypted data obtained as a search result, the decryption key must be shared from the registration client 20. As a key sharing method, there are a key sharing method using public key cryptography used in SSL (Secure Sockets Layer) and a DH (Diffie-Hellman) key sharing method used in IPSec (Security Architecture for Internet Protocol). Are known. Here, a specific procedure for a key sharing method using public key cryptography will be described.

（Ｓ７４０）鍵共有部３２４はデータベースサーバ４０から受信した検索結果から、暗号化データ等を登録した登録クライアント２０の、ＩＰアドレスなどといった識別情報を取り出す。登録クライアント２０は当該暗号化データの復号鍵を所有している。復号鍵を共有する前に、まずは検索クライアント３０がなりすましなどをしていない正規のクライアントであることを証明しなければならない。鍵共有部３２４は以下の手順で登録クライアント２０の認証を行う。 (S740) The key sharing unit 324 extracts identification information such as the IP address of the registered client 20 that registered the encrypted data from the search result received from the database server 40. The registered client 20 has a decryption key for the encrypted data. Before sharing the decryption key, first, it is necessary to prove that the search client 30 is a legitimate client that does not impersonate. The key sharing unit 324 authenticates the registered client 20 in the following procedure.

（Ｓ７４０−１）鍵共有部３２４は登録クライアント２０のＩＰアドレスをもとに通信インターフェース３３２を介して登録クライアント２０に接続する。 (S740-1) The key sharing unit 324 connects to the registration client 20 via the communication interface 332 based on the IP address of the registration client 20.

（Ｓ７４０−２）登録クライアント２０の鍵生成部２１８は、通信インターフェース２３２を介して検索クライアント３０の鍵共有部３２４に証明書を要求する。ここで証明書とは、信頼できる第三者（ＣＡ：Certificate Authority、認証局）が検索クライアント３０の公開鍵に電子署名を施したものである。 (S740-2) The key generation unit 218 of the registration client 20 requests a certificate from the key sharing unit 324 of the search client 30 via the communication interface 232. Here, the certificate is a certificate obtained by applying a digital signature to the public key of the search client 30 by a trusted third party (CA: Certificate Authority).

（Ｓ７４０−３）鍵共有部３２４は当該証明書を登録クライアント２０へ送付する。 (S740-3) The key sharing unit 324 sends the certificate to the registration client 20.

（Ｓ７４０−４）鍵生成部２１８は当該証明書の署名を検証し、検索クライアント３０の公開鍵を取得する。署名の検証に失敗した場合は、証明書が不正であるとして通信路を切断して処理を終了する。 (S740-4) The key generation unit 218 verifies the signature of the certificate and acquires the public key of the search client 30. If the verification of the signature fails, the communication path is cut off because the certificate is invalid, and the process ends.

（Ｓ７４０−５）鍵共有部３２４はメッセージを生成してメッセージダイジェストを付加し、鍵共有部３２４の持つ秘密鍵で暗号化して、鍵生成部２１８へ送信する。 (S740-5) The key sharing unit 324 generates a message, adds a message digest, encrypts it with the secret key held by the key sharing unit 324, and transmits the encrypted message to the key generation unit 218.

（Ｓ７４０−６）鍵生成部２１８は、鍵共有部３２４の公開鍵を使ってメッセージを解読する。解読したメッセージからメッセージダイジェストを作成し、鍵共有部３２４が付加したメッセージダイジェストと比較する。双方のメッセージダイジェストの一致が確認されれば正規の検索クライアント３０から改ざんされていないメッセージを受信したと判定され、認証が完了する。そうでない場合は、検索クライアント３０は正規のクライアントではないと判定され、通信路を切断し、処理を終了する。 (S740-6) The key generation unit 218 decrypts the message using the public key of the key sharing unit 324. A message digest is created from the decrypted message and compared with the message digest added by the key sharing unit 324. If a match between both message digests is confirmed, it is determined that a message that has not been tampered with has been received from the authorized search client 30, and authentication is completed. Otherwise, it is determined that the search client 30 is not a regular client, the communication path is disconnected, and the process ends.

（Ｓ７０４−１）から（Ｓ７４０−６）で述べた認証手順では、登録クライアント２０のユーザの意思にかかわらず、ＣＡに正規の証明書を発行してもらった正規の検索クライアント３０は全て、暗号化データを復号し得ることになる。復号鍵の送付先を限定するためには、ステップＳ７０４−１で検索クライアント３０から接続された際や、ステップＳ７４０−４において証明書を検証した際に、鍵生成部２１８が接続元の情報や証明書などから検索クライアント３０の識別情報も読み取り、所定の検索クライアント以外に復号鍵を送信しないように通信路を切断するといった方法をとればよい。復号鍵の送信先の指定は、ユーザがユーザインターフェース２３０を介して行うことができる。 In the authentication procedure described in (S704-1) to (S740-6), regardless of the intention of the user of the registered client 20, all the regular search clients 30 that have issued a regular certificate to the CA are encrypted. The encrypted data can be decoded. In order to limit the destination of the decryption key, when the connection is made from the search client 30 in step S704-1 or when the certificate is verified in step S740-4, the key generation unit 218 The identification information of the search client 30 may be read from a certificate or the like, and the communication path may be disconnected so that the decryption key is not transmitted to other than the predetermined search client. The user can specify the destination of the decryption key via the user interface 230.

認証が完了した後、検索クライアント３０は、以下の手順で登録クライアント２０から復号鍵を入手する。 After the authentication is completed, the search client 30 obtains a decryption key from the registration client 20 in the following procedure.

（Ｓ７４２）登録クライアント２０の鍵生成部２１８は、証明書から取得した公開鍵で自己の有する復号鍵を暗号化して、通信インターフェース２３２を介して検索クライアント３０の鍵共有部３２４に送信する。鍵共有部３２４は当該暗号化された復号鍵を自己の秘密鍵で復号して、所望の復号鍵を得る。 (S742) The key generation unit 218 of the registration client 20 encrypts the decryption key that it has with the public key acquired from the certificate, and transmits it to the key sharing unit 324 of the search client 30 via the communication interface 232. The key sharing unit 324 decrypts the encrypted decryption key with its own secret key to obtain a desired decryption key.

（Ｓ７４４）復号化部３２６がＳ７４２で入手した復号鍵を用いて暗号データを復号して、検索処理が完了する。 (S744) The decryption unit 326 decrypts the encrypted data using the decryption key obtained in S742, and the search process is completed.

以上の手順により検索クライアント３０のユーザは、データベースサーバ４０の管理者や通信路上の第三者に検索クエリや検索結果の内容を事実上知られることなく、所望の検索結果を入手することができる。 Through the above procedure, the user of the search client 30 can obtain the desired search result without the knowledge of the search query and the search result being effectively known by the administrator of the database server 40 or a third party on the communication path. .

（優先順位決定）
図８はデータベースサーバ４０が行う優先順位算出Ｓ７２４の処理手順を例示するフローチャートである。図８に基づいて検索クライアント３０からトラップドアを受信してから行う優先順位算出処理について以下に説明する。図５で述べた秘匿化インデックスとトラップドアとの照合についても、具体的な処理を例示する。なお、以下の処理はすべてデータベースサーバ４０の検索部４４０の優先順位算出部４４２が行う。(Priority determination)
FIG. 8 is a flowchart illustrating the processing procedure of priority order calculation S724 performed by the database server 40. A priority calculation process performed after receiving the trapdoor from the search client 30 based on FIG. 8 will be described below. Specific processing is also exemplified for the collation between the concealment index and the trapdoor described in FIG. The following processing is all performed by the priority calculation unit 442 of the search unit 440 of the database server 40.

（Ｓ８００）クラスタをカウントするための変数ｐに１を設定する。 (S800) 1 is set to a variable p for counting clusters.

（Ｓ８０２）クラスタＩＤがｐとなるクラスタに属する全登録データから代表データを選択する。代表データは、当該クラスタに属する全登録データからランダムに選択してもいいし、Ｋ平均法を用いた場合には、当該クラスタの中心に最も近い登録データを代表データとしてもよい。例えば、所定の順序でクラスタ内の登録データを並べた時に、全体の順序の中心付近に位置するデータを代表データとする。 (S802) Representative data is selected from all registered data belonging to the cluster whose cluster ID is p. The representative data may be selected randomly from all the registered data belonging to the cluster, or when the K average method is used, the registered data closest to the center of the cluster may be used as the representative data. For example, when the registration data in the cluster is arranged in a predetermined order, data located near the center of the whole order is used as representative data.

（Ｓ８０４）変数ｐが全クラスタ数よりも小さい場合、ｐを１増やして（Ｓ８０６）、ステップＳ８０２に戻り、次のクラスタについて同様の処理を行う。そうでない場合はステップＳ８１０へ進む。 (S804) If the variable p is smaller than the total number of clusters, p is incremented by 1 (S806), the process returns to step S802, and the same processing is performed for the next cluster. Otherwise, the process proceeds to step S810.

（Ｓ８１０）クラスタをカウントするための変数ｑに１を設定する。 (S810) 1 is set to the variable q for counting clusters.

（Ｓ８１２）クラスタＩＤがｑとなるクラスタの代表データの秘匿化インデックスとトラップドアとを照合する。非特許文献１に基づいて具体的な照合方法を例示すると以下のようになる。秘匿化インデックスとトラップドアの照合の処理手順を図１４に示す。なお、図１３の（Ｓ５０２−１）から（Ｓ５０２−４）で求められた秘匿化インデックスＨｉは、預託データ内の単語Ｗｉから生成したハッシュ値ｈｉと、乱数列ｒｉとそのメッセージダイジェストｄｉを連結したものとの排他的論理和で生成されたものとする（ｉ＝１，２，．．．）。また、（Ｓ７０−１）から（Ｓ７０−２）の手順で求めたトラップドアをｈ′とする。なお、添え字「ｉ」は、預託データ内に含まれるそれぞれの単語に対する識別子である。 (S812) The concealment index of the representative data of the cluster whose cluster ID is q is collated with the trap door. A specific verification method based on Non-Patent Document 1 is exemplified as follows. FIG. 14 shows a processing procedure for collating the concealment index and the trapdoor. Note that the concealment index Hi obtained in (S502-1) to (S502-4) in FIG. 13 concatenates the hash value hi generated from the word Wi in the deposit data, the random number sequence ri, and its message digest di. (I = 1, 2,...). Further, the trap door obtained in the procedure from (S70-1) to (S70-2) is set as h ′. The subscript “i” is an identifier for each word included in the deposit data.

（Ｓ８１２−１）各単語Ｗｉについて、秘匿化インデックスＨｉとトラップドアｈ′との排他的論理和を取る。 (S812-1) For each word Wi, the exclusive OR of the concealment index Hi and the trap door h 'is calculated.

（Ｓ８１２−２）ステップＳ５０２−３と同様に、当該排他的論理和（ビット列Ｓ′ｉ）の先頭ｃビットのビット列ｒ′ｉから、図１３のステップＳ５０２−３と同じ所定の演算を行なってメッセージダイジェストＤｉを計算し、このＤｉを、当該排他的論理和の後半ｎ−ｃビットのビット列ｄ′ｉと比較する。 (S812-2) Similar to step S502-3, the same predetermined operation as step S502-3 in FIG. 13 is performed from the bit string r′i of the first c bits of the exclusive OR (bit string S′i). The message digest Di is calculated, and this Di is compared with the bit sequence d′ i of the second half nc bits of the exclusive OR.

（Ｓ８１２−３）もしｈｉ＝ｈ′、即ち、元の単語Ｗｉと検索キーワードとが一致するならば、このダイジェストＤｉとビット列ｄ′ｉとの排他的論理和を取ることで、乱数列ｒｉとそのメッセージダイジェストｄｉだけが残るはずである。よって、ｒ′ｉのメッセージダイジェストＤｉがｄ′ｉに一致すれば、ｈｉはｈ′に等しく、秘匿化インデックスに対応する預託データはトラップドアに対応する検索クエリ（検索キーワード）を含んでいると判断できる。以下この事象を単純に検索にヒットしたという。一致しない場合、秘匿化インデックスに対応する預託データはトラップドアに対応する検索クエリを含んでいないと判断する。 (S812-3) If hi = h ′, that is, if the original word Wi and the search keyword match, by taking the exclusive OR of the digest Di and the bit string d′ i, Only the message digest di should remain. Therefore, if the message digest Di of r′i matches d′ i, hi is equal to h ′, and the deposit data corresponding to the concealment index includes a search query (search keyword) corresponding to the trapdoor. I can judge. Hereinafter, this event is simply referred to as a search hit. If they do not match, it is determined that the deposit data corresponding to the concealment index does not include a search query corresponding to the trapdoor.

ここで、図１４を参照しながら、ステップＳ８１２−２、８１２−３における検索クエリとトラップドアとの一致判定のアルゴリズムを説明する。 Here, an algorithm for determining whether or not the search query matches the trapdoor in steps S812-2 and 812-3 will be described with reference to FIG.

排他的論理和を「ＸＯＲ」、集合Ａの補集合（否定）を「¬」、論理和を「＋」、論理積を「・」、２つのビット列の結合を「｜」とすると、一般に、３つの集合の排他的論理和は、（ＡＸＯＲＢ）ＸＯＲＣ＝Ｘ・Ｂ＋¬Ｘ・¬Ｂ、かつ、Ｘ＝¬（ＡＸＯＲＣ）、¬Ｘ＝（ＡＸＯＲＣ）となる。ここで、Ａ＝ｈｉ（単語ｗｉのハッシュ値）、Ｂ＝（ｒｉ｜ｄｉ）（乱数列ｒｉとメッセージダイジェストｄｉからなるビット列Ｓｉ）、Ｃ＝ｈ′（トラップドア）とすると、Ｘ＝ｈｉＸＯＲ ¬ｈ′、¬Ｘ＝ｈｉＸＯＲｈ′となる。特に、ｈｉ＝ｈ′の場合、Ｘ＝１、¬Ｘ＝０となる。 When the exclusive OR is “XOR”, the complement of the set A (negation) is “¬”, the logical sum is “+”, the logical product is “•”, and the combination of two bit strings is “|”, The exclusive OR of the three sets is (A XOR B) XOR C = X · B + ¬X · ¬B, X = ¬ (A XOR C), and ¬X = (A XOR C). Here, if A = hi (hash value of word wi), B = (ri | di) (bit string Si composed of random number sequence ri and message digest di), C = h ′ (trap door), X = hi XOR ¬h 'and ¬X = hi XOR h'. In particular, when hi = h ′, X = 1 and ¬X = 0.

秘匿化インデックスＨｉとトラップドアｈ′
との排他的論理和を、ＨｉＸＯＲｈ′＝（ｒ′ｉ｜ｄ′ｉ）（乱数列とメッセージダイジェストからなるビット列Ｓ′ｉ）とおいて（Ｓ８１２−１）、この左辺のＨｉに上記のＨｉの表現を代入すると、ｒ′ｉ＝Ｘ・ｒｉ＋¬Ｘ・¬ｒｉ、及びｄ′ｉ＝Ｘ・ｄｉ＋¬Ｘ・¬ｄｉとなる。Concealment index Hi and trap door h '
And XOR h ′ = (r′i | d′ i) (bit sequence S′i consisting of a random number sequence and a message digest) (S812-1), Substituting the expression of Hi, r′i = X · ri + ¬X · ¬ri and d'i = X · di + ¬X · ¬di.

もし、ｈｉ＝ｈ′ならば、Ｘ＝１、¬Ｘ＝０であるから、ｒ′ｉ＝ｒｉ、及びｄ′ｉ＝ｄｉとなる。従って、トラップドアによるメッセージダイジェストＤｉ＝ｆ（ｒ′ｉ）＝ｆ（ｒｉ）＝ｄｉ（Ｓ８１２−２）、さらに、ｄ′ｉ＝ｄｉであるから、Ｄｉ＝ｄ′ｉとなる（Ｓ８１２−３）。 If hi = h ′, since X = 1 and ¬X = 0, r′i = ri and d′ i = di. Therefore, the message digest Di = f (r′i) = f (ri) = di (S812-2) by the trap door, and since d′ i = di, Di = d′ i (S812-3). ).

以上のことから、ｈｉ＝ｈ′、即ち、単語Ｗｉ＝検索キーワードであれば、Ｄｉ＝ｄ′ｉとなる（Ｓ８１２−３）。 From the above, if hi = h ′, that is, if the word Wi = search keyword, then Di = d′ i (S812-3).

乱数列と排他的論理和の性質を利用することで、平文と暗号文が1対1とならない場合でもトラップドアとの照合を行うことができる。詳細については非特許文献１参照のこと。なお、クラスタ内の代表データ（登録データ）、特に、登録データ内の秘匿化インデックスを「ピボット」とも言う。 By using the property of the random number sequence and exclusive OR, even when the plaintext and ciphertext are not one-to-one, it is possible to collate with the trapdoor. See Non-Patent Document 1 for details. Note that the representative data (registered data) in the cluster, in particular, the concealment index in the registered data is also called “pivot”.

（Ｓ８１４）秘匿化インデックスとトラップドアとの合致率が大きいほど優先度が高くなるよう、クラスタＩＤがｑとなるクラスタの優先度を算出する。トラップドアが１つの場合、合致度は検索にヒットしたか否かの二値で与えられる。複数の検索キーワードを指定してこれらのａｎｄ（論理積）検索やｏｒ（論理和）検索を行った場合、トラップドアは複数となるため、合致度は、検索にヒットしたトラップドアの個数と全トラップドアとの比で与えられる。 (S814) The priority of the cluster whose cluster ID is q is calculated so that the higher the match rate between the concealment index and the trapdoor, the higher the priority. When there is one trapdoor, the degree of match is given as a binary value indicating whether or not the search is hit. When these AND (logical product) search or or (logical sum) search is performed by specifying multiple search keywords, there are multiple trap doors, so the degree of match is the number of trap doors that hit the search and the total number of trap doors. It is given as a ratio to the trapdoor.

（Ｓ８１６）変数ｑが全クラスタ数よりも小さい場合、ｑを１増やして（Ｓ８１８）、ステップＳ８１２に戻り、次のクラスタについて同様の処理を行う。そうでない場合はステップＳ８２０へ進む。 (S816) If the variable q is smaller than the total number of clusters, q is incremented by 1 (S818), the process returns to step S812, and the same processing is performed for the next cluster. Otherwise, the process proceeds to step S820.

（Ｓ８２０）優先順位算出部４４２は、優先度が高い順になるようにクラスタＩＤをソートし、当該結果をメモリ４１４または記憶装置４１６に出力する。以上で優先順位算出処理を終了する。 (S820) The priority order calculation unit 442 sorts the cluster IDs so that the priorities are in descending order, and outputs the results to the memory 414 or the storage device 416. This completes the priority order calculation process.

（クラスタの優先順位に基づく登録データの照合）
図９は、データベースサーバ４０が行う秘匿化インデックスとトラップドアの照合処理（図８のステップＳ８１２）の手順を例示するフローチャートである。本発明の実施の形態では、預託データの特徴量を用いて照合対象の優先順位を定めてから照合を順次行い、一定回数で照合を打ち切ることで、安全性や検索精度の低下を抑えつつ秘匿検索処理の高速化を実現することを特徴とする。以下、照合を行う回数を照合回数とよぶ。当該照合回数はあらかじめ登録クライアント２０のユーザがユーザインターフェース２３０を介して設定しておく。照合回数の設定については後で図１１を用いて説明する。(Verification of registered data based on cluster priority)
FIG. 9 is a flowchart illustrating the procedure of the concealment index and trapdoor matching process (step S812 in FIG. 8) performed by the database server 40. In the embodiment of the present invention, the priority of collation targets is determined using the feature amount of the deposit data, and then collation is performed sequentially, and the collation is terminated at a fixed number of times, thereby suppressing the decrease in safety and search accuracy. It is characterized by realizing high-speed search processing. Hereinafter, the number of times of collation is referred to as the number of collations. The number of collations is set in advance by the user of the registration client 20 via the user interface 230. The setting of the number of collations will be described later with reference to FIG.

秘匿化インデックスとトラップドアの照合処理は以下の手順で行われる。なお、以下の処理（Ｓ９０４を除く）はすべて検索部４４０が行う。 The verification process of the concealment index and the trapdoor is performed according to the following procedure. Note that the search unit 440 performs all of the following processing (except S904).

（Ｓ９００）照合回数をカウントする変数ｔを０に、クラスタをカウントする変数ｋを１に設定する。 (S900) A variable t for counting the number of collations is set to 0, and a variable k for counting clusters is set to 1.

（Ｓ９０２）クラスタ内の登録データをカウントする変数ｎを１に設定する。メモリ４１４または記憶装置４１６からステップＳ８２０で優先順位算出部４４２が出力した優先順位を読み込み、ｋ番目に優先順位が高いクラスタＣｋを特定する。 (S902) A variable n for counting registered data in the cluster is set to 1. The priority order output by the priority order calculation unit 442 in step S820 is read from the memory 414 or the storage device 416, and the k-th highest priority cluster Ck is specified.

（Ｓ９０４）照合部４４４は、クラスタＣｋに含まれるｎ番目の登録データの秘匿化インデックスとトラップドアとを照合する。ヒットした場合、秘匿化インデックスとトラップドアの合致率と併せて、対応する登録データＩＤをメモリ４１４または記憶装置４１６に一時的に出力する。ヒットしなかった場合は何も出力しない。 (S904) The collation unit 444 collates the concealment index of the nth registration data included in the cluster Ck with the trap door. In the case of a hit, the corresponding registration data ID is temporarily output to the memory 414 or the storage device 416 together with the concealment index and the trap door match rate. If there is no hit, nothing is output.

（Ｓ９０６）照合回数の変数ｔを１増やす。 (S906) The variable t of the number of verifications is incremented by one.

（Ｓ９０８）もし変数ｔがあらかじめ定めた照合回数より小さい場合は、次のステップＳ９１０に進む。そうでない場合はステップＳ９１８に進み、処理を終了する。 (S908) If the variable t is smaller than the predetermined number of collations, the process proceeds to the next step S910. Otherwise, the process proceeds to step S918, and the process ends.

（Ｓ９１０）登録データをカウントする変数ｎがクラスタＣｋに含まれる全登録データ数よりも小さい場合、ｎを１増やして（Ｓ９１２）、ステップＳ９０４に戻り、クラスタ内の次の登録データについて同様の処理を行う。そうでない場合はステップＳ９１４へ進む。 (S910) When the variable n for counting registered data is smaller than the total number of registered data included in the cluster Ck, n is incremented by 1 (S912), and the process returns to step S904, and the same processing is performed for the next registered data in the cluster. I do. Otherwise, the process proceeds to step S914.

（Ｓ９１４）優先順位を表す変数ｋが全クラスタ数よりも小さい場合、ｋを１増やして（Ｓ９１６）、ステップＳ９０２に戻り、次に優先順位が高いクラスタについて同様の処理を行う。そうでない場合はステップＳ９１８へ進む。 (S914) If the variable k representing the priority order is smaller than the total number of clusters, k is incremented by 1 (S916), the process returns to step S902, and the same processing is performed for the cluster with the next highest priority order. Otherwise, the process proceeds to step S918.

（Ｓ９１８）検索部４４０は、照合部４４４がメモリ４１４または記憶装置４１６に一時的に出力した登録データＩＤに対応する暗号化データを、メモリ４１４または記憶装置４１６に合致率とともに出力する。以上で秘匿化インデックスとトラップドアの照合処理を終了する。 (S918) The search unit 440 outputs the encrypted data corresponding to the registered data ID temporarily output to the memory 414 or the storage device 416 by the matching unit 444 to the memory 414 or the storage device 416 together with the match rate. The concealment index and the trap door verification process are thus completed.

例えば、登録クライアント２０が「雲」というキーワードを含む１０００個の預託データと、「雲」を含まない９０００個の預託データを、本実施形態によりデータベースサーバ４０に登録したとする。データベースサーバ４０では、クラスタリング処理により、「雲」を含むクラスタＡと含まないクラスタＢの２つに分解して登録データが管理されることになる。検索クライアント３０が「雲」という検索クエリで検索した場合、検索クエリ（トラップドア）とピボット（秘匿化インデックスの代表）との照合により、クラスタＡの１０００個の登録データが優先的に照合される。よって、仮に１０００回で照合を打ち切ったとしても、「雲」を含む１０００個の全預託データにヒットすることになる。一方、従来の検索可能暗号を用いた秘匿検索システムにおいては、１００００個の全登録データに対して「雲」という検索クエリのトラップドアと照合して、初めて「雲」を含む全預託データにヒットする。従って、この例では、本発明は従来と比較して１０倍検索速度を向上できたといえる。このように、図１から図９を用いて説明した本発明の実施の形態に従って、元のデータを推測しにくい特徴量を用いて秘匿化インデックスをクラスタリングしておくことで、安全性や検索精度の低下を抑えつつ、秘匿検索を高速化することができた。 For example, it is assumed that the registration client 20 registers 1000 deposit data including the keyword “cloud” and 9000 deposit data not including “cloud” in the database server 40 according to the present embodiment. In the database server 40, the registration data is managed by being divided into two, a cluster A including “cloud” and a cluster B not including “cloud” by the clustering process. When the search client 30 searches with the search query “cloud”, 1000 registered data of the cluster A are preferentially collated by collating the search query (trap door) and the pivot (representative of the concealment index). . Therefore, even if the collation is terminated 1000 times, 1000 pieces of deposit data including “cloud” are hit. On the other hand, in the secret search system using the conventional searchable encryption, all the registered data including “cloud” is hit for the first time by checking against the trap door of the search query “cloud” for all registered data. To do. Therefore, in this example, it can be said that the present invention was able to improve the search speed by a factor of 10 compared with the prior art. As described above, according to the embodiment of the present invention described with reference to FIGS. 1 to 9, the concealment index is clustered using the feature amount that is difficult to guess the original data, so that the safety and the search accuracy are increased. It was possible to speed up the secret search while suppressing the decrease of the search.

図１０も、データベースサーバ40が行う秘匿化インデックスとトラップドアとの照合処理手順を例示するフローチャートである。図９の例では一定回数で照合を打ち切ることで秘匿検索処理の高速化を実現したが、照合回数ではなく検索にヒットした回数（以下、ヒット回数とよぶ）で打ち切ることでも高速化を実現できる。具体的には、図９のステップＳ９０６およびＳ９０８が以下のステップＳ９０６−ａ，Ｓ９０６−ｂ，Ｓ９０８′に置き換わる。以下の処理はすべて検索部４４０が行う。 FIG. 10 is also a flowchart illustrating the collating process procedure between the concealment index and the trapdoor performed by the database server 40. In the example of FIG. 9, the speed of the confidential search processing is realized by aborting the collation at a fixed number of times. However, the speeding up can also be realized by aborting the search by the number of hits (hereinafter referred to as the hit number) instead of the number of collations. . Specifically, steps S906 and S908 in FIG. 9 are replaced with the following steps S906-a, S906-b, and S908 ′. The search unit 440 performs all the following processing.

（Ｓ９０６−ａ）照合部４４４において検索がヒットしたかどうかを判断する。ヒットしたときＳ９０６−ｂへ進む。ヒットしなかったときステップＳ９０８′へ進む。 (S906-a) The collation unit 444 determines whether or not the search is hit. When a hit is found, the process proceeds to S906-b. If not hit, the process proceeds to step S908 '.

（Ｓ９０６−ｂ）照合回数の変数ｔを１増やす。 (S906-b) The variable t of the number of collations is increased by one.

（Ｓ９０８′）もし変数ｔがあらかじめ定めたヒット回数より小さい場合は、次のＳ９１０に進む。そうでない場合はＳ９１８に進み、処理を終了する。 (S908 ′) If the variable t is smaller than the predetermined number of hits, the process proceeds to the next S910. Otherwise, the process proceeds to S918, and the process ends.

ヒット回数は、あらかじめ登録クライアント２０のユーザがユーザインターフェース２３０を介して設定しておく。ヒット回数の設定については後で図１１を用いて説明する。 The number of hits is set in advance by the user of the registration client 20 via the user interface 230. The setting of the number of hits will be described later with reference to FIG.

図１０のフローチャートによる方法はあらかじめ設定したヒット回数に達するまで照合を繰り返すため、図９で説明した検索方法と比較して検索漏れが少なくなるという利点がある。その反面、検索にヒットしなければ検索が遅延するという欠点がある。一方、図９で説明した方法は、あらかじめ設定した照合回数しか照合を行わないため、図１０の方法よりも検索漏れが生じやすいという欠点があるが、検索結果のいかんにかかわらず検索応答時間が一定に保たれるという利点がある。 The method according to the flowchart of FIG. 10 has an advantage that search omissions are reduced as compared with the search method described in FIG. 9 because the collation is repeated until a preset number of hits is reached. On the other hand, if the search is not hit, the search is delayed. On the other hand, the method described with reference to FIG. 9 has a drawback in that a search omission is more likely to occur than the method of FIG. There is an advantage that it is kept constant.

図１１は、検索クライアント３０またはデータベースサーバ４０で行う設定画面を例示する図である。図３の説明で述べた検索クライアント３０の設定部３２８が設定するパラメータとして、例えば照合回数がある。 FIG. 11 is a diagram illustrating a setting screen performed by the search client 30 or the database server 40. As a parameter set by the setting unit 328 of the search client 30 described in the description of FIG.

ダイアログ１１００や１１２０は、設定部３２８が照合回数をユーザに設定させるためにユーザインターフェース３３０を介してユーザに提示する画面の例である。ダイアログ１１００では、スライドバー１１０２を左に動かすほど照合回数が小さくなって検索速度が向上する一方で、検索にヒットする登録データに到達する可能性が下がり、検索精度が低下する。スライドバー１１０２を右に動かすほど照合回数が大きくなって、検索速度が低下する一方、検索精度が向上する。スライドバー１１２２の位置に応じて、照合回数は、設定部３２８が保持する所定の値に設定される。また、ダイアログボックス１１２０の入力ボックス１１２２で、ユーザが直接照合回数を設定することもできる。なお図１０で説明した実施形態においては、上記説明における照合回数をヒット回数に読み替える。 Dialogs 1100 and 1120 are examples of screens that the setting unit 328 presents to the user via the user interface 330 in order to cause the user to set the number of collations. In the dialog 1100, as the slide bar 1102 is moved to the left, the number of collations is reduced and the search speed is improved. On the other hand, the possibility of reaching registered data that hits the search decreases, and the search accuracy decreases. As the slide bar 1102 is moved to the right, the number of collations increases, and the search speed decreases while the search accuracy improves. Depending on the position of the slide bar 1122, the number of collations is set to a predetermined value held by the setting unit 328. Also, the user can directly set the number of collations in the input box 1122 of the dialog box 1120. In the embodiment described with reference to FIG. 10, the number of verifications in the above description is replaced with the number of hits.

また、照合回数（またはヒット回数）の設定をデータベースサーバ４０の管理者が行う実施形態もある。この場合、データベースサーバ４０の設定部４５０がユーザインターフェース４６０を介してダイアログ１１００や１１２０を提示する。データベースサーバ４０の管理者が、登録データ数やクラスタリングの態様を定期的にチェックし、検索応答時間が遅延しないように照合回数を調整することで、秘匿検索サービスの質を保証することができる。 There is also an embodiment in which the administrator of the database server 40 sets the number of collations (or the number of hits). In this case, the setting unit 450 of the database server 40 presents the dialogs 1100 and 1120 via the user interface 460. The administrator of the database server 40 periodically checks the number of registered data and the mode of clustering, and adjusts the number of verifications so that the search response time is not delayed, thereby ensuring the quality of the confidential search service.

１０：ネットワーク、２０−１〜２０−ｎ、２０：登録クライアント、３０−１〜３０−ｍ、３０：検索クライアント、４０：データベースサーバ、
２００、３００、４００：内部バス、２１２、３１２、４１２：ＣＰＵ、２１４、３１４、４１４：メモリ、２１６、３１６、４１６：記憶装置、２３０、３３０、４６０：ユーザインターフェース、２３２、３３２、４６２：通信インターフェース、
２１８：鍵生成部、２２０：登録部、２２２：暗号化部、２２４：秘匿化インデックス生成部、２２６：特徴量算出部、２２８：設定部、
３２０：検索部、３２２：トラップドア生成部、３２４：鍵共有部、３２６：復号化部、３２８：設定部、４１８：認証部、４２０：登録部、４３０：クラスタリング部、４３２：類似度算出部、４４０：検索部、４４２：優先順位算出部、４４４：照合部、４５０：設定部、
Ｓ５０：データ生成処理、Ｓ５２：データ送受信処理、Ｓ５４：クラスタリング処理、
６０：登録データ格納位置管理テーブル、６００：登録データＩＤカラム、６０２：暗号化データカラム、６０４：秘匿化インデックスカラム、６０６：特徴量カラム、６０８：登録クライアントカラム、６１０：その他必要な事項を格納するカラム、６２：クラスタ管理テーブル、６２０：クラスタＩＤカラム、６２２：登録データＩＤカラム、６２４：その他必要な事項を格納するカラム、
Ｓ７０：トラップドア生成処理、Ｓ７２：秘匿検索処理、Ｓ７４：復号化処理、
１１００、１１２０：ダイアログ、１１０２：スライドバー、１１２２：入力ボックス10: Network, 20-1 to 20-n, 20: Registration client, 30-1 to 30-m, 30: Search client, 40: Database server,
200, 300, 400: Internal bus, 212, 312, 412: CPU, 214, 314, 414: Memory, 216, 316, 416: Storage device, 230, 330, 460: User interface, 232, 332, 462: Communication interface,
218: Key generation unit, 220: Registration unit, 222: Encryption unit, 224: Concealment index generation unit, 226: Feature quantity calculation unit, 228: Setting unit,
320: search unit, 322: trap door generation unit, 324: key sharing unit, 326: decryption unit, 328: setting unit, 418: authentication unit, 420: registration unit, 430: clustering unit, 432: similarity calculation unit 440: Search unit, 442: Priority calculation unit, 444: Verification unit, 450: Setting unit,
S50: data generation processing, S52: data transmission / reception processing, S54: clustering processing,
60: Registration data storage location management table, 600: Registration data ID column, 602: Encrypted data column, 604: Confidential index column, 606: Feature column, 608: Registration client column, 610: Store other necessary items 62: Cluster management table, 620: Cluster ID column, 622: Registered data ID column, 624: Column for storing other necessary items,
S70: Trap door generation process, S72: Secret search process, S74: Decryption process,
1100, 1120: Dialog, 1102: Slide bar, 1122: Input box

Claims

データを登録クライアントから受信し、検索用の情報を検索クライアントから受信する秘匿検索装置は、
前記登録クライアントから、前記データを暗号化した暗号化データと、前記データから抽出したインデックスを秘匿化した秘匿化インデックスと、データ間の類似度を計算するための特徴量との組を受信する受信手段と、
前記登録クライアントから受信した前記特徴量をもとに、２つのデータの類似度を算出する類似度算出手段と、
前記類似度算出手段が算出した類似度に基づいて、前記登録クライアントから受信した前記暗号化データをクラスタリングするクラスタリング手段と、
前記秘匿検索装置に登録されたデータの検索を行うための検索クエリに含まれる検索キーワードを秘匿化したトラップドアを前記検索クライアントから受け取り、前記クラスタリング手段が生成したクラスタリングの結果をもとに、前記クラスタリングされた暗号化データと前記トラップドアとの照合処理の優先順位を算出する優先順位算出手段と、
前記登録クライアントから受信した前記秘匿化インデックスと前記トラップドアとの照合処理を行う照合手段と、
前記検索クライアントから前記トラップドアを受信したとき、前記優先順位算出手段が算出した優先順位に基づいて、あらかじめ定めた回数だけ、前記優先順位が高いクラスタの順に、前記照合手段によって、前記暗号化データと前記トラップドアとの照合を行ない、前記トラップドアにヒットした前記暗号化データを前記検索クライアントに返信する検索手段とを、
有することを特徴とする秘匿検索装置。 A secret search device that receives data from a registered client and receives search information from the search client.
Received from the registered client is a set of encrypted data obtained by encrypting the data, a concealed index obtained by concealing an index extracted from the data, and a feature amount for calculating a similarity between the data. Means,
Similarity calculation means for calculating the similarity of two data based on the feature amount received from the registered client;
Clustering means for clustering the encrypted data received from the registered client based on the similarity calculated by the similarity calculation means;
The trap door that conceals the search keyword included in the search query for searching the data registered in the secret search device is received from the search client, and based on the clustering result generated by the clustering unit, Priority order calculating means for calculating a priority order of collation processing between the clustered encrypted data and the trapdoor;
Collating means for performing collation processing between the concealment index received from the registered client and the trap door;
When the trap door is received from the search client, based on the priority calculated by the priority calculation means, the collation means performs the encrypted data in the order of the clusters having the highest priority by a predetermined number of times. Search means for performing a comparison with the trap door and returning the encrypted data hitting the trap door to the search client,
A secret search device characterized by comprising:

前記秘匿検索装置において、
前記クラスタリング手段は、
クラスタを1以上生成し、各クラスタについて中心をランダムに設定し、
前記登録クライアントから受信した全てのデータについて、各データに含まれる特徴量に基づいて、前記類似度算出手段に、当該中心との類似度を計算させて、各データを最も類似する中心が属するクラスタに割り振り、
前記登録クライアントから受信した全てのデータについて、クラスタへの割り振りが変化しなかった場合は処理を終了し、
それ以外の場合は各クラスタの中心を、当該クラスタに属するデータの特徴量を用いて計算し直してから、当該中心を求める処理を繰り返す
ことを特徴とする請求項１記載の秘匿検索装置。 In the secret search device,
The clustering means includes
Generate one or more clusters, set the center randomly for each cluster,
For all data received from the registered client, based on the feature amount included in each data, let the similarity calculation means calculate the similarity to the center, and each data belongs to the cluster to which the most similar center belongs Allocated to
For all data received from the registered client, if the allocation to the cluster has not changed, terminate the process,
The center of each cluster Otherwise, the re-calculated by using the feature amount of data belonging to the cluster, the secure search apparatus according to claim 1, wherein repeating the processing for obtaining the center.

前記秘匿検索装置において、
前記クラスタリング手段は、
前記登録クライアントから受信したデータを1つだけを含むクラスタを、前記データの総数分生成し、
前記類似度算出手段に、各クラスタに属するデータの特徴量を用いてクラスタ間の距離を計算させ、最も距離の近い２つのクラスタを逐次的に併合し、
前記併合を、全ての対象が１つのクラスタに併合されるまで繰り返す、
ことを特徴とする請求項１記載の秘匿検索装置。 In the secret search device,
The clustering means includes
A cluster including only one data received from the registered client is generated for the total number of the data,
Let the similarity calculation means calculate the distance between the clusters using the feature amount of the data belonging to each cluster, sequentially merge the two clusters with the closest distance;
Repeat the merging until all objects are merged into one cluster,
The secret search device according to claim 1 .

前記秘匿検索装置において、
前記優先順位算出手段は、
各クラスタについて、前記クラスタに属する全てのデータのうち1つを代表データとして選択し、
各クラスタについて、前記クラスタの代表データの秘匿化インデックスと前記トラップドアを前記照合手段に照合させて、
前記トラップドアとの合致率が大きいほど優先度が高くなるように当該クラスタの優先度を算出し、
前記優先度が高い順になるようにクラスタに含まれるデータの照合順序をソートする、
ことを特徴とする請求項１記載の秘匿検索装置。 In the secret search device,
The priority order calculating means includes:
For each cluster, select one of all data belonging to the cluster as representative data,
For each cluster, let the collation means collate the concealment index of the representative data of the cluster and the trapdoor,
Calculate the priority of the cluster so that the higher the match rate with the trapdoor, the higher the priority,
Sorting the collation order of the data contained in the cluster so that the priorities are in descending order;
The secret search device according to claim 1 .

前記秘匿検索装置において、
前記照合手段は、
前記秘匿化インデックスと前記トラップドアの排他的論理和を取り、
前記秘匿化インデックス生成の際に生成した乱数列と同じ長さのビット列を前記排他的論理和の先頭から取って、前記ビット列のメッセージダイジェストを計算し、
前記メッセージダイジェストが、前記排他的論理和の前記メッセージダイジェストを計算していないビット列と一致した場合、前記秘匿化インデックスは前記トラップドアに対応する検索クエリを含んでいると判断し、
一致しない場合、前記秘匿化インデックスは前記トラップドアに対応する前記検索クエリを含んでいないと判断する、
ことを特徴とする請求項１記載の秘匿検索装置。 In the secret search device,
The verification means includes
Take the exclusive OR of the concealment index and the trapdoor,
Taking a bit string having the same length as the random number sequence generated when generating the concealment index from the head of the exclusive OR, calculating a message digest of the bit sequence,
If the message digest matches a bit string for which the message digest of the exclusive OR is not calculated, it is determined that the concealment index includes a search query corresponding to the trapdoor;
If not, it is determined that the concealment index does not include the search query corresponding to the trapdoor;
The secret search device according to claim 1 .

前記秘匿検索装置において、
前記検索手段が行う照合処理の回数は、前記検索クライアントが設定する、
ことを特徴とする請求項１記載の秘匿検索装置。 In the secret search device,
The number of matching processes performed by the search means is set by the search client.
The secret search device according to claim 1 .

前記秘匿検索装置において、
前記検索手段が行う照合処理の回数は、前記秘匿検索装置が設定する、
ことを特徴とする請求項１記載の秘匿検索装置。 In the secret search device,
The number of matching processes performed by the search means is set by the secret search device.
The secret search device according to claim 1 .

少なくとも１つの登録クライアント、秘匿検索装置、及び少なくとも１つの検索クライアントを互いにネットワークを介して接続した計算機システムであって、
データを前記秘匿検索装置に送信する前記登録クライアントは、
前記秘匿検索装置に送信するデータを暗号化して暗号化データを生成する暗号化手段、
前記データから抽出したインデックスを秘匿化した秘匿化インデックスを生成する秘匿化インデックス生成手段、
前記データから、データ間の類似度を計算するための特徴量を算出する特徴量算出手段、及び
前記暗号化データ、前記秘匿化インデックス、及び前記特徴量の組を、前記秘匿検索装置に送信する登録手段を有し、
前記秘匿検索装置に対して検索を行う前記検索クライアントは、
前記秘匿検索装置に登録されたデータの検索を行うための検索クエリに含まれる検索キーワードを秘匿化したトラップドアを生成するトラップドア生成手段を有し、
データを前記登録クライアントから受信し、検索用の情報を前記検索クライアントから受信する前記秘匿検索装置は、
前記登録クライアントから、前記暗号化データと、前記秘匿化インデックスと、前記特徴量との組を受信する受信手段、
前記登録クライアントから受信した前記特徴量をもとに、２つのデータの類似度を算出する類似度算出手段、
前記類似度算出手段が算出した類似度に基づいて、前記登録クライアントから受信した前記暗号化データをクラスタリングするクラスタリング手段、
前記トラップドアを前記検索クライアントから受け取り、前記クラスタリングの結果をもとに、前記クラスタリングされた暗号化データと前記トラップドアとの照合処理の優先順位を算出する優先順位算出手段、
前記登録クライアントから受信した前記秘匿化インデックスと前記トラップドアとの照合処理を行う照合手段、及び
前記検索クライアントから前記トラップドアを受信したとき、前記優先順位に基づいて、あらかじめ定めた回数だけ、前記優先順位が高いクラスタの順に、前記照合手段によって、前記暗号化データと前記トラップドアとの照合を行ない、前記トラップドアにヒットした前記暗号化データを前記検索クライアントに返信する検索手段とを有する、
ことを特徴とする計算機システム。 A computer system in which at least one registered client, a secret search device, and at least one search client are connected to each other via a network,
The registration client that transmits data to the secret search device includes:
Encryption means for encrypting data to be transmitted to the confidential search device to generate encrypted data;
A concealment index generating means for generating a concealment index concealing the index extracted from the data;
A feature amount calculating means for calculating a feature amount for calculating a similarity between data from the data, and a set of the encrypted data, the concealment index, and the feature amount are transmitted to the confidential search device. Have registration means,
The search client that performs a search for the secret search device is:
A trap door generating means for generating a trap door that conceals a search keyword included in a search query for searching for data registered in the secret search device;
The secret search device that receives data from the registered client and receives search information from the search client,
Receiving means for receiving a set of the encrypted data, the concealment index, and the feature amount from the registered client;
Similarity calculation means for calculating the similarity of two data based on the feature received from the registered client;
Clustering means for clustering the encrypted data received from the registered client based on the similarity calculated by the similarity calculation means;
Priority order calculating means for receiving the trap door from the search client, and calculating a priority of collation processing between the clustered encrypted data and the trap door based on the clustering result;
Collation means for performing a collation process between the concealment index received from the registered client and the trap door, and when the trap door is received from the search client, the predetermined number of times based on the priority order. In the order of clusters having the highest priority, the collating means collates the encrypted data with the trap door, and has a retrieval means for returning the encrypted data hitting the trap door to the search client.
A computer system characterized by that.

ネットワークを介して、秘匿検索装置に登録するデータを送信する登録クライアントと前記秘匿検索装置に対して検索を行う検索クライアントとに接続された秘匿検索装置における秘匿検索方法は、
前記登録クライアントから、前記データを暗号化した暗号化データと、前記データから抽出したインデックスを秘匿化した秘匿化インデックスと、データ間の類似度を計算するための特徴量との組を受信し、
前記登録クライアントから受信した前記特徴量をもとに、２つのデータの類似度を算出し、
前記算出した類似度に基づいて、前記登録クライアントから受信した前記暗号化データをクラスタリングし、
前記秘匿検索装置に登録されたデータの検索を行うための検索クエリに含まれる検索キーワードを秘匿化したトラップドアを前記検索クライアントから受け取り、
前記生成したクラスタリングの結果をもとに、前記クラスタリングされた暗号化データと前記トラップドアとの照合処理の優先順位を算出し、
前記検索クライアントから前記トラップドアを受信したとき、前記算出した優先順位に基づいて、あらかじめ定めた回数だけ、前記登録クライアントから受信した前記暗号化インデックスと前記トラップドアとを照合し、
前記トラップドアにヒットした前記暗号化データを前記検索クライアントに返信する、
ことを特徴とする秘匿検索方法。 A secret search method in a secret search device connected to a registration client that transmits data to be registered in the secret search device and a search client that searches the secret search device via a network includes:
From the registered client, a set of encrypted data obtained by encrypting the data, a concealed index obtained by concealing an index extracted from the data, and a feature amount for calculating a similarity between the data is received.
Based on the feature amount received from the registered client, the similarity between the two data is calculated,
Clustering the encrypted data received from the registered client based on the calculated similarity,
Receiving from the search client a trapdoor that conceals a search keyword included in a search query for searching for data registered in the secret search device;
Based on the generated clustering result, calculate the priority of collation processing between the clustered encrypted data and the trapdoor,
When the trap door is received from the search client, based on the calculated priority order, the encryption index received from the registered client is compared with the trap door a predetermined number of times,
Returning the encrypted data hitting the trapdoor to the search client;
A secret search method characterized by the above.