JP2014074959A

JP2014074959A - Adjacent search processing device, method and program

Info

Publication number: JP2014074959A
Application number: JP2012220719A
Authority: JP
Inventors: Seiji Matsumura; 聖司松村; Hiroki Akama; 浩樹赤間; Yoshinori Matsuo; 嘉典松尾; Naoya Kotani; 尚也小谷; Masashi Yamamuro; 雅司山室
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-10-02
Filing date: 2012-10-02
Publication date: 2014-04-24

Abstract

PROBLEM TO BE SOLVED: To provide an adjacent search processing device capable of performing high accuracy and high speed search processing using a kNN graph by utilizing a high performance in parallel calculation of a many core operator represented by a GPU device in an adjacent search processing between a query vector and a large scale vector DB when the number of dimensions of vectors is high.SOLUTION: Graph search is made on branches M smaller than the number of branches N when the kNN graph is organized, and on the nodes after completing the graph search, parallel distance calculation is made between branch nodes of the number of branches and a query when the kNN graph is organized, and thus, a node positioned at a shortest distance from the query is selected.

Description

本発明は、近傍探索処理装置及び方法及びプログラムに係り、特に、ベクトルの次元数が高次元な場合に、クエリのベクトルと、大規模なベクトルデータベースの近傍探索処理を、GPU(Graphics Processing Unit)デバイスに代表されるメニーコア演算器の並列演算能力の高さに着目し、kNN（k-Nearest Neighbor）グラフを用いて実現するための近傍探索処理装置及び方法及びプログラムに関する。 The present invention relates to a neighborhood search processing apparatus, method, and program, and in particular, when a vector has a high number of dimensions, a query vector and neighborhood search processing of a large-scale vector database are performed by a GPU (Graphics Processing Unit). The present invention relates to a neighborhood search processing apparatus, method, and program for realizing using a kNN (k-Nearest Neighbor) graph, focusing on the high parallel computing capability of a many-core computing unit represented by a device.

ベクトルの次元数が高次元な場合に、クエリのベクトルと大規模なベクトルDBとの近傍探索処理をGPUデバイスに代表されるメニーコア演算器の並列演算能力の高さに着目し、高精度かつ高速な探索処理をkNNグラフを用いて行う方法として以下のような方法がある。ここで、kNNグラフとは、最近傍検索のためのアルゴリズムであり、各ノードが最も近いｋ個のノードと辺で結ぶ。グラフが無向グラフである場合、各ノードの次数はｋである。 When the number of vector dimensions is high, pay attention to the high parallel processing capability of many-core computing units represented by GPU devices for the neighborhood search processing of query vectors and large-scale vector DBs. As a method for performing a simple search process using a kNN graph, there are the following methods. Here, the kNN graph is an algorithm for nearest neighbor search, and each node is connected to the nearest k nodes by edges. If the graph is an undirected graph, the degree of each node is k.

[従来技術１]
ホストコンピュータのみで、kNNグラフを用いた近傍探索がある（例えば、非特許文献1参照）。 [Prior art 1]
There is a neighborhood search using a kNN graph only by a host computer (for example, see Non-Patent Document 1).

従来技術1の処理のフローチャートを図1に示す。 FIG. 1 shows a flowchart of the processing of the prior art 1.

当該技術の近傍探索では、kNNグラフを構築した時の枝数をグラフ探索に利用する。クエリは、現ノードとの距離計算とその枝ノード群との距離計算を繰り返して行い（ステップ４〜７）、クエリから最短距離のノードを選択する（ステップ８）。そして、最短距離ノードが、現ノードであれば、そのノード最近傍として出力する（ステップ１０）。一方、最短距離ノードが、現ノード以外であれば、最短距離ノードにホップして（ステップ９）、ホップ先ノードを現ノードにして、グラフ探索を再開する（ステップ４〜９）。 In the neighborhood search of the technology, the number of branches when the kNN graph is constructed is used for the graph search. The query repeats the distance calculation with the current node and the distance calculation with the branch node group (steps 4 to 7), and selects the node with the shortest distance from the query (step 8). If the shortest distance node is the current node, it is output as the nearest node (step 10). On the other hand, if the shortest distance node is other than the current node, it hops to the shortest distance node (step 9), makes the hop destination node the current node, and restarts the graph search (steps 4 to 9).

この探索では、枝数をパラメータとして、速度と精度を調整することができるが、速度を重視すると、枝数を減らす必要があるため、精度が低下してしまう問題がある。 In this search, the speed and accuracy can be adjusted using the number of branches as a parameter. However, if the speed is emphasized, the number of branches needs to be reduced, and there is a problem that the accuracy is lowered.

[従来技術２]
さらに、kNNグラフをホップする際の条件判定をするために行うベクトル間の距離計算をGPUデイバスで並列化した方法がある。 [Prior art 2]
Furthermore, there is a method in which distance calculation between vectors performed for determining a condition when hopping a kNN graph is parallelized using a GPU device.

従来技術２の処理を図2に示す。 FIG. 2 shows the processing of the prior art 2.

この方法は、ホストコンピュータとGPUデバイスを利用して、kNNグラフを用いた近傍探索を行う。当該近傍探索では、kNNグラフを構築したときの枝数をグラフ探索に利用する。クエリは、GPUデバイスにて、現ノードとその枝ノード群との並列距離計算をして、計算結果をホストコンピュータに転送する（ステップ１６）。ホストコンピュータでは、クエリから最短距離のノードを選択してホストコンピュータに転送する（ステップ１７，１８）。そして、ホストコンピュータ側では最短距離ノードが、現ノードであれば、そのノードを最近傍として出力する（ステップ１９，２１）。一方、最短距離ノードが、現ノード以外であれば、最短距離ノードにホップして（ステップ２０）、ホップ先ノードを現ノードにして、グラフ探索を再開する（ステップ１６〜２０）。 This method uses a host computer and a GPU device to perform a neighborhood search using a kNN graph. In the neighborhood search, the number of branches when the kNN graph is constructed is used for the graph search. The query calculates the parallel distance between the current node and its branch node group in the GPU device, and transfers the calculation result to the host computer (step 16). The host computer selects the node with the shortest distance from the query and transfers it to the host computer (steps 17 and 18). If the shortest distance node is the current node on the host computer side, that node is output as the nearest neighbor (steps 19 and 21). On the other hand, if the shortest distance node is other than the current node, it hops to the shortest distance node (step 20), makes the hop destination node the current node, and restarts the graph search (steps 16 to 20).

このグラフ探索では、枝数をパラメータとして、速度と精度を調整することができるが、速度を重視すると、枝数を減らす必要があるため、精度が低下してしまう。距離計算をGPUデバイスで並列処理を行うことにより、精度向上のために枝数を増やしても距離計算速度を高速化することが可能になる。しかし、ホップする度に現ノードとその枝ノード群のベクトルデータをGPUデバイスに転送する処理が発生するために、転送処理による速度低下が問題となる。 In this graph search, the speed and accuracy can be adjusted using the number of branches as a parameter. However, if importance is placed on the speed, the number of branches needs to be reduced, and the accuracy is lowered. By performing distance calculation in parallel with GPU devices, the distance calculation speed can be increased even if the number of branches is increased to improve accuracy. However, since a process of transferring vector data of the current node and its branch node group to the GPU device occurs every time a hop occurs, a decrease in speed due to the transfer process becomes a problem.

[従来技術３]
また、クエリのベクトルが全ベクトルDBとGPUを用いて並列距離計算する方法がある。これは、GPUを用いたユークリッド距離計算を並列化した全探索が挙げられる。全探索はクエリのベクトルとDB側の距離計算を１つずつ行い、計算結果から最小の値を求めるので、厳密な探索処理が可能となる。さらに、GPUで並列化処理をしているので、CPUの逐次処理に対しては高速な処理が可能となる。 [Prior art 3]
In addition, there is a method of calculating a parallel distance by using all vectors DB and GPU for query vectors. This is a full search in which Euclidean distance calculation using GPU is parallelized. In the full search, the query vector and the distance calculation on the DB side are performed one by one, and the minimum value is obtained from the calculation result, so that a strict search process is possible. Furthermore, since parallel processing is performed on the GPU, high-speed processing is possible for sequential CPU processing.

しかし、全探索では、１つずつ距離計算を行っていくため、例えば、GPUを用いて並列処理をしたとしても、従来技術１，２と比較すると処理速度が遅いという問題がある。 However, since the distance search is performed one by one in the full search, for example, even if parallel processing is performed using a GPU, there is a problem that the processing speed is slower than in the conventional techniques 1 and 2.

Metric-based Shape Retrieval in Large Database(ファイル形式pdf).Metric-based Shape Retrieval in Large Database (file format pdf).

上記のように、大規模な高次元ベクトル群の高速な最近傍探索方法としては、kNNグラフ構築時の枝数で近似最近傍探索処理を用いる技術は存在したが、従来技術1のようにホストのみを用いた探索では、枝数が少ない場合、高速な探索が可能となるが、精度を確保するには枝数を増やす必要があり、そうなると速度が低下するという問題がある。 As mentioned above, as a fast nearest neighbor search method for large-scale high-dimensional vectors, there is a technique that uses approximate nearest neighbor search processing based on the number of branches at the time of kNN graph construction, but as in prior art 1, the host is used. In the search using only the number of branches, when the number of branches is small, a high-speed search is possible. However, it is necessary to increase the number of branches in order to ensure accuracy, and there is a problem that the speed decreases.

また、従来技術２のように、距離計算をGPUデバイスで並列化した場合でも、GPUデバイスによる並列距離計算で多数の枝を高速に距離計算でき、精度を上げることができるが、ホップ度にホスト−GPUデバイス間の転送が発生するので、処理速度が低下する。 In addition, even when the distance calculation is parallelized with the GPU device as in the prior art 2, the parallel distance calculation by the GPU device can calculate the distance of many branches at high speed, and the accuracy can be improved. -Since transfer between GPU devices occurs, processing speed decreases.

本発明は、上記の点に鑑みなされたもので、ベクトルの次元数が高次元な場合に、クエリのベクトルと大規模なベクトルDBとの近傍探索処理を、GPUデバイスに代表されるメニーコア演算器の並列演算能力の高さに着目し、kNNグラフを用いて高精度かつ高速な探索処理を実現することが可能な近傍探索処理装置及び方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and when a vector has a high number of dimensions, a proximity search process between a query vector and a large-scale vector DB is performed by a many-core arithmetic unit represented by a GPU device. Focusing on the high parallel computing capability, the object is to provide a neighborhood search processing apparatus, method and program capable of realizing high-precision and high-speed search processing using a kNN graph.

上記の課題を解決するため、本発明（請求項１）は、kNN（k-Nearest Neighbor）グラフ構造を用いてベクトル群の近傍探索を行う近傍探索処理装置であって、
枝数Ｎ本の前記kNNグラフを構築し、グラフ探索開始ノードを現ノードとする事前処理手段と、
前記現ノードの枝Ｎ本のうち、該現ノードから距離が近いＭ本を選択して、該現ノードと該Ｍ本のノードとの距離を計算する距離計算手段と、
前記距離計算手段で得られた距離計算結果から、入力されたクエリから最短距離のノードを選択する最短距離ノード選択手段と、
選択された前記最短距離のノードが前記現ノードと等しくない場合は、該最短距離のノードにホップし、前記距離計算手段及び前記最短距離ノード選択手段を実行するグラフ探索手段と、
前記選択された前記最短距離のノードが等しい場合は、該現ノードから距離が遠いN−M個の枝ノードに対応するベクトルＩＤを選択し、該N−M個の枝ノードに対応するベクトル群と前記クエリのベクトルの並列距離計算を行い、該現ノード及び該N−M個の枝ノードと前記クエリ間の距離のN−M+1個の計算結果より、該クエリから最短距離のノードを選択する最近傍ベクトル抽出手段と、を有する。 In order to solve the above problems, the present invention (Claim 1) is a neighborhood search processing apparatus that performs neighborhood search of a vector group using a kNN (k-Nearest Neighbor) graph structure,
Pre-processing means for constructing the kNN graph of N branches and having the graph search start node as the current node;
A distance calculation means for selecting M branches near the current node from N branches of the current node and calculating a distance between the current node and the M nodes;
From the distance calculation result obtained by the distance calculation means, the shortest distance node selection means for selecting the shortest distance node from the input query,
If the selected shortest distance node is not equal to the current node, hop to the shortest distance node and execute the distance calculation means and the shortest distance node selection means;
If the selected nodes of the shortest distance are equal, a vector ID corresponding to NM branch nodes that are far from the current node is selected, and a vector group corresponding to the NM branch nodes is selected. And the parallel distance calculation of the vector of the query, and from the N−M + 1 calculation results of the distance between the current node and the N−M branch nodes and the query, the node with the shortest distance from the query is calculated. And nearest neighbor vector extraction means for selecting.

また、本発明（請求項２）は、ベクトル群を格納したベクトル記憶手段と、GPUデバイスと、前記ベクトル記憶手段の前記ベクトル群のデータを、前記GPUデバイスに転送する手段と、を更に有し、
前記最近傍ベクトル抽出手段は、
選択した前記ベクトルＩＤを前記GPUデバイスに転送し、前記N−M個の枝ノードに対応するベクトル群と前記クエリのベクトルの並列距離計算結果を該GPUデバイスから取得する手段を含み、
前記GPUデバイスは、
前記ベクトルＩＤに基づいて前記ベクトル群のデータを参照して、該ベクトルＩＤに対応するベクトル群と前記クエリのベクトルの並列距離計算を行う並列距離計算手段を含む。 The present invention (Claim 2) further comprises vector storage means storing vector groups, a GPU device, and means for transferring the data of the vector groups of the vector storage means to the GPU device. ,
The nearest neighbor vector extracting means includes:
Means for transferring the selected vector ID to the GPU device, and obtaining from the GPU device a parallel distance calculation result of a vector group corresponding to the NM branch nodes and a vector of the query;
The GPU device is
Parallel distance calculation means for referring to the data of the vector group based on the vector ID and calculating a parallel distance between the vector group corresponding to the vector ID and the vector of the query is included.

また、本発明（請求項３）は、メニーコア演算器を更に有し、
前記最近傍ベクトル抽出手段は、
前記選択された前記最短距離のノードが等しい場合は、前記メニーコア演算器に対して
前記現ノードから距離が遠いN−M個の枝ノードに対応するベクトル群と前記クエリベクトルの並列距離計算を指示し、計算結果を取得する並列距離計算指示手段を含み、
前記メニーコア演算器は、
前記現ノードから距離が遠いN−M個の枝ノードに対応するベクトル群と前記クエリベクトルの並列距離計算を行う手段を含む。 The present invention (Claim 3) further includes a many-core computing unit,
The nearest neighbor vector extracting means includes:
If the selected shortest distance nodes are equal, the many-core computing unit is instructed to calculate a parallel distance between the vector group corresponding to N-M branch nodes far from the current node and the query vector. And a parallel distance calculation instruction means for obtaining the calculation result,
The many-core computing unit is:
Means for calculating a parallel distance between a vector group corresponding to NM branch nodes far from the current node and the query vector;

上記のように本発明によれば、kNNグラフ構築時に探索用近傍ベクトルと、距離計算用近傍ベクトルを別に設けることで、従来のkNNより高速かつ高精度な近傍探索処理が可能になる。 As described above, according to the present invention, when a kNN graph is constructed, a search neighborhood vector and a distance calculation neighborhood vector are separately provided, so that neighborhood search processing that is faster and more accurate than conventional kNN becomes possible.

これにより、処理のコストの大きな大規模な高次元ベクトル群の近傍探索処理でも、kNNグラフを構築し、グラフ探索用の近傍ベクトル数とグラフ探索後の距離計算用の近傍ベクトル数を分けてグラフ探索後に多数のベクトルと距離計算することで、高精度な探索が可能になる。 As a result, even in the neighborhood search processing of large-scale high-dimensional vector groups with large processing costs, a kNN graph is constructed, and the graph is divided into the number of neighborhood vectors for graph search and the number of neighborhood vectors for distance calculation after the graph search. A high-precision search is possible by calculating distances with a large number of vectors after the search.

さらに、グラフ探索後の距離計算にGPUデバイス等のメニーコア演算器を用いて並列化することで高速な探索が可能になる。 Furthermore, a high-speed search becomes possible by parallelizing the distance calculation after the graph search using a many-core arithmetic unit such as a GPU device.

上記により、高次元なベクトルを取り扱う物体認識処理システムにおいて、処理時間を短縮することが可能となる。 As described above, in the object recognition processing system that handles high-dimensional vectors, it is possible to shorten the processing time.

従来技術１のフローチャートである。10 is a flowchart of Conventional Technology 1. 従来技術２のフローチャートである。It is a flowchart of prior art 2. 本発明の第１の実施の形態におけるホストコンピュータを有する並列化近傍探索処理装置の構成例である。It is an example of a structure of the parallelization vicinity search processing apparatus which has a host computer in the 1st Embodiment of this invention. 本発明の第１の実施の形態における並列化近傍探索処理装置の処理のフローチャートである。It is a flowchart of a process of the parallelization vicinity search processing apparatus in the 1st Embodiment of this invention. 本発明の第２の実施の形態におけるGPUを有する並列化近傍探索処理装置の構成例である。It is a structural example of the parallelization vicinity search processing apparatus which has GPU in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における並列化近傍探索処理装置の処理のフローチャートである。It is a flowchart of a process of the parallelization vicinity search processing apparatus in the 2nd Embodiment of this invention. 本発明の第３の実施の形態におけるメニーコア演算器を有する並列化近傍探索処理装置の構成例である。It is a structural example of the parallelization vicinity search processing apparatus which has the many-core arithmetic unit in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における並列化近傍探索処理装置の処理のフローチャートである。It is a flowchart of a process of the parallelization vicinity search processing apparatus in the 3rd Embodiment of this invention. 本発明の実施例１の現ノードの枝のリストである。It is a list of branches of the current node according to the first embodiment of the present invention. 本発明の実施例１の現ノードから選択された距離が近い１０本の枝の例である。It is an example of ten branches with a short distance selected from the current node according to the first embodiment of the present invention. 本発明の実施例１の最近傍ベクトル候補の例である。It is an example of the nearest neighbor vector candidate of Example 1 of this invention. 本発明の実施例２の現ノードの枝のリストである。It is a list of branches of the current node according to the second embodiment of the present invention. 本発明の実施例２の現ノードから選択された距離が近い８本の枝の例である。It is an example of eight branches with the short distance selected from the current node of Example 2 of this invention. 本発明の実施例２の最近傍ベクトル候補の例である。It is an example of the nearest neighbor vector candidate of Example 2 of this invention. 従来技術で新の最近傍が見つからない状況の一例（枝数３（Ｎ＝３）でkNNグラフ構築）である。This is an example of a situation where a new nearest neighbor cannot be found in the prior art (kNN graph construction with 3 branches (N = 3)). 本発明により真の最近傍が探索可能となる例である。This is an example in which the true nearest neighbor can be searched according to the present invention.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

[第1の実施の形態]
本実施の形態では、グラフ探索用と距離計算用の近傍ベクトル数を異なる数にする方法について説明する。 [First embodiment]
In the present embodiment, a method for setting different numbers of neighboring vectors for graph search and distance calculation will be described.

図３は、本発明の第１の実施の形態におけるホストコンピュータを有する並列化近傍探索処理装置の構成例である。 FIG. 3 is a configuration example of a parallel neighborhood search processing apparatus having a host computer according to the first embodiment of the present invention.

並列化近傍探索処理装置１１は、ホストコンピュータ１２を有し、ホストコンピュータ１２は、メインメモリ１３とＣＰＵ１４を有する。なお、ホストコンピュータ１２には、任意の台数の他のデバイスを接続することが可能である。 The parallel neighborhood search processing apparatus 11 has a host computer 12, and the host computer 12 has a main memory 13 and a CPU 14. Note that any number of other devices can be connected to the host computer 12.

メインメモリ１３は、外部から入力されるクエリのベクトルデータ及び、CPU１４における計算結果等を保持する。 The main memory 13 holds query vector data input from the outside, calculation results in the CPU 14, and the like.

CPU１４は、枝数Ｎ本のkNNグラフを構築する機能、グラフ探索開始ノードを現ノードとする機能、現ノードの枝Ｎ本から距離が近いＭ本を選択して、該現ノードと該Ｍ本のノードとの距離を計算する機能、距離計算結果から、入力されたクエリから最短距離のノードを選択する機能、選択された最短距離のノードが現ノードと等しいか否かを判定する機能、kNNグラフの探索を行う機能、グラフ探索後に、現ノード及びN−M個の枝ノードとクエリ間の距離のN−M+1個の計算結果より、該クエリから最短距離のノードを選択する機能、最短距離のノードのベクトルID、最短距離ノードとクエリ間の距離を出力する機能を有する。 The CPU 14 selects a function for constructing a kNN graph with N branches, a function for setting a graph search start node as a current node, and M that is close to the N branches of the current node. A function to calculate the distance to the node, a function to select the shortest distance node from the input query from the distance calculation result, a function to determine whether or not the selected shortest distance node is equal to the current node, kNN A function for searching a graph, a function for selecting a node having the shortest distance from the current node and N−M + 1 branch calculation results and N−M + 1 calculation results of the distance between the queries after the graph search, It has a function of outputting the vector ID of the shortest distance node and the distance between the shortest distance node and the query.

上記の構成における動作を説明する。 The operation in the above configuration will be described.

図４は、本発明の第1の実施の形態における並列化近傍探索処理装置の処理のフローチャートである。 FIG. 4 is a flowchart of the process of the parallelization neighborhood search processing apparatus in the first embodiment of the present invention.

以下では、枝数Ｎ本でkNNグラフがホストコンピュータ１２内のメインメモリ１３上に構築されていることを前提とする。 In the following, it is assumed that a kNN graph with N branches is constructed on the main memory 13 in the host computer 12.

ステップ１０１）予め、ホストコンピュータ１２のCPU１４は、グラフ探索開始ノードを決定し、メインメモリ１３に格納する。 Step 101) The CPU 14 of the host computer 12 determines a graph search start node in advance and stores it in the main memory 13.

ステップ１０２）ホストコンピュータ１２は、外部からのクエリのベクトルデータの入力を受け付け、メインメモリ１３に格納する。 Step 102) The host computer 12 receives an input of query vector data from the outside and stores it in the main memory 13.

ステップ１０３） CPU１４は、入力されたベクトルデータに対して、グラフ探索開始ノードを現ノードとする。 Step 103) The CPU 14 sets the graph search start node as the current node for the input vector data.

ステップ１０４） CPU１４は、現ノードの枝Ｎ本のうち、現ノードから距離が近い枝Ｍ本（Ｎ＞Ｍ）を選択する。 Step 104) The CPU 14 selects M branches (N> M) that are close to the current node among the N branches of the current node.

ステップ１０５） CPU１４において、クエリが、現ノードとその枝Ｍ本のノード群との距離を計算し、メインメモリ１３に格納する。 Step 105) In the CPU 14, the query calculates the distance between the current node and its branch M node group and stores it in the main memory 13.

ステップ１０６）メインメモリ１３に格納されている計算結果から、クエリからの最短距離のノードを選択する。 Step 106) From the calculation results stored in the main memory 13, the node with the shortest distance from the query is selected.

ステップ１０７）もし、クエリから最短距離のノードが現ノードと等しくないならば、ステップ１０８に移行し、等しい場合はステップ１０９に移行する。 Step 107) If the node at the shortest distance from the query is not equal to the current node, go to Step 108, otherwise go to Step 109.

ステップ１０８） CPU１４は、最短距離の枝ノードにホップし、ステップ１０４に移行して、ホップ先のノードを現ノードとしてグラフ探索する。 Step 108) The CPU 14 hops to the branch node with the shortest distance, moves to Step 104, and searches the graph with the hop-destination node as the current node.

ステップ１０９） CPU１４は、現ノードから距離が遠いＮ−Ｍ個の枝ノードに対応するベクトルＩＤを選択する。 Step 109) The CPU 14 selects vector IDs corresponding to the NM branch nodes that are far from the current node.

ステップ１１０） CPU１４は、現ノード及び選択されたＮ−Ｍ個の枝ノードとクエリ間の距離を計算し、メインメモリ１３に格納する。 Step 110) The CPU 14 calculates the distance between the current node and the selected NM branch nodes and the query, and stores it in the main memory 13.

ステップ１１１） CPU１４は、メインメモリ１３に格納されている計算結果Ｎ−Ｍ＋１個より、クエリから最短距離にあるノードを選択する。 Step 111) The CPU 14 selects a node at the shortest distance from the query from the calculation results N−M + 1 stored in the main memory 13.

ステップ１１２） CPU１４は、メインメモリ１３から最短距離のノードのベクトルＩＤと、最短距離ノードとクエリ間の距離を読み出して出力する。 Step 112) The CPU 14 reads out the vector ID of the shortest distance node from the main memory 13 and the distance between the shortest distance node and the query and outputs them.

［第２の実施の形態］
次に、図３の構成に並列距離計算を行うためのGPUデバイスを用いる例について説明する。 [Second Embodiment]
Next, an example of using a GPU device for performing parallel distance calculation in the configuration of FIG. 3 will be described.

図５は、本発明の第２の実施の形態におけるGPUを有する並列化近傍探索処理装置の構成の例を示す。 FIG. 5 shows an example of the configuration of a parallelized neighborhood search processing apparatus having a GPU according to the second embodiment of the present invention.

同図に示す並列化近傍探索処理装置２１は、図３のホストコンピュータ１２に加えて、ベクトルＤＢ２５とGPUデバイス２６を付加した構成である。GPUデバイス２６は、GPUグローバルメモリ２７、GPU２８を有する。 The parallelized neighborhood search processing device 21 shown in the figure has a configuration in which a vector DB 25 and a GPU device 26 are added to the host computer 12 of FIG. The GPU device 26 includes a GPU global memory 27 and a GPU 28.

図５に示すCPU２４は、ベクトルDB２５のベクトルデータをGPUデバイス２４に転送する機能、及び、第１の実施の形態における、N−M個の枝ノードに対応するベクトル群とクエリのベクトルの並列距離計算を、GPUデバイス２４に対して指示し、計算結果を取得する機能を有する。その他の機能は第１の実施の形態と同様である。 The CPU 24 shown in FIG. 5 transfers the vector data of the vector DB 25 to the GPU device 24, and the parallel distance between the vector group corresponding to the NM branch nodes and the query vector in the first embodiment. It has a function of instructing calculation to the GPU device 24 and acquiring a calculation result. Other functions are the same as those in the first embodiment.

GPU２８は、CPU２４から並列距離計算の指示を受けると、N−M個の枝ノードに対応するベクトル群とクエリのベクトルの並列距離計算を行い、その結果をCPU２４に返却する機能を有する。 Upon receiving an instruction for parallel distance calculation from the CPU 24, the GPU 28 has a function of calculating a parallel distance between a vector group corresponding to NM branch nodes and a query vector and returning the result to the CPU 24.

以下に、上記の構成における動作を説明する。 The operation in the above configuration will be described below.

図６は、本発明の第２の実施の形態における並列化近傍探索処理装置の処理のフローチャートである。 FIG. 6 is a flowchart of the processing of the parallelized neighborhood search processing device in the second embodiment of the present invention.

以下では、枝数Ｎ本でkNNグラフがホストコンピュータ２２のメインメモリ２３に構築されていることを前提とする。 In the following, it is assumed that a kNN graph with N branches is constructed in the main memory 23 of the host computer 22.

ステップ２０１）予めホストコンピュータ２２は、ベクトルＤＢ２５をGPUデバイス２６に転送する。 Step 201) The host computer 22 transfers the vector DB 25 to the GPU device 26 in advance.

ステップ２０２） CPU２４は、グラフ探索開始ノードを決定する。 Step 202) The CPU 24 determines a graph search start node.

ステップ２０３） CPU２４は、外部からのクエリのベクトルデータの入力を受け付け、メインメモリ２３に格納する。 Step 203) The CPU 24 receives the input of the query vector data from the outside and stores it in the main memory 23.

ステップ２０４） CPU２４は、入力されたベクトルデータに対してグラフ探索開始ノードを現ノードとする。 Step 204) The CPU 24 sets the graph search start node as the current node for the input vector data.

ステップ２０５） CPU２４は、現ノードの枝Ｎ本のうち、現ノードから距離が近い枝Ｍ本（Ｎ＞Ｍ）を選択し、メインメモリ２３に格納する。 Step 205) The CPU 24 selects M branches (N> M) that are close to the current node among the N branches of the current node, and stores them in the main memory 23.

ステップ２０６） CPU２４において、クエリが現ノードとその枝Ｍ本のノード群との距離を計算し、メインメモリ２３に格納する。 Step 206) In the CPU 24, the query calculates the distance between the current node and its branch M node group, and stores it in the main memory 23.

ステップ２０７） CPU２４は、メインメモリ２３に格納されている計算結果より、クエリから最短距離のノードを選択する。 Step 207) The CPU 24 selects the node with the shortest distance from the query based on the calculation result stored in the main memory 23.

ステップ２０８） CPU２４は、もし、クエリから最短距離のノードが現ノードと等しくないならば、ステップ２０９に移行する。等しい場合はステップ２１０に移行する。 Step 208) The CPU 24 proceeds to Step 209 if the node with the shortest distance from the query is not equal to the current node. If they are equal, the process proceeds to step 210.

ステップ２０９） CPU２４は、最短距離の枝ノードにホップし、ステップ２０５の処理に戻り、ホップ先のノードを現ノードとしてグラフ探索を続ける。 Step 209) The CPU 24 hops to the branch node with the shortest distance, returns to the processing of Step 205, and continues the graph search with the hop destination node as the current node.

ステップ２１０） CPU２４は、クエリから最短距離のノードが現ノードに等しい場合は、現ノードから距離が遠いＮ−Ｍ個の枝ノードに対応するベクトルＩＤを選択する。 Step 210) If the node with the shortest distance from the query is equal to the current node, the CPU 24 selects the vector ID corresponding to the NM branch nodes that are far from the current node.

ステップ２１１） CPU２４は、クエリのベクトルデータ及び、ステップ２１０で選択されたＮ−Ｍ個のベクトルＩＤをGPUデバイス２６に転送する。 Step 211) The CPU 24 transfers the vector data of the query and the NM vector IDs selected in Step 210 to the GPU device 26.

ステップ２１２） GPUデバイス２６は、Ｎ−Ｍ個のノードに対応するベクトル群とクエリのベクトルの並列距離計算を行う。 Step 212) The GPU device 26 performs parallel distance calculation between the vector group corresponding to the NM nodes and the query vector.

ステップ２１３） GPUデバイス２６は、ステップ２１２で計算されたＮ−Ｍ個の距離計算結果をホストコンピュータ２２に転送する。 Step 213) The GPU device 26 transfers the NM distance calculation results calculated in Step 212 to the host computer 22.

ステップ２１４） CPU２４は、GPUデバイス２６から取得した距離計算結果をメインメモリ２３に格納し、現ノード及び選択されたＮ−Ｍ個の枝ノードとクエリ間の距離計算結果Ｎ−Ｍ＋１個より、クエリから最短距離のノードを選択する。 Step 214) The CPU 24 stores the distance calculation result obtained from the GPU device 26 in the main memory 23, and obtains the query from the distance calculation result NM + 1 between the current node and the selected NM branch nodes and the query. Select the node with the shortest distance from.

ステップ２１５） CPU２４は、最短距離ノードのベクトルＩＤと、最短距離ノードとクエリ間の距離を出力する。 Step 215) The CPU 24 outputs the vector ID of the shortest distance node and the distance between the shortest distance node and the query.

［第３の実施の形態］
本実施の形態では、第２の実施の形態におけるGPUデバイス２６がメニーコア演算器であった場合について説明する。 [Third Embodiment]
In the present embodiment, a case where the GPU device 26 in the second embodiment is a many-core computing unit will be described.

図７は、本発明の第３の実施の形態におけるメニーコア演算器を有する並列化近傍探索処理装置の構成例を示す。 FIG. 7 shows a configuration example of a parallel neighborhood search processing apparatus having a many-core arithmetic unit according to the third embodiment of the present invention.

同図に示す並列化近傍探索処理装置３１は、図５のGPUデバイス２６としてメニーコア演算器３６がホストコンピュータ３２に接続されている。メニーコア演算器３６はホストコンピュータ３２に複数接続することが可能である。 In the parallelized neighborhood search processing device 31 shown in FIG. 5, a many-core computing unit 36 is connected to the host computer 32 as the GPU device 26 in FIG. A plurality of many-core computing units 36 can be connected to the host computer 32.

図７に示すCPU３４は、第１の実施の形態における、N−M個の枝ノードに対応するベクトル群とクエリのベクトルの並列距離計算を、メニーコア演算器３６に対して指示し、計算結果を取得する機能を有する。その他の機能は第１の実施の形態と同様である。 The CPU 34 illustrated in FIG. 7 instructs the many-core arithmetic unit 36 to calculate the parallel distance between the vector group corresponding to the NM branch nodes and the query vector in the first embodiment, and the calculation result is obtained. Has a function to acquire. Other functions are the same as those in the first embodiment.

メニーコア演算器３６は、CPU３４から並列距離計算の指示を受けると、N−M個の枝ノードに対応するベクトル群とクエリのベクトルの並列距離計算を行い、その結果をCPU３４に返却する機能を有する。 When receiving a parallel distance calculation instruction from the CPU 34, the many-core arithmetic unit 36 has a function of calculating a parallel distance between a vector group corresponding to NM branch nodes and a query vector and returning the result to the CPU 34. .

図８は、本発明の第３の実施の形態における並列化近傍探索処理装置の処理のフローチャートである。 FIG. 8 is a flowchart of the process of the parallel neighborhood search processing apparatus in the third embodiment of the present invention.

ステップ３０１）ホストコンピュータ３２は、グラフ探索開始ノードを決定する。 Step 301) The host computer 32 determines a graph search start node.

ステップ３０２）ホストコンピュータ３２は、外部からのクエリのベクトルデータの入力を受け付け、メインメモリ３３に格納する。 Step 302) The host computer 32 accepts input of query vector data from the outside and stores it in the main memory 33.

ステップ３０３） CPU３４は、入力されたベクトルデータに対してグラフ探索開始ノードを現ノードに決定する。 Step 303) The CPU 34 determines the graph search start node as the current node for the input vector data.

ステップ３０４） CPU３４は、現ノードの枝Ｎ本のうち、現ノードから距離が近い枝Ｍ本（Ｎ＞Ｍ）を選択する。 Step 304) The CPU 34 selects M branches (N> M) that are close to the current node among the N branches of the current node.

ステップ３０５） CPU３４において、クエリが現ノードとその枝Ｍ本のノード群との距離を計算する。 Step 305) In the CPU 34, the query calculates the distance between the current node and its branch M node group.

ステップ３０６） CPU３４は、ステップ３０５の計算結果より、クエリから最短距離のノードを選択する。 Step 306) The CPU 34 selects the node with the shortest distance from the query based on the calculation result of step 305.

ステップ３０７）もし、クエリから最短距離のノードが現ノードと等しいならば、ステップ３０９に移行し、等しくないならばステップ３０８に移行する。 Step 307) If the node with the shortest distance from the query is equal to the current node, go to Step 309, otherwise go to Step 308.

ステップ３０８）最短距離の枝ノードにホップし、ステップ３０４の処理に戻り、ホップ先のノードを現ノードとしてグラフ探索を続ける。 Step 308) Hop to the branch node with the shortest distance, and return to the processing of Step 304 to continue the graph search with the hop destination node as the current node.

ステップ３０９） CPU３４は、メニーコア演算器３６に対して並列距離計算を指示し、メニーコア演算器３６において、現ノードから距離が遠いＮ−Ｍ個の枝ノードに対応するベクトル群とクエリのベクトルの並列距離を計算し、計算結果をCPU３４に転送する。 Step 309) The CPU 34 instructs the many-core computing unit 36 to calculate the parallel distance, and the many-core computing unit 36 parallels the vector group corresponding to the NM branch nodes far from the current node with the query vector. The distance is calculated and the calculation result is transferred to the CPU 34.

ステップ３１０） CPU３４は、メニーコア演算気３６の距離計算結果をメインメモリ３３に格納し、現ノード及び選択されたＮ−Ｍ個の枝ノードとクエリ間の距離計算結果Ｎ−Ｍ＋１個よりクエリから最願距離のノードを選択する。 Step 310) The CPU 34 stores the distance calculation result of the many-core computing module 36 in the main memory 33, and calculates the distance calculation result NM + 1 between the current node and the selected NM branch nodes and the query from the query. Select the desired distance node.

ステップ３１１） CPU３４は、最短距離ノードのベクトルＩＤと最短距離ノードとクエリ間の距離を出力する。 Step 311) The CPU 34 outputs the vector ID of the shortest distance node and the distance between the shortest distance node and the query.

なお、本発明では、ホスト及びその内部の装置（メインメモリやＣＰＵ）、ベクトルＤＢが保存されている記憶装置、GPUデバイス、メニーコア演算器は複数搭載されていても適用可能である。 Note that the present invention is applicable even if a host and its internal devices (main memory and CPU), a storage device storing a vector DB, a GPU device, and a plurality of many-core computing units are mounted.

また、本発明では、kNNグラフのノードの1つがベクトルＤＢ中のベクトル1に対応しているものとする。 In the present invention, it is assumed that one of the nodes of the kNN graph corresponds to the vector 1 in the vector DB.

以下に、上記の実施の形態における実施例を説明する。 Examples of the above embodiment will be described below.

［実施例１］
本実施例では、前述の第２の実施の形態の具体的な例を示す。 [Example 1]
In this example, a specific example of the above-described second embodiment is shown.

事前処理として、ベクトルＤＢ２６に保存されている画像群に対して、画像認識プログラムであるSURFアルゴリズムにより、128次元ベクトル群が合計10,000,000ベクトル抽出され、ハードディスク側（ベクトルDB２５）に保存されている。 As pre-processing, a total of 10,000,000 vectors of 128-dimensional vector groups are extracted from the image groups stored in the vector DB 26 by the SURF algorithm, which is an image recognition program, and stored on the hard disk side (vector DB 25).

SURFアルゴリズムにより抽出されるベクトルの具体例を以下示す。
（0.00771829，0.00876939，-0.00164389，0.00242805，0.00591626，0.0059704，-0.000699053，0.0010269，-0.0402349，0.0478606，-0.00675006，0.008548，0.0079681，0.00937274，0.0254017，0.0303479，0.0105966，0.0781956，0.0237573，0.0313011，-0.0058261，0.00972548，0.00640149，0.00890163，0.00202366，0.005785，0.0008189，0.00261104，0.00168162，0.00313512，0.000645605，0.00187556，0.0156556，0.0198709，0.043964，0.0491846，-0.0457624，0.0519745，0.00104955，0.00352257，-0.0473678，0.0676495，-0.290302，0.372653，-0.105448，0.125615，-0.165742，0.227193，0.00130149，0.33567，0.171501，0.368598，-0.0482353，0.108972，0.0130523，0.0544993，-0.00404431，0.0100914，0.0318822，0.05294，-0.00937941，0.0154139，-0.000685246，0.00858936，0.00154014，0.0031448，0.00700949，0.00804435，-0.00668938，0.00847851，0.000471806，0.00180168，0.0882967，0.18727，0.0836585，0.0885382，-0.00612456，0.128606，0.0955226，0.0993798，-0.122265，0.198727，0.124402，0.35299，-0.0196364，0.0753894，0.0210032，0.155274，-0.00605615，0.0433227，-0.0162856，0.0611591，-0.0026333，0.00693902，1.00823e-005，0.00847174，0.00122396，0.00126437，-4.30777e-005，4.30777e-005，0.000536311，0.000536311，-2.0572e-005，2.79183e-005，0.0566939，0.0566939，-0.00197521，0.00197521，0.0473506，0.0473506，-0.00099733，0.00099733，-0.0350208，0.0410233，-0.00671439，0.0295273，0.00189219，0.003061，0.0154454，0.0282393，0.00100916，0.00239708，-0.00259997，0.00458948，0.00259418，0.00365359，0.000663469，0.00140468）
並列化近傍探索処理装置のCPU２４は、ベクトルＤＢ２５の各ベクトルにおいて、探索用の近傍ベクトル数を10、距離計算用の近傍ベクトル数を512としてkNNグラフを構築する。 Specific examples of vectors extracted by the SURF algorithm are shown below.
(0.00771829, 0.00876939, -0.00164389, 0.00242805, 0.00591626, 0.0059704, -0.000699053, 0.0010269, -0.0402349, 0.0478606, -0.00675006, 0.008548, 0.0079681, 0.00937274, 0.0254017, 0.0303479, 0.0105966, 0.0781956, 0.0237573, 0.0313011, -54858261 0.00640149, 0.00890163, 0.00202366, 0.005785, 0.0008189, 0.00261104, 0.00168162, 0.00313512, 0.000645605, 0.00187556, 0.0156556, 0.0198709, 0.043964, 0.0491846, -0.0457624, 0.0519745, 0.00104955, 0.00352257, -0.0473678, 0.0676495, -0.290302, 0.372653, -0.105448 0.125615, -0.165742, 0.227193, 0.00130149, 0.33567, 0.171501, 0.368598, -0.0482353, 0.108972, 0.0130523, 0.0544993, -0.00404431, 0.0100914, 0.0318822, 0.05294, -0.00937941, 0.0154139, -0.000685246, 0.00858936, 0.00154014, 0.0031448, 0.00700949, 0.00804435 , -0.00668938, 0.00847851, 0.000471806, 0.00180168, 0.0882967, 0.18727, 0.0836585, 0.0885382, -0.00612456, 0.128606, 0.0955226, 0.0993798, -0.1 22265, 0.198727, 0.124402, 0.35299, -0.0196364, 0.0753894, 0.0210032, 0.155274, -0.00605615, 0.0433227, -0.0162856, 0.0611591, -0.0026333, 0.00693902, 1.00823e-005, 0.00847174, 0.00122396, 0.00126437, -4.330777e-005, 4.30777 e-005,0.000536311,0.000536311, -2.0572e-005,2.79183e-005,0.0566939,0.0566939, -0.00197521,0.00197521,0.0473506,0.0473506, -0.00099733,0.00099733, -0.0350208,0.0410233, -0.00671439,0.0295273,0.00189219,0.003061 , 0.0154454, 0.0282393, 0.00100916, 0.00239708, -0.00259997, 0.00458948, 0.00259418, 0.00365359, 0.000663469, 0.00140468)
The CPU 24 of the parallel neighborhood search processing apparatus constructs a kNN graph with 10 neighborhood vectors for search and 512 neighborhood vectors for distance calculation in each vector of the vector DB 25.

ユーザが認識したい画像を撮影し、撮影画像からSURFアルゴリズムにより128次元のベクトル群が1,000ベクトル抽出されたものとして、各クエリのベクトルに対して以下の処理を行う。 An image that the user wants to recognize is captured, and the following processing is performed on the vectors of each query, assuming that 1,000 vectors of 128-dimensional vector groups are extracted from the captured image by the SURF algorithm.

以下、第２の実施の形態におけるホストコンピュータ２２とGPUデバイス２６からなる並列化近傍探索処理装置２１において、枝数512本でkNNグラフが構築されている場合における処理を説明する。以下では、図６に示すフローチャートに沿って説明する。 Hereinafter, processing in the case where a kNN graph is constructed with 512 branches in the parallelized neighborhood search processing device 21 including the host computer 22 and the GPU device 26 in the second embodiment will be described. Below, it demonstrates along the flowchart shown in FIG.

予めホストコンピュータ２２のCPU２４は、ベクトルＤＢ２５のベクトルをGPUデバイス２６に転送し（ステップ２０１）、探索開始ノードを５点決定し（ステップ２０２）、外部から入力されたクエリのベクトルデータを取得し（ステップ２０３）、探索開始ノードを現ノードに設定する（ステップ２０４）。現ノードの枝512本が、例えば、図９のリストで示されていると、このリスト内で現ノードから距離が近い枝10本を選択する（ステップ２０５）。その結果図１０に示すような表となる。 The CPU 24 of the host computer 22 transfers the vector in the vector DB 25 to the GPU device 26 in advance (step 201), determines five search start nodes (step 202), and acquires the vector data of the query input from the outside (step 202). Step 203), the search start node is set to the current node (Step 204). For example, if 512 branches of the current node are shown in the list of FIG. 9, 10 branches that are close to the current node in the list are selected (step 205). As a result, a table as shown in FIG. 10 is obtained.

クエリが現ノードとその枝10本のノード群との距離を計算し（ステップ２０６）、計算結果よりクエリから最短距離のノードを選択し（ステップ２０７）、もし、クエリから最短距離のノードが現ノードと等しくないならば（ステップ２０８、No）、最短距離の枝ノードにホップして（ステップ２０９）、ホップ先のノードを現ノードとして探索を続ける（ステップ２０５〜２０８）。もし、クエリから最短距離のノードが現ノードと等しいならば（ステップ２０８、Yes）、クエリのベクトルデータ及び、現ノードから距離が遠い枝502本のノードに対応するベクトルのＩＤをGPUデバイス２６に転送する（ステップ２１０，２１１）。 The query calculates the distance between the current node and the 10 nodes in the branch (step 206), selects the node with the shortest distance from the query based on the calculation result (step 207), and if the node with the shortest distance from the query is If it is not equal to the node (step 208, No), hop to the branch node with the shortest distance (step 209), and continue the search with the hop-destination node as the current node (steps 205-208). If the node with the shortest distance from the query is equal to the current node (step 208, Yes), the vector data of the query and the ID of the vector corresponding to the 502 nodes that are far from the current node are sent to the GPU device 26. Transfer (steps 210 and 211).

GPUデバイス２６は、枝502本のノードとベクトルの並列距離計算を行い（ステップ２１２）、計算結果をホストコンピュータ２２に転送する（ステップ２１３）。 The GPU device 26 calculates the parallel distance between the 502 nodes and the vector (step 212), and transfers the calculation result to the host computer 22 (step 213).

CPU２４は、現ノードとGPUデバイス２６で計算された枝ノード502個の計算結果より、クエリから最短距離のノードを選択し（ステップ２１４）、最短距離のノードのＤＢベクトルＩＤと、クエリとそのノードの最短距離値を出力する（ステップ２１５）。 The CPU 24 selects the node with the shortest distance from the query based on the calculation result of the current node and the 502 branch nodes calculated by the GPU device 26 (step 214), the DB vector ID of the node with the shortest distance, the query, and its node The shortest distance value is output (step 215).

探索開始ノードは５点なので、図１１に示すように、最近傍ベクトル候補が５つ出てくるため、図１１に示すリストの中からクエリのベクトルから最短距離のベクトルID:172818と距離値：0.061359を出力する。 Since there are five search start nodes, five nearest neighbor vector candidates appear as shown in FIG. 11. Therefore, the vector ID of the shortest distance from the query vector in the list shown in FIG. 0.061359 is output.

この結果、各クエリのベクトルに対して、ベクトルＤＢ２５内で距離が近似的に最近傍であるベクトルを求めることができる。 As a result, for each query vector, a vector whose distance is approximately the nearest in the vector DB 25 can be obtained.

このように、SURFアルゴリズムから抽出される高次元ベクトル群に対して、本発明を適用することで、高精度かつ高速な近傍探索処理が可能になる。 As described above, by applying the present invention to the high-dimensional vector group extracted from the SURF algorithm, high-precision and high-speed neighborhood search processing can be performed.

この他にも、音声や文書、位置データなどから抽出されたベクトルの近傍探索にも本発明を適用することが可能である。 In addition to this, the present invention can also be applied to the neighborhood search of vectors extracted from speech, documents, position data, and the like.

［実施例２］
本実施例では、前述の第３の実施の形態における具体的な例を示す。 [Example 2]
In this example, a specific example in the third embodiment will be described.

並列化近傍探索処理装置３１は、ホストコンピュータ３２とメニーコア演算器３６を有する。 The parallelization neighborhood search processing device 31 includes a host computer 32 and a many-core arithmetic unit 36.

事前処理として、DB側に保存されている画像群に対して、局所領域を抽出するPCA-SIFTアルゴリズムにより、36次元のベクトル群が5,000,000ベクトル抽出され、ハードディスク側（ベクトルDB３５）に保存されている。 As pre-processing, 5,000,000 vectors of 36-dimensional vector groups are extracted from the image groups stored on the DB side by the PCA-SIFT algorithm that extracts local regions, and stored on the hard disk side (vector DB 35). .

PCA-SIFTアルゴリズムにより抽出されるベクトルの具体例を以下に挙げる。
（1573，-1611，1546，-1450，-11，-322，-384，-385，-652，-459，0，377，-619，-969，-858，-591，1138，-535，-2471，-2174，2022，-257，-248，-1820，537，1224，-1272，279，512，-1340，1124，1955，95，-623，1181，-2839）
ベクトルDB３５の各DBベクトルにおいて、探索用の近傍ベクトル数を８、距離計算用の近傍ベクトル数を1024として、kNNグラフを構築する。 Specific examples of vectors extracted by the PCA-SIFT algorithm are given below.
(1573, -1611,1546, -1450, -11, -322, -384, -385, -652, -459,0,377, -619, -969, -858, -591,1138, -535, -2471, -2174, 2022, -257, -248, -1820, 537, 1224, -1272, 279, 512, -1340, 1124, 1955, 95, -623, 1181, -2839)
In each DB vector of the vector DB 35, a kNN graph is constructed with 8 neighborhood vectors for search and 1024 neighborhood vectors for distance calculation.

ユーザが認識したい画像を撮影し、撮影画像からSURFアルゴリズムにより36次元のベクトル群が500ベクトル抽出されたとして、各クエリのベクトルに対して以下の処理を行う。 An image that the user wants to recognize is photographed, and 500 vectors of 36-dimensional vectors are extracted from the photographed image by the SURF algorithm, and the following processing is performed on each query vector.

枝数1024本でkNNグラフが構築されている場合において、ホストコンピュータ３２のCPU３４は、探索開始ノードを10点決定し（ステップ３０１）、クエリのベクトルデータが入力されると（ステップ３０２）、クエリが各探索開始ノードから探索し（ステップ３０３）、現ノードの枝1024本が、例えば図１２のリストで示されていると、このリスト中で、現ノードから距離が近い枝8本を選択する（ステップ３０４）。その例を図１３に示す。 In the case where the kNN graph is constructed with 1024 branches, the CPU 34 of the host computer 32 determines 10 search start nodes (step 301), and when the query vector data is input (step 302), the query Search from each search start node (step 303), and if 1024 branches of the current node are shown in the list of FIG. 12, for example, eight branches that are close to the current node are selected in this list. (Step 304). An example is shown in FIG.

クエリが、現ノードとその枝８本のノード群と距離計算し（ステップ３０５）、計算結果より、クエリから最短距離のノードを選択し（ステップ３０６）、もし、クエリから最短距離のノードが現ノード以外ならば（ステップ３０７,No）、最短距離の枝ノードにホップし、ホップ先のノードを現ノードとして探索を続ける（ステップ３０４〜３０７）。クエリから最短距離のノードが現ノードの場合（ステップ３０７,Yes）、メニーコア演算器３６において、現ノードから距離が遠い枝1016本のノード群に対応ベクトルとクエリのベクトルの並列距離計算を行い、ホストコンピュータ３２に転送する（ステップ３０９）。 The query calculates the distance between the current node and the group of eight branches (step 305), and selects the node with the shortest distance from the query based on the calculation result (step 306). If it is not a node (step 307, No), it hops to the branch node with the shortest distance, and continues the search with the hop-destination node as the current node (steps 304 to 307). If the node with the shortest distance from the query is the current node (step 307, Yes), the many-core computing unit 36 calculates the parallel distance between the corresponding vector and the query vector for 1016 nodes that are far from the current node. The data is transferred to the host computer 32 (step 309).

ホストコンピュータ３２のCPU３４は、現ノードとメニーコア演算器３６で計算された1016個の枝ノードの距離計算結果より、クエリから最願距離のノードを選択し（ステップ３１０）、最短距離ノードのベクトルＩＤと最短距離ノードとクエリ間の距離値を出力する（ステップ３１１）。 The CPU 34 of the host computer 32 selects the node with the longest desired distance from the query based on the distance calculation result of the 1016 branch nodes calculated by the current node and the many-core computing unit 36 (step 310), and the vector ID of the shortest distance node. The distance value between the shortest distance node and the query is output (step 311).

探索開始ノードは10点なので、図１４に示すように、最近傍ベクトル候補が10でてくるので、図１４のリストの中から最短距離のベクトルID:27990と距離値:432507を出力する。 Since the search start node is 10 points, the nearest neighbor vector candidate is 10 as shown in FIG. 14. Therefore, the shortest distance vector ID: 27990 and the distance value: 432507 are output from the list of FIG.

この結果、各クエリベクトルに対して、ベクトルＤＢ３５内で距離が近似的に最近傍なベクトルを求めることができる。 As a result, for each query vector, a vector whose distance is approximately the nearest in the vector DB 35 can be obtained.

このように、PCA-SIFTアルゴリズムから抽出される高次元ベクトル群に対して、本発明を適用することで、高精度かつ高速な近傍探索処理が可能となる。 Thus, by applying the present invention to a high-dimensional vector group extracted from the PCA-SIFT algorithm, high-precision and high-speed neighborhood search processing can be performed.

以下に、従来技術と本発明の差分を説明する。 The difference between the prior art and the present invention will be described below.

従来技術１のホストのみを利用したkNNグラフを用いた探索は、グラフ探索後、到達したノードのベクトルを出力する。しかし、図１５に示すように、到達したノードが必ずしも最近傍ではない可能性もある。これに対し、本発明は、グラフ探索後、到達したノードのベクトルとその近傍のベクトル群の複数個と距離計算し、その中で最短距離のベクトルを最近傍ベクトルとして出力する。従来技術と比較すると、グラフ探索後に到達したノードとその周辺のノードともう一度距離計算して、到達したノード以外に最近傍がないかを再確認していることにより、従来技術より精度の向上が見込める。 The search using the kNN graph using only the host of Prior Art 1 outputs a vector of the reached nodes after the graph search. However, as shown in FIG. 15, the reached node may not necessarily be the nearest neighbor. On the other hand, according to the present invention, after the graph search, the distance between the reached node vector and a plurality of neighboring vector groups is calculated, and the shortest distance vector is output as the nearest neighbor vector. Compared to the prior art, the distance between the node reached after the graph search and its surrounding nodes is calculated once again, and it is reconfirmed that there is no nearest neighbor other than the reached node. I can expect.

図１６は、本発明により真の最近傍が探索可能となる例であり、到達ノードの周囲とグラフ探索時より広範囲に、もう一度距離計算することで従来技術では到達し得なかった真の最近傍が探索可能となる例を示している。例えば、枝数100（Ｎ＝100）(Ａの枝（点線）＋Ｂの枝（細い実線）)でkNNグラフを構築し、グラフ探索時は、枝数３（Ｍ＝３）（Ａの枝）で探索し、グラフ探索後に、枝数97（Ｎ−Ｍ）（Ｂの枝）で到達ノード近傍のノードと距離計算を行う。なお、計算コストの上昇はGPUデバイスによる並列化を行うことで抑制できる。 FIG. 16 is an example in which the true nearest neighbor can be searched according to the present invention, and the true nearest neighbor that could not be reached by the prior art by calculating the distance again around the reaching node and in a wider range than during the graph search. Shows an example in which search is possible. For example, a kNN graph is constructed with the number of branches 100 (N = 100) (the branch of A (dotted line) + the branch of B (thin solid line)), and when searching for a graph, the number of branches 3 (M = 3) (the branch of A) After the graph search, distance calculation is performed with nodes near the reaching node with the number of branches 97 (NM) (B branches). The increase in calculation cost can be suppressed by parallelization using GPU devices.

従来技術においてもN=100でグラフ探索しても、真の最近傍に到達可能であるが、グラフ探索時の距離計算回数が増加して探索速度が低下する。 Even in the prior art, even if the graph search is performed with N = 100, the true nearest neighbor can be reached, but the number of distance calculations during the graph search increases and the search speed decreases.

また、ホストとGPUデバイスを利用した従来技術２は、グラフを探索時に、ノードヲホップする度に、その距離計算をGPUデバイスで行うために、ホスト−GPUデバイス間の転送が多発し、探索速度が低下するが、本発明では、グラフ探索後にGPUデバイスによる並列距離計算を行うため、ホスト−GPUデバイス間の速度低下を抑えることができる。 Also, in the conventional technique 2 using the host and the GPU device, when the graph is searched, every time the node is hopped, the distance calculation is performed by the GPU device, so the transfer between the host and the GPU device occurs frequently, and the search speed is reduced. However, in the present invention, since the parallel distance calculation by the GPU device is performed after the graph search, the speed reduction between the host and the GPU device can be suppressed.

上記の並列化近傍探索処理装置の図４、図６、図８に示す各構成要素の動作をプログラムとして構築し、並列化近傍探索装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 The operation of each component shown in FIG. 4, FIG. 6, FIG. 8 of the above parallel neighborhood search processing device is constructed as a program and installed in a computer used as a parallel neighborhood search device for execution, or a network It is possible to circulate through.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

１１，２１，３１並列化近傍探索処理装置
１２，２２，３２ホストコンピュータ
１３，２３，３３メインメモリ
１４，２４，３４ＣＰＵ
２５ベクトルDB
２６ GPUデバイス
２７ GPUグローバルメモリ
２８ GPU
３５ベクトルDB
３６メニーコア演算器 11, 21, 31 Parallelized neighborhood search processing device 12, 22, 32 Host computer 13, 23, 33 Main memory 14, 24, 34 CPU
25 Vector DB
26 GPU device 27 GPU global memory 28 GPU
35 Vector DB
36 Many-core computing unit

Claims

kNN（k-Nearest Neighbor）グラフ構造を用いてベクトル群の近傍探索を行う近傍探索処理装置であって、
枝数Ｎ本の前記kNNグラフを構築し、グラフ探索開始ノードを現ノードとする事前処理手段と、
前記現ノードの枝Ｎ本のうち、該現ノードから距離が近いＭ本を選択して、該現ノードと該Ｍ本のノードとの距離を計算する距離計算手段と、
前記距離計算手段で得られた距離計算結果から、入力されたクエリから最短距離のノードを選択する最短距離ノード選択手段と、
選択された前記最短距離のノードが前記現ノードと等しくない場合は、該最短距離のノードにホップし、前記距離計算手段及び前記最短距離ノード選択手段を実行するグラフ探索手段と、
前記選択された前記最短距離のノードが等しい場合は、該現ノードから距離が遠いN−M個の枝ノードに対応するベクトルＩＤを選択し、該N−M個の枝ノードに対応するベクトル群と前記クエリのベクトルの並列距離計算を行い、該現ノード及び該N−M個の枝ノードと前記クエリ間の距離のN−M+1個の計算結果より、該クエリから最短距離のノードを選択する最近傍ベクトル抽出手段と、
を有することを特徴とする近傍探索処理装置。 A neighborhood search processing device that performs a neighborhood search of a vector group using a kNN (k-Nearest Neighbor) graph structure,
Pre-processing means for constructing the kNN graph of N branches and having the graph search start node as the current node;
A distance calculation means for selecting M branches near the current node from N branches of the current node and calculating a distance between the current node and the M nodes;
From the distance calculation result obtained by the distance calculation means, the shortest distance node selection means for selecting the shortest distance node from the input query,
If the selected shortest distance node is not equal to the current node, hop to the shortest distance node and execute the distance calculation means and the shortest distance node selection means;
If the selected nodes of the shortest distance are equal, a vector ID corresponding to NM branch nodes that are far from the current node is selected, and a vector group corresponding to the NM branch nodes is selected. And the parallel distance calculation of the vector of the query, and from the N−M + 1 calculation results of the distance between the current node and the N−M branch nodes and the query, the node with the shortest distance from the query is calculated. Nearest neighbor vector extraction means to select;
A neighborhood search processing device characterized by comprising:

ベクトル群を格納したベクトル記憶手段と、
GPUデバイスと、
前記ベクトル記憶手段の前記ベクトル群のデータを、前記GPUデバイスに転送する手段と、
を更に有し、
前記最近傍ベクトル抽出手段は、
選択した前記ベクトルＩＤを前記GPUデバイスに転送し、前記N−M個の枝ノードに対応するベクトル群と前記クエリのベクトルの並列距離計算結果を該GPUデバイスから取得する手段を含み、
前記GPUデバイスは、
前記ベクトルＩＤに基づいて前記ベクトル群のデータを参照して、該ベクトルＩＤに対応するベクトル群と前記クエリのベクトルの並列距離計算を行う並列距離計算手段を含む
請求項１記載の近傍探索処理装置。 Vector storage means for storing vector groups;
A GPU device,
Means for transferring the vector group data of the vector storage means to the GPU device;
Further comprising
The nearest neighbor vector extracting means includes:
Means for transferring the selected vector ID to the GPU device, and obtaining from the GPU device a parallel distance calculation result of a vector group corresponding to the NM branch nodes and a vector of the query;
The GPU device is
The neighborhood search processing apparatus according to claim 1, further comprising: a parallel distance calculation unit that refers to the data of the vector group based on the vector ID and calculates a parallel distance between the vector group corresponding to the vector ID and the vector of the query. .

メニーコア演算器を更に有し、
前記最近傍ベクトル抽出手段は、
前記選択された前記最短距離のノードが等しい場合は、前記メニーコア演算器に対して
前記現ノードから距離が遠いN−M個の枝ノードに対応するベクトル群と前記クエリベクトルの並列距離計算を指示し、計算結果を取得する並列距離計算指示手段を含み、
前記メニーコア演算器は、
前記現ノードから距離が遠いN−M個の枝ノードに対応するベクトル群と前記クエリベクトルの並列距離計算を行う手段を含む
請求項１記載の近傍探索処理装置。 It further has a many-core computing unit,
The nearest neighbor vector extracting means includes:
If the selected shortest distance nodes are equal, the many-core computing unit is instructed to calculate a parallel distance between the vector group corresponding to N-M branch nodes far from the current node and the query vector. And a parallel distance calculation instruction means for obtaining the calculation result,
The many-core computing unit is:
2. The neighborhood search processing apparatus according to claim 1, further comprising means for calculating a parallel distance between a vector group corresponding to NM branch nodes far from the current node and the query vector.

kNN（k-Nearest Neighbor）グラフ構造を用いてベクトル群の近傍探索を行う近傍探索処理方法であって、
事前処理手段、距離計算手段、最短距離ノード選択手段、グラフ探索手段、最近傍ベクトル抽出手段を有する装置において、
前記事前処理手段が、枝数Ｎ本の前記kNNグラフを構築し、グラフ探索開始ノードを現ノードとする事前処理ステップと、
前記距離計算手段が、前記現ノードの枝Ｎ本のうち、該現ノードから距離が近いM（N＞M）本を選択して、該現ノードと該Ｍ本のノードとの距離を計算する距離計算ステップと、
前記最短距離ノード選択手段が、前記距離計算ステップで得られた距離計算結果から、入力されたクエリから最短距離のノードを選択する最短距離ノード選択ステップと、
前記グラフ探索手段が、前記最短距離ノード選択ステップで選択された前記最短距離のノードが前記現ノードと等しくない場合は、該最短距離のノードにホップし、前記距離計算ステップ及び前記最短距離ノード選択ステップを行うグラフ探索ステップと、
前記最近傍ベクトル抽出手段が、前記グラフ探索ステップで前記選択された前記最短距離のノードが等しい場合は、該現ノードから距離が遠いN−M個の枝ノードに対応するベクトルＩＤを選択し、該N−M個の枝ノードに対応するベクトル群と前記クエリのベクトルの並列距離計算を行い、該現ノード及び該N−M個の枝ノードと前記クエリ間の距離のN−M+1個の計算結果より、該クエリから最短距離のノードを選択する最近傍ベクトル抽出ステップと、
を行うことを特徴とする近傍探索処理方法。 A neighborhood search processing method that performs neighborhood search of a vector group using a kNN (k-Nearest Neighbor) graph structure,
In an apparatus having pre-processing means, distance calculation means, shortest distance node selection means, graph search means, and nearest neighbor vector extraction means,
The preprocessing means constructs the kNN graph of N branches and a preprocessing step in which the graph search start node is a current node;
The distance calculating means selects M (N> M) of the N branches of the current node that are close to the current node, and calculates the distance between the current node and the M nodes. A distance calculation step;
The shortest distance node selection means, from the distance calculation result obtained in the distance calculation step, the shortest distance node selection step of selecting the node of the shortest distance from the input query;
If the shortest distance node selected in the shortest distance node selection step is not equal to the current node, the graph search means hops to the shortest distance node, and the distance calculation step and the shortest distance node selection A graph search step for performing steps;
The nearest neighbor vector extracting means, when the nodes of the shortest distance selected in the graph search step are equal, select vector IDs corresponding to N−M branch nodes far from the current node; A parallel distance calculation of the vector group corresponding to the NM branch nodes and the vector of the query is performed, and NM + 1 distances between the current node and the NM branch nodes and the query are calculated. From the calculation result of, the nearest neighbor vector extraction step of selecting the node with the shortest distance from the query,
The neighborhood search processing method characterized by performing.

ベクトル群を格納したベクトル記憶手段と、
GPUデバイスと、
を更に有する装置において、
前記ベクトル記憶手段のベクトル群のデータを前記GPUデバイスに転送するステップを更に行い、
前記最近傍ベクトル抽出ステップにおいて、
選択した前記ベクトルＩＤを前記GPUデバイスに転送し、
前記GPUデバイスにおいて、
前記ベクトルＩＤに基づいて前記ベクトル群のデータを参照して、該ベクトルＩＤに対応するベクトル群と前記クエリのベクトルの並列距離計算を行う
請求項４記載の近傍探索処理方法。 Vector storage means for storing vector groups;
A GPU device,
Further comprising:
Further transferring the vector group data of the vector storage means to the GPU device;
In the nearest neighbor vector extraction step,
Transfer the selected vector ID to the GPU device,
In the GPU device,
5. The neighborhood search processing method according to claim 4, wherein parallel distance calculation between the vector group corresponding to the vector ID and the vector of the query is performed by referring to the data of the vector group based on the vector ID.

メニーコア演算器を更に有する装置において、
前記最近傍ベクトル抽出ステップにおいて、
前記選択された前記最短距離のノードが等しい場合は、前記メニーコア演算器が、前記現ノードから距離が遠いN−M個の枝ノードに対応するベクトル群と前記クエリベクトルの並列距離計算を行う
請求項４記載の近傍探索処理方法。 In an apparatus further comprising a many-core computing unit,
In the nearest neighbor vector extraction step,
When the selected shortest distance nodes are equal, the many-core computing unit performs parallel distance calculation of a vector group corresponding to N-M branch nodes far from the current node and the query vector. Item 5. The neighborhood search processing method according to item 4.

コンピュータを、
請求項１乃至３のいずれか１項に記載の近傍探索処理装置の各手段として機能させるための近傍探索処理プログラム。 Computer
The neighborhood search processing program for functioning as each means of the neighborhood search processing apparatus of any one of Claims 1 thru | or 3.