JP5670993B2

JP5670993B2 - Reconstruction apparatus, method and program for tree structure by single path aggregation

Info

Publication number: JP5670993B2
Application number: JP2012280603A
Authority: JP
Inventors: 健山室; 史和小西
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-12-25
Filing date: 2012-12-25
Publication date: 2015-02-18
Anticipated expiration: 2032-12-25
Also published as: JP2014126881A

Description

本発明は、単一パス集約による木構造の再構成装置及び方法及びプログラムに係り、特に、木構造を用いて文字列を探索する際に、探索対象の文字列から構成されたアンバランスな木構造（トライ木やパトリシア木）を再構成し、より探索の際に参照局所性を改善するための単一パス集約による木構造の再構成装置及び方法及びプログラムに関する。 The present invention relates to an apparatus and method for reconstructing a tree structure by single path aggregation, and more particularly to an unbalanced tree composed of character strings to be searched when searching for a character string using a tree structure. The present invention relates to an apparatus, method, and program for reconstructing a tree structure by single path aggregation for reconstructing a structure (Tri tree or Patricia tree) and improving reference locality at the time of search.

文字列探索の木の最適化方法として、文字列探索のための木構造における冗長的なポインタと探索を削減するために、部分的なノードをカップリングすることで、ポインタの数を削減し、参照局所性を限定的に改善する方法がある（例えば特許文献1参照）。 String search tree optimization method reduces the number of pointers by coupling partial nodes to reduce redundant pointers and searches in the tree structure for string searches, There is a method for improving reference locality in a limited manner (see, for example, Patent Document 1).

また、文字列探索の木構造において、根ノードから単一の葉ノードまでの全ノード集合を単体物理ノードに再帰的に集約することで参照局所性を改善する方法がある（例えば、非特許文献１参照）。 In addition, there is a method for improving reference locality by recursively aggregating all node sets from a root node to a single leaf node into a single physical node in a tree structure of character string search (for example, non-patent literature) 1).

特許４４０２１２０号公報Japanese Patent No. 4402120

Roberto Gross and Giuseppe Ottaviano, 「Fast Compressed Tries through Path Decompositions」, ALENEX, 2012.Roberto Gross and Giuseppe Ottaviano, "Fast Compressed Tries through Path Decompositions", ALENEX, 2012.

しかしながら、上記従来の特許文献１の手法では、探索対象の文字列から構築した木構造の形や各ノードの参照確率によって、参照局所性が極端に悪くなり、探索速度が著しく悪化するケースがある。また、非特許文献１の手法では、特に、高さのある木構造では、参照確率が低いアクセスされないノードまで集約されるため非効率である。 However, in the method of the above-described conventional patent document 1, the locality of the reference is extremely deteriorated and the search speed is significantly deteriorated depending on the shape of the tree structure constructed from the character string to be searched and the reference probability of each node. . In addition, the technique of Non-Patent Document 1 is inefficient because a tree structure with a high height is aggregated to nodes that are not accessed and has a low reference probability.

本発明は、上記の点に鑑みなされたもので、階層型のメモリ構造において、分岐が少なく縦長でアンバランスな木構造の参照局所性を向上させることにより、探索速度を高速化可能な単一パス集約による木構造の再構成装置及び方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points. In a hierarchical memory structure, the search locality can be increased by improving the reference locality of a vertically long and unbalanced tree structure with few branches. An object of the present invention is to provide an apparatus and method for reconstructing a tree structure by path aggregation, and a program.

上記の課題を解決するため、本発明（請求項１）は、長さＮの文字列の中から任意の部分文字列Ｍを探索する際に、入力された木構造が分岐の少ない縦長の構造であるとき、該木構造を再構成する木構造の再構成装置であって、
前記木構造の各ノードの識別子、接続関係、参照確率を含む入力木構造情報が入力されると、木構造記憶手段に格納し、根ノードから該参照確率の高いノードを該木構造記憶手段から抽出し、集約起点ノードとして、単一の葉の方向に予め与えられた集約ノード個数K個のノードを単体物理ノードに集約し、最適化木構造記憶手段に格納し、その他のノードから構成される部分木は集約された該単体物理ノードの子ノードとして再配置する木構造最適化処理手段を有する。 In order to solve the above problem, the present invention (Claim 1) is directed to a vertically long structure in which an input tree structure has few branches when searching for an arbitrary partial character string M from a character string of length N. A reconstructing device for a tree structure for reconstructing the tree structure,
When input tree structure information including an identifier, a connection relation, and a reference probability of each node of the tree structure is input, the tree structure storage unit stores the input tree structure information, and a node having a high reference probability from a root node is stored in the tree structure storage unit. Extracted and aggregated K nodes that are given in the direction of a single leaf in advance in the direction of a single leaf are aggregated into a single physical node, stored in optimized tree structure storage means, and composed of other nodes The subtree has tree structure optimization processing means for rearranging the subtrees as child nodes of the aggregated single physical node.

また、本発明（請求項２）は、前記木構造最適化処理手段において、
集約起点候補ノード記憶手段から集約起点の候補ノードを取得し、該候補ノードを該集約起点候補ノード記憶手段から削除し、前記集約起点ノードを単一パス集約処理手段に入力する処理を、予め与えられた集約回数分繰り返し、該集約起点候補ノード記憶手段に残っている起点ノード以下の部分木を前記最適化木構造記憶手段に格納する手段を含み、
前記単一パス集約処理手段において、
前記木構造最適化処理手段から入力された前記集約起点ノードを処理対象ノードとして前記木構造記憶手段から抽出し、前記ノード情報記憶手段に格納し、該処理対象ノードの子ノードの集合を前記ノード情報記憶手段から抽出し、該子ノードの集合の中で参照確率が最も高いノードを次の処理対象ノードとし、該処理対象ノード以外の子ノードを前記集約起点候補ノード記憶手段に格納する処理を、予め与えられた集約ノード個数（Ｋ）分繰り返し、該集約ノード個数分の集約対象となったノード群を前記ノード情報記憶手段から取得して、前記単体物理ノードとして前記最適化木構造記憶手段に格納する手段を有する。 The present invention (Claim 2) provides the tree structure optimization processing means,
A process of acquiring a candidate node for the aggregation starting point from the aggregation origin candidate node storage unit, deleting the candidate node from the aggregation origin candidate node storage unit, and inputting the aggregation origin node to the single path aggregation processing unit is given in advance. Means for storing the subtree below the starting node remaining in the aggregation starting point candidate node storage unit in the optimized tree structure storage unit.
In the single path aggregation processing means,
The aggregation origin node input from the tree structure optimization processing means is extracted from the tree structure storage means as a processing target node, stored in the node information storage means, and a set of child nodes of the processing target node is stored in the node A process of extracting from the information storage means, setting a node having the highest reference probability in the set of child nodes as a next processing target node, and storing child nodes other than the processing target node in the aggregation origin candidate node storage means , Repeatedly for a given number of aggregation nodes (K), acquiring a node group to be aggregated for the number of aggregation nodes from the node information storage means, and the optimized tree structure storage means as the single physical node Means for storing.

上記のように、本発明によれば、長さＮの文字列の中から任意の部分文字列Ｍ（N>>M前提）を効率的に探索するための木構造（トライ木）を構成した場合に起こるアンバランスな構造、特に分岐が少なく縦長な木構造のパスの中で、参照確率の高いパスは集約ノードとして単体物理ノードにまとめ、集約ノードとしてメモリ上の隣接領域に配置されるため、アンバランスだった木構造全体が平衡化して階層型メモリにおける参照局所性が向上し、結果として探索速度が改善される。 As described above, according to the present invention, a tree structure (trie tree) for efficiently searching an arbitrary partial character string M (N >> M premise) from a character string of length N is configured. In an unbalanced structure that occurs in some cases, especially paths with a long tree structure with few branches, paths with a high reference probability are grouped as a single physical node as an aggregate node and placed in an adjacent area on the memory as an aggregate node The whole unbalanced tree structure is balanced and the reference locality in the hierarchical memory is improved, and as a result, the search speed is improved.

本発明の概要を示す図である。It is a figure which shows the outline | summary of this invention. 本発明の一実施の形態における木構造再構成装置の構成例である。It is an example of a structure of the tree structure reconstruction apparatus in one embodiment of this invention. 本発明の一実施の形態における集約起点候補ノード記憶部のデータフォーマットである。It is a data format of the aggregation origin candidate node memory | storage part in one embodiment of this invention. 本発明の一実施の形態における木構造最適化処理部のフローチャートである。It is a flowchart of the tree structure optimization process part in one embodiment of this invention. 本発明の一実施の形態における木構造入力装置から入力されるデータ例である。It is an example of the data input from the tree structure input device in one embodiment of this invention. 本発明の一実施の形態における単一パス集約処理部のフローチャートである。It is a flowchart of the single path | pass aggregation process part in one embodiment of this invention. 本発明の一実施の形態における入力の特徴と再構成の具体例である。It is a specific example of the feature and reconstruction of the input in one embodiment of the present invention. 本発明の一実施の形態における集約起点候補ノード記憶部の更新例である。It is an example of an update of the aggregation origin candidate node memory | storage part in one embodiment of this invention. 本発明の一実施の形態における集約されたノード内の探索の具体例である。It is a specific example of the search in the aggregated node in one embodiment of this invention.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の概要を示す図である。 FIG. 1 is a diagram showing an outline of the present invention.

本発明は、文字列探索のためのアンバランスな木構造を最適化するものであり、具体的には、探索対象の文字列から構成されたアンバランスな木構造（トライ木／パトリシア木）を再構成することでより探索の際の参照局所性を改善した木構造を生成する。 The present invention optimizes an unbalanced tree structure for character string search. Specifically, an unbalanced tree structure (Tri-tree / Patricia tree) composed of character strings to be searched is used. By reconstructing, a tree structure with improved reference locality in the search is generated.

同図（ａ）は縦長の木構造の例である。本発明では、このように探索対象の文字列から当該入力文字列の特性に依存したアンバランスな木が入力されるものとする。同図（ａ）のような木構造から、アンバランス性を改善し、同図（ｂ）に示すように、最適化された木構造を出力する。 FIG. 4A shows an example of a vertically long tree structure. In the present invention, an unbalanced tree depending on the characteristics of the input character string is input from the character string to be searched in this way. The imbalance is improved from the tree structure as shown in FIG. 11A, and an optimized tree structure is output as shown in FIG.

図２は、本発明の一実施の形態における木構造再構成装置の構成例である。 FIG. 2 is a configuration example of a tree structure reconstruction device according to an embodiment of the present invention.

同図に示す木構造再構成装置１０は、木構造最適化処理部１１、集約起点候補ノード記憶部１２、単一パス集約処理部１３、ノード情報記憶部１４、最適化木構造記憶部１５、木構造記憶部１６を有し、木構造最適化処理部１１は、木構造入力装置１と結果出力装置２に接続されている。 The tree structure reconstruction device 10 shown in the figure includes a tree structure optimization processing unit 11, an aggregation starting point candidate node storage unit 12, a single path aggregation processing unit 13, a node information storage unit 14, an optimized tree structure storage unit 15, A tree structure storage unit 16 is included, and the tree structure optimization processing unit 11 is connected to the tree structure input device 1 and the result output device 2.

集約起点候補ノード記憶部１２は、木構造最適化処理部１１から設定された根ノードと、集約起点候補ノードが格納される。集約起点候補ノード記憶部１２の例を図３に示す。同図に示すように、集約起点候補ノード記憶部１２は、更新順序番号毎にノード番号が格納され、先頭から順に参照される。 The aggregation origin candidate node storage unit 12 stores the root node set from the tree structure optimization processing unit 11 and the aggregation origin candidate node. An example of the aggregation start candidate node storage unit 12 is shown in FIG. As shown in the figure, the aggregation origin candidate node storage unit 12 stores a node number for each update order number and is referred to in order from the top.

ノード情報記憶部１４は、単一パス集約処理部１３から指定された根のノードに対応するノード情報が格納される。なお、ノード情報記憶部１４は、後述する図６のフローチャートが終了した時点で消去される一時的な記憶部である。 The node information storage unit 14 stores node information corresponding to the root node designated by the single path aggregation processing unit 13. The node information storage unit 14 is a temporary storage unit that is deleted when the flowchart of FIG.

最適化木構造記憶部１５は、木構造最適化処理部１１及び単一パス集約処理部１３で最適化処理されたノード番号の集合が格納される。 The optimized tree structure storage unit 15 stores a set of node numbers optimized by the tree structure optimization processing unit 11 and the single path aggregation processing unit 13.

木構造記憶部１６は、木構造最適化処理部１１から渡された入力木構造が格納され、単一パス集約処理部１３によって参照される。 The tree structure storage unit 16 stores the input tree structure passed from the tree structure optimization processing unit 11 and is referred to by the single path aggregation processing unit 13.

以下、上記の構成における処理を説明する。 Hereinafter, the process in said structure is demonstrated.

図４は、本発明の一実施の形態における木構造最適化処理部のフローチャートである。 FIG. 4 is a flowchart of the tree structure optimization processing unit according to the embodiment of the present invention.

ステップ４００）木構造最適化処理部１１は、木構造入力装置１から文字列探索のための入力木構造を受け取るまで待機する。 Step 400) The tree structure optimization processing unit 11 stands by until an input tree structure for character string search is received from the tree structure input device 1.

ステップ４０５）木構造入力装置１から、図５に示すような入力木構造を取得する。入力されるデータは、ノード番号、隣接ノード番号、参照確率であり、同図（ｂ）に示すように縦に長い木構造のイメージである。木構造最適化処理部１１は、入力された入力木構造を木構造記憶部１６に格納し、指定された根のノードを集約起点候補ノード記憶部１２に挿入する。また、木構造最適化処理部１１は、単一パス集約処理部１３に集約起点ノードの情報を出力し、単一パス集約処理部１３は木構造記憶部１６から当該集約起点ノードに対応する木構造を処理対象ノードとして抽出し、ノード情報記憶部１４に格納する。ノード情報記憶部１４には、例えば、図５（ａ）に示すように、ノード番号、隣接ノード番号、参照確率が格納される。 Step 405) An input tree structure as shown in FIG. 5 is acquired from the tree structure input apparatus 1. The input data is a node number, an adjacent node number, and a reference probability, and is an image of a tree structure that is vertically long as shown in FIG. The tree structure optimization processing unit 11 stores the inputted input tree structure in the tree structure storage unit 16 and inserts the designated root node into the aggregation origin candidate node storage unit 12. Further, the tree structure optimization processing unit 11 outputs the information on the aggregation starting node to the single path aggregation processing unit 13, and the single path aggregation processing unit 13 transmits the tree corresponding to the aggregation starting node from the tree structure storage unit 16. The structure is extracted as a processing target node and stored in the node information storage unit 14. For example, as shown in FIG. 5A, the node information storage unit 14 stores a node number, an adjacent node number, and a reference probability.

ステップ４１０）集約回数用のカウンタｉを０で初期化する。 Step 410) The aggregation number counter i is initialized with zero.

ステップ４１５）集約起点候補ノード記憶部１２から集約起点となるノードを１つ取得する。最初の集約起点ノードは、ステップ４０５で得られた根のノードであり、次回以降は集約起点候補ノード記憶部１２の中で更新順序番号が最も小さいノードを集約起点ノードとして取得する。また、取得した当該ノードを当該集約起点候補ノード記憶部１２から削除する。これにより、集約起点候補ノード記憶部１２に残るのは未処理の候補ノードのみとなる。 Step 415) One node as an aggregation starting point is acquired from the aggregation starting point candidate node storage unit 12. The first aggregation starting node is the root node obtained in step 405, and from the next time onward, the node having the smallest update order number in the aggregation starting candidate node storage unit 12 is acquired as the aggregation starting node. Further, the acquired node is deleted from the aggregation origin candidate node storage unit 12. As a result, only unprocessed candidate nodes remain in the aggregation start candidate node storage unit 12.

ステップ４２０）ステップ４１５で取得した集約起点ノードを単一パス集約処理部１３に入力し、単一パス集約処理部１３の処理を実行する。 Step 420) The aggregation origin node acquired in Step 415 is input to the single path aggregation processing unit 13, and the process of the single path aggregation processing unit 13 is executed.

ステップ４２５）木構造最適化処理部１１は、集約起点ノードノード記憶部１２内に存在する候補数Ｓを取得し、Ｓが０より大きければステップ４３０に移行し、そうでなければ、最適化完了通知を結果出力装置２に出力して当該木構造最適化処理部１１の処理を終了する。 Step 425) The tree structure optimization processing unit 11 acquires the number of candidates S existing in the aggregation starting node node storage unit 12. If S is larger than 0, the process proceeds to Step 430. Otherwise, the optimization is completed. The notification is output to the result output device 2 and the processing of the tree structure optimization processing unit 11 is terminated.

ステップ４３０）カウンタｉを１増加させ（ｉ=ｉ＋１）、ｉが、集約回数の入力パラメータであるNUM_COMPACTIONより小さい場合は、ステップ４１５に移行し、そうでない場合はステップ４３５に移行する。なお、NUM_COMPACTION（集約回数）は、ステップ４１５〜４３０を繰り返す回数であり、事前に決定され、入力手段（図示せず）から入力される、または、メモリ（図示せず）に格納されているものとする。 Step 430) The counter i is incremented by 1 (i = i + 1). If i is smaller than NUM_COMPACTION, which is an input parameter for the number of times of aggregation, the process proceeds to Step 415, and otherwise, the process proceeds to Step 435. Note that NUM_COMPACTION (number of times of aggregation) is the number of times to repeat steps 415 to 430, and is determined in advance and input from an input means (not shown) or stored in a memory (not shown). And

ステップ４３５）集約起点候補ノード記憶部１２内に残された起点候補ノード以下の部分木を最適化構造記憶部１５に記録する。なお、NUM_COMPACTIONは、予め設定されているものとする。 Step 435) The subtree below the starting candidate node remaining in the aggregation starting candidate node storage unit 12 is recorded in the optimization structure storage unit 15. Note that NUM_COMPACTION is set in advance.

次に、単一パス集約処理部１３の処理について説明する。 Next, processing of the single path aggregation processing unit 13 will be described.

図６は、本発明の一実施の形態における単一パス集約処理部のフローチャートである。 FIG. 6 is a flowchart of the single path aggregation processing unit in one embodiment of the present invention.

ステップ５００）単一パス集約処理部１３は、木構造最適化処理部１１から入力された集約起点ノードを取得し、当該ノードを処理対象ノードとする。 Step 500) The single path aggregation processing unit 13 acquires the aggregation starting node input from the tree structure optimization processing unit 11, and sets the node as a processing target node.

ステップ５０５）集約ノード数のためのカウンタｊを０で初期化する（ｊ＝０）。 Step 505) The counter j for the aggregate node number is initialized with 0 (j = 0).

ステップ５１０）処理対象ノードを木構造記憶部１６から読み出してノード情報記憶部１４に記録する。 Step 510) The processing target node is read from the tree structure storage unit 16 and recorded in the node information storage unit 14.

ステップ５１５）ノード情報記憶部１４から当該処理対象ノードの子ノード集合を探索する。子ノードは、入力された木構造の隣接ノード番号を参照することにより抽出する。 Step 515) The node information storage unit 14 is searched for a child node set of the processing target node. The child node is extracted by referring to the adjacent node number of the input tree structure.

ステップ５２０）取得した子ノード集合の中で最も参照確率の高いノードを次の処理対象ノードに設定する。 Step 520) The node having the highest reference probability in the acquired child node set is set as the next processing target node.

ステップ５２５）ステップ５２０で処理対象ノードに設定された以外の子ノードを全て集約起点候補ノード記憶部１２に挿入する。 Step 525) All child nodes other than those set as processing target nodes in Step 520 are inserted into the aggregation starting point candidate node storage unit 12.

ステップ５３０）カウンタｊを１増加させ（ｊ＝ｊ＋１）、ｊが所定の集約ノード数Kより小さければステップ５１０に移行して上記の処理を繰り返し、そうでなければステップ５３５に移行する。 Step 530) The counter j is incremented by 1 (j = j + 1). If j is smaller than the predetermined aggregate node number K, the process proceeds to Step 510 and the above processing is repeated. Otherwise, the process proceeds to Step 535.

ステップ５３５）ｊ≧Ｋとなった場合に、上記で処理対象ノードとなったノード列（パス）をノード情報記憶部１４から取得する。 Step 535) When j ≧ K, the node sequence (path) that is the processing target node is acquired from the node information storage unit 14.

ステップ５４０）取得したノード列から物理的に単一な集約ノードパスを構成し、最適化木構造記憶部１５に記録すると共に、ノード情報記憶部１４の情報を消去する。 Step 540) A physically aggregated node path is constructed from the acquired node sequence, recorded in the optimized tree structure storage unit 15, and information in the node information storage unit 14 is deleted.

以下、上記の処理を具体的に説明する。 The above processing will be specifically described below.

図７は、本発明の一実施の形態における入力の特徴と再構成の具体例を示す。 FIG. 7 shows a specific example of input characteristics and reconstruction in an embodiment of the present invention.

木構造入力装置１から図５に示すような木構造のデータが入力されると、木構造最適化処理部１１は、参照確率の高い根から単一の葉方向に集約ノード数=K個（ここでは、K=5とする）のノード（ノード１，２，３，４，６）を単体物理ノードに集約する。 When tree-structured data as shown in FIG. 5 is input from the tree-structure input device 1, the tree-structure optimization processing unit 11 counts the number of aggregate nodes = K in a single leaf direction from a root having a high reference probability ( Here, the nodes (nodes 1, 2, 3, 4, 6) of K = 5) are collected into a single physical node.

図５の例において、ノード[１]を集約起点ノードとし、集約起点候補ノード記憶部１２に挿入する（ステップ４０５）。このとき、集約起点候補ノード記憶部１２には、図８（ａ）のようにノード［１］が格納される。なお、参照確率は、そのノードを根とした場合の木全体の合計参照確率とする。 In the example of FIG. 5, the node [1] is set as the aggregation start node and inserted into the aggregation start candidate node storage unit 12 (step 405). At this time, the node [1] is stored in the aggregation start candidate node storage unit 12 as shown in FIG. The reference probability is the total reference probability of the entire tree when the node is the root.

次に、木構造最適化処理部１１において、集約起点候補ノード記憶部１２から、更新順序番号が最も小さい行を取り出し、その行を削除する（ステップ４１５）。図５（ａ）の例から［ノード２］が取得されたものとする。 Next, the tree structure optimization processing unit 11 extracts the row with the smallest update order number from the aggregation start candidate node storage unit 12 and deletes the row (step 415). It is assumed that [Node 2] is acquired from the example of FIG.

次に、木構造最適化処理部１１は、単一集約処理部１３に集約起点ノードを入力する（ステップ４２０）。単一集約処理部１３は、木構造最適化処理部１１から入力されたノードを処理対象ノードとし、木構造記憶部１６から読み出してノード情報記憶部１４に格納する（ステップ５００）。当該処理対象ノードの子ノードをノード情報記憶部１４から探索する（ステップ５１５）。図５（ａ）において、集約起点ノード［２］の子ノードの中で最も参照確率の高いノードを次の処理対象ノードとして設定する（ステップ５２０）。ここでは、ノード［３］が取得されるものとする。集約対象にならなかったノードを集約起点候補ノード記憶部１２に挿入する。例えば、図８（ｃ）のように格納される。これにより、次回の木構造最適化処理部１１のステップ４１５の処理で取得される集約起点ノードは、更新順序番号が最も小さいノード番号［８］となる。なお、処理対象ノードの子ノードの参照確率が同値の場合は、ノード情報記憶部１４に先に記録されたノードを選択するものとする。 Next, the tree structure optimization processing unit 11 inputs the aggregation starting node to the single aggregation processing unit 13 (step 420). The single aggregation processing unit 13 sets the node input from the tree structure optimization processing unit 11 as a processing target node, reads it from the tree structure storage unit 16 and stores it in the node information storage unit 14 (step 500). The node information storage unit 14 is searched for a child node of the processing target node (step 515). In FIG. 5A, the node with the highest reference probability among the child nodes of the aggregation starting node [2] is set as the next processing target node (step 520). Here, it is assumed that the node [3] is acquired. Nodes that have not been aggregated are inserted into the aggregation origin candidate node storage unit 12. For example, it is stored as shown in FIG. As a result, the aggregation start node acquired in the process of step 415 of the next tree structure optimization processing unit 11 is the node number [8] having the smallest update order number. Note that, when the reference probabilities of the child nodes of the processing target node are the same value, the node previously recorded in the node information storage unit 14 is selected.

単一パス集約処理部１３では、単一の葉方向にＫ個のノードについてステップ５１０〜５３０を行い、ｊ≧Ｋとなったら、ノード情報記憶部１４から集約対象となったノード列（ノード１，２，３，４，６）を取得して（ステップ５３５）、それらを単一な集約ノードパスとする。その結果、図７（ｂ）に示すように、集約ノード数Ｋ＝５に対し、集約ノード｛１，２，３，４，６｝を取得し、これらの集約ノードを最適化木構造記憶部１５に格納する（ステップ５４０）。 The single path aggregation processing unit 13 performs steps 510 to 530 for K nodes in a single leaf direction, and when j ≧ K, the node sequence (node 1) that is the aggregation target from the node information storage unit 14 , 2, 3, 4, 6) (step 535) and make them a single aggregate node path. As a result, as shown in FIG. 7B, for the number of aggregate nodes K = 5, aggregate nodes {1, 2, 3, 4, 6} are acquired, and these aggregate nodes are optimized tree structure storage units. 15 (step 540).

以降、図７（ｂ）の上記の集約ノード｛１，２，３，４，６｝の子に再構成された各部分木（点線三角）に対して、上記の処理を集約回数(NUM_COMPACTION)分再帰的に実施する。 Thereafter, the above processing is repeated for each subtree (dotted triangle) reconstructed as a child of the aggregation node {1, 2, 3, 4, 6} in FIG. 7B (NUM_COMPACTION). Run recursively.

木構造最適化処理部１１は、ｉ≧NUM_COMPACTIONとなった時点で、集約起点候補ノード記憶部１２に残された起点ノード以下の部分木を最適化木構造記憶部１５に格納し、集約起点候補ノード記憶部１２内に存在する候補ノード数Ｓが０になった時点で、最適化完了通知を結果出力装置２に出力する。 When i ≧ NUM_COMPACTION, the tree structure optimization processing unit 11 stores the subtree below the starting node remaining in the aggregation starting point candidate node storage unit 12 in the optimization tree structure storage unit 15, and the aggregation starting point candidate When the number of candidate nodes S existing in the node storage unit 12 becomes zero, an optimization completion notification is output to the result output device 2.

上記の処理により、部分的に集約され、最適化木構造記憶部１５に格納された例を図９に示す。集約されたノード群は、最適化木構造記憶部１５の隣接領域に配置され、当該領域を順次読み込むことにより、当該集約されたノード内における分岐点を決定し、次に探索する部分木を決定する。例えば、集約されたノード群ノード［４］については、当該ノード［４］に対応する部分木ノード［５］にジャンプする。 FIG. 9 shows an example in which the above processing is partially aggregated and stored in the optimized tree structure storage unit 15. The aggregated node group is arranged in an adjacent area of the optimized tree structure storage unit 15, and by sequentially reading the area, a branch point in the aggregated node is determined, and a subtree to be searched next is determined. To do. For example, for the aggregated node group node [4], the process jumps to the subtree node [5] corresponding to the node [4].

上記のように、入力された木構造情報から、最も参照確率の高いノードを根ノードとして、単一の葉の方向K個のノードを、単体物理ノードに集約し、その他のノードから構成される部分木はその集約ノードの子として再配置する処理を再帰的に部分木に適用する。これにより、縦長な木構造を構成するパスの中で参照確率の高いパスは集約ノードとして物理的にまとめられ、集約ノードとしてメモリの隣接領域に配置されるため、縦長であった木構造全体が平衡化して階層型メモリにおける参照局所性が向上する。そのため、結果として探索速度が改善される。 As described above, from the input tree structure information, the node with the highest reference probability is the root node, K nodes in the single leaf direction are aggregated into a single physical node, and composed of other nodes The subtree recursively applies the process of rearranging as a child of the aggregation node to the subtree. As a result, paths with a high reference probability among paths constituting a vertically long tree structure are physically grouped as an aggregation node and arranged in an adjacent area of the memory as an aggregation node. Equilibration improves reference locality in the hierarchical memory. As a result, the search speed is improved.

なお、上記の図２の木構造再構成装置の各構成要素の動作をプログラムとして構築し、木構造再構成装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 In addition, the operation | movement of each component of the above-mentioned tree structure reconstruction apparatus of FIG. 2 is built as a program, installed in a computer used as the tree structure reconstruction apparatus, and executed, or distributed via a network. Is possible.

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications can be made within the scope of the claims.

１木構造入力装置
２結果出力装置
１０木構造再構成装置
１１木構造最適化処理部
１２集約起点候補ノード記憶部
１３単一パス集約処理部
１４ノード情報記憶部
１５最適化木構造記憶部
１６木構造記憶部 DESCRIPTION OF SYMBOLS 1 Tree structure input apparatus 2 Result output apparatus 10 Tree structure reconstruction apparatus 11 Tree structure optimization process part 12 Aggregation origin candidate node memory | storage part 13 Single path | pass aggregation process part 14 Node information memory | storage part 15 Optimization tree structure memory | storage part 16 Tree Structure storage

Claims

長さＮの文字列の中から任意の部分文字列Ｍを探索する際に、入力された木構造が分岐の少ない縦長の構造であるとき、該木構造を再構成する木構造の再構成装置であって、
前記木構造の各ノードの識別子、接続関係、参照確率を含む入力木構造情報が入力されると、木構造記憶手段に格納し、根ノードから該参照確率の高いノードを該木構造記憶手段から抽出し、集約起点ノードとして、単一の葉の方向に予め与えられた集約ノード個数K個のノードを単体物理ノードに集約し、最適化木構造記憶手段に格納し、その他のノードから構成される部分木は集約された該単体物理ノードの子ノードとして再配置する木構造最適化処理手段を有する
ことを特徴とする単一パス集約による木構造の再構成装置。 When searching for an arbitrary partial character string M from a character string of length N, if the input tree structure is a vertically long structure with few branches, a tree structure reconstruction device that reconfigures the tree structure Because
When input tree structure information including an identifier, a connection relation, and a reference probability of each node of the tree structure is input, the tree structure storage unit stores the input tree structure information, and a node having a high reference probability from a root node is stored in the tree structure storage unit. Extracted and aggregated K nodes that are given in the direction of a single leaf in advance in the direction of a single leaf are aggregated into a single physical node, stored in optimized tree structure storage means, and composed of other nodes A tree structure reconstructing device using single path aggregation, comprising: a tree structure optimization processing means for rearranging the subtree as a child node of the aggregated single physical node.

前記木構造最適化処理手段は、
集約起点候補ノード記憶手段から集約起点の候補ノードを取得し、該候補ノードを該集約起点候補ノード記憶手段から削除し、前記集約起点ノードを単一パス集約処理手段に入力する処理を、予め与えられた集約回数分繰り返し、該集約起点候補ノード記憶手段に残っている起点ノード以下の部分木を前記最適化木構造記憶手段に格納する手段を含み、
前記単一パス集約処理手段は、
前記木構造最適化処理手段から入力された前記集約起点ノードを処理対象ノードとして前記木構造記憶手段から抽出し、前記ノード情報記憶手段に格納し、該処理対象ノードの子ノードの集合を前記ノード情報記憶手段から抽出し、該子ノードの集合の中で参照確率が最も高いノードを次の処理対象ノードとし、該処理対象ノード以外の子ノードを前記集約起点候補ノード記憶手段に格納する処理を、予め与えられた集約ノード個数（Ｋ）分繰り返し、該集約ノード個数分の集約対象となったノード群を前記ノード情報記憶手段から取得して、前記単体物理ノードとして前記最適化木構造記憶手段に格納する手段を有する、
請求項１記載の単一パス集約による木構造の再構成装置。 The tree structure optimization processing means includes:
A process of acquiring a candidate node for the aggregation starting point from the aggregation origin candidate node storage unit, deleting the candidate node from the aggregation origin candidate node storage unit, and inputting the aggregation origin node to the single path aggregation processing unit is given in advance. Means for storing the subtree below the starting node remaining in the aggregation starting point candidate node storage unit in the optimized tree structure storage unit.
The single path aggregation processing means includes:
The aggregation origin node input from the tree structure optimization processing means is extracted from the tree structure storage means as a processing target node, stored in the node information storage means, and a set of child nodes of the processing target node is stored in the node A process of extracting from the information storage means, setting a node having the highest reference probability in the set of child nodes as a next processing target node, and storing child nodes other than the processing target node in the aggregation origin candidate node storage means , Repeatedly for a given number of aggregation nodes (K), acquiring a node group to be aggregated for the number of aggregation nodes from the node information storage means, and the optimized tree structure storage means as the single physical node Having means for storing in the
The tree-structure reconstructing apparatus according to claim 1, wherein the single-path aggregation is used.

長さＮの文字列の中から任意の部分文字列Ｍを探索する際に、入力された木構造が分岐の少ない縦長の構造であるとき、該木構造を再構成する木構造の再構成方法であって、
ノード情報記憶手段、最適化木構造記憶手段、木構造最適化処理手段、集約起点候補記憶手段、単一パス集約処理手段を有する装置において、
前記木構造最適化処理手段が、前記木構造の各ノードの識別子、接続関係、参照確率を含む入力木構造情報が入力されると、木構造記憶手段に格納し、根ノードから該参照確率の高いノードを該木構造記憶手段から抽出し、集約起点ノードとして、単一の葉の方向に予め与えられた集約ノード個数K個のノードを単体物理ノードに集約し、前記最適化木構造記憶手段に格納し、その他のノードから構成される部分木は集約された該単体物理ノードの子ノードとして再配置する木構造最適化処理ステップを行う
ことを特徴とする単一パス集約による木構造の再構成方法。 When searching for an arbitrary partial character string M from a character string of length N, if the input tree structure is a vertically long structure with few branches, the tree structure reconstruction method reconstructs the tree structure Because
In an apparatus having node information storage means, optimized tree structure storage means, tree structure optimization processing means, aggregation origin candidate storage means, single path aggregation processing means,
When the tree structure optimization processing means receives input tree structure information including an identifier, a connection relationship, and a reference probability of each node of the tree structure, the tree structure optimization processing means stores the input tree structure information in the tree structure storage means, and stores the reference probability from the root node. A high node is extracted from the tree structure storage means, and the aggregation tree number K nodes previously given in the direction of a single leaf are aggregated into a single physical node as an aggregation origin node, and the optimized tree structure storage means The tree structure is re-arranged by single-path aggregation, and a sub-tree consisting of other nodes is rearranged as a child node of the aggregated single physical node. Configuration method.

前記木構造最適化処理ステップにおいて、
前記集約起点候補ノード記憶手段から集約起点の候補ノードを取得し、該候補ノードを該集約起点候補ノード記憶手段から削除し、前記集約起点ノードを単一パス集約処理手段に入力する処理を、予め与えられた集約回数分繰り返し、該集約起点候補ノード記憶手段に残っている起点ノード以下の部分木を前記最適化木構造記憶手段に格納し、
前記単一パス集約処理手段が、前記木構造最適化処理ステップで入力された前記集約起点ノードを処理対象ノードとして前記木構造記憶手段から抽出して前記ノード情報記憶手段に格納し、該処理対象ノードの子ノードの集合を抽出し、該子ノードの集合の中で参照確率が最も高いノードを次の処理対象ノードとし、該処理対象ノード以外の子ノードを集約起点候補ノード記憶手段に格納する処理を、予め与えられた集約ノード個数（Ｋ）分繰り返し、該集約ノード個数分の集約対象となったノード群を前記ノード情報記憶手段から取得して、前記単体物理ノードとして前記最適化木構造記憶手段に格納する、
請求項３記載の単一パス集約による木構造の再構成方法。 In the tree structure optimization processing step,
A process of acquiring a candidate node for an aggregation start point from the aggregation start candidate node storage unit, deleting the candidate node from the aggregation start point candidate node storage unit, and inputting the aggregation start node to a single path aggregation processing unit in advance It repeats for the given number of times of aggregation, and stores the subtree below the origin node remaining in the aggregation origin candidate node storage means in the optimized tree structure storage means,
The single path aggregation processing unit extracts the aggregation origin node input in the tree structure optimization processing step as a processing target node from the tree structure storage unit, stores the extracted node in the node information storage unit, and the processing target A set of child nodes of the node is extracted, a node having the highest reference probability in the set of child nodes is set as the next processing target node, and child nodes other than the processing target node are stored in the aggregation starting point candidate node storage unit. The process is repeated for the number of aggregation nodes (K) given in advance, and a node group to be aggregated for the number of aggregation nodes is acquired from the node information storage means, and the optimized tree structure is used as the single physical node. Store in storage means,
The tree structure reconstructing method by single path aggregation according to claim 3.

コンピュータを、
請求項１または２に記載の単一パス集約による木構造の再構成装置の各手段として機能させるための単一パス集約による木構造の再構成プログラム。 Computer
A program for reconstructing a tree structure by single path aggregation to function as each means of the tree structure reconstruction apparatus by single path aggregation according to claim 1 or 2.