JP6296044B2

JP6296044B2 - Code processing program and data structure

Info

Publication number: JP6296044B2
Application number: JP2015238737A
Authority: JP
Inventors: 大介二宮; 真嘉樋口; 豊小山; 雅樹西垣; 量松村; 敏郎小野; 崇記小澤; 純史川井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-12-07
Filing date: 2015-12-07
Publication date: 2018-03-20
Anticipated expiration: 2032-06-28
Also published as: JP2016034158A

Description

本技術は、符号化技術に関する。 The present technology relates to an encoding technology.

従来技術では、入力文字列の圧縮を行うために文節木を生成する。この文節木の各ノードは、文字又は文字列と置き換える符号と、圧縮される文字と、階層を表すデータと、親ノードポインタと、使用される可能性がある各文字に対応する子ノードへのポインタと、各子ノードに対応する文字が出現する回数を計数するカウンタとを含む。例えば入力文字列「ABABCABCABCCBCBCBCAAACBACBACBBCCBB」が入力された場合、図１に示すような文節木が生成される。なお、新たなノードを生成するための出現回数の閾値は２である。この例では、第０階層のノードとしてルートノードが設けられており、第１階層のノードとして文字「０ｘ００」乃至「０ｘＦＦ」についてのノードが設けられている。第２階層のノードとして、文字「０ｘ４１」の子ノードとして文字「０ｘ４２」及び「０ｘ４１」についてのノードと、文字「０ｘ４２」の子ノードとして文字「０ｘ４３」についてのノードと、文字「０ｘ４３」の子ノードとして文字「０ｘ４２」についてのノードとが設けられている。さらに、第３階層のノードとして、文字「０ｘ４２」の子ノードとして文字「０ｘ４３」についてのノードと、文字「０ｘ４２」の子ノードとして文字「０ｘ４３」「０ｘ４１」及び「０ｘ４２」についてのノードとが設けられている。なお、各ノードについては、符号（Ａ）と、文字（Ｂ）と、子ノードについての文字の出現回数（Ｃ）と、子ノードへのポインタ（Ｄ）とを模式的に示している。 In the prior art, a phrase tree is generated in order to compress an input character string. Each node in this phrase tree has a code that replaces the character or character string, the character to be compressed, the data representing the hierarchy, the parent node pointer, and the child node corresponding to each character that may be used. It includes a pointer and a counter that counts the number of times a character corresponding to each child node appears. For example, when an input character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBB” is input, a phrase tree as shown in FIG. 1 is generated. Note that the threshold of the number of appearances for generating a new node is 2. In this example, a root node is provided as a node of the 0th hierarchy, and nodes for characters “0x00” to “0xFF” are provided as nodes of the 1st hierarchy. As nodes of the second hierarchy, nodes for characters “0x42” and “0x41” as child nodes of character “0x41”, nodes for character “0x43” as child nodes of character “0x42”, and characters “0x43” A node for the character “0x42” is provided as a child node. Furthermore, as the nodes of the third hierarchy, there are a node for the character “0x43” as a child node of the character “0x42”, and a node for the characters “0x43”, “0x41”, and “0x42” as child nodes of the character “0x42”. Is provided. In addition, about each node, the code | symbol (A), the character (B), the appearance frequency (C) of the character about a child node, and the pointer (D) to a child node are shown typically.

図２に示すように子ノードポインタと子ノードについての文字の出現回数とが、使用可能な文字種２５６個分となるので、１つのノードで３０８５バイトのメモリ容量が消費されることになる。仮に符号長２バイトで表現できる最大数６５５３６個のノードが設けられるとすると、合計で約１９２Ｍバイトメモリ容量が消費される。 As shown in FIG. 2, since the child node pointer and the number of appearances of characters for the child node are 256 usable character types, the memory capacity of 3085 bytes is consumed in one node. If a maximum number of 65536 nodes that can be expressed by a code length of 2 bytes is provided, a total memory capacity of about 192 Mbytes is consumed.

文節木は文字列を置き換える符号の種類（すなわちノード数）が多いほど、多様なデータを圧縮することができる。しかし、符号の種類がこのように多くなるとノード数も多くなり、文節木全体のサイズが大きくなってしまう。 In the phrase tree, the more types of codes (ie, the number of nodes) that replace character strings, the more data can be compressed. However, when the number of types of codes increases in this way, the number of nodes also increases, and the overall size of the phrase tree increases.

特開２０１１−２２１８４５号公報JP 2011-221845 A

従って、本技術の目的は、一側面において、文節木を表すデータ構造のためのメモリ容量を削減するための技術を提供することである。 Accordingly, an object of the present technology is, in one aspect, to provide a technology for reducing a memory capacity for a data structure representing a phrase tree.

本技術の第１の態様に係る文節木のデータ構造は、分節木のデータ構造であって、（Ａ）分節木を構成する複数のノードそれぞれが、特定の文字または文字列に対応する符号を保持する第１の領域と、（Ｂ）特定の文字または文字列の次に入力文字列において出現する文字の出現順番を複数の文字それぞれについて保持する第２の領域と、（Ｃ）特定の文字または文字列の次に入力文字列において出現した文字の出現回数が特定の閾値を超えていない場合は上記出現回数を、出現回数が特定の閾値を超えた場合は特定の文字または文字列と特定の文字または文字列の次に入力文字列において出現した文字とからなる文字列に対応して生成される符号を、特定の文字または文字列の次に入力文字列において出現した文字の出現順番に応じて保持する第３の領域とを有する。 The data structure of the phrase tree according to the first aspect of the present technology is a data structure of a segment tree, and (A) each of a plurality of nodes constituting the segment tree has a code corresponding to a specific character or character string. A first area to be held; (B) a second area to hold the appearance order of the character appearing in the input character string next to the specific character or character string for each of the plurality of characters; and (C) the specific character. Or, if the number of appearances of the character that appears in the input character string after the character string does not exceed a specific threshold, the number of appearances is specified, and if the number of appearances exceeds a specific threshold, the specific character or character string is specified. The code generated corresponding to the character string that consists of the character that appears in the input character string next to the character or character string in the order of appearance of the character that appears in the input character string after the specific character or character string Hold accordingly And a third region that.

本技術の第２の態様に係るデータ構造は、文節木に対応する第１のデータと、文節木の各ノードに対応する文字又は文字列についての第２のデータと、文節木の階層についての第３のデータとを有する。また、第１のデータにおいて、文節木の各ノードについて、当該ノードの親ノードの符号又は当該ノードの子ノードの符号のうち最大の符号と、当該ノードで表される文字のコードと、当該ノードが所属する階層とを対応付けるデータが、各ノードの符号の順番に並べられている。第２のデータは、文節木の第２階層以降の各階層の各ノードについて当該ノードに対応する文字又は文字列のコードを当該ノードの符号の順に含む。さらに、第３のデータは、文節木の第２階層以降の各階層について、当該階層に属するノードの数と、当該階層に属するノードの符号のうち最小の符号と、第２のデータにおいて当該最小の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む。 The data structure according to the second aspect of the present technology includes: first data corresponding to a phrase tree; second data regarding a character or character string corresponding to each node of the phrase tree; And third data. In the first data, for each node in the phrase tree, the maximum code of the code of the parent node of the node or the code of the child node of the node, the code of the character represented by the node, and the node Data that associates with the hierarchy to which the node belongs is arranged in the order of the codes of the nodes. The second data includes, for each node in each hierarchy after the second hierarchy of the phrase tree, a character or character string code corresponding to the node in the order of the code of the node. Further, the third data includes, for each hierarchy after the second hierarchy of the phrase tree, the number of nodes belonging to the hierarchy, the smallest code among the codes of the nodes belonging to the hierarchy, and the minimum in the second data. The data which matches the offset value from the head of the arrangement position of the character or character string corresponding to the code of.

文節木を表すデータ構造のためのメモリ容量を削減できるようになる。 The memory capacity for the data structure representing the phrase tree can be reduced.

図１は、従来例の文節木を表すデータ構造の一例を模式的に示す図である。FIG. 1 is a diagram schematically showing an example of a data structure representing a conventional phrase tree. 図２は、従来例の使用メモリ量の一例を示す図である。FIG. 2 is a diagram illustrating an example of the amount of memory used in the conventional example. 図３は、第１の実施の形態に係る文節木の１ノード分のデータ構造を示す図である。FIG. 3 is a diagram illustrating a data structure for one node of a phrase tree according to the first embodiment. 図４は、拡張カウンタ兼子ノード番号の配列の一例を示す図である。FIG. 4 is a diagram showing an example of an array of extension counter / child node numbers. 図５は、実施の形態に係る情報処理装置の機能ブロック図である。FIG. 5 is a functional block diagram of the information processing apparatus according to the embodiment. 図６は、実施の形態に係るメインの処理フローを示す図である。FIG. 6 is a diagram illustrating a main processing flow according to the embodiment. 図７は、文節木生成処理の処理フローを示す図である。FIG. 7 is a diagram illustrating a processing flow of phrase tree generation processing. 図８は、文節木のデータの生成処理を説明するための図である。FIG. 8 is a diagram for explaining phrase tree data generation processing. 図９は、文節木のデータの生成処理を説明するための図である。FIG. 9 is a diagram for explaining phrase tree data generation processing. 図１０は、文節木のデータの生成処理を説明するための図である。FIG. 10 is a diagram for explaining phrase tree data generation processing. 図１１は、文節木のデータの生成処理を説明するための図である。FIG. 11 is a diagram for explaining data generation processing of phrase trees. 図１２は、文節木のデータの生成処理を説明するための図である。FIG. 12 is a diagram for explaining phrase tree data generation processing. 図１３は、文節木のデータの生成処理を説明するための図である。FIG. 13 is a diagram for explaining phrase tree data generation processing. 図１４は、文節木のデータの生成処理を説明するための図である。FIG. 14 is a diagram for explaining phrase tree data generation processing. 図１５は、文節木のデータの生成処理を説明するための図である。FIG. 15 is a diagram for explaining phrase tree data generation processing. 図１６は、文節木のデータの生成処理を説明するための図である。FIG. 16 is a diagram for explaining phrase tree data generation processing. 図１７は、文節木のデータの生成処理を説明するための図である。FIG. 17 is a diagram for explaining phrase tree data generation processing. 図１８は、文節木のデータの生成処理を説明するための図である。FIG. 18 is a diagram for explaining phrase tree data generation processing. 図１９は、文節木のデータの生成処理を説明するための図である。FIG. 19 is a diagram for explaining phrase tree data generation processing. 図２０Ａは、文節木のデータの生成処理を説明するための図である。FIG. 20A is a diagram for explaining the phrase tree data generation process. 図２０Ｂは、文節木のデータの生成処理を説明するための図である。FIG. 20B is a diagram for explaining phrase tree data generation processing. 図２０Ｃは、文節木のデータの生成処理を説明するための図である。FIG. 20C is a diagram for explaining the data generation process of the phrase tree. 図２０Ｄは、文節木のデータの生成処理を説明するための図である。FIG. 20D is a diagram for explaining the data generation process of phrase trees. 図２０Ｅは、文節木のデータの生成処理を説明するための図である。FIG. 20E is a diagram for explaining the phrase tree data generation process. 図２０Ｆは、文節木のデータの生成処理を説明するための図である。FIG. 20F is a diagram for describing phrase tree data generation processing. 図２１は、文節木のデータの生成処理を説明するための図である。FIG. 21 is a diagram for explaining phrase tree data generation processing. 図２２Ａは、文節木のデータの生成処理を説明するための図である。FIG. 22A is a diagram for explaining the phrase tree data generation process. 図２２Ｂは、文節木のデータの生成処理を説明するための図である。FIG. 22B is a diagram for explaining phrase tree data generation processing. 図２３は、文節木のデータの生成処理を説明するための図である。FIG. 23 is a diagram for explaining phrase tree data generation processing. 図２４Ａは、文節木のデータの生成処理を説明するための図である。FIG. 24A is a diagram for explaining data generation processing of phrase trees. 図２４Ｂは、文節木のデータの生成処理を説明するための図である。FIG. 24B is a diagram for explaining the phrase tree data generation process. 図２４Ｃは、文節木のデータの生成処理を説明するための図である。FIG. 24C is a diagram for explaining the phrase tree data generation process. 図２４Ｄは、文節木のデータの生成処理を説明するための図である。FIG. 24D is a diagram for explaining the data generation process of phrase trees. 図２４Ｅは、文節木のデータの生成処理を説明するための図である。FIG. 24E is a diagram for explaining the phrase tree data generation process. 図２４Ｆは、文節木のデータの生成処理を説明するための図である。FIG. 24F is a diagram for describing phrase tree data generation processing. 図２５は、文節木のデータの生成処理を説明するための図である。FIG. 25 is a diagram for explaining phrase tree data generation processing. 図２６Ａは、番号取得処理の処理フローを示す図である。FIG. 26A is a diagram illustrating a processing flow of number acquisition processing. 図２６Ｂは、配列設定処理の処理フローを示す図である。FIG. 26B is a diagram illustrating a processing flow of array setting processing. 図２７は、カレントノード設定処理の処理フローを示す図である。FIG. 27 is a diagram illustrating a processing flow of current node setting processing. 図２８は、文節木の一例を示す図である。FIG. 28 is a diagram illustrating an example of a phrase tree. 図２９は、ソート後の文節木の一例を示す図である。FIG. 29 is a diagram illustrating an example of the phrase tree after sorting. 図３０は、圧縮マップを説明するための図である。FIG. 30 is a diagram for explaining the compression map. 図３１は、圧縮マップの生成を説明するための図である。FIG. 31 is a diagram for explaining generation of a compression map. 図３２は、圧縮マップの生成処理を説明するための図である。FIG. 32 is a diagram for explaining compression map generation processing. 図３３は、圧縮マップの生成処理を説明するための図である。FIG. 33 is a diagram for explaining compression map generation processing. 図３４は、圧縮マップの生成処理を説明するための図である。FIG. 34 is a diagram for explaining compression map generation processing. 図３５は、圧縮マップの生成処理を説明するための図である。FIG. 35 is a diagram for explaining compression map generation processing. 図３６は、圧縮マップの生成処理を説明するための図である。FIG. 36 is a diagram for explaining compression map generation processing. 図３７は、圧縮マップの生成処理を説明するための図である。FIG. 37 is a diagram for explaining compression map generation processing. 図３８は、圧縮マップの生成処理を説明するための図である。FIG. 38 is a diagram for explaining compression map generation processing. 図３９は、圧縮マップの生成処理を説明するための図である。FIG. 39 is a diagram for explaining compression map generation processing. 図４０は、圧縮マップの生成処理を説明するための図である。FIG. 40 is a diagram for explaining compression map generation processing. 図４１は、圧縮マップの生成処理を説明するための図である。FIG. 41 is a diagram for explaining compression map generation processing. 図４２は、圧縮マップの生成処理を説明するための図である。FIG. 42 is a diagram for explaining compression map generation processing. 図４３は、圧縮マップの生成処理を説明するための図である。FIG. 43 is a diagram for explaining compression map generation processing. 図４４は、圧縮マップの生成処理を説明するための図である。FIG. 44 is a diagram for explaining compression map generation processing. 図４５は、圧縮マップの生成処理を説明するための図である。FIG. 45 is a diagram for explaining compression map generation processing. 図４６は、圧縮マップの生成処理を説明するための図である。FIG. 46 is a diagram for explaining compression map generation processing. 図４７は、圧縮マップの生成処理を説明するための図である。FIG. 47 is a diagram for explaining compression map generation processing. 図４８は、圧縮マップの生成処理を説明するための図である。FIG. 48 is a diagram for explaining compression map generation processing. 図４９は、圧縮マップの生成処理を説明するための図である。FIG. 49 is a diagram for explaining compression map generation processing. 図５０は、圧縮マップの生成処理を説明するための図である。FIG. 50 is a diagram for explaining compression map generation processing. 図５１は、圧縮マップ生成処理の処理フローを示す図である。FIG. 51 is a diagram showing a processing flow of compression map generation processing. 図５２は、エントリ追加処理の処理フローを示す図である。FIG. 52 is a diagram showing a processing flow of entry addition processing. 図５３は、圧縮処理の処理フローを示す図である。FIG. 53 is a diagram showing a processing flow of compression processing. 図５４は、圧縮マップの一例を示す図である。FIG. 54 is a diagram illustrating an example of a compression map. 図５５は、圧縮処理を説明するための図である。FIG. 55 is a diagram for explaining the compression processing. 図５６は、伸張処理の処理フローを示す図である。FIG. 56 is a diagram showing a processing flow of decompression processing. 図５７は、伸張処理を説明するための図である。FIG. 57 is a diagram for explaining the decompression process. 図５８は、文節木のデータ構造を用いて圧縮する場合の処理を説明するための図である。FIG. 58 is a diagram for explaining processing when compression is performed using a phrase tree data structure. 図５９は、文節木のデータ構造を用いて圧縮する場合の処理を説明するための図である。FIG. 59 is a diagram for explaining processing when compression is performed using a phrase tree data structure. 図６０は、文節木のデータ構造を用いて圧縮する場合の処理を説明するための図である。FIG. 60 is a diagram for explaining processing when compression is performed using a phrase tree data structure. 図６１は、文節木のデータ構造を用いて圧縮する場合の処理を説明するための図である。FIG. 61 is a diagram for explaining processing when compression is performed using a data structure of a phrase tree. 図６２は、文節木のデータ構造を用いて伸張する場合の処理を説明するための図である。FIG. 62 is a diagram for explaining processing when decompressing using the data structure of the phrase tree. 図６３は、文節木のデータ構造を用いて伸張する場合の処理を説明するための図である。FIG. 63 is a diagram for explaining processing when decompressing using the data structure of the phrase tree. 図６４は、文節木のデータ構造を用いて伸張する場合の処理を説明するための図である。FIG. 64 is a diagram for explaining processing when decompressing using the data structure of the phrase tree. 図６５は、文節木のデータ構造を用いて伸張する場合の処理を説明するための図である。FIG. 65 is a diagram for explaining processing when decompressing using the data structure of the phrase tree. 図６６は、文節木のデータ構造を用いて伸張する場合の処理を説明するための図である。FIG. 66 is a diagram for explaining processing when decompressing using the data structure of the phrase tree. 図６７は、第２の実施の形態を説明するための文節木を表す図である。FIG. 67 is a diagram illustrating a phrase tree for explaining the second embodiment. 図６８は、第２の実施の形態における圧縮マップの模式図である。FIG. 68 is a schematic diagram of a compression map according to the second embodiment. 図６９は、第２の実施の形態におけるエントリ追加処理２の処理フローを示す図である。FIG. 69 is a diagram illustrating a processing flow of entry addition processing 2 according to the second embodiment. 図７０は、第２の実施の形態における圧縮処理２の処理フローを示す図である。FIG. 70 is a diagram illustrating a processing flow of compression processing 2 according to the second embodiment. 図７１は、第２の実施の形態における圧縮マップの一例を示す図である。FIG. 71 is a diagram illustrating an example of a compression map according to the second embodiment. 図７２は、第３の実施の形態における圧縮マップの一例を示す図である。FIG. 72 is a diagram illustrating an example of a compression map according to the third embodiment. 図７３は、第３の実施の形態における階層情報の一例を示す図である。FIG. 73 is a diagram illustrating an example of the hierarchy information in the third embodiment. 図７４は、第３の実施の形態における伸張マップの一例を示す図である。FIG. 74 is a diagram illustrating an example of an expansion map according to the third embodiment. 図７５は、第３の実施の形態における圧縮マップ生成処理２の処理フローを示す図である。FIG. 75 is a diagram illustrating a processing flow of the compression map generation processing 2 according to the third embodiment. 図７６は、第３の実施の形態におけるエントリ追加処理３の処理フローを示す図である。FIG. 76 is a diagram illustrating a processing flow of entry addition processing 3 according to the third embodiment. 図７７は、第３の実施の形態における設定処理の処理フローを示す図である。FIG. 77 is a diagram illustrating a processing flow of setting processing according to the third embodiment. 図７８は、第３の実施の形態における圧縮処理３の処理フローを示す図である。FIG. 78 is a diagram illustrating a processing flow of the compression processing 3 in the third embodiment. 図７９は、第３の実施の形態における伸張処理２の処理フローを示す図である。FIG. 79 is a diagram illustrating a processing flow of the decompression processing 2 according to the third embodiment. 図８０は、伸張処理２を説明するための図である。FIG. 80 is a diagram for explaining the decompression process 2. 図８１は、第４の実施の形態における圧縮マップの一例を示す図である。FIG. 81 is a diagram illustrating an example of a compression map according to the fourth embodiment. 図８２は、第４の実施の形態における階層情報の一例を示す図である。FIG. 82 is a diagram illustrating an example of the hierarchy information in the fourth embodiment. 図８３は、第４の実施の形態における伸張マップの一例を示す図である。FIG. 83 is a diagram illustrating an example of an expansion map according to the fourth embodiment. 図８４は、第４の実施の形態における圧縮マップ生成処理３の処理フローを示す図である。FIG. 84 is a diagram showing a processing flow of the compression map generation processing 3 in the fourth embodiment. 図８５は、第４の実施の形態におけるエントリ追加処理４の処理フローを示す図である。FIG. 85 is a diagram illustrating a process flow of the entry addition process 4 in the fourth embodiment. 図８６は、第４の実施の形態における設定処理２の処理フローを示す図である。FIG. 86 is a diagram illustrating a processing flow of setting processing 2 according to the fourth embodiment. 図８７は、第４の実施の形態における圧縮処理４の処理フローを示す図である。FIG. 87 is a diagram illustrating a processing flow of the compression processing 4 in the fourth embodiment. 図８８は、第４の実施の形態における伸張処理３の処理フローを示す図である。FIG. 88 is a diagram illustrating a processing flow of the decompression processing 3 according to the fourth embodiment. 図８９は、コンピュータの機能ブロック図である。FIG. 89 is a functional block diagram of a computer.

［実施の形態１］
図３に本実施の形態に係る文節木におけるノード１つ分のデータ構造を示す。ノードのデータブロックは、符号（ＩＤＸ）の領域と、文字出現番号の配列の領域と、カウンタ兼子ノード番号の配列の領域と、スパンド番号の領域と、文字出現数の領域とを含む。また、これとは別に、拡張カウンタ兼子ノード番号の配列数をカウントするための領域を有する。 [Embodiment 1]
FIG. 3 shows a data structure for one node in the phrase tree according to the present embodiment. The data block of the node includes a code (IDX) region, a character appearance number array region, a counter / child node number array region, a spanned number region, and a character appearance number region. Separately from this, there is an area for counting the number of arrays of extension counter / child node numbers.

文字出現番号の領域は、本ノードの符号に対応する文字又は文字列の次に出現する各文字（０ｘ００乃至０ｘＦＦまでの各文字）について出現順番を保持する配列である。但し、文字出現数が２５６の場合以外「ＦＦ」は未使用を表す。また、他の値は、カウンタ兼子ノード番号の配列番号を表す。例えば、文字「０ｘ４１（Ａ）」のノードについて着目し、文節木を生成するための文字列が「ＡＢＡＡＣ」である場合、最初の「Ａ」の次にまず出現した文字はＢ（０ｘ４２）なので、文字出現番号の領域０ｘ４２番目に「０」が設定され、次の「Ａ」の次に出現した文字はＡなので、文字出現番号の領域０ｘ４１番目に「１」が設定され、次の「Ａ」の次に出現した文字はＣなので、文字出現番号の領域０ｘ４３番目に「２」が設定される。 The character appearance number area is an array that holds the appearance order of each character (each character from 0x00 to 0xFF) that appears next to the character or character string corresponding to the code of this node. However, “FF” represents unused, except when the number of character appearances is 256. The other values represent the array number of the counter / child node number. For example, paying attention to the node of the character “0x41 (A)”, when the character string for generating the phrase tree is “ABAAC”, the character that first appears after the first “A” is B (0x42). , “0” is set in the character appearance number area 0x42 and the character that appears next to the next “A” is A, so “1” is set in the character appearance number area 0x41 and the next “A” “2” is set in the character appearance number area 0x43.

カウンタ兼子ノード番号の配列には、出現順番に従って出現回数（カウンタ）又は子ノード番号（「０ｘ」が付された番号）が格納される。上で述べた例では、「０」番目に文字「０ｘ４２」についての子ノード番号（符号）「０ｘ０１００」が格納され、「１」番目に文字「０ｘ４１」についての子ノード番号（符号）「０ｘ０１０４」が格納され、「２」番目に文字「０ｘ０４３」についての出現回数「１」が格納されている。本実施の形態では、カウンタ兼子ノード番号の領域には、８つの出現回数又は子ノード番号しか格納できない。８つを超える出現回数又は子ノード番号を保持する場合には、拡張カウンタ兼子ノード番号の配列の番号をスパンド番号の領域に設定する。スパンド番号の初期値は「０ｘＦＦＦＦ」である。 The counter / child node number array stores the number of appearances (counter) or child node numbers (numbers with “0x”) according to the order of appearance. In the example described above, the child node number (code) “0x0100” for the character “0x42” is stored in the “0” th, and the child node number (code) “0x0104” for the character “0x41” is stored in the “1” th. ", And the number of appearances" 1 "for the character" 0x043 "is stored in the" 2 "th position. In the present embodiment, only 8 occurrence counts or child node numbers can be stored in the counter / child node number area. When the number of appearances or child node numbers exceeding eight is held, the array number of the extended counter / child node number is set in the spanned number area. The initial value of the spanned number is “0xFFFF”.

拡張カウンタ兼子ノード番号の配列は、どのノードからも参照される共通領域に設けられる。図４に示すように、この配列も１つで８つの出現回数又は子ノード番号を保持できるようになっており、最後にこの配列でも不足する場合に参照すべき拡張カウンタ兼子ノード番号の配列の番号（拡張スパンド番号）が設定されるようになっている。 The array of extension counter / child node numbers is provided in a common area referenced from any node. As shown in FIG. 4, the number of occurrences or child node numbers of this array can be held by one, and the extension counter / child node number array to be referred to when this array is insufficient at the end. A number (extended spanned number) is set.

文字出現数の領域には、本ノードに対応する文字又は文字列の次に出現した文字の数が設定される。上で述べた例では「Ｂ」「Ａ」「Ｃ」の３文字出現したので「３」が設定される。 The number of characters that appear next to the character or character string corresponding to this node is set in the character appearance number area. In the example described above, since three characters “B”, “A”, and “C” appear, “3” is set.

このように、デフォルトで特定できる子ノードの数を限定して必ずしも用いない子ノードへのポインタの領域を削減している。また、親ノードへのポインタも有していないので、メモリ使用量が削減されている。 Thus, the number of child nodes that can be specified by default is limited to reduce the area of pointers to child nodes that are not necessarily used. In addition, since it does not have a pointer to the parent node, the memory usage is reduced.

本実施の形態では、このような文節木を生成する処理、このような文節木からディスクに格納するためのデータ構造である圧縮マップを生成する処理、圧縮マップを用いた圧縮処理及び伸張処理、並びにこれらの処理を実施する装置などについて説明する。 In the present embodiment, a process for generating such a phrase tree, a process for generating a compression map that is a data structure for storing in the disk from such a phrase tree, a compression process and an expansion process using the compression map, An apparatus for performing these processes will be described.

本実施の形態に係る情報処理装置１００の機能ブロック図を図５に示す。情報処理装置１００は、文節木生成部１１０と、圧縮マップ生成部１２０と、データ格納部１３０と、圧縮処理部１４０と、伸張処理部１５０と、入出力部１６０とを有する。 FIG. 5 shows a functional block diagram of the information processing apparatus 100 according to the present embodiment. The information processing apparatus 100 includes a phrase tree generation unit 110, a compression map generation unit 120, a data storage unit 130, a compression processing unit 140, an expansion processing unit 150, and an input / output unit 160.

文節木生成部１１０は、例えば入出力部１６０から入力された文節木生成のための入力文字列に対して処理を行って本実施の形態に係る文節木のデータを生成する。圧縮マップ生成部１２０は、文節木生成部１１０が生成した文節木のデータから圧縮マップを生成して、データ格納部１３０に格納する。 For example, the phrase tree generation unit 110 performs processing on an input character string for generation of a phrase tree input from the input / output unit 160 to generate phrase tree data according to the present embodiment. The compression map generation unit 120 generates a compression map from the phrase tree data generated by the phrase tree generation unit 110 and stores the compression map in the data storage unit 130.

圧縮処理部１４０は、データ格納部１３０に格納されている圧縮マップを用いて例えば入出力部１６０から入力された圧縮対象の入力文字列に対して圧縮処理を実施して、圧縮結果をデータ格納部１３０に格納する。伸張処理部１５０は、データ格納部１３０に格納されている圧縮マップを用いて、例えばデータ格納部１３０に格納されている圧縮結果に対して伸張処理を実施し、伸張結果をデータ格納部１３０に格納する。入出力部１６０は、伸張結果を表示装置などの出力装置又は他のコンピュータなどに出力する場合もある。 The compression processing unit 140 uses the compression map stored in the data storage unit 130 to perform compression processing on the input character string to be compressed input from the input / output unit 160, for example, and stores the compression result as data Stored in the unit 130. The decompression processing unit 150 uses the compression map stored in the data storage unit 130 to perform, for example, decompression processing on the compression result stored in the data storage unit 130, and stores the decompression result in the data storage unit 130. Store. The input / output unit 160 may output the expansion result to an output device such as a display device or another computer.

次に、図６乃至図６６を用いて情報処理装置１００の処理内容について説明する。まず、文節木生成部１１０は、入力文字列に対して文節木生成処理を実施する（図６：ステップＳ１）。この文節木生成処理については図７乃至図２７を用いて詳細に説明する。また、圧縮マップ生成部１２０は、文節木生成部１１０によって生成された文節木から圧縮マップを生成する圧縮マップ生成処理を実施し、データ格納部１３０に格納する（ステップＳ３）。圧縮マップ生成処理については図２８乃至図５２を用いて詳細に説明する。 Next, processing contents of the information processing apparatus 100 will be described with reference to FIGS. First, the phrase tree generation unit 110 performs a phrase tree generation process on the input character string (FIG. 6: step S1). This phrase tree generation process will be described in detail with reference to FIGS. In addition, the compression map generation unit 120 performs a compression map generation process for generating a compression map from the phrase tree generated by the phrase tree generation unit 110, and stores the compression map generation process in the data storage unit 130 (step S3). The compression map generation process will be described in detail with reference to FIGS.

その後、例えば入出力部１６０から圧縮対象の入力文字列が入力されると、圧縮処理部１４０は、圧縮マップを用いて圧縮処理を実施し、圧縮結果をデータ格納部１３０に格納する（ステップＳ５）。圧縮処理については図５３乃至図５５を用いて詳細に説明する。また、例えば入出力部１６０から指示されると、伸張処理部１５０は、例えばデータ格納部１３０に格納されている圧縮結果に対して、圧縮マップを用いて伸張処理を実施し、処理結果をデータ格納部１３０に格納する（ステップＳ７）。伸張処理については図５６及び図５７を用いて詳細に説明する。 Thereafter, for example, when an input character string to be compressed is input from the input / output unit 160, the compression processing unit 140 performs a compression process using the compression map and stores the compression result in the data storage unit 130 (step S5). ). The compression process will be described in detail with reference to FIGS. For example, when instructed by the input / output unit 160, the decompression processing unit 150 performs decompression processing on the compression result stored in the data storage unit 130, for example, using a compression map, and the processing result is stored as data. Store in the storage unit 130 (step S7). The decompression process will be described in detail with reference to FIGS.

次に、文節木生成処理について図７を用いて説明する。まず、文節木生成部１１０は、符号０ｘ００００乃至０ｘ００ＦＦのノードのデータを生成する（ステップＳ１１）。必ず設けられる第１階層のノードを初期設定として生成する。 Next, phrase tree generation processing will be described with reference to FIG. First, the phrase tree generator 110 generates node data with codes 0x0000 to 0x00FF (step S11). A node in the first hierarchy that is always provided is generated as an initial setting.

そして、文節木生成部１１０は、入力文字列から入力文字を１バイト読み込み、そのノードをカレントノードに設定する（ステップＳ１３）。さらに、文節木生成部１１０は、入力文字列から次の入力文字を１バイト読み込む（ステップＳ１５）。そして、文節木生成部１１０は、次の入力文字が入力文字列の終端であるか判断する（ステップＳ１７）。次の入力文字が入力文字列の終端であれば、処理は呼出元の処理に戻る。 Then, the phrase tree generation unit 110 reads one byte of the input character from the input character string, and sets the node as the current node (step S13). Further, the phrase tree generation unit 110 reads one byte of the next input character from the input character string (step S15). Then, the phrase tree generation unit 110 determines whether the next input character is the end of the input character string (step S17). If the next input character is the end of the input character string, the process returns to the caller process.

一方、次の入力文字が入力文字列の終端ではない場合には、文節木生成部１１０は、入力文字のコードを配列番号として用いて、カレントノードの文字出現番号の配列に格納されている値Ａを取得する（ステップＳ１８）。そして、文節木生成部１１０は、番号取得処理を実施する（ステップＳ１９）。この番号取得処理については、図２６Ａ及び図２６Ｂを用いて説明する。 On the other hand, if the next input character is not the end of the input character string, the phrase tree generator 110 uses the code of the input character as the array number, and the value stored in the character appearance number array of the current node A is acquired (step S18). Then, the phrase tree generation unit 110 performs a number acquisition process (step S19). This number acquisition process will be described with reference to FIGS. 26A and 26B.

その後、文節木生成部１１０は、カレントノード設定処理を実施する（ステップＳ２１）。カレントノード設定処理については、図２７を用いて説明する。その後処理はステップＳ１５に戻る。 Thereafter, the phrase tree generator 110 performs a current node setting process (step S21). The current node setting process will be described with reference to FIG. Thereafter, the process returns to step S15.

番号取得処理及びカレントノード設定処理の詳細を述べる前に、処理内容を分かり易くするために、具体例について図８乃至図２５を用いて説明しておく。ここでは、「ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI」という文字列を入力する際の処理を説明する。また、文節木のデータ構造だけではなく、文節木の模式図をも併せて説明する。 Before describing the details of the number acquisition process and the current node setting process, a specific example will be described with reference to FIGS. Here, the processing when inputting the character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” will be described. In addition to the data structure of the phrase tree, a schematic diagram of the phrase tree will also be described.

ステップＳ１１では、模式的に示せば図８に示すように、第１階層として符号０ｘ０００１乃至０ｘ００ＦＦのノードが生成される。この例では主に符号「０ｘ００４１」「０ｘ００４２」「０ｘ００４３」のノード及びその子ノードが主な処理対象となるので、この部分のみが示されている。また、図９に示すようなデータ構造のデータブロックが生成される。このように、文字「０ｘ４１」に対応する符号「０ｘ００４１」についての図３に示したデータブロックと、文字「０ｘ４２」に対応する符号「０ｘ００４２」についての図３に示したデータブロックと、文字「０ｘ４３」に対応する符号「０ｘ００４３」についての図３に示したデータブロックとが生成される。この段階では設定される値は初期値である。 In step S11, as schematically shown in FIG. 8, nodes 0x0001 to 0x00FF are generated as the first layer. In this example, mainly the nodes of the codes “0x0041”, “0x0042”, and “0x0043” and their child nodes are the main processing targets, so only this portion is shown. Further, a data block having a data structure as shown in FIG. 9 is generated. Thus, the data block shown in FIG. 3 for the code “0x0041” corresponding to the character “0x41”, the data block shown in FIG. 3 for the code “0x0042” corresponding to the character “0x42”, and the character “ The data block shown in FIG. 3 for the code “0x0043” corresponding to “0x43” is generated. The value set at this stage is an initial value.

次に、入力文字列「ＡBABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI」の最初の「Ａ」を読み込み、対応する符号「０ｘ００４１」のノードをカレントノードに設定する。また、入力文字列「AＢABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI」の次の「Ｂ」を読み込む。そうすると、図１０に示すように、カレントノードの文字出現番号の配列において「０ｘ４２（Ｂ）」の値を参照すると「０ｘＦＦ」のため、文字出現番号＝出現順番＝０を設定する。また、文字出現数の値を「０」から「１」に更新する。さらに、カウンタ兼子ノード番号の配列において、出現順番「０」番目の値が「０ｘ０１００」より小さいので、出現回数「０」を１インクリメントして「１」を設定する。また、出現回数が閾値「２」に達していないので、カレントノードを文字「０ｘ４２」に対応する符号「０ｘ００４２」のノードに変更する。なお、新たなカレントノードを黒塗り三角形で表し、旧カレントノードを白抜き三角形で表す。 Next, the first “A” of the input character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read, and the node with the corresponding code “0x0041” is set as the current node. Also, “B” next to the input character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. Then, as shown in FIG. 10, when the value of “0x42 (B)” is referred to in the character appearance number array of the current node, “0xFF” is set, so that character appearance number = order of appearance = 0 is set. Also, the value of the character appearance number is updated from “0” to “1”. Furthermore, in the array of counter and child node numbers, the value of the appearance order “0” is smaller than “0x0100”, so the appearance count “0” is incremented by 1 and set to “1”. In addition, since the number of appearances does not reach the threshold “2”, the current node is changed to a node “0x0042” corresponding to the character “0x42”. The new current node is represented by a black triangle, and the old current node is represented by a white triangle.

次に、入力文字列「ABＡBCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI」の「Ａ」を読み込む。そうすると、図１１に示すように、カレントノードの文字出現番号の配列において「０ｘ４１（Ａ）」の値を参照すると「０ｘＦＦ」のため、文字出現番号＝出現順番＝０を設定する。また、文字出現数の値を「０」から「１」に更新する。さらに、カウンタ兼子ノード番号の配列において、出現順番「０」番目の値が「０ｘ０１００」より小さいので、出現回数「０」を１インクリメントして「１」を設定する。また、出現回数が閾値「２」に達していないので、カレントノードを文字「０ｘ４１」に対応する符号「０ｘ００４１」のノードに変更する。 Next, “A” of the input character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. Then, as shown in FIG. 11, when the value of “0x41 (A)” is referenced in the character appearance number array of the current node, “0xFF” is set, so that character appearance number = appearance order = 0 is set. Also, the value of the character appearance number is updated from “0” to “1”. Furthermore, in the array of counter and child node numbers, the value of the appearance order “0” is smaller than “0x0100”, so the appearance count “0” is incremented by 1 and set to “1”. In addition, since the number of appearances does not reach the threshold “2”, the current node is changed to a node with a code “0x0041” corresponding to the character “0x41”.

さらに、入力文字列「ABAＢCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI」の「Ｂ」を読み込む。そうすると、図１２に示すように、カレントノードの文字出現番号の配列において「０ｘ４２（Ｂ）」の値を参照すると出現順番「０」が得られる。そして、カウンタ兼子ノード番号の配列において、出現順番「０」番目の値が「０ｘ０１００」より小さいので、出現回数「１」を１インクリメントする。そうすると、閾値「２」に達するので、図１２に示すように、新規に符号「０ｘ０１００」の子ノードを生成し、カウンタ兼子ノード番号の配列の「０」番目に子ノードの符号「０ｘ０１００」を設定する。符号「０ｘ０１００」は「ＡＢ」に対応する符号となる。そして、カレントノードを符号「０ｘ０１００」の子ノードに設定する。文節木は、図８のような状態から、図１３に示す状態に変化する。 Furthermore, “B” of the input character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. Then, as shown in FIG. 12, when the value “0x42 (B)” is referenced in the character appearance number array of the current node, the appearance order “0” is obtained. In the counter / child node number array, the value “0” in the order of appearance is smaller than “0x0100”, so the appearance count “1” is incremented by one. Then, since the threshold value “2” is reached, as shown in FIG. 12, a child node with a code “0x0100” is newly generated, and the code “0x0100” of the child node is added to the “0” th of the counter / child node number array. Set. The code “0x0100” is a code corresponding to “AB”. Then, the current node is set as a child node of the code “0x0100”. The phrase tree changes from the state shown in FIG. 8 to the state shown in FIG.

また、入力文字列「ABABＣABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI」の「Ｃ」を読み込む。そうすると、図１４に示すように、カレントノードの文字出現番号の配列において「０ｘ４３（Ｃ）」の値を参照すると「０ｘＦＦ」のため、文字出現番号＝出現順番＝０を設定する。また、文字出現数の値を「０」から「１」に更新する。さらに、カウンタ兼子ノード番号の配列において、出現順番「０」番目の値が「０ｘ０１００」より小さいので、出現回数「０」を１インクリメントして「１」を設定する。また、出現回数が閾値「２」に達していないので、カレントノードを文字「０ｘ４３」に対応する符号「０ｘ００４３」のノードに変更する。 Also, “C” in the input character string “ABABCABCCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. Then, as shown in FIG. 14, when the value “0x43 (C)” is referred to in the character appearance number array of the current node, “0xFF” is set, so that character appearance number = appearance order = 0 is set. Also, the value of the character appearance number is updated from “0” to “1”. Furthermore, in the array of counter and child node numbers, the value of the appearance order “0” is smaller than “0x0100”, so the appearance count “0” is incremented by 1 and set to “1”. In addition, since the number of appearances does not reach the threshold “2”, the current node is changed to a node of the code “0x0043” corresponding to the character “0x43”.

さらに、入力文字列「ABABCＡBCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI」の「Ａ」を読み込む。そうすると、図１５に示すように、カレントノードの文字出現番号の配列において「０ｘ４１（Ａ）」の値を参照すると「０ｘＦＦ」のため、文字出現番号＝出現順番＝０を設定する。また、文字出現数の値を「０」から「１」に更新する。さらに、カウンタ兼子ノード番号の配列において、出現順番「０」番目の値が「０ｘ０１００」より小さいので、出現回数「０」を１インクリメントして「１」を設定する。また、出現回数が閾値「２」に達していないので、カレントノードを文字「０ｘ４１」に対応する符号「０ｘ００４１」のノードに変更する。 Furthermore, “A” of the input character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. Then, as shown in FIG. 15, when the value of “0x41 (A)” is referred to in the character appearance number array of the current node, “0xFF” is set, so that character appearance number = appearance order = 0 is set. Also, the value of the character appearance number is updated from “0” to “1”. Furthermore, in the array of counter and child node numbers, the value of the appearance order “0” is smaller than “0x0100”, so the appearance count “0” is incremented by 1 and set to “1”. In addition, since the number of appearances does not reach the threshold “2”, the current node is changed to a node with a code “0x0041” corresponding to the character “0x41”.

さらに、入力文字列「ABABCAＢCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI」の「Ｂ」を読み込む。そうすると、図１６に示すように、カレントノードの文字出現番号の配列において「０ｘ４２（Ｂ）」の値を参照すると「０」のため、カウンタ兼子ノード番号の配列において、出現順番「０」番目の値を参照する。そうすると符号「０ｘ０１００」が得られるので、カレントノードを符号「０ｘ０１００」のノードに変更する。 Furthermore, “B” of the input character string “ABABCABCCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. Then, as shown in FIG. 16, when the value “0x42 (B)” is referred to in the character appearance number array of the current node, it is “0”, and therefore the appearance order “0” th in the counter / child node number array. Refers to the value. Then, since the code “0x0100” is obtained, the current node is changed to the node of the code “0x0100”.

さらに、入力文字列「ABABCABＣABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI」の「Ｃ」を読み込む。そうすると、図１７に示すように、カレントノードの文字出現番号の配列において「０ｘ４３（Ｃ）」の値を参照すると出現順番「０」が得られる。そして、カウンタ兼子ノード番号の配列において、出現順番「０」番目の値が「０ｘ０１００」より小さいので、出現回数「１」を１インクリメントする。そうすると、閾値「２」に達するので、図１７に示すように、新規に符号「０ｘ０１０１」の子ノードを生成し、カウンタ兼子ノード番号の配列の「０」番目に子ノードの符号「０ｘ０１０１」を設定する。符号「０ｘ０１０１」は「ＡＢＣ」に対応する符号となる。そして、カレントノードを符号「０ｘ０１０１」の子ノードに設定する。文節木は、図１３のような状態から、図１８に示す状態に変化する。 Furthermore, “C” of the input character string “ABABCABCABCCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. Then, as shown in FIG. 17, when the value “0x43 (C)” is referenced in the character appearance number array of the current node, the appearance order “0” is obtained. In the counter / child node number array, the value “0” in the order of appearance is smaller than “0x0100”, so the appearance count “1” is incremented by one. Then, since the threshold value “2” is reached, as shown in FIG. 17, a child node with a code “0x0101” is newly generated, and the code “0x0101” of the child node is added to the “0” th of the counter / child node number array. Set. The code “0x0101” is a code corresponding to “ABC”. Then, the current node is set as a child node of the code “0x0101”. The phrase tree changes from the state shown in FIG. 13 to the state shown in FIG.

さらに、入力文字列「ABABCABCＡBCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI」の「Ａ」を読み込む。そうすると、図１９に示すように、カレントノードの文字出現番号の配列において「０ｘ４１（Ａ）」の値を参照すると「０ｘＦＦ」のため、文字出現番号＝出現順番＝０を設定する。また、文字出現数の値を「０」から「１」に更新する。さらに、カウンタ兼子ノード番号の配列において、出現順番「０」番目の値が「０ｘ０１００」より小さいので、出現回数「０」を１インクリメントして「１」を設定する。また、出現回数が閾値「２」に達していないので、カレントノードを文字「０ｘ４１」に対応する符号「０ｘ００４１」のノードに変更する。このような処理を繰り返して行く。 Furthermore, “A” of the input character string “ABABCABCABCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. Then, as shown in FIG. 19, when the value of “0x41 (A)” is referred to in the character appearance number array of the current node, “0xFF” is set, so that character appearance number = appearance order = 0 is set. Also, the value of the character appearance number is updated from “0” to “1”. Furthermore, in the array of counter and child node numbers, the value of the appearance order “0” is smaller than “0x0100”, so the appearance count “0” is incremented by 1 and set to “1”. In addition, since the number of appearances does not reach the threshold “2”, the current node is changed to a node with a code “0x0041” corresponding to the character “0x41”. This process is repeated.

そして、入力文字列「ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBＩCBI」の「Ｉ」を読み込む段階になったものとする。この時カレントノードは符号「０ｘ０１０２」のノードであり、図２０Ａ乃至図２０Ｆに示すように、カレントノードの文字出現番号の配列において「０ｘ４９（Ｉ）」の値を参照すると「０ｘＦＦ」であるため、文字出現番号＝出現順番「８」を設定する。また、文字出現数の値を「８」から「９」に更新する。さらに、カウンタ兼子ノード番号の配列において、出現順番「８」番目は、デフォルトの配列には入っていないので、拡張カウンタ兼子ノード番号の配列を参照することになる。ここでは拡張カウンタ兼子ノード番号の配列「０」を確保し、その配列において「０（＝８−８）」番目の値を参照する。この拡張カウンタ兼子ノード番号の配列「０」における出現順番「０」番目の値が「０ｘ０１００」より小さいので、出現回数「０」を１インクリメントして「１」を設定する。また、出現回数が閾値「２」に達していないので、カレントノードを文字「０ｘ４９」に対応する符号「０ｘ００４９」のノードに変更する。なお、この段階では文節木は図２１に示すような状態となっている。符号「０ｘ０１０２」の子ノード（第３階層のノード）が８つ既に生成されている。 Then, it is assumed that “I” in the input character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. At this time, the current node is a node having the code “0x0102”, and as shown in FIGS. 20A to 20F, when the value “0x49 (I)” is referenced in the array of the character appearance numbers of the current node, it is “0xFF”. , Character appearance number = appearance order “8” is set. Also, the value of the character appearance number is updated from “8” to “9”. Furthermore, in the counter / child node number array, the appearance order “8” is not included in the default array, so the extended counter / child node number array is referred to. Here, the array “0” of the extension counter / child node number is secured, and the “0 (= 8−8)” th value is referred to in the array. Since the value “0” in the order of appearance in the array “0” of this extended counter / child node number is smaller than “0x0100”, the number of appearances “0” is incremented by 1 and set to “1”. In addition, since the number of appearances does not reach the threshold “2”, the current node is changed to a node with a code “0x0049” corresponding to the character “0x49”. At this stage, the phrase tree is in a state as shown in FIG. Eight child nodes (third layer nodes) of the code “0x0102” have already been generated.

その後、入力文字列「ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBIＣBI」の「Ｃ」を読み込む。そうすると、図２２Ａ及び図２２Ｂに示すように、カレントノードの文字出現番号の配列において「０ｘ４３（Ｃ）」の値を参照すると出現順番「０」が得られる。そして、カウンタ兼子ノード番号の配列において、出現順番「０」番目の値が「０ｘ０１００」より小さいので、出現回数「０」を１インクリメントして「１」を設定する。また、出現回数が閾値「２」に達していないので、カレントノードを文字「０ｘ４３」に対応する符号「０ｘ００４３」のノードに変更する。 Thereafter, “C” of the input character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. Then, as shown in FIGS. 22A and 22B, when the value “0x43 (C)” is referred to in the character appearance number array of the current node, the appearance order “0” is obtained. In the counter / child node number array, the value “0” in the order of appearance is smaller than “0x0100”, so the number of appearances “0” is incremented by 1 and set to “1”. In addition, since the number of appearances does not reach the threshold “2”, the current node is changed to a node of the code “0x0043” corresponding to the character “0x43”.

さらに、入力文字列「ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICＢI」の「Ｂ」を読み込む。そうすると、図２３に示すように、カレントノードの文字出現番号の配列において「０ｘ４２（Ｂ）」の値を参照すると「１」のため、カウンタ兼子ノード番号の配列において、出現順番「１」番目の値を参照する。そうすると符号「０ｘ０１０２」が得られるので、カレントノードを符号「０ｘ０１０２」のノードに変更する。 Furthermore, “B” of the input character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. Then, as shown in FIG. 23, when the value of “0x42 (B)” is referred to in the character appearance number array of the current node, the value is “1”. Refers to the value. Then, since the code “0x0102” is obtained, the current node is changed to the node of the code “0x0102”.

最後に、入力文字列「ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBＩ」の「Ｉ」を読み込む。そうすると、図２４Ａ乃至図２４Ｆに示すように、カレントノードの文字出現番号の配列において「０ｘ４９（Ｉ）」の値を参照すると出現順番「８」が得られる。カウンタ兼子ノード番号の配列において、出現順番「８」番目は、デフォルトの配列には入っていないので、拡張カウンタ兼子ノード番号の配列を参照することになる。ここでは拡張カウンタ兼子ノード番号の配列「０」において「０（＝８−８）」番目の値を参照すると、「０ｘ０１００」より小さいので、出現回数「１」を１インクリメントする。そうすると、閾値「２」に達するので、図２４Ａ乃至図２４Ｆに示すように、新規に符号「０ｘ０１０Ｄ」の子ノードを生成し、カウンタ兼子ノード番号の配列の「０」番目に子ノードの符号「０ｘ０１０Ｄ」を設定する。符号「０ｘ０１０Ｄ」は「ＡＢＩ」に対応する符号となる。そして、カレントノードを符号「０ｘ０１０Ｄ」の子ノードに設定する。文節木は、図２３のような状態から、図２５に示す状態に変化する。このように、第２階層には４個、第３階層には１０個のノードを有する文節木が生成されるが、出現順番で子ノードを生成しているので、第２階層及び第３階層において、符号は、対応する文字のコード順には並べられていない。 Finally, “I” of the input character string “ABABCABCABCCBCBCBCAAACBACBACBBCCBBCBECBECBDCBDCBGCBGCBHCBHCBFCBFCBICBI” is read. Then, as shown in FIGS. 24A to 24F, the appearance order “8” is obtained by referring to the value “0x49 (I)” in the character appearance number array of the current node. In the counter / child node number array, the appearance order “8” is not included in the default array, so the extended counter / child node number array is referred to. Here, referring to the “0 (= 8−8)”-th value in the array “0” of the extension counter / child node number is smaller than “0x0100”, the appearance count “1” is incremented by one. Then, since the threshold value “2” is reached, as shown in FIGS. 24A to 24F, a child node with a code “0x010D” is newly generated, and the code “0” of the child node number “0” in the counter / child node number array is displayed. 0x010D "is set. The code “0x010D” is a code corresponding to “ABI”. Then, the current node is set as a child node of code “0x010D”. The phrase tree changes from the state shown in FIG. 23 to the state shown in FIG. In this way, a phrase tree having four nodes in the second hierarchy and 10 nodes in the third hierarchy is generated, but child nodes are generated in the order of appearance, so the second hierarchy and the third hierarchy are generated. Are not arranged in the code order of the corresponding characters.

このような処理を実施するための番号取得処理について、図２６Ａを用いて説明する。文節木生成部１１０は、値Ａが０ｘＦＦ且つ文字出現数が２５６ではないという条件を満たしているか判断する（ステップＳ３１）。この条件が満たされている場合には、今まで出現していなかった文字が出現したことになるので、文節木生成部１１０は、値Ａにカレントノードについての文字出現番号を設定し（ステップＳ３３）、文字出現数を１インクリメントする（ステップＳ３５）。 A number acquisition process for performing such a process will be described with reference to FIG. 26A. The phrase tree generation unit 110 determines whether the condition that the value A is 0xFF and the number of character appearances is not 256 is satisfied (step S31). When this condition is satisfied, a character that has not appeared so far has appeared, so the phrase tree generation unit 110 sets the character appearance number for the current node as the value A (step S33). ), The character appearance count is incremented by 1 (step S35).

ステップＳ３１の条件を満たしていないと判断された場合又はステップＳ３５の後に、文節木生成部１１０は、値Ａが８以上であるか判断する（ステップＳ３７）。値Ａが８未満であれば、文節木生成部１１０は、カウンタ兼子ノード番号の配列においてＡ番目に格納されている値Ｂを取得する（ステップＳ５１）。そして呼出元の処理に戻る。 When it is determined that the condition of step S31 is not satisfied or after step S35, the phrase tree generation unit 110 determines whether the value A is 8 or more (step S37). If the value A is less than 8, the phrase tree generator 110 acquires the value B stored in the Ath position in the counter / child node number array (step S51). Then, the process returns to the caller process.

一方、値Ａが８以上であれば、文節木生成部１１０は、スパンド番号が０ｘＦＦＦＦとなっているか判断する（ステップＳ３９）。スパンド番号が０ｘＦＦＦＦである場合には、まだ拡張カウンタ兼子ノード番号の配列を取得していないことになる。従って、スパンド番号が０ｘＦＦＦＦである場合には、文節木生成部１１０は、配列設定処理を実施する（ステップＳ４１）。そして処理はステップＳ４３に移行する。 On the other hand, if the value A is 8 or more, the phrase tree generator 110 determines whether the spanned number is 0xFFFF (step S39). If the spanned number is 0xFFFF, the array of extension counters and child node numbers has not yet been acquired. Accordingly, when the spanned number is 0xFFFF, the phrase tree generation unit 110 performs an array setting process (step S41). Then, the process proceeds to step S43.

配列設定処理について図２６Ｂを用いて説明する。文節木生成部１１０は、カレントノードについてのスパンド番号又は現在参照している拡張カウンタ兼子ノード番号の配列の拡張スパンド番号に、現在の配列数を設定する（ステップＳ６８）。そして、文節木生成部１１０は、配列数を１インクリメントする（ステップＳ６９）。そして、呼出元の処理に戻る。 The array setting process will be described with reference to FIG. 26B. The phrase tree generation unit 110 sets the current number of arrays to the spanned number for the current node or the expanded spanned number of the array of the currently referenced extended counter / child node number (step S68). Then, the phrase tree generation unit 110 increments the number of arrays by 1 (step S69). Then, the process returns to the caller process.

ステップＳ３９でスパンド番号が０ｘＦＦＦＦではなく他の値が設定されているか、又はステップＳ４１の後に、文節木生成部１１０は、取得した「スパンド番号」又は「拡張スパンド番号」番目の拡張カウンタ兼子ノード番号の配列を参照する（ステップＳ４３）。そして、文節木生成部１１０は、Ａ＝Ａ−８を算出する（ステップＳ４５）。その後、文節木生成部１１０は、Ａが８以上であるか判断する（ステップＳ４７）。Ａが８以上であれば、拡張カウンタ兼子ノード番号の配列を２以上使用することになる。Ａが８以上であれば、現在参照している拡張カウンタ兼子ノード番号の配列における拡張スパンド番号が０ｘＦＦＦＦであるか判断する（ステップＳ５３）。拡張スパンド番号が０ｘＦＦＦＦでない場合にはステップＳ５７に移行する。一方、拡張スパンド番号が０ｘＦＦＦＦである場合には、新たな拡張カウンタ兼子ノード番号の配列を設定することになるので、文節木生成部１１０は、配列設定処理（図２６Ｂ）を実施する（ステップＳ５５）。そして処理はステップＳ５７に移行する。 In step S39, the spanned number is not 0xFFFF but another value is set, or after step S41, the phrase tree generation unit 110 acquires the “spanned number” or “extended spanned number” -th extended counter / child node number. Is referred to (step S43). Then, the phrase tree generation unit 110 calculates A = A-8 (step S45). Thereafter, the phrase tree generator 110 determines whether A is 8 or more (step S47). If A is 8 or more, two or more arrays of extension counter / child node numbers are used. If A is 8 or more, it is determined whether or not the extended spanned number in the array of the currently referred extended counter / child node number is 0xFFFF (step S53). If the expanded spanned number is not 0xFFFF, the process proceeds to step S57. On the other hand, when the extended spanned number is 0xFFFF, an array of a new extended counter / child node number is set, so the phrase tree generation unit 110 performs an array setting process (FIG. 26B) (step S55). ). Then, the process proceeds to step S57.

そして、文節木生成部１１０は、参照先の拡張カウンタ兼子ノード番号の配列における拡張スパンド番号、又は配列設定処理において設定された拡張カウンタ兼子ノード番号の配列における拡張スパンド番号を取得する（ステップＳ５７）。そして処理はステップＳ４３に戻る。 Then, the phrase tree generation unit 110 acquires the extended spanned number in the array of reference destination extended counter / child node numbers or the extended spanned number in the array of extended counter / child node numbers set in the array setting process (step S57). . Then, the process returns to step S43.

一方、値Ａが８未満となれば、文節木生成部１１０は、参照先の拡張カウンタ兼子ノード番号の配列のＡ番目に格納されている値Ｂを取得する（ステップＳ４９）。そして処理は呼出元の処理に戻る。 On the other hand, if the value A is less than 8, the phrase tree generation unit 110 acquires the value B stored in the Ath position in the array of reference destination extended counter / child node numbers (step S49). Then, the process returns to the caller process.

以上の処理を実施することで、子ノードの符号又は文字の出現回数を得ることができるようになる。 By performing the above processing, it is possible to obtain the number of appearances of the code or character of the child node.

次に、カレントノード設定処理について、図２７を用いて説明する。文節木生成部１１０は、カウンタ兼子ノード番号（又は拡張カウンタ兼子ノード番号）の配列のＡ番目に格納されている値Ｂが子ノード番号（すなわち０ｘ０１００以上の符号）であるか判断する（ステップＳ７１）。値Ｂが子ノード番号である場合には、文節木生成部１１０は、カレントノードを子ノードに位置づける（ステップＳ８３）。そして呼出元の処理に戻る。 Next, the current node setting process will be described with reference to FIG. The phrase tree generation unit 110 determines whether or not the value B stored in the A-th array of the counter / child node number (or extension counter / child node number) is a child node number (that is, a code of 0x0100 or more) (step S71). ). When the value B is a child node number, the phrase tree generation unit 110 positions the current node as a child node (step S83). Then, the process returns to the caller process.

一方、値Ｂが子ノード番号ではない場合には、文節木生成部１１０は、値Ｂを１インクリメントする（ステップＳ７３）。そして、文節木生成部１１０は、値Ｂが閾値以上となったか判断する（ステップＳ７５）。値Ｂが閾値未満であれば、文節木生成部１１０は、カレントノードを入力文字の対応する符号のノード（すなわち、入力文字コードのノード）に位置づける（ステップＳ７７）。そして呼出元の処理に移行する。 On the other hand, if the value B is not a child node number, the phrase tree generation unit 110 increments the value B by 1 (step S73). Then, the phrase tree generation unit 110 determines whether the value B is equal to or greater than a threshold value (step S75). If the value B is less than the threshold value, the phrase tree generation unit 110 positions the current node as the node corresponding to the code of the input character (that is, the node of the input character code) (step S77). Then, the process proceeds to the caller process.

一方、値Ｂが閾値以上であれば、文節木生成部１１０は、カウンタ兼子ノード番号（又は拡張カウンタ兼子ノード番号）の配列のＡ番目に、新たな子ノードの符号を設定する（ステップＳ７９）。データブロックのうち最も大きい符号＋１で新たな子ノードの符号が設定される。さらに、文節木生成部１１０は、設定した符号の新たな子ノードのデータブロックを生成し、カレントノードを、生成したノードに位置づける（ステップＳ８１）。 On the other hand, if the value B is greater than or equal to the threshold value, the phrase tree generation unit 110 sets the code of the new child node in the Ath position of the counter / child node number (or extension counter / child node number) array (step S79). . The code of a new child node is set with the largest code + 1 in the data block. Further, the phrase tree generation unit 110 generates a data block of a new child node with the set code, and positions the current node as the generated node (step S81).

このような処理を実施すれば、図２４Ａ乃至図２４Ｆに示すような文節木が生成される。 When such processing is performed, a phrase tree as shown in FIGS. 24A to 24F is generated.

次に、圧縮マップ生成処理について説明する。上でも述べたように、文節木は、第１階層については文字コードの順に符号を付与しているので符号の小さい順にソートされているが、第２階層以降については出現順に符号が付与されてしまうので、階層や対応する文字コードとは無関係に符号が付与されている。図２８に示す簡単な例では、第２階層には符号「０ｘ０１００」「０ｘ０１０４」「０ｘ０１０６」「０ｘ０１０２」のノードが付与されており、連続した符号となっていない。また、文字コード「０ｘ４１」に対応する符号「０ｘ００４１」の子ノード「０ｘ０１００」で表す文字コードは「０ｘ４２」であり、子ノード「０ｘ０１０４」で表す文字コードは「０ｘ４１」であり、この点においても符号は文字コードの順に整列されていない。 Next, the compression map generation process will be described. As described above, the phrase trees are sorted in ascending order of the codes because the codes are assigned in the order of the character codes for the first layer, but the codes are assigned in the order of appearance for the second and subsequent layers. Therefore, the code is assigned regardless of the hierarchy and the corresponding character code. In the simple example shown in FIG. 28, nodes of codes “0x0100”, “0x0104”, “0x0106”, and “0x0102” are assigned to the second layer, and are not continuous codes. Also, the character code represented by the child node “0x0100” of the code “0x0041” corresponding to the character code “0x41” is “0x42”, and the character code represented by the child node “0x0104” is “0x41”. The codes are not arranged in the order of the character codes.

本実施の形態では、圧縮マップを生成する際に、図２８に示すような符号の付与態様から、図２９に示すように、階層順及び親ノード毎に対応する文字（符号に対応する文字列の最後の文字）のコード順にソートするように、符号を振り直す。図２９の例では、第２階層の符号「０ｘ０１００」には文字「０ｘ４１」のノードを対応付け、符号「０ｘ０１０１」には文字「０ｘ４２」のノードを対応付け、符号「０ｘ０１０２」には文字「０ｘ４３」のノードを対応付け、符号「０ｘ０１０３」に文字「０ｘ４２」のノードを対応付ける。また、第３階層の符号「０ｘ０１０４」に文字「０ｘ４３」のノードを対応付け、符号「０ｘ０１０５」に文字「０ｘ４１」のノードを対応付け、符号「０ｘ０１０６」に文字「０ｘ４２」のノードを対応付け、符号「０ｘ０１０７」に文字「０ｘ４３」のノードを対応付ける。このようにすれば、ノードの二分探索（二分検索）が可能となる。 In this embodiment, when a compression map is generated, characters corresponding to each hierarchical order and parent node (a character string corresponding to a symbol) are generated as shown in FIG. The code is re-assigned so that it is sorted in the order of the code of the last character). In the example of FIG. 29, the node of the character “0x41” is associated with the code “0x0100” of the second layer, the node of the character “0x42” is associated with the code “0x0101”, and the character “0x0102” is associated with the character “0x0102”. The node “0x43” is associated, and the node “0x42” is associated with the code “0x0103”. Further, the node of the character “0x43” is associated with the code “0x0104” of the third layer, the node of the character “0x41” is associated with the code “0x0105”, and the node of the character “0x42” is associated with the code “0x0106”. The node of the character “0x43” is associated with the code “0x0107”. In this way, a binary search (binary search) of nodes becomes possible.

より具体的に圧縮マップを説明する。圧縮マップは、親ノードの符号と自ノードの文字コード（自ノードの符号に対応する文字又は文字列の最後の文字）とを対応付ける対応付けデータを符号の順番で並べたデータである。実際には、カレントノードの符号と、当該カレントノードのカウンタ兼子ノード番号の配列に符号が格納されていることが確認された文字出現番号の配列の番号とを対応付ける対応付けデータを含む。図２８に示すような文節木の場合には、図３０の左側のような圧縮マップが生成される。上でも述べたように、符号は出現順に付与されるので、親ノードの符号の列はソートされておらず、結果として符号で表される文字列自体も文字コード順にソートされていない。なお、図３０の例では、符号も文字列も示されているが、これらのデータは圧縮マップには含まれていない補足情報である。 The compression map will be described more specifically. The compression map is data in which association data that associates the code of the parent node with the character code of the own node (the character corresponding to the code of the own node or the last character of the character string) is arranged in the order of the codes. Actually, it includes association data that associates the code of the current node with the number of the array of character appearance numbers whose code is confirmed to be stored in the counter / child node number array of the current node. In the case of a phrase tree as shown in FIG. 28, a compression map as shown on the left side of FIG. 30 is generated. As described above, since the codes are assigned in the order of appearance, the code sequence of the parent node is not sorted, and as a result, the character strings themselves represented by the codes are not sorted in the character code order. In the example of FIG. 30, the code and the character string are shown, but these data are supplementary information not included in the compression map.

本実施の形態では、文節木から図３０の右側に示すように、上記のようなソート及び符号の振り直しを実施して、親ノードの符号と自ノードの文字コードとを対応付ける対応付けデータを、新たに振り直された符号の順番に並べることで圧縮マップを生成する。図３０の例でも分かるように、親ノードの符号は符号の小さい順にソートされている。これによって圧縮マップの二分探索が可能となる。 In the present embodiment, as shown on the right side of FIG. 30 from the phrase tree, the above sort and code reassignment are performed, and the association data for associating the code of the parent node with the character code of the own node is obtained. Then, the compressed map is generated by arranging them in the order of the newly reassigned codes. As can be seen from the example of FIG. 30, the codes of the parent nodes are sorted in ascending order of the codes. This allows a binary search of the compressed map.

本実施の形態では、ソートを行うために、図３１に示すように、ソート領域を導入する。ソート領域には、以下の処理で特定される符号を、以下の処理で特定された順で格納する。この際、ソート領域の配列番号はソート後の符号となる。図３１の例では、配列番号「０ｘ０１００」の位置には、第２階層のノードの符号として最初にソート前符号「０ｘ０１０４」が格納される。すなわち、ソート前符号「０ｘ０１０４」のノードは、今後ソート後符号「０ｘ０１００」として取り扱われる。このノードについては、親ノードの符号は「０ｘ００４１」であり、このノードで表される文字コードは「０ｘ４１」である。 In this embodiment, in order to sort, a sort area is introduced as shown in FIG. In the sort area, codes specified by the following processing are stored in the order specified by the following processing. At this time, the array number of the sort area is the code after sorting. In the example of FIG. 31, the pre-sort code “0x0104” is first stored as the code of the second layer node at the position of the array element number “0x0100”. That is, the node with the pre-sort code “0x0104” will be handled as the post-sort code “0x0100” in the future. For this node, the sign of the parent node is “0x0041”, and the character code represented by this node is “0x41”.

より具体的な処理内容を、図２４Ａ乃至図２４Ｆに示すような文節木のデータが得られた場合を一例として段階的に説明する。 More specific processing contents will be described step by step, taking as an example the case where phrase tree data as shown in FIGS. 24A to 24F is obtained.

まず、符号「０ｘ００００」乃至「０ｘ００ＦＦ」のノードについてはソートを行わなくてもよいので、そのまま圧縮マップにデータを登録する。例えば図３２に示すようなデータが圧縮マップに登録される。なお、親ノードは根ノードなので親ノードの符号として「ｒｏｏｔ」というデータが登録されており、さらにそれぞれの担当する文字「０ｘ００」乃至「０ｘＦＦ」が登録される。さらに、図３３に示すように、ソート領域にはノード「０ｘ００００」乃至「０ｘ００ＦＦ」を小さい順に登録する。 First, since it is not necessary to sort the nodes with codes “0x0000” to “0x00FF”, data is registered in the compression map as it is. For example, data as shown in FIG. 32 is registered in the compression map. Since the parent node is the root node, data “root” is registered as a code of the parent node, and the characters “0x00” to “0xFF” in charge of each are registered. Further, as shown in FIG. 33, nodes “0x0000” to “0x00FF” are registered in the sorting area in ascending order.

次に、ソート領域に登録されている符号のノードを配列番号（＝ソート後の符号）の小さい順に処理を行う。但し、子ノードが存在しないと圧縮マップには対応付けデータは登録されないので、図２５に示す文節木に示すように符号「０ｘ００４０」までの処理の説明は省略する。そして、図３４に示すように、配列番号「０ｘ００４１」を処理することになると、符号「０ｘ００４１」のデータブロックを参照し、その文字出現番号の配列番号「０ｘ００」から順番に「０ｘＦＦ」以外の値が登録されている位置を探索する。この例では、「０ｘ４１」番目に出現順番「１」が登録されているので、カウンタ兼子ノード番号の配列で「１」番目を参照する。そうすると、「０ｘ１００」以上の値として符号「０ｘ０１０４」が登録されているので、ソート領域の最後端「０ｘ０１００」に「０ｘ０１０４」を登録する。さらに、圧縮マップに、図３５に示すように、ソート後符号「０ｘ１００」番目に、親ノードのソート後の符号「０ｘ００４１」と文字出現番号の配列番号「０ｘ４１」に相当する文字コードとを対応付ける対応付けデータを登録する。 Next, the nodes of the codes registered in the sort area are processed in ascending order of the array element numbers (= sorted codes). However, if there is no child node, the association data is not registered in the compression map, so the description of the processing up to the code “0x0040” is omitted as shown in the phrase tree shown in FIG. Then, as shown in FIG. 34, when processing the array element number “0x0041”, the data block of the code “0x0041” is referred to, and the character appearance number from the array element number “0x00” in order is not “0xFF”. Search for the position where the value is registered. In this example, since the appearance order “1” is registered in the “0x41” th, the “1” th is referred to in the counter / child node number array. Then, since the code “0x0104” is registered as a value equal to or greater than “0x100”, “0x0104” is registered at the last end “0x0100” of the sort area. Furthermore, as shown in FIG. 35, the code “0x100” after the sorting of the parent node is associated with the character code corresponding to the array number “0x41” of the character appearance number, as shown in FIG. Register the association data.

また、図３６に示すように、符号「０ｘ００４１」のデータブロックにおける文字出現番号の配列番号「０ｘ４２」番目には、出現順番「０」が登録されているので、カウンタ兼子ノード番号の配列で「０」番目を参照する。そうすると、「０ｘ１００」以上の値として符号「０ｘ０１００」が登録されているので、ソート領域の最後端「０ｘ０１０１」に「０ｘ０１００」を登録する。さらに、圧縮マップに、図３７に示すように、ソート後符号「０ｘ１０１」番目に、親ノードのソート後の符号「０ｘ００４１」と文字出現番号の配列番号「０ｘ４２」に相当する文字コードとを対応付ける対応付けデータを登録する。 Further, as shown in FIG. 36, since the appearance order “0” is registered in the array number “0x42” of the character appearance number in the data block with the code “0x0041”, the array of counter and child node numbers is “ Refer to the 0th position. Then, since the code “0x0100” is registered as a value equal to or greater than “0x100”, “0x0100” is registered at the last end “0x0101” of the sort area. Furthermore, as shown in FIG. 37, the code “0x101” after sorting of the parent node and the character code corresponding to the array number “0x42” of the character appearance number are associated with the compression map, as shown in FIG. Register the association data.

さらに、図３８に示すように、符号「０ｘ００４１」のデータブロックにおける文字出現番号の配列番号「０ｘ４３」番目には、出現順番「２」が登録されているので、カウンタ兼子ノード番号の配列で「２」番目を参照する。そうすると、「０ｘ１００」未満の値「１」が登録されているので、圧縮マップに対応付けデータを登録することはない。文字出現番号の以後の配列番号について処理しても圧縮マップに対応付けデータを登録することはない。 Further, as shown in FIG. 38, since the appearance order “2” is registered in the array number “0x43” of the character appearance number in the data block of the code “0x0041”, the array of counter and child node numbers is “ Reference is made to “2” th. Then, since the value “1” less than “0x100” is registered, the association data is not registered in the compression map. Even if the sequence numbers after the character appearance number are processed, the association data is not registered in the compression map.

次に、図３９に示すように、配列番号「０ｘ００４２」を処理することになると、符号「０ｘ００４２」のデータブロックを参照し、その文字出現番号の配列番号「０ｘ００」から順番に「０ｘＦＦ」以外の値が登録されている位置を探索する。この例では、「０ｘ４１」番目に出現順番「０」が登録されているので、カウンタ兼子ノード番号の配列で「０」番目を参照する。そうすると、「０ｘ１００」未満の値「１」が登録されているので、圧縮マップに対応付けデータを登録することはない。 Next, as shown in FIG. 39, when processing the array element number “0x0042”, the data block with the code “0x0042” is referred to, and the character appearance number from the array element number “0x00” in order is not “0xFF”. The position where the value of is registered is searched. In this example, since the appearance order “0” is registered in the “0x41” th, the “0” th is referred to in the counter / child node number array. Then, since the value “1” less than “0x100” is registered, the association data is not registered in the compression map.

また、符号「０ｘ００４２」のデータブロックにおける文字出現番号の配列番号「０ｘ４２」番目には、「０ｘＦＦ」が登録されているので圧縮マップにはデータは登録されない。 In addition, since “0xFF” is registered at the array element number “0x42” of the character appearance number in the data block of the code “0x0042”, no data is registered in the compression map.

さらに、図４０に示すように、符号「０ｘ００４２」のデータブロックにおける文字出現番号の配列番号「０ｘ４３」番目には、出現順番「１」が登録されているので、カウンタ兼子ノード番号の配列で「１」番目を参照する。そうすると、「０ｘ１００」以上の値として符号「０ｘ０１０６」が登録されているので、ソート領域の最後端「０ｘ０１０２」に「０ｘ０１０６」を登録する。さらに、圧縮マップに、図４１に示すように、ソート後符号「０ｘ１０２」番目に、親ノードのソート後の符号「０ｘ００４２」と文字出現番号の配列番号「０ｘ４３」に相当する文字コードとを対応付ける対応付けデータを登録する。文字出現番号の以後の配列番号について処理しても圧縮マップに対応付けデータを登録することはない。 Further, as shown in FIG. 40, since the appearance order “1” is registered in the array number “0x43” of the character appearance number in the data block of the code “0x0042”, the array of the counter / child node number is “ Refer to “1” th. Then, since the code “0x0106” is registered as a value equal to or greater than “0x100”, “0x0106” is registered at the last end “0x0102” of the sort area. Furthermore, as shown in FIG. 41, the code “0x102” after sorting of the parent node and the character code corresponding to the array number “0x43” of the character appearance number are associated with the compression map, as shown in FIG. Register the association data. Even if the sequence numbers after the character appearance number are processed, the association data is not registered in the compression map.

次に、図４２に示すように、配列番号「０ｘ００４３」を処理することになると、符号「０ｘ００４３」のデータブロックを参照し、その文字出現番号の配列番号「０ｘ００」から順番に「０ｘＦＦ」以外の値が登録されている位置を探索する。この例では、「０ｘ４１」番目に出現順番「０」が登録されているので、カウンタ兼子ノード番号の配列で「０」番目を参照する。そうすると、「０ｘ１００」未満の値「１」が登録されているので、圧縮マップに対応付けデータを登録することはない。 Next, as shown in FIG. 42, when processing the array element number “0x0043”, the data block of the code “0x0043” is referred to, and the character appearance number from the array element number “0x00” in order is not “0xFF”. The position where the value of is registered is searched. In this example, since the appearance order “0” is registered in the “0x41” th, the “0” th is referred to in the counter / child node number array. Then, since the value “1” less than “0x100” is registered, the association data is not registered in the compression map.

さらに、図４３に示すように、符号「０ｘ００４３」のデータブロックにおける文字出現番号の配列番号「０ｘ４２」番目には、出現順番「１」が登録されているので、カウンタ兼子ノード番号の配列で「１」番目を参照する。そうすると、「０ｘ１００」以上の値として符号「０ｘ０１０２」が登録されているので、ソート領域の最後端「０ｘ０１０３」に「０ｘ０１０２」を登録する。さらに、圧縮マップに、図４４に示すように、ソート後符号「０ｘ１０３」番目に、親ノードのソート後の符号「０ｘ００４３」と文字出現番号の配列番号「０ｘ４２」に相当する文字コードとを対応付ける対応付けデータを登録する。文字出現番号の以後の配列番号について処理しても圧縮マップに対応付けデータを登録することはない。 Furthermore, as shown in FIG. 43, since the appearance order “1” is registered at the array appearance number “0x42” of the character appearance number in the data block of the code “0x0043”, the array of counter and child node numbers is “ Refer to “1” th. Then, since the code “0x0102” is registered as a value equal to or greater than “0x100”, “0x0102” is registered in the last end “0x0103” of the sort area. Furthermore, as shown in FIG. 44, the code “0x103” after sorting of the parent node and the character code corresponding to the array number “0x42” of the character appearance number are associated with the code “0x103” after sorting as shown in FIG. Register the association data. Even if the sequence numbers after the character appearance number are processed, the association data is not registered in the compression map.

ソート後符号「０ｘ００４４」のデータブロックを処理しても圧縮マップにデータが登録されることはなく、ソート後符号「０ｘ００ＦＦ」までの全データブロックを処理しても圧縮マップにデータが登録されることはない。 Even if the data block with the code “0x0044” after sorting is processed, the data is not registered in the compression map, and the data is registered in the compression map even if all the data blocks up to the code “0x00FF” after sorting are processed. There is nothing.

また、図４５に示すように、配列番号「０ｘ０１００」を処理することになると、符号「０ｘ０１００」のデータブロックを参照し、その文字出現番号の配列番号「０ｘ００」から順番に「０ｘＦＦ」以外の値が登録されている位置を探索する。この例では、「０ｘ４３」番目に出現順番「０」が登録されているので、カウンタ兼子ノード番号の配列で「０」番目を参照する。そうすると、「０ｘ１００」以上の値として符号「０ｘ０１０１」が登録されているので、ソート領域の最後端「０ｘ０１０４」に「０ｘ０１０１」を登録する。さらに、圧縮マップに、図４６に示すように、ソート後符号「０ｘ１０４」番目に、親ノードのソート後の符号「０ｘ０１００」と文字出現番号の配列番号「０ｘ４３」に相当する文字コードとを対応付ける対応付けデータを登録する。文字出現番号の以後の配列番号について処理しても圧縮マップに対応付けデータを登録することはない。 Also, as shown in FIG. 45, when processing the array element number “0x0100”, the data block of the code “0x0100” is referred to, and the character appearance number from the array element number “0x00” in order is not “0xFF”. Search for the position where the value is registered. In this example, since the appearance order “0” is registered at the “0x43” th, the “0” th is referred to in the array of counter / child node numbers. Then, since the code “0x0101” is registered as a value equal to or greater than “0x100”, “0x0101” is registered at the last end “0x0104” of the sort area. Further, as shown in FIG. 46, the code “0x0100” after the sorting of the parent node is associated with the character code corresponding to the array number “0x43” of the character appearance number, as shown in FIG. Register the association data. Even if the sequence numbers after the character appearance number are processed, the association data is not registered in the compression map.

また、ソート後符号「０ｘ０１０１」のデータブロックを処理しても圧縮マップにデータが登録されることはない。 Further, even if the data block with the sorted code “0x0101” is processed, data is not registered in the compression map.

また、図４７に示すように、配列番号「０ｘ０１０２」を処理することになると、符号「０ｘ０１０２」のデータブロックを参照し、その文字出現番号の配列番号「０ｘ００」から順番に「０ｘＦＦ」以外の値が登録されている位置を探索する。この例では、「０ｘ４１」番目に出現順番「１」が登録されているので、カウンタ兼子ノード番号の配列で「１」番目を参照する。そうすると、「０ｘ１００」以上の値として符号「０ｘ０１０５」が登録されているので、ソート領域の最後端「０ｘ０１０５」に「０ｘ０１０５」を登録する。さらに、圧縮マップに、図４８に示すように、ソート後符号「０ｘ１０５」番目に、親ノードのソート後の符号「０ｘ０１０３」と文字出現番号の配列番号「０ｘ４１」に相当する文字コードとを対応付ける対応付けデータを登録する。このデータブロックについては、文字出現番号の以後の配列番号について処理すると、順調に圧縮マップに対応付けデータが追加されるようになる。 Also, as shown in FIG. 47, when processing the array element number “0x0102”, the data block with the code “0x0102” is referred to, and the character appearance number from the array element number “0x00” in order is other than “0xFF”. Search for the position where the value is registered. In this example, since the appearance order “1” is registered in the “0x41” th, the “1” th is referred to in the counter / child node number array. Then, since the code “0x0105” is registered as a value equal to or greater than “0x100”, “0x0105” is registered at the last end “0x0105” of the sort area. Further, as shown in FIG. 48, the code “0x105” after sorting of the parent node is associated with the code “0x0103” after sorting of the parent node and the character code corresponding to the array number “0x41” of the character appearance number, as shown in FIG. Register the association data. With respect to this data block, if the array number after the character appearance number is processed, the association data is added to the compression map smoothly.

そして、図４９に示すように、符号「０ｘ０１０２」のデータブロックにおける文字出現番号の配列番号「０ｘ４９」番目には、出現順番「８」が登録されているので、拡張カウンタ兼子ノード番号の配列で「０」番目を参照する。そうすると、「０ｘ１００」以上の値として符号「０ｘ０１０Ｄ」が登録されているので、ソート領域の最後端「０ｘ０１０Ｄ」に「０ｘ０１０Ｄ」を登録する。さらに、圧縮マップに、図５０に示すように、ソート後符号「０ｘ１０Ｄ」番目に、親ノードのソート後の符号「０ｘ０１０３」と文字出現番号の配列番号「０ｘ４９」に相当する文字コードとを対応付ける対応付けデータを登録する。文字出現番号の以後の配列番号について処理しても圧縮マップに対応付けデータを登録することはない。以下の処理で圧縮マップにはデータは登録されないので、説明を省略する。 As shown in FIG. 49, since the appearance order “8” is registered in the array number “0x49” of the character appearance number in the data block of the code “0x0102”, the array of extension counters and child node numbers is used. Refer to the “0” th. Then, since the code “0x010D” is registered as a value equal to or greater than “0x100”, “0x010D” is registered at the last end “0x010D” of the sort area. Further, as shown in FIG. 50, the code “0x0103” after sorting of the parent node and the character code corresponding to the array number “0x49” of the character appearance number are associated with the compression map “0x10D” th as shown in FIG. Register the association data. Even if the sequence numbers after the character appearance number are processed, the association data is not registered in the compression map. Since no data is registered in the compression map in the following processing, the description is omitted.

次に、図５１及び図５２を用いて圧縮マップ生成処理を説明する。 Next, the compression map generation process will be described with reference to FIGS.

圧縮マップ生成部１２０は、圧縮マップに、第１階層のノード「０ｘ００００」乃至「０ｘ００ＦＦ」のデータとして、根ノードを表すデータｒｏｏｔとノードの符号に対応する文字コードとを対応付ける対応付けデータを順番に追加する（図５１：ステップＳ９１）。また、圧縮マップ生成部１２０は、ソート領域に、第１の階層のノード「０ｘ００００」乃至「０ｘ００ＦＦ」の符号を、順番に追加する（ステップＳ９３）。そして、圧縮マップ生成部１２０は、ソート領域から、ソート後符号の小さい順に、未処理の符号を１つ読み出す（ステップＳ９５）。ここで未処理の符号を読み出すことができない場合には（ステップＳ９７：Ｎｏルート）、呼出元の処理に戻る。 The compression map generation unit 120 sequentially associates the compression map with the data “root” representing the root node and the character code corresponding to the code of the node as the data of the nodes “0x0000” to “0x00FF” of the first hierarchy. (FIG. 51: Step S91). Further, the compression map generation unit 120 sequentially adds the codes of the nodes “0x0000” to “0x00FF” of the first hierarchy to the sort area (step S93). Then, the compression map generator 120 reads one unprocessed code from the sort area in ascending order of the sorted code (step S95). If an unprocessed code cannot be read (step S97: No route), the process returns to the caller process.

一方、未処理の符号を読み出すことができれば（ステップＳ９７：Ｙｅｓルート）、圧縮マップ生成部１２０は、文節木において、読み出した符号のノードのデータブロックを参照する（ステップＳ９９）。そして、圧縮マップ生成部１２０は、エントリ追加処理を実施する（ステップＳ１０１）。このエントリ追加処理については、図５２を用いて説明する。エントリ追加処理が終了すると、ステップＳ９５に戻る。 On the other hand, if the unprocessed code can be read (step S97: Yes route), the compression map generation unit 120 refers to the data block of the read code node in the phrase tree (step S99). Then, the compression map generation unit 120 performs entry addition processing (step S101). This entry addition process will be described with reference to FIG. When the entry addition process ends, the process returns to step S95.

次に、エントリ追加処理の処理フローについて説明する。 Next, the processing flow of entry addition processing will be described.

圧縮マップ生成部１２０は、読み出した符号のノードのデータブロックにおいて、文字出現番号の配列における未処理の番号における値Ａを、番号の小さい順に取り出す（図５２：ステップＳ１１１）。ここで、圧縮マップ生成部１２０は、処理が、文字出現番号の配列における終端まで既に行われていたか判断する（ステップＳ１１３）。処理が文字出現番号の配列における終端まで既に行われていた場合には、呼出元の処理に戻る。 In the data block of the read code node, the compression map generation unit 120 extracts the value A in the unprocessed number in the character appearance number array in ascending order (FIG. 52: step S111). Here, the compression map generation unit 120 determines whether the processing has already been performed up to the end of the character appearance number array (step S113). If the process has already been performed up to the end of the character appearance number array, the process returns to the caller process.

一方、処理が文字出現番号の配列における終端まで行われていない場合には、圧縮マップ生成部１２０は、取り出した値Ａが０ｘＦＦであり且つ文字出現数が２５６ではないか判断する（ステップＳ１１５）。この条件を満たす場合には、符号がカウンタ兼子ノード番号の配列に登録されていないのでステップＳ１１１に戻る。 On the other hand, if the processing has not been performed up to the end of the character appearance number array, the compression map generation unit 120 determines whether the extracted value A is 0xFF and the number of character appearances is 256 (step S115). . If this condition is satisfied, the code is not registered in the counter / child node number array, and the process returns to step S111.

一方、値Ａが上で述べた条件を満たさない場合には、圧縮マップ生成部１２０は、値Ａが８以上であるか判断する（ステップＳ１１７）。Ａが８以上の場合には、拡張カウンタ兼子ノード番号の配列を用いているので、圧縮マップ生成部１２０は、スパンド番号の値を読み取り、スパンド番号番目（拡張スパンド番号の場合もある）の拡張カウンタ兼子ノード番号の配列を参照する（ステップＳ１１９）。また、圧縮マップ生成部１２０は、値Ａを−８する（ステップＳ１２１）。そして、圧縮マップ生成部１２０は、値Ａが８以上であるか判断する（ステップＳ１２３）。まだ値Ａが８以上である場合には、圧縮マップ生成部１２０は、拡張カウンタ兼子ノード番号の最後尾に格納されている拡張スパンド番号の値Ｃを取得する（ステップＳ１２５）。そして処理はステップＳ１１９に戻る。 On the other hand, when the value A does not satisfy the above-described conditions, the compression map generation unit 120 determines whether the value A is 8 or more (step S117). When A is 8 or more, since the array of extended counter / child node numbers is used, the compression map generation unit 120 reads the value of the spanned number and expands the spanned number-th (which may be an extended spanned number). Reference is made to the counter / child node number array (step S119). Further, the compression map generator 120 decrements the value A by -8 (step S121). Then, the compression map generation unit 120 determines whether the value A is 8 or more (step S123). If the value A is still 8 or more, the compression map generator 120 acquires the value C of the extended spanned number stored at the end of the extended counter / child node number (step S125). Then, the process returns to step S119.

一方、値Ａが８より小さい場合には、圧縮マップ生成部１２０は、参照先の拡張カウンタ兼子ノード番号の配列においてＡ番目に格納されている値Ｂを読み出す（ステップＳ１２７）。そして処理はステップＳ１３０に移行する。 On the other hand, when the value A is smaller than 8, the compression map generation unit 120 reads the value B stored in the Ath position in the array of reference destination extended counter / child node numbers (step S127). Then, the process proceeds to step S130.

一方、値Ａが初めからＡが８未満である場合には、圧縮マップ生成部１２０は、カウンタ兼子ノード番号の配列においてＡ番目に格納されている値Ｂを読み出す（ステップＳ１２９）。 On the other hand, if the value A is initially less than 8, the compression map generator 120 reads the value B stored in the Ath position in the counter / child node number array (step S129).

そして、圧縮マップ生成部１２０は、取り出した値Ｂが「０ｘ０１００」以上であるか判断する（ステップＳ１３０）。取り出した値Ｂが「０ｘ０１００」未満であれば、圧縮マップのデータを追加することはないので、処理はステップＳ１１１に戻る。一方、取り出した値Ｂが「０ｘ０１００」以上であれば、圧縮マップ生成部１２０は、符号Ｂをソート領域の最後尾に追加する（ステップＳ１３１）。また、圧縮マップ生成部１２０は、圧縮マップに、ソート領域における処理対象の配列番号と文字出現番号の配列における処理対象の配列番号とを対応付ける対応付けデータを追加する（ステップＳ１３３）。そして処理はステップＳ１１１に戻る。 Then, the compression map generation unit 120 determines whether or not the extracted value B is “0x0100” or more (step S130). If the extracted value B is less than “0x0100”, the compression map data is not added, and the process returns to step S111. On the other hand, if the extracted value B is “0x0100” or more, the compression map generation unit 120 adds the code B to the end of the sort area (step S131). Further, the compression map generation unit 120 adds association data that associates the processing target array number in the sort area with the processing target array number in the character appearance number array in the compression map (step S133). Then, the process returns to step S111.

以上のような処理を実施することで、上で具体的に説明した処理が行われるようになる。また、このように生成された圧縮マップであれば、ディスクにそのまま格納することができ、圧縮処理及び伸張処理において利用することができる。 By performing the processing as described above, the processing specifically described above is performed. In addition, the compression map generated in this way can be stored on the disk as it is, and can be used in compression processing and decompression processing.

次に、圧縮マップを用いた圧縮処理について、図５３乃至図５５を用いて説明する。 Next, compression processing using a compression map will be described with reference to FIGS.

圧縮処理部１４０は、圧縮対象の入力文字列から１文字取り出し、圧縮マップにおいて当該文字に対応する符号をカレントノードに位置づける（図５３：ステップＳ１４１）。また、圧縮処理部１４０は、入力文字列から次の１文字取り出す（ステップＳ１４３）。ここで、圧縮処理部１４０は、ステップＳ１４３で文字を取り出すことができたか判断する（ステップＳ１４４）。文字を取り出すことができれば、圧縮処理部１４０は、カレントノードの符号が「０ｘ０１００」より小さいか判断する（ステップＳ１４７）。 The compression processing unit 140 extracts one character from the input character string to be compressed, and positions the code corresponding to the character in the compression map as the current node (FIG. 53: step S141). The compression processing unit 140 extracts the next character from the input character string (step S143). Here, the compression processing unit 140 determines whether or not characters have been extracted in step S143 (step S144). If the character can be extracted, the compression processing unit 140 determines whether the code of the current node is smaller than “0x0100” (step S147).

カレントノードの符号が「０ｘ０１００」より小さい場合には、圧縮処理部１４０は、「親ノードの符号＝カレントノードの符号、文字コード＝取り出した文字」となるノードについて、圧縮マップにおいて符号「０ｘ０１００」から最終ノードの範囲で二分探索を実施する（ステップＳ１４９）。そして処理はステップＳ１５３に移行する。 When the code of the current node is smaller than “0x0100”, the compression processing unit 140 uses the code “0x0100” in the compression map for a node having “parent node code = current node code, character code = extracted character”. A binary search is performed in the range from to the last node (step S149). Then, the process proceeds to step S153.

一方、カレントノードが「０ｘ０１００」以上であれば、圧縮処理部１４０は、「親ノードの符号＝カレントノードの符号、文字コード＝取り出した文字」となるノードについて、圧縮マップにおいてカレントノードの符号＋１から最終ノードの範囲で二分探索を実施する（ステップＳ１５１）。そして処理はステップＳ１５３に移行する。 On the other hand, if the current node is “0x0100” or more, the compression processing unit 140 adds the code +1 of the current node in the compression map for a node having “parent node code = current node code, character code = extracted character”. To the final node (step S151). Then, the process proceeds to step S153.

ステップＳ１５３の処理に移行して、圧縮処理部１４０は、二分探索により該当ノードが見つかったか判断する（ステップＳ１５３）。該当ノードが見つからなかった場合には、圧縮処理部１４０は、圧縮結果としてカレントノードの符号を出力し（ステップＳ１５７）、カレントノードを、ステップＳ１４３で取り出した文字のノードに設定する（ステップＳ１５９）。そして処理はステップＳ１４３に戻る。一方、該当ノードが見つかった場合には、圧縮処理部１４０は、カレントノードを該当ノードに変更する（ステップＳ１５５）。そして処理はステップＳ１４３に戻る。 Shifting to the process of step S153, the compression processing unit 140 determines whether the corresponding node is found by the binary search (step S153). If no corresponding node is found, the compression processing unit 140 outputs the code of the current node as the compression result (step S157), and sets the current node as the node of the character extracted in step S143 (step S159). . Then, the process returns to step S143. On the other hand, when the corresponding node is found, the compression processing unit 140 changes the current node to the corresponding node (step S155). Then, the process returns to step S143.

ステップＳ１４３で文字が取り出せなかった場合には、圧縮処理部１４０は、カレントノードの符号を圧縮結果として出力する（ステップＳ１４５）。そして処理は呼出元の処理に戻る。 If the character cannot be extracted in step S143, the compression processing unit 140 outputs the code of the current node as the compression result (step S145). Then, the process returns to the caller process.

このような処理を実施することで、文字列の圧縮が行われる。 By performing such processing, the character string is compressed.

例えば、図５４に示すような圧縮マップが生成され、「ABCAA」という圧縮対象の文字列が入力された場合の処理について具体的に説明する。 For example, a process when a compression map as shown in FIG. 54 is generated and a character string to be compressed of “ABCAA” is input will be specifically described.

まず、入力文字列「ＡBCAA」の「Ａ」を処理する場合には、カレントノードが「０ｘ００４１」に設定される（図５５における［１］）。次に、「AＢCAA」の「Ｂ」を処理する場合には、「親ノードの符号＝０ｘ００４１、文字コード＝０ｘ４２」というノードについて、符号「０ｘ０１００」から「０ｘ０１０７」の範囲で二分探索を実施する。そうすると、符号「０ｘ０１０１」が該当するノードであることが分かるので、カレントノードを「０ｘ０１０１」に設定する（図５５における［２］）。 First, when processing “A” of the input character string “ABCAA”, the current node is set to “0x0041” ([1] in FIG. 55). Next, when “B” of “ABCAA” is processed, a binary search is performed for the node “parent node code = 0x0041, character code = 0x42” in the range of codes “0x0100” to “0x0107”. . Then, since it can be seen that the code “0x0101” is the corresponding node, the current node is set to “0x0101” ([2] in FIG. 55).

さらに、「ABＣAA」の「Ｃ」を処理する場合には、「親ノードの符号＝０ｘ０１０１、文字コード＝０ｘ４３」というノードについて、符号「０ｘ０１０２」から「０ｘ０１０７」の範囲で二分探索を実施する。そうすると、符号「０ｘ０１０４」が該当するノードであることが分かるので、カレントノードを「０ｘ０１０４」に設定する（図５５における［３］）。さらに、「ABCＡA」の「Ａ」を処理する場合には、「親ノードの符号＝０ｘ０１０４、文字コード＝０ｘ４１」となるノードについて、符号「０ｘ０１０５」から「０ｘ０１０７」の範囲で二分探索を実施する。そうすると、見つからないので、カレントノードの符号「０ｘ０１０４」を圧縮結果に追加する（図５５における［４］）。なお、文字コード「０ｘ０４１」に対応する符号「０ｘ００４１」のノードをカレントノードに設定する。 Further, when “C” of “ABCAA” is processed, a binary search is performed in the range of “0x0102” to “0x0107” for the node “parent node code = 0x0101, character code = 0x43”. Then, since the code “0x0104” is found to be the corresponding node, the current node is set to “0x0104” ([3] in FIG. 55). Further, when “A” of “ABCAA” is processed, a binary search is performed in the range of the codes “0x0105” to “0x0107” for the node having “parent node code = 0x0104, character code = 0x41”. . Then, since it is not found, the code “0x0104” of the current node is added to the compression result ([4] in FIG. 55). Note that the node of the code “0x0041” corresponding to the character code “0x041” is set as the current node.

さらに、「ABCAＡ」の「Ａ」を処理する場合には、「親ノードの符号＝０ｘ００４１、文字コード＝０ｘ４１」というノードについて、符号「０ｘ０１００」から「０ｘ０１０７」の範囲で二分探索を実施する。そうすると、符号「０ｘ０１００」が該当するノードであることが分かるので、カレントノードを「０ｘ０１００」に設定する（図５５における［５］）。これで入力文字列の全文字を処理したので、カレントノードの符号「０ｘ０１００」を出力して、処理を終了する（図５５における［６］）。 Further, when “A” of “ABCAA” is processed, a binary search is performed in the range of the codes “0x0100” to “0x0107” for the node “parent node code = 0x0041, character code = 0x41”. Then, since the code “0x0100” is found to be the corresponding node, the current node is set to “0x0100” ([5] in FIG. 55). Since all the characters of the input character string have been processed, the current node code “0x0100” is output and the processing is terminated ([6] in FIG. 55).

このように圧縮処理を実施することができる。 In this way, the compression process can be performed.

次に、図５６及び図５７を用いて伸張処理の処理フローを説明する。まず、伸張処理部１５０は、圧縮データから未処理の符号を１つ読み出し、圧縮マップにおいて該当するノードをカレントノードに設定する（図５６：ステップＳ１６１）。なお、ステップＳ１６１で符号を取得できなければ処理は呼出元の処理に戻る（ステップＳ１６３：Ｎｏルート）。一方、符号が取得できれば（ステップＳ１６３：Ｙｅｓルート）、伸張処理部１５０は、カレントノードの文字を作業域に出力する（ステップＳ１６５）。そして、伸張処理部１５０は、カレントノードに親ノードの符号が含まれているか判断する（ステップＳ１６７）。親ノードの符号がｒｏｏｔである場合には、親ノードの符号は無しと判断する。親ノードの符号がない場合には、伸張処理部１５０は、作業域の文字列を後ろから伸張結果として出力する（ステップＳ１６９）。処理はステップＳ１６１に戻る。一方、親ノードの符号がある場合には、伸張処理部１５０は、親ノードの符号をカレントノードに位置づける（ステップＳ１７１）。そして処理はステップＳ１６５に戻る。 Next, the processing flow of the decompression process will be described with reference to FIGS. First, the decompression processing unit 150 reads one unprocessed code from the compressed data, and sets the corresponding node in the compression map as the current node (FIG. 56: step S161). If the code cannot be acquired in step S161, the process returns to the caller process (step S163: No route). On the other hand, if the code can be acquired (step S163: Yes route), the decompression processing unit 150 outputs the character of the current node to the work area (step S165). Then, the decompression processing unit 150 determines whether the current node includes the parent node code (step S167). If the parent node has a root code, it is determined that the parent node has no code. If there is no parent node code, the decompression processing unit 150 outputs the character string in the work area from the back as the decompression result (step S169). The process returns to step S161. On the other hand, if there is a code of the parent node, the decompression processing unit 150 positions the code of the parent node as the current node (step S171). Then, the process returns to step S165.

このような処理を実施することで符号を文字列に伸張することができるようになる。 By executing such processing, the code can be expanded into a character string.

例えば図５４に示すような圧縮マップを用いて符号「０ｘ０１０４、０１００」が圧縮データとして入力された場合を説明する。 For example, a case where a code “0x0104, 0100” is input as compressed data using a compression map as shown in FIG. 54 will be described.

まず、符号「０ｘ０１０４」のノードをカレントノードに設定し、カレントノードにおける文字「Ｃ」を作業領域に出力する。また、親ノードの符号が「０ｘ０１０１」であるので、カレントノードを符号「０ｘ０１０１」のノードに設定する（図５７における［１］）。また、カレントノードの符号「０ｘ０１０１」における文字「Ｂ」を作業域に出力する。さらに、親ノードの符号が「０ｘ００４１」であるので、カレントノードを符号「０ｘ００４１」に設定する（図５７における［２］）。 First, the node with the code “0x0104” is set as the current node, and the character “C” at the current node is output to the work area. Further, since the code of the parent node is “0x0101”, the current node is set to the node of code “0x0101” ([1] in FIG. 57). In addition, the character “B” in the code “0x0101” of the current node is output to the work area. Further, since the code of the parent node is “0x0041”, the current node is set to the code “0x0041” ([2] in FIG. 57).

そして、カレントノードの符号「０ｘ００４１」における文字「Ａ」を作業域に出力する。但し、親ノードは存在しないので、作業域の文字を逆順に出力すると、「ＡＢＣ」が得られる。 Then, the character “A” in the code “0x0041” of the current node is output to the work area. However, since there is no parent node, “ABC” is obtained when characters in the work area are output in reverse order.

次に、新たな符号「０ｘ０１００」を読み出すと、カレントノードに設定して、当該符号のノードにおける文字「Ａ」を作業域に出力する。そして、親ノードの符号が「０ｘ００４１」であるので、カレントノードを符号「０ｘ００４１」のノードに設定する（図５７における［４］）。そして、カレントノードにおける文字「Ａ」を作業域に出力する。ここでカレントノードには親ノードの符号は無いので、作業域の文字列を逆順に出力すると、「ＡＡ」がさらに得られる。ここで伸張処理が完了する（図５７における［５］）。 Next, when a new code “0x0100” is read, it is set as the current node, and the character “A” at the node of the code is output to the work area. Since the code of the parent node is “0x0041”, the current node is set to the node of code “0x0041” ([4] in FIG. 57). Then, the character “A” at the current node is output to the work area. Here, since the current node does not have a parent node code, “AA” is further obtained when the character strings in the work area are output in reverse order. Here, the decompression process is completed ([5] in FIG. 57).

このように文節木のデータ構造を変更したため、処理途中で消費するメモリ容量を削減できる。 Since the phrase tree data structure is changed in this way, the memory capacity consumed during processing can be reduced.

なお、圧縮マップを生成せずとも、文節木のデータを用いて圧縮処理を実施することができる。 Note that the compression processing can be performed using the phrase tree data without generating the compression map.

例えば、図２４Ａ乃至図２４Ｆで表した文節木のデータを用いて文字列「ＡＢＣＡＡ」を圧縮する場合を一例に説明する。 For example, a case where the character string “ABCAA” is compressed using the data of the phrase tree shown in FIGS. 24A to 24F will be described as an example.

「ＡBCAA」における「Ａ」を読み出すと、図５８に示すように、文字「Ａ」に対応する符号「０ｘ００４１」のノードをカレントノードに設定する。さらに、「AＢCAA」における「Ｂ」を読み出すと、文字出現番号の配列において「０ｘ４２（Ｂ）」番目の値「０」を得て、カウンタ兼子ノード番号の配列における出現順番「０」の値を読み出す。この場合、符号「０ｘ０１００」が得られる。符号「０ｘ０１００」であれば、この符号に対応するノードをカレントノードに設定する。 When “A” in “ABCAA” is read, as shown in FIG. 58, the node of the code “0x0041” corresponding to the character “A” is set as the current node. Further, when “B” in “ABCAA” is read, the “0x42 (B)”-th value “0” is obtained in the character appearance number array, and the value of the appearance order “0” in the counter / child node number array is obtained. read out. In this case, the code “0x0100” is obtained. If the code is “0x0100”, the node corresponding to this code is set as the current node.

次に、「ABＣAA」における「Ｃ」を読み出すと、図５９に示すように、文字出現番号の配列において「０ｘ４３（Ｃ）」番目の値「０」を得て、カウンタ兼子ノード番号の配列における出現順番「０」の値を読み出す。ここでは、符号「０ｘ０１０１」が得られる。符号「０ｘ０１０１」であれば、この符号のノードをカレントノードに設定する。 Next, when “C” in “ABCAA” is read, the “0x43 (C)”-th value “0” is obtained in the array of character appearance numbers, as shown in FIG. The value of the appearance order “0” is read. Here, the code “0x0101” is obtained. If the code is “0x0101”, the node of this code is set as the current node.

また、「ABCＡA」における「Ａ」を読み出すと、図６０に示すように、文字出現番号の配列において「０ｘ４１（Ａ）」番目の値「０」を得て、カウンタ兼子ノード番号の配列における出現順番「０」の値を読み出す。そうすると、符号「０ｘ０１００」より小さい値であるので、カレントノードの符号「０ｘ０１０１」を圧縮結果として出力する。 Further, when “A” in “ABCAA” is read, the “0x41 (A)”-th value “0” is obtained in the array of character appearance numbers as shown in FIG. 60, and appears in the array of counter and child node numbers. Read the value of order “0”. Then, since the value is smaller than the code “0x0100”, the code “0x0101” of the current node is output as the compression result.

さらに、「ABCAＡ」における「Ａ」を読み出すと、図６１に示すように、文字出現番号の配列において「０ｘ４１（Ａ）」番目の値「１」を得て、カウンタ兼子ノード番号の配列における出現順番「１」の値を読み出す。そうすると、符号「０ｘ０１０４」が得られる。ここで入力文字列は終了するので、符号「０ｘ０１０４」も圧縮結果として出力する。そうすると、最終的に「０ｘ０１０１」「０ｘ０１０４」が圧縮結果として出力されることになる。 Further, when “A” in “ABCAA” is read, the “0x41 (A)”-th value “1” is obtained in the character appearance number array as shown in FIG. Read the value of order “1”. Then, the code “0x0104” is obtained. Since the input character string ends here, the code “0x0104” is also output as the compression result. Then, “0x0101” and “0x0104” are finally output as the compression results.

また、文節木のデータを用いて圧縮データを伸張することもできる。ここでは、図２４Ａ乃至図２４Ｆで表した文節木のデータを用いて符号「０ｘ０１０１」「０ｘ０１０４」を伸張する場合を一例に説明する。 It is also possible to decompress compressed data using phrase tree data. Here, a case where the codes “0x0101” and “0x0104” are expanded using the phrase tree data shown in FIGS. 24A to 24F will be described as an example.

まず、図６２に示すように、符号「０ｘ０１０１」を、カウンタ兼子ノード番号の配列の中に含むノードを探索する。そうすると、今回は符号「０ｘ０１００」のノードの出現順番「０」が特定される。そうすると、文字出現番号の配列で「０」が登録されている番号を探索すると「０ｘ４３」が得られる。この文字コード「０ｘ４３」を作業域に出力する。 First, as shown in FIG. 62, a node including the code “0x0101” in the counter / child node number array is searched. Then, the appearance order “0” of the node with the code “0x0100” is specified this time. Then, when a number in which “0” is registered in the character appearance number array is searched, “0x43” is obtained. This character code “0x43” is output to the work area.

次に、図６３に示すように、符号「０ｘ０１００」を、カウンタ兼子ノード番号の配列の中に含むノードを探索する。そうすると、今回は符号「０ｘ００４１」のノードの出現順番「０」が特定される。そうすると、文字出現番号の配列で「０」が登録されている番号を探索すると「０ｘ４２」が得られる。この文字コード「０ｘ４２」を作業域に出力する。 Next, as shown in FIG. 63, a node including the code “0x0100” in the counter / child node number array is searched. Then, the appearance order “0” of the node with the code “0x0041” is specified this time. Then, when a number in which “0” is registered in the character appearance number array is searched, “0x42” is obtained. This character code “0x42” is output to the work area.

次に、符号「０ｘ００４１」が探索対象となるが符号「０ｘ０１００」より小さいので、符号「０ｘ００４１」に対応する文字「０ｘ４１」を作業域に出力する。 Next, since the code “0x0041” is a search target but is smaller than the code “0x0100”, the character “0x41” corresponding to the code “0x0041” is output to the work area.

そうすると、図６４に示すように、作業域内の文字の順番を入れ替えて、「０ｘ４１」「０ｘ４２」「０ｘ４３」が伸張結果として出力される。 Then, as shown in FIG. 64, the order of the characters in the work area is changed, and “0x41”, “0x42”, and “0x43” are output as the expansion result.

次に、図６５に示すように、符号「０ｘ０１０４」を、カウンタ兼子ノード番号の配列の中に含むノードを探索する。そうすると、今回は符号「０ｘ００４１」のノードの出現番号「１」が特定される。そうすると、文字出現番号の配列で「１」が登録されている番号を探索すると「０ｘ４１」が得られる。この文字コード「０ｘ４１」を作業域に出力する。 Next, as shown in FIG. 65, a node including the code “0x0104” in the counter / child node number array is searched. Then, the appearance number “1” of the node with the code “0x0041” is specified this time. Then, when a number in which “1” is registered in the character appearance number array is searched, “0x41” is obtained. This character code “0x41” is output to the work area.

さらに、符号「０ｘ００４１」が探索対象となるが上で述べたのと同様に符号「０ｘ０１００」より小さいので、符号「０ｘ００４１」に対応する文字「０ｘ４１」を作業域に出力する。 Further, although the code “0x0041” is a search target, it is smaller than the code “0x0100” as described above, so the character “0x41” corresponding to the code “0x0041” is output to the work area.

そうすると、図６６に示すように、作業域内の文字の順番を入れ替えて、「０ｘ４１」「０ｘ４１」を伸張結果に追加する。このようにして、「ＡＢＣＡＡ」が得られる。 Then, as shown in FIG. 66, the order of the characters in the work area is changed, and “0x41” and “0x41” are added to the decompression result. In this way, “ABCAA” is obtained.

［実施の形態２］
本実施の形態では圧縮処理の処理速度を向上させるために、圧縮マップのデータ構造を変更する。 [Embodiment 2]
In this embodiment, the data structure of the compression map is changed in order to improve the processing speed of the compression process.

具体的には、各ノードについて子ノードの最大ノード番号（最大子ノードの符号）を保持するようにする。圧縮処理では、ステップＳ１４９及びステップＳ１５１でも、カレントノードを親ノードとするノードを探索している。すなわち、カレントノードの子ノードであることが検索の条件となっているが、探索範囲はステップＳ１４９では「０ｘ０１００」から最終ノードまでであり、ステップＳ１５１ではカレントノードの符号＋１から最終ノードまでである。 Specifically, the maximum node number (sign of the maximum child node) of the child node is held for each node. In the compression processing, a node having the current node as a parent node is also searched in steps S149 and S151. That is, the search condition is that it is a child node of the current node, but the search range is from “0x0100” to the last node in step S149, and from the code +1 of the current node to the last node in step S151. .

図６７に模式的に示す文節木において、例えば、カレントノードが「０ｘ０１０３」のノードであるとすると、子ノードの探索範囲は、実際には「０ｘ０１０５」乃至「０ｘ０１０７」のノードであるが、第１の実施の形態では、「０ｘ０１０４」についても探索対象となっていた。この例では１つのノードしか余分になっていないが、実際には「０ｘ０１００」「０ｘ０１０１」「０ｘ０１０２」に多数の子ノードが存在する場合には影響がある。また、「０ｘ０１０１」がカレントノードであれば、子ノードは「０ｘ０１０４」のみであるが、第１の実施の形態では「０ｘ０１０２」乃至「０ｘ０１０７」が探索範囲となっていた。 In the phrase tree schematically shown in FIG. 67, for example, if the current node is a node of “0x0103”, the search range of child nodes is actually a node of “0x0105” to “0x0107”. In the first embodiment, “0x0104” is also a search target. In this example, there is only one extra node, but there is actually an effect when there are many child nodes in “0x0100”, “0x0101”, and “0x0102”. If “0x0101” is the current node, the child node is only “0x0104”. However, in the first embodiment, “0x0102” to “0x0107” are search ranges.

これに対して各ノードについて最大子ノードの符号を保持すれば、カレントノードの１つ前のノードの最大子ノードの符号＋１のノードからカレントノードの最大子ノードまでを探索範囲として絞り込むことができるようになる。なお、子ノードが存在しない場合には、そのノードの符号−１のノードの最大子ノードの符号をコピーしておく。 On the other hand, if the code of the maximum child node is held for each node, the search range can be narrowed down from the node of the code +1 of the maximum child node of the node immediately before the current node to the maximum child node of the current node. It becomes like this. When there is no child node, the code of the maximum child node of the node of code −1 of the node is copied.

このように最大子ノードの符号を保持すれば、カレントノードが「０ｘ０１０３」であれば最大子ノードは「０ｘ０１０７」となっている。さらにカレントノードの符号「０ｘ０１０３」−１＝「０ｘ０１０２」の最大子ノードの符号「０ｘ０１０４」＋１＝「０ｘ０１０５」が得られるので、探索範囲は「０ｘ０１０５」乃至「０ｘ０１０７」であると効率的に特定できるようになる。 If the sign of the maximum child node is held in this way, if the current node is “0x0103”, the maximum child node is “0x0107”. Furthermore, since the code “0x0104” + 1 = “0x0105” of the maximum child node of the code “0x0103” −1 = “0x0102” of the current node is obtained, the search range is efficiently identified as “0x0105” to “0x0107” become able to.

具体的には、圧縮マップは、図６８に示すように、親ノードの符号と、文字コードと、最大子ノードの符号とを対応付ける形に変形される。 Specifically, as shown in FIG. 68, the compression map is transformed into a form in which the code of the parent node, the character code, and the code of the maximum child node are associated with each other.

次に、本実施の形態に係る圧縮マップ生成処理について説明する。本実施の形態では、圧縮マップ生成処理におけるエントリ追加処理を、図６９に示すように変更する。但し、変更部分は、ステップＳ１３５が追加された部分のみである。 Next, compression map generation processing according to the present embodiment will be described. In the present embodiment, the entry addition process in the compression map generation process is changed as shown in FIG. However, the changed part is only the part to which step S135 is added.

ステップＳ１３５では、１つのノードについて文字出現番号の配列を最後まで処理した後に、圧縮マップ生成部１２０は、ソート領域に最後に追加した符号の配列番号を最大子ノードの符号として設定する。処理に係るノードについてソート領域に子ノードの符号を追加しなかった場合、すなわち子ノードが存在していない場合には、処理に係るノードより前のノードと同じ符号が最大子ノードの符号として設定されることになる。 In step S135, after processing the array of the character appearance numbers for one node to the end, the compression map generation unit 120 sets the array number of the code added last to the sort area as the code of the maximum child node. If the child node code is not added to the sort area for the processing node, that is, if no child node exists, the same code as the node before the processing node is set as the maximum child node code. Will be.

また、本実施の形態に係る圧縮処理は、図７０に示すような処理フローとなる。但し、図５３の処理フローと異なる部分は、ステップＳ１４７乃至ＳＳ１５１の部分のみである。 Further, the compression processing according to the present embodiment has a processing flow as shown in FIG. However, the part different from the processing flow of FIG. 53 is only the part of steps S147 to SS151.

具体的には、圧縮処理部１４０は、カレントノードの符号が「０ｘ００００」であるか否かを判断する（ステップＳ１４７ｂ）。これは、「０ｘ００００」のノードだけが、カレントノードより１つ前のノードの最大子ノードを取得できないためである。カレントノードの符号が「０ｘ００００」である場合には、圧縮処理部１４０は、「親ノードの符号＝カレントノードの符号、文字コード＝取り出した文字」となるノードについて、圧縮マップにおいて符号「０ｘ０１００」から最大子ノードの符号の範囲で二分探索を実施する（ステップＳ１４９ｂ）。そして処理はステップＳ１５３に移行する。ステップＳ１４９よりも探索の範囲が狭められている。 Specifically, the compression processing unit 140 determines whether the code of the current node is “0x0000” (step S147b). This is because only the node “0x0000” cannot acquire the maximum child node of the node immediately before the current node. When the code of the current node is “0x0000”, the compression processing unit 140 sets the code “0x0100” in the compression map for the node having “parent node code = current node code, character code = extracted character”. To perform a binary search in the range of the code of the maximum child node (step S149b). Then, the process proceeds to step S153. The search range is narrower than in step S149.

一方、カレントノードが「０ｘ００００」でない場合には、圧縮処理部１４０は、「親ノードの符号＝カレントノードの符号、文字コード＝取り出した文字」となるノードについて、圧縮マップにおいて、カレントノードの１つ前のノードの最大子ノードの符号＋１から最大子ノードの符号の範囲で二分探索を実施する（ステップＳ１５１ｂ）。そして処理はステップＳ１５３に移行する。同様に、ステップＳ１５１よりも探索の範囲が狭められている。 On the other hand, if the current node is not “0x0000”, the compression processing unit 140 sets 1 of the current node in the compression map for the node having “parent node code = current node code, character code = extracted character”. A binary search is performed in the range from the code +1 of the maximum child node of the previous node to the code of the maximum child node (step S151b). Then, the process proceeds to step S153. Similarly, the search range is narrower than in step S151.

図６７の文節木について上で述べた処理を実施すれば、図７１のような圧縮マップが得られる。このような圧縮マップを用いて、入力文字列「ＡＢＣＡＡ」について上で述べた圧縮処理を実施すると、以下のようになる。なお、処理結果自体は、図５５に示したものと同様である。 If the above-described processing is performed on the phrase tree of FIG. 67, a compression map as shown in FIG. 71 is obtained. When the compression processing described above is performed on the input character string “ABCAA” using such a compression map, the following is performed. The processing result itself is the same as that shown in FIG.

まず、入力文字列「ＡBCAA」の「Ａ」を処理する場合には、カレントノードが「０ｘ００４１」に設定される（図５５における［１］）。次に、「AＢCAA」の「Ｂ」を処理する場合には、「親ノードの符号＝０ｘ００４１、文字コード＝０ｘ４２」というノードについて、符号「０ｘ０１００」から「０ｘ０１０１」（カレントノードの最大子ノードの符号）の範囲で二分探索を実施する。そうすると、符号「０ｘ０１０１」が該当するノードであることが分かるので、カレントノードを「０ｘ０１０１」に設定する（図５５における［２］）。 First, when processing “A” of the input character string “ABCAA”, the current node is set to “0x0041” ([1] in FIG. 55). Next, when processing “B” of “ABCAA”, for the node “parent node code = 0x0041, character code = 0x42”, the codes “0x0100” to “0x0101” (the maximum child node of the current node) Binary search is performed within the range of the sign. Then, since it can be seen that the code “0x0101” is the corresponding node, the current node is set to “0x0101” ([2] in FIG. 55).

さらに、「ABＣAA」の「Ｃ」を処理する場合には、「親ノードの符号＝０ｘ０１０１、文字コード＝０ｘ４３」というノードについて、符号「０ｘ０１０４」（１つ前のノードの最大子ノードの符号＋１＝カレントノードの最大子ノードの符号）で二分探索を実施する。そうすると、符号「０ｘ０１０４」が該当するノードであることが分かるので、カレントノードを「０ｘ０１０４」に設定する（図５５における［３］）。 Further, when “C” of “ABCAA” is processed, the code “0x0104” (the code of the maximum child node of the previous node + 1) is added to the node “parent node code = 0x0101, character code = 0x43”. = Binary search at the maximum child node of the current node). Then, since the code “0x0104” is found to be the corresponding node, the current node is set to “0x0104” ([3] in FIG. 55).

さらに、「ABCＡA」の「Ａ」を処理する場合には、「親ノードの符号＝０ｘ０１０４、文字コード＝０ｘ４１」となるノードについて、符号「０ｘ０１０８」（１つ前のノードの最大子ノードの符号＋１）から符号「０ｘ０１０７」（カレントノードの最大子ノード）が探索範囲として特定されるが、探索範囲の始点と終点が逆転しており、探索不要であることが分かる。そうすると、目的のノードは見つからないので、カレントノードの符号「０ｘ０１０４」を圧縮結果に追加する（図５５における［４］）。なお、文字コード「０ｘ０４１」に対応する符号「０ｘ００４１」のノードをカレントノードに設定する。 Further, when “A” of “ABCAA” is processed, the code “0x0108” (the code of the maximum child node of the previous node is set for the node having “parent node code = 0x0104, character code = 0x41”). The code “0x0107” (the maximum child node of the current node) is specified as the search range from +1), but it can be seen that the start point and end point of the search range are reversed, and no search is necessary. Then, since the target node is not found, the code “0x0104” of the current node is added to the compression result ([4] in FIG. 55). Note that the node of the code “0x0041” corresponding to the character code “0x041” is set as the current node.

さらに、「ABCAＡ」の「Ａ」を処理する場合には、「親ノードの符号＝０ｘ００４１、文字コード＝０ｘ４１」というノードについて、符号「０ｘ０１００」から「０ｘ０１０１」（カレントノードの最大子ノードの符号）の範囲で二分探索を実施する。そうすると、符号「０ｘ０１００」が該当するノードであることが分かるので、カレントノードを「０ｘ０１００」に設定する（図５５における［５］）。これで入力文字列の全文字を処理したので、カレントノードの符号「０ｘ０１００」を出力して、処理を終了する（図５５における［６］）。 Further, when “A” of “ABCAA” is processed, the codes “0x0100” to “0x0101” (the code of the largest child node of the current node) for the node “parent node code = 0x0041, character code = 0x41” ) Perform binary search within the range. Then, since the code “0x0100” is found to be the corresponding node, the current node is set to “0x0100” ([5] in FIG. 55). Since all the characters of the input character string have been processed, the current node code “0x0100” is output and the processing is terminated ([6] in FIG. 55).

以上のように具体例でも探索範囲が狭められていることが分かる。 As described above, it can be seen that the search range is narrowed even in the specific example.

［実施の形態３］
第３の実施の形態では、伸張処理の処理速度を向上させる。図５６で示した処理では、根ノード（ｒｏｏｔ）までノードを遡ることになるので、ノードの探索の処理回数が多くなっている。そこで、本実施の形態では、圧縮マップのデータ構造を変更すると共に、新たに階層情報及び伸張マップを生成して保持しておくものとする。このように追加のデータを保持するようになったとしても、追加で必要な容量は８００ＫＢ程度であり、負担がそれほど増えるわけではない。なお、圧縮マップは、４４６ＫＢ程度である。 [Embodiment 3]
In the third embodiment, the processing speed of the decompression process is improved. In the process shown in FIG. 56, since the node is traced back to the root node (root), the number of node search processes is increased. Therefore, in this embodiment, it is assumed that the data structure of the compression map is changed, and the hierarchical information and the decompression map are newly generated and held. Even if additional data is held in this way, the additional required capacity is about 800 KB, and the burden does not increase so much. The compression map is about 446 KB.

図７２に本実施の形態に係る圧縮マップの一例を示す。図７２に示すように、基本的な構成は第１の実施の形態と同様であるが、親ノードの符号と自ノードの文字コードと自ノードの所属階層の番号とを対応付けて格納するようになっている。 FIG. 72 shows an example of the compression map according to the present embodiment. As shown in FIG. 72, the basic configuration is the same as that of the first embodiment, but the code of the parent node, the character code of the own node, and the number of the hierarchy to which the own node belongs are stored in association with each other. It has become.

また、図７３は、第１階層以降の各階層について、階層内ノード数と、階層内先頭符号と、伸張文字列先頭オフセット（offset）値とが対応付けられるようになっている。図７３の例は、図６７の文節木を基にしているが、階層情報があれば、第２階層には４つのノード「０ｘ０１００」「０ｘ０１０１」「０ｘ０１０２」「０ｘ０１０３」があり、これらがそれぞれ２文字からなる文字列を表していることが分かる。また、これらの文字列は合計２×４＝８バイトである。また、第３階層においては４つのノード「０ｘ０１０４」「０ｘ０１０５」「０ｘ０１０６」「０ｘ０１０７」があり、これらがそれぞれ３文字からなる文字列を表していることが分かる。また、これらの文字列は合計３×４＝１２バイトであることも分かる。伸張文字列先頭オフセット値は、階層が特定されたとき、当該階層の先頭の符号の文字又は文字列が、伸張マップの先頭からどれだけ離れた位置に配置されているかを示している。 In FIG. 73, for each layer after the first layer, the number of nodes in the layer, the head code in the layer, and the expanded character string head offset (offset) value are associated with each other. The example of FIG. 73 is based on the phrase tree of FIG. 67, but if there is hierarchy information, there are four nodes “0x0100”, “0x0101”, “0x0102”, and “0x0103” in the second hierarchy, It can be seen that it represents a character string consisting of two characters. These character strings are 2 × 4 = 8 bytes in total. Further, in the third hierarchy, there are four nodes “0x0104”, “0x0105”, “0x0106”, and “0x0107”, and it can be seen that these represent character strings each consisting of three characters. It can also be seen that these character strings total 3 × 4 = 12 bytes. The expanded character string head offset value indicates how far the character or character string of the code at the head of the layer is located from the beginning of the expanded map when the layer is specified.

さらに、図７４は、本実施の形態に係る伸張マップの一例を示す。本実施の形態に係る伸張マップは、所属するノードの符号に対応する文字が、符号の順番で格納されている連想配列である。図６７の文節木であれば、第２階層については、０番目には「ＡＡ」、１番目には「ＡＢ」、２番目には「ＢＣ」、３番目には「ＣＢ」が登録されている。また、第３階層については、０番目には「ＡＢＣ」、１番目には「ＣＢＡ」、２番目には「ＣＢＢ」、３番目には「ＣＢＣ」が登録されている。 Further, FIG. 74 shows an example of an expansion map according to the present embodiment. The decompression map according to the present embodiment is an associative array in which characters corresponding to the codes of the nodes to which they belong are stored in the order of the codes. 67, in the second hierarchy, “AA” is registered for the 0th, “AB” for the 1st, “BC” for the 2nd, and “CB” for the 3rd. Yes. For the third hierarchy, “ABC” is registered for the 0th, “CBA” for the first, “CBB” for the second, and “CBC” for the third.

このような圧縮マップなどのデータを生成する処理を、図７５乃至図７７を用いて説明する。 Processing for generating data such as a compression map will be described with reference to FIGS.

圧縮マップ生成部１２０は、圧縮マップに、第１階層のノード「０ｘ００００」乃至「０ｘ００ＦＦ」のデータとして、根ノードを表すデータｒｏｏｔとノードの符号に対応する文字コードと所属階層番号「１」を対応付ける対応付けデータを順番に追加する（図７５：ステップＳ１８１）。 The compression map generation unit 120 adds data root representing the root node, the character code corresponding to the code of the node, and the belonging hierarchy number “1” as data of the nodes “0x0000” to “0x00FF” of the first hierarchy to the compression map. The association data to be associated is added in order (FIG. 75: step S181).

また、圧縮マップ生成部１２０は、階層情報に第１階層の情報を設定する（ステップＳ１８３）。すなわち、第１階層には２５６個のノードがあるので、階層内ノード数に２５６を設定し、階層内先頭符号に「０ｘ００００」を設定し、伸張文字列先頭オフセットには「０」を設定する。これらは固定値である。 In addition, the compression map generation unit 120 sets the information of the first layer in the layer information (Step S183). That is, since there are 256 nodes in the first layer, 256 is set as the number of nodes in the layer, “0x0000” is set as the head code in the layer, and “0” is set as the decompressed character string head offset. . These are fixed values.

さらに、圧縮マップ生成部１２０は、伸張マップに第１階層の情報を設定する（ステップＳ１８５）。第１階層は１文字で文字コードそのものであるから「０ｘ００」から「０ｘＦＦ」までを設定する。これらは固定値である。 Further, the compression map generation unit 120 sets the first layer information in the decompression map (step S185). Since the first layer is one character and the character code itself, “0x00” to “0xFF” are set. These are fixed values.

そして、圧縮マップ生成部１２０は、ソート領域に、第１の階層のノード「０ｘ００００」乃至「０ｘ００ＦＦ」の符号を、順番に追加する（ステップＳ１８７）。また、圧縮マップ生成部１２０は、次階層内先頭符号に「０ｘ０１００」を設定する（ステップＳ１８９）。次階層内先頭符号は、以下で用いる変数である。さらに、圧縮マップ生成部１２０は、階層カウンタを１に初期化する（ステップＳ１９０）。 Then, the compression map generation unit 120 sequentially adds the codes of the nodes “0x0000” to “0x00FF” of the first hierarchy to the sort area (step S187). Further, the compression map generation unit 120 sets “0x0100” as the head code in the next layer (step S189). The head code in the next layer is a variable used in the following. Further, the compression map generation unit 120 initializes the hierarchy counter to 1 (step S190).

その後、圧縮マップ生成部１２０は、ソート領域から、ソート後符号の小さい順に、未処理の符号を１つ読み出す（ステップＳ１９１）。ここで未処理の符号を読み出すことができない場合には（ステップＳ１９３：Ｎｏルート）、圧縮マップ生成部１２０は、以下で説明するノード数カウンタの値を、階層情報における最終階層の階層内ノード数として設定する（ステップＳ１９５）。そして処理は呼出元の処理に戻る。 After that, the compression map generation unit 120 reads one unprocessed code from the sorting area in ascending order of the sorted code (step S191). If the unprocessed code cannot be read out here (step S193: No route), the compression map generation unit 120 uses the value of the node number counter described below as the number of nodes in the hierarchy of the final hierarchy in the hierarchy information. (Step S195). Then, the process returns to the caller process.

一方、未処理の符号を読み出すことができれば（ステップＳ１９３：Ｙｅｓルート）、圧縮マップ生成部１２０は、文節木において、読み出した符号のノードのデータブロックを参照する（ステップＳ１９７）。そして、圧縮マップ生成部１２０は、本実施の形態に係るエントリ追加処理３を実施する（ステップＳ１９９）。このエントリ追加処理３については、図７６を用いて説明する。さらに、エントリ追加処理３の後に、圧縮マップ生成部１２０は、設定処理を実施する（ステップＳ２０１）。設定処理については、図７７を用いて説明する。そして処理はステップＳ１９１に戻る。 On the other hand, if the unprocessed code can be read (step S193: Yes route), the compression map generation unit 120 refers to the data block of the node of the read code in the phrase tree (step S197). Then, the compression map generation unit 120 performs the entry addition process 3 according to the present embodiment (step S199). The entry addition process 3 will be described with reference to FIG. Further, after the entry addition process 3, the compression map generation unit 120 performs a setting process (step S201). The setting process will be described with reference to FIG. Then, the process returns to step S191.

次に、エントリ追加処理３について図７６を用いて説明する。図７６は、図５２とほぼ同じであるが、ステップＳ１３３がステップＳ１３３ｂに変更され、ステップＳ１３７及びＳ１３９が追加されている。 Next, the entry addition process 3 will be described with reference to FIG. FIG. 76 is substantially the same as FIG. 52, but step S133 is changed to step S133b, and steps S137 and S139 are added.

すなわち、ステップＳ１３３ｂでは、圧縮マップ生成部１２０は、圧縮マップに、ソート領域における処理対象の配列番号（親ノードの符号）と文字出現番号の配列における処理対象の配列番号（子ノードの文字コード）と階層カウンタの値（所属階層）とを対応付ける対応付けデータを追加する（ステップＳ１３３ｂ）。 That is, in step S133b, the compression map generation unit 120 adds the processing target array number (parent node code) in the sort area and the processing target array number (child node character code) in the array of character appearance numbers to the compression map. And association data for correlating the value of the hierarchy counter (affiliation hierarchy) with each other (step S133b).

また、圧縮マップ生成部１２０は、伸張マップに、親ノードの文字列＋文字出現番号の配列番号（子ノードの文字コード）を、子ノードに対応する文字列として階層カウンタの値の階層において設定する（ステップＳ１３７）。親ノードの文字列（処理に係る符号の文字列）は、例えば圧縮マップをステップＳ９５で特定された符号を用いてたどることによって特定する。また、親ノードの文字列は、圧縮マップで所属階層を取得して、階層情報における、カレントノードが所属する階層の「伸張文字列先頭オフセット」＋（カレントノードの符号−階層内先頭符号）×階層の位置を、伸張マップにおいて参照すれば、得ることができる。 In addition, the compression map generation unit 120 sets the character string of the parent node + the array number of the character appearance number (character code of the child node) in the decompression map in the hierarchy of the value of the hierarchy counter as a character string corresponding to the child node. (Step S137). The character string of the parent node (character string of the code related to the process) is specified by following the compression map using the code specified in step S95, for example. Also, the character string of the parent node is obtained by acquiring the affiliation hierarchy in the compression map, and in the hierarchy information, “decompressed character string head offset” of the hierarchy to which the current node belongs + (code of current node−head code in hierarchy) × The position of the hierarchy can be obtained by referring to the extension map.

さらに、圧縮マップ生成部１２０は、ノード数カウンタの値を１インクリメントする（ステップＳ１３９）。そして処理はステップＳ１１１に戻る。 Furthermore, the compression map generator 120 increments the value of the node number counter by 1 (step S139). Then, the process returns to step S111.

以上のようにすれば、圧縮マップ、階層情報及び伸張マップの一部を生成できるようになる。 In this way, a part of the compression map, hierarchy information, and decompression map can be generated.

次に、設定処理について、図７７を用いて説明する。 Next, the setting process will be described with reference to FIG.

圧縮マップ生成部１２０は、これまでに設定されていた次階層先頭符号がカレントノードの符号であるか判断する（ステップＳ２１１）。次階層先頭符号の初期値は「０ｘ０１００」である。次階層先頭符号がカレントノードの符号ではない場合には処理はステップＳ２１９に移行する。一方、次階層先頭符号がカレントノードの符号であれば、圧縮マップ生成部１２０は、エントリ追加処理３において、子ノードが特定されたか判断する（ステップＳ２１３）。子ノードが存在しない場合には、圧縮マップ生成部１２０は、次階層先頭符号に、カレントノードの符号＋１を設定する（ステップＳ２１７）。そして処理は呼出元の処理に戻る。 The compression map generation unit 120 determines whether the next layer head code set so far is the code of the current node (step S211). The initial value of the next layer head code is “0x0100”. If the next layer head code is not the code of the current node, the process proceeds to step S219. On the other hand, if the next layer head code is the code of the current node, the compression map generation unit 120 determines whether a child node is specified in the entry addition process 3 (step S213). If there is no child node, the compression map generator 120 sets the code +1 of the current node as the next layer head code (step S217). Then, the process returns to the caller process.

一方、子ノードが存在した場合には、次階層先頭符号に、エントリ追加処理３において特定された子ノードのうち最小子ノードの符号を設定する（ステップＳ２１５）。そして処理はステップＳ２１９に移行する。 On the other hand, if there is a child node, the code of the smallest child node among the child nodes specified in the entry addition process 3 is set as the next layer top code (step S215). Then, the process proceeds to step S219.

ステップＳ２１９の処理に移行して、圧縮マップ生成部１２０は、カレントノードの符号＋１が次階層先頭符号であるか判断する（ステップＳ２１９）。すなわち、処理に係るノードが、現階層の最終ノードであるか判断する。カレントノードの符号＋１が次階層先頭符号である場合には、圧縮マップ生成部１２０は、階層情報の階層内ノード数に、ノード数カウンタの値を設定する（ステップＳ２２１）。また、圧縮マップ生成部１２０は、階層情報において、次階層の階層内先頭符号としてカレントノード＋１を設定する（ステップＳ２２３）。さらに、圧縮マップ生成部１２０は、階層情報における次の階層の伸張文字列先頭オフセットとして、現階層の伸張文字列先頭オフセット＋ノード数カウンタ×階層番号（階層カウンタの値）を設定する（ステップＳ２２５）。 Shifting to the process of step S219, the compression map generation unit 120 determines whether the code +1 of the current node is the next layer head code (step S219). That is, it is determined whether the node related to the process is the last node in the current hierarchy. When the code +1 of the current node is the next layer head code, the compression map generator 120 sets the value of the node number counter to the number of nodes in the layer of the layer information (step S221). In addition, the compression map generation unit 120 sets current node + 1 as the first code in the hierarchy of the next hierarchy in the hierarchy information (step S223). Further, the compression map generation unit 120 sets the decompressed character string head offset of the current layer + node number counter × hierarchy number (hierarchy counter value) as the decompressed character string head offset of the next layer in the layer information (step S225). ).

そして、圧縮マップ生成部１２０は、ノード数カウンタを０に初期化し（ステップＳ２２７）、階層カウンタの値を１インクリメントする（ステップＳ２２９）。そして処理は、呼出元の処理に戻る。 Then, the compression map generation unit 120 initializes the node number counter to 0 (step S227), and increments the value of the hierarchy counter by 1 (step S229). Then, the process returns to the caller process.

このような処理を実施することで上で述べた圧縮マップ、階層情報及び伸張マップを生成することができる。 By performing such processing, the compression map, the hierarchy information, and the decompression map described above can be generated.

次に、本実施の形態に係る圧縮処理３について図７８を用いて説明する。本実施の形態に係る圧縮処理３は、第１の実施の形態に係る圧縮処理とほぼ同じであり、異なるのはステップＳ１４７乃至Ｓ１５１に代わり、ステップＳ１５１ｃを実行するような点である。 Next, compression processing 3 according to the present embodiment will be described with reference to FIG. The compression process 3 according to the present embodiment is substantially the same as the compression process according to the first embodiment, and is different in that step S151c is executed instead of steps S147 to S151.

すなわち、圧縮処理部１４０は、「親ノードの符号＝カレントノードの符号、文字コード＝取り出した文字」となるノードについて、圧縮マップにおいて、カレントノードの階層の次の階層の階層内先頭符号（階層情報内のデータ）から次の階層（階層情報内のデータ）の階層内ノード数の範囲で二分探索を実施する（ステップＳ１５１ｃ）。 In other words, the compression processing unit 140, for a node having “parent node code = current node code, character code = extracted character”, in the compression map, the first code in the hierarchy (hierarchy) A binary search is performed within the range of the number of nodes in the hierarchy from the data in the information) to the next hierarchy (data in the hierarchy information) (step S151c).

このようにすれば、第１の実施の形態に係る圧縮処理における探索範囲より狭められている。 In this way, it is narrower than the search range in the compression processing according to the first embodiment.

例えば、「ABCAA」という圧縮対象の文字列が入力された場合の処理について具体的に説明する。 For example, a process when a character string to be compressed of “ABCAA” is input will be specifically described.

まず、入力文字列「ＡBCAA」の「Ａ」を処理する場合には、カレントノードが「０ｘ００４１」に設定される（図５５における［１］）。次に、「AＢCAA」の「Ｂ」を処理する場合には、「親ノードの符号＝０ｘ００４１、文字コード＝０ｘ４２」というノードについて、現階層が「１」であるから第２階層の符号「０ｘ０１００」から「０ｘ０１０３」の範囲で二分探索を実施する。そうすると、符号「０ｘ０１０１」が該当するノードであることが分かるので、カレントノードを「０ｘ０１０１」に設定する（図５５における［２］）。 First, when processing “A” of the input character string “ABCAA”, the current node is set to “0x0041” ([1] in FIG. 55). Next, when “B” of “ABCAA” is processed, since the current hierarchy is “1” for the node “parent node code = 0x0041, character code = 0x42”, the second hierarchy code “0x0100” ”To“ 0x0103 ”. Then, since it can be seen that the code “0x0101” is the corresponding node, the current node is set to “0x0101” ([2] in FIG. 55).

さらに、「ABＣAA」の「Ｃ」を処理する場合には、「親ノードの符号＝０ｘ０１０１、文字コード＝０ｘ４３」というノードについて、現階層が「２」であるから第３階層の符号「０ｘ０１０４」から「０ｘ０１０７」の範囲で二分探索を実施する。そうすると、符号「０ｘ０１０４」が該当するノードであることが分かるので、カレントノードを「０ｘ０１０４」に設定する（図５５における［３］）。 Further, when “C” of “ABCAA” is processed, since the current layer is “2” for the node “parent node code = 0x0101, character code = 0x43”, the third layer code “0x0104” To “0x0107”. Then, since the code “0x0104” is found to be the corresponding node, the current node is set to “0x0104” ([3] in FIG. 55).

さらに、「ABCＡA」の「Ａ」を処理する場合には、「親ノードの符号＝０ｘ０１０４、文字コード＝０ｘ４１」となるノードについて、現階層が「３」であるので、次階層が存在しないので、探索無しで該当ノード無しと言うことが分かる。従って、カレントノードの符号「０ｘ０１０４」を圧縮結果に追加する（図５５における［４］）。なお、文字コード「０ｘ４１」に対応する符号「０ｘ００４１」のノードをカレントノードに設定する。 Further, when “A” of “ABCAA” is processed, since the current layer is “3” for the node having “parent node code = 0x0104, character code = 0x41”, there is no next layer. It can be seen that there is no corresponding node without searching. Therefore, the code “0x0104” of the current node is added to the compression result ([4] in FIG. 55). Note that the node of the code “0x0041” corresponding to the character code “0x41” is set as the current node.

さらに、「ABCAＡ」の「Ａ」を処理する場合には、「親ノードの符号＝０ｘ００４１、文字コード＝０ｘ４１」というノードについて、現階層が「１」であるから第２階層の符号「０ｘ０１００」から「０ｘ０１０３」の範囲で二分探索を実施する。そうすると、符号「０ｘ０１００」が該当するノードであることが分かるので、カレントノードを「０ｘ０１００」に設定する（図５５における［５］）。これで入力文字列の全文字を処理したので、カレントノードの符号「０ｘ０１００」を出力して、処理を終了する（図５５における［６］）。 Further, when “A” of “ABCAA” is processed, since the current layer is “1” for the node “parent node code = 0x0041, character code = 0x41”, the second layer code “0x0100” To “0x0103”. Then, since the code “0x0100” is found to be the corresponding node, the current node is set to “0x0100” ([5] in FIG. 55). Since all the characters of the input character string have been processed, the current node code “0x0100” is output and the processing is terminated ([6] in FIG. 55).

次に、本実施の形態に係る伸張処理２の処理内容について図７９及び図８０を用いて説明する。 Next, processing contents of the decompression processing 2 according to the present embodiment will be described with reference to FIGS. 79 and 80. FIG.

まず、伸張処理部１５０は、圧縮データから１つ符号を取り出し、圧縮マップにおいて該当するノードをカレントノードに位置づける（ステップＳ２３１）。圧縮データから符号が読み出せなかった場合には（ステップＳ２３３：Ｎｏルート）、処理は呼出元の処理に戻る。一方、圧縮データから符号を読み出すことができれば（ステップＳ２３３：Ｙｅｓルート）、階層情報における、カレントノードが所属する階層の「伸張文字列先頭オフセット」＋（カレントノードの符号−階層内先頭符号）×階層の位置を、伸張マップにおいて参照し、階層バイト分出力する（ステップＳ２３５）。そしてステップＳ２３１に戻る。伸長文字列先頭オフセット、階層内先頭符号については、階層情報から読み出すことによって得られる。 First, the decompression processing unit 150 extracts one code from the compressed data, and positions the corresponding node in the compression map as the current node (step S231). If the code cannot be read from the compressed data (step S233: No route), the process returns to the caller process. On the other hand, if the code can be read from the compressed data (step S233: Yes route), “hierarchical character string head offset” of the hierarchy to which the current node belongs in the hierarchy information + (code of current node−head code in hierarchy) × The position of the hierarchy is referred to in the decompression map and output for the hierarchy bytes (step S235). Then, the process returns to step S231. The decompressed character string head offset and the head code within the hierarchy are obtained by reading from the hierarchy information.

このように階層情報と伸張マップを用いることで処理が高速化される。 In this way, the processing speed is increased by using the hierarchy information and the extension map.

例えば、「０ｘ０１０４」「０ｘ０１００」を伸張する場合の処理を図８０を用いて説明する。 For example, a process when decompressing “0x0104” and “0x0100” will be described with reference to FIG.

まず、符号「０ｘ０１０４」を読み出すと、圧縮マップから第３階層であることが特定されるので、第３階層の伸張文字列先頭オフセット「２６４」＋（カレントノードの符号「０ｘ０１０４」−階層内先頭符号「０ｘ０１０４」）×３＝「２６４」となるので、伸張マップの先頭から２６４バイトから３バイト分取り出す。そうすると、「ＡＢＣ」が出力される（図８０［１］）。
さらに、次の符号「０ｘ０１００」を読み出すと、圧縮マップから第２階層であることが特定されるので、第２階層の伸張文字オフセット「２５６」＋（カレントノードの符号「０ｘ０１００」−階層内先頭符号「０ｘ０１００」）×２＝「２５６」となるので、伸張マップの先頭から２５６バイトから２バイト分取り出す。そうすると、「ＡＡ」がさらに出力されることになる（図８０［２］）。 First, when the code “0x0104” is read out, it is specified from the compression map that it is the third layer. Therefore, the decompressed character string head offset “264” of the third layer + (current node code “0x0104” −the head in the layer Since the code is “0x0104”) × 3 = “264”, 3 bytes are extracted from 264 bytes from the head of the decompression map. Then, “ABC” is output (FIG. 80 [1]).
Further, when the next code “0x0100” is read, it is specified from the compression map that it is the second layer. Therefore, the second layer expanded character offset “256” + (current node code “0x0100” −the head in the layer) Since the code “0x0100”) × 2 = “256”, 2 bytes are extracted from 256 bytes from the head of the decompression map. Then, “AA” is further output (FIG. 80 [2]).

以上のように２ステップで伸張されるので、処理が高速化される。 As described above, since the decompression is performed in two steps, the processing speed is increased.

［実施の形態４］
本実施の形態では実施の形態２及び３を統合する。また、伸張マップ及び階層情報において固定で且つ圧縮マップのデータから得られる情報については省略することで、メモリ使用量を削減する。 [Embodiment 4]
In the present embodiment, the second and third embodiments are integrated. Also, information that is fixed in the decompression map and hierarchy information and obtained from the compression map data is omitted, thereby reducing the memory usage.

本実施の形態に係る圧縮マップの一例を図８１に示す。本実施の形態では、ソート後の符号の順番で、自ノードの文字コードと、最大子ノードの符号と、所属階層とを対応付けている。親ノードの符号については、階層情報及び伸張マップがあれば用いられないので、省略されている。 An example of the compression map according to the present embodiment is shown in FIG. In the present embodiment, the character code of the own node, the code of the maximum child node, and the affiliation hierarchy are associated in the order of the codes after sorting. The code of the parent node is omitted because it is not used if there is hierarchical information and an extension map.

さらに、図８２に本実施の形態に係る階層情報を示す。本実施の形態に係る階層情報は、第３の実施の形態と異なり、第１階層のデータが省略されている。第２階層以降の各階層について、階層内ノード数と、階層内先頭符号と、伸張文字列先頭オフセットとが登録されるようになっている。第１階層のデータが省略されているので、伸張文字列先頭オフセットの値が変更されている。 Further, FIG. 82 shows hierarchical information according to the present embodiment. The hierarchy information according to the present embodiment is different from the third embodiment in that the data of the first hierarchy is omitted. For each layer after the second layer, the number of nodes in the layer, the head code within the layer, and the decompressed character string head offset are registered. Since the data of the first layer is omitted, the value of the decompressed character string head offset is changed.

また、図８３に本実施の形態に係る伸張マップを示す。本実施の形態に係る伸張マップは、第３の実施の形態と異なり、第１階層のデータが省略されている。伸張マップは、第２階層以降の符号の小さい順に、該当する文字列が格納されている連想配列である。 FIG. 83 shows an expansion map according to the present embodiment. In the decompression map according to the present embodiment, unlike the third embodiment, the data of the first layer is omitted. The decompression map is an associative array in which corresponding character strings are stored in ascending order of codes after the second layer.

本実施の形態に係る圧縮マップ生成処理３の処理フローを図８４に示す。第３の実施の形態に係る圧縮マップ生成処理２と異なる部分は、ステップＳ１８３及びＳ１８５が存在せず、エントリ追加処理３を実施するステップＳ１９９の代わりにエントリ追加処理４を実施するステップＳ１９９ｂを実施する点、設定処理を実施するステップＳ２０１の代わりに設定処理２を実施するステップＳ２０１ｂを実施する点が異なっている。 FIG. 84 shows a process flow of the compression map generation process 3 according to the present embodiment. The difference from the compression map generation process 2 according to the third embodiment is that steps S183 and S185 do not exist, and step S199b that performs the entry addition process 4 is performed instead of step S199 that performs the entry addition process 3. The difference is that step S201b for performing the setting process 2 is performed instead of step S201 for performing the setting process.

次にエントリ追加処理４の処理フローを図８５に示す。第３の実施の形態に係るエントリ追加処理３と異なる点は、配列の終端の場合に実行するステップＳ１３５（エントリ追加処理２で説明した処理）を実施する点と、ステップＳ１３３ｂの代わりにステップＳ１３３ｃを実施する点である。 Next, FIG. 85 shows a process flow of the entry addition process 4. The difference from the entry addition process 3 according to the third embodiment is that step S135 (the process described in the entry addition process 2) executed at the end of the array is performed, and step S133c instead of step S133b. It is a point to implement.

ステップＳ１３３ｃでは、圧縮マップ生成部１２０は、圧縮マップに、文字出現番号の配列における処理対象の配列番号（子ノードの文字コード）と階層カウンタの値（所属階層）とを対応付ける対応付けデータを追加する（ステップＳ１３３ｃ）。ソート領域における処理対象の配列番号（親ノードの符号）を登録しなくなった点が異なる。 In step S133c, the compression map generation unit 120 adds association data that associates the processing target array number (the character code of the child node) in the array of character appearance numbers with the value of the hierarchy counter (affiliation hierarchy) in the compression map. (Step S133c). The difference is that the array element (the code of the parent node) to be processed in the sort area is no longer registered.

また、圧縮マップ生成処理３に含まれる設定処理２は、図７７とは異なり、図８６に示すような処理フローとなる。但し、図７７との差は、ステップＳ２２０及びＳ２２２が追加され、ステップＳ２２３及びＳ２２５の代わりに、ステップＳ２２３ｂ及びＳ２２５ｂが実施されるようになっている。 Also, the setting process 2 included in the compression map generation process 3 has a process flow as shown in FIG. 86, unlike FIG. However, the difference from FIG. 77 is that steps S220 and S222 are added, and steps S223b and S225b are executed instead of steps S223 and S225.

この設定処理２において、ステップＳ２１９においてカレントノードの符号＋１が次階層先頭符号である場合、すなわち当該階層の最終ノードである場合には、圧縮マップ生成部１２０は、現在の階層カウンタの値が１であるか判断する（ステップＳ２２０）。現在の階層カウンタの値が１である場合には、階層情報にデータを追加しないので、圧縮マップ生成部１２０は、階層情報における次の階層の伸張文字列先頭オフセットに「０」を設定する（ステップＳ２２２）。そして処理はステップＳ２２５ｂに移行する。 In the setting process 2, when the code +1 of the current node is the next layer head code in step S219, that is, when it is the last node of the layer, the compression map generator 120 sets the current layer counter value to 1. (Step S220). When the value of the current hierarchy counter is 1, since no data is added to the hierarchy information, the compression map generation unit 120 sets “0” as the decompressed character string start offset of the next hierarchy in the hierarchy information ( Step S222). Then, the process proceeds to step S225b.

一方、現在の階層カウンタの値が１でない場合には、圧縮マップ生成部１２０は、階層情報の階層内ノード数に、ノード数カウンタの値を設定する（ステップＳ２２１）。また、圧縮マップ生成部１２０は、階層情報における次の階層の伸張文字列先頭オフセットとして、現階層の伸張文字列オフセット＋ノード数カウンタ×階層数（階層カウンタの値）を設定する（ステップＳ２２３ｂ）。この処理はステップＳ２２５と同じである。さらに圧縮マップ生成部１２０は、階層情報において、次階層の階層内先頭符号としてカレントノードの符号＋１を設定する（ステップＳ２２５ｂ）。このステップはステップＳ２２３と同じである。以降の処理は図７７と同じである。 On the other hand, when the current hierarchy counter value is not 1, the compression map generation unit 120 sets the value of the node number counter to the number of nodes in the hierarchy of the hierarchy information (step S221). Further, the compression map generation unit 120 sets the decompressed character string offset of the current layer + node number counter × the number of layers (the value of the layer counter) as the decompressed character string start offset of the next layer in the layer information (step S223b). . This process is the same as step S225. Further, the compression map generation unit 120 sets the code +1 of the current node as the first code in the next hierarchy in the hierarchy information (step S225b). This step is the same as step S223. The subsequent processing is the same as in FIG.

次に、本実施の形態に係る圧縮処理４の処理フローを図８７に示す。第２の実施の形態に係る圧縮処理２との差は、ステップＳ１４９ｂとステップＳ１５１ｂが、ステップＳ１４９ｄとステップＳ１５１ｄとに変更された点である。 Next, FIG. 87 shows a processing flow of compression processing 4 according to the present embodiment. The difference from the compression process 2 according to the second embodiment is that step S149b and step S151b are changed to step S149d and step S151d.

具体的には、圧縮処理部１４０は、カレントノードの符号が「０ｘ００００」であるか否かを判断する（ステップＳ１４７ｂ）。これは、「０ｘ００００」のノードだけが、カレントノードより１つ前のノードの最大子ノードを取得できないためである。カレントノードの符号が「０ｘ００００」である場合には、圧縮処理部１４０は、文字コード＝取り出した文字となるノードについて、圧縮マップにおいて符号「０ｘ０１００」から最大子ノードの符号の範囲で二分探索を実施する（ステップＳ１４９ｄ）。親ノードについてのデータが圧縮マップから削除されているため、このような処理になる。 Specifically, the compression processing unit 140 determines whether the code of the current node is “0x0000” (step S147b). This is because only the node “0x0000” cannot acquire the maximum child node of the node immediately before the current node. When the code of the current node is “0x0000”, the compression processing unit 140 performs a binary search in the compression map from the code “0x0100” to the code of the maximum child node for the node whose character code is the extracted character. Implement (step S149d). This is the process because the data about the parent node has been deleted from the compression map.

一方、カレントノードが「０ｘ００００」でない場合には、圧縮処理部１４０は、文字コード＝取り出した文字となるノードについて、圧縮マップにおいて、カレントノードの１つ前のノードの最大子ノードの符号＋１から最大子ノードの符号の範囲で二分探索を実施する（ステップＳ１５１ｄ）。そして処理はステップＳ１５３に移行する。 On the other hand, when the current node is not “0x0000”, the compression processing unit 140 determines, from the code +1 of the maximum child node of the node immediately before the current node in the compression map, for the node whose character code is the extracted character. A binary search is performed in the range of the code of the maximum child node (step S151d). Then, the process proceeds to step S153.

なお、伸張処理については、図７９の伸張処理２の代わりに図８８に示す伸張処理３を実施する。図８８と図７９の差については、ステップＳ２３７及びＳ２３９が追加されている点である。具体的には、伸張処理部１５０は、圧縮マップにおけるカレントノードの所属階層の値が「１」であるか判断する（ステップＳ２３７）。所属階層の値が「１」である場合には、伸張処理部１５０は、圧縮マップにおけるカレントノードにおける文字コードを出力し（ステップＳ２３９）、処理はステップＳ２３１に移行する。一方、所属階層の値が「２」以上であれば、処理はステップＳ２３５に移行する。このような処理を実施すれば、階層情報及び伸張マップのデータ量を削減しても伸張処理が高速化される。 As for the decompression process, the decompression process 3 shown in FIG. 88 is performed instead of the decompression process 2 in FIG. The difference between FIG. 88 and FIG. 79 is that steps S237 and S239 are added. Specifically, the decompression processing unit 150 determines whether the value of the hierarchy level of the current node in the compression map is “1” (step S237). If the value of the affiliation hierarchy is “1”, the decompression processing unit 150 outputs the character code at the current node in the compression map (step S239), and the process proceeds to step S231. On the other hand, if the value of the belonging hierarchy is “2” or more, the process proceeds to step S235. If such a process is performed, the decompression process can be accelerated even if the data amount of the hierarchy information and the decompression map is reduced.

以上本実施の形態を説明したが、本技術はこれに限定されるものではない。 Although the present embodiment has been described above, the present technology is not limited to this.

例えば、上で述べた機能ブロック図は必ずしも実際のプログラムモジュール構成とは一致しない。また、処理フローについても処理結果が変わらない限り、処理順番を入れ替えたり、並列実行するようにできる。 For example, the functional block diagram described above does not necessarily match the actual program module configuration. As for the processing flow, as long as the processing result does not change, the processing order can be changed or executed in parallel.

さらに上で述べた処理については１台のコンピュータで処理する例を示したが、複数のコンピュータで処理するようにしても良い。 Furthermore, although the above-described processing has been described with an example in which processing is performed by one computer, processing may be performed by a plurality of computers.

なお、上で述べた情報処理装置１００は、コンピュータ装置であって、図８９に示すように、メモリ２５０１とＣＰＵ（Central Processing Unit）２５０３とハードディスク・ドライブ（ＨＤＤ：Hard Disk Drive）２５０５と表示装置２５０９に接続される表示制御部２５０７とリムーバブル・ディスク２５１１用のドライブ装置２５１３と入力装置２５１５とネットワークに接続するための通信制御部２５１７とがバス２５１９で接続されている。オペレーティング・システム（ＯＳ：Operating System）及び本実施例における処理を実施するためのアプリケーション・プログラムは、ＨＤＤ２５０５に格納されており、ＣＰＵ２５０３により実行される際にはＨＤＤ２５０５からメモリ２５０１に読み出される。ＣＰＵ２５０３は、アプリケーション・プログラムの処理内容に応じて表示制御部２５０７、通信制御部２５１７、ドライブ装置２５１３を制御して、所定の動作を行わせる。また、処理途中のデータについては、主としてメモリ２５０１に格納されるが、ＨＤＤ２５０５に格納されるようにしてもよい。本技術の実施例では、上で述べた処理を実施するためのアプリケーション・プログラムはコンピュータ読み取り可能なリムーバブル・ディスク２５１１に格納されて頒布され、ドライブ装置２５１３からＨＤＤ２５０５にインストールされる。インターネットなどのネットワーク及び通信制御部２５１７を経由して、ＨＤＤ２５０５にインストールされる場合もある。このようなコンピュータ装置は、上で述べたＣＰＵ２５０３、メモリ２５０１などのハードウエアとＯＳ及びアプリケーション・プログラムなどのプログラムとが有機的に協働することにより、上で述べたような各種機能を実現する。 Note that the information processing apparatus 100 described above is a computer apparatus, and as shown in FIG. 89, a memory 2501, a CPU (Central Processing Unit) 2503, a hard disk drive (HDD: Hard Disk Drive) 2505, and a display device. A display control unit 2507 connected to 2509, a drive device 2513 for the removable disk 2511, an input device 2515, and a communication control unit 2517 for connecting to a network are connected by a bus 2519. An operating system (OS) and an application program for executing the processing in this embodiment are stored in the HDD 2505, and are read from the HDD 2505 to the memory 2501 when executed by the CPU 2503. The CPU 2503 controls the display control unit 2507, the communication control unit 2517, and the drive device 2513 according to the processing content of the application program, and performs a predetermined operation. Further, data in the middle of processing is mainly stored in the memory 2501, but may be stored in the HDD 2505. In an embodiment of the present technology, an application program for performing the above-described processing is stored in a computer-readable removable disk 2511 and distributed, and installed from the drive device 2513 to the HDD 2505. In some cases, the HDD 2505 may be installed via a network such as the Internet and the communication control unit 2517. Such a computer apparatus realizes various functions as described above by organically cooperating hardware such as the CPU 2503 and the memory 2501 described above and programs such as the OS and application programs. .

以上述べた本実施の形態をまとめると、以下のようになる。 The above-described embodiment can be summarized as follows.

本実施の形態に係るデータ構造生成方法は、（Ａ）文字又は文字列に対応する符号を保持するための第１の領域と当該文字又は文字列の次に出現する文字の出現順番を各文字について保持するための第２の領域と上記文字又は文字列の次に出現する文字の出現順番に応じて出現回数又は当該出現回数が閾値を超えた場合に当該文字又は文字列と今回出現した文字とからなる文字列に対応する符号を保持するための第３の領域とを含むデータブロックを、使用される可能性のある各文字について生成し、（Ｂ）入力文字列に含まれる複数の文字のうち着目する文字又は文字列についてのデータブロックの第２の領域において入力文字列において着目する文字又は文字列の次に出現した文字についての出現順番が保持されており且つ第３の領域において当該出現順番のデータとして出現回数が保持されており且つ当該出現回数が今回閾値を超えることになることを検出すると、第３の領域において当該出現順番のデータとして、着目する文字又は文字列と次に出現した文字とからなる第２の文字列に対応する符号を格納し、（Ｃ）第２の文字列についてのデータブロックを生成する処理を含む。 In the data structure generation method according to the present embodiment, (A) a first area for holding a code corresponding to a character or a character string, and the appearance order of characters appearing next to the character or character string are set for each character. If the number of appearances or the number of appearances exceeds the threshold according to the appearance order of the second region to hold and the character or the character that appears next to the character string, the character or character string and the character that has appeared this time A data block including a third area for holding a code corresponding to the character string consisting of is generated for each character that may be used, and (B) a plurality of characters included in the input character string In the second area of the data block for the character or character string of interest in which the appearance order of the character that appears next to the character or character string of interest in the input character string is held, and the third area If the number of appearances is held as the data of the appearance order and the number of appearances exceeds the current threshold, the character or character string of interest as the data of the order of appearance in the third area The code | cord | chord corresponding to the 2nd character string which consists of the character which appeared next is stored, and the process which produces | generates the data block about the (C) 2nd character string is included.

このような処理を実施することによって得られる複数のデータブロックを用いれば、メモリ使用量が削減されている。 If a plurality of data blocks obtained by performing such processing are used, the memory usage is reduced.

なお、上で述べたデータ構造生成方法は、（Ｄ）入力文字列に含まれる複数の文字のうち着目する文字又は文字列についてのデータブロックの第２の領域において、入力文字列において着目する文字又は文字列の次に出現した文字についての出現順番が保持されており且つ第３の領域において当該出現順番のデータとして出現回数が保持されており且つ当該出現回数を増分しても閾値を超えないことを検出すると、次に出現した文字を、着目する文字に設定する処理をさらに含むようにしても良い。 Note that the data structure generation method described above includes (D) the character of interest in the input character string in the second area of the data block for the character or character string of interest among the plurality of characters included in the input character string. Alternatively, the appearance order of the character that appears next to the character string is held, and the number of appearances is held as data of the appearance order in the third area, and the threshold value is not exceeded even if the number of appearances is incremented When this is detected, the process may further include a process of setting a character that appears next as a focused character.

さらに、上で述べたデータ構造生成方法は、（Ｅ）入力文字列に含まれる複数の文字のうち着目する文字又は文字列についてのデータブロックの第２の領域において、入力文字列において着目する文字又は文字列の次に出現した文字についての出現順番が保持されており且つ第３の領域において当該出現順番のデータとして着目する文字又は文字列と次に出現した文字とからなる第２の文字列に対応する符号が保持されていることを検出すると、当該第２の文字列を、着目する文字列に設定する処理をさらに含むようにしても良い。 Furthermore, the data structure generation method described above is (E) a character of interest in the input character string in the second area of the data block for the character or character string of interest among the plurality of characters included in the input character string. Alternatively, the second character string that has the appearance order for the character that appears next to the character string, and that includes the character or character string of interest as data of the appearance order in the third area and the character that appears next If it is detected that a code corresponding to is held, the second character string may be further set to a character string of interest.

さらに、上で述べた第３の領域が、所定出現順番までの領域に限定されている場合もある。この場合、上で述べたデータブロックが、第３の領域の拡張領域を示すデータを保持する第４の領域をさらに含むようにしても良い。これにより、下位のデータブロックの数が多い場合にも対処できるようになる。 Furthermore, the third region described above may be limited to a region up to a predetermined appearance order. In this case, the data block described above may further include a fourth area that holds data indicating an extension area of the third area. As a result, it is possible to cope with a case where the number of lower data blocks is large.

また、上で述べたデータ構造生成方法は、（Ｆ）入力文字列について生成された複数のデータブロックから、当該複数のデータブロックの各データブロックについて当該データブロックの第１の領域に保持されている符号及び第３の領域に保持されている符号を当該符号に対応する文字又は文字列のコードに基づき第２の符号に付与し直した状態における複数のデータブロックで表される木構造における第１階層の各データブロックについては最上位階層を表すデータと当該データブロックの第１の領域に保持されている第２の符号に対応する文字のコードとを対応付ける第１の対応付けデータと、上記状態における複数のデータブロックで表される木構造における第２階層以降の各データブロックについては当該データブロックの親データブロックの第１の領域に保持されている第２の符号と当該データブロックで追加された文字のコードとを対応付ける第２の対応付けデータとを含むデータを生成する生成処理をさらに含むようにしても良い。 In the data structure generation method described above, (F) each data block of the plurality of data blocks is held in the first area of the data block from the plurality of data blocks generated for the input character string. In the tree structure represented by a plurality of data blocks in a state where the code held in the third area and the code held in the third area are reassigned to the second code based on the character or character string code corresponding to the code For each data block of one layer, first association data that associates data representing the highest layer with a character code corresponding to the second code held in the first area of the data block; For each data block in the second and subsequent layers in the tree structure represented by a plurality of data blocks in the state, the parent data of the data block A generation process for generating data including the second code held in the first area of the lock and the second association data for associating the character code added in the data block may be further included. .

このようにすれば、ディスクに格納できる形式でさらに圧縮処理及び伸張処理に好適なデータ構造（例えば実施の形態に係る圧縮マップ）が生成される。 In this way, a data structure suitable for compression processing and decompression processing (for example, a compression map according to the embodiment) is generated in a format that can be stored on the disk.

また、第１の対応付けデータ及び第２の対応付けデータにおいて、当該データブロックについて第３の領域において保持されている最も大きい第２の符号をさらに対応付けられている場合もある。例えば圧縮処理における探索処理を高速化することができるようになる。 Further, in the first association data and the second association data, the largest second code held in the third area may be further associated with the data block. For example, the search process in the compression process can be speeded up.

なお、上で述べたデータ構造生成方法は、（Ｇ）入力文字列について生成された複数のデータブロックのうち使用される可能性がある文字についての第１のデータブロックの各々を、当該第１のデータブロックの符号の小さい順に処理対象に設定する処理と、（Ｈ）処理対象の第１のデータブロックの第２の領域において文字のコードの順番で出現順番を読み出し、当該処理対象の第１のデータブロックの第３の領域において当該出現順番のデータとして符号が保持されている場合には、当該出現順番のデータとして保持されている符号のデータブロックを第１のデータブロックの後に処理すべき第２のデータブロックに追加し、当該処理対象の第１のデータブロックの処理順番に相当する第２の符号と上記文字のコードとを対応付ける対応付けデータを順に格納する第１の格納処理と、（Ｉ）第１のデータブロックの後に処理すべき第２のデータブロックの各々を、追加された順番で処理対象に設定する処理と、（Ｊ）処理対象の第２のデータブロックの第２の領域において文字のコードの順番で出現順番を読み出し、当該処理対象の第２のデータブロックの第３の領域において当該出現順番のデータとして符号が保持されている場合には、当該出現順番のデータとして保持されている符号のデータブロックを上記後に処理すべき第２のデータブロックに追加し、当該処理対象の第２のデータブロックの処理順番に相当する第２の符号と上記文字のコードとを対応付ける対応付けデータを順に格納する第２の格納処理とをさらに含むようにしても良い。 In the data structure generation method described above, each of the first data blocks for the characters that may be used among the plurality of data blocks generated for the input character string (G) (H) a process of setting the data blocks in order of increasing codes, (H) reading out the appearance order in the order of the character codes in the second area of the first data block to be processed, and If the code is held as the data in the appearance order in the third area of the data block, the data block of the code held as the data in the appearance order should be processed after the first data block Correspondence that is added to the second data block and associates the second code corresponding to the processing order of the first data block to be processed with the character code (I) a process of setting each of the second data blocks to be processed after the first data block as a processing target in the added order; and (J ) The appearance order is read in the order of the character codes in the second area of the second data block to be processed, and the code is retained as the data in the appearance order in the third area of the second data block to be processed If it is, the data block of the code held as the data in the appearance order is added to the second data block to be processed later, which corresponds to the processing order of the second data block to be processed And a second storage process for sequentially storing association data for associating the second code and the character code.

さらに、上で述べたデータ構造生成方法は、（Ｋ）入力文字列について生成された複数のデータブロックのうち使用される可能性がある文字についての第１のデータブロックの各々について、最上位のデータブロックを表すデータと当該第１のデータブロックについての文字のコードとを対応付ける対応付けデータを文字のコードの順に格納する処理をさらに含むようにしても良い。この場合、第１の格納処理又は第２の格納処理において、最後に上記対応付けデータを格納した際の第２の符号を、処理対象の第１のデータブロックについての文字のコード又は処理対象の第２のデータブロックについての文字列の最終文字のコードに対応付けて格納するようにしても良い。 Furthermore, the above-described data structure generation method (K) performs the top-level processing for each of the first data blocks for characters that may be used among the plurality of data blocks generated for the input character string. You may make it further include the process which stores the matching data which matches the data showing a data block and the character code about the said 1st data block in order of the character code. In this case, in the first storage process or the second storage process, the second code when the association data is stored last is used as the character code or the processing target of the first data block to be processed. The second data block may be stored in association with the last character code of the character string.

本実施の形態の第２の態様に係るデータ構造生成方法は、（Ａ）文字又は文字列に対応する符号を保持するための第１の領域と当該文字又は文字列の次に出現する文字の出現順番を各文字について保持するための第２の領域と文字又は文字列の次に出現する文字の出現順番に応じて出現回数又は当該出現回数が閾値を超えた場合に文字又は文字列と今回出現した文字とからなる文字列に対応する符号を保持するための第３の領域とを含むデータブロックを、使用される可能性のある各文字と、既にデータブロックが生成されている、入力文字列内の文字又は文字列の次に所定の出現回数以上出現する文字とについて生成する第１生成処理と、（Ｂ）第１生成処理により生成された複数のデータブロックから、当該複数のデータブロックの各データブロックについて当該データブロックの第１の領域に保持されている符号及び第３の領域に保持されている符号を当該符号に対応する文字又は文字列のコードに基づき第２の符号に付与し直した状態における複数のデータブロックで表される木構造の第１階層の各データブロックについては最上位階層を表すデータ又は当該データブロックについて第３の領域に保持されている最も大きい第２の符号と当該データブロックの第１の領域に保持されている第２の符号に対応する文字のコードと階層番号とを対応付ける第１の対応付けデータと、上記状態における複数のデータブロックで表される木構造の第２階層以降の各データブロックについては当該データブロックの親データブロックの第１の領域に保持されている第２の符号又は当該データブロックについて第３の領域に保持されている最も大きい第２の符号と当該データブロックで追加された文字のコードと階層番号とを対応付ける第２の対応付けデータとを含むデータを生成する第２生成処理と、（Ｃ）上記状態における複数のデータブロックで表される木構造の第２階層以降の各階層に属する各データブロックの第１の領域に保持されている第２の符号に対応する文字列のコードを含む第１のデータを生成する第３生成処理と、（Ｄ）上記状態における複数のデータブロックで表される木構造の第２階層以降の各階層について当該階層内のデータブロック数と当該階層内のデータブロックの第１の領域に保持されている最小の第２の符号と第１のデータにおいて当該最小の第２の符号に対応する文字列の配置位置の、先頭からのオフセット値とを対応付ける階層情報を生成する第４生成処理とを含む。 In the data structure generation method according to the second aspect of the present embodiment, (A) a first area for holding a code corresponding to a character or character string, and a character appearing next to the character or character string The second area for holding the appearance order for each character and the number of appearances or the number of appearances according to the appearance order of the character that appears next to the character or character string Each character that may be used as a data block including a third area for holding a code corresponding to a character string made up of characters that have appeared, and an input character for which a data block has already been generated A first generation process for generating a character in the column or a character that appears more than a predetermined number of times next to the character string, and (B) a plurality of data blocks from the plurality of data blocks generated by the first generation process Each of For the data block, the code held in the first area of the data block and the code held in the third area are reassigned to the second code based on the character or character string code corresponding to the code. For each data block of the first hierarchy of the tree structure represented by a plurality of data blocks in the state that has been made, data representing the highest hierarchy or the largest second code held in the third area for the data block First association data for associating a character code corresponding to a second code held in the first area of the data block with a hierarchy number, and a tree structure represented by a plurality of data blocks in the above state For each data block in the second layer and after, the second code held in the first area of the parent data block of the data block or the data block The second generation for generating the data including the largest second code held in the third area for the data block and the second association data for associating the character code added in the data block with the layer number Processing and (C) a character corresponding to the second code held in the first area of each data block belonging to each hierarchy after the second hierarchy of the tree structure represented by the plurality of data blocks in the above state A third generation process for generating the first data including the code of the column; and (D) the number of data blocks in the hierarchy for each hierarchy after the second hierarchy of the tree structure represented by the plurality of data blocks in the above state. And the minimum second code held in the first area of the data block in the hierarchy and the arrangement position of the character string corresponding to the minimum second code in the first data And a fourth generation process for generating hierarchical information for associating the offset value from the head.

これによって生成されるデータを用いれば、圧縮処理及び伸張処理の処理速度を上げることができる。 By using the data generated thereby, the processing speed of the compression process and the expansion process can be increased.

本実施の形態の第３の態様に係るデータ構造は、文字又は文字列に対応する符号を保持するための第１の領域と当該文字又は文字列の次に出現する文字の出現順番を各文字について保持するための第２の領域と上記文字又は文字列の次に出現する文字の出現順番に応じて出現回数又は当該出現回数が閾値を超えた場合に上記文字又は文字列と今回出現した文字とからなる文字列に対応する符号を保持するための第３の領域とを含むデータブロックを各ノードのデータとして含む文節木のデータ構造である。これによってメモリ使用量を大幅に削減できるようになる。 The data structure according to the third aspect of the present embodiment includes a first area for holding a code corresponding to a character or a character string, and an appearance order of characters appearing next to the character or character string. The character or the character string and the character that has appeared this time when the number of appearances or the number of appearances exceeds the threshold according to the appearance order of the character that appears next to the second region and the character or the character string. A data structure including a data block including a third area for holding a code corresponding to a character string consisting of As a result, the memory usage can be greatly reduced.

本実施の形態の第４の態様に係るデータ構造は、文節木に対応するデータ構造であって、文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたものである。このようなデータであれば、そのままディスクに格納して後に使用することも可能である。 The data structure according to the fourth aspect of the present embodiment is a data structure corresponding to a phrase tree, and for each node of the phrase tree, the code of the parent node of the node and the character represented by the node The association data for associating the codes are arranged in the order of the codes of the nodes. Such data can be stored in a disk as it is and used later.

本実施の形態の第４の態様に係るデータ構造において、上で述べた対応付けデータが、さらに当該ノードの子ノードの符号のうち最大の符号をさらに対応付けている場合もある。これによって、圧縮処理の処理効率を上げることができるようになる。 In the data structure according to the fourth aspect of the present embodiment, the association data described above may further associate the maximum code among the codes of the child nodes of the node. As a result, the processing efficiency of the compression process can be increased.

また、本実施の形態の第５の態様に係るデータ構造は、文節木に対応する第１のデータと、文節木の各ノードに対応する文字又は文字列についての第２のデータと、文節木の階層についての第３のデータとを有する。そして、第１のデータにおいて、文節木の各ノードについて、当該ノードの親ノードの符号又は当該ノードの子ノードの符号のうち最大の符号と、当該ノードで表される文字のコードと、当該ノードが所属する階層とを対応付けるデータが、各ノードの符号の順番に並べられている。また、第２のデータは、文節木の第２階層以降の各階層の各ノードについて当該ノードに対応する文字又は文字列のコードを当該ノードの符号の順に含む。さらに、第３のデータは、文節木の第２階層以降の各階層について、当該階層に属するノードの数と、当該階層に属するノードの符号のうち最小の符号と、第２のデータにおいて当該最小の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む。 The data structure according to the fifth aspect of the present embodiment includes first data corresponding to a phrase tree, second data regarding a character or character string corresponding to each node of the phrase tree, and a phrase tree. And the third data for the hierarchy. In the first data, for each node of the phrase tree, the maximum code among the code of the parent node of the node or the code of the child node of the node, the code of the character represented by the node, and the node Data that associates with the hierarchy to which the node belongs is arranged in the order of the codes of the nodes. In addition, the second data includes, for each node in each hierarchy after the second hierarchy of the phrase tree, a character or character string code corresponding to the node in the order of the code of the node. Further, the third data includes, for each hierarchy after the second hierarchy of the phrase tree, the number of nodes belonging to the hierarchy, the smallest code among the codes of the nodes belonging to the hierarchy, and the minimum in the second data. The data which matches the offset value from the head of the arrangement position of the character or character string corresponding to the code of.

このようにすれば圧縮処理及び伸張処理の処理速度を向上させることができる。 In this way, the processing speed of compression processing and decompression processing can be improved.

さらに、本実施の形態の第６の態様に係る圧縮方法は、（Ａ）文節木に対応するデータ構造であって、文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、（Ｂ）親ノードの符号が第１の文字に対応する符号となっており且つノードで表される文字のコードが入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、データ構造において探索する探索処理と、（Ｃ）対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、（Ｄ）対応付けデータが検出されない場合には、参照している対応付けデータの符号を出力し、第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、（Ｅ）探索処理と第１参照処理と第２参照処理とを、入力文字列の最後の文字を処理するまで、第２の文字を入力文字列の文字の順に移動させつつ実施し、入力文字列の最後の文字を処理した後に、参照している対応付けデータの符号を出力する処理とを含む。 Furthermore, the compression method according to the sixth aspect of the present embodiment is (A) a data structure corresponding to a phrase tree, and for each node of the phrase tree, the code of the parent node of the node and the node A process of referring to the association data of the code corresponding to the first character included in the input character string in the data structure in which the association data for associating the represented character code is arranged in the order of the code of each node (B) Correspondence in which the code of the parent node is a code corresponding to the first character, and the code of the character represented by the node is the code of the second character that appears next in the input character string A search process for searching for attached data in the data structure; (C) when association data is detected, a first reference process for referring to the association data; and (D) association data is not detected. In this case, a code of the association data being referred to is output, a second reference process for referring to the association data of the code corresponding to the second character, (E) the search process, the first reference process, and the first 2 The reference process is performed by moving the second character in the order of the characters of the input character string until the last character of the input character string is processed, and the reference is made after the last character of the input character string is processed. And processing for outputting the code of the associated data.

このように上で述べたデータ構造を用いて圧縮処理を実施できる。 Thus, the compression process can be performed using the data structure described above.

本実施の形態の第６の態様に係る圧縮方法の探索処理において、データ構造の第１階層についての対応付けデータを参照している場合には、第２階層以降の対応付けデータを探索し、データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの符号より後ろの対応付けデータを探索するようにしても良い。 In the search process of the compression method according to the sixth aspect of the present embodiment, when referring to the association data for the first hierarchy of the data structure, the association data for the second hierarchy and thereafter are searched, When referring to the association data for the second and subsequent layers of the data structure, the association data after the code of the referenced association data may be searched.

また、上で述べた対応付けデータが、さらに上記ノードの子ノードの符号のうち最大の符号をさらに対応付けるようにしても良い。この場合、上で述べた探索処理において、データ構造の第１階層についての対応付けデータを参照している場合には、第２階層に属する最初の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索し、データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの１つ前の対応付けデータに含まれる最大の符号の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索するようにしても良い。これによって、探索範囲が狭められ、圧縮処理の高速化が図られる。 Further, the association data described above may further associate the maximum code among the codes of the child nodes of the node. In this case, in the search process described above, when the association data for the first hierarchy of the data structure is referred to, the association data referred to from the first association data belonging to the second hierarchy If the association data of the second and subsequent hierarchies of the data structure is referenced up to the maximum code association data included in the You may make it search from the correlation data of the largest code | cord | chord contained in matching data to the correlation data of the largest code | cord | chord contained in the reference | corresponding correlation data. This narrows the search range and speeds up the compression process.

また、上で述べた対応付けデータが、さらに当該ノードの属する階層の階層番号をさらに対応付けるようにしても良い。この場合、上で述べた探索処理において、データ構造において参照している対応付けデータの階層番号の次の階層番号が対応付けられている対応付けデータを探索するようにしても良い。このようにしても、探索範囲が狭められ、圧縮処理の高速化が図られる。 Further, the association data described above may further associate the hierarchy number of the hierarchy to which the node belongs. In this case, in the search processing described above, the association data associated with the next hierarchical number of the hierarchical number of the association data referenced in the data structure may be searched. This also narrows the search range and speeds up the compression process.

本実施の形態の第７の態様に係る圧縮方法は、文節木に対応する第１のデータと文節木の階層についての第２のデータとを有するデータ構造であって、第１のデータにおいて、文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードと、当該ノードが所属する階層とを対応付けるデータが、各ノードの符号の順番に並べられており、第２のデータは、文節木の各階層について、当該階層に属するノードの数と、当該階層に属するノードの符号のうち最小の符号とを対応付けるデータを含むデータ構造を用いる。そして、上記圧縮方法は、（Ａ）上記データ構造に含まれる第１のデータにおいて、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、（Ｂ）親ノードの符号が、参照している対応付けデータに対応するノードの符号となっており且つ上記ノードで表される文字のコードが入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、第２のデータにおいて、参照している対応付けデータに対応するノードの階層の１階層下の階層に属するノードの数及び最小の符号とから特定される範囲を第１のデータについて探索する探索処理と、（Ｃ）対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、（Ｄ）対応付けデータが検出されない場合には、参照している対応付けデータの符号を出力し、第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、（Ｅ）探索処理と第１参照処理と第２参照処理とを、入力文字列の最後の文字を処理するまで、第２の文字を入力文字列の文字の順に移動させつつ実施し、入力文字列の最後の文字を処理した後に、参照している対応付けデータの符号を出力する処理と含む。このような処理でも伸張処理を高速化できる。 The compression method according to the seventh aspect of the present embodiment is a data structure having first data corresponding to a phrase tree and second data on the hierarchy of the phrase tree, For each node in the phrase tree, data that associates the code of the parent node of the node, the character code represented by the node, and the hierarchy to which the node belongs are arranged in the order of the code of each node. The second data uses a data structure including data that associates the number of nodes belonging to the hierarchy with the minimum code among the codes of the nodes belonging to the hierarchy for each hierarchy of the phrase tree. The compression method includes (A) a process of referring to association data of a code corresponding to the first character included in the input character string in the first data included in the data structure, and (B) a parent. The code of the node is the code of the node corresponding to the referenced association data, and the character code represented by the node is the code of the second character that appears next in the input character string. In the second data, in the second data, the range specified by the number of nodes belonging to the hierarchy one level below the hierarchy of the node corresponding to the reference data to be referred to and the minimum code is the first data. Search processing for searching for (C) association data is detected, first reference processing for referring to the association data, and (D) reference data if association data is not detected A second reference process for outputting the code of the corresponding association data and referring to the association data of the code corresponding to the second character, (E) a search process, a first reference process, and a second reference process, Until the last character of the input character string is processed, the second character is moved in the order of the character of the input character string, and after the last character of the input character string is processed, Including a process of outputting a code. Even with such processing, the decompression processing can be accelerated.

さらに、本実施の形態の第８の態様に係る伸張方法は、（Ａ）文節木に対応するデータ構造であって、文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力符号のうち第１の符号の順番の対応付けデータを特定する処理と、（Ｂ）特定された対応付けデータに含まれる文字のコードをメモリに格納する格納処理と、（Ｃ）特定された対応付けデータに含まれる親ノードの符号が文節木の根ノード以外のノードの符号を示している場合には、当該親ノードの符号の対応付けデータを参照する参照処理と、（Ｄ）特定された対応付けデータに含まれる親ノードの符号が文節木の根ノードの符号を示している場合には、メモリに格納されている文字のコードを逆順に出力する出力処理と、（Ｅ）格納処理と参照処理と出力処理とを、入力符号のうち第１の符号より後ろの各符号について順番に実施する処理とを含む。 Furthermore, the decompression method according to the eighth aspect of the present embodiment is (A) a data structure corresponding to a phrase tree, and for each node of the phrase tree, the code of the parent node of the node and the node In the data structure in which the association data for associating the represented character codes are arranged in the order of the codes of the nodes, the process of identifying the association data in the order of the first code among the input codes, (B ) A storage process for storing the character code included in the specified association data in the memory; and (C) the code of the parent node included in the specified association data indicates the code of a node other than the root node of the phrase tree. The reference processing of referring to the association data of the code of the parent node, and (D) the code of the parent node included in the identified association data indicates the code of the root node of the phrase tree. In this case, an output process for outputting the codes of characters stored in the memory in reverse order, and (E) a storage process, a reference process, and an output process are performed for each code after the first code among the input codes. And processing performed in order.

このように上で述べたデータ構造を用いて伸張処理を行うことができる。 In this manner, the decompression process can be performed using the data structure described above.

本実施の形態の第９の態様に係る圧縮方法は、文節木に対応するデータ構造であって、文節木の各ノードについて、当該ノードの子ノードの符号のうち最大の符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造を用いる。そして、本圧縮方法は、（Ａ）入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、（Ｂ）上記ノードで表される文字のコードが入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、データ構造において探索する探索処理と、（Ｃ）対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、（Ｄ）対応付けデータが検出されない場合には、参照している対応付けデータの符号を出力し、第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、（Ｅ）探索処理と第１参照処理と第２参照処理とを、入力文字列の最後の文字を処理するまで、第２の文字を入力文字列の文字の順に移動させつつ実施し、入力文字列の最後の文字を処理した後に、参照している対応付けデータの符号を出力する処理と含む。そして、上記探索処理においては、データ構造の第１階層についての対応付けデータを参照している場合には、第２階層に属する最初の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索する、また、データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの１つ前の対応付けデータに含まれる最大の符号の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索する。このようにすれば、探索範囲が狭められているので高速に圧縮できるようになる。 The compression method according to the ninth aspect of the present embodiment is a data structure corresponding to a phrase tree, and for each node of the phrase tree, the maximum code among the codes of child nodes of the node and the node Associating data for associating the represented character codes with each other uses a data structure arranged in the order of the codes of the nodes. The compression method includes (A) a process of referring to the association data of the code corresponding to the first character included in the input character string, and (B) the character code represented by the node is the input character string. In the data structure, and (C) when the association data is detected, the association data that is the code of the second character that appears next in FIG. 1 reference process, and (D) when the association data is not detected, the code of the association data being referred to is output, and the second reference process of referring to the association data of the code corresponding to the second character And (E) performing the search process, the first reference process, and the second reference process while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. Process last character of input string After the, including the process for outputting the sign of the mapping data to which reference is made. In the search process, when the association data for the first hierarchy of the data structure is referred to, it is included in the associated association data from the first association data belonging to the second hierarchy. Search up to the association data with the maximum code, and when referring to the association data for the second and subsequent layers of the data structure, the correspondence immediately before the referenced association data Search is performed from the association data of the maximum code included in the attached data to the association data of the maximum code included in the referenced association data. In this way, since the search range is narrowed, it can be compressed at high speed.

また、本実施の形態の第１０の態様に係る伸張方法は、文節木に対応する第１のデータと、文節木の各ノードに対応する文字又は文字列についての第２のデータと、文節木の階層についての第３のデータとを有し且つデータ格納部に格納されているデータ構造を用いる。そして、第１のデータにおいて、文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられている。さらに、第２のデータは、文節木の各階層の各ノードについて当該ノードに対応する文字又は文字列を当該ノードの符号の順に含む。さらに、第３のデータは、文節木の各階層について、当該階層に属するノードの符号のうち最小の符号と、第２のデータにおいて当該最小の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む。そして、本伸張方法は、（Ａ）上記データ構造に含まれる第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する処理と、（Ｂ）特定されたエントリに含まれる階層番号に従って第３のデータにおいて上記最小の符号とオフセット値とを特定する特定処理と、（Ｃ）第２のデータから、特定されたオフセット値に対して、特定されたエントリの符号と特定された最小の符号との差に階層番号を乗じた値を加算することで得られる配置位置から階層番号分の文字又は文字列を読み出す読み出し処理と、（Ｄ）入力符号のうち第１の符号の後ろの第２の符号以降の各符号について、第１のデータにおけるエントリを特定し、特定処理と読み出し処理とを実施する処理とを含む。 The decompression method according to the tenth aspect of the present embodiment includes first data corresponding to a phrase tree, second data for a character or character string corresponding to each node of the phrase tree, and a phrase tree. And a data structure stored in the data storage unit. In the first data, for each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the codes of the nodes. Further, the second data includes a character or a character string corresponding to the node in order of the code of the node for each node in each hierarchy of the phrase tree. Further, the third data includes, for each hierarchy of the phrase tree, the minimum code among the codes of the nodes belonging to the hierarchy, and the arrangement position of the character or character string corresponding to the minimum code in the second data, Includes data that associates offset values from the beginning. In this decompression method, (A) the first data included in the data structure includes processing for specifying an entry in the order of the first code among the input codes, and (B) included in the specified entry. A specifying process for specifying the minimum code and the offset value in the third data according to the hierarchy number; and (C) the code of the specified entry is specified for the specified offset value from the second data. A read process for reading a character or a character string corresponding to a hierarchical number from an arrangement position obtained by adding a value obtained by multiplying the difference from the minimum code by a hierarchical number, and (D) the first code of the input code For each code after the second second code, the process includes specifying an entry in the first data and performing a specifying process and a reading process.

このようにすれば、伸張処理が高速化される。 In this way, the decompression process is speeded up.

また、本実施の形態の第１１の態様に係る伸張方法は、文節木に対応する第１のデータと、文節木の各ノードに対応する文字又は文字列についての第２のデータと、文節木の階層についての第３のデータとを有し且つデータ格納部に格納されているデータ構造を用いる。そして、第１のデータにおいて、文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられている。また、第２のデータは、文節木の第２の階層以降の各階層の各ノードについて当該ノードに対応する文字列を当該ノードの符号の順に含む。さらに、第３のデータは、文節木の第２の階層以降の各階層について、当該階層に属するノードの符号のうち最小の符号と、第２のデータにおいて当該最小の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む。そして本伸張方法は、（Ａ）上記データ構造に含まれる第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する処理と、（Ｂ）特定されたエントリが第１の階層におけるエントリであれば、特定されたエントリの符号に対応する文字を出力する出力処理と、（Ｃ）特定されたエントリが第２の階層以降のエントリであれば、特定されたエントリに含まれる階層番号に従って第３のデータにおいて最小の符号とオフセット値とを特定する特定処理と、（Ｄ）第２のデータにおいて、特定されたオフセット値に対して、特定されたエントリの符号と特定された最小の符号との差に階層番号を乗じた値を加算することで得られる配置位置から階層番号分の文字又は文字列を読み出す読み出し処理と、（Ｅ）入力符号のうち第１の符号の後ろの第２の符号以降の各符号について、第１のデータにおけるエントリを特定し、出力処理と特定処理と読み出し処理とを実施する処理とを含む。 The decompression method according to the eleventh aspect of the present embodiment includes first data corresponding to a phrase tree, second data for a character or character string corresponding to each node of the phrase tree, and a phrase tree. And a data structure stored in the data storage unit. In the first data, for each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the codes of the nodes. Further, the second data includes a character string corresponding to the node in the order of the code of the node for each node in each layer after the second layer of the phrase tree. Further, the third data includes, for each hierarchy after the second hierarchy of the phrase tree, a character or a character corresponding to the minimum code among the codes of the nodes belonging to the hierarchy and the minimum code in the second data. It contains data that associates the offset value from the beginning of the column placement position. The decompression method includes (A) a process of specifying an entry in the order of the first code among the input codes in the first data included in the data structure, and (B) the specified entry is the first data If it is an entry in the hierarchy, an output process for outputting a character corresponding to the code of the identified entry, and (C) if the identified entry is an entry in the second hierarchy or later, it is included in the identified entry A specifying process for specifying a minimum code and an offset value in the third data according to the layer number; and (D) a code for the specified entry is specified for the specified offset value in the second data. A read process for reading out characters or character strings corresponding to the layer number from the arrangement position obtained by adding a value obtained by multiplying the difference from the minimum code by the layer number; For the second of each code after code behind the code, and a process for identifying an entry in the first data, to implement a particular process and the reading process and the output process.

このようにデータ構造のデータ量を削減しても伸張処理を高速化することができる。 Thus, even if the data amount of the data structure is reduced, the decompression process can be accelerated.

なお、上で述べたような処理をコンピュータに実施させるためのプログラムを作成することができ、当該プログラムは、例えばフレキシブル・ディスク、ＣＤ−ＲＯＭなどの光ディスク、光磁気ディスク、半導体メモリ（例えばＲＯＭ）、ハードディスク等のコンピュータ読み取り可能な記憶媒体又は記憶装置に格納される。 It is possible to create a program for causing a computer to carry out the processing described above, such as a flexible disk, an optical disk such as a CD-ROM, a magneto-optical disk, and a semiconductor memory (for example, ROM). Or a computer-readable storage medium such as a hard disk or a storage device.

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）
文字又は文字列に対応する符号を保持するための第１の領域と当該文字又は文字列の次に出現する文字の出現順番を各文字について保持するための第２の領域と前記文字又は文字列の次に出現する文字の出現順番に応じて出現回数又は当該出現回数が閾値を超えた場合に前記文字又は文字列と今回出現した文字とからなる文字列に対応する符号を保持するための第３の領域とを含むデータブロックを、使用される可能性のある各文字について生成し、
入力文字列に含まれる複数の文字のうち着目する文字又は文字列についてのデータブロックの第２の領域において前記入力文字列において前記着目する文字又は文字列の次に出現した文字についての出現順番が保持されており且つ第３の領域において当該出現順番のデータとして出現回数が保持されており且つ当該出現回数が今回閾値を超えることになることを検出すると、前記第３の領域において当該出現順番のデータとして、前記着目する文字又は文字列と前記次に出現した文字とからなる第２の文字列に対応する符号を格納し、
前記第２の文字列についてのデータブロックを生成する
処理をコンピュータに実行させるためのプログラム。 (Appendix 1)
A first area for holding a code corresponding to a character or character string, a second area for holding the appearance order of the character that appears next to the character or character string, and the character or character string The number of times of appearance according to the appearance order of the character that appears next to the number of times, or when the number of times of occurrence exceeds a threshold, a code for holding a code corresponding to the character string consisting of the character or character string and the character that has appeared this time A data block containing three regions for each character that may be used,
Of the plurality of characters included in the input character string, in the second area of the data block for the character or character string of interest, the appearance order of the character that appears next to the character or character string of interest in the input character string is If it is detected that the number of appearances is held as data of the appearance order in the third area and the number of appearances exceeds the current threshold, the appearance order of the third area is detected. As data, a code corresponding to a second character string consisting of the character or character string of interest and the next appearing character is stored,
A program for causing a computer to execute a process for generating a data block for the second character string.

（付記２）
前記入力文字列に含まれる複数の文字のうち着目する文字又は文字列についてのデータブロックの第２の領域において前記入力文字列において前記着目する文字又は文字列の次に出現した文字についての出現順番が保持されており且つ第３の領域において当該出現順番のデータとして出現回数が保持されており且つ当該出現回数を増分しても閾値を超えないことを検出すると、前記次に出現した文字を、着目する文字に設定する
処理をさらに前記コンピュータに実行させるための付記１記載のプログラム。 (Appendix 2)
Appearance order of the character or character string that appears next to the character or character string of interest in the second area of the data block for the character or character string of interest of the plurality of characters included in the input character string And the number of occurrences is held as data of the appearance order in the third area, and it is detected that the threshold value is not exceeded even if the number of appearances is increased, the next appearing character is The program according to appendix 1, for causing the computer to further execute a process of setting a character of interest.

（付記３）
前記入力文字列に含まれる複数の文字のうち着目する文字又は文字列についてのデータブロックの第２の領域において前記入力文字列において前記着目する文字又は文字列の次に出現した文字についての出現順番が保持されており且つ前記第３の領域において当該出現順番のデータとして前記着目する文字又は文字列と前記次に出現した文字とからなる第２の文字列に対応する符号が保持されていることを検出すると、当該第２の文字列を、着目する文字列に設定する
処理をさらに前記コンピュータに実行させるための付記１又は２記載のプログラム。 (Appendix 3)
Appearance order of the character or character string that appears next to the character or character string of interest in the second area of the data block for the character or character string of interest of the plurality of characters included in the input character string And a code corresponding to the second character string including the character or character string of interest and the next appearing character is retained as the data of the appearance order in the third area. The program according to appendix 1 or 2, for causing the computer to further execute a process of setting the second character string as a focused character string when the second character string is detected.

（付記４）
前記第３の領域が、所定出現順番までの領域に限定されており、
前記データブロックが、前記第３の領域の拡張領域を示すデータを保持する第４の領域をさらに含む、
付記１乃至３のいずれか１つ記載のプログラム。 (Appendix 4)
The third region is limited to a region up to a predetermined order of appearance;
The data block further includes a fourth area holding data indicating an extension area of the third area;
The program according to any one of appendices 1 to 3.

（付記５）
前記入力文字列について生成された複数のデータブロックから、当該複数のデータブロックの各データブロックについて当該データブロックの第１の領域に保持されている符号及び第３の領域に保持されている符号を当該符号に対応する文字又は文字列のコードに基づき第２の符号に付与し直した状態における複数のデータブロックで表される木構造における第１階層の各データブロックについては最上位階層を表すデータと当該データブロックの第１の領域に保持されている第２の符号に対応する文字のコードとを対応付ける第１の対応付けデータと、前記状態における複数のデータブロックで表される木構造における第２階層以降の各データブロックについては当該データブロックの第１の領域に保持されている第２の符号と当該データブロックの第３の領域に保持されている第２の符号に関連付けられている前記次に出現する文字のコードとを対応付ける第２の対応付けデータとを含むデータを生成する生成処理
をさらに前記コンピュータに実行させるための付記１乃至４のいずれか１つ記載のプログラム。 (Appendix 5)
From the plurality of data blocks generated for the input character string, for each data block of the plurality of data blocks, the code held in the first area of the data block and the code held in the third area Data representing the highest hierarchy for each data block of the first hierarchy in the tree structure represented by a plurality of data blocks in the state reassigned to the second code based on the character or character string code corresponding to the code And the first association data that associates the character code corresponding to the second code held in the first area of the data block, and the tree structure represented by the plurality of data blocks in the state For each data block in the second and subsequent layers, the second code and the data held in the first area of the data block Generation processing for generating data including second association data that associates the code of the next appearing character associated with the second code held in the third area of the lock; The program according to any one of supplementary notes 1 to 4 for causing the program to be executed.

（付記６）
前記第１の対応付けデータ及び前記第２の対応付けデータにおいて、
当該データブロックについて第３の領域において保持されている最も大きい第２の符号をさらに対応付けられている
付記５記載のプログラム。 (Appendix 6)
In the first association data and the second association data,
The program according to claim 5, further associated with the largest second code held in the third region for the data block.

（付記７）
文字又は文字列に対応する符号を保持するための第１の領域と当該文字又は文字列の次に出現する文字の出現順番を各文字について保持するための第２の領域と前記文字又は文字列の次に出現する文字の出現順番に応じて出現回数又は当該出現回数が閾値を超えた場合に前記文字又は文字列と今回出現した文字とからなる文字列に対応する符号を保持するための第３の領域とを含むデータブロックを、使用される可能性のある各文字と、既にデータブロックが生成されている、入力文字列内の文字又は文字列の次に所定の出現回数以上出現する文字とについて生成する第１生成処理と、
前記第１生成処理により生成された複数のデータブロックから、当該複数のデータブロックの各データブロックについて当該データブロックの第１の領域に保持されている符号及び第３の領域に保持されている符号を当該符号に対応する文字又は文字列のコードに基づき第２の符号に付与し直した状態における複数のデータブロックで表される木構造の第１階層の各データブロックについては最上位階層を表すデータ又は当該データブロックについて第３の領域に保持されている最も大きい第２の符号と当該データブロックの第１の領域に保持されている第２の符号に対応する文字のコードと階層番号とを対応付ける第１の対応付けデータと、前記状態における複数のデータブロックで表される木構造の第２階層以降の各データブロックについては当該データブロックの第１の領域に保持されている第２の符号又は当該データブロックについて第３の領域に保持されている最も大きい第２の符号と当該データブロックの第３の領域に保持されている第２の符号に関連付けられている前記次に出現する文字のコードと階層番号とを対応付ける第２の対応付けデータとを含むデータを生成する第２生成処理と、
前記状態における複数のデータブロックで表される木構造の第２階層以降の各階層に属する各データブロックの第１の領域に保持されている第２の符号に対応する文字列のコードを含む第１のデータを生成する第３生成処理と、
前記状態における複数のデータブロックで表される木構造の第２階層以降の各階層について当該階層内のデータブロック数と当該階層内のデータブロックの第１の領域に保持されている最小の第２の符号と前記第１のデータにおいて当該最小の第２の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付ける階層情報を生成する第４の生成処理と、
をコンピュータに実行させるためのプログラム。 (Appendix 7)
A first area for holding a code corresponding to a character or character string, a second area for holding the appearance order of the character that appears next to the character or character string, and the character or character string The number of times of appearance according to the appearance order of the character that appears next to the number of times, or when the number of times of occurrence exceeds a threshold, a code for holding a code corresponding to the character string consisting of the character or character string and the character that has appeared this time A data block including three areas, each character that may be used, and a character that has already been generated for a predetermined number of occurrences after the character in the input character string or the character string. A first generation process for generating
From the plurality of data blocks generated by the first generation process, for each data block of the plurality of data blocks, a code held in the first area of the data block and a code held in the third area Represents the highest layer for each data block in the first layer of the tree structure represented by a plurality of data blocks in a state where the symbol is reassigned to the second code based on the character or character string code corresponding to the code The largest second code held in the third area for the data or the data block, the code of the character corresponding to the second code held in the first area of the data block, and the layer number The first association data to be associated with each data block in the second and subsequent layers of the tree structure represented by the plurality of data blocks in the state. The second code held in the first area of the data block or the largest second code held in the third area for the data block and held in the third area of the data block A second generation process for generating data including second association data for associating the code of the next appearing character associated with the second code being performed and the hierarchy number;
A first character string code corresponding to the second code held in the first area of each data block belonging to each hierarchy after the second hierarchy of the tree structure represented by the plurality of data blocks in the state; A third generation process for generating one data;
For each hierarchy after the second hierarchy of the tree structure represented by a plurality of data blocks in the state, the number of data blocks in the hierarchy and the minimum second stored in the first area of the data block in the hierarchy A generation process for generating hierarchical information that associates an offset value from the beginning of an arrangement position of a character or character string corresponding to the minimum second code in the first data;
A program that causes a computer to execute.

（付記８）
文字又は文字列に対応する符号を保持するための第１の領域と当該文字又は文字列の次に出現する文字の出現順番を各文字について保持するための第２の領域と前記文字又は文字列の次に出現する文字の出現順番に応じて出現回数又は当該出現回数が閾値を超えた場合に前記文字又は文字列と今回出現した文字とからなる文字列に対応する符号を保持するための第３の領域とを含むデータブロックを各ノードのデータとして含む文節木のデータ構造。 (Appendix 8)
A first area for holding a code corresponding to a character or character string, a second area for holding the appearance order of the character that appears next to the character or character string, and the character or character string The number of times of appearance according to the appearance order of the character that appears next to the number of times, or when the number of times of occurrence exceeds a threshold, a code for holding a code corresponding to the character string consisting of the character or character string and the character that has appeared this time A data structure of a phrase tree including a data block including three areas as data of each node.

（付記９）
文節木に対応するデータ構造であって、
前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられた
データ構造。 (Appendix 9)
A data structure corresponding to a phrase tree,
A data structure in which, for each node of the phrase tree, association data that associates the code of the parent node of the node with the code of the character represented by the node is arranged in the order of the codes of the nodes.

（付記１０）
前記対応付けデータが、さらに当該ノードの子ノードの符号のうち最大の符号をさらに対応付けている付記９記載のデータ構造。 (Appendix 10)
The data structure according to appendix 9, wherein the association data further associates a maximum code among codes of child nodes of the node.

（付記１１）
文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有するデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードの親ノードの符号又は当該ノードの子ノードの符号のうち最大の符号と、当該ノードで表される文字のコードと、当該ノードが所属する階層とを対応付けるデータが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の第２階層以降の各階層の各ノードについて当該ノードに対応する文字又は文字列のコードを当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の第２階層以降の各階層について、当該階層に属するノードの数と、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の第２の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
データ構造。 (Appendix 11)
First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
A data structure having
In the first data,
For each node of the phrase tree, the maximum code among the code of the parent node of the node or the code of the child node of the node, the character code represented by the node, and the hierarchy to which the node belongs are associated with each other The data is arranged in the order of the sign of each node,
The second data is:
For each node in each hierarchy after the second hierarchy of the phrase tree, a character or character string code corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy after the second hierarchy of the phrase tree, the number of nodes belonging to the hierarchy, the smallest code among the codes of the nodes belonging to the hierarchy, and the smallest second code in the second data A data structure that contains data that correlates the offset value from the beginning of the corresponding character or character string location.

（付記１２）
文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
を、コンピュータに実行させるための圧縮プログラム。 (Appendix 12)
A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node In the data structure arranged in order, a process of referring to the association data of the code corresponding to the first character included in the input character string;
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string A search process for searching the corresponding data in the data structure;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character, processing to output the code of the association data referred to,
Is a compression program that causes a computer to execute.

（付記１３）
前記探索処理において、
前記データ構造の第１階層についての対応付けデータを参照している場合には、第２階層以降の対応付けデータを探索し、
前記データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの符号より後ろの対応付けデータを探索する
付記１２記載の圧縮プログラム。 (Appendix 13)
In the search process,
When referring to the association data for the first hierarchy of the data structure, search the association data for the second hierarchy and thereafter,
13. The compression program according to claim 12, wherein when referring to the association data for the second and subsequent layers of the data structure, the association data after the code of the referenced association data is searched.

（付記１４）
前記対応付けデータが、さらに前記ノードの子ノードの符号のうち最大の符号をさらに対応付けており、
前記探索処理において、
前記データ構造の第１階層についての対応付けデータを参照している場合には、第２階層に属する最初の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索し、
前記データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの１つ前の対応付けデータに含まれる最大の符号の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索する
付記１２記載の圧縮プログラム。 (Appendix 14)
The association data further associates the maximum code among the codes of child nodes of the node;
In the search process,
When referring to the association data for the first hierarchy of the data structure, the association data of the maximum code included in the association data being referred to from the first association data belonging to the second hierarchy Explore until
When referring to the association data for the second and subsequent layers of the data structure, from the association data of the maximum code included in the association data immediately before the referenced association data The compression program according to appendix 12, wherein the search is performed up to the association data of the maximum code included in the association data being referred to.

（付記１５）
前記対応付けデータが、さらに当該ノードの属する階層の階層番号をさらに対応付けており、
前記探索処理において、
前記データ構造において参照している対応データの階層番号の次の階層番号が対応付けられている対応付けデータを探索する
付記１２記載の圧縮プログラム。 (Appendix 15)
The association data further associates a hierarchy number of a hierarchy to which the node belongs,
In the search process,
The compression program according to appendix 12, wherein the correspondence data associated with the hierarchical number next to the hierarchical number of the corresponding data referred to in the data structure is searched.

（付記１６）
文節木に対応する第１のデータと前記文節木の階層についての第２のデータとを有するデータ構造であって、前記第１のデータにおいて、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードと、当該ノードが所属する階層とを対応付けデータが、各ノードの符号の順番に並べられており、前記第２のデータは、前記文節木の各階層について、当該階層に属するノードの数と、当該階層に属するノードの符号のうち最小の符号とを対応付けるデータを含む前記データ構造に含まれる前記第１のデータにおいて、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記第２のデータにおいて、参照している対応付けデータに対応するノードの階層の１階層下の階層に属するノードの数及び前記最小の符号とから特定される範囲を前記第１のデータについて探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
をコンピュータに実行させるための圧縮プログラム。 (Appendix 16)
A data structure having first data corresponding to a phrase tree and second data for a hierarchy of the phrase tree, wherein each node of the phrase tree in the first data is a parent node of the node , The code of the character represented by the node, and the hierarchy to which the node belongs are arranged in the order of the code of each node, and the second data is the phrase tree In the first data included in the data structure including the data for associating the number of nodes belonging to the hierarchy with the minimum code among the codes of the nodes belonging to the hierarchy, A process of referring to the association data of the code corresponding to the first character
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string In the second data, the number of nodes belonging to a layer one level below the level of the node corresponding to the referenced association data and the minimum code are specified. A search process for searching for a range of the first data;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character, processing to output the code of the association data referred to,
A compression program that causes a computer to execute.

（付記１７）
文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの子ノードの符号のうち最大の符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
を、コンピュータに実行させ、
前記探索処理においては、
前記データ構造の第１階層についての対応付けデータを参照している場合には、第２階層に属する最初の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索し、
前記データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの１つ前の対応付けデータに含まれる最大の符号の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索する
圧縮プログラム。 (Appendix 17)
A data structure corresponding to a phrase tree, for each node of the phrase tree, association data that associates the maximum code among the codes of the child nodes of the node with the code of the character represented by the node, In the data structure arranged in the order of the code of each node, a process of referring to the association data of the code corresponding to the first character included in the input character string;
A search process for searching in the data structure for correspondence data in which the code of the character represented by the node is the code of the second character that appears next in the input character string;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character, processing to output the code of the association data referred to,
To the computer,
In the search process,
When referring to the association data for the first hierarchy of the data structure, the association data of the maximum code included in the association data being referred to from the first association data belonging to the second hierarchy Explore until
When referring to the association data for the second and subsequent layers of the data structure, from the association data of the maximum code included in the association data immediately before the referenced association data A compression program that searches up to association data of the maximum code included in the association data being referenced.

（付記１８）
文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力符号のうち第１の符号の順番の対応付けデータを特定する処理と、
特定された前記対応付けデータに含まれる文字のコードをメモリに格納する格納処理と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノード以外のノードの符号を示している場合には、当該親ノードの符号の対応付けデータを参照する参照処理と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノードの符号を示している場合には、前記メモリに格納されている文字のコードを逆順に出力する出力処理と、
前記格納処理と前記参照処理と前記出力処理とを、前記入力符号のうち前記第１の符号より後ろの各符号について順番に実施する処理と、
を、コンピュータに実行させるための伸張プログラム。 (Appendix 18)
A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node In the data structure arranged in order, the process of identifying the association data in the order of the first code among the input codes;
A storing process for storing a code of a character included in the identified association data in a memory;
When the code of the parent node included in the identified association data indicates the code of a node other than the root node of the phrase tree, a reference process for referring to the association data of the code of the parent node;
When the code of the parent node included in the identified association data indicates the code of the root node of the phrase tree, an output process for outputting the codes of the characters stored in the memory in reverse order;
A process of sequentially performing the storage process, the reference process, and the output process for each code after the first code among the input codes;
Is a decompression program for causing a computer to execute.

（付記１９）
文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の各階層の各ノードについて当該ノードに対応する文字又は文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の第２の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する処理と、
特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定処理と、
前記第２のデータから、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し処理と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記特定処理と前記読み出し処理とを実施する処理と、
を、コンピュータに実行させるための伸張プログラム。 (Appendix 19)
First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy of the phrase tree, the character or character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy of the phrase tree, the minimum code among the codes of the nodes belonging to the hierarchy and the position of the character or character string corresponding to the minimum second code in the second data from the beginning In the first data included in the data structure including data that associates an offset value, a process of specifying an entry in the order of the first code among the input codes;
A specifying process for specifying the minimum code and the offset value in the third data according to the hierarchical number included in the specified entry;
Obtained by adding a value obtained by multiplying the difference between the code of the specified entry and the specified minimum code to the specified offset value from the second data by the hierarchy number. Read processing for reading out characters or character strings corresponding to the hierarchy number from the arrangement position;
A process of specifying an entry in the first data for each code after the second code after the first code among the input codes, and performing the specifying process and the reading process;
Is a decompression program for causing a computer to execute.

（付記２０）
文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の第２の階層以降の各階層の各ノードについて当該ノードに対応する文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の第２の階層以降の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の第２の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する処理と、
特定された前記エントリが第１の階層におけるエントリであれば、特定された前記エントリの符号に対応する文字を出力する出力処理と、
特定された前記エントリが第２の階層以降のエントリであれば、特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定処理と、
前記第２のデータにおいて、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し処理と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記出力処理と前記特定処理と前記読み出し処理とを実施する処理と、
を、コンピュータに実行させるためのプログラム。 (Appendix 20)
First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy after the second hierarchy of the phrase tree, a character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy after the second hierarchy of the phrase tree, the smallest code among the codes of the nodes belonging to the hierarchy, and the arrangement of characters or character strings corresponding to the smallest second code in the second data Processing for specifying an entry in the order of the first code among the input codes in the first data included in the data structure, including data that associates an offset value from the head of the position;
If the specified entry is an entry in the first hierarchy, an output process for outputting a character corresponding to the code of the specified entry;
If the specified entry is an entry after the second layer, a specifying process for specifying the minimum code and the offset value in the third data according to the layer number included in the specified entry;
In the second data, it is obtained by adding a value obtained by multiplying a difference between the code of the specified entry and the specified minimum code to the specified offset value by the hierarchy number. Read processing for reading out characters or character strings corresponding to the hierarchy number from the arrangement position;
A process of specifying an entry in the first data for each code after the second code after the first code in the input code, and performing the output process, the specifying process, and the reading process When,
A program that causes a computer to execute.

（付記２１）
文字又は文字列に対応する符号を保持するための第１の領域と当該文字又は文字列の次に出現する文字の出現順番を各文字について保持するための第２の領域と前記文字又は文字列の次に出現する文字の出現順番に応じて出現回数又は当該出現回数が閾値を超えた場合に前記文字又は文字列と今回出現した文字とからなる文字列に対応する符号を保持するための第３の領域とを含むデータブロックを、使用される可能性のある各文字について生成し、
入力文字列に含まれる複数の文字のうち着目する文字又は文字列についてのデータブロックの第２の領域において前記入力文字列において前記着目する文字又は文字列の次に出現した文字についての出現順番が保持されており且つ第３の領域において当該出現順番のデータとして出現回数が保持されており且つ当該出現回数が今回閾値を超えることになることを検出すると、前記第３の領域において当該出現順番のデータとして、前記着目する文字又は文字列と前記次に出現した文字とからなる第２の文字列に対応する符号を格納し、
前記第２の文字列についてのデータブロックを生成する
処理がコンピュータにより実行される情報処理方法。 (Appendix 21)
A first area for holding a code corresponding to a character or character string, a second area for holding the appearance order of the character that appears next to the character or character string, and the character or character string The number of times of appearance according to the appearance order of the character that appears next to the number of times, or when the number of times of occurrence exceeds a threshold, a code for holding a code corresponding to the character string consisting of the character or character string and the character that has appeared this time A data block containing three regions for each character that may be used,
Of the plurality of characters included in the input character string, in the second area of the data block for the character or character string of interest, the appearance order of the character that appears next to the character or character string of interest in the input character string is If it is detected that the number of appearances is held as data of the appearance order in the third area and the number of appearances exceeds the current threshold, the appearance order of the third area is detected. As data, a code corresponding to a second character string consisting of the character or character string of interest and the next appearing character is stored,
An information processing method in which a process of generating a data block for the second character string is executed by a computer.

（付記２２）
文字又は文字列に対応する符号を保持するための第１の領域と当該文字又は文字列の次に出現する文字の出現順番を各文字について保持するための第２の領域と前記文字又は文字列の次に出現する文字の出現順番に応じて出現回数又は当該出現回数が閾値を超えた場合に前記文字又は文字列と今回出現した文字とからなる文字列に対応する符号を保持するための第３の領域とを含むデータブロックを、使用される可能性のある各文字について生成する手段と、
入力文字列に含まれる複数の文字のうち着目する文字又は文字列についてのデータブロックの第２の領域において前記入力文字列において前記着目する文字又は文字列の次に出現した文字についての出現順番が保持されており且つ第３の領域において当該出現順番のデータとして出現回数が保持されており且つ当該出現回数が今回閾値を超えることになることを検出すると、前記第３の領域において当該出現順番のデータとして、前記着目する文字又は文字列と前記次に出現した文字とからなる第２の文字列に対応する符号を格納する手段と、
前記第２の文字列についてのデータブロックを生成する手段と、
を有する情報処理装置。 (Appendix 22)
A first area for holding a code corresponding to a character or character string, a second area for holding the appearance order of the character that appears next to the character or character string, and the character or character string The number of times of appearance according to the appearance order of the character that appears next to the number of times, or when the number of times of occurrence exceeds a threshold, a code for holding a code corresponding to the character string consisting of the character or character string and the character that has appeared this time Means for generating a data block comprising three regions for each character that may be used;
Of the plurality of characters included in the input character string, in the second area of the data block for the character or character string of interest, the appearance order of the character that appears next to the character or character string of interest in the input character string is If it is detected that the number of appearances is held as data of the appearance order in the third area and the number of appearances exceeds the current threshold, the appearance order of the third area is detected. Means for storing, as data, a code corresponding to a second character string made up of the character or character string of interest and the next appearing character;
Means for generating a data block for the second character string;
An information processing apparatus.

（付記２３）
文字又は文字列に対応する符号を保持するための第１の領域と当該文字又は文字列の次に出現する文字の出現順番を各文字について保持するための第２の領域と前記文字又は文字列の次に出現する文字の出現順番に応じて出現回数又は当該出現回数が閾値を超えた場合に前記文字又は文字列と今回出現した文字とからなる文字列に対応する符号を保持するための第３の領域とを含むデータブロックを、使用される可能性のある各文字と、既にデータブロックが生成されている、入力文字列内の文字又は文字列の次に所定の出現回数以上出現する文字とについて生成する第１生成処理と、
前記第１生成処理により生成された複数のデータブロックから、当該複数のデータブロックの各データブロックについて当該データブロックの第１の領域に保持されている符号及び第３の領域に保持されている符号を当該符号に対応する文字又は文字列のコードに基づき第２の符号に付与し直した状態における複数のデータブロックで表される木構造の第１階層の各データブロックについては最上位階層を表すデータ又は当該データブロックについて第３の領域に保持されている最も大きい第２の符号と当該データブロックの第１の領域に保持されている第２の符号に対応する文字のコードと階層番号とを対応付ける第１の対応付けデータと、前記状態における複数のデータブロックで表される木構造の第２階層以降の各データブロックについては当該データブロックの第１の領域に保持されている第２の符号又は当該データブロックについて第３の領域に保持されている最も大きい第２の符号と当該データブロックの第３の領域に保持されている第２の符号に関連付けられている前記次に出現する文字のコードと階層番号とを対応付ける第２の対応付けデータとを含むデータを生成する第２生成処理と、
前記状態における複数のデータブロックで表される木構造の第２階層以降の各階層に属する各データブロックの第１の領域に保持されている第２の符号に対応する文字列のコードを含む第１のデータを生成する第３生成処理と、
前記状態における複数のデータブロックで表される木構造の第２階層以降の各階層について当該階層内のデータブロック数と当該階層内のデータブロックの第１の領域に保持されている最小の第２の符号と前記第１のデータにおいて当該最小の第２の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付ける階層情報を生成する第４の生成処理と、
がコンピュータにより実行される情報処理方法。 (Appendix 23)
A first area for holding a code corresponding to a character or character string, a second area for holding the appearance order of the character that appears next to the character or character string, and the character or character string The number of times of appearance according to the appearance order of the character that appears next to the number of times, or when the number of times of occurrence exceeds a threshold, a code for holding a code corresponding to the character string consisting of the character or character string and the character that has appeared this time A data block including three areas, each character that may be used, and a character that has already been generated for a predetermined number of occurrences after the character in the input character string or the character string. A first generation process for generating
From the plurality of data blocks generated by the first generation process, for each data block of the plurality of data blocks, a code held in the first area of the data block and a code held in the third area Represents the highest layer for each data block in the first layer of the tree structure represented by a plurality of data blocks in a state where the symbol is reassigned to the second code based on the character or character string code corresponding to the code The largest second code held in the third area for the data or the data block, the code of the character corresponding to the second code held in the first area of the data block, and the layer number The first association data to be associated with each data block in the second and subsequent layers of the tree structure represented by the plurality of data blocks in the state. The second code held in the first area of the data block or the largest second code held in the third area for the data block and held in the third area of the data block A second generation process for generating data including second association data for associating the code of the next appearing character associated with the second code being performed and the hierarchy number;
A first character string code corresponding to the second code held in the first area of each data block belonging to each hierarchy after the second hierarchy of the tree structure represented by the plurality of data blocks in the state; A third generation process for generating one data;
For each hierarchy after the second hierarchy of the tree structure represented by a plurality of data blocks in the state, the number of data blocks in the hierarchy and the minimum second stored in the first area of the data block in the hierarchy A generation process for generating hierarchical information that associates an offset value from the beginning of an arrangement position of a character or character string corresponding to the minimum second code in the first data;
Is an information processing method executed by a computer.

（付記２４）
文字又は文字列に対応する符号を保持するための第１の領域と当該文字又は文字列の次に出現する文字の出現順番を各文字について保持するための第２の領域と前記文字又は文字列の次に出現する文字の出現順番に応じて出現回数又は当該出現回数が閾値を超えた場合に前記文字又は文字列と今回出現した文字とからなる文字列に対応する符号を保持するための第３の領域とを含むデータブロックを、使用される可能性のある各文字と、既にデータブロックが生成されている、入力文字列内の文字又は文字列の次に所定の出現回数以上出現する文字とについて生成する手段と、
前記第１生成手段により生成された複数のデータブロックから、当該複数のデータブロックの各データブロックについて当該データブロックの第１の領域に保持されている符号及び第３の領域に保持されている符号を当該符号に対応する文字又は文字列のコードに基づき第２の符号に付与し直した状態における複数のデータブロックで表される木構造の第１階層の各データブロックについては最上位階層を表すデータ又は当該データブロックについて第３の領域に保持されている最も大きい第２の符号と当該データブロックの第１の領域に保持されている第２の符号に対応する文字のコードと階層番号とを対応付ける第１の対応付けデータと、前記状態における複数のデータブロックで表される木構造の第２階層以降の各データブロックについては当該データブロックの第１の領域に保持されている第２の符号又は当該データブロックについて第３の領域に保持されている最も大きい第２の符号と当該データブロックの第３の領域に保持されている第２の符号に関連付けられている前記次に出現する文字のコードと階層番号とを対応付ける第２の対応付けデータとを含むデータを生成する手段と、
前記状態における複数のデータブロックで表される木構造の第２階層以降の各階層に属する各データブロックの第１の領域に保持されている第２の符号に対応する文字列のコードを含む第１のデータを生成する手段と、
前記状態における複数のデータブロックで表される木構造の第２階層以降の各階層について当該階層内のデータブロック数と当該階層内のデータブロックの第１の領域に保持されている最小の第２の符号と前記第１のデータにおいて当該最小の第２の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付ける階層情報を生成する手段と、
を有する情報処理装置。 (Appendix 24)
A first area for holding a code corresponding to a character or character string, a second area for holding the appearance order of the character that appears next to the character or character string, and the character or character string The number of times of appearance according to the appearance order of the character that appears next to the number of times, or when the number of times of occurrence exceeds a threshold, a code for holding a code corresponding to the character string consisting of the character or character string and the character that has appeared this time A data block including three areas, each character that may be used, and a character that has already been generated for a predetermined number of occurrences after the character in the input character string or the character string. Means for generating and
From the plurality of data blocks generated by the first generation means, for each data block of the plurality of data blocks, a code held in the first area of the data block and a code held in the third area Represents the highest layer for each data block in the first layer of the tree structure represented by a plurality of data blocks in a state where the symbol is reassigned to the second code based on the character or character string code corresponding to the code The largest second code held in the third area for the data or the data block, the code of the character corresponding to the second code held in the first area of the data block, and the layer number First correspondence data to be associated with each data block in the second and subsequent layers of the tree structure represented by a plurality of data blocks in the state The second code held in the first area of the data block or the largest second code held in the third area for the data block and held in the third area of the data block Means for generating data including second association data for associating the code of the next appearing character associated with the second code being made and the hierarchy number;
A first character string code corresponding to the second code held in the first area of each data block belonging to each hierarchy after the second hierarchy of the tree structure represented by the plurality of data blocks in the state; Means for generating one data;
For each hierarchy after the second hierarchy of the tree structure represented by a plurality of data blocks in the state, the number of data blocks in the hierarchy and the minimum second stored in the first area of the data block in the hierarchy Generating hierarchical information that associates an offset value from the head of the arrangement position of the character or character string corresponding to the minimum second code in the first data,
An information processing apparatus.

（付記２５）
文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
が、コンピュータにより実行される圧縮方法。 (Appendix 25)
A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node In the data structure arranged in order, a process of referring to the association data of the code corresponding to the first character included in the input character string;
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string A search process for searching the corresponding data in the data structure;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character, processing to output the code of the association data referred to,
A compression method executed by a computer.

（付記２６）
文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する手段と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索手段と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照手段と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照手段と、
前記探索手段と前記第１参照手段と前記第２参照手段とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ動作させ、前記入力文字の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する手段と、
を有する情報処理装置。 (Appendix 26)
A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node Means for referring to the association data of the code corresponding to the first character included in the input character string in the data structure arranged in order;
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string Search means for searching the corresponding data in the data structure;
A first reference means for referring to the association data when the association data is detected;
A second reference means for outputting a code of the association data referred to and referring to the association data of the code corresponding to the second character when the association data is not detected;
The search means, the first reference means, and the second reference means are operated while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. Means for outputting a code of the association data referred to after processing the last character of the input character;
An information processing apparatus.

（付記２７）
文節木に対応する第１のデータと前記文節木の階層についての第２のデータとを有するデータ構造であって、前記第１のデータにおいて、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードと、当該ノードが所属する階層とを対応付けデータが、各ノードの符号の順番に並べられており、前記第２のデータは、前記文節木の各階層について、当該階層に属するノードの数と、当該階層に属するノードの符号のうち最小の符号とを対応付けるデータを含む前記データ構造に含まれる前記第１のデータにおいて、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記第２のデータにおいて、参照している対応付けデータに対応するノードの階層の１階層下の階層に属するノードの数及び前記最小の符号とから特定される範囲を前記第１のデータについて探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
が、コンピュータにより実行される圧縮方法。 (Appendix 27)
A data structure having first data corresponding to a phrase tree and second data for a hierarchy of the phrase tree, wherein each node of the phrase tree in the first data is a parent node of the node , The code of the character represented by the node, and the hierarchy to which the node belongs are arranged in the order of the code of each node, and the second data is the phrase tree In the first data included in the data structure including the data for associating the number of nodes belonging to the hierarchy with the minimum code among the codes of the nodes belonging to the hierarchy, A process of referring to the association data of the code corresponding to the first character
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string In the second data, the number of nodes belonging to a layer one level below the level of the node corresponding to the referenced association data and the minimum code are specified. A search process for searching for a range of the first data;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character, processing to output the code of the association data referred to,
A compression method executed by a computer.

（付記２８）
文節木に対応する第１のデータと前記文節木の階層についての第２のデータとを有するデータ構造であって、前記第１のデータにおいて、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードと、当該ノードが所属する階層とを対応付けデータが、各ノードの符号の順番に並べられており、前記第２のデータは、前記文節木の各階層について、当該階層に属するノードの数と、当該階層に属するノードの符号のうち最小の符号とを対応付けるデータを含む前記データ構造に含まれる前記第１のデータにおいて、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する手段と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記第２のデータにおいて、参照している対応付けデータに対応するノードの階層の１階層下の階層に属するノードの数及び前記最小の符号とから特定される範囲を前記第１のデータについて探索する探索手段と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照手段と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照手段と、
前記探索手段と前記第１参照手段と前記第２参照手段とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ動作させ、前記入力文字の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する手段と、
を有する情報処理装置。 (Appendix 28)
A data structure having first data corresponding to a phrase tree and second data for a hierarchy of the phrase tree, wherein each node of the phrase tree in the first data is a parent node of the node , The code of the character represented by the node, and the hierarchy to which the node belongs are arranged in the order of the code of each node, and the second data is the phrase tree In the first data included in the data structure including the data for associating the number of nodes belonging to the hierarchy with the minimum code among the codes of the nodes belonging to the hierarchy, Means for referring to the association data of the code corresponding to the first character
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string In the second data, the number of nodes belonging to a layer one level below the level of the node corresponding to the referenced association data and the minimum code are specified. Search means for searching for a range for the first data;
A first reference means for referring to the association data when the association data is detected;
A second reference means for outputting a code of the association data referred to and referring to the association data of the code corresponding to the second character when the association data is not detected;
The search means, the first reference means, and the second reference means are operated while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. Means for outputting a code of the association data referred to after processing the last character of the input character;
An information processing apparatus.

（付記２９）
文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの子ノードの符号のうち最大の符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
が、コンピュータにより実行され、
前記探索処理においては、
前記データ構造の第１階層についての対応付けデータを参照している場合には、第２階層に属する最初の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索し、
前記データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの１つ前の対応付けデータに含まれる最大の符号の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索する
圧縮方法。 (Appendix 29)
A data structure corresponding to a phrase tree, for each node of the phrase tree, association data that associates the maximum code among the codes of the child nodes of the node with the code of the character represented by the node, In the data structure arranged in the order of the code of each node, a process of referring to the association data of the code corresponding to the first character included in the input character string;
A search process for searching in the data structure for correspondence data in which the code of the character represented by the node is the code of the second character that appears next in the input character string;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character, processing to output the code of the association data referred to,
Is executed by the computer,
In the search process,
When referring to the association data for the first hierarchy of the data structure, the association data of the maximum code included in the association data being referred to from the first association data belonging to the second hierarchy Explore until
When referring to the association data for the second and subsequent layers of the data structure, from the association data of the maximum code included in the association data immediately before the referenced association data A compression method for searching up to the association data of the maximum code included in the referenced association data.

（付記３０）
文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの子ノードの符号のうち最大の符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する手段と、
前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索手段と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照手段と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照手段と、
前記探索手段と前記第１参照手段と前記第２参照手段とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ動作させ、前記入力文字の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する手段と、
を有し、
前記探索手段が、
前記データ構造の第１階層についての対応付けデータを参照している場合には、第２階層に属する最初の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索し、
前記データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの１つ前の対応付けデータに含まれる最大の符号の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索する
情報処理装置。 (Appendix 30)
A data structure corresponding to a phrase tree, for each node of the phrase tree, association data that associates the maximum code among the codes of the child nodes of the node with the code of the character represented by the node, Means for referring to the association data of the code corresponding to the first character included in the input character string in the data structure arranged in the order of the code of each node;
Search means for searching in the data structure for correspondence data in which the code of the character represented by the node is the code of the second character that appears next in the input character string;
A first reference means for referring to the association data when the association data is detected;
A second reference means for outputting a code of the association data referred to and referring to the association data of the code corresponding to the second character when the association data is not detected;
The search means, the first reference means, and the second reference means are operated while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. Means for outputting a code of the association data referred to after processing the last character of the input character;
Have
The search means
When referring to the association data for the first hierarchy of the data structure, the association data of the maximum code included in the association data being referred to from the first association data belonging to the second hierarchy Explore until
When referring to the association data for the second and subsequent layers of the data structure, from the association data of the maximum code included in the association data immediately before the referenced association data An information processing apparatus that searches up to the association data of the maximum code included in the associated association data.

（付記３１）
文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力符号のうち第１の符号の順番の対応付けデータを特定する処理と、
特定された前記対応付けデータに含まれる文字のコードをメモリに格納する格納処理と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノード以外のノードの符号を示している場合には、当該親ノードの符号の対応付けデータを参照する参照処理と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノードの符号を示している場合には、前記メモリに格納されている文字のコードを逆順に出力する出力処理と、
前記格納処理と前記参照処理と前記出力処理とを、前記入力符号のうち前記第１の符号より後ろの各符号について順番に実施する処理と、
が、コンピュータにより実行される伸張方法。 (Appendix 31)
A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node In the data structure arranged in order, the process of identifying the association data in the order of the first code among the input codes;
A storing process for storing a code of a character included in the identified association data in a memory;
When the code of the parent node included in the identified association data indicates the code of a node other than the root node of the phrase tree, a reference process for referring to the association data of the code of the parent node;
When the code of the parent node included in the identified association data indicates the code of the root node of the phrase tree, an output process for outputting the codes of the characters stored in the memory in reverse order;
A process of sequentially performing the storage process, the reference process, and the output process for each code after the first code among the input codes;
Is a decompression method executed by a computer.

（付記３２）
文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力符号のうち第１の符号の順番の対応付けデータを特定する手段と、
特定された前記対応付けデータに含まれる文字のコードをメモリに格納する格納処理手段と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノード以外のノードの符号を示している場合には、当該親ノードの符号の対応付けデータを参照する参照手段と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノードの符号を示している場合には、前記メモリに格納されている文字のコードを逆順に出力する出力手段と、
前記格納処理手段と前記参照手段と前記出力手段とを、前記入力符号のうち前記第１の符号より後ろの各符号について順番に動作させる手段と、
を有する情報処理装置。 (Appendix 32)
A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node In the data structure arranged in order, means for identifying the association data in the order of the first code among the input codes;
Storage processing means for storing a character code included in the identified association data in a memory;
When the code of the parent node included in the identified association data indicates a code of a node other than the root node of the phrase tree, a reference unit that refers to the association data of the code of the parent node;
When the code of the parent node included in the identified association data indicates the code of the root node of the phrase tree, output means for outputting the codes of the characters stored in the memory in reverse order;
Means for operating the storage processing means, the reference means, and the output means in turn for each code after the first code among the input codes;
An information processing apparatus.

（付記３３）
文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の各階層の各ノードについて当該ノードに対応する文字又は文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の第２の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する処理と、
特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定処理と、
前記第２のデータから、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し処理と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記特定処理と前記読み出し処理とを実施する処理と、
が、コンピュータにより実行される伸張方法。 (Appendix 33)
First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy of the phrase tree, the character or character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy of the phrase tree, the minimum code among the codes of the nodes belonging to the hierarchy and the position of the character or character string corresponding to the minimum second code in the second data from the beginning In the first data included in the data structure including data that associates an offset value, a process of specifying an entry in the order of the first code among the input codes;
A specifying process for specifying the minimum code and the offset value in the third data according to the hierarchical number included in the specified entry;
Obtained by adding a value obtained by multiplying the difference between the code of the specified entry and the specified minimum code to the specified offset value from the second data by the hierarchy number. Read processing for reading out characters or character strings corresponding to the hierarchy number from the arrangement position;
A process of specifying an entry in the first data for each code after the second code after the first code among the input codes, and performing the specifying process and the reading process;
Is a decompression method executed by a computer.

（付記３４）
文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の各階層の各ノードについて当該ノードに対応する文字又は文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の第２の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する手段と、
特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定手段と、
前記第２のデータから、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し手段と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記特定手段と前記読み出し手段とを動作させる手段と、
を有する情報処理装置。 (Appendix 34)
First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy of the phrase tree, the character or character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy of the phrase tree, the minimum code among the codes of the nodes belonging to the hierarchy and the position of the character or character string corresponding to the minimum second code in the second data from the beginning Means for identifying an entry in the order of the first code among the input codes in the first data included in the data structure including data that associates an offset value;
Specifying means for specifying the minimum code and the offset value in the third data according to the hierarchical number included in the specified entry;
Obtained by adding a value obtained by multiplying the difference between the code of the specified entry and the specified minimum code to the specified offset value from the second data by the hierarchy number. Reading means for reading out characters or character strings corresponding to the hierarchical number from the arrangement position;
A means for specifying an entry in the first data for each code after the second code after the first code among the input codes, and operating the specifying means and the reading means;
An information processing apparatus.

（付記３５）
文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の第２の階層以降の各階層の各ノードについて当該ノードに対応する文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の第２の階層以降の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の第２の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する処理と、
特定された前記エントリが第１の階層におけるエントリであれば、特定された前記エントリの符号に対応する文字を出力する出力処理と、
特定された前記エントリが第２の階層以降のエントリであれば、特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定処理と、
前記第２のデータにおいて、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し処理と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記出力処理と前記特定処理と前記読み出し処理とを実施する処理と、
が、コンピュータにより実行される伸張方法。 (Appendix 35)
First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy after the second hierarchy of the phrase tree, a character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy after the second hierarchy of the phrase tree, the smallest code among the codes of the nodes belonging to the hierarchy, and the arrangement of characters or character strings corresponding to the smallest second code in the second data Processing for specifying an entry in the order of the first code among the input codes in the first data included in the data structure, including data that associates an offset value from the head of the position;
If the specified entry is an entry in the first hierarchy, an output process for outputting a character corresponding to the code of the specified entry;
If the specified entry is an entry after the second layer, a specifying process for specifying the minimum code and the offset value in the third data according to the layer number included in the specified entry;
In the second data, it is obtained by adding a value obtained by multiplying a difference between the code of the specified entry and the specified minimum code to the specified offset value by the hierarchy number. Read processing for reading out characters or character strings corresponding to the hierarchy number from the arrangement position;
A process of specifying an entry in the first data for each code after the second code after the first code in the input code, and performing the output process, the specifying process, and the reading process When,
Is a decompression method executed by a computer.

（付記３６）
文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の第２の階層以降の各階層の各ノードについて当該ノードに対応する文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の第２の階層以降の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の第２の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する手段と、
特定された前記エントリが第１の階層におけるエントリであれば、特定された前記エントリの符号に対応する文字を出力する出力手段と、
特定された前記エントリが第２の階層以降のエントリであれば、特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定手段と、
前記第２のデータにおいて、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し手段と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記出力手段と前記特定手段と前記読み出し手段とを動作させる処理と、
を有する情報処理装置。 (Appendix 36)
First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy after the second hierarchy of the phrase tree, a character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy after the second hierarchy of the phrase tree, the smallest code among the codes of the nodes belonging to the hierarchy, and the arrangement of characters or character strings corresponding to the smallest second code in the second data Means for identifying an entry in the order of the first code among the input codes in the first data included in the data structure, including data that associates an offset value from the head of the position;
If the specified entry is an entry in the first hierarchy, output means for outputting a character corresponding to the code of the specified entry;
If the specified entry is an entry after the second hierarchy, a specifying means for specifying the minimum code and the offset value in the third data according to the hierarchy number included in the specified entry;
In the second data, it is obtained by adding a value obtained by multiplying a difference between the code of the specified entry and the specified minimum code to the specified offset value by the hierarchy number. Reading means for reading out characters or character strings corresponding to the hierarchical number from the arrangement position;
A process of specifying an entry in the first data for each code after the second code after the first code in the input code and operating the output means, the specifying means, and the reading means When,
An information processing apparatus.

１００情報処理装置
１１０文節木生成部
１２０圧縮マップ生成部
１３０データ格納部
１４０圧縮処理部
１５０伸張処理部
１６０入出力部 100 Information Processing Device 110 Phrase Tree Generation Unit 120 Compression Map Generation Unit 130 Data Storage Unit 140 Compression Processing Unit 150 Decompression Processing Unit 160 Input / Output Unit

Claims

文節木に対応するデータ構造であって、
前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードと、当該ノードの子ノードの符号のうち最大の符号とを対応付ける対応付けデータが、各ノードの符号の順番に並べられ、
前記各ノードの符号が、前記文節木における各階層内において連続するように付与されており、
情報処理装置が、
前記文節木におけるあるノードの子ノードを探索する際に、前記あるノードの１つ前のノードについての対応付けデータから前記あるノードの１つ前のノードの子ノードの符号のうち最大の符号を読み出して当該最大の符号の次の符号を特定し、前記あるノードについての対応付けデータから当該あるノードの子ノードのうち最大の符号を読み出すことに用いることが可能である
データ構造。 A data structure corresponding to a phrase tree,
For each node of the phrase tree, the association data that associates the code of the parent node of the node, the code of the character represented by the node, and the maximum code among the codes of the child nodes of the node, Are arranged in the order of the signs of
The code of each node is given to be continuous in each hierarchy in the phrase tree ,
Information processing device
When searching for a child node of a node in the phrase tree, the maximum code among the codes of the child nodes of the node immediately before the certain node is obtained from the association data for the node immediately before the certain node. A data structure that can be used to read and specify a code next to the maximum code and to read the maximum code among the child nodes of the certain node from the association data for the certain node .

文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有するデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードの親ノードの符号又は当該ノードの子ノードの符号のうち最大の符号と、当該ノードで表される文字のコードと、当該ノードが所属する階層の番号とを対応付ける対応付けデータが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の第２階層以降の各階層の各ノードについて当該ノードに対応する文字列に含まれる文字のコードを当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の第２階層以降の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の符号に対応する文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
データ構造であって、
情報処理装置が、
前記第１のデータにおいて、入力符号のうちある符号の順番の対応付けデータを特定し、
特定された前記対応付けデータが前記第２階層以降についての対応付けデータであれば、特定された前記対応付けデータに含まれる階層の番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定し、
前記第２のデータにおいて、特定された前記最小の符号と前記オフセット値と前記階層の番号とに基づき特定される文字のコードを読み出す
処理に用いることが可能である
データ構造。 First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
A data structure having
In the first data,
For each node of the phrase tree, the maximum code among the code of the parent node of the node or the code of the child node of the node, the character code represented by the node, and the number of the hierarchy to which the node belongs mapping data associating the found are arranged in order of code for each node,
The second data is:
For each node in each hierarchy after the second hierarchy of the phrase tree, the character code included in the character string corresponding to the node is included in the order of the code of the node,
The third data is:
For the second and subsequent layers in each layer of the phrase tree, a minimum code among codes of nodes belonging to those hierarchical layers, the arrangement position of the character string corresponding in the second data to the minimum of the code, from the beginning a data structure containing data associating the offset value,
Information processing device
In the first data, specify the association data in the order of a certain code among the input codes,
If the identified association data is association data for the second and subsequent layers, the minimum code and the offset value in the third data according to the layer number included in the identified association data And identify
In the second data, the code of the character specified based on the specified minimum code, the offset value, and the hierarchy number is read.
Can be used for processing
Data structure .

文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字列の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
を、コンピュータに実行させるための圧縮プログラム。 A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node In the data structure arranged in order, a process of referring to the association data of the code corresponding to the first character included in the input character string;
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string A search process for searching the corresponding data in the data structure;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character string, a process of outputting the code of the association data referred to,
Is a compression program that causes a computer to execute.

前記探索処理において、
前記データ構造の第１階層についての対応付けデータを参照している場合には、第２階層以降の対応付けデータを探索し、
前記データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの符号より後ろの対応付けデータを探索する
請求項３記載の圧縮プログラム。 In the search process,
When referring to the association data for the first hierarchy of the data structure, search the association data for the second hierarchy and thereafter,
The compression program according to claim 3 , wherein when referring to association data for a hierarchy after the second hierarchy of the data structure, the association data after the code of the association data being referred to is searched.

前記対応付けデータが、さらに前記ノードの子ノードの符号のうち最大の符号をさらに対応付けており、
前記探索処理において、
前記データ構造の第１階層についての対応付けデータを参照している場合には、第２階層に属する最初の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索し、
前記データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの１つ前の対応付けデータに含まれる最大の符号の対応付けデータの次の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索する
請求項３記載の圧縮プログラム。 The association data further associates the maximum code among the codes of child nodes of the node;
In the search process,
When referring to the association data for the first hierarchy of the data structure, the association data of the maximum code included in the association data being referred to from the first association data belonging to the second hierarchy Explore until
Wherein the second layer of the subsequent hierarchical data structure in the case that reference the correspondence data, the previous one of the mapping data that refers to the maximum code included in the mapping data of the mapping data The compression program according to claim 3 , wherein the search is performed from the next association data to the association data having the maximum code included in the referenced association data.

前記対応付けデータが、さらに当該ノードの属する階層の階層番号をさらに対応付けており、
前記探索処理において、
前記データ構造において参照している対応データの階層番号の次の階層番号が対応付けられている対応付けデータを探索する
請求項３記載の圧縮プログラム。 The association data further associates a hierarchy number of a hierarchy to which the node belongs,
In the search process,
The compression program according to claim 3, wherein association data associated with a hierarchy number next to a hierarchy number of corresponding data referred to in the data structure is searched.

文節木に対応する第１のデータと前記文節木の階層についての第２のデータとを有するデータ構造であって、前記第１のデータにおいて、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードと、当該ノードが所属する階層とを対応付けるデータが、各ノードの符号の順番に並べられており、前記第２のデータは、前記文節木の各階層について、当該階層に属するノードの数と、当該階層に属するノードの符号のうち最小の符号とを対応付けるデータを含む前記データ構造に含まれる前記第１のデータにおいて、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記第２のデータにおいて、参照している対応付けデータに対応するノードの階層の１階層下の階層に属するノードの数及び前記最小の符号とから特定される範囲を前記第１のデータについて探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字列の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
をコンピュータに実行させるための圧縮プログラム。 A data structure having first data corresponding to a phrase tree and second data for a hierarchy of the phrase tree, wherein each node of the phrase tree in the first data is a parent node of the node , The code of the character represented by the node and the hierarchy to which the node belongs are arranged in the order of the code of each node, and the second data is the phrase tree For each layer, the first data included in the data structure including data that associates the number of nodes belonging to the layer with the minimum code among the codes of the nodes belonging to the layer is included in the input character string. A process of referring to the association data of the code corresponding to the first character;
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string In the second data, the number of nodes belonging to a layer one level below the level of the node corresponding to the referenced association data and the minimum code are specified. A search process for searching for a range of the first data;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character string, a process of outputting the code of the association data referred to,
A compression program that causes a computer to execute.

文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの子ノードの符号のうち最大の符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字列の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
を、コンピュータに実行させ、
前記探索処理においては、
前記データ構造の第１階層についての対応付けデータを参照している場合には、第２階層に属する最初の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索し、
前記データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの１つ前の対応付けデータに含まれる最大の符号の対応付けデータの次の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索する
圧縮プログラム。 A data structure corresponding to a phrase tree, for each node of the phrase tree, association data that associates the maximum code among the codes of the child nodes of the node with the code of the character represented by the node, In the data structure arranged in the order of the code of each node, a process of referring to the association data of the code corresponding to the first character included in the input character string;
A search process for searching in the data structure for correspondence data in which the code of the character represented by the node is the code of the second character that appears next in the input character string;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character string, a process of outputting the code of the association data referred to,
To the computer,
In the search process,
When referring to the association data for the first hierarchy of the data structure, the association data of the maximum code included in the association data being referred to from the first association data belonging to the second hierarchy Explore until
Wherein the second layer of the subsequent hierarchical data structure in the case that reference the correspondence data, the previous one of the mapping data that refers to the maximum code included in the mapping data of the mapping data A compression program that searches from the next association data to the association data with the maximum code included in the association data being referred to.

文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力符号のうち第１の符号の順番の対応付けデータを特定する処理と、
特定された前記対応付けデータに含まれる文字のコードをメモリに格納する格納処理と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノード以外のノードの符号を示している場合には、当該親ノードの符号の対応付けデータを参照する参照処理と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノードの符号を示している場合には、前記メモリに格納されている文字のコードを逆順に出力する出力処理と、
前記格納処理と前記参照処理と前記出力処理とを、前記入力符号のうち前記第１の符号より後ろの各符号について順番に実施する処理と、
を、コンピュータに実行させるための伸張プログラム。 A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node In the data structure arranged in order, the process of identifying the association data in the order of the first code among the input codes;
A storing process for storing a code of a character included in the identified association data in a memory;
When the code of the parent node included in the identified association data indicates the code of a node other than the root node of the phrase tree, a reference process for referring to the association data of the code of the parent node;
When the code of the parent node included in the identified association data indicates the code of the root node of the phrase tree, an output process for outputting the codes of the characters stored in the memory in reverse order;
A process of sequentially performing the storage process, the reference process, and the output process for each code after the first code among the input codes;
Is a decompression program for causing a computer to execute.

文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の各階層の各ノードについて当該ノードに対応する文字又は文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する処理と、
特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定処理と、
前記第２のデータから、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し処理と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記特定処理と前記読み出し処理とを実施する処理と、
を、コンピュータに実行させるためのプログラム。 First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy of the phrase tree, the character or character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy of the phrase tree, the minimum code among the codes of the nodes belonging to the hierarchy, and the offset value from the beginning of the arrangement position of the character or character string corresponding to the minimum code in the second data In the first data included in the data structure including the data corresponding to
A specifying process for specifying the minimum code and the offset value in the third data according to the hierarchical number included in the specified entry;
Obtained by adding a value obtained by multiplying the difference between the code of the specified entry and the specified minimum code to the specified offset value from the second data by the hierarchy number. Read processing for reading out characters or character strings corresponding to the hierarchy number from the arrangement position;
A process of specifying an entry in the first data for each code after the second code after the first code among the input codes, and performing the specifying process and the reading process;
A program that causes a computer to execute.

文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の第２の階層以降の各階層の各ノードについて当該ノードに対応する文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の第２の階層以降の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する処理と、
特定された前記エントリが第１の階層におけるエントリであれば、特定された前記エントリの符号に対応する文字を出力する出力処理と、
特定された前記エントリが第２の階層以降のエントリであれば、特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定処理と、
前記第２のデータにおいて、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し処理と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記出力処理と前記特定処理と前記読み出し処理とを実施する処理と、
を、コンピュータに実行させるためのプログラム。 First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy after the second hierarchy of the phrase tree, a character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy after the second hierarchy of the phrase tree, the minimum code among the codes of the nodes belonging to the hierarchy, and the arrangement position of the character or character string corresponding to the minimum code in the second data, In the first data included in the data structure including data that associates an offset value from the head, a process of specifying an entry in the order of the first code among the input codes;
If the specified entry is an entry in the first hierarchy, an output process for outputting a character corresponding to the code of the specified entry;
If the specified entry is an entry after the second layer, a specifying process for specifying the minimum code and the offset value in the third data according to the layer number included in the specified entry;
In the second data, it is obtained by adding a value obtained by multiplying a difference between the code of the specified entry and the specified minimum code to the specified offset value by the hierarchy number. Read processing for reading out characters or character strings corresponding to the hierarchy number from the arrangement position;
A process of specifying an entry in the first data for each code after the second code after the first code in the input code, and performing the output process, the specifying process, and the reading process When,
A program that causes a computer to execute.

文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字列の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
が、コンピュータにより実行される圧縮方法。 A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node In the data structure arranged in order, a process of referring to the association data of the code corresponding to the first character included in the input character string;
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string A search process for searching the corresponding data in the data structure;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character string, a process of outputting the code of the association data referred to,
A compression method executed by a computer.

文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する手段と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索手段と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照手段と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照手段と、
前記探索手段と前記第１参照手段と前記第２参照手段とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ動作させ、前記入力文字列の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する手段と、
を有する情報処理装置。 A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node Means for referring to the association data of the code corresponding to the first character included in the input character string in the data structure arranged in order;
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string Search means for searching the corresponding data in the data structure;
A first reference means for referring to the association data when the association data is detected;
A second reference means for outputting a code of the association data referred to and referring to the association data of the code corresponding to the second character when the association data is not detected;
The search means, the first reference means, and the second reference means are operated while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. Means for processing the last character of the input character string and then outputting a code of the association data referred to;
An information processing apparatus.

文節木に対応する第１のデータと前記文節木の階層についての第２のデータとを有するデータ構造であって、前記第１のデータにおいて、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードと、当該ノードが所属する階層とを対応付けデータが、各ノードの符号の順番に並べられており、前記第２のデータは、前記文節木の各階層について、当該階層に属するノードの数と、当該階層に属するノードの符号のうち最小の符号とを対応付けるデータを含む前記データ構造に含まれる前記第１のデータにおいて、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記第２のデータにおいて、参照している対応付けデータに対応するノードの階層の１階層下の階層に属するノードの数及び前記最小の符号とから特定される範囲を前記第１のデータについて探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字列の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
が、コンピュータにより実行される圧縮方法。 A data structure having first data corresponding to a phrase tree and second data for a hierarchy of the phrase tree, wherein each node of the phrase tree in the first data is a parent node of the node , The code of the character represented by the node, and the hierarchy to which the node belongs are arranged in the order of the code of each node, and the second data is the phrase tree In the first data included in the data structure including the data for associating the number of nodes belonging to the hierarchy with the minimum code among the codes of the nodes belonging to the hierarchy, A process of referring to the association data of the code corresponding to the first character
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string In the second data, the number of nodes belonging to a layer one level below the level of the node corresponding to the referenced association data and the minimum code are specified. A search process for searching for a range of the first data;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character string, a process of outputting the code of the association data referred to,
A compression method executed by a computer.

文節木に対応する第１のデータと前記文節木の階層についての第２のデータとを有するデータ構造であって、前記第１のデータにおいて、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードと、当該ノードが所属する階層とを対応付けるデータが、各ノードの符号の順番に並べられており、前記第２のデータは、前記文節木の各階層について、当該階層に属するノードの数と、当該階層に属するノードの符号のうち最小の符号とを対応付けるデータを含む前記データ構造に含まれる前記第１のデータにおいて、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する手段と、
前記親ノードの符号が、参照している前記対応付けデータに対応するノードの符号となっており且つ前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記第２のデータにおいて、参照している対応付けデータに対応するノードの階層の１階層下の階層に属するノードの数及び前記最小の符号とから特定される範囲を前記第１のデータについて探索する探索手段と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照手段と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照手段と、
前記探索手段と前記第１参照手段と前記第２参照手段とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ動作させ、前記入力文字列の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する手段と、
を有する情報処理装置。 A data structure having first data corresponding to a phrase tree and second data for a hierarchy of the phrase tree, wherein each node of the phrase tree in the first data is a parent node of the node , The code of the character represented by the node and the hierarchy to which the node belongs are arranged in the order of the code of each node, and the second data is the phrase tree For each layer, the first data included in the data structure including data that associates the number of nodes belonging to the layer with the minimum code among the codes of the nodes belonging to the layer is included in the input character string. Means for referring to the association data of the code corresponding to the first character;
The code of the second node in which the code of the parent node is the code of the node corresponding to the association data being referred to and the character code represented by the node appears next in the input character string In the second data, the number of nodes belonging to a layer one level below the level of the node corresponding to the referenced association data and the minimum code are specified. Search means for searching for a range for the first data;
A first reference means for referring to the association data when the association data is detected;
A second reference means for outputting a code of the association data referred to and referring to the association data of the code corresponding to the second character when the association data is not detected;
The search means, the first reference means, and the second reference means are operated while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. Means for processing the last character of the input character string and then outputting a code of the association data referred to;
An information processing apparatus.

文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの子ノードの符号のうち最大の符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する処理と、
前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索処理と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照処理と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照処理と、
前記探索処理と前記第１参照処理と前記第２参照処理とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ実施し、前記入力文字列の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する処理と、
が、コンピュータにより実行され、
前記探索処理においては、
前記データ構造の第１階層についての対応付けデータを参照している場合には、第２階層に属する最初の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索し、
前記データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの１つ前の対応付けデータに含まれる最大の符号の対応付けデータの次の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索する
圧縮方法。 A data structure corresponding to a phrase tree, for each node of the phrase tree, association data that associates the maximum code among the codes of the child nodes of the node with the code of the character represented by the node, In the data structure arranged in the order of the code of each node, a process of referring to the association data of the code corresponding to the first character included in the input character string;
A search process for searching in the data structure for correspondence data in which the code of the character represented by the node is the code of the second character that appears next in the input character string;
When the association data is detected, a first reference process for referring to the association data;
If the association data is not detected, a second reference process for outputting the code of the association data referred to and referring to the association data of the code corresponding to the second character;
The search process, the first reference process, and the second reference process are performed while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. , After processing the last character of the input character string, a process of outputting the code of the association data referred to,
Is executed by the computer,
In the search process,
When referring to the association data for the first hierarchy of the data structure, the association data of the maximum code included in the association data being referred to from the first association data belonging to the second hierarchy Explore until
Wherein the second layer of the subsequent hierarchical data structure in the case that reference the correspondence data, the previous one of the mapping data that refers to the maximum code included in the mapping data of the mapping data A compression method for searching from the next association data to the association data having the maximum code included in the associated association data.

文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの子ノードの符号のうち最大の符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力文字列に含まれる第１の文字に対応する符号の対応付けデータを参照する手段と、
前記ノードで表される文字のコードが前記入力文字列において次に現れる第２の文字のコードとなっている対応付けデータを、前記データ構造において探索する探索手段と、
対応付けデータが検出された場合には、当該対応付けデータを参照する第１参照手段と、
対応付けデータが検出されない場合には、参照している前記対応付けデータの符号を出力し、前記第２の文字に対応する符号の対応付けデータを参照する第２参照手段と、
前記探索手段と前記第１参照手段と前記第２参照手段とを、前記入力文字列の最後の文字を処理するまで、前記第２の文字を前記入力文字列の文字の順に移動させつつ動作させ、前記入力文字列の最後の文字を処理した後に、参照している前記対応付けデータの符号を出力する手段と、
を有し、
前記探索手段が、
前記データ構造の第１階層についての対応付けデータを参照している場合には、第２階層に属する最初の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索し、
前記データ構造の第２階層以降の階層についての対応付けデータを参照している場合には、参照している対応付けデータの１つ前の対応付けデータに含まれる最大の符号の対応付けデータの次の対応付けデータから、参照している対応付けデータに含まれる最大の符号の対応付けデータまでを探索する
情報処理装置。 A data structure corresponding to a phrase tree, for each node of the phrase tree, association data that associates the maximum code among the codes of the child nodes of the node with the code of the character represented by the node, Means for referring to the association data of the code corresponding to the first character included in the input character string in the data structure arranged in the order of the code of each node;
Search means for searching in the data structure for correspondence data in which the code of the character represented by the node is the code of the second character that appears next in the input character string;
A first reference means for referring to the association data when the association data is detected;
A second reference means for outputting a code of the association data referred to and referring to the association data of the code corresponding to the second character when the association data is not detected;
The search means, the first reference means, and the second reference means are operated while moving the second character in the order of the characters of the input character string until the last character of the input character string is processed. Means for processing the last character of the input character string and then outputting a code of the association data referred to;
Have
The search means
When referring to the association data for the first hierarchy of the data structure, the association data of the maximum code included in the association data being referred to from the first association data belonging to the second hierarchy Explore until
Wherein the second layer of the subsequent hierarchical data structure in the case that reference the correspondence data, the previous one of the mapping data that refers to the maximum code included in the mapping data of the mapping data An information processing apparatus that searches from the next association data to the association data having the maximum code included in the associated association data.

文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力符号のうち第１の符号の順番の対応付けデータを特定する処理と、
特定された前記対応付けデータに含まれる文字のコードをメモリに格納する格納処理と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノード以外のノードの符号を示している場合には、当該親ノードの符号の対応付けデータを参照する参照処理と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノードの符号を示している場合には、前記メモリに格納されている文字のコードを逆順に出力する出力処理と、
前記格納処理と前記参照処理と前記出力処理とを、前記入力符号のうち前記第１の符号より後ろの各符号について順番に実施する処理と、
が、コンピュータにより実行される伸張方法。 A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node In the data structure arranged in order, the process of identifying the association data in the order of the first code among the input codes;
A storing process for storing a code of a character included in the identified association data in a memory;
When the code of the parent node included in the identified association data indicates the code of a node other than the root node of the phrase tree, a reference process for referring to the association data of the code of the parent node;
When the code of the parent node included in the identified association data indicates the code of the root node of the phrase tree, an output process for outputting the codes of the characters stored in the memory in reverse order;
A process of sequentially performing the storage process, the reference process, and the output process for each code after the first code among the input codes;
Is a decompression method executed by a computer.

文節木に対応するデータ構造であって、前記文節木の各ノードについて、当該ノードの親ノードの符号と、当該ノードで表される文字のコードとを対応付ける対応付けデータが、各ノードの符号の順番に並べられたデータ構造において、入力符号のうち第１の符号の順番の対応付けデータを特定する手段と、
特定された前記対応付けデータに含まれる文字のコードをメモリに格納する格納処理手段と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノード以外のノードの符号を示している場合には、当該親ノードの符号の対応付けデータを参照する参照手段と、
特定された前記対応付けデータに含まれる親ノードの符号が前記文節木の根ノードの符号を示している場合には、前記メモリに格納されている文字のコードを逆順に出力する出力手段と、
前記格納処理手段と前記参照手段と前記出力手段とを、前記入力符号のうち前記第１の符号より後ろの各符号について順番に動作させる手段と、
を有する情報処理装置。 A data structure corresponding to a phrase tree, and for each node of the phrase tree, association data that associates a code of a parent node of the node with a code of a character represented by the node includes a code of each node In the data structure arranged in order, means for identifying the association data in the order of the first code among the input codes;
Storage processing means for storing a character code included in the identified association data in a memory;
When the code of the parent node included in the identified association data indicates a code of a node other than the root node of the phrase tree, a reference unit that refers to the association data of the code of the parent node;
When the code of the parent node included in the identified association data indicates the code of the root node of the phrase tree, output means for outputting the codes of the characters stored in the memory in reverse order;
Means for operating the storage processing means, the reference means, and the output means in turn for each code after the first code among the input codes;
An information processing apparatus.

文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の各階層の各ノードについて当該ノードに対応する文字又は文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する処理と、
特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定処理と、
前記第２のデータから、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し処理と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記特定処理と前記読み出し処理とを実施する処理と、
が、コンピュータにより実行される伸張方法。 First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy of the phrase tree, the character or character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy of the phrase tree, the minimum code among the codes of the nodes belonging to the hierarchy, and the offset value from the beginning of the arrangement position of the character or character string corresponding to the minimum code in the second data In the first data included in the data structure including the data corresponding to
A specifying process for specifying the minimum code and the offset value in the third data according to the hierarchical number included in the specified entry;
Obtained by adding a value obtained by multiplying the difference between the code of the specified entry and the specified minimum code to the specified offset value from the second data by the hierarchy number. Read processing for reading out characters or character strings corresponding to the hierarchy number from the arrangement position;
A process of specifying an entry in the first data for each code after the second code after the first code among the input codes, and performing the specifying process and the reading process;
Is a decompression method executed by a computer.

文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の各階層の各ノードについて当該ノードに対応する文字又は文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する手段と、
特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定手段と、
前記第２のデータから、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し手段と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記特定手段と前記読み出し手段とを動作させる手段と、
を有する情報処理装置。 First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy of the phrase tree, the character or character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy of the phrase tree, the minimum code among the codes of the nodes belonging to the hierarchy, and the offset value from the beginning of the arrangement position of the character or character string corresponding to the minimum code in the second data In the first data included in the data structure including the data corresponding to
Specifying means for specifying the minimum code and the offset value in the third data according to the hierarchical number included in the specified entry;
Obtained by adding a value obtained by multiplying the difference between the code of the specified entry and the specified minimum code to the specified offset value from the second data by the hierarchy number. Reading means for reading out characters or character strings corresponding to the hierarchical number from the arrangement position;
A means for specifying an entry in the first data for each code after the second code after the first code among the input codes, and operating the specifying means and the reading means;
An information processing apparatus.

文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の第２の階層以降の各階層の各ノードについて当該ノードに対応する文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の第２の階層以降の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する処理と、
特定された前記エントリが第１の階層におけるエントリであれば、特定された前記エントリの符号に対応する文字を出力する出力処理と、
特定された前記エントリが第２の階層以降のエントリであれば、特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定処理と、
前記第２のデータにおいて、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し処理と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記出力処理と前記特定処理と前記読み出し処理とを実施する処理と、
が、コンピュータにより実行される伸張方法。 First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy after the second hierarchy of the phrase tree, a character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy after the second hierarchy of the phrase tree, the minimum code among the codes of the nodes belonging to the hierarchy, and the arrangement position of the character or character string corresponding to the minimum code in the second data, In the first data included in the data structure including data that associates an offset value from the head, a process of specifying an entry in the order of the first code among the input codes;
If the specified entry is an entry in the first hierarchy, an output process for outputting a character corresponding to the code of the specified entry;
If the specified entry is an entry after the second layer, a specifying process for specifying the minimum code and the offset value in the third data according to the layer number included in the specified entry;
In the second data, it is obtained by adding a value obtained by multiplying a difference between the code of the specified entry and the specified minimum code to the specified offset value by the hierarchy number. Read processing for reading out characters or character strings corresponding to the hierarchy number from the arrangement position;
A process of specifying an entry in the first data for each code after the second code after the first code in the input code, and performing the output process, the specifying process, and the reading process When,
Is a decompression method executed by a computer.

文節木に対応する第１のデータと、
前記文節木の各ノードに対応する文字又は文字列についての第２のデータと、
前記文節木の階層についての第３のデータと、
を有し且つデータ格納部に格納されているデータ構造であって、
前記第１のデータにおいて、
前記文節木の各ノードについて、当該ノードが所属する階層の階層番号を含むエントリが、各ノードの符号の順番に並べられており、
前記第２のデータは、
前記文節木の第２の階層以降の各階層の各ノードについて当該ノードに対応する文字列を当該ノードの符号の順に含み、
前記第３のデータは、
前記文節木の第２の階層以降の各階層について、当該階層に属するノードの符号のうち最小の符号と、前記第２のデータにおいて当該最小の符号に対応する文字又は文字列の配置位置の、先頭からのオフセット値とを対応付けるデータを含む
前記データ構造に含まれる前記第１のデータにおいて、入力符号のうち第１の符号の順番のエントリを特定する手段と、
特定された前記エントリが第１の階層におけるエントリであれば、特定された前記エントリの符号に対応する文字を出力する出力手段と、
特定された前記エントリが第２の階層以降のエントリであれば、特定された前記エントリに含まれる階層番号に従って前記第３のデータにおいて前記最小の符号と前記オフセット値とを特定する特定手段と、
前記第２のデータにおいて、特定された前記オフセット値に対して、特定された前記エントリの符号と特定された前記最小の符号との差に前記階層番号を乗じた値を加算することで得られる配置位置から前記階層番号分の文字又は文字列を読み出す読み出し手段と、
前記入力符号のうち前記第１の符号の後ろの第２の符号以降の各符号について、前記第１のデータにおけるエントリを特定し、前記出力手段と前記特定手段と前記読み出し手段とを動作させる処理と、
を有する情報処理装置。 First data corresponding to the phrase tree;
Second data about characters or character strings corresponding to each node of the phrase tree;
Third data about the hierarchy of the phrase tree;
And a data structure stored in the data storage unit,
In the first data,
For each node of the phrase tree, entries including the hierarchy number of the hierarchy to which the node belongs are arranged in the order of the code of each node,
The second data is:
For each node in each hierarchy after the second hierarchy of the phrase tree, a character string corresponding to the node is included in the order of the code of the node,
The third data is:
For each hierarchy after the second hierarchy of the phrase tree, the minimum code among the codes of the nodes belonging to the hierarchy, and the arrangement position of the character or character string corresponding to the minimum code in the second data, Means for identifying an entry in the order of the first code among the input codes in the first data included in the data structure including data associated with an offset value from the head;
If the specified entry is an entry in the first hierarchy, output means for outputting a character corresponding to the code of the specified entry;
If the specified entry is an entry after the second hierarchy, a specifying means for specifying the minimum code and the offset value in the third data according to the hierarchy number included in the specified entry;
In the second data, it is obtained by adding a value obtained by multiplying a difference between the code of the specified entry and the specified minimum code to the specified offset value by the hierarchy number. Reading means for reading out characters or character strings corresponding to the hierarchical number from the arrangement position;
A process of specifying an entry in the first data for each code after the second code after the first code in the input code and operating the output means, the specifying means, and the reading means When,
An information processing apparatus.