JP2019003501A

JP2019003501A - Information processor, information processing method, and program

Info

Publication number: JP2019003501A
Application number: JP2017118876A
Authority: JP
Inventors: 中村　実; Minoru Nakamura; 実中村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-06-16
Filing date: 2017-06-16
Publication date: 2019-01-10

Abstract

To provide a storage technology enabling high-speed search of a document including whose data type is not determined and which includes a field name and a value corresponding to the field name.SOLUTION: An information processor includes: an acquisition part acquiring a document whose data type is not determined and which includes a field name and a value corresponding to the field name; a holding part holding field name specification information for specifying the field name described in the document in description order of the field name in the document; a generation part generating description order and type relationship information associating the description order with type information of the document specified in accordance with the description order; and a storage control part storing, into a storage region, type-value relationship information associating a value corresponding to the field name corresponding to the field name specification information held in the description order.SELECTED DRAWING: Figure 4

Description

本発明は、情報処理装置、情報処理方法、及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

コンピュータ上や情報通信網上で取り扱われるデータを効率よく利用するための仕組みとしてデータベース（ＤＢ）が存在し、そのＤＢを管理するシステムとしてデータベース管理システム（ＤＢＭＳ）が存在する。 A database (DB) exists as a mechanism for efficiently using data handled on a computer or an information communication network, and a database management system (DBMS) exists as a system for managing the DB.

ＤＢには、データを２次元のテーブル形式で格納し、必要に応じてテーブルを結合させることができるリレーショナルＤＢ（ＲＤＢ）及びその管理システム（ＲＤＢＭＳ）がある。 The DB includes a relational DB (RDB) and a management system (RDBMS) that can store data in a two-dimensional table format and combine the tables as necessary.

しかし、近年、取り扱われるデータ量の増加、レスポンス速度、データの多様化等により、利用環境や利用条件に応じて、ＲＤＢＭＳ以外のＤＢＭＳ、いわゆるＮｏＳＱＬ（Ｎｏt ＯｎｌｙＳＱＬ）も利用されることが多くなってきている。ＮｏＳＱＬデータベースの１つに、ドキュメントＤＢと呼ばれるデータベースが存在する。 However, in recent years, DBMSs other than RDBMSs, so-called NoSQL (Not Only SQL), are often used depending on the usage environment and usage conditions due to an increase in the amount of data handled, response speed, data diversification, and the like. It is coming. There is a database called a document DB as one of NoSQL databases.

ドキュメントＤＢでは、ＪＳＯＮ（ＪａｖａＳｃｒｉｐｔＯｂｊｅｃｔＮｏｔａｔｉｏｎ）やＸＭＬ（ＥｘｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）等の記述形式で記述されたデータが格納される。ドキュメントＤＢの一例として、ＭｏｎｇｏＤＢやＭａｒｋＬｏｇｉｃのようなドキュメントＤＢがある。 The document DB stores data described in a description format such as JSON (Java Script Object Notation) or XML (Extensible Markup Language). As an example of the document DB, there is a document DB such as MongoDB or MarkLogic.

ＸＭＬ形式等の所定の形式の文書を構成する個々の論理的な構造要素を識別できる文書（構造化文書）を格納し、格納した文書を検索する技術に関して、例えば、次の技術がある。 As a technique for storing a document (structured document) that can identify individual logical structural elements constituting a document of a predetermined format such as an XML format and for searching the stored document, for example, there are the following techniques.

第１技術として、１件の文書が複数の論理構造で構成される構造化文書を検索する構造化文書検索システムがある（例えば、特許文献１）。第１の技術では、検索時に一括して参照される可能性の高い文字列データに所定のインデックスグループ識別子を付与して文書登録がされる。それから、インデックスグループ識別子の等しいインデックスデータを用いて文書検索が行われる。これにより、目的とする論理構造だけを対象に指定する構造指定検索を高速に実現することができる。 As a first technique, there is a structured document retrieval system that retrieves a structured document in which one document has a plurality of logical structures (for example, Patent Document 1). In the first technique, document registration is performed by assigning a predetermined index group identifier to character string data that is highly likely to be collectively referred to during search. Then, a document search is performed using index data having the same index group identifier. As a result, it is possible to realize a structure designation search that designates only a target logical structure as a target at high speed.

第２技術として、ＸＭＬ文書にインデックス付けする技術がある（例えば、特許文献２）。 As a second technique, there is a technique for indexing an XML document (for example, Patent Document 2).

第３技術として、バイナリ形式の構造化文書に対する検索処理技術がある。第３技術では、検索式変換部は、構造化文書に対する検索式を構成するそれぞれのノードを、ボキャブラリ一覧表を用いて対応するインデックスに変換することで、この検索式を変換する。文書解析部は、構造化文書を構成するそれぞれのノードに対応するインデックスを、ボキャブラリ一覧表を用いて特定する。検索式評価部は、変換後の検索式に該当する構造化文書中の一部を、変換後の検索式中に記されているそれぞれのインデックスと、文書解析部が特定したそれぞれのノードに対応するインデックスと、を用いて検索する。これにより、バイナリ形式の構造化文書に対するより高速な検索処理を実現することができる。 As a third technique, there is a search processing technique for a structured document in binary format. In the third technique, the search expression conversion unit converts each of the nodes constituting the search expression for the structured document into a corresponding index using the vocabulary list, thereby converting the search expression. The document analysis unit specifies an index corresponding to each node constituting the structured document using the vocabulary list. The search expression evaluation unit corresponds to a part of the structured document corresponding to the converted search expression to each index described in the converted search expression and each node specified by the document analysis unit. Search using the index. As a result, a higher-speed search process for a binary structured document can be realized.

特開２０００−３３６６号公報JP 2000-3366 A 特表２００７−５３３００８号公報Special table 2007-533008 gazette 特開２０１０−２５０４４９号公報JP 2010-250449 A

ＲＤＢＭＳでは、テーブルの構造（列の名前やデータ型）を「スキーマ」と呼ぶことがある。ＲＤＢＭＳでは、データをテーブルに挿入（格納）する前にスキーマが定義されている。 In the RDBMS, the table structure (column name or data type) may be referred to as “schema”. In RDBMS, a schema is defined before data is inserted (stored) in a table.

一方、ドキュメントＤＢでは、データをＤＢに挿入する前に事前のスキーマの定義を必要とせず、ＪＳＯＮ形式やＸＭＬ形式等の文書データをＤＢに挿入することができる。例えば、ＭｏｎｇｏＤＢでは、以下のプログラム１のようなＪＳＯＮ形式のデータ（ＭｏｎｇｏＤＢでは「ドキュメント」と呼ぶ）をコレクションに挿入する。 On the other hand, in the document DB, document data in the JSON format, XML format, or the like can be inserted into the DB without requiring prior schema definition before the data is inserted into the DB. For example, in MongoDB, data in the JSON format (referred to as “document” in MongoDB) such as the following program 1 is inserted into the collection.

たとえば、以下のプログラム１は、４つのドキュメントを含んでいる。ここで、以下のプログラムにおいて、１ドキュメントは、１対の“｛”と“｝”で囲まれた範囲で示される。これらの４つのプログラムは、「ｎａｍｅ」フィールドが共通だということを除けば、他のフィールドはドキュメントによって形式がバラバラである。但し、１番目と４番目のドキュメントは、同一形式である。
プログラム１
{
"name": "Horny",
"date": "1992-02-13",
"gender": "m",
"weight": 600
},
{
"name": ”SO43659”,
"date": ”2011-05-31”,
"account": “AWS29825”,
"price": 59.99,
"tags": “Sales”
},
{
"name": "sue",
"age": 26,
"status": "A",
"groups": "news"
},
{
"name": "Aurora",
"date": "1991-01-24",
"gender": "f",
"weight": 450
},
一方、ＭｏｎｇｏＤＢでは、検索時にはフィールドを指定して検索が行われる。例えば「nameフィールドが"Merry"でdateフィールドが2015年から2016年のドキュメント」のように指定するが、ＭｏｎｇｏＤＢは、各ドキュメントを解析（パース）してどのようなフィールドが入っているか確認しながら検索する。そのため、最初から各カラムが決まったスキーマで格納されているＲＤＢＭＳと比較して、ＭｏｎｇｏＤＢの処理速度は、低速である。 For example, the following program 1 includes four documents. Here, in the following program, one document is shown in a range surrounded by a pair of “{” and “}”. In these four programs, the format of the other fields varies from document to document, except that the "name" field is common. However, the first and fourth documents have the same format.
Program 1
{
"name": "Horny",
"date": "1992-02-13",
"gender": "m",
"weight": 600
},
{
"name": ”SO43659”,
"date": ”2011-05-31”,
"account": “AWS29825”,
"price": 59.99,
"tags": “Sales”
},
{
"name": "sue",
"age": 26,
"status": "A",
"groups": "news"
},
{
"name": "Aurora",
"date": "1991-01-24",
"gender": "f",
"weight": 450
},
On the other hand, in MongoDB, a search is performed by specifying a field at the time of search. For example, “name field is“ Merry ”and date field is a document from 2015 to 2016”, but MongoDB parses each document and confirms what fields are in it. Search for. Therefore, the processing speed of MongoDB is low compared to the RDBMS in which each column is stored with a predetermined schema from the beginning.

本発明は、１つの側面では、データ型が決められてなく、かつフィールド名と該フィールド名に対応する値とを含むドキュメントを高速検索可能にする格納技術を提供する。 In one aspect, the present invention provides a storage technique that enables a high-speed search for a document having an undefined data type and including a field name and a value corresponding to the field name.

本願の開示する情報処理装置は、１つの態様において、取得部、保持部、生成部、格納制御部を含む。取得部は、データ型が決められてなく、かつフィールド名とフィールド名に対応する値とを含むドキュメントを取得する保持部は、ドキュメントに記述されたフィールド名を特定するフィールド名特定情報を、ドキュメントにおけるフィールド名の記述順に保持する。生成部は、記述順と、記述順に応じて特定されるドキュメントのタイプ情報とを関係付けた記述順タイプ関係情報を生成する。格納制御部は、タイプ情報と、記述順に保持されたフィールド名特定情報に対応するフィールド名に対応する値とを関係づけたタイプ‐値関係情報を格納領域に格納する。 In one aspect, an information processing apparatus disclosed in the present application includes an acquisition unit, a holding unit, a generation unit, and a storage control unit. The obtaining unit obtains a document whose data type is not determined and includes a field name and a value corresponding to the field name, and the holding unit obtains field name identification information for identifying the field name described in the document, Are stored in the order of field name description. The generation unit generates description order type relationship information that associates the description order with the document type information specified according to the description order. The storage control unit stores type-value relationship information in which the type information is associated with the value corresponding to the field name corresponding to the field name specifying information held in the description order in the storage area.

本願の開示する情報処理方法は、１つの態様において、コンピュータが、次の処理を行う。すなわち、コンピュータは、データ型が決められてなく、かつフィールド名とフィールド名に対応する値とを含むドキュメントを取得する。コンピュータは、ドキュメントに記述されたフィールド名を特定するフィールド名特定情報を、ドキュメントにおけるフィールド名の記述順に保持する。コンピュータは、記述順と、記述順に応じて特定されるドキュメントのタイプ情報とを関係付けた記述順タイプ関係情報を生成する。コンピュータは、タイプ情報と記述順に保持されたフィールド名特定情報に対応するフィールド名に対応する値とを関係づけたタイプ‐値関係情報を格納領域に格納する。 In one aspect of the information processing method disclosed in the present application, a computer performs the following processing. That is, the computer obtains a document whose data type is not determined and includes a field name and a value corresponding to the field name. The computer holds field name specifying information for specifying field names described in the document in the order in which the field names are described in the document. The computer generates description order type relationship information that associates the description order with the document type information specified in accordance with the description order. The computer stores type-value relationship information in which the type information and the value corresponding to the field name corresponding to the field name specifying information held in the description order are related to each other in the storage area.

本願の開示するプログラムは、１つの態様において、コンピュータが、次の処理を行う。すなわち、コンピュータは、データ型が決められてなく、かつフィールド名とフィールド名に対応する値とを含むドキュメントを取得する。コンピュータは、ドキュメントに記述されたフィールド名を特定するフィールド名特定情報を、ドキュメントにおけるフィールド名の記述順に保持する。コンピュータは、記述順と、記述順に応じて特定されるドキュメントのタイプ情報とを関係付けた記述順タイプ関係情報を生成する。コンピュータは、タイプ情報と記述順に保持されたフィールド名特定情報に対応するフィールド名に対応する値とを関係づけたタイプ‐値関係情報を格納領域に格納する。 In one aspect of the program disclosed in the present application, the computer performs the following processing. That is, the computer obtains a document whose data type is not determined and includes a field name and a value corresponding to the field name. The computer holds field name specifying information for specifying field names described in the document in the order in which the field names are described in the document. The computer generates description order type relationship information that associates the description order with the document type information specified in accordance with the description order. The computer stores type-value relationship information in which the type information and the value corresponding to the field name corresponding to the field name specifying information held in the description order are related to each other in the storage area.

本発明によれば、１つの側面では、データ型が決められてなく、かつフィールド名と該フィールド名に対応する値とを含むドキュメントを高速検索可能なように格納することができる。 According to the present invention, in one aspect, a data type is not determined, and a document including a field name and a value corresponding to the field name can be stored so as to be searched at high speed.

単純なＪＳＯＮデータの格納方法を説明するための図である。It is a figure for demonstrating the storing method of simple JSON data. ＢＳＯＮ形式によるＪＳＯＮデータの格納方法を説明するための図である。It is a figure for demonstrating the storage method of JSON data by a BSON format. 図３（Ａ）は図１の方法により格納されたデータの読み出し方法を説明するための図であり、図３（Ｂ）は図２の方法により格納されたデータの読み出し方法を説明するための図である。3A is a diagram for explaining a method of reading data stored by the method of FIG. 1, and FIG. 3B is a diagram for explaining a method of reading data stored by the method of FIG. FIG. 本実施形態における情報処理装置の一例を示す。An example of the information processing apparatus in this embodiment is shown. 本実施形態における情報処理装置のブロック図を示す。The block diagram of the information processing apparatus in this embodiment is shown. 本実施形態で用いるプログラムＰの一例を示す。An example of the program P used in the present embodiment is shown. 本実施形態におけるストア領域に格納されたドキュメントの一例を示す。An example of the document stored in the store area | region in this embodiment is shown. 本実施形態におけるフィールドＩＤ管理ツリーの一例を示す。An example of the field ID management tree in this embodiment is shown. 本実施形態におけるフィールド名管理テーブルの一例を示す。An example of the field name management table in this embodiment is shown. 本実施形態におけるタイプＩＤ管理ツリーの一例を示す。An example of the type ID management tree in this embodiment is shown. 本実施形態におけるタイプ管理テーブルの一例を示す。An example of the type management table in this embodiment is shown. 本実施形態におけるＪＳＯＮからフィールドＩＤを、ＪＳＯＮでの記述順に格納したフィールドＩＤ配列の生成について説明するための図である。It is a figure for demonstrating the production | generation of the field ID arrangement | sequence which stored field ID from JSON in the order of description in JSON in this embodiment. 本実施形態におけるＪＳＯＮに基づくコレクションへのドキュメントの挿入のフローチャート（その１）である。It is a flowchart (the 1) of the insertion of the document in the collection based on JSON in this embodiment. 本実施形態におけるＪＳＯＮに基づくコレクションへのドキュメントの挿入のフローチャート（その２）である。It is a flowchart (the 2) of the insertion of the document in the collection based on JSON in this embodiment. 図１３Ａ及び図１３Ｂにおいて用いられるデータである。It is the data used in FIG. 13A and FIG. 13B. 本実施形態におけるドキュメントの検索処理のフローチャートである。It is a flowchart of the search process of the document in this embodiment. 本実施形態を用いた場合の指定のフィールド値の取得方法について説明する図である。It is a figure explaining the acquisition method of the designated field value at the time of using this embodiment. 本実施形態におけるプログラムを実行するコンピュータのハードウェア構成の一例を示す。An example of the hardware constitutions of the computer which executes the program in this embodiment is shown.

検索性能を向上させるために、ドキュメント挿入時に構造を解析し、同一フィールドをまとめたり、インデックスを張ったりすることもある。この場合は検索が高速化されるが、挿入速度は低下する。
（ｉ）例えば、ＪＳＯＮ格納方法としてもっとも単純なのは、ＪＳＯＮを文字列としてそのまま格納する方法である。但し，ＪＳＯＮを構成する文字列のうち余分な空白は削除する。ＭｏｎｇｏＤＢでは、プログラムＡ１をそのまま格納すると、ストレージやメモリ中のデータは図１の形式で保存される。 In order to improve the search performance, the structure is analyzed at the time of document insertion, and the same field may be collected or an index may be set. In this case, the search is speeded up, but the insertion speed is reduced.
(I) For example, the simplest JSON storage method is a method of storing JSON as it is as a character string. However, extra blanks are deleted from the character string constituting JSON. In MongoDB, if the program A1 is stored as it is, the data in the storage and memory is saved in the format of FIG.

プログラムＡ１
{
"name: "Horny",
"date": "1992-02-13",
"gender": "m",
"weight": 600
}, Program A1
{
"name:" Horny ",
"date": "1992-02-13",
"gender": "m",
"weight": 600
},

このように、ＪＳＯＮ形式のドキュメントの挿入は、前のＪＳＯＮの後に追記してゆく。入力データをＪＳＯＮに記述するだけなので高速である。 In this way, the insertion of the JSON format document is added after the previous JSON. Since input data is only described in JSON, it is fast.

そして、検索の際には、格納したドキュメントについて、特定のフィールド名を指定して、フィールド値を取り出す操作になる。この場合、ＪＳＯＮの格納されたデータを先頭から順に読んでゆく。例えば、特定のフィールド名「"gender"」のフィールド値を取り出す場合、図３（Ａ）に示すように、ＪＳＯＮデータを先頭から順次読み出していく。そのため、検索に時間がかかり、検索速度は低速である。
（ｉｉ）次に、ＢＳＯＮ形式で格納する場合を述べる。ＭｏｎｇｏＤＢの格納形式では、ＢＳＯＮ（バイナリ型ＪＳＯＮ）形式で記述される。例えば、プログラムＡ１をＢＳＯＮ形式で格納すると図２の形式で保存される。 In the search, a specified field name is designated for the stored document, and the field value is extracted. In this case, the data stored in JSON are read sequentially from the top. For example, when extracting a field value of a specific field name ““ gender ””, JSON data is sequentially read from the top as shown in FIG. Therefore, the search takes time and the search speed is low.
(Ii) Next, a case where data is stored in the BSON format will be described. The MongoDB storage format is described in the BSON (binary type JSON) format. For example, when the program A1 is stored in the BSON format, it is saved in the format of FIG.

図２は、ＢＳＯＮ形式によるＪＳＯＮデータの格納方法を説明するための図である。符号１に示すように、ＢＳＯＮの先頭４バイトに整数値で、ＢＳＯＮ全体のバイト長が保存される。符号９に示すように、ＢＳＯＮの最後はＮＵＬＬ終端文字(「￥０」で表現)で終わる。 FIG. 2 is a diagram for explaining a method of storing JSON data in the BSON format. As indicated by reference numeral 1, the byte length of the entire BSON is stored as an integer value in the first 4 bytes of the BSON. As indicated by reference numeral 9, the end of BSON ends with a NULL termination character (expressed as “¥ 0”).

ＢＳＯＮ内の各フィールドは、フィールド名（符号３）及びフィールド名のデータ型（符号２）と、フィールド値（符号５）及びフィールド値のデータ長（符号４）とを含む。 Each field in the BSON includes a field name (symbol 3) and a field name data type (symbol 2), a field value (symbol 5), and a field value data length (symbol 4).

フィールド名のデータ型が文字列なら“２”が設定される（符号２）。フィールド名のデータ型が４バイト整数なら“１６”が設定される（符号２”）。フィールド名（符号３）には、ＮＵＬＬが終端に付与されたフィールド名が設定される。 If the data type of the field name is a character string, “2” is set (reference numeral 2). If the data type of the field name is a 4-byte integer, “16” is set (reference 2), and the field name (reference 3) is set to a field name with NULL at the end.

フィールド値は、指定したデータ型（符号２）によって異なる。符号２’で示すように、データ型＝文字列（２）であるなら、フィールド値には最初に文字列の長さが４バイト整数で格納され、その後にＮＵＬＬ終端文字列が入る。フィールド値のデータ長には格納したデータ長が入る。符号２”で示すように、データ型＝整数型（１６）なら、４バイト整数がそのまま格納される。フィールド値のデータ長には格納したデータ長が入る。 The field value differs depending on the designated data type (reference numeral 2). As indicated by reference numeral 2 ', if the data type = character string (2), the length of the character string is first stored as a 4-byte integer in the field value, followed by a NULL termination character string. The stored data length is included in the data length of the field value. As indicated by reference numeral 2 ″, if the data type = integer type (16), a 4-byte integer is stored as it is. The data length of the field value is the stored data length.

このようにＢＳＯＮ格納方式では、ＪＳＯＮを解析して図２のようなＢＳＯＮに構築するので、ＪＳＯＮをそのまま格納するよりも時間がかかり、処理速度は低下する。 As described above, in the BSON storage method, since the JSON is analyzed and built into the BSON as shown in FIG. 2, it takes more time than storing the JSON as it is, and the processing speed is reduced.

そして、検索の際には、格納したドキュメントについて、特定のフィールド名を指定して、フィールド値を取り出す操作になる。ＢＳＯＮ形式の場合、フィールドが予め区切られているので、フィールド単位で読み飛ばして検索ができる。例えば、図３（Ｂ）に示すように、特定のフィールド名「"gender"」のフィールド値を取り出す場合、フィールド単位で読み飛ばして検索がされる。そのため、ＪＳＯＮ形式よりも検索時間が短縮されるが、それでも、先頭からフィールド名を順次読み込んでいく必要がある。 In the search, a specified field name is designated for the stored document, and the field value is extracted. In the case of the BSON format, since the fields are divided in advance, it is possible to search by skipping the field unit. For example, as shown in FIG. 3B, when a field value of a specific field name ““ gender ”” is extracted, the search is performed by skipping the reading in field units. Therefore, the search time is shortened compared to the JSON format, but it is still necessary to read the field names sequentially from the top.

このように、ＭｏｎｇｏＤＢはＢＳＯＮと呼ばれるＪＳＯＮをパースした構造化データ形式となっていたが、フィールド名は文字列格納されており、文字比較によりマッチングが必要であった。ＭｅｓｓａｇｅＰａｃｋもＭｏｎｇｏＤＢと同様である。 As described above, MongoDB has a structured data format that parses JSON called BSON, but field names are stored as character strings, and matching is necessary by character comparison. MessagePack is the same as MongoDB.

また、ＯｒａｃｌｅのＪＳＯＮでは、フィールド名からフィールドＩＤへのマッチングは行っているが、ＪＳＯＮ全体のタイプ構造を識別していない。 In Oracle's JSON, field names are matched with field IDs, but the type structure of the entire JSON is not identified.

そこで、本実施形態では、データ型が決められてなく、かつフィールド名と該フィールド名に対応する値とを含むドキュメントを高速検索可能にする格納技術を提供する。 Therefore, in the present embodiment, a storage technique is provided that enables high-speed search for a document whose data type is not determined and that includes a field name and a value corresponding to the field name.

図４は、本実施形態における情報処理装置の一例を示す。情報処理装置１１は、取得部１２、保持部１３、生成部１４、格納制御部１５を含む。 FIG. 4 shows an example of an information processing apparatus in the present embodiment. The information processing apparatus 11 includes an acquisition unit 12, a holding unit 13, a generation unit 14, and a storage control unit 15.

取得部１２は、データ型が決められてなく、かつフィールド名と該フィールド名に対応する値とを含むドキュメントを取得する。取得部２の一例として、後述する取得部２３、Ｓ１の処理が挙げられる。 The acquisition unit 12 acquires a document whose data type is not determined and includes a field name and a value corresponding to the field name. As an example of the acquisition unit 2, processing of an acquisition unit 23 and S 1 described later can be given.

保持部１３は、ドキュメントに記述されたフィールド名を特定するフィールド名特定情報を、ドキュメントにおけるフィールド名の記述順に保持する。保持部１３の一例として、後述する格納検索制御部２４、Ｓ９の処理が挙げられる。 The holding unit 13 holds field name specifying information for specifying field names described in the document in the order of description of the field names in the document. As an example of the holding unit 13, processing of a storage search control unit 24 and S 9 described later is given.

生成部１４は、記述順と、記述順に応じて特定されるドキュメントのタイプ情報とを関係付けた記述順タイプ関係情報を生成する。生成部１４の一例として、後述する格納検索制御部２４、Ｓ１０〜Ｓ１３の処理が挙げられる。記述順タイプ関係情報の一例として、後述するタイプ管理テーブル３２が挙げられる。 The generation unit 14 generates description order type relationship information that associates the description order with the document type information specified according to the description order. As an example of the generation unit 14, processing of a storage search control unit 24 and S 10 to S 13 described later can be given. As an example of the description order type relation information, a type management table 32 to be described later is cited.

格納制御部１５は、タイプ情報と、記述順に保持されたフィールド名特定情報に対応するフィールド名に対応する値とを関係づけたタイプ‐値関係情報を格納領域に格納する。格納制御部１５の一例として、後述する格納検索制御部２４、Ｓ１４〜Ｓ１５の処理が挙げられる。タイプ‐値関係情報の一例として、後述するストア領域２８に格納されるドキュメントが挙げられる。 The storage control unit 15 stores, in the storage area, type-value relation information that associates type information with values corresponding to field names corresponding to field name specifying information held in the order of description. As an example of the storage control unit 15, processing of a storage search control unit 24, which will be described later, and S 14 to S 15 is given. An example of the type-value relationship information is a document stored in a store area 28 described later.

保持部１３は、第１木構造を用いてフィールド名を保持した第１木構造情報に基づいて、該フィールド名から、フィールド名に対応するフィールド名特定情報を取得する。保持部１３は、フィールド名特定情報を、ドキュメントにおけるフィールド名の記述順に保持する。 The holding unit 13 acquires field name specifying information corresponding to the field name from the field name based on the first tree structure information holding the field name using the first tree structure. The holding unit 13 holds the field name specifying information in the order of description of field names in the document.

格納制御部１５は、第２木構造を用いてフィールド名特定情報を記述順に保持した第２木構造情報に基づいて、フィールド名特定情報の記述順から、記述順に応じて特定されるドキュメントのタイプ情報を取得する。格納制御部１５は、記述順とタイプ情報とを関係付けたタイプ‐値関係情報を格納領域に格納する。 The storage control unit 15 uses the second tree structure to store the field name specifying information in the description order, and based on the second tree structure information, specifies the document type specified according to the description order from the description order of the field name specifying information. Get information. The storage control unit 15 stores the type-value relationship information in which the description order is associated with the type information in the storage area.

情報処理装置１１は、さらに、検索部１６を含む。検索部１６は、取得部１２により検索条件を取得する場合、タイプ情報に基づいて、記述順タイプ関係情報とタイプ‐値関係情報とを関係付ける。検索部１６は、関係づけた記述順タイプ関係情報とタイプ‐値関係情報から検索条件に含まれるフィールド名に対応する値が条件を満たすか否かに基づいて、検索を行う。検索部１６の一例として、後述する格納検索制御部２４、Ｓ２３〜Ｓ３０の処理が挙げられる。 The information processing apparatus 11 further includes a search unit 16. When the acquisition unit 12 acquires the search condition, the search unit 16 associates the description order type relationship information and the type-value relationship information based on the type information. The search unit 16 performs a search based on whether or not the value corresponding to the field name included in the search condition satisfies the condition from the related description order type relationship information and the type-value relationship information. As an example of the search unit 16, processing of a storage search control unit 24 and S 23 to S 30 which will be described later is given.

このように構成することにより、データ型が決められてなく、かつフィールド名と該フィールド名に対応する値とを含むドキュメントを高速検索可能に格納することができる。すなわち、スキームレス型文書から、文書を構成する構成要素（メタ情報）と、各構成要素に対応する値とを分離させ、効率的な検索ができる形式でメタ情報を管理・保存することにより、検索速度を向上させることができる。 With this configuration, a document whose data type is not determined and which includes a field name and a value corresponding to the field name can be stored so as to be searched at high speed. In other words, by separating the component elements (meta information) that make up the document and the values corresponding to each component from the schemeless document, and managing and storing the meta information in a format that allows efficient search, Search speed can be improved.

以下に本実施形態の詳細を説明する。
ここで、スキーマの定義を述べる。ＲＤＢＭＳの「スキーマ」は、テーブル定義コマンドで列形式の定義を指すものとする。ＲＤＢＭＳの1つであるＰｏｓｔｇｒｅＳＱＬの場合、テーブル定義はプログラム2のようなCREATE TABLEコマンドで行うが、これが「スキーマ」となる。 Details of this embodiment will be described below.
Here, the schema definition is described. The “Schema” of the RDBMS indicates a column format definition with a table definition command. In the case of PostgreSQL, which is one of the RDBMSs, the table definition is performed by a CREATE TABLE command such as program 2, which is a “schema”.

（プログラム 2）
CREATE TABLE tablename (
name TEXT,
date DATE,
gender CHAR(1),
weight INTGER
): (Program 2)
CREATE TABLE tablename (
name TEXT,
date DATE,
gender CHAR (1),
weight INTGER
):

ＪＳＯＮ形式のドキュメントの「スキーマ」は、フィールドの名前だけを残した構造を指すものとする。プログラム３のＪＳＯＮ形式のドキュメントがあった場合、プログラム４がそのドキュメントの「スキーマ」となる。 The “schema” of the JSON format document indicates a structure in which only the field names are left. If there is a JSON document of the program 3, the program 4 becomes the “schema” of the document.

（プログラム３）
{
"name": "Horny",
"date": "1992-02-13",
"gender": "m",
"weight": 600
}, (Program 3)
{
"name": "Horny",
"date": "1992-02-13",
"gender": "m",
"weight": 600
},

（プログラム４）
{
"name":,
"date":,
"gender":,
"weight":
}, (Program 4)
{
"name" :,
"date" :,
"gender" :,
"weight":
},

本実施形態では、スキーマはフィールド名の並び順を意識し、同じフィールド名が含まれていても並び順が異なる場合は別のスキーマとしている。そのためプログラム４とプログラム５は異なるスキーマである。 In the present embodiment, the schema is conscious of the order of field names, and even if the same field name is included, the schema is different if the order is different. Therefore, program 4 and program 5 have different schemas.

（プログラム５）
{
"date":,
"name":,
"gender":,
"weight":
}, (Program 5)
{
"date" :,
"name" :,
"gender" :,
"weight":
},

但し、本実施形態の範囲内でプログラム４とプログラム５を同じスキーマと見なすバリエーションもありえる。 However, there may be variations in which the program 4 and the program 5 are regarded as the same schema within the scope of the present embodiment.

本実施形態では、ドキュメントの挿入速度を犠牲にせずに、高速検索を可能とする技術を提供する。本実施形態では、以下を実現する。
・ドキュメントＤＢへのＪＳＯＮ形式のドキュメントの挿入時にＪＳＯＮのメタ情報を解析し、ＪＳＯＮの構造に対してユニークなＩＤを付ける。これを利用して検索を高速化する。
・ドキュメントＤＢへのＪＳＯＮ形式のドキュメントの挿入時のメタ情報の解析を最適化することで挿入速度の低下を最小化する。
・ＪＳＯＮからメタ情報（フィールド等）を除くことで、格納するデータ量を削減する。これによりストレージへの入出力量を減らして全体性能を高速化する。 The present embodiment provides a technique that enables high-speed search without sacrificing the document insertion speed. In the present embodiment, the following is realized.
-The JSON meta information is analyzed when a JSON format document is inserted into the document DB, and a unique ID is assigned to the JSON structure. Use this to speed up searches.
-Optimization of meta-information when inserting a JSON-formatted document into the document DB minimizes a decrease in insertion speed.
-Reduce the amount of data to be stored by removing meta information (fields, etc.) from JSON. This reduces the amount of input and output to the storage and speeds up overall performance.

図５は、本実施形態における情報処理装置のブロック図を示す。情報処理装置２１は、例えば、コンピュータである。情報処理装置２１は、制御部２２、格納部２７を含む。 FIG. 5 is a block diagram of the information processing apparatus according to this embodiment. The information processing device 21 is, for example, a computer. The information processing apparatus 21 includes a control unit 22 and a storage unit 27.

制御部２２は、取得部２３、格納検索制御部２４、出力部２５を含む。取得部２３は、要求元から、コマンド及びＪＳＯＮ形式のドキュメント情報を取得する。要求元は、例えば、外部の端末、または内部のプログラムを含む。 The control unit 22 includes an acquisition unit 23, a storage search control unit 24, and an output unit 25. The acquisition unit 23 acquires command and JSON format document information from the request source. The request source includes, for example, an external terminal or an internal program.

格納検索制御部２４は、取得したコマンドに応じて、後述する形式により受信したＪＳＯＮ形式のドキュメント（以下、「ドキュメント」と称する）を格納部２７へ格納する。また、格納検索制御部２４は、取得したコマンドに応じて、後述する形式により格納部２７からドキュメントを読み出す。出力部２５は、取得したコマンドに応じて、格納部２７から読み出されたドキュメントを要求元へ出力する。 The storage search control unit 24 stores, in the storage unit 27, a JSON-formatted document (hereinafter referred to as “document”) received in a format to be described later in accordance with the acquired command. Further, the storage search control unit 24 reads a document from the storage unit 27 in a format to be described later in accordance with the acquired command. The output unit 25 outputs the document read from the storage unit 27 to the request source in accordance with the acquired command.

ここで、本実施形態では、コレクション毎に以下の構成要素を持つものとする。コレクション毎に出現したフィールド名に対して、ユニークな番号を付与する。そのユニークな番号をフィールドＩＤ（Ｆｉｅｌｄ−ＩＤ）とする。 Here, in the present embodiment, each collection has the following components. A unique number is assigned to the field name that appears for each collection. The unique number is a field ID (Field-ID).

ＪＳＯＮのデータ構造はその中に含まれているフィールドＩＤの配列だと考えられる。この「フィールドＩＤの配列」に対して出現順にユニークな番号を付与する。そのユニークな番号をタイプＩＤ（Ｔｙｐｅ−ＩＤ）とする。 The data structure of JSON is considered to be an array of field IDs contained therein. A unique number is assigned to the “array of field IDs” in the order of appearance. The unique number is defined as a type ID (Type-ID).

格納部２７は、ストア領域２８、フィールドＩＤ管理ツリー２９、フィールド名管理テーブル３０、タイプＩＤ管理ツリー３１、タイプ管理テーブル３２を含む。 The storage unit 27 includes a store area 28, a field ID management tree 29, a field name management table 30, a type ID management tree 31, and a type management table 32.

ストア領域２８は、ドキュメントの主要部分、すなわち、主として、フィールドの値を格納する領域である。 The store area 28 is a main part of the document, that is, an area for mainly storing field values.

フィールドＩＤ管理ツリー２９は、フィールド名からフィールドＩＤを検索できるツリーデータ構造である。フィールドＩＤ管理ツリー２９は、一例として、トライ木またはプレフィックス木と呼ばれるデータ構造で構成することを想定しているが、これに限定されない。 The field ID management tree 29 has a tree data structure in which a field ID can be searched from a field name. As an example, the field ID management tree 29 is assumed to be configured with a data structure called a trie tree or a prefix tree, but is not limited to this.

フィールド名管理テーブル３０は、フィールドＩＤ管理ツリー２９と対となるデータ構造で、フィールドＩＤからフィールド名を取得する場合に用いられる。フィールド名管理テーブル３０は、一例として、配列やＢ木構造で構成することを想定しているが、これに限定されない。 The field name management table 30 has a data structure that forms a pair with the field ID management tree 29 and is used when a field name is acquired from a field ID. As an example, it is assumed that the field name management table 30 is configured by an array or a B-tree structure, but the field name management table 30 is not limited to this.

タイプＩＤ管理ツリー３１は、ＪＳＯＮ中に出現するフィールドＩＤの配列からタイプＩＤを検索するデータ構造である。タイプＩＤ管理ツリー３１は、ＪＳＯＮのスキーマを管理する。 The type ID management tree 31 has a data structure for searching for a type ID from an array of field IDs appearing in JSON. The type ID management tree 31 manages the JSON schema.

タイプ管理テーブル３２は、ＪＳＯＮのスキーマにあたるタイプ毎の情報を管理する領域である。 The type management table 32 is an area for managing information for each type corresponding to the JSON schema.

図６は、本実施形態で用いるプログラムＰの一例を示す。プログラムＰは、上述したプログラム１と同じものなので、その説明を省略する。なお、図６では、プログラムＰには、４つのプログラムが含まれており、それぞれをプログラムＰ１、プログラムＰ２、プログラムＰ３、プログラムＰ４と称する。 FIG. 6 shows an example of the program P used in this embodiment. Since the program P is the same as the program 1 described above, its description is omitted. In FIG. 6, the program P includes four programs, which are referred to as a program P1, a program P2, a program P3, and a program P4.

図７は、本実施形態におけるストア領域２８に格納されたドキュメントの一例を示す。ストア領域２８には、図６のプログラムＰに含まれるフィールドの値がタイプＩＤ事に格納されている。 FIG. 7 shows an example of a document stored in the store area 28 in the present embodiment. In the store area 28, field values included in the program P of FIG. 6 are stored as type IDs.

図８は、本実施形態におけるフィールドＩＤ管理ツリーの一例を示す。図８において、フィールドＩＤ管理ツリー２９の木構造は、プレフィックス木を想定して記述している。 FIG. 8 shows an example of a field ID management tree in the present embodiment. In FIG. 8, the tree structure of the field ID management tree 29 is described assuming a prefix tree.

図９は、本実施形態におけるフィールド名管理テーブルの一例を示す。フィールド名管理テーブル３０は、「フィールドＩＤ」と「フィールド名」の項目を含む。「フィールドＩＤ」は、フィールド名を識別する識別情報である。「フィールド名」には、管理対象となるフィールド名が格納される。 FIG. 9 shows an example of a field name management table in the present embodiment. The field name management table 30 includes items of “field ID” and “field name”. “Field ID” is identification information for identifying a field name. The “field name” stores the name of the field to be managed.

図１０は、本実施形態におけるタイプＩＤ管理ツリーの一例を示す。図１０において、タイプＩＤ管理ツリー３１の木構造は、プレフィックス木を想定して記述している。 FIG. 10 shows an example of a type ID management tree in the present embodiment. In FIG. 10, the tree structure of the type ID management tree 31 is described assuming a prefix tree.

図１１は、本実施形態におけるタイプ管理テーブルの一例を示す。タイプ管理テーブル３２は、「タイプＩＤ」３２−１、「フィールド数」３２−２、「フィールド」３２−３の項目を含む。項目「フィールド数」３２−２には、タイプＩＤで特定されるドキュメントを構成するフィールド数が格納される。項目「フィールド」３２−３は、タイプＩＤに対応するドキュメントに含まれるフィールドのフィールドＩＤが配置順３２−４及び昇順３２−５に、その順序に応じて、フィールドＩＤが格納される。 FIG. 11 shows an example of a type management table in the present embodiment. The type management table 32 includes items of “type ID” 32-1, “number of fields” 32-2, and “field” 32-3. The item “number of fields” 32-2 stores the number of fields constituting the document specified by the type ID. In the item “field” 32-3, field IDs of fields included in the document corresponding to the type ID are stored in the arrangement order 32-4 and the ascending order 32-5 according to the order.

図１２は、本実施形態におけるＪＳＯＮからフィールドＩＤを、ＪＳＯＮでの記述順に格納したフィールドＩＤ配列の生成について説明するための図である。図１２については、後述する。 FIG. 12 is a diagram for explaining generation of a field ID array storing field IDs from JSON in the order of description in JSON in the present embodiment. FIG. 12 will be described later.

次に、図６から図１１を用いて、コレクションへのドキュメントの挿入工程について説明する。例えば、取得部２３は、コレクションへのドキュメントの挿入命令と共に、ＪＳＯＮ形式のドキュメント（図６）を取得するとする。 Next, a process for inserting a document into the collection will be described with reference to FIGS. For example, it is assumed that the acquisition unit 23 acquires a document in JSON format (FIG. 6) together with a document insertion instruction to the collection.

ＪＳＯＮ（図６）の中には複数のフィールドが含まれている。格納検索制御部２４は、ＪＳＯＮ（図６）から各フィールドを抽出してフィールドＩＤ管理ツリー２９を検索し、各フィールド名に対応するフィールドＩＤを取得する。 A plurality of fields are included in JSON (FIG. 6). The storage search control unit 24 extracts each field from the JSON (FIG. 6), searches the field ID management tree 29, and acquires a field ID corresponding to each field name.

ＪＳＯＮ中に存在するフィールド名がフィールドＩＤ管理ツリー２９に存在しない場合は、格納検索制御部２４は、フィールドＩＤ管理ツリー２９に新たにそのフィールド名を登録する。ここで、格納検索制御部２４は、新しいフィールドＩＤをその登録するフィールド名に割り振る。同時に、格納検索制御部２４は、フィールドＩＤ管理ツリー２９に新しいフィールド名とフィールドＩＤのペアを登録する。 If the field name existing in the JSON does not exist in the field ID management tree 29, the storage search control unit 24 newly registers the field name in the field ID management tree 29. Here, the storage search control unit 24 assigns a new field ID to the registered field name. At the same time, the storage search control unit 24 registers a new field name / field ID pair in the field ID management tree 29.

ここまででＪＳＯＮの各フィールドのフィールドＩＤを取得することができる。例えば図１２で示すＪＳＯＮの場合、“name”のフィールドＩＤが“１“、“date”のフィールドＩＤが“７”、“gender”のフィールドＩＤが“５”、“weight”のフィールドＩＤが“４”となる。これらのフィールドＩＤを登場順に並べると［１，７，５，４］の配列となる。これを、フィールドＩＤ配列と称する。 Up to this point, the field ID of each field of JSON can be acquired. For example, in the case of the JSON shown in FIG. 12, the field ID of “name” is “1”, the field ID of “date” is “7”, the field ID of “gender” is “5”, and the field ID of “weight” is “ 4 ". When these field IDs are arranged in the order of appearance, an array of [1, 7, 5, 4] is obtained. This is referred to as a field ID array.

次に、格納検索制御部２４は、得られたフィールドＩＤ配列を用いてタイプＩＤ管理ツリー３１（図１０）を検索して、ＪＳＯＮの構造を示すタイプＩＤを取得する。検索の結果、タイプＩＤ管理ツリー３１にて、得られたフィールドＩＤ配列がヒットしない場合は、その得られたフィールドＩＤ配列は、未登録のＪＳＯＮの構造といえる。そこで、格納検索制御部２４は、その得られたフィールドＩＤ配列に新たにタイプＩＤを付与し、タイプＩＤ管理ツリー３１及びタイプ管理テーブル３２に登録する。 Next, the storage search control unit 24 searches the type ID management tree 31 (FIG. 10) using the obtained field ID array, and acquires the type ID indicating the JSON structure. As a result of the search, if the obtained field ID array does not hit in the type ID management tree 31, the obtained field ID array can be said to be an unregistered JSON structure. Therefore, the storage search control unit 24 newly assigns a type ID to the obtained field ID array and registers it in the type ID management tree 31 and the type management table 32.

格納検索制御部２４は、タイプ管理テーブル３２の項目「フィールド」の「配置順」には、ドキュメント内のフィールドの出現順にフィールドＩＤを並べて格納する。格納検索制御部２４は、これ以外にフィールドＩＤを正順にソートし、その出現順とペアにしてタイプ管理テーブル３２に格納する。 The storage search control unit 24 stores the field IDs in the “arrangement order” of the item “field” of the type management table 32 in the order of appearance of the fields in the document. In addition to this, the storage search control unit 24 sorts the field IDs in the normal order, and stores them in the type management table 32 in pairs with the appearance order.

格納検索制御部２４は、挿入するＪＳＯＮのタイプＩＤが得られたら、図７に示すように、タイプＩＤと共に、出現順のままフィールド値をストア領域２８へ格納する。 When the type ID of the JSON to be inserted is obtained, the storage search control unit 24 stores the field values in the store area 28 together with the type IDs in the order of appearance as shown in FIG.

図１３Ａ及び図１３Ｂは、本実施形態におけるＪＳＯＮに基づくコレクションへのドキュメントの挿入のフローチャートである。図１４は、図１３Ａ及び図１３Ｂにおいて用いられるデータである。 13A and 13B are flowcharts for inserting a document into a collection based on JSON in the present embodiment. FIG. 14 shows data used in FIGS. 13A and 13B.

取得部２３は、要求元から、コマンド及びＪＳＯＮを取得する（Ｓ１）。要求元は、例えば、外部の端末、または内部のプログラムを含む。ＪＳＯＮは、例えば、図１４（Ａ）に示す内容である。 The acquisition unit 23 acquires a command and JSON from the request source (S1). The request source includes, for example, an external terminal or an internal program. JSON has the contents shown in FIG.

格納検索制御部２４は、取得したコマンドを判別する。取得したコマンドがコレクションへのドキュメントの挿入コマンドであると判定した場合、格納検索制御部２４は、ＪＳＯＮの字句解析（ＬｅｘｉｃａｌＡｎａｌｙｓｉｓ）を行う（Ｓ２）。ＪＳＯＮの字句解析では、格納検索制御部２４は、ＪＳＯＮを先頭から順に読み取っていき、ＪＳＯＮの文字列をフィールド（名前と値の組）に分割する。字句解析結果として、格納検索制御部２４は、図１４（Ｂ）の内容を得る。字句解析結果には、フィールド名（Ｎａｍｅ）と値（Ｖａｌｕｅ）の組が、読み取り順に格納される。 The storage search control unit 24 determines the acquired command. If it is determined that the acquired command is a command for inserting a document into the collection, the storage search control unit 24 performs JSON lexical analysis (S2). In the JSON lexical analysis, the storage search control unit 24 reads the JSON in order from the head, and divides the JSON character string into fields (name / value pairs). As a lexical analysis result, the storage search control unit 24 obtains the contents of FIG. In the lexical analysis result, a set of field name (Name) and value (Value) is stored in the reading order.

格納検索制御部２４は、字句解析結果の上位から１フィールドを取得し、これをＦとする（Ｓ３）。格納検索制御部２４は、フィールドＩＤ管理ツリー２９からＦの「Ｎａｍｅ」（フィールド名）を検索する（Ｓ４）。 The storage search control unit 24 acquires one field from the top of the lexical analysis result, and designates this as F (S3). The storage search control unit 24 searches for “Name” (field name) of F from the field ID management tree 29 (S4).

ＦのＮａｍｅがフィールドＩＤ管理ツリー２９に存在しない場合（Ｓ５で「ＮＯ」）、格納検索制御部２４は、そのＦのＮａｍｅに、新たにフィールドＩＤを割り当てる（Ｓ６）。格納検索制御部２４は、ＦのＮａｍｅと割り当てたフィールドＩＤをフィールドＩＤ管理ツリー２９とフィールド名管理テーブル３０に登録する（Ｓ７）。フィールドＩＤ管理ツリー２９は、例えば、トライ木またはプレフィックス木等のデータ構造で構築される。Ｓ７の処理後、処理はＳ４へ戻る。 If the name of F does not exist in the field ID management tree 29 (“NO” in S5), the storage search control unit 24 assigns a new field ID to the name of F (S6). The storage search control unit 24 registers the name of F and the assigned field ID in the field ID management tree 29 and the field name management table 30 (S7). The field ID management tree 29 is constructed with a data structure such as a trie tree or a prefix tree, for example. After the process of S7, the process returns to S4.

Ｆのフィールド名がフィールドＩＤ管理ツリー２９に存在する場合（Ｓ５で「ＹＥＳ」）、格納検索制御部２４は、フィールドＩＤ管理ツリー２９から、その検索されたフィールド名のフィールドＩＤを取得する。格納検索制御部２４は、その取得したフィールドＩＤをＩとして、図１４（Ｃ）に示すように字句解析結果に付与する（Ｓ８）。 When the field name of F exists in the field ID management tree 29 (“YES” in S5), the storage search control unit 24 acquires the field ID of the searched field name from the field ID management tree 29. The storage search control unit 24 assigns the acquired field ID as I to the lexical analysis result as shown in FIG. 14C (S8).

格納検索制御部２４は、字句解析結果の上位から順に１行ずつ取得し、字句解析結果に含まれる行数分、Ｓ３〜Ｓ８のループ処理を繰り返す。 The storage search control unit 24 obtains one line at a time from the top of the lexical analysis result, and repeats the loop processing of S3 to S8 for the number of lines included in the lexical analysis result.

格納検索制御部２４は、各フィールドのＩから配列を作成し、その作成した配列をＪとする（Ｓ９）。例えば、図１４（Ｃ）の場合、Ｉは上から「１，７，５，４」である。この場合、図１４（Ｄ）に示すように、Ｊは、［１，７，５，４］で表される。 The storage search control unit 24 creates an array from I of each field, and designates the created array as J (S9). For example, in the case of FIG. 14C, I is “1, 7, 5, 4” from the top. In this case, as shown in FIG. 14D, J is represented by [1, 7, 5, 4].

格納検索制御部２４は、タイプＩＤ管理ツリー３１からＪを検索する（Ｓ１０）。ＪがタイプＩＤ管理ツリー３１に存在しない場合（Ｓ１１で「ＮＯ」）、格納検索制御部２４は、Ｊに新たにタイプＩＤを割り当てる（Ｓ１２）。 The storage search control unit 24 searches J from the type ID management tree 31 (S10). When J does not exist in the type ID management tree 31 (“NO” in S11), the storage search control unit 24 newly assigns a type ID to J (S12).

格納検索制御部２４は、Ｊと割り当てたタイプＩＤとを、タイプＩＤ管理ツリー３１とタイプ管理テーブル３２に登録する（Ｓ１３）。タイプＩＤ管理ツリー３１は、例えば、プレフィックス木等のデータ構造で構築される。Ｓ１３の処理後、処理はＳ９へ戻る。 The storage search control unit 24 registers J and the assigned type ID in the type ID management tree 31 and the type management table 32 (S13). The type ID management tree 31 is constructed with a data structure such as a prefix tree, for example. After the process of S13, the process returns to S9.

ＪがタイプＩＤ管理ツリー３１に存在する場合（Ｓ１０で「ＹＥＳ」）、格納検索制御部２４は、タイプＩＤ管理ツリー３１から、その検索されたＪのタイプＩＤを取得する（Ｓ１４）。 When J exists in the type ID management tree 31 (“YES” in S10), the storage search control unit 24 acquires the type ID of the searched J from the type ID management tree 31 (S14).

格納検索制御部２４は、ストア領域２８に、ＪＳＯＮを登録する（Ｓ１５）。ここでは、格納検索制御部２４は、ストア領域２８に、取得したタイプＩＤと、Ｊを構成する各Ｉに対応するフィールドの値（Ｖａｌｕｅ）を順に、格納する。 The storage search control unit 24 registers JSON in the store area 28 (S15). Here, the storage search control unit 24 sequentially stores the acquired type ID and the field value (Value) corresponding to each I constituting J in the store area 28.

次に、コレクションに含まれるドキュメントの検索工程について説明する。
図１５は、本実施形態におけるドキュメントの検索処理のフローチャートである。一例として、コレクションの中から条件に合致するドキュメントを抽出することを考える。例えば「“ａｇｅ”が２０以上のドキュメントを抽出する」という条件で検索すると想定する。 Next, a search process for documents included in the collection will be described.
FIG. 15 is a flowchart of document search processing in the present embodiment. As an example, consider extracting a document that meets a condition from a collection. For example, it is assumed that a search is performed under the condition “extract documents with“ age ”of 20 or more”.

取得部２３は、要求元から、コマンド及びＪＳＯＮを取得する（Ｓ２１）。要求元は、例えば、外部の端末、または内部のプログラムを含む。ＪＳＯＮには、対象となるフィールド名（以下、「対象フィールド名」と称する）が記述されている。 The acquisition unit 23 acquires a command and JSON from the request source (S21). The request source includes, for example, an external terminal or an internal program. In the JSON, a target field name (hereinafter referred to as “target field name”) is described.

格納検索制御部２４は、取得したコマンドを判別する。取得したコマンドがフィールドＩＤ管理ツリー２９（フィールド名管理テーブル３０）から対象フィールド名を検索するコマンドであると判定した場合、格納検索制御部２４は、フィールドＩＤ管理ツリー２９（フィールド名管理テーブル３０）から対象フィールド名を検索する（Ｓ２２）。 The storage search control unit 24 determines the acquired command. When it is determined that the acquired command is a command for searching the target field name from the field ID management tree 29 (field name management table 30), the storage search control unit 24 uses the field ID management tree 29 (field name management table 30). The target field name is searched from (S22).

上記の検索条件の場合、格納検索制御部２４は、フィールドＩＤ管理ツリー２９（図８）（フィールド名管理テーブル３０）からフィールド名“ａｇｅ”を検索し、そのフィールド名に対応するフィールドＩＤを取得する（この場合、“ａｇｅ”のフィールドＩＤは“３”である）。 In the case of the above search conditions, the storage search control unit 24 searches the field ID management tree 29 (FIG. 8) (field name management table 30) for the field name “age” and acquires the field ID corresponding to the field name. (In this case, the field ID of “age” is “3”).

フィールドＩＤ管理ツリー２９（フィールド名管理テーブル３０）に、対象フィールド名が存在しない場合（Ｓ２２で「ＮＯ」）、フィールドＩＤ管理ツリー２９（フィールド名管理テーブル３０）において、取り出すドキュメントがないので、次の処理を行う。すなわち、格納検索制御部２４は、出力部２５に、検索結果として、フィールドＩＤ管理ツリー２９（フィールド名管理テーブル３０）に対象フィールド名が存在しないことを通知する。出力部２５は、要求元に、検索結果を出力する（Ｓ３１）。 If the target field name does not exist in the field ID management tree 29 (field name management table 30) (“NO” in S22), there is no document to be extracted in the field ID management tree 29 (field name management table 30). Perform the process. That is, the storage search control unit 24 notifies the output unit 25 that the target field name does not exist in the field ID management tree 29 (field name management table 30) as a search result. The output unit 25 outputs the search result to the request source (S31).

フィールドＩＤ管理ツリー２９（フィールド名管理テーブル３０）に、対象フィールド名が存在する場合（Ｓ２２で「ＹＥＳ」）、格納検索制御部２４は、フィールドＩＤ管理ツリー２９（フィールド名管理テーブル３０）から対象フィールド名のフィールドＩＤを取得し、それをＦとする（Ｓ２３）。 When the target field name exists in the field ID management tree 29 (field name management table 30) (“YES” in S22), the storage search control unit 24 selects the target from the field ID management tree 29 (field name management table 30). The field ID of the field name is acquired and set as F (S23).

格納検索制御部２４は、ストア領域２８から１行（１ドキュメント）を取得し、これをドキュメントＩとする（Ｓ２４）。 The storage search control unit 24 acquires one line (one document) from the store area 28 and sets it as the document I (S24).

格納検索制御部２４は、ドキュメントＩからタイプＩＤを取り出し、これをＳとする（Ｓ２５）。格納検索制御部２４は、タイプ管理テーブル３２から、タイプＩＤがＳと一致する行（ドキュメント）を取り出し、これをドキュメントＴとする（Ｓ２６）。 The storage search control unit 24 extracts the type ID from the document I and sets it as S (S25). The storage search control unit 24 extracts a line (document) whose type ID matches S from the type management table 32, and sets this as a document T (S26).

タイプ管理テーブル３２の各行（各ドキュメント）には、その行（ドキュメント）に含まれるフィールドがフィールドＩＤ順に並んでいる。格納検索制御部２４は、ドキュメントＴの中のフィールドＩＤに、Ｆが含まれているか否かを検索する（Ｓ２７）。検索は２分探索でもＳＩＭＤ（Single Instruction Multiple Data）命令を用いることもできる。 In each row (each document) of the type management table 32, fields included in the row (document) are arranged in order of field ID. The storage search control unit 24 searches whether or not F is included in the field ID in the document T (S27). The search can be a binary search or a SIMD (Single Instruction Multiple Data) instruction.

Ｔの中のフィールドＩＤにＦが含まれない場合（Ｓ２７で「ＮＯ」）、格納検索制御部２４は、ループ処理の先頭の処理（Ｓ２４）を行う。すなわち、その行（ドキュメントＴ）には、「ａｇｅ」というフィールドが含まれていないことが分かる。この場合、格納検索制御部２４は、次のドキュメントを取り出すためにＳ２４の処理に戻る。 When F is not included in the field ID in T (“NO” in S27), the storage search control unit 24 performs the first process (S24) of the loop process. That is, it can be seen that the field (document T) does not include the field “age”. In this case, the storage search control unit 24 returns to the process of S24 to take out the next document.

Ｔの中のフィールドＩＤにＦが含まれる場合（Ｓ２７で「ＹＥＳ」）、格納検索制御部２４は、Ｉの中のＦを抽出する（Ｓ２８）。この場合、上記の例では、格納検索制御部２４は、そのフィールド「ａｇｅ」の値をストア領域２８から取り出す。 When F is included in the field ID in T (“YES” in S27), the storage search control unit 24 extracts F in I (S28). In this case, in the above example, the storage search control unit 24 extracts the value of the field “age” from the store area 28.

格納検索制御部２４は、検索条件が成立するか否かを判定する（Ｓ２９）。上記の例では、格納検索制御部２４は、ストア領域２８から取り出したフィールド「ａｇｅ」の値と、検索条件（「“ａｇｅ”が２０以上」）とを比較する。 The storage search control unit 24 determines whether the search condition is satisfied (S29). In the above example, the storage search control unit 24 compares the value of the field “age” extracted from the store area 28 with the search condition (“age” is 20 or more ”).

検索条件が成立しない場合（Ｓ２９で「ＮＯ」）、すなわち、ストア領域２８から取り出したフィールドの値が検索条件を満たさない場合、格納検索制御部２４は、次のドキュメントを取り出すために、ループ処理の先頭の処理（Ｓ２４）を行う。 If the search condition is not satisfied (“NO” in S29), that is, if the value of the field extracted from the store area 28 does not satisfy the search condition, the storage search control unit 24 performs a loop process to extract the next document. The top process (S24) is performed.

検索条件が成立する場合（Ｓ２９で「ＹＥＳ」）、すなわち、ストア領域２８から取り出したフィールドの値が検索条件を満たす場合、格納検索制御部２４は、その検索条件を満たすドキュメントを取り出し（Ｓ３０）、作業領域に保存する。具体的には、格納検索制御部２４は、Ｓ２６で取得したタイプ管理テーブル３２の行の中の配置順のフィールドＩＤの並びを取り出し、フィールド名管理テーブル３０を検索してフィールド名を作業領域に出力する。格納検索制御部２４は、同様に、ストア領域２８からフィールドの値を作業領域に出力する。格納検索制御部２４は、全てのフィールドを出力し終えたら、次のドキュメントを取り出すために、Ｓ２４の処理に戻る。 If the search condition is satisfied (“YES” in S29), that is, if the value of the field extracted from the store area 28 satisfies the search condition, the storage search control unit 24 extracts a document that satisfies the search condition (S30). Save it in the work area. Specifically, the storage search control unit 24 takes out the arrangement of field IDs in the arrangement order in the row of the type management table 32 acquired in S26, searches the field name management table 30, and sets the field name as a work area. Output. Similarly, the storage search control unit 24 outputs the field value from the store area 28 to the work area. When all the fields have been output, the storage / retrieval control unit 24 returns to the process of S24 to take out the next document.

格納検索制御部２４は、ストア領域の上位から順に１行ずつ取得し、字句解析結果に含まれる行数分、Ｓ２４〜Ｓ３０のループ処理を繰り返す。 The storage search control unit 24 obtains one line at a time in order from the top of the store area, and repeats the loop processing of S24 to S30 for the number of lines included in the lexical analysis result.

格納検索制御部２４は、出力部２５に、検索結果としてＳ３０で累積的に作業領域に保存したドキュメントを送信する。出力部２５は、要求元に、検索結果を出力する（Ｓ３１）。 The storage search control unit 24 transmits to the output unit 25 the documents stored in the work area in S30 as search results. The output unit 25 outputs the search result to the request source (S31).

例えば、図６のプログラムＰ１を例にとる。すなわち、格納検索制御部２４は、ストア領域２８から１行目のドキュメントを取り出し、そのドキュメントからタイプＩＤ＝１を取得する。格納検索制御部２４は、タイプ管理テーブル３２から、その取得したタイプＩＤ＝１に対応する行を取得する。 For example, the program P1 in FIG. 6 is taken as an example. That is, the storage search control unit 24 takes out the document on the first line from the store area 28 and acquires type ID = 1 from the document. The storage search control unit 24 acquires a row corresponding to the acquired type ID = 1 from the type management table 32.

ここで、タイプ管理テーブル３２から取得した行に登録されたフィールドＩＤ順は、［１，４，５，７］である。したがって、Ｓ２７において、“ａｇｅ”のフィールドＩＤに３は含まれていないことが分かる。この場合、Ｓ２７で「ＮＯ」へ進み、次のドキュメントについてループ処理が実行される。 Here, the field ID order registered in the row acquired from the type management table 32 is [1, 4, 5, 7]. Therefore, in S27, it is understood that 3 is not included in the field ID of “age”. In this case, the process proceeds to “NO” in S27, and loop processing is executed for the next document.

次に、例えば、図６のプログラムＰ３を例にとる。この場合、格納検索制御部２４は、ストア領域２８から３行目のドキュメントを取り出し、そのドキュメントからタイプＩＤ＝３を取得する。格納検索制御部２４は、タイプ管理テーブル３２から、その取得したタイプＩＤ＝３に対応する行を取得する。 Next, for example, the program P3 in FIG. 6 is taken as an example. In this case, the storage search control unit 24 takes out the document on the third line from the store area 28 and acquires type ID = 3 from the document. The storage search control unit 24 acquires a row corresponding to the acquired type ID = 3 from the type management table 32.

ここで、タイプ管理テーブル３２から取得した行に登録されたフィールドＩＤ順は、［１，３，９，１０］である。Ｓ２７において、“ａｇｅ”のフィールドＩＤに３が含まれていることが分かるので、格納検索制御部２４は、Ｓ２８、Ｓ２９の処理を行う。 Here, the field ID order registered in the row acquired from the type management table 32 is [1, 3, 9, 10]. In S27, it can be seen that 3 is included in the field ID of “age”, so the storage search control unit 24 performs the processes of S28 and S29.

ここで、タイプ管理テーブル３２から取得した行に登録されたフィールドＩＤの配置順は［１，３，１０，９］である。格納検索制御部２４は、フィールドＩＤの配置順［１，３，１０，９］に基づいて、フィールド名管理テーブル３０を検索して、対応するフィールド名が“ｎａｍｅ”、“ａｇｅ”、“ｓｔａｔｕｓ”、“ｇｒｏｕｐｓ”であることが分かる。格納検索制御部２４は、ストア領域２８から取り出したフィールドの値と合わせて出力する。 Here, the arrangement order of the field IDs registered in the row acquired from the type management table 32 is [1, 3, 10, 9]. The storage search control unit 24 searches the field name management table 30 based on the field ID arrangement order [1, 3, 10, 9], and the corresponding field names are “name”, “age”, “status”. "," Groups ". The storage search control unit 24 outputs it together with the value of the field extracted from the store area 28.

図１６は、本実施形態を用いた場合の指定のフィールド値の取得方法について説明する図である。例えば、検索においてフィールド“ｇｅｎｄｅｒ”が指定されたとする。このとき、図８に示すように、フィールド“ｇｅｎｄｅｒ”のフィールドＩＤは、“５”である。タイプＩＤ＝１のドキュメントの場合には、タイプ管理テーブル３２を参照すると、第３番目の列に、フィールドＩＤは、“５”が格納されている。これに基づいて、ストア領域２８から、タイプＩＤ＝１のドキュメントについて、フィールドＩＤ＝“５”に対応する値“ｍ”が取り出される。その後は、この値について、検索条件との比較がなされる。 FIG. 16 is a diagram for explaining a method for acquiring a designated field value when this embodiment is used. For example, it is assumed that the field “gender” is designated in the search. At this time, as shown in FIG. 8, the field ID of the field “gender” is “5”. In the case of a document of type ID = 1, referring to the type management table 32, the field ID “5” is stored in the third column. Based on this, the value “m” corresponding to the field ID = “5” is extracted from the store area 28 for the document of type ID = 1. Thereafter, this value is compared with the search condition.

このように、ＪＳＯＮのスキーマがタイプＩＤで管理されているので、フィールド名からフィールドＩＤを検索した後は数値の比較だけで検索できる。 As described above, since the JSON schema is managed by the type ID, after searching the field ID from the field name, it can be searched only by comparing numerical values.

このように、ＪＳＯＮを解析して、値を除いたフィールド名及びどのフィールド名を含むかというドキュメントの構造に関する情報（メタ情報）と、フィールドの値とを分離させることができる。そして、検索対象となるフィールドを効率的に検索ができるように、トライ木、プレフィックス木、Ｂ木等の木構造を用いて、メタ情報は管理・保存されている。 In this way, by analyzing the JSON, it is possible to separate the field name excluding the value and information (meta information) regarding the structure of the document indicating which field name is included from the field value. The meta information is managed and stored using a tree structure such as a trie tree, a prefix tree, and a B-tree so that a field to be searched can be efficiently searched.

よって、本実施形態を用いることにより、スキームレス型文書を解析してメタ情報をＩＤで管理すると共に、フィールド値と管理して、高速検索可能なようにスキームレス型文書を格納することができる。 Therefore, by using this embodiment, it is possible to analyze a schemeless document and manage meta information by ID, and manage it as a field value to store a schemeless document so that it can be searched at high speed. .

図１７は、本実施形態におけるプログラムを実行するコンピュータのハードウェア構成の一例を示す。情報処理装置は、例えば、コンピュータ４１である。 FIG. 17 shows an example of the hardware configuration of a computer that executes a program according to this embodiment. The information processing apparatus is, for example, a computer 41.

コンピュータ４１は、中央演算装置（ＣＰＵ）４２、メモリ４３、ハードディスクドライブ（ＨＤＤ）４４、ネットワークインターフェースカード（ＮＩＣ）４５、入力インターフェース４６及びビデオインターフェース４７を含む。以下、インターフェースを「Ｉ／Ｆ」と称する。 The computer 41 includes a central processing unit (CPU) 42, a memory 43, a hard disk drive (HDD) 44, a network interface card (NIC) 45, an input interface 46 and a video interface 47. Hereinafter, the interface is referred to as “I / F”.

ＣＰＵ４２、メモリ４３、ＨＤＤ４４、ＮＩＣ４５、入力Ｉ／Ｆ４６及びビデオＩ／Ｆ４７はバス４８によって接続されている。入力Ｉ／Ｆ４６にはキーボードまたはマウス等の入力装置４９が接続されている。ビデオＩ／Ｆ４７には、ディスプレイ５０が接続されている。 The CPU 42, memory 43, HDD 44, NIC 45, input I / F 46 and video I / F 47 are connected by a bus 48. An input device 49 such as a keyboard or a mouse is connected to the input I / F 46. A display 50 is connected to the video I / F 47.

ＣＰＵ４２は、プロセッサの一例であって、コンピュータ４１の全体動作を制御する中央演算装置である。メモリ４３は、ワーキングエリアとして機能する。 The CPU 42 is an example of a processor, and is a central processing unit that controls the overall operation of the computer 41. The memory 43 functions as a working area.

ＨＤＤ４４は、オペレーティングシステム（ＯＳ）や本実施形態に係るプログラムが格納されている大容量記憶装置である。ＨＤＤ４４は、ストア領域２８を含み、フィールドＩＤ管理ツリー２９、フィールド名管理テーブル３０、タイプＩＤ管理ツリー３１、及びタイプ管理テーブル３２を格納する。ＨＤＤ４４は、大容量記憶装置の一例であって、これに限定されず、例えばＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）であってもよい。 The HDD 44 is a mass storage device that stores an operating system (OS) and a program according to the present embodiment. The HDD 44 includes a store area 28 and stores a field ID management tree 29, a field name management table 30, a type ID management tree 31, and a type management table 32. The HDD 44 is an example of a large-capacity storage device, and is not limited thereto. For example, the HDD 44 may be an SSD (Solid State Drive).

ＮＩＣ４５は、インターネット、ローカルエリアネットワーク（ＬＡＮ）等の通信ネットワークと有線または無線で接続するためのインターフェースである。 The NIC 45 is an interface for connecting to a communication network such as the Internet or a local area network (LAN) by wire or wireless.

入力Ｉ／Ｆ４６は、入力装置１８から入力された指令をＣＰＵ４２へ伝達するインターフェースである。ビデオＩ／Ｆ４７は、ディスプレイ５０に画像を出力するインターフェースである。 The input I / F 46 is an interface that transmits a command input from the input device 18 to the CPU 42. The video I / F 47 is an interface that outputs an image to the display 50.

ＣＰＵ４２は、本実施形態に係るプログラム、すなわちＨＤＤ４４に格納されているドキュメントＤＢに対してドキュメントの格納、検索等を行うプログラムを読み出し、実行する。これにより、ＣＰＵ４２は、取得部２３、格納検索制御部２４、出力部２５として機能する。 The CPU 42 reads and executes a program according to the present embodiment, that is, a program for storing and searching a document with respect to the document DB stored in the HDD 44. Thereby, the CPU 42 functions as the acquisition unit 23, the storage search control unit 24, and the output unit 25.

上記実施形態で説明した処理を実現するプログラムは、プログラム提供者側から通信ネットワーク及びＮＩＣ４５を介して、例えば記憶装置ＨＤＤに格納されてもよい。また、上記実施形態で説明した処理を実現するプログラムは、市販され、流通している可搬型記憶媒体に格納されていてもよい。この場合、この可搬型記憶媒体は外付け又は内蔵の読取装置にセットされて、ＣＰＵ４２によってそのプログラムが読み出されて、実行されてもよい。可搬型記憶媒体としてはＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、フレキシブルディスク、光ディスク、光磁気ディスク、ＩＣカード、ＵＳＢメモリ装置など様々な形式の記憶媒体を使用することができる。このような記憶媒体に格納されたプログラムが読取装置によって読み取られる。 The program for realizing the processing described in the above embodiment may be stored in the storage device HDD, for example, via the communication network and the NIC 45 from the program provider side. Moreover, the program which implement | achieves the process demonstrated by the said embodiment may be stored in the portable storage medium marketed and distribute | circulated. In this case, the portable storage medium may be set in an external or built-in reader, and the program may be read out and executed by the CPU 42. As a portable storage medium, various types of storage media such as a CD-ROM, a DVD-ROM, a flexible disk, an optical disk, a magneto-optical disk, an IC card, and a USB memory device can be used. A program stored in such a storage medium is read by a reader.

本実施形態によれば、フィールド名を整数値となるフィールドＩＤにマッピングさせ、既にフィールドＩＤ順に並べられたフィールドＩＤ配列から検索することにより、検索処理の高速化を実現することができる。 According to the present embodiment, the field name is mapped to a field ID that is an integer value, and a search is performed from a field ID array that has already been arranged in the field ID order, so that the search process can be speeded up.

また、本実施形態によれば、新しいＪＳＯＮのスキーマが出現しないパターンでの高速化が極力高速に可能なようにしている。例えば、挿入対象のＪＳＯＮを先頭から最後へ一度だけパースすれば、ストア領域に格納するデータが作成可能となっている。これにより、挿入処理のオーバーヘッド削減を実現している。 In addition, according to the present embodiment, it is possible to increase the speed as much as possible with a pattern in which a new JSON schema does not appear. For example, if the JSON to be inserted is parsed only once from the beginning to the end, data to be stored in the store area can be created. Thereby, the overhead of insertion processing is reduced.

なお、本発明は、以上に述べた実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内で種々の構成または実施形態を取ることができる。 The present invention is not limited to the above-described embodiment, and various configurations or embodiments can be taken without departing from the gist of the present invention.

以上の実施の形態に関し、更に以下の付記を開示する。
（付記１）
データ型が決められてなく、かつフィールド名と該フィールド名に対応する値とを含むドキュメントを取得する取得部と、
前記ドキュメントに記述されたフィールド名を特定するフィールド名特定情報を、前記ドキュメントにおける該フィールド名の記述順に保持する保持部と、
前記記述順と、該記述順に応じて特定される前記ドキュメントのタイプ情報とを関係付けた記述順タイプ関係情報を生成する生成部と、
前記タイプ情報と、前記記述順に保持された前記フィールド名特定情報に対応する前記フィールド名に対応する前記値とを関係づけたタイプ‐値関係情報を格納領域に格納する格納制御部と、
を備えることを特徴とする情報処理装置。
（付記２）
前記保持部は、第１木構造を用いて前記フィールド名を保持した第１木構造情報に基づいて、該フィールド名から、該フィールド名に対応する前記フィールド名特定情報を取得し、該フィールド名特定情報を、前記ドキュメントにおける該フィールド名の記述順に保持し、
前記格納制御部は、第２木構造を用いて前記フィールド名特定情報を前記記述順に保持した第２木構造情報に基づいて、該フィールド名特定情報の該記述順から、該記述順に応じて特定される前記ドキュメントの前記タイプ情報を取得し、前記記述順と該タイプ情報とを関係付けたタイプ‐値関係情報を格納領域に格納する
ことを特徴とする付記１に記載の情報処理装置。
（付記３）
前記情報処理装置は、さらに、
前記取得部により検索条件を取得する場合、前記タイプ情報に基づいて、前記記述順タイプ関係情報と前記タイプ‐値関係情報とを関係付け、関係づけた前記記述順タイプ関係情報と前記タイプ‐値関係情報から該検索条件に含まれるフィールド名に対応する値が該条件を満たすか否かに基づいて、検索を行う検索部
を備えることを特徴とする付記１又は２に記載の情報処理装置。
（付記４）
コンピュータが、
データ型が決められてなく、かつフィールド名と該フィールド名に対応する値とを含むドキュメントを取得し、
前記ドキュメントに記述されたフィールド名を特定するフィールド名特定情報を、前記ドキュメントにおける該フィールド名の記述順に保持し、
前記記述順と、該記述順に応じて特定される前記ドキュメントのタイプ情報とを関係付けた記述順タイプ関係情報を生成し、
前記タイプ情報と、前記記述順に保持された前記フィールド名特定情報に対応する前記フィールド名に対応する前記値とを関係づけたタイプ‐値関係情報を格納領域に格納する、
ことを特徴とする情報処理方法。
（付記５）
前記保持において、第１木構造を用いて前記フィールド名を保持した第１木構造情報に基づいて、該フィールド名から、該フィールド名に対応する前記フィールド名特定情報を取得し、該フィールド名特定情報を、前記ドキュメントにおける該フィールド名の記述順に保持し、
前記格納において、第２木構造を用いて前記フィールド名特定情報を前記記述順に保持した第２木構造情報に基づいて、該フィールド名特定情報の該記述順から、該記述順に応じて特定される前記ドキュメントの前記タイプ情報を取得し、前記記述順と該タイプ情報とを関係付けたタイプ‐値関係情報を格納領域に格納する
ことを特徴とする付記４に記載の情報処理方法。
（付記６）
前記情報処理方法は、さらに、
前記検索条件を取得する場合、前記タイプ情報に基づいて、前記記述順タイプ関係情報と前記タイプ‐値関係情報とを関係付け、関係づけた前記記述順タイプ関係情報と前記タイプ‐値関係情報から該検索条件に含まれるフィールド名に対応する値が該条件を満たすか否かに基づいて、検索を行う
ことを特徴とする付記４又は５に記載の情報処理方法。
（付記７）
コンピュータに、
データ型が決められてなく、かつフィールド名と該フィールド名に対応する値とを含むドキュメントを取得し、
前記ドキュメントに記述されたフィールド名を特定するフィールド名特定情報を、前記ドキュメントにおける該フィールド名の記述順に保持し、
前記記述順と、該記述順に応じて特定される前記ドキュメントのタイプ情報とを関係付けた記述順タイプ関係情報を生成し、
前記タイプ情報と、前記記述順に保持された前記フィールド名特定情報に対応する前記フィールド名に対応する前記値とを関係づけたタイプ‐値関係情報を格納領域に格納する、
処理を実行させるプログラム。
（付記８）
前記保持において、第１木構造を用いて前記フィールド名を保持した第１木構造情報に基づいて、該フィールド名から、該フィールド名に対応する前記フィールド名特定情報を取得し、該フィールド名特定情報を、前記ドキュメントにおける該フィールド名の記述順に保持し、
前記格納において、第２木構造を用いて前記フィールド名特定情報を前記記述順に保持した第２木構造情報に基づいて、該フィールド名特定情報の該記述順から、該記述順に応じて特定される前記ドキュメントの前記タイプ情報を取得し、前記記述順と該タイプ情報とを関係付けたタイプ‐値関係情報を格納領域に格納する
ことを特徴とする付記７に記載のプログラム。
（付記９）
前記プログラムは、さらに、
前記検索条件を取得する場合、前記タイプ情報に基づいて、前記記述順タイプ関係情報と前記タイプ‐値関係情報とを関係付け、関係づけた前記記述順タイプ関係情報と前記タイプ‐値関係情報から該検索条件に含まれるフィールド名に対応する値が該条件を満たすか否かに基づいて、検索を行う
ことを特徴とする付記７又は８に記載のプログラム。 Regarding the above embodiment, the following additional notes are disclosed.
(Appendix 1)
An acquisition unit that acquires a document whose data type is not determined and includes a field name and a value corresponding to the field name;
A holding unit for holding field name specifying information for specifying a field name described in the document in the order of description of the field name in the document;
A generating unit that generates description order type relationship information that associates the description order with the type information of the document specified according to the description order;
A storage control unit for storing, in a storage area, type-value relationship information that associates the type information with the value corresponding to the field name corresponding to the field name specifying information held in the description order;
An information processing apparatus comprising:
(Appendix 2)
The holding unit obtains the field name specifying information corresponding to the field name from the field name based on the first tree structure information holding the field name using the first tree structure, and the field name Holding specific information in the order of description of the field names in the document;
The storage control unit specifies the field name specifying information according to the description order from the description order of the field name specifying information based on the second tree structure information in which the field name specifying information is held in the description order using the second tree structure. The information processing apparatus according to appendix 1, wherein the type information of the document to be acquired is acquired, and type-value relationship information in which the description order and the type information are related is stored in a storage area.
(Appendix 3)
The information processing apparatus further includes:
When acquiring the search condition by the acquisition unit, the description order type relationship information and the type-value that relate and relate the description order type relationship information and the type-value relationship information based on the type information. The information processing apparatus according to appendix 1 or 2, further comprising: a search unit that performs a search based on whether or not a value corresponding to a field name included in the search condition from the relationship information satisfies the condition.
(Appendix 4)
Computer
Retrieve a document with an undefined data type and a field name and a value corresponding to the field name,
Holding field name specifying information for specifying field names described in the document in the order of description of the field names in the document;
Generating description order type relationship information that associates the description order with the type information of the document specified according to the description order;
Storing in the storage area type-value relationship information associating the type information with the value corresponding to the field name corresponding to the field name specifying information held in the description order;
An information processing method characterized by the above.
(Appendix 5)
In the holding, the field name specifying information corresponding to the field name is acquired from the field name based on the first tree structure information holding the field name using the first tree structure, and the field name specifying Holding information in the order of description of the field names in the document;
In the storage, based on the second tree structure information in which the field name specifying information is held in the description order using the second tree structure, the field name specifying information is specified according to the description order from the description order. The information processing method according to appendix 4, wherein the type information of the document is acquired, and type-value relationship information in which the description order is associated with the type information is stored in a storage area.
(Appendix 6)
The information processing method further includes:
When acquiring the search condition, based on the type information, the description order type relation information and the type-value relation information are related, and the related description order type relation information and the type-value relation information are related. The information processing method according to appendix 4 or 5, wherein a search is performed based on whether or not a value corresponding to a field name included in the search condition satisfies the condition.
(Appendix 7)
On the computer,
Retrieve a document with an undefined data type and a field name and a value corresponding to the field name,
Holding field name specifying information for specifying field names described in the document in the order of description of the field names in the document;
Generating description order type relationship information that associates the description order with the type information of the document specified according to the description order;
Storing in the storage area type-value relationship information associating the type information with the value corresponding to the field name corresponding to the field name specifying information held in the description order;
A program that executes processing.
(Appendix 8)
In the holding, the field name specifying information corresponding to the field name is acquired from the field name based on the first tree structure information holding the field name using the first tree structure, and the field name specifying Holding information in the order of description of the field names in the document;
In the storage, based on the second tree structure information in which the field name specifying information is held in the description order using the second tree structure, the field name specifying information is specified according to the description order from the description order. The program according to appendix 7, wherein the type information of the document is acquired, and type-value relationship information that associates the description order with the type information is stored in a storage area.
(Appendix 9)
The program further includes:
When acquiring the search condition, based on the type information, the description order type relation information and the type-value relation information are related, and the related description order type relation information and the type-value relation information are related. The program according to appendix 7 or 8, wherein a search is performed based on whether or not a value corresponding to a field name included in the search condition satisfies the condition.

１１情報処理装置
１２取得部
１３保持部
１４生成部
１５格納制御部
１６検索部
２１情報処理装置
２２制御部
２３取得部
２４格納検索制御部
２５出力部
２７格納部
２８ストア領域
２９フィールドＩＤ管理ツリー
３０フィールド名管理テーブル
３１タイプＩＤ管理ツリー
３２タイプ管理テーブル DESCRIPTION OF SYMBOLS 11 Information processing apparatus 12 Acquisition part 13 Holding part 14 Generation part 15 Storage control part 16 Search part 21 Information processing apparatus 22 Control part 23 Acquisition part 24 Storage search control part 25 Output part 27 Storage part 28 Store area 29 Field ID management tree 30 Field name management table 31 Type ID management tree 32 Type management table

Claims

データ型が決められてなく、かつフィールド名と該フィールド名に対応する値とを含むドキュメントを取得する取得部と、
前記ドキュメントに記述されたフィールド名を特定するフィールド名特定情報を、前記ドキュメントにおける該フィールド名の記述順に保持する保持部と、
前記記述順と、該記述順に応じて特定される前記ドキュメントのタイプ情報とを関係付けた記述順タイプ関係情報を生成する生成部と、
前記タイプ情報と、前記記述順に保持された前記フィールド名特定情報に対応する前記フィールド名に対応する前記値とを関係づけたタイプ‐値関係情報を格納領域に格納する格納制御部と、
を備えることを特徴とする情報処理装置。 An acquisition unit that acquires a document whose data type is not determined and includes a field name and a value corresponding to the field name;
A holding unit for holding field name specifying information for specifying a field name described in the document in the order of description of the field name in the document;
A generating unit that generates description order type relationship information that associates the description order with the type information of the document specified according to the description order;
A storage control unit for storing, in a storage area, type-value relationship information that associates the type information with the value corresponding to the field name corresponding to the field name specifying information held in the description order;
An information processing apparatus comprising:

前記保持部は、第１木構造を用いて前記フィールド名を保持した第１木構造情報に基づいて、該フィールド名から、該フィールド名に対応する前記フィールド名特定情報を取得し、該フィールド名特定情報を、前記ドキュメントにおける該フィールド名の記述順に保持し、
前記格納制御部は、第２木構造を用いて前記フィールド名特定情報を前記記述順に保持した第２木構造情報に基づいて、該フィールド名特定情報の該記述順から、該記述順に応じて特定される前記ドキュメントの前記タイプ情報を取得し、前記記述順と該タイプ情報とを関係付けたタイプ‐値関係情報を格納領域に格納する
ことを特徴とする請求項１に記載の情報処理装置。 The holding unit obtains the field name specifying information corresponding to the field name from the field name based on the first tree structure information holding the field name using the first tree structure, and the field name Holding specific information in the order of description of the field names in the document;
The storage control unit specifies the field name specifying information according to the description order from the description order of the field name specifying information based on the second tree structure information in which the field name specifying information is held in the description order using the second tree structure. 2. The information processing apparatus according to claim 1, wherein the type information of the document to be acquired is acquired, and type-value relationship information in which the description order is associated with the type information is stored in a storage area.

前記情報処理装置は、さらに、
前記取得部により検索条件を取得する場合、前記タイプ情報に基づいて、前記記述順タイプ関係情報と前記タイプ‐値関係情報とを関係付け、関係づけた前記記述順タイプ関係情報と前記タイプ‐値関係情報から該検索条件に含まれるフィールド名に対応する値が該条件を満たすか否かに基づいて、検索を行う検索部
を備えることを特徴とする請求項１又は２に記載の情報処理装置。 The information processing apparatus further includes:
When acquiring the search condition by the acquisition unit, the description order type relationship information and the type-value that relate and relate the description order type relationship information and the type-value relationship information based on the type information. The information processing apparatus according to claim 1, further comprising: a search unit that performs a search based on whether a value corresponding to a field name included in the search condition satisfies the condition from relationship information. .

コンピュータが、
データ型が決められてなく、かつフィールド名と該フィールド名に対応する値とを含むドキュメントを取得し、
前記ドキュメントに記述されたフィールド名を特定するフィールド名特定情報を、前記ドキュメントにおける該フィールド名の記述順に保持し、
前記記述順と、該記述順に応じて特定される前記ドキュメントのタイプ情報とを関係付けた記述順タイプ関係情報を生成し、
前記タイプ情報と、前記記述順に保持された前記フィールド名特定情報に対応する前記フィールド名に対応する前記値とを関係づけたタイプ‐値関係情報を格納領域に格納する、
ことを特徴とする情報処理方法。 Computer
Retrieve a document with an undefined data type and a field name and a value corresponding to the field name,
Holding field name specifying information for specifying field names described in the document in the order of description of the field names in the document;
Generating description order type relationship information that associates the description order with the type information of the document specified according to the description order;
Storing in the storage area type-value relationship information associating the type information with the value corresponding to the field name corresponding to the field name specifying information held in the description order;
An information processing method characterized by the above.

コンピュータに、
データ型が決められてなく、かつフィールド名と該フィールド名に対応する値とを含むドキュメントを取得し、
前記ドキュメントに記述されたフィールド名を特定するフィールド名特定情報を、前記ドキュメントにおける該フィールド名の記述順に保持し、
前記記述順と、該記述順に応じて特定される前記ドキュメントのタイプ情報とを関係付けた記述順タイプ関係情報を生成し、
前記タイプ情報と、前記記述順に保持された前記フィールド名特定情報に対応する前記フィールド名に対応する前記値とを関係づけたタイプ‐値関係情報を格納領域に格納する、
処理を実行させる制御プログラム。 On the computer,
Retrieve a document with an undefined data type and a field name and a value corresponding to the field name,
Holding field name specifying information for specifying field names described in the document in the order of description of the field names in the document;
Generating description order type relationship information that associates the description order with the type information of the document specified according to the description order;
Storing in the storage area type-value relationship information associating the type information with the value corresponding to the field name corresponding to the field name specifying information held in the description order;
A control program that executes processing.