JP2009129202A

JP2009129202A - Data processor, data processing method, and program

Info

Publication number: JP2009129202A
Application number: JP2007303720A
Authority: JP
Inventors: Kohei Takeda; 光平武田; Yasuo Sanbe; 靖夫三部
Original assignee: NTT Data Corp
Current assignee: NTT Data Group Corp
Priority date: 2007-11-22
Filing date: 2007-11-22
Publication date: 2009-06-11
Anticipated expiration: 2027-11-22
Also published as: JP5162215B2

Abstract

PROBLEM TO BE SOLVED: To more quickly perform processing to non-structured data than a conventional manner. SOLUTION: A query analysis part 21 analyzes a query with input non-structured data as an object of processing, and acquires information showing "processing object data extraction conditions", "processing object data type" and "processing type". A structured data/index retrieval part 23 extracts the non-structured data under "processing object data extraction conditions", and creates the structured data whose type conversion has been performed by the "processing object data type". Then, the optimal index data based on the "processing object data type" and "processing type" are created for the value of the created structured data. A data processing part 24 quickens processing by referring to the index data in executing processing by the query. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、データ処理装置、データ処理方法、および、プログラムに関する。 The present invention relates to a data processing device, a data processing method, and a program.

構造化されているデータ（構造化データ）に対する、検索、ソート、比較などの処理を高速化するために、データを構成する特定のデータ項目に、該データ項目の型や処理内容に適した各種のインデックスを付与することが行われている。
例えば、特許文献１、２は、リレーショナルデータベースに格納されている構造化データへのクエリの履歴を分析し、該構造化データを構成するデータ項目に自動的にインデックスを付与する技術を開示している。 In order to speed up processing such as search, sort, and comparison for structured data (structured data), various types of data that are suitable for the data item type and processing content The indexing has been done.
For example, Patent Documents 1 and 2 disclose a technique for analyzing a history of queries to structured data stored in a relational database and automatically assigning an index to data items constituting the structured data. Yes.

オフィス文書やログデータ等のように、明確な構造を持たないデータ（非構造化データ）がある。非構造化データは、べた書きされたテキストデータやバイナリデータ等であり、データ項目に分割されていないため、インデックスを付与して処理の高速化を行うことができない。 There is data (unstructured data) that does not have a clear structure, such as office documents and log data. The unstructured data is written text data, binary data, or the like, and is not divided into data items. Therefore, it is impossible to increase the processing speed by adding an index.

特許文献３は、あらかじめ作成した所定のルールに基づいて、非構造化文書（非構造化データ）を、データ項目を有する構造化文書（構造化データ）に変換する技術を開示している。
特開平１１−０５３４０１号公報特開平１０−１１１８１９号公報特開平９−０６９１０１号公報 Patent Document 3 discloses a technique for converting an unstructured document (unstructured data) into a structured document (structured data) having data items based on predetermined rules created in advance.
JP 11-053401 A JP 10-1111819 A JP-A-9-0691101

しかしながら、特許文献３に記載の技術では、変換するための所定のルールを変換先の構造化文書の形式毎に定義しなければならず、ユーザにとって大きな負担となる。 However, in the technique described in Patent Document 3, a predetermined rule for conversion must be defined for each format of a structured document to be converted, which is a heavy burden on the user.

また、非構造化データをテキストデータに変換し、形態素解析などの解析をして、全文検索インデックスを付与することで、非構造化データに対する処理を高速化できるようにすることも行われている。しかしながら、全文検索インデックスは、文字や文字列に対する処理を高速化するためのインデックスであるため、数値的な比較や集計処理などの処理を高速化することはできない。 In addition, it is possible to speed up the processing for unstructured data by converting unstructured data into text data, performing analysis such as morphological analysis, and assigning a full-text search index. . However, since the full-text search index is an index for speeding up processing for characters and character strings, processing such as numerical comparison and tabulation processing cannot be speeded up.

本発明は、上記実状に鑑みてなされたものであり、非構造化データに対する処理を従来より高速に行うことができるデータ処理装置等を提供することを目的とする。
また、本発明は、非構造化データを構造化する際における、ユーザの負担を軽減することができるデータ処理装置等を提供することを目的とする。
さらに、本発明は、処理の内容に適したインデックスをデータに付与することができるデータ処理装置等を提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a data processing apparatus and the like that can perform processing on unstructured data at a higher speed than before.
Another object of the present invention is to provide a data processing device and the like that can reduce the burden on the user when structuring unstructured data.
Furthermore, an object of the present invention is to provide a data processing apparatus or the like that can assign an index suitable for the content of processing to data.

上記目的を達成するため、本発明の第１の観点に係るデータ処理装置は、
非構造化データを記憶する非構造化データ記憶手段と、
前記非構造化データを処理対象とするクエリを受信するクエリ受信手段と、
前記クエリ受信手段で受信したクエリを解析して、該クエリによる処理の対象となる非構造化データの抽出条件を示す抽出条件情報と、該クエリによる処理が扱うデータ型を示す型情報と、該クエリによる処理のタイプを示した処理タイプ情報とを取得するクエリ情報取得手段と、
前記非構造化データから、前記抽出条件情報が示す条件に合致するデータを抽出し、抽出したデータを、前記型情報が示すデータ型に変換した構造化データを作成する構造化データ作成手段と、
前記構造化データ作成手段で作成した構造化データを記憶する構造化データ記憶手段と、
前記型情報と前記処理タイプ情報とに基づいて、前記構造化データ記憶手段に記憶した構造化データに対するインデックスデータを作成するインデックスデータ作成手段と、
前記インデックスデータ作成手段で作成したインデックスデータを記憶するインデックスデータ記憶手段と、
前記クエリ受信手段で受信したクエリによる処理を、処理対象データを前記構造化データ記憶手段に記憶した構造化データとして、前記インデックスデータ記憶手段に記憶したインデックスデータを参照して実行する、クエリ実行手段と、
を備えることを特徴とする。 In order to achieve the above object, a data processing apparatus according to the first aspect of the present invention provides:
Unstructured data storage means for storing unstructured data;
Query receiving means for receiving a query for processing the unstructured data;
Analyzing the query received by the query receiving means, extraction condition information indicating an extraction condition of unstructured data to be processed by the query, type information indicating a data type handled by the process by the query, Query information acquisition means for acquiring processing type information indicating the type of processing by the query,
Structured data creating means for extracting data that matches the condition indicated by the extraction condition information from the unstructured data, and creating structured data obtained by converting the extracted data into a data type indicated by the type information;
Structured data storage means for storing structured data created by the structured data creation means;
Index data creating means for creating index data for the structured data stored in the structured data storage means based on the type information and the processing type information;
Index data storage means for storing the index data created by the index data creation means;
Query execution means for executing processing by the query received by the query receiving means as structured data having processing target data stored in the structured data storage means with reference to the index data stored in the index data storage means When,
It is characterized by providing.

前記インデックスデータ作成手段は、
前記型情報と前記処理タイプ情報とに基づいてインデックスタイプを特定し、
特定したインデックスタイプのインデックスデータを作成してもよい。 The index data creation means includes
An index type is identified based on the type information and the processing type information,
You may create the index data of the specified index type.

上記目的を達成するため、本発明の第２の観点に係るデータ処理方法は、
コンピュータを用いてデータを処理するためのデータ処理方法であって、
非構造化データを処理対象とするクエリを受信するステップと、
受信したクエリを解析して、該クエリによる処理の対象となる非構造化データの抽出条件を示す抽出条件情報と、該クエリによる処理が扱うデータ型を示す型情報と、該クエリによる処理のタイプを示した処理タイプ情報とを取得するステップと、
前記非構造化データから、前記抽出条件情報が示す条件に合致するデータを抽出し、抽出したデータを、前記型情報が示すデータ型に変換した構造化データを作成するステップと、
前記型情報と前記処理タイプ情報とに基づいて、作成した構造化データに対するインデックスデータを作成するステップと、
受信したクエリによる処理を、処理対象データを作成した構造化データとして、作成したインデックスデータを参照して実行する、
ことを特徴とする。 In order to achieve the above object, a data processing method according to the second aspect of the present invention includes:
A data processing method for processing data using a computer,
Receiving a query for processing unstructured data;
Analyzing the received query, extraction condition information indicating the extraction condition of unstructured data to be processed by the query, type information indicating the data type handled by the process by the query, and the type of processing by the query Obtaining processing type information indicating
Extracting data that matches the condition indicated by the extraction condition information from the unstructured data, and creating structured data obtained by converting the extracted data into a data type indicated by the type information;
Creating index data for the created structured data based on the type information and the processing type information;
Execute the process based on the received query with reference to the created index data as structured data that created the data to be processed.
It is characterized by that.

上記目的を達成するため、本発明の第３の観点に係るプログラムは、
コンピュータを、
非構造化データを記憶する非構造化データ記憶手段と、
前記非構造化データを処理対象とするクエリを受信するクエリ受信手段と、
前記クエリ受信手段で受信したクエリを解析して、該クエリによる処理の対象となる非構造化データの抽出条件を示す抽出条件情報と、該クエリによる処理が扱うデータ型を示す型情報と、該クエリによる処理のタイプを示した処理タイプ情報とを取得するクエリ情報取得手段と、
前記非構造化データから、前記抽出条件情報が示す条件に合致するデータを抽出し、抽出したデータを、前記型情報が示すデータ型に変換した構造化データを作成する構造化データ作成手段と、
前記構造化データ作成手段で作成した構造化データを記憶する構造化データ記憶手段と、
前記型情報と前記処理タイプ情報とに基づいて、前記構造化データ記憶手段に記憶した構造化データに対するインデックスデータを作成するインデックスデータ作成手段と、
前記インデックスデータ作成手段で作成したインデックスデータを記憶するインデックスデータ記憶手段と、
前記クエリ受信手段で受信したクエリによる処理を、処理対象データを前記構造化データ記憶手段に記憶した構造化データとして、前記インデックスデータ記憶手段に記憶したインデックスデータを参照して実行する、クエリ実行手段と、
して機能させる。 In order to achieve the above object, a program according to the third aspect of the present invention provides:
Computer
Unstructured data storage means for storing unstructured data;
Query receiving means for receiving a query for processing the unstructured data;
Analyzing the query received by the query receiving means, extraction condition information indicating an extraction condition of unstructured data to be processed by the query, type information indicating a data type handled by the process by the query, Query information acquisition means for acquiring processing type information indicating the type of processing by the query,
Structured data creating means for extracting data that matches the condition indicated by the extraction condition information from the unstructured data, and creating structured data obtained by converting the extracted data into a data type indicated by the type information;
Structured data storage means for storing structured data created by the structured data creation means;
Index data creating means for creating index data for the structured data stored in the structured data storage means based on the type information and the processing type information;
Index data storage means for storing the index data created by the index data creation means;
Query execution means for executing processing by the query received by the query receiving means as structured data having processing target data stored in the structured data storage means with reference to the index data stored in the index data storage means When,
And make it work.

本発明によれば、クエリを解析した結果に基づいて、非構造化データを構造化する。そして、クエリの処理に最適なインデックスデータを、該構造化したデータの値に対して作成する。従って、非構造化データに対する処理を従来よりも高速に行うことが可能となる。 According to the present invention, unstructured data is structured based on the result of analyzing a query. Then, index data optimal for query processing is created for the structured data value. Therefore, it is possible to perform processing on unstructured data at a higher speed than in the past.

以下、本発明の実施形態に係るデータ処理装置１について、図面を参照して説明する。
データ処理装置１は、図１に示すように、入力部１１と、出力部１２と、記憶部１３と、制御部１４と、を備え、各部はバス１５を介して互いに接続されている。 Hereinafter, a data processing apparatus 1 according to an embodiment of the present invention will be described with reference to the drawings.
As shown in FIG. 1, the data processing apparatus 1 includes an input unit 11, an output unit 12, a storage unit 13, and a control unit 14, and each unit is connected to each other via a bus 15.

入力部１１は、キーボードやマウス等によって構成され、データ処理装置１に様々な情報や指示を入力するものである。例えば、ユーザは、入力部１１を操作して、データ検索などの処理要求（以下、クエリという）を入力する。 The input unit 11 is configured with a keyboard, a mouse, and the like, and inputs various information and instructions to the data processing apparatus 1. For example, the user operates the input unit 11 to input a processing request such as data search (hereinafter referred to as a query).

出力部１２は、ディスプレイ等の表示装置から構成され、様々な情報を出力する。例えば、出力部１２は、ユーザが入力したクエリによる処理の実行結果を示す画面を表示する。 The output unit 12 includes a display device such as a display, and outputs various information. For example, the output unit 12 displays a screen indicating the execution result of the process by the query input by the user.

記憶部１３は、ハードディスク等から構成され、様々な情報、固定データ、および、制御部１４が動作するためのプログラム等を記憶する。
また、記憶部１３は、図２に示すように、非構造化データ記憶部１３１と、構造化データ記憶部１３２と、最適インデック選択用テーブル１３３と、インデックス記憶部１３４と、を備える。 The storage unit 13 includes a hard disk or the like, and stores various information, fixed data, a program for operating the control unit 14, and the like.
As shown in FIG. 2, the storage unit 13 includes an unstructured data storage unit 131, a structured data storage unit 132, an optimum index selection table 133, and an index storage unit 134.

非構造化データ記憶部１３１は、非構造化データ（ワープロ文書、表計算文書、ログデータ等）と、該非構造化データの属性情報を記憶する非構造化データ管理テーブル１３１ａとを記憶する。
非構造化データ管理テーブル１３１ａは、図３に示すように、非構造化データ毎に、非構造化データＩＤと、該非構造化データの名称（ファイル名）と、該非構造化データの格納位置を示す情報とを記憶する。 The unstructured data storage unit 131 stores unstructured data (a word processor document, a spreadsheet document, log data, etc.) and an unstructured data management table 131a that stores attribute information of the unstructured data.
As shown in FIG. 3, the unstructured data management table 131a includes an unstructured data ID, a name (file name) of the unstructured data, and a storage location of the unstructured data for each unstructured data. Information to be stored.

図２に戻り、構造化データ記憶部１３２は、後述するクエリ処理により構造化されたデータ（構造化データ）の値と、該構造化データの属性情報を記憶する構造化データ管理テーブル１３２ａとを記憶する。
構造化データ管理テーブル１３２ａは、図４に示すように、構造化データ毎に、構造化データＩＤと、該構造化データの抽出条件と、抽出の型とを記憶する。この抽出条件で、非構造化データから抽出したデータを抽出の型が示すデータ型に変換した値が、構造化データ記憶部１３２に、対応する構造化データの値として記憶される。
例えば、図４に示す構造化データ管理テーブル１３２ａの先頭エントリより、構造化データＩＤ「１」の構造化データには、非構造化データ「発注書」の３行目、タブ位置５個目に位置するデータを数値型に変換したデータが、該構造化データの値として格納されていることがわかる。 Returning to FIG. 2, the structured data storage unit 132 stores a value of data (structured data) structured by query processing, which will be described later, and a structured data management table 132 a that stores attribute information of the structured data. Remember.
As shown in FIG. 4, the structured data management table 132a stores a structured data ID, an extraction condition for the structured data, and an extraction type for each structured data. Under this extraction condition, a value obtained by converting data extracted from unstructured data into a data type indicated by the extraction type is stored in the structured data storage unit 132 as the value of the corresponding structured data.
For example, from the top entry of the structured data management table 132a shown in FIG. 4, the structured data with the structured data ID “1” includes the third row of the unstructured data “purchase order” and the fifth tab position. It can be seen that data obtained by converting the position data into a numerical type is stored as the value of the structured data.

構造化データの値は、構造化データ管理テーブル１３２ａのエントリ毎に、構造化値ＩＤを付与されて記憶される。図５は、図４に示す管理テーブルの先頭のエントリが示す構造化データ（構造化データＩＤが１である構造化データ）の値を示したものであり、該構造化データは３つの値「１０７７６」、「６６５７」、「７７８７９」を有していることを示している。また、各値にはそれぞれ構造化値ＩＤが付与されている。 The value of the structured data is stored with a structured value ID for each entry of the structured data management table 132a. FIG. 5 shows the value of the structured data (structured data whose structured data ID is 1) indicated by the top entry of the management table shown in FIG. 10776 ”,“ 6657 ”, and“ 77879 ”. Each value is given a structured value ID.

図２に戻り、最適インデックス選択用テーブル１３３は、処理対象データを処理するのに最適なインデックスのタイプを選択するために参照されるテーブルである。
最適インデックス選択用テーブル１３３は、図６に示すように、処理対象データのデータ型と処理タイプの組みに対して、最適なインデックスのタイプを示す情報を格納する。
例えば、図６に示す最適インデックス選択用テーブル１３３の先頭エントリより、「数値型」のデータに対して「完全一致比較」をする処理に最適なインデックスのタイプは、「Hash Table」であることがわかる。 Returning to FIG. 2, the optimum index selection table 133 is a table that is referred to in order to select an optimum index type for processing the processing target data.
As shown in FIG. 6, the optimum index selection table 133 stores information indicating the optimum index type for the combination of the data type and the processing type of the processing target data.
For example, from the top entry of the optimum index selection table 133 shown in FIG. 6, the optimum index type for the “complete match comparison” processing for “numeric type” data is “Hash Table”. Recognize.

図２に戻り、インデックス記憶部１３４は、インデックスデータと、インデックスデータの属性情報を記憶するインデックス管理テーブル１３４ａとを記憶する。 Returning to FIG. 2, the index storage unit 134 stores index data and an index management table 134 a that stores attribute information of the index data.

インデックスデータは、処理対象データに対して行われる処理を高速に実行するために参照されるデータであり、インデックスのタイプ毎に異なる構造を有する。
図７に、インデックスタイプが「Hash Table」であるインデックスデータの構造の例を示す。このインデックスデータは、処理対象データである構造型データの値（構造化値）のハッシュ値を所定の式で算出し、算出したハッシュ値で構造化値のＩＤを分類したテーブルの構成を有している。そして、ある検索値を有する構造型データの値を検索する際には、まず、検索値のハッシュ値を求め、そのハッシュ値と一致する構造化値のＩＤを上述のインデックスデータから取得し、取得したＩＤを有する構造化値の中から、検索値に一致する構造化値を検索すればよく、インデックスデータを参照することにより、検索対象の構造化値が絞り込まれることになるため、検索処理が高速化する。 The index data is data that is referred to in order to execute processing performed on the processing target data at high speed, and has a different structure for each index type.
FIG. 7 shows an example of the structure of index data whose index type is “Hash Table”. This index data has a table structure in which hash values of structured type data values (structured values) that are processing target data are calculated by a predetermined formula, and structured value IDs are classified by the calculated hash values. ing. When searching for the value of structured data having a certain search value, first, the hash value of the search value is obtained, and the ID of the structured value that matches the hash value is obtained from the index data described above. It is only necessary to search for a structured value that matches the search value from among the structured values having the specified ID, and by referring to the index data, the structured value to be searched is narrowed down. Speed up.

インデックス管理テーブル１３４ａは、図８に示すように、インデックスデータ毎に、該インデックスデータの作成元となる構造化データのＩＤと、インデックスのタイプと、インデックスデータの格納位置を示す情報とを対応付けて記憶する。
例えば、図８に示すインデックス管理テーブル１３４ａの先頭エントリより、インデックスＩＤ「１」が示す構造化データには、インデックスタイプ「Ｂ−Ｔｒｅｅ」のインデックスデータがインデックス記憶部１３４の「/index/dat1」に作成されていることがわかる。 As shown in FIG. 8, the index management table 134a associates, for each index data, the ID of the structured data that is the creation source of the index data, the index type, and information indicating the storage location of the index data. And remember.
For example, from the top entry of the index management table 134a shown in FIG. 8, the structured data indicated by the index ID “1” includes the index data of the index type “B-Tree” in the index storage unit 134 “/ index / dat1”. You can see that it has been created.

図１に戻り、制御部１４は、データの演算処理を行うと共に、バス１５を介して入力部１１、出力部１２、記憶部１３を制御するものであり、ＣＰＵ（Central Processing Unit）１４１、ＲＯＭ（Read Only Memory）１４２、ＲＡＭ（Random Access Memory）１４３等を備える。制御部１４における演算処理及び制御処理は、具体的には、ＣＰＵ１４１が、ＲＡＭ１４３を作業領域として使用して各種データを一時的に記憶させながら、ＲＯＭ１４２に記憶されている制御プログラムを実行することにより行われる。
また、制御部１４は、ＲＯＭ１４２や記憶部１３に記憶されている制御プログラムに従って上記各部を制御することにより、データ処理装置１の後述する処理等を行う。 Returning to FIG. 1, the control unit 14 performs data calculation processing, and controls the input unit 11, the output unit 12, and the storage unit 13 via the bus 15. A CPU (Central Processing Unit) 141, a ROM (Read Only Memory) 142, RAM (Random Access Memory) 143, and the like. Specifically, the calculation process and the control process in the control unit 14 are performed by the CPU 141 executing the control program stored in the ROM 142 while temporarily storing various data using the RAM 143 as a work area. Done.
In addition, the control unit 14 performs the processing described later of the data processing apparatus 1 by controlling each of the above units according to a control program stored in the ROM 142 or the storage unit 13.

データ処理装置１は、機能的には、図９に示すように、クエリ解析部２１と、構造化データ／インデックス検索部２２と、構造化データ／インデックス作成部２３と、データ処理部２４とを備える。なお、これらの各構成要素は、図１に示した制御部１４が、同じく図１に示した入力部１１、出力部１２、又は記憶部１３を制御することにより、実現する。 Functionally, the data processing apparatus 1 includes a query analysis unit 21, a structured data / index search unit 22, a structured data / index creation unit 23, and a data processing unit 24, as shown in FIG. Prepare. These components are realized by the control unit 14 shown in FIG. 1 controlling the input unit 11, the output unit 12, or the storage unit 13 shown in FIG.

クエリ解析部２１は、ユーザによって入力されたクエリを受信し、受信したクエリを解析して、「処理対象データ抽出条件」、「処理対象データ型」、および、「処理タイプ」を示す情報を取得する。 The query analysis unit 21 receives a query input by the user, analyzes the received query, and acquires information indicating “processing target data extraction condition”, “processing target data type”, and “processing type” To do.

「処理対象データ抽出条件」は、クエリが示す処理の処理対象となるデータを抽出するための条件を示した情報である。
「処理対象データ型」は、処理対象データ抽出条件で抽出した処理対象データを変換するデータの型を示した情報である。なお、「処理対象データ型」の類型化した候補の一覧（数値型、ベクトル型、文字列型、バイナリ型等）が、あらかじめ記憶部１３に記憶されている。
「処理タイプ」は、処理対象データ抽出条件で抽出し、処理対象データ型で型変換した処理対象データに対して行う処理のタイプを示した情報である。なお、「処理タイプ」の類型化した候補の一覧（完全一致比較、前方一致比較、中間一致比較、後方一致比較、大小比較等）が、あらかじめ記憶部１３に記憶されている。 The “processing target data extraction condition” is information indicating a condition for extracting data to be processed in the processing indicated by the query.
The “processing target data type” is information indicating a data type for converting the processing target data extracted under the processing target data extraction condition. Note that a list of candidates classified as “processing target data type” (numerical type, vector type, character string type, binary type, etc.) is stored in the storage unit 13 in advance.
“Processing type” is information indicating the type of processing to be performed on the processing target data extracted under the processing target data extraction condition and converted into the processing target data type. A list of candidates classified as “processing type” (complete match comparison, forward match comparison, intermediate match comparison, backward match comparison, magnitude comparison, etc.) is stored in the storage unit 13 in advance.

ここで、具体例を挙げて、「処理対象データ抽出条件」、「処理対象データ型」、「処理タイプ」について説明する。例えば、非構造化データ記憶部１３１に、非構造化データである複数のテキストファイルである「発注書」が格納されており、ユーザから「発注書の３行目のタブ位置５個目の項目を抽出し、数値変換した上で、最も大きな値を取得する」処理を示すクエリが入力された場合を考える。
この場合、「処理対象データ抽出条件」は「発注書の３行目のタブ位置５個目の項目」、「処理対象データ型」は「数値型」、「処理タイプ」は「大小比較」となる。 Here, the “processing target data extraction condition”, “processing target data type”, and “processing type” will be described with specific examples. For example, “unordered data” is stored in the unstructured data storage unit 131 as a plurality of text files that are unstructured data. Let us consider a case in which a query indicating a process of “extracting and performing numerical conversion and obtaining the largest value” is input.
In this case, the “processing target data extraction condition” is “the item at the fifth tab position on the third line of the purchase order”, the “processing target data type” is “numeric type”, and the “processing type” is “large / small comparison”. Become.

構造化データ／インデックス検索部２２は、クエリ解析部２１が解析したクエリの処理対象のデータが既に構造化されて構造化データ記憶部１３２に記憶されているか否かを判別する。
そして、処理対象データが記憶されていると判別した場合には、構造化データ／インデックス検索部２２は、最適インデックス選択用テーブル１３３、および、インデックス記憶部１３４を参照して、該処理対象データに対して最適なインデックスが作成されているか否かを判別する。 The structured data / index search unit 22 determines whether the data to be processed by the query analyzed by the query analysis unit 21 is already structured and stored in the structured data storage unit 132.
If it is determined that the processing target data is stored, the structured data / index search unit 22 refers to the optimum index selection table 133 and the index storage unit 134 to determine the processing target data. It is determined whether or not an optimal index has been created.

構造化データ／インデックス作成部２３は、構造化データ／インデックス検索部２２が、処理対象データが構造化データ記憶部１３２に記憶されていないと判別した場合に、非構造化データ記憶部１３１から処理対象データを抽出して、抽出したデータを構造化して、構造化データ記憶部１３２に記憶する。そして、その構造化した処理対象データに最適なインデックスを最適インデックス選択用テーブル１３３を参照して作成し、インデックス記憶部１３４に記憶する。
また、構造化データ／インデックス作成部２３は、構造化データ／インデックス検索部２２が、処理対象データが構造化データ記憶部１３２に記憶されているが該処理対象データに最適なインデックスが作成されていないと判別した場合、最適なインデックスを最適インデックス選択用テーブル１３３を参照して作成し、インデックス記憶部１３４に記憶する。 When the structured data / index search unit 22 determines that the processing target data is not stored in the structured data storage unit 132, the structured data / index creation unit 23 performs processing from the unstructured data storage unit 131. The target data is extracted, and the extracted data is structured and stored in the structured data storage unit 132. Then, an optimum index for the structured processing target data is created with reference to the optimum index selection table 133 and stored in the index storage unit 134.
Further, the structured data / index creation unit 23 is configured so that the structured data / index search unit 22 stores the processing target data in the structured data storage unit 132, but an optimal index is created for the processing target data. If it is determined that there is no optimum index, an optimum index is created with reference to the optimum index selection table 133 and stored in the index storage unit 134.

データ処理部２４は、クエリを実行する。なお、この際クエリの処理対象となるデータには、処理に最適なインデックスデータがインデックス記憶部１３４に記憶されており、データ処理部２４は、このインデックスデータを参照して処理を高速に実行する。
そして、データ処理部２４はその処理結果を出力部１２に表示させる。 The data processing unit 24 executes a query. At this time, the data to be processed by the query has index data optimum for processing stored in the index storage unit 134, and the data processing unit 24 executes processing at high speed with reference to the index data. .
Then, the data processing unit 24 causes the output unit 12 to display the processing result.

続いて、データ処理装置１で、ユーザから入力されたクエリを実行する処理（クエリ処理）の動作について説明する。 Next, an operation of processing (query processing) for executing a query input by the user in the data processing apparatus 1 will be described.

ユーザから、データ処理装置１に記憶されている非構造化データを処理するためのクエリが入力部１１より入力され、その入力情報が制御部１４に送信されると、図１０に示すような、クエリ処理が開始される。 When a query for processing unstructured data stored in the data processing device 1 is input from the input unit 11 and transmitted to the control unit 14 from the user, as shown in FIG. Query processing is started.

まず、クエリ解析部２１は、入力されたクエリを解析して、「処理対象データ抽出条件」、「処理対象データ型」、および、「処理タイプ」を取得する（ステップＳ１０）。 First, the query analysis unit 21 analyzes the input query and acquires “processing target data extraction condition”, “processing target data type”, and “processing type” (step S10).

続いて、構造化データ／インデックス検索部２２は、ステップＳ１０で取得した、「処理対象データ抽出条件」と「処理対象データ型」の組が、構造化データ管理テーブル１３２aに登録されているか否かを判別する（ステップＳ２０）。 Subsequently, the structured data / index search unit 22 determines whether or not the set of “processing object data extraction condition” and “processing object data type” acquired in step S10 is registered in the structured data management table 132a. Is determined (step S20).

構造化データ管理テーブル１３２aに登録されていないと判別した場合（ステップＳ２０；Ｎｏ）、クエリによる処理の対象となるデータ（処理対象データ）は、未だ構造化されていないことになり、構造化データ／インデックス作成部２３は、処理対象データを構造化する構造化処理を行う（ステップＳ３０）。 If it is determined that the data is not registered in the structured data management table 132a (step S20; No), the data to be processed by the query (processing target data) is not yet structured, and structured data. The index creation unit 23 performs a structuring process for structuring the processing target data (step S30).

図１１に、構造化処理（ステップＳ３０）の詳細を示す。
まず、構造化データ／インデックス作成部２３は、新たな構造化データＩＤを発行し、構造化データ管理テーブル１３２aにエントリを１つ追加する（ステップＳ３１）。なお、この追加されるエントリの「処理対象データ抽出条件」、および、「処理対象データ型」は、ステップＳ１０で取得したものとする。 FIG. 11 shows details of the structuring process (step S30).
First, the structured data / index creation unit 23 issues a new structured data ID and adds one entry to the structured data management table 132a (step S31). It is assumed that the “processing target data extraction condition” and “processing target data type” of the added entry are acquired in step S10.

続いて、構造化データ／インデックス作成部２３は、非構造化データ記憶部１３１から、ステップＳ１０で取得した「処理対象データ抽出条件」に合致する非構造化データの値を抽出する（ステップＳ３２）。 Subsequently, the structured data / index creation unit 23 extracts the value of the unstructured data that matches the “processing target data extraction condition” acquired in step S10 from the unstructured data storage unit 131 (step S32). .

続いて、構造化データ／インデックス作成部２３は、ステップＳ３２で抽出した非構造化データの値を、ステップＳ１０で取得した「処理対象データ型」が示す型に変換する（ステップＳ３３）。 Subsequently, the structured data / index creation unit 23 converts the value of the unstructured data extracted in step S32 into a type indicated by the “processing target data type” acquired in step S10 (step S33).

そして、構造化データ／インデックス作成部２３は、変換した値それぞれに、構造化値ＩＤを付与し、ステップＳ３１で構造化データ管理テーブル１３２aに追加したエントリが示す構造化データの値として、構造化データ記憶部１３２に記憶する（ステップＳ３４）。
以上で、構造化処理は終了し、図１０のステップＳ６０に処理が移る。 Then, the structured data / index creation unit 23 assigns a structured value ID to each converted value, and uses the structured data as the value of the structured data indicated by the entry added to the structured data management table 132a in step S31. It memorize | stores in the data memory | storage part 132 (step S34).
Thus, the structuring process ends, and the process moves to step S60 in FIG.

「処理対象データ抽出条件」と「処理対象データ型」との組が、構造化データ管理テーブル１３２aに登録されていると判別した場合（ステップＳ２０；Ｙｅｓ）、構造化データ／インデックス検索部２２は、この組みに対応付けられている構造化ＩＤを取得する（ステップＳ４０）。 When it is determined that the set of “processing target data extraction condition” and “processing target data type” is registered in the structured data management table 132a (step S20; Yes), the structured data / index search unit 22 The structured ID associated with this set is acquired (step S40).

そして、構造化データ／インデックス検索部２２は、クエリによる処理を実行するのに最適なインデックスが作成されているか否かを判別する（ステップＳ５０）。
具体的には、構造化データ／インデックス検索部２２は、ステップＳ１０で取得した「処理対象データ型」、および、「処理タイプ」の組みに対応付けられているインデックスタイプを、最適インデックス選択用テーブル１３３から取得する。そして、構造化データ／インデックス検索部２２は、取得したインデックスタイプとステップＳ４０で取得した構造化データＩＤとの組みがインデックス管理テーブル１３４ａに記憶されているか否かを判別することで、最適なインデックスが作成されているか否かを判別する。 Then, the structured data / index search unit 22 determines whether or not an optimal index for executing the processing by the query is created (step S50).
Specifically, the structured data / index search unit 22 uses the index type associated with the combination of the “processing target data type” and the “processing type” acquired in step S10 as the optimum index selection table. From 133. Then, the structured data / index search unit 22 determines whether or not the combination of the acquired index type and the structured data ID acquired in step S40 is stored in the index management table 134a. It is determined whether or not has been created.

最適なインデックスが作成されていないと判別した場合（ステップＳ５０；Ｎｏ）、処理をステップＳ６０に移す。
最適なインデックスが作成されていると判別した場合（ステップＳ５０；Ｙｅｓ）、処理をステップＳ７０に移す。 If it is determined that an optimal index has not been created (step S50; No), the process proceeds to step S60.
If it is determined that an optimal index has been created (step S50; Yes), the process proceeds to step S70.

ステップＳ６０で、構造化データ／インデックス作成部２３は、インデックスを作成、登録するインデックス作成・登録処理を行う。 In step S60, the structured data / index creation unit 23 performs index creation / registration processing for creating and registering an index.

図１２に、インデックス作成・登録処理（ステップＳ６０）の詳細を示す。
まず、構造化データ／インデックス作成部２３は、クエリによる処理を実行するのに、最適なインデックスのタイプを取得する（ステッＳ６１）。
具体的には、構造化データ／インデックス作成部２３は、ステップＳ１０で取得した「処理対象データ型」と「処理タイプ」の組みに関連付けられているインデックスタイプを最適インデックス選択用テーブル１３３から取得する。 FIG. 12 shows details of the index creation / registration process (step S60).
First, the structured data / index creation unit 23 obtains an optimum index type for executing the processing by the query (step S61).
Specifically, the structured data / index creation unit 23 acquires the index type associated with the combination of the “processing target data type” and the “processing type” acquired in step S 10 from the optimal index selection table 133. .

続いて、構造化データ／インデックス作成部２３は、ステップＳ４０で取得した構造化データＩＤが示す構造化データの値、又は、ステップＳ３０で構造化した構造化データの値を、構造化データ記憶部１３２から取得する（ステップＳ６２）。 Subsequently, the structured data / index creating unit 23 stores the value of the structured data indicated by the structured data ID acquired in step S40 or the value of the structured data structured in step S30. It is acquired from 132 (step S62).

そして、構造化データ／インデックス作成部２３は、ステップＳ６１で取得したインデックスタイプに基づいた手法で、さきほど取得した構造化データの値を高速に処理するためのインデックスデータを作成し、インデックス記憶部１３４に記憶する（ステップＳ６３）。 Then, the structured data / index creating unit 23 creates index data for high-speed processing of the value of the structured data acquired earlier by the method based on the index type acquired in step S61, and the index storage unit 134 (Step S63).

そして、構造化データ／インデックス作成部２３は、インデックス管理テーブル１３４ａに、記憶したインデックスデータに対応するエントリを登録し（ステップＳ６４）、インデックス作成・登録処理は終了する。 Then, the structured data / index creation unit 23 registers an entry corresponding to the stored index data in the index management table 134a (step S64), and the index creation / registration process ends.

図１０に戻り、ステップＳ７０で、データ処理部２４は、クエリによる処理を実行する（ステップＳ７０）。なお、この際、処理の対象となるデータは、構造化されて、構造化データ記憶部１３２に記憶されている。また、この構造化された処理対象データには、処理に最適なタイプのインデックスデータが作成されており、クエリによる処理の実行の際に、データ処理部２４は、このインデックスデータを参照する。 Returning to FIG. 10, in step S 70, the data processing unit 24 executes processing based on the query (step S 70). At this time, the data to be processed is structured and stored in the structured data storage unit 132. In addition, index data of the optimum type for processing is created in the structured processing target data, and the data processing unit 24 refers to this index data when executing processing by a query.

続いて、データ処理部２４は、クエリによる処理の結果を、出力部１２に表示させる（ステップＳ８０）。以上で、クエリ処理は終了する。 Subsequently, the data processing unit 24 causes the output unit 12 to display the result of the processing based on the query (Step S80). This completes the query process.

以上説明したように、この実施の形態では、クエリを解析して、クエリの処理対象である非構造化データを、ユーザによる操作を介することなく構造化データに変換する。そして、この構造化データを処理するのに最適なインデックスデータを作成する。そして、クエリを実行する際には、このインデックスデータを参照するため、非構造化データに対するクエリ処理を従来より高速に行うことが可能となる。 As described above, in this embodiment, a query is analyzed, and unstructured data that is a processing target of the query is converted into structured data without an operation by the user. Then, optimal index data for processing the structured data is created. When executing a query, the index data is referred to, so that query processing for unstructured data can be performed at a higher speed than in the past.

なお、この発明は上記実施の形態に限定されず、様々な応用が可能である。 In addition, this invention is not limited to the said embodiment, A various application is possible.

例えば、本実施形態では、明確な構造を持たない非構造化データをクエリの処理対象としたが、何らかの構造を持つものの、その使われ方が一定ではないようなデータ（半構造化データ）に対しても、本発明を適用しても同様な効果を得ることができる。
何故ならば、本発明では、半構造化データを処理対象とするクエリを解析した各種情報に基づいて、半構造化データを構造化してインデックスデータを作成するため、半構造化データの使われ方に適したインデックスが作成されるからである。 For example, in this embodiment, unstructured data that does not have a clear structure is the target of query processing, but data that has some structure, but whose usage is not constant (semi-structured data). On the other hand, the same effect can be obtained even when the present invention is applied.
This is because in the present invention, semi-structured data is structured and index data is created based on various types of information obtained by analyzing a query for processing semi-structured data. This is because a suitable index is created.

また、本実施形態では、処理対象となる非構造化データはデータ処理装置１に記憶されているものとしたが、これに限る必要はなく、記録媒体に格納されている非構造化データや、インターネットなどのネットワークを介して与えられる非構造化データを処理対象としてもよい。 In the present embodiment, the unstructured data to be processed is stored in the data processing apparatus 1, but the present invention is not limited to this, the unstructured data stored in the recording medium, Unstructured data given via a network such as the Internet may be processed.

なお、本発明のデータ処理装置は、専用のハードウェアに限られるものではなく、通常のコンピュータシステムによっても実現することができる。
具体的には、上記実施の形態では、データ処理装置１のプログラムが、メモリ等に予め記憶されているものとして説明した。しかし、上述の処理動作を実行させるためのプログラムを、フレキシブルディスク、ＣＤ−ＲＯＭ（Compact Disk Read-Only Memory）、ＤＶＤ（Digital Versatile Disk）、ＭＯ（Magneto-Optical disk）等のコンピュータ読み取り可能な記録媒体に格納して配布し、そのプログラムをコンピュータにインストールすることにより、上述の処理を実行するデータ処理装置１を構成してもよい。 The data processing apparatus of the present invention is not limited to dedicated hardware, and can be realized by a normal computer system.
Specifically, in the above embodiment, the program of the data processing device 1 has been described as being stored in advance in a memory or the like. However, a program for executing the above-described processing operation is recorded on a computer-readable recording medium such as a flexible disk, a CD-ROM (Compact Disk Read-Only Memory), a DVD (Digital Versatile Disk), or an MO (Magneto-Optical disk). The data processing apparatus 1 that executes the above-described processing may be configured by storing and distributing the program in a medium and installing the program in a computer.

また、プログラムをインターネット等の通信ネットワーク上のサーバ装置が有するディスク装置等に格納しておき、例えば、搬送波に重畳させて、コンピュータにダウンロード等するようにしてもよい。さらに、通信ネットワークを介してプログラムを転送しながら起動実行することによっても、上述の処理を達成することができる。
また、上述の機能を、ＯＳ（Operating System）が分担又はＯＳとアプリケーションの協働により実現する場合等には、ＯＳ以外の部分のみを媒体に格納して配布してもよく、また、コンピュータにダウンロード等してもよい。 Further, the program may be stored in a disk device or the like included in a server device on a communication network such as the Internet, and may be downloaded to a computer by being superimposed on a carrier wave, for example. Furthermore, the above-described processing can also be achieved by starting and executing a program while transferring it via a communication network.
When the above functions are realized by an OS (Operating System) sharing or by cooperation between the OS and an application, only the part other than the OS may be stored in a medium and distributed. You may download it.

本発明の実施の形態に係るデータ処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the data processor which concerns on embodiment of this invention. 図１に示したデータ処理装置の記憶部の構成を示す図である。It is a figure which shows the structure of the memory | storage part of the data processor shown in FIG. 非構造化データ管理テーブルの構成を示す図である。It is a figure which shows the structure of an unstructured data management table. 構造化データ管理テーブルの構成を示す図である。It is a figure which shows the structure of a structured data management table. 構造化データの値を示す表を示す図である。It is a figure which shows the table | surface which shows the value of structured data. 最適インデックス選択用テーブルの構成を示す図である。It is a figure which shows the structure of the table for optimal index selection. インデックスタイプが「Hash Table」である場合のインデックスデータの構造の例を示す図である。It is a figure which shows the example of the structure of index data in case index type is "Hash Table." インデックス管理テーブルの構成を示す図である。It is a figure which shows the structure of an index management table. 本発明の実施の形態に係るデータ処理装置の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the data processor which concerns on embodiment of this invention. クエリ処理の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of a query process. 構造化処理の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of a structuring process. インデックス作成・登録処理の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of an index creation and registration process.

符号の説明Explanation of symbols

１データ処理装置
１３１非構造化データ記憶部
１３２構造化データ記憶部
１３３最適インデックス選択用テーブル
１３４インデックス記憶部
２１クエリ解析部
２２構造化データ／インデックス検索部
２３構造化データ／インデックス作成部
２４データ処理部 1 Data Processing Device 131 Unstructured Data Storage Unit 132 Structured Data Storage Unit 133 Optimal Index Selection Table 134 Index Storage Unit 21 Query Analysis Unit 22 Structured Data / Index Search Unit 23 Structured Data / Index Creation Unit 24 Data Processing Part

Claims

非構造化データを記憶する非構造化データ記憶手段と、
前記非構造化データを処理対象とするクエリを受信するクエリ受信手段と、
前記クエリ受信手段で受信したクエリを解析して、該クエリによる処理の対象となる非構造化データの抽出条件を示す抽出条件情報と、該クエリによる処理が扱うデータ型を示す型情報と、該クエリによる処理のタイプを示した処理タイプ情報とを取得するクエリ情報取得手段と、
前記非構造化データから、前記抽出条件情報が示す条件に合致するデータを抽出し、抽出したデータを、前記型情報が示すデータ型に変換した構造化データを作成する構造化データ作成手段と、
前記構造化データ作成手段で作成した構造化データを記憶する構造化データ記憶手段と、
前記型情報と前記処理タイプ情報とに基づいて、前記構造化データ記憶手段に記憶した構造化データに対するインデックスデータを作成するインデックスデータ作成手段と、
前記インデックスデータ作成手段で作成したインデックスデータを記憶するインデックスデータ記憶手段と、
前記クエリ受信手段で受信したクエリによる処理を、処理対象データを前記構造化データ記憶手段に記憶した構造化データとして、前記インデックスデータ記憶手段に記憶したインデックスデータを参照して実行する、クエリ実行手段と、
を備えることを特徴とする、データ処理装置。 Unstructured data storage means for storing unstructured data;
Query receiving means for receiving a query for processing the unstructured data;
Analyzing the query received by the query receiving means, extraction condition information indicating an extraction condition of unstructured data to be processed by the query, type information indicating a data type handled by the process by the query, Query information acquisition means for acquiring processing type information indicating the type of processing by the query,
Structured data creating means for extracting data that matches the condition indicated by the extraction condition information from the unstructured data, and creating structured data obtained by converting the extracted data into a data type indicated by the type information;
Structured data storage means for storing structured data created by the structured data creation means;
Index data creating means for creating index data for the structured data stored in the structured data storage means based on the type information and the processing type information;
Index data storage means for storing the index data created by the index data creation means;
Query execution means for executing processing by the query received by the query receiving means as structured data having processing target data stored in the structured data storage means with reference to the index data stored in the index data storage means When,
A data processing apparatus comprising:

前記インデックスデータ作成手段は、
前記型情報と前記処理タイプ情報とに基づいてインデックスタイプを特定し、
特定したインデックスタイプのインデックスデータを作成する、
ことを特徴とする請求項１に記載のデータ処理装置。 The index data creation means includes
An index type is identified based on the type information and the processing type information,
Create index data of the specified index type,
The data processing apparatus according to claim 1.

コンピュータを用いてデータを処理するためのデータ処理方法であって、
非構造化データを処理対象とするクエリを受信するステップと、
受信したクエリを解析して、該クエリによる処理の対象となる非構造化データの抽出条件を示す抽出条件情報と、該クエリによる処理が扱うデータ型を示す型情報と、該クエリによる処理のタイプを示した処理タイプ情報とを取得するステップと、
前記非構造化データから、前記抽出条件情報が示す条件に合致するデータを抽出し、抽出したデータを、前記型情報が示すデータ型に変換した構造化データを作成するステップと、
前記型情報と前記処理タイプ情報とに基づいて、作成した構造化データに対するインデックスデータを作成するステップと、
受信したクエリによる処理を、処理対象データを作成した構造化データとして、作成したインデックスデータを参照して実行する、
ことを特徴とする、データ処理方法。 A data processing method for processing data using a computer,
Receiving a query for processing unstructured data;
Analyzing the received query, extraction condition information indicating the extraction condition of unstructured data to be processed by the query, type information indicating the data type handled by the process by the query, and the type of processing by the query Obtaining processing type information indicating
Extracting data that matches the condition indicated by the extraction condition information from the unstructured data, and creating structured data obtained by converting the extracted data into a data type indicated by the type information;
Creating index data for the created structured data based on the type information and the processing type information;
Execute the process based on the received query with reference to the created index data as structured data that created the data to be processed.
The data processing method characterized by the above-mentioned.

コンピュータを、
非構造化データを記憶する非構造化データ記憶手段と、
前記非構造化データを処理対象とするクエリを受信するクエリ受信手段と、
前記クエリ受信手段で受信したクエリを解析して、該クエリによる処理の対象となる非構造化データの抽出条件を示す抽出条件情報と、該クエリによる処理が扱うデータ型を示す型情報と、該クエリによる処理のタイプを示した処理タイプ情報とを取得するクエリ情報取得手段と、
前記非構造化データから、前記抽出条件情報が示す条件に合致するデータを抽出し、抽出したデータを、前記型情報が示すデータ型に変換した構造化データを作成する構造化データ作成手段と、
前記構造化データ作成手段で作成した構造化データを記憶する構造化データ記憶手段と、
前記型情報と前記処理タイプ情報とに基づいて、前記構造化データ記憶手段に記憶した構造化データに対するインデックスデータを作成するインデックスデータ作成手段と、
前記インデックスデータ作成手段で作成したインデックスデータを記憶するインデックスデータ記憶手段と、
前記クエリ受信手段で受信したクエリによる処理を、処理対象データを前記構造化データ記憶手段に記憶した構造化データとして、前記インデックスデータ記憶手段に記憶したインデックスデータを参照して実行する、クエリ実行手段と、
して機能させるプログラム。 Computer
Unstructured data storage means for storing unstructured data;
Query receiving means for receiving a query for processing the unstructured data;
Analyzing the query received by the query receiving means, extraction condition information indicating an extraction condition of unstructured data to be processed by the query, type information indicating a data type handled by the process by the query, Query information acquisition means for acquiring processing type information indicating the type of processing by the query,
Structured data creating means for extracting data that matches the condition indicated by the extraction condition information from the unstructured data, and creating structured data obtained by converting the extracted data into a data type indicated by the type information;
Structured data storage means for storing structured data created by the structured data creation means;
Index data creating means for creating index data for the structured data stored in the structured data storage means based on the type information and the processing type information;
Index data storage means for storing the index data created by the index data creation means;
Query execution means for executing processing by the query received by the query receiving means as structured data having processing target data stored in the structured data storage means with reference to the index data stored in the index data storage means When,
Program to make it work.