JPH08314966A

JPH08314966A - Method for generating index of document retrieving device and document retrieving device

Info

Publication number: JPH08314966A
Application number: JP7121338A
Authority: JP
Inventors: Kenichi Nogami; 謙一野上; Yukio Nakamoto; 幸夫中本; Isamu Iwai; 勇岩井; Takeshi Matsukuma; 剛松隈
Original assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Current assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Priority date: 1995-05-19
Filing date: 1995-05-19
Publication date: 1996-11-29

Abstract

PURPOSE: To provide a method for generating the index of a document retrieving device and the document retrieving device whereby only the document where a retrieval key is added in an optional designated range is picked-up, document pick-up which a user does not intend is reduced, a retrieval speed without being affected by the number of areas is obtained, enlargement of the index by the area number is restricted, and the document meeting a user request is efficiently retrieved. CONSTITUTION: The device is provided with a controller 1, an input device 2, an output device 3 for displaying and printing the contents of the retrieval document by the retrieval key, the indication of a retrieval operation, a retrieval result and a retrieval, an external storage device 4 for storing the retrieval object document and information required for retrieving, etc., and a division dictionary 5 for dividing the document into a word and a character unit at the time of generating character position information and also characterized by designating the retrieval range so as to retrieve in which position in the retrieval object document the word and the character unit are respectively appearing to the retrieval key which is constituted of the optionaly inputted word and the character.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データベースに登録さ
れた大量の文書中から、検索キーワードを基に該当文書
を検索する文書検索装置のインデックス作成方法、及び
文書検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document search device indexing method and a document search device for searching a large number of documents registered in a database based on a search keyword.

【０００２】[0002]

【従来の技術】従来より、データベースに登録された大
量の文書中から、ユーザが任意に入力した文字列を含む
文書を全て検索してくる文書検索方式の装置がある。文
書検索方式も２種類の方式があり、第１の方式として、
ユーザの入力した文字列と文書中の文字列との比較をと
って検索を行なう、文字列マッチング方式がある。2. Description of the Related Art Conventionally, there is a document retrieval system that retrieves all documents including a character string arbitrarily input by a user from a large number of documents registered in a database. There are two types of document search methods, and the first method is
There is a character string matching method in which a character string input by a user is compared with a character string in a document to perform a search.

【０００３】又、第２の方式として、データベースに登
録されている大量の文書中から高速に検索する為に、前
処理で検索用のインデックスを作成し、このインデック
スを用いて検索する方式がある。As a second method, there is a method in which a search index is created by preprocessing and a search is performed using this index in order to search at high speed from a large amount of documents registered in a database. .

【０００４】この検索用のインデックスは、全検索対象
文書中から全ての単語や文字を抽出し、抽出した単語や
文字からインデックスを表現するものである。ユーザが
検索キーに前処理で抽出した単語や文字を使用した場合
には、この検索用のインデックスを参照することで高速
に検索することができる。The search index extracts all words and characters from all search target documents and expresses the index from the extracted words and characters. When the user uses the word or character extracted in the preprocessing as the search key, the search can be performed at high speed by referring to this search index.

【０００５】しかし、ユーザは単語と文字の合成語や文
章などの検索キーを使用する場合がある。その場合に
は、更に入力された検索キーを単語や文字に分割し、そ
の分割された単語や文字を含む文書を、検索用のインデ
ックスを用いて抽出してくる。そして、その抽出されて
きた文書中に検索キーとして入力された文字列が含まれ
ているかどうかを文字と単語等の接続を表す隣接情報の
インデックスを用いて検索することができる。However, the user may use a search key such as a compound word of a word and a character or a sentence. In that case, the input search key is further divided into words and characters, and the document including the divided words and characters is extracted using the search index. Then, whether or not the extracted document includes the character string input as the search key can be searched using the index of the adjacent information indicating the connection between the character and the word.

【０００６】これは、ユーザが入力した検索キーを文書
中のどこかに含んでいるものを検索してくるものであ
り、ユーザの意図していなかった部分に検索キーが含ま
れている文書を抽出してくることがあった。このことか
ら、文書を複数のエリアに分割して検索を行ない、ユー
ザの指定したエリア内だけの検索を行なう検索方式があ
った。This is to search for a document that contains a search key entered by the user somewhere in the document, and search for a document that contains the search key in a part that was not intended by the user. I had to extract it. For this reason, there is a search method in which a document is divided into a plurality of areas for searching and only the area designated by the user is searched.

【０００７】[0007]

【発明が解決しようとする課題】上述したように従来の
方式では、ユーザが入力した検索キーが文書中のどこか
に含まれていれば、検索対象文書中から検索してきてい
た。これは、ユーザの要求している文書のみに限らず、
単純に文書中のどこかに含まれている為に、ユーザの意
図していた文書以外の文書も抽出してくるという問題が
あった。又、ユーザがエリアを指定して検索を行なう際
にも、指定されたエリアが多数の場合には、各々のエリ
アを検索することになり、検索速度は従来よりも落ちる
という問題があった。更に、エリア別に検索用のインデ
ックスを作成することによりデータ容量が増加するとい
う問題があった。As described above, in the conventional method, if the search key entered by the user is included in the document, the search target document is searched. This is not limited to the document requested by the user,
There is a problem in that a document other than the document intended by the user is also extracted because it is simply included somewhere in the document. Further, even when the user performs a search by designating an area, if there are a large number of designated areas, each area is searched, and there is a problem that the search speed becomes slower than in the past. Further, there is a problem that the data capacity is increased by creating a search index for each area.

【０００８】本発明は上記事情を考慮して成されたもの
で、上記不具合を解消し、ユーザが任意に検索キーを入
力する際、その検索キーを文書中の全ての単語や文字を
対象とするのでなく、ユーザが任意に指定した範囲に検
索キーが含まれている文書のみを抽出してくることによ
り、ユーザが意図していない文書の抽出を減少させ、更
にエリアの数に左右されない検索速度を得ることがで
き、又、エリアの数によってインデックスが肥大化する
ことを抑制することができ、これにより、ユーザの要求
にあった文書を効率よく検索可能とした文書検索装置の
インデックス作成方法、及び文書検索装置を提供するこ
とを目的とする。The present invention has been made in consideration of the above circumstances, and solves the above-mentioned inconvenience, and when the user arbitrarily inputs a search key, the search key targets all words and characters in the document. Instead of doing so, by extracting only the documents that include the search key in the range that the user arbitrarily specified, the extraction of documents that the user did not intend is reduced, and the search is independent of the number of areas. It is possible to obtain speed, and it is possible to prevent the index from becoming large due to the number of areas, thereby making it possible to efficiently search for a document that meets a user's request. , And a document search device.

【０００９】[0009]

【課題を解決するための手段】本発明は上記目的を達成
する為、入力した任意の文字列を含む文書を検索する文
書検索装置に於いて文書を検索する為の検索用のインデ
ックス作成方法であって、全検索対象文書中に出現する
全ての単語及び文字等のキーワードを抽出し、このキー
ワードの出現情報をビット列で持つことで文書の検索を
可能にする為の全文検索用のインデックスを作成し、同
時にキーワードがどのエリアに出現しているかを示すキ
ーワード存在情報をエリア単位にビット列で持たせ、エ
リア用のインデックスを作成することを特徴とする文書
検索装置のインデックス作成方法にある。In order to achieve the above object, the present invention provides a search index creating method for searching a document in a document search device for searching a document containing an input arbitrary character string. Therefore, all keywords that appear in all documents to be searched and keywords such as characters are extracted, and the index for full-text search is created to enable the document search by having the occurrence information of these keywords as bit strings. Then, at the same time, the keyword existence information indicating in which area the keyword appears is provided as a bit string for each area, and an index for the area is created.

【００１０】又、本発明は上記目的を達成する為、上記
文書検索装置のインデックス作成方法にあって、エリア
用のインデックスを作成する際、文書のエリアに出現し
た文字のみでインデックスを作成することにある。In order to achieve the above object, the present invention provides the index creating method of the above document retrieval apparatus, wherein when creating an index for an area, the index is created only by the characters appearing in the area of the document. It is in.

【００１１】又、本発明は上記目的を達成する為、上記
文書検索装置のインデックス作成方法にあって、任意に
エリア数とエリアを設定できるようにしたことにある。
又、本発明は上記目的を達成する為、上記文書検索装置
のインデックス作成方法にあって、作成されたインデッ
クスを用いる際、エリアが指定された検索を行う場合に
全文検索用インデックスとエリア用インデックスを併用
することで文書の検索を行うことにある。Further, in order to achieve the above object, the present invention is, in the index creating method of the above document retrieval apparatus, capable of arbitrarily setting the number of areas and the areas.
Further, in order to achieve the above object, the present invention provides a method for creating an index of the document search device, wherein when using the created index, a full-text search index and an area index are used when performing a search in which an area is designated. It is to search for a document by using together.

【００１２】更に、本発明は上記目的を達成する為、入
力した任意のキーワードを含む文書を検索する文書検索
装置であって、エリア用のインデックスを作成する際、
文書のエリアに出現した文字のみでインデックスを作成
するインデックス作成方法で作成される検索用のインデ
ックスを用いた文書検索装置に於いて、検索キーを入力
する検索キー入力手段と、検索キーが２語以上から構成
されている場合に当該検索キーを語単位に分割するキー
ワード分割手段と、任意に検索範囲指定できるエリア指
定手段と、前記検索キーを含む文書を抽出する検索キー
マッチング手段と、前記検索キーがキーワード分割手段
により分割されている場合にキーワードの出現位置を示
すキーワード出現位置情報を用いて指定されたエリア内
でのつながりを調べる出現位置マッチング手段とを具備
し、全検索対象文書中より要求にあった文書を検索して
得られた結果を出力するようにしたことを特徴とする文
書検索装置にある。Further, in order to achieve the above-mentioned object, the present invention is a document retrieval device for retrieving a document containing an input arbitrary keyword, wherein when creating an index for an area,
In a document search device using a search index created by an index creation method that creates an index only with characters that appear in a document area, a search key input means for inputting a search key and a search key consisting of two words When configured as described above, a keyword dividing unit that divides the search key into words, an area specifying unit that can arbitrarily specify a search range, a search key matching unit that extracts a document including the search key, and the search When the key is divided by the keyword dividing means, the appearance position matching means for checking the connection in the designated area using the keyword appearance position information indicating the appearance position of the keyword is provided, and all the search target documents are searched. A document retrieval device characterized in that a result obtained by retrieving a requested document is output.

【００１３】[0013]

【作用】上記構成によれば、ユーザが任意に入力した単
語や文字から構成される検索キーに対して、単語や文字
単位に夫々検索対象文書のどのような位置（例えば、章
や節など）に出現しているのかを検索範囲を指定して検
索することができる。According to the above configuration, with respect to a search key composed of words and characters arbitrarily input by the user, what position (for example, chapter or section) in the document to be searched for each word or character You can search by specifying the search range to see if it appears in.

【００１４】又、単語がどの文書のどの位置に含まれて
いるかを示す検索用のインデックスをビット列で表現
し、エリアに関しては文字のみに関して、ビット列で表
現するインデックスを作成する。Further, a search index indicating which position of which document a word is included in is expressed by a bit string, and an index is created by expressing a bit string only for a character for an area.

【００１５】エリアが指定された検索の際には、全文用
のインデックスと文字のみで作成されたエリア用のイン
デックスを併用して使用することにより、検索用のイン
デックスの肥大化を抑制できる。At the time of the search in which the area is designated, the index for the full text and the index for the area created only by the characters are used together, so that the index for the search can be prevented from being enlarged.

【００１６】これにより、ユーザの検索範囲の指定が、
小さな検索範囲や全文対象等、どのような検索範囲の指
定であってもビット列の論理演算処理により高速に検索
することができる。更に、ユーザの意図している文書以
外の文書を抽出してくる量減少させることができ、ユー
ザに対して検索速度及び検索効率が大きく向上すること
によりユーザの負担を軽減することができる。This allows the user to specify the search range,
Even if the search range is specified such as a small search range or the whole text target, the search can be performed at high speed by the logical operation processing of the bit string. Further, the amount of documents other than the document intended by the user can be reduced, and the search speed and the search efficiency for the user can be greatly improved, thereby reducing the burden on the user.

【００１７】[0017]

【実施例】本発明の概要は次の通りである。即ち、本発
明は、検索キーとして入力された文字列をもとに大量の
検索対象文書から該当する文書を検索してくるためのイ
ンデックスを作成するインデックス作成装置およびイン
デックス作成装置により作成されたインデックスから、
検索を行ない出力装置に表示する文書検索装置におい
て、前記検索対象文書からユーザの要求している文書を
抽出する際に、ユーザが任意に入力した検索キーが存在
している文書を検索できるように、ユーザの指定で文書
中における検索範囲を指定できる検索範囲指定手段と、
検索範囲指定手段によって指定された範囲内に検索キー
を含む文書を検索するために、その範囲内にどのような
文字が出現しているかを示す情報をビット列で表現し、
エリア内の検索用インデックスを作成するためのエリア
インデックス作成手段と、全文を検索するときに使用す
る検索用インデックスを作成するためのキーワードイン
デックス作成手段と、文字位置情報作成手段により作成
された文字位置情報から、指定された範囲内にユーザが
任意に入力した検索キーが存在している文書をビット列
の演算処理で高速に検索するための指定範囲検索手段と
を具備したことを特徴とする。The outline of the present invention is as follows. That is, the present invention is directed to an index creation device that creates an index for searching for a corresponding document from a large number of search target documents based on a character string input as a search key, and an index created by the index creation device. From
In a document search device for performing a search and displaying it on an output device, when a document requested by a user is extracted from the search target document, a document in which a search key arbitrarily input by the user exists can be searched. , A search range designation means capable of designating a search range in a document by user's designation,
In order to search for a document that includes a search key within the range specified by the search range specifying means, information indicating what characters appear in that range is represented by a bit string,
Area index creation means for creating a search index in the area, keyword index creation means for creating a search index used when searching the entire text, and character position created by the character position information creation means It is characterized in that the apparatus further comprises a specified range searching means for searching a document in which a search key arbitrarily input by the user exists within a specified range from the information by a bit string calculation process at high speed.

【００１８】以下図面を参照して本発明の一実施例を説
明する。図１は本発明を実現させる文書検索装置の概略
構成を示すブロック図である。図１に示すように、本発
明の実施例による文書検索装置は、ＣＰＵやメモリ等か
ら構成されている制御装置１と、ユーザがキーボードや
マウス等によって検索キーを入力したり検索操作を行な
う為の入力装置２と、この入力装置２によって入力され
た検索キーや検索操作の指示や検索結果及び検索によっ
て検索された文書の内容を表示や印刷を行なったりする
出力装置３と、検索対象文書や検索に必要な情報等を格
納する外部記憶装置４と、文字位置情報を作成する際に
文書を単語や文字の語単位に分割する為の分割辞書５と
から構成されている。An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a document search device for realizing the present invention. As shown in FIG. 1, a document search device according to an embodiment of the present invention is used for a control device 1 including a CPU, a memory and the like, and for a user to input a search key or perform a search operation with a keyboard, a mouse or the like. Input device 2, an output device 3 for displaying and printing a search key input by the input device 2, a search operation instruction, a search result, and a content of a document searched by the search, a search target document, It is composed of an external storage device 4 for storing information necessary for retrieval and a division dictionary 5 for dividing a document into word or character word units when creating character position information.

【００１９】前記実施例に於ける制御装置１の構成を図
２及び図３に示す。前記制御装置１は、図２及び図３に
示すように、処理的に大量の文書を検索する為のインデ
ックスを作成するインデックス作成処理ブロック１０
（図２参照）と、作成されたインデックスからユーザの
要求にあった文書を検索する検索処理ブロック１１（図
３参照）の二つに分割することができる。The structure of the control device 1 in the above embodiment is shown in FIGS. As shown in FIGS. 2 and 3, the control device 1 creates an index creation processing block 10 for creating an index for searching a large number of documents in terms of processing.
(See FIG. 2) and a search processing block 11 (see FIG. 3) for searching a document requested by the user from the created index.

【００２０】インデックス作成処理ブロック１０は、図
２に示すように、インデックス作成制御部２０、初期化
部２００、エリア情報読込部２０１、文書読込部２０
２、エリア取得部２０３、キーワード分割部２０４、エ
リア位置情報マッチング部２０５、キーワードインデッ
クス作成部２０６、エリアインデックス作成部２０７、
インデックス保存部２０８等の各処理部と、文書数格納
バッファ２２５、エリア数格納バッファ２２６、エリア
分割ワード格納バッファ２２７、文書格納バッファ２２
８、エリア情報格納バッファ２２９、分割キーワード格
納バッファ２３０、文字位置情報格納バッファ２３１、
エリア位置情報格納バッファ２３２、キーワードインデ
ックス格納バッファ２３３、エリアインデックス格納バ
ッファ２３４の各バッファ部とから構成されている。As shown in FIG. 2, the index creation processing block 10 includes an index creation control unit 20, an initialization unit 200, an area information reading unit 201, and a document reading unit 20.
2, area acquisition unit 203, keyword division unit 204, area position information matching unit 205, keyword index creation unit 206, area index creation unit 207,
Each processing unit such as the index storage unit 208, the document number storage buffer 225, the area number storage buffer 226, the area division word storage buffer 227, and the document storage buffer 22.
8, area information storage buffer 229, divided keyword storage buffer 230, character position information storage buffer 231,
It is composed of an area position information storage buffer 232, a keyword index storage buffer 233, and an area index storage buffer 234.

【００２１】又、検索処理ブロック１１は、図３に示す
ように、検索制御部２１、初期化部２５０、バッファク
リア部２５１、インデックス読込部２５２、検索実行部
２５３、キーワード抽出部２５４、キーワードマッチン
グ部２５５、エリアマッチング部２５６、ビット演算部
２５７、エリア番号抽出部２５８、文書ＩＤカウント部
２６０、出現位置マッチング部２６１、論理演算分割部
２６２、検索回答部２６３、出力部２６４、入力部２６
５等の各処理部と、キーワードインデックス格納バッフ
ァ２７５、エリアインデックス格納バッファ２７６、文
字位置情報格納バッファ２７９、エリア位置情報格納バ
ッファ２８０、エリア番号格納バッファ２８１、検索キ
ー格納バッファ２８２、キーワード格納バッファ２８
３、文字格納バッファ２８４、論理演算記号格納バッフ
ァ２８５、キーワードビット列格納バッファ２８７、エ
リアビット列格納バッファ２８８、一時ビット列格納バ
ッファ２８９、結果ビット列格納バッファ２９０、文書
ＩＤ格納バッファ２９２、検索式格納バッファ２９３、
該当文書番号格納バッファ２９４の各バッファ部とから
構成される。As shown in FIG. 3, the search processing block 11 includes a search control unit 21, an initialization unit 250, a buffer clear unit 251, an index reading unit 252, a search executing unit 253, a keyword extracting unit 254, and keyword matching. Unit 255, area matching unit 256, bit calculating unit 257, area number extracting unit 258, document ID counting unit 260, appearance position matching unit 261, logical operation dividing unit 262, search reply unit 263, output unit 264, input unit 26.
5, etc., a keyword index storage buffer 275, an area index storage buffer 276, a character position information storage buffer 279, an area position information storage buffer 280, an area number storage buffer 281, a search key storage buffer 282, and a keyword storage buffer 28.
3, character storage buffer 284, logical operation symbol storage buffer 285, keyword bit string storage buffer 287, area bit string storage buffer 288, temporary bit string storage buffer 289, result bit string storage buffer 290, document ID storage buffer 292, search expression storage buffer 293,
The corresponding document number storage buffer 294 is composed of the respective buffer units.

【００２２】インデックス作成制御部２０は、検索用の
インデックスを作成処理の制御を行なう。初期化部２０
１は、各バッファの初期化を行なう。エリア情報読込部
２０１は、外部記憶装置４に格納されているエリア情報
（エリア数及びエリア分割キーワード）を読み込み、エ
リア数格納バッファ２２６及びエリア分割ワード格納バ
ッファ２２７に格納を行なう。The index creation control unit 20 controls the process of creating an index for retrieval. Initialization unit 20
1 initializes each buffer. The area information reading unit 201 reads the area information (the number of areas and the area division keyword) stored in the external storage device 4, and stores it in the area number storage buffer 226 and the area division word storage buffer 227.

【００２３】文書読込部２０２は、外部記憶装置４に格
納されている文書を読み込み、文書格納バッファ２２８
に格納を行なう。エリア取得部２０３は、文書格納バッ
ファ２２８に格納されている文書の何文字目から何文字
目迄がどのエリアかのチェックを行ない、得られたエリ
アの開始・終了位置の情報をエリア情報格納バッファ２
２９に格納を行なう。The document reading unit 202 reads the document stored in the external storage device 4, and stores it in the document storage buffer 228.
Store in. The area acquisition unit 203 checks which character is from which character of the document stored in the document storage buffer 228 to which character area, and the information on the start / end position of the obtained area is stored in the area information storage buffer. Two
Store in 29.

【００２４】キーワード分割部２０４は、文書格納バッ
ファ２２９に格納されている文書を、分割辞書５を用い
て、単語や文字に分割を行ない、分割キーワード格納バ
ッファ２３０に格納を行なう。The keyword dividing unit 204 divides the document stored in the document storage buffer 229 into words and characters by using the division dictionary 5, and stores it in the divided keyword storage buffer 230.

【００２５】エリア位置情報マッチング部２０５は、分
割された文字がどのエリアに出現しているかの情報をチ
ェックし、エリア位置情報格納バッファ２３２への格納
を行なう。The area position information matching unit 205 checks the information in which area the divided characters appear, and stores it in the area position information storage buffer 232.

【００２６】キーワードインデックス作成部２０６は、
分割辞書５に対応してキーワードおよび文字単位にイン
デックスを作成し、キーワードインデックス格納バッフ
ァ２３３へ格納を行なう。The keyword index creation unit 206
An index is created for each keyword and character corresponding to the divided dictionary 5 and stored in the keyword index storage buffer 233.

【００２７】エリアインデックス作成部２０７は、エリ
ア位置情報格納バッファ２３３に格納されているエリア
位置情報からエリアインデックスを作成し、エリアイン
デックス格納バッファ２３４への格納を行なう。The area index creating section 207 creates an area index from the area position information stored in the area position information storage buffer 233 and stores it in the area index storage buffer 234.

【００２８】インデックス保存部２０８は、作成したキ
ーワードインデックス、エリアインデックス、文字位置
情報、エリア情報等をそれぞれ外部記憶装置４へ格納制
御する。The index storage unit 208 controls storage of the created keyword index, area index, character position information, area information, etc. in the external storage device 4, respectively.

【００２９】検索制御部２１は、文書の検索を行なうた
めの処理の制御を行なう。初期化部２５０は、各バッフ
ァの初期化を行なう。バッファクリア部２５１は、連続
して検索を行なう場合にクリアが必要なバッファのみの
クリアを行なう。The search control unit 21 controls the process for searching a document. The initialization unit 250 initializes each buffer. The buffer clearing unit 251 clears only the buffer that needs to be cleared in the case of continuous search.

【００３０】インデックス読込部２５２は、外部記憶装
置４に格納されているキーワードインデックス、エリア
インデックス、文字位置情報、エリア情報を読み込み、
それぞれキーワードインデックス格納バッファ２７５、
エリアインデックス格納バッファ２７６、文字位置情報
格納バッファ２７９、エリア情報格納バッファ２８０へ
の格納を行なう。The index reading unit 252 reads the keyword index, the area index, the character position information, and the area information stored in the external storage device 4,
A keyword index storage buffer 275,
The area index storage buffer 276, the character position information storage buffer 279, and the area information storage buffer 280 are stored.

【００３１】検索実行部２５３は検索式を検索式格納バ
ッファ２９３へ格納する。キーワード抽出部２５４は、
検索キー文字列をキーワードインデックスに登録されて
いるキーワード単位及び文字単位で分割を行ない、分割
したキーワード及び文字をそれぞれキーワード格納バッ
ファ２８３、文字格納バッファ２８４へ格納する。The search execution unit 253 stores the search expression in the search expression storage buffer 293. The keyword extracting unit 254
The search key character string is divided into keyword units and character units registered in the keyword index, and the divided keywords and characters are stored in the keyword storage buffer 283 and the character storage buffer 284, respectively.

【００３２】キーワードマッチング部２５５は、キーワ
ード格納バッファ２８３に格納されているキーワードに
マッチするビット列をキーワードインデックスより抽出
し、キーワードビット列格納バッファ２８７へ格納す
る。The keyword matching unit 255 extracts a bit string that matches the keyword stored in the keyword storage buffer 283 from the keyword index and stores it in the keyword bit string storage buffer 287.

【００３３】エリアマッチング部２５６は、文字格納バ
ッファ２８４に格納されている文字で且つエリア番号格
納バッファ２８１に格納されているエリア番号に相当す
るビット列を、エリアインデックスより抽出し、エリア
ビット列格納バッファ２８８へ格納する。The area matching unit 256 extracts a bit string corresponding to the area number stored in the area number storage buffer 281 which is the character stored in the character storage buffer 284 from the area index, and the area bit sequence storage buffer 288. Store to.

【００３４】ビット演算部２５７は、キーワードビット
列格納バッファ２８７及びエリアビット列格納バッファ
２８８に格納されているビット列の論理積をとり、一時
ビット列格納バッファ２８９への格納を行なう。The bit calculator 257 takes the logical product of the bit strings stored in the keyword bit string storage buffer 287 and the area bit string storage buffer 288, and stores it in the temporary bit string storage buffer 289.

【００３５】エリア番号抽出部２５８は、指定されてい
るエリア番号をエリア番号格納バッファ２８１へ格納す
る。文書ＩＤカウント部２６０は、結果ビット列格納バ
ッファ２８９に格納されている最終的に得られたビット
列から該当する文書ＩＤを抽出し、文書ＩＤ格納バッフ
ァ２９２へ格納する。The area number extraction unit 258 stores the designated area number in the area number storage buffer 281. The document ID counting unit 260 extracts the corresponding document ID from the finally obtained bit string stored in the result bit string storage buffer 289, and stores it in the document ID storage buffer 292.

【００３６】出現位置マッチング部２６１は、分割され
た検索キーが実際に連続であるかどうかのチェックを行
なう。論理演算分割部２６２は、検索式を論理演算記号
単位に分割し、分割した検索キーを検索キー格納バッフ
ァ２８２へ、論理演算記号を論理演算記号格納バッファ
２８５へ格納する。The appearance position matching unit 261 checks whether the divided search keys are actually continuous. The logical operation dividing unit 262 divides the search expression into logical operation symbol units, and stores the divided search keys in the search key storage buffer 282 and the logical operation symbols in the logical operation symbol storage buffer 285.

【００３７】検索回答部２６３は、該当文書ＩＤ格納バ
ッファ２９４に格納されている文書ＩＤの取得を行な
う。出力部２６４は、出力装置３への検索結果の出力を
行なう。入力部２６５は、ユーザが入力装置２を用いて
検索に関する操作の管理を行なう。The search response section 263 acquires the document ID stored in the corresponding document ID storage buffer 294. The output unit 264 outputs the search result to the output device 3. The input unit 265 manages a search operation by the user using the input device 2.

【００３８】図４乃至図３０はそれぞれ上記実施例の動
作を説明するための図であり、図４は上記実施例に於け
るインデックス作成処理の流れを示す図、図５及び図６
は上記実施例に於ける検索処理の流れを示す図である。
図７はエリア分割キーワードの外部記憶装置４への格納
例を示す図、図８はエリア数格納バッファ２２６への格
納例を示す図、図９はエリア分割ワード格納バッファへ
２２７の格納例を示す図、図１０は文書格納バッファ２
２８への格納例を示す図、図１１は文書数格納バッファ
２２５への格納例を示す図、図１２は分割辞書５の構成
例を示す図、図１３はエリアの中に別のエリアが存在す
る例を示す図、図１４はエリア情報格納バッファ２２９
への格納例を示す図、図１５は分割キーワード格納バッ
ファ２３０への格納例を示す図、図１６は文字位置情報
格納バッファ２３１への格納例を示す図、図１７はエリ
ア位置情報格納バッファ２３２への格納例を示す図、図
１８はキーワードインデックスの例を示す図、図１９は
エリアインデックスの例を示す図、図２０は検索式格納
バッファ２９３への格納例を示す図、図２１は検索キー
格納バッファ２８２への格納例を示す図、図２２は論理
演算記号格納バッファ２８５への格納例を示す図、図２
３はエリア番号格納バッファ２８１への格納例を示す
図、図２４はキーワード格納バッファ２８３への格納例
を示す図、図２５は文字格納バッファ２８４への格納例
を示す図、図２６はビット演算の例を示す図、図２７は
文書ＩＤ格納バッファ２９２への格納例を示す図、図２
８は出現位置マッチング部２６１の処理の例を示す図、
図２９は該当文書ＩＤ格納バッファ２９４への格納例を
示す図、図３０は文書検索装置及びエリアの指定方法の
例を示す図である。FIGS. 4 to 30 are diagrams for explaining the operation of the above embodiment, and FIG. 4 is a diagram showing the flow of the index creation processing in the above embodiment, FIGS. 5 and 6.
FIG. 6 is a diagram showing a flow of a search process in the above embodiment.
7 is a diagram showing an example of storing the area division keyword in the external storage device 4, FIG. 8 is a diagram showing an example of storage in the area number storage buffer 226, and FIG. 9 is a diagram showing an example of storage in the area division word storage buffer 227. FIG. 10 shows the document storage buffer 2
28 is a diagram showing an example of storage in 28, FIG. 11 is a diagram showing an example of storage in the document number storage buffer 225, FIG. 12 is a diagram showing a configuration example of the division dictionary 5, and FIG. 13 is another area in the area. FIG. 14 is a diagram showing an example of the area information storage buffer 229.
15 shows an example of storage in the divided keyword storage buffer 230, FIG. 16 shows an example of storage in the character position information storage buffer 231, and FIG. 17 shows an area position information storage buffer 232. FIG. 18 is a diagram showing an example of a keyword index, FIG. 19 is a diagram showing an example of an area index, FIG. 20 is a diagram showing an example of storage in a search expression storage buffer 293, and FIG. 21 is a search. 2 shows an example of storage in the key storage buffer 282, FIG. 22 shows an example of storage in the logical operation symbol storage buffer 285, FIG.
3 is a diagram showing an example of storage in the area number storage buffer 281, FIG. 24 is a diagram showing an example of storage in the keyword storage buffer 283, FIG. 25 is a diagram showing an example of storage in the character storage buffer 284, and FIG. 26 is a bit operation. FIG. 27 is a diagram showing an example of storage in the document ID storage buffer 292, FIG.
8 is a diagram showing an example of processing of the appearance position matching unit 261;
FIG. 29 is a diagram showing an example of storage in the corresponding document ID storage buffer 294, and FIG. 30 is a diagram showing an example of a document search device and an area designation method.

【００３９】上記しなような構成に於ける実施例装置の
具体的な動作を図４乃至図６の流れ図を参照して詳述す
る。はじめに、文書の検索を行なう為に必要な検索用イ
ンデックス作成時の具体例を示す。The specific operation of the embodiment apparatus having the above-mentioned configuration will be described in detail with reference to the flow charts of FIGS. First, a specific example of creating a search index required for searching a document will be shown.

【００４０】実際に、検索用インデックスを作成する前
にエリア数及びエリアの分割キーワードを、ユーザが任
意に決定しておき外部記憶装置４に格納しておく（図７
参照）。Actually, before the search index is created, the number of areas and the area division keywords are arbitrarily determined by the user and stored in the external storage device 4 (FIG. 7).
reference).

【００４１】先ず、初期化部２００が起動し、各バッフ
ァをクリアする（ステップ３０１）。この時、エリア数
格納バッファ２２６値を「０」、文書数格納バッファ２
２５の値を「０」に初期化する。First, the initialization unit 200 is activated to clear each buffer (step 301). At this time, the value of the area number storage buffer 226 is set to “0”, and the document number storage buffer 2
The value of 25 is initialized to "0".

【００４２】次に、エリア情報読込部２０１が起動し、
外部記憶装置４に格納されているエリア数及びエリア分
割キーワードを読み込み、それぞれエリア数格納バッフ
ァ２２６（図８参照）、エリア分割ワード格納バッファ
２２７に格納する（ステップ３０２、図９参照）。Next, the area information reading unit 201 is activated,
The number of areas and the area division keyword stored in the external storage device 4 are read and stored in the area number storage buffer 226 (see FIG. 8) and the area division word storage buffer 227, respectively (step 302, see FIG. 9).

【００４３】次に、文書読込部２０２が起動し、外部記
憶装置４に格納されている原文書のデータを読み込み、
文書格納バッファ２２８に格納する（ステップ３０
４）。このとき、文書数格納バッファ２２５の値を
「１」インクリメントする（ステップ３０５、図１１参
照）。Next, the document reading unit 202 is activated to read the data of the original document stored in the external storage device 4,
Store in the document storage buffer 228 (step 30)
4). At this time, the value of the document number storage buffer 225 is incremented by "1" (step 305, see FIG. 11).

【００４４】原文書のデータを文書格納バッファ２２８
に読み込むと（図１０参照）、エリア取得部２０３が起
動し、エリア分割ワード格納バッファ２２７に格納され
ている各エリアの開始文字列および終了文字列が文書の
何文字目から何文字目迄かをエリア数格納バッファ２２
６に格納されているエリア数分取得する（ステップ３０
６）。The data of the original document is stored in the document storage buffer 228.
10 (see FIG. 10), the area acquisition unit 203 is activated and the start character string and the end character string of each area stored in the area division word storage buffer 227 are counted from what character to what character of the document. The area number storage buffer 22
The number of areas stored in 6 is acquired (step 30).
6).

【００４５】例えば、エリア１が「はじめに」から「第
１章」の場合には、文頭から「はじめに」の「は」の文
字の出現位置までの文字数と、「第１章」の「第」の文
字の直前の出現位置までの文字数がエリア１の開始位置
と終了位置となる。For example, when Area 1 is from "Introduction" to "Chapter 1", the number of characters from the beginning of the sentence to the appearance position of the character "Ha" in "Introduction" and "Chapter 1" in "Chapter 1" The number of characters up to the appearance position immediately before the character is the start position and end position of area 1.

【００４６】つまり、「はじめに」の「は」の文字が文
頭から５０文字目、「第１章」の「第」の文字が文頭か
ら１００文字目ならば、このエリアの位置情報は、「５
０文字目から９９文字目」迄となる。In other words, if the character "ha" in "Introduction" is the 50th character from the beginning of the sentence and the "first" character in "Chapter 1" is the 100th character from the beginning of the sentence, the position information of this area is "5".
From the 0th character to the 99th character ".

【００４７】エリア取得部２０３は、各エリア単位に開
始位置と終了位置を取得するため、エリアの中に別のエ
リアが存在していたり、エリアとエリアが交差していて
も構わない（例えば、第１章の中に別のエリアである第
１節等、図１３参照）。Since the area acquisition unit 203 acquires the start position and the end position for each area, another area may exist in the area or the areas may intersect (for example, (See Fig. 13, which is a separate area in Chapter 1).

【００４８】このエリア取得部２０３により取得された
エリア情報は、エリア情報格納バッファ２２９に格納さ
れる（図１４参照）。このエリア情報格納バッファ２２
９に格納されたエリア情報を基に、検索用インデックス
の作成を行なっていく。The area information acquired by the area acquisition unit 203 is stored in the area information storage buffer 229 (see FIG. 14). This area information storage buffer 22
The search index is created based on the area information stored in 9.

【００４９】続けて、キーワード分割部２０４が起動
し、文書格納バッファ２２９に格納されている文書を、
分割辞書５（図１２参照）によって単語や文字等のキー
ワード単位に分割し、分割キーワード格納バッファ２３
０に格納する（ステップ３０７、図１５参照）。Subsequently, the keyword dividing unit 204 is activated and the documents stored in the document storage buffer 229 are
The division dictionary 5 (see FIG. 12) is used to divide the keyword into words or characters, and the division keyword storage buffer 23 is used.
0 (step 307, see FIG. 15).

【００５０】分割する際には、キーワードを構成してい
る文字がどのエリアに出現しているかを示す文字位置情
報も同時に文字位置情報格納バッファ２３１に格納する
（ステップ３０８、図１６参照）。At the time of division, the character position information indicating in which area the characters forming the keyword appear is also stored in the character position information storage buffer 231 (step 308, see FIG. 16).

【００５１】この文字の位置情報は１文書単位で作成さ
れ、出現した文字が何番目に出現したかが記述される。
又、同時に、エリア位置情報マッチング部２０５が起動
し、現在分割を行っている文字がどのエリアに出現して
いるかの情報をチェックし（ステップ３０９）、エリア
位置情報格納バッファ２３２に格納する（ステップ３１
０、図１７参照）。The position information of this character is created for each document and describes the order of appearance of the appearing character.
At the same time, the area position information matching unit 205 is activated to check the information on which area the character currently being divided appears in (step 309) and store it in the area position information storage buffer 232. 31
0, see FIG. 17).

【００５２】次に、キーワードインデックス作成部２０
６が起動する。キーワードインデックスの形式は１文書
当たり１ビットで表現を行う。該当するキーワードが存
在していば「１」で表現し、存在していなければ「０」
で表現する。Next, the keyword index creating section 20
6 starts. The keyword index format is expressed by 1 bit per document. If the corresponding keyword exists, it is expressed by "1", and if it does not exist, "0"
Express with.

【００５３】例えば、「計算機」というキーワードのキ
ーワードインデックスが「１００１０００１（２）」と
いうビット列では、１ビット目、４ビット目および８ビ
ット目に相当する文書が「計算機」というキーワードを
含んでいることになる（図１８参照）。For example, in the bit string whose keyword index of the keyword "computer" is "1000001 (2)", the documents corresponding to the first, fourth and eighth bits include the keyword "computer". (See FIG. 18).

【００５４】キーワードインデックス作成部２０６は、
キーワードインデックスを作成し、キーワードインデッ
クス格納バッファ２３３に格納する（ステップ３１
１）。続いて、エリアインデックス作成部２０７が起動
し、エリア位置情報格納バッファ２３３に格納されてい
る全てのエリアに関するインデックスの作成を行なう。
エリアインデックスは、各エリア単位に存在し、形式は
キーワードインデックスの形式と同じであるが、エリア
インデックスはそのエリアに出現している文字のみで構
成されるインデックスである（図１９参照）。The keyword index creating unit 206
A keyword index is created and stored in the keyword index storage buffer 233 (step 31).
1). Then, the area index creation unit 207 is activated to create indexes for all areas stored in the area position information storage buffer 233.
The area index exists in each area unit and has the same format as that of the keyword index, but the area index is an index composed only of characters appearing in the area (see FIG. 19).

【００５５】エリアインデックス作成部２０７は、エリ
アインデックスを作成し、エリアインデックス格納バッ
ファ２３４に格納する（ステップ３１２）。１文書当た
りのキーワード分割が終了し、更に読み込む文書が存在
すればステップ３０３に戻る。The area index creating section 207 creates an area index and stores it in the area index storage buffer 234 (step 312). If keyword division per document is completed and there is a document to be read, the process returns to step 303.

【００５６】全ての文書のキーワード分割が終了すると
インデックス保存部２０８が起動し、文書数格納バッフ
ァ２２５に格納されている文書数、エリア数格納バッフ
ァ２２６に格納されているエリア数、文字位置情報格納
バッファ２３１に格納されている文字位置情報、エリア
情報格納バッファ２２９に格納されているエリア情報、
キーワードインデックス格納バッファ２３３に格納され
ているキーワードインデックス、エリアインデックス格
納バッファ２３４に納されているエリアインデックスを
外部記憶装置４に格納する（ステップ３１３）。When the keyword division of all the documents is completed, the index storage unit 208 is activated to store the number of documents stored in the document number storage buffer 225, the number of areas stored in the area number storage buffer 226, and the character position information storage. Character position information stored in the buffer 231, area information stored in the area information storage buffer 229,
The keyword index stored in the keyword index storage buffer 233 and the area index stored in the area index storage buffer 234 are stored in the external storage device 4 (step 313).

【００５７】これで、検索を行なうのに必要なインデッ
クスの作成が終了する。次に、本装置における検索時の
具体例を示す。先ず、初期化部２５０が起動し、各バッ
ファをクリアする（ステップ４０２）。次に、インデッ
クス読込部２５２が起動し、外部記憶装置４から検索対
象文書中に含まれているキーワードを示すキーワードイ
ンデックスをキーワードインデックス格納バッファ２７
５に格納し、各エリアに含まれてるキーワードを示すエ
リアインデックスをエリアインデックス格納バッファ２
７６に格納し、文字位置情報を文字位置情報格納バッフ
ァ２７９に格納し（図１６参照）、エリア情報をエリア
情報格納バッファ２８０に格納する（ステップ４０３、
図１４参照）。This completes the creation of the index required to perform the search. Next, a specific example at the time of searching by this device will be shown. First, the initialization unit 250 is activated to clear each buffer (step 402). Next, the index reading unit 252 is activated, and a keyword index indicating a keyword included in the search target document is read from the external storage device 4 to the keyword index storage buffer 27.
Area index storage buffer 2 that stores the area index indicating the keyword contained in each area.
76, the character position information is stored in the character position information storage buffer 279 (see FIG. 16), and the area information is stored in the area information storage buffer 280 (step 403,
(See FIG. 14).

【００５８】キーワードインデックス格納バッファ２７
５には、キーワードと文書ＩＤをビット列で表現し、そ
のキーワードが含まれる文書ＩＤの位置のビットが
「１」となっているビット列が格納されている。Keyword index storage buffer 27
5 stores a keyword and a document ID in a bit string, and stores a bit string in which the bit at the position of the document ID including the keyword is "1".

【００５９】エリアインデックス格納バッファ２７６に
格納されているエリアインデックスは、キーワードイン
デックスの構造と同じである。ユーザが入力装置２を用
いて［終了］を選択した場合には（ステップ４０４）、
本装置を終了する（ステップ４１４）。The area index stored in the area index storage buffer 276 has the same structure as the keyword index. When the user selects [END] using the input device 2 (step 404),
The apparatus is terminated (step 414).

【００６０】ユーザが入力装置２を用いてエリアの指定
を行うと（ステップ４０５）、指定されたエリアはエリ
ア格納バッファ２９５に格納され（ステップ４１３）、
検索式を作成するときに使用される。When the user designates an area using the input device 2 (step 405), the designated area is stored in the area storage buffer 295 (step 413).
Used when creating a search expression.

【００６１】エリアの指定が行われない場合には、全文
が検索対象範囲と指定される。このとき、エリア番号は
「０」とする（ステップ４０６）。ユーザはキーボード
などの入力装置２を用いて検索キーの入力や、検索式の
作成を行なうことができる（ステップ４０７）。When the area is not designated, the whole sentence is designated as the search target range. At this time, the area number is set to "0" (step 406). The user can input a search key or create a search expression using the input device 2 such as a keyboard (step 407).

【００６２】そして、ユーザが入力装置２を用いて［検
索］を実行することにより（ステップ４０８）、検索実
行部２５３が起動する。検索実行部２５３は入力されて
いる検索式を検索式格納バッファ２９３に格納する（図
２０参照）。Then, the user executes [search] using the input device 2 (step 408) to start the search execution unit 253. The search execution unit 253 stores the input search expression in the search expression storage buffer 293 (see FIG. 20).

【００６３】このとき、エリアの指定が行なわれている
検索キーはエリア格納バッファ２９５に格納されている
エリア番号が付与された形で格納されるものである。検
索式格納バッファ２９３に検索式が入力されると、論理
演算分割部２６２が起動し、検索式を論理演算記号単位
に分割し、分割された検索式を構成する検索キーを検索
キー格納バッファ２８２に格納し（図２１参照）、論理
演算記号を論理演算記号格納バッファ２８５に格納する
（ステップ４０９、図２２参照）。At this time, the search key in which the area is designated is stored with the area number stored in the area storage buffer 295 added thereto. When a search expression is input to the search expression storage buffer 293, the logical operation dividing unit 262 is activated, the search expression is divided into logical operation symbol units, and the search key forming the divided search expression is stored in the search key storage buffer 282. (See FIG. 21) and the logical operation symbol is stored in the logical operation symbol storage buffer 285 (step 409, see FIG. 22).

【００６４】次に、各検索キーの単位の処理に移る。先
ず、エリア番号抽出部２５８が起動し、指定されている
エリア番号をエリア番号格納バッファ２８１に格納する
（ステップ４１０、図２３参照）。Next, the processing for each search key unit will be described. First, the area number extraction unit 258 is activated to store the designated area number in the area number storage buffer 281 (step 410, see FIG. 23).

【００６５】引き続き、キーワード抽出部２５４が起動
し、検索キー文字列をキーワードインデックスに登録さ
れているキーワードの単位及び文字単位で分割を行な
う。分割されたキーワードおよび文字は夫々キーワード
格納バッファ２８３（図２４参照）、文字格納バッファ
２８４に格納される（（ステップ４１１、図２５参
照）。Subsequently, the keyword extraction unit 254 is activated and the search key character string is divided into units of keywords registered in the keyword index and units of characters. The divided keywords and characters are stored in the keyword storage buffer 283 (see FIG. 24) and the character storage buffer 284, respectively (step 411, see FIG. 25).

【００６６】検索キーの分割が全て終了するとマッチン
グ処理に移る。マッチング処理では、最初にキーワード
マッチング部２５５が起動し、キーワード格納バッファ
２８３に格納されている検索キーワードと、キーワード
インデックス格納バッファ２７５に格納されているイン
デックスとでキーワードのマッチングを取り、どの文書
ＩＤにその検索キーワードが含まれているかを示すキー
ワードインデックスのビット列を抽出してくる。When the division of the search key is completed, the matching process starts. In the matching process, first, the keyword matching unit 255 is activated, and the keyword matching is performed between the search keyword stored in the keyword storage buffer 283 and the index stored in the keyword index storage buffer 275 to determine which document ID. A bit string of a keyword index indicating whether the search keyword is included is extracted.

【００６７】キーワードマッチング部２５５により抽出
されたビット列は、キーワードビット列格納バッファ２
８７に格納される（ステップ４１２）。ここで、エリア
番号格納バッファ２８１に格納されているエリア番号が
「０」つまり全文対象になっている場合には（ステップ
４２５）、キーワードビット列格納バッファ２８７に格
納されているビット列を一時ビット列格納バッファ２８
９に移す（図２６参照）。The bit string extracted by the keyword matching unit 255 is stored in the keyword bit string storage buffer 2
It is stored in 87 (step 412). Here, when the area number stored in the area number storage buffer 281 is "0", that is, when the whole sentence is targeted (step 425), the bit string stored in the keyword bit string storage buffer 287 is changed to the temporary bit string storage buffer. 28
9 (see FIG. 26).

【００６８】エリア番号格納バッファ２８１に全文対象
以外のエリアが指定されている場合には、エリアマッチ
ング部２５６が起動する。エリアマッチング部２５６が
起動すると、文字格納バッファ２８４に格納されている
文字で且つエリア番号格納バッファ２８１に格納されて
いるエリア番号に相当するビット列をエリアインデック
ス格納バッファ２７６に格納されているエリアインデッ
クスの中から抽出する（ステップ４１５）。When an area other than the full text target is designated in the area number storage buffer 281, the area matching unit 256 is activated. When the area matching unit 256 is activated, a bit string corresponding to the area number stored in the area number storage buffer 281 that is a character stored in the character storage buffer 284 is stored in the area index storage buffer 276. It is extracted from the inside (step 415).

【００６９】この時、複数文字から成り立っている検索
キーの場合は各ビット列の論理積を取りながら抽出し、
エリアビット列格納バッファ２８８に格納する。キーワ
ードビット列格納バッファ２８７及びエリアビット列格
納バッファ２８９にット列が格納されると、ビット演算
部２５７が起動し、キーワードビット列格納バッファ２
８７及びエリアビット列格納バッファ２８８に格納され
ているビット列の論理積を取り、一時ビット列格納バッ
ファ２８９に格納する（ステップ４１６）。At this time, in the case of a search key composed of a plurality of characters, extraction is performed while taking the logical product of each bit string,
The data is stored in the area bit string storage buffer 288. When the bit string is stored in the keyword bit string storage buffer 287 and the area bit string storage buffer 289, the bit operation unit 257 is activated and the keyword bit string storage buffer 2
87 and the bit strings stored in the area bit string storage buffer 288 are ANDed and stored in the temporary bit string storage buffer 289 (step 416).

【００７０】ここで、論理演算記号格納バッファ２８５
に論理演算記号が格納されていない場合、或いは検索キ
ーが複数ある場合に於ける初回の検索キーのビット列の
場合には、一時ビット列格納バッファ２８９に格納され
ているビット列をそのまま結果ビット列格納バッファ２
９０に移す。Here, the logical operation symbol storage buffer 285 is used.
If the logical operation symbol is not stored in, or if there is a plurality of search keys, and the bit string is the first search key, the bit string stored in the temporary bit string storage buffer 289 is directly used as the result bit string storage buffer 2
Move to 90.

【００７１】論理演算記号格納バッファ２８５に論理演
算記号が格納されていて、且つ二つ目以降の検索キーの
場合は、一時ビット列格納バッファ２８９及び結果ビッ
ト列格納バッファ２９０に格納されているビット列を、
論理演算記号格納バッファ２８５に格納されている論理
演算記号で論理演算を行ない、結果ビット列格納バッフ
ァ２８９に再格納する（ステップ４１７）。When the logical operation symbol is stored in the logical operation symbol storage buffer 285 and the second and subsequent search keys are used, the bit strings stored in the temporary bit string storage buffer 289 and the result bit string storage buffer 290 are
A logical operation is performed using the logical operation symbol stored in the logical operation symbol storage buffer 285, and the result bit string storage buffer 289 stores it again (step 417).

【００７２】全ての検索キーについてビット列の抽出及
びビット演算が終了すると（ステップ４１８）、文書Ｉ
Ｄカウント部２６０が起動し、最終的に論理演算により
導き出されて結果ビット列格納バッファ２９０に格納さ
れているビット列のうち、ビットが「１」となっている
文書ＩＤをカウントし、文書ＩＤ格納バッファ２９２に
納する（ステップ４１９、図２７参照）。When the bit string extraction and bit calculation are completed for all the search keys (step 418), the document I
The D count unit 260 is activated, and finally, among the bit strings stored in the result bit string storage buffer 290 that are derived by the logical operation, the document IDs whose bits are “1” are counted, and the document ID storage buffer is counted. It is stored in 292 (step 419, see FIG. 27).

【００７３】続けて、出現位置マッチング部２６１が起
動する。出現位置マッチング部２６１は、分割された検
索キーがユーザが指定したエリア内で連続出現している
かどうかをチェックする（ステップ４２０、図２８参
照）。Subsequently, the appearance position matching unit 261 is activated. The appearance position matching unit 261 checks whether or not the divided search keys continuously appear in the area designated by the user (step 420, see FIG. 28).

【００７４】先ず、エリア情報格納バッファ２８０に格
納されているエリア情報から、文書ＩＤ格納バッファ２
９２に格納されている文書ＩＤの、エリア番号格納バッ
ファ２８１に格納されているエリア番号の開始位置及び
終了位置をサーチする。First, from the area information stored in the area information storage buffer 280, the document ID storage buffer 2
The start position and end position of the area number stored in the area number storage buffer 281 of the document ID stored in 92 are searched.

【００７５】エリアの開始位置から終了位置内に、検索
キーを構成する文字の位置情報が存在しているものを見
つけ、検索キーを構成する文字の次の文字が次の値（例
えば、１文字目の文字位置情報が「７６」のとき、次の
文字が「７７」を示しているならば、その文字は連続で
あることを示す）。It is found that the position information of the character forming the search key exists between the start position and the end position of the area, and the character next to the character forming the search key has the next value (for example, 1 character). When the character position information of the eye is "76" and the next character indicates "77", it means that the character is continuous.

【００７６】これを繰り返して、検索キーがユーザの指
定したエリアに存在しているかどうかのチェックが行え
る。これを１文書に対して、分割された検索キー分行な
う。文書中に全ての検索キーが存在していた場合、該当
文書ＩＤ格納バッファ２９４に文書ＩＤを格納する（図
２９参照）。By repeating this, it is possible to check whether or not the search key exists in the area designated by the user. This is performed for one document for divided search keys. If all the search keys exist in the document, the document ID is stored in the corresponding document ID storage buffer 294 (see FIG. 29).

【００７７】出現位置マッチング部２６１により、抽出
された文書ＩＤが該当文書ＩＤ格納バッファ２９４に格
納されると（ステップ４２１）、検索回答部２６３が起
動し、該当文書ＩＤ格納バッファ２９４に格納されてい
る文書ＩＤを検索回答として取得する（ステップ４２
２）。When the appearance position matching unit 261 stores the extracted document ID in the corresponding document ID storage buffer 294 (step 421), the search response unit 263 is activated and stored in the corresponding document ID storage buffer 294. The existing document ID as a search answer (step 42)
2).

【００７８】そして、出力部２６４が起動し、出力装置
３に検索結果を出力する（ステップ４２３）。新たに検
索キーを入力して検索を行なう場合には、入力装置２に
より［クリア］を選択することにより（ステップ４２
４）、バッファクリア部２５１が起動し（ステップ４０
１）、ステップ４０３に戻る。引き続き検索を行なう場
合には、ステップ４０４に戻る。Then, the output unit 264 is activated to output the search result to the output device 3 (step 423). When a search key is newly input to perform a search, the user can select [Clear] with the input device 2 (step 42).
4), the buffer clear unit 251 is activated (step 40
1) and returns to step 403. When continuing the search, the process returns to step 404.

【００７９】[0079]

【発明の効果】以上詳記したように本発明によれば、ユ
ーザが任意に入力した単語や文字から構成される検索キ
ーに対して、検索対象文書のどの位置（例えば、章や節
等）に出現しているのか検索範囲（エリア）を指定して
検索することが可能になる。As described above in detail, according to the present invention, which position (for example, chapter or section) of the document to be searched is relative to the search key composed of words and characters arbitrarily input by the user. It is possible to search by specifying the search range (area) whether it appears in.

【００８０】又、検索キーがどの文書のどのエリアに含
まれているのかを表す検索用のインデックスをビット列
で表現する為、エリアの大きさに関わらず、文書中の指
定エリア内にキーワードが存在しているかどうかを１ビ
ットで表現可能である。Further, since the index for retrieval indicating which area of which document the retrieval key is included in is represented by a bit string, a keyword exists in the designated area in the document regardless of the size of the area. Whether or not it can be expressed by 1 bit.

【００８１】又、検索用のインデックスを全文対象用と
エリア用と２種類のインデックスに分割し、全文対象の
みの検索の場合にはエリア用インデックスは使用しない
為、処理的に不必要なインデックス等を持つ必要がな
い。Further, the index for retrieval is divided into two types of indexes, one for full text and one for area, and the area index is not used when only full text is searched. You don't have to have.

【００８２】又、エリアの検索時には単語に関しては全
文対象用のインデックスから、又、その単語を構成する
文字はエリア用のインデックスからビット列を抽出し、
論理演算を行う為、エリア用のインデックスは文字のみ
のインデックスで構成される。この為、エリア用のイン
デックスの容量を小規模化できる。When searching an area, a bit string is extracted from a full-text target index for a word, and a character string forming the word is extracted from an area index.
Since logical operations are performed, the area index is composed of only characters. Therefore, the capacity of the area index can be reduced.

【００８３】このような検索用のインデックスにより、
ユーザの指定する検索範囲が小さい検索範囲や全文対象
等、どのような検索範囲の指定であっても、同じビット
列の演算処理により、ユーザが意図している文書以外の
文書を抽出してくる量を減少させることが可能となるエ
リア検索を高速に実現することが可能である。これによ
り、ユーザの検索作業効率が大幅に向上する。With such a search index,
The amount of documents other than the document intended by the user is extracted by the same bit string calculation processing, regardless of the search range specified by the user, such as the search range or the full text target. It is possible to realize area search at high speed that can reduce As a result, the search work efficiency of the user is significantly improved.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の一実施例の概略構成を示すブロック
図。FIG. 1 is a block diagram showing a schematic configuration of an embodiment of the present invention.

【図２】同実施例に於ける制御部の構成を示すブロック
図。FIG. 2 is a block diagram showing a configuration of a control unit in the embodiment.

【図３】同実施例に於ける制御部の構成を示すブロック
図。FIG. 3 is a block diagram showing a configuration of a control unit in the embodiment.

【図４】同実施例に於けるインデックス作成処理の流れ
を示す図。FIG. 4 is a diagram showing a flow of index creation processing in the embodiment.

【図５】同実施例に於ける検索処理の流れを示す図。FIG. 5 is a diagram showing a flow of a search process in the embodiment.

【図６】同実施例に於ける検索処理の流れを示す図。FIG. 6 is a diagram showing a flow of search processing in the embodiment.

【図７】同実施例に於けるエリア分割キーワードの外部
記憶装置への格納例を示す図。FIG. 7 is a diagram showing an example of storing an area division keyword in an external storage device according to the same embodiment.

【図８】同実施例に於けるエリア数格納バッファへの格
納例を示す図。FIG. 8 is a diagram showing an example of storage in an area number storage buffer in the embodiment.

【図９】同実施例に於けるエリア分割ワード格納バッフ
ァへの格納例を示す図。FIG. 9 is a diagram showing an example of storage in an area division word storage buffer in the embodiment.

【図１０】同実施例に於ける文書格納バッファへの格納
例を示す図。FIG. 10 is a diagram showing an example of storage in a document storage buffer according to the same embodiment.

【図１１】同実施例に於ける文書数格納バッファへの格
納例を示す図。FIG. 11 is a diagram showing an example of storage in a document number storage buffer in the embodiment.

【図１２】同実施例に於ける分割辞書の例を示す図。FIG. 12 is a diagram showing an example of a division dictionary in the same embodiment.

【図１３】同実施例に於けるエリアの中にエリアが存在
する例を示す図。FIG. 13 is a diagram showing an example in which an area exists among the areas in the embodiment.

【図１４】同実施例に於けるエリア情報格納バッファへ
の格納例を示す図。FIG. 14 is a diagram showing an example of storage in an area information storage buffer according to the same embodiment.

【図１５】同実施例に於ける分割キーワード格納バッフ
ァへの格納例を示す図。FIG. 15 is a diagram showing an example of storage in a divided keyword storage buffer according to the embodiment.

【図１６】同実施例に於ける文字位置情報格納バッファ
への格納例を示す図。FIG. 16 is a diagram showing an example of storage in a character position information storage buffer in the embodiment.

【図１７】同実施例に於けるエリア位置情報格納バッフ
ァへの格納例を示す図。FIG. 17 is a diagram showing an example of storage in an area position information storage buffer in the embodiment.

【図１８】同実施例に於けるキーワードインデックスの
例を示す図。FIG. 18 is a diagram showing an example of a keyword index in the embodiment.

【図１９】同実施例に於けるエリアインデックスの例を
示す図。FIG. 19 is a diagram showing an example of an area index in the embodiment.

【図２０】同実施例に於ける検索式格納バッファへの格
納例を示す図。FIG. 20 is a diagram showing an example of storage in a search expression storage buffer in the embodiment.

【図２１】同実施例に於ける検索キー格納バッファへの
格納例を示す図。FIG. 21 is a diagram showing an example of storage in a search key storage buffer according to the embodiment.

【図２２】同実施例に於ける論理演算記号格納バッファ
への格納例を示す図。FIG. 22 is a diagram showing an example of storage in a logical operation symbol storage buffer according to the same embodiment.

【図２３】同実施例に於けるエリア番号格納バッファへ
の格納例を示す図。FIG. 23 is a diagram showing an example of storage in an area number storage buffer in the embodiment.

【図２４】同実施例に於けるキーワード格納バッファへ
の格納例を示す図。FIG. 24 is a diagram showing an example of storage in a keyword storage buffer according to the embodiment.

【図２５】同実施例に於ける文字格納バッファへの格納
例を示す図。FIG. 25 is a diagram showing an example of storage in a character storage buffer in the embodiment.

【図２６】同実施例に於けるビット演算の例を示す図。FIG. 26 is a diagram showing an example of a bit operation in the embodiment.

【図２７】同実施例に於ける文書ＩＤ格納バッファへの
格納例を示す図。FIG. 27 is a diagram showing an example of storage in a document ID storage buffer according to the embodiment.

【図２８】同実施例に於ける出現位置マッチング部の処
理の例を示す図。FIG. 28 is a diagram showing an example of processing of an appearance position matching unit in the embodiment.

【図２９】同実施例に於ける該当文書ＩＤ格納バッファ
への格納例を示す図。FIG. 29 is a diagram showing an example of storage in a corresponding document ID storage buffer in the embodiment.

【図３０】同実施例に於ける文書検索装置及びエリアの
指定方法の例を示す図。FIG. 30 is a diagram showing an example of a document search device and an area designation method according to the embodiment.

【符号の説明】[Explanation of symbols]

１…制御装置、２…入力装置、３…出力装置、４…外部
記憶装置、５…分割辞書、１０…インデックス作成処理
ブロック、１１…検索処理ブロック、２０…インデック
ス作成制御部、２１…検索制御部、２００…初期化部、
２０１…エリア情報読込部、２０２…文書読込部、２０
３…エリア取得部、２０４…キーワード分割部、２０５
…エリア位置情報マッチング部、２０６…キーワードイ
ンデックス作成部、２０７…エリアインデックス作成
部、２０８…インデックス保存部、２２５…文書数格納
バッファ、２２６…エリア数格納バッファ、２２７…エ
リア分割ワード格納バッファ、２２８…文書格納バッフ
ァ、２２９…エリア情報格納バッファ、２３０…分割キ
ーワード格納バッファ、２３１…文字位置情報格納バッ
ファ、２３２…エリア位置情報格納バッファ、２３３…
キーワード゛インデックス格納バッファ、２３４…エリ
アインデックス格納バッファ、２５０…初期化部、２５
１…バッファクリア部、２５２…インデックス読込部、
２５３…検索実行部、２５４…キーワード抽出部、２５
５…キーワードマッチング部、２５６…エリアマッチン
グ部、２５７…ビット演算部、２５８…エリア番号抽出
部、２６０…文書ＩＤカウント部、２６１…出現位置マ
ッチング部、２６２…論理演算分割部、２６３…検索回
答部、２６４…出力部、２６５…入力部、２７５…キー
ワードインデックス格納バッファ、２７６…エリアイン
デックス格納バッファ、２７９…文字位置情報格納バッ
ファ、２８０…エリア情報格納バッファ、２８１…エリ
ア番号格納バッファ、２８２…検索キー格納バッファ、
２８３…キーワード格納バッファ、２８４…文字格納バ
ッファ、２８５…論理演算記号格納バッファ、２８７…
キーワードビット列格納バッファ、２８８…エリアビッ
ト列格納バッファ、２８９…一時ビット列格納バッフ
ァ、２９０…結果ビット列格納バッファ、２９２…文書
ＩＤ格納バッファ、２９３…検索式格納バッファ、２９
４…該当文書ＩＤ格納バッファ、２９５…エリア格納バ
ッファ。DESCRIPTION OF SYMBOLS 1 ... Control device, 2 ... Input device, 3 ... Output device, 4 ... External storage device, 5 ... Divided dictionary, 10 ... Index creation processing block, 11 ... Search processing block, 20 ... Index creation control unit, 21 ... Search control Part, 200 ... initialization part,
201 ... Area information reading unit, 202 ... Document reading unit, 20
3 ... Area acquisition unit, 204 ... Keyword division unit, 205
... Area position information matching unit, 206 ... Keyword index creation unit, 207 ... Area index creation unit, 208 ... Index storage unit, 225 ... Document number storage buffer, 226 ... Area number storage buffer, 227 ... Area division word storage buffer, 228 ... document storage buffer, 229 ... area information storage buffer, 230 ... division keyword storage buffer, 231 ... character position information storage buffer, 232 ... area position information storage buffer, 233 ...
Keyword “index storage buffer, 234 ... Area index storage buffer, 250 ... Initialization unit, 25
1 ... Buffer clearing unit, 252 ... Index reading unit,
253 ... Search execution unit, 254 ... Keyword extraction unit, 25
5 ... Keyword matching part, 256 ... Area matching part, 257 ... Bit operation part, 258 ... Area number extraction part, 260 ... Document ID counting part, 261 ... Appearing position matching part, 262 ... Logical operation dividing part, 263 ... Search answer Section, 264 ... output section, 265 ... input section, 275 ... keyword index storage buffer, 276 ... area index storage buffer, 279 ... character position information storage buffer, 280 ... area information storage buffer, 281 ... area number storage buffer, 282 ... Search key storage buffer,
283 ... Keyword storage buffer, 284 ... Character storage buffer, 285 ... Logical operation symbol storage buffer, 287 ...
Keyword bit string storage buffer, 288 ... Area bit string storage buffer, 289 ... Temporary bit string storage buffer, 290 ... Result bit string storage buffer, 292 ... Document ID storage buffer, 293 ... Search expression storage buffer, 29
4 ... Applicable document ID storage buffer, 295 ... Area storage buffer.

───────────────────────────────────────────────────── フロントページの続き (72)発明者中本幸夫東京都青梅市新町1381番地１東芝コンピュ―タエンジニアリング株式会社内 (72)発明者岩井勇東京都青梅市末広町２丁目９番地株式会社東芝青梅工場内 (72)発明者松隈剛東京都青梅市新町1381番地１東芝コンピュ―タエンジニアリング株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Yukio Nakamoto 1381 Shinmachi, Ome-shi, Tokyo Within Toshiba Computer Engineering Co., Ltd. (72) Inventor Isamu Iwai 2-9 Suehirocho, Ome-shi, Tokyo Stocks Incorporated at Toshiba Ome Plant (72) Inventor Tsuyoshi Matsukuma 1138-1 Shinmachi, Ome-shi, Tokyo Inside Toshiba Computer Engineering Co., Ltd.

Claims

【特許請求の範囲】[Claims]

【請求項１】入力した任意の文字列を含む文書を検索
する文書検索装置に於いて、文書を検索する為の検索用
のインデックス作成方法であって、全検索対象文書中に出現する全ての単語及び文字等のキ
ーワードを抽出し、このキーワードの出現情報をビット列で持つことで文書
の検索を可能にする為の全文検索用のインデックスを作
成し、同時にキーワードがどのエリアに出現しているか
を示すキーワード存在情報を、エリア単位にビット列で
持たせ、エリア用のインデックスを作成することを特徴
とする文書検索装置のインデックス作成方法。1. A document retrieval apparatus for retrieving a document containing an input arbitrary character string, which is an indexing method for retrieval for retrieving a document, in which all of the documents appearing in all retrieval target documents are searched. By extracting keywords such as words and characters, and having the occurrence information of these keywords in a bit string, create an index for full-text search to enable document search, and at the same time identify in which area the keyword appears. A method of creating an index for a document search device, characterized in that the keyword existence information is stored as a bit string for each area and an index for the area is created.

【請求項２】エリア用のインデックスを作成する際、
文書のエリアに出現した文字のみでインデックスを作成
することを特徴とする請求項１記載の文書検索装置のイ
ンデックス作成方法。2. When creating an index for an area,
2. The method for creating an index in a document search device according to claim 1, wherein the index is created only by the characters that appear in the area of the document.

【請求項３】エリア数とエリアを任意に設定できるよ
うにしたことを特徴とする請求項２記載の文書検索装置
のインデックス作成方法。3. The method for creating an index in a document retrieval apparatus according to claim 2, wherein the number of areas and the areas can be set arbitrarily.

【請求項４】作成されたインデックスを用いる際、エ
リアが指定された検索を行なう場合に全文検索用インデ
ックスとエリア用インデックスを併用することで文書の
検索を行なうことを特徴とする請求項２記載の文書検索
装置のインデックス作成方法。4. The document is searched by using the full-text search index and the area index together when a search is performed in which an area is designated when using the created index. Indexing method for the document retrieval device.

【請求項５】入力した任意のキーワードを含む文書を
検索する文書検索装置であって、エリア用のインデック
スを作成する際、文書のエリアに出現した文字のみでイ
ンデックスを作成するインデックス作成方法で作成され
る検索用のインデックスを用いた文書検索装置に於い
て、検索キーを入力する検索キー入力手段と、検索キーが２語以上から構成されているとき、当該検索
キーを語単位に分割するキーワード分割手段と、任意に検索範囲指定できるエリア指定手段と、前記検索キーを含む文書を抽出する検索キーマッチング
手段と、前記検索キーがキーワード分割手段により分割されてい
るとき、キーワードの出現位置を示すキーワード出現位
置情報を用いて指定されたエリア内でのつながりを調べ
る出現位置マッチング手段とを具備し、全検索対象文書中より要求にあった文書を検索して得ら
れた結果を出力するようにしたことを特徴とする文書検
索装置。5. A document retrieval device for retrieving a document containing an input arbitrary keyword, wherein when creating an index for an area, it is created by an index creating method that creates an index only with characters that appear in the area of the document. In a document search device using an index for search, a search key input means for inputting a search key and a keyword for dividing the search key into word units when the search key is composed of two or more words Dividing means, area specifying means for arbitrarily specifying the search range, search key matching means for extracting a document including the search key, and indicating the appearance position of the keyword when the search key is divided by the keyword dividing means And an appearance position matching means for checking the connection within the designated area using the keyword appearance position information. And, all search target document document retrieval system which is characterized in that so as to output a result obtained by searching the documents that were from the requesting.