JPH06309368A

JPH06309368A - Document retrieving device

Info

Publication number: JPH06309368A
Application number: JP5115402A
Authority: JP
Inventors: Nobuhiro Yamazaki; 伸宏山崎; Masato Kobe; 正人小部
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1993-04-20
Filing date: 1993-04-20
Publication date: 1994-11-04

Abstract

PURPOSE:To reduce a retrieval noise and improve the efficiency of document retrieval and to efficiently obtain only a necessary part from a document which is passed as a result of the retrieval. CONSTITUTION:The document retrieving device is equipped with a document storage means 13 which stores plural document data consisting of document elements, a retrieval condition input means 11 for specifying document element specification conditions as conditions for specifying the document elements and document contents conditions as conditions for specifying document contents corresponding to the document elements, a document element retrieval means 122 which retrieves document elements in the document storage means under the document element specification conditions specified by the retrieval condition input means, and a contents retrieval means 123 which retrieves document elements meeting the document contents conditions specified by the retrieval condition input means among the document elements take out by the document element retrieval means.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書データを効率よく
取り出すための文書検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document retrieval device for efficiently retrieving document data.

【０００２】[0002]

【従来の技術】従来、大量の文書からの文書検索手法と
しては、検索対象となる各文書にその文書の内容を表す
代表的なキーワードを付けて記憶し、文書検索時にはそ
のキーワードを利用して検索を行うという、いわゆるキ
ーワード検索が一般的であった（例えば、特開平２−２
７７１６８号公報、特開平３−１５６６７８号公報
等）。しかしこの手法では、文書内容に変更が生じた場
合等の対応が煩雑であること、また各文書に付与したキ
ーワードでしか検索できないので文書検索の自由度が低
い等の欠点がある。2. Description of the Related Art Conventionally, as a document search method from a large number of documents, a typical keyword representing the contents of the document is attached to each document to be searched and stored, and the keyword is used at the time of document search. A so-called keyword search in which a search is performed is common (for example, Japanese Patent Laid-Open No. 2-2
77168, JP-A-3-156678, etc.). However, this method has drawbacks that it is complicated to deal with the case where the document contents are changed, and that the degree of freedom in document retrieval is low because the retrieval can be performed only with the keywords assigned to each document.

【０００３】このような背景から近年は全文検索手法が
注目されている。全文検索とは入力された文字列などの
検索条件を記憶されている文書の全ての情報に対し照合
を行い、照合が成功した文書を結果として出力するとい
う手法である。この手法には、検索文字列がキーワード
のような特定の単語に限定されず検索の自由度が高くな
るということと、各文書に対して検索のためのキーワー
ドづけ等の処理を必要としないことなどの利点がある。
しかしながら、全文検索時において被照合文字列が文書
中のどの位置にあっても照合が成功するので、あまり関
係のない文書が検索結果中に含まれるいわゆる検索ノイ
ズが発生し易い。From such a background, a full-text search method has been attracting attention in recent years. The full-text search is a method of matching all the information of a document in which a search condition such as an input character string is stored and outputting the document for which the matching is successful as a result. In this method, the search character string is not limited to specific words such as keywords, and the degree of freedom in search is high, and processing such as keyword addition for each document is not required for each document. There are advantages such as.
However, since the matching is successful regardless of the position of the collated character string in the document during the full-text search, so-called search noise is likely to occur in which the documents that are not so related are included in the search result.

【０００４】全文検索手法の検索ノイズの問題を改善す
る技術の一例として特開平３−２４８２７２号公報（文
書検索装置）記載の技術がある。これは全文検索時にお
いて被照合文字列が文書中のどの位置にあっても照合が
成功することが原因で生じる検索ノイズを部分的な形態
素解析を利用して減少させるというものである。As an example of a technique for improving the problem of search noise in the full-text search method, there is a technique described in Japanese Patent Laid-Open No. 3-248272 (document search device). This is to use partial morphological analysis to reduce the search noise caused by successful matching regardless of the position of the matched character string in the document during full-text search.

【０００５】[0005]

【発明が解決しようとする課題】しかしこのような従来
の全文検索の技術では、文字列の検索対象領域が全文に
渡っているために処理を高速に行うことができないこ
と、高速に形態素解析を行うために形態素解析装置と辞
書記憶装置が必要であることなどの欠点があった。However, in such a conventional full-text search technique, the processing cannot be performed at high speed because the search target area of the character string extends over the entire text, and the morphological analysis can be performed at high speed. There are drawbacks such as the need for a morphological analysis device and a dictionary storage device in order to carry out.

【０００６】従来の文書検索装置では、文書全体（先頭
から最後まで）を検索結果として渡し、表示／印刷など
の出力をしていた。文書全体を必要とするのではなく、
指定したキーワードが出現する部分だけを見たい場合な
どでは、検索結果の文書全体を渡すために記憶部との間
でデータをやり取りする量が大量となる。更に、文書全
体の中から指定したキーワードが出現している箇所を捜
し出すために時間を要するなどの課題があった。In the conventional document retrieval apparatus, the entire document (from the beginning to the end) is passed as a retrieval result, and output such as display / printing is performed. Instead of needing the entire document
When it is desired to see only the portion where the specified keyword appears, the amount of data exchanged with the storage unit is large in order to pass the entire document as the search result. Further, there is a problem that it takes time to search for a portion where the specified keyword appears in the entire document.

【０００７】また、文だけを渡して、表示／印刷してい
るものでは、その文の前後を見たい場合には再度記憶部
から文書を取り出す必要があり手数が多くなっていた。Further, in a case where only a sentence is passed and displayed / printed, it is necessary to retrieve the document from the storage unit again when it is desired to see before and after the sentence, which is troublesome.

【０００８】本発明はこのような問題点を解消すること
を目的とするものである。即ち、本発明は、文書検索装
置において、検索ノイズの減少と、文書検索の効率を向
上させることを目的とする。An object of the present invention is to eliminate such problems. That is, it is an object of the present invention to reduce search noise and improve document search efficiency in a document search device.

【０００９】また、本発明は、文書検索装置において、
検索の結果渡された文書から必要とする部分のみを効率
的に取得することができるようにすることを目的とす
る。Further, according to the present invention, in a document retrieval device,
The purpose is to enable efficient retrieval of only the required part from the documents passed as a result of the search.

【００１０】[0010]

【問題を解決するための手段】本発明（請求項１）の文
書処理装置は、文書要素から構成される複数の文書デー
タを記憶する文書記憶手段（図１の１３）と、文書要素
を特定するための条件である文書要素指定条件および文
書要素に対応する文書内容を特定するための条件である
文書内容条件を指定する検索条件入力手段（図１の１
１）と、検索条件入力手段により指定された文書要素指
定条件を満たす前記文書記憶手段内の文書要素を検索す
る文書要素検索手段（図１の１２２）と、前記文書要素
検索手段により取り出された文書要素に関して、前記検
索条件入力手段により指定された文書内容条件を満たす
ものを検索する内容検索手段（図１の１２３）とを備え
たことを特徴とする。A document processing apparatus according to the present invention (claim 1) specifies a document storage unit (13 in FIG. 1) for storing a plurality of document data composed of document elements. Retrieval condition input means (1 in FIG. 1) for designating a document element designating condition which is a condition for specifying and a document content condition which is a condition for identifying document content corresponding to the document element.
1), a document element search means (122 in FIG. 1) for searching for a document element in the document storage means that satisfies the document element specification condition specified by the search condition input means, and the document element search means. With regard to the document element, the content search means (123 in FIG. 1) for searching the document elements satisfying the document content condition specified by the search condition input means is provided.

【００１１】また、本発明（請求項２）は、文書要素か
ら構成される複数の文書データを記憶する文書記憶手段
（図１３の１３１）と、前記文書記憶手段に記憶される
文書から見出しを表す文字列を検出する見出し検出手段
（図１３の１３２）と、前記見出し検出手段により検出
された文字列と前記文書記憶手段により記憶された文書
要素との対応関係を表す検索用情報を生成する検索用情
報生成手段（図１３の１３３）と、前記検索用情報生成
手段により生成された検索用情報を記憶する検索用情報
記憶手段（図１３の１３４）と、文書要素を特定するた
めの条件である文書要素指定条件および文書要素に対応
する文書内容を特定するための条件である文書内容条件
を指定する検索条件入力手段（図１３の１３６）と、検
索条件入力手段により指定された文書要素指定条件を満
たす文書要素を前記検索用情報を用いて検索する文書要
素検索手段（図１３の１３５）と、前記文書要素検索手
段により取り出された文書要素に関して、前記検索条件
入力手段により指定された文書内容条件を満たすものを
検索する内容検索手段（図１３の１３８）とを備えたこ
とを特徴とする。Further, the present invention (Claim 2) comprises a document storage means (131 in FIG. 13) for storing a plurality of document data composed of document elements, and a heading from the document stored in the document storage means. A headline detecting means (132 in FIG. 13) for detecting the character string to be represented, and search information representing the correspondence between the character string detected by the headline detecting means and the document element stored in the document storage means are generated. Search information generation means (133 in FIG. 13), search information storage means (134 in FIG. 13) for storing the search information generated by the search information generation means, and conditions for specifying document elements The search condition input means (136 in FIG. 13) for specifying the document element specifying condition and the document content condition for specifying the document content corresponding to the document element, and the search condition input means. The document element search means (135 in FIG. 13) for searching the document element satisfying the specified document element specification condition using the search information, and the search condition for the document element retrieved by the document element search means. Content retrieval means (138 in FIG. 13) for retrieving documents satisfying the document content conditions designated by the input means.

【００１２】また、本発明（請求項３）は、構造化され
た文書データの検索を行なう文書検索装置において、検
索の結果の文書データ取り出し範囲を、文書の全体また
は部分構造として指定する取り出し範囲指定手段（図５
の５４）と、その取り出し範囲指定手段により指定され
た部分構造を検索する手段（図５の５２）とを有するこ
とを特徴とする。Further, according to the present invention (claim 3), in a document retrieval apparatus for retrieving structured document data, a retrieval range for designating a retrieval range of document data as a whole or a partial structure of a document. Designating means (Fig. 5
54) and means for retrieving the partial structure designated by the extraction range designating means (52 in FIG. 5).

【００１３】[0013]

【作用】検索条件入力手段（図１の１１）により文書要
素指定条件および文書内容条件を指定する。文書におけ
る文書要素は、例えば、章、節、段落など文書を構成す
る要素であり、構造化文書の文書部品と呼ばれているも
のを含む。文書要素指定条件はその文書要素を特定する
ための条件であり、例えば文書要素に含まれる見出しの
文字列や文書構造における位置などがある。文書要素検
索手段（図１の１２２）は文書要素指定条件を満たす全
ての文書要素を文書記憶手段から捜し出す。文書要素検
索条件として例えば文書要素の見出しの文字列が指定さ
れた場合には、その指定された文字列と各文書要素の見
出しの文字列とを比較して、条件を満たす文書要素を全
て捜し出す。The document element specifying condition and the document content condition are specified by the search condition inputting means (11 in FIG. 1). Document elements in a document are elements that constitute the document, such as chapters, sections, and paragraphs, and include what are called document parts of structured documents. The document element designation condition is a condition for specifying the document element, and is, for example, a character string of a heading included in the document element or a position in the document structure. The document element search means (122 in FIG. 1) searches the document storage means for all the document elements that satisfy the document element designation condition. For example, when the character string of the heading of the document element is specified as the document element search condition, the specified character string is compared with the character string of the heading of each document element to find all the document elements that satisfy the condition. .

【００１４】内容検索手段（図１の１２３）は、前記文
書要素検索手段により捜し出された文書要素群を検索の
対象として、前記検索条件入力手段により指定された文
書内容条件を満たすものを検索する。文書内容条件は文
書要素に対応する文書内容の特定のものを捜すための条
件であり、文字列として与えられる。内容検索手段（図
１の１２３）は文書内容条件として指定された文字列を
含む文書内容を持つ文書要素を検索する。The content search means (123 in FIG. 1) searches for the document element group found by the document element search means and searches for those satisfying the document content conditions specified by the search condition input means. To do. The document content condition is a condition for searching for a specific document content corresponding to a document element, and is given as a character string. The content retrieval means (123 in FIG. 1) retrieves a document element having document content including a character string designated as a document content condition.

【００１５】以上のように、本発明は文書要素条件を用
いた検索により文書要素を選択し、その選択された文書
要素に対して文書内容条件による検索を施すようにした
ので、検索の効率の改善、検索ノイズの減少をはかるこ
とができる。特に、文書が構造化文書である場合には、
文書記憶手段に格納された任意個の構造化文書に対して
見出し文字列などで文書部品を選択し、対応する文書内
容を読み出して検索の対象領域とすることにより、検索
時における自由度の高い領域指定をすることができる。As described above, according to the present invention, the document element is selected by the search using the document element condition, and the selected document element is searched by the document content condition. It is possible to improve and reduce search noise. In particular, if the document is a structured document,
The degree of freedom at the time of search is high by selecting a document part with an index character string or the like for an arbitrary number of structured documents stored in the document storage means and reading out the corresponding document content to be a search target area. You can specify the area.

【００１６】本発明（請求項２）において、文書の見出
しを見出し検出手段により検出され、この見出しと文書
要素との対応が文書要素検索用情報生成手段により文書
要素検索用情報として、文書要素検索情報記憶手段に登
録される。文書要素検索用情報を用いることにより、文
書要素の検索をより高速化できる。しかも、従来のキー
ワード検索のように検索用情報の登録に手間を要するこ
とがなく、簡便に検索用情報の登録が可能となる。In the present invention (Claim 2), the headline of the document is detected by the headline detecting means, and the correspondence between the headline and the document element is retrieved by the document element retrieving information generating means as the document element retrieving information. It is registered in the information storage means. By using the document element search information, the search speed of the document element can be further increased. In addition, unlike the conventional keyword search, it is possible to easily register the search information without requiring the trouble of registering the search information.

【００１７】また、構造化された文書データの検索を行
なう文書検索装置において、検索の結果ヒットした文書
の文書データは利用者に転送または印刷／表示される。
その際、本発明（請求項３）は、取り出し範囲を、取り
出し範囲指定手段（図５の５４）により文書の全体また
は部分構造として予め指定する。指定できる構造は、例
えばタイトルのみ／第１ページ／目次／条件が一致した
文／段落／節／章といった文書の部分構造、および文書
全体である。その取り出し範囲指定手段により指定され
た部分構造を文書構造検索手段（図５の５２）により検
索する。Further, in the document search device for searching structured document data, the document data of the document hit as a result of the search is transferred or printed / displayed to the user.
In this case, according to the present invention (claim 3), the extraction range is designated in advance by the extraction range designation means (54 in FIG. 5) as the whole or partial structure of the document. The structures that can be designated are, for example, only the title / first page / table of contents / text with matching conditions / paragraphs / sections / chapter, and the entire document. The document structure searching unit (52 in FIG. 5) searches the partial structure specified by the extraction range specifying unit.

【００１８】以上のように、本発明（請求項３）によれ
ば、指示されたキーワードなどの検索条件を満たす文書
を検索した場合に、検索結果の文書からあらかじめ指定
された部分構造だけをユーザに転送することができ、文
書の他の不必要な部分を転送し表示／印刷することを防
ぐことができる。特に、文書の記憶部と検索結果を渡す
ところが遠隔の場合には転送量を減らすことになる。As described above, according to the present invention (Claim 3), when a document satisfying a search condition such as an instructed keyword is searched, only a partial structure designated in advance from the search result document is searched by the user. Can be forwarded to and prevented from forwarding and displaying / printing other unwanted parts of the document. In particular, the transfer amount is reduced when the storage unit of the document and the place where the search result is passed are remote.

【００１９】[0019]

【実施例】【Example】

（第１の実施例）図１は本実施例の構成を示す図であ
る。本実施例の文書検索装置は、図１に示すように、検
索条件入力部１１と、文書構造検索部１２と、文書記憶
部１３と、検索結果記憶部１４と、検索結果表示部１５
を備えている。(First Embodiment) FIG. 1 is a diagram showing the configuration of this embodiment. As shown in FIG. 1, the document search device according to the present exemplary embodiment includes a search condition input unit 11, a document structure search unit 12, a document storage unit 13, a search result storage unit 14, and a search result display unit 15.
Is equipped with.

【００２０】検索条件入力部１１は、内容検索の対象と
なる文書部品の条件と文書内容の条件を入力するもので
ある。The search condition input unit 11 is for inputting the conditions of the document parts and the contents of the document which are the targets of the content search.

【００２１】文書構造検索部１２は、構造化文書の形で
格納された文書を文書記憶部１３から取り出して操作
し、検索結果を検索結果記憶部１４に順次出力するもの
であり、文書ルート取得部１２１、文書部品選択部１２
２、部品内容照合部１２３を有している。The document structure retrieval unit 12 retrieves a document stored in the form of a structured document from the document storage unit 13 and operates it, and sequentially outputs retrieval results to the retrieval result storage unit 14, and obtains a document root. Section 121, document parts selection section 12
2. It has a parts content collating unit 123.

【００２２】文書ルート取得部１２１は文書記憶部１３
に格納されている文書の構造木のルートを順次取り出す
処理を行うものである。The document route acquisition unit 121 is a document storage unit 13.
This is a process for sequentially extracting the root of the structure tree of the document stored in.

【００２３】文書部品選択部１２２は文書ルート取得部
１２１で取り出した文書ルートから構造木を辿って、検
索条件入力部１１で取得した文書部品条件を満たす文書
部品を選択する処理を行うものである。The document component selection unit 122 follows the structure tree from the document root extracted by the document route acquisition unit 121 and performs a process of selecting a document component satisfying the document component condition acquired by the search condition input unit 11. .

【００２４】部品内容照合部１２３は文書部品選択部１
２２で得た文書部品に対応する文書内容を文書記憶部１
３から取り出して、検索条件入力部１１から入力した文
書内容条件を満たすかどうかを判定する処理を行うもの
である。文書記憶部１３は文書をＯＤＡ形式で格納する
記憶部である。本実施例では文書部品として章、節、段
落、見出しを扱う。The parts content collating unit 123 is a document parts selecting unit 1.
The document contents corresponding to the document parts obtained in 22 are stored in the document storage unit 1.
3 is performed and it is determined whether the document content conditions input from the search condition input unit 11 are satisfied. The document storage unit 13 is a storage unit that stores a document in the ODA format. In this embodiment, chapters, sections, paragraphs and headings are treated as document parts.

【００２５】検索結果記憶部１４は部品内容照合部１２
３で照合が成功した文書部品を含む文書のルートを結果
として保持する記憶部である。The search result storage unit 14 is a component content collation unit 12
3 is a storage unit that holds the route of the document including the document parts that have been successfully collated in 3.

【００２６】検索結果表示部１５は、文書検索終了時に
検索結果記憶部１４に保持された検索結果をディスプレ
イなどに表示する表示部である。The search result display unit 15 is a display unit that displays the search results held in the search result storage unit 14 on the display or the like when the document search is completed.

【００２７】このように構成された文書検索装置におけ
る検索処理の過程を詳細に説明する。文書の理論構造は
文書記憶部１３において図２に示すようなツリー構造で
記憶されている。The process of search processing in the document search device thus configured will be described in detail. The theoretical structure of a document is stored in the document storage unit 13 in a tree structure as shown in FIG.

【００２８】さらに文書のルートは図３に示すように文
書名と文書ルートへのポインタからなる配列の形式で格
納されているものとする。Further, it is assumed that the document root is stored in the form of an array consisting of a document name and a pointer to the document root, as shown in FIG.

【００２９】図４は処理のフローを示すものである。（１）最初に検索条件入力部１１により文書部品条
件、文書内容条件を入力する（ステップＳ４０１）。こ
こで文書部品条件は、文書中の検索の範囲を限定するた
めの文書の構造における特定の位置・範囲を指定する条
件である。例えば、文書部品条件には、文書部品の見出
しを指定することによって特定するためにの例として見
出し文字列に「特許請求の範囲」という文字列を含むと
いう見出し文字を利用した条件や、定型フォームを持つ
文書において決まった位置に出現する「要約」の部分な
ど、文書の構造に依存する条件を記述することができ
る。また文書内容条件は選択した文書部品に対応する節
や段落の内容に対して文字列検索を行うための検索文字
列である。FIG. 4 shows a processing flow. (1) First, the document condition condition and the document content condition are input by the search condition input unit 11 (step S401). Here, the document component condition is a condition for designating a specific position / range in the document structure for limiting the search range in the document. For example, in the document part condition, a condition using a heading character that a heading character string includes a character string "Claims" or a fixed form is specified as an example for specifying by specifying a heading of the document part. It is possible to describe conditions that depend on the structure of the document, such as the "summary" portion that appears at a fixed position in the document that has. The document content condition is a search character string for performing a character string search on the content of the section or paragraph corresponding to the selected document part.

【００３０】（２）照合すべき対象の文書があるか否
かを調べ（ステップＳ４０２）、照合すべき対象の文書
があれば、文書ルート取得部１２１によって文書記憶部
１３から文書を構成するツリーのルートを一つ取り出す
（ステップＳ４０３）。すべて照合が済んで処理すべき
対象の文書がない場合は（７）の処理へ進む。(2) It is checked whether or not there is a document to be collated (step S402), and if there is a document to be collated, the tree that composes the document from the document storage unit 13 by the document root acquisition unit 121. One route is extracted (step S403). If all the documents have been collated and there is no target document to be processed, the process proceeds to (7).

【００３１】（３）そのルートのに照合が済んでいな
い文書部品があるか否かを調べ（ステップＳ４０４）、
あれば文書部品選択部１２２は（２）で得られたルート
から文書部品のツリーを辿り、文書部品を一つ取り出す
（ステップＳ４０５）。取り出すべき文書部品がそのル
ートにない場合は（２）の処理に戻る。ここで定型フォ
ームを持つ文書などを検索対象とする場合、ツリーにお
けるある位置の文書部品のみを候補として取り出すとい
った手法も考えられる。(3) It is checked whether or not there is a document part whose matching has not been completed in the route (step S404),
If there is, the document parts selection unit 122 traces the tree of the document parts from the route obtained in (2) and extracts one document part (step S405). When the document part to be taken out is not in the route, the process returns to (2). Here, if a document having a fixed form is to be searched, a method may be considered in which only the document parts at a certain position in the tree are extracted as candidates.

【００３２】（４）各々の文書部品が文書部品条件を
満たすかどうかを判定する（ステップＳ４０６）。例え
ば、文書部品条件として文書部品の見出しに含まれるべ
き文字列が指定されているときは、（３）で取り出した
文書部品に対応する見出しの文字列と文書部品条件の文
字列とを比較し、一致した場合にはその文書部品に属す
る文書内容が文書内容条件によって検索する範囲として
特定される。図２の構造化文書の例において、文書部品
２０１を取り出し、それに対応する見出し２０２の内容
２０３の文字列が文書部品条件の文字列を含んでいたと
すれば、その文書部品に対応する内容２０８および２１
３が、文書内容条件によって検索する範囲となる。も
し、文書部品２０１は文書部品条件を満たさなかった
が、文書部品２０４を取り出して比較し、その見出しの
内容２０６の文字列が文書部品条件の文字列を含むと判
定されたとすれば、その文書部品２０４の内容２０８が
文書内容条件によって検索すべき範囲となる。(4) It is determined whether or not each document component satisfies the document component condition (step S406). For example, when the character string to be included in the headline of the document part is specified as the document part condition, the character string of the headline corresponding to the document part extracted in (3) is compared with the character string of the document part condition. If they match, the document content belonging to the document part is specified as a search range according to the document content condition. In the example of the structured document in FIG. 2, if the document part 201 is taken out and the character string of the content 203 of the corresponding heading 202 includes the character string of the document part condition, the content 208 corresponding to the document part is obtained. And 21
3 is the range to be searched according to the document content conditions. If the document component 201 does not satisfy the document component condition, but the document component 204 is taken out and compared, and it is determined that the character string of the content 206 of the headline includes the character string of the document component condition, the document The content 208 of the part 204 is the range to be searched according to the document content condition.

【００３３】（５−１）文書部品条件を満たす場合、
その文書部品に対応する内容を文書記憶部１３から取り
出し（ステップＳ４０７）、部品内容照合部１２３によ
って文書内容条件を満たしているかどうかの判断、つま
り検索文字列との文字列照合を行う（ステップＳ４０
８）。（５−２）文書部品条件を満たさない場合、（３）の処
理へ戻る。(5-1) When the document parts condition is satisfied,
The content corresponding to the document part is retrieved from the document storage unit 13 (step S407), and the part content matching unit 123 determines whether or not the document content condition is satisfied, that is, character string matching with the search character string is performed (step S40).
8). (5-2) If the document part condition is not satisfied, the process returns to (3).

【００３４】（６−１）照合が成功した場合、その内容
を含む文書を検索結果記憶部１４に記憶し（ステップＳ
４０９）、（２）の処理に戻る。（６−２）照合が失敗した場合、（３）の処理へ戻る。(6-1) If the collation is successful, the document including the contents is stored in the search result storage unit 14 (step S
409) and the process returns to (2). (6-2) If the collation fails, the process returns to (3).

【００３５】（７）検索結果表示部１５により検索結果
記憶部１４に格納された検索結果をディスプレイ等に表
示する（ステップＳ４１０）。（８）処理を終了する。(7) The search result display unit 15 displays the search results stored in the search result storage unit 14 on a display or the like (step S410). (8) The process ends.

【００３６】本実施例によれば、まず文書部品選択部１
２２によって文書部品条件を満たす文書部品を検索し特
定することにより、文書内容条件による検索の範囲を文
書内の必要な領域のみに限定するようにしたので、検索
の効率をあげることができ、かつ検索ノイズを減少させ
ることができる。According to the present embodiment, first, the document part selection unit 1
By searching for and identifying the document part that satisfies the document part condition by 22, the search range based on the document content condition is limited to only a necessary area in the document, so that the search efficiency can be improved, and Search noise can be reduced.

【００３７】（第２の実施例）図５は第２の実施例の検
索装置の構成を示すものである。この検索装置は、文書
記憶部５１と、文書構造検索部５２と、検索条件入力部
５３と、取り出し範囲指定部５４と、文書構造移動指定
部５５と、検索結果表示部５６とを備えている。(Second Embodiment) FIG. 5 shows the configuration of a search device according to the second embodiment. This search device includes a document storage unit 51, a document structure search unit 52, a search condition input unit 53, a retrieval range designation unit 54, a document structure movement designation unit 55, and a search result display unit 56. .

【００３８】文書記憶部５１は文書を格納するための記
憶部である。文書構造検索部５２は格納されている文書
から検索条件入力部５３で指定された条件に合った文書
を検索し取り出し範囲指定部５４で指定された範囲に合
わせて文書構造を取り出したり、文書構造移動指定部５
５の指定により文書構造を辿る処理を行うものである。
検索条件入力部５３はユーザが検索すべき文書の条件を
指定するものである。取り出し範囲指定部５４は結果を
どうような文書構造で取り出すかを指定するものであ
る。文書構造移動指定部５５は検索結果として取り出し
た文書部分構造からユーザなどの指定により他の文書構
造への移動を指示するものである。検索結果表示部５６
は文書構造検索部５２の検索結果を表示するものであ
る。The document storage unit 51 is a storage unit for storing a document. The document structure search unit 52 searches the stored documents for documents that meet the conditions specified by the search condition input unit 53, extracts the document structure in accordance with the range specified by the extraction range specification unit 54, and retrieves the document structure. Movement designation section 5
According to the designation of 5, the processing for tracing the document structure is performed.
The search condition input unit 53 is for the user to specify the condition of the document to be searched. The retrieval range designating unit 54 designates the document structure in which the result is retrieved. The document structure movement designating section 55 is for instructing the movement from the document partial structure extracted as the search result to another document structure by the designation of the user or the like. Search result display area 56
Displays the search result of the document structure search unit 52.

【００３９】次に、以上のように構成された本実施例の
文書検索装置の動作を図６のフロー図により説明する。
検索条件入力部５３により、検索条件が入力される（ス
テップＳ６１）。検索したいキーワードとして「ネット
ワーク」が入力されたと仮定する。Next, the operation of the document retrieval apparatus of this embodiment constructed as described above will be described with reference to the flow chart of FIG.
Search conditions are input by the search condition input unit 53 (step S61). Suppose "network" is entered as the keyword to be searched.

【００４０】また、取り出し範囲指定部５４により、文
書構造のどれを検索結果として文書記憶部５１より取り
出すかを指定する（ステップＳ６２）。なお、この指定
は検索を行い、条件に合った文書が存在することが判明
した後で行うようにしてもよい。取り出す文書の構造
は、キーワードが出現する文／段落／節／章、キーワー
ドが出現する文書全体、文書の１ページ／目次などであ
る。取り出し範囲指定部５４のインタフェースの例を図
８に示す。例えば、条件に一致した段落を指定しておく
と、図９のように「ネットワーク」が出現する段落のみ
が検索結果として抽出され、結果表示手段５７に表示さ
れる。Further, the retrieval range designating section 54 designates which of the document structures is to be retrieved from the document storage section 51 as a search result (step S62). Note that this designation may be performed after a search is performed and it is found that a document satisfying the conditions exists. The structure of the document to be extracted is a sentence / paragraph / section / chapter in which the keyword appears, the entire document in which the keyword appears, one page / table of contents of the document, and the like. FIG. 8 shows an example of the interface of the extraction range designation unit 54. For example, when a paragraph that matches the condition is designated, only the paragraph in which “network” appears as shown in FIG. 9 is extracted as a search result and displayed on the result display unit 57.

【００４１】文書構造検索部５２を通して文書記憶部５
１から検索条件であるネットワークという言葉が出現す
る文書が探索される（ステップＳ６３）。Through the document structure search unit 52, the document storage unit 5
A document in which the word "network" that is the search condition appears from 1 is searched (step S63).

【００４２】探索の結果、該当文書がなかった時はステ
ップＳ６５へ進み、あった時にはステップＳ６６へ進む
（ステップＳ６４）。ここでは、図７に示すような３件
の文書が探索されたとする。なお、該当文書がなかった
時には検索結果表示部５６によりその旨表示する（ステ
ップＳ６５）。As a result of the search, if there is no corresponding document, the process proceeds to step S65, and if there is, the process proceeds to step S66 (step S64). Here, it is assumed that three documents as shown in FIG. 7 are searched. When there is no corresponding document, the search result display unit 56 displays that fact (step S65).

【００４３】探索された文書に関して、ステップＳ６２
で指定された取り出すべき文書構造を抽出する（ステッ
プＳ６６）。前述の例のように段落が指定された場合に
は、該当文書の条件に一致した段落が抽出され、図９の
ように検索結果表示部５６に表示される（ステップＳ６
７）。Regarding the searched document, step S62
The document structure to be extracted specified in step S66 is extracted (step S66). When a paragraph is designated as in the above example, a paragraph that matches the condition of the relevant document is extracted and displayed on the search result display unit 56 as shown in FIG. 9 (step S6).
7).

【００４４】また、検索された結果の文書について、表
示されている段落の前後の段落や上位の構造である節や
章や文書全体を、文書構造移動指定部５５により指定し
（ステップＳ６８、Ｓ６９）、表示することもできる。
文書構造移動指定部５５のインタフェースの例を図１０
に示す。With respect to the retrieved document, paragraphs before and after the displayed paragraph, sections or chapters having a higher structure, and the entire document are designated by the document structure movement designation unit 55 (steps S68 and S69). ), Can also be displayed.
An example of the interface of the document structure movement designation unit 55 is shown in FIG.
Shown in.

【００４５】図１１には、文書構造移動指定部５５によ
って指定された種々の場合について、関連構造を取り出
し表示する様子を示すものである。即ち、図１１には検
索条件のネットワークを含む文書２について、検索条件
に一致した部分構造に対し、前方の部分構造を指定した
場合、一つ上位の部分構造を指定した場合、さらに一つ
上位の部分構造を指定した場合、上位の文書全体を指定
した場合のそれぞれの表示が例示されている。FIG. 11 shows how related structures are extracted and displayed for various cases designated by the document structure movement designating unit 55. That is, in FIG. 11, in the case of the document 2 including the network of the search condition, when the front partial structure is specified with respect to the partial structure matching the search condition, when the upper partial structure is specified, the next upper structure In the case where the partial structure is designated, the respective displays when the entire upper document is designated are illustrated.

【００４６】文書構造検索部５２については、文書２を
例にとり動作を説明する。文書記憶部５１には図１２に
示すように文書構造を表現する形で記憶されている。
「ネットワーク」というキーワードを含む文書を検索す
ると通常の手法により「オフィスへのネットワークが
…。」が見つけ出される。更に取り出す構造の指定が
“段落”であれば、一つ上のノードが段落を表現する構
造であるのでノード９以下の構造が文書記憶部５１より
取り出される。文書構造移動指定部５５により”一つ上
位の構造へ”と指定されると図１２でノード９から一つ
上の構造へ辿りノード７以下の構造を取り出す。ノード
９から下の構造を取り出しているときに、一つ前方の部
分構造を要求されると、同様に図７でノード９から一つ
前方の部分構造へ辿りノード１０以下の構造を取り出
す。The operation of the document structure search unit 52 will be described by taking the document 2 as an example. The document storage unit 51 stores the document structure as shown in FIG.
When a document including the keyword "network" is searched, "network to the office ..." is found by a usual method. Further, if the designation of the structure to be extracted is "paragraph", since the node one level above is a structure expressing the paragraph, the structures of the nodes 9 and below are extracted from the document storage unit 51. When the document structure movement designating unit 55 designates "to the structure one level higher", the structure from the node 9 to the structure one level higher in FIG. 12 is extracted. When a substructure one forward is requested while the structure below is taken out from the node 9, similarly, the structure below the node 10 is taken out by going back from the node 9 to the substructure one ahead in FIG.

【００４７】この第２の実施例は、指示されたキーワー
ドを持つ文書を検索した場合に、取り出し範囲指定部５
４によりあらかじめ指定された文書の部分構造だけをユ
ーザに転送することにより、文書の他の不必要な部分を
転送し表示／印刷することを防ぐことができる。特に、
文書記憶部５１と検索結果を渡すところが遠隔の場合に
は転送量を減らすことになる。また、結果として渡され
た文書の部分構造から文書構造移動指定部５５により文
書構造に基づいて辿っていくことにより最初に指示した
文書構造以外の部分も取得することができる。In the second embodiment, when the document having the designated keyword is searched, the retrieval range designating unit 5 is used.
By transferring only the partial structure of the document designated in advance by 4 to the user, it is possible to prevent other unnecessary parts of the document from being transferred and displayed / printed. In particular,
When the place where the document storage unit 51 and the search result are passed is remote, the transfer amount is reduced. Further, by following the partial structure of the document passed as a result based on the document structure by the document structure movement designating unit 55, a part other than the document structure initially instructed can be acquired.

【００４８】（第３の実施例）図１３は第３の実施例の
構成を示すものである。この装置は、文書作成装置１３
０と、文書記憶部１３１と、見出し検出部１３２と、検
索用情報生成部１３３と、部品検索用情報記憶部１３４
と、部品検索部１３５と、検索条件入力部１３６と、文
書部品取り出し部１３７と、部品内容照合部１３８と、
検索結果表示部１３９を備えている。(Third Embodiment) FIG. 13 shows the configuration of the third embodiment. This device is a document creation device 13
0, the document storage unit 131, the headline detection unit 132, the search information generation unit 133, and the component search information storage unit 134.
A component search unit 135, a search condition input unit 136, a document component extraction unit 137, a component content collation unit 138,
The search result display unit 139 is provided.

【００４９】文書記憶部１３１は文書作成装置１３０に
より作成された文書を記憶するために用いられる。The document storage unit 131 is used to store the document created by the document creating device 130.

【００５０】見出し検出部１３２は、文書作成装置１３
０により作成された文書データを文書記憶部１３１に記
憶する際に、文書データを解析し、章、節、図表などの
文書部品の見出しとなる部分を検出するものである。The headline detection unit 132 is used for the document creation device 13
When the document data created by 0 is stored in the document storage unit 131, the document data is analyzed to detect a portion that becomes a heading of a document part such as a chapter, a section, or a chart.

【００５１】検索用情報生成部１３３は見出し検出手段
により検出された見出しの文字列を抽出し、文書部品の
ポインタとの対応付けをした部品検索用情報を生成する
ものであり、その生成した部品検索用情報は部品検索用
情報記憶部１３４に記憶される。The search information generating section 133 extracts the character string of the headline detected by the headline detecting means, and generates the part search information which is associated with the pointer of the document part. The search information is stored in the component search information storage unit 134.

【００５２】文書部品検索部１３５は文書を検索する際
に検索の範囲を小さく限定するための検索を行うもので
あり、検索条件入力部１３６より与えられた文字列と同
じまたはそれを含む見出しを部品検索用情報記憶部１３
４から検索し、該当する見出しを含む部品のポインタを
文書部品取り出し部１３７へ送出するものである。The document parts search unit 135 performs a search for narrowing the search range when searching for a document, and searches for a headline that is the same as or includes the character string given by the search condition input unit 136. Parts search information storage unit 13
4 and sends the pointer of the component including the corresponding heading to the document component extracting unit 137.

【００５３】検索条件入力部１３６は、部品を検索する
ための条件を文書部品検索部１３５へ入力し、検索され
た文書部品に対して内容検索を行うための内容条件を部
品内容照合部１３８へ入力するものである。The search condition input unit 136 inputs a condition for searching a part to the document part searching unit 135, and a content condition for performing a content search for the searched document part to the part content collating unit 138. It is something to enter.

【００５４】文書部品取り出しぶ１３７は、文書部品検
索部１３５からの検索結果得られた文書部品のポインタ
により文書記憶部１３１から文書部品を取り出し部品内
容照合部１３８へ渡すものである。The document part retrieval unit 137 retrieves the document component from the document storage unit 131 by the pointer of the document component obtained as a result of retrieval from the document component retrieval unit 135 and transfers it to the component content collation unit 138.

【００５５】部品内容照合部１３８は、文書部品取り出
し部１３７により文書記憶部１３１から取り出された部
品書部品の内容を検索条件入力部１３６から入力された
内容条件と照合するものである。The parts content collating unit 138 collates the contents of the parts document parts extracted from the document storage unit 131 by the document parts extracting unit 137 with the content conditions input from the search condition input unit 136.

【００５６】検索結果表示部１３９は、部品内容照合部
１３８での照合の結果、検索条件に適合した文書の一覧
を表示するものである。The search result display unit 139 displays a list of documents that meet the search conditions as a result of the collation by the component content collating unit 138.

【００５７】図１４は、本実施例の部品検索用情報の生
成、記憶を行う処理の流れを示すものである。図１４に
おいてｐは文書内容を指し示すポインタである。ポイン
タの指し示す対象は、構造化文書の木構造のノードであ
る。FIG. 14 shows the flow of processing for generating and storing the component search information of this embodiment. In FIG. 14, p is a pointer that indicates the contents of the document. The object pointed by the pointer is a tree-structured node of the structured document.

【００５８】まずステップＳ１４１によりｐが文書の開
始点を指し示すように初期化される。開始点は例えば図
２の構造化文書の場合の文書１のルート２００に当た
る。First, in step S141, p is initialized to point to the start point of the document. The starting point is, for example, the route 200 of the document 1 in the case of the structured document of FIG.

【００５９】続いて、ステップＳ１４２により現在ｐが
指し示している部分が見出しであるか否かを調べる。こ
こで、関数ｔｙｐｅはｐの指し示している対象の種類を
調べ、それが見出しであれば見出しであることを表す値
を返す。構造化文書では木構造の各ノードに、そのノー
ドの種類、例えば表題、見出し、図形、本文などを区別
する値が設定されているので、関数ｔｙｐｅはこの値を
調べることによってｐが見出しであるか否かを知ること
ができる。Subsequently, in step S142, it is checked whether or not the portion currently pointed to by p is a headline. Here, the function type checks the type of the object pointed to by p, and if it is a headline, returns a value indicating that it is a headline. In the structured document, each node of the tree structure is set with a value that distinguishes the type of the node, for example, a title, a headline, a figure, a body, etc. Therefore, p is a headline by examining this value. You can know whether or not.

【００６０】ステップＳ１４２によりｐが見出しである
と判定された場合、ステップＳ１４３において部品検索
用情報生成部１３３によりそのｐの内容とその位置を指
し示す情報との対応表の形式にした部品検索用情報が生
成され、部品検索用情報記憶部１７３に記憶される。図
２に示すような構造化文書ではｐが見出しである場合に
はその子は内容部であり、見出しの内容となる文字列が
この内容部に格納されている。When it is determined in step S142 that p is a headline, in step S143, the component search information generation unit 133 forms the component search information in the form of a correspondence table between the content of p and the information indicating the position. Is generated and stored in the component search information storage unit 173. In the structured document as shown in FIG. 2, when p is a heading, its child is the content section, and the character string that is the content of the heading is stored in this content section.

【００６１】ステップＳ１４２によりｐが見出しでない
と判定された場合、または、ステップＳ１４３により部
品検索用情報の生成、記憶が終了した後に、ステップＳ
１４４によりｐ：＝ｎｅｘｔ（ｐ）によりｐが次のノー
ドを指し示すようにする。When it is determined in step S142 that p is not a headline, or after the generation and storage of the component search information are completed in step S143, step S
144 causes p: = next (p) so that p points to the next node.

【００６２】続いて、ｐの値を調べ、ｎｉｌであれば終
了し、ｎｉｌでなければステップＳ１４２に戻って繰り
返し処理を続ける。Then, the value of p is checked. If it is nil, the process is ended. If it is not nil, the process returns to step S142 to repeat the process.

【００６３】このようにして生成・記憶された部品検索
用情報は、図１６の対応表１６１に示すようなものとな
る。これを用いて、文書記憶部１３１から所望の文書部
品を取り出す検索処理について説明する。図１５は検索
処理の流れを示す。The parts search information thus generated and stored is as shown in the correspondence table 161 of FIG. A search process for extracting a desired document part from the document storage unit 131 will be described using this. FIG. 15 shows the flow of search processing.

【００６４】図１５においてｃは問い合わせの条件を表
すデータ、ｉは対応表１６１の行を示す整数型の変数、
ｕは対応表１６１から指されている文書部品１６４〜１
６９で条件を満たすものを記憶する集合型の変数であ
る。In FIG. 15, c is data representing a query condition, i is an integer type variable indicating a row of the correspondence table 161, and
u is the document part 164-1 pointed to by the correspondence table 161.
It is a set type variable that stores the condition 69.

【００６５】ステップＳ１５１でまず問い合わせの条件
を表すデータをｃに代入する。In step S151, the data representing the inquiry condition is first substituted into c.

【００６６】ステップＳ１５２でｉ、ｕを初期化する。
次にステップＳ１５３で対応表のｉ行目である「対応表
［ｉ］」に記憶されている文字列が条件ｃを満たすか否
かを調べる。ｉが１のときには対応表１６１の１行目を
調べる。In step S152, i and u are initialized.
Next, in step S153, it is checked whether the character string stored in the "correspondence table [i]", which is the i-th row of the correspondence table, satisfies the condition c. When i is 1, the first line of the correspondence table 161 is checked.

【００６７】ステップＳ１５３により調べた結果、条件
を満たす場合にはステップＳ１５４により「対応表
［ｉ］」から指されている文書部品をｕに追加する。As a result of checking in step S153, if the condition is satisfied, the document part pointed to from the "correspondence table [i]" is added to u in step S154.

【００６８】ステップＳ１５３の判定の結果、条件を満
たさないとされた場合、またはステップＳ１５４が終了
した後に、ステップＳ１５５によりｉの値が１だけ増え
る。If the result of determination in step S153 is that the condition is not satisfied, or after step S154 ends, the value of i is incremented by 1 in step S155.

【００６９】続いてステップＳ１５６によりその値が対
応表１６１の大きさを越えていないか調べられ、越えて
いない場合にはステップＳ１５３にもどり、ｉが対応表
１６１の大きさを越えるまで繰り返される。Subsequently, in step S156, it is checked whether or not the value exceeds the size of the correspondence table 161, and if not, the process returns to step S153 and is repeated until i exceeds the size of the correspondence table 161.

【００７０】この処理が終了した後に、ｕには条件を満
たす文書部品を指し示す情報が記憶されている。この情
報は文書部品取り出し部１３７に渡され、これにより対
応する文書部品の内容が文書記憶部１３１から取り出さ
れ、部品内容照合部１３８において、検索条件入力部１
３６から入力された内容条件と照合され、条件を満たす
文書部品の一覧が検索結果表示部１３９に表示され
る。。After this process is completed, information indicating a document component satisfying the condition is stored in u. This information is passed to the document parts extracting unit 137, whereby the contents of the corresponding document parts are extracted from the document storage unit 131, and in the parts contents collating unit 138, the search condition input unit 1
The content condition input from 36 is collated, and a list of document parts satisfying the condition is displayed on the search result display unit 139. .

【００７１】本実施例によれば、まず文書部品検索部１
３５によって文書部品条件を満たす文書部品を検索し特
定することにより、文書内容条件による検索の範囲を文
書内の必要な領域のみに限定するようにしたので、検索
の効率をあげることができ、かつ検索ノイズを減少させ
ることができる。しかも、部品検索用情報記憶部１３４
に記憶された部品検索用情報により部品検索をするの
で、いっそう高速な検索ができる。また、見出し検出部
１３２により見出しを検出し、これを部品検索用情報生
成部１３３により、部品検索用情報とするので、検索用
情報の生成、登録に人手を要さず、検索用情報の自動作
成が可能となる。According to the present embodiment, first, the document parts retrieval unit 1
By searching for and identifying the document part satisfying the document part condition by 35, the search range based on the document content condition is limited to only a necessary area in the document, so that the search efficiency can be improved, and Search noise can be reduced. Moreover, the parts search information storage unit 134
Since the parts are searched based on the parts searching information stored in, it is possible to search even faster. Further, since the headline detection unit 132 detects a headline and the component search information generation unit 133 uses this as the component search information, no labor is required to generate and register the search information, and the search information is automatically generated. Can be created.

【００７２】[0072]

【発明の効果】以上のように、本発明（請求項１）は文
書要素条件を用いた検索により文書要素選択し、その選
択された文書要素に対して文書内容条件による検索を施
すようにしたので、検索の効率の改善、検索ノイズの減
少をはかることができる。As described above, according to the present invention (Claim 1), the document element is selected by the search using the document element condition, and the selected document element is searched by the document content condition. Therefore, it is possible to improve search efficiency and reduce search noise.

【００７３】本発明（請求項２）は、文書の見出しを見
出し検出手段により検出され、この見出しと文書要素と
の対応が文書要素検索用情報生成手段により文書要素検
索用情報として、文書要素検索情報記憶手段に登録され
る。文書要素検索用情報を用いることにより、文書要素
の検索を高速化できる。従って本発明（請求項２）は、
上記発明（請求項１）の検索の効率向上の効果をさらに
大きくすることができる。しかも、従来のキーワード検
索のように検索用情報の登録に手間を要することがな
く、簡便に検索用情報の登録が可能となる。According to the present invention (Claim 2), the headline of the document is detected by the headline detecting means, and the correspondence between the headline and the document element is retrieved by the document element retrieving information generating means as the document element retrieving information. It is registered in the information storage means. By using the document element search information, it is possible to speed up the search for the document element. Therefore, the present invention (claim 2) is
The effect of improving the search efficiency of the above invention (claim 1) can be further enhanced. In addition, unlike the conventional keyword search, it is possible to easily register the search information without requiring the trouble of registering the search information.

【００７４】また、本発明（請求項３）によれば、指示
されたキーワードを持つ文書を検索した場合に、あらか
じめ指定された文書の部分構造だけをユーザに転送する
ことができ、文書の他の不必要な部分を転送し表示／印
刷することを防ぐことができる。特に、文書の記憶部と
検索結果を渡すところが遠隔の場合には転送量を減らす
ことになる。Further, according to the present invention (Claim 3), when the document having the designated keyword is searched, only the partial structure of the document designated in advance can be transferred to the user. It is possible to prevent unnecessary portions of the data from being transferred and displayed / printed. In particular, the transfer amount is reduced when the storage unit of the document and the place where the search result is passed are remote.

【図面の簡単な説明】[Brief description of drawings]

【図１】第１の実施例の構成を示す図FIG. 1 is a diagram showing a configuration of a first embodiment.

【図２】構造化文書の例を示す図FIG. 2 is a diagram showing an example of a structured document.

【図３】文書ルートの格納の形態を説明するための図FIG. 3 is a diagram for explaining a storage mode of a document route.

【図４】第１の実施例の処理フローを示す図FIG. 4 is a diagram showing a processing flow of the first embodiment.

【図５】第２の実施例の構成を示す図FIG. 5 is a diagram showing a configuration of a second embodiment.

【図６】第２の実施例の処理フローを示す図FIG. 6 is a diagram showing a processing flow of the second embodiment.

【図７】指定キーワードを含む文書の集合の例を示す
図FIG. 7 is a diagram showing an example of a set of documents including a designated keyword.

【図８】取り出し範囲指定部の例を示す図FIG. 8 is a diagram showing an example of an extraction range designation unit.

【図９】結果として表示された文書部分の例を示す図FIG. 9 is a diagram showing an example of a document portion displayed as a result.

【図１０】文書構造移動指定部の例を示す図FIG. 10 is a diagram showing an example of a document structure movement designation unit.

【図１１】関連構造の取出を説明するための図FIG. 11 is a diagram for explaining extraction of related structures.

【図１２】文書構造検索の例を説明するための図FIG. 12 is a diagram for explaining an example of document structure search.

【図１３】第３の実施例の構成を示す図FIG. 13 is a diagram showing a configuration of a third embodiment.

【図１４】第３の実施例における検索用情報の登録処
理のフローを示す図FIG. 14 is a diagram showing a flow of registration processing of search information in the third embodiment.

【図１５】第３の実施例における検索処理のフローを
示す図FIG. 15 is a diagram showing a flow of search processing in the third embodiment.

【図１６】第３の実施例における検索用情報（対応
表）の例を示す図FIG. 16 is a diagram showing an example of search information (correspondence table) in the third embodiment.

【符号の説明】[Explanation of symbols]

１１…検索条件入力部、１２…文書構造検索部、１２１
…文書ルート取得部、１２２…文書部品選択部、１２３
…部品内容照合部、１３…文書記憶部、１４…検索結果
記憶部、１５…検索結果表示部、５１文書記憶部、５２
…文書構造検索部、５３…検索条件入力部、５４…取り
出し範囲指定部、５５…文書構造移動指定部、５６…検
索結果表示部。11 ... Search condition input unit, 12 ... Document structure search unit, 121
... document route acquisition unit, 122 ... document component selection unit, 123
... parts content collating unit, 13 ... document storage unit, 14 ... search result storage unit, 15 ... search result display unit, 51 document storage unit, 52
Document structure search unit 53 Search condition input unit 54 Extraction range designation unit 55 Document structure movement designation unit 56 Search result display unit

Claims

【特許請求の範囲】[Claims]

【請求項１】文書要素から構成される複数の文書デー
タを記憶する文書記憶手段と、文書要素を特定するための条件である文書要素指定条件
および文書要素に対応する文書内容を特定するための条
件である文書内容条件を指定する検索条件入力手段と、前記検索条件入力手段により指定された文書要素指定条
件を満たす前記文書記憶手段内の文書要素を検索する文
書要素検索手段と、前記文書要素検索手段により取り出された文書要素に関
して、前記検索条件入力手段により指定された文書内容
条件を満たすものを検索する内容検索手段とを備えたこ
とを特徴とする文書検索装置。1. A document storage unit for storing a plurality of pieces of document data composed of document elements, and a document element specifying condition which is a condition for specifying a document element and a document content corresponding to the document element. Search condition input means for specifying a document content condition which is a condition; document element search means for searching for a document element in the document storage means satisfying the document element specification condition specified by the search condition input means; A document retrieving apparatus, comprising: a content retrieving unit for retrieving document elements retrieved by the retrieving unit that satisfy the document content conditions designated by the retrieving condition inputting unit.

【請求項２】文書要素から構成される複数の文書デ
ータを記憶する文書記憶手段と、前記文書記憶手段に記憶される文書から見出しを表す文
字列を検出する見出し検出手段と、前記見出し検出手段により検出された文字列と前記文書
記憶手段により記憶された文書要素との対応関係を表す
検索用情報を生成する検索用情報生成手段と、前記検索用情報生成手段により生成された検索用情報を
記憶する検索用情報記憶手段と、文書要素を特定するための条件である文書要素指定条件
および文書要素に対応する文書内容を特定するための条
件である文書内容条件を指定する検索条件入力手段と、前記検索条件入力手段により指定された文書要素指定条
件を満たす文書要素を前記検索用情報を用いて検索する
文書要素検索手段と、前記文書要素検索手段により取り出された文書要素に関
して、前記検索条件入力手段により指定された文書内容
条件を満たすものを検索する内容検索手段とを備えたこ
とを特徴とする文書検索装置。2. A document storage means for storing a plurality of document data composed of document elements, a headline detection means for detecting a character string representing a headline from a document stored in the document storage means, and the headline detection means. A search information generation unit that generates search information that represents the correspondence between the character string detected by the search unit and the document element stored by the document storage unit; and the search information generated by the search information generation unit. A search information storage means to be stored; a search condition input means for specifying a document element specifying condition that is a condition for specifying a document element and a document content condition that is a condition for specifying a document content corresponding to the document element; A document element retrieving means for retrieving a document element satisfying a document element designating condition designated by the search condition inputting means using the search information; A document retrieving apparatus comprising: a content retrieving unit for retrieving document elements retrieved by the retrieving unit that satisfy the document content conditions designated by the retrieving condition inputting unit.

【請求項３】構造化された文書データの検索を行なう
文書検索装置において、検索の結果の文書データ取り出し範囲を、文書の全体ま
たは部分構造として指定する取り出し範囲指定手段と、その取り出し範囲指定手段により指定された部分構造を
検索する手段とを有することを特徴とする文書検索装
置。3. A document retrieval device for retrieving structured document data, wherein retrieval range designating means for designating a retrieval range of document data as a whole or a partial structure of a document, and the retrieval range designating means. And a means for retrieving a partial structure designated by the document retrieval apparatus.