JPH07121552A

JPH07121552A - Document group analyzing device

Info

Publication number: JPH07121552A
Application number: JP5291274A
Authority: JP
Inventors: Shintaro Kojo; 慎太郎古城
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1993-10-27
Filing date: 1993-10-27
Publication date: 1995-05-12
Anticipated expiration: 2018-01-14
Also published as: JP3367174B2

Abstract

PURPOSE:To obtain instantaneous sorting and to attain dynamic sorting by retrieving the whole document in a defined range and sorting and totalizing new combinations of categories obtained by logical operation between defined category elements. CONSTITUTION:A retrieving condition preparing part 13 develops a category specified from an input part 11 to a lower hierarchical category element and a group of its synonyms by the use of a term developing data base 12. An inquiry engine 14 retrieves a data base 15 to be retrieved by whole sentence retrieval by the use of respective synonyms obtained by the preparing part 13. A totalizing part 16 totalizes retrieved results in each category and a display part 17 displays a picture in accordance with a display format specified from the input part 11. While observing the display, a user judges whether the display is a required one or not, and when a required display is not obtained, the specification of a display method is changed by the input part 11 and processing is executed again to display a picture by a new display method. The above processing is repeated until a required result is obtained to analyze a document group.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上利用分野】本発明は、文書を保管管理するデー
タベースから文書群を検索し、その分析を行う文書群分
析装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document group analyzing apparatus for searching a document group from a database for storing and managing documents and analyzing the document group.

【０００２】[0002]

【従来の技術】文書を自動分類する方法として、特定の
文字列を含んでいるかどうかで判断する、という方法が
あった。それに加え、複数の文字列を総称する「カテゴ
リー」を定義して、厳密な文字列照合では取りこぼして
しまう、表記のゆれや類似語に対応する方法も提案され
ている（特開平２−１０５９７３号公報）。しかし、い
ずれにしても文字列あるいはカテゴリーの扱いは固定的
である。この方法は、予め想定される分類項目に、無理
にでも仕分るための方式を取っている。こうしたやり方
は、（１）単語の頻度が今後も変化しない、（２）新た
に設けるべき分類項目はない、といった仮定に基づいて
おり、これは現実と一致しない。例え、しばらくの間
は、ある程度有効に動作していたとしても、いずれはシ
ステムの改編は免れないようになるだろう。つまり、予
め想定した分類のためにカテゴリーを利用するにとどま
っており、新しい分類項目が現れることを見逃してい
る。分類すべき文書群の性質が一定であるときは、それ
なりに有用であるかもしれないが、文書の性質は一般的
には変わりうるものであるから、分類方式も時々変化す
べき性質をもつべきである。2. Description of the Related Art As a method of automatically classifying documents, there is a method of judging whether or not a specific character string is included. In addition, a method has been proposed in which a "category" that collectively refers to a plurality of character strings is defined, and a typographical fluctuation or a similar word that is missed in a strict character string collation is dealt with (Japanese Patent Laid-Open No. 2-105973). Gazette). However, in any case, the handling of character strings or categories is fixed. This method uses a method to sort the items into categories that are supposed in advance, even if it is impossible. This method is based on the assumption that (1) the frequency of words will not change in the future, and (2) there will be no new classification items to be provided, which does not match the reality. Even if it works for a while, it will be inevitable that the system will be modified. In other words, it only uses categories for presumed classification, and misses the appearance of new classification items. It may be useful when the properties of the documents to be classified are constant, but since the properties of documents are generally variable, the classification method should also have properties that should change from time to time. Is.

【０００３】[0003]

【発明が解決しようとする課題】前述のように従来の技
術では固定的な分類がなされ、動的な分類ができない。
例えば、従来の技術を利用して「コンピュータ」関連の
文書と「医療」関連の文書を別々の場所に分けて置いて
おくと、医療分野にコンピュータメーカーがこぞって参
入しはじめた、といった状況に対応することができな
い。つまり、従来のやりかたでは、どちらかに入れられ
てしまうからである。こういうときには、「コンピュー
タかつ医療」についての文書を他からより分けて見たい
はずである。前もって分類項目を想定するのは、変化の
激しい状況においては、得策ではない。また、分類より
先にカテゴリーの方が来るべきであり、分類はカテゴリ
ーを適当に組み合せることで得られるべきである。分類
のためにカテゴリーを定義するのでは本末顛倒である。
本発明は、このような従来の技術の問題点を解消し、文
書群分析装置において、検索のためのカテゴリーの有効
な活用を図り、かつ、即時的な分類が得られ、ある語が
別のカテゴリーに移るといった操作にも耐える柔軟なシ
ステム構築をすることを目的とする。また、最新の情報
に対応したカテゴリーの定義の変更を容易に行うことが
できるようにすることを目的とする。As described above, according to the conventional technique, fixed classification is performed, and dynamic classification cannot be performed.
For example, using conventional technology, if you put "computer" related documents and "medical" related documents separately in different places, it will respond to the situation that computer manufacturers have all started to enter the medical field. Can not do it. In other words, in the conventional way, it can be put in either. At such times, you may want to separate the “computer and medical” document from the rest. Assuming classification items in advance is not a good idea in a rapidly changing environment. Also, the categories should come before the categories, and the categories should be obtained by combining the categories appropriately. Defining categories for classification is a chore.
The present invention solves the above-mentioned problems of the conventional technique, makes effective use of categories for retrieval in a document group analysis device, and provides immediate classification, and a certain word is different. The purpose is to build a flexible system that can withstand operations such as moving to a category. It is also intended to facilitate changing the definition of the category corresponding to the latest information.

【０００４】[0004]

【課題を解決するための手段】本発明の文書群分析装置
は、分析対象の複数の文書を記憶する対象記憶手段（図
１の１５）と、検索用の複数のカテゴリーの語を組み合
わせて入力する入力手段（図１の１１）と、カテゴリー
が定義され、カテゴリーの語を展開するために用いる展
開情報記憶手段（図１の１２）と、前記入力されたカテ
ゴリーの語を、展開情報記憶手段を参照して、そのカテ
ゴリーに属する検索用のキー語に展開情報記憶手段を参
照して展開する用語展開手段（図１の１３）と、展開さ
れたすべての検索用のキー語により前記対象記憶手段の
文書を検索する検索手段（図１の１４）と、検索された
文書群を、前記入力されたカテゴリー間の指定された論
理演算によって定まる新たなカテゴリー群に分類、集計
する集計手段（図１の１６）と、集計結果を指定の表示
形式により表示する表示手段（図１の１７）とを備えた
ことを特徴とする。また、上記構成において、前記展開
情報記憶手段のカテゴリーの定義を編集するための編集
手段（図１の１８）を設けることができる。A document group analysis apparatus according to the present invention inputs a combination of target storage means (15 in FIG. 1) for storing a plurality of documents to be analyzed and a plurality of categories of words for retrieval. Input means (11 in FIG. 1), a category is defined, expansion information storage means (12 in FIG. 1) used to expand a word of the category, and the input category word is expanded information storage means. The term storage means (13 in FIG. 1) for expanding the search key words belonging to the category by referring to the expansion information storage means, and the expanded target storage by all the expanded search key words. Retrieval means (14 in FIG. 1) for retrieving documents of the means, and aggregating means for categorizing and aggregating the retrieved document group into a new category group determined by a specified logical operation between the inputted categories (FIG. And 16), characterized by comprising a display means for displaying the designated display format count results (17 in Figure 1). Further, in the above configuration, an editing means (18 in FIG. 1) for editing the definition of the category of the expansion information storage means can be provided.

【０００５】[0005]

【作用】入力手段によりユーザが分析しようとする文書
群に関係ありそうな複数のカテゴリーの語の組合せを入
力すると、用語展開手段は展開情報記憶手段に記憶され
たカテゴリーの定義に従って、検索用のキーとなる語を
得る。検索手段は用語展開手段によって得られた検索用
のキーとなる語により、対象記憶手段に記憶された文書
を検索する。その検索によって得られた文書群は入力手
段によって入力された複数のカテゴリー（例えば、「製
品」「競合」「地域」）の範囲のすべてが網羅されてい
る。集計手段は、前記検索された文書群を前記入力され
た複数のカテゴリーのそれぞれの要素（「製品」カテゴ
リーに属する「複写機」「ファクシミリ」「ワークステ
ーション」、「競合」カテゴリーに属する「Ｐ社」「Ｑ
社」、「第二四半期」カテゴリーに属する「４月」「５
月」「６月」など）間の論理演算によって定まる組み合
わせのカテゴリー群に分類、集計する。表示手段は集計
結果を指定の表示形式（図１３）で一覧表示する。本発
明によれば、入力した複数のカテゴリーを定義によって
展開し、定義された範囲の文書をすべて索出し、定義さ
れたカテゴリー要素間の論理演算によって得られるカテ
ゴリーの新たな組み合わせ対して分類集計するので、即
時的な分類が得られ、動的な分類ができる。また、細か
な分類もできる。従って、ある語が別のカテゴリーに移
るといった操作にも耐える柔軟なシステム構築をするこ
とができる。それを表示する際には、複数のキー語の分
布状態を一回のユーザ入力で調べることができる。それ
によって文書群全体からの傾向を見て取ることや、文書
群どうしの関連などが具体的にわかる。また、展開情報
記憶手段のカテゴリーの定義を編集するための編集手段
を設けた場合には、カテゴリーの定義を随時変更するこ
とができるので、最新の情報に対応したカテゴリーの定
義の変更を容易に行うことができる。When the user inputs a combination of words of a plurality of categories which is likely to be related to the document group to be analyzed by the input means, the term expansion means causes the term expansion means to search for a word according to the definition of the category stored in the expansion information storage means. Get the key words. The search means searches the document stored in the target storage means by the word serving as the search key obtained by the term expansion means. The document group obtained by the search covers all the ranges of a plurality of categories (for example, “product”, “competition”, and “region”) input by the input means. The aggregating means uses the retrieved document groups as respective elements of the input plurality of categories (“copying machine”, “facsimile”, “workstation”, which belongs to the “product” category, and “P company” which belongs to the “competition” category. "Q
"April" and "5" that belong to the "Company" and "Second quarter" categories
Months, “June”, etc.) are categorized and aggregated into a combination of categories determined by logical operations. The display means displays a list of the counting results in a designated display format (FIG. 13). According to the present invention, a plurality of input categories are expanded by definition, all documents within a defined range are searched, and a new combination of categories obtained by logical operation between defined category elements is classified and aggregated. Therefore, an immediate classification can be obtained and a dynamic classification can be performed. Also, fine classification is possible. Therefore, it is possible to construct a flexible system that can withstand the operation of moving a certain word to another category. When displaying it, the distribution state of a plurality of key words can be checked by one user input. By doing so, it is possible to see trends from the entire document group and the relationship between the document groups. Further, when the editing means for editing the category definition of the expansion information storage means is provided, the category definition can be changed at any time, so that the category definition corresponding to the latest information can be easily changed. It can be carried out.

【０００６】[0006]

【実施例】図１は本発明の実施例による文書群分析装置
の構成を示す図である。この文書群分析装置は、図１に
示すように、文書群分析用コマンドを含む操作指示情報
や表示用のデータ形式等を入力するための入力部１１
と、文書群分析用コマンドで指定する検索の用語（カテ
ゴリー）をそれに関連するカテゴリ要素や類義語などの
検索に用いる用語群と対応させた情報を格納する用語展
開用データベース１２と、文書群分析用コマンドから用
語展開用データベース１２を用いて、適切な検索コマン
ドに変換する検索条件作成部１３と、インデックスを有
し、文書を検索可能に保管した検索対象データベース１
５と、検索条件作成部１３から与えられた検索コマンド
により検索対象データベース１４をインデックス等を用
いて検索し、検索結果を集計部１６に渡す問合せエンジ
ン１４と、検索結果に対して集計を行う集計部１６と、
集計の結果を入力部１１から指示された表示形式に従っ
て表示を行う表示部１７と、用語展開用データの編集を
行う用語展開用データ編集部１８を備えている。1 is a diagram showing the configuration of a document group analysis apparatus according to an embodiment of the present invention. As shown in FIG. 1, this document group analysis device has an input unit 11 for inputting operation instruction information including a document group analysis command, a data format for display, and the like.
And a term expansion database 12 for storing information in which a search term (category) specified by a document group analysis command is associated with a term group used for searching related category elements or synonyms, and for a document group analysis A search condition creation unit 13 for converting a command into an appropriate search command using the term expansion database 12, and a search target database 1 having an index and storing documents in a searchable manner.
5, an inquiry engine 14 that searches the search target database 14 using an index or the like by a search command given from the search condition creating unit 13 and passes the search results to the totaling unit 16, and an aggregate that aggregates the search results. Part 16;
The display unit 17 includes a display unit 17 for displaying the result of tabulation according to the display format designated by the input unit 11, and a term expansion data editing unit 18 for editing the term expansion data.

【０００７】次に、以上のように構成された本実施例の
文書群分析装置の動作（処理）について説明する。図２
は本文書群分析装置による文書群の分析の処理の概要を
示すフローチャートである。入力部１１においては、使
用するカテゴリーの指定を行う（ステップＳ２１）。ま
た、表示方法の指定を行う（ステップＳ２２）。検索条
件作成部１３は入力部１１から指定されたカテゴリーを
用語展開用データベース１２を用いて下位の階層のカテ
ゴリー要素およびその類義語の集合へと展開する（ステ
ップＳ２３）。問合せエンジン１５は、検索条件作成部
１３で得られたそれぞれの類義語を用いて検索対象デー
タベース１４を全文検索により検索する（ステップＳ２
４）。集計部１６は検索結果をカテゴリー別に集計する
（ステップＳ２５）。表示部１７は入力部１１の指定す
る表示形式に従って表示を行う（ステップＳ２６）。ユ
ーザはその表示を見て望ましい表示か否かを判断し、望
ましい表示が得られていないときには、入力部１１によ
り表示方法の指定を変える。そして、ステップＳ２３〜
２６の処理が再び実行されて、新しい表示方法により表
示する。このような操作を繰り返して望ましい表示が得
られたら、ステップＳ２８に進む。ステップＳ２８では
さらに再分類を試みるか否かをユーザが判断する（ステ
ップＳ２８）。望ましい結果（つまり文書群から、その
とき求められている端的な結論といったもの）が得られ
るまで、ステップＳ２２〜ステップＳ２８の処理を繰り
返す。このように試行錯誤をしながら、文書群を捜しま
わって文書群の分析を行う。各部の処理の詳細を以下に
説明する。Next, the operation (processing) of the document group analysis apparatus of the present embodiment having the above-mentioned configuration will be described. Figure 2
3 is a flowchart showing an outline of processing of analyzing a document group by the document group analyzing apparatus. In the input unit 11, the category to be used is designated (step S21). Further, the display method is designated (step S22). The search condition creation unit 13 uses the term expansion database 12 to expand the category specified by the input unit 11 into a category element of a lower hierarchy and a set of synonyms thereof (step S23). The inquiry engine 15 searches the search target database 14 by full-text search using each synonym obtained by the search condition creation unit 13 (step S2).
4). The tallying unit 16 tallies the search results by category (step S25). The display unit 17 performs display according to the display format specified by the input unit 11 (step S26). The user looks at the display and determines whether or not the display is desirable, and when the desired display is not obtained, the designation of the display method is changed by the input unit 11. And step S23-
The processing of 26 is executed again to display by the new display method. When a desired display is obtained by repeating such an operation, the process proceeds to step S28. In step S28, the user determines whether or not to further attempt the reclassification (step S28). The processing of steps S22 to S28 is repeated until a desired result (that is, a simple conclusion obtained at that time from the document group) is obtained. In this way, while conducting trial and error, the document group is searched and the document group is analyzed. Details of the processing of each unit will be described below.

【０００８】（入力部１１）入力部１１はカテゴリーの
組み合わせを指示することができるのならどのような方
式で入力するように構成してもよい。また、この入力部
１１はどのように表示をするのかを同時に指定する。図
３の例は、Ｘ軸にカテゴリー「製品」と「競合」の組合
わせ、Ｙ軸に「第二四半期」を置いて、全文書を分析す
る指定を行った例を示すものである。例えば、「製品」
以下の階層の「複写機」、「ファクシミリ」・・・のよ
うな「製品」の要素カテゴリーについては入力する必要
はない。その結果がどのように示されるかは表示部１７
の説明において後述する。図３の例での入力データのデ
ータ構造は図４に示すような分析軸とその軸でのカテゴ
リーを対応させたものである。(Input Unit 11) The input unit 11 may be configured to input by any method as long as the combination of categories can be instructed. Further, the input unit 11 simultaneously specifies how to display. The example of FIG. 3 shows an example in which the combination of the categories “product” and “competition” is set on the X axis and “second quarter” is set on the Y axis, and designation is made to analyze all documents. . For example, "Product"
It is not necessary to enter the element categories of "product" such as "copy machine", "facsimile" ... in the following layers. The display unit 17 shows how the result is displayed.
Will be described later. The data structure of the input data in the example of FIG. 3 corresponds to the analysis axis and the category on that axis as shown in FIG.

【０００９】（用語展開用データベース１２）用語展開
用データベース１２において、カテゴリーを下位の階層
あるいは末端のカテゴリー要素や類義語に展開するため
の構造はどのような構造であってもよい。一般的には図
５に示すように階層構造で表す。この場合、ルートカテ
ゴリー５１をルートとしてその下の階層にサブカテゴリ
ー５２がいくつかあって、末端カテゴリー５３で終わっ
ている。末端カテゴリー５３にいくつかの類義語５４が
対応している。図６は用語展開用データ編集部１８によ
り用語展開用データベース１２を編集している表示画面
の例を示すものである。この画面例においては、左側に
カテゴリーが階層構造で表示され、右側に左側のハイラ
イトされた末端カテゴリーに対応する類義語が表示され
ている。図６の左側に表示されたカテゴリーの例におい
て、「競合」がルートカテゴリーであり、「製品」「地
域」…等はサブカテゴリーであり、「複写機」「ファク
シミリ」…等は末端カテゴリーである。右側の類義語に
は末端カテゴリーのハイライトされた「パーソナルコン
ピュータ」に対応する類義語が表示されている。これら
はこの表示された編集画面上において任意に編集するこ
とが可能であり、その編集の結果は直ちに用語展開用デ
ータに反映される。(Term Expansion Database 12) In the term expansion database 12, any structure may be used for expanding a category into a lower hierarchy or a category element at the end or a synonym. Generally, it is expressed in a hierarchical structure as shown in FIG. In this case, there are some subcategories 52 in the hierarchy below the root category 51 as the root, and the subcategories 52 end at the end category 53. Several synonyms 54 correspond to the end category 53. FIG. 6 shows an example of a display screen on which the term expansion database 12 is edited by the term expansion data editing unit 18. In this screen example, categories are displayed in a hierarchical structure on the left side and synonyms corresponding to the highlighted end category on the left side are displayed on the right side. In the example of the categories displayed on the left side of FIG. 6, “competition” is the root category, “product”, “region”, etc. are subcategories, and “copying machine”, “facsimile”, etc. are end categories. . In the synonyms on the right side, synonyms corresponding to the highlighted "personal computer" in the end category are displayed. These can be arbitrarily edited on the displayed editing screen, and the result of the editing is immediately reflected in the term expansion data.

【００１０】（検索条件作成部１３）入力条件作成部１
３は、入力部１１で指定されたカテゴリーの組合せにつ
いて適合するものを求める検索条件を検索エンジンに提
示する。用語展開用データベース１２の内容に従って、
カテゴリーを実際に検索する語のレベルまで展開する。
類義語が定義されている場合は、すべての類義語で検索
するので、問合せ回数がそのぶん増えることになる。こ
の作業は、結局、図７に示すようなカテゴリー要素と類
義語の対応表をつくることである。図８によりカテゴリ
ーの展開の処理を説明する。検索条件作成部１３は入力
部１１から入力されたカテゴリーが用語展開用データベ
ース１２に登録されている語であるか否かを調べる（ス
テップＳ８１）。登録されていなかったときは入力され
たカテゴリーの語をそのまま出力する（ステップＳ８
２）。入力されたカテゴリーが用語展開用データベース
１２に登録されていたときには、そのカテゴリーに属す
る語をカテゴリー要素として出力する（ステップＳ８
３）。さらに、そのカテゴリー要素がサブカテゴリーを
持っているかを調べる（ステップＳ８４）。その結果、
サブカテゴリーをもっていた場合には、そのサブカテゴ
リーを対象として（ステップＳ８８）、ステップＳ８３
の処理を繰り返す。そしてサブカテゴリーを持っていな
いカテゴリー要素に達したならば、それに対応する類義
語を出力する（ステップＳ８５）。そして次のカテゴリ
ーを対象として、ステップＳ８３からの処理を繰り返し
て、カテゴリー要素や類義語を出力して行く。次のカテ
ゴリーがなくなったときカテゴリーの展開を終了する。(Search condition creating unit 13) Input condition creating unit 1
3 presents to the search engine search conditions for finding a match for the combination of categories specified by the input unit 11. According to the contents of the term expansion database 12,
Expand the category to the level of the actual search word.
When a synonym is defined, all synonyms are searched, so the number of queries increases accordingly. This work is ultimately to create a correspondence table of category elements and synonyms as shown in FIG. The category expansion process will be described with reference to FIG. The search condition creating unit 13 checks whether or not the category input from the input unit 11 is a word registered in the term expansion database 12 (step S81). If it is not registered, the words in the input category are output as they are (step S8).
2). When the inputted category is registered in the term expansion database 12, words belonging to the category are output as category elements (step S8).
3). Further, it is checked whether the category element has a sub category (step S84). as a result,
If there is a subcategory, the subcategory is targeted (step S88) and step S83.
The process of is repeated. When a category element having no subcategory is reached, a synonym corresponding to the category element is output (step S85). Then, the process from step S83 is repeated for the next category to output category elements and synonyms. When there is no next category, the expansion of the category ends.

【００１１】（問合せエンジン１４）問合せエンジン１
４は検索対象データベース１５に全文検索を実施して返
答をする。これは、検索対象データベース１５の形式に
応じたモジュールである。検索条件作成部１３で得られ
たすべてのカテゴリー要素と類義語をそれぞれ検索語と
して検索し、その検索語を含む文書のＩＤ（ファイル
名、物理的アドレスなど）を検索の結果として集めてく
る。図９は、その問合せ処理のフローを示すものであ
る。カテゴリー要素をキーにして検索対象データベース
１５を検索する（ステップＳ９１）。そのカテゴリー要
素に対応する類義語を一つずつキーにして順次検索を行
い（ステップＳ９２、ステップＳ９４）、そのカテゴリ
ー要素に対応するすべての類義語での検索が終了する
（ステップＳ９３でＹの判定になったとき）と、次のカ
テゴリー要素を取り出して（ステップＳ９６）、同様の
処理（ステップＳ９１〜ステップＳ９４）を行う。検索
条件作成部１３から与えられたすべてのカテゴリー要素
とそれに対応する類義語による検索が終了するまで（ス
テップＳ９５）上記の処理を繰り返す。(Query engine 14) Query engine 1
Reference numeral 4 performs a full-text search on the search target database 15 and returns a response. This is a module according to the format of the search target database 15. All the category elements and synonyms obtained by the search condition creation unit 13 are searched as search words, and the IDs (file names, physical addresses, etc.) of the documents containing the search words are collected as the search results. FIG. 9 shows the flow of the inquiry processing. The search target database 15 is searched using the category element as a key (step S91). The synonyms corresponding to the category element are sequentially searched one by one (steps S92 and S94), and the search for all synonyms corresponding to the category element is completed (Y is determined in step S93). Then, the next category element is extracted (step S96), and the same processing (step S91 to step S94) is performed. The above process is repeated until the search by all the category elements given from the search condition creating unit 13 and the synonyms corresponding to them is completed (step S95).

【００１２】（検索対象データベース１５）検索対象デ
ータベース１５には、速度を気にする必要がないのな
ら、どういう方式でもかまわない。しかし、一般的に
は、速度は速いほうがよく、そのためには用語について
インデクスを持っているべきである。もっとも単純に
は、図１０に示すような形式のインデクスになる。
「語」も「文書ＩＤ」もスペース効率を考えるなら圧縮
されているべきである。(Search Target Database 15) The search target database 15 may be of any type as long as the speed does not need to be taken into consideration. However, in general, the faster the speed, the better you should have an index on the terms. The simplest is an index of the form shown in FIG.
Both "word" and "document ID" should be compressed in view of space efficiency.

【００１３】（集計部１６）集計部１６は、検索した結
果得られた文書ＩＤの個数を数えたり、論理演算（例え
ば、Ａを含みかつＢを含む文書求める）を実施して表示
に備える。作業は主に２段階に分かれる。(Aggregating unit 16) The aggregating unit 16 counts the number of document IDs obtained as a result of the search and performs a logical operation (for example, a document including A and B is included) to prepare for display. The work is mainly divided into two stages.

【００１４】（作業１）類義語についてまとめるカテゴリー要素「パソコン」の類義語として「ＰＣ」
「パーソナルコンピュータ」があったとして、〈１〉語「パソコン」を含む文書＝｛文書１、文書２｝〈２〉語「ＰＣ」を含む文書＝｛文書２｝〈３〉語「パーソナルコンピュータ」を含む文書＝｛文
書７、文書８、文書１１｝という検索結果が得られたとしたら、カテゴリー要素
「パソコン」を含む文書はこれらのすべての文書をマー
ジしたもの、つまり｛文書１、文書２、文書７、文書
８、文書１１｝となる。(Work 1) Summarize synonyms "PC" as a synonym of the category element "PC"
Assuming that there is a "personal computer", the document containing the <1> word "personal computer" = {document 1, document 2} <2> the document containing the word "PC" = {document 2} <3> the word "personal computer" Suppose that the search result is: document including {= document 7, document 8, document 11}, the document including the category element “PC” is a merge of all these documents, that is, {document 1, document 2, Document 7, document 8, document 11}.

【００１５】（作業２）カテゴリー要素をつかって分
類する例えば、後述する表示部１７で必要とされる、「パソコ
ン」かつ「Ａ社」かつ「４月」なる文書を求めるには、〈１〉カテゴリー要素「パソコン」を含む文書＝｛文書
１、文書２｝〈２〉カテゴリー要素「Ａ社」を含む文書＝｛文書２、
文書７、文書８、文書１１｝〈３〉カテゴリー要素「４月」を含む文書＝｛文書２、
文書８｝としたら、その結果は、｛文書２｝となる。(Work 2) Classify by using category element For example, to obtain a document "PC", "Company A" and "April" which is required on the display unit 17 described later, <1> Document containing category element "PC" = {Document 1, Document 2} <2> Document containing category element "Company A" = {Document 2,
Document 7, Document 8, Document 11} <3> Document including category element “April” = {Document 2,
Document 8} results in {Document 2}.

【００１６】以上の処理の詳細について流れ図にしたの
が図１１および図１２である。集計部の作業１において
は、図１１に示すように、空リストを作成する（ステッ
プＳ１１１）。カテゴリー要素による検索結果をマージ
する（ステップＳ１１２）。そのカテゴリーに対応する
類義語による検索結果をマージする（ステップＳ１１
３）。最後の類義語になるまで（ステップＳ１１４で判
定）、次々と類義語をマージしてゆき、一つのカテゴリ
ー要素に対応するすべての類義語のマージが終わると次
のカテゴリー要素について、同様にステップＳ１１２な
いしステップＳ１１５の処理を繰返しマージを行い、最
後のカテゴリー要素とそれに対応する類義語による検索
結果のマージが終わると（ステップＳ１１６で判定）、
図１２の作業２の処理へ移る。作業１の終了時点ではカ
テゴリー要素自身とその類義語を含む文書群のリストが
得られる。次の作業２で、表示用のデータとなる項目を
算出する。FIG. 11 and FIG. 12 are flowcharts showing details of the above processing. In work 1 of the tallying unit, as shown in FIG. 11, an empty list is created (step S111). The search results by the category element are merged (step S112). The search results by the synonyms corresponding to the category are merged (step S11).
3). The synonyms are successively merged until the final synonym is reached (determined in step S114), and when all synonyms corresponding to one category element have been merged, steps S112 through S115 are similarly performed for the next category element. When the merging of the search results by the last category element and the synonyms corresponding to the last category element is completed (determined in step S116),
The process moves to the process 2 of FIG. At the end of work 1, a list of documents including the category element itself and its synonyms is obtained. In the next operation 2, items to be data for display are calculated.

【００１７】集計部の作業２の一例として、図１２に示
すのは、「製品」「競合」「第二四半期」の３つのカテ
ゴリーで分析を行った場合の処理であり、図１３の表示
をするためのデータを生成する。この例では「製品」の
カテゴリーには「複写機」「ファクシミリ」「パーソナ
ルコンピュータ」「ワークステーション」等のカテゴリ
ー要素があり、「競合」のカテゴリーには「Ａ社」「Ｂ
社」等のカテゴリー要素があり、「第二四半期」のカテ
ゴリーには「４月」「５月」「６月」等のカテゴリー要
素がある。図１２の処理では、「製品」「競合」「第二
四半期」の３つのカテゴリーのカテゴリー要素の組み合
わせを順次に作って行き、各組み合わせに対して共通に
含まれる文書を出力する（ステップＳ１２３）。ステッ
プＳ１２０、Ｓ１２８およびＳ１２９からなるループは
「製品」のカテゴリー要素を順次選択し、ステップＳ１
２１、Ｓ１２６およびＳ１２７からなるループは「競
合」のカテゴリー要素を順次選択し、ステップＳ１２
２、Ｓ１２４およびＳ１２５からなるループは「第二四
半期」のカテゴリー要素を順次選択する。これらの入れ
子処理により３種のカテゴリー要素のあらゆる組合せが
生成され、それぞれの組み合わせに対して共通に含まれ
る文書が出力される。図１３には「製品」のカテゴリー
要素「複写機」と、「競合」のカテゴリー要素「Ｐ社」
と、第二四半期の「４月」のカテゴリー要素の組み合わ
せに対して共通に含まれる文書がＡであることが示され
ている。分析のために指定するカテゴリーを多く指定す
ればするほど、処理の入れ子は深くなって行く。また、
常にこのような計算が必要なわけではなく、結果をキャ
ッシュしておくことにより、計算回数を少なくすること
もできる。As an example of the operation 2 of the tallying unit, FIG. 12 shows the processing when the analysis is performed in the three categories of “product”, “competition” and “second quarter”. Generate data for display. In this example, there are category elements such as "copy machine", "facsimile", "personal computer", "workstation" in the "product" category, and "Company A" and "B" in the "competitor" category.
There are category elements such as "Company" and the like, and there are category elements such as "April", "May" and "June" in the "second quarter" category. In the process of FIG. 12, combinations of category elements of three categories of “product”, “competition”, and “second quarter” are sequentially created, and a document commonly included in each combination is output (step). S123). A loop including steps S120, S128 and S129 sequentially selects category elements of "product", and then step S1
The loop composed of 21, S126 and S127 sequentially selects category elements of “competition”, and then step S12.
2, the loop consisting of S124 and S125 sequentially selects the category element of "second quarter". By these nesting processes, all combinations of the three types of category elements are generated, and the documents commonly included in each combination are output. In FIG. 13, the category element "copying machine" of "product" and the category element "company P" of "competitor"
And the document included in common for the combination of the category elements of "April" in the second quarter is A. The more categories you specify for your analysis, the deeper the nesting of the process. Also,
Such calculation is not always necessary, and the number of calculations can be reduced by caching the result.

【００１８】（表示部１７）表示部１７は、入力部１１
からの表示形式の指示に従い表示する。入力部１１で例
として挙げた検索結果は図１３ような表として表現でき
る。それぞれの枠の中に文書は分類されている。例え
ば、Ａは「４月」かつ「複写機」かつ「Ｐ社」なる文書
であり、Ｂは「４月」かつ「複写機」かつ「Ｑ社」なる
文書になる。これら該当する文書の個数をグラフとして
もよいし、それぞれの文書に直接アクセスできるように
してもよい。(Display unit 17) The display unit 17 includes an input unit 11
Display according to the display format instructions from. The search result given as an example in the input unit 11 can be expressed as a table as shown in FIG. The documents are classified in each frame. For example, A is a document of "April" and "copy machine" and "P company", and B is a document of "April" and "copy machine" and "Q company". The number of the corresponding documents may be graphed, or each document may be directly accessible.

【００１９】（用語展開用データ編集部１８）用語展開
用データ編集部１８は、用語展開用データベース１２に
記憶されているカテゴリーの定義を、削除、追加、貼付
け等の編集を行い、また、類義語の追加や削除等の処理
を行う。図１４その処理のイベントループを示すもの
で、イベントを受け付けて、そのイベントの種類に応じ
て、開始処理（ステップＳ１４１）、終了処理（ステッ
プＳ１４２）、保存処理（ステップＳ１４３）、カテゴ
リー選択処理（ステップＳ１４４）、カテゴリー削除処
理（ステップＳ１４５）、カテゴリー追加処理（ステッ
プＳ１４６）、カテゴリー貼付処理（ステップＳ１４
７）、類義語削除処理（ステップＳ１４８）、類義語追
加処理（ステップＳ１４９）のいずれかの処理を行う。
図６の表示画面は、編集を行うためのインタフェースの
画面であり、この画面上のアイコンや特定の位置をマウ
スによりクリックすることあるいはその他の原因によ
り、イベントが発生し、処理が行われることになる。(Term Expansion Data Editing Unit 18) The term expansion data editing unit 18 edits the definition of the category stored in the term expansion database 12 by deleting, adding, pasting, etc. Processing such as adding and deleting. FIG. 14 shows an event loop of the process, in which an event is accepted, and a start process (step S141), an end process (step S142), a save process (step S143), a category selection process ( Step S144), category deletion processing (step S145), category addition processing (step S146), category pasting processing (step S14)
7), a synonym deletion process (step S148), or a synonym addition process (step S149).
The display screen of FIG. 6 is a screen of an interface for editing, and when an icon or a specific position on this screen is clicked by the mouse or other causes, an event occurs and processing is performed. Become.

【００２０】図１５は、図１４の開始処理の詳細を示す
ものである。この開始処理においては、用語展開用デー
タベースオープンの操作が行われるれる（ステップＳ１
５１）。オープンできなれば、エラー終了し、オープン
できたときには、用語展開用データベース１２の内容を
メモリへロードする（ステップＳ１５３）。メモリが不
足していないかを調べ（ステップＳ１５４）、編集用ウ
ィンドウに表示する（ステップＳ１５５）。表示用のメ
モリが不足していないを調べ（ステップＳ１５６）、不
足していなければ、開始処理を終わる。主メモリあるい
は表示用メモリが不足していたときには、用語展開用デ
ータベースをクローズする（ステップＳ１５９）。FIG. 15 shows details of the start processing of FIG. In this start processing, an operation of opening the term expansion database is performed (step S1).
51). If the file cannot be opened, the process ends with an error. If the file can be opened, the contents of the term expansion database 12 are loaded into the memory (step S153). It is checked whether or not the memory is insufficient (step S154), and it is displayed in the editing window (step S155). It is checked whether the display memory is not insufficient (step S156). If not, the start process is ended. When the main memory or the display memory is insufficient, the term expansion database is closed (step S159).

【００２１】図１６は終了処理を示すフローチャートで
ある。保存後メモリに変更があるかを調べ（ステップＳ
１６１）、変更があればユーザに保存するかどうかユー
ザに問い合わせる（ステップＳ１６２）ユーザが保存す
ることを選択したときには、保存処理を行う（ステップ
Ｓ１６４）。ステップＳ１６１でメモりに変更がなかっ
たとき、あるいはステップＳ１６３でユーザがＯＫを選
択しなかったときには、用語展開用データベースの編集
処理用のファイルをクローズする（ステップＳ１６
４）。そしてウィンドウおよびメモリのリリースをする
（ステップＳ１６６〜１６７）。FIG. 16 is a flow chart showing the termination processing. Check if there is any change in memory after saving (step S
161) If there is a change, ask the user whether to save (step S162) If the user selects to save, save processing is performed (step S164). When the memory is not changed in step S161 or when the user does not select OK in step S163, the file for editing the term expansion database is closed (step S16).
4). Then, the window and the memory are released (steps S166 to 167).

【００２２】図１７は、保存処理を示すもので、用語展
開用データベースへの書き込みを行う。FIG. 17 shows a saving process, in which the term expansion database is written.

【００２３】図１８は、カテゴリ選択処理のフローを示
すものである。新規に選択されたカテゴリー（＝Ｃｎ）
を取得する（ステップＳ１８１）。選択されているカテ
ゴリー（＝Ｃｓ）が新規に選択されたカテゴリーＣｎが
一致するか否かを判定し、もし一致していれば選択を解
除する。そしてＣｉにｎｉｌを設定する。ＣｓとＣｎが
一致しなかった場合には、新規カテゴリーを選択しＣｓ
にＣｎを入れる。FIG. 18 shows a flow of category selection processing. Newly selected category (= Cn)
Is acquired (step S181). It is determined whether or not the selected category (= Cs) matches the newly selected category Cn, and if they match, the selection is canceled. Then, nil is set in Ci. If Cs and Cn do not match, select a new category and select Cs
Put Cn in.

【００２４】図１９は、カテゴリー削除処理の処理フロ
ー図である。カテゴリー削除処理においては、まず、選
択されているカテゴリー（＝Ｃｓ）が空か否かを調べ
（ステップＳ１９１）、空でなければカテゴリーのカッ
トバッファ（＝Ｃｂ））をクリアする（ステップＳ１９
２）。選択されているカテゴリーＣｓとそのサブカテゴ
リーをカットバッファＣｂにコピーする（ステップＳ１
９３）。選択されているカテゴリーＣｓを抹消する（ス
テップＳ１９４）。そして選択を解除し、Ｃｓをｎｉｌ
に設定する。FIG. 19 is a processing flow chart of the category deletion processing. In the category deleting process, first, it is checked whether or not the selected category (= Cs) is empty (step S191), and if not empty, the category cut buffer (= Cb) is cleared (step S19).
2). The selected category Cs and its subcategories are copied to the cut buffer Cb (step S1).
93). The selected category Cs is deleted (step S194). Then cancel the selection and change Cs to nil.
Set to.

【００２５】図２０は、カテゴリー追加処理の処理フロ
ー図である。カテゴリー追加処理においては、ユーザに
カテゴリー名を入力させる（ステップＳ２０１）。入力
がキャンセル、または空文字列が入力されるされたかを
判定し（ステップＳ２０２）、Ｙｅｓであれば追加処理
は行わないで終了する。入力のキャンセルまたは文字列
が入力されたときには、新規にカテゴリＣｎを生成する
（ステップＳ２０３）。選択カテゴリー（Ｃｓ）が空で
あるか否かを判定し（ステップＳ２０４）、空であれば
Ｃｎをルートカテゴリに入れる。空でないときにはカテ
ゴリーＣｎを選択カテゴリーＣｓのサブカテゴリーとし
て挿入する。FIG. 20 is a processing flow chart of the category addition processing. In the category addition process, the user is prompted to input the category name (step S201). It is determined whether the input has been canceled or an empty character string has been input (step S202). If Yes, the addition process is not performed and the process ends. When the input is canceled or a character string is input, a new category Cn is generated (step S203). It is determined whether the selected category (Cs) is empty (step S204), and if it is empty, Cn is added to the root category. When it is not empty, the category Cn is inserted as a subcategory of the selected category Cs.

【００２６】図２１は、カテゴリー貼付け処理の処理フ
ロー図である。カテゴリのカットバッファＣｂが空かい
なかを調べる（ステップＳ２１１）。空であれば張り付
けるものがないので直ちに終了する。空であれば選択カ
テゴリーＣｓが空か否かを調べ（ステップＳ２１２）、
空であれば、カッとバッファＣｂ以下のサブカテゴリー
をルートカテゴリーに挿入する。空でない場合にはカッ
とバッファＣｂ以下のサブカテゴリーを選択カテゴリー
Ｃｓのサブカテゴリーとして挿入する（ステップＳ２０
６）。FIG. 21 is a processing flow chart of the category pasting processing. It is checked whether or not the category cut buffer Cb is empty (step S211). If it is empty, there is nothing to stick, so it ends immediately. If it is empty, it is checked whether the selected category Cs is empty (step S212),
If it is empty, the sub-categories below the bracket and the buffer Cb are inserted into the root category. If it is not empty, the subcategories below the bracket Cb are inserted as subcategories of the selected category Cs (step S20).
6).

【００２７】図２２は、類義語削除処理の処理フローを
示す図である。類義語削除処理においては、選択されて
いる類義語（＝Ｒｓ）が空かを調べ（ステップＳ２２
１）、空であれば削除処理は終了し、空でなければ類義
語Ｒｓの内容を抹消して（ステップＳ２２２）、終了す
る。FIG. 22 is a diagram showing a processing flow of the synonym deletion processing. In the synonym deletion process, it is checked whether the selected synonym (= Rs) is empty (step S22).
1) If it is empty, the deletion process ends. If it is not empty, the content of the synonym Rs is deleted (step S222), and the process ends.

【００２８】以上のような処理を組み合わせることによ
って、カテゴリー（用語体系）の編集を行うことができ
る。例えば、あるカテゴリーを今までと別のサブカテゴ
リーに移動する場合についてて説明する。今まで「Ａ水
産」は「遠洋漁業船の経営をしている会社」とのみの認
識しかなく、「漁業」というカテゴリーにしていいたの
を、「食品メーカー」に入れ直すという場合には、次の
ような操作を行う。カテゴリー「Ａ水産」の選択→カテ
ゴリーの削除→カテゴリー「食品メーカー」の選択→カ
テゴリーの張り付け。By combining the above processes, the category (terminology) can be edited. For example, a case of moving a certain category to another subcategory will be described. Until now, "A Fisheries" was only recognized as "a company that manages pelagic fishing vessels," and the category "fishery" was replaced by "food manufacturers." Do such an operation. Select category "A Fisheries" → Delete category → Select category "Food Maker" → Paste category.

【００２９】また、他の例として、類義語を追加して新
規な呼称に対応する場合について説明する。カテゴリー
「紙詰まり」にカテゴリー「ジャム」という言葉が一般
的に使われるようになったので、類義語として登録す
る。その場合の操作は、カテゴリー「紙詰まり」の選択
→類義語「ジャム」の追加As another example, a case where a synonym is added to correspond to a new name will be described. The word "jam" is now commonly used in the category "paper jam", so register it as a synonym. In that case, select the category "Paper jam" and add the synonym "Jam".

【００３０】これらの操作の結果は即座に文書群の分析
結果に反映されるので、試行錯誤しながら文書群の性質
や、設定した用語の体系の妥当性などを調べることがで
きる。Since the results of these operations are immediately reflected in the analysis result of the document group, the characteristics of the document group and the validity of the set terminology system can be checked by trial and error.

【００３１】[0031]

【発明の効果】本発明によれば、次のような効果を奏す
ることができる。（１）動的に分類ができる本発明によれば、入力した複数のカテゴリーを定義によ
って展開し、定義された範囲の文書をすべて索出し、定
義されたカテゴリー要素間の論理演算によって得られる
カテゴリーの新たな組み合わせ対して分類集計するの
で、即時的な分類が得られ、動的な分類ができる。形骸
化した分類法に頼って無意味な仕分作業をするのではな
く、つねに最新の分類を最新のデータに基づいて実施で
きる。According to the present invention, the following effects can be obtained. (1) According to the present invention that can dynamically classify, a category obtained by expanding a plurality of input categories by definition, searching for all documents in a defined range, and performing a logical operation between defined category elements Since a new combination of is classified and aggregated, an immediate classification can be obtained and a dynamic classification can be performed. Rather than relying on disorganized taxonomies to do meaningless sorting work, the latest classification can always be done based on the latest data.

【００３２】（２）カテゴリーの組合せによる細かな
分類本発明によれば分類をしたうえで、それを更に細分して
いくことができる。これは、定義したカテゴリーの組合
せを編集手段により変更することにより実現できる。例
えば、陸上競技というジャンルにしぼった上で、まった
くこのジャンルに関係のないカテゴリー「薬物」を利用
して、ドーピングの問題を取り上げることができる。(2) Fine classification based on combination of categories According to the present invention, it is possible to perform classification and then further divide it. This can be realized by changing the defined combination of categories by the editing means. For example, you can focus on the genre of athletics and use the category "drugs", which has nothing to do with this genre, to address the issue of doping.

【００３３】（３）用語の体系の自在な変更本発明によれば、編集手段を設けた場合、カテゴリーの
定義の変更がいつでも可能である。つまり、認識モデル
の更新が簡単になるわけである。例えば、今まで「Ａ水
産」は「遠洋漁船の経営をしている会社」とのみの認識
しかなく、「漁業」というカテゴリーにしていたのが、
「食品メーカー」に入れたほうがよい、といった処理が
できる。(3) Flexible modification of term system According to the present invention, when the editing means is provided, the definition of the category can be modified at any time. In other words, updating the recognition model becomes easy. For example, until now, "A Fisheries" was only recognized as "a company that manages pelagic fishing boats", and the category "fishery" was
It is better to put it in a "food maker".

【００３４】（４）試行錯誤による発見キーワードを含んだ文書をリトリーブするだけでなく、
それを表示する際には、複数のキーワードの分布状態
を、一回のユーザー入力で調べることができる。それに
よって、文書群全体からの傾向を見て取ることや、文書
群どうしの関連などが具体的に分かる。また、用語展開
用データベースの内容を変更することによる表示の変化
を観察することで、文書中から有為な情報を抽出でき
る。(4) Discovery by trial and error Not only retrieve documents containing keywords,
When displaying it, the distribution state of multiple keywords can be checked with one user input. Thereby, it is possible to understand the tendency from the whole document group and the relation between the document groups. Also, by observing the change in the display caused by changing the contents of the term expansion database, significant information can be extracted from the document.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の実施例の文書群分析装置の構成を示す
ブロック図FIG. 1 is a block diagram showing the configuration of a document group analysis device according to an embodiment of the present invention.

【図２】実施例による文書群の分析処理の全体の流れを
示す図FIG. 2 is a diagram showing an overall flow of a document group analysis process according to an embodiment.

【図３】入力部１１の入力画面の一例を示す図FIG. 3 is a diagram showing an example of an input screen of an input unit 11.

【図４】入力データのデータ構造の例を示す図FIG. 4 is a diagram showing an example of a data structure of input data.

【図５】用語展開用データベースの内容を説明するため
の図FIG. 5 is a diagram for explaining the contents of a term expansion database.

【図６】用語展開用データベースを編集している画面の
例を示す図FIG. 6 is a diagram showing an example of a screen on which a term expansion database is edited.

【図７】検索条件作成部で作成された検索条件の例を示
す図FIG. 7 is a diagram showing an example of search conditions created by a search condition creation unit.

【図８】カテゴリーの展開の処理フロー図FIG. 8: Process flow diagram for category expansion

【図９】問合せエンジンによる対象データベースの検索
の処理フロー図FIG. 9 is a flow chart of processing for searching the target database by the query engine.

【図１０】検索の結果を示す図FIG. 10 is a diagram showing search results.

【図１１】集計部の作業１の処理フロー図FIG. 11 is a processing flowchart of work 1 of the tallying unit.

【図１２】集計部の作業１の処理フロー図FIG. 12 is a processing flowchart of work 1 of the tallying unit.

【図１３】表示部による分析結果の表示の例を示す図。FIG. 13 is a diagram showing an example of display of analysis results on a display unit.

【図１４】用語展開用データ編集部の処理フロー図FIG. 14 is a processing flow chart of a term expansion data editing unit.

【図１５】用語展開用データ編集部の処理における開始
処理のフロー図FIG. 15 is a flowchart of start processing in processing of a term expansion data editing unit.

【図１６】用語展開用データ編集部の処理における終了
処理のフロー図FIG. 16 is a flowchart of the end processing in the processing of the term expansion data editing unit.

【図１７】用語展開用データ編集部の処理における保存
処理のフロー図FIG. 17 is a flowchart of a saving process in the process of the term expansion data editing unit.

【図１８】用語展開用データ編集部の処理におけるカテ
ゴリー選択処理のフロー図FIG. 18 is a flow chart of category selection processing in the processing of the term expansion data editing unit.

【図１９】用語展開用データ編集部の処理におけるカテ
ゴリー削除処理のフロー図FIG. 19 is a flowchart of category deletion processing in the processing of the term expansion data editing unit.

【図２０】用語展開用データ編集部の処理における追加
処理のフロー図FIG. 20 is a flowchart of the additional processing in the processing of the term expansion data editing unit.

【図２１】用語展開用データ編集部の処理における張付
処理のフロー図FIG. 21 is a flowchart of a sticking process in the process of the term expansion data editing unit.

【図２２】用語展開用データ編集部の処理における類義
語削除処理のフロー図FIG. 22 is a flowchart of a synonym deletion process in the process of the term expansion data editing unit.

【図２３】用語展開用データ編集部の処理における類義
語追加処理のフロー図FIG. 23 is a flowchart of the synonym addition process in the process of the term expansion data editing unit.

【符号の説明】[Explanation of symbols]

１１…入力部、１２…用語展開用データベース、１３…
検索条件作成部、１４…問合せエンジン、１５…検索用
データベース、１６…集計部、１７…表示部、１８…用
語展開用データ編集部。11 ... Input part, 12 ... Term expansion database, 13 ...
Search condition creation unit, 14 ... Inquiry engine, 15 ... Search database, 16 ... Aggregation unit, 17 ... Display unit, 18 ... Term expansion data editing unit.

Claims

【特許請求の範囲】[Claims]

【請求項１】分析対象の複数の文書を記憶する対象記
憶手段と、検索用の複数のカテゴリーの語を組み合わせて入力する
入力手段と、カテゴリーが定義され、カテゴリーの語を展開するため
に用いる展開情報記憶手段と、前記入力されたカテゴリーの語をそのカテゴリーに属す
る検索用のキー語に展開情報記憶手段を参照して展開す
る用語展開手段と、展開されたすべての検索用のキー語により前記対象記憶
手段の文書を検索する検索手段と、検索された文書群を、前記入力されたカテゴリー間の指
定された論理演算によって定まる新たなカテゴリー群に
分類、集計する集計手段と、集計結果を指定の表示形式により表示する表示手段とを
備えたことを特徴とする文書群分析装置。1. A target storage means for storing a plurality of documents to be analyzed, an input means for inputting a combination of words of a plurality of categories for searching, and a category defined, which is used for expanding a word of the category. The expansion information storage means, the term expansion means for expanding the word of the input category into the search key word belonging to the category by referring to the expansion information storage means, and the expanded key words for search Search means for searching the documents in the target storage means; totalizing means for classifying and totalizing the searched document groups into a new category group determined by a specified logical operation between the input categories; A document group analysis device, comprising: a display unit for displaying in a designated display format.

【請求項２】前記展開情報記憶手段のカテゴリーの定
義を編集するための編集手段を設けたことを特徴とする
請求項１記載の文書群分析装置。2. The document group analysis apparatus according to claim 1, further comprising editing means for editing the definition of the category of the expansion information storage means.