JPH05225247A

JPH05225247A - Inter-docment structure display method

Info

Publication number: JPH05225247A
Application number: JP4004806A
Authority: JP
Inventors: Hiroyasu Chimura; 浩靖千村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1992-01-14
Filing date: 1992-01-14
Publication date: 1993-09-03

Abstract

PURPOSE:To considerably reduce the man-hour by using a keyword to automatically extract relations between plural documents and displaying the structure on a display device. CONSTITUTION:Two documents A and B are selected by a document pair generating part 1, and all sentences in documents A and B are searched by a sentence search part 2 to find keywords registered in a preliminarily prepared set of keywords. As the result, sets Ka and Kb of keywords included in documents A and B are settled. A set comparing part 3 compares sets Ka and Kb with each other; and when the set Ka is a subset of the set Kb or a part of the set Ka on the outside of the set Kb is smaller than a set value alpha, it is regarded by a document inclusion relation discriminating part 4 that the document B includes the document A, and '1' is substituted into the A-th row and the B-th column of a prepared matrix M. This processing is repeated for all pairs of documents, and the matrix M is regarded as an adjacency matrix in the graph theory by a graph operation part 6, and the known hierarchizing algorithm is applied to display the structure of the whole of documents as a tree.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、複数の文書間の関連を
自動的に抽出し、構造をディスプレイに表示することに
より文書管理を行うための文書間構造表示方法に関する
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an inter-document structure display method for managing documents by automatically extracting relationships between a plurality of documents and displaying the structures on a display.

【０００２】[0002]

【従来の技術】従来、複数の文書間の関連構造を表示す
るためには、個々の文書を人間が読んで文書対の関連の
あり・なしを判定することが必要であった。2. Description of the Related Art Conventionally, in order to display a relational structure between a plurality of documents, it has been necessary for a human to read each document and determine whether a document pair is related or not.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、文書が
大量になった場合、また、個々の文書のボリュームが大
規模である場合、それらすべてを人間が読んで関係を見
出すことは大変な工数を必要とし、事実上不可能に近か
った。また、例え文書間の関係づけができたとしても、
関係の強弱をも表現できるような表示方法がなかった。However, when the number of documents becomes large and the volume of each document is large, it takes a lot of man-hours for a human to read all of them and find out the relationship. And it was virtually impossible. In addition, even if the documents can be related,
There was no display method that could express the strength of the relationship.

【０００４】[0004]

【課題を解決するための手段】本発明は、上記問題点を
解決したものであり、第１の発明の文書間構造表示方法
は、複数の文書間の関連を自動的に抽出し、構造をディ
スプレイに表示する文書間構造表示方法であって、任意
の２つの文書Ａ，Ｂに対して、あらかじめ用意したキー
ワード集に登録されているキーワードの中で、文書Ａに
含まれるキーワードの集合をＫ_a、文書Ｂに含まれるキ
ーワードの集合をＫ_bとする第１のステップと、Ｋ_aが
Ｋ_bの部分集合であるか、またははみ出す部分すなわち
集合｛Ｋ_a−Ｋ_b｝の要素数が設定値αよりも小さい場
合には文書Ｂが文書Ａを包含するとみなし、文書Ａから
文書Ｂへの方向に関連があると定義する第２のステップ
と、これをすべての文書対に対して行うことにより、す
べての文書間の関連を求める第３のステップと、既知の
階層化アルゴリズムを適用して全体の構造をツリーとし
てディスプレイに表示する第４のステップとから成るこ
とを特徴とする。SUMMARY OF THE INVENTION The present invention solves the above problems, and a method of displaying a structure between documents according to the first invention automatically extracts a relation between a plurality of documents to obtain a structure. A method for displaying an inter-document structure displayed on a display, wherein a set of keywords included in document A among keywords registered in a keyword collection prepared in advance for any two documents A and B is set to K. _a , _a first step in which a set of keywords included in document B is K _b, and K _a is _a subset of K _b , or a protruding portion, that is, the number of elements of _a set {K _a −K _b } is set If the value is smaller than the value α, it is considered that the document B includes the document A, and the second step of defining that the direction from the document A to the document B is relevant, and this is performed for all document pairs. Due to the association between all documents A third step of obtaining, characterized in that it consists of a fourth step of displaying on the display the whole structure by applying known hierarchical algorithm as a tree.

【０００５】第２の発明の文書間構造表示方法は、第１
の発明の文書間構造表示方法において、設定値αを変化
させる第５のステップを有することを特徴とする。The interdocument structure display method of the second invention is the first method.
The inter-document structure display method according to the invention is characterized by including a fifth step of changing the set value α.

【０００６】第３の発明の文書間構造表示方法は、第１
の発明の文書間構造表示方法において、ツリーをディス
プレイに表示する時、文書ＡとＢの間のアークの太さ
を、集合｛Ｋ_a−Ｋ_b｝の要素数に応じて変化させるこ
とにより、包含関係の強さを表現する第５のステップを
有することを特徴とする。The inter-document structure display method of the third invention is the first method.
In the method for displaying an inter-document structure according to the invention described above, when the tree is displayed on the display, by changing the thickness of the arc between the documents A and B according to the number of elements of the set {K _a −K _b }, It is characterized by having a fifth step of expressing the strength of the inclusion relation.

【０００７】第４の発明の文書間構造表示方法は、第１
の発明の文書間構造表示方法において、ツリーをディス
プレイに表示する時、文書ＡとＢの間のアークの色を、
集合｛Ｋ_a−Ｋ_b｝の要素数に応じて変化させることに
より、包含関係の強さを表現する第５のステップを有す
ることを特徴とする。The interdocument structure display method of the fourth invention is the first method.
In the method for displaying the structure between documents according to the invention of claim 1, when the tree is displayed on the display, the color of the arc between the documents A and B is
It is characterized by having a fifth step of expressing the strength of the inclusive relation by changing it according to the number of elements of the set {K _a −K _b }.

【０００８】[0008]

【作用】本発明は４個の発明からなる。このうち第２、
第３、第４の発明は、第１の発明を基本として、それぞ
れ別個の効果をもたらす新しい機能を付加した発明であ
る。The present invention consists of four inventions. The second of these,
The third and fourth inventions are inventions based on the first invention, to which new functions that bring different effects are added.

【０００９】第１の発明は、以下の４つのステップによ
り、要素間関連構造をディスプレイに表示する。The first invention displays the inter-element relation structure on the display by the following four steps.

【００１０】第１のステップ任意の２つの文書Ａ，Ｂ中のすべての文章をサーチし、
あらかじめ用意したキーワード集に登録されているキー
ワードを捜す。今、文書Ａに含まれるキーワードの集合
がＫ_a、文書Ｂに含まれるキーワードの集合がＫ_bであ
ったとする。The first step is to search all sentences in any two documents A and B,
Search for the keywords registered in the prepared keyword collection. It is assumed that the set of keywords included in the document A is K _a and the set of keywords included in the document B is K _b .

【００１１】第２のステップ集合Ｋ_aとＫ_bの要素を比較する。Ｋ_aがＫ_bの部分集
合であるか、またははみ出す部分すなわち集合｛Ｋ_a−
Ｋ_b｝の要素数が設定値αよりも小さい場合には文書Ｂ
が文書Ａを包含するとみなし、文書Ａから文書Ｂへの方
向に関連があると定義し、用意しておいたマトリックス
Ｍの第Ａ行、第Ｂ列に“１”を代入する。Second step Compare the elements of sets K _a and K _b . K _a is _a subset of K _b , or the protruding portion or set {K _a −
If the number of elements of K _b } is smaller than the set value α, the document B
Is considered to include the document A, the direction from the document A to the document B is defined as being related, and “1” is substituted into the prepared matrix M at the A-th row and the B-th column.

【００１２】第３のステップすべての文書対に対して上記第１〜第２のステップを繰
り返す。これにより、すべての文書間の関連が求まる。Third Step The above first and second steps are repeated for all document pairs. As a result, the relation between all the documents is obtained.

【００１３】第４のステップ第３のステップの結果得られたマトリックスＭを、グラ
フ理論における隣接行列とみなし、既知の階層化アリゴ
リズムを適用して全体の構造をツリーとしてディスプレ
イに表示する。Fourth Step The matrix M obtained as a result of the third step is regarded as an adjacency matrix in graph theory, and the known hierarchical algorithm is applied to display the entire structure as a tree on the display.

【００１４】第２の発明は、第１の発明における設定値
αを変化させることにより、文書間の包含関係のあり／
なしの判定の感度を調整する。A second aspect of the invention is that there is an inclusion relation between documents by changing the set value α in the first aspect.
Adjust the sensitivity of the judgment of none.

【００１５】第３の発明は、第１の発明において表示し
たツリーにおいて、例えば文書Ａ，Ｂに対するＫ_aがＫ
_bの部分集合である場合には、文書ＡからＢへのアーク
の太さを最大にし、一方、Ｋ_aがＫ_bの部分集合でな
く、はみ出す部分がある場合すなわち集合｛Ｋ_a−
Ｋ_b｝が空集合でない場合には｛Ｋ_a−Ｋ_b｝の要素数
が増加するに従ってアークが次第に細くなるように表示
する。これにより、文書間のアークの太さにより、文書
間の包含関係の強さが一目でわかるようにする。In the third invention, in the tree displayed in the first invention, for example, K _a for documents A and B is K.
If it is a subset of _b , then the thickness of the arc from documents A to B is maximized, while if K _a is not a subset of K _b and there is a protruding portion, that is, the set {K _a −
When K _b } is not an empty set, the arc is displayed so as to become gradually thinner as the number of elements of {K _a −K _b } increases. As a result, the strength of the inclusion relation between documents can be seen at a glance based on the thickness of the arc between the documents.

【００１６】第４の発明は、第１の発明において表示し
たツリーにおいて、例えば文書Ａ，Ｂに対するＫ_aがＫ
_bの部分集合である場合には、文書ＡからＢへのアーク
の色を明るい色または強い色にし、一方、Ｋ_aがＫ_bの
部分集合でなく、はみ出す部分がある場合すなわち集合
｛Ｋ_a−Ｋ_b｝が空集合でない場合には｛Ｋ_a−Ｋ_b｝
の要素数が増加するに従ってアークの色が次第に暗くな
る、または弱くなるように表示する。これにより、文書
間のアークの色により、文書間の包含関係の強さが一目
でわかるようにする。In a fourth invention, in the tree displayed in the first invention, for example, K _a for documents A and B is K.
If it is a subset of _b , then the color of the arc from documents A to B is a bright or strong color, while if K _a is not a subset of K _b , and there is a protruding portion, that is, the set {K _a If -K _b} is not an empty set {K _a -K _b}
The color of the arc is gradually darkened or weakened as the number of elements of is increased. As a result, the strength of the inclusion relation between documents can be seen at a glance by the color of the arc between the documents.

【００１７】[0017]

【実施例】図１におけるＡは、第１の発明の一実施例を
示すブロック図である。始めに、文書対発生部１におい
て、処理の対象となる２つの文書Ａ，Ｂを選択する。次
に、文章サーチ部２において、文章Ａ，Ｂ中のすべての
文章をサーチし、あらかじめ用意したキーワード集に登
録されているキーワードを捜す。その結果、文書Ａに含
まれるキーワードの集合Ｋ_a、文書Ｂに含まれるキーワ
ードの集合Ｋ_bが確定する。次に、集合比較部３におい
て、Ｋ_aとＫ_bの比較を行い、Ｋ_aがＫ_bの部分集合で
あるか、または、はみ出す部分がある場合すなわち集合
｛Ｋ_a−Ｋ_b｝が空集合でない場合には｛Ｋ_a−Ｋ_b｝
の要素数が設定値αよりも小さいか、を調べる。この結
果により、次の文書包含関係判定部４にて、文書Ａと文
書Ｂの包含関係を判定し、もし包含関係があるならば用
意しておいたマトリックスＭの対応する部分に“１”を
代入する。次に、判定部５において、すべての文書対に
対して処理が終了したか否かを判定し、終了してないな
らば文書対発生部１に戻り、新たな文書対に対して上記
の処理を繰り返す。もし、すべての文書対に対して処理
が終了しているのならば、グラフ演算部６において、マ
トリックスＭをグラフ理論における隣接行列とみなし、
既知の階層化アルゴリズムを適用して階層化を行う。次
のツリー表示部７において、実際に全体の構造をツリー
としてディスプレイに表示する。DESCRIPTION OF THE PREFERRED EMBODIMENTS A in FIG. 1 is a block diagram showing an embodiment of the first invention. First, the document pair generation unit 1 selects two documents A and B to be processed. Next, the text search unit 2 searches all the texts in the texts A and B to search for the keywords registered in the keyword collection prepared in advance. As a result, the keyword set K _a included in the document A and the keyword set K _b included in the document B are determined. Next, in the set comparison unit 3, K _a is compared with K _b , and if K _a is _a subset of K _b or there is a protruding portion, that is, the set {K _a −K _b } is an empty set. Otherwise, {K _a −K _b }
Check whether the number of elements of is smaller than the set value α. Based on this result, the next document inclusion relation determination unit 4 determines the inclusion relation between document A and document B, and if there is an inclusion relation, sets "1" to the corresponding portion of the prepared matrix M. substitute. Next, the determination unit 5 determines whether or not the processing has been completed for all document pairs, and if not completed, the process returns to the document pair generation unit 1 to perform the above-described processing for a new document pair. repeat. If the processing has been completed for all document pairs, the graph computing unit 6 regards the matrix M as an adjacency matrix in graph theory,
Layering is performed by applying a known layering algorithm. In the next tree display unit 7, the entire structure is actually displayed as a tree on the display.

【００１８】図１におけるＡとＢは、第２の発明の一実
施例を示すブロック図である。設定値α入力部８におい
て、設定値αの値を入力し、この値を集合比較部３に送
る。この操作により、文書間の包含関係のあり／なしの
判定の感度を調整する。A and B in FIG. 1 are block diagrams showing an embodiment of the second invention. The set value α input unit 8 inputs the value of the set value α and sends this value to the set comparison unit 3. By this operation, the sensitivity of the determination of the presence / absence of the inclusion relation between documents is adjusted.

【００１９】図１におけるＡとＣは、第３の発明の一実
施例を示すブロック図である。アーク太さ決定部９は、
集合比較部３から集合｛Ｋ_a−Ｋ_b｝の要素数情報を受
取り、各アークの太さを決定し、ツリー表示部に太さ情
報を与える。A and C in FIG. 1 are block diagrams showing an embodiment of the third invention. The arc thickness determination unit 9
The element number information of the set {K _a −K _b } is received from the set comparison unit 3, the thickness of each arc is determined, and the tree display unit is provided with the thickness information.

【００２０】図１におけるＡとＤは、第４の発明の一実
施例を示すブロック図である。アーク色決定部１０は、
集合比較部３から集合｛Ｋ_a−Ｋ_b｝の要素数情報を受
取り、各アークの色を決定し、ツリー表示部に色情報を
与える。A and D in FIG. 1 are block diagrams showing an embodiment of the fourth invention. The arc color determination unit 10
Receiving element number information of the set {K _a -K _b} from a set comparison unit 3, determines the color of each arc, giving color information on the tree display unit.

【００２１】[0021]

【発明の効果】従来、文書対の関連のあり／なしの判断
を人手で行っていた方式に対して、本発明による方式に
よれば、文書対の関連のあり／なしの判断を自動的に行
うことが可能となり、大幅に工数を削減することができ
るという効果をもたらす。According to the method of the present invention, it is possible to automatically determine whether a document pair is related or not, as opposed to the method of manually determining whether a document pair is related. It is possible to do so, and it is possible to significantly reduce the number of steps.

【図面の簡単な説明】[Brief description of drawings]

【図１】第１、２、３及び４の発明の一実施例を示すブ
ロック図FIG. 1 is a block diagram showing an embodiment of first, second, third and fourth inventions.

【符号の説明】[Explanation of symbols]

１文書対発生部２文章サーチ部３集合比較部４文書包含関係判定部５判定部６グラフ演算部７ツリー表示部８設定値α入力部９アーク太さ決定部１０アーク色決定部 1 Document Pair Generation Section 2 Text Search Section 3 Set Comparison Section 4 Document Inclusion Relationship Determination Section 5 Determination Section 6 Graph Calculation Section 7 Tree Display Section 8 Set Value α Input Section 9 Arc Thickness Determination Section 10 Arc Color Determination Section

Claims

【特許請求の範囲】[Claims]

【請求項１】複数の文書間の関連を自動的に抽出し、
構造をディスプレイに表示する文書間構造表示方法であ
って、任意の２つの文書Ａ，Ｂに対して、あらかじめ用意した
キーワード集に登録されているキーワードの中で、文書
Ａに含まれるキーワードの集合をＫ_a、文書Ｂに含まれ
るキーワードの集合をＫ_bとする第１のステップと、Ｋ
_aがＫ_bの部分集合であるか、またははみ出す部分すな
わち集合｛Ｋ_a−Ｋ_b｝の要素数が設定値αよりも小さ
い場合には文書Ｂが文書Ａを包含するとみなし、文書Ａ
から文書Ｂへの方向に関連があると定義する第２のステ
ップと、これをすべての文書対に対して行うことによ
り、すべての文書間の関連を求める第３のステップと、
既知の階層化アルゴリズムを適用して全体の構造をツリ
ーとしてディスプレイに表示する第４のステップとから
成ることを特徴とする文書間構造表示方法。1. A relationship between a plurality of documents is automatically extracted,
A method of displaying an inter-document structure for displaying a structure on a display, wherein a set of keywords included in document A among keywords registered in a keyword collection prepared in advance for any two documents A and B Is K _a and the set of keywords contained in document B is K _b, and K
_{If a} is _a subset of K _b , or if the number of elements in the protruding portion or set {K _a −K _b } is smaller than the set value α, document B is considered to include document A, and document A
To the document B from the second step, and by doing this for all document pairs, the third step to find the relationship between all documents,
A fourth step of applying a known layering algorithm to display the entire structure as a tree on a display.

【請求項２】設定値αを変化させる第５のステップを
有することを特徴とする請求項１記載の文書間構造表示
方法。2. The inter-document structure display method according to claim 1, further comprising a fifth step of changing the set value α.

【請求項３】ツリーをディスプレイに表示する時、文
書ＡとＢの間のアークの太さを、集合｛Ｋ_a−Ｋ_b｝の
要素数に応じて変化させることにより、包含関係の強さ
を表現する第５のステップを有することを特徴とする請
求項１記載の文書間構造表示方法。3. When the tree is displayed on the display, the strength of the inclusive relation is changed by changing the thickness of the arc between the documents A and B according to the number of elements of the set {K _a −K _b }. The inter-document structure display method according to claim 1, further comprising a fifth step of expressing

【請求項４】ツリーをディスプレイに表示する時、文
書ＡとＢの間のアークの色を、集合｛Ｋ_a−Ｋ_b｝の要
素数に応じて変化させることにより、包含関係の強さを
表現する第５のステップを有することを特徴とする請求
項１記載の文書間構造表示方法。4. When displaying a tree on a display, by changing the color of the arc between the documents A and B according to the number of elements of the set {K _a −K _b }, the strength of the inclusion relation can be increased. The inter-document structure display method according to claim 1, further comprising a fifth step of expressing.