JPH0743728B2 - Summary sentence generation method - Google Patents

Summary sentence generation method

Info

Publication number
JPH0743728B2
JPH0743728B2 JP2203865A JP20386590A JPH0743728B2 JP H0743728 B2 JPH0743728 B2 JP H0743728B2 JP 2203865 A JP2203865 A JP 2203865A JP 20386590 A JP20386590 A JP 20386590A JP H0743728 B2 JPH0743728 B2 JP H0743728B2
Authority
JP
Japan
Prior art keywords
sentence
sentences
tree structure
connection relation
categorical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP2203865A
Other languages
Japanese (ja)
Other versions
JPH0490055A (en
Inventor
顕司 小野
聡 木下
輝彦 浮田
Original Assignee
工業技術院長
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 工業技術院長 filed Critical 工業技術院長
Priority to JP2203865A priority Critical patent/JPH0743728B2/en
Publication of JPH0490055A publication Critical patent/JPH0490055A/en
Publication of JPH0743728B2 publication Critical patent/JPH0743728B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

【発明の詳細な説明】 [発明の目的] (産業上の利用分野) 本発明は自然言語からなる文章の高品質な要約文を効果
的に生成することのできる要約文生成方式に関する。
DETAILED DESCRIPTION OF THE INVENTION [Industrial Application] The present invention relates to a summary sentence generation method capable of effectively generating a high-quality summary sentence of a sentence composed of natural language.

(従来の技術) 自然言語からなる文章の要約文の生成は、従来一般的に
は文章全体に亘って出現頻度の高い語を多数含む文や、
例えば「重要なことは…」等の特定の語や表現を含む文
を重要度の高い文であると判断して抽出し、これらの文
を繋ぎ合わせることにより行われている。
(Prior Art) Conventionally, the generation of a summary sentence of a sentence composed of a natural language generally involves a sentence including a large number of frequently appearing words in the entire sentence,
For example, a sentence including a specific word or expression such as "important thing ..." is judged to be a sentence of high importance and extracted, and these sentences are connected.

然し乍ら、このような文抽出の基準は、単に語の出現頻
度や特定の語に着目して行われているだけなので、内容
的に同じ意味を持つ複数の文が重複して抽出され易い。
この結果、往々にして冗長な要約文が生成され易いと云
う問題があった。しかも文抽出の基準が文章全体の論旨
の流れを踏まえたものではないので、抽出された複数の
文間の接続関係を決定することが困難であり、適切な要
約文を生成することが難しい。更には抽出された複数の
文を繋ぎ合わせて生成された要訳文中には、往々にして
必要な接続詞がない場合や、不適切な接続詞が含まれる
ことが多くあり、その文意が曖昧なものとなり易いと云
う問題があった。
However, since such a sentence extraction criterion is merely focused on the frequency of occurrence of a word or a specific word, a plurality of sentences having the same meaning in content are likely to be extracted in duplicate.
As a result, there has been a problem that redundant summary sentences are often generated. In addition, since the sentence extraction standard is not based on the flow of the whole sentence, it is difficult to determine the connection relation between the extracted plural sentences, and it is difficult to generate an appropriate summary sentence. In addition, the essential sentences generated by connecting a plurality of extracted sentences often include a necessary conjunction or an inappropriate conjunction, which makes the meaning of the sentence ambiguous. There was a problem that it was easy to become a thing.

一方、最近では文章全体の文脈構造を解析し、文章全体
の論旨の流れや、文章を構成する個々の文間の相対的な
関係を調べることが種々試みられている。この文脈構造
の解析は、例えば 『情報処理学会研究会報告Vol.89 No.6 89−NL−70−2 pp1−8 1989.1.20』 『情報処理学会シンポジウム論文集 Vol.89 No.5 pp125−136 1989.11』 等に紹介されるように、基本的には文章中に出現する接
続詞等の修辞表現を手掛りとし、その修辞表現から複数
の文間の接続関係として好ましい系列を、接続関係間の
系列規則を参照して調べることによりなされる。
On the other hand, recently, various attempts have been made to analyze the context structure of the whole sentence and examine the flow of the whole sentence and the relative relationship between the individual sentences constituting the sentence. The analysis of this context structure can be performed, for example, in “Information Processing Society of Japan, Research Report Vol.89 No.6 89-NL-70-2 pp1-8 1989.1.20”, “Information Processing Society of Japan Symposium Vol.89 No.5 pp125-”. 136 1989.11 ”, etc., basically, using a rhetorical expression such as a conjunction that appears in a sentence as a clue, a preferable sequence as a connection relation between a plurality of sentences from the rhetorical expression is referred to as a sequence between connection relations. It is done by looking up the rules.

然し乍ら、このような解析処理にて文章全体の文脈構造
を求めても、一般的にはその論旨の流れと、文章を構成
する複数の文間の接続関係が示されるだけで、このよう
な情報を用いて要約文を生成する手法については明らか
にされていない。
However, even if the context structure of the entire sentence is obtained by such an analysis process, in general, only the flow of the argument and the connection relation between the multiple sentences forming the sentence are shown, and such information can be obtained. The method of generating a summary sentence using is not clarified.

(発明が解決しようとする課題) このように従来の要約文生成処理にあっては、内容的に
同じ意味を持つ文を重複して抽出し易く、また抽出され
た複数の文間の接続関係を決定することが困難なので、
抽出された複数の文を繋ぎ合わせて要約文を生成した
際、冗長な要約文となったり、またその要約文中に不適
切な接続詞が含まれることが多々生じ易い等の問題があ
った。
(Problems to be Solved by the Invention) As described above, in the conventional summary sentence generation process, it is easy to duplicate sentences having the same meaning in terms of content, and the connection relation between the plurality of extracted sentences. Is difficult to determine,
When a plurality of extracted sentences are connected to each other to generate a summary sentence, there are problems such as a redundant summary sentence and the possibility that an inappropriate conjunction is often included in the summary sentence.

本発明はこのような事情を考慮してなされたもので、そ
の目的とするところは、文章中の重要な文を文間の接続
関係を明確に決定しながら抽出して高品質な要約文を効
果的に生成することのできる要約文生成方式を提供する
ことにある。
The present invention has been made in consideration of such circumstances, and an object thereof is to extract a high-quality summary sentence by extracting important sentences in a sentence while clearly determining the connection relation between sentences. It is to provide a summary sentence generation method that can be effectively generated.

[発明の構成] (課題を解決するための手段) 本発明に係る要約文生成方式は、自然言語の文章を解析
して文章全体の論旨構造を求める手段と、複数の文間の
接続関係を表す共範疇的標識を用いて前記文章の論旨構
造を文を単位として木構造表現する手段と、予め記憶さ
れた共範疇的標識に対応した選択規則に基づいて前記木
構造から2文で構成される部分木構造である最小単位構
造で取り出し、この取り出した最小単位構造を該最小単
位構造に含まれる共範疇的標識に従って分類し、分類に
応じて2文中の前文ないし後文ないし両文を削除する棄
却操作を再帰的に繰り返す手段と、前記棄却操作の後前
記木構造中に残った文を重要文として取り出す手段と、
前記棄却操作の後前記木構造中に残った共範疇的標識を
用いて前記重要文間の接続表現を生成して要約文を得る
手段とを具備したことを特徴とする。
[Structure of the Invention] (Means for Solving the Problem) A summary sentence generation method according to the present invention provides a means for analyzing a sentence in natural language to obtain a logical structure of the whole sentence and a connection relation between a plurality of sentences. The sentence structure is composed of two sentences based on a means for expressing the logical structure of the sentence in a tree structure using a co-categorical marker as a unit and a selection rule corresponding to the pre-stored co-categorical marker. The minimum unit structure that is a partial tree structure is extracted, the extracted minimum unit structure is classified according to the co-categorical marker included in the minimum unit structure, and the preceding sentence or the latter sentence or both sentences in two sentences are deleted according to the classification. Means for recursively repeating the rejection operation, and means for extracting a sentence remaining in the tree structure after the rejection operation as an important sentence,
And a means for obtaining a summary sentence by generating a connection expression between the important sentences using the co-categorical markers remaining in the tree structure after the rejection operation.

(作用) 本発明によれば、解析処理により文章全体の論旨の流れ
を示す文脈構造を求め、この文脈構造に従って前記文章
を構成する複数の文を、複数の文間の接続関係を表す共
範疇的標識を用いて木構造表現し、共範疇的標識に対応
して選択規則に基づいて木構造から2文で構成される部
分的木構造である最小単位構造を取り出し、この取り出
した最小単位構造を該最小単位構造に含まれる共範疇的
標識に従って分類し、分類に応じて2文中の前文ないし
後文ないし両文を削除する棄却操作を再帰的に繰り返し
行うので、棄却操作の後、木構造中に残った文を重要文
として取り出すことができ、また棄却操作の後、木構造
中に残った共範疇的標識を用いて前記重要文間の接続表
現を生成して要約文を得ることができる。
(Operation) According to the present invention, a context structure indicating a flow of the whole sentence is obtained by analysis processing, and a plurality of sentences forming the sentence according to this context structure are combined into a common category representing a connection relation between the plurality of sentences. The tree structure is represented using a dynamic marker, the minimum unit structure which is a partial tree structure composed of two sentences is extracted from the tree structure based on the selection rule corresponding to the co-categorical marker, and the extracted minimum unit structure is extracted. Are classified according to the co-categorical markers included in the minimum unit structure, and the rejection operation of deleting the preceding sentence, the latter sentence, or both sentences in the two sentences is recursively repeated according to the classification. Sentences remaining in the sentence can be extracted as important sentences, and after the rejection operation, the connected representation between the important sentences can be generated using the co-categorical markers remaining in the tree structure to obtain a summary sentence. it can.

(実施例) 以下、図面を参照して本発明の一実施例に係る要約文生
成方式について説明する。
(Embodiment) Hereinafter, a summary sentence generation method according to an embodiment of the present invention will be described with reference to the drawings.

第1図は実施例方式を適用して構成される文書処理装置
の概略構成図で、1は自然言語からなる文章(テキスト
データ)を入力する文章入力部である。文脈構造解析部
2は前記文章入力部1から入力された文章に対して、例
えば前述したような手法を用いて文章中に出現する接続
詞等の修辞表現を手掛りとし、その修辞表現から複数の
文間の接続関係として好ましい系列を、接続関係間の系
列規則を参照して調べ、その論旨構造を示す文脈構造を
求めるものである。
FIG. 1 is a schematic configuration diagram of a document processing device configured by applying the embodiment method, and 1 is a sentence input unit for inputting a sentence (text data) in natural language. The context structure analysis unit 2 uses a rhetorical expression such as a conjunction that appears in the text with respect to the text input from the text input unit 1 by using, for example, the method described above, and a plurality of the texts are extracted from the rhetorical expression. The preferred sequence as the connection relation between the connections is checked with reference to the sequence rule between the connection relations, and the context structure indicating the argument structure is obtained.

木構造生成部3は上述した如く解析される入力文章の文
脈構造に従い、前述入力文章を構成する複数の文を、文
を単位とし、複数の文間の接続関係を表す共範疇的標識
を用いて木構造表現するものである。
According to the context structure of the input sentence analyzed as described above, the tree structure generating unit 3 uses a plurality of sentences forming the above-mentioned input sentence as a unit of sentence and a co-categorical marker indicating a connection relation between the plurality of sentences. It is a tree structure expression.

接続関係判断部4は、規則部5に予め登録されている、
文間の接続関係に固有な選択規則を参照し、前記木構造
生成部3にて生成された文脈構造を示す木構造に従って
複数の文間の接続関係を示す共範疇的標識から、文の一
方または両方を棄却しても良いか否かを判断する。そし
て棄却すべき文が見出された場合、その文についての情
報を前記木構造から棄却する。この接続関係判断部4に
よる不要文の棄却操作は、前記木構造を修正しながら再
帰的に繰り返し行われる。そして所定回数の棄却操作が
繰り返し行われたとき、或いは上記棄却操作により残さ
れた文の数が所定文数以下となったとき、不要分の棄却
操作が停止され、残された文をその木構造に従って繋ぎ
合わせることにより、前記入力文章に対する要約文が生
成出力される。
The connection relation judging unit 4 is registered in the rule unit 5 in advance,
One of the sentences is referred from the co-categorical indicator indicating the connection relation between a plurality of sentences in accordance with the tree structure indicating the context structure generated by the tree structure generation unit 3 with reference to the selection rule specific to the connection relation between sentences. Alternatively, it is determined whether or not both can be rejected. When a sentence to be rejected is found, the information about the sentence is rejected from the tree structure. The unnecessary sentence rejection operation by the connection relation determination unit 4 is recursively repeated while modifying the tree structure. When a predetermined number of reject operations are repeated, or when the number of sentences left by the above reject operation is less than or equal to a predetermined number of sentences, unnecessary reject operations are stopped, and the remaining sentences are removed from the tree. By connecting in accordance with the structure, a summary sentence for the input sentence is generated and output.

このような要約文の生成処理の流れを更に詳しく説明す
る。
The flow of such summary sentence generation processing will be described in more detail.

例えば第2図に示すような文章が前記文章入力部1から
入力されると、文脈構造解析部2は、例えば句点を分の
区切りとして検出し、文章中に出現する接続詞や特定の
文章表現等の修辞表現を手掛りとして複数の文間の接続
関係を調べる。この文間の接続関係は、例えば第3図に
示すように構成さた辞書を参照する等し、文章中に出現
する接続詞や特定の文章表現を手掛りとして、その語
(文章表現)にて結ばれる文間の接続関係を示す『接続
の型』『接続関係名』をそれぞれ求め、これを共範疇標
識記号にて表現することによりなされる。この際、その
修辞表現から複数の文間の接続関係として好ましい系列
を、接続関係間の系列規則を参照して文章全体に亘って
調べ、前記入力文章の論旨構造を示す文脈構造を決定す
る。
For example, when a sentence as shown in FIG. 2 is input from the sentence input unit 1, the context structure analysis unit 2 detects, for example, a punctuation mark as a division of minutes, and a conjunction or a specific sentence expression appearing in the sentence. The connection relation between multiple sentences is examined by using the rhetorical expression of. The connection relation between the sentences is obtained by referring to a dictionary constructed as shown in FIG. 3, for example, and connecting the words (sentence expressions) with the connectives or specific sentence expressions appearing in the sentence as clues. The "connection type" and the "connection relationship name", which indicate the connection relationship between the sentences, are respectively obtained, and are represented by a co-category indicator symbol. At this time, a series preferable as a connection relation between a plurality of sentences is checked from the rhetorical expression over the entire sentence with reference to a series rule between the connection relations, and a context structure indicating a logical structure of the input sentence is determined.

尚、このようにして求められる文間の接続関係は、例え
ば第3図に示すように『言明型』の接続関係として『重
複』『補足』『理由』…等の情報であり、『直列型』の
接続関係としては『順接』『逆接』『同列』…等の情
報、また『並列型』の接続関係としては『並列』『対
比』『転換』…等の情報、更にその他の接続関係として
『予定』『参照』等の情報からなる。
The connection relation between the sentences thus obtained is information such as "overlap", "supplement", "reason", etc., as the connection relation of "statement type" as shown in FIG. "Forward connection,""reverseconnection,""samerow," etc. as the connection relation, and "parallel", "contrast", "conversion", etc. as the connection relation for "parallel type", and other connection relations It consists of information such as “plan” and “reference”.

文脈構造解析部2は、このようにして前記第2図に示す
ような入力文章を解析し、その入力文章が,,〜
の8つの文からなり、これらの文間には ((((1→2)−(((3→4)ap5)×6))→
7)※8) なる接続関係があることが見出される。尚、ここに示し
た共範疇的標識記号である[→]は『順接』を示し、ま
た[−]は『対比』,[×]は『逆接』を,[※]は
『参照』,[ap]は『継続』をそれぞれ示している。
The context structure analysis unit 2 analyzes the input sentence as shown in FIG. 2 in this way, and the input sentence is ...
It consists of 8 sentences of ((((1 → 2)-(((3 → 4) ap5) × 6)) →
7) * 8) It is found that there is a connection relationship. In addition, the co-categorical mark symbol shown here [→] indicates “forward connection”, [−] indicates “contrast”, [×] indicates “reverse connection”, and [*] indicates “reference”, [Ap] indicates "continuation", respectively.

前記木構造生成部3は上述した如く求められる入力文章
の、文を単位とする接続関係を示す文脈構造に従って、
例えば第5図(a)に示すように木構造を生成し、個々
の文間の接続関係を示す部分木毎に、そのノード部分に
前述した共範疇標識記号を付し、これによって前記入力
文章の論旨構造、つまりその文脈構造を表現する。この
ようにして共範疇標識記号を用いて木構造表現された入
力文章の文脈構造に従い、前記接続関係判断部4の下
で、例えば第4図に示すようにして要約文の生成処理が
実行される。
The tree structure generation unit 3 follows the context structure showing the connection relation in sentence units of the input sentence obtained as described above.
For example, as shown in FIG. 5 (a), a tree structure is generated, and the subcategory indicator symbol described above is attached to the node part of each subtree indicating the connection relation between individual sentences, whereby the input sentence Expresses the argument structure of, that is, its contextual structure. In this way, according to the context structure of the input sentence represented by the tree structure using the co-category indicator symbol, the process of generating a summary sentence is executed under the connection relation judging unit 4 as shown in FIG. 4, for example. It

この処理手続きは、先ず生成しようとする要約文の文数
を規定する上限値Iを初期設定することから開始される
(ステップa)。この上限値Iは、要訳文を何文以下と
して生成するかを規定するものである。しかる後、前処
理として前述した如く共範疇標識記号を用いて木構造表
現された入力文章の文脈構造から『参照型』『予定型』
の共範疇標識記号を持つ部分構造(部分木)を、要約文
を構成するに冗長な文であると判断し、その部分構造を
前記入力文章の文脈構造から取除くと云う処理を実行す
る(ステップb)。具体的には、第5図(a)に示す木
構造の文脈構造においては、『参照型』の共範疇標識記
号[※]を持つ部分構造が文であることから、この文
を入力文章中から削除し、その木構造を第5図(b)
に示すように修正変更する。
This processing procedure is started by first initializing an upper limit value I that defines the number of sentences of the summary sentence to be generated (step a). The upper limit value I defines how many sentences or less are to be generated as the essential translation sentences. Then, as a preprocessing, as described above, from the context structure of the input sentence expressed in a tree structure using the co-category indicator symbol, "reference type""scheduledtype"
It is determined that the partial structure (subtree) having the co-category indicator symbol of is a redundant sentence for constructing the summary sentence, and a process of removing the partial structure from the context structure of the input sentence is executed ( Step b). Specifically, in the context structure of the tree structure shown in FIG. 5 (a), since the partial structure having the "reference type" co-category indicator symbol [*] is a sentence, this sentence is included in the input sentence. , And the tree structure is deleted from FIG. 5 (b).
Modify and change as shown in.

このような処理手続きを実行した後、前記木構造におい
て残されている分の数、つまりその後の処理対象とする
第5図(b)に示す文脈構造に含まれる文の数を制御値
Jとして初期設定する(ステップc)。そしてこの文の
数を示す制御値Jが前述した上限値Iに示される要約文
としての許容文数に達しているか否かを判定し(ステッ
プd)、この時点でその文数が上限値I以下となったこ
とが検出された場合には、それらの文を前述した如く文
の削除に伴って修正変更された木構造(既約文脈構造)
に従って繋ぎ合わせ、これを要約文として出力する(ス
テップe)。
After executing such a processing procedure, the number of remaining parts in the tree structure, that is, the number of sentences included in the context structure shown in FIG. Initialize (step c). Then, it is determined whether or not the control value J indicating the number of sentences has reached the allowable number of abstract sentences as the above-mentioned upper limit value I (step d). When it is detected that the following is true, the tree structure that has been modified by the deletion of the sentence as described above (irreduced context structure)
According to the above, they are connected and output as a summary sentence (step e).

然し乍ら、一般的には上述した『参照型』『予定型』の
共範疇標識記号を持つ部分構造(部分木)の削除処理だ
けでは、その文数を上限値I以下に抑えることは不可能
である。
However, in general, it is impossible to keep the number of sentences below the upper limit value I only by deleting the substructures (subtrees) having the "reference type" and "scheduled type" co-category indicator symbols described above. is there.

従ってこのような場合には、文章全体の文脈構造を示す
木構造の中から [文 接続関係 文] なる形式で示されるように2つの文が直接構造化されて
いるような最小単位部分に着目し、その最小単位部分に
ついての既約を行う。この既約は前記規則部5に予め格
納されている文間の接続関係に固有な選択規則に基づい
て行われるもので、例えばk番目(k=1,…L;Lは単位
構造の総数)の最小単位構造が、文Nkと文Mkとの間で、
その接続関係をRkとして [Nk,Rk,Mk] で示されるとき、 (1)接続関係Rkが直列型のとき、その単位全体を文Mk
にて置換する。
Therefore, in such a case, pay attention to the minimum unit part in which two sentences are directly structured as shown in the form [sentence connection sentence] from the tree structure showing the context structure of the whole sentence. Then, irreducible reduction is performed on the minimum unit part. This irreduction is performed based on a selection rule stored in advance in the rule section 5 and peculiar to the connection relation between sentences. For example, kth (k = 1, ... L; L is the total number of unit structures). The minimum unit structure of is between the sentence Nk and the sentence Mk,
When the connection relation is represented as [Rk] [Nk, Rk, Mk], (1) When the connection relation Rk is a series type, the whole unit is a sentence Mk.
Replace with.

[Nk,Rk,Mk]→[Mk] (2)接続関係Rkが言明型のとき、その単位全体を文Nk
にて置換する。
[Nk, Rk, Mk] → [Mk] (2) When the connection relation Rk is assertive, the whole unit is sentence Nk
Replace with.

[Nk,Rk,Mk]→[Nk] (3)接続関係Rkが並列型のとき、その単位全体を削除
する。
[Nk, Rk, Mk] → [Nk] (3) Connection relation When Rk is a parallel type, the entire unit is deleted.

[Nk,Rk,Mk]→[削除] 等の規則に従って行われる。このような既約処理は、文
脈構造を示す木構造の最小単位構造の部分から再帰的に
繰り返し実行される。
It is performed according to the rules such as [Nk, Rk, Mk] → [Delete]. Such irreducible processing is recursively repeatedly executed from the portion of the minimum unit structure of the tree structure showing the context structure.

即ち、この既約処理は、先ず文脈構造を示す木構造中か
らその最小単位構造[Nk,Rk,Mk]の全てを取り出し(ス
テップf)、制御パラメータKにその最小単位構造の総
数Lをセットする(ステップg)。そして上記制御パラ
メータKにより特定される最小単位構造については、そ
の接続関係Rkが『言明型』であるか,『直列型』である
かをそれぞれ判定する(ステップh、i)。そして接続
関係Rkが『言明型』である場合には、その最小単位構造
[Nk,Rk,Mk]を文Nkにて置換する(ステップj)。また
接続関係Rkが『直列型』である場合には、その最小単位
構造[Nk,Rk,Mk]を文Mkにて置換する(ステップk)。
そしてその接続関係Rkが『直列型』でも『言明型』でも
ない場合には、前述した第3図に示す接続関係から明ら
かなように、残された『並列型』であることが示される
ので、その最小単位構造[Nk,Rk,Mk]全体を削除する
(ステップm)。
That is, in this irreducible processing, first, all of the minimum unit structures [Nk, Rk, Mk] are extracted from the tree structure showing the context structure (step f), and the total number L of the minimum unit structures is set in the control parameter K. (Step g). Then, for the minimum unit structure specified by the control parameter K, it is determined whether the connection relation Rk is "statement type" or "serial type" (steps h and i). When the connection relation Rk is "statement type", the minimum unit structure [Nk, Rk, Mk] is replaced with the sentence Nk (step j). When the connection relation Rk is "serial type", the minimum unit structure [Nk, Rk, Mk] is replaced with the sentence Mk (step k).
If the connection relation Rk is neither "series type" nor "statement type", it is shown that the remaining "parallel type" is apparent from the connection relation shown in FIG. , The entire minimum unit structure [Nk, Rk, Mk] is deleted (step m).

このような処理を前記制御パラメータKをデクリメント
しながら(ステップn)、その制御パラメータKの値が
零(0)になるまで、つまり前述した如く取り出した全
ての最小単位構造[Nk,Rk,Mk]のそれぞれに対する処理
が完了するまで繰り返し実行する(ステップo)。
While decrementing the control parameter K by such processing (step n), until the value of the control parameter K becomes zero (0), that is, all the minimum unit structures [Nk, Rk, Mk extracted as described above are obtained. ] Is repeatedly executed until the processing for each of the above is completed (step o).

このような既約処理により第5図(b)に示すような文
脈構造から、『直列型』の接続関係についての規則に従
って文,がそれぞれ削除され、その木構造が第5図
(c)に示すように既約される。そして前述した第2図
に示す入力文章は、第6図(a)に示すように整理され
る。
By such irreducible processing, sentences and sentences are deleted from the context structure as shown in FIG. 5 (b) according to the rules for the "serial type" connection relation, and the tree structure is changed to FIG. 5 (c). It is irreducible as shown. The input sentence shown in FIG. 2 is organized as shown in FIG. 6 (a).

しかして以上の既約処理が終了したら、その時点での文
数が求められ、その文数が新たな制御値Jとしてセット
される(ステップp)。そしてその制御値Jで示される
文数が前述した上限値I以下となるか否かを判定し(ス
テップq)、許容文数Iに達していない場合には、前述
した既約処理を再帰的に繰り返し実行する。
When the above irreducible processing is completed, the number of sentences at that time is obtained, and the number of sentences is set as a new control value J (step p). Then, it is determined whether or not the number of sentences indicated by the control value J is equal to or less than the upper limit value I described above (step q), and when the allowable number of sentences I has not been reached, the irreducible processing described above is recursively performed. To repeat.

例えば上限値Iとして文数[2]が設定されている場合
には、再度前述した既約処理が実行される。そして第5
図(c)に示す木構造の最小単位構造[Nk,Rk,Mk]か
ら、『並列型』の接続関係にある文,が見出され、
これらの文,をそれぞれ抹消することにより第5図
(d)に示すような木構造が求められ、文章は第6図
(b)に示すように整理される。
For example, when the number of sentences [2] is set as the upper limit value I, the irreducible processing described above is executed again. And the fifth
From the minimum unit structure [Nk, Rk, Mk] of the tree structure shown in Fig. (C), sentences having a "parallel type" connection relation are found,
By deleting each of these sentences, a tree structure as shown in FIG. 5 (d) is obtained, and the sentences are organized as shown in FIG. 6 (b).

また上限値Iとして文数[1]が設定されている場合に
は、更に上記第5図(d)に示すような既約処理結果に
対して再度既約処理が起動される。そして第5図(d)
に示す木構造の最小単位構造[Nk,Rk,Mk]から、『並列
型』の接続関係にある文,が見出され、文,を
それぞれ抹消することによりその木構造が第5図(e)
に示すように修正され、その文章が第6図(c)に示す
ように整理される。
Further, when the sentence number [1] is set as the upper limit value I, the irreducible processing is started again for the irreducible processing result as shown in FIG. 5 (d). And FIG. 5 (d)
From the minimum unit structure [Nk, Rk, Mk] of the tree structure shown in Fig. 5, sentences "in parallel connection" are found, and by deleting each sentence, the tree structure is changed to Fig. 5 (e )
Is corrected as shown in FIG. 6 and the sentence is arranged as shown in FIG.

このようにして上限値Iに示される文数以下の文が前記
入力文章中から抽出されたとき、つまり上述した木構造
に示される文脈構造の既約処理によって上限値I以下の
数の文が残されたとき、これらの文をその文間の接続関
係に従って繋ぎ合わせ、これを要約文として生成出力す
る(ステップe)。尚、この要約文の生成処理において
は、例えば既約文脈構造(木構造)に属する文を、出現
順序の早い左側の文から順に並べ、これらの文の間に前
述した木構造に示される共範疇的標識記号に従い、その
記号に対応した接続表現を挿入しながら行われる。
In this way, when the sentences less than or equal to the upper limit value I are extracted from the input sentence, that is, the number of sentences less than or equal to the upper limit value I is reduced by the irreducible processing of the context structure shown in the tree structure. When left, these sentences are connected according to the connection relation between the sentences, and this is generated and output as a summary sentence (step e). In this summary sentence generation process, for example, the sentences belonging to the irreducible context structure (tree structure) are arranged in order from the left-most sentence in the earliest appearance order, and the sentences shown in the tree structure between them are displayed. It is performed according to the category sign symbol while inserting the connection expression corresponding to the symbol.

このようにこの実施例における文脈構造の既約処理は、
要約文として含むべき接続表現を木構造における共範疇
的標識記号として保存しながら実行される。そして最小
単位構造である部分木においてその接続関係から不要で
あると判定される文を削除しながら文脈構造に対する既
約処理が実行される。この結果、要約文として重要度の
高い文だけを効果的に抽出して要約文を作成することが
でき、またその要約文に含まれる文の数が変化した場合
であっても、それらの文間の関係を文脈構造として保存
することができるので、常に適切な接続表現を備えた高
品質な要約文を作成することが可能となる。
Thus, the irreducible processing of the context structure in this embodiment is
It is executed while saving the connected expressions that should be included as a summary sentence as co-categorical sign symbols in the tree structure. Then, the irreducible process for the context structure is executed while deleting the sentence determined to be unnecessary from the connection relation in the subtree which is the minimum unit structure. As a result, it is possible to effectively extract only important sentences as summary sentences and create summary sentences, and even if the number of sentences included in the summary sentences changes, Since the relation between them can be saved as a context structure, it is possible to always create a high-quality summary sentence with an appropriate connection expression.

尚、入力文章中に同じ文意を持つ文が複数存在する場合
には、次にようにしてその排除が行われる。例えば入力
文章に対する解析処理により、その文脈構造が (((1→2)=(3←4))→5) として求められ、その木構造が第7図に示すように求め
られた場合、これを既約処理することによって ((2=3)→5) なる構造が求められる。ここで文と文を結ぶ共範疇
的標識記号[=]は『同列』を意味し、同じ文意を持つ
文,が並列的に要約文中に含まれることになる。し
かしこの場合、前述した上限値Iが文数[2]であれ
ば、この既約文脈構造に対して再度既約処理が施される
ことになり、文,間の接続関係である『直列型』の
規則から文が抹消されることになる。従って要約文の
長さとして余裕がある場合には、上述した並列関係にあ
る同じ文意の文が存在していても問題はないが、余裕が
ないような場合には上限値Iを設定し直すことにより、
重複した文の一方を効果的に削除することができる。従
ってこのようにして既約処理を再帰的に繰り返すことに
より、非常に効果的に適切な表現の高品位な要約文を簡
易に生成することが可能となる。
When there are a plurality of sentences having the same meaning in the input sentence, they are eliminated as follows. For example, when the context structure is obtained as (((1 → 2) = (3 ← 4)) → 5) by the analysis process for the input sentence and the tree structure is obtained as shown in FIG. By irreducible processing, the structure of ((2 = 3) → 5) is obtained. Here, the co-categorical mark symbol [=] that connects sentences means "same sequence", and sentences having the same meaning are included in parallel in the summary sentence. However, in this case, if the above-mentioned upper limit value I is the number of sentences [2], the irreducible processing is performed again on this irreducible context structure, and the connection relation between the sentences and the “serial type The sentence will be deleted from the rule. Therefore, if there is a margin in the length of the summary sentence, there is no problem even if there is a sentence with the same meaning in the above-mentioned parallel relationship, but if there is no margin, set the upper limit value I. By fixing
One of the duplicate sentences can be effectively deleted. Therefore, by recursively repeating the irreducible processing in this manner, it becomes possible to easily and effectively generate a high-quality summary sentence of an appropriate expression.

尚、本発明は上述した実施例に限定されるものではな
い。実施例では要約文としての文数の上限値Iを規定し
て文脈構造に対する既約処理を繰り返し行うようにした
が、その繰り返し回数自体を制限するようにしても良
い。また接続関係に応じた文の削除規制についても、そ
の接続関係名に応じて更に細かく規定するようにしても
良い。その他、本発明はその要旨を逸脱しない範囲で種
々変形して実施することができる。
The present invention is not limited to the above embodiment. In the embodiment, the upper limit value I of the number of sentences as the summary sentence is defined and the irreducible processing for the context structure is repeatedly performed, but the number of repetitions itself may be limited. Further, the sentence deletion regulation according to the connection relation may be defined more finely according to the connection relation name. In addition, the present invention can be variously modified and implemented without departing from the scope of the invention.

[発明の効果] 以上説明したように本発明によれば、自然言語の文章の
論旨構造を文を単位として表現した木構造から2文で構
成される部分木構造である最小単位構造を取り出し、こ
の取り出した最小単位構造を該最小単位構造に含まれる
共範疇的標識に従って分類し、分類に応じて2文中の前
文ないし後文ないし両文を削除する棄却操作を再帰的に
繰り返し、この棄却操作の後上記木構造中に残った2文
を重要文として取り出すので、上述した棄却操作の後上
記木構造中に残った共範疇的標識を用いて上記重要文間
の接続表現を生成して要約文を得ることができる。従っ
て、要約文の長さを効果的に調節することができる等の
実用上多大なる効果が奏せられる。
EFFECTS OF THE INVENTION As described above, according to the present invention, the minimum unit structure, which is a partial tree structure composed of two sentences, is extracted from the tree structure in which the logical structure of a natural language sentence is expressed in sentences. The extracted minimum unit structure is classified according to a co-categorical marker included in the minimum unit structure, and a rejection operation for deleting the preceding sentence, the latter sentence, or both sentences in two sentences is recursively repeated according to the classification, and the rejection operation is performed. After that, since the two sentences remaining in the tree structure are extracted as important sentences, the connection expression between the important sentences is generated and summarized by using the co-categorical markers remaining in the tree structure after the rejection operation described above. You can get a sentence. Therefore, the practically great effect such as the effective adjustment of the length of the summary sentence can be achieved.

【図面の簡単な説明】[Brief description of drawings]

図は本発明の一実施例に係る要訳文生成方式について示
すもので、第1図は実施例方式を適用して構成される自
然言語処理装置の概略的な構成図、第2図は入力文章の
例を示す図、第3図は文脈解析に用いられる接続表現と
共範疇的標識記号とその性質を分類して示す図、第4図
は実施例における要訳文生成処理のアルゴリズムを示す
図、第5図は共範疇的標識記号を用いた文脈構造の木構
造表現と既約処理に伴う木構造の変化を示す図、第6図
は既約処理により整理されていく文章の例を示す図、第
7図は同じ文意の文の重複を避けるための処理を説明す
る為の図である。 1…文章入力部、2…文脈構造解析部、3…木構造生成
部、4…接続関係判断部、5…規則部、b…接続関係に
従う文の削除処理、e…既約文脈構造に従う要訳文生成
処理、f…木構造からの単位構造部分の抽出処理、j,k,
m…接続関係に従う文の削除処理(文脈構造の既約処
理)。
FIG. 1 shows an essential sentence generation method according to an embodiment of the present invention. FIG. 1 is a schematic configuration diagram of a natural language processing device configured by applying the embodiment method, and FIG. 2 is an input sentence. FIG. 3 is a diagram showing classified expressions of connection expressions used in context analysis, co-categorical sign symbols, and their properties, and FIG. 4 is a diagram showing an algorithm of essential sentence generation processing in the embodiment, FIG. 5 is a diagram showing a tree structure representation of a context structure using a co-categorical sign and changes in the tree structure due to irreducible processing, and FIG. 6 is a diagram showing an example of sentences organized by the irreducible processing. , FIG. 7 is a diagram for explaining a process for avoiding duplication of sentences having the same meaning. DESCRIPTION OF SYMBOLS 1 ... Text input part, 2 ... Context structure analysis part, 3 ... Tree structure generation part, 4 ... Connection relation determination part, 5 ... Rule part, b ... Deletion processing of sentences according to connection relation, e ... Necessary according to irreducible context structure Translation processing, f ... Extraction of unit structure part from tree structure, j, k,
m ... Sentence deletion processing according to connection relationships (irreducible processing of context structure).

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】自然言語の文章を解析して文章全体の論旨
構造を求める手段と、複数の文間の接続関係を表す共範
疇的標識を用いて前記文章の論旨構造を文を単位として
木構造表現する手段と、予め記憶された共範疇的標識に
対応した選択規則に基づいて前記木構造から2文で構成
される部分木構造である最小単位構造で取り出し、この
取り出した最小単位構造を該最小単位構造に含まれる共
範疇的標識に従って分類し、分類に応じて2文中の前文
ないし後文ないし両文を削除する棄却操作を再帰的に繰
り返す手段と、前記棄却操作の後前記木構造中に残った
文を重要文として取り出す手段と、前記棄却操作の後前
記木構造中に残った共範疇的標識を用いて前記重要文間
の接続表現を生成して要約文を得る手段とを具備したこ
とを特徴とする要約文生成方式。
1. A method for analyzing a sentence of a natural language to obtain a logical structure of an entire sentence, and a co-categorical marker indicating a connection relation between a plurality of sentences, to form a tree with the logical structure of the sentence as a unit. Based on the means for expressing the structure and the selection rule corresponding to the pre-stored co-categorical marker, a minimum unit structure which is a partial tree structure composed of two sentences is extracted from the tree structure, and the extracted minimum unit structure is extracted. Means for recursively repeating a rejection operation for classifying according to a co-categorical marker included in the minimum unit structure and deleting the preceding sentence or the latter sentence or both sentences in the two sentences according to the classification; and the tree structure after the rejecting operation. Means for extracting the remaining sentences as important sentences, and means for obtaining a concatenated sentence by generating a connection expression between the important sentences using the co-categorical markers remaining in the tree structure after the rejection operation. Features that are equipped Statement generation scheme.
JP2203865A 1990-08-02 1990-08-02 Summary sentence generation method Expired - Lifetime JPH0743728B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2203865A JPH0743728B2 (en) 1990-08-02 1990-08-02 Summary sentence generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2203865A JPH0743728B2 (en) 1990-08-02 1990-08-02 Summary sentence generation method

Publications (2)

Publication Number Publication Date
JPH0490055A JPH0490055A (en) 1992-03-24
JPH0743728B2 true JPH0743728B2 (en) 1995-05-15

Family

ID=16480986

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2203865A Expired - Lifetime JPH0743728B2 (en) 1990-08-02 1990-08-02 Summary sentence generation method

Country Status (1)

Country Link
JP (1) JPH0743728B2 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0612447A (en) * 1992-03-31 1994-01-21 Toshiba Corp Summary sentence preparing device
JP3001047B2 (en) 1997-04-17 2000-01-17 日本電気株式会社 Document summarization device
JP2004524559A (en) 2001-01-23 2004-08-12 エデュケーショナル テスティング サービス Automatic paper analysis method
US7127208B2 (en) 2002-01-23 2006-10-24 Educational Testing Service Automated annotation
WO2007113903A1 (en) * 2006-04-04 2007-10-11 Fujitsu Limited Summary creation program, summary creation device, summary creation method, and computer-readable recording medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
伊藤哲郎著「情報検索」(昭61−8−10)昭晃堂P.26−54

Also Published As

Publication number Publication date
JPH0490055A (en) 1992-03-24

Similar Documents

Publication Publication Date Title
US6338034B2 (en) Method, apparatus, and computer program product for generating a summary of a document based on common expressions appearing in the document
RU2571373C2 (en) Method of analysing text data tonality
JP2726568B2 (en) Character recognition method and device
US5895446A (en) Pattern-based translation method and system
US6014680A (en) Method and apparatus for generating structured document
US5748973A (en) Advanced integrated requirements engineering system for CE-based requirements assessment
US7398196B1 (en) Method and apparatus for summarizing multiple documents using a subsumption model
JP2010061176A (en) Text mining device, text mining method, and text mining program
US20050033566A1 (en) Natural language processing method
JPH0743728B2 (en) Summary sentence generation method
JP2021179781A (en) Sentence extraction device and sentence extraction method
JP2000040085A (en) Method and device for post-processing for japanese morpheme analytic processing
JP3139658B2 (en) Document display method
JP3518998B2 (en) Method and apparatus for creating semantic attribute dictionary and recording medium recording semantic attribute dictionary creating program
JP7227705B2 (en) Natural language processing device, search device, natural language processing method, search method and program
JP2005025555A (en) Thesaurus construction system, thesaurus construction method, program for executing the method, and storage medium with the program stored thereon
JP2004318809A (en) Information extraction rule generating apparatus and method
JPH04167049A (en) Document processor
JP2000250913A (en) Example type natural language translation method, production method and device for list of bilingual examples and recording medium recording program of the production method and device
JPH09179868A (en) Translation correspondence support system
JPH07230468A (en) Method and device for automatically extracting keyword
JPH0474259A (en) Document summarizing device
Love Benchmarking the performance of Two Automated Term-extraction systems: LOGOS and ATAO
JP6476638B2 (en) Specific term candidate extraction device, specific term candidate extraction method, and specific term candidate extraction program
JPH1115826A (en) Document analyzer and its method

Legal Events

Date Code Title Description
EXPY Cancellation because of completion of term