JPH0490055A - Summarized sentence generating system - Google Patents

Summarized sentence generating system

Info

Publication number
JPH0490055A
JPH0490055A JP2203865A JP20386590A JPH0490055A JP H0490055 A JPH0490055 A JP H0490055A JP 2203865 A JP2203865 A JP 2203865A JP 20386590 A JP20386590 A JP 20386590A JP H0490055 A JPH0490055 A JP H0490055A
Authority
JP
Japan
Prior art keywords
sentences
sentence
tree structure
connection
context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2203865A
Other languages
Japanese (ja)
Other versions
JPH0743728B2 (en
Inventor
Kenji Ono
顕司 小野
Satoshi Kinoshita
聡 木下
Teruhiko Ukita
浮田 輝彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Institute of Advanced Industrial Science and Technology AIST
Original Assignee
Agency of Industrial Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency of Industrial Science and Technology filed Critical Agency of Industrial Science and Technology
Priority to JP2203865A priority Critical patent/JPH0743728B2/en
Publication of JPH0490055A publication Critical patent/JPH0490055A/en
Publication of JPH0743728B2 publication Critical patent/JPH0743728B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

PURPOSE:To efficiently generate summarized sentences of high quality having a proper connection expression by extracting only important sentences in input sentences together with the context structure in accordance with the designated number of summarized sentences. CONSTITUTION:A connection relation judging part 4 refers to selection rules which are preliminarily registered in a rule part 5 and are peculiar to connection relations between sentences to judge whether one or both sentences should be abandoned or not in accordance with the structure, which indicates the context structure generated by a tree structure, by corange marking indicating the connection relations among plural sentences. If a sentence to be abandoned in found, information of this sentence is abandoned from the tree structure. Sentences of high importance left unabandoned are joined in accordance with the tree structure to obtain summarized sentences. Thus, summarized sentences of high quality where connection relations among plural sentences are clear are effectively generated.

Description

【発明の詳細な説明】 [発明の目的] (産業上の利用分野) 本発明は自然言語からなる文章の高品質な要約文を効果
的に生成することのできる要約文生成方式に関する。
DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Application Field) The present invention relates to a summary sentence generation method that can effectively generate a high-quality summary sentence of a natural language text.

(従来の技術) 自然言語からなる文章の要約文の生成は、従来−船釣に
は文章全体に亘って出現頻度の高い語を多数含む文や、
例えば[重要なことは・・・j等の特定の語や表現を含
む文を重要度の高い文であるけなので、内容的に同じ意
味を持つ複数の文が重複して抽出され易い。この結果、
往々にして冗長な要約文が生成され易いと云う問題があ
った。しかも文抽出の基準が文章全体の論旨の流れを踏
まえたものではないので、抽出された複数の文間の接続
関係を決定することが困難であり、適切な要約文を生成
することが難しい。更には抽出された複数の文を繋ぎ合
わせて生成された要訳文中には、往々にして必要な接続
詞がない場合や、不適切な接続詞が含まれることが多く
あり、その文意が曖昧なものとなり易いと云う問題があ
った。
(Prior art) Conventionally, generation of a summary sentence of a sentence consisting of natural language is conventionally possible.
For example, [What is important is that sentences containing specific words or expressions such as . As a result,
There is a problem that often redundant summary sentences are easily generated. Moreover, since the criteria for sentence extraction are not based on the flow of the overall thesis of the text, it is difficult to determine the connections between the multiple extracted sentences, and it is difficult to generate an appropriate summary sentence. Furthermore, the translated sentences generated by connecting multiple extracted sentences often lack necessary conjunctions or contain inappropriate conjunctions, making the meaning of the sentences ambiguous. There was a problem that it could easily become a thing.

一方、最近では文章全体の文脈構造を解析し、文章全体
の論旨の流れや、文章を構成する個々の文間の相対的な
関係を調べることが種々試みられている。この文脈構造
の解析は、例えば「情報処理学会研究会報告Vo1.8
9No、[i  89−NL−70−2ppl−819
89,1,20J「情報処理学会シンポジウム論文集 Vo1.89  No、5  pp125−136 1
989.J、I Jとによりなされる。
On the other hand, recently, various attempts have been made to analyze the context structure of the entire text and examine the flow of the overall idea of the text and the relative relationships between the individual sentences that make up the text. An analysis of this context structure can be performed, for example, in “Information Processing Society of Japan Research Group Report Vol.
9No, [i 89-NL-70-2ppl-819
89, 1, 20J “Information Processing Society of Japan Symposium Proceedings Vol. 1.89 No. 5 pp125-136 1
989. J, IJ.

然し乍ら、このような解析処理にて文章全体の文脈構造
を求めても、−船釣にはその論旨の流れと、文章を構成
する複数の文間の接続関係が示されるだけで、このよう
な情報を用いて要約文を生成する手法については明らか
にされていない。
However, even if we use this type of analysis to determine the context structure of the entire sentence, it only shows the flow of the argument and the connections between the multiple sentences that make up the sentence. The method of generating a summary sentence using information has not been clarified.

(発明が解決しようとする課題) このように従来の要約文生成処理にあっては、内容的に
同じ意味を持つ文を重複して抽出し易く、また抽出され
た複数の文間の接続関係を決定することが困難なので、
抽出された複数の文を繋ぎ合わせて要約文を生成した際
、冗長な要約文となったり、またその要約文中に不適切
な接続詞が含まれることが多々生じ易い等の問題があっ
た。
(Problems to be Solved by the Invention) In this way, in the conventional summary sentence generation process, sentences that have the same meaning in terms of content are easily extracted repeatedly, and the connection relationships between multiple extracted sentences are easy to be extracted. Since it is difficult to determine
When a summary sentence is generated by connecting a plurality of extracted sentences, there are problems such as the summary sentence being redundant and inappropriate conjunctions often being included in the summary sentence.

本発明はこのような事情を考慮してなされたもので、そ
の目的とするところは、文章中の重要な文を文間の接続
関係を明確に決定しながら抽出して高品質な要約文を効
果的に生成することのできる要約文生成方式を提供する
ことにある。
The present invention was developed in consideration of these circumstances, and its purpose is to extract important sentences from a text while clearly determining the connection relationships between sentences, and to generate a high-quality summary sentence. The object of the present invention is to provide a method for generating summary sentences that can be effectively generated.

[発明の構成] (課題を解決するための手段) 本発明に係る要約文生成方式は、自然言語の記文章の論
旨構造を文を単位として木構造表現した上で、前記共範
破約標識により示される文間の接続関係に固有な選択規
則に基づいて前記木構造で結ばれた文の一方または両方
を棄却する操作を再帰的に繰り返すことによって、要約
文として用いるに適した重要な文を適確に抽出し、且つ
その木構造に従って抽出文を繋ぎ合わせることにより高
品質な要約文を効率的に作成するようにしたことを特徴
とするものである。
[Structure of the Invention] (Means for Solving the Problems) The summary sentence generation method according to the present invention expresses the thesis structure of a written sentence in a natural language in a tree structure using sentences as units, and then Important sentences suitable for use as summary sentences are identified by recursively repeating the operation of rejecting one or both of the sentences connected in the tree structure based on the selection rule specific to the connection relationship between sentences shown by The present invention is characterized in that a high-quality summary sentence is efficiently created by accurately extracting the sentences and connecting the extracted sentences according to the tree structure.

(作 用) 本発明によれば、解析処理により文章全体の論旨の流れ
を示す文脈構造を求め、この文脈構造に従って前記文章
を構成する複数の文を、複数の文間の接続関係を表す共
範破約標識を用いて木構造表現し、共範破約標識により
示される文間の接続関係に固有な選択規則に基づいて前
記木構造表現されて結ばれている複数の文の一方または
両方を棄却していくと云う操作を再帰的に行うので、諭
旨構造的、或いは接続関係の上で冗長な文や重要度の低
い文を効果的に除去していくことが可能となる。その上
で残された文、つまり棄却するこよ 的に生成することが可能となる。
(Function) According to the present invention, a context structure indicating the flow of the argument of the entire sentence is obtained through analysis processing, and a plurality of sentences constituting the sentence are divided according to this context structure into a common structure representing the connection relationship between the plurality of sentences. One or both of the plurality of sentences expressed in a tree structure using range breakage indicators and connected in the tree structure based on selection rules specific to the connection relationships between sentences indicated by corange breakage indicators. Since the operation of rejecting is performed recursively, it becomes possible to effectively eliminate sentences that are redundant or sentences of low importance in terms of sentence structure or connection. Then, it becomes possible to generate the remaining sentence, that is, to reject it.

(実施例) 以下、図面を参照して本発明の一実施例に係る要約文生
成方式について説明する。
(Embodiment) Hereinafter, a summary sentence generation method according to an embodiment of the present invention will be described with reference to the drawings.

第1図は実施例方式を適用して構成される文書処理装置
の概略構成図で、lは自然言語からなる文章(テキスト
データ)を入力する文章入力部である。文脈構造解析部
2は前記文章入力部1から入力された文章に対して、例
えば前述したような手法を用いて文章中に出現する接続
詞等の修辞表現を手掛りとし、その修辞表現から複数の
文間の接続関係として好ましい系列を、接続関係間の系
列規則を参照して調べ、その論旨構造を示す文脈構造を
求めるものである。
FIG. 1 is a schematic configuration diagram of a document processing device constructed by applying the embodiment method, and l is a text input unit into which a text (text data) consisting of a natural language is input. The context structure analysis unit 2 analyzes the text input from the text input unit 1 using, for example, the method described above, using rhetorical expressions such as conjunctions that appear in the text as clues, and extracts multiple sentences from the rhetorical expressions. This method examines the preferred sequence as the connecting relationship between the two, with reference to the sequence rules between the connecting relationships, and determines the context structure that indicates the thesis structure.

木構造生成部3は上述した如く解析される入力文章の文
脈構造に従い、前記入力文章を構成する複数の文を、文
を単位とし、複数の文間の接続関係を表す共範破約標識
を用いて木構造表現するものである。
In accordance with the context structure of the input sentence to be analyzed as described above, the tree structure generation unit 3 converts a plurality of sentences constituting the input sentence into sentence units, and generates co-canonical breakage indicators that represent connection relations between the plurality of sentences. It is used to represent a tree structure.

接続関係判断部4は、規則部5に予め登録されしても良
いか否かを判断する。そして棄却すべき文が見出された
場合、その文についての情報を前記木構造から棄却する
。この接続関係判断部4による不要文の棄却操作は、前
記木構造を修正しなから再帰的に繰り返し行われる。そ
して所定回数の棄却操作が繰り返し行われたとき、或い
は上記棄却操作により残された文の数が所定文数以下と
なったとき、不要文の棄却操作が停止され、残された文
をその木構造に従って繋ぎ合わせることにより、前記入
力文章に対する要約文が生成出力される。
The connection relationship determining unit 4 determines whether it is acceptable to be registered in the rule unit 5 in advance. If a sentence to be rejected is found, information about that sentence is rejected from the tree structure. This operation of rejecting unnecessary sentences by the connection relation determining unit 4 is repeated recursively without modifying the tree structure. Then, when the rejection operation is repeated a predetermined number of times, or when the number of sentences left after the above rejection operation becomes less than the predetermined number of sentences, the rejection operation for unnecessary sentences is stopped and the remaining sentences are transferred to the tree. By connecting the sentences according to the structure, a summary sentence for the input sentence is generated and output.

このような要約文の生成処理の流れを更に詳しく説明す
る。
The flow of such a summary sentence generation process will be explained in more detail.

例えば第2図に示すような文章が前記文章入力部1から
入力されると、文脈構造解析部2は、例えば句点を文の
区切りとして検出し、文章中に出現する接続詞や特定の
文章表現等の修辞表現を手掛りとして複数の文間の接続
関係を調べる。この文間の接続関係は、例えば第3図に
示すように構成された辞書を参照する等し、文章中に出
現するれる。この際、その修辞表現から複数の文間の接
続関係として好ましい系列を、接続関係間の系列規則を
参照して文章全体に亘って調べ、前記入力文章の論旨構
造を示す文脈構造を決定する。
For example, when a sentence like the one shown in FIG. 2 is input from the sentence input unit 1, the context structure analysis unit 2 detects, for example, a period as a sentence break, and identifies conjunctions and specific sentence expressions that appear in the sentence. Investigate the connections between multiple sentences using the rhetorical expressions in the text as clues. This connection relationship between sentences can be determined by referring to a dictionary configured as shown in FIG. 3, for example, as it appears in the sentences. At this time, from the rhetorical expression, a preferable sequence as a connection relationship between a plurality of sentences is examined over the entire sentence with reference to sequence rules between connection relationships, and a context structure indicating the thesis structure of the input sentence is determined.

尚、このようにして求められる文間の接続関係は、例え
ば第3図に示すように「言明型」の接続関係として「重
複」 「補足」 「理由」・・・等の情報であり、「直
列型」の接続関係としては「順接」「逆接」 「同列」
・・・等の情報、また「並列型」の接続関係としては「
並列」 「対比」 「転換」・・・等の情報、更にその
他の接続関係として「予定」「参照」等の情報からなる
Incidentally, the connection relations between sentences obtained in this way are, for example, information such as "duplication", "supplementation", "reason", etc. as "statement-type" connection relations, as shown in Figure 3, and " The connection relationships of the series type are ``forward tangent'', ``inverse tangent'', and ``same parallel''.
..., etc., and the connection relationship of "parallel type" is "
It consists of information such as "parallel", "contrast", "conversion", etc., as well as information such as "schedule" and "reference" as other connection relationships.

文脈構造解析部2は、このようにして前記第2図に示す
ような入力文章を解析し、その入力文章が■、■、〜■
の8つの文からなり、これらの文間には ((((1−2) −(((3→4)ap5) X 6
)) −7)* 8)なる接続関係があることが見出さ
れる。尚、ここに示した共範破約標識記号である[−]
は「順接」を示し、また[−]は「対比J、[X]は「
逆接」構造に従って、例えば第5図(a)に示すように
木構造を生成し、個々の文間の接続関係を示す部分つま
りその文脈構造を表現する。このようにして共範時標識
記号を用いて木構造表現された入力文章の文脈構造に従
い、前記接続関係判断部4の下で、例えば第4図に示す
ようにして要訳文の生成処理が実行される。
In this way, the context structure analysis unit 2 analyzes the input sentence as shown in FIG. 2, and the input sentence is
It consists of eight sentences, and between these sentences is ((((1-2) −(((3→4)ap5)
)) -7)*8) It is found that there is a connection relationship as follows. In addition, the common norm breaking indicator symbol shown here is [-]
indicates "conjunct", [-] indicates "contrast J", and [X] indicates "
According to the "reverse connection" structure, a tree structure is generated, for example, as shown in FIG. 5(a), and the part showing the connection relationship between individual sentences, that is, the context structure thereof, is expressed. According to the context structure of the input sentence expressed in a tree structure using common time marker symbols in this way, the process of generating a translated sentence is executed under the connection relation determining unit 4 as shown in FIG. 4, for example. be done.

この処理手続きは、先ず生成しようとする要訳文の文数
を規定する上限値工を初期設定することから開始される
(ステップa)。この上限値工は、要訳文を回文以下と
して生成するかを規定するものである。しかる後、前処
理として前述した如く共範時標識記号を用いて木構造表
現された入力文章の文脈構造から「参照型」 「予定型
」の共範時標識記号を持つ部分構造(部分木)を、要訳
文を構成するに冗長な文であると判断し、その部分構造
を前記入力文章の文脈構造から取除くと云う処理を実行
する(ステップb)。具体的には、第5図(a)に示す
木構造の文脈構造においては、「参照型」の共範時標識
記号[※]を持つ部分構造が文■であることから、この
文■を入力文章中から削除し、その木構造を第5図(b
)に示すように修正変更する。
This processing procedure begins by initializing an upper limit value that defines the number of sentences to be translated (step a). This upper limit specifies whether a sentence to be translated is generated as a palindrome or less. After that, as a preprocessing step, a substructure (subtree) with common time marker symbols of "reference type" and "predictive type" is created from the context structure of the input sentence expressed as a tree structure using common time marker symbols as described above. is judged to be a redundant sentence to constitute a required translation sentence, and a process of removing that partial structure from the context structure of the input sentence is executed (step b). Specifically, in the tree-structured context structure shown in Figure 5(a), the substructure with the common paradigm indicator symbol [*] of "reference type" is sentence ■, so this sentence ■ It is deleted from the input text and its tree structure is shown in Figure 5 (b).
) Modify and change as shown.

ツブC)。そしてこの文の数を示す制御値Jが前述した
上限値Iに示される要訳文としての許容文数に達してい
るか否かを判定しくステップd)、この時点でその文数
が上限値1以下となったことが検出された場合には、そ
れらの文を前述した如く文の削除に伴って修正変更され
た木構造(既約文脈構造)に従って繋ぎ合わせ、これを
要訳文として出力する(ステップe)。
Whelk C). Then, it is determined whether the control value J indicating the number of sentences has reached the permissible number of sentences to be translated as indicated by the upper limit I mentioned above (step d), and at this point, the number of sentences is less than or equal to the upper limit 1. If it is detected that the sentences are deleted, these sentences are connected according to the tree structure (irreducible context structure) that has been revised and changed as the sentences are deleted as described above, and this is output as a translated sentence (step e).

然し乍ら、−船釣には上述した「参照型」 「予定型」
の共範鴫標識記号を持つ部分構造(部分木)の削除処理
だけでは、その文数を上限値I以下に抑えることは不可
能である。
However, - for boat fishing, there are the above-mentioned ``reference type'' and ``planned type''.
It is impossible to suppress the number of sentences below the upper limit value I by only deleting the substructure (subtree) having the common category indicator symbol.

従ってこのような場合には、文章全体の文脈構造を示す
木構造の中から U文 接続関係 文コ なる形式で示されるように2つの文が直接構造化されて
いるような最小単位部分に着目し、その最小単位部分に
ついての既約を行う。この既約は前記規則部5に予め格
納されている文間の接続関係に固有な選択規則に基づい
て行われるもので、例で示されるとき、 (1)接続関係Rkが直列型のとき、その単位全体を文
Mkにて置換する。
Therefore, in such cases, we focus on the smallest unit part in which two sentences are directly structured, as shown in the form U-sentence, connection relation, and sentence-co, from among the tree structure that shows the context structure of the entire sentence. Then, perform irreducibility on the smallest unit part. This irreducibility is performed based on a selection rule specific to the connection relationship between sentences stored in advance in the rule section 5. As shown in the example, (1) When the connection relationship Rk is of the serial type, The entire unit is replaced by sentence Mk.

[Nk  、  Rk  、  Mk  コ   → 
  [Mk  コ(2)接続関係Rkが言明型のとき、
その単位全体を文Nkにて置換する。
[Nk, Rk, Mk →
[Mk (2) When the connection relation Rk is of the assertion type,
The entire unit is replaced by sentence Nk.

[Nk  、   Rk  、   Mk  コ   
→   [Nk  ](3)接続関係Rkが並列型のと
き、その単位全体を削除する。
[Nk, Rk, Mk
→ [Nk] (3) When the connection relationship Rk is of the parallel type, delete the entire unit.

[Nk、Rk、Mkl  −[削除] 等の規則に従って行われる。このような既約処理は、文
脈構造を示す木構造の最小単位構造の部分から再帰的に
繰り返し実行される。
This is performed according to rules such as [Nk, Rk, Mkl - [deletion]. Such irreducible processing is repeatedly executed recursively starting from the minimum unit structure of the tree structure representing the context structure.

即ち、この既約処理は、先ず文脈構造を示す木構造中か
らその最小単位構造[Nk、Rk、Mklの全てを取り
出しくステップf)、制御パラメータKにその最小単位
構造の総数りをセットする(ステップg)。そして上記
制御パラメータKにより特定される最小単位構造につい
て、その接続関係Rkが「言明型」であるか、「直列型
Jであが「直列型」である場合には、その最小単位構造
[Nk、Rk、Mklを文Mkにて置換する(ステップ
k)。そしてその接続関係Rkが「直列型でも「言明型
」でもない場合には、前述した第3図に示す接続関係か
ら明らかなように、残された「並列型」であることが示
されるので、その最小単位構造[Nk 、 Rk 、 
Mk ]全体を削除する(ステップm)。
That is, in this irreducible process, first, the minimum unit structure [step f of extracting all of Nk, Rk, and Mkl] from the tree structure representing the context structure, and the total number of the minimum unit structures is set in the control parameter K. (Step g). Regarding the minimum unit structure specified by the above control parameter K, if the connection relation Rk is the "statement type" or the "series type J" is the "serial type", then the minimum unit structure [Nk , Rk, and Mkl are replaced by the sentence Mk (step k). If the connection relationship Rk is neither "serial type" nor "statement type," it is shown that it is the remaining "parallel type," as is clear from the connection relationship shown in FIG. 3 mentioned above. Its minimum unit structure [Nk, Rk,
Mk] is deleted in its entirety (step m).

このような処理を前記制御パラメータKをデクリメント
しながら(ステップn)、その制御パラメータにの値が
零(0)になるまで、つまり前述した如く取り出した全
ての最小単位構造[Nk 。
This process is performed while decrementing the control parameter K (step n) until the value of the control parameter becomes zero (0), that is, all the minimum unit structures extracted as described above [Nk.

Rk、Mklのそれぞれに対する処理が完了するまで繰
り返し実行する(ステップ。)。
The process is repeated until the processing for each of Rk and Mkl is completed (step).

このような既約処理により第5図(b)に示すような文
脈構造から、「直列型」の接続関係についての規則に従
って文■、■がそれぞれ削除され、その木構造が第5図
(C)に示すように既約される。
Through such irreducible processing, sentences ■ and ■ are respectively deleted from the context structure shown in Figure 5(b) according to the rules regarding the "serial type" connection relationship, and the tree structure is created as shown in Figure 5(C ) is irreducible as shown in

そして前述した第2図に示す入力文章は、第6図(a)
に示すように整理される。
The input text shown in Figure 2 above is shown in Figure 6(a).
It is organized as shown below.

下となるか否かを判定しくステップq)、許容文いる場
合には、再度前述した既約処理が実行される。そして第
5図(e)に示す木構造の最小単位構造[Nk、Rk、
Mk]から、「並列型」の接続関係にある文■、■が見
出され、これらの文■。
In step q), if there is an acceptable sentence, the irreducible process described above is executed again. The minimum unit structure of the tree structure [Nk, Rk,
Mk], sentences ■ and ■ that have a "parallel type" connection relationship are found, and these sentences ■.

■をそれぞれ抹消することにより第5図(d)に示すよ
うな木構造が求められ、文章は第6図(b)に示すよう
に整理される。
A tree structure as shown in FIG. 5(d) is obtained by deleting each of ``■'', and the text is organized as shown in FIG. 6(b).

また上限値Iとして文数[1]が設定されている場合に
は、更に上記第5図(d)に示すような既約処理結果に
対して再度既約処理が起動される。
Further, when the number of sentences [1] is set as the upper limit I, the irreducible process is started again for the irreducible process result as shown in FIG. 5(d).

そして第5図(d)に示す木構造の最小単位構造[Nk
 、Rk 、Mkコから、「並列型」の接続関係にある
文■、■が見出され、文■、■をそれぞれ抹消すること
によりその木構造が第5図(e)に示すように修正され
、その文章が第6図(e)に示すように整理される。
The minimum unit structure [Nk
, Rk, and Mk, sentences ■ and ■ that have a "parallel type" connection relationship are found, and by deleting sentences ■ and ■, respectively, the tree structure is modified as shown in Figure 5 (e). The sentences are arranged as shown in FIG. 6(e).

このようにして上限値■に示される文数以下の文が前記
入力文章中から抽出されたとき、つまり上述した木構造
に示される文脈構造の既約処理によって上限値工以下の
数の文が残されたとき、こ順序の早い左側の文から順に
並べ、これらの文の間に前述した木構造に示される共範
破約標識記号に従い、その記号に対応した接続表現を挿
入しな1がら行われる。
In this way, when sentences less than or equal to the number of sentences indicated by the upper limit value ■ are extracted from the input text, that is, by the irreducible processing of the context structure shown in the tree structure described above, the number of sentences less than or equal to the upper limit value F is extracted from the input text. When the remaining sentences are left, arrange them in order starting from the leftmost sentence, and insert the conjunctive expression corresponding to the common breakage indicator symbol shown in the tree structure mentioned above between these sentences. It will be done.

一 このようにこの実施例における文脈構造の既約処理は、
要訳文として含むべき接続表現を木構造における共範破
約標識記号として保存しながら実行される。そして最小
単位構造である部分木においてその接続関係から不要で
あると判定される文を削除しながら文脈構造に対する既
約処理が実行される。この結果、要訳文として重要度の
高い文だけを効果的に抽出して要約文を作成することが
でき、またその要訳文に含まれる文の数が変化した場合
であっても、それらの文間の関係を文脈構造として保存
することができるので、常に適切な接続表現を備えた高
品質な要訳文を作成することが可能となる。
- In this way, the irreducible processing of the context structure in this example is
This is executed while preserving the connective expressions that should be included in the essential translation as co-canonical break indicator symbols in the tree structure. Then, irreducible processing is performed on the context structure while deleting sentences that are determined to be unnecessary based on the connection relationships in the subtree, which is the minimum unit structure. As a result, it is possible to create a summary sentence by effectively extracting only sentences with high importance as required translation sentences, and even if the number of sentences included in the required translation sentences changes, it is possible to create a summary sentence. Since the relationships between the two can be saved as a context structure, it is possible to always create high-quality translated sentences with appropriate connective expressions.

尚、入力文章中に同じ文意を持つ文が複数存在する場合
には、次のようにしてその排除が行われる。例えば入力
文章に対する解析処理により、その文脈構造が つて ご  ((2−3)  −5) なる構造が求められる。ここで文■と文■を結ぶ共範破
約標識記号[−]は「同列」を意味し、同じ文意を持つ
文■、■が並列的に要訳文中に含まれることになる。し
かしこの場合、前述した上限値■が文数[2]であれば
、この既約文脈構造に対して再度既約処理が施されるこ
とになり、文■。
Note that if there are multiple sentences with the same meaning in the input sentence, they are eliminated as follows. For example, by analyzing an input sentence, a structure whose context structure is ((2-3) -5) is obtained. Here, the co-parallel break indicator symbol [-] that connects sentences ■ and sentences ■ means "same rank," and sentences ■ and ■ that have the same sentence meaning are included in parallel in the translated sentence. However, in this case, if the above-mentioned upper limit ■ is the number of sentences [2], irreducible processing is performed again on this irreducible context structure, and the sentence ■ becomes.

0間の接続関係である「直列型」の規則から文■が抹消
されることになる。従って要約文の長さとして余裕があ
る場合には、上述した並列関係にある同じ文意の文が存
在していても問題はないが、余裕がないような場合には
上限値Iを設定し直すことにより、重複した文の一方を
効果的に削除することができる。従ってこのようにして
既約処理を再帰的に繰り返すことにより、非常に効果的
に適切な表現の高品位な要訳文を簡易に生成することが
可能となる。
The sentence ■ will be deleted from the "serial type" rule, which is the connection relationship between 0s. Therefore, if there is enough length for the summary sentence, there is no problem even if there are sentences with the same meaning in the above-mentioned parallel relationship, but if there is not enough length, the upper limit value I should be set. By editing, one of the duplicate sentences can be effectively deleted. Therefore, by recursively repeating the irreducible process in this manner, it is possible to easily generate a high-quality translated sentence with an appropriate expression very effectively.

尚、本発明は上述した実施例に限定されるものではない
。実施例では要訳文としての文数の上限じ−で更に細か
く規定するようにしても良い。その他、本発明はその要
旨を逸脱しない範囲で種々変形して実施することができ
る。
Note that the present invention is not limited to the embodiments described above. In the embodiment, the upper limit of the number of sentences to be translated may be specified in more detail. In addition, the present invention can be implemented with various modifications without departing from the gist thereof.

一一−[発明の効果] 以上説明したように本発明によれば、指定された要訳文
の文数に応じて入力文章中の重要な文−だ・けをその文
脈構造と共に抽出することができるので、抽出された文
間の接続関係を明確に決定しながら、接続表現の適切な
高品質な要訳文を効率的に生成することができる。しか
もその要約処理を再帰的な処理の繰り返しにより実現す
るので、その処理の繰り返し回数を制限することだけに
よって要訳文の長さを効果的に調節することができる等
の実用上多大なる効果が奏せられる。
11- [Effects of the Invention] As explained above, according to the present invention, it is possible to extract only important sentences from an input text along with their context structure according to the specified number of sentences to be translated. Therefore, it is possible to efficiently generate high-quality translated sentences with appropriate connective expressions while clearly determining the connective relationships between extracted sentences. Moreover, since the summarization process is achieved by repeating recursive processing, it has great practical effects, such as being able to effectively adjust the length of the translated sentence simply by limiting the number of times the process is repeated. be given

【図面の簡単な説明】[Brief explanation of drawings]

図は本発明の一実施例に係る要訳文生成方式について示
すもので、第1図は実施例方式を適用して構成される自
然言語処理装置の概略的な構成図、第2図は入力文章の
例を示す図、−第3図は文脈解析に用いられる接続表現
と共範破約標識記号■・・・文章入力部、2・・・文脈
構造解析部、3・・・木構造生成部、4・・・接続関係
判断部、5・・・規則部、b・・・接続関係に従う文の
削除処理、e・・・既約文脈構造に従う要訳文生成処理
、f・・・木構造からの単位構造部分の抽出処理、j、
に、m・・・接続関係に従2、う文の削除処理(文脈構
造の既約処理)。
The figures show a translated text generation method according to an embodiment of the present invention, and Fig. 1 is a schematic configuration diagram of a natural language processing device configured by applying the embodiment method, and Fig. 2 is an input text. Figure 3 is a diagram showing an example of the connective expression and common break indicator symbol used for context analysis ■... Sentence input part, 2... Context structure analysis part, 3... Tree structure generation part , 4...Connection relationship judgment unit, 5...Rules unit, b... Sentence deletion process according to connection relationship, e... Translation sentence generation process according to irreducible context structure, f... From tree structure Extraction process of unit structure part of j,
According to the connection relationship, m... 2. Deletion processing of sentences (irreducible processing of context structure).

Claims (2)

【特許請求の範囲】[Claims] (1)自然言語の文章を解析して文章全体の論旨構造を
求める手段と、複数の文間の接続関係を表す共範疇的標
識を用いて前記文章の論旨構造を文を単位として木構造
表現する手段と、共範疇的標識により示される分間の接
続関係に固有な選択規制に基づいて前記木構造で結ばれ
た文の一方または両方を棄却する操作を再帰的に繰り返
す手段と、文の棄却操作がなされた前記木構造に従って
前記文章の要約文を抽出する手段とを具備したことを特
徴とする要約文生成方式。
(1) A tree structure representation of the thesis structure of the sentence in units of sentences using a means of analyzing a natural language sentence to find the thesis structure of the entire sentence, and a cocategorical marker representing the connection relationship between multiple sentences. a means for recursively repeating an operation of rejecting one or both of the sentences connected in the tree structure based on a selection restriction specific to the connection relationship between the parts indicated by the co-categorical marker, and a means for rejecting the sentence. A summary sentence generation method comprising: means for extracting a summary sentence of the text according to the manipulated tree structure.
(2)木構造に対する文の棄却操作は、その再帰的な繰
り返し回数を制限して行われ、この繰り返し制限回数に
より要約文の長さを調節することを特徴とする請求項(
1)に記載の要約文生成方式。
(2) The sentence rejection operation for the tree structure is performed by limiting the number of recursive repetitions, and the length of the summary sentence is adjusted according to the limited number of repetitions.
The summary sentence generation method described in 1).
JP2203865A 1990-08-02 1990-08-02 Summary sentence generation method Expired - Lifetime JPH0743728B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2203865A JPH0743728B2 (en) 1990-08-02 1990-08-02 Summary sentence generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2203865A JPH0743728B2 (en) 1990-08-02 1990-08-02 Summary sentence generation method

Publications (2)

Publication Number Publication Date
JPH0490055A true JPH0490055A (en) 1992-03-24
JPH0743728B2 JPH0743728B2 (en) 1995-05-15

Family

ID=16480986

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2203865A Expired - Lifetime JPH0743728B2 (en) 1990-08-02 1990-08-02 Summary sentence generation method

Country Status (1)

Country Link
JP (1) JPH0743728B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0612447A (en) * 1992-03-31 1994-01-21 Toshiba Corp Summary sentence preparing device
US6338034B2 (en) 1997-04-17 2002-01-08 Nec Corporation Method, apparatus, and computer program product for generating a summary of a document based on common expressions appearing in the document
WO2007113903A1 (en) * 2006-04-04 2007-10-11 Fujitsu Limited Summary creation program, summary creation device, summary creation method, and computer-readable recording medium
US7796937B2 (en) 2002-01-23 2010-09-14 Educational Testing Service Automated annotation
US8452225B2 (en) 2001-01-23 2013-05-28 Educational Testing Service Methods for automated essay analysis

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0612447A (en) * 1992-03-31 1994-01-21 Toshiba Corp Summary sentence preparing device
US6338034B2 (en) 1997-04-17 2002-01-08 Nec Corporation Method, apparatus, and computer program product for generating a summary of a document based on common expressions appearing in the document
US8452225B2 (en) 2001-01-23 2013-05-28 Educational Testing Service Methods for automated essay analysis
US7796937B2 (en) 2002-01-23 2010-09-14 Educational Testing Service Automated annotation
US8626054B2 (en) 2002-01-23 2014-01-07 Educational Testing Service Automated annotation
WO2007113903A1 (en) * 2006-04-04 2007-10-11 Fujitsu Limited Summary creation program, summary creation device, summary creation method, and computer-readable recording medium

Also Published As

Publication number Publication date
JPH0743728B2 (en) 1995-05-15

Similar Documents

Publication Publication Date Title
Pal et al. An approach to automatic text summarization using WordNet
US4730270A (en) Interactive foreign language translating method and apparatus
US5708829A (en) Text indexing system
US5369577A (en) Text searching system
US5323316A (en) Morphological analyzer
DE69032712T2 (en) HIERARCHICAL PRE-SEARCH TYPE DOCUMENT SEARCH METHOD, DEVICE THEREFOR, AND A MAGNETIC DISK ARRANGEMENT FOR THIS DEVICE
CN102576358B (en) Word pair acquisition device, word pair acquisition method, and program
WO2003038664A2 (en) Machine translation
CN112148359B (en) Distributed code clone detection and search method, system and medium based on subblock filtering
JPH07244666A (en) Method and device for automatic natural language translation
CN109508448A (en) Short information method, medium, device are generated based on long article and calculate equipment
US20040122660A1 (en) Creating taxonomies and training data in multiple languages
Theeramunkong et al. Non-dictionary-based Thai word segmentation using decision trees
Pal et al. An approach to automatic text summarization using simplified lesk algorithm and wordnet
US11436278B2 (en) Database creation apparatus and search system
JPH0490055A (en) Summarized sentence generating system
CN110413779B (en) Word vector training method, system and medium for power industry
Bloem et al. Distributional semantics for neo-Latin
Al-Msie'Deen et al. Naming the identified feature implementation blocks from software source code
Markellos et al. Knowledge discovery in patent databases
JPS61278970A (en) Method for controlling display and calibration of analyzed result of sentence structure in natural language processor
Çilden Stemming Turkish words using snowball
JPH07225770A (en) Data retrieval device
Chartron Lexicon management tools for large textual databases: the Lexinet system
US11500867B2 (en) Identification of multiple foci for topic summaries in a question answering system

Legal Events

Date Code Title Description
EXPY Cancellation because of completion of term