JPH03156669A

JPH03156669A - Document processor

Info

Publication number: JPH03156669A
Application number: JP1295057A
Authority: JP
Inventors: Shinichi Kobayashi; 紳一小林
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1989-11-15
Filing date: 1989-11-15
Publication date: 1991-07-04

Abstract

PURPOSE:To omit the labor and time required for a document producer to input the punctuation marks at input of a document by providing the data on a punctuation mark adding position recognizing rule and a punctuation mark adding means which adds the punctuation marks to a text based on the data on the rule. CONSTITUTION:The data 8 on a punctuation mark adding position recognizing rule including the punctuation marks as the elements of a phrase structure rule is provided together with a punctuation mark adding means 7 which adds the punctuation marks to a text based on the data 8. Then the inputted text data is divided for each words and the parts of speech of these words are decided. Furthermore, a sentence structure tree is made based on a punctuation mark rule, and the data 8 is applied to the sentence structure tree. The punctua tion marks are previously included in the data 8 as the elements and, therefore, the adding positions of punctuation marks are decided based on the data 8. As a result, the punctuation marks can be added to the proper positions of the text and therefore it is not required to input the punctuation marks at production of the text.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、文書処理装置に係わり、特に、日本語文章の
読み易さ向上に好適な、句読点付加機能付き文書処理装
置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a document processing device, and particularly to a document processing device with a punctuation mark addition function suitable for improving the readability of Japanese sentences.

〔従来の技術〕[Conventional technology]

日本語文書の作成において、句読点を入力することにつ
いては、キャノン日本語ワードプロセッサ　キャノワー
ド４１００　操作説明書〜基本編〜２−２６〜２−２８
（１９８９）に書かれている。For information on entering punctuation marks when creating Japanese documents, refer to Canon Japanese Word Processor Cano Word 4100 Operating Instructions - Basic Edition - 2-26 to 2-28.
(1989).

この方法は、かな漢字変換処理における読みとして、か
なと同様にキーボードなどの入力装置から。This method uses an input device such as a keyboard in the same way as kana as a reading in the kana-kanji conversion process.

句読点を入力するというものである。This is to enter punctuation marks.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

上記従来技術は、文書作成者が句読点を付加する位置を
決めなければならなず、また句読点をいちいち入力しな
ければならない。In the above-mentioned conventional technology, the document creator must decide where to add punctuation marks and must input the punctuation marks one by one.

本発明の目的は１句読点を自動的に付加することにより
、文書入力の際１文書作成者に句読点を入力させる手間
を省くと同時に、読み易い文書の作成支援を可能とする
文書処理装置を提供することにある。An object of the present invention is to provide a document processing device that automatically adds a punctuation mark, thereby eliminating the need for a document creator to input punctuation marks when inputting a document, and at the same time, making it possible to support the creation of an easy-to-read document. It's about doing.

〔課題を解決するための手段〕[Means to solve the problem]

上記問題点は、句構造規則の要素として句読点を含む句
読点付加位置認識規則データ、そのデータをもとに句読
点をテキストに付加する句読点付加手段を設けることに
より解決される。The above problem is solved by providing punctuation mark addition position recognition rule data that includes punctuation marks as elements of phrase structure rules, and punctuation mark addition means that adds punctuation marks to text based on the data.

〔作用〕[Effect]

文書処理装置において、入力済みのテキストデ−タを単
語ごとに分割し、品詞を決定する。さらに句構造規則を
もとに構文木を作る。この構文木に句読点付加位置認識
規則データをあてはめる。In a document processing device, inputted text data is divided into words and the part of speech is determined. Furthermore, a syntax tree is created based on the phrase structure rules. Punctuation mark addition position recognition rule data is applied to this syntax tree.

句読点付加位置認識規則データには、あらかじめ句読点
が要素として入っているので１句読点の付加位置が決定
する。Since the punctuation mark addition position recognition rule data includes punctuation marks as elements in advance, the addition position of one punctuation mark is determined.

〔実施例〕〔Example〕

以下、本発明の一実施例を第１図〜第６図を用いて説明
する。An embodiment of the present invention will be described below with reference to FIGS. 1 to 6.

第１図は、本発明をワードプロセッシング機能（以下、
ワープロ機能という）に組み入れたときの構成方法を示
すものであり、１は、ワープロ機能ブロック、２は、入
力装置、３は、出力装置。FIG. 1 shows the present invention as a word processing function (hereinafter referred to as
1 is a word processing function block, 2 is an input device, and 3 is an output device.

４は、テキストデータエリア、５は、形態素・構文解析
機能ブロック、６は、形態素・構文解析結果エリア、７
は１句読点付加機能ブロック、８は。4 is a text data area, 5 is a morpheme/syntax analysis function block, 6 is a morpheme/syntax analysis result area, 7
1 is a punctuation mark addition function block, and 8 is a punctuation mark addition function block.

句読点付加位置認識規則データエリアである。This is a punctuation mark addition position recognition rule data area.

ワープロ機能ブロック１は、入力装置２から入力された
文字列を文書としてテキストデータエリア４に格納する
。また、テキストデータエリア４に格納された文書を出
力装置３に出力する０句読点付加機能が起動されると、
形態素・構文解析機能ブロック５は、テキストデータエ
リア４の文書を単語ごとに分割し、品詞を決定する。ざ
らに句構造規則をもとに構文木を作り、形態素・構文解
析結果エリア６に格納する。句読点付加機能ブロック７
は、形態素・構文解析結果エリア６に格納された構文木
に、句読点付加位置認識規則データエリア８に格納され
た句読点付加位！！Ｌｌ！規則データをあてはめる０句
読点付加位置認識規則データには、あらかじめ句読点が
要素として入っているので、句読点の付加位置が決定す
る。A word processing function block 1 stores a character string input from an input device 2 in a text data area 4 as a document. Additionally, when the zero punctuation addition function that outputs the document stored in the text data area 4 to the output device 3 is activated,
The morpheme/syntax analysis function block 5 divides the document in the text data area 4 into words and determines the part of speech. A syntax tree is created based on the rough phrase structure rules and stored in the morpheme/syntactic analysis result area 6. Punctuation mark addition function block 7
is the punctuation mark addition position stored in the punctuation mark addition position recognition rule data area 8 to the syntax tree stored in the morpheme/syntax analysis result area 6! ! Ll! Since the 0 punctuation mark addition position recognition rule data to which the rule data is applied includes punctuation marks as elements in advance, the addition position of the punctuation mark is determined.

第２図以降で、第１図の処理の詳細を述べる。From FIG. 2 onwards, details of the process shown in FIG. 1 will be described.

第２図は、本発明の一実施例のハードウェア構成を示す
ものであり、９は、処理装置、２は、文字入力及び編集
処理指示のためのキーボードなどの入力装置、３は、テ
キストを表示・印刷するためのＣＲＴデイスプレィやプ
リンタなどの出力装置である。１０は、メモリなどの内
部記憶装置であり、以下の４つの各エリアからなる。１
１は、プログラムエリア、４は、文書を格納するための
テキストデータエリア、６は、構文木を格納するための
形態素・構文解析結果エリア、８は１句読点付加位置認
識規則データエリアを表している。FIG. 2 shows the hardware configuration of an embodiment of the present invention, where 9 is a processing device, 2 is an input device such as a keyboard for inputting characters and editing instructions, and 3 is a text input device. This is an output device such as a CRT display or printer for displaying and printing. Reference numeral 10 denotes an internal storage device such as a memory, which consists of the following four areas. 1
1 represents a program area, 4 represents a text data area for storing a document, 6 represents a morpheme/syntax analysis result area for storing a syntax tree, and 8 represents a 1 punctuation mark addition position recognition rule data area. .

１２は、フロッピーディスクなどの外部記憶装置であり
、以下の２つの各記憶部からなる。１３は、テキストデ
ータ格納部、１４は、句読点付加位置認識規則データ格
納部を表している。Reference numeral 12 denotes an external storage device such as a floppy disk, which consists of the following two storage units. Reference numeral 13 represents a text data storage unit, and 14 represents a punctuation mark addition position recognition rule data storage unit.

第３図は、ワープロ機能ブロックｌの処理内容をフロー
チャートで示すものである。まず、処理ステップ１５で
、入力装置からキーを入力する。FIG. 3 is a flowchart showing the processing contents of the word processor function block 1. First, in processing step 15, a key is input from the input device.

処理ステップ１６で、キーが文字入カキ−であると判定
されたならば、処理ステップ１７で、入力された文字を
テキストデータに追加し、処理ステップ１５に戻る。処
理ステップ１６で、キーがワープロ編集キーであると判
定されたならば、処理ステップ１８で、キーに該当した
ワープロ編集機能を実行し、処理ステップ１５に戻る。If it is determined in processing step 16 that the key is a character input key, then in processing step 17 the input characters are added to the text data, and the process returns to processing step 15. If it is determined in processing step 16 that the key is a word processor editing key, then in processing step 18 the word processing editing function corresponding to the key is executed, and the process returns to processing step 15.

処理ステップ１６で、キーが句読点付加を意味するキー
であると判定されたならば、処理ステップ１９に分岐す
る。処理ステップ１９で、テキストデータを単語ごとに
分割し、品詞を決定する。さらに、処理ステップ２０で
、句構造規則をもとに構文木を作り、その構文木に、句
読点付加位置認識規則データをあてはめる１句読点付加
位置認識規則データには、あらかじめ句読点が要素とし
て入っているので、句読点の付加位置が決定する。処理
ステップ２１で１文の内部にある句点や、２つ以上型な
って生成された読点などの、明らかに不必要と判断され
る句読点を削除する。処理ステップ２２で、−文（文１
つ：任意の文字で始まる文字列で、末尾に句点の来るも
の）のうち、長さが３以下の部分文字列（句読点で囲ま
れた文字列：長さとは、単語の数）があるかどうか判定
し、あるならば、処理ステップ２３に分岐する。処理ス
テップ２２で、−文のうち、長さが３以下の部分文字列
がないと判定されたならば、処理ステップ１５に戻る。If it is determined in processing step 16 that the key is a key that means punctuation mark addition, processing branches to processing step 19. In processing step 19, the text data is divided into words and the part of speech is determined. Furthermore, in processing step 20, a syntax tree is created based on the phrase structure rules, and the punctuation mark addition position recognition rule data is applied to the syntax tree.1 The punctuation mark addition position recognition rule data contains punctuation marks as elements in advance. Therefore, the position to add punctuation marks is determined. In a processing step 21, punctuation marks that are clearly judged to be unnecessary, such as a period within one sentence or a comma generated in two or more patterns, are deleted. In processing step 22, − sentence (sentence 1
1: A character string starting with any character with a period at the end), is there a substring with a length of 3 or less (character string surrounded by punctuation marks: length is the number of words)? If so, the process branches to step 23. If it is determined in processing step 22 that there is no partial character string with a length of 3 or less in the - sentence, the process returns to processing step 15.

処理ステップ２３で、該当の部分文字列の両脇に句読点
がある場合、そのうち、よりレベルの低い読点を削除し
、処理ステップ２２に戻る。In processing step 23, if there are punctuation marks on both sides of the relevant partial character string, the lower level punctuation marks are deleted, and the process returns to processing step 22.

第４図は、第１図の句読点付加位置認識規則データ８の
一例を示すものである。いわゆる、句構造の文法規則で
あり、〈文〉は、〈名詞句〉と読点とく動詞句〉と句点
からなることを示す。また、〈名詞句〉は、く名詞〉と
［〈助詞〉］、またはく名詞句〉とく名詞句〉と読点、
またはく文〉とく名詞句〉と読点からなる。さらに、〈
動詞句〉は、く動詞〉と［く助動詞〉］、またはく名詞
句〉と〈動詞句〉、またはく副詞〉とく動詞句〉からな
る。ただし、［］内は省略も可能であるとする。FIG. 4 shows an example of the punctuation mark addition position recognition rule data 8 shown in FIG. This is a so-called grammatical rule for phrase structure, indicating that a ``sentence'' consists of a ``noun phrase,'' a comma, a verb phrase, and a period. In addition, a ``noun phrase'' can be a ku noun and a ``particle'', or a ku noun phrase, a special noun phrase, and a comma.
It consists of a sentence, a noun phrase, and a comma. moreover,<
A verb phrase consists of a verb and an auxiliary verb, or a noun phrase and a verb phrase, or an adverb and a verb phrase. However, the characters in [ ] can be omitted.

第５図は、第３図における処理のデータフローを示した
ものである。（１）は、本実施例を説明するために用意
したテキストの一例である。（２）は。FIG. 5 shows the data flow of the process in FIG. 3. (1) is an example of text prepared to explain this embodiment. (2).

第３図の処理ステップ１９の実行結果を示したものであ
る。（１）の原文を単語ごとに分割し、品詞を決定した
図である。（３）は、第３図の処理ステップ２０の実行
結果を示したものである。第４図の文法規則を（２）の
形態素解析結果にあてはめて作った構文木である。さら
に第３図の処理ステップ２１及び処理ステップ２３にお
ける句読点の削除処理を示している。「漱石が」という
部分文字列は、単語２つで構成されており、長さが３以
下の部分文字列に該当している。よって、その両脇にあ
る読点を削除するわけであるが、「彼は、」と「漱石が
、」の２つの読点を比べた場合、「漱石が」の方が低い
レベル（ルートから距離が遠いほど、レベルが低いもの
とする）にあるため、こちらを削除する。また１Ｍ文を
一文と認識したため、「漱石が、書いた。」というく文
〉を複文の従属節と見なし、末尾の句点は不必要である
ので、削除する。さらに、「読む」という部分文字列は
、長さが３以下の部分文字列に該当しているので、直前
の読点を削除する。直後には句点があるが、句点なので
削除できない。また、「彼は、」はその直前に句読点が
ないので、削除の対象とならない６以上、説明したよう
に、第４図に示すような文法規則を構文木にあてはめて
句読点を追加し、さらに不必要な句読点を削除すること
により、適切な位置に句読点を付加できる。This figure shows the execution result of processing step 19 in FIG. 3. It is a diagram in which the original text of (1) is divided into words and the parts of speech are determined. (3) shows the execution result of processing step 20 in FIG. This is a syntax tree created by applying the grammar rules in Figure 4 to the morphological analysis results in (2). Furthermore, the punctuation mark deletion processing in processing step 21 and processing step 23 of FIG. 3 is shown. The partial character string "Soseki ga" is composed of two words and corresponds to a partial character string with a length of 3 or less. Therefore, the commas on both sides of it are deleted, but when comparing the two commas "He is," and "Soseki ga,""Sosekiga" is at a lower level (distance from the route). The further away, the lower the level), so delete this. Also, since the 1M sentence is recognized as one sentence, the sentence ``Soseki wrote.'' is regarded as a dependent clause of a compound sentence, and the ending period is unnecessary, so it is deleted. Furthermore, since the partial character string "read" corresponds to a partial character string with a length of 3 or less, the immediately preceding comma is deleted. There is a full stop immediately after it, but it cannot be deleted because it is a full stop. In addition, since there is no punctuation mark in front of ``he,'' it is not subject to deletion.As explained above, we apply the grammar rules shown in Figure 4 to the syntax tree, add punctuation marks, and By deleting unnecessary punctuation marks, you can add punctuation marks at appropriate positions.

その結果が、（４）である。The result is (4).

第６図は、第５図と同様に、第３図における処理のデー
タフローを示したものである。（３）の構文木において
、「判断が１１、」というように読点が複数重なってい
る例である。これは、異なるレベルにおいて読点を付け
たものであり、このうち２つは、第３図の処理ステップ
２１で、明らかに不必要なものとして削除される。Similar to FIG. 5, FIG. 6 shows the data flow of the processing in FIG. In the syntax tree (3), this is an example in which multiple commas overlap, such as "judgment ga 11." This is marked with commas at different levels, two of which are deleted as clearly unnecessary in processing step 21 of FIG.

以上、実施例について詳述した。The embodiments have been described above in detail.

本実施例によれば、テキストに自動的に句読点を付加す
るので、ワープロで文書を作成する際、特に句読点を入
力する必要がないという効果がある。According to this embodiment, since punctuation marks are automatically added to text, there is an advantage that there is no need to input punctuation marks when creating a document using a word processor.

さらに５句読点付加位置認識規則データは、句読点付加
手段と分離して格納してあり、極めて独立性に富んでい
る。よって、文法規則の変更にも十分対応のとれる構造
となっており、システムの拡張にも柔軟性がある。Furthermore, the five punctuation mark addition position recognition rule data are stored separately from the punctuation mark addition means and are extremely independent. Therefore, the structure is sufficiently adaptable to changes in grammar rules, and there is flexibility in system expansion.

また、対象とするテキストは、句読点が付いていても、
無視すればよいので、誤って付いている句読点を指摘し
たり、適切な位置に付は直したりすることも可能である
。In addition, even if the target text has punctuation marks,
You can simply ignore the punctuation mark, so you can point out the incorrect punctuation mark and put it in the correct position.

さらに、第３図の処理ステップ２２において、長さが３
以下の部分文字列を見つけ、読点削除の目安としている
が、この値を多少増やすことにより、削除する読点を減
少させることもできる。Furthermore, in processing step 22 of FIG.
The following substrings are found and used as a guideline for comma deletion, but by slightly increasing this value, the number of commas to be deleted can be reduced.

また１日本語文章に句読点を付加できるので、外国語か
ら日本語への機械翻訳の日本語生成機能においても利用
可能である。Furthermore, since punctuation marks can be added to a single Japanese sentence, it can also be used in the Japanese generation function for machine translation from foreign languages to Japanese.

〔発明の効果〕〔Effect of the invention〕

本発明によれば、テキスト中の適切な位置に句読点を付
加できるので、テキスト作成の際、句読点を入力する必
要がないという効果がある。According to the present invention, since punctuation marks can be added to appropriate positions in text, there is an effect that there is no need to input punctuation marks when creating text.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明をワープロ機能に組み入れたときの構成
方法を示すブロック図、第２図は本発明の一実施例のハ
ードウェア構成図、第３図はワープロ機能ブロックｌの
処理内容のフローチャート、第４図は第１図の句読点付
加位置認識規則データ８の一例を示す図、第５図および
第６図は第３図におけるデータフロー図である。１・・・ワープロ機能ブロック、２・・・入力装置、３
・・・出力装置、４・・・テキストデータエリア、５・
・・形態素・構文解析機能ブロック、６・・・形態素・
構文解析結果エリア、７・・・句読点付加機能ブロック
、８第２図第１図第３図第４図第５図FIG. 1 is a block diagram showing a configuration method when the present invention is incorporated into a word processing function, FIG. 2 is a hardware configuration diagram of an embodiment of the present invention, and FIG. 3 is a flowchart of processing contents of the word processing function block l. , FIG. 4 is a diagram showing an example of the punctuation mark addition position recognition rule data 8 of FIG. 1, and FIGS. 5 and 6 are data flow diagrams in FIG. 3. 1... Word processor function block, 2... Input device, 3
...Output device, 4...Text data area, 5.
...Morpheme/syntax analysis function block, 6...Morpheme/
Syntax analysis result area, 7... Punctuation mark addition function block, 8 Fig. 2 Fig. 1 Fig. 3 Fig. 4 Fig. 5

Claims

【特許請求の範囲】１、文章を編集する文書処理装置において、句読点付加
位置認識規則データ、該データをもとに句読点をテキス
トに付加する句読点付加手段を備えることを特徴とする
文書処理装置。２、上記句読点付加位置認識規則データは、句構造規則
の要素として句読点を含むことを特徴とする特許請求の
範囲第１項記載の文書処理装置。[Scope of Claims] 1. A document processing device for editing text, comprising punctuation mark addition position recognition rule data and punctuation mark addition means for adding punctuation marks to text based on the data. 2. The document processing device according to claim 1, wherein the punctuation mark addition position recognition rule data includes punctuation marks as elements of phrase structure rules.