JP3599734B2

JP3599734B2 - Sentence proofreading apparatus and method

Info

Publication number: JP3599734B2
Application number: JP2003372878A
Authority: JP
Inventors: 克己望月; 和義長保
Original assignee: エー・アイ・ソフト株式会社
Priority date: 2003-10-31
Filing date: 2003-10-31
Publication date: 2004-12-08
Anticipated expiration: 2016-12-05
Also published as: JP2004094972A

Description

本発明は、文章校正装置およびその方法に関し、特に、日本語を構成し得る文字からなる文章を校正する文章校正装置およびその方法に関する。 The present invention relates to a sentence proofreading apparatus and method, and more particularly, to a sentence proofreading apparatus and method for proofreading a sentence composed of characters that can constitute Japanese.

従来、この種の文章校正装置では、作成文書の種類や目的に応じて、文章を校正するための様々な機能が提案されている。最近では、表記の統一、文末の表現の統一などを図るために、不適切な表現や組み合わせに対して警告を表示し、装置の使用者に対して修正を促す機能を搭載するものも多い。例えば、表記の統一を図る機能としては、文章中の記号、英字、カナ文字、数字について全角文字／半角文字といった文字種を予め設定しておき、読み出した文字列が該設定と異なる場合に、これを表示するものがある。また、文末の表現の統一を図る機能としては、使用者の設定に従って、各文末が「〜である」のような常体または「〜です」のような敬体のいずれであるかを識別し、該設定と異なる場合に訂正候補を表示するものが提案されている。 Conventionally, in this type of sentence proofreading device, various functions for proofreading a sentence have been proposed according to the type and purpose of a created document. Recently, in order to unify the notation and the expression at the end of the sentence, there are many devices equipped with a function of displaying a warning for an inappropriate expression or combination and prompting a user of the apparatus to make a correction. For example, as a function for unifying notations, a character type such as a full-width character / half-width character is set in advance for symbols, alphabetic characters, kana characters, and numbers in a sentence. Is displayed. In addition, the function to unify the expression at the end of the sentence is to identify whether the end of each sentence is the normal, such as "is," or the honorific, such as "is," according to the user's settings. It is proposed to display a correction candidate when the setting is different from the setting.

しかしながら、こうした文章校正装置は、予め登録した単語の誤りや文体の不統一、あるいは予め登録しておいた同音異義語や異字体などについては校正の対象とすることはできるが、未登録の単語については、日本語の表記として誤用と考えられても校正の対象とするいったことはできないという問題があった。従って、こうした未登録の単語の場合には、本来校正より正すべき多くの表記を見逃してしまう可能性があった。例えば、日本語の表記の中で、長音記号とマイナス記号は、時に誤って用いられることがあるが、外来語などで長音記号を伴う単語をすべて登録しておいて校正の対象とすると、膨大な記憶容量が必要となってしまう。しかも、登録されていない新たな外来語などは校正の対象とすることができない。更に、英単語の中に長音記号が混じって使われる場合なども校正の対象とはならなかった。 However, such a grammatical proofreading device can correct erroneous words and inconsistencies in styles registered in advance, or homonyms and allomorphs registered in advance, but can correct unregistered words. However, there was a problem that even if it was considered misused as a Japanese notation, it could not be included in the proofreading. Therefore, in the case of such an unregistered word, there is a possibility that many notations that should be corrected rather than proofreading may be missed. For example, in Japanese notation, the prolonged sign and the minus sign are sometimes used incorrectly, but if all the words with the prolonged sign are registered in foreign words, etc. Large storage capacity is required. In addition, new foreign words that have not been registered cannot be targeted for proofreading. In addition, proofreading was not performed when long sounds were mixed in English words.

こうした問題は、特にＯＣＲ（光学的文字読み取り機）が普及し、日本語文書をＯＣＲで読み取った場合の校正などでは看過することができない。ＯＣＲで読み取った場合には、本来長音記号となるべきところがマイナス記号として読み取られてしまうことも少なくないからである。このような場合に、従来の校正装置では、文章中における長音及びマイナス記号を全角あるいは半角のいずれにより表記するかを予め定めておいて、これを識別、判定できるにとどまり、長音記号とマイナス記号の相互間における誤用を指摘する事ができなかった。また、同じ表記が、ＯＣＲにより、一方は全角文字（２バイト文字）として、他方が半角文字（１バイト文字）として読み取られる場合もあり得るが、こうした場合の校正も従来は行なわれていなかった。 Such a problem is particularly widespread with OCR (optical character reader), and cannot be overlooked by proofreading when a Japanese document is read by OCR. This is because, in the case of reading by OCR, a part which should be a long sound sign is often read as a minus sign. In such a case, in the conventional proofreading device, it is determined in advance whether a long sound and a minus sign in a sentence are expressed in full-width or half-width, and this can only be identified and determined. Could not point out misuse between each other. In addition, the same notation may be read by OCR as one full-width character (2-byte character) and the other as half-width character (1 byte character). However, the calibration in such a case has not been conventionally performed. .

その理由として、長音記号とマイナス記号の使い分けについての考察が不十分であり、校正として使用できるルールが知られていなかったこと、及び、日本語には全角文字と半角文字という２種類の文字が存在し、これらが長音記号、マイナス記号と混在する場合のルールについても検討されていなかったこと、等を挙げることができる。 The reasons for this were that there was insufficient consideration on the proper use of the long sign and the minus sign, and there were no known rules that could be used as proofreading. And that rules for the case where these are mixed with a long sign and a minus sign have not been studied.

そこで、本発明の文章校正装置は、日本語表記として誤用と考えられる文字列の並びを検出する手法を見いだすことを目的としてなされ、特に文中における長音記号とマイナス記号の使い方やカタカナあるいは英数字が連続する場合の表記の統一等を校正の対象にすることを目的としてなされた。 Therefore, the sentence proofreading device of the present invention is intended to find a method of detecting a sequence of character strings that is considered to be misused as Japanese notation, and particularly, how to use long sounds and minus signs and katakana or alphanumeric characters in sentences. The purpose of the calibration was to unify the notation in continuous cases.

本発明の第１の文章校正装置は、
日本語を構成し得る文字からなる文章を校正する文章校正装置であって、
前記文章から文字を順次入力する入力手段と、
該入力した文字から、１バイト文字もしくは２バイト文字としてのカタカナまたは英数字のいずれか一方が連続した文字列を単語として切り出す単語切り出し手段と、
該切り出された単語を構成する各文字を、１バイト文字または２バイト文字のうち、異なる種類の文字に変換する文字種変換手段と、
該文字種変換手段より変換された文字列が、前記文章において既出か否かを判断する変換文字列判定手段と、
該変換文字列判定手段により、前記単語が前記文章において既出であると判断された場合には、警告を出力する警告出力手段とを備えたことを要旨とする。 The first sentence proofreading device of the present invention comprises:
A sentence proofreading device for proofreading sentences composed of characters that can constitute Japanese,
Input means for sequentially inputting characters from the sentence,
Word extracting means for extracting a character string in which either one of katakana or alphanumeric characters as a single-byte character or a double-byte character is continuous from the input character as a word,
Character type conversion means for converting each character constituting the cut-out word into a different type of character among 1-byte characters or 2-byte characters;
A character string converted by the character type conversion means, a conversion character string determination means for determining whether or not the character string has already appeared in the text,
The gist is that the conversion character string determination means includes a warning output means for outputting a warning when it is determined that the word has already appeared in the text.

この文章校正装置に対応する第１の文章校正方法は、
日本語を構成し得る文字からなる文章を校正する文章校正方法であって、
前記文章から文字を順次入力し、
該入力した文字から、１バイト文字もしくは２バイト文字としてのカタカナまたは英数字のいずれか一方が連続した文字列を単語として切り出し、
該切り出された単語を構成する各文字を、１バイト文字または２バイト文字のうち、異なる種類の文字に変換し、
該変換された文字列が、前記文章において既出か否かを判断し、
該単語が前記文章において既出であると判断された場合には、警告を出力すること
を要旨としている。 The first grammar correction method corresponding to this grammar correction device is
A grammar proofing method for proofreading a sentence composed of characters that can constitute Japanese,
Input characters sequentially from the sentence,
From the input character, cut out a character string in which either one of katakana or alphanumeric as a single-byte character or a double-byte character is continuous as a word,
Converting each character constituting the cut-out word into a different type of character among 1-byte characters or 2-byte characters,
Determine whether the converted character string has already appeared in the sentence,
The gist is to output a warning when it is determined that the word has already appeared in the text.

上記構成を有する第１の文章校正装置および第１の文章校正方法は、所定の入力手段により入力された文字が、カタカナ文字の連続もしくは英数字の連続であるか否かを判断し、これらの文字の連続である場合には、該連続した文字列を１つの単語として切り出す。切り出されたその単語が、１バイト文字からなる文字列である場合には２バイト文字からなる文字列に、２バイト文字からなる文字列である場合には１バイト文字からなる文字列に、それぞれ文字種を変換し、バイト数の異なる単語を生成する。その上で、文字種変換により生成された単語の文字列が、校正の対象としている文章において既出か否かを判断し、該文字列が既出である場合には、既出文字列と入力された文字列の文字種が異なるとして、警告を出力し、使用者に対し修正を促す。この場合、既出であるか否かの判断手法には、例えば、文章中において最初に使用された単語を既出単語として主記憶あるいは外部記憶装置に記憶しておき、これを基準として判断することなどが考えられる。 The first sentence proofreading apparatus and the first sentence proofreading method having the above configuration determine whether a character input by a predetermined input means is a continuous katakana character or a continuous alphanumeric character. If the characters are continuous, the continuous character string is cut out as one word. If the extracted word is a character string consisting of one-byte characters, it is converted to a character string consisting of two-byte characters, and if it is a character string consisting of two-byte characters, it is converted to a character string consisting of one-byte characters. Convert characters and generate words with different numbers of bytes. Then, it is determined whether or not the character string of the word generated by the character type conversion has already appeared in the text to be proofread. If the character string has already appeared, the character string that has been input and the character string that has been input are determined. Outputs a warning that the character type of the column is different and prompts the user to correct it. In this case, a method for determining whether or not the word has already been used includes, for example, storing the word used first in the sentence in the main memory or an external storage device as a previously-used word, and making a determination based on this. Can be considered.

かかる第１の文章校正装置および第１の文章校正方法では、カタカナ文字または英数字から構成される単語であって使用者が所望する文字種のものを予め辞書に登録しておくことなく、同一文章内における該単語の文字種の不統一を検出することができる。従って、辞書等において膨大な記憶容量を確保する必要がない。さらに、本発明によれば、単語毎の文字種不統一を検出することができるから、異なる単語についての異なる文字種の採用を許容することが可能となった。従来は、例えばカタカナ文字は２バイト文字とすると予め定めておき、これと異なる文字列（例えば１バイト文字のカタカナ文字列）を文字種が不統一であるとして検出できるに過ぎず、校正の対象となる単語毎に文字種統一の基準を変えることはできなかった。これによって、使用者の用字の多様なパターンに対応した校正を行なうことが可能となった。また、校正の対象が、未登録の単語や数字等のように単語としては観念しにくいものであっても、文字種の不統一を検出することができる。 In the first sentence proofreading device and the first sentence proofreading method, the same sentence can be obtained without pre-registering a word composed of katakana characters or alphanumeric characters of a desired character type in a dictionary in advance. Inconsistency in the character type of the word within the word can be detected. Therefore, it is not necessary to secure a huge storage capacity in a dictionary or the like. Further, according to the present invention, it is possible to detect character type inconsistency for each word, so that it is possible to allow the use of different character types for different words. Conventionally, for example, katakana characters are previously determined to be two-byte characters, and a character string different from this (for example, a katakana character string of one-byte characters) can only be detected as having a non-unified character type. The standard of character unification could not be changed for each word. This makes it possible to perform calibration corresponding to various patterns of the user's script. In addition, even if the object of proofreading is a word that is difficult to imagine as a word, such as an unregistered word or a number, it is possible to detect inconsistency in the character type.

本発明の第２の文章校正装置は、
日本語を構成し得る文字からなる文章を校正する文章校正装置であって、
前記文章から文字を順次入力する入力手段と、
該入力した文字が、長音記号またはマイナス記号のいずれかの種別に該当するかを判断する記号判断手段と、
該記号判断手段により、該入力された文字が長音記号またはマイナス記号に該当すると判断されたとき、該記号の前の文字の種類を判断する文字種判断手段と、
該文字種判断手段により判断された文字種と前記記号判断手段により判断された記号の種別との組み合わせの妥当性を判断する妥当性判断手段と、
該妥当性判断手段により、該組み合わせの妥当性が不適切と判断された場合には、警告を出力する警告出力手段とを備えることを要旨とする。 The second sentence proofreading device of the present invention comprises:
A sentence proofreading device for proofreading sentences composed of characters that can constitute Japanese,
Input means for sequentially inputting characters from the sentence,
Symbol determination means for determining whether the input character corresponds to any of a long sign or a minus sign,
A character type determining unit that determines a type of a character preceding the symbol when the input character is determined to correspond to a long sign or a minus sign by the symbol determining unit;
Validity determining means for determining the validity of a combination of the character type determined by the character type determining means and the type of the symbol determined by the symbol determining means;
The gist is to provide a warning output unit that outputs a warning when the validity determination unit determines that the validity of the combination is inappropriate.

また、この文章校正装置に対応した第２の文章校正方法は、
日本語を構成し得る文字からなる文章を校正する文章校正方法であって、
文字を入力し、
該入力した文字が、長音記号またはマイナス記号のいずれかの種別に該当するかを判断し、
該入力された文字が長音記号またはマイナス記号に該当すると判断されたとき、該記号の前の文字の種類を判断し、
該判断された文字種と判断された記号の種別との組み合わせの妥当性を判断し、
該組み合わせの妥当性が不適切と判断された場合には、警告を出力することを要旨とする。 In addition, the second sentence proofreading method corresponding to this sentence proofreading device is:
A grammar proofing method for proofreading a sentence composed of characters that can constitute Japanese,
Enter characters,
Judge whether the input character corresponds to any of a long sign or a minus sign,
When it is determined that the input character corresponds to a long sign or a minus sign, the type of the character before the symbol is determined,
Determine the validity of the combination of the determined character type and the determined symbol type,
If the validity of the combination is determined to be inappropriate, a gist is to output a warning.

上記構成を有する本発明の第２の文章校正装置および第２の文章校正方法は、校正の際に、日本語を構成し得る文字からなる文章から入力された文字が、長音記号またはマイナス記号のいずれかの種別に該当すると判断された場合には、更にその前の文字の種類を判断し、両者の組み合わせの妥当性について判断する。妥当性の判断は、長音記号またはマイナス記号と、その前の文字との組み合わせの規則に基づいて、行なわれる。両者の組み合わせが不適切である場合には、警告を出力し、使用者に対し修正を促す。 In the second sentence proofreading apparatus and the second sentence proofreading method of the present invention having the above-described configuration, the characters input from the sentence composed of characters that can constitute Japanese characters are used for proofreading when the proofreading is performed. If it is determined that any of the types is applicable, the type of the preceding character is further determined, and the validity of the combination of the two is determined. The determination of validity is made based on the rules for the combination of a long sign or a minus sign with the character preceding it. If the combination of the two is inappropriate, a warning is output and the user is urged to make a correction.

従って、本発明の第２の文章校正装置および第２の文章校正方法では、長音記号およびマイナス記号を含む単語を予め辞書に登録しておくことなく、該記号の誤用を検出することができる。従って、辞書等において膨大な記憶容量を確保する必要がなく、また、校正の対象が、未登録の単語や英字を用いた数式等のように単語とは観念しにくいものであっても、両記号の誤用を検出することができる。 Therefore, according to the second sentence proofreading apparatus and the second sentence proofreading method of the present invention, it is possible to detect misuse of a symbol including a long sign and a minus sign without previously registering the word in the dictionary. Therefore, it is not necessary to secure an enormous storage capacity in a dictionary or the like, and even if the object of proofreading is an unregistered word or a mathematical expression using alphabets, etc., it is difficult to think of it as a word, Misuse of symbols can be detected.

なお、警告出力手段は、例えば、ＣＲＴ画面へメッセージを出力するといった構成を考えることができるが、ファイルやプリンタ等に出力してもよく、また単に警告するだけでなく、あり得る訂正候補を選択可能な形態で伴う構成など、種々の対応を考えることができる。 The warning output means may be configured to output a message to a CRT screen, for example. However, the warning output means may be output to a file, a printer, or the like. Various countermeasures can be considered, such as possible configurations.

また、該記号の前において判断、識別される文字の種類としては、少なくとも、仮名文字を考えることができ、長音記号の前の文字が仮名文字以外である場合及びマイナス記号の前の文字が仮名文字である場合に、組み合わせを不適当とすることを考えることができる。これらが、両記号に付随してかつ組み合わせて用いられる蓋然性が高い文字種であり、広く両記号の誤用を検出できることになるからである。 In addition, as the type of character determined and identified before the symbol, at least a kana character can be considered. When the character before the long sign is not a kana character, and when the character before the minus sign is a kana character, If it is a character, it can be considered that the combination is inappropriate. These are character types that are likely to be used in conjunction with and in combination with both symbols, and that misuse of both symbols can be detected widely.

さらに、該装置及び方法では、該記号及び該記号の前の文字が全角／半角のいずれであるかをも識別し、該記号及び該記号の前の文字種の組み合わせが適当であっても、該記号及び該記号の前の文字の全角／半角が不一致である場合には、組み合わせを不適当とすることができる。例えば、「Ａ-Ｂ」や「A−Ｂ」等の組み合わせも不適当とするのである。この場合には、長音記号やマイナス記号を予め全角／半角のいずれかに指定しておく必要がなく、長音記号またはマイナス記号を含む単語について、該単語を構成する文字及び記号の全角／半角の不統一を検出することができる。 Further, the apparatus and the method identify whether the symbol and the character preceding the symbol are full-width / half-width, and determine whether the combination of the symbol and the character type preceding the symbol is appropriate. If the full-width / half-width characters of the symbol and the character preceding the symbol do not match, the combination may be inappropriate. For example, combinations such as "AB" and "AB" are also inappropriate. In this case, it is not necessary to previously designate a long sign or a minus sign as one of two-byte / one-byte characters. For words including a long sign or a minus sign, Inconsistencies can be detected.

さらに、上記の第１，第２の文章校正装置および第１，第２の文章校正方法において、入力された文字の連続が、連続する改行の組み合わせ、改行とこれに続くスペースの組み合わせ、あるいは句点とこれに続く改行の組み合わせ内のいずれかに該当するとき、これを文末と判断し、文末と判断される箇所以外では、少なくとも改行を除いて、文字の入力を連続的に行なわせる構成も好適である。 Further, in the first and second sentence proofreading apparatuses and the first and second sentence proofreading methods, the continuation of the input characters may be a combination of a continuous line feed, a combination of a line feed and a space following the line feed, or a punctuation mark. When any of the combinations of the following and a new line is applicable, it is determined that this is the end of the sentence, and in a portion other than the end determined to be the end of the sentence, it is also preferable that the character input is continuously performed except for at least the new line. It is.

通常、日本語の文章は、句点（「。」または「．」）により完了するが、句点がなく単に改行のみが置かれて完結する文も存在する。しかし、行単位で文章を扱うエディタにより作成された文章やパソコン通信などの通信により受け取られた文章では、各行の末尾に改行コードが挿入されている場合が存在するから、単純に、句点まで、または改行コードまでを文末をみなすという処理では、文末を正しく判定することができない。これに対して上記の構成を備えることにより、改行のある行末が文末であるかを正しく判定することができ、文末以外では文字を連続的に入力させ、途中に改行が存在する場合でも、校正の対象として扱うことができる。 Usually, a Japanese sentence is completed by a period ("." Or "."), But there is also a sentence that has no period and is completed only by a line feed. However, in sentences created by editors that handle sentences in units of lines or sentences received through communication such as personal computer communication, line break codes may be inserted at the end of each line, so simply, Or, in the process of considering the end of the sentence up to the line feed code, the end of the sentence cannot be correctly determined. In contrast, with the above configuration, it is possible to correctly determine whether the end of a line with a line feed is the end of a sentence. Can be treated as a target.

この発明は、以下のような他の態様も含んでいる。第１の態様は、日本語を構成し得る文字からなる文章を校正する文章校正装置を実現するプログラムを記録した媒体であって、
前記文章から文字を順次入力する入力処理と、
該入力した文字が、長音記号またはマイナス記号のいずれかの種別に該当するかを判断する記号判断処理と、
該記号判断処理により、該入力された文字が長音記号またはマイナス記号に該当すると判断されたとき、該記号の前の文字の種類を判断する文字種判断処理と、
該文字種判断処理により判断された文字種と前記記号判断処理により判断された記号の種別との組み合わせの妥当性を判断する妥当性判断処理と、
該妥当性判断処理により、該組み合わせの妥当性が不適切と判断された場合には、警告を出力する警告出力処理と
をコンピュータにより読み取り可能または実行可能に記録した記録媒体などである。 The present invention includes other aspects as described below. A first aspect is a medium that records a program that implements a sentence proofreading device that proofreads a sentence composed of characters that can constitute Japanese,
An input process of sequentially inputting characters from the sentence,
A symbol judgment process for judging whether the input character corresponds to any of a long sign and a minus sign,
When the input character is determined to correspond to a long sign or a minus sign by the symbol determination process, a character type determination process of determining a type of a character before the symbol;
Validity determination processing for determining the validity of a combination of the character type determined by the character type determination processing and the type of the symbol determined by the symbol determination processing;
If the validity determination process determines that the validity of the combination is inappropriate, a warning output process for outputting a warning is executed on a computer-readable or executable recording medium.

この媒体は、コンピュータのフレキシブルディスク装置や光磁気ディスク装置などに装着され、磁気的な手段や光学的な手段により記録したプログラムをコンピュータのメモリに転送することにより、コンピュータにおける文章校正機能を実現するものである。 This medium is mounted on a flexible disk device or a magneto-optical disk device of a computer, and implements a sentence correction function in the computer by transferring a program recorded by magnetic means or optical means to a memory of the computer. Things.

第２の態様は、コンピュータのメモリにロードされ、コンピュータシステムのマイクロプロセッサによって実行されることにより、日本語を構成し得る文字からなる文章を校正する文章校正装置を実現するプログラムを通信回線を介して供給する装置としての構成である。 According to a second aspect, a program for realizing a sentence proofreading device which is loaded into a memory of a computer and executed by a microprocessor of a computer system to proofread a sentence composed of characters that can constitute Japanese is transmitted via a communication line. This is a configuration as a device for supplying the pressure.

以上説明した本発明の構成及び作用を一層明らかにするために、以下本発明の実施の形態を実施例に基づき説明する。図１は、本発明の好適な実施例である文章校正装置のハードウェアの構成を示すブロック図である。図１に示すように、この文章校正装置は、予め設定されたプログラムに従って各種演算処理を実行するＣＰＵ２１を中心に、バス３１により相互に接続された次の各部を備える。ＲＯＭ２２は、ＣＰＵ２１で各種演算処理を実行するのに必要な文章校正装置に関わるプログラムやデータを予め格納しており、ＲＡＭ２３は、同じくＣＰＵ２１で各種演算処理を実行するのに必要な文章校正装置に関わる各種データが一時的に読み書きされるメモリである。キーボードインターフェイス２５は、キーボード２４からのキー信号の処理を司り、ＣＲＴＣ２７は、カラー表示可能なＣＲＴ２６への信号出力を制御し、プリンタインターフェイス２９は、プリンタ２８へのデータの出力を制御する。ハードディスクコントローラ（ＨＤＣ）３０は、ハードディスク３２を制御するものであり、このハードディスク３２には、ＲＡＭ２３にロードされて実行される文章校正処理プログラムなどの各種プログラムや、各種変換辞書などが記憶されている。 In order to further clarify the configuration and operation of the present invention described above, embodiments of the present invention will be described below based on examples. FIG. 1 is a block diagram showing a hardware configuration of a sentence proofreading apparatus according to a preferred embodiment of the present invention. As shown in FIG. 1, the sentence proofreading apparatus includes the following units interconnected by a bus 31 around a CPU 21 that executes various arithmetic processing according to a preset program. The ROM 22 previously stores programs and data related to a sentence proofreading device necessary for performing various kinds of arithmetic processing by the CPU 21, and the RAM 23 stores a program and data necessary for executing various kinds of arithmetic processing by the CPU 21. It is a memory from which various related data is temporarily read and written. The keyboard interface 25 controls processing of key signals from the keyboard 24, the CRTC 27 controls signal output to a CRT 26 capable of color display, and the printer interface 29 controls data output to a printer 28. The hard disk controller (HDC) 30 controls the hard disk 32. The hard disk 32 stores various programs such as a sentence proofreading processing program loaded into the RAM 23 and executed, and various conversion dictionaries. .

上記のハードウェアにより、入力された文章の校正の他に、文字列の入力、仮名漢字変換、編集、表示、印刷などがなされる。これらの処理は、基本的には、キーボード２４からかな文字列を入力し、これをハードディスク３２に記憶した辞書群を参照することで漢字仮名混じり文に変換し文章を作成してゆく処理およびキーボード２４からの入力に従って、入力済みの文章を編集してゆく処理である。これらの処理については、詳細な説明は省略する。なお、本実施例では、作成された文章は、ＲＡＭ２３の所定領域に格納され、ＣＲＴＣ２７を介し、ＣＲＴ２６の画面上に表示される。 With the hardware described above, in addition to proofreading of input text, input of character strings, kana-kanji conversion, editing, display, printing, and the like are performed. Basically, these processes are a process of inputting a kana character string from the keyboard 24, referring to a dictionary group stored in the hard disk 32, converting the kana character string into a sentence mixed with kanji kana, and creating a sentence. This is a process of editing an input sentence according to the input from S24. Detailed description of these processes is omitted. In the present embodiment, the created text is stored in a predetermined area of the RAM 23 and displayed on the screen of the CRT 26 via the CRTC 27.

次に、上記構成のハードウェアにより実行される文章校正処理の詳細について説明する。校正の対象となる文章は、ＲＡＭ２３の所定の領域に格納されているものとする。ここで、キーボード２４から、この文章に対する校正の実行を指示するキー操作がなされたとき、ＣＰＵ２１の命令によりハードディスク３２に格納された文章校正プログラムがＲＡＭ２３上にロードされ、実行可能な状態となる。 Next, the details of the sentence proofreading process executed by the hardware having the above configuration will be described. It is assumed that the text to be corrected is stored in a predetermined area of the RAM 23. Here, when a key operation for instructing the execution of the proofreading for this sentence is performed from the keyboard 24, the sentence proofreading program stored in the hard disk 32 is loaded on the RAM 23 by an instruction of the CPU 21 and becomes executable.

文章校正プログラムの概要について説明する。図２は、文章校正装置における一般的な文章校正処理を示すフローチャートである。この文章校正処理が、起動されると、まず使用者が希望する校正機能を指定する処理が行なわれる（ステップＳ１００）。この処理により、校正内容が確定されるのである。例えば、誤字脱字の校正、カタカナまたは英字等の文字種の統一、読みやすさの評価、表記のゆれのチェック等の指定が行なわれる。次に、希望する校正範囲をキーボード２４において指定することにより、校正対象を特定する処理が行なわれる（ステップＳ１１０）。校正対象は、文章全体を指定することもできるし、選択範囲とすることもできるし、現在カーソル（キャレット）が存在する位置から文頭まであるいは文末までとすることもできる。更には、段落を単位として指定することも可能である。 The outline of the sentence proofreading program will be described. FIG. 2 is a flowchart showing a general sentence correction process in the sentence correction device. When the sentence proofreading process is started, a process for designating a proofreading function desired by the user is first performed (step S100). By this processing, the calibration contents are determined. For example, proofreading of typographical errors, unification of character types such as katakana or English characters, evaluation of readability, checking of spelling of the notation, and the like are specified. Next, a process for specifying a calibration target is performed by designating a desired calibration range on the keyboard 24 (step S110). The proofreading target can specify the entire text, can be a selection range, or can be from the position where the cursor (caret) currently exists to the beginning of the text or to the end of the text. Furthermore, it is also possible to specify a paragraph as a unit.

次に、ステップＳ１１０で特定された校正対象から文字列を順次読み込み（ステップＳ１２０）、入力されたその文字列に対して、ステップＳ１００で確定した校正の処理を実行し（ステップＳ１３０）、校正結果を修正候補とともにＣＲＴ２６の画面上に表示する処理を行なう（ステップＳ１４０）。なお、追加の校正処理の要求があれば、ステップＳ１００に戻って上述した処理を繰り返す。 Next, a character string is sequentially read from the proofreading object specified in step S110 (step S120), and the calibration processing determined in step S100 is performed on the input character string (step S130), and the proofreading result is obtained. Is displayed on the screen of the CRT 26 together with the correction candidates (step S140). If there is a request for additional calibration processing, the process returns to step S100 to repeat the above-described processing.

以上、校正処理全体の流れについて説明した。次に、本実施例特有の処理の全体構造について説明する。図３は、図２の各処理ルーチンの中の校正処理、表示処理について、本実施例の文章校正装置に特有の処理の全体構造を示すブロック図である。本実施例の文章校正装置は、図示するように、一文獲得モジュール４０，日本語解析モジュール５０，誤字・脱字検出モジュール５５，長音記号／マイナス記号用法判定モジュール６０，カタカナ単語表記判定モジュール７０，英文字単語表記判定モジュール８０，検出結果表示モジュール９０を備える。 The flow of the entire calibration process has been described above. Next, the overall structure of the processing unique to the present embodiment will be described. FIG. 3 is a block diagram showing the entire structure of the processing unique to the sentence proofreading apparatus of the present embodiment regarding the proofreading processing and display processing in each processing routine of FIG. As shown in the figure, the sentence proofreading apparatus of this embodiment includes a one-sentence acquisition module 40, a Japanese analysis module 50, a typo / missing detection module 55, a prolonged / minus symbol usage judgment module 60, a katakana word notation judgment module 70, It includes a character word notation determination module 80 and a detection result display module 90.

一文獲得モジュール４０は、校正対象となる文章の文末位置を判定することにより一文の単位で文字列を獲得する処理を行なうモジュールである。日本語解析モジュール５０は、ハードディスク３２に格納された辞書ファイルを参照して、一文獲得モジュール４０により獲得した一文を日本語としての妥当性の観点から解析するモジュールである。誤字・脱字検出モジュール５５は、辞書ファイルを参照して行なわれた日本語解析の結果を利用して、誤字脱字を検出するモジュールである。例えば、漢字数文字からなる熟語が、辞書ファイルに見いだせない場合には、一部の文字が異なる熟語があるかを検索し、一部も漢字のみが異なる熟語が見いだされた場合には、誤字と判断したり、一文の漢字が脱落していると見なせる熟語が見いだされた場合には、脱字と判断するのである。 The one-sentence acquisition module 40 is a module that performs a process of acquiring a character string in units of one sentence by determining the end position of the sentence to be corrected. The Japanese analysis module 50 is a module that refers to a dictionary file stored in the hard disk 32 and analyzes a sentence acquired by the one-sentence acquisition module 40 from the viewpoint of validity as Japanese. The typographical / missing detection module 55 is a module that detects typographical errors using the result of Japanese analysis performed with reference to the dictionary file. For example, if a idiom consisting of several kanji characters cannot be found in the dictionary file, search for idioms in which some characters differ, and if some idioms differ only in kanji, If it is determined that a kanji is missing, or if a phrase that can be considered to be missing a kanji is found, it is determined to be a misspelling.

更に、長音記号／マイナス記号用法判定モジュール６０は、日本語文の中で用いられる長音記号「ー」とマイナス記号「−」の使用の適切さを判断するモジュールである。また、カタカナ単語表記判定モジュール７０は、カタカナで表記される単語の表記の統一などを判定するモジュールである。更に、英文字単語表記判定モジュール８０は、英文字からなる単語の表記の統一などを判定するモジュールである。これらのモジュールの詳細は、後述する。検出結果表示モジュール９０は、上記のいくつかのモジュールにより判定された日本語文としての妥当性の判断結果を表示するモジュールであり、実際には校正結果を一覧するウィンドウと、校正結果に対する修正候補を提示するウィンドウとを表示する処理を、併せ行なうモジュールである。 Further, the long sign / minus sign usage judging module 60 is a module for judging the appropriateness of using the long sign “−” and the minus sign “−” used in the Japanese sentence. The katakana word notation determination module 70 is a module for determining unification of notation of words written in katakana. Furthermore, the English character word notation determination module 80 is a module for determining unification of notation of words composed of English characters. Details of these modules will be described later. The detection result display module 90 is a module that displays the determination result of the validity as a Japanese sentence determined by some of the above modules, and in fact, displays a window that lists the calibration results and a correction candidate for the calibration results. This is a module that also performs processing to display a window to be presented.

各モジュールの機能を中心に、本実施例に特徴的な処理の概要を説明する。まず、一文獲得モジュール４０における処理について、図４のフローチャートを参照しつつ説明する。一文獲得モジュール４０は、日本語の文章を構成する最小単位である文を的確に切り出して、長音記号／マイナス記号用法判定モジュール６０などの校正用モジュールに供給するものである。通常、日本語の文章は、句点（「。」または「．」）により完了するが、句点がなく単に改行のみが置かれて完結する文も存在する。しかし、行単位で文章を扱うエディタにより作成された文章やパソコン通信などの通信により受け取られた文章では、各行の末尾に改行コードが挿入されている場合が存在するから、単純に、句点まで、または改行コードまでを文末をみなすという処理では、一文を正しく獲得することはできない。本実施例の一文獲得モジュール４０では、以下の処理により改行が存在する場合の文末の判定を行なって、一文を正確に獲得している。なお、改行が存在しない範囲において、句点から句点までを一文として切り出す処理は、従来から行なわれているのでここでは説明しない。 An outline of processing characteristic of the present embodiment will be described focusing on the function of each module. First, the processing in the one-sentence acquisition module 40 will be described with reference to the flowchart in FIG. The one-sentence acquisition module 40 accurately extracts a sentence, which is a minimum unit of a Japanese sentence, and supplies the sentence to a calibration module such as a long-sound / minus-sign usage determination module 60. Usually, a Japanese sentence is completed by a period ("." Or "."), But there is also a sentence that has no period and is completed simply by placing a line feed. However, in sentences created by editors that handle sentences in units of lines or sentences received through communication such as personal computer communication, line break codes may be inserted at the end of each line, so simply, Or, in the process of considering the end of the sentence up to the line feed code, one sentence cannot be obtained correctly. The one-sentence acquisition module 40 of the present embodiment determines the end of a sentence in the case where a line feed exists by the following processing, and acquires one sentence accurately. Note that the process of cutting out a period from a period to a period as one sentence in a range where a line feed does not exist has been conventionally performed, and will not be described here.

図４に示した一文獲得処理ルーチンが起動されると、まず、校正対象となる文章から、一行文の文字列が獲得される（ステップＳ２００）。次に、獲得された行（以下、獲得行と呼ぶ）の前行末尾に改行コードが存在するか否かを判定する処理を行なう（ステップＳ２１０）。前行末尾に改行コードが存在するか否かは、前行についての処理において判定し、フラグの形式で保存されている（後述）。したがって、ここでフラグを参照することにより、容易に、前行に改行コードが存在したかを判定することができる。獲得行の前行に改行コードが存在すると判定された場合には、本行の行頭の文字コードをチェックし、行頭が空白（全角スペースまたは半角スペース）または改行コードであるか否かを判定する処理を行なう（ステップＳ２２０、Ｓ２３０）。行頭が改行コードである場合、または空白である場合には、前行は文末であったと判定する（ステップＳ２４０）。ここで、行頭が空白である場合には、獲得行に文字列が存在するものの、獲得行の前行の行末には改行コードが存在し、これに続く行の先頭に空白が存在することから、両行にまたがって文字が連続していることはないと判断し、前行は文末であったと判定するのである。この場合には、前行までの文字列を一文として切り出し、これを校正用のモジュールに出力する処理を行なう（ステップＳ２４５）。 When the one-sentence acquisition routine shown in FIG. 4 is started, first, a one-line sentence character string is acquired from the text to be corrected (step S200). Next, a process of determining whether or not a line feed code exists at the end of the previous line of the obtained line (hereinafter, referred to as an obtained line) is performed (step S210). Whether or not a line feed code exists at the end of the previous line is determined in the processing for the previous line, and is stored in the form of a flag (described later). Therefore, by referring to the flag here, it is possible to easily determine whether a line feed code exists in the previous line. If it is determined that a line feed code exists in the line preceding the acquired line, the character code at the beginning of this line is checked, and it is determined whether the line head is a blank (double-byte space or half-width space) or a line feed code. Processing is performed (steps S220, S230). If the head of the line is a line feed code or is blank, it is determined that the preceding line is the end of the sentence (step S240). Here, if the beginning of the line is blank, there is a character string in the acquired line, but there is a line feed code at the end of the line before the acquired line, and there is a blank at the beginning of the line following it. It is determined that the characters do not continue over both lines, and that the preceding line is the end of the sentence. In this case, the character string up to the previous line is cut out as one sentence, and a process of outputting the sentence to the proofreading module is performed (step S245).

また、一行を獲得した後（ステップＳ２００）、その前行に改行コードが存在しないと判断された場合（ステップＳ２１０）、あるいは前行に改行コードは存在するが、獲得行の先頭には空白も改行コードも存在しないと判断された場合には（ステップＳ２１０ないし２３０）、次に獲得行に改行コードが存在するか否かを判定する処理を行なう（ステップＳ２５０）。獲得行に改行コードが存在する場合には、更に改行コードの前に位置する文字が句点であるか否かを判定する（ステップＳ２６０）。改行コードの前に位置する文字が句点である場合には、獲得行においてはこの行に存在する改行位置が文末であると判定する（ステップＳ２７０）。この場合には、獲得行の行末までを、それまで入力してきた文字列の末尾に加え、全体を一文として出力する処理を行なう（ステップＳ２７５）。 After acquiring one line (step S200), if it is determined that there is no line feed code in the previous line (step S210), or if there is a line feed code in the previous line, there is no blank at the beginning of the acquired line. When it is determined that there is no line feed code (steps S210 to S230), a process of determining whether or not a line feed code is present in the acquired line is performed (step S250). If there is a line feed code in the acquired line, it is further determined whether or not the character located before the line feed code is a period (step S260). If the character positioned before the line feed code is a punctuation mark, it is determined that the line feed position existing in this line in the acquired line is the end of the sentence (step S270). In this case, a process of outputting the whole sentence as one sentence is performed in addition to the end of the acquired line to the end of the character string input so far (step S275).

一方、獲得行に改行コードが存在しない場合（ステップＳ２５０）や、獲得行の末尾の改行コードの前が句点でない場合には（ステップＳ２６０）、獲得行には文末が存在しないと判定し、その行の文字列を改行コードを除いて、一文として切り出そうとしている文字列の末尾に加える処理を行なう（ステップＳ２８０）。これらの処理の後、校正対象内の全行の解析が完了しているかを判断し（ステップＳ２９０）、完了していなければ、１行の獲得（ステップＳ２００）から、以上の処理を繰り返す。全行についての解析が完了したと判断された場合には、ステップＳ２８０により付加した残余の行が残っていれば、これを一文として出力する処理を行なった後（ステップＳ２９５）、本ルーチンを終了する。 On the other hand, if there is no line feed code in the acquired line (step S250), or if the end of the acquired line is not a period before the line feed code (step S260), it is determined that the end of the sentence does not exist in the acquired line. A process of adding the character string of the line to the end of the character string to be cut out as one sentence, excluding the line feed code, is performed (step S280). After these processes, it is determined whether the analysis of all the lines in the calibration target has been completed (step S290). If not completed, the above process is repeated from acquisition of one line (step S200). When it is determined that the analysis has been completed for all the lines, if there is any remaining line added in step S280, a process of outputting this as one sentence is performed (step S295), and this routine ends. I do.

以上の処理により、改行コードが存在する行における文末判定が正しくなされ、行単位で編集を行なうエディタで編集された文章のように各行に改行コードが挿入されている文章であっても、文末でない改行コードは除いて、一文が切り出され、校正モジュールに引き渡されることになる。例えば、図５に示す例では、各行の末尾に改行コード（図示符号「▽」）が挿入されているが、
（１）改行コードの次の行頭に空白が存在する場合に文末と判断して（ステップＳ２１０，Ｓ２２０，Ｓ２４０）、それまでの文字列（Ａ）を一文として切り出し（ステップＳ２４５）、
（２）改行コードが連続する場合に文末と判断して（ステップＳ２１０〜Ｓ２４０）、それまでの文字列（Ｂ）を一文として切り出し（ステップＳ２４５）、
（３）改行コードの前の文字が句点の場合には文末判断して（ステップＳ２１０，Ｓ２５０〜Ｓ２７０）、それまでの文字列（Ｃ）を一文として切り出す（ステップＳ２７５）
ことになる。 By the above processing, the end of the sentence is correctly determined on the line where the line feed code exists, and even if the sentence has a line feed code inserted in each line, such as a sentence edited by an editor that edits line by line, it is not the end of the sentence Except for the line feed code, one sentence is cut out and passed to the proofreading module. For example, in the example shown in FIG. 5, a line feed code (indicated symbol “符号”) is inserted at the end of each line.
(1) If there is a blank at the beginning of the line following the line feed code, it is determined as the end of the sentence (steps S210, S220, S240), and the character string (A) up to that is cut out as one sentence (step S245),
(2) If line feed codes are consecutive, it is determined that the sentence ends (steps S210 to S240), and the character string (B) up to that is cut out as one sentence (step S245).
(3) If the character before the line feed code is a period, the end of the sentence is determined (steps S210, S250 to S270), and the character string (C) up to that point is cut out as one sentence (step S275).
Will be.

即ち、本実施例の文章校正装置では、改行コードの位置を一律に文末位置とみなすのではなく、改行コードの存在に加えて、行頭及び改行コード前の位置を参照して文末判定を行なっている。従って、パソコン通信等から入力された文章を校正する場合であっても、真の文末位置を判定することができる。この結果、行末に形式上の改行コードが存在しても、前行の行末から次行頭にかけて連続した文字列については、連続した文字列として正しく判断されることになる。なお、本実施例では、文末でないと判断された行末の改行コードを除いているが、文末でないと判断された行の次の行の先頭にスペースが位置する場合、これを除くことも可能である。文章によっては、インデントのためにスペースを行頭に１もしくは複数個配置する場合がある。したがって、こうした場合には、行頭のスペースを除いて文字列を構成することが望ましい。 That is, in the sentence proofreading apparatus of the present embodiment, the position of the line feed code is not regarded as the end position of the sentence, but the end of the sentence is determined by referring to the position of the beginning of the line and the position before the line feed code in addition to the presence of the line feed code. I have. Therefore, the true sentence end position can be determined even when the sentence input from the personal computer communication or the like is corrected. As a result, even if a formal line feed code exists at the end of a line, a continuous character string from the end of the previous line to the beginning of the next line is correctly determined as a continuous character string. In the present embodiment, the line feed code at the end of the line determined not to be the end of the sentence is removed. However, when a space is located at the beginning of the line following the line determined to be not the end of the sentence, this can be removed. is there. Depending on the text, one or more spaces may be placed at the beginning of the line for indentation. Therefore, in such a case, it is desirable to form a character string excluding the space at the beginning of the line.

以上説明した一文獲得モジュール４０により出力された一文を受け取って、長音記号とマイナス記号の用法の妥当性について判断する長音記号／マイナス記号用法判定モジュール６０の構成と働きの詳細について、次に説明する。図６は、長音記号／マイナス記号用法判定モジュール６０（図３）における処理の概略を示すフローチャートである。長音記号／マイナス記号用法判定モジュール６０は、上述した文末判定ルーチンにより切り出された一文を受け取ると、その一文の先頭を解析位置とした上で、図６に示した長音記号／マイナス記号用法判定ルーチンを実行する。この処理が開始されると、設定された解析位置に存在する文字が長音記号またはマイナス記号のいずれかであるか否かを判定する処理を行なう（ステップＳ３００、Ｓ３０５）。ステップＳ３００、Ｓ３０５において、該両記号のいずれにも該当しない場合には、解析位置を１文字分進め（ステップＳ３７０）、受け取った一文の全語についての解析が完了するまでは、ステップＳ３００に、処理を繰り返す。 The structure and operation of the long-sound / minus-sign usage judging module 60 that receives the one-sentence output by the one-sentence acquisition module 40 described above and judges the validity of the long-sound and minus-sign usage will be described below. . FIG. 6 is a flowchart showing an outline of the processing in the long sound sign / minus sign usage judging module 60 (FIG. 3). Upon receiving the sentence cut out by the above-mentioned sentence end determination routine, the prolonged-sound / minus-sign usage determination module 60 sets the beginning of the sentence as an analysis position, and then sets the prolonged-sign / minus-sign usage determination routine shown in FIG. Execute When this process is started, a process is performed to determine whether the character present at the set analysis position is either a long sign or a minus sign (steps S300, S305). In steps S300 and S305, if neither of the two symbols is applicable, the analysis position is advanced by one character (step S370). Until the analysis of all the words of the received sentence is completed, the process proceeds to step S300. Repeat the process.

他方、ステップＳ３００、Ｓ３０５の判定により、解析位置の文字が長音記号またはマイナス記号であると判定された場合には、解析位置を１文字前に戻し（ステップＳ３１０、Ｓ３１５）、長音記号またはマイナス記号とその１文字前の文字とが、全角または半角文字として一致しているかを判断する（ステップＳ３２０，Ｓ３２５）。即ち、長音記号またはマイナス記号が全角の場合にはその前の文字も全角であるかどうか、あるいは長音記号またはマイナス記号が半角の場合にはその前の文字も半角であるかどうかを判定するのである。長音記号またはマイナス記号とそれらの記号前の文字の全角／半角が一致していない場合には、文字種不統一と判定し、判定結果を検出結果表示モジュール９０に出力する処理を行なう（ステップＳ３３０）。 On the other hand, if it is determined in steps S300 and S305 that the character at the analysis position is a prolonged sign or a minus sign, the analysis position is returned to the previous character (steps S310 and S315), and the prolonged sign or the minus sign is returned. It is determined whether or not and the character one character before that match as full-width or half-width characters (steps S320, S325). That is, if the long sign or minus sign is full-width, it is determined whether the preceding character is also full-width, or if the long sign or minus sign is half-width, whether the preceding character is also half-width. is there. If the full-width / minus sign does not match the full-width / half-width character of the character preceding those symbols, it is determined that the character type is inconsistent, and a process of outputting the determination result to the detection result display module 90 is performed (step S330). .

他方、長音記号またはマイナス記号とその記号の前の文字の全角／半角が一致していると判断された場合には（ステップＳ３２０またはＳ３２５）、次に解析位置に存在する文字（この時点では長音記号またはマイナス記号の一つ前の文字）が仮名文字であるか否かを判定する（ステップＳ３４０、Ｓ３４５）。長音記号の前の文字が仮名文字でないと判定されるケースは、「漢字」＋「ー」、「英数字」＋「ー」、「記号」＋「ー」のように、長音記号とその前の文字からみて長音記号の不適切な用法であると判断し、判定結果を検出結果表示モジュール９０に出力する（ステップＳ３５０）。また、マイナス記号の前の文字が仮名文字であると判断された場合には（ステップＳ３０５，Ｓ３４５）、「かな」+「−」という文字列が存在することになるから、これはマイナス記号の不適切な用法であると判断し、判定結果を検出結果表示モジュール９０に出力する（ステップＳＳ３５５）。 On the other hand, when it is determined that the full-width / half-width characters of the long sign or the minus sign and the character preceding the sign match (step S320 or S325), the next character present at the analysis position (the long It is determined whether or not the symbol or the character preceding the minus sign) is a kana character (steps S340 and S345). In the case where the character before the prolonged symbol is not a kana character, the prolonged symbol and the preceding symbol are used, such as "kanji" + "-", "alphanumeric" + "-", "symbol" + "-". It is determined that the usage of the long sound symbol is inappropriate from the point of view of the character, and the determination result is output to the detection result display module 90 (step S350). If the character before the minus sign is determined to be a kana character (steps S305 and S345), the character string "kana" + "-" exists, and this It is determined that the usage is inappropriate, and the determination result is output to the detection result display module 90 (step SS355).

その後、ステップＳ３１０，Ｓ３１５で１文字分戻した解析位置を元に戻すべく、解析位置を１文字進める処理を行ない（ステップＳ３６０）、その後、更に解析位置を１文字分進め（ステップＳ３７０）、全語についての解析が完了するまでは、ステップＳ３００に戻って、処理を繰り返す。なお、長音記号の前の文字がかな文字である場合、あるいはマイナス記号の前の文字がかな文字でない場合には、記号と記号前の文字の組み合わせは適当であると判断し、何も行なわずそのまま解析位置を元に戻し（ステップＳ３６０）、更に次の解析位置まで１文字進め（ステップＳ３７０）、全語の解析が完了するまで、上述した処理を繰り返す（ステップＳ３８０）。 Thereafter, in order to restore the analysis position returned by one character in steps S310 and S315, the analysis position is advanced by one character (step S360), and then the analysis position is further advanced by one character (step S370). Until the analysis of the word is completed, the process returns to step S300 and the process is repeated. If the character before the long sign is a kana character or the character before the minus sign is not a kana character, the combination of the symbol and the character before the symbol is determined to be appropriate, and no operation is performed. The analysis position is returned to the original position (step S360), the character is advanced by one character to the next analysis position (step S370), and the above processing is repeated until the analysis of all words is completed (step S380).

以上の長音記号／マイナス記号用法判定処理ルーチンが終了すると、長音記号／マイナス記号用法判定モジュール６０により、長音記号とマイナス記号の用法の判定結果（これらの記号とその前の文字列との組み合わせが不適切と判断されたもの）が、所定の領域に記憶される。この結果を受けて、次に検出結果表示モジュール９０がこれをＣＲＴ２６に表示する処理を行なう。この処理が、図２に示した表示処理（ステップＳ１４０）に相当する。検出結果表示モジュール９０は、長音記号／マイナス記号用法判定モジュール６０における判定結果を、修正候補とともにＣＲＴ２６上に表示する。この表示の一例を、図７に示した。図７には、「長音記号の１文字前の文字が仮名文字以外」であり、不適当な組み合わせと判定された場合の判定結果及び修正候補の表示の一例を示した。 When the above-described prolonged-sound / minus-sign usage determination processing routine is completed, the prolonged-sign / minus-sign usage determination module 60 determines the usage of the prolonged-sign and the minus sign (the combination of these symbols and the character string preceding the sign). Is determined to be inappropriate) is stored in a predetermined area. Upon receiving the result, the detection result display module 90 performs a process of displaying the result on the CRT 26. This processing corresponds to the display processing (step S140) shown in FIG. The detection result display module 90 displays the judgment result of the long sound sign / minus sign usage judgment module 60 on the CRT 26 together with the correction candidates. An example of this display is shown in FIG. FIG. 7 shows an example of the display of a determination result and a correction candidate when an improper combination is determined because the character immediately preceding the long sound symbol is other than a kana character.

本実施例の文章校正装置では、長音記号及びマイナス記号が文章中で適切に用いられているか否かを、精度良くかつ効率的に判定することができる。しかも、長音記号およびマイナス記号の校正のために、両記号を含む単語をハードディスク３２に格納された校正用の辞書ファイルに予め登録しておく必要がなく、新たな外来語などであっても、長音記号やマイナス記号の用法を正しく判定することができる。また、数式など、単語とは観念しにくいものであっても、両記号の誤用を検出することができる。従って、辞書ファイルの記憶容量をいたずらに増大することがない。また、本実施例によれば、長音記号やマイナス記号とその前に存在する文字の種類の間における組み合わせの適否のみならず、全角／半角の一致をも各記号の用法の適否の判断の基準とするため、例えば、「Ａ-Ｂ」（全角文字＋半角のマイナス記号）や「A−Ｂ」（半角の英数字＋全角のマイナス記号）等の組み合わせも不適当と判断して校正の対象とすることができる。これらの結果、本実施例の文章校正装置によれば、長音記号またはマイナス記号を含む単語について、該単語を構成する文字及び記号の全角／半角の不統一を検出することができる。 The sentence proofreading apparatus according to the present embodiment can accurately and efficiently determine whether or not a long sound sign and a minus sign are appropriately used in a sentence. Moreover, it is not necessary to register words including both symbols in the dictionary file for correction stored in the hard disk 32 in advance for the correction of the long sign and the minus sign, and even if it is a new foreign word or the like, It is possible to correctly determine the usage of a long sign or a minus sign. Further, even if it is difficult to think of a word as a mathematical expression or the like, misuse of both symbols can be detected. Therefore, the storage capacity of the dictionary file does not increase unnecessarily. Further, according to the present embodiment, not only the suitability of the combination between the prolonged sign or the minus sign and the type of the character before it, but also the matching of full-width / half-width characters is used as a criterion for judging whether the usage of each symbol is appropriate. Therefore, for example, a combination of "AB" (full-width character + half-width minus sign) or "AB" (half-width alphanumeric character + full-width minus sign) is also judged to be inappropriate, and is subject to calibration. It can be. As a result, according to the sentence proofreading apparatus of the present embodiment, for a word including a long sign or a minus sign, it is possible to detect the full-width / half-width non-uniformity of the characters and symbols constituting the word.

なお、本実施例ではステップＳ３４５において仮名文字であるか否かを判定しているが、英文字であるか否かを判定する処理等に置き換えることも可能である。この場合には、マイナス記号の前の文字が英数字であれば組み合わせは正しいと判断し、マイナス記号の前の文字が英数字でなければ不適切な組み合わせと判断することになる。従って、マイナス記号の前がかな文字か英数字の場合には、同じ判定結果を生じるが、マイナス記号の前が「漢字」の場合には、前者ではこれを不適切な組み合わせとは見ないのに対して、後者の場合はこれを不適切な組み合わせと判定することになるという違いが生じる。日本語を構成し得る文字のうち、漢字とマイナス記号の組み合わせのあり得る場合（例えば「東京−新橋間」のような地名表示を想定する場合）には前者の構成を採り、あり得ない場合には後者の構成を採れば良い。 In this embodiment, whether or not the character is a kana character is determined in step S345. However, the process may be replaced with a process of determining whether or not the character is an English character. In this case, if the character before the minus sign is alphanumeric, the combination is determined to be correct, and if the character before the minus sign is not alphanumeric, the combination is determined to be inappropriate. Therefore, when the minus sign is preceded by a kana character or an alphanumeric character, the same judgment result is obtained, but when the minus sign is preceded by "Kanji", the former does not regard this as an inappropriate combination. On the other hand, in the latter case, there is a difference that this is determined as an inappropriate combination. In the case where a combination of kanji and a minus sign is possible among the characters that can constitute Japanese (for example, when a place name display such as "between Tokyo and Shimbashi" is assumed), the former configuration is adopted, and in the case where it is impossible May adopt the latter configuration.

次に、本実施例の文章校正装置におけるカタカナ単語表記判定モジュール７０と英文字単語表記判定モジュール８０の処理について、併せて説明する。カタカナ単語表記判定モジュール７０及び英文字単語表記判定モジュール８０は、カタカナのみから構成された単語および英数字のみから構成された単語の表記の不一致を判定するモジュールである。その詳細な処理を図８のフローチャートに示した。図８に示したカタカナ及び英単語表記判定ルーチンは、一文獲得モジュール４０から受け取った最初の一文の先頭を解析位置として設定して起動され、まず、設定された解析位置に存在する文字がカタカナであるか否かを判定する処理を行なう（ステップＳ４００）。解析位置の文字がカタカナではないと判定した場合には、次に解析位置の文字が英数字であるか否かを判定する処理を行なう（ステップＳ４０５）。なお、これらの判断では、解析位置の文字が、全角文字か半角文字かを区別せず、判断を行なう。解析位置の文字がカタカナでも英数字でもないと判断された場合には、解析位置を１文字文進め（ステップＳ４１０）、一文獲得モジュール４０を介して次々と受け取る全文字列（構成対象として指定された範囲の全文字列）について解析が完了したかを判断する（ステップＳ４７０）。解析が完了していなければ、ステップＳ４００に戻って処理を繰り返す。 Next, the processing of the katakana word notation determination module 70 and the English character word notation determination module 80 in the sentence proofreading device of the present embodiment will be described together. The katakana word notation judging module 70 and the English character word notation judging module 80 are modules for judging inconsistency of notation of a word composed only of katakana and a word composed of only alphanumeric characters. The detailed processing is shown in the flowchart of FIG. The katakana and English word notation determination routine shown in FIG. 8 is started by setting the beginning of the first sentence received from the one sentence acquisition module 40 as the analysis position, and first, the character existing at the set analysis position is written in katakana. A process is performed to determine whether or not there is (step S400). If it is determined that the character at the analysis position is not katakana, a process is performed to determine whether the character at the analysis position is alphanumeric (step S405). In these determinations, the determination is made without distinguishing whether the character at the analysis position is a full-width character or a half-width character. If it is determined that the character at the analysis position is neither a katakana character nor an alphanumeric character, the analysis position is advanced by one character sentence (step S410), and all the character strings received one after another via the one-sentence acquisition module 40 (designated as a configuration object) It is determined whether the analysis has been completed for all the character strings in the range (step S470). If the analysis has not been completed, the process returns to step S400 and repeats the process.

解析位置に存在する文字がカタカナであると判定した場合は（ステップＳ４００）、当該解析位置以後に連続して存在するカタカナ文字を１つの単語として切り出す処理を行ない、カタカナのみから構成された単語を生成する処理を行なう（ステップＳ４２０）。なお、この処理では、ステップＳ４００で検出されたカタカナが全角文字であれば、全角のカタカナが連続して存在する範囲を一つの単語として生成し、最初に検出されたカタカナが半角文字であれば、半角のカタカナが連続して存在する範囲を一つの単語として生成する。こうして切り出された単語を単語Ａと呼ぶ。同様の処理を、解析位置の文字が英数字であると判定した場合にも実行し（ステップＳ４０５，Ｓ４２５）、英数字の全角文字のみが連続する文字列もしくは英数字の半角文字のみが連続する文字列を、単語Ａとして生成する。 If it is determined that the character present at the analysis position is katakana (step S400), a process of cutting out katakana characters existing continuously after the analysis position as one word is performed, and a word composed only of katakana is extracted. A generation process is performed (step S420). In this process, if the katakana detected in step S400 is a full-width character, a range in which full-width katakana is continuously present is generated as one word, and if the first detected katakana is a half-width character, , A range in which half-width katakana is continuously present is generated as one word. The word cut out in this way is called word A. The same process is also performed when it is determined that the character at the analysis position is an alphanumeric character (steps S405 and S425), and a character string in which only full-width alphanumeric characters are continuous or only half-width alphanumeric characters are continuous. A character string is generated as word A.

次に、単語Ａを、単語Ａに対応しかつ全角／半角の異なる文字列に変換し、単語Ｂを生成する処理を行なう（ステップＳ４３０）。例えば、単語Ａが全角文字からなる「エディター」である場合には、半角文字からなる「エテ゛ィター」を単語Ｂとして生成する。単語Ａが全角文字からなる「ＡＢ１２」であれば、半角文字からなる「AB12」が単語Ｂである。次に、構成対象として指定された範囲のうち、先頭から解析位置までに、単語Ａが既に使用されていたか否かを判定する（ステップＳ４４０）。単語Ａが既に使用されている場合には、単語Ａの表記は既使用の表記に一致しており、単語Ａについての表記は統一されていると判定することができる。他方、単語Ａがまだ使用されていないと判断された場合には（ステップＳ４４０）、単語Ａとは、全角／半角の異なる単語Ｂが既に使用されていたか否かを判定する（ステップＳ４４５）。単語Ｂが既に使用されている場合には、単語Ａの表記とは異なる表記が既に使用されていたことになるので、単語Ａの表記は既出表記と不一致と判定する（ステップＳ４５５）。この不一致の判定結果は、その内容と共に、検出結果表示モジュール９０に出力される。単語Ｂもまだ使用されていない場合には、単語Ａの表記は校正対象内で初めて使われた表記なので、単語ＡをＲＡＭ２３内の所定の領域内に記憶し（ステップＳ４５０）、以後のステップＳ４４０及びＳ４４５における判断において参照可能とする。 Next, a process of converting the word A into a character string corresponding to the word A and having different full-width / half-width characters to generate the word B is performed (step S430). For example, if the word A is an “editor” consisting of full-width characters, an “editor” consisting of half-width characters is generated as the word B. If the word A is “AB12” composed of full-width characters, “AB12” composed of half-width characters is the word B. Next, it is determined whether or not the word A has already been used from the head to the analysis position in the range specified as the configuration target (step S440). If the word A has already been used, the notation of the word A matches the already used notation, and it can be determined that the notation of the word A is unified. On the other hand, when it is determined that the word A has not been used yet (step S440), it is determined whether or not the word B different from the word A in full-width / half-width has already been used (step S445). If the word B has already been used, it means that a notation different from the notation of the word A has already been used, and thus the notation of the word A is determined to be inconsistent with the already-described notation (step S455). The determination result of the mismatch is output to the detection result display module 90 together with the content. If the word B has not been used yet, since the notation of the word A is the first notation used in the proofreading object, the word A is stored in a predetermined area in the RAM 23 (step S450), and the subsequent steps S440 And in the determination in S445.

ステップＳ４４０で単語Ａが既に使用されていると判定された場合またはステップＳ４５０もしくはＳ４５５の判定処理を経た後は、解析位置を単語Ａの文字数分だけ進め（ステップＳ４６０）、校正対象内の全語の解析が完了するまで（ステップＳ４７０）、以上の処理を繰り返す。なお、校正対象となる文字列の全ての解析が完了すると、検出結果表示モジュール９０により、本ルーチンにおける判定結果が修正候補とともにＣＲＴ２６の画面上に表示される。かかる表示の一例を図９に示す。図９は、校正対象内において、まず「エディター」という全角文字からなる単語が使用されており、その後に「エテ゛ィター」という半角文字からなる単語が使用されていた場合に、表記が不統一と判定され、その判定結果及び修正候補を画面上へ表示した例である。 If it is determined in step S440 that the word A has already been used, or after the determination processing in step S450 or S455, the analysis position is advanced by the number of characters of the word A (step S460), and all the words in the proofreading target are read. The above processing is repeated until the analysis of (1) is completed (step S470). When the analysis of all the character strings to be calibrated is completed, the detection result display module 90 displays the determination result in this routine together with the correction candidates on the screen of the CRT 26. FIG. 9 shows an example of such a display. FIG. 9 shows that when the word consisting of full-width characters “editor” is used first in the proofreading object, and then the word consisting of half-width characters “editor” is used, the notation is determined to be inconsistent. This is an example in which the determination result and the correction candidate are displayed on the screen.

本実施例の文章校正装置では、校正対象内におけるカタカナ単語の全角および半角表記の不統一を、精度良くかつ効率的に判定することができる。しかも、校正対象内のカタカナ文字の連続を、単語として切り出す構成をとっているため、未登録単語ばかりでなく、数字等のように単語とは観念しにくいものであっても、表記の不統一を検出することができる。本実施例の文章校正装置では、カタカナ文字または英数字から構成される単語を、ハードディスク３２に格納された構成用の辞書に予め登録しておく必要がない。さらに、従来は、校正内容を確定する際に、カタカナ文字、英字等について全角半角のいずれに統一をするかを個別に指定していたが、本実施例の文章校正装置では、こうした指定を行なう必要がなく、単語毎に全角文字または半角文字の不統一を判定することができる。したがって、使用者の多様な用字のパターンに対応した構成を行なうことができる。 The sentence proofreading apparatus according to the present embodiment can accurately and efficiently determine the non-uniformity of full-width and half-width notation of katakana words in the proofreading object. In addition, since the continuation of katakana characters in the proofreading object is cut out as words, not only unregistered words but also inconsistent notation of words, such as numbers, which are difficult to imagine as words. Can be detected. In the sentence proofreading apparatus of the present embodiment, it is not necessary to register words composed of katakana characters or alphanumeric characters in the configuration dictionary stored in the hard disk 32 in advance. Further, conventionally, when the proofreading content is determined, it is individually specified whether to unify katakana characters, English characters, etc. into full-width and half-width characters. However, the sentence proofreading apparatus of this embodiment performs such specification. There is no need to determine whether full-width or half-width characters are unified for each word. Therefore, a configuration corresponding to various character patterns of the user can be performed.

さらに、本実施例では、校正対象内で最初に解析されたカタカナ文字等の連続した文字列を、当該文字列が全角または半角のいずれであるかについての情報とともに既出単語として記憶しておく構成を採っているので、ＯＣＲ入力などのように、文章中に何度か用いられた同一の表記（例えば「エディター」という全角文字列）の１つが、認識時に誤って「エテ゛ィター」と読み取られてしまっても、これらの相互の表記の不統一を確実に検出することができる。 Furthermore, in the present embodiment, a configuration in which a continuous character string such as katakana characters first analyzed in the proofreading object is stored as an existing word together with information on whether the character string is full-width or half-width. Therefore, one of the same notations (for example, a double-byte character string of "editor") used several times in a sentence, such as an OCR input, is erroneously read as "editor" at the time of recognition. Even if it does, the inconsistency of these mutual notations can be reliably detected.

以上本発明の一実施例について説明したが、本発明はこの様な実施例になんら限定されるものではなく、本発明の要旨を逸脱しない範囲において種々なる態様で実施し得ることは勿論である。 Although one embodiment of the present invention has been described above, the present invention is not limited to such an embodiment, and it is needless to say that the present invention can be implemented in various modes without departing from the gist of the present invention. .

本発明の一実施例である文章校正装置が実現されるハードウェアを示すブロック図である。FIG. 2 is a block diagram illustrating hardware that implements the sentence proofreading apparatus according to one embodiment of the present invention. 本発明の実施例で実行される文章校正処理ルーチンを説明するフローチャートである。5 is a flowchart illustrating a sentence proofreading processing routine executed in the embodiment of the present invention. 実施例としての文章校正装置における文章校正機能を実現するモジュールの関係を示したブロック図である。FIG. 4 is a block diagram illustrating a relationship between modules that implement a sentence proofreading function in the sentence proofreading device as an embodiment. 一文獲得モジュール４０において実行される文末判定処理ルーチンを示すフローチャートである。9 is a flowchart illustrating a sentence end determination processing routine executed in the one sentence acquisition module 40. 一文獲得モジュール４０において獲得される一文の例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of a sentence acquired by a sentence acquisition module 40. 長音記号／マイナス記号用法判定モジュール６０において実行される長音記号とマイナス記号の用法判定ルーチンを示すフローチャートである。9 is a flowchart showing a usage rule for a long sign and a minus sign, which is executed in a long sign / minus sign usage judging module 60. 長音記号とマイナス記号の用法判定結果の表示の一例を示す説明図である。It is an explanatory view showing an example of a display of a usage decision result of a long sign and a minus sign. カタカナ単語表記判定モジュール７０及び英文字単語表記判定モジュール８０において実行されるカタカナ単語及び英文字単語表記判定処理ルーチンを示すフローチャートである。It is a flowchart which shows the katakana word and English character notation determination processing routine performed in the katakana word notation determination module 70 and the English character word notation determination module 80. カタカナ単語及び英文字単語表記判定結果の表示の一例を示す参考図である。It is a reference drawing showing an example of a display of a katakana word and an English character word notation judgment result.

符号の説明Explanation of reference numerals

２１...ＣＰＵ
２２...ＲＯＭ
２３...ＲＡＭ
２４...キーボード
２５...キーボードインターフェイス
２６...ＣＲＴディスプレイ
２７...ＣＲＴＣ
２８...プリンタ
２９...プリンタインターフェイス
３１...バス
３２...ハードディスク
４０...一文獲得モジュール
５０...日本語解析モジュール
５５...誤字・脱字検出モジュール
６０...長音記号／マイナス記号用法判定モジュール
７０...カタカナ単語表記判定モジュール
８０...英文字単語表記判定モジュール 21 ... CPU
22 ... ROM
23 ... RAM
24 ... Keyboard 25 ... Keyboard interface 26 ... CRT display 27 ... CRTC
28 ... Printer 29 ... Printer Interface 31 ... Bus 32 ... Hard Disk 40 ... Sentence Acquisition Module 50 ... Japanese Analysis Module 55 ... Error and Omission Detection Module 60 ... Long Sound Symbol / minus sign usage judgment module 70 ... Katakana word notation judgment module 80 ... English character word notation judgment module

Claims

日本語を構成し得る文字からなる文章を校正する文章校正装置であって、
前記文章から文字を順次入力する入力手段と、
該入力した文字から、１バイト文字もしくは２バイト文字としてのカタカナまたは英文字のいずれか一方が連続した文字列を単語として切り出す単語切り出し手段と、
該切り出された単語を構成する各文字を、１バイト文字または２バイト文字のうち、異なる種類の文字に変換する文字種変換手段と、
該文字種変換手段より変換された文字列が、前記文章において既出か否かを判断する変換文字列判定手段と、
該変換文字列判定手段により、前記単語が前記文章において既出であると判断された場合には、警告を出力する警告出力手段と
を備えた文章校正装置。 A sentence proofreading device for proofreading sentences composed of characters that can constitute Japanese,
Input means for sequentially inputting characters from the sentence,
Word extracting means for extracting a character string in which either one of katakana or English characters as a single-byte character or a double-byte character continues from the input character as a word,
Character type conversion means for converting each character constituting the cut-out word into a different type of character among 1-byte characters or 2-byte characters;
A character string converted by the character type conversion means, a conversion character string determination means for determining whether or not the character string has already appeared in the text,
A warning output unit that outputs a warning when the converted character string determination unit determines that the word has already appeared in the text.

請求項２に記載の文章校正装置であって、
更に、
前記入力手段により入力された文字の連続が、連続する改行の組み合わせ、改行とこれに続くスペースの組み合わせ、あるいは句点とこれに続く改行の組み合わせの内のいずれかに該当するとき、これを検出する検出手段と、
該検出手段により前記文字の連続が前記組み合わせのいずれかに該当することが検出されたとき、これを文末と判断する文末判断手段と、
前記文末判断手段により文末と判断される箇所以外では、少なくとも改行を除いて、前記入力手段による文字の入力を連続的に行なわせる連続入力手段と
を備えた文章校正装置。 The sentence proofreading device according to claim 2,
Furthermore,
When the continuation of the character input by the input means corresponds to any one of a combination of a continuous line feed, a combination of a line feed and a space following the line, or a combination of a period and a line feed following the line break, it is detected. Detecting means;
When the detection unit detects that the continuation of the character corresponds to any of the combinations, a sentence end determination unit that determines this as a sentence end,
A sentence proofreading device comprising: a continuous input unit that allows characters to be continuously input by the input unit except for a line feed at a location other than the end determined by the end-of-sentence determining unit.

日本語を構成し得る文字からなる文章を校正する文章校正方法であって、
前記文章から文字を順次入力し、
該入力した文字から、１バイト文字もしくは２バイト文字としてのカタカナまたは英文字のいずれか一方が連続した文字列を単語として切り出し、
該切り出された単語を構成する各文字を、１バイト文字または２バイト文字のうち、異なる種類の文字に変換し、
該変換された文字列が、前記文章において既出か否かを判断し、
該単語が前記文章において既出であると判断された場合には、警告を出力する
文章校正方法。 A grammar proofing method for proofreading a sentence composed of characters that can constitute Japanese,
Input characters sequentially from the sentence,
From the input characters, cut out a character string in which either one of katakana or English as a single-byte character or double-byte character is continuous as a word,
Converting each character constituting the cut-out word into a different type of character among 1-byte characters or 2-byte characters,
Determine whether the converted character string has already appeared in the sentence,
A sentence proofreading method that outputs a warning when the word is determined to have already appeared in the sentence.

請求項３に記載の文章校正方法であって、
更に、
前記入力された文字の連続が、連続する改行の組み合わせ、改行とこれに続くスペースの組み合わせ、あるいは句点とこれに続く改行の組み合わせの内のいずれかに該当するとき、これを検出し、
前記文字の連続が前記組み合わせのいずれかに該当することが検出されたとき、これを文末と判断し、
該文末と判断される箇所以外では、少なくとも改行を除いて、前記文字の入力を連続的に行なわせる
文章校正方法。 The sentence proofreading method according to claim 3, wherein
Furthermore,
When the continuation of the input characters corresponds to any one of a combination of a continuous line feed, a combination of a line feed and a space following the line, or a combination of a period and a line feed following the line break, this is detected.
When it is detected that the sequence of characters corresponds to any of the combinations, it is determined that this is the end of the sentence,
A sentence proofreading method for continuously inputting the characters except for a line feed at a portion other than the end determined to be the end of the sentence.