JPS5899829A - Erroneous character detection and correction backing device - Google Patents

Erroneous character detection and correction backing device

Info

Publication number
JPS5899829A
JPS5899829A JP56198941A JP19894181A JPS5899829A JP S5899829 A JPS5899829 A JP S5899829A JP 56198941 A JP56198941 A JP 56198941A JP 19894181 A JP19894181 A JP 19894181A JP S5899829 A JPS5899829 A JP S5899829A
Authority
JP
Japan
Prior art keywords
character
characters
dictionary
erroneous
contents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP56198941A
Other languages
Japanese (ja)
Other versions
JPH0248938B2 (en
Inventor
Tamaki Saito
斎藤 珠喜
Toshiaki Sugimura
利明 杉村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP56198941A priority Critical patent/JPS5899829A/en
Publication of JPS5899829A publication Critical patent/JPS5899829A/en
Publication of JPH0248938B2 publication Critical patent/JPH0248938B2/ja
Granted legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
  • Processing Or Creating Images (AREA)

Abstract

PURPOSE:To automatically execute the work for detecting and correcting an error, and to elevate the efficiency of a device, by detecting an example of an erroneous character from whether characters of a character string of a Japanese sentence can be connected or not, and outputting a character which can be connected to the previous or the following character of that which is similar to said example, as an example of a corrected character. CONSTITUTION:In a character string of a Japanese sentence from an input part 1, 2 or more characters which are connected to each other are stored temporarily in a register 2, and the 2 character portion stored in the register 2 is stored in a buffer 3a of a control part 3. Also, in a character concatenation dictionary 5, contents as to whether 2 characters connected to each other, of a Japanese sentence can be concatenated or not are stored, the contents of the buffer 3a are collated with the contents of the dictionary 5 by a collating circuit 4, and they are provided to the control part 3. By a result of this collation, an erroneous code generating circuit 7-1, an example start code generating circuit 7-2 and an example end code generating circuit 7-3, of a code generating circuit 7 are driven. Subsequently, a character which can be connected to the previous or the following character of that which is similar to the erroneous character example is outputted as a corrected character to an output part 6, the work for detecting and correcting an error is executed automatically, and the effect of a device is elevated.

Description

【発明の詳細な説明】 本発明は、日本語情報処理システムにおいて文字列デー
タに含まれる誤り文字の検出・修正を支援する装置に関
するものである。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a device for supporting detection and correction of erroneous characters included in character string data in a Japanese information processing system.

従来、例えばワードプロセッサとして商品化されている
言語処理システムにおいては、単語を綴シの文字列とし
て収容した辞書を用い、入力されるデータ(文章、単語
など)の文字列を該辞書に収容されている単語の文字列
と比較・照合することによシ人力文字列の誤りを検出し
、該入力文字列に誤りが含1れている場合には辞書に収
容されている単語の文字(列)に置き換えることにより
、誤りを訂正するように構成されていた。例えば、英語
のワードプロセッサの場合には単語毎に区切られて入力
される文字列を、−文字ずつ辞書中の単語と照合し、文
字の順序も含めて一致する文字の数が最大の単語を入力
された単語と見なして、一致しない文字を誤りとして辞
書中の対応する単語の文字で置き換えることにより、誤
シ文字の訂正を行っている。しかしながら日本語文章を
対象にした場合には漢字の字種が多いこと、また、2字
の漢字で構成される単語が多いため誤り文字の判定が困
難なこと(2字の単語の場合はどちらが誤りかが判らな
い)、さらに文章が分かち書きされないため単語の切出
しが難しいこと、などの日本語自体が有する徨々の(計
算機処理上の)問題点のためにデータ中に含まれる誤#
)を目゛動的に検出し、修正することは難しく、人手に
よるチェックによらざるを得なか゛つた。
Conventionally, language processing systems commercialized as word processors, for example, use a dictionary that stores words as character strings, and store the character strings of input data (sentences, words, etc.) in the dictionary. Errors in the human input character string are detected by comparing and collating them with the character string of the word contained in the dictionary, and if the input character string contains an error, the character (string) of the word stored in the dictionary is detected. It was configured to correct the error by replacing it with . For example, in the case of an English word processor, a string of characters that is input separated into words is checked against words in a dictionary - character by character, and the word with the maximum number of matching characters, including the order of the characters, is entered. Incorrect characters are corrected by treating them as words that do not match and replacing them with characters from the corresponding word in the dictionary. However, when targeting Japanese sentences, it is difficult to identify incorrect characters because there are many types of kanji, and many words are composed of two kanji (in the case of two-character words, which one is wrong? Errors may be included in the data due to various (computer processing) problems that the Japanese language itself has, such as the fact that it is difficult to separate words because sentences are not separated.
) is difficult to dynamically detect and correct, requiring manual checking.

本発明は上記従来の欠点を除去するため、日本語文の文
字列の文字間の接続可否から誤シ文字の候補を検出し、
誤シ文字の候補に類似した文字のうちその前及び又は後
の文字と接続可能な文字を修正文字の候補として出力す
るようにしたもので、その目的とするところは日本語文
の誤り文字の検出・修正作業の効率を上げることにある
。以下、図面について詳細に説明する。
In order to eliminate the above-mentioned conventional drawbacks, the present invention detects incorrect character candidates from the connectivity between characters in a character string of a Japanese sentence,
Among the characters similar to the candidate for the incorrect character, characters that can be connected with the previous and/or subsequent characters are output as candidates for the corrected character.The purpose of this is to detect incorrect characters in Japanese sentences.・It aims to improve the efficiency of correction work. The drawings will be described in detail below.

第1図は本発明の一実施例を示すもので、図中、1は入
力部、2は2文字分の日本語のデータを一時記憶するレ
ジスタ、3は2文字分のデーlk一時記憶するバッファ
3aを有する制御部4は照合回路、5は日本語文の相違
なる2文字の接続可否を内容とした文字連接辞書を記憶
する記憶部(以下、説明の簡略のため単に文字連接辞書
と称す。)、6は出力部、7は誤り符号発生回路7−1
.候補開始符号発生回路7−2゜候補終了符号発生回路
7−3よりなる符号発生回路、8は検索回路、9は個々
の文字に対してに文字分類辞書と称す。)である。レジ
スタ2は2文字分のデータ札納領域を有するシフトレジ
スタで構成され、制御部3の制御の下に入力部lより入
力した文字のデータを一時記憶し、1文字毎に出力部6
に送出する如くなっている。
FIG. 1 shows an embodiment of the present invention. In the figure, 1 is an input section, 2 is a register that temporarily stores Japanese data for two characters, and 3 is a register that temporarily stores data for two characters. A control section 4 having a buffer 3a is a collation circuit, and a storage section 5 stores a character concatenation dictionary containing information on whether or not two different characters in a Japanese sentence can be connected (hereinafter simply referred to as a character concatenation dictionary for the sake of brevity). ), 6 is an output section, 7 is an error code generation circuit 7-1
.. A code generation circuit consisting of a candidate start code generation circuit 7-2 and a candidate end code generation circuit 7-3, 8 a search circuit, and 9 a character classification dictionary for each character. ). The register 2 is composed of a shift register having a data storage area for two characters, and under the control of the control unit 3, temporarily stores the data of the characters input from the input unit l, and outputs the data for each character to the output unit 6.
It looks like it will be sent to.

文字連接辞書5の内容は、例えば第2図に示す如く、相
違なる2文字のうち第1文字を行に配列した文字に、第
2文字を列に配列した文字にそれぞれ対応させ1日本語
の言葉として存在するところの連接を構成する2文字の
交点に′1#の値を1日本語の言葉として存在しないと
ころの連接を構成する2文字の欠点に“oI+の値をそ
れぞれ与えたものである。ここで、前記lll11の値
の代シにその言葉としての使用頻度を表わすような値を
用いることも可能である。なお、図面上では漢字同士の
組み合せしか示されていないが、実際にはかなと漢字、
かな同士等についての接続可否も含まれている。
The contents of the character concatenation dictionary 5 are, for example, as shown in FIG. The value of '1# is given to the intersection of two characters that make up a conjunction that exists as a word, and the value of ``oI+'' is given to the defect of two letters that make up a conjunction that does not exist as a Japanese word. Here, it is also possible to use a value that represents the frequency of use of the word as a substitute for the value of lll11.Although the drawing only shows combinations of kanji, in reality Hakana and Kanji,
It also includes whether or not kana can be connected to each other.

誤り符号発生回路7−1.候補開始符号発生回路7−2
.候補終了符号発生回路7−3は制御部3によシ駆動さ
れ、それぞれ誤り符号゛′/″′(スラッシュ)、候補
開始符号“じ(始め大括弧)、候補終了符号1〕”(終
シ犬括弧)を出力部6に出力する如くなっている。なお
、誤り符号としてはJI8−06226の符号表の空領
域にシヌテム独自に定めた2バイト符号でもよい。
Error code generation circuit 7-1. Candidate start code generation circuit 7-2
.. The candidate end code generation circuit 7-3 is driven by the control unit 3, and generates an error code "'/"' (slash), a candidate start code "same (opening brackets)", and a candidate end code "1" (end code), respectively. (dog parentheses) is output to the output unit 6.The error code may be a 2-byte code uniquely defined by Sinutem in the empty area of the code table of JI8-06226.

文字分類辞書9の内容は1例えば第3図に示す如<、 
JIS−06226のコード順に見出しの漢字を配列し
、各漢字に対応してその漢字と「読み」が共通の漢字を
分類し並べたものと、第4図に示す如く、あいうえお順
に見出しのひらがなを配列し、各ひらがなに対応してそ
のひらがなと「形」が類似したひらがなを分類し並べた
ものと、更に図示しないが、かたかな、アルファベット
等の「形」が類似したものを並べたものとを組み合せた
もの、筐たは第5図に示す如く、見出しの漢字に対応し
てその漢字と「形゛」が類似した漢字を分類し並べたも
のと前記第4図のもの等とを組み合せたもの、あるいは
こねらをすべて組み合せたものである。
The contents of the character classification dictionary 9 are as shown in FIG.
The kanji in the headings are arranged in the order of the JIS-06226 code, and the kanji that have the same "reading" as that kanji are classified and arranged corresponding to each kanji, and the hiragana in the headings are arranged in the order of Ai-ueo as shown in Figure 4. Hiraganas that are similar in shape to each hiragana are categorized and arranged, and katakana, alphabets, etc. that are similar in shape are also arranged, although not shown. As shown in the box or Figure 5, kanji that are similar in shape to the kanji in the heading are classified and arranged according to the kanji in the heading, and as shown in Figure 4 above. It is a combination or a combination of all kneads.

次に動作につ1へて説明する。まず入力部1よジ人力し
た漢字、かな及び句読点等の混在する日本語の文字列デ
ータのうち先頭及び2番目のデータをそれぞれレジスタ
2に第1文字及び第2文字として取り込む。制御部3は
レジスタ2内の2文字のデータをそのま1バツフア3a
に書き込み、更に照合回路4に送る。照合回路4は該レ
ジスタ2の第1文字および第2文字のデータを文字連接
辞書5の行および列の文字データにアクセスし、その交
点の値を制御部3に送出する。該照合回路4からの値が
′1”(即ち。
Next, each operation will be explained one by one. First, the input unit 1 inputs the first and second data of Japanese character string data containing a mixture of kanji, kana, punctuation marks, etc. into the register 2 as the first and second characters, respectively. The control unit 3 transfers the two character data in the register 2 to one buffer 3a.
and further sends it to the verification circuit 4. The matching circuit 4 accesses the data of the first character and the second character of the register 2 to the character data of the row and column of the character concatenation dictionary 5, and sends the value of the intersection to the control section 3. The value from the matching circuit 4 is '1'' (ie.

レジスタ2内の2/)の文字が日本語として正し〈連接
する場合)であれば、制御部3はレジスタ2内の第1文
字のデータをその筐1出力部6に送出し、第2文字を前
記第1文字を格納した領域に移し、入力部1から次の1
文字をレジスタ2に取υ込む。、この後、レジスタ2内
の新しい第1文字、第2文字9データをバッファ3aに
書き込み、前記同様に文字連接の照合を行なう。
If the character 2/) in register 2 is correct as Japanese (if concatenated), the control unit 3 sends the data of the first character in register 2 to the output unit 6 of the case 1, and the second Move the character to the area where the first character was stored, and move the character from input section 1 to the next 1
Load the character into register 2. , After that, the new first character and second character 9 data in the register 2 are written into the buffer 3a, and the character concatenation is checked in the same manner as described above.

照合回路4からの値が“0″(即ち、レジスタ2内の2
つの文字が日本語として連接しない場合)であれば、制
御部3は誤り符号発生回路7−1よシ誤り符号“/”を
出力部6に出力させ、更にレジスタ2内の第1文字のデ
ータを出力部6に送出し次の1文字を入力部1より取り
込む。
The value from the matching circuit 4 is “0” (i.e. 2 in register 2).
If two characters are not concatenated as Japanese characters), the control unit 3 causes the error code generation circuit 7-1 to output an error code “/” to the output unit 6, and furthermore, the control unit 3 causes the error code generation circuit 7-1 to output the error code “/” to the output unit 6, and furthermore, is sent to the output section 6, and the next character is taken in from the input section 1.

但し、ここではバッファ3aの内容はそのま1とする。However, here, the content of the buffer 3a is left unchanged at 1.

次に制御部3は候補開始符号発生回路7−2より候補開
始符号“じを1出力部6に、出力させ、バッファ3a内
の第1文字のデータを検索回路8に送る。検索回路8は
該第1文字のデータを見出しとして文字分類辞書9を検
索し、類似した対応文字を制御部3へ送出する。制御部
3は前記類似した文字を順次第1文字としてバッファ3
a内の第2文字と組み合せた2文字のデータを照合回路
4に送る。照合回路4は文字連接辞書5にアクセスし、
前記各2文字の接続可否をチェックし、”o”iたは1
1#の値を制御部3へ出力する。制御部3は該照合回路
4からの値によって前記類似した文字のうちバッファ3
a内の第2文字の直前に存在し得る文字のみを出力部6
に順次送出し、この後、候補終了符号発生回路7−3か
ら候補終了符号゛〕”を出力部6に送出させる。
Next, the control section 3 causes the candidate start code generation circuit 7-2 to output the candidate start code "ji" to the 1 output section 6, and sends the data of the first character in the buffer 3a to the search circuit 8. The character classification dictionary 9 is searched using the data of the first character as a heading, and similar corresponding characters are sent to the control unit 3.The control unit 3 sequentially stores the similar characters as one character in the buffer 3.
The data of the two characters combined with the second character in a is sent to the matching circuit 4. The collation circuit 4 accesses the character concatenation dictionary 5,
Check whether or not each of the above two characters can be connected, and select “o”i or 1.
The value of 1# is output to the control section 3. The control unit 3 selects one of the similar characters from the buffer 3 according to the value from the matching circuit 4.
The output unit 6 outputs only the characters that can exist immediately before the second character in a.
After that, the candidate ending code "]" is sent out to the output section 6 from the candidate ending code generating circuit 7-3.

次に制御部3は候補開始符号発生回路7−2より出力部
6に候補開始符号“〔”を送出させ、バッファ3a内の
第2文字のデータを検索回路8に送シ、該第2文字を見
出しとして文字分類辞書9を検索させる。、検索回路8
は、該文字分類辞書9よシパツファ3a内の第2文字に
類似した文字を制御部3へ送出し、制御部3は該文字を
順次第2文字としてバッファ3a内の第1文字と組み合
せた2文字のデータを照合回路4へ送り、文字連接辞書
5によって接続可否をチェックさせる。該チェックの結
・果に従って制御部3はバッファ3a内の第1文字の直
後に接続し得る文字のみを出力部6に順次送出し、その
後、候補終了符号発生回路7−3より候補終了符号1′
〕#を出力部6に送出させる。
Next, the control unit 3 causes the candidate start code generation circuit 7-2 to send out the candidate start code “[” to the output unit 6, and sends the data of the second character in the buffer 3a to the search circuit 8. The character classification dictionary 9 is searched using as a heading. , search circuit 8
sends characters similar to the second character in the buffer 3a from the character classification dictionary 9 to the control unit 3, and the control unit 3 sequentially converts the characters into two characters and combines them with the first character in the buffer 3a. The character data is sent to the matching circuit 4, and the character concatenation dictionary 5 checks whether or not the connection is possible. According to the result of the check, the control unit 3 sequentially sends only characters that can be connected immediately after the first character in the buffer 3a to the output unit 6, and then the candidate termination code generation circuit 7-3 generates candidate termination code 1. ′
] Send # to the output section 6.

次に制御部3はレジスタ2内の次の2文字分のデータを
ノ々ツファ3aに新しく書き込み、前記同様に文字連接
の照合を行ない、以下これを繰り返す。
Next, the control section 3 newly writes data for the next two characters in the register 2 into the notation buffer 3a, and checks the character concatenation in the same manner as described above, and repeats this process.

第6図は前記実施例で説明した処理の流れをフローで示
したものである。
FIG. 6 is a flowchart showing the process flow explained in the above embodiment.

第7図は前記実施例における入力文字列と出力文字列の
一例を示すもので、図中ai、aiJ (1+J≧1)
は日本語文の各文字を表わしている。
FIG. 7 shows an example of an input character string and an output character string in the above embodiment, and in the figure ai, aiJ (1+J≧1)
represents each character of a Japanese sentence.

ここでは入力文字列中、2番目の文字a2が誤9文字で
あった場合(即ちal a2及びa2 a3という文字
連接が日本語に存在しない)を示している。
Here, a case is shown in which the second character a2 in the input character string is an incorrect nine characters (that is, the character concatenations ``al a2'' and ``a2 a3'' do not exist in Japanese).

(axの下線は誤り文字であることを示すために付した
もので、実際の文字列には記されていない。)出力文字
列中h  〔all al2・・・・・・〕は81に類
似した文字であってその後に32が接続可能なもの、即
ちa2が正しく入力された文字で81が誤り文字である
と見なした場合のalの代りになり得る修正文字の候補
の集合である。またC a’21 a22・・・・・・
〕はa2に類似した文字であってその前に31が接続可
能なもの、即ちalが正しく入力された文字で勾が誤シ
文字であると見なした場合の32の の代りになシ得る修正文字の候補集合である。
(The underline in ax is added to indicate that it is an error character, and is not written in the actual character string.) In the output string, h [all al2...] is similar to 81. This is a set of candidates for corrected characters that can be substituted for al when it is assumed that a2 is a correctly input character and 81 is an error character. Also C a'21 a22...
] is a character similar to a2 and can be connected to 31 before it, that is, it can be used instead of 32 if al is a correctly input character and gradient is an incorrect character. This is a candidate set of modified characters.

また〔al21a′、2・・・〕、〔a3、a32・・
・〕についても前記同様であるが&  Ca21 a’
22・・・〕は33が接続することができる文字の集合
であり、一般には〔a21a22 ・・・〕とは異なる
Also, [al21a', 2...], [a3, a32...]
・] is the same as above, but & Ca21 a'
22...] is a set of characters to which 33 can be connected, and is generally different from [a21a22...].

前記の如くして検出された誤り文字の候補、及び!a9
文字に置き換わる修正文字の候補は一般に人間の判断に
よって正誤を決定され、誤り文字の修正が行われる。そ
のために使用される表示装置、正誤指示装置、文字指定
装置1文字人力装置等(図示せず)は本発明装置の後(
実流側では出力部6の後)に配置される如くなる。
Candidates for erroneous characters detected as described above, and! a9
Candidates for correction characters to replace characters are generally determined to be correct or incorrect by human judgment, and incorrect characters are corrected. A display device, a correct/incorrect indicating device, a character specifying device, a single character manual device, etc. (not shown) used for this purpose are provided after the device of the present invention (
On the actual flow side, it is placed after the output section 6).

なお、文字連接辞書5の内容は前述した通りであるが、
この辞書を作成する際の日本語文の対象として、例えば
国語辞書に収録されている”語″を用いれば、1単語”
内での文字の接続可否条件の辞書とすることができる。
The contents of the character concatenation dictionary 5 are as described above,
For example, if we use the "words" included in the Japanese dictionary as the target of the Japanese sentences when creating this dictionary, one word"
It can be used as a dictionary of conditions for allowing characters to be connected within.

また、一般の文章全体における文字の接続可否条件の辞
書とする場合には本発明装置を適用して誤シ文字を検出
の修正する対象の入力文字列データから作成することが
できる。但し、文字連接辞書5の作成の際に対象とした
入力データの言葉の範囲と実際に検出・修正支援処理を
行なう入力データの言葉の範囲とが異なる(例えば法律
用語と医学用語のように分野が異なる)ような場合には
、検出された誤シが全て誤りであるとは限らない。
Furthermore, in the case of creating a dictionary of conditions for connecting characters in a general text as a whole, the device of the present invention can be applied to create a dictionary from input character string data for detecting and correcting erroneous characters. However, the range of words in the input data targeted when creating the character concatenation dictionary 5 is different from the range of words in the input data for which detection/correction support processing is actually performed (for example, legal terminology and medical terminology). (different), not all detected errors are errors.

また文字分類辞書9内の漢字に対応する部分が、第3図
に示すような゛読み”による分類の場合は、5.0前記
列のタブレット形入力装置のように各漢字が“読み”を
手掛りにして入力されるような入力装置を使用して入力
された文字列のデータの誤り検出・修正に有効であシ、
第5図に示すよりな6形”による分類の場合は。
In addition, if the part corresponding to the kanji in the character classification dictionary 9 is classified by ``yomi'' as shown in Figure 3, each kanji has a ``yomi'' as in the case of the tablet type input device in the column 5.0. It is effective for detecting and correcting errors in character string data input using input devices that use cues.
In the case of classification according to the 6th form shown in Figure 5.

漢字OCRのように各漢字が“形″を手掛りにして入力
されるような入力装置によって入力された文字列のデー
タの誤シ検出・修正に有効である。
It is effective for detecting and correcting errors in character string data input by an input device such as Kanji OCR, in which each Kanji character is input using its "shape" as a clue.

誤り符号、候補開始符号、候補終了符号等は必ずしも全
て必要なわけではなく、少なくすることも可能である。
The error code, candidate start code, candidate end code, etc. are not all necessarily necessary, and they can be reduced.

(例えば、誤シ符号”/”は説明を容易にするためのも
ので、省略してもよい。) 誤り文字検出・修正の精度の向上は2文字だけでなく、
3文字、4文字・・・・・・とより長い範囲の文字間で
連接をチェックすることによってなし得る。これは付属
語、代名詞、形式名詞などと付属語との接続のチェック
に有効である。
(For example, the error symbol "/" is for ease of explanation and may be omitted.) The accuracy of error character detection and correction is improved not only for two characters.
This can be done by checking the concatenation between characters in a longer range, such as 3 characters, 4 characters, etc. This is effective for checking the connection between adjuncts, pronouns, formal nouns, etc. and adjuncts.

以上説明したように本発明によれば1日本語文の文字列
データに含まれる誤シ文字の候補を検出し、更にその文
字に代るべき修正用の文字の候補を出力するため、従来
すべて人手に頼っていた日本語文のデータチェック(い
わゆるベリファイ)の作業の一部を分担させることがで
き、作業の効率を上げることができるとともに誤り文字
検出・修正の精度を上げることができる利点がある。
As explained above, according to the present invention, candidates for incorrect characters included in the character string data of one Japanese sentence are detected, and candidates for correction characters to replace the detected characters are output. This has the advantage of being able to share part of the work of data checking (so-called verification) for Japanese text, which was previously relied on by the system, increasing work efficiency and improving the accuracy of detecting and correcting erroneous characters.

なお、本発明は日本語ワードゾロセツザ、漢字OCR,
音声認識による日本語入力装置等、いわゆる日本語入力
装置すべてに適用でき、また、一度人力されて出来上が
ったデータ(例えばデータベースに収容されているデー
タ)の誤シ文字検出・修正にも適用できる。
In addition, the present invention is applicable to Japanese word zoro setsuza, kanji OCR,
It can be applied to all so-called Japanese input devices, such as Japanese input devices using voice recognition, and can also be applied to detecting and correcting erroneous characters in data that has been created manually (for example, data stored in a database).

【図面の簡単な説明】[Brief explanation of drawings]

図面は本発明の説明に供するもので、第1図は本発明の
一実施例を示す誤シ文字検出・修正支援装置のブロック
構成図、第2図は文字連接辞書の内容の一例を示す説明
図1第3図1第4図及び第5図は文字分類辞書の内容の
一例を示す説明図、第6図は第1図の装置における処理
のフローチャート、第7図は入力文字列と出力文字列の
一例を示す説明図である。 1・・・・・・入力部、2・・・・・・レジスタ、3・
・・・・・制御部、4・・・・・・照合回路、5・・・
・・・文字連接辞書、6・・・・・・出力部、7・・・
・・・符号発生回路、8・・・・・・検索回路、9・・
・・・・文字分類辞書。 特許出願人  日本電信電話公社 代理人 弁理士   吉  1) 精  孝第1図 第2図 第3図 [見出し] [対応漢字] 第4図 [見出し1 [り↑応文字] 第5図 [見出し] 〔対応シ莢字] 第7図 入力文字列 = θ102θ304・・・謂桔
The drawings serve to explain the present invention; FIG. 1 is a block diagram of an erroneous character detection/correction support device showing one embodiment of the present invention, and FIG. 2 is an explanatory diagram showing an example of the contents of a character concatenation dictionary. Figure 1 Figure 3 Figure 1 Figures 4 and 5 are explanatory diagrams showing an example of the contents of a character classification dictionary, Figure 6 is a flowchart of processing in the device in Figure 1, and Figure 7 is an input character string and output characters. It is an explanatory diagram showing an example of a column. 1...Input section, 2...Register, 3.
...Control unit, 4...Verification circuit, 5...
...Character concatenation dictionary, 6...Output section, 7...
... code generation circuit, 8 ... search circuit, 9 ...
...Character classification dictionary. Patent Applicant Nippon Telegraph and Telephone Public Corporation Agent Patent Attorney Yoshi 1) Takashi Sei Figure 1 Figure 2 Figure 3 [Heading] [Corresponding Kanji] Figure 4 [Heading 1 [Ri↑Corresponding Characters] Figure 5 [Heading] [Supported characters] Figure 7 Input character string = θ102θ304... So-called 桔

Claims (1)

【特許請求の範囲】 日本語文の文字列のうち相違なる2文字以上を一時記憶
する第1の手段と、日本語文の相違なる少なくとも2文
字の接続可否を内容とする文字連接辞書を記憶する第2
の手段と、個々の文字に対してその特徴によって類似す
る文字を分類した文字分類辞書を記憶する第3の手段と
。 前記第1の手段の内容と第2の手段の内容とを照合し誤
、シ文字の候補を検出する第4の手段と。 該誤り文字の候補に類似する文字を前記第3の手段よシ
検索し前記第2の手段の内容と照合して第1の手段の他
の文字と接続可能な文字のみを選択して出力する第5の
手段とからなる誤シ文字検出・修正支援装置。
[Scope of Claims] A first means for temporarily storing two or more different characters in a character string of a Japanese sentence, and a second means for storing a character concatenation dictionary whose content is whether or not at least two different characters in a Japanese sentence can be connected. 2
and a third means for storing a character classification dictionary in which characters similar to each character are classified according to their characteristics. and fourth means for comparing the contents of the first means with the contents of the second means to detect an erroneous C character candidate. Searching the third means for characters similar to the candidate for the erroneous character, comparing them with the contents of the second means, and selecting and outputting only characters that can be connected to other characters in the first means. A false character detection/correction support device comprising a fifth means.
JP56198941A 1981-12-10 1981-12-10 Erroneous character detection and correction backing device Granted JPS5899829A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP56198941A JPS5899829A (en) 1981-12-10 1981-12-10 Erroneous character detection and correction backing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP56198941A JPS5899829A (en) 1981-12-10 1981-12-10 Erroneous character detection and correction backing device

Publications (2)

Publication Number Publication Date
JPS5899829A true JPS5899829A (en) 1983-06-14
JPH0248938B2 JPH0248938B2 (en) 1990-10-26

Family

ID=16399508

Family Applications (1)

Application Number Title Priority Date Filing Date
JP56198941A Granted JPS5899829A (en) 1981-12-10 1981-12-10 Erroneous character detection and correction backing device

Country Status (1)

Country Link
JP (1) JPS5899829A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01180666A (en) * 1988-01-13 1989-07-18 Sanyo Electric Co Ltd Connectability testing device
JPH01291366A (en) * 1988-05-18 1989-11-22 Nec Corp Grammatical case recognizing system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01180666A (en) * 1988-01-13 1989-07-18 Sanyo Electric Co Ltd Connectability testing device
JPH01291366A (en) * 1988-05-18 1989-11-22 Nec Corp Grammatical case recognizing system

Also Published As

Publication number Publication date
JPH0248938B2 (en) 1990-10-26

Similar Documents

Publication Publication Date Title
US7584093B2 (en) Method and system for generating spelling suggestions
US5161245A (en) Pattern recognition system having inter-pattern spacing correction
JPH07325828A (en) Grammar checking system
JPH07325824A (en) Grammar checking system
JPS61217863A (en) Electronic dictionary
Kaur et al. Spell Checking and Error Correcting System for text paragraphs written in Punjabi Language using Hybrid approach
JPS5899829A (en) Erroneous character detection and correction backing device
JP2870375B2 (en) Sentence correction device
WO2022059556A1 (en) Document retrieval device
JP2908460B2 (en) Error recognition correction method and apparatus
Samsuri et al. A comparison of distributed, pam, and trie data structure dictionaries in automatic spelling correction for indonesian formal text
JPS60164864A (en) Device for processing data
JP3390567B2 (en) Typo correction device
JP3045886B2 (en) Character processing device with handwriting input function
JP3387421B2 (en) Word input support device and word input support method
JPH087046A (en) Document recognition device
JPS62212871A (en) Sentence reading correcting device
JPH026098B2 (en)
JP2693489B2 (en) Japanese sentence error detection method
JPH1078953A (en) Address notation conversion and check method
JPH02136959A (en) Extracting device for correction candidate of japanese sentence
JPH0540853A (en) Post-processing system for character recognizing result
JPH0484261A (en) Error notation retrieval system
JPH0378081A (en) Word processor
JPS6029823A (en) Adaptive type symbol string conversion system