JP3274014B2

JP3274014B2 - Character recognition device and character recognition method

Info

Publication number: JP3274014B2
Application number: JP05736294A
Authority: JP
Inventors: 里志江村; 一郎中尾; 磨理子竹之内; 穂高倉
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1994-03-28
Filing date: 1994-03-28
Publication date: 2002-04-15
Anticipated expiration: 2017-04-15
Also published as: JPH07271921A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、文字認識装置および文
字認識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device and a character recognition method.

【０００２】[0002]

【従来の技術】近年印刷または手書きされた文書を光／
電気変換等で読み取った上、いったん画素毎にビット情
報化された画像データ情報とし、この上でこの画像デー
タ中の文字を認識してデータ入力の省力化を図ったり、
更に外国語に翻訳したり、盲人や視力障害者のために発
声するようなシステムの研究、開発がなされ、また一部
実用化されている。2. Description of the Related Art Documents printed or handwritten in recent years are
After being read by electrical conversion or the like, image data information is temporarily converted into bit information for each pixel, and then characters in this image data are recognized to save data input,
Research and development of systems for translating into foreign languages and uttering for blind and visually impaired persons have been studied and developed, and some of them have been put to practical use.

【０００３】本発明は、このようなシステムに採用され
る文字認識装置に関するものである。さて、従来よりこ
のようなシステムでの文字認識装置においては、認識精
度を向上する目的で、単語辞書を用いて文字認識結果の
修正を行う後処理が行なわれている。すなわち、単語の
出現頻度の多寡を考慮しない後処理では、文字認識のみ
の評価値で単語が決定されてしまうため、認識率がいま
ひとつであるという欠点がある。そこで、例えば特開平
１−２５５９８９号、特開平３−１９８１８０号、特開
平４−２５６１９４号に示されているように、単語の出
現頻度を含む単語辞書を用いる後処理が提案されてい
る。以下、これらの単語辞書を用いる文字認識装置につ
いて、図７および図８を用いて説明する。図７は従来の
文字認識装置の構成図である。本図において７０１は文
字特徴抽出手段であり、７０２は文字認識辞書であり、
７０３は文字認識手段であり、７０４は単語辞書であ
り、７０５は単語探索手段であり、７０６は単語決定手
段であり、７０７は単語修正手段であり、７０８は単語
辞書更新手段である。図８は、この文字認識装置におけ
る処理例の図である。文字特徴抽出手段７０１は、文字
列が記載された画像データから文字画像を切り出し、切
り出された文字画像から特徴パターンを抽出する。文字
認識辞書７０２は、文字コードとこの文字コードに対応
する文字を識別するのに用いる識別パターンの組を格納
している。文字認識手段７０３は、文字特徴抽出手段７
０１によって抽出された特徴パターンと文字認識辞書７
０２に登録されている識別パターンとを比較して、該当
する文字コード及びその確からしさを示す認識評価値か
らなる、そして認識対象となっている切り出された１文
字データ当たりあらかじめ定められた個数以下の認識候
補文字データを出力する。単語辞書７０４は、文字列に
出現可能な単語の文字コードと出現頻度の組を格納して
いる。単語探索手段７０５は、文字認識手段７０３によ
って出力された認識候補文字の組合せからなる単語につ
いて単語辞書７０４中を探索し、単語辞書に格納されて
いる単語とその出現頻度を出力する。単語決定手段７０
６は、単語探索手段７０５で得られた単語のうち、出現
頻度が最も高い単語を出力する。単語修正手段７０７
は、単語決定手段７０６によって出力された単語につい
て、使用者が正しい単語に修正することを可能とさせ
る。単語辞書更新手段７０８は、単語修正手段７０７で
修正が行われなかった場合は単語決定手段７０６が出力
した単語の、修正が行われた場合には修正された単語に
ついてもその単語辞書７０４に登録されている出現頻度
を更新する。The present invention relates to a character recognition device used in such a system. Conventionally, in a character recognition apparatus of such a system, post-processing for correcting a character recognition result using a word dictionary has been performed for the purpose of improving recognition accuracy. That is, in the post-processing that does not consider the frequency of appearance of the word, the word is determined only by the evaluation value of the character recognition, and thus the recognition rate is not sufficient. Therefore, post-processing using a word dictionary including the frequency of appearance of a word has been proposed as disclosed in, for example, JP-A-1-255589, JP-A-3-198180, and JP-A-4-256194. Hereinafter, character recognition apparatuses using these word dictionaries will be described with reference to FIGS. FIG. 7 is a configuration diagram of a conventional character recognition device. In the figure, reference numeral 701 denotes a character feature extracting unit, 702 denotes a character recognition dictionary,
Reference numeral 703 denotes a character recognition unit, 704 denotes a word dictionary, 705 denotes a word search unit, 706 denotes a word determination unit, 707 denotes a word correction unit, and 708 denotes a word dictionary update unit. FIG. 8 is a diagram of a processing example in this character recognition device. The character feature extracting unit 701 extracts a character image from image data in which a character string is described, and extracts a characteristic pattern from the extracted character image. The character recognition dictionary 702 stores a set of a character code and an identification pattern used to identify a character corresponding to the character code. The character recognizing unit 703 is a character feature extracting unit 7
01 and the character recognition dictionary 7
Compared with the identification pattern registered in No. 02, it is composed of a corresponding character code and a recognition evaluation value indicating its certainty, and is equal to or less than a predetermined number per cut-out character data to be recognized. Output recognition candidate character data. The word dictionary 704 stores sets of character codes and appearance frequencies of words that can appear in a character string. The word search unit 705 searches the word dictionary 704 for a word composed of a combination of recognition candidate characters output by the character recognition unit 703, and outputs the word stored in the word dictionary and its appearance frequency. Word determination means 70
6 outputs the word having the highest appearance frequency among the words obtained by the word search means 705. Word correction means 707
Allows the user to correct the word output by the word determination means 706 to a correct word. The word dictionary updating unit 708 registers the word output by the word determining unit 706 when the word correcting unit 707 has not corrected the word, and also registers the corrected word when the word correcting unit 707 has corrected the word in the word dictionary 704. Update the appearance frequency that has been set.

【０００４】次に、以上のように構成された文字認識装
置の単語探索以降の処理について、図８を用いて説明す
る。図８（ａ）は、単語「松居」が記載された画像デー
タに対する文字認識手段７０３の出力例を示す。図８
（ｂ）は、単語辞書中で探索される単語の例を示す。図
８（ｃ）は、単語辞書７０４に格納されている単語とそ
の出現頻度の例を示す。文字認識手段７０３は、図８
（ａ）に示す認識候補文字を出力したとする。最初の文
字「松」に対しては「松」、「林」、「拡」の認識候補
文字が得られ、その確からしさを示す認識評価値がそれ
ぞれ、６２、７０、７２であったとする。なお、後に説
明する実施例でも認識評価値は値が小さいほどより確か
らしいものとする。二番目の文字「居」に対しては認識
候補文字「居」、「届」、「尾」が得られ、認識評価値
はそれぞれ５８、６４、７３であったとする。単語探索
手段７０５は、図８（ａ）に示す認識候補文字「松」、
「林」、「拡」および「居」、「届」、「尾」との組合
せからなる単語、つまり図８（ｂ）に示した「松居」、
「松届」、「松尾」、「林居」等９個の単語について、
これらが単語辞書７０４中に登録されているか否かを調
べる。その結果、出現頻度が１０の単語「松居」と出現
頻度が５００の単語「松尾」のみが存在したとする。単
語決定手段７０６は、単語探索手段７０５で得られた単
語「松居」と「松尾」のうち、より出現頻度が高い単語
「松尾」を正しい単語と決定して文字認識結果として出
力する。使用者は、単語修正手段７０７を使用して、単
語決定手段７０６が誤って決定の上出力した単語「松
尾」を正しい「松居」と修正する。単語辞書更新手段７
０８は、単語修正手段で修正された単語「松居」の単語
辞書７０４に格納されている出現頻度１０を更新して１
１とする。[0004] Next, the processing after the word search of the character recognition device configured as described above will be described with reference to FIG. FIG. 8A shows an output example of the character recognition unit 703 for image data in which the word “Matsui” is described. FIG.
(B) shows an example of a word searched in the word dictionary. FIG. 8C shows an example of words stored in the word dictionary 704 and their appearance frequencies. FIG.
It is assumed that the recognition candidate character shown in FIG. For the first character "matsu", it is assumed that recognition candidate characters "matsu", "forest", and "extended" are obtained, and the recognition evaluation values indicating the certainty are 62, 70, and 72, respectively. In the embodiment described later, it is assumed that the smaller the value of the recognition evaluation value is, the more certain the recognition evaluation value is. It is assumed that recognition candidate characters “i”, “notification”, and “tail” are obtained for the second character “i”, and the recognition evaluation values are 58, 64, and 73, respectively. The word search means 705 determines the recognition candidate character “matsu” shown in FIG.
A word composed of a combination of "bayashi", "extended" and "living", "notice", "tail", that is, "matsui" shown in FIG.
About nine words such as "Matsunori", "Matsuo", "Hayashi"
It is checked whether or not these are registered in the word dictionary 704. As a result, it is assumed that only the word “Matsui” having an appearance frequency of 10 and the word “Matsuo” having an appearance frequency of 500 exist. The word determination unit 706 determines the word “Matsuo” having a higher appearance frequency as a correct word among the words “Matsui” and “Matsuo” obtained by the word search unit 705 and outputs the word as a character recognition result. The user uses the word correction unit 707 to correct the word “Matsuo” determined and output by the word determination unit 706 erroneously as the correct “Matsui”. Word dictionary updating means 7
08 updates the appearance frequency 10 stored in the word dictionary 704 of the word “Matsui” corrected by the word correcting means to 1
Let it be 1.

【０００５】なお、本発明に関係する従来の技術として
は、その他認識評価値に関する制限条件に基づいて単語
探索を行う単語を制限する方法（特開平１−２５５９８
９、特開平４−２５６１９４）や、時系列に単語の出現
頻度を持つ方法（特開平４−２５６１９４）も提案され
ている。また、文字列や文字の切り出し、文字認識及び
認識評価値については、例えば特願昭６３−３１２２８
８号「文字認識方法」、特願昭６０−１０６４０４号
「文字認識装置」、特開平５−１２８３０７号「文字認
識装置」、特開平５−１２８３０８号「文字認識装置」
等にて公開されているいわば周知の技術であるため、そ
れらの詳細な説明は省略する。[0005] As a conventional technique related to the present invention, there is a method of restricting words for which word search is performed based on other restrictive conditions regarding recognition evaluation values (Japanese Patent Laid-Open No. 25598/1995).
9, Japanese Patent Application Laid-Open No. 4-256194) and a method in which words have frequency of appearance in time series (Japanese Patent Application Laid-Open No. 4-256194) have been proposed. Further, regarding the extraction of character strings and characters, character recognition and recognition evaluation values, see, for example, Japanese Patent Application No. 63-31228.
No. 8, "Character recognition method", Japanese Patent Application No. 60-106404, "Character recognition device", JP-A-5-128307, "Character recognition device", JP-A-5-128308, "Character recognition device"
And so on, so to speak, well-known techniques, and a detailed description thereof will be omitted.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら上記のよ
うな文字認識装置では、認識候補文字からなる単語につ
いて、出現頻度だけで評価を行って認識結果を決定して
いる。このためたとえ正しい単語が、個々の認識候補文
字の上位に位置するもののみからなっている場合でも、
本来、正しいとされるべき正解単語が単語辞書に登録さ
れていない場合は勿論のこと、たとえ単語辞書に登録さ
れていても該単語そのものの出現頻度が低い値とされて
いるとき場合には、より出現頻度の高い単語を誤って出
力してしまうことがある。However, in the above-described character recognition apparatus, a recognition result is determined by evaluating a word composed of recognition candidate characters based on only the appearance frequency. Therefore, even if the correct word consists only of the ones that are higher in the individual recognition candidate characters,
Naturally, not only when the correct word to be regarded as correct is not registered in the word dictionary, but also when the appearance frequency of the word itself is a low value even if registered in the word dictionary, A word with a higher frequency of occurrence may be output by mistake.

【０００７】本発明は、上記問題点に鑑み、総合的な判
断によって文字認識を行う文字認識装置を提供すること
を目的としてなされたものである。また、そのような文
字認識装置に学習能力を持たせることを目的としてなさ
れたものである。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and has as its object to provide a character recognition apparatus that performs character recognition by comprehensive judgment. Further, the object of the present invention is to provide such a character recognition device with a learning ability.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するため
に、本発明においては、文字を認識するのに使用する
点、線分、曲線等の特徴パターンを抽出する文字特徴抽
出手段と、文字コードと該文字コードとの文字であるこ
とを識別するのに使用される識別パターンの組を登録
（照合に利用可能な態様での記憶）した文字認識辞書
と、前記文字特徴抽出手段によって抽出された特徴パタ
ーンと前記文字認識辞書に登録されている識別パターン
とを比較して所定の手順で認識評価値を計算することに
より、抽出した１文字当たりあらかじめ定められた個数
以下の認識候補文字につき、その文字コード及びその確
からしさを示す認識評価値からなる認識候補文字データ
を出力する仮文字認識手段と、各文字毎に、その文字と
組み合わせて単語を作る文字の文字コードとその出現頻
度（出現の可能性を示す数値）の組を登録している単語
辞書と、相連続して認識対象となっている複数の文字毎
に、前記仮文字認識手段によって出力された各認識候補
文字を文字列中の位置を不変としたまま組み合わせるこ
とにより単語（候補文字の組合せ、実際の使用は不問）
を作成し、前記単語辞書中にこの単語が登録されている
か否かを調べ、若し登録されているならばその文字と単
語辞書に登録されている出現頻度を出力する単語探索手
段と、前記単語検索手段によって出力された単語につい
て前記仮文字認識手段によって出力された認識評価値
と、前記単語探索手段によって出力された出現頻度とか
ら、所定の手順で単語評価値を計算する単語評価手段
と、前記単語評価手段によって計算された単語評価値を
もとに正しい単語を決定する単語決定手段と、前記単語
決定手段によって決定された単語が誤っていた場合に
は、使用者に正しい単語に修正可能とさせる単語修正手
段と、前記単語決定手段によって決定された単語が誤っ
ていた場合には、前記単語評価手段において認識評価値
と出現頻度とから単語評価値を算出する際に用いる出現
頻度に関する重みを変化させる評価手順修正手段とを備
えることにより文字認識を行うことを特徴としている。In order to achieve the above object, the present invention provides a character feature extracting means for extracting a feature pattern such as a point, a line segment, a curve, etc. used for recognizing a character; A character recognition dictionary in which a set of identification patterns used to identify a code and a character corresponding to the character code is registered (stored in a form usable for collation), and extracted by the character feature extraction means. By comparing the feature pattern and the identification pattern registered in the character recognition dictionary and calculating a recognition evaluation value in a predetermined procedure, the number of recognition candidate characters equal to or less than a predetermined number per extracted character is calculated. Provisional character recognition means for outputting recognition candidate character data comprising the character code and a recognition evaluation value indicating the likelihood thereof, and for each character, forming a word by combining the character A word dictionary in which a set of character code of character and its appearance frequency (numerical value indicating the possibility of appearance) is registered, and for each of a plurality of characters to be continuously recognized, the provisional character recognition means Words (combination of candidate characters, actual use is irrelevant) by combining each output recognition candidate character with the position in the character string unchanged.
Word search means for checking whether or not this word is registered in the word dictionary, and, if registered, outputting the character and the appearance frequency registered in the word dictionary; and Word evaluation means for calculating a word evaluation value in a predetermined procedure from a recognition evaluation value output by the provisional character recognition means for the word output by the word search means and an appearance frequency output by the word search means; Word determination means for determining a correct word based on a word evaluation value calculated by the word evaluation means ;
If the word determined by the determining means is incorrect
Is a word correction hand that allows the user to correct the word
And the word determined by the word determination means is incorrect.
If the recognition evaluation value is
Used to calculate the word evaluation value from the word and the appearance frequency
Character recognition is performed by providing an evaluation procedure correcting means for changing a weight related to frequency .

【０００９】本発明においては、文字を認識するのに使
用する特徴パターンを抽出する文字特徴抽出手段と、文
字コードとその文字コードの文字であることを識別する
のに使用される識別パターンの組を登録した文字認識辞
書と、各文字毎に、その文字と組み合わせて単語を作る
文字の文字コードとその出現頻度の組を登録している単
語辞書と、前記単語辞書に登録されている単語につい
て、その単語を構成する文字コードに対応する前記文字
認識辞書に格納されている文字の識別パターンと前記文
字特徴抽出手段によって抽出された特徴パターンとを比
較して、所定の手順で両者の類似度を示す認識評価値を
計算の上出力する認識評価値計算手段と、前記認識評価
値計算手段によって構成する文字の認識度が高い単語を
候補単語として所定数選出し、この上でこれらの前記単
語辞書に格納されている出現頻度を出力する候補単語出
現頻度出力手段と、前記認識評価値計算手段によって出
力された認識評価値と前記候補単語出現頻度出力手段の
出力した出現頻度とから、所定の手順で各候補単語につ
いて単語評価値を計算する単語評価手段と、前記単語評
価手段によって計算された単語評価値をもとに正しい単
語を選択の上決定する単語決定手段と、前記単語決定手
段によって決定された単語が誤っていた場合には、使用
者に正しい単語に修正可能とさせる単語修正手段と、前
記単語決定手段によって決定された単語が誤っていた場
合には、前記単語評価手段において認識評価値と出現頻
度とから単語評価値を算出する際に用いる出現頻度に関
する重みを変化させる評価手順修正手段とを備えること
により文字認識を行うことを特徴としている。In the present invention, a character feature extracting means for extracting a feature pattern used to recognize a character, and a set of a character code and an identification pattern used to identify a character having the character code. , A character dictionary for each character, a set of character codes of characters that make up a word by combining the characters, and a set of occurrence frequencies thereof, and a word registered in the word dictionary. Comparing the character identification pattern stored in the character recognition dictionary corresponding to the character code constituting the word with the characteristic pattern extracted by the character characteristic extracting means, and determining the similarity between the two in a predetermined procedure. A recognition evaluation value calculating means for calculating and outputting a recognition evaluation value indicating the following, and a word having a high degree of recognition of a character formed by the recognition evaluation value calculating means is determined as a candidate word A candidate word appearance frequency output means for selecting and outputting the appearance frequencies stored in these word dictionaries; a recognition evaluation value output by the recognition evaluation value calculation means; and the candidate word appearance frequency output means. And a word evaluation unit that calculates a word evaluation value for each candidate word in a predetermined procedure from the appearance frequency output by the word processing unit, and a correct word is selected and determined based on the word evaluation value calculated by the word evaluation unit. Word deciding means and said word deciding means
Use if word determined by column is incorrect
Word correction means that allows the user to correct the correct word
If the word determined by the written word determination means is incorrect
In the case, the word evaluation means uses the recognition evaluation value and the appearance frequency.
And the frequency of appearance used to calculate word evaluation values from
Character recognition is performed by providing an evaluation procedure correcting means for changing a weight to be performed.

【００１０】また、本発明は、文字を認識するのに使用
する特徴パターンを抽出する文字特徴抽出手段と、文字
コードと該文字コードの文字であることを識別するのに
使用される識別パターンの組を登録した文字認識辞書
と、前記文字特徴抽出手段によって抽出された特徴パタ
ーンと前記文字認識辞書に登録されている識別パターン
とを比較して所定の手順で認識評価値を計算することに
より、抽出した１文字当たりあらかじめ定められた個数
以下の認識候補文字につき、その文字コード及びその確
からしさを示す認識評価値からなる認識候補文字データ
を出力する仮文字認識手段と、各文字毎に、その文字と
組み合わせて単語を作る文字の文字コードとその出現頻
度の組を登録している単語辞書と、相連続して認識対象
となっている複数の文字毎に、前記仮文字認識手段によ
って出力された各認識候補文字を文字列中の位置を不変
としたまま組み合わせることにより単語を作成し、前記
単語辞書中にこの単語が登録されているか否かを調べ、
若し登録されているならばその文字と単語辞書に登録さ
れている出現頻度を出力する単語探索手段と、前記単語
辞書に登録されている単語について、前記仮文字認識手
段によって出力された認識評価値のうち当該単語を構成
する認識候補文字の認識評価値と、前記単語探索手段に
よって出力された当該単語の出現頻度とから、所定の手
順で単語評価値を計算し、前記単語辞書に登録されてい
ない単語について、前記仮文字認識手段によって出力さ
れた認識評価値のうち当該単語を構成する認識候補文字
の認識評価値のみから所定の手順で単語評価値を計算す
る単語評価手段と、前記単語辞書に登録されている単語
の前記単語評価値と、前記単語辞書に登録されていない
単語の前記単語評価値とを基にし、これらの単語の中か
ら正しい単語を決定する単語決定手段とを備えることに
より文字認識を行うことを特徴とする。 The present invention also relates to a method for recognizing characters.
Character feature extraction means for extracting a feature pattern
To identify the code and the character of the character code
A character recognition dictionary that registers a set of identification patterns to be used
And the characteristic pattern extracted by the character characteristic extracting means.
Patterns and identification patterns registered in the character recognition dictionary
To calculate the recognition evaluation value in a predetermined procedure by comparing
From the number of characters extracted per character
For the following recognition candidate characters, their character codes and their
Recognition candidate character data consisting of recognition evaluation values indicating karasashi
And a provisional character recognition unit that outputs
Character codes of characters that make up words by combining them and their frequency
The word dictionary that registers the set of degrees and the recognition target continuously
For each of the plurality of characters
The position of each candidate character output in the string is unchanged.
Create a word by combining
Check if this word is registered in the word dictionary,
If it is registered, it is registered in the character and word dictionary.
Word search means for outputting the occurrence frequency of the word,
For the words registered in the dictionary,
Construct the word in the recognition evaluation value output by the step
The recognition evaluation value of the recognition candidate character to be
Therefore, a predetermined hand is determined from the output frequency of the word.
The word evaluation values are calculated in order, and are registered in the word dictionary.
Words that are not output by the provisional character recognition means.
Recognition candidate characters that make up the word among the recognized recognition evaluation values
Calculate word evaluation value from the recognition evaluation value of
Word evaluation means, and words registered in the word dictionary
And the word evaluation value is not registered in the word dictionary
Based on the word evaluation value of a word,
Word determining means for determining the correct word from
Character recognition is performed more.

【００１１】ここで、前記単語決定手段によって決定さ
れた単語が誤っていた場合には、使用者に正しい単語に
修正可能とさせる単語修正手段と、前記単語決定手段に
よって決定された単語が誤っていた場合には、前記単語
評価手段において認識評価値と出現頻度とから単語評価
値を算出する際に用いる出現頻度に関する重みを変化さ
せる評価手順修正手段とを備えることを特徴とする。 Here, the word is determined by the word determining means.
If the incorrect word is incorrect, ask the user for the correct word.
Word correction means for enabling correction, and
Therefore, if the determined word is incorrect, the word
Word evaluation based on recognition evaluation value and appearance frequency in evaluation means
The weight of the appearance frequency used to calculate the value is changed.
And an evaluation procedure correcting means for causing the evaluation procedure to be performed.

【００１２】ここで、前記単語決定手段によって決定さ
れた単語が誤っていた場合には、使用者に正しい単語に
修正可能とさせる単語修正手段と、誤って決定された単
語及び修正後の正しい単語の少なくも一については、前
記単語辞書に登録されておれば出現頻度を更新すること
及び登録されていなければ単語そのものと出現頻度を新
規に登録することの少なくも一を行う単語辞書更新手段
とを備えたことを特徴とする。 Here, the word is determined by the word determining means.
If the incorrect word is incorrect, ask the user for the correct word.
Word correction means to make it possible to correct
For the word and at least one of the corrected words,
Update the appearance frequency if it is registered in the dictionary
If not registered, the word itself and appearance frequency are updated.
Word dictionary updating means that performs at least one of registrations
And characterized in that:

【００１３】また、本発明においては、特徴パターンを
抽出する文字特徴抽出ステップと、前記文字特徴抽出ス
テップによって得られた特徴パターンとあらかじめ作成
されている文字コードと該文字コードの文字であること
を識別するのに使用される識別パターンの組である文字
データが登録されている文字認識辞書内の識別パターン
とを比較する比較ステップと、文字認識辞書の文字識別
パターンとの比較により、認識対象の１文字当たりあら
かじめ定められた個数以下の識別候補文字につきその文
字コードおよびその確からしさを示す認識評価値からな
る認識候補文字データを得る仮文字認識ステップと、前
記仮文字認識ステップによって得られた各認識候補文字
の組合せからなる単語を、前記単語辞書中で探索し、前
記単語辞書に格納されている出現頻度を得る単語探索ス
テップと、前記単語探索ステップによって探索された単
語について、前記文字認識ステップによって得られた認
識評価値のうち単語を構成する認識候補文字の認識評価
値と、前記単語探索ステップによって得られた単語の出
現頻度とから、所定の手順にて単語評価値を得る単語評
価ステップと、前記単語選択手段を用いて、前記単語評
価ステップにおいて得られた単語評価値をもとに正しい
単語を選択する単語決定ステップと、前記単語決定ステ
ップによって決定された単語が誤っていた場合には、使
用者に正しい単語に修正可能とさせる単語修正ステップ
と、前記単語決定ステップによって決定された単語が誤
っていた場合には、前記単語評価ステップにおいて認識
評価値と出現頻度とから単語評価値を算出する際に用い
る出現頻度に関する重みを変化させる評価手順修正ステ
ップとを有することにより文字認識を行うことを特徴と
している。Further , in the present invention, a character feature extracting step of extracting a feature pattern, a feature pattern obtained by the character feature extracting step, a previously created character code, and a character of the character code. A comparison step of comparing character data, which is a set of identification patterns used for identification, with an identification pattern in a registered character recognition dictionary, and a comparison with a character identification pattern of the character recognition dictionary. A provisional character recognition step of obtaining recognition candidate character data consisting of a character code and a recognition evaluation value indicating the likelihood of a predetermined number or less of identification candidate characters per character; A word composed of a combination of recognition candidate characters is searched in the word dictionary and stored in the word dictionary. A word search step for obtaining the occurrence frequency of the word, and a recognition evaluation value of a recognition candidate character constituting a word among the recognition evaluation values obtained by the character recognition step for the word searched for in the word search step; A word evaluation step of obtaining a word evaluation value in a predetermined procedure from the appearance frequency of the word obtained in the word search step, and a word evaluation value obtained in the word evaluation step using the word selecting means. a word determination step of selecting a correct word to preparative, the word determined stearate
If the word determined by the
Word correction step that allows the user to correct the correct word
And the word determined in the word determination step is incorrect.
If it has been recognized in the word evaluation step
Used when calculating word evaluation value from evaluation value and appearance frequency
Of the evaluation procedure to change the weight of the appearance frequency
It is characterized by performing character recognition by and a-up.

【００１４】また、本発明においては、文字を認識する
のに使用する特徴パターンを得る文字特徴抽出ステップ
と、各文字毎にその文字と組み合わせて単語を作る文字
の文字コードとその出現頻度の組を登録してあらかじめ
作成されている単語辞書を用いて、その単語辞書に格納
されている単語について、文字コードとその文字コード
の文字であることを識別するのに使用される識別パター
ンとの組を登録してあらかじめ作成されている文字認識
辞書を使用して単語辞書に登録されている単語について
これを構成する文字コードに対応する前記文字認識辞書
に格納されている識別パターンと前記文字特徴抽出ステ
ップによって得られた特徴パターンとを比較して、単語
を構成する文字の確からしさを示す認識評価値を計算の
上出力する認識評価値計算ステップと、前記認識評価値
計算ステップにて構成する文字の認識度が高いとされた
単語を候補単語として所定数選出し、この上でこの候補
単語について、前記単語辞書に格納されている出現頻度
を得た上で出力する単語認識ステップと、前記単語認識
ステップによって選出された各候補単語について、前記
認識評価値計算ステップによって得られた各単語を構成
する文字毎の認識評価値と、各単語の出現頻度とから、
所定の手順で単語評価値を得る単語評価ステップと、前
記単語評価ステップにおいて得られた単語評価値をもと
に正しいと判断される単語を選択の上決定し、この決定
した単語を出力する単語決定ステップと、前記単語決定
ステップによって決定された単語が誤っていた場合に
は、使用者に正しい単語に修正可能とさせる単語修正ス
テップと、前記単語決定ステップによって決定された単
語が誤っていた場合には、前記単語評価ステップにおい
て認識評価値と出現頻度とから単語評価値を算出する際
に用いる出現頻度に関する重みを変化させる評価手順修
正ステップとを有していることを特徴としている。Further , in the present invention, a character feature extracting step for obtaining a feature pattern used for character recognition, and a set of a character code of a character forming a word by combining the character with the character for each character and an appearance frequency thereof A set of a character code and an identification pattern used to identify a character stored in the word dictionary using a word dictionary created in advance and identifying the character having the character code. Using a character recognition dictionary created in advance, the identification pattern stored in the character recognition dictionary corresponding to the character code constituting the word registered in the word dictionary and the character feature extraction Comparing with the feature pattern obtained by the step, the recognition evaluation value which calculates and outputs the recognition evaluation value indicating the certainty of the characters constituting the word is output. In the value calculation step and the recognition evaluation value calculation step, a predetermined number of words whose character recognition is determined to be high are selected as candidate words, and the candidate words are stored in the word dictionary. A word recognition step of outputting after obtaining an appearance frequency, and for each candidate word selected by the word recognition step, a recognition evaluation value for each character constituting each word obtained by the recognition evaluation value calculation step; From the appearance frequency of each word,
A word evaluation step of obtaining a word evaluation value by a predetermined procedure; and a word that is determined by selecting a word determined to be correct based on the word evaluation value obtained in the word evaluation step, and outputs the determined word. Deciding step, said word deciding
If the word determined by the step is incorrect
Is a word correction tool that allows the user to correct the word
Step and the unit determined by the word determination step.
If the word is incorrect, go to the word evaluation step
Calculation of word evaluation value from recognition evaluation value and appearance frequency
Procedure to change the weight of appearance frequency used
And a positive step .

【００１５】また、本発明は、特徴パターンを抽出する
文字特徴抽出ステップと、前記文字特徴抽出ステップに
よって得られた特徴パターンとあらかじめ作成されてい
る文字コードと該文字コードの文字であることを識別す
るのに使用される識別パターンの組である文字データが
登録されている文字認識辞書内の識別パターンとを比較
する比較ステップと、文字認識辞書の文字識別パターン
との比較により、認識対象の１文字当たりあらかじめ定
められた個数以下の識別候補文字につきその文字コード
およびその確からしさを示す認識評価値からなる認識候
補文字データを得る仮文字認識ステップと、相連続して
認識対象となっている複数の文字毎に、前記仮文字認識
ステップによって出力された各認識候補文字を文字列中
の位置を不変としたまま組み合わせることにより単語を
作成し、前記単語辞書中にこの単語が登録されているか
否かを調べ、若し登録されているならばその文字と単語
辞書に登録されている出現頻度を出力する単語探索ステ
ップと、前記単語辞書に登録されている単語について、
前記仮文字認識ステップによって得られた認識評価値の
うち当該単語を構成する認識候補文字の認識評価値と、
前記単語探索ステップによって出力された当該単語の出
現頻度とから、所定の手順にて単語評価値を計算し、前
記単語辞書に登録されていない単語について、前記仮文
字認識ステップによって得られた認識評価値のうち当該
単語を構成する認識候補文字の認識評価値のみから所定
の手順で単語評価値を計算する単語評価ステップと、前
記単語辞書に登録されている単語の前記単語評価値と、
前記単語辞書に登録されていない単語の前記単語評価値
とを基にし、これらの単語の中から正しい単語を決定す
る単語決定ステップとを有することにより文字認識を行
うことを特徴とする。 Further , according to the present invention, a feature pattern is extracted.
In the character feature extraction step, the character feature extraction step
Therefore, the feature pattern obtained and the
Character code and the character of the character code.
Character data, which is a set of identification patterns used to
Compare with registered patterns in registered character recognition dictionary
Comparing step and character recognition pattern of character recognition dictionary
Is determined in advance for each character to be recognized.
The character code for each of the candidate characters less than
Recognition Indicator Consisting of Recognition Evaluation Value Showing Probability
A provisional character recognition step to obtain complementary character data, and
The temporary character recognition is performed for each of a plurality of characters to be recognized.
Each recognition candidate character output by the step is in the character string
By combining words while keeping the position of
Created and whether this word is registered in the word dictionary
Check if it is registered, and if it is registered, its letter and word
A word search step that outputs the frequency of appearance registered in the dictionary
And the words registered in the word dictionary,
Of the recognition evaluation value obtained in the provisional character recognition step
A recognition evaluation value of a recognition candidate character constituting the word;
Output of the word output in the word search step
From the current frequency, calculate the word evaluation value according to a predetermined procedure,
For words that are not registered in the dictionary,
Of the recognition evaluation values obtained in the character recognition step
Predetermined only from the recognition evaluation values of the recognition candidate characters that make up the word
A word evaluation step of calculating a word evaluation value according to the procedure of
Said word evaluation value of a word registered in the notation word dictionary;
The word evaluation value of a word not registered in the word dictionary
And determine the correct word from these words based on
Character recognition by performing
It is characterized by the following.

【００１６】ここで、前記単語決定ステップによって決
定された単語が誤っていた場合には、使用者に正しい単
語に修正可能とさせる単語修正ステップと、前記単語決
定ステップによって決定された単語が誤っていた場合に
は、前記単語評価ステップにおいて認識評価値と出現頻
度とから単語評価値を算出する際に用いる出現頻度に関
する重みを変化させる評価手順修正ステップとを含むこ
とを特徴とする。また、本発明は、前記単語決定ステッ
プによって決定された単語が誤っていた場合には使用者
が正しい単語に修正する単語修正ステップと、前記単語
決定ステップによって決定された単語及び前記単語修正
ステップによって修正された単語の少なくも一につい
て、前記単語辞書更新手段を用いて、前記単語辞書に登
録されている単語であれば出現頻度を更新すること及び
登録されていない単語であれば単語そのものと出現頻度
とを登録することの少なくも一を行う単語辞書更新ステ
ップとを備えたことを特徴とする。 Here, the word is determined by the word determining step.
If the specified word is incorrect, give the user the correct word.
A word correction step of making the word correctable;
If the word determined by the fixed step is incorrect
Is the recognition evaluation value and the appearance frequency in the word evaluation step.
And the frequency of appearance used to calculate word evaluation values from
Modifying the evaluation procedure to change the weight
And features. Further, the present invention provides the above-mentioned word determination step.
If the word determined by the group is incorrect, the user
Correcting the word to a correct word;
The word determined by the determining step and the word correction
At least one of the words modified by the step
Using the word dictionary updating means to register in the word dictionary.
Update the frequency of appearance if it is a recorded word; and
If the word is not registered, the word itself and its appearance frequency
Word dictionary update step that does at least one of registering
And a top.

【００１７】[0017]

【作用】上記構成により本発明においては、文字特徴抽
出手段が、文字を認識するのに使用する特徴パターンを
抽出する。文字認識辞書には、文字コードと該文字コー
ドの文字であることを識別するのに使用される識別パタ
ーンの組が文字認識に利用可能な態様で登録されてい
る。文字認識手段が、前記文字特徴抽出手段によって抽
出された特徴パターンと前記文字認識辞書に登録されて
いる識別パターンとを比較照合して、所定の手順で認識
評価値を計算することにより、抽出した１文字当たりあ
らかじめ定められた個数以下の文字コードおよびその確
からしさを示す認識評価値からなる認識候補文字データ
を出力する。単語辞書には、各文字毎にその文字と組み
合わせて単語を作る文字の文字コードとその単語そのも
のの文字画像中での出現頻度の組を登録している。単語
探索手段が、前記文字認識手段によって出力された相連
続して認識対象となっている複数の文字毎に出力された
あらかじめ定められた個数以下の認識候補文字を、文字
列中の位置を不変としたまま組み合わせることにより、
単語を作成し、前記単語辞書中にこの単語が登録されて
いるか否かを調べ、若し登録されているならばその文字
と単語辞書に格納されている出現頻度を出力する。単語
評価手段が、単語探索手段によって出力された単語につ
いて前記仮文字認識手段によって出力された認識評価値
と、前記単語探索手段によって出力された出現頻度とか
ら、所定の手順を用いて単語評価値を計算する。単語決
定手段が、前記単語評価手段によって計算された単語評
価値をもとに正しい単語を決定する。単語修正手段が、
前記単語決定手段によって決定された単語が誤っていた
場合には、使用者に正しい単語に修正可能とさせる。評
価手順修正手段が、前記単語決定手段によって決定され
た単語が誤っていた場合には、前記単語評価手段におい
て認識評価値と出現頻度とから単語評価値を算出する際
に用いる出現頻度に関する重みを変化させる。 According to the present invention, the character feature extracting means extracts a feature pattern used for recognizing a character. In the character recognition dictionary, a set of a character code and an identification pattern used to identify a character of the character code is registered in a form usable for character recognition. The character recognition unit compares the feature pattern extracted by the character feature extraction unit with an identification pattern registered in the character recognition dictionary, and calculates a recognition evaluation value in a predetermined procedure to extract the character. Recognition candidate character data consisting of a character code equal to or less than a predetermined number per character and a recognition evaluation value indicating the likelihood is output. The word dictionary registers, for each character, a combination of a character code of a character that forms a word by combining the character with the character and an appearance frequency of the word itself in a character image. The word search means changes the position in the character string of a predetermined number or less of recognition candidate characters output for each of a plurality of consecutively recognized characters output by the character recognition means. By combining with
A word is created, it is checked whether this word is registered in the word dictionary, and if it is registered, the character and the appearance frequency stored in the word dictionary are output. The word evaluation unit uses a predetermined procedure to determine the word evaluation value of the word output by the word search unit based on the recognition evaluation value output by the provisional character recognition unit and the appearance frequency output by the word search unit. Is calculated. The word determination means determines a correct word based on the word evaluation value calculated by the word evaluation means. Word correction means,
The word determined by the word determination means was incorrect
In such a case, the user is allowed to correct the word. Comment
Value procedure correcting means is determined by the word determining means.
If the word is incorrect, the word
Calculation of word evaluation value from recognition evaluation value and appearance frequency
Is changed with respect to the appearance frequency used for.

【００１８】また、本発明においては、文字特徴抽出手
段が、文字を認識するのに使用する特徴パターンを抽出
する。文字認識辞書には、あらかじめ文字コードと該文
字コードの文字であることを識別するのに使用される識
別パターンの組を登録してある。単語辞書には、各文字
毎にその文字と組み合わせて単語を作る文字の文字コー
ドとあらかじめの印刷文書一般の調査の結果等によりも
とめたその出現頻度の組を登録している。単語認識計算
手段が、前記単語辞書に登録されている単語について、
前記文字特徴抽出手段によって抽出された特徴パターン
と、単語を構成する文字コードに対応する前記文字認識
辞書に格納されている識別パターンとを比較して、所定
の手順で両者の類似度を示す認識評価値を計算する。候
補単語出現頻度出力手段が、認識評価値計算手段によっ
て構成する文字の認識度が高い単語を候補単語として所
定数選出し、この上でこれらの出現頻度を単語辞書から
もとめて出力する。単語評価手段が、単語認識手段によ
って出力された認識評価値と出現頻度とから、所定の手
順で各候補単語についてその単語評価値を計算する。単
語決定手段が、単語評価手段によって計算された単語評
価値をもとに正しい単語を選択の上決定する。単語修正
手段が、前記単語決定手段によって決定された単語が誤
っていた場合には、使用者に正しい単語に修正可能とさ
せる。評価手順修正手段が、前記単語決定手段によって
決定された単語が誤っていた場合には、前記単語評価手
段において認識評価値と出現頻度とから単語評価値を算
出する際に用いる出現頻度に関する重みを変化させる。 Further , in the present invention, the character feature extracting means extracts a feature pattern used for recognizing a character. In the character recognition dictionary, a set of a character code and an identification pattern used to identify a character of the character code is registered in advance. In the word dictionary, for each character, a set of a character code of a character that forms a word by combining the character and an appearance frequency obtained based on a result of a general survey of printed documents in advance is registered. Word recognition calculating means, for words registered in the word dictionary,
A feature pattern extracted by the character feature extracting means is compared with an identification pattern stored in the character recognition dictionary corresponding to a character code constituting a word, and a recognition procedure indicating a similarity between the two in a predetermined procedure. Calculate the evaluation value. The candidate word appearance frequency output means selects a predetermined number of words having a high degree of recognition of the characters formed by the recognition evaluation value calculation means as candidate words, and then obtains and outputs these appearance frequencies from a word dictionary. The word evaluation means calculates a word evaluation value for each candidate word in a predetermined procedure from the recognition evaluation value output by the word recognition means and the appearance frequency. The word determining means selects and determines a correct word based on the word evaluation value calculated by the word evaluating means. The word correction unit determines that the word determined by the word determination unit is incorrect.
Is correct, the user can correct it to the correct word.
Let The evaluation procedure correcting means is provided by the word determining means.
If the determined word is incorrect, the word evaluation
Calculate word evaluation value from recognition evaluation value and appearance frequency
The weight related to the appearance frequency used when issuing is changed.

【００１９】また、本発明においては、文字特徴抽出手
段が、文字を認識するのに使用する特徴パターンを抽出
する。文字認識辞書が、文字コードと該文字コードの文
字であることを識別するのに使用される識別パターンの
組を登録している。仮文字認識手段が、前記文字特徴抽
出手段によって抽出された特徴パターンと前記文字認識
辞書に登録されている識別パターンとを比較して所定の
手順で認識評価値を計算することにより、抽出した１文
字当たりあらかじめ定められた個数以下の認識候補文字
につき、その文字コード及びその確からしさを示す認識
評価値からなる認識候補文字データを出力する。単語辞
書が、各文字毎に、その文字と組み合わせて単語を作る
文字の文字コードとその出現頻度の組を登録している。
単語探索手段が、相連続して認識対象となっている複数
の文字毎に、前記仮文字認識手段によって出力された各
認識候補文字を文字列中の位置を不変としたまま組み合
わせることにより単語を作成し、前記単語辞書中にこの
単語が登録されているか否かを調べ、若し登録されてい
るならばその文字と単語辞書に登録されている出現頻度
を出力する。単語評価手段が、前記単語辞書に登録され
ている単語について、前記仮文字認識手段によって出力
された認識評価値のうち当該単語を構成する認識候補文
字の認識評価値と、前記単語探索手段によって出力され
た当該単語の出現頻度とから、所定の手順で単語評価値
を計算し、前記単語辞書に登録されていない単語につい
て、前記仮文字認識手段によって出力された認識評価値
のうち当該単語を構成する認識候補文字の認識評価値の
みから所定の手順で単語評価値を計算する。単語決定手
段が、前記単語辞書に登録されている単語の前記単語評
価値と、前記単語辞書に登録されていない単語の前記単
語評価値とを基にし、これらの単語の中から正しい単語
を決定する。 Further , in the present invention, the character feature extraction
Columns extract feature patterns used to recognize characters
I do. The character recognition dictionary stores a character code and a sentence of the character code.
Of the identification pattern used to identify
A pair is registered. The provisional character recognizing means performs the character feature extraction.
Character pattern extracted by output means and the character recognition
By comparing with the identification pattern registered in the dictionary,
One sentence extracted by calculating the recognition evaluation value in the procedure
No more than a predetermined number of candidate characters per character
, Recognition of the character code and its certainty
The recognition candidate character data including the evaluation value is output. Vocabulary
Calligraphy, for each letter, combine with that letter to form a word
A set of character code of the character and its appearance frequency is registered.
Multiple words search means are recognized consecutively
For each character of
Combine recognition candidate characters with their positions in the character string unchanged
To create a word, and put this word in the word dictionary.
Check if the word is registered and if it is registered
If it is, the character and the appearance frequency registered in the word dictionary
Is output. Word evaluation means is registered in the word dictionary.
Output by the provisional character recognition means
Recognition candidate sentence that constitutes the word among the recognized recognition evaluation values
A character recognition evaluation value, which is output by the word search means.
From the appearance frequency of the word, the word evaluation value
Is calculated for words not registered in the word dictionary.
The recognition evaluation value output by the provisional character recognition means.
Of the recognition evaluation values of the recognition candidate characters that constitute the word
Then, a word evaluation value is calculated by a predetermined procedure. Word decision
The column is the word rating of a word registered in the word dictionary.
Value and the unit of the word not registered in the word dictionary.
Based on word evaluation value, correct word from these words
To determine.

【００２０】また、本発明においては、単語修正手段
が、前記単語決定手段によって決定された単語が誤って
いた場合には、使用者に正しい単語に修正可能とさせ
る。評価手順修正手段が、前記単語決定手段によって決
定された単語が誤っていた場合には、前記単語評価手段
において認識評価値と出現頻度とから単語評価値を算出
する際に用いる出現頻度に関する重みを変化させる。 Further , in the present invention, the word correcting means
But the word determined by the word determination means is incorrect
If so, let the user correct it to the correct word
You. The evaluation procedure correcting means is determined by the word determining means.
If the specified word is incorrect, the word evaluation means
Calculates word evaluation value from recognition evaluation value and appearance frequency
The weight related to the appearance frequency used in the processing is changed.

【００２１】また、本発明においては、単語修正手段が
単語決定手段によって誤って決定された単語が存在する
場合に、ワードプロセッサ等と同じくこれをＣＲＴへの
表示等で見つけた使用者にキーボード操作等により正し
い単語に修正することを可能とさせる。単語辞書更新手
段が、誤って決定された単語及び修正後の正しい単語の
少なくも一（含む、両方）については、前記単語辞書に
登録されておれば出現頻度を更新すること及び登録され
ていなければ単語そのものと出現頻度を新規に登録する
ことの少なくとも一を行う。 In the present invention, the word correcting means is
There is a word incorrectly determined by the word determination means
In this case, like a word processor,
Correct the user found on the display etc. by keyboard operation etc.
To make it possible to correct the word. Word dictionary update
The column contains the incorrectly determined word and the corrected correct word.
At least one (including both)
If it is registered, update the appearance frequency and register
If not, register the word itself and appearance frequency newly
Do at least one of the things.

【００２２】[0022]

【００２３】[0023]

【実施例】以下、本発明に係る文字認識装置を実施例に
基づいて説明する。（第１実施例）図１は本発明の第一実施例の構成図であ
る。本図において、１０１は文字特徴抽出手段であり、
１０２は文字認識辞書であり、１０３は文字認識手段で
あり、１０４は単語辞書であり、１０５は単語探索手段
であり、１０６は単語評価手段であり、１０７は単語決
定手段であり、１０８は単語修正手段であり、１０９は
単語辞書更新手段であり、１１０は評価関数修正手段で
ある。DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a character recognition device according to the present invention will be described based on embodiments. (First Embodiment) FIG. 1 is a configuration diagram of a first embodiment of the present invention. In the figure, reference numeral 101 denotes a character feature extracting unit.
102 is a character recognition dictionary, 103 is a character recognition means, 104 is a word dictionary, 105 is a word search means, 106 is a word evaluation means, 107 is a word determination means, and 108 is a word Correcting means 109 is a word dictionary updating means, and 110 is an evaluation function correcting means.

【００２４】次に、以上のように構成された文字認識装
置について図１、図２及び図３を用いてその動作を説明
する。なお、ここに図２は、本実施例の処理を示すフロ
ーチャートであり、図３は、本実施例における処理例の
図である。文字特徴抽出ステップ（２０１）では、文字
特徴抽出手段１０１が光源と光／電気変換素子を有する
スキャナあるいはファイルなどから読み込まれた認識対
象文字列を含む画像データに対して、例えば互いに直行
する二方向への投影を取るなどの方法によって、文字列
を切り出し、更に一文字ずつの文字画像を文字画素の連
続の有無や行間隔から求めた文字ピッチ等をもとに切り
出す。更に、この切り出された一文字ずつの画像から、
例えば輪郭の方向など文字の特徴を示す特徴パターンの
抽出がなされる。Next, the operation of the character recognition apparatus constructed as described above will be described with reference to FIGS. 1, 2 and 3. FIG. 2 is a flowchart showing the processing of the present embodiment, and FIG. 3 is a diagram of a processing example in the present embodiment. In the character feature extraction step (201), the character feature extraction means 101 applies, for example, two directions orthogonal to each other to image data including a recognition target character string read from a scanner or a file having a light source and an optical / electrical conversion element. A character string is cut out by a method such as projection onto a character, and a character image of each character is further cut out based on the presence / absence of continuation of character pixels, a character pitch obtained from a line interval, and the like. Furthermore, from the cut-out image of each character,
For example, a characteristic pattern indicating a characteristic of a character such as a direction of a contour is extracted.

【００２５】文字認識ステップ（２０２）では、文字認
識手段１０３を用いて、文字特徴抽出ステップ（２０
１）に抽出された特徴パターンと、文字認識辞書１０２
に格納されている全ての識別パターンとの相違を多次元
ベクトル化してその距離を計算するなどして、複数の認
識候補文字を得る。ここに、認識候補文字データは、文
字コードと、その確からしさを示す例えば距離値などの
認識評価値とからなる。なお、この場合、出力する認識
候補文字数はシステム全体の処理速度の面から３としか
つ評価値の低いものは足切りを行うものとしている。ま
たこのため、最大３となる。例えば、図３（ａ）に単語
「松居」が記載された画像データに対する文字認識手段
１０３の出力例を示す。最初の文字「松」に対しては
「松」、「林」、「拡」の認識候補文字が得られ、その
確からしさを示す認識評価値がそれぞれ、６２、７０、
７２であったとする。二番目の文字「居」に対しては認
識候補文字「居」、「届」、「尾」が得られ、認識評価
値はそれぞれ５８、６４、７３であったとする。In the character recognition step (202), a character feature extraction step (20) is performed using the character recognition means 103.
The feature pattern extracted in 1) and the character recognition dictionary 102
A plurality of recognition candidate characters are obtained by, for example, converting the differences from all the identification patterns stored in the multi-dimensional vector into a multi-dimensional vector and calculating the distance between them. Here, the recognition candidate character data includes a character code and a recognition evaluation value such as a distance value indicating the likelihood. In this case, the number of recognition candidate characters to be output is set to 3 from the viewpoint of the processing speed of the entire system, and the one with a low evaluation value is cut off. Therefore, the maximum value is 3. For example, FIG. 3A shows an output example of the character recognition unit 103 for image data in which the word “Matsui” is described. For the first character "matsu", recognition candidate characters of "matsu", "lin", and "extended" are obtained, and the recognition evaluation values indicating the certainty are 62, 70, respectively.
Assume that it is 72. It is assumed that recognition candidate characters “i”, “notification”, and “tail” are obtained for the second character “i”, and the recognition evaluation values are 58, 64, and 73, respectively.

【００２６】候補選択ステップ（２０３）では、文字認
識ステップ２０２で得られた認識候補文字の文字列中の
配置を変更しない組合せ（実際には、単語とは限らない
が、本明細書では「単語」という。）のうち、まだ単語
探索ステップ２０４によって探索されていない単語を一
つ選択する。例えば、図３（ｂ）に示す９つの単語「松
居」、「松届」などの中から一つを選択する。In the candidate selection step (203), a combination that does not change the arrangement in the character string of the recognition candidate character obtained in the character recognition step 202 (actually, this is not necessarily a word, but in this specification, the word "word ) Is selected. One word that has not been searched yet in the word search step 204 is selected. For example, one of the nine words “Matsui” and “Matsuno” shown in FIG. 3B is selected.

【００２７】単語探索ステップ（２０４）では、単語探
索手段１０５を用いて、候補選択ステップ（２０３）に
よって選択された単語が、単語辞書１０４に登録されて
いるか否かが調べられ、登録されているならばその出現
頻度が得られる。なお、単語辞書に格納されていない単
語の出現頻度は０とする。図３（ｃ）に単語辞書１０４
に格納されている単語とその出現頻度を例示する。図３
（ｂ）に示した単語の中では、出現頻度１０の「松居」
および出現頻度５００の「松尾」が単語辞書に存在し、
その他の単語は存在しなかったとする。In the word search step (204), it is checked whether or not the word selected in the candidate selection step (203) is registered in the word dictionary 104 by using the word search means 105, and the word is registered. Then the appearance frequency is obtained. Note that the frequency of appearance of words not stored in the word dictionary is 0. FIG. 3C shows the word dictionary 104.
Of the words stored in the table and their appearance frequencies. FIG.
Among the words shown in (b), "Matsui" with an appearance frequency of 10
And "Matsuo" with an appearance frequency of 500 exists in the word dictionary,
Assume that no other words exist.

【００２８】単語評価ステップ（２０５）では、単語評
価手段１０６を用いて、文字認識ステップ（２０２）で
得られた認識評価値のうち単語を構成する認識候補文字
の認識評価値の和と、単語探索ステップ（２０４）で得
られた単語の出現頻度とから、あらかじめ定められた関
数、例えば以下の式を用いて単語評価値が計算される。In the word evaluation step (205), using the word evaluation means 106, the sum of the recognition evaluation values of the recognition candidate characters constituting the word among the recognition evaluation values obtained in the character recognition step (202), and the word A word evaluation value is calculated from a word appearance frequency obtained in the search step (204) using a predetermined function, for example, the following equation.

【００２９】（出現頻度が０でない場合）単語評価値＝（認識評価値の和）×｛１−Ｐ×ｌｎ（出
現頻度）｝ここで、Ｐは設定値であり、本実施例では、当初０．０
２とする。また、ｌｎは自然対数である。（出現頻度が０の場合）単語評価値＝認識評価値の和そうすると、単語「松居」の単語評価値は、（６２＋５８）×｛１−０．０２×ｌｎ（１０）｝＝１
１４となる。同じく、単語「松尾」の単語評価値は、（６２＋７３）×｛１−０．０２×ｌｎ（５００）｝＝
１１８となる。同じく、単語「林居」の単語評価値は、７０＋５８＝１２８となる。(When the appearance frequency is not 0) Word evaluation value = (sum of recognition evaluation values) × {1−P × ln (appearance frequency)} Here, P is a set value. 0.0
Let it be 2. Ln is a natural logarithm. (When the appearance frequency is 0) Word evaluation value = sum of recognition evaluation values Then, the word evaluation value of the word “Matsui” is (62 + 58) × {1−0.02 × ln (10)} = 1
It becomes 14. Similarly, the word evaluation value of the word “Matsuo” is (62 + 73) × {1−0.02 × ln (500)} =
118 is obtained. Similarly, the word evaluation value of the word “bayashi” is 70 + 58 = 128.

【００３０】図３（ｄ）に、以上の手順でもとめた各単
語の単語評価値を示す。候補終了判定ステップ（２０
６）では、文字認識ステップ（２０２）で得られた認識
候補文字の全ての組合せについて単語探索ステップ（２
０４）での処理を終えたか否かが判定され、未終了の場
合には候補選択ステップ（２０３）へ進み、終了の場合
には単語決定ステップ（２０７）へ進むこととなる。例
えば、図３（ｂ）の９つの単語について単語探索ステッ
プでの処理を終えると、単語決定ステップ（２０７）へ
進む。FIG. 3D shows the word evaluation value of each word obtained by the above procedure. Candidate end determination step (20
In 6), a word search step (2) is performed for all combinations of the recognition candidate characters obtained in the character recognition step (202).
It is determined whether or not the processing in step 04) has been completed. If the processing has not been completed, the process proceeds to the candidate selection step (203), and if completed, the process proceeds to the word determination step (207). For example, when the processing in the word search step is completed for the nine words in FIG. 3B, the process proceeds to the word determination step (207).

【００３１】単語決定ステップ（２０７）では、単語決
定手段１０７が、単語評価ステップ（２０５）で計算さ
れた単語評価値に基づいて、最も評価が良い単語が一つ
選択された上出力される。例えば、図３（ｄ）に示した
各単語から単語評価値が最も良い、すなわち、最も値が
小さい単語「松居」を選択する。単語修正ステップ（２
０８）では、単語修正手段１０８が、単語決定ステップ
（２０７）で得られた単語を、表示装置に表示するなど
して使用者の確認を促す。そして、正しい単語が表示さ
れておれば、使用者は正しい旨を示す確認の入力を行
い、また正しくなければ正しい単語をキーボードなどか
ら入力したり、次順位の候補単語をカーソルで指定した
りして訂正することにより確認、修正処理を可能とさせ
る。例えば、単語決定ステップで正しい単語「松居」が
決定された場合には、結果を確認する入力を行う。ま
た、例えば他の記載場所で読み取った画像データが本来
は「松居」であるのに誤った単語「松尾」が決定された
場合には正しい単語「松居」と修正がなされる。In the word determination step (207), the word determination means 107 selects and outputs one word having the highest evaluation based on the word evaluation value calculated in the word evaluation step (205). For example, the word “Matsui” having the best word evaluation value, that is, the smallest value, is selected from the words shown in FIG. Word correction step (2
In step 08), the word correcting unit 108 prompts the user to confirm the word obtained in the word determination step (207) by displaying the word on a display device. If the correct word is displayed, the user inputs a confirmation indicating that the word is correct.If the word is not correct, the user inputs a correct word from a keyboard or the like, or specifies a next-rank candidate word with a cursor. Confirmation and correction processing are enabled by making corrections. For example, when the correct word “Matsui” is determined in the word determination step, an input for confirming the result is performed. Further, for example, when the wrong word “Matsuo” is determined even though the image data read at another description place is originally “Matsui”, the correct word “Matsui” is corrected.

【００３２】単語辞書更新ステップ（２０９）では、単
語修正ステップ（２０８）で修正された場合には修正後
の単語について、修正されなかった場合には単語決定ス
テップ（２０７）で得られた単語について、単語辞書更
新手段１０９を用いて単語辞書１０４の「松居」の出現
頻度に１を加えたり、出現率を増加させるなどしてその
内容を更新する。また、単語修正ステップ（２０８）で
修正された単語が単語辞書１０４に格納されていない単
語であった場合には、新たに該当する単語を単語辞書に
登録し、出現頻度を例えば１と初期化する。例えば、単
語辞書中の単語「松居」の出現頻度に１を加えて１１と
更新する。また、単語修正ステップ（２０８）で単語辞
書に存在しない単語「林尾」と修正された場合には、単
語辞書に単語「林尾」とその出現頻度１を新たに登録す
る。In the word dictionary updating step (209), when the word is corrected in the word correcting step (208), the corrected word is used. When the word is not corrected, the word obtained in the word determining step (207) is used. Then, the content is updated by adding one to the frequency of appearance of “Matsui” in the word dictionary 104 or increasing the frequency of appearance using the word dictionary updating means 109. If the word corrected in the word correcting step (208) is not stored in the word dictionary 104, a new word is registered in the word dictionary, and the appearance frequency is initialized to, for example, 1. I do. For example, 1 is added to the appearance frequency of the word “Matsui” in the word dictionary, and is updated to 11. If the word "Hayao" which is not present in the word dictionary is corrected in the word correction step (208), the word "Hayao" and its appearance frequency 1 are newly registered in the word dictionary.

【００３３】修正判定ステップ（２１０）では、単語修
正ステップ（２０８）での使用者による修正の有無を判
定し、修正されなかった場合には画像データ中の当該文
字の認識処理を終了し、修正された場合には評価関数修
正ステップ（２１１）へ行くこととなる。評価関数修正
ステップ（２１１）では、使用者が単語決定ステップ
（２０７）で得られた単語を単語修正ステップ（２０
８）で修正した場合には、単語評価手段１０６が単語を
評価するために使用している関数のパラメータを修正す
ることがなされる。例えば、単語決定ステップ（２０
７）で、認識評価値が良く出現頻度が低い単語を誤って
選択した場合には、出現頻度に関する重みが大きくなる
ように修正し、逆に認識評価値が悪く出現頻度が高い単
語を誤って選択した場合には、出現頻度に関する重みが
小さくなるように修正する。以下、図３（ａ）に示した
認識候補文字及び図３（ｃ）に示した出現頻度を持った
単語について、これらの修正の内容を説明する。正しい
単語が「松尾」である場合に、若しより出現頻度が低い
単語「松居」と誤って決定出力された場合には、パラメ
ータＰを大きな値へ修正する。これは、単語評価値を求
める関数での出現頻度の重みを増すためである。一方正
しい単語が「松居」である場合に、より出現頻度が高い
単語「松尾」と誤って決定出力された場合には、パラメ
ータＰを小さな値へ修正する。これは、単語評価値を求
める関数での出現頻度の重みを減らすためである。（第２実施例）図４は、本実施例の構成図である。本図
において、４０１は文字特徴抽出手段であり、４０２は
文字認識辞書であり、４０３は単語辞書であり、４０４
は単語認識手段であり、４０５は単語評価手段であり、
４０６は単語決定手段であり、４０７は単語修正手段で
あり、４０８は単語辞書更新手段であり、４０９は評価
関数修正手段である。次に、以上のように構成された文
字認識装置について、図４、図５及び図６を用いてその
動作を説明する。ここに、図５は本実施例における処理
のフローチャートであり、図６は本実施例における処理
例の図である。In the correction determining step (210), it is determined whether or not the user has made a correction in the word correcting step (208). If the correction has not been made, the process of recognizing the character in the image data is terminated. If so, the procedure goes to the evaluation function correction step (211). In the evaluation function correction step (211), the user converts the word obtained in the word determination step (207) into a word correction step (20).
When the correction is made in step 8), the parameters of the function used by the word evaluation means 106 to evaluate the word are corrected. For example, the word determination step (20
In 7), when a word having a good recognition evaluation value and a low appearance frequency is erroneously selected, the weight related to the appearance frequency is corrected to be large, and a word having a low recognition evaluation value and a high appearance frequency is erroneously selected. When selected, the weight is modified so that the weight related to the appearance frequency is reduced. Hereinafter, the contents of these corrections for the recognition candidate character shown in FIG. 3A and the word having the appearance frequency shown in FIG. 3C will be described. If the correct word is "Matsuo" and the word "Matsui" having a lower appearance frequency is erroneously determined and output, the parameter P is corrected to a larger value. This is to increase the weight of the appearance frequency in the function for obtaining the word evaluation value. On the other hand, when the correct word is “Matsui” and the word “Matsuo” having a higher appearance frequency is erroneously determined and output, the parameter P is corrected to a small value. This is to reduce the weight of the appearance frequency in the function for obtaining the word evaluation value. (Second Embodiment) FIG. 4 is a configuration diagram of the present embodiment. In the figure, reference numeral 401 denotes a character feature extraction unit, 402 denotes a character recognition dictionary, 403 denotes a word dictionary, and 404 denotes a character dictionary.
Is a word recognition means, 405 is a word evaluation means,
Reference numeral 406 denotes a word determining unit, 407 denotes a word correcting unit, 408 denotes a word dictionary updating unit, and 409 denotes an evaluation function correcting unit. Next, the operation of the character recognition device configured as described above will be described with reference to FIG. 4, FIG. 5, and FIG. Here, FIG. 5 is a flowchart of the process in the present embodiment, and FIG. 6 is a diagram of a process example in the present embodiment.

【００３４】文字特徴抽出ステップ（５０１）では、先
の第１実施例と同じく文字特徴抽出手段４０１を用い
て、例えば輪郭の方向など文字の特徴を示す特徴パター
ンを抽出することがなされる。単語選択ステップ（５０
２）では、単語辞書４０３に格納されている単語の中か
ら、まだ単語認識ステップ５０３での処理を終えていな
い単語が一つ選択される。In the character feature extracting step (501), a character pattern indicating a character feature such as a contour direction is extracted by using the character feature extracting means 401 as in the first embodiment. Word selection step (50
In 2), one word that has not been processed in the word recognition step 503 is selected from the words stored in the word dictionary 403.

【００３５】単語認識ステップ（５０３）では、単語認
識手段４０４を用いて、単語選択ステップ（５０２）で
選択された単語について、文字特徴抽出ステップ（５０
１）で得られた特徴パターンと、単語を構成する文字コ
ードに対応する文字認識辞書４０２内の識別パターンと
の相違の程度を足切りや多次元ベクトル化してその距離
を計算するなどの処理をして、単語を構成する文字の確
からしさを示す距離値等の認識評価値が得られる。ま
た、同時に単語辞書４０３に格納されている単語の出現
頻度も得られる。図６（ａ）に単語辞書に格納されてい
る単語と出現頻度の例を、図６（ｂ）にそれぞれの単語
を構成する文字の認識評価値の例を示す。例えば、単語
「松居」の文字「松」の認識評価値は６２であり、文字
「居」の認識評価値は５８である。単語評価ステップ
（５０４）では、単語評価手段４０５を用いて、単語認
識ステップ５０３で得られた単語を構成する文字の認識
評価値の和と、単語の出現頻度とから、あらかじめ定め
られた関数、例えば単語評価値を以下の式を用いて単語
評価値を計算する。In the word recognition step (503), the character recognition step (50) is performed on the word selected in the word selection step (502) using the word recognition means 404.
Processing such as truncation or multidimensional vectorization of the degree of difference between the feature pattern obtained in 1) and the identification pattern in the character recognition dictionary 402 corresponding to the character code forming the word to calculate the distance is performed. Then, a recognition evaluation value such as a distance value indicating the certainty of the characters constituting the word is obtained. At the same time, the appearance frequency of the words stored in the word dictionary 403 is obtained. FIG. 6A shows an example of the words and appearance frequencies stored in the word dictionary, and FIG. 6B shows an example of the recognition evaluation values of the characters constituting each word. For example, the recognition evaluation value of the character “matsu” of the word “Matsui” is 62, and the recognition evaluation value of the character “i” is 58. In the word evaluation step (504), using a word evaluation unit 405, a predetermined function, based on the sum of the recognition evaluation values of the characters constituting the word obtained in the word recognition step 503 and the appearance frequency of the word, For example, the word evaluation value is calculated using the following expression.

【００３６】単語評価値＝（認識評価値の和）×｛１−
Ｐ×ｌｎ（出現頻度）｝ここで、Ｐは設定値であり、第１実施例と同じく、当初
０．０２とする。また、ｌｎは自然対数である。そうす
ると、単語「松居」の単語評価値は、（６２＋５８）×｛１−０．０２×ｌｎ（１０）｝＝１
１４となり、単語「松尾」の単語評価値は、（６２＋７３）×｛１−０．０２×ｌｎ（５００）｝＝
１１８となる。Word evaluation value = (sum of recognition evaluation values) × ｛1−
P × ln (appearance frequency)｝ Here, P is a set value, which is initially set to 0.02 as in the first embodiment. Ln is a natural logarithm. Then, the word evaluation value of the word “Matsui” is (62 + 58) × {1−0.02 × ln (10)} = 1
The word evaluation value of the word “Matsuo” is (62 + 73) × {1−0.02 × ln (500)} =
118 is obtained.

【００３７】図６（ｃ）に各単語の単語評価値の例を示
す。単語終了判定ステップ（５０５）では、単語辞書４
０３に格納されている全ての単語について、単語認識ス
テップ（５０３）の処理が終えているか否かを判定し、
未終了の場合には、単語選択ステップ（５０２）へ進
み、終了の場合には単語決定ステップ（５０６）へ進
む。FIG. 6C shows an example of the word evaluation value of each word. In the word end determination step (505), the word dictionary 4
03, it is determined whether or not the processing of the word recognition step (503) has been completed for all the words stored in “03”.
If not completed, the process proceeds to a word selection step (502), and if not completed, the process proceeds to a word determination step (506).

【００３８】単語決定ステップ（５０６）では、単語辞
書４０３に格納されている全ての単語について単語認識
ステップ（５０３）及び単語評価ステップ（５０４）で
の処理を終了した後に、単語決定手段４０６を用いて決
定が行われる。本ステップでは、単語評価ステップ（５
０４）で計算された単語評価値に基づいて、最も評価が
良い単語を一つ選択する。例えば、図６（ｃ）に示した
各単語から単語評価値が最も良い、すなわち、最も値が
小さい単語「松居」を選択する。In the word determination step (506), after the processing in the word recognition step (503) and the word evaluation step (504) is completed for all the words stored in the word dictionary 403, the word determination means 406 is used. A decision is made. In this step, the word evaluation step (5
Based on the word evaluation value calculated in step 04), one word with the highest evaluation is selected. For example, the word “Matsui” having the best word evaluation value, that is, the smallest value, is selected from the words shown in FIG. 6C.

【００３９】以降の処理は、本発明第一の実施例と同様
であるので省略する。（第３実施例）本実施例は、基本的には先の第１、第２
実施例と同じである。ただし、文字認識辞書と単語辞書
とが文字認識の対象とされる画像データの内容に応じて
切換可能となっているのが異なる。Subsequent processing is the same as in the first embodiment of the present invention, and will not be described. (Third Embodiment) This embodiment is basically similar to the first and second embodiments.
This is the same as the embodiment. However, the difference is that the character recognition dictionary and the word dictionary can be switched according to the content of image data to be subjected to character recognition.

【００４０】周知のごとく、日本語、中国語、英語ある
いは特許出願の明細書、新聞、数学の論文等各種文書
は、その言語や内容の種類、如何等によって使用される
文字やその字体、あるいは記号や単語（学術用語）やそ
の使用頻度等に個別の傾向が見られる。例えば、特許出
願の明細書では「発明」や「手段」、新聞では「政府」
や「経済」等の単語の使用頻度が高い。As is well known, various documents such as Japanese, Chinese, English or patent application specifications, newspapers, mathematics papers, etc., use characters and their fonts depending on the language, the type of content, etc. There are individual trends in symbols and words (scientific terms) and their frequency of use. For example, in the specification of a patent application, "invention" or "means", and in newspapers, "government"
Words such as and "economic" are frequently used.

【００４１】このため、本実施例ではあらかじめ各種文
書用に文字認識辞書と単語辞書とを作成しておき、認識
開始に先立って使用者が認識対象の文書の種類を教え
（入力し）たり、辞書の優先度を指定したりすることに
より、文字認識の効率向上を図っている。ただし、辞書
の使用者による優先度の入力等を実現するためのハード
面、ソフト面の構成はいわば周知の技術である。例え
ば、日本語と欧米系言語との自動翻訳装置において、類
縁の深いポルトガル語とスペイン語等は装置の使用する
辞書を変換するだけで流用されている。このため、構成
図やフローチャート等を使用してのハード面やソフト面
の説明は省略する。For this reason, in this embodiment, a character recognition dictionary and a word dictionary are created in advance for various documents, and the user can teach (input) the type of the document to be recognized before starting the recognition. By specifying the priority of the dictionary, the efficiency of character recognition is improved. However, the configuration of the hardware and software for realizing the priority input and the like by the dictionary user is a well-known technique. For example, in an automatic translation device for Japanese and Western languages, Portuguese and Spanish, which are closely related, are diverted only by converting a dictionary used by the device. For this reason, description of the hardware and software aspects using configuration diagrams, flowcharts, and the like will be omitted.

【００４２】以上、本発明を実施例にもとづいて説明し
てきたが、本発明は何も上記実施例に限定されないのは
勿論である。すなわち、以下のようなものも本発明に含
まれる。（１）製造等の都合で、各特許請求の範囲に記載した１
の構成要素（要件、ステップ）を複数のものとしてい
る。逆に、複数のものを１としている。あるいは、これ
らを適宜組み合わせている。Although the present invention has been described based on the embodiments, it is needless to say that the present invention is not limited to the above embodiments. That is, the following is also included in the present invention. (1) For the convenience of manufacturing, etc., 1
Have multiple components (requirements, steps). Conversely, one is set to a plurality. Alternatively, these are appropriately combined.

【００４３】（２）実施例では、単語は２字のものとし
ているが、多くの文字を対象として認識効果を上げるた
め３字、４字等の単語をも登録している。この際、認識
対象として採用する単語の優先順位の選択としては、他
に形態素解析や最長一致法が併用されるのは勿論であ
る。また、認識対象が欧米系の文書であるならば、この
効果は更に向上するであろう。(2) In the embodiment, the word has two letters. However, words of three letters, four letters, etc. are registered in order to improve the recognition effect for many characters. In this case, as a selection of the priority order of the word to be adopted as the recognition target, it goes without saying that morphological analysis and the longest matching method are also used in combination. If the recognition target is a European or American document, this effect will be further improved.

【００４４】（３）画像切出し手段は取出し、装着可能
のフロッピーディスク等の記憶部を内蔵した上で、他の
手段と別体としている。また、別体の他の手段はこの記
憶部を取り出し、装着可能としている。これにより、携
帯性の向上、高価な文字認識部本体の有効活用を図る。（４）文字は、漢字に限定されず、仮字（仮名）、ハン
グル文字、アルファベット、数学の記号あるいはこれら
と漢字からなるものとしている。(3) The image cutout means has a built-in storage unit such as a detachable and mountable floppy disk, and is separate from other means. In addition, another unit separate from the storage unit can take out the storage unit and make it attachable. Thereby, the portability is improved and the expensive character recognition unit body is effectively used. (4) The characters are not limited to kanji, but are composed of kana characters (kana), Hangul characters, alphabets, mathematical symbols, or kanji characters.

【００４５】（５）単語修正ステップで単語が修正され
る度に、評価関数修正ステップにおいて評価関数のパラ
メータを修正するとしたが、単語修正ステップでの単語
の修正が一定回数に達した時点で、評価関数修正ステッ
プでの評価関数のパラメータの修正を行うとしている。（６）文字のパターン認識は、アルファベットやアラビ
ア数字に対して用いられる決定木法（特願平５−６８５
８６号）等各種の方法がある。このため、その方法如何
によって、単語決定手段で用いられる所定の手順は、設
定値や自然対数を用いるものでなく他のものにしてい
る。(5) Each time a word is corrected in the word correction step, the parameters of the evaluation function are corrected in the evaluation function correction step. When the word correction in the word correction step reaches a certain number of times, It is stated that the parameters of the evaluation function are corrected in the evaluation function correction step. (6) Character pattern recognition is based on the decision tree method used for alphabets and Arabic numerals (Japanese Patent Application No. 5-685).
No. 86). For this reason, depending on the method, the predetermined procedure used in the word determination means is not a set value or a natural logarithm but another one.

【００４６】更に、文字の認識評価値そのものや単語の
出現頻度そのものが非常に低いならば適宜足切りを行う
ような手段を採用している。逆に、文字の認識評価値が
非常に高かったり、既に確定した文字があれば、それら
を候補単語選択に際して重要視している。（７）特殊な文書にあっては、その文書固有の記号、文
字等についてはあらかじめ装置側に教えておく、若しく
は入力可能とする機能を付加している。例えば、特許出
願書類における「〔」、「〕」類似の記号や数学の論文
における「＝」等である。Further, if the recognition evaluation value of the character itself or the frequency of appearance of the word itself is extremely low, a means for appropriately cutting off is adopted. Conversely, if the recognition evaluation value of the character is very high, or if there are already determined characters, those characters are regarded as important when selecting candidate words. (7) For a special document, a function of informing the apparatus in advance of a symbol, a character, and the like unique to the document or adding a function of enabling input is provided. For example, a symbol similar to "[" or "]" in a patent application document or "=" in a mathematical paper.

【００４７】[0047]

【発明の効果】以上説明してきたように本発明によれ
ば、文字に対する認識評価値とその文字にて構成される
単語の出現頻度とからあらかじめ定められた手順を用い
て単語評価値を計算することにより、総合的な判断によ
って文字認識結果を正しく決定することができる。ま
た、誤って出力された単語を修正し、この誤って選択さ
れた単語の出現頻度を更新し、更に修正後の単語が未登
録ならば、該単語そのものと出現頻度とを新規に登録す
る。これにより、正しい文字認識結果を得られる。さら
に、修正のあった単語などについては、その評価を行う
ために用いる関数そのものやそのパラメータを修正し、
調整することにより、更に正しい文字認識結果を得られ
る。As described above, according to the present invention, a word evaluation value is calculated from a recognition evaluation value for a character and an appearance frequency of a word composed of the character by using a predetermined procedure. Thus, the character recognition result can be correctly determined by comprehensive judgment. Further, the erroneously output word is corrected, the appearance frequency of the erroneously selected word is updated, and if the corrected word is not registered, the word itself and the appearance frequency are newly registered. Thereby, a correct character recognition result can be obtained. In addition, for words that have been modified, the function itself and its parameters used to perform the evaluation are modified,
By adjusting, a more correct character recognition result can be obtained.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明に係る文字認識装置の第一実施例の構成
図である。FIG. 1 is a configuration diagram of a first embodiment of a character recognition device according to the present invention.

【図２】上記実施例における処理のフローチャートであ
る。FIG. 2 is a flowchart of a process in the embodiment.

【図３】上記実施例における処理例の図である。FIG. 3 is a diagram of a processing example in the embodiment.

【図４】本発明に係る文字認識装置の第二実施例の構成
図である。FIG. 4 is a configuration diagram of a second embodiment of the character recognition device according to the present invention.

【図５】上記実施例における処理のフローチャートであ
る。FIG. 5 is a flowchart of a process in the embodiment.

【図６】上記実施例における処理例の図である。FIG. 6 is a diagram of a processing example in the embodiment.

【図７】従来の文字認識装置の構成図である。FIG. 7 is a configuration diagram of a conventional character recognition device.

【図８】従来の文字認識装置における処理例の図であ
る。FIG. 8 is a diagram of a processing example in a conventional character recognition device.

【符号の説明】[Explanation of symbols]

１０１文字特徴抽出手段１０２文字認識辞書１０３文字認識手段１０４単語辞書１０５単語探索手段１０６単語評価手段１０７単語決定手段１０８単語修正手段１０９単語辞書更新手段１１０評価関数修正手段４０１文字特徴抽出手段４０２文字認識辞書４０３単語辞書４０４単語認識手段４０５単語評価手段４０６単語決定手段４０７単語修正手段４０８単語辞書更新手段４０９評価関数修正手段 101 character feature extraction means 102 character recognition dictionary 103 character recognition means 104 word dictionary 105 word search means 106 word evaluation means 107 word determination means 108 word correction means 109 word dictionary update means 110 evaluation function correction means 401 character feature extraction means 402 character recognition Dictionary 403 Word dictionary 404 Word recognition means 405 Word evaluation means 406 Word determination means 407 Word correction means 408 Word dictionary update means 409 Evaluation function correction means

───────────────────────────────────────────────────── フロントページの続き (72)発明者高倉穂大阪府門真市大字門真1006番地松下電器産業株式会社内 (56)参考文献特開平６−4716（ＪＰ，Ａ) 特開昭59−106082（ＪＰ，Ａ) 特開平５−225401（ＪＰ，Ａ) 特開平５−120492（ＪＰ，Ａ) 特開昭63−257086（ＪＰ，Ａ) 特開平７−28929（ＪＰ，Ａ) 特開平４−218885（ＪＰ，Ａ) 高尾哲康西野文人，日本語文書リーダ後処理の実現と評価，情報処理学会論文誌，日本，社団法人情報処理学会, 1989年11月15日，Ｖｏｌ．30 Ｎｏ. 11，ｐ．1394−1401 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06K 9/72 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (72) Hoho Takakura 1006 Kazuma Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (56) References JP-A-6-4716 (JP, A) JP-A-59- 106082 (JP, A) JP-A-5-225401 (JP, A) JP-A-5-120492 (JP, A) JP-A-62-257086 (JP, A) JP-A-7-28929 (JP, A) Japanese Patent Laid-Open No. Hei 4-218885 (JP, A) Tetsuyasu Takao Fumito Nishino, Implementation and Evaluation of Japanese Document Reader Post-Processing, Transactions of IPSJ, Japan, Information Processing Society of Japan, November 15, 1989, Vol. 30 No. 11, p. 1394-1401 (58) Field surveyed (Int. Cl. ⁷ , DB name) G06K 9/72 JICST file (JOIS)

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】文字を認識するのに使用する特徴パター
ンを抽出する文字特徴抽出手段と、文字コードと該文字コードの文字であることを識別する
のに使用される識別パターンの組を登録した文字認識辞
書と、前記文字特徴抽出手段によって抽出された特徴パターン
と前記文字認識辞書に登録されている識別パターンとを
比較して所定の手順で認識評価値を計算することによ
り、抽出した１文字当たりあらかじめ定められた個数以
下の認識候補文字につき、その文字コード及びその確か
らしさを示す認識評価値からなる認識候補文字データを
出力する仮文字認識手段と、各文字毎に、その文字と組み合わせて単語を作る文字の
文字コードとその出現頻度の組を登録している単語辞書
と、相連続して認識対象となっている複数の文字毎に、前記
仮文字認識手段によって出力された各認識候補文字を文
字列中の位置を不変としたまま組み合わせることにより
単語を作成し、前記単語辞書中にこの単語が登録されて
いるか否かを調べ、若し登録されているならばその文字
と単語辞書に登録されている出現頻度を出力する単語探
索手段と、前記単語検索手段によって出力された単語について前記
仮文字認識手段によって出力された認識評価値と、前記
単語探索手段によって出力された出現頻度とから、所定
の手順で単語評価値を計算する単語評価手段と、前記単語評価手段によって計算された単語評価値をもと
に正しい単語を決定する単語決定手段と、前記単語決定手段によって決定された単語が誤っていた
場合には、使用者に正しい単語に修正可能とさせる単語
修正手段と、前記単語決定手段によって決定された単語が誤っていた
場合には、前記単語評価手段において認識評価値と出現
頻度とから単語評価値を算出する際に用いる出現頻度に
関する重みを変化させる評価手順修正手段とを備えるこ
とにより文字認識を行うことを特徴とする文字認識装
置。1. A character feature extracting means for extracting a feature pattern used for recognizing a character, and a set of a character code and an identification pattern used to identify a character of the character code are registered. A character recognition dictionary, and comparing the feature pattern extracted by the character feature extraction unit with the identification pattern registered in the character recognition dictionary to calculate a recognition evaluation value in a predetermined procedure, thereby extracting one character extracted. A temporary character recognition unit that outputs recognition candidate character data consisting of a character code and a recognition evaluation value indicating the likelihood of the recognition candidate character that is equal to or less than a predetermined number, and combining each character with the character. A word dictionary in which sets of character codes of characters forming words and their appearance frequencies are registered, and for each of a plurality of characters that are successively recognized, A word is created by combining each of the recognition candidate characters output by the provisional character recognition means while keeping the position in the character string unchanged, and it is checked whether or not this word is registered in the word dictionary. Word search means for outputting the character and the appearance frequency registered in the word dictionary if registered, a recognition evaluation value output by the provisional character recognition means for the word output by the word search means, Word evaluation means for calculating a word evaluation value in a predetermined procedure from the appearance frequency output by the word search means; word determination for determining a correct word based on the word evaluation value calculated by the word evaluation means Means and the word determined by said word determination means were incorrect
If so, the word that allows the user to correct it to the correct word
Correction means and the word determined by the word determination means were incorrect
In the case, the word evaluation means determines the recognition evaluation value and the appearance.
The appearance frequency used to calculate the word evaluation value from the frequency
A character recognition apparatus characterized by performing character recognition by including an evaluation procedure correcting means for changing a weight associated with the evaluation procedure .

【請求項２】文字を認識するのに使用する特徴パター
ンを抽出する文字特徴抽出手段と、文字コードとその文字コードの文字であることを識別す
るのに使用される識別パターンの組を登録した文字認識
辞書と、各文字毎に、その文字と組み合わせて単語を作る文字の
文字コードとその出現頻度の組を登録している単語辞書
と、前記単語辞書に登録されている単語について、その単語
を構成する文字コードに対応する前記文字認識辞書に格
納されている文字の識別パターンと前記文字特徴抽出手
段によって抽出された特徴パターンとを比較して、所定
の手順で両者の類似度を示す認識評価値を計算の上出力
する認識評価値計算手段と、前記認識評価値計算手段によって構成する文字の認識度
が高い単語を候補単語として所定数選出し、この上でこ
れらの前記単語辞書に格納されている出現頻度を出力す
る候補単語出現頻度出力手段と、前記認識評価値計算手段によって出力された認識評価値
と前記候補単語出現頻度出力手段の出力した出現頻度と
から、所定の手順で各候補単語について単語評価値を計
算する単語評価手段と、前記単語評価手段によって計算された単語評価値をもと
に、正しい単語を選択の上決定する単語決定手段と、前記単語決定手段によって決定された単語が誤っていた
場合には、使用者に正しい単語に修正可能とさせる単語
修正手段と、前記単語決定手段によって決定された単語が誤っていた
場合には、前記単語評価手段において認識評価値と出現
頻度とから単語評価値を算出する際に用いる出現頻度に
関する重みを変化させる評価手順修正手段とを備えるこ
とにより文字認識を行うことを特徴とする文字認識装
置。2. A character feature extraction means for extracting a feature pattern used for recognizing a character, and a set of a character code and an identification pattern used to identify a character having the character code are registered. A character recognition dictionary, for each character, a word dictionary in which a set of character codes and appearance frequencies of characters that make up a word by combining the character is registered; and for a word registered in the word dictionary, The character recognition pattern corresponding to the character code constituting the character recognition dictionary stored in the character recognition dictionary is compared with the characteristic pattern extracted by the character characteristic extraction means, and the recognition indicating the similarity between the two is performed in a predetermined procedure. A recognition evaluation value calculating means for calculating and outputting an evaluation value; and a predetermined number of words having a high degree of recognition of characters formed by the recognition evaluation value calculating means are selected as candidate words. A candidate word appearance frequency output unit that outputs the appearance frequency stored in the word dictionary; a recognition evaluation value output by the recognition evaluation value calculation unit; and an appearance frequency output by the candidate word appearance frequency output unit. A word evaluation unit that calculates a word evaluation value for each candidate word in a predetermined procedure; and a word determination unit that selects and determines a correct word based on the word evaluation value calculated by the word evaluation unit. The word determined by the word determination means was incorrect
If so, the word that allows the user to correct it to the correct word
Correction means and the word determined by the word determination means were incorrect
In the case, the word evaluation means determines the recognition evaluation value and the appearance.
The appearance frequency used to calculate the word evaluation value from the frequency
A character recognition apparatus characterized by performing character recognition by including an evaluation procedure correcting means for changing a weight associated with the evaluation procedure .

【請求項３】文字を認識するのに使用する特徴パター
ンを抽出する文字特徴抽出手段と、文字コードと該文字コードの文字であることを識別する
のに使用される識別パターンの組を登録した文字認識辞
書と、前記文字特徴抽出手段によって抽出された特徴パターン
と前記文字認識辞書に登録されている識別パターンとを
比較して所定の手順で認識評価値を計算することによ
り、抽出した１文字当たりあらかじめ定められた個数以
下の認識候補文字につき、その文字コード及びその確か
らしさを示す認識評価値からなる認識候補文字データを
出力する仮文字認識手段と、各文字毎に、その文字と組み合わせて単語を作る文字の
文字コードとその出現頻度の組を登録している単語辞書
と、相連続して認識対象となっている複数の文字毎に、前記
仮文字認識手段によって出力された各認識候補文字を文
字列中の位置を不変としたまま組み合わせることにより
単語を作成し、前記単語辞書中にこの単語が登録されて
いるか否かを調べ、若し登録されているならばその文字
と単語辞書に登録されている出現頻度を出力する単語探
索手段と、前記単語辞書に登録されている単語について、前記仮文
字認識手段によって出力された認識評価値のうち当該単
語を構成する認識候補文字の認識評価値と、前記単語探
索手段によって出力された当該単語の出現頻度とから、
所定の手順で単語評価値を計算し、前記単語辞書に登録
されていない単語について、前記仮文字認識手段によっ
て出力された認識評価値のうち当該単語を構成する認識
候補文字の認識評価値のみから所定の手順で単語評価値
を計算する単語評価手段と、前記単語辞書に登録されている単語の前記単語評価値
と、前記単語辞書に登録されていない単語の前記単語評
価値とを基にし、これらの単語の中から正しい単語を決
定する単語決定手段とを備えることにより文字認識を行
うことを特徴とする文字認識装置。 3. A feature pattern used to recognize a character.
Character feature extracting means for extracting a character code, and identifying the character code and the character of the character code
A character recognition dictionary that registers a set of identification patterns used for
And a feature pattern extracted by the character feature extracting means.
And the identification pattern registered in the character recognition dictionary.
In the calculation child the recognition evaluation value in a predetermined procedure in comparison
Or less than a predetermined number per extracted character.
For the following recognition candidate characters, their character codes and their
Recognition candidate character data consisting of recognition evaluation values
Provisional character recognition means to output, and for each character,
A word dictionary in which pairs of character codes and their appearance frequencies are registered
And for each of a plurality of characters that are successively recognized,
Each recognition candidate character output by the provisional character recognition means
By combining them while keeping the position in the character string unchanged
Create a word, and this word is registered in the word dictionary
Check if there is any, and if registered, the character
And word search to output the frequency of appearance registered in the word dictionary
Searching means, and the provisional sentence for a word registered in the word dictionary.
Out of the recognition evaluation values output by the character recognition means
A recognition evaluation value of a recognition candidate character constituting a word;
From the appearance frequency of the word output by the search means,
Calculate word evaluation value according to predetermined procedure and register in word dictionary
For words that are not recognized, the provisional character recognition means
Out of the recognition evaluation values output by
Word evaluation value from the recognition evaluation value of the candidate character only according to a predetermined procedure
Word evaluation means for calculating the word evaluation value, and the word evaluation value of a word registered in the word dictionary
And the word evaluation of words not registered in the word dictionary.
Determine the correct word from these words based on value.
Character recognition by providing
A character recognition device characterized by the following.

【請求項４】前記単語決定手段によって決定された単
語が誤っていた場合には、使用者に正しい単語に修正可
能とさせる単語修正手段と、前記単語決定手段によって決定された単語が誤っていた
場合には、前記単語評価手段において認識評価値と出現
頻度とから単語評価値を算出する際に用いる出現頻度に
関する重みを変化させる評価手順修正手段とを備えるこ
とを特徴とする請求項３記載の文字認識装置。 4. The unit determined by the word determining means.
If the word is incorrect, the user can correct it to the correct word
The word determined by the word correcting means to be enabled and the word determined by the word determining means were incorrect.
In the case, the word evaluation means determines the recognition evaluation value and the appearance.
The appearance frequency used to calculate the word evaluation value from the frequency
This and a evaluation procedures modifying means for changing the weight related
4. The character recognition device according to claim 3, wherein:

【請求項５】前記単語決定手段によって決定された単
語が誤っていた場合には、使用者に正しい単語に修正可
能とさせる単語修正手段と、誤って決定された単語及び修正後の正しい単語の少なく
も一については、前記単語辞書に登録されておれば出現
頻度を更新すること及び登録されていなければ単語その
ものと出現頻度を新規に登録することの少なくも一を行
う単語辞書更新手段とを備えたことを特徴とする請求項
１から請求項４のうちのいずれか一項に記載の文字認識
装置。5. A word correcting means for allowing a user to correct a word determined by the word determining means when the word is incorrect, and a word correcting means for correcting the incorrectly determined word and the corrected correct word. For at least one, a word dictionary updating unit that performs at least one of updating the appearance frequency if registered in the word dictionary and not newly registering the word itself and the appearance frequency if not registered. Claims comprising:
The character recognition device according to any one of claims 1 to 4 .

【請求項６】特徴パターンを抽出する文字特徴抽出ス
テップと、前記文字特徴抽出ステップによって得られた特徴パター
ンとあらかじめ作成されている文字コードと該文字コー
ドの文字であることを識別するのに使用される識別パタ
ーンの組である文字データが登録されている文字認識辞
書内の識別パターンとを比較する比較ステップと、文字認識辞書の文字識別パターンとの比較により、認識
対象の１文字当たりあらかじめ定められた個数以下の識
別候補文字につきその文字コードおよびその確からしさ
を示す認識評価値からなる認識候補文字データを得る仮
文字認識ステップと、前記単語探索ステップによって探索された単語につい
て、前記仮文字認識ステップによって得られた認識評価
値のうち単語を構成する認識候補文字の認識評価値と、
前記単語探索ステップによって得られた単語の出現頻度
とから、所定の手順にて単語評価値を得る単語評価ステ
ップと、前記単語選択手段を用いて、前記単語評価ステップにお
いて得られた単語評価値をもとに正しい単語を選択する
単語決定ステップと、前記単語決定ステップによって決定された単語が誤って
いた場合には、使用者に正しい単語に修正可能とさせる
単語修正ステップと、前記単語決定ステップによって決定された単語が誤って
いた場合には、前記単語評価ステップにおいて認識評価
値と出現頻度とから単語評価値を算出する際に用いる出
現頻度に関する重みを変化させる評価手順修正ステップ
とを有することにより文字認識を行うことを特徴とする
文字認識方法。6. A character feature extracting step of extracting a feature pattern, a feature pattern obtained by the character feature extracting step, a previously created character code, and a character code used to identify the character. A comparison step of comparing character data, which is a set of identification patterns to be registered, in a character recognition dictionary in which character data is registered; A provisional character recognition step of obtaining recognition candidate character data consisting of a character code and a recognition evaluation value indicating the likelihood of the identification candidate characters equal to or less than the given number of characters, and the provisional character recognition for the word searched by the word search step Recognition evaluation of recognition candidate characters that constitute a word among the recognition evaluation values obtained by the steps Value and
A word evaluation step of obtaining a word evaluation value in a predetermined procedure from the appearance frequency of the word obtained in the word search step; and using the word selection means, the word evaluation value obtained in the word evaluation step a word determination step of selecting a correct word based, incorrectly word determined by said word decision step
If so, allow the user to correct it to the correct word
The word determined in the word correction step and the word determination step
If there is a recognition evaluation in the word evaluation step
Value used to calculate the word evaluation value from the
Evaluation procedure modification step to change the weight related to the current frequency
Character recognition method characterized by performing character recognition by having and.

【請求項７】文字を認識するのに使用する特徴パター
ンを得る文字特徴抽出ステップと、各文字毎にその文字と組み合わせて単語を作る文字の文
字コードとその出現頻度の組を登録してあらかじめ作成
されている単語辞書を用いて、その単語辞書に格納され
ている単語について、文字コードとその文字コードの文
字であることを識別するのに使用される識別パターンと
の組を登録してあらかじめ作成されている文字認識辞書
を使用して、単語辞書に登録されている単語についてこ
れを構成する文字コードに対応する前記文字認識辞書に
格納されている識別パターンと前記文字特徴抽出ステッ
プによって得られた特徴パターンとを比較して、単語を
構成する文字の確からしさを示す認識評価値を計算の上
出力する認識評価値計算ステップと、前記認識評価値計算ステップにて、構成する文字の認識
度が高いとされた単語を候補単語として所定数選出し、
この上でこの候補単語について、前記単語辞書に格納さ
れている出現頻度を得た上で出力する単語認識ステップ
と、前記単語認識ステップによって選出された各候補単語に
ついて、前記認識評価値計算ステップによって得られた
各単語を構成する文字毎の認識評価値と、各単語の出現
頻度とから、所定の手順で単語評価値を得る単語評価ス
テップと、前記単語評価ステップにおいて得られた単語評価値をも
とに正しいと判断される単語を選択の上決定し、この決
定した単語を出力する単語決定ステップと、前記単語決定ステップによって決定された単語が誤って
いた場合には、使用者に正しい単語に修正可能とさせる
単語修正ステップと、前記単語決定ステップによって決定された単語が誤って
いた場合には、前記単語評価ステップにおいて認識評価
値と出現頻度とから単語評価値を算出する際に用いる出
現頻度に関する重みを変化させる評価手順修正ステップ
とを有していることを特徴とする文字認識方法。7. A character feature extraction step for obtaining a feature pattern used for character recognition, and for each character, a set of a character code of a character forming a word by combining with the character and an appearance frequency are registered and registered in advance. Using the created word dictionary, for a word stored in the word dictionary, register a set of a character code and an identification pattern used to identify a character of the character code, and register in advance. Using the created character recognition dictionary, the words registered in the word dictionary are obtained by the identification patterns stored in the character recognition dictionary corresponding to the character codes constituting the words and the character feature extracting step. A recognition evaluation value calculation step of comparing and outputting the recognition evaluation value indicating the likelihood of the characters constituting the word, and outputting the recognition evaluation value. At serial recognized evaluation value calculating step, and selecting a predetermined number of words is to have a high awareness of characters constituting a candidate word,
For this candidate word, a word recognition step of obtaining and outputting an appearance frequency stored in the word dictionary, and for each candidate word selected by the word recognition step, the recognition evaluation value calculation step A word evaluation step of obtaining a word evaluation value by a predetermined procedure from a recognition evaluation value for each character constituting each word obtained and an appearance frequency of each word; and a word evaluation value obtained in the word evaluation step. determined after selected words that are judged to be correct on the basis of a word determination step of outputting the determined word, incorrectly word determined by said word decision step
If so, allow the user to correct it to the correct word
The word determined in the word correction step and the word determination step
If there is a recognition evaluation in the word evaluation step
Value used to calculate the word evaluation value from the
Evaluation procedure modification step to change the weight related to the current frequency
And a character recognition method.

【請求項８】特徴パターンを抽出する文字特徴抽出ス
テップと、前記文字特徴抽出ステップによって得られた特徴パター
ンとあらかじめ作成されている文字コードと該文字コー
ドの文字であることを識別するのに使用される識別パタ
ーンの組である文字データが登録されている文字認識辞
書内の識別パターンとを比較する比較ステップと、文字認識辞書の文字識別パターンとの比較により、認識
対象の１文字当たりあらかじめ定められた個数以下の識
別候補文字につきその文字コードおよびその確からしさ
を示す認識評価値からなる認識候補文字データを得る仮
文字認識ステップと、相連続して認識対象となっている複数の文字毎に、前記
仮文字認識ステップによって出力された各認識候補文字
を文字列中の位置を不変としたまま組み合わせることに
より単語を作成し、前記単語辞書中にこの単語が登録さ
れているか否かを調べ、若し登録されているならばその
文字と単語辞書に登録されている出現頻度を出力する単
語探索ステップと、前記単語辞書に登録されている単語について、前記仮文
字認識ステップによって得られた認識評価値のうち当該
単語を構成する認識候補文字の認識評価値と、前記単語
探索ステップによって出力された当該単語の出現頻度と
から、所定の手順にて単語評価値を計算し、前記単語辞
書に登録されていない単語について、前記仮文字認識ス
テップによって得られた認識評価値のうち当該単語を構
成する認識候補文字の認識評価値のみから所定の手順で
単語評価値を計算する単語評価ステップと、前記単語辞書に登録されている単語の前記単語評価値
と、前記単語辞書に登録されていない単語の前記単語評
価値とを基にし、これらの単語の中から正しい単語を決
定する単語決定ステップとを有することにより文字認識
を行うことを特徴とする文字認識方法。 8. A character feature extraction system for extracting a feature pattern.
Step a, wherein the putter obtained by the character feature extracting step
And the previously created character code and the character code
Pattern used to identify the character
Character recognition dictionary in which character data that is a set of
The recognition step is performed by comparing with a character recognition pattern in a character recognition dictionary and a comparison step of comparing with a recognition pattern in a book.
No more than a predetermined number of characters per target character
The character code and the likelihood of another candidate character
To obtain recognition candidate character data consisting of recognition evaluation values indicating
And character recognition steps, for each of a plurality of characters phases continuously has a recognition target, wherein
Recognition candidate characters output by the provisional character recognition step
Can be combined without changing the position in the string
Word, and this word is registered in the word dictionary.
Check if it is registered, and if registered
Outputs the appearance frequency registered in the character and word dictionary.
A word search step, and for the words registered in the word dictionary,
Of the recognition evaluation values obtained in the character recognition step
A recognition evaluation value of a recognition candidate character constituting a word;
The appearance frequency of the word output by the search step and
, A word evaluation value is calculated by a predetermined procedure, and the word
For words that are not registered in the book,
Of the recognition evaluation values obtained by the step,
From the recognition evaluation value of the recognition candidate character
A word evaluation step of calculating a word evaluation value, and the word evaluation value of a word registered in the word dictionary
And the word evaluation of words not registered in the word dictionary.
Determine the correct word from these words based on value.
Character recognition by having a word determining step
And a character recognition method.

【請求項９】前記単語決定ステップによって決定され
た単語が誤っていた場合には、使用者に正しい単語に修
正可能とさせる単語修正ステップと、前記単語決定ステップによって決定された単語が誤って
いた場合には、前記単語評価ステップにおいて認識評価
値と出現頻度とから単語評価値を算出する際に用いる出
現頻度に関する重みを変化させる評価手順修正ステップ
とを有していることを特徴とする請求項８記載の文字認
識方法。 9. The method according to claim 1, wherein said word is determined by said word determining step.
If the entered word is incorrect, ask the user to correct it.
The word determined in the word correction step and the word decision step
If there is a recognition evaluation in the word evaluation step
Value used to calculate the word evaluation value from the
Evaluation procedure modification step to change the weight related to the current frequency
9. The character recognition device according to claim 8, wherein
Knowledge method.

【請求項１０】前記単語決定ステップによって決定さ
れた単語が誤っていた場合には使用者が正しい単語に修
正する単語修正ステップと、前記単語決定ステップによって決定された単語及び前記
単語修正ステップによって修正された単語の少なくも一
について、前記単語辞書更新手段を用いて、前記単語辞
書に登録されている単語であれば出現頻度を更新するこ
と及び登録されていない単語であれば単語そのものと出
現頻度とを登録することの少なくも一を行う単語辞書更
新ステップとを備えたことを特徴とする請求項６から請
求項９のうちいずれか一項に記載の文字認識方法。10. A word correction step in which a user corrects a word determined in the word determination step to a correct word when the word is incorrect, and a word determined in the word determination step and correction by the word correction step. For at least one of the registered words, the word dictionary updating unit is used to update the appearance frequency if the word is registered in the word dictionary, and the word itself and the appearance frequency if the word is not registered. 請claim 6, characterized in that a word dictionary update step of performing at least one registering bets
Character recognition method according to any one of Motomeko 9.