JPH02264388A

JPH02264388A - Character recognition post-processing device

Info

Publication number: JPH02264388A
Application number: JP1084126A
Authority: JP
Inventors: Mamoru Okada; 守岡田; Atsuko Kurihara; 栗原　敦子; Sueshige Harada; 季栄原田
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Group Corp
Priority date: 1989-04-04
Filing date: 1989-04-04
Publication date: 1990-10-29

Abstract

PURPOSE:To accurately and rapidly estimate a correct character by associating a word from a recognition candidate character, comparing the respective characters of the word with the recognition candidate character, and estimating the correct character from the recognition candidate character. CONSTITUTION:A retrieve part 2 associates the word having a certain character in any position, and retrieves the word as a retrieve key in the sequence form the candidate character at high recognition priority. Next by comparing each character included in the word with the recognition candidate character by a collating part 4, the likelihood of each word is calculated, and based on the likelihood of each word, the correct character is estimated by a collating part 5. Thus even when the correct character is not included in the recognition candidate character in the local character position, the corresponding word is associated and retrieved, and by not comparing the all combinations of the recognition candidate characters but only comparing the character included in the retrieved word with the recognition candidate character in the corresponding position, the correct character can be accurately and rapidly estimated.

Description

【発明の詳細な説明】［産業上の利用分野コ本発明は文字認識装置からの出力として得られる認識候
補文字から単語辞書を参照することにより正解文字を推
定する文字認識後処理装置に関するものである。[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a character recognition post-processing device that estimates a correct character by referring to a word dictionary from recognition candidate characters obtained as output from a character recognition device. be.

［従来の技術］従来のこの種の装置では、次の様な方法がとられていた
。すなわち、単語と認識候補文字を照合する方法では、
認識候補文字のすべての組合せを網羅してその組合せの
中から最も一致度が高くなる認識候補文字の組合せを求
めていた。このため、長さがＮ文字で各文字位置の認識
候補文字がＭ個からなる認識結果から正解文字を推定す
る場合には、８１１個の組合せを考慮する必要があり、
Ｎ＝Ｍ＝１０とすれば１０１０という実大な数の組合せ
を想定して単語を照合する必要があったので、照合に時
間がかかるという問題があった。また、認識候補文字か
ら単語を検索する方法においては、照合開始文字位置か
ら始まる単語を検索するという方法がとられていた。こ
のため、照合開始文字位置の認識候補文字の中に正解文
字が存在しない場合には、検索した単語と認識候補文字
との照合が必ず失敗していたので、正解文字推定能力が
著しく低下するという問題があった。[Prior Art] Conventional devices of this type have adopted the following method. In other words, in the method of matching words and recognition candidate characters,
All combinations of recognition candidate characters were covered, and a combination of recognition candidate characters with the highest degree of matching was found among the combinations. Therefore, when estimating the correct character from a recognition result with a length of N characters and M recognition candidate characters at each character position, it is necessary to consider 811 combinations.
If N=M=10, it was necessary to match the words by assuming a huge number of combinations, 1010, so there was a problem in that the matching took a long time. Furthermore, in the method of searching for a word from recognition candidate characters, a method has been used in which a word is searched starting from the matching start character position. For this reason, if the correct character does not exist among the recognition candidate characters at the matching start character position, the matching between the searched word and the recognition candidate characters will always fail, and the ability to estimate the correct character will be significantly reduced. There was a problem.

［発明が解決しようとする課題］本発明の目的は、認識候補文字から単語を検索し該単語
と認識候補文字を照合する方法において正解文字推定能
力の向上と処理時間を短縮する文字認識後処理手段を提
供することにある。[Problems to be Solved by the Invention] An object of the present invention is to provide post-character recognition processing that improves correct character estimation ability and shortens processing time in a method of searching for words from recognition candidate characters and matching the words with recognition candidate characters. It is about providing the means.

［課題を解決するための手段］本発明は、ある文字をいずれかの文字位置に有する単語
を連想し認識順位が高い候補文字から順番に検索キーと
して単語を検索する検索部と、単語に含まれる各文字と
認識候補文字を比較することにより各単語の尤度を計算
する照合部と、各単語の尤度から正解文字を推定する処
理部を有することを最も主要な特徴とする。[Means for Solving the Problems] The present invention includes a search unit that associates a word with a certain character in any character position and searches for a word as a search key in order of candidate characters having a high recognition rank; The most important feature is that it has a matching unit that calculates the likelihood of each word by comparing each character in the list with recognition candidate characters, and a processing unit that estimates the correct character from the likelihood of each word.

［作用］本発明は、一部の文字位置で必ずしも正解文字が認識候
補文字に含まれていなくても単語を連想して検索し、か
つ、認識候補文字のすべての組合せを調べることなく正
解文字を推定するように作用する。[Operation] The present invention searches for words by associating words even if the correct character is not necessarily included in the recognition candidate characters at some character positions, and searches for the correct character without examining all combinations of recognition candidate characters. It acts to estimate .

［実施例］第１図は本発明の詳細な説明する図であって、１は認識
候補文字を記憶するメモリ、２は認識候補文字を検索キ
ーとして該当単語を検索する検索部、３は単語を記憶す
る辞書部、４は検索部２で得た単語とメモ＋７１に記憶
されている認識候補文字を突合せ照合する照合部、５は
照合部４の結果を受は取り最も確からしい単語を推定す
る処理部、６は照合結果を格納するメモリである。[Example] FIG. 1 is a detailed diagram of the present invention, in which 1 is a memory that stores recognition candidate characters, 2 is a search unit that searches for a corresponding word using the recognition candidate characters as a search key, and 3 is a word. 4 is a collation unit that matches and collates the word obtained by the search unit 2 with the recognition candidate characters stored in the memo + 71; 5 is a collation unit that receives the results of the collation unit 4 and estimates the most likely word. A processing unit 6 is a memory that stores the verification results.

これを動作するには、まず外部の文字認識装置から候補
文字がメモリｌに入力される。メモリ１は文字列長がＮ
文字、各文字に対応する候補文字をＭ文字とすると、（
ＮＸＭＸ２）バイトのメモリ容量で構成できる。To operate this, candidate characters are first input into memory l from an external character recognition device. Memory 1 has a string length of N
If the candidate character corresponding to each character is M character, (
It can be configured with a memory capacity of NXMX2) bytes.

次に検索部２が動作する。第２図は検索部２の動作を説
明する図であり、２０は文字列「神奈川県横須賀市」の
認識候補文字例、２１と２２は照合開始位置、２３と２
４は操作範囲である。第２図において検索部２は、・文
字列方向に対して５＝１１すなわち文字位置を照合開始
位置２１にセットし、文字列方向の検査範囲ｎと認識順
位方向の検査範囲ｍをセットする（第２図の例ではｎ＝
３、ｍ＝５にセットされている）。次に検査範囲２３に
含まれる候補文字のうち認識順位Ｒ＝１の文字「捕」、
「奈」、「す」を検索キーとして辞書部３から単語を検
索し、照合部４に送出する。Next, the search unit 2 operates. FIG. 2 is a diagram explaining the operation of the search unit 2, in which 20 is an example of recognition candidate characters for the character string "Yokosuka City, Kanagawa Prefecture", 21 and 22 are matching start positions, 23 and 2
4 is the operating range. In FIG. 2, the search unit 2 sets 5=11 in the character string direction, that is, the character position, to the matching start position 21, and sets the test range n in the character string direction and the test range m in the recognition ranking direction ( In the example in Figure 2, n=
3, m=5). Next, among the candidate characters included in the inspection range 23, the character “Ki” with recognition rank R=1,
Words are searched from the dictionary section 3 using "na" and "su" as search keys and sent to the matching section 4.

第３図は辞書の構成法を示す図であり、３０は見出し部
、３１は単語表記部である。第３図から単語表記部３１
には見出し部３０の文字を含む単語が収容される構成に
なっているので、見出し文字の位置に関係なく見出しＷ
＆３０の文字から単語を連想できることが容易に理解で
きる。この辞書構成は容量が増大するが、単語の平均文
字列長は２〜３文字前後という経験則があるので、容量
増大の度合は高々、２〜３倍である。FIG. 3 is a diagram showing the construction method of the dictionary, where 30 is a heading section and 31 is a word notation section. From Figure 3, word notation section 31
Since the structure is such that words containing the characters in the heading part 30 are accommodated, the heading W
It is easy to understand that words can be associated with the letters &30. Although this dictionary structure increases the capacity, there is a rule of thumb that the average character string length of a word is about 2 to 3 characters, so the degree of increase in capacity is at most 2 to 3 times.

照合部４は、検索部２から単語を受は取ると該単語とメ
モリ１に記憶された候補文字を照合し、各単語の尤度を
算出する。第４図は検索単語と尤度の計算例を示す図で
あり、４０は「捕」で検索された単語の例、４１は「奈
」で検索された単語の例であり、本実施例では「す」で
検索される単語は空であるとする。第４図の詳細動作に
ついて以下に説明する。When the collation unit 4 receives a word from the search unit 2, it collates the word with candidate characters stored in the memory 1, and calculates the likelihood of each word. FIG. 4 is a diagram showing a search word and an example of calculating the likelihood. 40 is an example of a word searched for "Kore", 41 is an example of a word searched for "NA", and in this example, It is assumed that the word searched for "su" is empty. The detailed operation of FIG. 4 will be explained below.

照合部４は認識候補の順位１〜順位５に対してＷｔ、Ｗ
３．Ｗ３．ｗ４．Ｗｓを設定する。本実施例では、順位
ｉに対してＷ１＝２’−’なる重みを割り当てる。すな
わち、ｗ、＝１６．　Ｗ、＝３゜Ｗ３：４．Ｗ、＝２．
ｗｌ＝ｌとしている。The matching unit 4 calculates Wt, W for ranks 1 to 5 of recognition candidates.
3. W3. w4. Set Ws. In this embodiment, a weight of W1=2'-' is assigned to the rank i. That is, w,=16. W, = 3°W3:4. W,=2.
It is assumed that wl=l.

まず、「捕」で検索された単語４０に着目し、単語の先
頭を、第２図における照合開始位置２１に合わせる。単
語「捕手」と「捕鯨」に関しては、第２文字目の「手」
と「鯨」は該当文字位置の認識候補文字の中に一致する
文字がないので、これらの尤度は、Ｋ＝ｗ、−ｗ、＝Ｑ
となる。同様に、単語「捕鯨船Ｊと「逮捕」に関しては
、それぞれの尤度は、Ｋ＝−１６、Ｋ＝−３２となる。First, attention is paid to the word 40 that was searched for "tori", and the beginning of the word is aligned with the matching start position 21 in FIG. Regarding the words "catcher" and "whaling", the second letter "hand"
Since there is no matching character among the recognition candidate characters for the corresponding character position, the likelihoods of these characters are K=w, -w, =Q
becomes. Similarly, for the words "whaler J" and "arrest", the respective likelihoods are K=-16 and K=-32.

　次に、「奈」で検索された単語４１に着目し、上記と
同様の動作を行うことにより、「奈良」、「神奈川」、
「神奈川系」の尤度には、それぞれに＝−３２、Ｋ＝８
、Ｋ＝２４となる。なお、各順位の重みは、ｗ、≧Ｗ２
・・・≧Ｗ＠＞０　（ｍは候補順位の数）なる関係を満
足する数値であれば良く、文字認識装置の特性により値
を変更することが可能である。Next, by focusing on the word 41 searched for "Nara" and performing the same operation as above, "Nara", "Kanagawa",
The likelihood of "Kanagawa system" is = -32, K = 8, respectively.
, K=24. Note that the weight of each rank is w, ≧W2
...≧W@>0 (m is the number of candidate rankings) Any numerical value may be used as long as it satisfies the relationship, and the value can be changed depending on the characteristics of the character recognition device.

照合部４は各単語とそれぞれの尤度の算出結果を処理部
５に通知し、処理部５が動作する。処理部５は、照合部
４から受けた単語のうち正値でかつ最大の尤度を有する
単語を検出する。このとき、正値の最大尤度を有する単
語が存在すれば、該単語を確定語としてメモリ６に書き
込む。第４図の例では単語「神奈川系」が検出される。The matching unit 4 notifies the processing unit 5 of each word and the calculation results of their respective likelihoods, and the processing unit 5 operates. The processing unit 5 detects a word having a positive value and the maximum likelihood among the words received from the matching unit 4. At this time, if a word with a positive maximum likelihood exists, that word is written into the memory 6 as a definite word. In the example of FIG. 4, the word "Kanagawa-kei" is detected.

単語「神奈川系・」は長さが４文字であるので、第２図
において照合開始位置２１から４文字分だけ位置を更新
して（すなわち、Ｓ←Ｓ＋４として）新たな照合開始位
置２２を検索部２へ通知し、検索部が前記と同様の動作
を繰り返す。一方、照合部４は、正値の尤度を有する単
語を検出できなっかだ場合には、認識順位を′ｌ′だけ
更新して（すなわち、Ｒ−Ｒ＋１として）、検索部２に
新たな単語の検索を依頼する。以上の繰り返し動作で、
Ｒ＝５でかつ正値の最大尤度を有する単語が検出できな
っかだ場合には、照合開始位置の第１順位の候補文字を
未知語としてメモリ６へ書き込み、照合開始位置を′　
ｌ′だけ更新しくすなわち、Ｓ←Ｓ＋１とし）、かつ認
識順位を初期値に設定しくすなわち、Ｒ−１とし）、検
索部２に新たな単語の検索を依頼する。以上の動作は、
処理部５がメモリ１に入力された文字列長と同じ長さの
文字列をメモリ６に書き込んだ時点で終了する。Since the word "Kanagawa-kei・" has a length of 4 characters, the position in FIG. 2 is updated by 4 characters from the matching start position 21 (that is, as S←S+4) and a new matching start position 22 is searched. The search unit 2 is notified and the search unit repeats the same operation as described above. On the other hand, if the matching unit 4 is unable to detect a word with a positive likelihood, it updates the recognition ranking by 'l' (that is, as R-R+1) and sends a new Request a word search. With the above repeated operations,
If R=5 and a word with a positive maximum likelihood cannot be detected, the first candidate character at the matching start position is written to the memory 6 as an unknown word, and the matching start position is
The CPU 1 updates the recognition order by l' (ie, S←S+1) and sets the recognition rank to the initial value (ie, R-1), and requests the search unit 2 to search for a new word. The above operation is
The process ends when the processing unit 5 writes a character string with the same length as the character string input into the memory 1 into the memory 6.

以上の動作から明らかなように、従来の技術に比べて、
一部の文字位置で認識候補文字の中に正解文字が含まれ
ていなくても該当単語を連想して検索でき、かつ、認識
候補文字のすべての組合せでなく検索した単語に含まれ
る文字と対応する位置の認識候補文字を比較するのみで
正解文字を推定できる点が改善されている。As is clear from the above operations, compared to the conventional technology,
Even if the correct character is not included in the recognition candidate characters at some character positions, you can search by association with the corresponding word, and it corresponds to the characters included in the searched word rather than all combinations of recognition candidate characters. An improvement is that the correct character can be estimated simply by comparing the recognition candidate characters at the position.

［発明の効果］以上説明したように、認識候補文字から単語を連想し、
単語の各文字と認識候補文字を比較して認識候補文字か
ら正解文字を推定するので、正解文字が一部の文字位置
（特に、単語の先頭）で欠落していても精度よくかつ高
速に正解文字を推定できる利点がある。[Effect of the invention] As explained above, words can be associated with recognition candidate characters,
The correct character is estimated from the recognition candidate characters by comparing each character of the word with the recognition candidate characters, so even if the correct character is missing at some character positions (especially at the beginning of the word), the correct answer is accurate and fast. It has the advantage of being able to estimate characters.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は本発明の詳細な説明図、第２図は検索部２の動
作説明図、第３図は辞書部３の構成の説明図、第４図は
照合部４の動作説明図である。０・・・認識候補文字例１１２２・・・照合開始位置、３．２４・・・検索範囲０・・・見出し部、１・・・表記部、０．４１・・・単語と尤度算出例。FIG. 1 is a detailed explanatory diagram of the present invention, FIG. 2 is an explanatory diagram of the operation of the search section 2, FIG. 3 is an explanatory diagram of the configuration of the dictionary section 3, and FIG. 4 is an explanatory diagram of the operation of the collation section 4. . 0... Recognition candidate character example 1122... Verification start position, 3.24... Search range 0... Heading part, 1... Notation part, 0.41... Word and likelihood calculation example .

Claims

【特許請求の範囲】[Claims]

長さがＮ文字でかつ各文字位置についてＭ個の認識候補
文字を有する文字列から正解文字を推定する文字認識後
処理装置において、文字照合開始位置から数えてｎ（ｎ
≦Ｎ）文字位置までの候補文字のうち、第１順位から第
ｍ（ｍ≦Ｍ）順位までのそれぞれの候補文字について該
候補文字を含む単語を検索する検索部と、第１順位から
第ｍ順位のそれぞれの順位に対して、ｗ＿１≧ｗ＿２≧
・・・≧ｗ＿ｍ＞０なる関係を満足する正値を設定し、
検索部で検索したそれぞれの単語について該単語に含ま
れるそれぞれの文字に対し、同じ文字位置の第ｉ順位に
該文字と一致する候補文字が存在すればｗ＿ｉなる正値
を加算し、一致する候補文字が存在しなければ−ｗ＿ｉ
なる負値を加算することにより該単語の尤度を求める照
合部と、照合部で得た尤度から正解文字を推定する処理
部を具備することを特徴とする文字認識後処理装置。In a character recognition post-processing device that estimates a correct character from a character string that is N characters long and has M recognition candidate characters for each character position, n(n
≦N) A search unit that searches for a word containing the candidate character for each candidate character from the first rank to the mth rank (m≦M) among the candidate characters up to the character position; For each rank, w_1≧w_2≧
...Set a positive value that satisfies the relationship ≧w_m>0,
For each word searched by the search unit, if there is a candidate character that matches the character in the i-th rank of the same character position for each character included in the word, a positive value w_i is added to find the matching candidate. If the character does not exist -w_i
1. A character recognition post-processing device comprising: a matching unit that calculates the likelihood of the word by adding a negative value; and a processing unit that estimates a correct character from the likelihood obtained by the matching unit.