JPH02264388A - Character recognition post-processing device - Google Patents

Character recognition post-processing device

Info

Publication number
JPH02264388A
JPH02264388A JP1084126A JP8412689A JPH02264388A JP H02264388 A JPH02264388 A JP H02264388A JP 1084126 A JP1084126 A JP 1084126A JP 8412689 A JP8412689 A JP 8412689A JP H02264388 A JPH02264388 A JP H02264388A
Authority
JP
Japan
Prior art keywords
character
word
recognition
recognition candidate
correct
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP1084126A
Other languages
Japanese (ja)
Inventor
Mamoru Okada
守 岡田
Atsuko Kurihara
栗原 敦子
Sueshige Harada
季栄 原田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
N T T DATA TSUSHIN KK
NTT Data Group Corp
Original Assignee
N T T DATA TSUSHIN KK
NTT Data Communications Systems Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by N T T DATA TSUSHIN KK, NTT Data Communications Systems Corp filed Critical N T T DATA TSUSHIN KK
Priority to JP1084126A priority Critical patent/JPH02264388A/en
Publication of JPH02264388A publication Critical patent/JPH02264388A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To accurately and rapidly estimate a correct character by associating a word from a recognition candidate character, comparing the respective characters of the word with the recognition candidate character, and estimating the correct character from the recognition candidate character. CONSTITUTION:A retrieve part 2 associates the word having a certain character in any position, and retrieves the word as a retrieve key in the sequence form the candidate character at high recognition priority. Next by comparing each character included in the word with the recognition candidate character by a collating part 4, the likelihood of each word is calculated, and based on the likelihood of each word, the correct character is estimated by a collating part 5. Thus even when the correct character is not included in the recognition candidate character in the local character position, the corresponding word is associated and retrieved, and by not comparing the all combinations of the recognition candidate characters but only comparing the character included in the retrieved word with the recognition candidate character in the corresponding position, the correct character can be accurately and rapidly estimated.

Description

【発明の詳細な説明】 [産業上の利用分野コ 本発明は文字認識装置からの出力として得られる認識候
補文字から単語辞書を参照することにより正解文字を推
定する文字認識後処理装置に関するものである。
[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a character recognition post-processing device that estimates a correct character by referring to a word dictionary from recognition candidate characters obtained as output from a character recognition device. be.

[従来の技術] 従来のこの種の装置では、次の様な方法がとられていた
。すなわち、単語と認識候補文字を照合する方法では、
認識候補文字のすべての組合せを網羅してその組合せの
中から最も一致度が高くなる認識候補文字の組合せを求
めていた。このため、長さがN文字で各文字位置の認識
候補文字がM個からなる認識結果から正解文字を推定す
る場合には、811個の組合せを考慮する必要があり、
N=M=10とすれば1010という実大な数の組合せ
を想定して単語を照合する必要があったので、照合に時
間がかかるという問題があった。また、認識候補文字か
ら単語を検索する方法においては、照合開始文字位置か
ら始まる単語を検索するという方法がとられていた。こ
のため、照合開始文字位置の認識候補文字の中に正解文
字が存在しない場合には、検索した単語と認識候補文字
との照合が必ず失敗していたので、正解文字推定能力が
著しく低下するという問題があった。
[Prior Art] Conventional devices of this type have adopted the following method. In other words, in the method of matching words and recognition candidate characters,
All combinations of recognition candidate characters were covered, and a combination of recognition candidate characters with the highest degree of matching was found among the combinations. Therefore, when estimating the correct character from a recognition result with a length of N characters and M recognition candidate characters at each character position, it is necessary to consider 811 combinations.
If N=M=10, it was necessary to match the words by assuming a huge number of combinations, 1010, so there was a problem in that the matching took a long time. Furthermore, in the method of searching for a word from recognition candidate characters, a method has been used in which a word is searched starting from the matching start character position. For this reason, if the correct character does not exist among the recognition candidate characters at the matching start character position, the matching between the searched word and the recognition candidate characters will always fail, and the ability to estimate the correct character will be significantly reduced. There was a problem.

[発明が解決しようとする課題] 本発明の目的は、認識候補文字から単語を検索し該単語
と認識候補文字を照合する方法において正解文字推定能
力の向上と処理時間を短縮する文字認識後処理手段を提
供することにある。
[Problems to be Solved by the Invention] An object of the present invention is to provide post-character recognition processing that improves correct character estimation ability and shortens processing time in a method of searching for words from recognition candidate characters and matching the words with recognition candidate characters. It is about providing the means.

[課題を解決するための手段] 本発明は、ある文字をいずれかの文字位置に有する単語
を連想し認識順位が高い候補文字から順番に検索キーと
して単語を検索する検索部と、単語に含まれる各文字と
認識候補文字を比較することにより各単語の尤度を計算
する照合部と、各単語の尤度から正解文字を推定する処
理部を有することを最も主要な特徴とする。
[Means for Solving the Problems] The present invention includes a search unit that associates a word with a certain character in any character position and searches for a word as a search key in order of candidate characters having a high recognition rank; The most important feature is that it has a matching unit that calculates the likelihood of each word by comparing each character in the list with recognition candidate characters, and a processing unit that estimates the correct character from the likelihood of each word.

[作用] 本発明は、一部の文字位置で必ずしも正解文字が認識候
補文字に含まれていなくても単語を連想して検索し、か
つ、認識候補文字のすべての組合せを調べることなく正
解文字を推定するように作用する。
[Operation] The present invention searches for words by associating words even if the correct character is not necessarily included in the recognition candidate characters at some character positions, and searches for the correct character without examining all combinations of recognition candidate characters. It acts to estimate .

[実施例] 第1図は本発明の詳細な説明する図であって、1は認識
候補文字を記憶するメモリ、2は認識候補文字を検索キ
ーとして該当単語を検索する検索部、3は単語を記憶す
る辞書部、4は検索部2で得た単語とメモ+71に記憶
されている認識候補文字を突合せ照合する照合部、5は
照合部4の結果を受は取り最も確からしい単語を推定す
る処理部、6は照合結果を格納するメモリである。
[Example] FIG. 1 is a detailed diagram of the present invention, in which 1 is a memory that stores recognition candidate characters, 2 is a search unit that searches for a corresponding word using the recognition candidate characters as a search key, and 3 is a word. 4 is a collation unit that matches and collates the word obtained by the search unit 2 with the recognition candidate characters stored in the memo + 71; 5 is a collation unit that receives the results of the collation unit 4 and estimates the most likely word. A processing unit 6 is a memory that stores the verification results.

これを動作するには、まず外部の文字認識装置から候補
文字がメモリlに入力される。メモリ1は文字列長がN
文字、各文字に対応する候補文字をM文字とすると、(
NXMX2)バイトのメモリ容量で構成できる。
To operate this, candidate characters are first input into memory l from an external character recognition device. Memory 1 has a string length of N
If the candidate character corresponding to each character is M character, (
It can be configured with a memory capacity of NXMX2) bytes.

次に検索部2が動作する。第2図は検索部2の動作を説
明する図であり、20は文字列「神奈川県横須賀市」の
認識候補文字例、21と22は照合開始位置、23と2
4は操作範囲である。第2図において検索部2は、・文
字列方向に対して5=11すなわち文字位置を照合開始
位置21にセットし、文字列方向の検査範囲nと認識順
位方向の検査範囲mをセットする(第2図の例ではn=
3、m=5にセットされている)。次に検査範囲23に
含まれる候補文字のうち認識順位R=1の文字「捕」、
「奈」、「す」を検索キーとして辞書部3から単語を検
索し、照合部4に送出する。
Next, the search unit 2 operates. FIG. 2 is a diagram explaining the operation of the search unit 2, in which 20 is an example of recognition candidate characters for the character string "Yokosuka City, Kanagawa Prefecture", 21 and 22 are matching start positions, 23 and 2
4 is the operating range. In FIG. 2, the search unit 2 sets 5=11 in the character string direction, that is, the character position, to the matching start position 21, and sets the test range n in the character string direction and the test range m in the recognition ranking direction ( In the example in Figure 2, n=
3, m=5). Next, among the candidate characters included in the inspection range 23, the character “Ki” with recognition rank R=1,
Words are searched from the dictionary section 3 using "na" and "su" as search keys and sent to the matching section 4.

第3図は辞書の構成法を示す図であり、30は見出し部
、31は単語表記部である。第3図から単語表記部31
には見出し部30の文字を含む単語が収容される構成に
なっているので、見出し文字の位置に関係なく見出しW
&30の文字から単語を連想できることが容易に理解で
きる。この辞書構成は容量が増大するが、単語の平均文
字列長は2〜3文字前後という経験則があるので、容量
増大の度合は高々、2〜3倍である。
FIG. 3 is a diagram showing the construction method of the dictionary, where 30 is a heading section and 31 is a word notation section. From Figure 3, word notation section 31
Since the structure is such that words containing the characters in the heading part 30 are accommodated, the heading W
It is easy to understand that words can be associated with the letters &30. Although this dictionary structure increases the capacity, there is a rule of thumb that the average character string length of a word is about 2 to 3 characters, so the degree of increase in capacity is at most 2 to 3 times.

照合部4は、検索部2から単語を受は取ると該単語とメ
モリ1に記憶された候補文字を照合し、各単語の尤度を
算出する。第4図は検索単語と尤度の計算例を示す図で
あり、40は「捕」で検索された単語の例、41は「奈
」で検索された単語の例であり、本実施例では「す」で
検索される単語は空であるとする。第4図の詳細動作に
ついて以下に説明する。
When the collation unit 4 receives a word from the search unit 2, it collates the word with candidate characters stored in the memory 1, and calculates the likelihood of each word. FIG. 4 is a diagram showing a search word and an example of calculating the likelihood. 40 is an example of a word searched for "Kore", 41 is an example of a word searched for "NA", and in this example, It is assumed that the word searched for "su" is empty. The detailed operation of FIG. 4 will be explained below.

照合部4は認識候補の順位1〜順位5に対してWt、W
3.W3.w4.Wsを設定する。本実施例では、順位
iに対してW1=2’−’なる重みを割り当てる。すな
わち、w、=16. W、=3゜W3:4.W、=2.
wl=lとしている。
The matching unit 4 calculates Wt, W for ranks 1 to 5 of recognition candidates.
3. W3. w4. Set Ws. In this embodiment, a weight of W1=2'-' is assigned to the rank i. That is, w,=16. W, = 3°W3:4. W,=2.
It is assumed that wl=l.

まず、「捕」で検索された単語40に着目し、単語の先
頭を、第2図における照合開始位置21に合わせる。単
語「捕手」と「捕鯨」に関しては、第2文字目の「手」
と「鯨」は該当文字位置の認識候補文字の中に一致する
文字がないので、これらの尤度は、K=w、−w、=Q
となる。同様に、単語「捕鯨船Jと「逮捕」に関しては
、それぞれの尤度は、K=−16、K=−32となる。
First, attention is paid to the word 40 that was searched for "tori", and the beginning of the word is aligned with the matching start position 21 in FIG. Regarding the words "catcher" and "whaling", the second letter "hand"
Since there is no matching character among the recognition candidate characters for the corresponding character position, the likelihoods of these characters are K=w, -w, =Q
becomes. Similarly, for the words "whaler J" and "arrest", the respective likelihoods are K=-16 and K=-32.

 次に、「奈」で検索された単語41に着目し、上記と
同様の動作を行うことにより、「奈良」、「神奈川」、
「神奈川系」の尤度には、それぞれに=−32、K=8
、K=24となる。なお、各順位の重みは、w、≧W2
・・・≧W@>0 (mは候補順位の数)なる関係を満
足する数値であれば良く、文字認識装置の特性により値
を変更することが可能である。
Next, by focusing on the word 41 searched for "Nara" and performing the same operation as above, "Nara", "Kanagawa",
The likelihood of "Kanagawa system" is = -32, K = 8, respectively.
, K=24. Note that the weight of each rank is w, ≧W2
...≧W@>0 (m is the number of candidate rankings) Any numerical value may be used as long as it satisfies the relationship, and the value can be changed depending on the characteristics of the character recognition device.

照合部4は各単語とそれぞれの尤度の算出結果を処理部
5に通知し、処理部5が動作する。処理部5は、照合部
4から受けた単語のうち正値でかつ最大の尤度を有する
単語を検出する。このとき、正値の最大尤度を有する単
語が存在すれば、該単語を確定語としてメモリ6に書き
込む。第4図の例では単語「神奈川系」が検出される。
The matching unit 4 notifies the processing unit 5 of each word and the calculation results of their respective likelihoods, and the processing unit 5 operates. The processing unit 5 detects a word having a positive value and the maximum likelihood among the words received from the matching unit 4. At this time, if a word with a positive maximum likelihood exists, that word is written into the memory 6 as a definite word. In the example of FIG. 4, the word "Kanagawa-kei" is detected.

単語「神奈川系・」は長さが4文字であるので、第2図
において照合開始位置21から4文字分だけ位置を更新
して(すなわち、S←S+4として)新たな照合開始位
置22を検索部2へ通知し、検索部が前記と同様の動作
を繰り返す。一方、照合部4は、正値の尤度を有する単
語を検出できなっかだ場合には、認識順位を′l′だけ
更新して(すなわち、R−R+1として)、検索部2に
新たな単語の検索を依頼する。以上の繰り返し動作で、
R=5でかつ正値の最大尤度を有する単語が検出できな
っかだ場合には、照合開始位置の第1順位の候補文字を
未知語としてメモリ6へ書き込み、照合開始位置を′ 
l′だけ更新しくすなわち、S←S+1とし)、かつ認
識順位を初期値に設定しくすなわち、R−1とし)、検
索部2に新たな単語の検索を依頼する。以上の動作は、
処理部5がメモリ1に入力された文字列長と同じ長さの
文字列をメモリ6に書き込んだ時点で終了する。
Since the word "Kanagawa-kei・" has a length of 4 characters, the position in FIG. 2 is updated by 4 characters from the matching start position 21 (that is, as S←S+4) and a new matching start position 22 is searched. The search unit 2 is notified and the search unit repeats the same operation as described above. On the other hand, if the matching unit 4 is unable to detect a word with a positive likelihood, it updates the recognition ranking by 'l' (that is, as R-R+1) and sends a new Request a word search. With the above repeated operations,
If R=5 and a word with a positive maximum likelihood cannot be detected, the first candidate character at the matching start position is written to the memory 6 as an unknown word, and the matching start position is
The CPU 1 updates the recognition order by l' (ie, S←S+1) and sets the recognition rank to the initial value (ie, R-1), and requests the search unit 2 to search for a new word. The above operation is
The process ends when the processing unit 5 writes a character string with the same length as the character string input into the memory 1 into the memory 6.

以上の動作から明らかなように、従来の技術に比べて、
一部の文字位置で認識候補文字の中に正解文字が含まれ
ていなくても該当単語を連想して検索でき、かつ、認識
候補文字のすべての組合せでなく検索した単語に含まれ
る文字と対応する位置の認識候補文字を比較するのみで
正解文字を推定できる点が改善されている。
As is clear from the above operations, compared to the conventional technology,
Even if the correct character is not included in the recognition candidate characters at some character positions, you can search by association with the corresponding word, and it corresponds to the characters included in the searched word rather than all combinations of recognition candidate characters. An improvement is that the correct character can be estimated simply by comparing the recognition candidate characters at the position.

[発明の効果] 以上説明したように、認識候補文字から単語を連想し、
単語の各文字と認識候補文字を比較して認識候補文字か
ら正解文字を推定するので、正解文字が一部の文字位置
(特に、単語の先頭)で欠落していても精度よくかつ高
速に正解文字を推定できる利点がある。
[Effect of the invention] As explained above, words can be associated with recognition candidate characters,
The correct character is estimated from the recognition candidate characters by comparing each character of the word with the recognition candidate characters, so even if the correct character is missing at some character positions (especially at the beginning of the word), the correct answer is accurate and fast. It has the advantage of being able to estimate characters.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の詳細な説明図、第2図は検索部2の動
作説明図、第3図は辞書部3の構成の説明図、第4図は
照合部4の動作説明図である。 0・・・認識候補文字例 1122・・・照合開始位置、 3.24・・・検索範囲 0・・・見出し部、 1・・・表記部、 0.41・・・単語と尤度算出例。
FIG. 1 is a detailed explanatory diagram of the present invention, FIG. 2 is an explanatory diagram of the operation of the search section 2, FIG. 3 is an explanatory diagram of the configuration of the dictionary section 3, and FIG. 4 is an explanatory diagram of the operation of the collation section 4. . 0... Recognition candidate character example 1122... Verification start position, 3.24... Search range 0... Heading part, 1... Notation part, 0.41... Word and likelihood calculation example .

Claims (1)

【特許請求の範囲】[Claims] 長さがN文字でかつ各文字位置についてM個の認識候補
文字を有する文字列から正解文字を推定する文字認識後
処理装置において、文字照合開始位置から数えてn(n
≦N)文字位置までの候補文字のうち、第1順位から第
m(m≦M)順位までのそれぞれの候補文字について該
候補文字を含む単語を検索する検索部と、第1順位から
第m順位のそれぞれの順位に対して、w_1≧w_2≧
・・・≧w_m>0なる関係を満足する正値を設定し、
検索部で検索したそれぞれの単語について該単語に含ま
れるそれぞれの文字に対し、同じ文字位置の第i順位に
該文字と一致する候補文字が存在すればw_iなる正値
を加算し、一致する候補文字が存在しなければ−w_i
なる負値を加算することにより該単語の尤度を求める照
合部と、照合部で得た尤度から正解文字を推定する処理
部を具備することを特徴とする文字認識後処理装置。
In a character recognition post-processing device that estimates a correct character from a character string that is N characters long and has M recognition candidate characters for each character position, n(n
≦N) A search unit that searches for a word containing the candidate character for each candidate character from the first rank to the mth rank (m≦M) among the candidate characters up to the character position; For each rank, w_1≧w_2≧
...Set a positive value that satisfies the relationship ≧w_m>0,
For each word searched by the search unit, if there is a candidate character that matches the character in the i-th rank of the same character position for each character included in the word, a positive value w_i is added to find the matching candidate. If the character does not exist -w_i
1. A character recognition post-processing device comprising: a matching unit that calculates the likelihood of the word by adding a negative value; and a processing unit that estimates a correct character from the likelihood obtained by the matching unit.
JP1084126A 1989-04-04 1989-04-04 Character recognition post-processing device Pending JPH02264388A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1084126A JPH02264388A (en) 1989-04-04 1989-04-04 Character recognition post-processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1084126A JPH02264388A (en) 1989-04-04 1989-04-04 Character recognition post-processing device

Publications (1)

Publication Number Publication Date
JPH02264388A true JPH02264388A (en) 1990-10-29

Family

ID=13821819

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1084126A Pending JPH02264388A (en) 1989-04-04 1989-04-04 Character recognition post-processing device

Country Status (1)

Country Link
JP (1) JPH02264388A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0684006A (en) * 1992-04-09 1994-03-25 Internatl Business Mach Corp <Ibm> Method of online handwritten character recognition

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0684006A (en) * 1992-04-09 1994-03-25 Internatl Business Mach Corp <Ibm> Method of online handwritten character recognition

Similar Documents

Publication Publication Date Title
US20190018891A1 (en) Incremental maintenance of inverted indexes for approximate string matching
US8364470B2 (en) Text analysis method for finding acronyms
US8321409B1 (en) Document ranking using word relationships
US20060294092A1 (en) System and method for data sensitive filtering of patient demographic record queries
Zhou et al. Resolving surface forms to wikipedia topics
JPH079655B2 (en) Spelling error detection and correction method and apparatus
US8527503B2 (en) Processing search queries in a network of interconnected nodes
JPH0362000A (en) Efficient simplified algorithm of negative markov model speech recognition
Raghu et al. Disentangling language and knowledge in task-oriented dialogs
US20030158725A1 (en) Method and apparatus for identifying words with common stems
JPWO2003034279A1 (en) Information retrieval method, information retrieval program, and computer-readable recording medium on which information retrieval program is recorded
JP3777456B2 (en) Japanese morphological analysis method and apparatus, and dictionary unregistered word collection method and apparatus
US8606772B1 (en) Efficient multiple-keyword match technique with large dictionaries
JPH02264388A (en) Character recognition post-processing device
Gupta et al. Fast and effective searches of personal names in an international environment
Daciuk Treatment of unknown words
JPH06325091A (en) Similarity evaluation type data base retrieval device
US12032609B1 (en) System, method, and computer program for performing semantic type-ahead suggestions for natural language database searches
CN116881437B (en) Data processing system for acquiring text set
JP3799080B2 (en) Information collection method and apparatus
Das An alternate approach for question answering system in Bengali language using classification techniques
JPH10105578A (en) Similar word retrieving method utilizing point
CN110083679B (en) Search request processing method and device, electronic equipment and storage medium
KR101767625B1 (en) Apparatus and Method for Searching Minimum Segmentation of Japanese Sentence based on Dynamic programming
Nikam Efficient and accurate approach for approximate string search in spatial dataset