JPH08305722A

JPH08305722A - Character string retrieving device

Info

Publication number: JPH08305722A
Application number: JP7109794A
Authority: JP
Inventors: Hirochika Kishimoto; 泰親岸元
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1995-05-08
Filing date: 1995-05-08
Publication date: 1996-11-22
Anticipated expiration: 2016-11-19
Also published as: JP3230641B2

Abstract

PURPOSE: To make it possible to execute the retrieval without executing determi nating operation for determining characters in a retrieving character string or misrecognition correcting operation by calculating the degree of coincidence with a character string to be retrieved in a data base on the accuracy value of a coincident character candidate. CONSTITUTION: This character string retrieving device consists of a handwritten character reader 11 for inputting a retrieving character string of which characters have not been determined by the use of a tablet 5 and an input pen 7 and a retrieving processing part 12 for executing coincidence degree calculating processing for calculating the degree of coincidence with the retrieving character string based on a data base stored in an auxiliary storage device 3 and the accuracy value of a character coincident with each partial character string included in a character string to be retrieved stored in the data base and extracting data to be retrieved by comparing the calculated degree of coincidence with the degree of coincidence of another partial character string.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文字読取装置を備えた
電子機器などに用いるものであって、データベースから
検索文字列に一致する文字列を含むデータを検索する文
字列検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character string retrieving device for use in an electronic device or the like having a character reading device, and for retrieving data containing a character string matching a retrieval character string from a database.

【０００２】[0002]

【従来の技術】従来、データベースの検索では、データ
ベース中の各データのタイトルやキーワードまたはデー
タ本体などの文字列を対象として行われる場合がある。
このような文字列の検索（文字列探索[string searchin
g]または文字列照合[string pattern matching]）は、
検索文字列（パターン[pattern]）を指定して、各デー
タ中の検索対象となる欄（探索キー[search key]）の被
検索文字列（テキスト[text]）がこの検索文字列に一致
する文字列を含むかどうかを判断するものである。従来
の文字列の検索処理を説明する。この検索処理では、図
１０に示すように、検索文字列Ｐを指定してデータベー
ス中の各データの被検索文字列Ｔを検索するものとす
る。ここで、各文字は□で示す。また、検索文字列Ｐ
は、文字数をｍ文字（図ではｍ＝４）とし、ｉ文字目の
文字はＰ[i]で表すと共に、被検索文字列Ｔは、文字数
をｎ文字とし、ｉ文字目の文字はＴ[i]で表すものとす
る。検索処理は、検索文字列Ｐが被検索文字列Ｔに含ま
れる文字数ｍの各部分文字列と一致するかどうかを検査
する。この部分文字列は、被検索文字列Ｔが検索文字列
Ｐの文字数以上の場合にのみ存在し（ｎ≧ｍ）、この被
検索文字列Ｔ中にｎ−ｍ＋１個存在する。即ち、被検索
文字列Ｔ中の１文字目の文字からｍ文字目の文字までの
Ｔ[1]〜Ｔ[m]の文字列が１番目の部分文字列となり、Ｔ
[2]〜Ｔ[m+1]の文字列が２番目の部分文字列となって、
以降同様にＴ[n-m+1]〜Ｔ[n]の文字列がｎ−ｍ＋１番目
の最後の部分文字列となる。2. Description of the Related Art Conventionally, a database may be searched for a character string such as a title, a keyword, or a data body of each data in the database.
Such a string search (string search [string searchin
g] or string matching [string pattern matching]
Specify the search character string (pattern [pattern]), and the searched character string (text [text]) of the search target column (search key [search key]) in each data matches this search character string. It is to determine whether or not a character string is included. A conventional character string search process will be described. In this search processing, as shown in FIG. 10, the search character string P is specified to search for the search target character string T of each data in the database. Here, each character is indicated by □. Also, the search character string P
Indicates the number of characters is m (m = 4 in the figure), the i-th character is represented by P [i], and the searched character string T has the number of characters n and the i-th character is T [ i]. The search processing checks whether or not the search character string P matches each partial character string of the number of characters m included in the searched character string T. This partial character string exists only when the searched character string T is equal to or larger than the number of characters of the searched character string P (n ≧ m), and n−m + 1 exist in this searched character string T. That is, the character string of T [1] to T [m] from the first character to the mth character in the searched character string T becomes the first partial character string, and T
The character string from [2] to T [m + 1] becomes the second partial character string,
After that, similarly, the character string of T [n-m + 1] to T [n] becomes the last partial character string of the (n−m + 1) th part.

【０００３】図１１に示すように、検索処理の最初のス
テップ（以下「Ｓ」という）５１では、まず部分文字列
の先頭位置ｔｏｐの値を“０”に初期化する。この先頭
位置ｔｏｐの値は、検査対象となる部分文字列の先頭文
字が被検索文字列Ｔの先頭文字Ｔ[1]から何文字目にな
るかを示す。そして、文字数ｎの被検索文字列Ｔの先頭
位置ｔｏｐ以降に文字数ｍ分の文字があるかどうかを判
断して（Ｓ５２）、ｍ文字分に足りず部分文字列が存在
しない場合には検索結果が不一致であるとして検索処理
を終了する。検索処理の開始時にここで検索結果が不一
致であるとして終了するのは、被検索文字列Ｔの文字数
ｎが検索文字列Ｐの文字数ｍよりも少なかった場合であ
る。As shown in FIG. 11, in the first step (hereinafter referred to as "S") 51 of the retrieval process, the value of the top position top of the partial character string is first initialized to "0". The value of the head position top indicates how many characters the head character of the partial character string to be inspected is from the head character T [1] of the searched character string T. Then, it is determined whether or not there are characters for the number of characters m after the top position top of the searched character string T having the number of characters n (S52). If there are not enough partial characters for m characters, the search result And the search processing is terminated because there is a mismatch. At the start of the search process, the search result is not matched and ends when the number of characters n of the searched character string T is less than the number of characters m of the search character string P.

【０００４】Ｓ５２でｍ文字分の文字があり部分文字列
が存在すると判断された場合には、カウンタｉの値を
“１”に初期化して（Ｓ５３）、検索文字列Ｐにおける
ｉ文字目の文字Ｐ[i]が被検索文字列Ｔにおける部分文
字列の先頭位置ｔｏｐからｉ文字目の文字Ｔ[top+i]に
一致するかどうかを判断する（Ｓ５４）。そして、これ
らの文字が一致する場合には、カウンタｉに“１”を加
えて（Ｓ５５）、このカウンタｉの値が検索文字列Ｐの
文字数ｍを超えるまでの間、Ｓ５４に戻りこの処理を繰
り返す（Ｓ５６）。また、Ｓ５６でカウンタｉの値が文
字数ｍを超えたと判断された場合には、検索結果が一致
であるとして検索処理を終了する。したがって、被検索
文字列Ｔにおける先頭位置ｔｏｐからｍ文字分の部分文
字列の各文字が検索文字列Ｐの同じ文字位置で対応する
文字と全て一致する場合には、このＳ５４〜Ｓ５６のル
ープをｍ回繰り返した後に検索結果が一致であるとして
終了する。When it is determined in S52 that there are m characters and a partial character string exists, the value of the counter i is initialized to "1" (S53) and the i-th character in the search character string P is initialized. It is determined whether or not the character P [i] matches the i-th character T [top + i] from the top position top of the partial character string in the searched character string T (S54). If these characters match, "1" is added to the counter i (S55), and the process returns to S54 until the value of the counter i exceeds the number of characters m of the search character string P, and this processing is performed. Repeat (S56). Further, when it is determined in S56 that the value of the counter i exceeds the number of characters m, the search process is terminated because the search results are in agreement. Therefore, when each character of the partial character string of m characters from the start position top in the searched character string T matches all the corresponding characters at the same character position of the search character string P, the loop of S54 to S56 is performed. After repeating m times, the search result is regarded as a match and the process ends.

【０００５】しかし、このＳ５４〜Ｓ５６のループの間
に１文字でも文字の不一致が検出されると、Ｓ５４でこ
のループから脱する。そして、被検索文字列Ｔの先頭位
置ｔｏｐを１文字分先に進めて次の部分文字列を設定す
ると共に（Ｓ５７）、Ｓ５２に戻り上記処理を繰り返
す。したがって、Ｓ５２〜Ｓ５７のループでは、図１０
に示した先頭位置ｔｏｐ＝０の場合の１番目の部分文字
列Ｔ[1]〜Ｔ[m]から先頭位置ｔｏｐ＝ｎ−ｍの場合のｎ
−ｍ＋１番目の部分文字列Ｔ[n-m+1]〜Ｔ[n]までの各部
分文字列が検索文字列Ｐに一致するかどうかを順に検査
することになり、一致する部分文字列が発見された場合
にはＳ５６でループから脱し検索結果が一致であるとし
て検索処理を終了する。また、いずれの部分文字列とも
一致しなかった場合には、Ｓ５２でこれ以上部分文字列
が存在しないと判断されてループから脱し検索結果が不
一致であるとして検索処理を終了する。However, if any character mismatch is detected during the loop of S54 to S56, the loop is exited at S54. Then, the head position top of the searched character string T is advanced by one character to set the next partial character string (S57), and the process returns to S52 to repeat the above process. Therefore, in the loop of S52 to S57, as shown in FIG.
From the first partial character string T [1] to T [m] when the top position top = 0 shown in FIG.
It is to be sequentially checked whether or not each of the substrings of the -m + 1th substring T [n-m + 1] to T [n] matches the search string P, and the substrings that match are searched. If found, the loop is exited in S56, the search result is determined to be a match, and the search process ends. If none of the partial character strings match, it is determined in S52 that there are no more partial character strings, the loop is exited, and the search result is determined to be unmatched, and the search processing ends.

【０００６】データベースの文字列の検索では、各デー
タ中の被検索文字列に対して上記文字列の検索処理を実
行し、検索文字列がこの被検索文字列のいずれかの部分
文字列に一致したとして終了した場合に、この被検索文
字列を有するデータを抽出する。In the search for a character string in a database, the above-mentioned character string search processing is executed for the searched character string in each data, and the searched character string matches any one of the partial character strings of this searched character string. When the processing ends, the data having this searched character string is extracted.

【０００７】なお、図１１に示すアルゴリズムでは同じ
文字を繰り返し比較する無駄があるため、実際の文字列
の検索では、ＢＭ法[Boyer-Moore string pattern matc
hingalgorithm]のような高速化アルゴリズムが用いられ
る場合が多い。Since the algorithm shown in FIG. 11 has a waste of repeatedly comparing the same character, the BM method [Boyer-Moore string pattern matc] is used in the actual character string search.
A speed-up algorithm such as [hingalgorithm] is often used.

【０００８】[0008]

【発明が解決しようとする課題】ところで、最近の電子
機器は、タブレットなどの座標入力装置（位置入力装置
[pointing device]）を用いた手書き文字読取装置[hand
written character reader]（手書き文字認識装置[hand
written character recognition system]）や、イメー
ジスキャナなどの画像入力装置を用いた光学的文字読取
装置（ＯＣＲ[Optical Character Reader]）を備えたも
のが多くなっている。これらの文字読取装置は、入力さ
れたパターンから抽出される特徴と一致する標準パター
ンを選出することにより文字認識を行い、この標準パタ
ーンの文字コードを出力するものである。ただし、この
ような文字読取装置による文字列の入力では、一部の文
字コードを完全に確定できない場合が生じるのは避け得
ず、この場合には可能性のある複数の候補文字を列挙す
ることになる。また、文字の誤認識が発生して、入力文
字とは異なる文字コードに確定する場合もある。By the way, recent electronic devices include a coordinate input device such as a tablet (position input device).
Handwriting character reading device [hand
written character reader] (handwritten character recognition device [hand
written character recognition system]) and an optical character reading device (OCR [Optical Character Reader]) using an image input device such as an image scanner. These character reading devices perform character recognition by selecting a standard pattern that matches a feature extracted from an input pattern, and output a character code of this standard pattern. However, it is unavoidable that some character codes cannot be completely determined when inputting a character string using such a character reading device. In this case, enumerate multiple possible candidate characters. become. In addition, erroneous recognition of characters may occur, and a character code different from the input character may be determined.

【０００９】しかし、従来の文字列の検索では、検索文
字列と被検索文字列の双方の文字コードが確定していな
ければ検索処理を実行することができない。このため、
検索文字列や被検索文字列を文字読取装置で入力する電
子機器では、文字認識が不完全であった場合に、いずれ
の候補文字が正しい入力文字であるかを確定する確定作
業を行ってから検索処理を実行する必要がある。ところ
が、この確定作業は、列挙された候補文字の中から操作
者がいずれかを対話的に選択しなければならないため、
極めて面倒な作業となる。しかも、データベース中の被
検索文字列を印刷物の活字文書などから大量に入力した
ような場合には、この確定作業が実質的に不可能な場合
もある。However, in the conventional character string search, the search processing cannot be executed unless the character codes of both the search character string and the searched character string are fixed. For this reason,
In an electronic device that inputs a search character string or a searched character string with a character reading device, if character recognition is incomplete, perform confirmation work to determine which candidate character is the correct input character. It is necessary to execute the search process. However, this confirmation work requires the operator to interactively select one of the listed candidate characters.
It will be extremely troublesome work. In addition, when a large amount of search target character strings in the database are input from a printed document such as a printed document, this confirmation work may not be practically possible.

【００１０】また、従来の文字列の検索では、文字コー
ドが完全に一致しなければ検索文字列と被検索文字列と
が一致したと判断することができないので、文字読取装
置で文字の誤認識が発生した場合には検索処理が正しく
実行されない。Further, in the conventional character string search, it cannot be judged that the searched character string and the searched character string match unless the character codes completely match, so that the character reading device erroneously recognizes the character. When occurs, the search process is not executed correctly.

【００１１】このため、文字読取装置を備えた従来の電
子機器などでは、文字列の検索を有効に活用することが
できないという問題点があった。For this reason, there is a problem that the retrieval of the character string cannot be effectively utilized in the conventional electronic equipment or the like equipped with the character reading device.

【００１２】本発明は、上記従来の問題を解決するもの
で、文字が確定されない文字列についても一致の程度を
算出することにより検索を可能とする文字列検索装置を
提供することを目的としている。The present invention solves the above-mentioned conventional problems, and an object of the present invention is to provide a character string search device capable of searching a character string in which characters are not fixed by calculating the degree of matching. .

【００１３】[0013]

【課題を解決するための手段】本発明の文字列検索装置
は、各文字が１または２以上の候補文字からなり、か
つ、各候補文字ごとに確度値が対応付けられた不確定文
字列を検索文字列として入力する検索文字列入力手段
と、文字が確定した被検索文字列を有するデータの集合
からなるデータベースと、該データベースにおける各デ
ータ内の各被検索文字列に含まれる、該検索文字列と同
じ文字数の各部分文字列について、該部分文字列の各文
字が同じ文字位置で対応する検索文字列の文字のいずれ
かの候補文字に一致する場合に、それぞれの文字ごとに
一致した候補文字の確度値に基づいて当該部分文字列の
一致度に演算を施す一致度演算手段と、該一致度演算手
段が算出した各部分文字列の一致度を他の部分文字列の
一致度または所定値と比較して、この比較結果に基づき
選出した部分文字列を含む被検索文字列を有するデータ
を該データベースから抽出するデータ抽出手段とを備え
たものであり、そのことにより上記目的が達成される。The character string search device of the present invention generates an indeterminate character string in which each character is composed of one or more candidate characters, and a certainty value is associated with each candidate character. Search character string input means for inputting as a search character string, a database consisting of a set of data having a character string to be searched, and the search character included in each searched character string in each data in the database For each partial character string with the same number of characters as the column, if each character of the partial character string matches any of the candidate characters of the corresponding search character string at the same character position, the matched candidate for each character A matching degree calculating means for calculating a matching degree of the partial character string based on the certainty value of the character, and a matching degree of each partial character string calculated by the matching degree calculating means, a matching degree of another partial character string or a predetermined value. Value and And compare the data with the search string containing selected portions string based on the comparison result and which was equipped with a data extracting means for extracting from said database, said object is achieved.

【００１４】また、好ましくは、文字が確定した検索文
字列を入力する検索文字列入力手段と、各文字が１また
は２以上の候補文字からなり、かつ、各候補文字ごとに
確度値が対応付けられた不確定文字列である被検索文字
列を有するデータの集合からなるデータベースと、該デ
ータベースにおける各データ内の各被検索文字列に含ま
れる、該検索文字列と同じ文字数の各部分文字列につい
て、該部分文字列の各文字のいずれかの候補文字が同じ
文字位置で対応する検索文字列の文字に一致する場合
に、それぞれの文字ごとに一致した候補文字の確度値に
基づいて当該部分文字列の一致度に演算を施す一致度演
算手段と、該一致度演算手段が算出した各部分文字列の
一致度を他の部分文字列の一致度または所定値と比較し
て、この比較結果に基づき選出した部分文字列を含む被
検索文字列を有するデータを該データベースから抽出す
るデータ抽出手段とを備える。Further, preferably, a search character string input means for inputting a search character string having a fixed character, each character is composed of one or more candidate characters, and a certainty value is associated with each candidate character. A database consisting of a set of data having a searched character string that is an indeterminate character string, and each partial character string included in each searched character string in each data in the database and having the same number of characters as the searched character string , If any candidate character of each character of the partial character string matches the character of the corresponding search character string at the same character position, the part is determined based on the probability value of the matching candidate character for each character. A matching degree calculating means for calculating the matching degree of the character string, and the matching degree of each partial character string calculated by the matching degree calculating means is compared with the matching degree of another partial character string or a predetermined value, and this comparison result To The data having the searched string containing selected portions string Hazuki and a data extracting means for extracting from the database.

【００１５】さらに、好ましくは、各文字が１または２
以上の候補文字からなり、かつ、各候補文字ごとに確度
値が対応付けられた不確定文字列を検索文字列として入
力する検索文字列入力手段と、各文字が１または２以上
の候補文字からなり、かつ、各候補文字ごとに確度値が
対応付けられた不確定文字列である被検索文字列を有す
るデータの集合からなるデータベースと、該データベー
スにおける各データ内の各被検索文字列に含まれる、該
検索文字列と同じ文字数の各部分文字列について、該部
分文字列の各文字のいずれかの候補文字が同じ文字位置
で対応する検索文字列の文字のいずれかの候補文字に一
致する場合に、それぞれの文字ごとに一致した双方の候
補文字の確度値に基づいて当該部分文字列の一致度に演
算を施す一致度演算手段と、該一致度演算手段が算出し
た各部分文字列の一致度を他の部分文字列の一致度また
は所定値と比較して、この比較結果に基づき選出した部
分文字列を含む被検索文字列を有するデータを該データ
ベースから抽出するデータ抽出手段とを備える。Further, preferably, each character is 1 or 2.
Search character string input means for inputting an uncertain character string consisting of the above candidate characters and having a certainty value associated with each candidate character as a search character string, and each character from one or more candidate characters And a database consisting of a set of data having a searched character string which is an uncertain character string in which a certainty value is associated with each candidate character, and included in each searched character string in each data in the database For each partial character string having the same number of characters as the search character string, any candidate character of each character of the partial character string matches any candidate character of the characters of the corresponding search character string at the same character position. In this case, the matching degree calculating means for calculating the matching degree of the partial character string based on the certainty value of both candidate characters matched for each character, and the partial character strings calculated by the matching degree calculating means Data extraction means for comparing the degree of coincidence with the matching degree of another partial character string or a predetermined value and extracting data having a searched character string including the partial character string selected based on the comparison result from the database. .

【００１６】さらに、好ましくは、前記検索文字列入力
手段が、文字読取装置によって入力文字列の各文字を識
別し、それぞれの文字ごとに１または２以上の候補文字
を選出すると共に、選出した各候補文字ごとに認識の正
確さを示す確度値を付加するものである。Further preferably, the search character string input means identifies each character of the input character string by the character reading device, selects one or more candidate characters for each character, and selects each candidate character. An accuracy value indicating the accuracy of recognition is added to each candidate character.

【００１７】さらに、好ましくは、前記データベースに
おける各データの各被検索文字列が、文字読取装置によ
って入力文字列の各文字を識別し、それぞれの文字ごと
に１または２以上の候補文字を選出すると共に、選出し
た各候補文字ごとに認識の正確さを示す確度値を付加し
て入力したものである。Further, preferably, each searched character string of each data in the database identifies each character of the input character string by the character reading device and selects one or more candidate characters for each character. At the same time, an accuracy value indicating the accuracy of recognition is added to each of the selected candidate characters and input.

【００１８】さらに、好ましくは、前記一致度演算手段
が、被検索文字列の文字数が検索文字列の文字数に一致
する場合にのみ該被検索文字列全体を唯一の部分文字列
として一致度の演算を行うものである。Further, preferably, the coincidence degree calculating means calculates the degree of coincidence by using the entire searched character string as an only partial character string only when the number of characters of the searched character string matches the number of characters of the searched character string. Is to do.

【００１９】[0019]

【作用】上記構成により、一致度演算手段は、被検索文
字列の各部分文字列について、不確定文字列からなる検
索文字列との一致の程度を一致度として算出することが
できる。しかも、いずれの候補文字にも一致しない文字
が検出された場合にも、この一致度算出手段の処理を最
後の文字まで続行することにより、不一致文字が含まれ
る部分文字列に対しても一致度の算出を行うことができ
る。また、データ抽出手段は、各部分文字列の一致度を
他の部分文字列の一致度と相互に比較したり所定値と比
較することにより、各部分文字列の最大の一致度を検出
したり、各部分文字列を一致度の高い順にソート[sorti
ng]したり、一致度が所定値以上の部分文字列を選出し
て、データベースから操作者が所望するデータを抽出す
ることができる。With the above structure, the coincidence degree calculating means can calculate, as the degree of coincidence, the degree of coincidence of each partial character string of the searched character string with the search character string consisting of the indeterminate character string. In addition, even when a character that does not match any of the candidate characters is detected, by continuing the processing of the matching degree calculation means to the last character, the matching degree for the partial character string including the unmatched character is also calculated. Can be calculated. Further, the data extraction means detects the maximum degree of coincidence of each partial character string by comparing the degree of coincidence of each partial character string with the degree of coincidence of another partial character string or by comparing with a predetermined value. , Sort each substring in descending order of matching [sorti
ng] or selecting a partial character string whose degree of coincidence is a predetermined value or more to extract the data desired by the operator from the database.

【００２０】したがって、検索文字列が各文字の確定し
ていない不確定文字列であっても、そのままデータベー
スの各データの被検索文字列に対して検索を行うことが
可能となり、例えば一致度が最も高いデータを抽出した
り一致度が所定値以上のデータを一致度が高い順にソー
トして抽出することができるようになる。しかも、一部
に不一文字が含まれている場合にも、他の文字が一致し
て高い一致度が得られれば、そのデータも抽出の対象と
することが可能となる。Therefore, even if the search character string is an indeterminate character string in which each character is not fixed, it is possible to search the searched character string of each data in the database as it is, and, for example, the degree of matching is It becomes possible to extract the highest data or sort and extract the data having a matching degree equal to or higher than a predetermined value in descending order of the matching degree. Moreover, even if a part of the character includes a non-uniform character, if the other characters match and a high degree of matching is obtained, the data can be extracted.

【００２１】なお、候補文字は、通常は具体的な文字コ
ードを有するものであるが、特定の文字種を表すもの
や、被検索文字列の全ての文字に一致するようなメタキ
ャラクタを用いることも可能である。データベースは、
データベーススキーマによって定義されたデータの集合
のみならず、ここではテキストデータからなるファイル
なども含む広い意味で用いている。この場合、各データ
は例えばテキストデータ中の各行とすることができる。Note that the candidate character usually has a specific character code, but a character that represents a specific character type or a metacharacter that matches all the characters in the searched character string may be used. It is possible. The database is
Not only a set of data defined by the database schema but also a file including text data is used here in a broad sense. In this case, each data can be, for example, each line in the text data.

【００２２】確度値は、各候補文字ごとに明示的に値が
対応付けられる他、各文字についての候補文字の総数や
当該候補文字の序列などに応じて自動的に定まるように
対応付けてもよい。この確度値は、対応する候補文字が
実際にその文字である可能性が高いほど高得点となる得
点とする他、実際にその文字である可能性を示す確率な
どの数値を用いることもできる。各部分文字列の一致度
は、最初に適当な初期値を与えておき、部分文字列の文
字が検索文字列の候補文字に一致するたびに、その確度
値に基づいて一致度演算手段が演算を施す。確度値が得
点である場合には、例えば一致度の初期値を０とし、各
確度値を順次加算する演算を施せば、その部分文字列が
検索文字列のより可能性の高い候補文字に一致するほど
高得点の一致度を得ることができる。しかも、不一致の
文字については、確度値の加算を行わないことにより一
致度が高得点となるのを抑制できる。また、確度値が０
以上１以下の値を有する確率である場合には、例えば一
致度の初期値を１とし、各確度値を順次乗算する演算を
施せば、その部分文字列が検索文字列に一致する確率が
高いほど１に近い値の一致度を得ることができる。ただ
し、この場合には、不一致文字があった場合にも、一致
度に予め定められた十分に低い確率を乗算する演算を施
す必要がある。The probability value is explicitly associated with each candidate character, or may be associated with the candidate character so that it is automatically determined according to the total number of candidate characters for each character and the order of the candidate characters. Good. For this certainty value, the higher the probability that the corresponding candidate character is actually the character, the higher the score, and it is also possible to use a numerical value such as the probability that the character is actually the character. The matching degree of each sub-character string is given an appropriate initial value first, and each time a character of the sub-character string matches a candidate character of the search character string, the matching degree calculation means calculates it based on the certainty value. Give. If the certainty value is a score, for example, if the initial value of the degree of matching is set to 0 and an operation of sequentially adding each certainty value is performed, the partial character string matches the candidate character with a higher possibility of being the search character string. The higher the score, the higher the degree of agreement. Moreover, for the characters that do not match, it is possible to suppress the matching score from becoming a high score by not adding the accuracy values. Also, the accuracy value is 0
When the probability of having a value of 1 or less is set, for example, if the initial value of the matching degree is set to 1 and an operation of sequentially multiplying each accuracy value is performed, the probability that the partial character string matches the search character string is high. It is possible to obtain a degree of coincidence close to 1. However, in this case, even if there is a non-matching character, it is necessary to perform an operation for multiplying the matching degree by a predetermined low probability.

【００２３】データ抽出手段が各部分文字列の一致度を
他の部分文字列の一致度と比較する場合、同じ被検索文
字列中における他の部分文字列の一致度と比較する場合
と、同じデータ内の他の被検索文字列における部分文字
列の一致度と比較する場合と、他のデータ内の被検索文
字列における部分文字列の一致度と比較する場合とが存
在し得る。そして、同じデータ内の同じ被検索文字列中
または他の被検索文字列の部分文字列の一致度と比較す
る場合は、一般に最大の一致度を求めるためのものであ
り、他のデータとの間の比較は、一致度の高い順に並べ
るなどのソートを行うためのものである。When the data extraction means compares the degree of coincidence of each partial character string with the degree of coincidence of another partial character string, it is the same as when comparing the degree of coincidence of another partial character string in the same searched character string. There may be a case of comparing with a matching degree of a partial character string in another searched character string in the data, and a case of comparing with a matching degree of a partial character string in a searched character string in other data. And when comparing with the matching degree of the same searched character string in the same data or a partial character string of another searched character string, it is generally for obtaining the maximum matching degree, and is compared with other data. The comparison between them is for performing sorting such as arranging in order of high degree of coincidence.

【００２４】また、上記構成により、一致度演算手段
は、不確定文字列からなる被検索文字列の各部分文字列
について、検索文字列との一致の程度を一致度として算
出することができる。したがって、データベースの各デ
ータの被検索文字列が各文字の確定していない不確定文
字列であっても、そのまま検索文字列による検索を行う
ことが可能となる。Further, with the above configuration, the coincidence degree calculating means can calculate the degree of coincidence with the search character string as the degree of coincidence for each partial character string of the searched character string consisting of an indeterminate character string. Therefore, even if the searched character string of each data in the database is an indefinite character string in which each character is not fixed, it is possible to perform a search using the search character string as it is.

【００２５】さらに、上記構成により、一致度演算手段
は、不確定文字列からなる被検索文字列の各部分文字列
について、不確定文字列からなる検索文字列との一致の
程度を一致度として算出することができる。したがっ
て、検索文字列が各文字の確定していない不確定文字列
であり、かつ、データベースの各データの被検索文字列
も同様の不確定文字列である場合であっても、そのまま
検索を行うことが可能となる。Further, with the above configuration, the coincidence degree calculating means sets the degree of coincidence of each partial character string of the searched character string including the indefinite character string with the search character string including the indefinite character string as the coincidence degree. It can be calculated. Therefore, even if the search character string is an uncertain character string in which each character is not confirmed and the searched character string of each data in the database is also the same uncertain character string, the search is performed as it is. It becomes possible.

【００２６】さらに、上記構成により、不確定文字列か
らなる検索文字列を文字読取装置によって入力したもの
とすることができる。したがって、例えば手書き文字読
取装置を備えた電子機器において、検索文字列を手書き
によって指定した場合に、複数の候補文字が選出された
ときにもその文字の確定作業を行うことなく、また、誤
認識があったときにもその文字の訂正作業を行うことな
く、直ちに検索を実行することができるようになる。Further, with the above configuration, it is possible to input a search character string consisting of an uncertain character string by the character reading device. Therefore, for example, in an electronic device equipped with a handwritten character reading device, when a search character string is designated by handwriting, even when a plurality of candidate characters are selected, the character is not confirmed and erroneous recognition is not performed. Even if there is an error, the search can be immediately executed without correcting the character.

【００２７】さらに、上記構成により、不確定文字列か
らなる被検索文字列を文字読取装置によって入力したも
のとすることができる。したがって、例えば光学的文字
読取装置を備えた電子機器において、印刷物の活字文書
などを機械的に読み込んで文字の確定作業や訂正作業を
行うことなくデータベースの各データの被検索文字列と
したような場合にも、直ちに検索を実行することができ
るようになる。Further, with the above configuration, the searched character string consisting of the uncertain character string can be input by the character reading device. Therefore, for example, in an electronic device equipped with an optical character reading device, a typed document of a printed matter is mechanically read, and the searched character string of each data in the database is set without mechanically fixing or correcting the character. If so, you will be able to perform the search immediately.

【００２８】さらに、上記構成により、被検索文字列の
文字数が検索文字列の文字数に一致する場合にのみ検索
対象とすることができる。Further, with the above configuration, the search target character string can be searched only when it matches the number of characters of the search character string.

【００２９】[0029]

【実施例】以下、本発明の実施例について説明する。Embodiments of the present invention will be described below.

【００３０】図１〜図５は本発明の第１実施例を示すも
のであって、図１は文字列検索装置の構成を示すブロッ
ク図、図２は文字列検索装置を備えた電子機器の構成を
示すブロック図、図３は文字列検索装置における検索処
理の動作を示すフローチャート、図４は検索処理におけ
る一致度の計算処理の動作を示すフローチャート、図５
は検索文字列の候補文字と部分文字列の文字とが一致す
る状態を示す図である。1 to 5 show a first embodiment of the present invention. FIG. 1 is a block diagram showing the configuration of a character string search device, and FIG. 2 is an electronic device equipped with the character string search device. FIG. 5 is a block diagram showing the configuration, FIG. 3 is a flowchart showing an operation of a search process in the character string search device, FIG. 4 is a flowchart showing an operation of a matching degree calculation process in the search process, and FIG.
FIG. 6 is a diagram showing a state where candidate characters of a search character string and characters of a partial character string match.

【００３１】本実施例は、不確定文字列の検索文字列に
基づいてデータベースの検索を行う文字列検索装置を備
えた電子機器について説明する。この電子機器は、図２
に示すように、装置全体の制御と各種演算を行う演算装
置１を備えている。この演算装置１には、半導体メモリ
などからなる主記憶装置２が接続されている。主記憶装
置２は、文字列検索装置などのプログラムを実行のため
にロードする他、このプログラムの実行中に検索文字列
を格納したり一致度の得点を保存するための作業領域を
確保するためなどに使用される。また、演算装置１に
は、ハードディスク装置やフロッピーディスク装置など
からなる補助記憶装置３と、ＣＲＴ[Cathode Ray Tube]
ディスプレイやＬＣＤ[Liquid Crystal Display]などか
らなるディスプレイ装置４とが接続されている。補助記
憶装置３は、文字列検索装置などのプログラムやデータ
ベースを格納するために使用される。ディスプレイ装置
４は、文字列検索装置のプログラムの実行中に、入力さ
れた検索文字列を表示したり検索結果を表示するために
使用される。さらに、演算装置１には、入力装置として
タブレット５とイメージスキャナ６とが接続されてい
る。タブレット５は、入力ペン７を用いて手書きした文
字のストロークなどのパターンを入力する座標入力装置
であり、補助記憶装置３に格納された手書き文字読取装
置のプログラムを実行することにより、この手書き文字
のパターンを文字コードとして認識し検索文字列として
主記憶装置２に格納することができるようになってい
る。イメージスキャナ６は、印刷物の活字文書などを光
学的に読み込んで画像パターンとして入力する画像入装
置であり、補助記憶装置３に格納された光学的文字読取
装置のプログラムを実行することにより、活字文書など
の画像パターンを文字コードとして認識しデータベース
の各データの被検索文字列として補助記憶装置３に格納
することができるようになっている。なお、タブレット
５は、ディスプレイ装置４と一体型のものであってもよ
い。また、このタブレット５の代わりにマウスなどの他
の座標入力装置を用いることもできる。In this embodiment, an electronic device equipped with a character string search device for searching a database based on a search character string for an indeterminate character string will be described. This electronic device is shown in FIG.
As shown in FIG. 3, the arithmetic unit 1 is provided for controlling the entire apparatus and performing various arithmetic operations. The arithmetic unit 1 is connected to a main memory unit 2 such as a semiconductor memory. The main storage device 2 loads a program such as a character string search device for execution, and reserves a work area for storing a search character string and storing a score of the degree of coincidence during execution of this program. Used for etc. Further, the arithmetic unit 1 includes an auxiliary storage device 3 including a hard disk device and a floppy disk device, and a CRT [Cathode Ray Tube].
A display and a display device 4 including an LCD [Liquid Crystal Display] are connected. The auxiliary storage device 3 is used to store programs such as a character string search device and a database. The display device 4 is used for displaying an input search character string and displaying a search result during execution of a program of the character string search device. Further, a tablet 5 and an image scanner 6 are connected to the arithmetic unit 1 as input devices. The tablet 5 is a coordinate input device for inputting a pattern such as a stroke of a handwritten character using the input pen 7, and by executing a program of the handwritten character reading device stored in the auxiliary storage device 3, the handwritten character The pattern can be recognized as a character code and stored in the main storage device 2 as a search character string. The image scanner 6 is an image input device that optically reads a printed document or the like of a printed matter and inputs it as an image pattern. By executing the program of the optical character reading device stored in the auxiliary storage device 3, the image scanner 6 is a printed document. It is possible to recognize an image pattern such as a character code as a character code and store it in the auxiliary storage device 3 as a searched character string of each data in the database. The tablet 5 may be integrated with the display device 4. Further, instead of the tablet 5, another coordinate input device such as a mouse can be used.

【００３２】上記ハードウエア構成の電子機器において
データベースの検索処理を実行する文字列検索装置の構
成を図１に示す。この文字列検索装置は、タブレット５
上に入力ペン７を用いて手書きした各文字のパターンを
手書き文字読取装置１１によって認識し検索文字列とし
て検索処理部１２に入力する。ただし、本実施例では、
手書き文字読取装置１１が手書き入力文字を確定できな
かった場合にも、確定作業を行うことなく不確定文字列
のまま入力する。不確定文字列は、各文字が１または２
以上の候補文字からなり、かつ、各候補文字ごとに確度
値が対応付けられた文字列である。手書き入力文字が確
定できた場合には、その文字が１文字だけの候補文字か
らなり、確度値として満点の得点が対応付けられる。ま
た、手書き文字がいずれの候補文字に該当するかを確定
できなかった場合には、複数の候補文字が列挙され、各
候補文字ごとにその手書き文字に該当する可能性が高い
ほど満点に近い得点が対応付けられる。もっとも、候補
文字が複数ある場合のみならず１文字だけの場合にも、
実際の手書き入力文字がいずれの候補文字にも該当しな
い誤認識は生じる得る。FIG. 1 shows the configuration of a character string search device that executes a database search process in an electronic device having the above hardware configuration. This character string search device is a tablet 5
The handwritten character reading device 11 recognizes the pattern of each character handwritten using the input pen 7 and inputs it to the search processing unit 12 as a search character string. However, in this embodiment,
Even when the handwritten character reading device 11 cannot confirm the handwritten input character, the uncertain character string is input as it is without performing the confirmation work. Indeterminate character string is 1 or 2 for each character
The character string is made up of the above candidate characters, and the probability value is associated with each candidate character. When the handwritten input character can be confirmed, the character is made up of only one candidate character, and a perfect score is associated as the accuracy value. In addition, when it is not possible to determine which candidate character the handwritten character corresponds to, a plurality of candidate characters are listed, and the higher the possibility that the candidate character corresponds to the handwritten character, the closer the score is to the full score. Are associated with. However, not only when there are multiple candidate characters but also when there is only one character,
False recognition may occur in which the actual handwritten input character does not correspond to any of the candidate characters.

【００３３】上記不確定文字列は、任意のデータ構造に
よって実現することができる。即ち、例えば各文字につ
いての候補文字の最大数が定まっている場合には２次元
配列によって実現することができる。この場合、最初の
次元の添え字によって各文字の要素を指定し、次の次元
の添え字によって各候補文字の要素を指定する。そし
て、確度値についても、要素数が同じ数値型の２次元配
列に格納することにより対応付けることができる。ま
た、確度値が文字型と同じデータ型で表すことができる
場合には、不確定文字列を３次元配列として、最後の次
元の全ての添え字によって確度値の要素を指定すること
もできる。さらに、この確度値は、例えば候補文字が１
文字だけしか存在しない場合には満点の得点とし、２文
字の場合には先の候補文字が満点の６割の得点で後の候
補文字が満点の４割の得点とするというように、各文字
についての候補文字の総数や当該候補文字の序列などに
応じて自動的に定まるように対応付けることもできる。
したがって、この場合の確度値は、データとして不確定
文字列に付随するのではなく、アルゴリズムとして対応
付けられることになる。The indefinite character string can be realized by an arbitrary data structure. That is, for example, when the maximum number of candidate characters for each character is fixed, it can be realized by a two-dimensional array. In this case, the subscript of the first dimension specifies the element of each character, and the subscript of the next dimension specifies the element of each candidate character. The accuracy value can also be associated by storing it in a two-dimensional array of the same numerical value type as the number of elements. Further, when the certainty value can be represented by the same data type as the character type, the uncertain character string can be a three-dimensional array and the elements of the certainty value can be designated by all the subscripts of the last dimension. Further, this certainty value is, for example, 1 for the candidate character.
When there are only characters, the full score is given, and when there are two characters, the first candidate character is 60% of the full score and the latter candidate character is 40% of the full score. Can be automatically determined according to the total number of candidate characters and the order of the candidate characters.
Therefore, the accuracy value in this case is not associated with the uncertain character string as data, but is associated as an algorithm.

【００３４】また、例えば各文字の１または２以上の候
補文字を順に配置すると共に、特別の区切り記号を定め
て、各文字の間にこの区切り記号を挿入することにより
文字の区切りを判別できるようにすれば、上記不確定文
字列を１次元配列などのシーケンシャルなデータとして
取り扱うことができ、各文字の候補文字の数も無制限と
することができる。そして、この場合にも、確度値を各
候補文字の直後に配置したり、別のシーケンシャルデー
タに格納したり、アルゴリズムとして対応付けることが
できる。さらに、この不確定文字列は、木構造[tree st
ructure]によって実現することも可能である。Further, for example, by arranging one or more candidate characters of each character in order, defining a special delimiter, and inserting this delimiter between each character, the character delimiter can be determined. In this case, the indeterminate character string can be handled as sequential data such as a one-dimensional array, and the number of candidate characters for each character can be unlimited. Also in this case, the accuracy value can be arranged immediately after each candidate character, can be stored in another sequential data, or can be associated as an algorithm. In addition, this indeterminate string has a tree structure [tree st
It can also be realized by [ructure].

【００３５】上記検索処理部１２では、この不確定文字
列の検索文字列に基づいて補助記憶装置３に格納された
データベースにおける各データ中の被検索文字列の検索
を行う。このデータベースの検索の際には、各データ中
の検索対象となる欄を探索キーとして指定する。探索キ
ーとなる欄は、データのタイトルやキーワードまたはデ
ータ本体などであり、タイトル欄の場合にはタイトル名
の文字列が被検索文字列となり、キーワード欄の場合に
は列挙された各キーワードの文字列がそれぞれ被検索文
字列となり、データ本体の場合にはこのデータ本体を構
成する各文字列がそれぞれ被検索文字列となる。検索処
理部１２の検索処理は、まず被検索文字列における検索
文字列と同じ文字数の各部分文字列ごとに、一致した候
補文字の確度値に基づいて後に説明する一致度を算出す
ると共に、当該被検索文字列における各部分文字列の一
致度の最大得点を最大一致度として求める。そして、こ
の最大一致度が高いデータを優先して、ディスプレイ装
置４の検索結果リストに表示する。The search processing section 12 searches for the searched character string in each data in the database stored in the auxiliary storage device 3 based on the searched character string of the indeterminate character string. When searching this database, the search target column in each data is designated as a search key. The column that is the search key is the title of the data, the keyword, or the data itself. In the case of the title column, the character string of the title name is the searched character string, and in the case of the keyword column, the character string of each listed keyword. Each of the columns becomes a searched character string, and in the case of a data body, each character string forming this data body becomes a searched character string. In the search processing of the search processing unit 12, first, for each partial character string having the same number of characters as the search character string in the search target character string, the degree of matching described later is calculated based on the probability value of the matched candidate character, and The maximum score of the matching score of each partial character string in the searched character string is obtained as the maximum matching score. Then, the data having the highest maximum degree of coincidence is preferentially displayed in the search result list of the display device 4.

【００３６】上記構成の検索処理部１２の動作を図３お
よび図４のフローチャートに基づいて説明する。この検
索処理部１２は、補助記憶装置３に格納されたデータベ
ースから各データの被検索文字列を順次読み出し、それ
ぞれの被検索文字列について検索処理を実行する。ここ
で、検索文字列Ｐの文字数をｍ文字とし、被検索文字列
Ｔの文字数をｎ文字とする。検索処理では、図３に示す
ように、まず最大一致度の得点を“０”に初期化すると
共に（Ｓ１）、部分文字列の先頭位置ｔｏｐの値も
“０”に初期化する（Ｓ２）。この先頭位置ｔｏｐの値
は、検査対象となる部分文字列の先頭文字が被検索文字
列Ｔの先頭文字から何文字目になるかを示す。そして、
文字数ｎの被検索文字列Ｔの先頭位置ｔｏｐ以降に文字
数ｍ分の文字があるかどうかを判断して（Ｓ３）、ｍ文
字分に足りず部分文字列が存在しない場合には検索処理
を終了する。被検索文字列Ｔの文字数ｎが検索文字列Ｐ
の文字数ｍよりも少なかった場合には、検索処理の開始
時に直ちにここで処理を終了し、最大一致度の得点も初
期値のままとなる。The operation of the search processing unit 12 having the above configuration will be described with reference to the flowcharts of FIGS. 3 and 4. The search processing unit 12 sequentially reads out the searched character string of each data from the database stored in the auxiliary storage device 3, and executes the search process for each searched character string. Here, the number of characters of the search character string P is m, and the number of characters of the searched character string T is n. In the search processing, as shown in FIG. 3, first, the score of the maximum matching score is initialized to "0" (S1), and the value of the top position top of the partial character string is also initialized to "0" (S2). . The value of the head position top indicates how many characters the head character of the partial character string to be inspected is from the head character of the searched character string T. And
It is judged whether or not there are characters for the number of characters m after the top position top of the searched character string T having the number of characters n (S3), and if there are not enough partial characters for m characters, the search process ends. To do. The number of characters n in the searched character string T is the search character string P
If the number of characters is less than m, the process is terminated immediately at the start of the search process, and the score of the maximum matching score remains the initial value.

【００３７】Ｓ３でｍ文字分の文字があり部分文字列が
存在すると判断された場合には、先頭位置ｔｏｐからｍ
文字分の部分文字列について一致度の計算処理が実行さ
れる（Ｓ４）。そして、このＳ４で算出した当該部分文
字列の一致度と最大一致度とを比較して（Ｓ５）、最大
一致度の得点の方が低かった場合にはこの最大一致度の
得点をＳ４で算出した一致度の得点に書き換え（Ｓ
６）、また、最大一致度の得点の方が高いか同得点であ
ればそのまま、先頭位置ｔｏｐを１文字分先に進めて次
の部分文字列を設定すると共に（Ｓ７）、Ｓ３に戻って
被検索文字列Ｔの全ての部分文字列の処理が終わるまで
この処理を繰り返す。したがって、この検索処理では、
被検索文字列Ｔの先頭からｍ文字分の部分文字列を順に
切り出して、それぞれ検索文字列Ｐとの一致度を計算す
る。また、この検索処理の終了時には、当該被検索文字
列Ｔにおける各部分文字列の一致度の最大の得点が最大
一致度に格納されることになる。If it is determined in S3 that there are m characters and there is a partial character string, m from the top position top.
A matching degree calculation process is executed for the partial character strings of the characters (S4). Then, the degree of coincidence of the partial character string calculated in S4 is compared with the maximum degree of coincidence (S5), and if the score of the maximum degree of coincidence is lower, the score of the maximum degree of coincidence is calculated in S4. It is rewritten into the score of the degree of coincidence (S
6) On the other hand, if the score of the maximum degree of coincidence is higher or the same score, the start position top is advanced by one character and the next partial character string is set (S7), and the process returns to S3. This process is repeated until all the partial character strings of the searched character string T have been processed. Therefore, in this search process,
The partial character strings of m characters are sequentially cut out from the beginning of the searched character string T, and the degree of coincidence with the search character string P is calculated. Further, at the end of this search processing, the maximum score of the matching score of each partial character string in the searched character string T is stored in the maximum matching score.

【００３８】上記Ｓ４における一致度の計算処理の詳細
を図４に基づいて説明する。ただし、ここでは、検索文
字列Ｐにおけるｉ文字目の文字のｊ番目の候補文字はＰ
[i,j]で表すものとし、被検索文字列Ｔにおけるｉ文字
目の文字はＴ[i]で表すものとする。一致度の計算処理
の開始時には、まず一致度の得点を“０”に初期化する
と共に（Ｓ１１）、カウンタｉの値を“１”に初期化し
（Ｓ１２）、カウンタｊの値も“１”に初期化する（Ｓ
１３）。そして、検索文字列Ｐにおけるｉ文字目の文字
のｊ番目の候補文字Ｐ[i,j]が被検索文字列Ｔにおける
部分文字列の先頭位置ｔｏｐからｉ文字目の文字Ｔ[top
+i]に一致するかどうかを判断する（Ｓ１４）。Details of the calculation process of the degree of coincidence in S4 will be described with reference to FIG. However, here, the j-th candidate character of the i-th character in the search character string P is P
It is represented by [i, j], and the i-th character in the searched character string T is represented by T [i]. At the start of the coincidence degree calculation processing, first, the coincidence degree score is initialized to "0" (S11), the value of the counter i is initialized to "1" (S12), and the value of the counter j is also "1". Initialize to (S
13). Then, the j-th candidate character P [i, j] of the i-th character in the search character string P is the i-th character T [top from the top position top of the partial character string in the searched character string T.
+ i] is determined (S14).

【００３９】Ｓ１４において、候補文字Ｐ[i,j]が文字
Ｔ[top+i]に一致しないと判断された場合には、カウン
タｊに“１”を加えて（Ｓ１５）、次の候補文字Ｐ[i,
j]が存在するかどうかを判断し（Ｓ１６）、次の候補文
字Ｐ[i,j]が存在する場合にはＳ１４に戻ってこの処理
を繰り返す。したがって、ここでは部分文字列の各文字
Ｔ[top+i]が検索文字列Ｐにおける同じ文字位置で対応
する１または２以上の候補文字Ｐ[i,j]と順に比較され
る。なお、Ｓ１６における次の候補文字Ｐ[i,j]の存在
の有無は、不確定文字列のデータ構造に応じた方法で検
出することができる。例えば２次元配列を用いる場合、
候補文字が存在しない要素に特別の空記号を格納してお
けば、候補文字Ｐ[i,j]がこの空記号であるかどうかを
検査することにより次の候補文字の有無が検出できる。
また、区切り記号を用いたシーケンシャルデータの場合
には、直前の候補文字の次の文字がこの区切り記号であ
るかどうかを検査することにより次の候補文字の有無が
検出できる。When it is determined in S14 that the candidate character P [i, j] does not match the character T [top + i], "1" is added to the counter j (S15), and the next candidate character is added. P [i,
It is determined whether or not j] exists (S16), and if the next candidate character P [i, j] exists, the process returns to S14 and repeats this processing. Therefore, here, each character T [top + i] of the partial character string is sequentially compared with one or more corresponding candidate characters P [i, j] at the same character position in the search character string P. The presence or absence of the next candidate character P [i, j] in S16 can be detected by a method according to the data structure of the uncertain character string. For example, when using a two-dimensional array,
If a special empty symbol is stored in the element where no candidate character exists, the presence or absence of the next candidate character can be detected by checking whether the candidate character P [i, j] is this empty symbol.
Also, in the case of sequential data using a delimiter, the presence or absence of the next candidate character can be detected by checking whether the character next to the immediately preceding candidate character is this delimiter.

【００４０】Ｓ１４において、候補文字Ｐ[i,j]が文字
Ｔ[top+i]に一致すると判断された場合には、残りの候
補文字との比較を打ち切って、その候補文字Ｐ[i,j]に
対応付けられた確度値を一致度の得点に加算する（Ｓ１
７）。そして、このＳ１７で一致度が加算された場合、
または、Ｓ１６で当該文字における全ての候補文字との
比較が完了したと判断された場合には、カウンタｉに
“１”を加えて次の文字に進み（Ｓ１８）、このカウン
タｉの値が検索文字列Ｐの文字数ｍを超えるまで、Ｓ１
３に戻りカウンタｊを再初期化してからこの処理を繰り
返す（Ｓ１９）。また、Ｓ１９でカウンタｉの値が文字
数ｍを超えたと判断された場合には、一致度の計算処理
を終了する。したがって、部分文字列の各文字が検索文
字列Ｐにおける同じ文字位置で対応する候補文字のいず
れかに一致した場合には、その候補文字の確度値が一致
度の得点に順次加算される。また、いずれの候補文字に
も一致しなかった場合にも、得点の加算は行われない
が、以降の各文字について処理を続行する。このため、
検索文字列Ｐの各候補文字に１文字も一致しなかった部
分文字列の一致度の得点は初期値である“０”のままで
あり、一般に一致した文字数が多いほど一致度の得点も
高くなる。また、部分文字列の同じ文字位置の文字が一
致した場合であっても、いずれの候補文字に一致したか
によって一致度が変化し、より可能性の高い候補文字に
一致するほど高得点となる。When it is determined in S14 that the candidate character P [i, j] matches the character T [top + i], the comparison with the remaining candidate characters is terminated and the candidate character P [i, j] is terminated. The accuracy value associated with j] is added to the matching score (S1
7). Then, when the degree of coincidence is added in S17,
Alternatively, when it is determined in S16 that the comparison with all the candidate characters in the character is completed, "1" is added to the counter i and the process proceeds to the next character (S18), and the value of this counter i is searched. S1 until the number of characters m of the character string P is exceeded
After returning to 3, the counter j is re-initialized and this processing is repeated (S19). If it is determined in S19 that the value of the counter i has exceeded the number of characters m, the matching degree calculation process ends. Therefore, when each character of the partial character string matches any of the corresponding candidate characters at the same character position in the search character string P, the certainty value of the candidate character is sequentially added to the score of the matching degree. In addition, even if none of the candidate characters match, no score is added, but the process is continued for each subsequent character. For this reason,
The score of the degree of coincidence of the partial character string in which no character matches each candidate character of the search character string P remains "0" which is the initial value, and generally, the greater the number of matched characters, the higher the degree of coincidence. Become. Further, even if the characters at the same character position in the partial character string match, the degree of matching changes depending on which candidate character matches, and the higher the match with the more likely candidate character, the higher the score. .

【００４１】上記一致度の計算処理において、検索文字
列Ｐの各文字の候補文字と被検索文字列Ｔに含まれる部
分文字列の各文字とが一致する様子を図５に例示する。
ここで、各文字や候補文字は□で示すものとし、一致し
た文字や候補文字には□の中に黒丸を表示している。ま
た、検索文字列Ｐは、文字数を４文字（ｍ＝４）とし、
１文字目は５番目までの候補文字があり、２文字目は３
番目までの候補文字があり、３文字目は４番目までの候
補文字があり、４文字目は５番目までの候補文字がある
ものとする。カウンタｉ＝１の最初のループでは、カウ
ンタｊ＝２のときに、検索文字列Ｐの１文字目の２番目
の候補文字Ｐ[1,2]が部分文字列の１文字目の文字Ｔ[to
p+1]と一致するので、Ｓ１７でこの候補文字Ｐ[1,2]の
確度値が一致度の得点に加算される。カウンタｉ＝２の
ループでは、カウンタｊ＝１のときに、検索文字列Ｐの
２文字目の１番目の候補文字Ｐ[2,1]が部分文字列の２
文字目の文字Ｔ[top+2]と一致するので、この候補文字
Ｐ[2,1]の確度値が一致度の得点に加算される。カウン
タｉ＝３のループでは、カウンタｊ＝２のときに、検索
文字列Ｐの３文字目の２番目の候補文字Ｐ[3,2]が部分
文字列の３文字目の文字Ｔ[top+3]と一致するので、こ
の候補文字Ｐ[3,2]の確度値が一致度の得点に加算され
る。そして、カウンタｉ＝４のループでは、カウンタｊ
＝３のときに、検索文字列Ｐの４文字目の３番目の候補
文字Ｐ[4,3]が部分文字列の４文字目の文字Ｔ[top+4]と
一致するので、この候補文字Ｐ[4,3]の確度値が一致度
の得点に加算される。したがって、ここで算出される一
致度は、候補文字Ｐ[1,2]と候補文字Ｐ[2,1]と候補文字
Ｐ[3,2]と候補文字Ｐ[4,3]の各確度値の総和となる。FIG. 5 exemplifies how the candidate character of each character of the search character string P and each character of the partial character string included in the searched character string T are matched in the above-described matching degree calculation process.
Here, each character or candidate character is indicated by □, and a matching character or candidate character is indicated by a black circle in □. The search character string P has four characters (m = 4),
The first character has up to the fifth candidate character, the second character has 3
It is assumed that there are candidate characters up to the th, the third character has candidate characters up to the fourth, and the fourth character has candidate characters up to the fifth. In the first loop of the counter i = 1, when the counter j = 2, the second candidate character P [1,2] of the first character of the search character string P is the first character T [of the partial character string T [ to
p + 1], the probability value of this candidate character P [1,2] is added to the score of the matching degree in S17. In the loop of the counter i = 2, when the counter j = 1, the first candidate character P [2,1] of the second character of the search character string P is 2 of the partial character string.
Since the character T matches the character T [top + 2], the accuracy value of this candidate character P [2,1] is added to the score of the matching degree. In the loop of the counter i = 3, when the counter j = 2, the second candidate character P [3,2] of the third character of the search character string P is the third character T [top + of the partial character string. 3], the probability value of this candidate character P [3,2] is added to the score of the matching degree. Then, in the loop of the counter i = 4, the counter j
= 3, the third candidate character P [4,3], which is the fourth character of the search character string P, matches the character T [top + 4], which is the fourth character of the partial character string. The accuracy value of P [4,3] is added to the score of the degree of coincidence. Therefore, the matching degree calculated here is the probability value of each of the candidate character P [1,2], the candidate character P [2,1], the candidate character P [3,2], and the candidate character P [4,3]. Is the sum of

【００４２】上記検索処理部１２では、被検索文字列の
検索処理が終了すると、最大一致度の得点を参照してそ
のデータの得点とする。また、同じデータ中に複数の被
検索文字列があった場合には、これらの最大一致度の得
点のうちのさらに最大のものをそのデータの得点とす
る。そして、例えばこの得点が所定以上となるデータの
みをデータベースから抽出し、これらのデータを得点が
高い順にディスプレイ装置４の検索結果リストに表示す
る。図１では、検索処理部１２の検索処理によって、補
助記憶装置３に格納されたデータベースから２つのデー
タが抽出され、ディスプレイ装置４の検索結果リストに
最高得点のデータ２と次に得点が高いデータ５とが表示
された状態を示す。When the search processing of the searched character string is completed, the search processing section 12 refers to the score of the maximum degree of coincidence and sets it as the score of the data. When there are a plurality of searched character strings in the same data, the maximum score of the maximum matching score is used as the score of the data. Then, for example, only the data whose score is a predetermined value or more is extracted from the database, and these data are displayed in the search result list of the display device 4 in descending order of the score. In FIG. 1, two pieces of data are extracted from the database stored in the auxiliary storage device 3 by the search processing of the search processing unit 12, and the highest score data 2 and the next highest score data are displayed in the search result list of the display device 4. 5 and 5 are displayed.

【００４３】以上説明したように、本実施例の電子機器
の文字列検索装置によれば、タブレット５と入力ペン７
を用いた手書き文字読取装置１１により検索文字列を入
力する場合に、この検索文字列に確定していない文字や
誤認識の文字が含まれていたとしても、文字の確定作業
や訂正作業を行うことなく検索を実行することができ
る。即ち、検索文字列に複数の候補文字からなる未確定
の文字があったとしても、被検索文字列の対応文字がい
ずれかの候補文字に一致すれば一致度が高得点となるの
で、この被検索文字列を有するデータを確実に抽出する
ことができる。また、手書き文字読取装置１１が手書き
文字の一部を誤認識した場合にも、残りの他の文字が確
実に一致すれば比較的高得点の一致度を得ることができ
るので、本来の検索文字列に一致する被検索文字列を有
するデータを抽出できる可能性が高くなる。As described above, according to the character string search device for electronic equipment of this embodiment, the tablet 5 and the input pen 7 are used.
When a search character string is input by the handwritten character reading device 11 using, even if the search character string includes unconfirmed characters or erroneously recognized characters, the character confirmation work or correction work is performed. You can perform a search without having to. That is, even if there is an undetermined character composed of a plurality of candidate characters in the search character string, if the corresponding character of the searched character string matches any one of the candidate characters, the degree of matching is high. It is possible to reliably extract the data having the search character string. Further, even when the handwritten character reading device 11 erroneously recognizes a part of the handwritten character, it is possible to obtain a relatively high score of coincidence if the remaining other characters surely coincide with each other. There is a high possibility that data having a searched character string that matches the string can be extracted.

【００４４】なお、図４に示した一致度の計算処理で
は、実際にはほとんどの部分文字列が検索文字列と全く
異なる文字列であるのが通常である。しかも、いずれの
候補文字にも一致しない文字がある程度の文字数以上検
出された場合には、最後の文字まで計算を続行したとし
ても、一致度が高得点となることはあり得ない。したが
って、処理の高速化のために、不一致の文字が所定数以
上検出された場合には、その部分文字列については、以
降の文字の一致度の計算処理を打ち切るようにしてもよ
い。In the matching degree calculation process shown in FIG. 4, most of the partial character strings are actually character strings that are completely different from the search character strings. In addition, if a certain number or more of characters that do not match any of the candidate characters are detected, even if the calculation is continued up to the last character, the degree of coincidence cannot be high. Therefore, in order to speed up the processing, when a predetermined number or more of unmatched characters are detected, the calculation processing of the matching degree of the subsequent characters may be terminated for the partial character string.

【００４５】また、最高一致度の得点が所定値以上とな
る被検索文字列のデータをその得点にかかわりなく全て
一律に抽出するような検索を行う場合には、図３に示し
た検索処理において、この所定値以上の一致度を有する
部分文字列が検出されると、そのデータは既に抽出の対
象となることが確定するので、以降の部分文字列につい
てのＳ４の一致度の計算処理を打ち切るようにすること
もできる。Further, in the case of performing a search for uniformly extracting all the data of the character string to be searched for which the score of the highest matching score is a predetermined value or more, regardless of the score, in the search process shown in FIG. When a partial character string having a matching degree equal to or higher than the predetermined value is detected, it is determined that the data is already extracted, so the calculation processing of the matching degree in S4 for the subsequent partial character strings is terminated. You can also do so.

【００４６】図６および図７は本発明の第２実施例を示
すものであって、図６は検索処理における一致度の計算
処理の動作を示すフローチャート、図７は検索文字列の
文字と部分文字列の候補文字とが一致する状態を示す図
である。なお、図１乃至図５に示した第１実施例と同様
の機能を有する構成部材には同じ符号を付記する。6 and 7 show the second embodiment of the present invention. FIG. 6 is a flow chart showing the operation of the calculation process of the degree of coincidence in the search process, and FIG. 7 is the characters and parts of the search character string. It is a figure which shows the state in which the candidate character of a character string corresponds. The constituents having the same functions as those of the first embodiment shown in FIGS. 1 to 5 are designated by the same reference numerals.

【００４７】本実施例は、各文字が確定した検索文字列
に基づいてデータベースの不確定文字列の被検索文字列
を検索する文字列検索装置を備えた電子機器について説
明する。この電子機器のハードウエア構成は図２に示し
た第１実施例と同じである。そして、文字列検索装置の
構成も、図１に示した第１実施例と同じでよい。ただ
し、本実施例では、補助記憶装置３に格納されたデータ
ベースの各データ中の被検索文字列が第１実施例で示し
たものと同じ不確定文字列によって構成されている。ま
た、本実施例では、タブレット５と入力ペン７による検
索文字列の入力の際に、手書き文字読取装置１１が手書
き入力文字を確定できなかった場合は、確定作業を行っ
て全ての文字を確定させてから検索処理部１２に入力す
る。したがって、この検索文字列は、図示しないキーボ
ードなどの文字の入力装置から入力したものであっても
よい。なお、データベース中の被検索文字列は、例えば
印刷物の活字文書などを図２に示したイメージスキャナ
６で画像パターンとして読み込み光学的文字読取装置に
よって認識したものを確定作業を行うことなく入力した
ものとする。また、タブレット５と入力ペン７を用いて
手書き文字読取装置１１により入力したものであっても
よい。In this embodiment, an electronic device equipped with a character string search device for searching a searched character string of an indefinite character string in a database based on a search character string in which each character is fixed will be described. The hardware configuration of this electronic device is the same as that of the first embodiment shown in FIG. The configuration of the character string search device may be the same as that of the first embodiment shown in FIG. However, in this embodiment, the searched character string in each data of the database stored in the auxiliary storage device 3 is composed of the same indeterminate character string as shown in the first embodiment. Further, in this embodiment, when the handwritten character reading device 11 cannot confirm the handwritten input character when inputting the search character string with the tablet 5 and the input pen 7, the confirmation operation is performed to confirm all the characters. Then, it is input to the search processing unit 12. Therefore, this search character string may be input from a character input device such as a keyboard (not shown). The searched character string in the database is, for example, a character string of a printed matter that is input as an image pattern by the image scanner 6 shown in FIG. 2 and recognized by an optical character reading device without performing a confirmation operation. And Further, it may be input by the handwritten character reading device 11 using the tablet 5 and the input pen 7.

【００４８】本実施例の検索処理部１２は、各文字が確
定した検索文字列に基づいて補助記憶装置３に格納され
たデータベースにおける各データ中の不確定文字列から
なる被検索文字列の検索を行う。この検索処理部１２の
検索処理は、図３に示した第１実施例の場合と同じであ
るが、Ｓ４における一致度の計算処理の内容は図４に示
した第１実施例の場合とは異なる。The search processing unit 12 of the present embodiment searches for a searched character string consisting of an indeterminate character string in each data in the database stored in the auxiliary storage device 3 based on the search character string in which each character is fixed. I do. The search processing of the search processing unit 12 is the same as that of the first embodiment shown in FIG. 3, but the content of the matching degree calculation processing in S4 is the same as that of the first embodiment shown in FIG. different.

【００４９】本実施例におけるこの一致度の計算処理の
詳細を図６に基づいて説明する。ただし、ここでは、検
索文字列Ｐにおけるｉ文字目の文字はＰ[i]で表すもの
とし、被検索文字列Ｔにおけるｉ文字目の文字のｊ番目
の候補文字はＴ[i,j]で表すものとする。一致度の計算
処理の開始時には、まず一致度の得点を“０”に初期化
すると共に（Ｓ２１）、カウンタｉの値とカウンタｊの
値を“１”に初期化する（Ｓ２２，Ｓ２３）。そして、
検索文字列Ｐにおけるｉ文字目の文字Ｐ[i]が被検索文
字列Ｔにおける部分文字列の先頭位置ｔｏｐからｉ文字
目の文字のｊ番目の候補文字Ｔ[top+i,j]に一致するか
どうかを判断する（Ｓ２４）。Details of the calculation processing of the degree of coincidence in this embodiment will be described with reference to FIG. However, here, the i-th character in the search character string P is represented by P [i], and the j-th candidate character of the i-th character in the searched character string T is T [i, j]. Shall be represented. At the start of the matching score calculation process, first, the score of the matching score is initialized to "0" (S21), and the values of the counter i and the counter j are initialized to "1" (S22, S23). And
The i-th character P [i] in the search character string P matches the j-th candidate character T [top + i, j] of the i-th character from the top position top of the partial character string in the searched character string T It is determined whether or not (S24).

【００５０】Ｓ２４において、文字Ｐ[i]が候補文字Ｔ
[top+i,j]に一致しないと判断された場合には、カウン
タｊに“１”を加えて（Ｓ２５）、次の候補文字Ｔ[top
+i,j]が存在するかどうかを判断し（Ｓ２６）、次の候
補文字Ｔ[top+i,j]が存在する場合にはＳ２４に戻って
この処理を繰り返す。したがって、ここでは検索文字列
Ｐの各文字Ｐ[i]が部分文字列における同じ文字位置で
対応する１または２以上の各候補文字Ｔ[top+i,j]と順
に比較される。In S24, the character P [i] is the candidate character T.
If it is determined that the value does not match [top + i, j], "1" is added to the counter j (S25), and the next candidate character T [top
It is determined whether or not + i, j] exists (S26), and if the next candidate character T [top + i, j] exists, the process returns to S24 and repeats this processing. Therefore, here, each character P [i] of the search character string P is sequentially compared with one or more corresponding candidate characters T [top + i, j] at the same character position in the partial character string.

【００５１】Ｓ２４において、文字Ｐ[i]が候補文字Ｔ
[top+i,j]に一致すると判断された場合には、残りの候
補文字との比較を打ち切って、その候補文字Ｔ[top+i,
j]に対応付けられた確度値を一致度の得点に加算する
（Ｓ２７）。そして、このＳ２７で一致度が加算された
場合、または、Ｓ２６で当該文字における全ての候補文
字との比較が完了したと判断された場合には、カウンタ
ｉに“１”を加えて次の文字に進み（Ｓ２８）、このカ
ウンタｉの値が検索文字列Ｐの文字数ｍを超えるまで、
Ｓ２３に戻ってカウンタｊを再初期化してからこの処理
を繰り返す（Ｓ２９）。また、Ｓ２９でカウンタｉの値
が文字数ｍを超えたと判断された場合には、一致度の計
算処理を終了する。したがって、検索文字列Ｐの各文字
が部分文字列における同じ文字位置で対応する候補文字
のいずれかに一致した場合には、その候補文字の確度値
が一致度の得点に順次加算される。また、いずれの候補
文字にも一致しなかった場合にも、得点の加算は行われ
ないが、以降の各文字について処理を続行する。このた
め、各部分文字列は、第１実施例の場合と同様に、検索
文字列Ｐとの一致の度合が高いほど一致度が高得点とな
る。In S24, the character P [i] is the candidate character T.
If it is determined that the character string matches [top + i, j], the comparison with the remaining candidate characters is aborted, and the candidate character T [top + i, j]
The accuracy value associated with j] is added to the score of the degree of coincidence (S27). Then, if the degree of coincidence is added in S27, or if it is determined in S26 that the comparison with all the candidate characters in the character is completed, "1" is added to the counter i and the next character is added. (S28), until the value of the counter i exceeds the number of characters m of the search character string P,
After returning to S23 and re-initializing the counter j, this processing is repeated (S29). If it is determined in S29 that the value of the counter i exceeds the number of characters m, the matching degree calculation process ends. Therefore, when each character of the search character string P matches any of the corresponding candidate characters at the same character position in the partial character string, the certainty value of the candidate character is sequentially added to the matching score. In addition, even if none of the candidate characters match, no score is added, but the process is continued for each subsequent character. Therefore, as in the case of the first embodiment, the higher the degree of match between the partial character strings and the search character string P, the higher the matching score.

【００５２】上記一致度の計算処理において、検索文字
列Ｐの各文字と被検索文字列Ｔに含まれる部分文字列の
各文字の候補文字とが一致する様子を図７に例示する。
ここでも、各文字や候補文字は□で示すものとし、一致
した文字や候補文字には□の中に黒丸を表示している。
また、検索文字列Ｐの文字数を４文字（ｍ＝４）とし、
被検索文字列Ｔの部分文字列が１文字目は７番目までの
候補文字があり、２文字目は５番目までの候補文字があ
り、３文字目は２番目までの候補文字があり、４文字目
は３番目までの候補文字があるものとする。カウンタｉ
＝１の最初のループでは、カウンタｊ＝２のときに、検
索文字列Ｐの１文字目の文字Ｐ[1]が部分文字列の１文
字目の２番目の候補文字Ｔ[top+1,2]と一致するので、
Ｓ２７でこの候補文字Ｔ[top+1,2]の確度値が一致度の
得点に加算される。カウンタｉ＝２のループでは、カウ
ンタｊ＝３のときに、検索文字列Ｐの２文字目の文字Ｐ
[2]が部分文字列の２文字目の３番目の候補文字Ｔ[top+
2,3]と一致するので、この候補文字Ｔ[top+2,3]の確度
値が一致度の得点に加算される。カウンタｉ＝３のルー
プでは、カウンタｊ＝１のときに、検索文字列Ｐの３文
字目の文字Ｐ[3]が部分文字列の３文字目の１番目の候
補文字Ｔ[top+3,1]と一致するので、この候補文字Ｔ[to
p+3,1]の確度値が一致度の得点に加算される。そして、
カウンタｉ＝４のループでは、カウンタｊ＝３のとき
に、検索文字列Ｐの４文字目の文字Ｐ[4]が部分文字列
の４文字目の３番目の候補文字Ｔ[top+4,3]と一致する
ので、この候補文字Ｔ[top+4,3]の確度値が一致度の得
点に加算される。したがって、ここで算出される一致度
は、候補文字Ｔ[top+1,2]と候補文字Ｔ[top+2,3]と候補
文字Ｔ[top+3,1]と候補文字Ｔ[top+4,3]の各確度値の総
和となる。FIG. 7 exemplifies how each character of the search character string P and the candidate character of each character of the partial character string included in the searched character string T are matched in the above-described matching degree calculation process.
Also in this case, each character or candidate character is indicated by □, and a matching character or candidate character is indicated by a black circle in □.
Further, the number of characters of the search character string P is 4 characters (m = 4),
The first character of the substring of the searched character string T has up to the seventh candidate character, the second character has up to the fifth candidate character, the third character has up to the second candidate character, 4 It is assumed that the third character has up to the third candidate character. Counter i
In the first loop of = 1, when the counter j = 2, the first character P [1] of the search character string P is the second candidate character T [top + 1,2 of the first character of the partial character string. 2] matches, so
In S27, the certainty value of this candidate character T [top + 1,2] is added to the score of the matching degree. In the loop of the counter i = 2, when the counter j = 3, the second character P of the search character string P
[2] is the third candidate character T [top +, which is the second character of the substring.
2,3], the probability value of this candidate character T [top + 2,3] is added to the score of the matching degree. In the loop of the counter i = 3, when the counter j = 1, the third character P [3] of the search character string P is the first candidate character T [top + 3, 3rd character of the partial character string. 1], so this candidate character T [to
The accuracy value of [p + 3,1] is added to the score of the matching score. And
In the loop of the counter i = 4, when the counter j = 3, the fourth character P [4] of the search character string P is the third candidate character T [top + 4, 4th character of the partial character string. 3], the probability value of this candidate character T [top + 4,3] is added to the score of the matching degree. Therefore, the matching degree calculated here is the candidate character T [top + 1,2], the candidate character T [top + 2,3], the candidate character T [top + 3,1], and the candidate character T [top +. 4,3] is the sum of the accuracy values.

【００５３】上記検索処理部１２で被検索文字列の検索
処理が終了すると、第１実施例の場合と同様に、最大一
致度の得点をそのデータの得点とし、例えばこの得点が
所定以上となるデータのみをデータベースから抽出し
て、これらのデータを得点が高い順にディスプレイ装置
４の検索結果リストに表示する。When the search processing of the searched character string is completed in the search processing section 12, the score of the maximum degree of coincidence is taken as the score of the data, as in the case of the first embodiment. For example, this score becomes a predetermined value or more. Only the data is extracted from the database, and these data are displayed in the search result list of the display device 4 in descending order of score.

【００５４】以上説明したように、本実施例の電子機器
の文字列検索装置によれば、イメージスキャナ６を用い
た光学的文字読取装置によりデータベースの各データ中
に被検索文字列を入力したような場合に、この被検索文
字列に確定していない文字や誤認識の文字が含まれてい
たとしても、文字の確定作業や訂正作業を行うことなく
検索を実行することができる。したがって、印刷物の活
字文書などを機械的に大量に読み込んでデータベース化
したようなものに対しても、検索文字列による検索を行
うことができるようになる。As described above, according to the character string retrieving apparatus for electronic equipment of the present embodiment, the optical character reading apparatus using the image scanner 6 is used to input the retrieved character string into each data of the database. In this case, even if the character string to be searched contains an unfixed character or a character that is erroneously recognized, the search can be executed without performing the character fixing operation or the correction operation. Therefore, it is possible to perform a search using a search character string even for a type of printed matter, such as a printed document, which is mechanically read in a large amount to form a database.

【００５５】なお、図６に示した一致度の計算処理や図
３に示した検索処理は、第１実施例の場合と同様に、不
一致の文字が所定数以上検出された場合や一致度の得点
が所定値以上となった場合に、高速化のために適宜処理
を途中で打ち切るようにすることができる。The matching degree calculation process shown in FIG. 6 and the retrieval process shown in FIG. 3 are similar to the case of the first embodiment when a predetermined number or more of non-matching characters are detected or the matching degree. When the score is equal to or higher than the predetermined value, the processing can be terminated midway in order to speed up the processing.

【００５６】図８および図９は本発明の第３実施例を示
すものであって、図８は検索処理における一致度の計算
処理の動作を示すフローチャート、図７は検索文字列の
候補文字と部分文字列の候補文字とが一致する状態を示
す図である。なお、図１乃至図５に示した第１実施例と
同様の機能を有する構成部材には同じ符号を付記する。8 and 9 show the third embodiment of the present invention. FIG. 8 is a flowchart showing the operation of the calculation process of the degree of coincidence in the search process, and FIG. 7 shows candidate characters of the search character string. It is a figure which shows the state in which the candidate character of a partial character string corresponds. The constituents having the same functions as those of the first embodiment shown in FIGS. 1 to 5 are designated by the same reference numerals.

【００５７】本実施例は、不確定文字列の検索文字列に
基づいてデータベースの不確定文字列の被検索文字列を
検索する文字列検索装置を備えた電子機器について説明
する。この電子機器のハードウエア構成は図２に示した
第１実施例と同じである。また、文字列検索装置の構成
も、図１に示した第１実施例と同じであり、タブレット
５と入力ペン７を用いて手書き文字読取装置１１により
入力した検索文字列は不確定文字列によって構成されて
いる。ただし、本実施例では、第２実施例の場合と同様
に、補助記憶装置３に格納されたデータベースの各デー
タ中の被検索文字列も不確定文字列によって構成されて
いる。This embodiment describes an electronic device equipped with a character string search device for searching a searched character string of an indeterminate character string in a database based on a search character string of an indeterminate character string. The hardware configuration of this electronic device is the same as that of the first embodiment shown in FIG. The configuration of the character string search device is also the same as that of the first embodiment shown in FIG. 1, and the search character string input by the handwritten character reading device 11 using the tablet 5 and the input pen 7 is an indeterminate character string. It is configured. However, in this embodiment, as in the case of the second embodiment, the searched character string in each data of the database stored in the auxiliary storage device 3 is also composed of an indeterminate character string.

【００５８】本実施例の検索処理部１２は、不確定文字
列の検索文字列に基づいて補助記憶装置３に格納された
データベースにおける各データ中の不確定文字列からな
る被検索文字列の検索を行う。この検索処理部１２の検
索処理は、図３に示した第１実施例の場合と同じである
が、Ｓ４における一致度の計算処理の内容は図４に示し
た第１実施例や図６に示した第２実施例の場合とは異な
る。The search processing unit 12 of the present embodiment searches for a searched character string consisting of the undefined character string in each data in the database stored in the auxiliary storage device 3 based on the searched character string of the undefined character string. I do. The search processing of the search processing unit 12 is the same as that in the first embodiment shown in FIG. 3, but the content of the matching degree calculation processing in S4 is the same as in the first embodiment shown in FIG. 4 and FIG. This is different from the case of the second embodiment shown.

【００５９】本実施例におけるこの一致度の計算処理の
詳細を図８に基づいて説明する。ただし、ここでは、検
索文字列Ｐにおけるｉ文字目の文字のｊ番目の候補文字
はＰ[i,j]で表すものとし、被検索文字列Ｔにおけるｉ
文字目の文字のｋ番目の候補文字はＴ[i,k]で表すもの
とする。一致度の計算処理の開始時には、まず一致度の
得点を“０”に初期化すると共に（Ｓ３１）、カウンタ
ｉの値とカウンタｊの値とカウンタｋの値をそれぞれ
“１”に初期化する（Ｓ３２〜Ｓ３４）。そして、検索
文字列Ｐにおけるｉ文字目の文字のｊ番目の候補文字Ｐ
[i,j]が被検索文字列Ｔにおける部分文字列の先頭位置
ｔｏｐからｉ文字目の文字のｋ番目の候補文字Ｔ[top+
i,k]に一致するかどうかを判断する（Ｓ３５）。Details of the calculation process of the degree of coincidence in this embodiment will be described with reference to FIG. However, here, the j-th candidate character of the i-th character in the search character string P is represented by P [i, j], and i in the searched character string T
The kth candidate character of the first character is represented by T [i, k]. At the start of the matching score calculation process, first, the score of the matching score is initialized to "0" (S31), and the values of the counter i, the counter j, and the counter k are initialized to "1". (S32-S34). Then, the j-th candidate character P of the i-th character in the search character string P
[i, j] is the k-th candidate character T [top +] of the i-th character from the top position top of the partial character string in the searched character string T
It is determined whether or not they match i, k] (S35).

【００６０】Ｓ３５において、候補文字Ｐ[i,j]が候補
文字Ｔ[top+i,k]に一致しないと判断された場合には、
カウンタｋに“１”を加えて（Ｓ３６）、次の候補文字
Ｔ[top+i,k]が存在するかどうかを判断し（Ｓ３７）、
次の候補文字Ｔ[top+i,k]が存在する場合にはＳ３５に
戻ってこの処理を繰り返す。また、次の候補文字Ｔ[top
+i,k]が存在しないと判断された場合には、カウンタｊ
に“１”を加えて（Ｓ３８）、次の候補文字Ｐ[i,j]が
存在するかどうかを判断し（Ｓ３９）、次の候補文字Ｐ
[i,j]が存在する場合にはＳ３４に戻りカウンタｋを再
初期化してからこの処理を繰り返す。したがって、ここ
では検索文字列Ｐの各文字の１または２以上の候補文字
Ｐ[i,j]と、部分文字列における同じ文字位置で対応す
る１または２以上の候補文字Ｔ[top+i,k]とが総当たり
で順に比較される。If it is determined in S35 that the candidate character P [i, j] does not match the candidate character T [top + i, k],
"1" is added to the counter k (S36), it is determined whether the next candidate character T [top + i, k] exists (S37),
If the next candidate character T [top + i, k] is present, the process returns to S35 and repeats this process. Also, the next candidate character T [top
If it is determined that + i, k] does not exist, the counter j
Is incremented by 1 (S38), it is determined whether or not the next candidate character P [i, j] exists (S39), and the next candidate character P is determined.
If [i, j] is present, the process returns to S34 and the counter k is re-initialized, and this process is repeated. Therefore, here, one or more candidate characters P [i, j] of each character of the search character string P and one or more candidate characters T [top + i, corresponding to the same character position in the partial character string are provided. k] and brute force are sequentially compared.

【００６１】Ｓ３５において、候補文字Ｐ[i,j]が候補
文字Ｔ[top+i,k]に一致すると判断された場合には、残
りの候補文字との比較を打ち切って、候補文字Ｐ[i,j]
と候補文字Ｔ[top+i,k]に対応付けられたそれぞれの確
度値を一致度の得点に加算する（Ｓ４０）。そして、こ
のＳ４０で一致度が加算された場合、または、Ｓ３９で
当該文字における全ての候補文字の比較が完了したと判
断された場合には、カウンタｉに“１”を加えて次の文
字に進み（Ｓ４１）、このカウンタｉの値が検索文字列
Ｐの文字数ｍを超えるまで、Ｓ３３に戻りカウンタｊと
カウンタｋを再初期化してからこの処理を繰り返す（Ｓ
４２）。また、Ｓ４２でカウンタｉの値が文字数ｍを超
えたと判断された場合には、一致度の計算処理を終了す
る。したがって、検索文字列Ｐの各文字の候補文字のい
ずれかが部分文字列における同じ文字位置で対応する候
補文字のいずれかに一致した場合には、一致した双方の
候補文字の確度値が一致度の得点に順次加算される。ま
た、いずれの候補文字にも一致しなかった場合にも、得
点の加算は行われないが、以降の各文字について処理を
続行する。このため、各部分文字列は、第１実施例や第
２実施例の場合と同様に、検索文字列Ｐとの一致の度合
が高いほど一致度が高得点となる。In S35, when it is determined that the candidate character P [i, j] matches the candidate character T [top + i, k], the comparison with the remaining candidate characters is terminated and the candidate character P [ i, j]
And the certainty value associated with the candidate character T [top + i, k] are added to the score of the matching degree (S40). Then, if the degree of coincidence is added in S40, or if it is determined in S39 that the comparison of all candidate characters in the character is completed, "1" is added to the counter i and the next character is added. The process proceeds to step S41, and returns to step S33 until the value of the counter i exceeds the number of characters m of the search character string P, the counter j and the counter k are re-initialized, and then this process is repeated (S41).
42). If it is determined in S42 that the value of the counter i exceeds the number of characters m, the matching degree calculation process ends. Therefore, when any one of the candidate characters of each character of the search character string P matches any of the corresponding candidate characters at the same character position in the partial character string, the certainty value of both matched candidate characters is determined to be the matching degree. Are sequentially added to the score. In addition, even if none of the candidate characters match, no score is added, but the process is continued for each subsequent character. Therefore, as in the first and second embodiments, the higher the degree of matching with the search character string P, the higher the matching score of each partial character string.

【００６２】上記一致度の計算処理において、検索文字
列Ｐの各文字の候補文字と被検索文字列Ｔに含まれる部
分文字列の各文字の候補文字とが一致する様子を図９に
例示する。ここでも、各文字や候補文字は□で示すもの
とし、一致した文字や候補文字には□の中に黒丸を表示
している。また、検索文字列Ｐは、文字数を４文字（ｍ
＝４）とし、各文字にはそれぞれ図５に示した第１実施
例の場合と同じ数の候補文字があるものとする。また、
部分文字列の各文字には、それぞれ図７に示した第２実
施例の場合と同じ数の候補文字があるものとする。カウ
ンタｉ＝１の最初のループでは、カウンタｊ＝２，ｋ＝
２のときに、検索文字列Ｐの１文字目の２番目の候補文
字Ｐ[1,2]が部分文字列の１文字目の２番目の候補文字
Ｔ[top+1,2]と一致するので、Ｓ４０でこれらの候補文
字Ｐ[1,2]と候補文字Ｔ[top+1,2]の確度値が一致度の得
点に加算される。カウンタｉ＝２のループでは、カウン
タｊ＝１，ｋ＝３のときに、検索文字列Ｐの２文字目の
１番目の候補文字Ｐ[2,1]が部分文字列の２文字目の３
番目の候補文字Ｔ[top+2,3]と一致するので、これらの
候補文字Ｐ[2,1]と候補文字Ｔ[top+2,3]の確度値が一致
度の得点に加算される。カウンタｉ＝３のループでは、
カウンタｊ＝２，ｋ＝１のときに、検索文字列Ｐの３文
字目の２番目の候補文字Ｐ[3,2]が部分文字列の３文字
目の１番目の候補文字Ｔ[top+3,1]と一致するので、こ
れらの候補文字Ｐ[3,2]と候補文字Ｔ[top+3,1]の確度値
が一致度の得点に加算される。そして、カウンタｉ＝４
のループでは、カウンタｊ＝３，ｋ＝３のときに、検索
文字列Ｐの４文字目の３番目の候補文字Ｐ[4,3]が部分
文字列の４文字目の３番目の候補文字Ｔ[top+4,3]と一
致するので、これらの候補文字Ｐ[4,3]と候補文字Ｔ[to
p+4,3]の確度値が一致度の得点に加算される。したがっ
て、ここで算出される一致度は、候補文字Ｐ[1,2]と候
補文字Ｐ[2,1]と候補文字Ｐ[3,2]と候補文字Ｐ[4,3]の
各確度値の和と、候補文字Ｔ[top+1,2]と候補文字Ｔ[to
p+2,3]と候補文字Ｔ[top+3,1]と候補文字Ｔ[top+4,3]の
各確度値の和との総和となる。FIG. 9 exemplifies how the candidate character of each character of the search character string P and the candidate character of each character of the partial character string included in the searched character string T are matched in the above-described matching degree calculation process. . Also in this case, each character or candidate character is indicated by □, and a matching character or candidate character is indicated by a black circle in □. Also, the search character string P has a number of characters of 4 (m
= 4), and each character has the same number of candidate characters as in the case of the first embodiment shown in FIG. Also,
It is assumed that each character of the partial character string has the same number of candidate characters as in the case of the second embodiment shown in FIG. In the first loop with counter i = 1, counters j = 2, k =
When 2, the second candidate character P [1,2] of the first character of the search character string P matches the second candidate character T [top + 1,2] of the first character of the partial character string. Therefore, in S40, the probability values of the candidate character P [1,2] and the candidate character T [top + 1,2] are added to the score of the matching degree. In the loop of the counter i = 2, when the counters j = 1 and k = 3, the first candidate character P [2,1] of the second character of the search character string P is the third character 3 of the second character of the partial character string.
Since the second candidate character T [top + 2,3] matches, the accuracy values of these candidate characters P [2,1] and candidate character T [top + 2,3] are added to the score of the matching degree. . In the loop of counter i = 3,
When the counters j = 2 and k = 1, the second candidate character P [3,2] of the third character of the search character string P is the first candidate character T [top + of the third character of the partial character string. 3,1], the probability values of the candidate character P [3,2] and the candidate character T [top + 3,1] are added to the score of the matching degree. And the counter i = 4
In the loop, when the counters j = 3 and k = 3, the third candidate character P [4,3] of the fourth character of the search character string P is the third candidate character of the fourth character of the partial character string. Since these match T [top + 4,3], these candidate characters P [4,3] and candidate characters T [to
The accuracy value of [p + 4,3] is added to the matching score. Therefore, the matching degree calculated here is the probability value of each of the candidate character P [1,2], the candidate character P [2,1], the candidate character P [3,2], and the candidate character P [4,3]. , The candidate character T [top + 1,2] and the candidate character T [to
p + 2,3], the candidate character T [top + 3,1], and the candidate character T [top + 4,3] are the sums of the respective probability values.

【００６３】上記検索処理部１２で被検索文字列の検索
処理が終了すると、第１実施例や第２実施例の場合と同
様に、最大一致度の得点をそのデータの得点とし、例え
ばこの得点が所定以上となるデータのみをデータベース
から抽出して、これらのデータを得点が高い順にディス
プレイ装置４の検索結果リストに表示する。When the search processing of the searched character string is completed in the search processing section 12, the score of the maximum degree of coincidence is used as the score of the data, as in the case of the first and second embodiments. Only the data having a predetermined value or more are extracted from the database, and these data are displayed in the search result list of the display device 4 in descending order of score.

【００６４】以上説明したように、本実施例の電子機器
の文字列検索装置によれば、検索文字列や被検索文字列
を文字読取装置によって入力したような場合に、これら
の文字列に確定していない文字や誤認識の文字が含まれ
ていたとしても、文字の確定作業や訂正作業を行うこと
なく検索を実行することができる。As described above, according to the character string search device for electronic equipment of this embodiment, when a search character string or a searched character string is input by the character reading device, these character strings are confirmed. Even if a character that has not been recognized or a character that is erroneously recognized is included, the search can be executed without performing the work of fixing or correcting the character.

【００６５】なお、図８に示した一致度の計算処理や図
３に示した検索処理は、第１実施例や第２実施例の場合
と同様に、不一致の文字が所定数以上検出された場合や
一致度の得点が所定値以上となった場合に、高速化のた
めに適宜処理を途中で打ち切るようにすることができ
る。In the matching degree calculation process shown in FIG. 8 and the search process shown in FIG. 3, as in the first and second embodiments, a predetermined number or more of mismatched characters are detected. In this case or when the score of the degree of coincidence becomes a predetermined value or more, the processing can be aborted midway in order to speed up the processing.

【００６６】また、第１〜第３実施例における検索処理
では、いずれの候補文字にも一致しない文字が検出され
た場合にも以降の文字の比較を続行するので、ＢＭ法の
ような高速化アルゴリズムを用いることはできない。し
かし、例えば不一致の文字が所定数以上検出されたとき
に以降の文字の比較を打ち切るようにする場合には、Ｂ
Ｍ法などを応用した高速化アルゴリズムを利用すること
も可能となる。Further, in the search processing in the first to third embodiments, the comparison of the following characters is continued even if a character that does not match any of the candidate characters is detected, so that the speed is increased as in the BM method. No algorithm can be used. However, for example, when the comparison of the following characters is terminated when a predetermined number or more of mismatched characters are detected, B
It is also possible to use a speed-up algorithm that applies the M method or the like.

【００６７】さらに、第１実施例や第３実施例では、不
確定文字列の検索文字列による検索を行うが、この不確
定文字列は、文字ごとの各候補文字を選択[union]によ
って結合すると共に、各文字を連結[concatenation]に
よって結合した正規表現[regular expression]として表
すことができるので、上記検索処理に代えて、この正規
表現によって構成される有限オートマトン[finite auto
maton]を用いた検索処理を行うことも可能となる。ただ
し、この場合には、文字ごとの各候補文字の状態に加え
て、これらの候補文字以外の全ての文字を受理する状態
を追加すると共に、各候補文字の状態に確度値を対応付
けておき、部分文字列から文字を受理することにより実
際に遷移した状態の確度値を順次加算するような処理と
する必要がある。Furthermore, in the first and third embodiments, the uncertain character string is searched by the retrieval character string. For this uncertain character string, each candidate character is combined by selecting [union]. In addition, since each character can be expressed as a regular expression [regular expression] that is combined by concatenation [concatenation], instead of the above search processing, a finite automaton [finite auto] configured by this regular expression is used.
It is also possible to perform a search process using [maton]. However, in this case, in addition to the state of each candidate character for each character, a state of accepting all characters other than these candidate characters is added, and the probability value is associated with the state of each candidate character. , It is necessary to perform processing such that the certainty values of the actual transition states are sequentially added by receiving characters from the partial character string.

【００６８】また、第１〜第３実施例における検索処理
では、被検索文字列に含まれる全ての部分文字列を検索
対象としたが、この被検索文字列の文字数が検索文字列
の文字数に一致する場合にのみ、被検索文字列全体を部
分文字列として検索を行うようにすることもできる。In the search processing in the first to third embodiments, all the partial character strings included in the searched character string are searched, but the number of characters in this searched character string is the number of characters in the searched character string. Only when they match, the entire searched character string may be searched as a partial character string.

【００６９】[0069]

【発明の効果】以上のように本発明の文字列検索装置に
よれば、検索文字列と被検索文字列に含まれる各部分文
字列との一致度を候補文字の確度値に基づいて算出し、
一致した候補文字が正しい文字である可能性が高いほど
この一致度の比較によって優位に抽出することができる
ので、文字読取装置などによって入力した各文字の確定
していない不確定文字列を検索文字列や被検索文字列と
して検索を行うことができるようになる。この際、検索
文字列と被検索文字列のいずれが不確定文字列であって
もよく、また、双方が不確定文字列であってもよい。し
かも、一部の文字が不確定文字列の対応する文字の全て
の候補文字に一致しない場合があったとしても、他の文
字が正確に一致すれば比較的高い一致度を得ることがで
きるので、文字認識が全くできなかった場合や誤認識が
生じていた場合にも検索が可能となる。したがって、例
えば手書き文字読取装置によって検索文字列を入力した
場合や光学的文字読取装置によって入力した被検索文字
列に対しても、文字の確定作業や訂正作業を行うことな
く検索を実行することができるようになる。As described above, according to the character string search device of the present invention, the degree of coincidence between the search character string and each partial character string included in the searched character string is calculated based on the probability value of the candidate character. ,
The higher the probability that the matched candidate character is a correct character, the more it can be extracted by the comparison of the degree of matching, so the uncertain character string of each character input by a character reader etc. It becomes possible to search as a string or a searched character string. At this time, either the search character string or the searched character string may be an indefinite character string, or both may be an indefinite character string. Moreover, even if some characters do not match all the candidate characters of the corresponding characters of the indeterminate character string, it is possible to obtain a relatively high degree of matching if the other characters match exactly. The search can be performed even when character recognition cannot be performed at all or when erroneous recognition occurs. Therefore, for example, when the search character string is input by the handwritten character reading device or the searched character string input by the optical character reading device, the search can be performed without performing the character fixing work or the correction work. become able to.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の第１実施例を示すものであって、文字
列検索装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a character string search device according to a first embodiment of the present invention.

【図２】本発明の第１実施例を示すものであって、文字
列検索装置を備えた電子機器の構成を示すブロック図で
ある。FIG. 2 shows the first embodiment of the present invention and is a block diagram showing a configuration of an electronic device including a character string search device.

【図３】本発明の第１実施例を示すものであって、文字
列検索装置における検索処理の動作を示すフローチャー
トである。FIG. 3 shows the first embodiment of the present invention and is a flow chart showing an operation of a search process in the character string search device.

【図４】本発明の第１実施例を示すものであって、検索
処理における一致度の計算処理の動作を示すフローチャ
ートである。FIG. 4 shows the first embodiment of the present invention and is a flowchart showing the operation of the matching degree calculation processing in the search processing.

【図５】本発明の第１実施例を示すものであって、検索
文字列の候補文字と部分文字列の文字とが一致する状態
を示す図である。FIG. 5 is a diagram showing a first embodiment of the present invention and is a diagram showing a state where candidate characters of a search character string and characters of a partial character string match.

【図６】本発明の第２実施例を示すものであって、検索
処理における一致度の計算処理の動作を示すフローチャ
ートである。FIG. 6 shows a second embodiment of the present invention and is a flowchart showing the operation of the matching degree calculation processing in the search processing.

【図７】本発明の第２実施例を示すものであって、検索
文字列の文字と部分文字列の候補文字とが一致する状態
を示す図である。FIG. 7 is a diagram illustrating a second embodiment of the present invention and is a diagram illustrating a state in which characters in a search character string and candidate characters in a partial character string match.

【図８】本発明の第３実施例を示すものであって、検索
処理における一致度の計算処理の動作を示すフローチャ
ートである。FIG. 8 shows a third embodiment of the present invention and is a flowchart showing the operation of the matching degree calculation processing in the search processing.

【図９】本発明の第３実施例を示すものであって、検索
文字列の候補文字と部分文字列の候補文字とが一致する
状態を示す図である。FIG. 9 is a diagram showing a third embodiment of the present invention and is a diagram showing a state where candidate characters of a search character string and candidate characters of a partial character string match.

【図１０】従来例を示すものであって、検索文字列と被
検索文字列の各部分文字列との関係を示す図である。FIG. 10 is a diagram showing a conventional example and showing a relationship between a search character string and each partial character string of a searched character string.

【図１１】従来例を示すものであって、文字列の検索処
理の動作を示すフローチャートである。FIG. 11 shows a conventional example and is a flowchart showing an operation of a character string search process.

【符号の説明】[Explanation of symbols]

１演算装置３補助記憶装置５タブレット６イメージスキャナ７入力ペン１１手書き文字読取装置１２検索処理部 1 Computing Device 3 Auxiliary Storage Device 5 Tablet 6 Image Scanner 7 Input Pen 11 Handwritten Character Reading Device 12 Search Processing Section

Claims

【特許請求の範囲】[Claims]

【請求項１】各文字が１または２以上の候補文字から
なり、かつ、各候補文字ごとに確度値が対応付けられた
不確定文字列を検索文字列として入力する検索文字列入
力手段と、文字が確定した被検索文字列を有するデータの集合から
なるデータベースと、該データベースにおける各データ内の各被検索文字列に
含まれる、該検索文字列と同じ文字数の各部分文字列に
ついて、該部分文字列の各文字が同じ文字位置で対応す
る検索文字列の文字のいずれかの候補文字に一致する場
合に、それぞれの文字ごとに一致した候補文字の確度値
に基づいて当該部分文字列の一致度に演算を施す一致度
演算手段と、該一致度演算手段が算出した各部分文字列の一致度を他
の部分文字列の一致度または所定値と比較して、この比
較結果に基づき選出した部分文字列を含む被検索文字列
を有するデータを該データベースから抽出するデータ抽
出手段とを備えた文字列検索装置。1. A search character string input means for inputting, as a search character string, an indeterminate character string in which each character is composed of one or more candidate characters, and a probability value is associated with each candidate character. A database consisting of a set of data having a searched character string in which characters are fixed, and for each partial character string included in each searched character string in each data in the database and having the same number of characters as the searched character string, If each character in the string matches any candidate character in the corresponding search string at the same character position, match that substring based on the probability value of the matching candidate character for each character Matching degree calculating means for calculating the matching degree and the matching degree of each partial character string calculated by the matching degree calculating means are compared with the matching degree of another partial character string or a predetermined value, and selected based on this comparison result. Department A character string search device comprising: a data extracting unit that extracts data having a searched character string including a minute character string from the database.

【請求項２】文字が確定した検索文字列を入力する検
索文字列入力手段と、各文字が１または２以上の候補文字からなり、かつ、各
候補文字ごとに確度値が対応付けられた不確定文字列で
ある被検索文字列を有するデータの集合からなるデータ
ベースと、該データベースにおける各データ内の各被検索文字列に
含まれる、該検索文字列と同じ文字数の各部分文字列に
ついて、該部分文字列の各文字のいずれかの候補文字が
同じ文字位置で対応する検索文字列の文字に一致する場
合に、それぞれの文字ごとに一致した候補文字の確度値
に基づいて当該部分文字列の一致度に演算を施す一致度
演算手段と、該一致度演算手段が算出した各部分文字列の一致度を他
の部分文字列の一致度または所定値と比較して、この比
較結果に基づき選出した部分文字列を含む被検索文字列
を有するデータを該データベースから抽出するデータ抽
出手段とを備えた文字列検索装置。2. A search character string input means for inputting a search character string in which a character is fixed, each character consisting of one or more candidate characters, and an accuracy value associated with each candidate character. For a database consisting of a set of data having a searched character string that is a fixed character string, and for each partial character string included in each searched character string in each data in the database and having the same number of characters as the searched character string, When any candidate character of each character of the substring matches the character of the corresponding search character string at the same character position, the substring of that character string is based on the probability value of the matching candidate character for each character. Matching degree calculating means for calculating the matching degree, and comparing the matching degree of each partial character string calculated by the matching degree calculating means with the matching degree of another partial character string or a predetermined value, and selecting based on this comparison result. Department A character string search device comprising: a data extracting unit that extracts data having a searched character string including a minute character string from the database.

【請求項３】各文字が１または２以上の候補文字から
なり、かつ、各候補文字ごとに確度値が対応付けられた
不確定文字列を検索文字列として入力する検索文字列入
力手段と、各文字が１または２以上の候補文字からなり、かつ、各
候補文字ごとに確度値が対応付けられた不確定文字列で
ある被検索文字列を有するデータの集合からなるデータ
ベースと、該データベースにおける各データ内の各被検索文字列に
含まれる、該検索文字列と同じ文字数の各部分文字列に
ついて、該部分文字列の各文字のいずれかの候補文字が
同じ文字位置で対応する検索文字列の文字のいずれかの
候補文字に一致する場合に、それぞれの文字ごとに一致
した双方の候補文字の確度値に基づいて当該部分文字列
の一致度に演算を施す一致度演算手段と、該一致度演算手段が算出した各部分文字列の一致度を他
の部分文字列の一致度または所定値と比較して、この比
較結果に基づき選出した部分文字列を含む被検索文字列
を有するデータを該データベースから抽出するデータ抽
出手段とを備えた文字列検索装置。3. A search character string input means for inputting, as a search character string, an indeterminate character string in which each character is composed of one or more candidate characters, and a probability value is associated with each candidate character, A database consisting of a set of data in which each character consists of one or more candidate characters, and a searched character string which is an uncertain character string in which a certainty value is associated with each candidate character, and a database in the database. For each partial character string having the same number of characters as the searched character string included in each searched character string in each data, a search character string in which any candidate character of each character of the partial character string corresponds at the same character position. Match character calculating means for calculating the match value of the partial character string on the basis of the probability value of both candidate characters matched for each character when matching any one of the candidate characters Every time The degree of coincidence of each partial character string calculated by the calculating means is compared with the degree of coincidence of another partial character string or a predetermined value, and the data having the searched character string including the partial character string selected based on the comparison result is obtained. A character string search device comprising a data extraction means for extracting from a database.

【請求項４】前記検索文字列入力手段が、文字読取装
置によって入力文字列の各文字を識別し、それぞれの文
字ごとに１または２以上の候補文字を選出すると共に、
選出した各候補文字ごとに認識の正確さを示す確度値を
付加する請求項１または３記載の文字列検索装置。4. The search character string input means identifies each character of the input character string by a character reading device and selects one or more candidate characters for each character, and
The character string search device according to claim 1 or 3, wherein a certainty value indicating accuracy of recognition is added to each of the selected candidate characters.

【請求項５】前記データベースにおける各データの各
被検索文字列が、文字読取装置によって入力文字列の各
文字を識別し、それぞれの文字ごとに１または２以上の
候補文字を選出すると共に、選出した各候補文字ごとに
認識の正確さを示す確度値を付加して入力した請求項２
〜４のうちいずれかに記載の文字列検索装置。5. Each searched character string of each data in the database identifies each character of the input character string by a character reading device, selects one or more candidate characters for each character, and selects each character. The accuracy value indicating the accuracy of recognition is added to each of the candidate characters that have been input and input.
The character string search device according to any one of to 4.

【請求項６】前記一致度演算手段が、被検索文字列の
文字数が検索文字列の文字数に一致する場合にのみ該被
検索文字列全体を唯一の部分文字列として一致度の演算
を行う請求項１〜５のうちいずれかに記載の文字列検索
装置。6. The coincidence degree calculating means calculates the degree of coincidence by using the entire searched character string as an only partial character string only when the number of characters of the searched character string matches the number of characters of the searched character string. The character string search device according to any one of Items 1 to 5.