JP4660504B2

JP4660504B2 - Text processing apparatus and program

Info

Publication number: JP4660504B2
Application number: JP2007135598A
Authority: JP
Inventors: 外志正土橋; 博之水谷; 明弘宇田; 直朗小平; 智久鈴木; 彰夫古畑
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2007-05-22
Filing date: 2007-05-22
Publication date: 2011-03-30
Anticipated expiration: 2027-05-22
Also published as: JP2008293109A

Description

本発明は、操作者による文字列修正作業を支援するテキスト処理装置及びプログラムに関する。 The present invention relates to a text processing apparatus and a program that support a character string correction operation by an operator.

近年、コンピュータの処理能力の高まりと共に文字認識、音声認識が広く用いられるようになっている。これら認識技術の認識性能は年々向上しているが、それでも誤認識が生じることは避けられない。そのため、認識テキストに含まれている誤認識文字列を修正するための操作者による修正作業が必要とされる。 In recent years, character recognition and voice recognition have been widely used as computer processing power increases. Although the recognition performance of these recognition technologies is improving year by year, it is still inevitable that erroneous recognition occurs. Therefore, a correction work by the operator for correcting the erroneously recognized character string included in the recognized text is required.

このような修正作業では、漢字１文字のみの修正が多発することがある。この場合、修正文字の入力に用いるインプットメソッド（仮名漢字変換モジュール、音声文字変換モジュールなど）の入力効率が悪くなる。例えば、文字認識において、文字列「晴天」の「天」が「夫」と誤認識されることがある。通常、画面表示されている認識テキスト中の誤認識文字「夫」を操作者が選択すると、当該誤認識文字「夫」の修正候補の一覧が画面表示される。もし、修正候補の一覧に正解文字「天」が含まれているならば、操作者はその「天」を選択すれば良い。しかし、修正候補の一覧が、例えば「１．人２．未３．大」であって、正解文字「天」を含んでいない場合、操作者は修正入力フィールドに当該正解文字（修正文字）「天」を入力することによって修正を行うこととなる。 In such correction work, correction of only one Chinese character may occur frequently. In this case, the input efficiency of input methods (such as the kana-kanji conversion module and the phonetic character conversion module) used for inputting the corrected characters is deteriorated. For example, in character recognition, “heaven” in the character string “clear sky” may be erroneously recognized as “husband”. Normally, when the operator selects the erroneously recognized character “husband” in the recognized text displayed on the screen, a list of correction candidates for the erroneously recognized character “husband” is displayed on the screen. If the correct character “heaven” is included in the list of correction candidates, the operator may select that “heaven”. However, if the list of correction candidates is, for example, “1. Person 2. Not 3. Large” and does not include the correct character “Ten”, the operator enters the correct character (corrected character) “ It will be corrected by entering "heaven".

ここで、操作者がインプットメソッドを用いて正解文字（修正文字）「天」を入力するものとする。もし、単漢字の仮名漢字変換により「天」を入力する場合、操作者は仮名文字列「てん」を入力する操作を行う。この入力仮名文字列「てん」が漢字に変換される。しかし、読みが「てん」の漢字は「天」以外に、「点」、「展」、「転」、「典」など様々あるため、操作者が意図した修正文字を効率良く入力することは難しい。 Here, it is assumed that the operator inputs the correct character (corrected character) “heaven” using the input method. If “ten” is input by kana-kanji conversion of single kanji, the operator performs an operation of inputting the kana character string “ten”. This input kana character string “Ten” is converted into Kanji. However, there are various Kanji readings with “Ten” other than “Ten”, such as “Dot”, “Exhibition”, “Transform”, “Ten”, etc. difficult.

もし、例えば「天気」のように、正解文字「天」を含む変換しやすい単語を入力し、その後余分な文字（この場合「気」）を消去するならば、文字変換効率の問題を解消することが可能となる。しかし、この手法は余分な文字消去の手間を招く。 If, for example, “weather” is entered, an easy-to-convert word including the correct character “heaven” is input, and then the extra characters (in this case “ki”) are erased to eliminate the problem of character conversion efficiency. It becomes possible. However, this method causes extra time for erasing characters.

そこで、例えば特許文献１には、余分な文字の消去を自動的に行うようにした技術（以下、第１の技術と称する）が記載されている。第１の技術では、認識テキスト中の修正されるべき誤認識文字が操作者によって選択される。また、正解文字を含む文字列の読みが操作者によって入力される。入力された読みは、仮名漢字変換手段によって正解文字を含む文字列に変換される。第１の技術では、認識テキスト中の誤認識文字の周辺の文字と、仮名漢字変換手段によって得られた文字列（正解文字を含む文字列）との比較により一致する文字が検索される。そして、一致する文字を除いた残りの文字に基づいて修正文字が決定される。 Thus, for example, Patent Document 1 describes a technique (hereinafter referred to as a first technique) that automatically deletes excess characters. In the first technique, an erroneously recognized character to be corrected in the recognized text is selected by the operator. In addition, a reading of a character string including a correct character is input by the operator. The input reading is converted into a character string including correct characters by the kana-kanji conversion means. In the first technique, a matching character is searched by comparing a character around a misrecognized character in the recognized text with a character string (character string including a correct character) obtained by the kana-kanji conversion means. Then, a corrected character is determined based on the remaining characters excluding the matching characters.

第１の技術によれば、例えば「晴天」が「晴夫」と誤認識された場合、操作者は「晴夫」中の誤認識文字「夫」を「天」に修正するのに、当該誤認識文字「夫」を選択指定すると共に、「晴天」の読み「せいてん」を入力すれば良い。ここでは、読み「せいてん」が仮名漢字変換手段によって文字列「晴天」に変換される。そして、誤認識文字「夫」の周辺の文字と、仮名漢字変換手段により得られた文字列「晴天」との比較により、一致する文字「晴」が検索される。この場合、文字列「晴天」から一致する文字「晴」を除いた文字、つまり「天」が、誤認識文字「夫」に置き換えられるべき修正文字として決定される。 According to the first technique, for example, when “Sunny” is misrecognized as “Haruo”, the operator corrects the misrecognized character “husband” in “Haruo” to “Ten”. The character “husband” can be selected and specified, and the reading “seiten” of “sunny weather” can be input. In this case, the reading “seiten” is converted into the character string “sunny” by the kana-kanji conversion means. Then, by comparing the character in the vicinity of the misrecognized character “husband” with the character string “sunny” obtained by the kana-kanji conversion means, the matching character “sunny” is searched. In this case, a character obtained by removing the matching character “sunny” from the character string “sunny”, that is, “heaven” is determined as a corrected character to be replaced with the misrecognized character “husband”.

また、上記特許文献１には、同音異義語が存在するために操作者の意図した文字列に速やかに仮名漢字変換できない場合を考慮して修正文字を決定する技術（以下、第２の技術と称する）も記載されている。この第２の技術の特徴は、例えば「会議」の「会」が「合」と誤認識された場合、操作者が「会議」の読み「かいぎ」を入力すると、「会議」以外の単語である「懐疑」「回議」などに変換される可能性があることから、これを回避するために「会議」の後ろ側に例えば１文字「室」が続く文字列「会議室」の読み「かいぎしつ」を入力した場合の修正文字の決定の手法にある。ここでは、読み「かいぎしつ」は、ほぼ１度で文字列「会議室」に仮名漢字変換される。「会議室」において誤認識文字「合」の周辺の文字「合議」との間で一致しない文字は「会」と「室」の２文字である。この２文字のうち、「議」の後ろの文字である「室」は、一致文字「議」との位置関係に基づき修正文字の対象外とされて、一致文字「議」の前の文字である「会」が修正文字として決定される。 Further, in Patent Document 1, a technique for determining a corrected character in consideration of a case where a kana-kanji conversion cannot be promptly converted into a character string intended by the operator due to the existence of homonyms (hereinafter referred to as a second technique). Also referred to). The feature of the second technique is that, for example, when the “meeting” of “meeting” is misrecognized as “go”, if the operator inputs “kaigi” reading of “meeting”, a word other than “meeting” In order to avoid this, for example, a character string “meeting room” followed by one character “room” is read behind the “meeting” in order to avoid this. This is a method of determining a correction character when "Kaigi Shitsu" is input. Here, the reading “Kaigi Shitsu” is converted into the kana-kanji character string “Conference Room” almost once. In the “conference room”, there are two characters “meeting” and “room” that do not match the character “council” around the misrecognized character “go”. Of these two characters, “room”, which is the character after “meeting”, is excluded from the correction character based on the positional relationship with the matching character “meeting”. A certain “meeting” is determined as a corrected character.

また、上記特許文献１には、修正文字を含む文字列であって複数の文節からなる文字列が入力されることを前提として、入力文字列のうちの予め定められた位置の文字（例えば第２文節の先頭文字）を修正文字として決定する技術（以下、第３の技術と称する）も記載されている。ここでは、第２文節の先頭文字を修正文字とすることが予め定められているものとする。第３の技術によれば、例えば「修士論文」の「修」が誤認識されて「参士論文」になった場合に、操作者が「がくもんをおさめる」と入力するならば、この入力文字列が「学問を修める」に仮名漢字変換された場合に、第２文節の先頭文字「修」が修正文字として決定される。
特開平７−３０２３０２号公報（段落００２３−００２７） Further, in Patent Document 1, a character (for example, a first character) at a predetermined position in an input character string is assumed on the assumption that a character string including a corrected character and including a plurality of phrases is input. There is also described a technique (hereinafter referred to as a third technique) for determining the first character of the two phrases as a corrected character. Here, it is assumed that the first character of the second phrase is a modified character. According to the third technique, for example, if the “Master” of “Master's thesis” is misrecognized and becomes “Journal thesis”, if the operator inputs “Suppress Gakumon”, this input When the character string is converted to Kana-Kanji to “Study Academic”, the first character “Osamu” of the second phrase is determined as a corrected character.
JP-A-7-302302 (paragraphs 0023-0027)

上記したように、第１乃至第３の技術によれば、操作者は認識テキスト中の誤認識文字を指定して、当該誤認識文字の正解文字を含む単語または文節の読みを入力することにより、その読みが仮名漢字変換された文字列から誤認識文字に置き換えられるべき修正文字が自動的に決定される。 As described above, according to the first to third techniques, the operator designates a misrecognized character in the recognized text and inputs a reading of a word or phrase including the correct character of the misrecognized character. Then, a corrected character to be replaced with a misrecognized character is automatically determined from the character string obtained by converting the reading into kana / kanji.

但し、修正文字の自動決定のためには、第１の技術では、操作者は、認識テキスト中の、誤認識文字が正解文字に置き換えられた本来あるべき単語または文節の読みを入力する必要がある。例えば、「晴天」が「晴夫」と誤認識された場合であれば、操作者は「晴夫」中の誤認識文字「夫」を「天」に修正するのに、当該「晴天」の読み「せいてん」を入力しなければならない。つまり第１の技術では、例えば「天気」の読み「てんき」と入力しても、「天」を文字列「晴夫」中の「夫」に対する修正文字であると自動決定することはできない。 However, in order to automatically determine a corrected character, in the first technique, the operator needs to input a reading of a word or a phrase that should be the original in which the erroneously recognized character is replaced with the correct character in the recognized text. is there. For example, if “Sunny” is misrecognized as “Haruo”, the operator corrects the misrecognized character “Huso” in “Haruo” to “Ten” and reads “Sunny”. You have to enter "Seiten". That is, in the first technique, for example, even if the reading of “weather” is input, “Ten” cannot be automatically determined to be a corrected character for “husband” in the character string “Haruo”.

また、第２の技術では、「会議」の「会」が「合」と誤認識された場合、操作者が「会議」の後ろ側に文字「室」が続く文字列「会議室」の読み「かいぎしつ」を入力することにより、読み「かいぎしつ」を仮名漢字変換して得られる「会議室」に含まれている不一致文字「会」及び「室」のうち、一致文字「議」の前の文字「会」が修正文字と決定される。しかし第２の技術では、操作者が、例えば文字列「早朝会議」は思いついても、「会議」の後ろ側に文字が続く文字列を思いつかなかった場合、修正作業を行うことができなくなる。勿論、仮名漢字変換後の文字列の中から操作者に修正文字の位置を指定させる構成を適用するならば、操作者は正解文字を含む任意の単語または文節を入力することができる。そのためには、操作者は仮名漢字変換後の文字列中で修正文字を指定する作業を必要とする。 In the second technique, when the “meeting” of the “meeting” is erroneously recognized as “go”, the operator reads the character string “meeting room” in which the character “room” follows the “meeting”. By entering “Kaigi Shitsu”, the matching characters “Meeting” of the mismatched characters “Mei” and “Room” included in “Meeting room” obtained by converting the reading “Kaigi Shitsu” into Kana / Kanji The character “kai” in front of “” is determined as a corrected character. However, in the second technique, even if the operator comes up with the character string “Early Morning Meeting”, for example, if the operator does not come up with a character string followed by letters, the correction work cannot be performed. Of course, if a configuration is adopted in which the operator designates the position of the correction character from the character string after the kana-kanji conversion, the operator can input any word or phrase including the correct character. For this purpose, the operator needs to specify a correction character in the character string after the kana-kanji conversion.

次に第３の技術では、操作者は、予め定められた位置に正解文字を含む文節の読みを入力しなければならず、文字修正作業を困難にする。 Next, in the third technique, the operator has to input a reading of a phrase including a correct character at a predetermined position, which makes the character correction work difficult.

本発明は上記事情を考慮してなされたものでその目的は、テキスト中に修正されるべき文字（文字列）が含まれているために、操作者が修正文字（修正文字列）を含む任意の文字列を修正用文字列として入力するための操作を行った場合に、その修正用文字列から修正されるべき文字（文字列）の修正に用いる修正文字（修正文字列）を自動抽出することができるテキスト処理装置及びプログラムを提供することにある。 The present invention has been made in consideration of the above circumstances, and its purpose is that the operator includes an arbitrary character including a corrected character (corrected character string) since the character (character string) to be corrected is included in the text. When an operation for inputting the character string of as a correction character string is performed, the correction character (correction character string) used for correcting the character (character string) to be corrected is automatically extracted from the correction character string It is an object to provide a text processing apparatus and a program capable of performing the above.

本発明の１つの観点によれば、テキスト処理装置が提供される。このテキスト処理装置は、テキストを入力するためのテキスト入力手段と、前記テキスト入力手段によって入力されたテキストを入力テキストとして格納するためのテキスト記憶手段と、前記テキスト記憶手段に格納された入力テキストの表示に用いられるディスプレイと、操作者の入力操作に用いられる入力操作手段と、前記入力操作手段を用いた操作者の入力操作に基づき前記入力テキスト中の修正されるべき文字列を被修正文字列として指定するための被修正文字列指定手段と、前記入力操作手段を用いた操作者の入力操作に基づき文字列の修正に用いられる修正文字列を含む修正用文字列を入力するための修正用文字列入力手段と、前記被修正文字列指定手段によって指定された被修正文字列と前記修正用文字列入力手段によって入力された修正用文字列とを比較することにより、当該修正用文字列の中から前記指定された被修正文字列の修正に用いられる修正文字列として最も尤もらしい文字列を抽出する修正文字列抽出手段と、前記入力テキスト中の前記指定された被修正文字列を前記抽出された文字列に置き換えることで当該被修正文字列を修正する修正手段とを具備する。 According to one aspect of the present invention, a text processing apparatus is provided. The text processing apparatus includes a text input unit for inputting text, a text storage unit for storing the text input by the text input unit as input text, and an input text stored in the text storage unit. A display used for display, input operation means used for an operator's input operation, and a character string to be corrected based on the operator's input operation using the input operation means. A character string to be corrected for designating a character string for correction and a character string for correction including a correction character string used for correcting the character string based on the input operation of the operator using the input operation means A character string input means, a corrected character string specified by the corrected character string specifying means and an input character string input by the correction character string input means; A correction character string extraction that extracts a character string most likely as a correction character string used for correcting the specified character string to be corrected from the correction character string by comparing with the corrected character string Means for correcting the corrected character string by replacing the specified character string to be corrected in the input text with the extracted character string.

本発明によれば、テキスト中の誤入力、或いは誤認識されたような文字列の修正作業を行う際、修正に用いる文字列（文字）を含みさえすれば、本来の文字列とは無関係の、入力しやすい任意の文字列を修正用文字列として操作者が入力する操作を行っても、当該修正に用いる文字列（文字）が修正用文字列から修正文字列（修正文字）として自動的に抽出される。このため、修正文字列を含む修正用文字列の入力時における仮名漢字変換等インプットメソッドの変換効率向上や、一旦修正用文字列を入力した後で余計な文字列を削除する手間の解消を図ることができ、テキスト修正の効率向上が可能となる。 According to the present invention, when a character string that is erroneously input or misrecognized in a text is corrected, the character string (characters) used for the correction is not included and is not related to the original character string. Even if the operator performs an operation to input an arbitrary character string that is easy to input as a correction character string, the character string (character) used for the correction is automatically changed from the correction character string as a correction character string (correction character). Extracted into For this reason, the conversion efficiency of the input method such as kana-kanji conversion when inputting the correction character string including the correction character string is improved, and the trouble of deleting the extra character string once the correction character string is input is aimed at. Thus, the efficiency of text correction can be improved.

以下、本発明の実施の形態につき図面を参照して説明する。
［第１の実施形態］
図１は本発明の第１の実施形態に係るテキスト処理装置の構成を示すブロック図である。図１のテキスト処理装置は、テキスト入力部１１、テキスト記憶部１２、ディスプレイ１３、テキスト表示部１４、入力操作部１５、被修正文字列指定部１６、修正用文字列入力部１７、仮名漢字変換部１８、辞書記憶部１９、修正文字列抽出部２０、及び修正部２１から構成される。 Embodiments of the present invention will be described below with reference to the drawings.
[First Embodiment]
FIG. 1 is a block diagram showing the configuration of the text processing apparatus according to the first embodiment of the present invention. 1 includes a text input unit 11, a text storage unit 12, a display 13, a text display unit 14, an input operation unit 15, a corrected character string specifying unit 16, a correction character string input unit 17, a kana-kanji conversion. A unit 18, a dictionary storage unit 19, a corrected character string extraction unit 20, and a correction unit 21 are included.

テキスト入力部１１は、例えば文字コードで表された漢字混じりのテキスト（テキストデータ）をネットワークから入力する通信インタフェースである。このテキストは、例えば、図１のテキスト処理装置とネットワークを介して接続された文字認識装置または音声認識装置による認識処理（文字認識処理または音声認識処理）によって認識された文字列から構成される文書データであり、当該文字認識装置または音声認識装置からネットワークを介して送信されるものとする。 The text input unit 11 is a communication interface that inputs text (text data) mixed with kanji represented by, for example, a character code from a network. This text is, for example, a document composed of a character string recognized by a recognition process (character recognition process or voice recognition process) by a character recognition device or a speech recognition device connected to the text processing device of FIG. 1 via a network. It is data and is transmitted from the character recognition device or the speech recognition device via a network.

テキスト入力部１１によって入力されるテキスト（ここでは認識結果）は一般に誤認識文字を含む。またテキストは、認識文字毎に認識候補文字群と対応付けられている。認識候補文字群の各認識候補文字には、その文字の認識文字としての確からしさを表すスコア、尤度或いは類似度が付されている。一般には、最も確からしい候補文字が認識文字として選択される。なお、テキスト入力部１１が、文字認識装置または音声認識装置による認識結果としてのテキストを格納した例えばリムーバブル記憶媒体から当該テキストを読み込む記憶装置であっても良い。また、テキスト入力部１１が、操作者の入力操作に従って文字列（テキスト）を入力するのに用いられるキーボードであっても良い。 The text (the recognition result here) input by the text input unit 11 generally includes misrecognized characters. The text is associated with a recognition candidate character group for each recognized character. Each recognition candidate character in the recognition candidate character group is given a score, likelihood, or similarity indicating the certainty of the character as a recognized character. In general, the most likely candidate character is selected as the recognized character. The text input unit 11 may be a storage device that reads the text from, for example, a removable storage medium that stores the text as a recognition result by the character recognition device or the speech recognition device. The text input unit 11 may be a keyboard used to input a character string (text) in accordance with an input operation by the operator.

テキスト記憶部１２は、テキスト入力部１１によって入力されたテキスト（及び当該テキストに付随する認識文字毎の認識候補文字群）を格納するのに用いられる。 The text storage unit 12 is used to store the text (and the recognition candidate character group for each recognized character that accompanies the text) input by the text input unit 11.

ディスプレイ１３は、テキストの表示等に用いられる。テキスト表示部１４は、テキスト記憶部１２に格納されたテキスト（入力テキスト）をディスプレイ１３に表示するための処理を行う。 The display 13 is used for displaying text and the like. The text display unit 14 performs processing for displaying the text (input text) stored in the text storage unit 12 on the display 13.

入力操作部１５は、ディスプレイ１３に表示されているテキストに修正されるべき文字列（被修正文字列）が含まれている場合に、その被修正文字列の選択、及び当該被修正文字列に置き換えられるべき文字列（修正文字列）を含む文字列（修正用文字列）の例えば読みの入力に用いられる。入力操作部１５は、例えばキーボード及びマウスから構成される。 When the text displayed on the display 13 includes a character string to be corrected (corrected character string), the input operation unit 15 selects the corrected character string and sets the corrected character string to the corrected character string. This is used for, for example, reading input of a character string (correction character string) including a character string to be replaced (correction character string). The input operation unit 15 includes a keyboard and a mouse, for example.

被修正文字列指定部１６は、入力操作部１５を用いた操作者の入力操作により任意の文字列が被修正文字列として選択されたことを検出する。被修正文字列指定部１６は、この文字列選択の検出により、選択された文字列を被修正文字列として指定（決定）する。修正用文字列入力部１７は、入力操作部１５を用いて行われる操作者による修正用文字列入力操作に基づき修正用文字列を入力する。 The corrected character string designating unit 16 detects that an arbitrary character string has been selected as the corrected character string by the input operation of the operator using the input operation unit 15. The corrected character string designating unit 16 designates (determines) the selected character string as the corrected character string by detecting the character string selection. The correction character string input unit 17 inputs a correction character string based on a correction character string input operation performed by the operator using the input operation unit 15.

仮名漢字変換部１８は、修正用文字列入力部１７によって入力された読みを漢字混じりの文字列（修正用文字列）に変換する。ここでの読みは、例えば仮名入力によるものであっても、ローマ字入力によるものであっても構わない。 The kana-kanji conversion unit 18 converts the reading input by the correction character string input unit 17 into a character string mixed with kanji (correction character string). The reading here may be, for example, by kana input or by Romaji input.

辞書記憶部１９は、言語知識データベース記憶部１９１、文字パターン辞書記憶部１９２及び音声パターン辞書記憶部１９３を含む。言語知識データベース記憶部１９１は、言語知識データベースを格納する。この言語知識データベースは、種々の文字の組み合わせからなる文字列毎に、その文字列の言語（単語または語句）としての尤もらしさを表す情報を保持する。文字パターン辞書記憶部１９２は、各文字の標準の文字パターン情報を格納する。音声パターン辞書記憶部１９３は、各文字を構成する音節の標準の音声パターン情報を格納する。 The dictionary storage unit 19 includes a language knowledge database storage unit 191, a character pattern dictionary storage unit 192, and a speech pattern dictionary storage unit 193. The language knowledge database storage unit 191 stores a language knowledge database. This linguistic knowledge database holds information representing the likelihood of the character string as a language (word or phrase) for each character string composed of various character combinations. The character pattern dictionary storage unit 192 stores standard character pattern information of each character. The voice pattern dictionary storage unit 193 stores standard voice pattern information of syllables constituting each character.

修正文字列抽出部２０は、修正用文字列を、ディスプレイ１３に表示されているテキストのうち、被修正文字列指定部１６によって指定された被修正文字列と比較することによって、当該修正用文字列から修正文字列を抽出する。 The correction character string extraction unit 20 compares the correction character string with the correction character string specified by the correction character string specifying unit 16 in the text displayed on the display 13, thereby correcting the correction character string. Extract the modified string from the column.

修正部２１は、ディスプレイ１３に表示されているテキストのうちの被修正文字列を、修正文字列抽出部２０によって抽出された修正文字列に置き換えることにより、当該テキスト（中の被修正文字列）を修正する。 The correcting unit 21 replaces the corrected character string in the text displayed on the display 13 with the corrected character string extracted by the corrected character string extracting unit 20, thereby the text (the corrected character string in the text). To correct.

図２は、図１のテキスト処理装置を実現する情報処理装置のハードウェア構成を示すブロック図である。図２において、図１と同様の部分には同一符号を付してある。図２に示す情報処理装置は、例えばパーソナルコンピュータのようなコンピュータである。この情報処理装置は、図１に示されるディスプレイ１３及び入力操作部１５の他に、通信インタフェース３１、リムーバブル記憶装置３２及びハードディスクドライブ（ＨＤＤ）３３の各入出力機器、ＣＰＵ３４及びメモリ３５を含む。メモリ３５の記憶領域の一部は、図１に示されるテキスト記憶部１２として用いられる。またＨＤＤ３３の記憶領域の一部は、図１に示される辞書記憶部１９として用いられる。 FIG. 2 is a block diagram showing a hardware configuration of an information processing apparatus that implements the text processing apparatus of FIG. In FIG. 2, the same parts as those in FIG. The information processing apparatus shown in FIG. 2 is a computer such as a personal computer. This information processing apparatus includes a communication interface 31, a removable storage device 32, input / output devices of a hard disk drive (HDD) 33, a CPU 34, and a memory 35 in addition to the display 13 and the input operation unit 15 shown in FIG. A part of the storage area of the memory 35 is used as the text storage unit 12 shown in FIG. A part of the storage area of the HDD 33 is used as the dictionary storage unit 19 shown in FIG.

ＨＤＤ３３には、図２の情報処理装置をテキスト処理装置（テキストエディタ）として機能させるためのプログラム３３０が予め格納されている。ＣＰＵ３４は、ＨＤＤ３３に格納されているプログラム３３０をメモリ３５に読み込んで実行することにより、図２の情報処理装置を図１に示されるテキスト処理装置として機能させる。即ちＣＰＵ３４は、プログラム３３０の実行により、図１に示されるテキスト入力部１１、テキスト表示部１４、被修正文字列指定部１６、修正用文字列入力部１７、仮名漢字変換部１８、修正文字列抽出部２０及び修正部２１の各機能部（処理部）を実現するものとする。このプログラム３３０は、コンパクトディスク、或いはＲＯＭのような、コンピュータ読み取り可能な記憶媒体に予め格納して頒布可能である。また、このプログラム３３０が、ネットワークを介して図２の情報処理装置のＨＤＤ３３にダウンロードされても構わない。 The HDD 33 stores in advance a program 330 for causing the information processing apparatus of FIG. 2 to function as a text processing apparatus (text editor). The CPU 34 reads the program 330 stored in the HDD 33 into the memory 35 and executes it, thereby causing the information processing apparatus in FIG. 2 to function as the text processing apparatus shown in FIG. That is, by executing the program 330, the CPU 34 executes the text input unit 11, the text display unit 14, the corrected character string specifying unit 16, the correction character string input unit 17, the kana-kanji conversion unit 18, the corrected character string shown in FIG. Each functional unit (processing unit) of the extraction unit 20 and the correction unit 21 is realized. The program 330 can be stored in advance and distributed in a computer-readable storage medium such as a compact disk or ROM. The program 330 may be downloaded to the HDD 33 of the information processing apparatus in FIG. 2 via a network.

次に、図１のテキスト処理装置における文字列修正処理について、図３に示すフローチャート及び図４に示す画面例を参照して説明する。まず、テキスト入力部１１によって、テキスト「本日は晴夫なり。明日は晴れるてしようか。」が入力されたものとする（ステップＳ１）。このテキスト「本日は晴夫なり。明日は晴れるてしようか。」は、例えば紙の文書「本日は晴天なり。明日は晴れるでしょうか。」をスキャナでスキャンして得られた画像（入力画像）を文字認識装置にて文字認識処理した認識結果であるものとする。ここでは、「晴天」の「天」が「夫」に誤認識されている。また、「晴れるでしょうか」における「で」と「ょ」が誤認識されて「晴れるてしようか」となっている。 Next, the character string correction process in the text processing apparatus of FIG. 1 will be described with reference to the flowchart shown in FIG. 3 and the screen example shown in FIG. First, it is assumed that the text “Today is Haruo. Will it be sunny tomorrow?” Is input by the text input unit 11 (step S1). This text “Today is Haruo. Will it be tomorrow?” For example is an image (input image) obtained by scanning a paper document “Today is sunny. Tomorrow is sunny.” It is assumed that the recognition result is a character recognition process performed by the character recognition device. In this case, “heaven” in “sunny sky” is mistakenly recognized as “husband”. In addition, “de” and “yo” in “Does it clear” are misrecognized and become “Will it be clear”?

テキスト入力部１１によって入力されたテキストはテキスト記憶部１２に格納される。このテキストには、入力画像が付随している。また、テキストには、当該テキストを構成する認識文字毎に認識候補文字群が付随している。認識文字も認識候補文字の１つである。各認識候補文字には、その文字の認識文字としての確からしさ（認識信頼度）を表す情報が付されている。本実施形態では、「晴天」の「天」に対する認識候補文字として、最も確からしい文字「夫」から順に、「人」、「未」及び「大」の４文字が挙げられているものとする。この場合、文字「夫」がテキスト中の認識文字として用いられ、残りの３文字「人」、「未」及び「大」が修正候補文字として用いられる。 The text input by the text input unit 11 is stored in the text storage unit 12. This text is accompanied by an input image. The text is accompanied by a recognition candidate character group for each recognized character constituting the text. The recognition character is also one of the recognition candidate characters. Each recognition candidate character is given information indicating the certainty (recognition reliability) of the character as a recognized character. In the present embodiment, as recognition candidate characters for “heaven” in “sunny”, four characters “person”, “not yet”, and “large” are listed in order from the most likely character “husband”. . In this case, the character “husband” is used as a recognized character in the text, and the remaining three characters “people”, “not yet”, and “large” are used as correction candidate characters.

テキスト表示部１４は、テキスト記憶部１２に格納されたテキスト（入力テキスト）及び当該テキストに付随する入力画像を、それぞれ図４において符号４１及び４２で示されるように、ディスプレイ１３に表示させる（ステップＳ２）。この際、テキスト表示部１４は、テキスト中の各文字（認識文字）の認識信頼度を表す情報に基づき、例えば認識信頼度が一定レベルよりも低い文字を、誤認識の可能性が高い文字（誤認識候補）として操作者が視認可能なように、他の文字とは区別してディスプレイ１３に表示させる。図４の例では、網掛け文字として表示されている。これにより操作者は、画面表示されているテキストに含まれている誤認識候補の中から、修正されるべき候補（被修正文字列）を入力操作部１５（の例えばマウス、或いはキーボード）で選択することができる。 The text display unit 14 displays the text (input text) stored in the text storage unit 12 and the input image accompanying the text on the display 13 as indicated by reference numerals 41 and 42 in FIG. S2). At this time, the text display unit 14 is based on information representing the recognition reliability of each character (recognized character) in the text, for example, a character (recognition having a lower recognition reliability than a certain level) is likely to be erroneously recognized ( It is displayed on the display 13 separately from other characters so that the operator can visually recognize it as an erroneous recognition candidate). In the example of FIG. 4, it is displayed as shaded characters. Thereby, the operator selects a candidate (corrected character string) to be corrected from among misrecognition candidates included in the text displayed on the screen by using the input operation unit 15 (for example, a mouse or a keyboard). can do.

なお、図１のテキスト処理装置に文法／単語チェック機能を持たせ、当該機能により入力テキストに対して、言語知識データベース記憶部１９１に格納されている言語知識データベースに基づいて文法／単語チェックを行うことで、文法的に誤っている文字列、或いは単語として尤もらしくない文字列を被修正文字列候補として抽出して、当該被修正文字列候補が他の文字列とは区別して表示されるようにしても良い。 1 is provided with a grammar / word check function, and the grammar / word check is performed on the input text based on the language knowledge database stored in the language knowledge database storage unit 191 by the function. Thus, a character string that is grammatically incorrect or a character string that is not likely to be a word is extracted as a corrected character string candidate, and the corrected character string candidate is displayed separately from other character strings. Anyway.

また操作者は、画面表示されている誤認識候補（被修正文字列候補）を被修正文字列として選択する以外に、入力テキスト上の任意の文字列の開始／終了位置を修正開始／終了位置として入力操作部１５で指定することで、当該開始／終了位置で示される文字列を被修正文字列として選択することもできる。 In addition to selecting the erroneous recognition candidate (corrected character string candidate) displayed on the screen as the corrected character string, the operator sets the start / end position of any character string on the input text. Can be selected as the character string to be corrected.

ここで、図４に示されるテキスト中に修正されるべき誤認識候補（誤認識文字）が存在するために、操作者が入力操作部１５を操作することによって、テキスト中の誤認識候補（誤認識文字）、例えば図４において符号４５で示される、文字列「晴夫」の文字「夫」を、被修正文字列（被修正文字）として選択したものとする。この場合、テキスト表示部１４は選択された文字「夫」が被修正文字列（被修正文字）として視認可能なように、当該文字「夫」を特別の表示属性でディスプレイ１３に表示させると共に、上記修正候補文字「人」、「未」及び「大」を、図４に示される修正候補表示領域４３に表示させる。但し本実施形態では、修正候補表示領域４３に表示される修正候補文字群には、正解文字「天」が含まれていない。 Here, since there are misrecognition candidates (misrecognition characters) to be corrected in the text shown in FIG. 4, the operator operates the input operation unit 15, so that the misrecognition candidates in the text (incorrect (Recognized character), for example, the character “husband” of the character string “Haruo” indicated by reference numeral 45 in FIG. 4 is selected as a corrected character string (corrected character). In this case, the text display unit 14 displays the character “husband” on the display 13 with a special display attribute so that the selected character “husband” can be visually recognized as a corrected character string (corrected character). The correction candidate characters “people”, “not yet” and “large” are displayed in the correction candidate display area 43 shown in FIG. However, in the present embodiment, the correction candidate character group displayed in the correction candidate display area 43 does not include the correct character “heaven”.

操作者は、入力操作部１５を操作して被修正文字列を選択すると、修正候補表示領域４３に表示された修正候補文字群の中に、修正文字（正解文字）が含まれているかを調べる。もし、修正文字が含まれていない場合、操作者は入力操作部１５を用いて、図４に示される修正入力フィールド４４に修正文字を含む文字列を入力するための操作を行う。 When the operator operates the input operation unit 15 to select a character string to be corrected, it is checked whether a correction character (correct answer character) is included in the correction candidate character group displayed in the correction candidate display area 43. . If the corrected character is not included, the operator uses the input operation unit 15 to perform an operation for inputting a character string including the corrected character in the corrected input field 44 shown in FIG.

さて被修正文字列指定部１６は、操作者によって被修正文字列が選択されたことを検出すると（ステップＳ３，Ｓ４）、その旨を修正用文字列入力部１７に通知すると共に、修正文字列抽出部２０に対し、選択された文字列を被修正文字列として指定する（ステップＳ５）。図４において符号４５で示される、文字列「晴夫」の文字「夫」が選択された本実施形態では、当該文字「夫」が被修正文字列（被修正文字）として指定される。 When the corrected character string specifying unit 16 detects that the corrected character string has been selected by the operator (steps S3 and S4), the corrected character string specifying unit 16 notifies the correction character string input unit 17 of the fact and the corrected character string. The selected character string is designated as the corrected character string to the extraction unit 20 (step S5). In the present embodiment in which the character “husband” of the character string “Haruo” indicated by reference numeral 45 in FIG. 4 is selected, the character “husband” is designated as a corrected character string (corrected character).

修正用文字列入力部１７は、被修正文字列指定部１６によって被修正文字列が選択されると、修正入力フィールド４４への修正文字を含む文字列（修正用文字列）の入力または修正候補表示領域４３からの修正候補の選択のいずれが行われるかを判定する（ステップＳ６）。ここでは、操作者が、入力操作部１５を用いて、修正入力フィールド４４へ修正用文字列「天気」の読み「てんき」を入力する操作を行ったものとする。この場合、仮名漢字変換部１８は、入力された読み「てんき」を「天気」に変換する。これにより修正入力フィールド４４には、修正用文字列「天気」が表示される。つまり修正入力フィールド４４に修正用文字列「天気」が入力される。 When the character string to be corrected is selected by the character string specifying unit 16 to be corrected, the correction character string input unit 17 inputs a character string (correction character string) including the correction character to the correction input field 44 or a correction candidate. It is determined which of the correction candidates is selected from the display area 43 (step S6). Here, it is assumed that the operator uses the input operation unit 15 to input the reading “Tenki” of the correction character string “weather” into the correction input field 44. In this case, the kana-kanji conversion unit 18 converts the input reading “Tenki” into “weather”. As a result, the correction character string “weather” is displayed in the correction input field 44. That is, the correction character string “weather” is input to the correction input field 44.

修正用文字列入力部１７は、修正入力フィールド４４に修正用文字列が入力されると、修正用文字列の入力を判定する（ステップＳ６）。この場合、修正用文字列入力部１７は、修正入力フィールド４４に入力された修正用文字列を取り込み、当該修正用文字列を修正文字列抽出部２０に渡す（ステップＳ７）。 When the correction character string is input to the correction input field 44, the correction character string input unit 17 determines input of the correction character string (step S6). In this case, the correction character string input unit 17 takes in the correction character string input in the correction input field 44 and passes the correction character string to the correction character string extraction unit 20 (step S7).

修正文字列抽出部２０は、修正用文字列入力部１７によって取り込まれた修正用文字列の文字数と被修正文字列指定部１６によって指定された被修正文字列の文字数とを比較することにより、修正用文字列からの修正文字列の抽出が必要かを判定する（ステップＳ８）。ここでは、修正用文字列の文字数が指定された被修正文字列の文字数よりも多い場合に、修正文字列の抽出が必要であると判定される。修正用文字列が「天気」であり、被修正文字列が「夫」である本実施形態では、修正文字列の抽出が必要であると判定される。 The correction character string extraction unit 20 compares the number of characters of the correction character string captured by the correction character string input unit 17 with the number of characters of the correction character string specified by the correction character string specifying unit 16. It is determined whether a correction character string needs to be extracted from the correction character string (step S8). Here, when the number of characters in the correction character string is greater than the number of characters in the specified character string to be corrected, it is determined that the correction character string needs to be extracted. In this embodiment in which the correction character string is “weather” and the corrected character string is “husband”, it is determined that the correction character string needs to be extracted.

修正文字列抽出部２０は、修正文字列の抽出が必要であると判定した場合、修正用文字列と被修正文字列とを比較することにより、当該修正用文字列から被修正文字列と置き換えるべき修正文字列を抽出する（ステップＳ９）。この修正文字列抽出部２０による修正文字列（修正文字）の抽出には、第１の文字列（修正用文字列）の中から第２の文字列（被修正文字列）に類似した第３の文字列を（修正文字列として）検索する手法が適用される。 When the correction character string extraction unit 20 determines that the correction character string needs to be extracted, the correction character string is compared with the correction character string by comparing the correction character string with the correction character string. A power correction character string is extracted (step S9). For the extraction of the corrected character string (corrected character) by the corrected character string extracting unit 20, a third character string similar to the second character string (corrected character string) is selected from the first character string (correcting character string). A method of searching for a character string (as a corrected character string) is applied.

修正文字列抽出部２０による検索には、文字列や音声などのマッチングによく用いられる周知のＤＰ（Dynamic Programming）マッチングのような手法が適用される。第２の文字列（被修正文字列）を含むテキストが文字認識結果である本実施形態では、このマッチングには、第１の文字列（修正用文字列）を構成する各文字の標準パターン（標準文字パターン）と、第２の文字列（被修正文字列）を構成する各文字の標準文字パターンとが用いられる。これらの文字パターンの情報は、文字パターン辞書記憶部１９２に格納されている。 For the search by the corrected character string extraction unit 20, a well-known technique such as DP (Dynamic Programming) matching, which is often used for matching character strings and voices, is applied. In this embodiment in which the text including the second character string (corrected character string) is a character recognition result, this matching includes a standard pattern of each character constituting the first character string (correcting character string) ( A standard character pattern) and a standard character pattern of each character constituting the second character string (corrected character string) are used. Information on these character patterns is stored in the character pattern dictionary storage unit 192.

修正文字列抽出部２０によるマッチング手法を適用した検索により、第１の文字列（修正用文字列）の中から第２の文字列（被修正文字列）に最も一致度（類似度）の高い文字列（第３の文字列）が修正文字列として抽出される。 As a result of the search using the matching method by the corrected character string extraction unit 20, the second character string (corrected character string) has the highest degree of matching (similarity) from the first character string (correcting character string). A character string (third character string) is extracted as a corrected character string.

第１の文字列（修正用文字列）が「天気」、第２の文字列（被修正文字列）が「夫」である本実施形態では、「天気」の「天」と「夫」との文字パターン（字形状）の類似度と「天気」の「気」と「夫」との字形状の類似度とが比較される。ここでは、「天」と「夫」との文字パターンの類似度の方が「気」と「夫」との文字パターンの類似度よりも十分高い、この場合、修正文字列抽出部２０は、第１の文字列（修正用文字列）「天気」の中から、第２の文字列（被修正文字列）「夫」に類似した第３の文字列「天」を修正文字列（修正文字）として抽出する。 In the present embodiment in which the first character string (correction character string) is “weather” and the second character string (corrected character string) is “husband”, “weather” “heaven” and “husband” The similarity of the character pattern (character shape) is compared with the similarity of the character shape of “Ki” and “husband” of “weather”. Here, the similarity of the character patterns of “heaven” and “husband” is sufficiently higher than the similarity of the character patterns of “ki” and “husband”. In this case, the corrected character string extraction unit 20 A third character string “heaven” similar to the second character string (corrected character string) “husband” is corrected from the first character string (correcting character string) “weather”. ).

上記の例は、テキスト入力部１１によって入力されたテキストが文字認識結果の場合である。もし、テキスト入力部１１によって入力されたテキストが音声認識結果の場合であるならば、修正文字列抽出部２０によるマッチング手法を適用した検索には、各文字の標準文字パターンに代えて、その文字の標準音声パターン（例えば、その文字に対応する音声を構成する音節の標準音声パターン）を用いれば良い。これらの音声パターンの情報は、音声パターン辞書記憶部１９３に格納されている。 The above example is a case where the text input by the text input unit 11 is a character recognition result. If the text input by the text input unit 11 is a speech recognition result, the search using the matching method by the corrected character string extraction unit 20 uses that character instead of the standard character pattern of each character. Standard speech patterns (for example, standard speech patterns of syllables constituting speech corresponding to the character) may be used. Information on these voice patterns is stored in the voice pattern dictionary storage unit 193.

また、パターンマッチングを用いた検索に代えて、例えば文字列（単語または語句）の言語としての尤もらしさ（単語または語句としての尤もらしさ）を評価して比較することにより、第１の文字列（修正用文字列）の中から第２の文字列（被修正文字列）の修正に最も適した文字列を修正文字列として検索することも可能である。ここでは修正文字列抽出部２０は、入力テキストから第２の文字列（被修正文字列）を含む第４の文字列を取り出す。次に修正文字列抽出部２０は取り出された第４の文字列中の被修正文字列を、第１の文字列（修正用文字列）から選択可能な同一文字数の文字列と置き換える処理を繰り返す。修正文字列抽出部２０は、この文字列置き換えが行われた第４の文字列の単語（言語）としての尤もらしさを、言語知識データベース記憶部１９１に格納されている言語知識データベースに基づいて評価する。そして修正文字列抽出部２０は、単語（言語）として最も尤もらしいと評価された第４の文字列に含まれている、被修正文字列の置き換えに用いられた文字列（第３の文字列）を、修正文字列として抽出する。この手法は、被修正文字列を含むテキストが、キー入力操作等によって作成されたテキストを編集するために、編集者（操作者）が、例えば誤入力文字を含む編集対象文字列を被修正文字列として指定して修正用文字列を入力する場合に、特に有効である。また、被修正文字列を含むテキストが文字認識結果又は音声認識結果の場合に、この手法を先のマッチングによる検索手法と組み合わせるならば、修正文字列を一層高精度に抽出することができる。 Further, instead of searching using pattern matching, for example, by evaluating and comparing the likelihood of a character string (word or phrase) as a language (likelihood as a word or phrase), the first character string ( It is also possible to search for a character string most suitable for correcting the second character string (corrected character string) from among the correction character strings). Here, the corrected character string extracting unit 20 extracts a fourth character string including the second character string (corrected character string) from the input text. Next, the corrected character string extraction unit 20 repeats the process of replacing the corrected character string in the extracted fourth character string with a character string having the same number of characters that can be selected from the first character string (correction character string). . The corrected character string extraction unit 20 evaluates the likelihood of the fourth character string subjected to the character string replacement as a word (language) based on the language knowledge database stored in the language knowledge database storage unit 191. To do. Then, the corrected character string extraction unit 20 includes a character string (third character string) used for replacement of the corrected character string included in the fourth character string evaluated as most likely as a word (language). ) As a corrected character string. In this method, in order to edit text created by a key input operation or the like including text to be corrected, the editor (operator) can edit the character string to be edited including, for example, an erroneously input character. This is particularly effective when a character string for correction is input as a column. Further, when the text including the corrected character string is a character recognition result or a speech recognition result, the corrected character string can be extracted with higher accuracy if this method is combined with the search method based on the previous matching.

以下、言語としての尤もらしさを利用した修正文字列の抽出の具体例について説明する。ここでは上述のように、第１の文字列（修正用文字列）が「天気」、第２の文字列（被修正文字列）が「夫」、第２の文字列（被修正文字列）「夫」を含む入力テキストが「本日は晴夫なり。明日は晴れるてしようか。」であるものとする。 Hereinafter, a specific example of extraction of a corrected character string using the likelihood as a language will be described. Here, as described above, the first character string (correction character string) is “weather”, the second character string (corrected character string) is “husband”, and the second character string (corrected character string). It is assumed that the input text including “husband” is “Today is Haruo. Will it be tomorrow?”.

まず修正文字列抽出部２０は、入力テキストから第２の文字列（被修正文字列）「夫」を含む第４の文字列（単語）を抽出する。ここでは、第４の文字列として単語「晴夫」が抽出されたものとする。次に修正文字列抽出部２０は、単語「晴夫」中の被修正文字「夫」を、第１の文字列（修正用文字列）「天気」を構成する各文字「天」「気」で置き換える。この置き換え後の単語は、「晴天」及び「晴気」となる。修正文字列抽出部２０は、言語知識データベース記憶部１９１に格納されている言語知識データベースに基づいて、「晴天」及び「晴気」の単語（言語）としての尤もらしさを評価する。この言語知識データベースは、種々の文字の組み合わせから構成される文字列（単語または語句）について、単語または語句（言語）としての尤もらしさを示す情報（例えば尤度）を保持している。例えば、「晴天」及び「晴気」の両単語の場合、「晴天」により高い尤度が付されている。そこで修正文字列抽出部２０は、単語としての尤もらしさが最も高い「晴天」を選択して、その「晴天」に含まれている、被修正文字「夫」の置き換えに用いられた文字「天」を、修正文字として抽出する。 First, the corrected character string extraction unit 20 extracts a fourth character string (word) including the second character string (corrected character string) “husband” from the input text. Here, it is assumed that the word “Haruo” is extracted as the fourth character string. Next, the corrected character string extraction unit 20 converts the corrected character “husband” in the word “Haruo” into the first character string (correction character string) “weather” and the characters “heaven” and “ki”. replace. The words after the replacement are “sunny” and “sunny”. The corrected character string extraction unit 20 evaluates the likelihood of “sunny” and “sunny” as words (languages) based on the language knowledge database stored in the language knowledge database storage unit 191. This linguistic knowledge database holds information (for example, likelihood) indicating the likelihood of a character string (word or phrase) composed of a combination of various characters as a word or phrase (language). For example, in the case of both the words “sunny” and “sunny”, “sunny” gives a higher likelihood. Therefore, the corrected character string extraction unit 20 selects “sunny sky” having the highest likelihood as a word, and the character “heaven” used to replace the corrected character “husband” included in the “sunny sky”. "Is extracted as a corrected character.

修正部２１は、入力テキスト中の被修正文字列（被修正文字）を、修正文字列抽出部２０によって抽出された修正文字列（修正文字）に置き換える（ステップＳ１０）。また、修正部２１は、修正候補表示領域４３に表示された修正候補文字群の中から任意の候補文字が選択された場合には（ステップＳ６）、当該選択された候補文字を修正文字として、入力テキスト中の被修正文字を、当該修正文字に置き換える（ステップＳ１０）。 The correcting unit 21 replaces the corrected character string (corrected character) in the input text with the corrected character string (corrected character) extracted by the corrected character string extracting unit 20 (step S10). In addition, when any candidate character is selected from the correction candidate character group displayed in the correction candidate display area 43 (step S6), the correction unit 21 sets the selected candidate character as a correction character. The corrected character in the input text is replaced with the corrected character (step S10).

本実施形態において修正文字列抽出部２０は、修正用文字列の文字数が被修正文字列の文字数と同数である場合、修正用文字列からの修正文字列の抽出が不要であると判定する（ステップＳ８）。この場合、修正部２１は、修正用文字列を修正文字列として、入力テキスト中の被修正文字列を、当該修正文字列（修正用文字列）に置き換える（ステップＳ１０）。 In the present embodiment, the correction character string extraction unit 20 determines that it is not necessary to extract the correction character string from the correction character string when the number of characters in the correction character string is the same as the number of characters in the correction character string ( Step S8). In this case, the correction unit 21 uses the correction character string as the correction character string and replaces the character string to be corrected in the input text with the correction character string (correction character string) (step S10).

以上の文字列修正処理は、操作者によって被修正文字列が選択される限り、繰り返される。そして、入力テキスト中に修正すべき文字列が存在しなくなり、操作者によってテキスト修正の終了が指定されると（ステップＳ３）、文字列修正処理は終了する。 The above character string correction process is repeated as long as the corrected character string is selected by the operator. Then, when there is no character string to be corrected in the input text and the end of text correction is designated by the operator (step S3), the character string correction processing ends.

［第２の実施形態］
次に、本発明の第２の実施形態について説明する。第２の実施形態の特徴は、操作者による文字列選択操作に基づく被修正文字列の指定を不要とすると共に、この被修正文字列が指定されない状態で被修正文字列を決定（推定）してテキスト修正を行うためのモード（第１のテキスト修正モード）と、第１の実施形態と同様に被修正文字列の指定を前提としてテキスト修正を行うためのモード（第２のテキスト修正モード）との切り替え設定を可能とした点にある。 [Second Embodiment]
Next, a second embodiment of the present invention will be described. The feature of the second embodiment is that it is not necessary to specify the corrected character string based on the character string selection operation by the operator, and the corrected character string is determined (estimated) in a state where the corrected character string is not specified. A mode for correcting the text (first text correction mode), and a mode for correcting the text on the assumption that the character string to be corrected is specified (second text correction mode) as in the first embodiment. It is in the point that switching setting between and is possible.

図５は本発明の第２の実施形態に係るテキスト処理装置の構成を示すブロック図である。図５において、図１と同様の要素には同一符号を付してある。図５のテキスト処理装置が、図１のそれと異なる点は、モード設定部２２及び被修正文字列検索部２３が追加されている点と、修正文字列抽出部２０に代えて修正文字列抽出部２００が用いられている点である。修正文字列抽出部２００は、第２のテキスト修正モードで動作する点で当該修正文字列抽出部２０と相違する。修正文字列抽出部２００は第２のテキスト修正モードにおいて、修正文字列抽出部２０と同様の修正文字列抽出を行う。 FIG. 5 is a block diagram showing a configuration of a text processing apparatus according to the second embodiment of the present invention. In FIG. 5, the same elements as those in FIG. The text processing device of FIG. 5 differs from that of FIG. 1 in that a mode setting unit 22 and a corrected character string search unit 23 are added, and a corrected character string extraction unit instead of the corrected character string extraction unit 20. 200 is used. The modified character string extraction unit 200 is different from the modified character string extraction unit 20 in that it operates in the second text modification mode. The corrected character string extraction unit 200 performs the same correction character string extraction as the corrected character string extraction unit 20 in the second text correction mode.

モード設定部２２は、入力操作部１５を用いた操作者の入力操作に基づき、第１のテキスト修正モード及び第２のテキスト修正モードの一方のモードを選択的に設定する。更に具体的に述べるならば、モード設定部２２は、被修正文字列指定部１６によって被修正文字列が指定されていない状態で、修正用文字列入力部１７によって修正用文字列が入力される特定状態の場合に第１のテキスト修正モードを設定し、被修正文字列指定部１６によって被修正文字列が指定された場合に第２のテキスト修正モードを設定する。 The mode setting unit 22 selectively sets one of the first text correction mode and the second text correction mode based on the input operation of the operator using the input operation unit 15. More specifically, in the mode setting unit 22, a correction character string is input by the correction character string input unit 17 in a state where the correction character string is not specified by the correction character string specifying unit 16. The first text correction mode is set in the specific state, and the second text correction mode is set when the corrected character string is specified by the corrected character string specifying unit 16.

被修正文字列検索部２３は、第１のテキスト修正モードにおいて動作して、修正用文字列入力部１７によって入力された（取り込まれた）修正用文字列を入力テキストと比較することにより、当該入力テキストから修正用文字列に類似した文字列を検索する。被修正文字列検索部２３は、検索された類似文字列を被修正文字列として決定（推定）する。 The corrected character string search unit 23 operates in the first text correction mode, and compares the correction character string input (captured) by the correction character string input unit 17 with the input text. Search the input text for a string that is similar to the correction string. The corrected character string search unit 23 determines (estimates) the searched similar character string as a corrected character string.

図５に示されるテキスト入力部１１、テキスト表示部１４、被修正文字列指定部１６、修正用文字列入力部１７、仮名漢字変換部１８、修正部２１、モード設定部２２、被修正文字列検索部２３及び修正文字列抽出部２００は、図２に示される情報処理装置（コンピュータ）内のＣＰＵ３４がプログラム３３０をメモリ３５に読み込んで実行することにより実現されるものとする。 The text input unit 11, the text display unit 14, the corrected character string specifying unit 16, the correction character string input unit 17, the kana-kanji conversion unit 18, the correction unit 21, the mode setting unit 22, and the corrected character string shown in FIG. The search unit 23 and the corrected character string extraction unit 200 are realized by the CPU 34 in the information processing apparatus (computer) shown in FIG. 2 reading the program 330 into the memory 35 and executing it.

次に、図５のテキスト処理装置における文字列修正処理について、図６Ａ及び図６Ｂに示すフローチャート並びに図７及び図８に示す画面例を参照して説明する。まず上記第１の実施形態と同様に、テキスト入力部１１によって、テキスト「本日は晴夫なり。明日は晴れるてしようか。」が入力されたものとする（ステップＳ１１）。このテキスト「本日は晴夫なり。明日は晴れるてしようか。」は、紙の文書「本日は晴天なり。明日は晴れるでしょうか。」をスキャナでスキャンして得られた画像（入力画像）を文字認識装置にて文字認識処理した認識結果である。テキスト入力部１１によって入力されたテキストはテキスト記憶部１２に格納される。 Next, the character string correction process in the text processing apparatus of FIG. 5 will be described with reference to the flowcharts shown in FIGS. 6A and 6B and the screen examples shown in FIGS. First, as in the first embodiment, it is assumed that the text “Today is Haruo. Will it be tomorrow?” Is input by the text input unit 11 (step S11). This text “Today is Haruo. Will it be sunny tomorrow?” Is the text (input image) obtained by scanning the paper document “Today is sunny. Will it be sunny tomorrow?” With a scanner. It is the recognition result which performed the character recognition process with the recognition apparatus. The text input by the text input unit 11 is stored in the text storage unit 12.

テキスト表示部１４は、テキスト記憶部１２に格納されたテキスト（入力テキスト）及び当該テキストに付随する入力画像を、それぞれ図７（ａ）において符号７１及び７２で示されるように、ディスプレイ１３に表示させる（ステップＳ１２）。 The text display unit 14 displays the text (input text) stored in the text storage unit 12 and the input image accompanying the text on the display 13 as indicated by reference numerals 71 and 72 in FIG. (Step S12).

ここで、図７（ａ）に示されるテキスト中の文字「夫」、つまり誤認識候補（誤認識文字）「夫」を、上記第１の実施形態と同様に、操作者が入力操作部１５を用いた操作で被修正文字列として選択したものとする。操作者によって被修正文字列が選択されたことは、被修正文字列指定部１６によって検出される（ステップＳ１３，Ｓ１４）。 Here, the character “husband” in the text shown in FIG. 7A, that is, the misrecognition candidate (misrecognized character) “husband” is input by the operator to the input operation unit 15 as in the first embodiment. It is assumed that the character string to be corrected is selected by an operation using. Selection of the corrected character string by the operator is detected by the corrected character string designating unit 16 (steps S13 and S14).

すると被修正文字列指定部１６は、修正文字列抽出部２０に対し、操作者によって選択された文字列を被修正文字列として指定する（ステップＳ１５）。このステップＳ１５において被修正文字列指定部１６は、操作者によって被修正文字列が選択されたこと（操作者によって選択された文字列が被修正文字列として指定されたこと）をモード設定部２２に通知する。この通知を受けてモード設定部２２は、テキスト処理装置を第２のテキスト修正モードに設定する（ステップＳ１６）、即ちモード設定部２２は、操作者による被修正文字列の選択操作（被修正文字列指定部１６による被修正文字列の指定）に応じて、テキスト修正モードを第２のテキスト修正モードに設定する。 Then, the corrected character string specifying unit 16 specifies the character string selected by the operator as the corrected character string to the corrected character string extracting unit 20 (step S15). In step S15, the corrected character string specifying unit 16 indicates that the corrected character string has been selected by the operator (the character string selected by the operator has been specified as the corrected character string). Notify Upon receiving this notification, the mode setting unit 22 sets the text processing device to the second text correction mode (step S16), that is, the mode setting unit 22 selects the character string to be corrected (corrected character by the operator). The text correction mode is set to the second text correction mode according to the specification of the character string to be corrected by the column specifying unit 16).

以降のテキスト処理装置における第２のテキスト修正モードでの処理（ステップＳ１７〜Ｓ２１）は、前記第１の実施形態におけるステップＳ６〜Ｓ１０の処理と同様である。 The subsequent processing in the second text correction mode (steps S17 to S21) in the text processing apparatus is the same as the processing in steps S6 to S10 in the first embodiment.

これに対し、操作者が被修正文字列を選択することなく、図７（ａ）に示される修正入力フィールド４４に修正用文字列を入力する操作を開始したものとする。操作者によって修正用文字列を入力する操作が行われたことは、修正用文字列入力部１７によって検出される（ステップＳ２２）。 In contrast, it is assumed that the operator has started an operation of inputting the correction character string in the correction input field 44 shown in FIG. 7A without selecting the character string to be corrected. The correction character string input unit 17 detects that the operator has performed an operation of inputting the correction character string (step S22).

修正用文字列入力部１７は、操作者によって修正用文字列を入力する操作が行われたこと（つまり修正用文字列の入力）をモード設定部２２に通知する。このときモード設定部２２は、被修正文字列の選択（指定）が被修正文字列指定部１６によって通知されていない状態（つまり被修正文字列が指定されていない状態）にある。この場合、モード設定部２２は、被修正文字列が指定されていない状態で、修正用文字列入力部１７から修正用文字列の入力が通知される（つまり修正用文字列が入力される）特定状態を検出する。つまりモード設定部２２は、修正用文字列入力部１７からの修正用文字列の入力の通知を受けて、被修正文字列が指定されていない状態で修正用文字列が入力される特定状態を検出する。モード設定部２２は、この特定状態の検出に応じてテキスト処理装置を第１のテキスト修正モードに設定する（ステップＳ２３）。 The correction character string input unit 17 notifies the mode setting unit 22 that the operation of inputting the correction character string has been performed by the operator (that is, the input of the correction character string). At this time, the mode setting unit 22 is in a state where the selection (designation) of the character string to be modified is not notified by the character string designation unit 16 (that is, the character string to be modified is not designated). In this case, the mode setting unit 22 is notified of the input of the correction character string from the correction character string input unit 17 in a state where the corrected character string is not specified (that is, the correction character string is input). Detect specific conditions. In other words, the mode setting unit 22 receives a notification of the input of the correction character string from the correction character string input unit 17, and enters a specific state in which the correction character string is input without specifying the character string to be corrected. To detect. The mode setting unit 22 sets the text processing device to the first text correction mode in response to the detection of the specific state (step S23).

修正用文字列入力部１７は、第１のテキスト修正モードでは、修正入力フィールド４４に入力された修正用文字列を例えば１文字ずつ取り込み、１文字取り込む毎に、当該取り込んだ１文字を被修正文字列検索部２３に渡す（ステップＳ２４）。 In the first text correction mode, the correction character string input unit 17 takes in the correction character string input in the correction input field 44, for example, one character at a time, and each time one character is read, the one character that has been read is corrected. It passes to the character string search part 23 (step S24).

被修正文字列検索部２３は、第１のテキスト修正モードにおいて、修正用文字列入力部１７から渡される修正用文字列を、入力テキストと当該テキストの先頭から順次比較することにより、当該テキストから当該修正用文字列に類似した文字列を検索する（ステップＳ２５）。このステップＳ２５において被修正文字列検索部２３は、検索された類似文字列を被修正文字列として決定（推定）する。 In the first text correction mode, the to-be-corrected character string search unit 23 sequentially compares the correction character string passed from the correction character string input unit 17 with the input text from the beginning of the text. A character string similar to the correction character string is searched (step S25). In step S25, the corrected character string search unit 23 determines (estimates) the searched similar character string as the corrected character string.

本実施形態において、ステップＳ２５で実行される検索には、文字列修正のための検索であることを考慮して、必ずしも完全一致しない類似文字列の検索も可能なあいまい性を許容した検索（あいまい検索）手法が適用される。あいまい検索は、ＤＰマッチングのような手法を用いて修正用文字列と入力テキストとのマッチングを行うことで実現可能である。本実施形態では、被修正文字列検索部２３によるステップＳ２５の検索に、修正用文字列入力部１７によって１文字が取り込まれる毎に候補を絞り込むインクリメンタル検索（インクリメンタルあいまい検索）が用いられる。 In the present embodiment, the search executed in step S25 is a search that allows a ambiguity that allows a search for a similar character string that does not necessarily match completely, considering that the search is for a character string correction. Search) method is applied. The fuzzy search can be realized by matching the correction character string with the input text using a technique such as DP matching. In the present embodiment, an incremental search (incremental fuzzy search) for narrowing candidates every time one character is taken in by the correction character string input unit 17 is used for the search in step S25 by the corrected character string search unit 23.

今、操作者の入力操作により、例えば図７（ｂ）に示すように、修正入力フィールド４４に修正用文字列の先頭文字となる「晴」が入力されたものとする。この場合、被修正文字列検索部２３は、文字「晴」を入力テキストと比較することにより、当該テキストから被修正文字の候補となる類似文字を検索する。ここでは、被修正文字列検索部２３は、入力テキストにおける「晴夫」の「晴」と、「晴れる」の「晴」とを検索して、両文字「晴」を被修正文字の候補として決定（推定）する。するとテキスト表示部１４は、被修正文字列検索部２３によって入力テキストから検索された文字が被修正文字の候補として自動決定されていることを明示するために、図７（ｂ）に示すように、当該検索された２つの文字「晴」を他の文字と区別してディスプレイ１３に表示させる（図では網掛け表示）。 Now, it is assumed that “sunny” that is the first character of the correction character string is input to the correction input field 44, for example, as shown in FIG. In this case, the corrected character string search unit 23 searches for similar characters that are candidates for the corrected character from the text by comparing the character “sunny” with the input text. Here, the corrected character string search unit 23 searches the input text for “sunny” of “Haruo” and “sunny” of “sunny”, and determines both characters “sunny” as candidates for the corrected character. (presume. Then, as shown in FIG. 7B, the text display unit 14 clearly shows that the character searched from the input text by the corrected character string search unit 23 is automatically determined as a candidate for the corrected character. Then, the two characters “sunny” searched are displayed on the display 13 in distinction from other characters (in the figure, shaded).

次に操作者の入力操作により、「晴」に続いて「天」が修正入力フィールド４４に入力されたものとする。つまり、修正入力フィールド４４に「晴天」が入力されたものとする。すると被修正文字列検索部２３は、その時点における修正用文字列「晴天」を、入力テキストにおける現在の被修正文字の候補「晴」を含み且つ「晴天」と文字数が同数の文字列「晴夫」及び「晴れ」と比較して、当該文字列「晴夫」及び「晴れ」の「晴天」に対する類似度（一致度）を取得する。本実施形態では説明を簡略化するために、例えば一致する文字数の割合が類似度として取得され、一定割合（ここでは半数）以上の文字が一致している場合に被修正文字列の候補として決定されるものとする。この例では、文字列「晴夫」及び「晴れ」の「晴天」に対する類似度は０．５であり、両文字列とも被修正文字列の候補となる。 Next, it is assumed that “heaven” is input to the correction input field 44 following “sunny” by the operator's input operation. That is, it is assumed that “fine weather” is input in the correction input field 44. Then, the corrected character string search unit 23 includes the character string “Haruo” that includes the current correction character candidate “hare” in the input text and has the same number of characters as “sunny sky”. ”And“ sunny ”, the similarity (matching degree) of the character strings“ Haruo ”and“ sunny ”with respect to“ sunny ”is acquired. In the present embodiment, in order to simplify the description, for example, the ratio of the number of matching characters is acquired as the similarity, and is determined as a candidate for the corrected character string when a certain ratio (here, half) or more of the characters match. Shall be. In this example, the similarity between the character strings “Haruo” and “Sunny” with respect to “Sunny” is 0.5, and both character strings are candidates for the corrected character string.

そこで操作者が、被修正文字列の候補を絞るために、入力操作部１５を用いて、「晴夫」に続いて「な」を修正入力フィールド４４に入力するための操作を行ったものとする。これにより修正入力フィールド４４には、図７（ｃ）に示されるように「晴天な」が入力される。すると被修正文字列検索部２３は、その時点における修正用文字列「晴天な」を、入力テキストにおける現在の被修正文字列の候補（「晴夫」または「晴れ」）を含み且つ「晴天な」と文字数が同数の文字列「晴夫な」及び「晴れる」と比較して、当該文字列「晴夫な」及び「晴れる」の「晴天な」に対する類似度を取得する。この例では、文字列「晴夫な」及び「晴れる」の「晴天な」に対する類似度は、それぞれ２／３（≒０．６７）及び１／３（≒０．３３）である。この場合、被修正文字列検索部２３は、類似度が０．５以上である文字列「晴夫な」のみを被修正文字列の候補として決定する。つまり、被修正文字列の候補が１つに絞られる。 Therefore, it is assumed that the operator performs an operation for inputting “NA” in the correction input field 44 after “Haruo” by using the input operation unit 15 in order to narrow down candidates for the corrected character string. . As a result, “sunny” is input to the correction input field 44 as shown in FIG. Then, the corrected character string search unit 23 includes the current correction character string candidate (“Haruo” or “Sunny”) in the input text for the correction character string “Sunny” at that time and “Sunny”. Are compared with the character strings “Haruo na” and “Haruharu” having the same number of characters, and the similarity of the character strings “Haruo na” and “Haru” to “Sunny” is obtained. In this example, the similarities of the character strings “Haruo” and “Hare” to “Hair” are 2/3 (≈0.67) and 1/3 (≈0.33), respectively. In this case, the corrected character string search unit 23 determines only the character string “Haruo” whose similarity is 0.5 or more as a candidate for the corrected character string. That is, candidates for the corrected character string are narrowed down to one.

このように被修正文字列検索部２３は、修正入力フィールド４４に入力された修正用文字列「晴天な」を入力テキストと比較することにより、当該修正用文字列「晴天な」との類似度が一定レベル以上の文字列「晴夫な」を当該入力テキストから検索して、当該検索された文字列「晴夫な」を被修正文字列の候補として決定する。するとテキスト表示部１４は、図７（ｃ）に示すように、被修正文字列検索部２３によって検索された入力テキスト中の文字列「晴夫な」が被修正文字列の候補として決定されていることを明示するために、当該文字列「晴夫な」を他の文字列と区別してディスプレイ１３に表示させる（図では網掛け表示）。 In this way, the corrected character string search unit 23 compares the correction character string “sunny sky” input in the correction input field 44 with the input text, thereby obtaining a similarity to the correction character string “sunny sky”. Is searched from the input text for the character string “Haruo Nana” having a certain level or higher, and the searched character string “Haruo Nana” is determined as a candidate for the corrected character string. Then, as shown in FIG. 7C, the text display unit 14 determines the character string “Haruo” in the input text searched by the corrected character string search unit 23 as a candidate for the corrected character string. In order to clearly indicate this, the character string “Haruo na” is displayed on the display 13 in a manner distinguishable from other character strings (shaded display in the figure).

操作者は、画面上で示されている被修正文字列候補が、修正されるべき目的の文字列に絞られたならば、修正入力フィールド４４への修正用文字列の入力操作を停止する。換言するならば、操作者は、被修正文字列候補が目的の文字列に絞られるまで修正用文字列を入力する操作を行う。操作者は、被修正文字列候補が目的の文字列に絞られたならば、入力操作部１５を用いて、当該文字列を被修正文字列として確定させるための操作を行う。 If the corrected character string candidate shown on the screen is narrowed down to the target character string to be corrected, the operator stops the input operation of the correction character string in the correction input field 44. In other words, the operator performs an operation of inputting the correction character string until the corrected character string candidates are narrowed down to the target character string. When the corrected character string candidate is narrowed down to the target character string, the operator uses the input operation unit 15 to perform an operation for determining the character string as the corrected character string.

修正部２１は、操作者の操作によって修正文字列の確定が指示されたことを検出すると（ステップＳ２６）、修正入力フィールド４４に入力されている修正用文字列を修正文字列として、その時点において確定されている、入力テキスト中の被修正文字列を、当該修正文字列に置き換える（ステップＳ２７）。図７（ｃ）の例では、「晴夫な」が「晴天な」に置き換えられる。 When the correction unit 21 detects that the confirmation of the corrected character string is instructed by the operation of the operator (step S26), the correction character string input in the correction input field 44 is set as the corrected character string at that time. The confirmed corrected character string in the input text is replaced with the corrected character string (step S27). In the example of FIG. 7C, “Haruo” is replaced with “Sunny”.

同様に、操作者が修正入力フィールド４４に、図８に示されるように修正用文字列「晴れるでしょうか」を入力すると、被修正文字列検索部２３によって入力テキストから、類似度が６／８（＝０．７５）の類似文字列「晴れるてしようか」が被修正文字列の候補として抽出される。 Similarly, when the operator inputs the correction character string “Is it clear?” As shown in FIG. 8 in the correction input field 44, the corrected character string search unit 23 makes the similarity 6/8 from the input text. A similar character string “Would you like to be fine” of (= 0.75) is extracted as a candidate for the corrected character string.

このように本実施形態のテキスト処理装置は、操作者が、画面表示されている入力テキスト中の修正されるべき箇所（被修正文字列）を指定することなく修正用文字列を入力しても、入力テキストから当該修正用文字列に一定レベル以上類似した文字列を検索することにより操作者が意図した被修正文字列を自動的に決定（推定）して、当該決定された被修正文字列を（修正文字列としての）修正用文字列に置き換える（修正する）ことができる。 As described above, the text processing apparatus according to the present embodiment allows the operator to input a correction character string without specifying a position (corrected character string) to be corrected in the input text displayed on the screen. The character string to be corrected is automatically determined (estimated) by searching for a character string that is similar to the correction character string by a certain level or more from the input text, and the determined corrected character string Can be replaced (corrected) with a correction character string (as a correction character string).

なお、被修正文字列検索部２３によるインクリメンタルあいまい検索は一例であり、修正入力フィールド４４への修正用文字列の入力が完了してから被修正文字列検索部２３による検索が開始されても構わない。また、操作者が、修正されるべき箇所（入力テキストが文字認識結果の例では誤認識箇所）をより発見しやすいように、あいまい検索の検索結果から修正用文字列と完全一致する文字列の抽出を除外するモードを設定する機能を図５のテキスト処理装置に備えても良い。また、被修正文字列の候補となる類似文字列が一定数以上存在する場合に、例えば入力テキストから最初に検索された類似文字列のみを被修正文字列の候補とするなど、候補を限定しても良い。 The incremental fuzzy search by the corrected character string search unit 23 is an example, and the search by the corrected character string search unit 23 may be started after the input of the correction character string to the correction input field 44 is completed. Absent. Also, in order to make it easier for the operator to find the part to be corrected (the part where the input text is a misrecognized part in the example of the character recognition result), the character string that completely matches the correction character string from the search result of the fuzzy search. The text processing apparatus in FIG. 5 may be provided with a function for setting a mode for excluding extraction. In addition, when there are a certain number of similar character strings that are candidates for the corrected character string, the candidates are limited, for example, only the similar character string searched first from the input text is set as a candidate for the corrected character string. May be.

なお、本発明は、上記第１及び第２の実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記第１及び第２の実施形態に開示されている複数の構成要素の適宜な組み合わせにより種々の発明を形成できる。例えば、第１または第２の実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。 The present invention is not limited to the first and second embodiments as they are, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the first and second embodiments. For example, some components may be deleted from all the components shown in the first or second embodiment.

本発明の第１の実施形態に係るテキスト処理装置の構成を示すブロック図。1 is a block diagram showing a configuration of a text processing apparatus according to a first embodiment of the present invention. 図１に示すテキスト処理装置を実現するのに用いられる情報処理装置のハードウェア構成を示すブロック図。The block diagram which shows the hardware constitutions of the information processing apparatus used for implement | achieving the text processing apparatus shown in FIG. 図１のテキスト処理装置における文字列修正処理の手順を示すフローチャート。The flowchart which shows the procedure of the character string correction process in the text processing apparatus of FIG. 図１のテキスト処理装置における文字列修正処理を説明するための画面例を示す図。The figure which shows the example of a screen for demonstrating the character string correction process in the text processing apparatus of FIG. 本発明の第２の実施形態に係るテキスト処理装置の構成を示すブロック図。The block diagram which shows the structure of the text processing apparatus which concerns on the 2nd Embodiment of this invention. 図５のテキスト処理装置における文字列修正処理の手順を示すフローチャートの一部を示す図。The figure which shows a part of flowchart which shows the procedure of the character string correction process in the text processing apparatus of FIG. 図５のテキスト処理装置における文字列修正処理の手順を示すフローチャートの残りを示す図。The figure which shows the remainder of the flowchart which shows the procedure of the character string correction process in the text processing apparatus of FIG. 図５のテキスト処理装置における文字列修正処理を説明するための画面例を示す図。The figure which shows the example of a screen for demonstrating the character string correction process in the text processing apparatus of FIG. 図５のテキスト処理装置における文字列修正処理を説明するための他の画面例を示す図。The figure which shows the other example of a screen for demonstrating the character string correction process in the text processing apparatus of FIG.

符号の説明Explanation of symbols

１１…テキスト入力部、１２…テキスト記憶部、１３…ディスプレイ、１４…テキスト表示部、１５…入力操作部、１６…被修正文字列指定部、１７…修正用文字列入力部、１９…辞書記憶部、２０，２００…修正文字列抽出部、２１…修正部、２２…モード設定部、２３…被修正文字列検索部、１９１…言語知識データベース記憶部、１９２…文字パターン辞書記憶部、１９３…音声パターン辞書記憶部、３３０…プログラム。 DESCRIPTION OF SYMBOLS 11 ... Text input part, 12 ... Text storage part, 13 ... Display, 14 ... Text display part, 15 ... Input operation part, 16 ... Character string specification part to be corrected, 17 ... Character string input part for correction, 19 ... Dictionary storage Part, 20, 200 ... corrected character string extraction part, 21 ... correction part, 22 ... mode setting part, 23 ... corrected character string search part, 191 ... language knowledge database storage part, 192 ... character pattern dictionary storage part, 193 ... Voice pattern dictionary storage unit, 330... Program.

Claims

テキストを入力するためのテキスト入力手段と、
前記テキスト入力手段によって入力されたテキストを入力テキストとして格納するためのテキスト記憶手段と、
前記テキスト記憶手段に格納された入力テキストの表示に用いられるディスプレイと、
操作者の入力操作に用いられる入力操作手段と、
前記入力操作手段を用いた操作者の入力操作に基づき前記入力テキスト中の修正されるべき文字列を被修正文字列として指定するための被修正文字列指定手段と、
前記入力操作手段を用いた操作者の入力操作に基づき文字列の修正に用いられる修正文字列を含む修正用文字列を入力するための修正用文字列入力手段と、
前記入力された修正用文字列の文字数が前記指定された被修正文字列の文字数よりも多い場合、前記入力された修正用文字列から前記指定された被修正文字列と同一文字数の文字列を順次選択して、当該順次選択された文字列の各々と前記指定された被修正文字列とを比較することにより、当該順次選択された文字列の中から前記指定された被修正文字列の修正に用いられる修正文字列として最も尤もらしい文字列を抽出する修正文字列抽出手段と、
前記入力テキスト中の前記指定された被修正文字列を前記抽出された文字列に置き換えることで当該被修正文字列を修正する修正手段と
を具備することを特徴とするテキスト処理装置。 Text input means for entering text;
Text storage means for storing the text input by the text input means as input text;
A display used to display the input text stored in the text storage means;
Input operation means used for an operator's input operation;
A to-be-corrected character string specifying means for specifying a character string to be corrected in the input text based on an input operation of the operator using the input operation means;
A correction character string input means for inputting a correction character string including a correction character string used for correcting the character string based on an input operation of the operator using the input operation means;
When the number of characters of the input correction character string is larger than the number of characters of the specified correction character string, a character string having the same number of characters as the specified correction character string is extracted from the input correction character string. Correction of the specified character string to be corrected from among the sequentially selected character strings by sequentially selecting and comparing each of the sequentially selected character strings with the specified character string to be corrected A corrected character string extracting means for extracting the most likely character string as the corrected character string used in
A text processing apparatus comprising: a correcting unit that corrects the corrected character string by replacing the designated corrected character string in the input text with the extracted character string.

前記入力操作手段を用いた操作者の入力操作に基づき、前記被修正文字列の指定を不要とする第１のテキスト修正モード及び前記被修正文字列の指定を前提とする第２のテキスト修正モードの一方のモードを選択的に設定するモード設定手段と、
前記第１のテキスト修正モードにおいて動作して、前記入力された修正用文字列を前記入力テキストと比較することにより、当該修正用文字列を修正文字列として用いる場合に被修正文字列として最も尤もらしい文字列を前記入力テキストから検索する被修正文字列検索手段とを更に具備し、
前記修正手段は、前記第１のテキスト修正モードにおいて、前記入力された修正用文字列を前記修正文字列として、前記入力テキスト中の前記検索された文字列を当該修正文字列に置き換えることで前記入力テキスト中の前記検索された文字列を修正し、
前記修正文字列抽出手段は、前記第２のテキスト修正モードにおいて動作する
ことを特徴とする請求項１記載のテキスト処理装置。 A first text correction mode that makes it unnecessary to specify the corrected character string and a second text correction mode that presupposes the specification of the corrected character string based on the input operation of the operator using the input operation means Mode setting means for selectively setting one of the modes,
Operating in the first text correction mode, by the entering-force has been corrected character string is compared with the input text, the most as the correction character string in the case of using the character string for the modified as a correction character string Further comprising a corrected character string search means for searching a plausible character string from the input text,
It said modifying means, in the first text correction mode, by replacing the entering-force has been modified character string as the correction character string, the search string in the input text to the correction character string Modify the searched string in the input text;
The text processing apparatus according to claim 1, wherein the corrected character string extraction unit operates in the second text correction mode.

前記モード設定手段は、前記被修正文字列指定手段によって被修正文字列が指定されていない状態で、前記修正用文字列入力手段によって修正用文字列が入力される特定状態の場合に前記第１のテキスト修正モードを設定し、前記被修正文字列指定手段によって被修正文字列が指定された場合に前記第２のテキスト修正モードを設定する
ことを特徴とする請求項２記載のテキスト処理装置。 The mode setting means is the first setting when the correction character string is input by the correction character string input means in a state where the correction character string is not specified by the correction character string specifying means. The text processing apparatus according to claim 2, wherein the second text correction mode is set when a corrected character string is designated by the corrected character string designating means.

前記被修正文字列検索手段は、前記修正用文字列を構成する文字に一致する文字が含まれている割合が一定レベル以上の文字列を検索することを特徴とする請求項２記載のテキスト処理装置。 3. The text processing according to claim 2, wherein the corrected character string search means searches for a character string whose ratio of characters that match the characters constituting the correction character string is a certain level or more. apparatus.

前記修正文字列抽出手段は、前記順次選択された文字列のうち、前記指定された被修正文字列を構成する文字のパターンと最も類似したパターンの文字から構成される文字列を、前記最も尤もらしい文字列として抽出することを特徴とする請求項１記載のテキスト処理装置。 The modified character string extraction means, among the character strings the sequentially selected, a character string consisting of character patterns with the most similar patterns constituting the target correction character string the specified, the most plausible 2. The text processing apparatus according to claim 1, wherein the text processing apparatus extracts the character string as a new character string .

前記修正文字列抽出手段は、前記入力テキストから前記指定された被修正文字列を含む文字列を取り出し、当該取り出された文字列中の前記被修正文字列を、前記順次選択された文字列でそれぞれ置き換えて、その置き換えが行われた全ての文字列の中から、言語として最も尤もらしい文字列を決定して、当該決定された文字列に含まれている、前記被修正文字列の置き換えに用いられた文字列を抽出することを特徴とする請求項１記載のテキスト処理装置。 The corrected character string extraction unit extracts a character string including the specified corrected character string from the input text, and the corrected character string in the extracted character string is converted into the sequentially selected character string. replacing respectively, from the replacement of all took place strings, to determine the most probable character string as a language, is included in the determined string, said the replacement of the correction character string The text processing apparatus according to claim 1, wherein the used character string is extracted.

テキストを入力するためのテキスト入力手段と、前記テキスト入力手段によって入力されたテキストを入力テキストとして格納するためのテキスト記憶手段と、ディスプレイと、入力操作手段とを有するコンピュータに、
前記テキスト入力手段によって入力されて前記テキスト記憶手段に格納された入力テキストを前記ディスプレイに表示するステップと、
前記入力テキストが前記ディスプレイに表示されている状態で、前記入力テキスト中の修正されるべき文字列が前記入力操作手段を用いた操作者の入力操作によって選択された場合に、当該選択された文字列を被修正文字列として決定するステップと、
前記入力操作手段を用いた操作者の入力操作によって入力される、文字列の修正に用いられる修正文字列を含む修正用文字列を取り込むステップと、
前記入力された修正用文字列の文字数が前記指定された被修正文字列の文字数よりも多い場合、前記入力された修正用文字列から前記指定された被修正文字列と同一文字数の文字列を順次選択して、当該順次選択された文字列の各々と前記指定された被修正文字列とを比較することにより、当該順次選択された文字列の中から前記指定された被修正文字列の修正に用いられる修正文字列として最も尤もらしい文字列を抽出するステップと、
前記入力テキスト中の前記決定された被修正文字列を前記抽出された文字列に置き換えることで当該被修正文字列を修正するステップと
を実行させるためのプログラム。 A computer having text input means for inputting text, text storage means for storing the text input by the text input means as input text, a display, and input operation means,
Displaying the input text input by the text input means and stored in the text storage means on the display;
When the input text is displayed on the display and the character string to be corrected in the input text is selected by the operator's input operation using the input operation means, the selected character Determining a column as a modified string;
Capturing a correction character string including a correction character string used for correction of a character string, which is input by an operator's input operation using the input operation means;
When the number of characters of the input correction character string is larger than the number of characters of the specified correction character string, a character string having the same number of characters as the specified correction character string is extracted from the input correction character string. Correction of the specified character string to be corrected from among the sequentially selected character strings by sequentially selecting and comparing each of the sequentially selected character strings with the specified character string to be corrected Extracting the most likely string as the modified string used in
A program for executing the step of correcting the corrected character string by replacing the determined corrected character string in the input text with the extracted character string.