JPS61156466A

JPS61156466A - Word extracting system

Info

Publication number: JPS61156466A
Application number: JP59276439A
Authority: JP
Inventors: Katsuhiko Fujita; 克彦藤田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-12-28
Filing date: 1984-12-28
Publication date: 1986-07-16

Abstract

PURPOSE:To decrease an error, to minimize the burden of a user and to improve a converting speed by having a means to detect a request for correction at the position where a converting error occurs, a means to store the continuous connection of the correct converting word at the position and a means to omit a part of the processing after said position based upon the information stored in a memory means. CONSTITUTION:A word extracting device is composed of a control part, an input part 2, a continuous connection word checking part 3, a dictionary retrieving part 4, a connecting possible impossible detecting part 5, an estimating part 6, an output part 7, an input device (keyboard) 8, a continuous connection word dictionary 9, a word dictionary 10, a connecting possible impossible testing table 11 and a display device (VDT)12. The correction at the position, where the punctuation occurs, is stored, the combination with immediately previous word is stored, thereby making easy to obtain the correct analyzed result at the analysis after that. For that reason, prior to the usual dictionary retrieving, a word continuous connection dictionary is retrieved.

Description

【発明の詳細な説明】技術分野本発明は、単語抽出方式に関し、詳しくは区切り誤りの
修正を自動的に学習することにより、正しい解析結果を
得やすくした単語抽出方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION Technical Field The present invention relates to a word extraction method, and more particularly to a word extraction method that automatically learns how to correct delimitation errors, thereby making it easier to obtain correct analysis results.

従来技術例えば、「じさつしたい」という文字列は１人間がこれ
を見ても、「自殺したい」と表すされるべきか、あるい
は「自殺死体１と表わされるべきか、不明である。上記
のような例は、かな漢字変換においても、当然誤変換す
るおそれがある。Prior Art For example, even if one person looks at the character string ``I want to kill myself,'' it is unclear whether it should be expressed as ``I want to commit suicide'' or ``Suicide Corpse 1.'' In such an example, there is a risk of erroneous conversion even in kana-kanji conversion.

この場合、「したい」は、実はす変動詞の連用形「シ」
と、助動詞「たいｊが結合したもので、２語からなって
いる。そのため、従来の同音語学習処理でも処理しきれ
ないという問題がある。In this case, ``shitai'' is actually the conjunctive form of the verb ``shi''.
It is a combination of the auxiliary verb ``taij'' and the auxiliary verb ``taij'', and is made up of two words.Therefore, there is a problem that it cannot be fully processed even with conventional homophone learning processing.

従来より、かな漢字変換においては、辞書検索によって
見出された単語に対し、評価式を用いて評価を行い、変
換単語を決めていく方法があるが。Conventionally, in kana-kanji conversion, there is a method of evaluating words found through dictionary searches using an evaluation formula to determine the converted words.

この方法では必然的に誤変換の可能性がある。単語の読
みの長さが、同じ同音語の変換誤りに対しては、既に各
種の学習処理が実用化されいている。This method inevitably involves the possibility of erroneous conversion. Various learning processes have already been put into practical use for conversion errors of homophones with the same pronunciation length.

しかし、この場合でも、区切りを誤った際の対策はなく
、区切り誤りの自動修正方法として、従来は理想的なも
のがなかった。However, even in this case, there is no countermeasure against incorrect delimitation, and there has been no ideal method for automatically correcting delimitation errors.

目　　　　　的本発明の目的は、このような従来の問題を解決し１区切
り誤りの修正を行うことができ、正しい解析と高速な変
換処理が可能な単語抽出方式を提供することにある。OBJECTIVES An object of the present invention is to provide a word extraction method that can solve such conventional problems, correct single-block errors, and perform correct analysis and high-speed conversion processing.

構　　　成上記目的を達成するため１本発明の単語抽出方式は、変
換誤りを生じた位置での修正の要求を検出する手段と、
該位置での正しい変換単語の連接を記憶する手段と、該
記憶手段に記憶された情報をもとに、該位置以降の処理
の一部を省略する手段とを有することにより１区切り誤
りの修正を自動的に学習できるようにして、゛誤りを減
少させ。Configuration To achieve the above object, the word extraction method of the present invention includes means for detecting a request for correction at a position where a conversion error has occurred;
Correcting a one-break error by having means for storing the correct concatenation of converted words at the position, and means for omitting part of the processing after the position based on the information stored in the storage means. Automatically learn to reduce errors.

ユーザの負担を少なくし、かつ変換速度を向上させるこ
とに特徴がある。It is characterized by reducing the burden on the user and improving the conversion speed.

以下、本発明の構成を、実施例により説明する。Hereinafter, the configuration of the present invention will be explained using examples.

第１図は１本発明の一実施例を示す単諸抽出装置のブロ
ック図である。FIG. 1 is a block diagram of a single extractor showing an embodiment of the present invention.

第１図において、ｌは制御部、２は入力部、３は連接単
語チェック部、４は辞書検索部、５は接続可否検定部、
６は評価部、７は出力部、８は入力装置（キーボード）
、９は連接単語辞書、ｌＯは単語辞書、１１は接続可否
検定部、１２は表示装置（Ｖ　Ｄ　Ｔ）である。In FIG. 1, l is a control unit, 2 is an input unit, 3 is a connected word check unit, 4 is a dictionary search unit, 5 is a connection verification unit,
6 is the evaluation section, 7 is the output section, and 8 is the input device (keyboard).
, 9 is a connected word dictionary, 1O is a word dictionary, 11 is a connection possibility testing section, and 12 is a display device (VDT).

本発明においては１区切りを生じた位置での修正を記憶
し、直前の単語との結びつきを記憶しておくことにより
、以降の解析では正しい解析結果を得やすくしている。In the present invention, by storing the correction at the position where one break occurs and storing the connection with the immediately preceding word, it becomes easier to obtain correct analysis results in subsequent analyses.

そのため１本発明では、通常の辞書検索に先き立って、
単語連接辞ｆを検索する点に特徴がある。Therefore, in the present invention, prior to a normal dictionary search,
The feature is that it searches for word conjunction f.

第３図は、第１図の処、環フローチャートである。FIG. 3 is a flowchart of the process shown in FIG. 1.

以下、実例を示して説明する。This will be explained below using an example.

「じさつしたいとかんがえて・・・・・」という文例が
入力されたとする（ステップ２１）、辞書検索に先き立
って連接単語チェック部３で単語連接辞書検索を行うわ
けであるが（ステップ２２）、ここでは、まだ１語も抽
出されていないので、ここでの処理は行われず、辞書検
索部４での辞書検索に進むものとする（ステップ２４）
、辞書検索部４では、文頭からの文字列に対し辞書検索
を行い。Assume that the example sentence "I want to feel sad..." is input (step 21). Prior to the dictionary search, the word conjunction check unit 3 performs a word conjunction dictionary search (step 21). 22), since not a single word has been extracted yet, no processing is performed here and the process proceeds to dictionary search in the dictionary search unit 4 (step 24).
, the dictionary search unit 4 performs a dictionary search on the character string starting from the beginning of the sentence.

「自殺」、「時差」、１字」等を見出す、これらの単語
に対し、接続可否検定部５で文頭に立ち得るか否かを検
定した後（ステップ２５）５文頭に立ち得る語のみが評
価６で評価を受ける（ステップ２６）、評価には、単語
の読みの長さを含む何種類かのパラメータを用いる。``Suicide'', ``Time difference'', 1 character'', etc. are found. After these words are tested by the connectability test unit 5 to determine whether they can appear at the beginning of a sentence (step 25), only words that can appear at the beginning of a sentence are determined. Evaluation is performed in evaluation 6 (step 26), and several types of parameters are used for the evaluation, including the reading length of the word.

ここで、最も評価順位の高い語が抽出される。Here, the word with the highest evaluation ranking is extracted.

前例において、「自殺」が抽出されたとすると。Suppose that in the previous example, "suicide" was extracted.

この単語の読みに後続する文字列に処理が進む。Processing proceeds to the character string following the pronunciation of this word.

「したいとかんがえて・・・」に対する辞書検索の前番
；、連接単語チェック部３での処理を行う。The first dictionary search for "I want to..." is processed by the connected word checking unit 3.

ここでは、直前の単語「自殺」をキーに、連接単語辞書
９を検索する（ステップ２７）。連接単語辞書９は、単
語と、それに直接後続する単語の情報とがペアで記憶さ
れている辞書である。ここに示されている後続単語の読
みが、対象となる入力かな文字列と合致したときには、
辞書検索処理をバイパスして、後続単語を決定するため
に用いられる。この辞書への登録方法は、後述する。Here, the conjunctive word dictionary 9 is searched using the immediately preceding word "suicide" as a key (step 27). The concatenated word dictionary 9 is a dictionary in which words and information about words directly following the words are stored in pairs. When the reading of the subsequent word shown here matches the target input Kana character string,
Used to bypass dictionary search processing and determine subsequent words. The method of registering in this dictionary will be described later.

ここでも、「自殺」というキーで連接単語辞書９を検索
したが、何も見つからないものとしておく。Here again, it is assumed that the conjunctive word dictionary 9 is searched using the key "suicide" but nothing is found.

次に、辞書検索部４では、「したいとかんがえて・・・
」に対する辞書検索により、「死体」。Next, the dictionary retrieval unit 4 searches for ``Thinking about what I want to do...''
Dictionary search for ``corpse''.

「舌」、「下」、「シ」、ｒ市」等が得られる。接続可
否検定部５．評価部６での処理を経て、「死体」が抽出
されたとする。``tongue'', ``under'', ``shi'', ``r city'', etc. are obtained. Connectivity testing section 5. Assume that a "cadaver" is extracted through processing in the evaluation unit 6.

ユーザが、「自殺したい」という変換を意図していたと
すれば、これは誤りである。そこで、修正に入るわけで
あるが、そのとき次の手順で連接単語辞書９にデータが
書込まれる。If the user intended the conversion to be ``I want to commit suicide,'' this would be incorrect. Therefore, correction is started, and at that time, data is written into the conjunctive word dictionary 9 in the following procedure.

（ａ）ユーザは、変換誤りを見出し、修正モードに移る
（修正モードキーが用意されている。ステップ３１）。(a) The user finds a conversion error and moves to the correction mode (a correction mode key is provided. Step 31).

（ｂ）誤変換単語にカーソルを移動する（ステップ３２
）。このとき、システムは、カーソルの位置している単
語を検知し、その単語以降の入力かな文字を一定数だけ
入力行に表示する。(b) Move the cursor to the incorrectly converted word (step 32)
). At this time, the system detects the word where the cursor is located and displays a certain number of Kana characters input after that word on the input line.

（ｅ）ユーザは変換キーを押下する（ステップ３３）。(e) The user presses the conversion key (step 33).

システムは、そのかな文字列に対する辞書検索と評価、
補正評価を行い、評価値類に表示装置１２上の変換候補
表示領域に表示する。候補が多く。The system performs dictionary search and evaluation for the kana string,
A correction evaluation is performed and the evaluation values are displayed in the conversion candidate display area on the display device 12. There are many candidates.

一度で全てを表示できない場合のため、次候補キーを設
けて、全候補を見ることができるようにしている。これ
らの表示技術は、従来からよく知られている方法である
。（ｄ）ユーザは、その候補中から望みの単語を選択す
る（ステップ３４）。選択方法としては、変換候補表示
領域と、入力装置８のキーを対応させておき、対応キー
によって選択するようにする。システムは、その選択を
受けた後１次の処理を行う。In case you cannot display all the candidates at once, a next candidate key is provided so that you can see all the candidates. These display techniques are conventionally well known methods. (d) The user selects the desired word from the candidates (step 34). The selection method is such that the conversion candidate display area is associated with a key on the input device 8, and selection is made using the corresponding key. After receiving the selection, the system performs the primary processing.

（Ａ）修正された単語の直前の単語の表記データをメモ
リから読み出す、（Ｂ）選択された単語の辞書データと
、上記直前の単語の表記とを、第２図に示すような形式
で、連接単語辞書９に書込む。(A) Read out the notation data of the word immediately before the corrected word from the memory; (B) Read the dictionary data of the selected word and the notation of the word immediately before the word in the format shown in FIG. 2; Write in the conjunctive word dictionary 9.

（Ｃ）上記処理が終了したならば、変換結果表示を変更
する。続いて１次の処理待ちとなる。(C) When the above processing is completed, change the conversion result display. Subsequently, it waits for the first processing.

次に、連接単語チェックがどのように行われるかを、説
明する。Next, how the connected word check is performed will be explained.

一旦、上記手順で「死体」→「シ」の修正が行われたも
のとする。このとき、連接単語辞書９には、第２図に示
すように記憶されている。この状態で、「じさつしたい
が・・・」のような入力があるとする。「自殺」までは
先と同じようにして抽出される。続いて、「したいが・
・」の処理に入る。このとき、連接単語チェック部３で
は「自殺」という表記をキーにして連接単語辞書９を検
索する。「シ」　という読みを持つ「シ」　という動詞
が連接することが見出される。この読み「シ」と入力文
字列とのマツチングをとると、確かに「シ」が先頭番；
ある。そこで辞書検索部４、評価部６での処理をバイパ
スして、「シ」を抽出する。It is assumed that the correction from "corpse" to "shi" has been performed in the above procedure. At this time, the words are stored in the connected word dictionary 9 as shown in FIG. In this state, suppose that there is an input such as "I want to touch you...". ``Suicide'' is extracted in the same way as before. Next, ``I want to...
・" processing begins. At this time, the connected word checking section 3 searches the connected word dictionary 9 using the expression "suicide" as a key. The verb ``shi'' which has the reading ``shi'' is found to be concatenated. When we match this reading “shi” with the input character string, “shi” is certainly the first number;
be. Therefore, the processing in the dictionary search section 4 and evaluation section 6 is bypassed and "shi" is extracted.

続いて、「たいが・・」の処理に入る。Next, the processing for "I want to..." begins.

上記の（Ｂ）の処理において、登録を品詞によって制御
することもできる。助詞については、学習することで、
かえって誤りを生ずる場合があり得るので、後方の語の
品詞が、助詞か否かを判定し助詞でなければ登録するこ
とにより、さらに信頼性を向上させることができる。こ
のようにして。In the process (B) above, registration can also be controlled by parts of speech. By learning about particles,
Since errors may occur, reliability can be further improved by determining whether the part of speech of the following word is a particle and registering it if it is not a particle. In this way.

本発明では１区切り誤りが生じた位置での修正を記憶し
、直前の単語との結びつきを記憶しておくことによって
、以降の解析では正しい解析結果を得ることができる。In the present invention, correct analysis results can be obtained in subsequent analyzes by storing the correction at the position where a one-break error occurs and storing the connection with the immediately preceding word.

効　　　果以上説明したように、本発明によれば、区切り誤りの修
正が自動的に学習できるので、以下での同じ入力につい
ての誤りがなくなるのみならず。Effects As explained above, according to the present invention, it is possible to automatically learn to correct delimiter errors, which not only eliminates errors in subsequent inputs.

学習された部分については処理が省略され、変換速度が
向上するとともに、ユーザの負担が少なくなる。Processing is omitted for the learned portions, improving conversion speed and reducing the burden on the user.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の一実施例を示す文書作成装置のブロッ
ク図、第２図は第１図の連接単語辞書のフォーマット図
、第３図は第１図の処理フローチャートである。ｌ：制御部、２：入力部、３：連接単語チェック部、４
：辞書検索部、５：接続可否検定部、６；評価部、７：
出力部、８：入力装置、９：連接単語辞書、１０：単語
辞書、１１：接続可否検定部。１２：表示装置。FIG. 1 is a block diagram of a document creation device showing an embodiment of the present invention, FIG. 2 is a format diagram of the conjunctive word dictionary of FIG. 1, and FIG. 3 is a processing flowchart of FIG. 1. l: control section, 2: input section, 3: connected word check section, 4
: Dictionary search section, 5: Connectability test section, 6; Evaluation section, 7:
Output unit, 8: Input device, 9: Concatenated word dictionary, 10: Word dictionary, 11: Connectivity test unit. 12: Display device.

Claims

【特許請求の範囲】[Claims]

（１）入力されたかな文字列に対し、単語辞書中の単語
と照合して一致するものを検出する辞書検索手段、検索
により得られた単語と直前の単語との接続の可否を検定
する手段、および接続可能な単語に対し最もらしさを評
価する手段を有する単諸抽出装置において、変換誤りを
生じた位置での修正の要求を検出する手段と、該位置で
の正しい変換単語の連接を記憶する手段と、該記憶手段
に記憶された情報をもとに、該位置以降の処理の一部を
省略する手段とを有することを特徴とする単語抽出方式
。(1) Dictionary search means that matches the input kana character string with words in a word dictionary to find a match; means that tests whether the word obtained by the search can be connected to the immediately preceding word; , and a means for evaluating the likelihood of connectable words, a means for detecting a request for correction at a position where a conversion error has occurred, and a means for storing a concatenation of correct converted words at the position. and means for omitting part of the processing after the position based on the information stored in the storage means.

（２）上記変換単語の連接を記憶する手段は、品詞情報
等の関与する単語の情報によつて制御されることを特徴
とする特許請求の範囲第１項記載の単語抽出方式。(2) The word extraction method according to claim 1, wherein the means for storing the concatenation of converted words is controlled by information on the words involved, such as part-of-speech information.