JPH03179498A

JPH03179498A - Voice japanese conversion system

Info

Publication number: JPH03179498A
Application number: JP1319867A
Authority: JP
Inventors: Toshiaki Tsuboi; 俊明坪井; Fumihiko Kobashi; 小橋　史彦
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1989-12-08
Filing date: 1989-12-08
Publication date: 1991-08-05

Abstract

PURPOSE:To enable highly accurate Japanese conversion even when a composite word is inputted as it is by making the evaluated value of a candidate for a composite word consisting of plural successive word candidates larger than an evaluated value determined by word candidates included among the composite candidates when the candidate is registered in a composite word pattern table. CONSTITUTION:A voice recognition part 3 matches an input voice which is inputted from a voice input part 1 with standard patterns of syllables and words stored previously in a standard pattern storage part 2 and sends the recognition result to a word dictionary collation part 4. The word dictionary collation part 4 collates the input recognition result with a word dictionary 5 and extracts candidates of the composite word consisting of a series of word candidates and successive word candidates. A composite word pattern processing part 6 checks whether or not the composite word candidate is registered in the composite word pattern table 7 and then makes the evaluated value of the composite word candidate larger than the sum of respective evaluated values of the respective word candidates included among the composite word candidates when the composite word candidate is registered. Consequently, the input composite word can be converted into Japanese with high accuracy.

Description

【発明の詳細な説明】「産業上の利用分野」この発明は単語、単音節などの音声認識用標準パタンを
用いて、入力音声中の単語、単音節などの認識を行い、
この認識結果を、単語辞書、活用語尾テーブル及び付属
語辞書を用いて日本語に変換する日本語変換方式に関す
る。[Detailed Description of the Invention] "Industrial Application Field" This invention recognizes words, monosyllables, etc. in input speech using standard patterns for speech recognition such as words, monosyllables, etc.
The present invention relates to a Japanese conversion method for converting this recognition result into Japanese using a word dictionary, a conjugated word ending table, and an adjunct word dictionary.

「従来の技術」音声認識を利用した日本語入力方式、つまり音声日本語
変換方式の多くは、文節の単位として複合語を含まない
方式が一般的である。この場合利用者は複合語を構成す
る各単語を１つの文節として区切って発声するか、その
各単語ごとに文節境界をキー人力する必要があり、利用
者は複合語をどの様に単語に分割するかを判断せねばな
らない。``Prior Art'' Most Japanese input methods that utilize speech recognition, that is, speech-to-Japanese conversion methods, generally do not include compound words as units of phrases. In this case, the user needs to either separate each word that makes up the compound word into a single clause and pronounce it, or manually define the clause boundary for each word. have to decide whether to do so.

これは、利用者にとってかなりの負担となり、単語とは
どのようなものかを知っていなければならない。しかし
、一般の利用者に複合語を正しく単語に分割させること
はほとんど不可能である。このため、単語分割の間違い
から日本語変換率の低下をまねくこととなる。また、利
用者が複合語を単語に分割して人力することを回避する
ため、利用者は複合語を区切らずに発声して人力し、装
置側で複合語を処理する場合は、複合語を構成すること
ができる単語の品詞を制限する程度しか行なわれておら
ず、単語候補を組み合わせた複合語の候補が多く生成さ
れ、日本語変換性能が低い。This places a considerable burden on the user, who must know what the words are. However, it is almost impossible for general users to correctly divide compound words into words. Therefore, errors in word segmentation may lead to a decrease in the Japanese conversion rate. In addition, in order to avoid the user having to manually divide a compound word into words, the user should manually pronounce the compound word without dividing it into words, and when processing the compound word on the device side, the compound word This is done only to the extent of restricting the parts of speech of words that can be composed, and many compound word candidates are generated by combining word candidates, resulting in low Japanese conversion performance.

「課題を解決するための手段」この発明はこれらの欠点を解決するために、人力する文
章の分野が限定された場合に利用者が使用する複合語も
限定されることを利用して、あらかしめ人力する分野の
文書を形態素解析するなどして単語分割し、複合語を構
成する単語のバタンを抽出し、複合語を構成する単語の
系列を記述した複合語パタンテーブルを設け、この複合
語パタンテーブルを用いて、認識結果を単語の候補に変
換した結果得られた複数の連続した単語候補の系列から
なる複合語の候補がその複合語パタンテーブルに登録さ
れているか否かを調べ、登録されている場合はその複合
語候補の評価値を、その複合語候補に含まれている単語
候補の評価値から央まる評価値より高くする。このよう
にして日本語変換性能を高める。"Means for Solving the Problem" In order to solve these drawbacks, this invention takes advantage of the fact that when the field of manual writing is limited, the compound words used by the user are also limited. Words are divided into words by morphological analysis of documents in the field of manual caulking, extracting the words that make up compound words, creating a compound word pattern table that describes the series of words that make up compound words, and creating compound words. Using the pattern table, check whether a compound word candidate consisting of a series of multiple consecutive word candidates obtained as a result of converting the recognition result into word candidates is registered in the compound word pattern table, and register it. If so, the evaluation value of that compound word candidate is set higher than the average value of the evaluation values of the word candidates included in the compound word candidate. In this way, the Japanese conversion performance is improved.

「実施例」この発明の実施例を第１図に示す。音声人力部１から利
用者により人力された人力音声と、標準バタンＭ積部２
にあらかしめ蓄えられている単音節および単語の標準バ
タンとのマツチングを音声認識部３で行い、人力音声の
単音節、単語の認識を行ない、その認識結果は単語辞書
照合部４に送られる。単語辞書照合部４では、人力され
た認識結果と単語辞書５とを照合し、単語候補および複
数の連続した単語候補の系列からなる複合語の候補を抽
出する。その複合語候補については、複合語バタン処理
部６において複合語パタンテーブル７に登録されている
か否かを調べ、登録されている場合はその複合語候補の
評価値を、その複合語候補に含まれる各単語候補の各評
価値の和よりもαだけ高い値する。この場合複合語にな
り得る確度、例えば使用頻度情報が高い程、αの値を太
きくし、複合語候補が複合語パタンテーブル７にない場
合は、複合語候補の評価値を、その複合語候補に含まれ
ている各単語候補の評価値から決る値よりもβだけ下げ
る。認識候補がまだ残っているときには、文法処理部８
で活用語尾テーブル９および付属語辞書１０を用いて文
法的に正しい文節候補のみを残す。文節候補選択部１１
で最終的に残った文節候補の内、評価値が高いものから
順に数個を残して蓄積し、最も評価値の高かった文節候
補を表示及び確認修正部１２で日本語変換候補として画
面に表示し、利用者が６Ｉ　ｒａ　して正しくない場合
は残りの文節候補の中から正しいものを選択する。"Example" An example of the present invention is shown in FIG. The human voice generated by the user from the voice human power section 1 and the standard slam M production section 2
The speech recognition unit 3 performs matching with the standard syllables and words that have been pre-stored in the speech recognition section 3 to recognize the monosyllables and words of human speech, and the recognition results are sent to the word dictionary collation section 4. The word dictionary collation unit 4 collates the human recognition results with the word dictionary 5 and extracts word candidates and compound word candidates consisting of a series of a plurality of consecutive word candidates. Regarding the compound word candidate, the compound word button processing unit 6 checks whether it is registered in the compound word pattern table 7, and if it is registered, the evaluation value of the compound word candidate is included in the compound word candidate. The value is α higher than the sum of the evaluation values of each word candidate. In this case, the higher the probability that it can be a compound word, for example, the higher the usage frequency information, the thicker the value of α becomes. If the compound word candidate is not in the compound word pattern table 7, the evaluation value of the compound word candidate is is lowered by β than the value determined from the evaluation value of each word candidate included in . If there are still recognition candidates remaining, the grammar processing unit 8
Then, only grammatically correct phrase candidates are retained using the conjugated word ending table 9 and the adjunct word dictionary 10. Clause candidate selection section 11
Among the phrase candidates that finally remained, the phrase candidates with the highest evaluation values are saved and accumulated, and the phrase candidate with the highest evaluation value is displayed on the screen as a Japanese conversion candidate in the display and confirmation correction section 12. However, if the user selects 6I ra and it is incorrect, the correct phrase is selected from the remaining phrase candidates.

第２図に複合語パタンテーブルの作成法を示す。Figure 2 shows how to create a compound word pattern table.

あらかじめ用意した該当する分野の文書データを解析し
、複合語を抽出する。各複合語はその複合語を構成する
単語に分解する。各単語は単語辞書５のアドレスもしく
は意味カテゴリ番号が付与されであるので、これにより
各複合語をその構成単語系列、つまり複合語バタンを記
述する。抽出した複合語は構成する単語の系列を単語辞
書５のアドレスもしくは意味的なカテゴリ番号を用いて
第３図のようにその使用頻度情報と共にテーブル化する
。第３図の例では１複合語の最大構成単語数が８の場合
である。第４図は複合語バタン処理の例である。同図Ａ
の音声認識結果で音節候補の組合せを作威し、単語辞書
と照合し、単語の候補を抽出する。１つの単語候補を抽
出すると、単語辞書との照合開始位置をその前の単語音
節候補組合せの終端位置の１つ次の位置にずらして再び
１ｆｆｌ識結果と単語辞書を照合する。このようにして
抽出された同図Ｂの複合語の候補と第３図の複合語パタ
ンテーブルを照合し、使用頻度の高い複合語の候補はそ
の複合語の構成単語候補の評価値で決る評価値（和の（
ｌ！りより大きく高い評価値とし、複合語パターンテー
ブルに無い複合語の候補の評価値はその複合語の構成単
語候補の評価値で決る評価値より低くする。この結果、
同図Ｃに示すように入力した複合語を高精度に得ること
が可能となる。なお前記αは一定値でもよい。Analyzes document data in the relevant field prepared in advance and extracts compound words. Each compound word is decomposed into its constituent words. Since each word is given an address in the word dictionary 5 or a meaning category number, each compound word is described in its constituent word series, that is, the compound word batan. For the extracted compound words, a series of constituent words is tabulated using the address of the word dictionary 5 or the semantic category number along with the usage frequency information as shown in FIG. In the example shown in FIG. 3, the maximum number of words constituting one compound word is eight. FIG. 4 is an example of compound word slam processing. Same figure A
The system generates combinations of syllable candidates based on the speech recognition results, compares them with a word dictionary, and extracts word candidates. When one word candidate is extracted, the start position of the comparison with the word dictionary is shifted to the position next to the end position of the previous word syllable candidate combination, and the 1ffl recognition result is compared with the word dictionary again. The compound word candidates extracted in this way in Figure B are compared with the compound word pattern table in Figure 3, and frequently used compound word candidates are evaluated based on the evaluation values of the constituent word candidates of the compound word. Value (sum of (
l! The evaluation value of a compound word candidate that is not in the compound word pattern table is set to be a value larger than the evaluation value determined by the evaluation value of the constituent word candidates of the compound word. As a result,
As shown in Figure C, it is possible to obtain the input compound word with high precision. Note that α may be a constant value.

この場合は複合語パターンテーブルに使用頻度情報を記
憶しておかなくてもよい。In this case, it is not necessary to store usage frequency information in the compound word pattern table.

「発明の効果」以上説明したように、この発明においては利用者が複合
語を単語に分割する必要がなく、複合語をそのまま人力
しても高精度に日本語変換することが可能となる。"Effects of the Invention" As explained above, in this invention, there is no need for the user to divide a compound word into words, and it becomes possible to convert the compound word into Japanese with high precision even if it is done manually.

【図面の簡単な説明】[Brief explanation of drawings]

第１図はこの発明の実施例を示すブロック図、第２図は
複合語パタンテーブルの作成法の一例で、文章から複合
語を抽出し、複合語を単語に分割し、各単語の単語辞書
５のアドレスもしくは意味カテゴリ番号を複合語テーブ
ルに登録し、使用頻度情報も加える例を示す図、第３図
は複合語パタンテーブルの例を示す図、第４図は複合語
バタン処理の例で、Ａは音声認識結果、Ｂは単語辞書照
合結果得られた複合語の候補、Ｃは複合語処理を行った
結果目的の複合語が得られた例を示す図である。Fig. 1 is a block diagram showing an embodiment of the present invention, and Fig. 2 is an example of a method for creating a compound word pattern table, in which compound words are extracted from a sentence, compound words are divided into words, and a word dictionary for each word is created. Figure 3 shows an example of a compound word pattern table, and Figure 4 shows an example of compound word slam processing. , A is a speech recognition result, B is a compound word candidate obtained as a result of word dictionary matching, and C is a diagram showing an example in which a target compound word is obtained as a result of compound word processing.

Claims

【特許請求の範囲】[Claims]

（１）単語、単音節などの音声認識用標準パタンを用い
て入力音声中の単語、単音節などの認識を行い、この認
識結果を、単語辞書、活用語尾テーブル及び付属語辞書
を用いて日本語に変換する日本語変換方式において、複合語を構成する単語の系列を記述した複合語パタンテ
ーブルを設け、上記認識結果と上記単語辞書とを照合して得られた複数
の連続した単語候補の系列からなる複合語の候補が上記
複合語パタンテーブルに登録されているか否かを調べ、その複合語の候補が上記複合語パタンテーブルに登録さ
れている場合は、その複合語候補の評価値を、その複合
語候補に含まれている単語候補の各評価値から決まる評
価値より高くすることを特徴とする音声日本語変換方式
。(1) Recognize words, monosyllables, etc. in the input speech using standard speech recognition patterns such as words and monosyllables, and use the recognition results as In the Japanese conversion method that converts words into words, a compound word pattern table is created that describes the series of words that make up a compound word, and multiple consecutive word candidates obtained by comparing the above recognition results with the above word dictionary are used. Check whether a compound word candidate consisting of a sequence is registered in the compound word pattern table above, and if the compound word candidate is registered in the compound word pattern table above, calculate the evaluation value of the compound word candidate. , a voice-to-Japanese conversion method characterized in that the evaluation value is set higher than the evaluation value determined from each evaluation value of word candidates included in the compound word candidate.