JP3415585B2

JP3415585B2 - Statistical language model generation device, speech recognition device, and information retrieval processing device

Info

Publication number: JP3415585B2
Application number: JP2000378702A
Authority: JP
Inventors: 宏一谷垣; 博史山本; 芳典匂坂
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 1999-12-17
Filing date: 2000-12-13
Publication date: 2003-06-09
Anticipated expiration: 2020-12-13
Also published as: JP2001236089A

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、学習データ及び学
習用テキストデータに基づいて統計的言語モデルを生成
する統計的言語モデル生成装置、上記統計的言語モデル
を用いて、入力される発声音声文の音声信号を音声認識
する音声認識装置、及び上記音声認識装置を用いた情報
検索処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a statistical language model generation device for generating a statistical language model based on learning data and learning text data, and an uttered voice sentence input using the statistical language model. Recognition device for recognizing a voice signal of a user, and an information retrieval processing device using the voice recognition device Regarding

【０００２】[0002]

【従来の技術】近年、音声認識技術の進展に伴い、音声
認識の大語彙タスクへの適用が盛んに行われている。し
かしながら、大語彙音声認識のパラダイム（特定領域や
時代の支配的な科学的対象把握の方法をいう。）におい
ても、未登録語の問題が完全に解決するわけではない。
特に、人名などの固有名詞に関しては、すべてを網羅す
ることが困難であるといった本質的な問題もある。一方
で、固有名詞にはタスク達成上重要な情報であるものも
多く含まれ、音声認識の実タスク上での運用を考える
際、固有名詞の未登録語処理技術は重要な課題となる。2. Description of the Related Art In recent years, with the progress of voice recognition technology, the application of voice recognition to large vocabulary tasks has been actively performed. However, the paradigm of large vocabulary speech recognition (meaning the method of grasping the dominant scientific object in a specific area or era) does not completely solve the problem of unregistered words.
In particular, with regard to proper names such as personal names, there is an essential problem that it is difficult to cover all of them. On the other hand, many proper nouns are important information for accomplishing the task, and the unregistered word processing technology of proper nouns is an important issue when considering the operation of voice recognition on a real task.

【０００３】従来、連続音声認識装置における音素並び
（読み）を含めた未登録語の検出方式としては、以下の
方法が提案されている。（１）音素タイプライタ等のサブワードデコーダを併用
する方法（以下、第１の従来例の方法という。）、及び
（２）サブワードを擬似的な単語として言語モデルに組
み込む方法（以下、第２の従来例の方法という。）。Conventionally, the following method has been proposed as a method of detecting unregistered words including phoneme sequences (reading) in a continuous speech recognition apparatus. (1) A method of using a subword decoder such as a phoneme typewriter together (hereinafter, referred to as a method of a first conventional example), and (2) A method of incorporating a subword into a language model as a pseudo word (hereinafter, referred to as a second method). It is called the conventional method.).

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、第１の
従来例の方法は、別のデコーダを駆動する必要があるた
め、処理量の観点で望ましくない。また、推定未知語区
間の音響スコアには最尤音素系列のスコアが使われるた
め、語彙内単語系列仮説との統合には、ペナルティやし
きい値などのヒューリスティックス（発見的方法）が絡
む。However, the method of the first conventional example is not desirable from the viewpoint of processing amount because it is necessary to drive another decoder. In addition, since the maximum likelihood phoneme sequence score is used as the acoustic score of the estimated unknown word section, heuristics (heuristics) such as penalties and thresholds are involved in the integration with the in-vocabulary word sequence hypothesis.

【０００５】一方、第２の従来例の方法は、デコーダの
変更なしに実現できる利点がある。しかしながら、サブ
ワード系列として得られる未登録語に対し有効な言語処
理を行うためには、後処理として、認識語彙よりも大き
な語彙による形態素解析などを要する。また、単語とサ
ブワード、あるいは、サブワード間のＮ−ｇｒａｍ確率
で、言語的特質を十分反映するモデル化ができるとは考
えにくく、認識制約としての有効性に疑問が残る。On the other hand, the method of the second conventional example has an advantage that it can be realized without changing the decoder. However, in order to perform effective language processing on unregistered words obtained as a subword sequence, morphological analysis using a vocabulary larger than the recognized vocabulary is required as post-processing. Moreover, it is unlikely that a model that sufficiently reflects linguistic characteristics can be modeled by the word and the subword, or the N-gram probability between the subwords, and the effectiveness as a recognition constraint remains questionable.

【０００６】また、電話機における音声認識及び自動ダ
イヤリング機能や、カーナビゲーションなどの小規模の
情報検索装置において、辞書登録の数が限定されるた
め、対象となる固有名詞の数が限定される。このような
場合において、対象の固有名詞が声認識装置とは別のシ
ステムで管理されるとき、音声認識装置への登録はでき
ず、音声認識率を向上させることができない。Further, in a small-scale information search device such as a voice recognition and automatic dialing function in a telephone or a car navigation, the number of dictionary registrations is limited, so that the number of proper proper nouns is limited. In such a case, when the proper noun of the object is managed by a system different from the voice recognition device, it cannot be registered in the voice recognition device and the voice recognition rate cannot be improved.

【０００７】本発明の目的は以上の問題点を解決し、単
語辞書において未登録の未登録語に関する音声認識の精
度を従来例に比較して高くすることができ、未登録語の
区間やクラスを同定する統計的言語モデルを生成するこ
とができる統計的言語モデル生成装置及び、統計的言語
モデル生成装置を用いた音声認識装置を提供することに
ある。The object of the present invention is to solve the above problems and to improve the accuracy of voice recognition for unregistered unregistered words in a word dictionary as compared with the conventional example. (EN) Provided are a statistical language model generation device capable of generating a statistical language model for identifying and a speech recognition device using the statistical language model generation device.

【０００８】また、本発明の別の目的は、電話機におけ
る音声認識及び自動ダイヤリング機能や、カーナビゲー
ションなどの小規模の情報検索処理装置において、単語
辞書において未登録の未登録語に関する音声認識の精度
を従来例に比較して高くすることができる音声認識装置
を用いて情報検索を実行することができる情報検索処理
装置を提供することにある。Another object of the present invention is to provide a voice recognition and automatic dialing function in a telephone and a small-scale information retrieval processing device such as a car navigation system for voice recognition of unregistered words in a word dictionary. An object of the present invention is to provide an information search processing device capable of executing information search by using a voice recognition device whose accuracy is higher than that of the conventional example.

【０００９】[0009]

【００１０】[0010]

【００１１】[0011]

【００１２】[0012]

【００１３】[0013]

【００１４】[0014]

【課題を解決するための手段】本発明に係る請求項１記
載の統計的言語モデル生成装置は、固有名詞又は外来語
の普通名詞の単語リストを含む学習データを格納する学
習データ記憶手段と、上記学習データ記憶手段に格納さ
れた学習データに基づいて、上記学習データにおけるモ
ーラ長に対する単語数の割合が実質的にガンマ分布に従
うと仮定したときのモーラ長のガンマ分布のパラメータ
をクラスに依存して推定して計算するとともに、モーラ
又はモーラ連鎖であるサブワード単位で、上記固有名詞
又は外来語の普通名詞の下位クラスであるクラスを有す
る第１のＮ−ｇｒａｍの出現確率を計算することにより
未登録語をモデル化したサブワード単位Ｎ−ｇｒａｍモ
デルを生成する第１の生成手段と、所定のテキストデー
タベースに基づいて生成された単語クラスＮ−ｇｒａｍ
モデルと、上記第１の生成手段によって生成されたサブ
ワード単位Ｎ−ｇｒａｍモデルと、上記第１の生成手段
によって計算されたモーラ長のガンマ分布のパラメータ
とに基づいて、上記単語クラスと、上記固有名詞又は外
来語の普通名詞の下位クラスであるクラスとに依存した
第２のＮ−ｇｒａｍの出現確率を計算することによりサ
ブワード単位に基づいた未登録語を含む統計的言語モデ
ルを生成する第２の生成手段とを備えたことを特徴とす
る。Statistical language model generating apparatus according to claim 1, wherein according to the present invention SUMMARY OF THE INVENTION comprises a learning data storage means for storing the learning data including a word list of common nouns proper nouns or foreign language, Based on the learning data stored in the learning data storage means, the parameter of the gamma distribution of the mora length when the ratio of the number of words to the mora length in the learning data substantially assumes the gamma distribution depends on the class. It is not possible to estimate and calculate the first N-gram that has a class that is a subclass of the proper noun or the common noun of the foreign word, in units of subwords that are mora or mora chain. Based on a first generation unit that generates a subword unit N-gram model that models a registered word, and a predetermined text database The generated word class N-gram
The word class and the uniqueness based on the model, the sub-word unit N-gram model generated by the first generating unit, and the gamma distribution parameter of the mora length calculated by the first generating unit. Second, a statistical language model including unregistered words based on subword units is generated by calculating the probability of occurrence of a second N-gram depending on a class that is a subclass of a noun or a common noun of a foreign word. And a means for generating

【００１５】また、請求項２記載の統計的言語モデル生
成装置は、請求項１記載の統計的言語モデル生成装置に
おいて、上記第１の生成手段によって生成されたサブワ
ード単位Ｎ−ｇｒａｍモデルに基づいて、上記サブワー
ド単位を抽出し、抽出したラベルを上記サブワード単位
に付与することにより、サブワード単位当たり複数のラ
ベル付きサブワード単位のデータを生成する第３の生成
手段と、上記テキストデータベースから抽出された単語
と、上記第３の生成手段によって生成された複数のラベ
ル付きサブワード単位のデータとに対して音素並びを付
与することにより単語辞書を生成する第４の生成手段と
をさらに備えたことを特徴とする。Further, the statistical language model generating apparatus according to claim 2, in a statistical language model generating apparatus according to claim 1, based on the subword unit N-gram model generated by said first generating means A third extraction means for extracting the sub-word unit and assigning the extracted label to the sub-word unit to generate a plurality of labeled sub-word unit data per sub-word unit; and the word extracted from the text database. And fourth generating means for generating a word dictionary by adding a phoneme arrangement to the plurality of labeled subword unit data generated by the third generating means. To do.

【００１６】またさらに、本発明に係る請求項３記載の
音声認識装置は、入力される発声音声文の音声信号に基
づいて、所定の統計的言語モデルを用いて音声認識する
音声認識手段を備えた音声認識装置において、上記音声
認識手段は、請求項１又は２記載の統計的言語モデル生
成装置によって生成された統計的言語モデルと、請求項
２記載の第４の生成手段によって生成された単語辞書と
を用いて音声認識することを特徴とする。Furthermore, a voice recognition device according to a third aspect of the present invention comprises a voice recognition means for recognizing a voice by using a predetermined statistical language model based on a voice signal of an input uttered voice sentence. In the voice recognition device described above, the voice recognition means is a statistical language model generated by the statistical language model generation device according to claim 1 or 2, and
It is characterized in that voice recognition is performed using the word dictionary generated by the fourth generating means described in 2 .

【００１７】また、本発明に係る請求項４記載の情報検
索処理装置は、上記単語リストに対応する普通名詞の単
語データとそれに対応する情報とを含むデータベースを
記憶するデータベース記憶手段と、請求項３記載の音声
認識装置から出力される音声認識結果の文字列をキーと
して用いて、上記データベース記憶手段に記憶されたデ
ータベースから検索して、一致する単語データに対応す
る情報を上記データベース記憶手段から読み出して出力
する検索手段とを備えたことを特徴とする。According to a fourth aspect of the present invention, there is provided an information retrieval processing device, which comprises a database storage means for storing a database including word data of common nouns corresponding to the word list and information corresponding to the data. Using the character string of the voice recognition result output from the voice recognition device described in No. 3 as a key, the database stored in the database storage means is searched, and the information corresponding to the matching word data is retrieved from the database storage means. And a retrieval means for reading and outputting.

【００１８】さらに、請求項５記載の情報検索処理装置
は、請求項４記載の情報検索処理装置において、さら
に、上記検索手段から出力される情報に基づいて、所定
の処理を実行する処理実行手段を備えたことを特徴とす
る。Further, the information search processing device according to a fifth aspect is the information search processing device according to the fourth aspect , further comprising processing execution means for executing a predetermined process based on the information output from the search means. It is characterized by having.

【００１９】[0019]

【００２０】[0020]

【発明の実施の形態】以下、図面を参照して本発明に係
る実施形態について説明する。DETAILED DESCRIPTION OF THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００２１】＜第１の実施形態＞図１は、本発明に係る
第１の実施形態である連続音声認識システムのブロック
図である。本発明に係る第１の実施形態の連続音声認識
システムは、未登録語モデル生成部２０と、サブワード
単位データ生成部２１と、単語辞書生成部２２と、単語
クラスＮ−ｇｒａｍモデル生成部２３と、言語モデル生
成部２４とを備えた統計的言語モデル生成装置を備えた
ことを特徴としている。<First Embodiment> FIG. 1 is a block diagram of a continuous speech recognition system according to a first embodiment of the present invention. The continuous speech recognition system according to the first embodiment of the present invention includes an unregistered word model generation unit 20, a subword unit data generation unit 21, a word dictionary generation unit 22, and a word class N-gram model generation unit 23. And a statistical language model generation device including the language model generation unit 24.

【００２２】本実施形態では、未登録語を含む音声の高
精度な認識を可能とする、新しい統計的言語モデルを生
成する統計的言語モデル生成装置を開示する。本実施形
態の統計的言語モデルは、（１）学習データメモリ３０
内の学習データに基づいて未登録語モデル生成部２０に
よって生成された、未登録語認識用の複数の未登録語モ
デルである、サブワード単位Ｎ−ｇｒａｍモデル及びモ
ーラ長ガンマ分布データと、（２）テキストデータメモ
リ３１内のテキストデータに基づいて単語クラスＮ−ｇ
ｒａｍモデル生成部２３によって生成された単語クラス
Ｎ−ｇｒａｍモデルと、に基づいて言語モデル生成部２
４によって統計的言語モデルが生成される。The present embodiment discloses a statistical language model generation device for generating a new statistical language model, which enables highly accurate recognition of speech including unregistered words. The statistical language model of this embodiment is (1) learning data memory 30.
Subword unit N-gram model and mora length gamma distribution data, which are a plurality of unregistered word models for unregistered word recognition, which are generated by the unregistered word model generation unit 20 based on the learning data in ) The word class Ng based on the text data in the text data memory 31.
The language model generation unit 2 based on the word class N-gram model generated by the ram model generation unit 23.
4 produces a statistical language model.

【００２３】これらの未登録語モデルは、各語彙クラス
に依存して構築される。ここで、サブワードとは、単語
よりも小さい単位をいい、具体的にはモーラ又はモーラ
連鎖をいう。モーラとは、韻律論において、強勢や抑揚
などの単位となる音の相対的な長さをいい、１モーラは
短母音を含む１音節の長さに相当する。日本語では、ほ
ぼ「かな」１字（拗音では２字）がこれに相当する。以
下では、固有名詞の下位クラスである、日本人姓及び名
の未登録語に対象を限定して説明する。These unregistered word models are constructed depending on each vocabulary class. Here, the subword means a unit smaller than a word, and specifically means a mora or a mora chain. A mora is a relative length of a sound that is a unit of stress or intonation in prosody theory, and one mora corresponds to one syllable including a short vowel. In Japanese, this corresponds to almost one "kana" character (two characters in Japanese syllabary). In the following, the description is limited to unregistered words of Japanese surnames and surnames, which are subclasses of proper nouns.

【００２４】本発明者は、（１）表１に示すように約３
０万の日本人の姓のモーラ並び（読み）のデータを含む
日本人姓ファイル３０ａと、（２）表２に示すように約
３０万の日本人の名のモーラ並び（読み）のデータを含
む日本人名ファイル３０ｂとを含む学習データを学習デ
ータメモリ３０に格納した。The present inventor has (1) about 3 as shown in Table 1.
The Japanese surname file 30a containing the data of the mora arrangement (reading) of the 0,000 Japanese surnames and (2) the data of the mora arrangement (reading) of the approximately 300,000 Japanese first names as shown in Table 2. The learning data including the Japanese name file 30b including the learning data is stored in the learning data memory 30.

【００２５】[0025]

【表１】 ―――――――――――― ス，ズ，キタ，カ，ハ，シサ，イ，ト，オタ，ナ，カヒ，ラ，ツ，ジア，サ，ギ，ノ …… ――――――――――――[Table 1] ―――――――――――― Suzuki Ta, Ka, Ha, Shi Sa, Lee, To, Oh Ta, na, mosquito Hi, La, Tsu, Ji A, sa, gi, no ...... ――――――――――――

【００２６】[0026]

【表２】 ――――――――――――――――――――――――――――――――――― ヨ，オ，コト，モ，エト，モ，コケ，イ，コ …… ―――――――――――――――――――――――――――――――――――[Table 2] ――――――――――――――――――――――――――――――――――― Yo, Oh, Ko Tomo, D To, mo, ko Ke, Lee, Ko ...... ―――――――――――――――――――――――――――――――――――

【００２７】上記学習データに基づく、本発明者による
日本人姓及び名データの分析及び分析結果について説明
する。日本人の姓や名をサブワードの系列として眺める
とき、次の特徴を有することが容易に予想される。（１）長さに関する特徴：姓ではスズキ、サトウ、タカ
ハシなど、名ではヒロシ、アキラ、イチロウなど、３な
いし４モーラ長の名前が一般的である。（２）音素並びに関する特徴：日本人の姓及び名は、基
本的に漢字で構成されており、姓ではヤマ、ムラ、ナカ
など、名ではロウ、イチ、ヒロなど、高頻度の単位が存
在する。The analysis and the analysis result of the Japanese surname and first name data by the present inventor based on the above learning data will be described. When viewing Japanese surnames and given names as a series of subwords, it is easily expected to have the following characteristics. (1) Features concerning length: Suzuki, Sato, Takahashi, etc. are the surnames, and Hiroshi, Akira, Ichiro, etc. are the names with 3 to 4 moras. (2) Features related to phoneme alignment: Japanese surnames and given names are basically composed of kanji, and there are high-frequency units such as Yama, Mura, and Naka for surnames, and Rou, Ichi, and Hiro for names. To do.

【００２８】本発明者は、こうした観点から、日本人姓
及び名の読みに関する統計的特徴を分析した。人名デー
タとしては約３０万人の著名人の名前を集録した公知の
人名リストを用いた。この学習データから、漢字と平仮
名のみで構成される姓及び名を日本人名として抽出し、
得られた姓３０３，５５２人分、名２９５，１４８人分
を対象に分析を行い、その結果を表３に示す。併せて比
較のため、日本人姓及び名以外の単語の特徴を分析す
る。比較する学習データとしては、本特許出願人が所有
する自然発話旅行会話データベースより、日本人姓及び
名を除いた、のべ１，１５５，１８３単語を用いた。From this viewpoint, the present inventor analyzed the statistical characteristics of reading Japanese surnames and given names. As the personal name data, a publicly known personal name list in which names of approximately 300,000 famous persons are collected was used. From this learning data, we extract the surname and given name consisting of only kanji and hiragana as Japanese names,
The obtained 303,552 first names and 295,148 first names were analyzed, and the results are shown in Table 3. For comparison, we also analyze the characteristics of words other than Japanese surnames and given names. As learning data for comparison, a total of 1,155,183 words excluding Japanese surnames and given names from the natural speech travel conversation database owned by the applicant of the present patent application were used.

【００２９】[0029]

【表３】モデルの学習データ ――――――――――――――――――――――――――――――――――― 日本人名姓名旅行会話 ――――――――――――――――――――――――――――――――――― 単語総数３０３，５５２２９５，１４８１，１６１，５７６異なり語彙１９，０１８２０，４１３１３，４５３ ――――――――――――――――――――――――――――――――――― （注）日本人名の異なり語彙は、音素並び又は読みの異なり単語で評価し、漢字表記の違いは無視した[Table 3] Model learning data ―――――――――――――――――――――――――――――――――――― Japanese First Name First Name Last Name Travel Conversation― ―――――――――――――――――――――――――――――――――― Total number of words 303,552 295,148 1,161,576 Different vocabulary 19 , 018 20,413 13,453 ――――――――――――――――――――――――――――――――――― (Note) Different Japanese names The vocabulary was evaluated by words with different phoneme sequences or different readings, and differences in kanji notation were ignored.

【００３０】本発明者が分析した単語の長さに関する統
計を図６に示す。長さの単位としては、モーラ数を用い
た。この結果から、日本人姓及び名の長さが３、４モー
ラを中心に非常に偏った分布を持つことが確認できる。
３、４モーラを合わせると、姓及び名ともにほぼ９割の
人名が該当することになる。次に、モーラの並びに関す
る統計を表４に示す。モーラ並びの偏りの指標として、
頻度上位Ｎ種類のモーラ二連鎖による、モーラ並びの被
覆率を調べた。ここで、被覆率とは、すべてのモーラの
中での二連鎖の占める割合をいう。FIG. 6 shows statistics about the length of words analyzed by the present inventor. The Mora number was used as the unit of length. From this result, it can be confirmed that the Japanese surnames and given names have a distribution that is extremely biased around 3, 4 mora.
If 3 and 4 moras are combined, almost 90% of the names correspond to both surname and given name. Next, Table 4 shows statistics on the arrangement of mora. As an index of the bias of mora alignment,
The coverage of the mora array by the two most-frequent mora two chains was investigated. Here, the coverage means a ratio of two chains in all mora.

【００３１】[0031]

【表４】モーラ並びの偏り ――――――――――――――――――――――――――――――――――― 二連鎖モーラの種類モーラ並びの被覆率（％）（頻度上位Ｎ種類）日本人名姓名旅行会話 ――――――――――――――――――――――――――――――――――― １３．８４．９０．１１０２３．３２８．３５．１１００５９．８６６．６１９．４１０００８４．３８２．４３５．６ ――――――――――――――――――――――――――――――――――― （注）頻度上位Ｎ種類の二連鎖モーラによる。モーラ並びの被覆率（％）。奇数長の単語があるため、被覆率が１００％になることはない。[Table 4] Bias of mora arrangement ―――――――――――――――――――――――――――――――――――― Two-chain mora types Mora arrangement Coverage rate (%) (top N types by frequency) Japanese first name First name First name Travel conversation ―――――――――――――――――――――――――――――――― ――― 1 3.8 4.9 0.1 10 23.3 28.3 5.1 100 100 59.8 66.6 19.4 1000 84.3 82.4 35.6 ――――――― ―――――――――――――――――――――――――――― (Note) Based on the most frequent N types of two-chain mora. Coverage (%) of mora. Since there are words of odd length, the coverage will not be 100%.

【００３２】例えば、日本人姓及び名では、それぞれの
高頻度１０００種類のモーラ二連鎖だけで、姓及び名に
おけるモーラ並びの８割以上が被覆される。For example, for Japanese surnames and given names, 80% or more of the mora arrangements in the surnames and given names are covered only by the high-frequency 1000 kinds of mora two chains.

【００３３】次いで、日本人姓及び名の未登録語モデル
に基づく統計的言語モデルの生成方法について詳述す
る。上述で得られた知見に基づき、日本人姓及び名クラ
スの未登録語モデルに基づく統計的言語モデルを構築す
る。また、デコーディングの観点から、統計的言語モデ
ルは、近年広く用いられているＮ−ｇｒａｍ形式で取り
扱えることが望ましく、本実施形態では、本未登録語モ
デルを単語Ｎ−ｇｒａｍ形式で実装する。Next, a method of generating a statistical language model based on unregistered word models of Japanese surnames and given names will be described in detail. Based on the knowledge obtained above, a statistical language model based on the unregistered word model of Japanese surname and first name class is constructed. Further, from the viewpoint of decoding, it is desirable that the statistical language model can be handled in the N-gram format which has been widely used in recent years, and in this embodiment, the unregistered word model is implemented in the word N-gram format.

【００３４】まず、未登録語を含む単語系列のモデル化
について説明する。本実施形態に係る統計的言語モデル
のベースとなるのは、単語のクラスＮ−ｇｒａｍモデル
である。この単語クラスＮ−ｇｒａｍモデルは、単語ク
ラスＮ−ｇｒａｍモデル生成部２３により、テキストデ
ータベースメモリ３１に格納された多数の日本語の書き
下し文からなるコーパスと呼ばれるテキストデータに基
づいて、例えば公知の最尤推定法を用いて単語のクラス
タリングを行って単語クラスＮ−ｇｒａｍモデルを生成
し、このモデルを単語クラスＮ−ｇｒａｍモデルメモリ
４３に格納する。First, modeling of a word series including unregistered words will be described. The basis of the statistical language model according to this embodiment is the class N-gram model of words. The word class N-gram model is generated by the word class N-gram model generation unit 23 based on text data called a corpus composed of a large number of Japanese written sentences stored in the text database memory 31, for example, a known maximum likelihood. Word clustering is performed using the estimation method to generate a word class N-gram model, and this model is stored in the word class N-gram model memory 43.

【００３５】単語クラスＮ−ｇｒａｍモデルでは、単語
系列Ｗ＝ｗ₁，ｗ₂，ｗ₃，ｗ₄，…，ｗ_Tの言語的尤度ｐ
ｈ（Ｗ）が一般に次式で与えられる。ただし、ｗ_tは単
語系列Ｗのｔ番目の単語であり、ｃ^wtは単語ｗ_tの語彙
クラスを表わすものとする。In the word class N-gram model, the linguistic likelihood p of the word sequence W = w ₁ , w ₂ , w ₃ , w ₄ , ..., W _T.
h (W) is generally given by the following equation. However, w _t is the t-th word of the word sequence W, and c ^wt represents the vocabulary class of the word w _t .

【００３６】[0036]

【数１】 [Equation 1]

【００３７】ところで、単語ｗ_t（以下、１つの単語を
ｗで表す。）には認識語彙にない未登録語が含まれてい
る。これら未登録語の生起確率を音素並び（又は読み）
の統計的特徴に基づいて推定するとき、上記数１中のク
ラス内単語１−ｇｒａｍ確率ｐ（ｗ｜ｃ^w）は次式によ
り与えられる。ただし、Ｍ^wは単語ｗのモーラ系列を表
す。By the way, the word w _t (hereinafter, one word is represented by w) includes an unregistered word which is not in the recognition vocabulary. The occurrence probabilities of these unregistered words are arranged in phonemes (or read).
When estimated on the basis of the statistical characteristics of, the in-class word 1-gram probability p (w | c ^w ) in the above equation 1 is given by the following equation. However, M ^w represents the mora sequence of the word w.

【００３８】[0038]

【００３９】ここで、単語辞書は、単語辞書メモリ１２
に格納される語彙辞書であり、ＯＯＶは未登録語であ
り、ｉｎＶｏｃは単語辞書内の条件を表す。上記数２に
おいて、確率ｐ（ＯＯＶ｜ｃ^w）は、クラスから未登録
語が生起する確率であり、例えば公知の方法（例えば、
従来技術文献「政瀧ほか，“品詞及び可変長形態素列の
複合Ｎ−ｇｒａｍを用いた形態素解析”，言語処理学会
会誌「自然言語処理」，Ｖｏｌ．６，Ｎｏ．２，ｐｐ．
４１−５７，１９９９年」など参照。）で推定できる。
この推定方法について説明すると、公知のチューリング
（Ｔｕｒｉｎｇ）推定法を用いたとき、データ上にｒ回
出現する形態素は、次式のｒ^*回と推定される。Here, the word dictionary is the word dictionary memory 12
Is a vocabulary dictionary stored in, OOV is an unregistered word, and inVoc represents a condition in the word dictionary. In the above equation 2, the probability p (OOV | c ^w ) is the probability that an unregistered word will occur from the class, and for example, a known method (for example,
Prior art document "Masaki Taki et al.," Morphological analysis using complex N-gram of part-of-speech and variable length morpheme sequence ", Journal of the Language Processing Society" Natural Language Processing ", Vol. 6, No. 2, pp.
41-57, 1999 ". ) Can be estimated.
This estimation method will be described. When a known Turing estimation method is used, a morpheme that appears r times in the data is estimated to be r ^* times in the following equation.

【００４０】[0040]

【数３】ｒ^*＝｛（ｒ＋１）ｎ_r+1｝／ｎ_r ## EQU3 ## r ^* = {(r + 1) n _{r + 1} } / n _r

【００４１】ここで、ｎ_rはデータ上にｒ回出現した形
態素の種類数を表す。従って、ｒ回出現する形態素ｗの
品詞からの出現確率Ｐ（ｗ｜ｃ_ξ）は、次式で表され
る。Here, n _r represents the number of types of morphemes that appear r times in the data. Therefore, the appearance probability P (w | c _ξ ) from the part of speech of the morpheme w that appears r times is represented by the following equation.

【００４２】[0042]

【数４】Ｐ（ｗ｜ｃ_ξ）＝ｒ^*／Ｎ（ｃ_ξ）## EQU4 ## P (w | c _ξ ) = r ^* / N (c _ξ )

【００４３】上記出現確率Ｐ（ｗ｜ｃ_ξ）を、クラスｃ
_ξに属する全ての形態素について計算し、次式に示すよ
うに、１から引いた残りが品詞ｃ_ξから未知語出現する
確率Ｐ（ｃ_h _ξ）である。The appearance probability P (w | c _ξ ) is given as a class c
All the morphemes belonging to _ξ are calculated, and the remainder subtracted from 1 is the probability P (c _h _ξ ) of the unknown word appearing from the part of speech c _ξ as shown in the following expression.

【００４４】[0044]

【数５】 [Equation 5]

【００４５】本実施形態では、限られた評価セット上で
未登録語モデルの有効な評価を行うことに主眼を置き、
次式を用いてモデル化を行う。すなわち、未登録語の生
起は予め規定したいくつかのクラス（ここで、クラスの
集合をＣ^OOVという。）のみに許すこととし、これらク
ラスからの単語生起は全て未登録語モデルで説明するこ
ととし、登録語を作らない。In the present embodiment, focusing on effective evaluation of unregistered word models on a limited evaluation set,
Modeling is performed using the following formula. That is, the occurrence of unregistered words should be allowed only to some predefined classes (herein, a set of classes is called C ^OOV ), and all the word occurrences from these classes should be explained by the unregistered word model. And do not make a registered word.

【００４６】[0046]

【数６】 [Equation 6]

【数７】 [Equation 7]

【００４７】次いで、日本人姓及び名の未登録語モデル
について説明する。上述したように、日本人姓及び名の
読みには、モーラ長、及びモーラ並び、それぞれに関し
て特徴的な傾向が見られた。従って、上記数７の未登録
語モデルｐ（Ｍ^w｜ｃ^w）は、次式のように展開すること
により、高精度なモデル化が可能となる。ただし、ｌｅ
ｎ（Ｍ^w）は単語ｗのモーラ長を表す。Next, the unregistered word model of Japanese surnames and given names will be described. As described above, the reading of Japanese surnames and surnames showed characteristic tendencies for the mora length and mora arrangement. Therefore, the unregistered word model p (M ^w | c ^w ) of the above equation 7 can be modeled with high accuracy by expanding as in the following equation. However, le
n (M ^w ) represents the mora length of the word w.

【００４８】[0048]

【数８】ｐ（Ｍ^w｜ｃ^w）＝ｐ（ｌｅｎ（Ｍ^w）｜ｃ^w）・
ｐ（Ｍ^w｜ｃ^w，ｌｅｎ（Ｍ^w））## EQU8 ## p (M ^w | c ^w ) = p (len (M ^w ) | c ^w ).
p (M ^w | c ^w , len (M ^w ))

【００４９】上記数８の確率ｐ（ｌｅｎ（Ｍ）｜ｃ）
は、日本人姓又は名クラスにおいて、長さｌｅｎ（Ｍ）
の単語が生起する確率である。本実施形態では、その確
率分布が次式で与えられるガンマ分布に実質的に従うこ
とを仮定する。すなわち、上記数８の右辺は、第１項の
モーラ長のガンマ分布の確率と、第２項のサブワード単
位バイグラムの確率との掛け算になっている。ただし、
α，λはクラスｃに依存するパラメータであり、モーラ
長の平均と分散より定まる。The probability p (len (M) | c) of the above equation 8
Is the length len (M) in the Japanese surname or surname class
Is the probability that the word will occur. In this embodiment, it is assumed that the probability distribution substantially follows the gamma distribution given by the following equation. That is, the right side of the above equation 8 is a product of the probability of the gamma distribution of the mora length of the first term and the probability of the subword unit bigram of the second term. However,
α and λ are parameters depending on the class c, and are determined by the average and variance of the mora length.

【００５０】[0050]

【数９】ここで、[Equation 9] here,

【数１０】 [Equation 10]

【００５１】一方、上記数８の確率ｐ（Ｍ^w｜ｃ^w，ｌｅ
ｎ（Ｍ^w））は、クラスｃ^wにおいて長さｌｅｎ（Ｍ^w）
のモーラ並びがＭ^w＝ｍ₁ ^w，ｍ₂ ^w，…となる確率であ
り、次式のサブワード単位Ｎ−ｇｒａｍによりモデル化
する。ただし、Ｕ＝ｕ₁，ｕ₂，…は詳細後述する手法で
自動獲得したサブワード単位（モーラ又はモーラ連鎖）
の系列である。また、式中のサブワード単位Ｎ−ｇｒａ
ｍには終端記号への遷移を含まない。On the other hand, the probability p (M ^w | c ^w , le of the above equation 8 is
n (M ^w )) has length len (M ^w ) in class c ^w
Is a probability that the mora arrangement of M ^w = m ₁ ^w , m ₂ ^w , ... Is modeled by the subword unit N-gram of the following equation. However, U = u ₁ , u ₂ , ... Is a subword unit (mora or mora chain) automatically acquired by the method described later in detail.
It is a series of. In addition, subword unit N-gra in the formula
The transition to the terminal symbol is not included in m.

【００５２】[0052]

【数１１】 [Equation 11]

【００５３】以上述べてきた本実施形態に係る統計的言
語モデルにおいて、「．．．あさぎ野陽子と
．．．」が出力される例を図７に示す。例では、日本
人姓及び名クラスの単語「あさぎ野」、「陽子」に対し
て、クラスラベル付きモーラ系列「アサギノ（日
姓）」、「ヨオコ（日名）」が出力される。本モデルで
は、日本人姓及び名の生起に対して、次の３レベルから
言語的制約をかける。In the statistical language model according to this embodiment described above, “... Asagino Yoko and
．．． 7 shows an example in which "" is output. In the example, the mora series “Asagino (day surname)” and “Yoko (day name)” with class labels are output for the words “Asagino” and “Yoko” in the Japanese surname and given name class. In this model, linguistic restrictions are imposed on the occurrence of Japanese surnames and given names from the following three levels.

【００５４】＜３レベルの言語的制約＞（１）単語間制約：単語のクラスＮ−ｇｒａｍを用い、
単語コンテキストにおいて日本人姓及び名（クラス）が
生起する尤度を評価する。サブワードによる姓及び名の
モデル化は下位の階層に隠蔽されるため、登録語系列の
モデル化には悪影響を及ぼさない。（２）姓及び名区間の継続長制約：姓及び名それぞれの
モーラ長に関するガンマ分布を用い、区間の姓及び名ら
しさを評価する。この制約により、不当に短い、もしく
は長いモーラ系列の湧き出しを防ぐことができる。（３）サブワードの並び制約：モーラとモーラ連鎖を単
位とする姓及び名のサブワード単位Ｎ−ｇｒａｍを用い
る。モーラ連鎖を単位とすることで、Ｎ−ｇｒａｍの高
精度化が期待できる。ここで、モデル化単位とするモー
ラ連鎖は、後述する繰り返し学習において自動的に獲得
する。<Three-Level Linguistic Constraints> (1) Interword Constraints: Using a class N-gram of words,
Evaluate the likelihood of Japanese surnames and given names (classes) occurring in a word context. The modeling of surnames and given names by subwords is hidden in the lower hierarchy, and therefore does not adversely affect the modeling of registered word sequences. (2) Continuation length constraint of family name and first name section: The family name and first name likeness of the section are evaluated using the gamma distribution regarding the mora length of each family name and first name. With this restriction, it is possible to prevent the emergence of an unduly short or long mora sequence. (3) Subword arrangement constraint: A subword unit N-gram of a family name and a given name in units of a mora and a mora chain is used. By using the mora chain as a unit, higher accuracy of N-gram can be expected. Here, the mora chain as a modeling unit is automatically acquired in the iterative learning described later.

【００５５】次いで、未登録語モデル生成部２０によっ
て実行される未登録語モデルの学習生成処理について説
明する。未登録語モデル生成部２０は、学習データメモ
リ３０に格納された日本人姓ファイル３０ａ及び日本人
名ファイル３０ｂに基づいて、日本人の姓クラス（ラベ
ルでは、日姓と略記する。）及び日本人の名クラス（ラ
ベルでは、日名と略記する。）の未登録語モデルを構築
する。この学習生成処理では、具体的には、サブワード
単位Ｎ−ｇｒａｍモデルとモーラ長ガンマ分布のデータ
を生成する。以下の実施形態では、個人名はそれぞれ等
しい確率で出現するとし、各姓又は各名の観測頻度とし
て人名リスト中の同姓者又は同名者の人数を用いること
とする。サブワード単位Ｎ−ｇｒａｍには、初期単位セ
ットとして単一モーラのみを与え、後述の繰り返し学習
において、逐次的に新たなモーラ連鎖を単位セットに追
加していく。これら単位候補となるモーラ連鎖には頻度
による予備選択を施すことで、学習の効率化を図った。Next, an unregistered word model learning generation process executed by the unregistered word model generation unit 20 will be described. The unregistered word model generation unit 20, based on the Japanese surname file 30a and the Japanese name file 30b stored in the learning data memory 30, the Japanese surname class (abbreviated as Japanese surname in the label) and Japanese. An unregistered word model of the first name class (abbreviated as day name in the label) is constructed. In this learning generation process, specifically, subword unit N-gram model and mora length gamma distribution data are generated. In the following embodiments, it is assumed that individual names appear with equal probability, and the same surname or the number of same names in the personal name list is used as the observation frequency of each surname or each name. Only a single mora is given to the sub-word unit N-gram as an initial unit set, and new mora chains are sequentially added to the unit set in iterative learning described later. The mora chain, which is a unit candidate, is pre-selected according to frequency to improve learning efficiency.

【００５６】図３は、図１の未登録語モデル生成部２０
によって実行される未登録語モデル生成処理を示すフロ
ーチャートであり、図４は、図３のサブルーチンである
サブワード２−ｇｒａｍの単位決定処理（ステップＳ
４）を示すフローチャートである。FIG. 3 shows the unregistered word model generator 20 of FIG.
FIG. 4 is a flowchart showing an unregistered word model generation process executed by FIG. 4, and FIG. 4 is a subword 2-gram unit determination process (step S in FIG. 3) which is a subroutine of FIG.
It is a flow chart which shows 4).

【００５７】未登録語モデルの学習データとして、モー
ラ系列で表現された姓及び名のリストを用い、このリス
トデータはそれぞれ学習データメモリ３０内の日本人姓
ファイル３０ａ及び日本人名ファイル３０ｂに格納され
ている。この学習データは先の表１及び表２の通りであ
って、各モーラをカンマ「，」で区切って表記してあ
る。この学習データに基づいて、未登録語モデル、すな
わち、モーラ長のガンマ分布とサブワード単位Ｎ−ｇｒ
ａｍモデルを生成する。以下では、Ｎ−ｇｒａｍの次数
Ｎが２の場合について、未登録語モデルの生成手順を説
明する。As the learning data of the unregistered word model, a list of surnames and first names expressed in the Mora series is used, and the list data are stored in the Japanese surname file 30a and the Japanese name file 30b in the learning data memory 30, respectively. ing. This learning data is as shown in Tables 1 and 2 above, and each mora is described by being separated by a comma ",". Based on this learning data, an unregistered word model, that is, gamma distribution of mora length and subword unit N-gr
Generate an am model. In the following, a procedure for generating an unregistered word model will be described for the case where the degree N of N-gram is 2.

【００５８】図３のステップＳ１において学習データメ
モリ３０から学習データを読み出し、ステップＳ２にお
いて、読み出した学習データに基づいて、姓又は名の１
単語当たりのモーラ数の平均μと分散Ｖを計算した後、
次式を用いてモーラ長のガンマ分布のパラメータを推定
する。In step S1 of FIG. 3, the learning data is read from the learning data memory 30, and in step S2, the family name or first name is set to 1 based on the read learning data.
After calculating the average μ and the variance V of the number of mora per word,
The parameters of the gamma distribution of the mora length are estimated using the following formula.

【００５９】[0059]

【数１２】λ＝Ｖ／μ(12) λ = V / μ

【数１３】α＝μ²／Ｖ[Formula 13] α = μ ² / V

【００６０】さらに、ステップＳ３において上記学習デ
ータに基づいて、サブワード単位２−ｇｒａｍの単位候
補となる、高頻度のモーラ連鎖を抽出し、抽出した単位
候補を「単位候補セット」と呼ぶ。ここでは、学習デー
タ上に出現する、例えば長さ２以上の全てのモーラ連鎖
に対して、その頻度を調べ、頻度が所定の値（＝１０
０）以上のモーラ連鎖を単位候補として抽出する。次い
で、ステップＳ４において、図４に示すサブルーチンで
あるサブワード２−ｇｒａｍの単位決定処理を実行し、
最後に、ステップＳ５において、学習終了後の暫定未登
録語モデルをサブワード単位Ｎ−ｇｒａｍモデルとして
サブワード単位モデルＮ−ｇｒａｍモデルメモリ４０に
格納するとともに、モーラ長のガンマ分布のデータをモ
ーラ長ガンマ分布データメモリ４１に格納する。Further, in step S3, a high-frequency mora chain, which is a unit candidate of the subword unit 2-gram, is extracted based on the learning data, and the extracted unit candidate is called a "unit candidate set". Here, the frequency of all mora chains appearing on the learning data, for example, all the mora chains having a length of 2 or more is examined, and the frequency is a predetermined value (= 10).
0) The above mora chains are extracted as unit candidates. Next, in step S4, the unit determination process of the sub word 2-gram, which is the subroutine shown in FIG. 4, is executed,
Finally, in step S5, the provisional unregistered word model after the learning is stored in the subword unit model N-gram model memory 40 as a subword unit N-gram model, and the mora length gamma distribution data is stored. The data is stored in the data memory 41.

【００６１】図４のサブワード２−ｇｒａｍの単位決定
処理においては、まず、ステップＳ１１において確定単
位セットにすべての単一のモーラを挿入し、ステップＳ
１２において単位候補セットの中から１つのモーラ連鎖
を選択する。次いで、ステップＳ１３において選択した
モーラ連鎖は確定単位セットに含まれているか否かを判
断し、ＹＥＳのときはステップＳ１５に進む一方、ＮＯ
のときは、ステップＳ１４に進む。ステップＳ１４にお
いて選択したモーラ連鎖を確定単位セットに追加して、
公知の最尤推定法を用いて暫定サブワード２−ｇｒａｍ
モデルを生成してステップＳ１５に進む。ここで、２−
ｇｒａｍモデルは、学習データと追加セットとの間の１
−ｇｒａｍと、追加セットのみの０−ｇｒａｍを用い
て、公知の削除補間法（例えば、従来技術文献「中川聖
一，”確率モデルによる音声認識”，社団法人電子情報
通信学会編，ｐｐ．６３−６４，昭和６３年７月１日発
行」など参照。）を用いて補間して生成する。この暫定
サブワード２−ｇｒａｍと、モーラ長のガンマ分布デー
タを合わせて「暫定未登録語モデル」と呼ぶ。ステプＳ
１５においては、すべてのモーラ連鎖について上記ステ
ップＳ１３及びＳ１４の処理したか否かを判断し、ＮＯ
であるときはステップＳ１２に戻り上記の処理を繰り返
すが、ＹＥＳのときはステップＳ１６に進む。ステップ
Ｓ１６において各暫定未登録語モデルに対して数８を用
いて平均尤度を計算し、平均尤度を最大にする暫定未登
録語モデルの単位セットを新しい確定単位セットとす
る。そして、ステップＳ１７において確定単位セットに
含まれるモーラ連鎖の数≧所定のしきい値Ｎ_th（例え
ば、１５０である。）であるか否かが判断され、ＮＯの
ときはステップＳ１２に戻り上記の処理を繰り返す一
方、ＹＥＳのときは元のメインルーチンに戻る。In the unit determination process for the subword 2-gram in FIG. 4, first, at step S11, all single moras are inserted into the fixed unit set, and then step S11.
At 12, one mora chain is selected from the unit candidate set. Next, it is determined whether or not the mora chain selected in step S13 is included in the fixed unit set. If YES, the process proceeds to step S15, while NO
If so, the process proceeds to step S14. Add the mora chain selected in step S14 to the fixed unit set,
The provisional subword 2-gram using the known maximum likelihood estimation method
A model is generated and the process proceeds to step S15. Where 2-
The gram model is one between the training data and the additional set.
-Gram and 0-gram of only the additional set are used to perform a known deletion interpolation method (for example, the prior art document "Seiichi Nakagawa," Speech recognition by stochastic model ", edited by the Institute of Electronics, Information and Communication Engineers, pp. 63). -64, issued July 1, 1988, etc.). This provisional subword 2-gram and gamma distribution data of mora length are collectively referred to as a "provisional unregistered word model". Step S
At 15, it is determined whether or not the steps S13 and S14 have been performed for all the mora chains, and NO.
If so, the process returns to step S12 to repeat the above process, but if YES, the process proceeds to step S16. In step S16, the average likelihood is calculated for each of the temporary unregistered word models using Equation 8, and the unit set of the temporary unregistered word model that maximizes the average likelihood is set as a new definite unit set. Then, in step S17, it is determined whether or not the number of mora chains included in the definite unit set ≧ a predetermined threshold value N _th (for example, 150). If NO, the process returns to step S12 and the above-mentioned is executed. While repeating the processing, when YES is returned to the original main routine.

【００６２】図９は、図１の未登録語モデル生成部２０
によって実行される未登録語モデル生成処理における、
モーラ連鎖の単位化による平均尤度の向上を示すグラフ
であって、モーラ連鎖の種類の数に対する平均尤度を示
すグラフである。すなわち、図９は繰り返し学習におけ
る平均尤度（数８）の変化を示す。単位候補とするモー
ラ連鎖は、頻度が１００以上のものとした。表１乃至表
３に示す学習データからは、姓モデルで１，８２９種
類、名モデルで１，６６０種類のモーラ連鎖が単位候補
となる。サブワード単位Ｎ−ｇｒａｍはＮ＝２とし、１
−ｇｒａｍ、０−ｇｒａｍを用いた公知の削除補間法で
補間した。図９に示すように、モーラ連鎖をサブワード
単位として追加していくことで、学習データに対する平
均尤度は単調に上昇する。モーラ連鎖を１５０個追加し
たモデルの平均尤度は、モーラ連鎖を用いないモデルに
比べ、姓モデルで３．９倍、名モデルで３．２倍となっ
た。未登録語モデルを単語１−ｇｒａｍとみなすと、単
語の学習セットパープレキシティは姓モデルで７４％、
名モデルで６９％改善されることになる。FIG. 9 shows the unregistered word model generator 20 of FIG.
In the unregistered word model generation process executed by
It is a graph which shows improvement of average likelihood by unitization of a mora chain, and is a graph which shows average likelihood with respect to the number of kinds of mora chain. That is, FIG. 9 shows changes in the average likelihood (Equation 8) in iterative learning. The mora chain as a unit candidate has a frequency of 100 or more. From the learning data shown in Tables 1 to 3, 1,829 kinds of surname models and 1,660 kinds of surname models are mora chains as unit candidates. The subword unit N-gram is N = 2, and 1
Interpolation was performed by a known deletion interpolation method using -gram and 0-gram. As shown in FIG. 9, by adding the mora chain in subword units, the average likelihood for the learning data monotonically increases. The average likelihood of the model in which 150 mora chains were added was 3.9 times in the surname model and 3.2 times in the surname model as compared to the model without the mora chain. If we consider the unregistered word model as the word 1-gram, the learning set perplexity of the word is 74% for the surname model,
It will be improved by 69% with the famous model.

【００６３】次いで、未登録語モデルを単語Ｎ−ｇｒａ
ｍ形式による単語辞書に実装する方法について説明す
る。上述の未登録語モデルは、以下に述べる方法によ
り、近似なく、クラスＮ−ｇｒａｍの形式で扱うことが
できる。そのため、統計的言語モデルとしてクラスＮ−
ｇｒａｍを扱うことが可能なデコーダであれば、デコー
ダの変更無しに、本方法による未登録語の認識が可能と
なる。ただし、極端に長い未登録語（本実施形態では、
１０モーラ以上の姓及び名）が認識対象とならないこと
が条件となる。サブワード単位Ｎ−ｇｒａｍで単位とし
て用いるモーラ及びモーラ連鎖は、擬似的な単語として
扱い、認識辞書、及びクラスＮ−ｇｒａｍに組み込む。
その際、各サブワード単位は以下のラベル付けによる展
開を行い、ラベル違いの同一サブワード単位を複数生成
する。Next, the unregistered word model is set to the word N-gra.
A method of implementing the m-type word dictionary will be described. The unregistered word model described above can be handled in the class N-gram format without approximation by the method described below. Therefore, as a statistical language model, class N-
Any decoder that can handle gram can recognize unregistered words by this method without changing the decoder. However, extremely long unregistered words (in this embodiment,
The condition is that family names and surnames of 10 mora or more) are not recognized. The mora and mora chain used as a unit in the sub-word unit N-gram are treated as a pseudo word and are incorporated in the recognition dictionary and the class N-gram.
At this time, each subword unit is expanded by the following labeling to generate a plurality of identical subword units having different labels.

【００６４】すなわち、ラベル付きサブワード単位デー
タ生成部２０は、サブワード単位Ｎ−ｇｒａｍメモリ４
２に格納された、日本人姓及び名のサブワード単位Ｎ−
ｇｒａｍそれぞれに対して以下の処理を実行し、この結
果得られるラベル付きサブワード単位群のデータをラベ
ル付きサブワード単位データメモリ４０に格納する。す
なわち、日本人姓（もしくは日本人名）サブワード単位
Ｎ−ｇｒａｍで単位として用いられるサブワード単位
（単一モーラ、モーラ連鎖）を全て抽出する。次いで、
抽出したそれぞれのサブワード単位に応じて、ラベルを
複数生成する。生成した各ラベルを当該サブワード単位
に付与することにより、サブワード単位あたり複数のラ
ベル付きサブワード単位が生成される。ここでラベルと
は、（ａ）クラス（本実施形態においては、「日姓」又
は「日名」）、（ｂ）単語内の開始位置（１，２，…，
ＬｅｎＭａｘ＋１；サブワード単位のモーラ数：ここ
で、ＬｅｎＭａｘは事前に設定された値であり、認識対
象とする最長の姓もしくは名のモーラ数を意味す
る。）、及び（ｃ）単語の終端であるか否か（終端を示
す「終」又は「−」）の３項組みである。That is, the labeled sub-word unit data generating section 20 uses the sub-word unit N-gram memory 4
Subword unit N- of Japanese surname and given name stored in 2
The following processing is executed for each gram, and the resulting data of the labeled subword unit group is stored in the labeled subword unit data memory 40. That is, all subword units (single mora, mora chain) used as units in the Japanese surname (or Japanese name) subword unit N-gram are extracted. Then
A plurality of labels are generated according to each extracted subword unit. By assigning each generated label to the corresponding subword unit, a plurality of labeled subword units are generated per subword unit. Here, the label means (a) class (in the present embodiment, “day family name” or “day name”), and (b) start position (1, 2, ...) In the word.
LenMax + 1; number of mora in subwords: Here, LenMax is a preset value, and means the number of mora of the longest family name or given name to be recognized. ), And (c) whether or not it is the end of a word (“end” or “−” indicating the end).

【００６５】上記（ｂ）の開始モーラ位置による展開
は、学習データに出現する最長の姓及び名に合わせ、と
もに終端位置が９モーラまでとなるようにした。上記
（ｃ）で単語終端ラベルを付与したサブワード単位に
は、音素並び（読み）の終端にポーズが入ることを許容
する。ここで、ラベル付きサブワードは、その遷移に次
の制約を受ける。（ｉ）登録語のクラスからラベル付き
サブワードへの遷移は、ラベル付きサブワードの開始モ
ーラ位置が１の場合のみ許される。逆に、（ii）ラベル
付きサブワードから登録語のクラスへの遷移は、ラベル
付きサブワードに単語終端ラベルが付与されている場合
のみ許される。（iii）ラベル付きサブワード間の遷移
は、単語内でのモーラ位置が連接し、かつ同一のクラス
に属する場合のみ許される。The expansion by the starting mora position in the above (b) is performed so that the ending position is up to 9 mora in accordance with the longest surname and given name appearing in the learning data. In the subword unit to which the word end label is added in (c) above, a pause is allowed at the end of the phoneme sequence (reading). Here, the labeled subword is subject to the following restrictions on its transition. (I) The transition from the registered word class to the labeled subword is allowed only when the starting mora position of the labeled subword is 1. On the contrary, (ii) the transition from the labeled subword to the class of the registered word is allowed only when the word end label is given to the labeled subword. (Iii) Transitions between labeled subwords are allowed only when the mora positions within a word are concatenated and belong to the same class.

【００６６】さらに、単語辞書生成部２２は、ラベル付
きサブワード単位データメモリ４２内のデータと、テキ
ストデータメモリ３１内のテキストデータとに基づいて
以下のように単語辞書を生成して単語辞書メモリ１２に
格納する。まず、単語辞書生成部２２は、テキストデー
タベースメモリ３１内のテキストデータ中に出現する全
ての単語を抽出し、単語辞書メモリ１２に格納する。次
いで、ラベル付きサブワード単位データメモリ４２中の
全てのラベル付きサブワード単位を単語辞書メモリ１２
に格納する。単語辞書メモリ１２に格納された全てのエ
ントリに対して、人手で、もしくは、読みと音素系列と
の対応テーブルを用いて音素付与を行う公知の音素付与
処理プログラムを用いて、その読み（音素系列）を付与
することにより単語辞書を生成する。これによって、単
語辞書メモリ１２に単語辞書が生成格納される。Further, the word dictionary generator 22 generates a word dictionary as follows based on the data in the labeled subword unit data memory 42 and the text data in the text data memory 31, and the word dictionary memory 12 To store. First, the word dictionary generation unit 22 extracts all the words that appear in the text data in the text database memory 31 and stores them in the word dictionary memory 12. Then, all the labeled subword units in the labeled subword unit data memory 42 are stored in the word dictionary memory 12
To store. For all the entries stored in the word dictionary memory 12, the reading (phoneme sequence) is performed manually or by using a known phoneme assignment processing program that assigns phonemes using a correspondence table between the reading and the phoneme sequence. ) Is added to generate a word dictionary. As a result, the word dictionary is generated and stored in the word dictionary memory 12.

【００６７】次いで、言語モデル生成部２４の統計的言
語モデル生成処理について説明する。ラベル付きサブワ
ード単位ｕ_dを単語として扱う際、クラスＮ−ｇｒａｍ
（Ｎ＝２）における確率は、以下のように与える（Ｎ＞
２でも同様）。ただし、ｕはラベル付け前のサブワード
単位を表し、＃は単語内のモーラ開始位置記号を表し、
ｃは語彙クラスを表し、ｌｅｎ（ｕ）はサブワード単位
ｕのモーラ長、ｐ_sw（ｕ_j｜ｕ_i，ｃ）はクラスに依存し
たサブワード単位２−ｇｒａｍであり、ｐ_LM（ｃ’｜
ｃ）はクラス２−ｇｒａｍのクラス間遷移確率を表す。
また、ｃｌａｓｓｏｆ（ｕ_d），ｓｔａｒｔｏｆ
（ｕ_d），ｉｓｅｎｄ（ｕ_d）はそれぞれ、ラベル付きサ
ブワード単位ｕ_dのラベルである語彙クラス、単語内の
モーラ開始位置、単語終端か否か、を参照する関数であ
り、ｅｎｄｏｆ（ｕ_d）はｓｔａｒｔｏｆ（ｕ_d）＋ｌｅ
ｎ（ｕ）により与えるものとする。Ｇ_c（＊）はクラス
のモーラ長ガンマ分布ｇ_c（ｘ）（数９）に基づく確率
関数であり、次の定積分により与えられる。Next, the statistical language model generation process of the language model generation unit 24 will be described. When dealing with a label with the sub-word units u _d as a word, class N-gram
The probability at (N = 2) is given as follows (N>
2 also applies). However, u represents the subword unit before labeling, # represents the mora start position symbol in the word,
c represents a vocabulary class, len (u) is the mora length of the subword unit u, p _sw (u _j | u _i , c) is the subword unit 2-gram depending on the class, and p _LM (c ′ |
c) represents the transition probability between classes of class 2-gram.
In addition, classof (u _d), startof
_{(U d), isend (u} d) , respectively, is a function to refer vocabulary class is the label of the labeled sub-word units u _d, mora start position in the word, the word end whether the, endof (u _d ) is startof (u _d) + le
It is given by n (u). G _c (*) is a probability function based on the class mora length gamma distribution g _c (x) (Equation 9), and is given by the following definite integral.

【００６８】[0068]

【数１４】 [Equation 14]

【数１５】 [Equation 15]

【００６９】まず、先頭サブワードの生起確率である姓
及び名クラスのクラス内１−ｇｒａｍ確率ｐ（ｕ_d｜
ｃ）は、ラベル付きサブワード単位ｕ_dの開始モーラ位
置ラベルが１である場合のみ許す。従って、次式で表す
ことができる。[0069] First of all, is the probability of occurrence of the first sub-word last name and the name of the class within a 1-gram probability p (u _d |
c) it is permitted only if the start mora position label labeled subword unit u _d is 1. Therefore, it can be expressed by the following equation.

【００７０】[0070]

【数１６】（１）もし（ｃｌａｓｓｏｆ（ｕ_d）＝ｃ）
∧（ｓｔａｒｔｏｆ（ｕ_d）＝１） ∧（ｉｓｅｎｄ（ｕ_d）＝偽）のときｐ（ｕ_d｜ｃ）＝ｐ_sw（ｕ｜＃、ｃ）・Ｇ_c（ｘ＞ｌｅｎ
（ｕ））（２）もし（ｃｌａｓｓｏｆ（ｕ_d）＝ｃ）∧（ｓｔａ
ｒｔｏｆ（ｕ_d）＝１） ∧（ｉｓｅｎｄ（ｕ_d）＝真）のときｐ（ｕ_d｜ｃ）＝ｐ_sw（ｕ｜＃、ｃ）・Ｇ_c（ｘ＝ｌｅｎ
（ｕ））（３）もし上記以外のときｐ（ｕ_d｜ｃ）＝０Equation 16] (1) If (classof (u _d) = c)
_{∧ (startof (u d) =} 1) ∧ (isend (u d) = false) p when _{(u d | c) = p} sw (u | #, c) · G c (x> len
(U)) (2) if _{(classof (u d) = c} ) ∧ (sta
_{rtof (u d) = 1)} ∧ (isend (u d) = true) when _{p (u d | c) =} p sw (u | #, c) · G c (x = len
(U)) (3) If other than the above, p (u _d | c) = 0

【００７１】次いで、第１のクラス間２−ｇｒａｍ確率
である、ラベル付きサブワード単位間の遷移は、両者の
クラスが同じで、かつ、両者の単語内でのモーラ位置が
連接する場合のみ許す。従って、第１のクラス間２−ｇ
ｒａｍ確率ｐ（ｕ_dj｜ｕ_di）は次式で表すことができ
る。Next, the transition between labeled subword units, which is the first 2-gram probability between classes, is allowed only when both classes are the same and the mora positions in both words are concatenated. Therefore, 2-g between the first class
The ram probability p (u _dj | u _di ) can be expressed by the following equation.

【００７２】[0072]

【数１７】（１）もし（ｃｌａｓｓｏｆ（ｕ_dj）＝ｃｌ
ａｓｓｏｆ（ｕ_di）＝ｃ） ∧（ｓｔａｒｔｏｆ（ｕ_dj）＝ｅｎｄｏｆ（ｕ_di）＋
１） ∧（ｉｓｅｎｄ（ｕ_dj）＝偽）のときｐ（ｕ_dj｜ｕ_di）＝ｐ_sw(ｕ_j｜ｕ_i，ｃ)・Ｇ_c（ｘ＞
（ｅｎｄｏｆ（ｕ_di）＋ｌｅｎ（ｕ_dj）））／Ｇ_c（ｘ
＞ｅｎｄｏｆ（ｕ_di））（２）もし（ｃｌａｓｓｏｆ（ｕ_dj）＝ｃｌａｓｓｏｆ
（ｕ_di）＝ｃ） ∧（ｓｔａｒｔｏｆ（ｕ_dj）＝ｅｎｄｏｆ（ｕ_di）＋
１） ∧（ｉｓｅｎｄ（ｕ_dj）＝真）のときｐ（ｕ_dj｜ｕ_di）＝ｐ_sw(ｕ_j｜ｕ_i，ｃ)・Ｇ_c（ｘ＝
（ｅｎｄｏｆ（ｕ_di）＋ｌｅｎ（ｕ_dj）））／Ｇ_c（ｘ
＞ｅｎｄｏｆ（ｕ_di））（３）もし上記以外のときｐ（ｕ_dj｜ｕ_di）＝０(17) (1) If (classof (u _dj ) = cl
assof (u _di ) = c) ∧ (startof (u _dj ) = endof (u _di ) +
1) When ∧ (isend (u _dj ) = false) p (u _dj | u _di ) = p _sw (u _j | u _i , c) · G _c (x>
(Endof (u _di ) + len (u _dj )) / G _c (x
> Endof (u _di )) (2) If (classof (u _dj ) = classof
(U _di ) = c) ∧ (startof (u _dj ) = endof (u _di ) +
1) When ∧ (isend (u _dj ) = true) p (u _dj | u _di ) = p _sw (u _j | u _i , c) · G _c (x =
(Endof (u _di ) + len (u _dj )) / G _c (x
> Endof (u _di )) (3) If other than the above, p (u _dj | u _di ) = 0

【００７３】さらに、第２のクラス間２−ｇｒａｍ確率
である、ラベル付きサブワード単位から、次単語のクラ
スへの遷移は、ラベル付きサブワード単位に単語終端ラ
ベルが付与されている場合のみに許す。従って、第２の
クラス間２−ｇｒａｍ確率ｐ（ｃ｜ｕ_d）は次式で表さ
れる。Furthermore, the transition from the labeled subword unit to the class of the next word, which is the second inter-class 2-gram probability, is allowed only when the word end label is assigned to the labeled subword unit. Thus, the second class between 2-gram probability p (c | u _d) is expressed by the following equation.

【００７４】[0074]

【数１８】（１）もしｉｓｅｎｄ（ｕ_d）＝真のときｐ（ｃ｜ｕ_d）＝ｐ_LM（ｃ｜ｃｌａｓｓｏｆ（ｕ_d））（２）もし上記以外のときｐ（ｃ｜ｕ_d）＝０Equation 18] (1) If iSEND (u _d) = true when _{p (c | u d) =} p LM (c | classof (u d)) (2) If the time other than the p (c | u _d ) = 0

【００７５】図５は、図１の言語モデル生成部２４によ
って実行される言語モデル生成処理を示すフローチャー
トである。図５において、まず、Ｓ２１において各メモ
リ４０、４１、４２からそれぞれ格納された各データを
読み出し、ステップＳ２２においてモーラ長のガンマ分
布のデータに基づいて、数１４及び数１５を用いて確率
関数の値を計算する。次いで、ステップＳ２３において
先頭サブワードの生起確率であるクラス内−ｇｒａｍ確
率を数１６を用いて計算し、ステップＳ２４においてサ
ブワード間の遷移確率である第１のクラス間２−ｇｒａ
ｍ確率を数１７を用いて計算し、ステップＳ２５におい
て終端サブワードから次単語のクラスに遷移する確率で
ある第２のクラス間２−ｇｒａｍ確率を数１７を用いて
計算する。そして、ステップＳ２６において上記計算さ
れた確率をまとめて、未登録語モデルに基づいて統計的
言語モデルとして統計的言語モデルメモリ４４に格納す
る。FIG. 5 is a flow chart showing the language model generation processing executed by the language model generation unit 24 of FIG. In FIG. 5, first, in S21, each stored data is read from each of the memories 40, 41, and 42, and in Step S22, the probability function of the probability function is calculated using Equations 14 and 15 based on the data of the gamma distribution of the mora length. Calculate the value. Next, in step S23, the in-class-gram probability that is the occurrence probability of the leading subword is calculated using Equation 16, and in step S24, the first inter-class 2-gram that is the transition probability between subwords.
The m probability is calculated using Expression 17, and the second inter-class 2-gram probability, which is the probability of transition from the terminal subword to the class of the next word, is calculated using Expression 17 in step S25. Then, in step S26, the probabilities calculated above are put together and stored in the statistical language model memory 44 as a statistical language model based on the unregistered word model.

【００７６】上記の言語モデル生成処理で生成された統
計的言語モデルの別の一例を図８に示す。図８の例にお
ける遷移確率は上述で定義されたものである。FIG. 8 shows another example of the statistical language model generated by the above language model generation processing. The transition probabilities in the example of FIG. 8 are as defined above.

【００７７】次いで、図１に示す連続音声認識システム
の構成及び動作について説明する。図１において、単語
照合部４に接続された音素隠れマルコフモデル（以下、
隠れマルコフモデルをＨＭＭという。）メモリ１１内の
音素ＨＭＭは、各状態を含んで表され、各状態はそれぞ
れ以下の情報を有する。（ａ）状態番号、（ｂ）受理可能なコンテキストクラ
ス、（ｃ）先行状態、及び後続状態のリスト、（ｄ）出
力確率密度分布のパラメータ、及び（ｅ）自己遷移確率
及び後続状態への遷移確率。なお、本実施形態において
用いる音素ＨＭＭは、各分布がどの話者に由来するかを
特定する必要があるため、所定の話者混合ＨＭＭを変換
して生成する。ここで、出力確率密度関数は３４次元の
対角共分散行列をもつ混合ガウス分布である。また、単
語照合部４に接続された単語辞書メモリ１２内の単語辞
書は、音素ＨＭＭメモリ１１内の音素ＨＭＭの各単語毎
にシンボルで表した読みを示すシンボル列を格納する。Next, the structure and operation of the continuous speech recognition system shown in FIG. 1 will be described. In FIG. 1, a phoneme hidden Markov model (hereinafter,
Hidden Markov model is called HMM. The phoneme HMM in the memory 11 is represented including each state, and each state has the following information. (A) state number, (b) acceptable context class, (c) list of preceding states and succeeding states, (d) parameters of output probability density distribution, and (e) self-transition probability and transition to succeeding states probability. The phoneme HMM used in the present embodiment is generated by converting a predetermined speaker mixed HMM because it is necessary to specify which speaker each distribution originates from. Here, the output probability density function is a Gaussian mixture mixture having a 34-dimensional diagonal covariance matrix. Further, the word dictionary in the word dictionary memory 12 connected to the word matching unit 4 stores a symbol string indicating the reading expressed in symbols for each word of the phoneme HMM in the phoneme HMM memory 11.

【００７８】図１において、話者の発声音声はマイクロ
ホン１に入力されて音声信号に変換された後、特徴抽出
部２に入力される。特徴抽出部２は、入力された音声信
号をＡ／Ｄ変換した後、例えばＬＰＣ分析を実行し、対
数パワー、１６次ケプストラム係数、Δ対数パワー及び
１６次Δケプストラム係数を含む３４次元の特徴パラメ
ータを抽出する。抽出された特徴パラメータの時系列は
バッファメモリ３を介して単語照合部４に入力される。In FIG. 1, the vocalized voice of the speaker is input to the microphone 1 and converted into a voice signal, and then input to the feature extraction unit 2. The feature extraction unit 2 performs, for example, LPC analysis after A / D conversion of the input voice signal, and a 34-dimensional feature parameter including logarithmic power, 16th-order cepstrum coefficient, Δ logarithmic power, and 16th-order Δ cepstrum coefficient. To extract. The time series of the extracted characteristic parameters is input to the word matching unit 4 via the buffer memory 3.

【００７９】単語照合部４は、ワン−パス・ビタビ復号
化法を用いて、バッファメモリ３を介して入力される特
徴パラメータのデータに基づいて、音素ＨＭＭメモリ１
１内の音素ＨＭＭと、単語辞書メモリ１２内の単語辞書
とを用いて単語仮説を検出し尤度を計算して出力する。
ここで、単語照合部４は、各時刻の各ＨＭＭの状態毎
に、単語内の尤度と発声開始からの尤度を計算する。尤
度は、単語の識別番号、単語の開始時刻、先行単語の違
い毎に個別にもつ。また、計算処理量の削減のために、
音素ＨＭＭ及び単語辞書とに基づいて計算される総尤度
のうちの低い尤度のグリッド仮説を削減する。単語照合
部４は、その結果の単語仮説と尤度の情報を発声開始時
刻からの時間情報（具体的には、例えばフレーム番号）
とともにバッファメモリ５を介して単語仮説絞込部６に
出力する。The word collating unit 4 uses the one-pass Viterbi decoding method and based on the data of the characteristic parameter input via the buffer memory 3, the phoneme HMM memory 1
Using the phoneme HMM in 1 and the word dictionary in the word dictionary memory 12, the word hypothesis is detected, the likelihood is calculated, and the likelihood is output.
Here, the word matching unit 4 calculates the likelihood within a word and the likelihood from the start of utterance for each state of each HMM at each time. The likelihood is individually held for each word identification number, word start time, and preceding word difference. Also, in order to reduce the amount of calculation processing,
Reduce the grid hypothesis of low likelihood out of the total likelihood calculated based on the phoneme HMM and the word dictionary. The word matching unit 4 uses the resulting word hypothesis and likelihood information as time information from the utterance start time (specifically, for example, a frame number).
Together with this, it outputs to the word hypothesis narrowing unit 6 via the buffer memory 5.

【００８０】単語仮説絞込部６は、単語照合部４からバ
ッファメモリ５を介して出力される単語仮説に基づい
て、統計的言語モデルメモリ４４内の統計的言語モデル
を参照して、終了時刻が等しく開始時刻が異なる同一の
単語の単語仮説に対して、当該単語の先頭音素環境毎
に、発声開始時刻から当該単語の終了時刻に至る計算さ
れた総尤度のうちの最も高い尤度を有する１つの単語仮
説で代表させるように単語仮説の絞り込みを行った後、
絞り込み後のすべての単語仮説の単語列のうち、最大の
総尤度を有する仮説の単語列を認識結果として出力す
る。なお、タスク適応化された統計的言語モデルは、各
タスク毎に１つの統計的言語モデルを備え、単語仮説絞
込部６は、音声認識しようとするタスクに対応する統計
的言語モデルを選択的に参照する。本実施形態において
は、好ましくは、処理すべき当該単語の先頭音素環境と
は、当該単語より先行する単語仮説の最終音素と、当該
単語の単語仮説の最初の２つの音素とを含む３つの音素
並びをいう。The word hypothesis narrowing unit 6 refers to the statistical language model in the statistical language model memory 44 on the basis of the word hypothesis output from the word matching unit 4 via the buffer memory 5, and determines the end time. For the same word with the same start time but different start times, the highest likelihood of the calculated total likelihood from the utterance start time to the end time of the word is calculated for each head phoneme environment of the word. After narrowing down the word hypotheses so that they are represented by one word hypothesis that they have,
Among the word strings of all the word hypotheses after narrowing down, the word string of the hypothesis having the maximum total likelihood is output as the recognition result. The task-adapted statistical language model has one statistical language model for each task, and the word hypothesis narrowing unit 6 selectively selects the statistical language model corresponding to the task to be speech-recognized. Refer to. In the present embodiment, preferably, the first phoneme environment of the word to be processed is the three phonemes including the final phoneme of the word hypothesis preceding the word and the first two phonemes of the word hypothesis of the word. Say a line.

【００８１】例えば、図２に示すように、（ｉ−１）番
目の単語Ｗ_i-1の次に、音素列ａ₁、ａ₂、…、ａ_nからな
るｉ番目の単語Ｗ_iがくるときに、単語Ｗ_i-1の単語仮説
として６つの仮説Ｗａ、Ｗｂ、Ｗｃ、Ｗｄ、Ｗｅ、Ｗｆ
が存在している。ここで、前者３つの単語仮説Ｗａ、Ｗ
ｂ、Ｗｃの最終音素は／ｘ／であるとし、後者３つの単
語仮説Ｗｄ、Ｗｅ、Ｗｆの最終音素は／ｙ／であるとす
る。終了時刻ｔ_eと先頭音素環境が等しい仮説（図２で
は先頭音素環境が“ｘ／ａ₁／ａ₂”である上から３つの
単語仮説）のうち総尤度が最も高い仮説（例えば、図２
において１番上の仮説）以外を削除する。なお、上から
４番めの仮説は先頭音素環境が違うため、すなわち、先
行する単語仮説の最終音素がｘではなくｙであるので、
上から４番めの仮説を削除しない。すなわち、先行する
単語仮説の最終音素毎に１つのみ仮説を残す。図２の例
では、最終音素／ｘ／に対して１つの仮説を残し、最終
音素／ｙ／に対して１つの仮説を残す。[0081] For example, as shown in FIG. 2, the (i-1) th word W _i-1 of the following phoneme string a _1, a _2, ..., come i th word W _i consisting a _n Sometimes, six hypotheses Wa, Wb, Wc, Wd, We, Wf are used as word hypotheses for the word W _i-1.
Exists. Here, the former three word hypotheses Wa, W
The final phoneme of b and Wc is / x /, and the final phoneme of the latter three word hypotheses Wd, We, and Wf is / y /. Of the hypotheses in which the end time t _e is equal to the head phoneme environment (in FIG. 2, the top phoneme environment is “x / a ₁ / a ₂ ”, the three word hypotheses from the top), the hypothesis with the highest total likelihood (for example, FIG. Two
Delete all but the first hypothesis). Since the fourth phoneme from the top has a different first phoneme environment, that is, the last phoneme of the preceding word hypothesis is y instead of x,
Do not delete the fourth hypothesis from the top. That is, only one hypothesis remains for each final phoneme of the preceding word hypothesis. In the example of FIG. 2, one hypothesis is left for the final phoneme / x / and one hypothesis is left for the final phoneme / y /.

【００８２】以上の実施形態においては、固有名詞の下
位クラスである日本人の姓及び名とを、未登録語の語彙
クラスとして用いているが、本発明はこれに限らず、以
下の固有名詞や外来語の普通名詞などに適用することが
できる。（１）外国人の姓と名、（２）地名、（３）会社名、
（４）各種施設名、（５）各種製品名など。従って、本
発明では、未登録語の語彙クラスとして、固有名詞及び
外来語の普通名詞の各下位クラスを用いることができ
る。In the above embodiments, Japanese surnames and given names, which are subclasses of proper nouns, are used as vocabulary classes of unregistered words, but the present invention is not limited to this, and the following proper nouns are used. It can be applied to common nouns of foreign words. (1) First and last name of foreigner, (2) Place name, (3) Company name,
(4) Various facility names, (5) Various product names, etc. Therefore, in the present invention, each subclass of proper nouns and common nouns of foreign words can be used as the vocabulary class of unregistered words.

【００８３】また、本実施形態で用いるクラス依存未登
録語モデルは、クラス毎に異なったパラメータ構造を持
たせることが可能となっている。そのため、各クラスに
おける読みの統計的特徴を強く反映させたモデル化が可
能である。実施形態において、日本人姓及び名の未登録
語モデル構築には、そのパラメータ構造として、（１）
単語長のガンマ分布と、（２）終端記号への遷移を含ま
ないサブワード単位Ｎ−ｇｒａｍとを用いた。しかしな
がら、その他のクラス、例えば宿泊施設名（「京都第一
観光ホテル」、「赤坂プリンスホテル」、「いとう旅
館」、…）のように、複合語を多く含むクラスをモデル
化する際には、（１）長さに関する制約が有効とはなら
ない場合がある。その場合は、上記（１）の制約を省
き、代わりに上記（２）の制約に基づいたサブワード単
位Ｎ−ｇｒａｍにおいて、終端記号への遷移を含むよう
モデル化する（例えば、「ホテル」や「旅館」から終端
記号へ高い確率で遷移する）ことで、そうしたクラスに
対しても高精度な未登録語モデルを構築することが可能
である。The class-dependent unregistered word model used in this embodiment can have a different parameter structure for each class. Therefore, modeling that strongly reflects the statistical characteristics of reading in each class is possible. In the embodiment, the unregistered word model construction of the Japanese surname and given name includes (1)
The gamma distribution of word length and (2) subword unit N-gram which does not include transition to terminal symbol were used. However, when modeling other classes, such as accommodation names (“Kyoto Daiichi Kanko Hotel”, “Akasaka Prince Hotel”, “Ito Ryokan”, ...) (1) There are cases where the restriction on the length is not valid. In that case, the constraint of the above (1) is omitted, and instead, the subword unit N-gram based on the constraint of the above (2) is modeled so as to include a transition to the terminal symbol (for example, “hotel” or “hotel”). It is possible to build a highly accurate unregistered word model for such a class by making a transition with a high probability from "inn" to a terminal symbol.

【００８４】以上の実施形態においては、当該単語の先
頭音素環境とは、当該単語より先行する単語仮説の最終
音素と、当該単語の単語仮説の最初の２つの音素とを含
む３つの音素並びとして定義されているが、本発明はこ
れに限らず、先行する単語仮説の最終音素と、最終音素
と連続する先行する単語仮説の少なくとも１つの音素と
を含む先行単語仮説の音素列と、当該単語の単語仮説の
最初の音素を含む音素列とを含む音素並びとしてもよ
い。In the above embodiment, the leading phoneme environment of the word is a three phoneme sequence including the final phoneme of the word hypothesis preceding the word and the first two phonemes of the word hypothesis of the word. Although defined, the present invention is not limited to this, and the phoneme string of the preceding word hypothesis including the final phoneme of the preceding word hypothesis and at least one phoneme of the preceding word hypothesis continuous with the final phoneme, and the word May be a phoneme sequence including a phoneme sequence including the first phoneme of the word hypothesis.

【００８５】以上の実施形態において、特徴抽出部２
と、単語照合部４と、単語仮説絞込部６と、未登録語モ
デル生成部２０と、サブワード単位データ生成部２１
と、単語辞書生成部２２と、単語クラスＮ−ｇｒａｍモ
デル生成部２３と、言語モデル生成部２４とは、例え
ば、デジタル電子計算機などのコンピュータで構成さ
れ、バッファメモリ３、５と、音素ＨＭＭメモリ１１
と、単語辞書メモリ１２と、学習データメモリ３０と、
テキストデータベースメモリ３１と、サブワード単位Ｎ
−ｇｒａｍモデルメモリ４０と、モーラ長ガンマ分布デ
ータメモリ４１と、ラベル付きサブワード単位データメ
モリ４２と、単語クラスＮ−ｇｒａｍモデルメモリ４３
と、統計的言語モデルメモリ４４とは、例えばハードデ
ィスクメモリなどの記憶装置で構成される。In the above embodiment, the feature extraction unit 2
, Word collation unit 4, word hypothesis narrowing unit 6, unregistered word model generation unit 20, and subword unit data generation unit 21.
The word dictionary generation unit 22, the word class N-gram model generation unit 23, and the language model generation unit 24 are configured by, for example, a computer such as a digital electronic computer, and include buffer memories 3 and 5 and a phoneme HMM memory. 11
A word dictionary memory 12, a learning data memory 30,
Text database memory 31 and subword unit N
-Gram model memory 40, mora length gamma distribution data memory 41, labeled subword unit data memory 42, word class N-gram model memory 43
The statistical language model memory 44 is composed of a storage device such as a hard disk memory.

【００８６】以上実施形態においては、単語照合部４と
単語仮説絞込部６とを用いて音声認識を行っているが、
本発明はこれに限らず、例えば、音素ＨＭＭメモリ１１
内の音素ＨＭＭを参照する音素照合部と、例えばＯｎｅ
ＰａｓｓＤＰアルゴリズムを用いて統計的言語モデ
ルを参照して単語の音声認識を行う音声認識部とで構成
してもよい。In the above embodiment, the speech recognition is performed using the word collating unit 4 and the word hypothesis narrowing unit 6.
The present invention is not limited to this, and for example, the phoneme HMM memory 11
Phoneme collating unit that refers to a phoneme HMM in the
It may be configured with a voice recognition unit that performs voice recognition of a word by referring to a statistical language model using the Pass DP algorithm.

【００８７】[0087]

【実施例】本発明者は、本実施形態に係る統計的言語モ
デルの有効性を確認するため、音声認識実験を行った。
以下では、二種類の統計的言語モデルを用いて比較評価
を行う。両言語モデルは、共通のベースモデルとして、
表３の旅行会話データのみから生成したクラスＮ−ｇｒ
ａｍを用いる。このベースモデルに対して、それぞれの
方法で日本人姓クラス、及び名クラスのクラス内単語１
−ｇｒａｍを置換する。EXAMPLE The present inventor conducted a speech recognition experiment in order to confirm the effectiveness of the statistical language model according to this embodiment.
Below, a comparative evaluation is performed using two types of statistical language models. Both language models, as a common base model,
Class N-gr generated only from travel conversation data in Table 3
am is used. For this base model, in each method, Japanese surname class and first class word 1
-Replace gram.

【００８８】評価を行う統計的言語モデルは以下の通り
である。（１）本実施形態に係る統計的言語モデル：日本人姓及
び名クラスの単語１−ｇｒａｍとして、姓及び名それぞ
れの未登録語モデルを用いる。サブワード単位Ｎ−ｇｒ
ａｍで単位として用いるモーラ連鎖は、特に断らない限
り１５０個の場合を評価する。認識語彙は、日本人姓及
び名以外の単語１２，７５５単語＋サブワードで構成
し、登録語の日本人姓及び名は作らない。（２）登録語方法（以下、比較例という。）：日本人姓
及び名クラスの単語１−ｇｒａｍとして、表３の人名デ
ータによる単語１−ｇｒａｍを用いる。認識語彙は、日
本人姓及び名以外の単語１２，７５５単語＋日本人姓及
び名３９，４３１単語となる。この方法は、評価セット
中のほぼ全人名をカバーする語彙を持つこと、また、本
実施形態の方法が未登録語モデルの最尤推定に用いる人
名データを単語１−ｇｒａｍとして直接用いることか
ら、概ね本実施形態の方法による認識精度の上限値を与
えるものと考えられる。The statistical language model used for evaluation is as follows. (1) Statistical language model according to the present embodiment: The unregistered word model of each family name and given name is used as the word 1-gram of the Japanese family name and given name class. Subword unit N-gr
As for the mora chain used as a unit in am, 150 cases are evaluated unless otherwise specified. The recognition vocabulary consists of 12,755 words plus subwords other than Japanese surnames and given names, and Japanese registered surnames and given names are not created. (2) Registered word method (hereinafter, referred to as a comparative example): The word 1-gram based on the personal name data in Table 3 is used as the word 1-gram of Japanese surname and given name class. The recognition vocabulary is 12,755 words other than Japanese surnames and given names + 39,431 Japanese surnames and given names. This method has a vocabulary that covers almost all the names in the evaluation set, and because the method of the present embodiment directly uses the person name data used for the maximum likelihood estimation of the unregistered word model as the word 1-gram, It is considered that the upper limit of the recognition accuracy according to the method of the present embodiment is approximately given.

【００８９】これら２つの方法の音声認識率を、以下の
基準により評価する。（１）単語認識率：評価データに出現する全単語の認識
率を評価する。日本人姓及び名は、クラス（「日姓」又
は「日名」）、読み（モーラ並び）、位置（ＤＰによる
対応付け）が全て正しい場合のみを正解とする。ただ
し、読みに関し、明らかに等価な長音（ヨウコとヨオ
コ）は手作業で修正して評価した。（２）姓及び名単語の再現率及び適合率：単語認識率評
価時の動的計画法のマッチング（ＤＰマッチング）を用
いて、日本人姓及び名のみの再現率と適合率を評価す
る。The speech recognition rates of these two methods are evaluated according to the following criteria. (1) Word recognition rate: The recognition rate of all words appearing in the evaluation data is evaluated. For Japanese surnames and given names, the correct answer is only when the class (“day surname” or “day name”), reading (arranged by mora), and position (correspondence by DP) are all correct. However, with regard to reading, apparently equivalent long sounds (Yoko and Yoko) were manually corrected and evaluated. (2) Recall and precision of surnames and given words: The recall and precision of only Japanese surnames and given names are evaluated using dynamic programming matching (DP matching) when evaluating word recognition rates.

【００９０】ここで、評価セットには、旅行会話ドメイ
ンの４２片側会話４、９９０単語を用いた。評価セット
に出現する日本人名は、姓及び名、合わせて７０単語
（異なり単語数５２）である。うち、表３の人名リスト
にも出現しない姓は３単語（アサギノ１単語、チンザイ
２単語）であった。Here, 42 unilateral conversations 4,990 words in the travel conversation domain were used in the evaluation set. The Japanese name appearing in the evaluation set is 70 words in total including the family name and given name (different word number is 52). Of these, the surnames that did not appear in the personal name list in Table 3 were 3 words (1 word for Asagino, 2 words for Chinzai).

【００９１】次いで、表３に本実施形態の方法、及び比
較例の方法の音声認識率を示す。Next, Table 3 shows the voice recognition rates of the method of this embodiment and the method of the comparative example.

【００９２】[0092]

【表５】音声認識率 ――――――――――――――――――――――――――――――――――― 認識率（％）本実施形態比較例 ――――――――――――――――――――――――――――――――――― 単語認識率８７．５１８７．３０姓及び名単語再現率７０７３適合率６７７５ ――――――――――――――――――――――――――――――――――― （注）姓及び名は、読み、クラス、区間が全て正しい場合のみ正解として評価した。比較例の方法の認識率は、概ね本実施形態の方法の上限値に相当すると考えられる。[Table 5] Speech recognition rate ――――――――――――――――――――――――――――――――――― Recognition rate (%) This embodiment Comparative example ――――――――――――――――――――――――――――――――――― Word recognition rate 87.51 87.30 Surname and surname word recall 70 73 Compliance rate 67 75 ――――――――――――――――――――――――――――――――――― (Note) The surname and given name will be evaluated as correct only if the reading, class and section are all correct. It was It is considered that the recognition rate of the method of the comparative example roughly corresponds to the upper limit of the method of the present embodiment. To be

【００９３】本実施形態の方法では、未登録語である姓
及び名を、登録語として認識した場合とほぼ同等の精度
で認識できた。予想に反し、本実施形態の方法の単語認
識率が比較例の方法を上回った理由の一つとして、以下
が挙げられる。音響尤度の低い一部の姓及び名に対し、
本実施形態の方法では読み誤りはあるものの区間が正し
く検出され、結果、前後の単語にまで認識誤りを誘発す
ることが少なかったと考えられる。このことは、次の表
６に示す読み誤りを無視した姓及び名区間の再現率及び
適合率において、本実施形態の方法が優れていることか
らも裏付けられる。According to the method of this embodiment, the unregistered words, namely the family name and the first name, can be recognized with substantially the same accuracy as when they are recognized as the registered words. Contrary to expectations, one of the reasons why the word recognition rate of the method of the present embodiment exceeds the method of the comparative example is as follows. For some surnames and given names with low acoustic likelihood,
It is considered that the method of the present embodiment correctly detects a section although there is a reading error, and as a result, it rarely induces a recognition error in the preceding and following words. This is supported by the fact that the method of the present embodiment is excellent in the recall and precision of the family name and first name sections in which the reading error shown in Table 6 below is ignored.

【００９４】[0094]

【表６】姓及び名単語の区間検出率 ――――――――――――――――――――――――――――――――――― 認識率（％）本実施形態比較例 ――――――――――――――――――――――――――――――――――― 姓及び名区間再現率８７８０適合率８４８２ ――――――――――――――――――――――――――――――――――― （注）姓及び名のクラスや区間が正しい場合を、正解として評価（音響尤度の影響が強い読み誤りは無視する。）した。[Table 6] Section detection rate for surnames and given words ――――――――――――――――――――――――――――――――――― Recognition rate (%) This embodiment Comparative example ――――――――――――――――――――――――――――――――――― Surname and surname recall rate 87 80 Compliance rate 84 82 ――――――――――――――――――――――――――――――――――― (Note) If the class and section of family name and given name are correct, it is evaluated as the correct answer (acoustic likelihood shadow). Ignore misleading reading errors. )did.

【００９５】図１０は、本発明者による実験の実験結果
であって、日本人の姓及び名の再現率におけるモーラ連
鎖の単位化効果を示すグラフであり、モーラ連鎖の種類
の数に対する単語再現率を示すグラフである。図１０か
ら明らかなように、単位化するモーラ連鎖を増やすこと
で、モデルによる姓及び名の尤度が上がり、再現率が改
善されるものと思われる。これは、上述の学習セットに
対する平均尤度の改善傾向と合致する。FIG. 10 is a graph showing the experimental results of the experiment by the present inventor showing the unitization effect of the mora chain on the recall rate of Japanese surnames and given names, and word reproduction with respect to the number of types of the mora chain. It is a graph which shows a rate. As is clear from FIG. 10, it is considered that the likelihood of surname and given name by the model is increased and the recall is improved by increasing the mora chain to be unitized. This is consistent with the trend of improving the average likelihood for the learning set described above.

【００９６】次いで、希有な姓及び名に対する音声認識
率について説明する。本実施形態で提案する未登録語モ
デルの利点は、事前に予測できない希有な単語も正しく
認識できる可能性があることにある。ここでは、そうし
た希な姓及び名を模擬的に作り出すことで、本実施形態
の方法の評価を行う。評価セットには、５２種類の日本
人姓及び名が出現する。そこで、これらの単語と同じ読
みを持つ全ての姓及び名を表１の学習データから削除し
た後、前節と同様に本実施形態の方法と比較例の方法に
よる音声認識率の比較実験を行った。表７にその結果を
示す。Next, the voice recognition rate for rare family names and first names will be described. The advantage of the unregistered word model proposed in the present embodiment is that rare words that cannot be predicted in advance can be correctly recognized. Here, the method of the present embodiment is evaluated by simulating such rare family names and first names. 52 types of Japanese family names and surnames appear in the evaluation set. Therefore, after deleting all surnames and given names having the same readings as these words from the learning data in Table 1, a comparison experiment of speech recognition rates by the method of the present embodiment and the method of the comparative example was performed as in the previous section. . Table 7 shows the results.

【００９７】[0097]

【表７】希有な姓及び名入力時の音声認識率 ―――――――――――――――――――――――― 認識率（％）本実施形態比較例 ―――――――――――――――――――――――― 単語認識率８６．６６８６．０８姓及び名単語再現率３１６適合率３６８ ―――――――――――――――――――――――― （注）学習に用いる姓及び名データから、評価セットに
出現する姓及び名と同じ読みを持つエントリを全て削除
して実験。姓及び名は、読み、クラス、区間が全て正し
い場合のみ正解として評価した。登録語方式の再現率・
適合率が０％にならないのは、形態素の不備により、一
部の姓が「普通名詞」になっていたためである。[Table 7] Rare surname and first name voice recognition rate ―――――――――――――――――――――――― Recognition rate (%) Comparison of this embodiment ――――――――――――――――――――――― Word recognition rate 86.66 86.08 Surname and first name word recall rate 31 6 Relevance rate 36 8 ―――――― ―――――――――――――――――― (Note) From the surname and given name data used for learning, delete all entries that have the same reading as the surname and given name that appear in the evaluation set. . The surname and given name were evaluated as correct only when the reading, class and section were all correct. Registered word recall rate
The precision does not reach 0% because some surnames became "common nouns" due to morpheme imperfections.

【００９８】表５から明らかなように、本実施形態の方
法では、学習データに存在しない姓及び名を与えても、
３１％の再現率で、その読み、クラス、区間を正しく認
識できた。結果、単語認識率でも登録語方式を０．５８
ポイント上回った。As is clear from Table 5, in the method of this embodiment, even if a surname and given name that do not exist in the learning data are given,
With a recall of 31%, the reading, class, and section were correctly recognized. As a result, even with the word recognition rate, the registered word method is 0.58.
I exceeded the points.

【００９９】以上説明したように、本発明に係る実施形
態によれば、未登録語モデルのクラス依存化により、次
の特有の効果を得ることができる。（１）モデル化対象を限定することで、読みの統計的特
徴をより明確化することができ、クラス固有のパラメー
タ制約を導入できるため、未登録語モデルを高精度化す
ることができる。（２）検出区間の言語処理が可能である。未登録語は、
読みに加えクラスも同時に同定される。読みとクラス
は、固有名詞の言語処理において必要十分な情報となる
ケースが多いものと考えられる。（３）上記生成された統計的言語モデルを用いて音声認
識することにより、従来技術に比較して高い認識率で音
声認識することができる。As described above, according to the embodiment of the present invention, the following unique effects can be obtained by making the unregistered word model class-dependent. (1) By limiting the modeling target, it is possible to clarify the statistical characteristics of reading and to introduce the parameter constraint peculiar to the class. Therefore, the accuracy of the unregistered word model can be improved. (2) Language processing of the detection section is possible. Unregistered words are
In addition to reading, the class is also identified at the same time. Yomi and class are considered to be necessary and sufficient information in the language processing of proper nouns in many cases. (3) By using the generated statistical language model for voice recognition, it is possible to perform voice recognition with a higher recognition rate as compared with the related art.

【０１００】＜第２の実施形態＞図１１は、本発明に係
る第２の実施形態である連続音声認識システムの構成を
示すブロック図であり、図１２は、図１１の連続音声認
識システムを用いた、自動ダイヤリング機能付き電話機
の構成を示すブロック図である。<Second Embodiment> FIG. 11 is a block diagram showing the configuration of a continuous speech recognition system according to a second embodiment of the present invention. FIG. 12 shows the continuous speech recognition system of FIG. It is a block diagram which shows the structure of the telephone with an automatic dialing function used.

【０１０１】図１１の連続音声認識システムは、図１の
連続音声認識システムに比較して以下の点が異なる。（１）学習データメモリ３０は、日本人姓ファイル３０
ａと、日本人名ファイル３０ｂとに加えて、地名ファイ
ル３０ｃなどのファイルを含む。地名ファイルは、例え
ば日本や外国の地名の単語を含むファイルである。（２）図１の単語クラスＮ−ｇｒａｍモデル生成部２３
に代えて、有限状態オートマトンモデル生成部２３ａを
備える。（３）図１の単語クラスＮ−ｇｒａｍモデルメモリ４３
に代えて、有限状態オートマトンモデルメモリ４３ａを
備える。The continuous speech recognition system of FIG. 11 differs from the continuous speech recognition system of FIG. 1 in the following points. (1) The learning data memory 30 is a Japanese surname file 30.
In addition to "a" and the Japanese name file 30b, files such as the place name file 30c are included. The place name file is, for example, a file containing words of place names in Japan and foreign countries. (2) Word class N-gram model generation unit 23 in FIG.
Instead, the finite state automaton model generation unit 23a is provided. (3) Word class N-gram model memory 43 of FIG.
Instead, it has a finite state automaton model memory 43a.

【０１０２】ここで、有限状態オートマトンモデル生成
部２３ａは、テキストデータベースメモリ３１に格納さ
れた多数の日本語の書き下し文からなるコーパスと呼ば
れるテキストデータに基づいて有限状態オートマトンモ
デルを生成し、このモデルを有限状態オートマトンメモ
リ４３ａに格納する。Here, the finite state automaton model generator 23a generates a finite state automaton model based on text data called a corpus composed of a large number of Japanese written sentences stored in the text database memory 31, and this model is generated. It is stored in the finite state automaton memory 43a.

【０１０３】図１１の連続音声認識システムにおいて、
破線で囲んだ部分を音声認識装置１００という。すなわ
ち、音声認識装置１００は、マイクロホン１から単語仮
説絞込部６までの回路及び処理部と、単語照合部４に接
続された音素ＨＭＭメモリ１１及び単語辞書メモリ１
２、並びに、単語仮説絞込部６に接続された統計的言語
モデルメモリ４４とを備えて構成される。第２の実施形
態では、この音声認識装置１００を用いて、図１２の自
動ダイヤリング機能付き電話機が構成される。なお、単
語辞書メモリ１２及び統計的言語モデルメモリ４４内の
データは予め図１１のシステムにより生成されて格納さ
れる。従って、音声認識装置１００は、マイクロホン１
に入力される人名などの単語の話者音声に応答して、音
声認識処理を実行して、音声認識結果の文字列を出力す
る。In the continuous speech recognition system of FIG. 11,
The part surrounded by the broken line is called the voice recognition device 100. That is, the speech recognition device 100 includes a circuit and a processing unit from the microphone 1 to the word hypothesis narrowing unit 6, a phoneme HMM memory 11 and a word dictionary memory 1 connected to the word matching unit 4.
2, and a statistical language model memory 44 connected to the word hypothesis narrowing unit 6. In the second embodiment, the voice recognition device 100 is used to configure the telephone with the automatic dialing function of FIG. The data in the word dictionary memory 12 and the statistical language model memory 44 are generated and stored in advance by the system shown in FIG. Therefore, the voice recognition device 100 is the microphone 1
In response to a speaker's voice of a word such as a person's name, the voice recognition processing is executed and the character string of the voice recognition result is output.

【０１０４】図１２は本実施形態に係る自動ダイヤリン
グ機能付き電話機の構成を示しており、主制御部５０
は、ＣＰＵで構成され、ＲＯＭ５１内に格納される所定
の動作プログラムを実行することによりこの電話機の全
体の動作を制御する。ＲＡＭ５２は、主制御部５０で動
作プログラムを実行するときに必要なデータを格納する
とともに、主制御部５０のための一時的なワーキングメ
モリとして用いられる。表示部５３は、例えば液晶表示
装置（ＬＣＤ）等の表示装置であり、当該電話機の動作
状態を表示したり、送信先の名称や電話番号を表示す
る。また、操作部５４は、当該電話機を操作するために
必要な文字キー、ダイヤル用テンキー、短縮ダイヤルキ
ーや各種のファンクションキー等を備える。さらに、ネ
ットワークコントロールユニット（ＮＣＵ）５５は、ア
ナログの公衆電話回線Ｌの直流ループなどの閉結及び開
放の動作を行い、かつ自動ダイヤル機能を有するＭＴＤ
Ｆダイヤラーを含むハードウェア回路であり、必要に応
じて送受話器５９に接続し、もしくは音声合成出力部５
６からの出力を公衆電話回線Ｌに接続する。またさら
に、音声合成出力部５６は、例えば、パルス発生器と雑
音発生器と利得可変型増幅器とフィルタとを備え、公知
の音声合成方法を用いて、主制御部５０からの制御によ
り、音声合成すべき文字列のテキストデータを所定のパ
ラメータ時系列に変換した後、そのピッチに基づいてパ
ルス発生器を制御し、有声／無声の切り換えに基づいて
パルス発生器と雑音発生器とを選択的に切り換えて使用
し、振幅値に基づいて利得可変型増幅器を制御し、フィ
ルタ係数値に基づいて上記フィルタを制御することによ
り、上記文字列の音声を音声合成してスピーカ５７を介
して出力し、もしくは、当該音声合成の音声信号をＮＣ
Ｕ５５及び公衆電話回線Ｌを介して通信の相手方に対し
て送信する。以上の回路５１乃至５６及び電話番号検索
部６０とは、バス５８を介して主制御部５０に接続され
る。FIG. 12 shows the configuration of a telephone with an automatic dialing function according to the present embodiment, and the main control unit 50
Is composed of a CPU and controls the overall operation of this telephone by executing a predetermined operation program stored in the ROM 51. The RAM 52 stores data required when the operation program is executed by the main control unit 50, and is used as a temporary working memory for the main control unit 50. The display unit 53 is, for example, a display device such as a liquid crystal display device (LCD), and displays the operating state of the telephone, the name of the transmission destination, and the telephone number. The operation unit 54 also includes character keys, a numeric keypad for dialing, speed dial keys, various function keys, and the like necessary for operating the telephone. Further, the network control unit (NCU) 55 is an MTD that performs an operation of closing and opening a DC loop of an analog public telephone line L and has an automatic dial function.
A hardware circuit including an F dialer, which is connected to the handset 59 as necessary, or the voice synthesis output unit 5
The output from 6 is connected to the public telephone line L. Furthermore, the voice synthesis output unit 56 includes, for example, a pulse generator, a noise generator, a variable gain amplifier, and a filter, and is controlled by the main control unit 50 using a well-known voice synthesis method. After converting the text data of the character string to a predetermined parameter time series, the pulse generator is controlled based on the pitch, and the pulse generator and the noise generator are selectively selected based on voiced / unvoiced switching. By switching and using, the variable gain amplifier is controlled based on the amplitude value, and the filter is controlled based on the filter coefficient value, the voice of the character string is voice-synthesized and output through the speaker 57, Alternatively, the voice signal of the voice synthesis is NC.
It is transmitted to the other party of communication via the U55 and the public telephone line L. The above circuits 51 to 56 and the telephone number search unit 60 are connected to the main control unit 50 via the bus 58.

【０１０５】電話番号テーブルメモリ６１は、人名とそ
れに対応する電話番号をテーブルの形式で予め記憶す
る。そして、電話番号検索部６０は、音声認識装置１０
０からの音声認識結果の「発信」という単語に続く文字
列の人名の単語に基づいて、当該人名に対応する電話番
号の情報を電話番号テーブルメモリ６１から読み出し
て、当該電話番号の情報をバス５８を介して主制御部５
０に出力する。これに応答して、主制御部５０は、電話
番号の情報をＮＣＵ５５内のＭＴＤＦダイヤラーに出力
し、このとき、ＮＣＵ５５は発信のためにオフフックし
た後、ＭＴＤＦダイヤラーは入力される電話番号の情報
に対応するダイヤル信号を発生して公衆電話回線Ｌに対
して送出する。これにより、ユーザがマイクロホン１を
介して発声した人名に対応する電話番号の電話機に対し
て発信できる。The telephone number table memory 61 stores in advance a person's name and a telephone number corresponding thereto in the form of a table. Then, the telephone number search unit 60 uses the voice recognition device 10
Based on the word of the person's name in the character string following the word "call" in the voice recognition result from 0, the information of the telephone number corresponding to the person's name is read from the telephone number table memory 61, and the information of the telephone number is stored in the bus. Main control unit 5 via 58
Output to 0. In response to this, the main control unit 50 outputs the telephone number information to the MTDF dialer in the NCU 55. At this time, after the NCU 55 goes off-hook to make a call, the MTDF dialer changes the input telephone number information. A corresponding dial signal is generated and sent to the public telephone line L. This allows the user to make a call to the telephone having the telephone number corresponding to the person's name uttered by the microphone 1.

【０１０６】この第２の実施形態によれば、音声認識及
び自動ダイヤリング機能を備えた電話機において、単語
辞書において未登録の未登録語に関する音声認識の精度
を従来例に比較して高くすることができる音声認識装置
を用いて情報検索を実行することができ、これにより、
限られたメモリで多数の人名などの固有名詞が音声認識
可能となるため、データベースを備えた電話機におい
て、従来技術に比較して高い精度で情報検索が可能とな
る。また、高い音声認識率で自動ダイヤリングできる。According to the second embodiment, in a telephone having a voice recognition and automatic dialing function, the accuracy of voice recognition regarding an unregistered word that is not registered in the word dictionary is higher than that of the conventional example. It is possible to perform information retrieval using a voice recognition device capable of
Since a large number of proper names such as personal names can be recognized by voice in a limited memory, it is possible to retrieve information with higher accuracy in a telephone equipped with a database as compared with the prior art. Also, automatic dialing can be performed with a high voice recognition rate.

【０１０７】以上の実施形態においては、有限状態オー
トマトンモデル生成部２３ａ及び有限状態オートマトン
メモリ４３ａを備えているが、本発明はこれに限らず、
それぞれ図１の単語クラスＮ−ｇｒａｍモデル生成部２
３、図１の単語クラスＮ−ｇｒａｍモデルメモリ４３を
備えてもよい。Although the finite state automaton model generation unit 23a and the finite state automaton memory 43a are provided in the above embodiment, the present invention is not limited to this.
The word class N-gram model generation unit 2 in FIG.
3, the word class N-gram model memory 43 of FIG. 1 may be provided.

【０１０８】＜第３の実施形態＞図１３は、本発明に係
る第３の実施形態である構内交換機（ＰＢＸ）の構成を
示すブロック図である。この実施形態は、図１１の音声
認識装置１００及び図１２の電話番号検索部６０、電話
番号テーブルメモリ６１を、構内交換機の内線転送又は
外線転送に適用したことを特徴としている。<Third Embodiment> FIG. 13 is a block diagram showing the configuration of a private branch exchange (PBX) according to a third embodiment of the present invention. This embodiment is characterized in that the voice recognition device 100 of FIG. 11 and the telephone number search unit 60 and the telephone number table memory 61 of FIG. 12 are applied to extension transfer or outside line transfer of a private branch exchange.

【０１０９】図１３において、主制御部１５０は、ＣＰ
Ｕで構成され、ＲＯＭ１５１内に格納される所定の動作
プログラムを実行することによりこの構内交換機の全体
の動作を制御する。ＲＡＭ１５２は、主制御部１５０で
動作プログラムを実行するときに必要なデータを格納す
るとともに、主制御部１５０のための一時的なワーキン
グメモリとして用いられる。表示部１５３は、例えば液
晶表示装置（ＬＣＤ）等の表示装置であり、当該構内交
換機の動作状態を表示したり、送信先の名称や電話番号
を表示する。また、操作部１５４は、当該構内交換機を
操作するために必要な文字キー、ダイヤル用テンキーや
各種のファンクションキー等を備える。さらに、ネット
ワークコントロールユニット（ＮＣＵ）１５５は、外線
である複数の公衆電話回線Ｌ１乃至ＬＮと、内線電話機
Ｔ１乃至ＴＭに接続された内線とを交互に接続する電話
交換スイッチ回路を備えるとともに、アナログの各公衆
電話回線Ｌ１乃至ＬＮの直流ループなどの閉結及び開放
の動作を行い、かつ自動ダイヤル機能を有するＭＴＤＦ
ダイヤラーを含むハードウェア回路であり、必要に応じ
て音声合成出力部１５６からの出力を公衆電話回線Ｌ１
乃至ＬＮに接続する。またさらに、音声合成出力部１５
６は、図１２の音声合成出力部５６と同様に構成され、
主制御部５０からの制御により、音声合成すべき文字列
のテキストデータを音声合成して、上記文字列の音声を
音声合成してその音声信号をＮＣＵ１５５及び公衆電話
回線Ｌ１乃至ＬＮを介して通信相手方に対して送信す
る。以上の回路１５１乃至１５６及び電話番号検索部６
０とは、バス１５８を介して主制御部１５０に接続され
る。In FIG. 13, the main control unit 150 displays the CP
The entire operation of this private branch exchange is controlled by executing a predetermined operation program which is composed of U and is stored in the ROM 151. The RAM 152 stores data required when the operation program is executed by the main control unit 150, and is used as a temporary working memory for the main control unit 150. The display unit 153 is, for example, a display device such as a liquid crystal display device (LCD), and displays an operating state of the private branch exchange, a destination name, and a telephone number. In addition, the operation unit 154 includes character keys necessary for operating the private branch exchange, numeric keys for dialing, various function keys, and the like. Further, the network control unit (NCU) 155 includes a telephone exchange switch circuit that alternately connects a plurality of public telephone lines L1 to LN that are external lines and extension lines that are connected to the extension telephones T1 to TM, and also has an analog switching circuit. MTDF for closing and opening DC loops of each public telephone line L1 to LN and having an automatic dialing function
A hardware circuit including a dialer, which outputs the output from the voice synthesis output unit 156 to the public telephone line L1 as necessary.
To LN. Furthermore, the voice synthesis output unit 15
6 is configured similarly to the voice synthesis output unit 56 of FIG. 12,
Under the control of the main control unit 50, the text data of the character string to be voice-synthesized is voice-synthesized, the voice of the character string is voice-synthesized, and the voice signal is communicated via the NCU 155 and the public telephone lines L1 to LN. Send to the other party. The above circuits 151 to 156 and the telephone number search unit 6
0 is connected to the main control unit 150 via the bus 158.

【０１１０】電話番号テーブルメモリ６１ｂは、人名と
それに対応する内線電話番号及び外線電話番号をテーブ
ルの形式で予め記憶する。そして、主制御部１５０は、
ＮＣＵ１５５において公衆電話回線Ｌ１乃至ＬＮにうち
の１つからの着信に対して自動応答し、「こちらは、Ａ
ＢＣ会社です。内線のどちらにお繋ぎしましょうか？」
という音声合成信号を音声合成出力部１５６で発生させ
て相手方に出力する。これに対して、相手方から発声さ
れる内線転送すべき人名の音声信号をＮＣＵ１５５から
音声認識装置１００の特徴抽出器２に出力する。このと
き、音声認識装置１００は音声認識処理を実行し、音声
認識結果の文字列を電話番号検索部６０に出力する。こ
れに応答して、電話番号検索部６０は、音声認識装置１
００からの音声認識結果の文字列の人名の単語に基づい
て、当該人名に対応する内線電話番号の情報を電話番号
テーブルメモリ６１ｂから読み出して、当該内線電話番
号の情報をバス１５８を介して主制御部１５０に出力す
る。これに応答して、主制御部１５０は、内線番号の情
報に基づいてＮＣＵ１５５を制御して、当該着信してき
た公衆電話回線を対応する内線番号の内線電話機に接続
することにより、内線転送が完了する。The telephone number table memory 61b stores in advance a person's name and corresponding extension telephone numbers and outside telephone numbers in the form of a table. Then, the main controller 150
The NCU 155 automatically responds to an incoming call from one of the public telephone lines L1 to LN.
It is a BC company. Which extension should I connect to? "
The voice synthesis signal is generated by the voice synthesis output unit 156 and output to the other party. On the other hand, the voice signal of the person's name to be transferred by extension, which is uttered by the other party, is output from the NCU 155 to the feature extractor 2 of the voice recognition device 100. At this time, the voice recognition device 100 executes the voice recognition process and outputs the character string of the voice recognition result to the telephone number search unit 60. In response to this, the telephone number search unit 60 causes the voice recognition device 1 to
The information of the extension telephone number corresponding to the person's name is read from the telephone number table memory 61b based on the word of the person's name of the character string of the voice recognition result from 00, and the information of the extension telephone number is mainly transmitted via the bus 158. Output to the control unit 150. In response to this, the main control unit 150 controls the NCU 155 based on the information of the extension number to connect the incoming public telephone line to the extension telephone of the corresponding extension number, thereby completing the extension transfer. To do.

【０１１１】以上の実施形態においては、内線転送の例
について説明しているが、内線電話機Ｔ１乃至ＴＭから
公衆電話回線への外線発信でも同様に、音声認識装置１
００、電話番号検索部６０及び電話番号テーブルメモリ
６１ｂを用いて、ユーザが外線発信したい「人名」を発
声するだけで外線発信を実行できるように構成できる。In the above embodiment, an example of extension transfer has been described, but the same applies to the voice recognition device 1 when an extension call is made from the extension telephones T1 to TM to the public telephone line.
00, the telephone number search unit 60, and the telephone number table memory 61b can be configured so that the user can make an outside line by simply saying the "personal name" that the user wants to make an outside line.

【０１１２】この第３の実施形態によれば、音声認識及
び自動転送機能を備えた構内交換機において、単語辞書
において未登録の未登録語に関する音声認識の精度を従
来例に比較して高くすることができる音声認識装置を用
いて情報検索を実行することができ、これにより、限ら
れたメモリで多数の人名などの固有名詞が音声認識可能
となるため、データベースを備えた構内交換機におい
て、従来技術に比較して高い精度で情報検索が可能とな
る。また、高い音声認識率で自動転送できる。According to the third embodiment, in the private branch exchange equipped with the voice recognition and automatic transfer functions, the accuracy of voice recognition regarding unregistered words that are not registered in the word dictionary is higher than that of the conventional example. It is possible to perform information retrieval by using a voice recognition device capable of performing voice recognition, which enables voice recognition of a large number of proper names such as personal names in a limited memory. Information can be searched with higher accuracy than Also, automatic transfer is possible with a high voice recognition rate.

【０１１３】＜第４の実施形態＞図１４は、本発明に係
る第４の実施形態であるカーナビゲーションシステムの
構成を示すブロック図である。この実施形態は、図１１
の音声認識装置１００をカーナビゲーションシステムに
適用したことを特徴としている。<Fourth Embodiment> FIG. 14 is a block diagram showing the configuration of a car navigation system according to a fourth embodiment of the present invention. This embodiment is shown in FIG.
The voice recognition device 100 is applied to a car navigation system.

【０１１４】図１４において、主制御部２５０は、ＣＰ
Ｕで構成され、ＲＯＭ２５１内に格納される所定の基本
プログラム及びＣＤ−ＲＯＭドライブ装置２５９内のＣ
Ｄ−ＲＯＭからフラッシュメモリ２５８にロードされた
アプリケーションプログラムを実行することによりこの
カーナビゲーションシステムの全体の動作を制御する。
ＲＡＭ２５２は、主制御部２５０で基本プログラム又は
アプリケーションプログラムを実行するときに必要なデ
ータを格納するとともに、主制御部２５０のための一時
的なワーキングメモリとして用いられる。表示部２５３
は、例えば液晶表示装置（ＬＣＤ）等の表示装置であ
り、当該カーナビゲーションの動作状態を表示したり、
指示された地名付近の地図やナビゲーション情報を表示
する。また、操作部２５４は、当該ナビゲーションシス
テムを操作するために必要な文字キー、ダイヤル用テン
キーや各種のファンクションキー等を備える。さらに、
音声合成出力部２５６は、図１２の音声合成出力部５６
と同様に構成され、主制御部２５０からの制御により、
音声合成すべき文字列のテキストデータを音声合成し
て、上記文字列の音声を音声合成してその音声信号をス
ピーカ２５７に出力する。ＣＤ−ＲＯＭドライブ装置２
５９には、カーナビゲーションのためのアプリケーショ
ンプログラム及び地図情報などのカーナビゲーション情
報を格納したＣＤ−ＲＯＭが挿入され、これらの情報は
当該ＣＤ−ＲＯＭからＣＤ−ＲＯＭドライブ装置２５９
及びバス２５８を介してフラッシュメモリ２５８にロー
ドされて利用される。以上の回路２５１乃至２５６、２
５９及び地名検索部６０ａとは、バス２５８を介して主
制御部２５０に接続される。In FIG. 14, the main controller 250 uses the CP
A predetermined basic program which is composed of U and is stored in the ROM 251 and C in the CD-ROM drive device 259.
The entire operation of this car navigation system is controlled by executing the application program loaded from the D-ROM to the flash memory 258.
The RAM 252 stores data necessary when the main control unit 250 executes the basic program or the application program, and is used as a temporary working memory for the main control unit 250. Display unit 253
Is a display device such as a liquid crystal display device (LCD), and displays the operation state of the car navigation,
Display the map and navigation information around the designated place name. In addition, the operation unit 254 includes character keys necessary for operating the navigation system, numeric keys for dialing, various function keys, and the like. further,
The voice synthesis output unit 256 is the voice synthesis output unit 56 of FIG.
Is configured in the same manner as described above, and by the control from the main control unit 250,
The text data of the character string to be voice-synthesized is voice-synthesized, the voice of the character string is voice-synthesized, and the voice signal is output to the speaker 257. CD-ROM drive device 2
A CD-ROM storing car navigation information such as an application program for car navigation and map information is inserted in 59, and these pieces of information are transferred from the CD-ROM to a CD-ROM drive device 259.
And is loaded into the flash memory 258 via the bus 258 and used. The above circuits 251-256, 2
59 and the place name search unit 60a are connected to the main control unit 250 via a bus 258.

【０１１５】地名テーブルメモリ６１ａは、地名とそれ
に対応する位置情報（緯度や経度の情報）をテーブルの
形式で予め記憶する。そして、ユーザが音声認識装置１
００のマイクロホン１に対して地名を発声したとき、音
声認識装置１００は音声認識処理を実行し、音声認識結
果の文字列を地名検索部６０ａに出力する。これに応答
して、地名検索部６０ａは、音声認識装置１００からの
音声認識結果の文字列の地名の単語に基づいて、当該地
名に対応する位置情報を地名テーブルメモリ６１ａから
読み出して、当該位置情報をバス２５８を介して主制御
部２５０に出力する。これに応答して、主制御部２５０
は、当該位置情報に基づいて、フラッシュメモリ２５８
内の地図情報などのカーナビゲーション情報を検索し
て、検索された対応する情報を表示部２５３に表示する
とともに、検索された音声情報を音声合成出力部２５６
に出力することにより、スピーカ２５７から当該音声合
成された音声を出力させる。The place name table memory 61a stores place names and position information (latitude and longitude information) corresponding thereto in advance in the form of a table. Then, the user recognizes the voice recognition device 1
When the place name is uttered to the microphone 1 of 00, the voice recognition device 100 executes the voice recognition process and outputs the character string of the voice recognition result to the place name search unit 60a. In response to this, the place name search unit 60a reads the position information corresponding to the place name from the place name table memory 61a, based on the word of the place name in the character string of the voice recognition result from the voice recognition device 100, and the position concerned. The information is output to the main control unit 250 via the bus 258. In response to this, the main control unit 250
On the basis of the position information.
The car navigation information such as map information in the inside is searched, the corresponding searched information is displayed on the display unit 253, and the searched voice information is output to the voice synthesis output unit 256.
To output the synthesized voice from the speaker 257.

【０１１６】この第４の実施形態によれば、音声認識及
びカーナビゲーション機能を有するカーナビゲーション
システムにおいて未登録の未登録語に関する音声認識の
精度を従来例に比較して高くすることができる音声認識
装置を用いて情報検索を実行することができ、これによ
り、限られたメモリで多数の地名なの固有名詞が音声認
識可能となるため、データベースを備えたカーナビゲー
ションシステムにおいて従来技術に比較して高い精度で
情報検索が可能となる。また、高い音声認識率で地名を
音声認識でき、適切にカーナビゲーションの処理を実行
できる。According to the fourth embodiment, in the car navigation system having the voice recognition and car navigation functions, the accuracy of voice recognition regarding unregistered unregistered words can be increased as compared with the conventional example. It is possible to perform information retrieval using the device, which makes it possible to recognize a large number of proper names of place names with voice in a limited memory, which is higher in a car navigation system with a database than in the prior art. Information can be retrieved with accuracy. Further, the place name can be recognized by voice with a high voice recognition rate, and the car navigation process can be appropriately executed.

【０１１７】以上の第２、第３及び第４の実施形態にお
いては、電話機、構内交換機、カーナビゲーションシス
テムの例について説明しているが、本発明はこれに限ら
ず、単語リストに対応する普通名詞の単語データとそれ
に対応する情報とを含むデータベースメモリを記憶し、
音声認識装置１００から出力される音声認識結果の文字
列をキーとして用いて、上記データベースの記憶装置に
記憶されたデータベースから検索して、一致する単語デ
ータに対応する情報を上記データベースメモリから読み
出して出力し、さらには、当該検索された情報に基づい
て、所定の処理を実行することができる。In the above second, third and fourth embodiments, examples of a telephone set, a private branch exchange, and a car navigation system are explained, but the present invention is not limited to this, and an ordinary word list corresponding to Stores a database memory containing word data of nouns and corresponding information,
Using the character string of the voice recognition result output from the voice recognition device 100 as a key, the database stored in the storage device of the database is searched, and the information corresponding to the matching word data is read from the database memory. It is possible to output and further execute a predetermined process based on the retrieved information.

【０１１８】＜第５の実施形態＞図１５は、本発明に係
る第５の実施形態であるかな漢字変換装置の構成を示す
ブロック図であり、図１と同様のものについては同一の
符号を付している。この実施形態に係るかな漢字変換装
置は、キーボード７１と、キーボードインターフェース
７２と、音素ＨＭＭメモリ１１及び単語辞書メモリ１２
が接続された単語照合部４ａと、バッファメモリ５と、
統計的言語モデルメモリ４４が接続された単語仮説絞込
部６とを備えて構成される。<Fifth Embodiment> FIG. 15 is a block diagram showing the arrangement of a Kana-Kanji conversion apparatus according to the fifth embodiment of the present invention. The same components as those in FIG. 1 are designated by the same reference numerals. is doing. The kana-kanji conversion device according to this embodiment includes a keyboard 71, a keyboard interface 72, a phoneme HMM memory 11 and a word dictionary memory 12.
A word matching unit 4a connected to, a buffer memory 5,
And a word hypothesis narrowing unit 6 to which a statistical language model memory 44 is connected.

【０１１９】ここで、単語辞書メモリ１２は、図１の単
語辞書生成部２２により生成された単語辞書を記憶し、
ここで、単語辞書は、学習用データメモリ３０に記憶さ
れたファイル（図１や図１１に図示の、日本人姓ファイ
ル３０ａ、日本人名ファイル３０ｂ、地名ファイル３０
ｃに限らず、上述のように、外国人の姓と名、会社名、
各種施設名、各種製品名などの単語を含んでもよい。）
及びテキストデータメモリ３１内のテキストデータの単
語に対応する漢字表記の複数の単語データを含む。ま
た、統計的言語モデルメモリ４４は、図１の言語モデル
生成部２４により生成された統計的言語モデルを記憶
し、この統計的言語モデルは上記学習用データメモリ３
０に記憶されたファイル及びテキストデータメモリ３１
内のテキストデータの単語に基づいて生成される。Here, the word dictionary memory 12 stores the word dictionary generated by the word dictionary generator 22 of FIG.
Here, the word dictionary is a file stored in the learning data memory 30 (Japanese surname file 30a, Japanese name file 30b, place name file 30 shown in FIGS. 1 and 11).
Not limited to c, as described above, the foreigner's surname and first name, company name,
Words such as various facility names and various product names may be included. )
And a plurality of word data in Kanji notation corresponding to the words of the text data in the text data memory 31. Further, the statistical language model memory 44 stores the statistical language model generated by the language model generation unit 24 of FIG. 1, and this statistical language model is the learning data memory 3 described above.
File and text data memory 31 stored in 0
Generated based on the words in the text data in.

【０１２０】図１５において、キーボード７１は、かな
文字列を入力するための入力手段であり、キーボードイ
ンターフェース７２はキーボード７１を用いて入力され
たかな文字列のデータを一旦格納した後、所定の信号変
換などの処理を実行した後、単語照合部４ａに出力す
る。単語照合部４ａは、ワン−パス・ビタビ復号化法を
用いて、キーボードインターフェース７２を介して入力
されるかな文字列のデータに基づいて、音素ＨＭＭメモ
リ１１内の音素ＨＭＭと、単語辞書メモリ１２内の単語
辞書とを用いて単語仮説を検出し尤度を計算して出力す
る。ここで、具体的には、単語照合部４ａは、単語辞書
を参照して、入力されたかな文字列と、上記単語辞書内
の単語との間の単語照合及び尤度計算を行い、一致した
ときに漢字表記の単語に変換して単語仮説の文字列とし
て尤度とともに出力する一方、一致しないときにかな文
字のまま単語仮説の文字列として尤度とともに出力す
る。単語照合部４ａからの出力データはバッファメモリ
５を介して単語仮説絞込部６に入力される。単語仮説絞
込部６は、単語照合部４ａからバッファメモリ５を介し
て出力される単語仮説に基づいて、統計的言語モデルメ
モリ４４内の統計的言語モデルを参照して、終了時刻が
等しく開始時刻が異なる同一の単語の単語仮説に対し
て、当該単語の先頭音素環境毎に、発声開始時刻から当
該単語の終了時刻に至る計算された総尤度のうちの最も
高い尤度を有する１つの単語仮説で代表させるように単
語仮説の絞り込みを行った後、絞り込み後のすべての単
語仮説の単語列のうち、最大の総尤度を有し漢字表記を
含む仮説の単語列を認識結果として出力する。In FIG. 15, a keyboard 71 is an input means for inputting a kana character string, and a keyboard interface 72 temporarily stores the data of the kana character string input using the keyboard 71 and then a predetermined signal. After performing processing such as conversion, it outputs to the word collating unit 4a. The word matching unit 4a uses the one-pass Viterbi decoding method, based on the data of the kana character string input via the keyboard interface 72, and the phoneme HMM in the phoneme HMM memory 11 and the word dictionary memory 12. The word hypothesis is used to detect the word hypothesis, and the likelihood is calculated and output. Here, specifically, the word matching unit 4a refers to the word dictionary, performs word matching and likelihood calculation between the input kana character string and the word in the word dictionary, and matches. At times, it is converted into a word in Kanji notation and output as a character string of a word hypothesis with likelihood, while when it does not match, it is output as a kana character string as a character string of word hypothesis with likelihood. The output data from the word matching unit 4a is input to the word hypothesis narrowing unit 6 via the buffer memory 5. The word hypothesis narrowing unit 6 refers to the statistical language model in the statistical language model memory 44 on the basis of the word hypothesis output from the word matching unit 4a via the buffer memory 5 and starts the end times at the same time. For the word hypothesis of the same word at different times, one of the highest likelihoods of the calculated total likelihoods from the utterance start time to the end time of the word is calculated for each head phoneme environment of the word. After narrowing down the word hypotheses so that they are represented by the word hypotheses, out of the word strings of all the word hypotheses after narrowing down, the word string of the hypothesis that has the maximum total likelihood and contains the Kanji notation is output as the recognition result. To do.

【０１２１】この第５の実施形態によれば、かな漢字変
換装置によれば、上記統計的言語モデルを利用して、か
な漢字変換率を従来技術に比較して向上できるかな漢字
変換装置を提供するができる。従って、例えば未登録の
固有名詞も変換可能とすることができる。According to the fifth embodiment, the kana-kanji conversion device can provide the kana-kanji conversion device which can improve the kana-kanji conversion rate as compared with the prior art by using the statistical language model. . Therefore, for example, unregistered proper nouns can be converted.

【０１２２】[0122]

【発明の効果】以上詳述したように本発明によれば、ク
ラスに依存して構築された未登録語モデルを含む統計的
言語モデルを生成したので、次の特有の効果を得ること
ができる。（１）モデル化対象を限定することで、読みの統計的特
徴をより明確化することができ、クラス固有のパラメー
タ制約を導入できるため、未登録語モデルを高精度化す
ることができる。（２）検出区間の言語処理が可能である。未登録語は、
読みに加えクラスも同時に同定される。読みとクラス
は、固有名詞の言語処理において必要十分な情報となる
ケースが多いものと考えられる。（３）上記生成された統計的言語モデルを用いて音声認
識することにより、従来技術に比較して高い認識率で音
声認識することができる。As described above in detail, according to the present invention, since the statistical language model including the unregistered word model constructed depending on the class is generated, the following unique effect can be obtained. . (1) By limiting the modeling target, it is possible to clarify the statistical characteristics of reading and to introduce the parameter constraint peculiar to the class. Therefore, the accuracy of the unregistered word model can be improved. (2) Language processing of the detection section is possible. Unregistered words are
In addition to reading, the class is also identified at the same time. Yomi and class are considered to be necessary and sufficient information in the language processing of proper nouns in many cases. (3) By using the generated statistical language model for voice recognition, it is possible to perform voice recognition with a higher recognition rate as compared with the related art.

【０１２３】また、本発明に係る情報検索処理装置によ
れば、電話機における音声認識及び自動ダイヤリング機
能や、カーナビゲーションなどの小規模の情報検索処理
装置において、単語辞書において未登録の未登録語に関
する音声認識の精度が従来例に比較して高い音声認識装
置を用いて情報検索を実行することができる。従って、
限られたメモリで多数の人名や地名などの固有名詞が音
声認識可能となるため、データベースを備えた小型携帯
装置などの情報検索処理装置において、従来技術に比較
して高い精度で情報検索が可能となる。Further, according to the information retrieval processing device of the present invention, in a small-scale information retrieval processing device such as voice recognition and automatic dialing function in a telephone or car navigation, unregistered words not registered in the word dictionary. It is possible to perform information retrieval using a voice recognition device in which the accuracy of voice recognition is higher than that in the conventional example. Therefore,
Since a large number of personal names and place names such as place names can be recognized by voice in a limited memory, information retrieval processing devices such as small portable devices equipped with a database can retrieve information with higher accuracy than conventional technologies. Becomes

【０１２４】[0124]

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明に係る第１の実施形態である連続音声
認識システムのブロック図である。FIG. 1 is a block diagram of a continuous speech recognition system according to a first embodiment of the present invention.

【図２】図１の連続音声認識システムにおける単語仮
説絞込部６の処理を示すタイミングチャートである。2 is a timing chart showing a process of a word hypothesis narrowing unit 6 in the continuous speech recognition system of FIG.

【図３】図１の未登録語モデル生成部２０によって実
行される未登録語モデル生成処理を示すフローチャート
である。FIG. 3 is a flowchart showing an unregistered word model generation process executed by an unregistered word model generation unit 20 in FIG.

【図４】図３のサブルーチンであるサブワード２−ｇ
ｒａｍの単位決定処理（ステップＳ４）を示すフローチ
ャートである。FIG. 4 is a sub word 2-g which is a subroutine of FIG.
It is a flowchart which shows the unit determination process (step S4) of ram.

【図５】図１の言語モデル生成部２４によって実行さ
れる言語モデル生成処理を示すフローチャートである。5 is a flowchart showing a language model generation process executed by the language model generation unit 24 of FIG.

【図６】本発明者の分析による、日本人の姓及び名並
びに旅行会話における単語の長さの分布を示すグラフで
あって、モーラ長に対する単語数の割合を示すグラフで
ある。FIG. 6 is a graph showing the distribution of word names in Japanese family names and first names and travel conversations, which is analyzed by the present inventors, and is a graph showing the ratio of the number of words to the mora length.

【図７】第１の実施形態に係るクラス依存未登録語モ
デルに基づく統計的言語モデルの一例を示す状態遷移図
である。FIG. 7 is a state transition diagram showing an example of a statistical language model based on the class-dependent unregistered word model according to the first embodiment.

【図８】第１の実施形態に係る統計的言語モデルの一
例を示す状態遷移図である。FIG. 8 is a state transition diagram showing an example of the statistical language model according to the first embodiment.

【図９】図１の未登録語モデル生成部２０によって実
行される未登録語モデル生成処理における、モーラ連鎖
の単位化による平均尤度の向上を示すグラフであって、
モーラ連鎖の種類の数に対する平均尤度を示すグラフで
ある。9 is a graph showing the improvement of the average likelihood due to the unitization of the mora chain in the unregistered word model generation processing executed by the unregistered word model generation unit 20 of FIG.
It is a graph which shows the average likelihood with respect to the number of types of mora chain.

【図１０】本発明者による第１の実施形態の連続音声
認識システムに係る実験の実験結果であって、日本人の
姓及び名の再現率におけるモーラ連鎖の単位化効果を示
すグラフであり、モーラ連鎖の種類の数に対する単語再
現率を示すグラフである。FIG. 10 is an experimental result of an experiment relating to the continuous speech recognition system of the first embodiment by the present inventor, and is a graph showing the unitization effect of the mora chain in the recall rate of Japanese surnames and given names; It is a graph which shows the word recall with respect to the number of types of mora chain.

【図１１】本発明に係る第２の実施形態である連続音
声認識システムの構成を示すブロック図である。FIG. 11 is a block diagram showing a configuration of a continuous speech recognition system according to a second embodiment of the present invention.

【図１２】図１１の連続音声認識システムを用いた、
自動ダイヤリング機能付き電話機の構成を示すブロック
図である。FIG. 12 is a diagram showing a case of using the continuous speech recognition system of FIG.
It is a block diagram showing a configuration of a telephone with an automatic dialing function.

【図１３】本発明に係る第３の実施形態である構内交
換機（ＰＢＸ）の構成を示すブロック図である。FIG. 13 is a block diagram showing a configuration of a private branch exchange (PBX) according to a third embodiment of the present invention.

【図１４】本発明に係る第４の実施形態であるカーナ
ビゲーションシステムの構成を示すブロック図である。FIG. 14 is a block diagram showing a configuration of a car navigation system that is a fourth embodiment according to the present invention.

【図１５】本発明に係る第５の実施形態であるかな漢
字変換装置の構成を示すブロック図である。FIG. 15 is a block diagram showing a configuration of a kana-kanji conversion device according to a fifth embodiment of the present invention.

【符号の説明】[Explanation of symbols]

１…マイクロホン、２…特徴抽出部、３，５…バッファメモリ、４，４ａ…単語照合部、６…単語仮説絞込部、１１…音素ＨＭＭメモリ、１２…単語辞書メモリ、２０…未登録モデル生成部、２１…サブワード単位データ生成部、２２…単語辞書生成部、２３…単語クラスＮ−ｇｒａｍモデル生成部、２３ａ…有限状態オートマトンモデル生成部、２４…言語モデル生成部、３０…学習データメモリ、３０ａ…日本人姓ファイル、３０ｂ…日本人名ファイル、３０ｃ…地名ファイル、３１…テキストデータベースメモリ、４０…サブワード単位Ｎ−ｇｒａｍモデルメモリ、４１…モーラ長ガンマ分布データメモリ、４２…ラベル付きサブワード単位データメモリ、４３…単語クラスＮ−ｇｒａｍモデルメモリ、４３ａ…有限状態オートマトンモデルメモリ、４４…統計的言語モデルメモリ、５０…主制御部、５１…ＲＯＭ、５２…ＲＡＭ、５３…表示部、５４…操作部、５５…ネットワークコントロールユニット（ＮＣＵ）、５６…音声合成出力部、５７…スピーカ、５８…バス、５９…送受話器、６０…電話番号検索部、６０ａ…地名検索部、６１，６１ｂ…電話番号テーブルメモリ、６１ａ…地名テーブルメモリ、７１…キーボード、７２…キーボードインターフェース、１００…音声認識装置、１５０…主制御部、１５１…ＲＯＭ、１５２…ＲＡＭ、１５３…表示部、１５４…操作部、１５５…ネットワークコントロールユニット（ＮＣ
Ｕ）、１５６…音声合成出力部、１５８…バス、２５０…主制御部、２５１…ＲＯＭ、２５２…ＲＡＭ、２５３…表示部、２５４…操作部、２５６…音声合成出力部、２５７…スピーカ、２５８…バス、２５９…ＣＤ−ＲＯＭドライブ装置、Ｌ，Ｌ１乃至ＬＮ…公衆電話回線、Ｔ１乃至ＴＭ…内線電話機。1 ... Microphone, 2 ... Feature extraction section, 3, 5 ... Buffer memory, 4, 4a ... Word matching section, 6 ... Word hypothesis narrowing section, 11 ... Phoneme HMM memory, 12 ... Word dictionary memory, 20 ... Unregistered model Generation unit, 21 ... Subword unit data generation unit, 22 ... Word dictionary generation unit, 23 ... Word class N-gram model generation unit, 23a ... Finite state automaton model generation unit, 24 ... Language model generation unit, 30 ... Learning data memory , 30a ... Japanese surname file, 30b ... Japanese name file, 30c ... Place name file, 31 ... Text database memory, 40 ... Subword unit N-gram model memory, 41 ... Mora length gamma distribution data memory, 42 ... Labeled subword unit Data memory, 43 ... Word class N-gram model memory, 43a ... Finite state State automaton model memory, 44 ... Statistical language model memory, 50 ... Main control unit, 51 ... ROM, 52 ... RAM, 53 ... Display unit, 54 ... Operation unit, 55 ... Network control unit (NCU), 56 ... Speech synthesis Output unit, 57 ... Speaker, 58 ... Bus, 59 ... Handset, 60 ... Telephone number search unit, 60a ... Place name search unit, 61, 61b ... Telephone number table memory, 61a ... Place name table memory, 71 ... Keyboard, 72 ... Keyboard interface, 100 ... Voice recognition device, 150 ... Main control unit, 151 ... ROM, 152 ... RAM, 153 ... Display unit, 154 ... Operation unit, 155 ... Network control unit (NC)
U), 156 ... Voice synthesis output section, 158 ... Bus, 250 ... Main control section, 251 ... ROM, 252 ... RAM, 253 ... Display section, 254 ... Operation section, 256 ... Voice synthesis output section, 257 ... Speaker, 258 ... bus, 259 ... CD-ROM drive device, L, L1 to LN ... public telephone line, T1 to TM ... extension telephone.

───────────────────────────────────────────────────── フロントページの続き (72)発明者匂坂芳典京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール音声翻訳通信研究所内 (56)参考文献特開2001−51996（ＪＰ，Ａ) 特開平６−308994（ＪＰ，Ａ) 特開2000−99082（ＪＰ，Ａ) ＭａｓａａｋｉＮＡＧＡＴＡ，ＡＰａｒｔｏｆＳｐｅｅｃｈＥｓｔｉｍａｔｉｏｎＭｅｔｈｏｄｆｏｒＪａｐａｎｅｓｅＵｎｋｎｏｗｎＷｏｒｄｓｕｓｉｎｇａＳｔａｔｉｓｔｉｃａｌＭｏｄｅｌｏｆＭｏｒｐｈｏｌｏｇ，37ｔｈＡｎｎｕａｌＭｅｅｔｉｎｇｏｆｔｈｅＡｓｓｏｃｉａｔｉｏｎｆｏｒＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＣｏｎｆｅｒｅｎｃｅ，米国，ＡｓｓｏｃｉａｔｉｏｎｆｏｒＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ，1999年６月20日, 277−284 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/18 ＪＩＣＳＴファイル（ＪＯＩＳ)─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Yoshinori Kosaka 5 Seiji-cho, Seika-cho, Soraku-gun, Kyoto Prefecture Mihiratani No. 5, A-T-R, Inc. Voice Translation Research Institute (56) References (JP, A) JP-A-6-308994 (JP, A) JP-A-2000-99082 (JP, A) Masaaki NAGATA, A Part of Speech Estimation Method for Golf Japan, Waisting in the United States of America. Annual Meeting of the Association for Computational Linguistics Pro eedings o f the Conference, the United States, Association for Computational Ling uistics, 6 May 20, 1999, 277-284 (58) investigated the field (Int.Cl. ^7, DB name) G10L 15/18 JICST file (JOIS)

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】固有名詞又は外来語の普通名詞の単語リ
ストを含む学習データを格納する学習データ記憶手段
と、上記学習データ記憶手段に格納された学習データに基づ
いて、上記学習データにおけるモーラ長に対する単語数
の割合が実質的にガンマ分布に従うと仮定したときのモ
ーラ長のガンマ分布のパラメータをクラスに依存して推
定して計算するとともに、モーラ又はモーラ連鎖である
サブワード単位で、上記固有名詞又は外来語の普通名詞
の下位クラスであるクラスを有する第１のＮ−ｇｒａｍ
の出現確率を計算することにより未登録語をモデル化し
たサブワード単位Ｎ−ｇｒａｍモデルを生成する第１の
生成手段と、所定のテキストデータベースに基づいて生成された単語
クラスＮ−ｇｒａｍモデルと、上記第１の生成手段によ
って生成されたサブワード単位Ｎ−ｇｒａｍモデルと、
上記第１の生成手段によって計算されたモーラ長のガン
マ分布のパラメータとに基づいて、上記単語クラスと、
上記固有名詞又は外来語の普通名詞の下位クラスである
クラスとに依存した第２のＮ−ｇｒａｍの出現確率を計
算することによりサブワード単位に基づいた未登録語を
含む統計的言語モデルを生成する第２の生成手段とを備
えたことを特徴とする統計的言語モデル生成装置。1. A learning data storage means for storing learning data including a word list of proper nouns or common nouns of foreign words, and a mora length in the learning data based on the learning data stored in the learning data storage means. The parameter of the gamma distribution of the mora length is assumed to be calculated depending on the class, assuming that the ratio of the number of words to is substantially following the gamma distribution. Or a first N-gram having a class that is a subclass of a common noun of a foreign word
First generation means for generating a subword unit N-gram model in which an unregistered word is modeled by calculating an appearance probability of, a word class N-gram model generated based on a predetermined text database, A sub-word unit N-gram model generated by the first generation means,
The word class based on the parameters of the gamma distribution of the mora length calculated by the first generating means;
A statistical language model including unregistered words based on subword units is generated by calculating the probability of occurrence of the second N-gram depending on a class that is a subclass of the proper noun or the common noun of a foreign word. A statistical language model generation apparatus comprising: a second generation unit.

【請求項２】上記第１の生成手段によって生成された
サブワード単位Ｎ−ｇｒａｍモデルに基づいて、上記サ
ブワード単位を抽出し、抽出したラベルを上記サブワー
ド単位に付与することにより、サブワード単位当たり複
数のラベル付きサブワード単位のデータを生成する第３
の生成手段と、上記テキストデータベースから抽出された単語と、上記
第３の生成手段によって生成された複数のラベル付きサ
ブワード単位のデータとに対して音素並びを付与するこ
とにより単語辞書を生成する第４の生成手段とをさらに
備えたことを特徴とする請求項１記載の統計的言語モデ
ル生成装置。2. A plurality of subword units are extracted for each subword unit by extracting the subword unit based on the subword unit N-gram model generated by the first generating unit and applying the extracted label to the subword unit. Third to generate data in units of labeled subwords
Generating a word dictionary by adding a phoneme arrangement to the word extracted from the text database and the data in units of a plurality of labeled subwords generated by the third generating means. statistical language model generating apparatus according to claim 1, wherein, further comprising fourth and generating means.

【請求項３】入力される発声音声文の音声信号に基づ
いて、所定の統計的言語モデルを用いて音声認識する音
声認識手段を備えた音声認識装置において、上記音声認識手段は、請求項１又は２記載の統計的言語
モデル生成装置によって生成された統計的言語モデル
と、請求項２記載の第４の生成手段によって生成された
単語辞書とを用いて音声認識することを特徴とする音声
認識装置。3. Based on the audio signal of the utterance sentence input, the speech recognition apparatus having speech recognizing speech recognition means by using a predetermined statistical language model, the speech recognition means, according to claim 1 or speech recognition, characterized in that the speech recognition using a statistical language model generated by 2 statistical language model generating apparatus according, a word dictionary generated by the fourth generating means according to claim 2, wherein apparatus.

【請求項４】上記単語リストに対応する普通名詞の単
語データとそれに対応する情報とを含むデータベースを
記憶するデータベース記憶手段と、請求項３記載の音声認識装置から出力される音声認識結
果の文字列をキーとして用いて、上記データベース記憶
手段に記憶されたデータベースから検索して、一致する
単語データに対応する情報を上記データベース記憶手段
から読み出して出力する検索手段とを備えたことを特徴
とする情報検索処理装置。4. A database storage means for storing a database containing word data of common nouns corresponding to the word list and information corresponding thereto, and a character of a voice recognition result output from the voice recognition device according to claim 3. Search means for searching the database stored in the database storage means, using the column as a key, and reading and outputting information corresponding to the matching word data from the database storage means. Information retrieval processing device.

【請求項５】上記情報検索処理装置はさらに、上記検索手段から出力される情報に基づいて、所定の処
理を実行する処理実行手段を備えたことを特徴とする請
求項４記載の情報検索処理装置。5. The information search processing device according to claim 4, further comprising a processing execution means for executing a predetermined processing based on the information output from the search means. apparatus.