JP2002278584A

JP2002278584A - Language model generator, voice recognition device using the same, method therefor and computer-readable recording medium having the program recorded thereon

Info

Publication number: JP2002278584A
Application number: JP2001074023A
Authority: JP
Inventors: Jun Ishii; 純石井
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2001-03-15
Filing date: 2001-03-15
Publication date: 2002-09-27
Anticipated expiration: 2021-03-15
Also published as: JP3894419B2

Abstract

PROBLEM TO BE SOLVED: To provide a language model generator preparing a language model, having the high inference accuracy of the occurrence probability of a word string and a voice recognition device using it, etc. SOLUTION: The language model generator is provided with a means 105 for generating the language model, including a redundant word for generating the language model for obtaining the occurrence probability of the word string, which includes the redundant word as well from a text for learning including the redundant word; a redundant word removal means 102 for removing the redundant word from the text for learning including the redundant word and generating the text for learning, excluding the redundant word and a means 104 for generating the language model, excluding the redundant word for generating the language model for obtaining the occurrence probability of the word string excluding the redundant word from the text for learning excluding the redundant word. Furthermore, the language model generator is provided with a voice feature amount extraction means for inputting recognition object voice and for extracting a voice feature amount and a collation means for performing collation to the voice feature amount, extracted by the voice feature amount extracting means by using an acoustic model for obtaining the probability of the sequence of the voice feature amount and the two language models and outputting the result of voice recognizition.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、話者の音声の音声
認識を行うための言語モデル生成装置及びこれを用いた
音声認識装置、並びにこれらの方法、これらのプログラ
ムを記録したコンピュータ読み取り可能な記録媒体に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a language model generating apparatus for performing voice recognition of a speaker's voice, a voice recognition apparatus using the same, a method thereof, and a computer readable program storing these programs. It relates to a recording medium.

【０００２】[0002]

【従来の技術】近年、使用話者が単語を連続して入力で
きる連続音声認識技術の実用化検討が盛んに行われてい
る。連続音声認識は、単語の復号列が最大事後確率を持
つように、音声の音響的な観測系列に基づいてＷハット
を復号することである。これは式(１)で表される。2. Description of the Related Art In recent years, the practical use of continuous speech recognition technology that enables a user to continuously input words has been actively studied. Continuous speech recognition is to decode a W-hat based on an acoustic observation sequence of speech so that a decoded sequence of words has a maximum posterior probability. This is represented by equation (1).

【０００３】[0003]

【数１】 (Equation 1)

【０００４】ここで、Ｏは音声の音響的な観測系列
[o₁, o₂, o₃, ... o_T] であり、Wは単語系列 [ｗ₁, w₂,
w₃, ... w_n] である。Ｐ(Ｏ|Ｗ)は単語列Ｗが与えられ
たときの観測系列Ｏに対する確率であり音響モデルによ
って計算するものであり、Ｐ(Ｗ)は単語列Ｗの生起確率
であり言語モデルによって計算するものである。音声認
識を実行する場合は一般に式(１)の対数をとった式(２)
を用いる。式(２)においてαは音響モデルによる確率と
言語モデルによる確率のバランスをとるための重み係数
である。[0004] Here, O is an acoustic observation sequence of speech.
[o ₁ , o ₂ , o ₃ , ... o _T ], and W is the word sequence [w ₁ , w ₂ ,
w ₃ , ... w _n ]. P (O | W) is the probability of the observed sequence O when the word string W is given and is calculated by an acoustic model, and P (W) is the occurrence probability of the word string W and is calculated by a language model. Things. When speech recognition is performed, generally the expression (2) obtained by taking the logarithm of the expression (1)
Is used. In Expression (2), α is a weighting coefficient for balancing the probability based on the acoustic model and the probability based on the language model.

【０００５】[0005]

【数２】 (Equation 2)

【０００６】音声認識については、・森北出版(株)から出版されている古井貞煕著の「音声
情報処理」、1998年６月発行(以降文献１とする)、・電子情報通信学会から出版されている中川聖一著の
「確率モデルによる音声認識」、1992年４月発行(以降
文献２とする)、・ＮＴＴアドバンステクノロジ(株)から出版されている
Lawrence Rabiner、Biing-Hwang Juang著、古井貞煕監
訳の「音声認識の基礎(上、下)」1995年１１月発行(以
降文献３とする)、に詳しく説明されている。[0006] For speech recognition, "Speech Information Processing" by Sadahiro Furui, published by Morikita Publishing Co., Ltd., published in June 1998 (hereinafter referred to as Reference 1). "Speech Recognition by Stochastic Model" by Seichi Nakagawa, published in April 1992 (hereinafter referred to as Reference 2).-Published by NTT Advanced Technology Corporation.
This is described in detail in Lawrence Rabiner, Biing-Hwang Juang, "Basics of Speech Recognition (Top, Bottom)", translated by Sadahiro Furui, published in November 1995 (hereinafter referred to as Reference 3).

【０００７】音響モデルによって計算するＰ(Ｏ|Ｗ)は
最近は統計的手法である隠れマルコフモデル(ＨＭＭ)を
用いる検討が盛んである。隠れマルコフモデルを用いた
音響モデルは例えば上記文献３の６章に詳しく述べられ
ている。The use of a hidden Markov model (HMM), which is a statistical method, has been actively studied for P (OW) calculated using an acoustic model. The acoustic model using the Hidden Markov Model is described in detail in, for example, Chapter 6 of Reference 3 described above.

【０００８】また言語モデルによって計算するＰ(Ｗ)は
統計的な手法を用いることが多く、代表的なものにN-ga
rmモデルがある(Ｎは２以上)。これらについては東京大
学出版会から出版されている北研二著の「確率的言語モ
デル」、1999年１１月発行(以下文献４とする)の３章に
おいて詳しく説明されている。N-gramモデルは直前の
(Ｎ−１)個の単語から次の単語への遷移確率を統計的に
与えるものである。N-gramによる単語列 w^L ₁= w₁ ... w
_L の生起確率は式(３)によって与えられる。The P (W) calculated by the language model often uses a statistical method, and a typical one is N-ga.
There is an rm model (N is 2 or more). These are described in detail in Chapter 3 of “Probabilistic Language Model” by Kenji Kita published by the University of Tokyo Press, published in November 1999 (hereinafter referred to as Reference 4). N-gram model
This is for statistically giving a transition probability from (N-1) words to the next word. N-gram word sequence w ^L ₁ = w ₁ ... w
The occurrence probability of _L is given by equation (3).

【０００９】[0009]

【数３】 (Equation 3)

【００１０】式(３)において確率Ｐ(w_t|w_t+1-N ^t-1)は
(Ｎ−１)個の単語からなる単語列w_t+1 _-N ^t-1の後に単語w
_tが生起する確率であり、Пは積を表している。例えば
「私・は・駅・へ・行く」(・は単語の区切りを表す)と
いった単語列の生起確率を2-gram(バイグラム)で求める
場合は式(４)のようになる。式(４)において＃は文頭、
文末を表す記号である。In equation (3), the probability P (w _t | w _{t + 1−N} ^t−1 ) is
The word w after the word sequence w _{t + 1} _-N ^t-1 consisting of (N-1) words
_t is the probability of occurrence, and П represents the product. For example, when the occurrence probability of a word string such as "I-ha-station-go-go" (where "-" indicates a word delimiter) is obtained by 2-gram (bigram), the expression is expressed by equation (4). In equation (4), # is the beginning of a sentence,
This is a symbol indicating the end of a sentence.

【００１１】[0011]

【数４】 (Equation 4)

【００１２】確率Ｐ(w_t|w_t+1-N ^t-1)は学習用テキストデ
ータの単語列の相対頻度によって求められる。単語列Ｗ
の学習用テキストデータにおける出現頻度をＣ(Ｗ)とす
れば、例えば「私・は」の2-gram確率Ｐ(は|私)は式
(５)によって計算される。式(５)においてＣ(私・は)は
単語列「私・は」の出現頻度、Ｃ(私)は「私」の出現頻
度である。The probability P (w _t | w _{t + 1−N} ^t−1 ) is obtained from the relative frequency of the word string of the text data for learning. Word string W
Let C (W) be the frequency of appearance in the text data for learning, for example, the 2-gram probability P (ha | I)
It is calculated by (5). In equation (5), C (I • ha) is the frequency of appearance of the word string “I • ha”, and C (I) is the frequency of appearance of “I”.

【００１３】[0013]

【数５】 (Equation 5)

【００１４】しかしながらN-gramの確率値を単純に相対
頻度によって推定すると、学習用テキストデータ中に出
現しない単語組を０にしてしまうという大きな欠点があ
る(ゼロ頻度問題)。また、たとえ学習用テキストデータ
中に出現したとしても出現頻度の小さな単語列に対して
は、統計的に信頼性のある確率値を推定するのが難しい
(スパースネスの問題)。これらの問題に対処するため
に、通常はスムージングあるいは平滑化と呼ばれる手法
を用いる。スムージングについては上記文献４の３．３
章にいくつかの手法が述べられているので、ここでは具
体的説明は省略する。However, when the probability value of the N-gram is simply estimated based on the relative frequency, there is a serious disadvantage that a word set that does not appear in the learning text data is set to 0 (zero frequency problem). Also, even if it appears in the learning text data, it is difficult to estimate a statistically reliable probability value for a word string having a low appearance frequency.
(Sparseness problem). To address these problems, a technique called smoothing or smoothing is usually used. For smoothing, refer to 3.3 of Reference 4 above.
Since some methods are described in the chapter, a specific description is omitted here.

【００１５】この言語モデルを用いて、話し言葉のよう
な自然な発話を音声認識の対象とした音声認識装置も構
築可能である。自然発話の特徴として「えーと」、「あ
のー」等の意味の無い繋ぎの語が入ることがある。図１
３に出現する頻度が高い冗長語の例を示す。冗長語は種
類が多く、またどの単語間にも挿入される可能性がある
ので、冗長語を含めて学習した場合はスパースネスやゼ
ロ頻度の問題が生じる。従ってこの言語モデルを用いた
場合は、単語列の正確な生起確率を得ることはできず高
い認識率が得られない。そこで自然な発話を音声認識す
るための言語モデルは、冗長語は含まずに生成する方法
が検討されている。従来技術としては例えば、特開平７
−１０４７８２号公報の「音声認識装置」(以降文献５
とする)がある。Using this language model, it is possible to construct a speech recognition apparatus for speech recognition of natural utterances such as spoken words. As a feature of the natural utterance, meaningless connecting words such as “er” and “a” may be included. Figure 1
3 shows an example of a redundant word that frequently appears in the third example. There are many types of redundant words, and there is a possibility that they are inserted between any words. Therefore, when learning including redundant words, problems of sparseness and zero frequency occur. Therefore, when this language model is used, an accurate occurrence probability of a word string cannot be obtained, and a high recognition rate cannot be obtained. Therefore, a method of generating a language model for speech recognition of a natural utterance without including a redundant word has been studied. As the prior art, for example,
"Speech Recognition Apparatus" in Japanese Patent No.
And).

【００１６】図１４は文献５に記述されている従来の音
声認識装置のブロック図である。以下図１４を参照して
従来技術の説明を行う。図において、１００１は認識対
象音声、１００２は音声特徴量抽出手段、１００３は音
響モデル、１００４は言語モデル、１００５は冗長語を
スキップした言語スコアを用いた照合手段、１００６は
音声認識結果である。FIG. 14 is a block diagram of a conventional speech recognition apparatus described in Reference 5. Hereinafter, the related art will be described with reference to FIG. In the figure, reference numeral 1001 denotes a recognition target voice, 1002 denotes a voice feature amount extraction unit, 1003 denotes an acoustic model, 1004 denotes a language model, 1005 denotes a matching unit using a language score in which redundant words are skipped, and 1006 denotes a voice recognition result.

【００１７】次に動作について説明する。認識対象音声
１００１は認識対象とする音声であり、音声特徴量抽出
手段１００２へ入力される。音声特徴量抽出手段１００
２は認識対象音声１００１に含まれている音声特徴量を
抽出する。音響モデル１００３は音声に対して音響的に
照合を行うためのモデルである。音響モデル１００３は
例えば、多数の話者が発声した文や単語の音声を用いて
学習した、前後音素環境を考慮した音素を認識ユニット
としたＨＭＭを用いる。Next, the operation will be described. The speech to be recognized 1001 is a speech to be recognized, and is input to the speech feature amount extraction means 1002. Voice feature extraction means 100
2 extracts a speech feature amount included in the recognition target speech 1001. The acoustic model 1003 is a model for acoustically collating a voice. As the acoustic model 1003, for example, an HMM that uses learning of sentences and words uttered by many speakers and uses a phoneme in consideration of the surrounding phoneme environment as a recognition unit is used.

【００１８】言語モデル１００４は、単語列の生起確率
を求めるためのモデルである。言語モデル１００４は、
冗長語を含んでいない学習テキストを用いて学習した、
冗長語以外の単語列の生起確率を与える言語モデルであ
る。また言語モデル１００４には発声されやすい冗長語
を選び、認識対象の語彙として登録している。冗長語を
含む単語連鎖についての生起確率は学習テキストによっ
て求めることはせず、冗長語は、どの単語間にも挿入で
きるものとしている。上記文献５では言語モデルとして
N-gramモデル(Nは３)を用いている。The language model 1004 is a model for obtaining the occurrence probability of a word string. The language model 1004 is
Learned using a training text that does not include redundant words,
This is a language model that gives the occurrence probabilities of word strings other than redundant words. A redundant word that is easily uttered is selected as the language model 1004 and registered as a vocabulary to be recognized. Occurrence probabilities for word chains including redundant words are not determined from the learning text, and redundant words can be inserted between any words. In the above document 5, the language model
The N-gram model (N is 3) is used.

【００１９】冗長語をスキップした言語スコアを用いた
照合手段１００５は、言語モデル１００４が設定してい
る認識対象の単語 [V(1), V(2), ..., V(vn)] (vnは認
識対象とする単語数)の発音表記を認識ユニットラベル
表記に変換し、このラベルに従って音響モデル１００３
に格納されている音素単位のＨＭＭを連結し、認識対象
単語の標準パタン [λ_v(1), λ_v(2), ..., λ_v(vn)] を
作成する。そして認識対象単語標準パタンと言語モデル
１００４によって表される単語列の生起確率を用いて、
音声特徴量抽出手段１００２の出力である音声特徴量に
対して照合を行い、音声認識結果１００６を出力する。The matching means 1005 using the language score in which the redundant words are skipped is used as a recognition target word [V (1), V (2),..., V (vn)] set by the language model 1004. (vn is the number of words to be recognized) phonetic notation is converted to a recognition unit label notation, and the acoustic model
, The standard patterns [λ _{v (1)} , λ _{v (2)} ,..., Λ _{v (vn)} ] of the words to be recognized are created. Then, using the recognition target word standard pattern and the occurrence probability of the word string represented by the language model 1004,
The voice feature amount output from the voice feature amount extraction unit 1002 is collated, and a voice recognition result 1006 is output.

【００２０】このときの照合において、単語列の生起確
率は冗長語をスキップして計算される。文献５の例では
「東京都港区新橋えーと１丁目」という単語列
の3-gramによる生起確率は、式(６)のように冗長語「え
ーと」をスキップした単語列を対象にして計算してい
る。そして冗長語へ接続する確率は１．０と一定値を与
えている。In the matching at this time, the occurrence probability of the word string is calculated by skipping redundant words. In the example of Reference 5, the 3-gram occurrence probability of the word string “Shinbashi Er-to-chome, Minato-ku, Tokyo” is calculated for the word string that skips the redundant word “E-to” as in equation (6). ing. The probability of connection to the redundant word is given a constant value of 1.0.

【００２１】[0021]

【数６】 (Equation 6)

【００２２】冗長語をスキップした言語スコアを用いた
照合手段１００５は、認識対象音声に対して認識対象単
語で最も照合スコアが高い単語列ＲＷ=[V(r(1)), V(r
(2)),..., V(r(m))] を音声認識結果１００６として出
力する。ここでｒ(ｉ)は音声認識結果の単語系列のｉ番
目の単語の単語番号を示す。また、ｍは認識単語系列
の単語数を示す。The matching means 1005 using a language score in which redundant words are skipped is used by the word string RW = [V (r (1)), V (r
(2)),..., V (r (m))] are output as the speech recognition result 1006. Here, r (i) indicates the word number of the i-th word in the word sequence of the speech recognition result. M indicates the number of words in the recognized word sequence.

【００２３】[0023]

【発明が解決しようとする課題】従来の音声認識装置は
以上のように構成されているので、冗長語に接続する確
率が一定であり、そして冗長語から接続する確率は考慮
されていない。冗長語はどの単語にも接続する可能性は
あるが、発声の最初や文節間に挿入されやすいという傾
向ある。また、発声されやすい冗長語の種類にも偏りが
あるにもかかわらず、従来の音声認識装置は各冗長語は
等しい生起確率となっている。従って言語モデルは複雑
度が大きく、単語列の生起確率の推定精度は悪くなり、
音声認識精度が良くならないといった課題があった。Since the conventional speech recognition apparatus is configured as described above, the probability of connection to a redundant word is constant, and the probability of connection from a redundant word is not considered. Redundant words can be connected to any word, but tend to be more likely to be inserted at the beginning of an utterance or between segments. In addition, despite the fact that the types of redundant words that are easily uttered are also biased, in the conventional speech recognition device, each redundant word has the same occurrence probability. Therefore, the language model has a high degree of complexity, and the accuracy of estimating the probability of occurrence of a word string is low.
There was a problem that the voice recognition accuracy did not improve.

【００２４】この発明は、上記のような課題を解決する
ためになされたものであり、単語列の生起確率の推定精
度が高い言語モデルを作成できる言語モデル生成装置、
言語モデル生成方法及び言語モデル生成プログラムを記
録したコンピュータ読み取り可能な記録媒体を提供する
ことを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and has a language model generating apparatus capable of generating a language model having a high estimation accuracy of the occurrence probability of a word string.
An object of the present invention is to provide a computer-readable recording medium on which a language model generating method and a language model generating program are recorded.

【００２５】またこの発明は、単語列の推定精度が高い
言語モデルを用いて音声認識を行う音声認識精度の高い
音声認識装置、音声認識方法及び音声認識プログラムを
記録したコンピュータ読み取り可能な記録媒体を提供す
ることを目的とする。According to the present invention, there is provided a speech recognition device having a high speech recognition accuracy for performing speech recognition using a language model having a high word string estimation accuracy, a speech recognition method, and a computer-readable recording medium recording a speech recognition program. The purpose is to provide.

【００２６】[0026]

【課題を解決するための手段】上記の目的に鑑みこの発
明は、冗長語を含む学習用テキストを入力して、単語列
の生起確率を求める言語モデルを生成する言語モデル生
成装置であって、上記冗長語を含む学習用テキストを入
力して、冗長語も含めて単語列の生起確率を求める言語
モデルを生成する冗長語を含む言語モデル生成手段と、
上記冗長語を含む学習用テキストから冗長語を取り除
き、冗長語を除いた学習用テキストを生成する冗長語除
去手段と、上記冗長語を除いた学習用テキストを入力
し、冗長語を除いた単語列の生起確率を求める言語モデ
ルを生成する冗長語を除いた言語モデル生成手段と、を
備えたことを特徴とする言語モデル生成装置にある。SUMMARY OF THE INVENTION In view of the above-mentioned object, the present invention is a language model generating apparatus for generating a language model for obtaining a word string occurrence probability by inputting a learning text including a redundant word, A language model generating means including a redundant word for inputting a learning text including the redundant word and generating a language model for obtaining a probability of occurrence of a word string including the redundant word;
A redundant word removing means for removing a redundant word from the learning text including the redundant word and generating a learning text without the redundant word; and a word obtained by inputting the learning text without the redundant word and removing the redundant word. A language model generating unit for removing a redundant word for generating a language model for obtaining a row occurrence probability.

【００２７】また、冗長語を含む学習用テキストを入力
して、単語列の生起確率を求める言語モデルを生成する
言語モデル生成装置であって、上記冗長語を含む学習用
テキストを入力して冗長語をクラス化し、クラス化され
た冗長語も含めて単語列の生起確率を求める言語モデル
を生成するクラス化された冗長語を含む言語モデル生成
手段と、上記冗長語を含む学習用テキストから冗長語を
取り除き、冗長語を除いた学習用テキストを生成する冗
長語除去手段と、上記冗長語を除いた学習用テキストを
入力し、冗長語を除いた単語列の生起確率を求める言語
モデルを生成する冗長語を除いた言語モデル生成手段
と、を備えたことを特徴とする言語モデル生成装置にあ
る。Also, there is provided a language model generating apparatus for inputting a learning text including a redundant word and generating a language model for obtaining a word string occurrence probability. A language model generating means including a classified redundant word for generating a language model for classifying the word and calculating the occurrence probability of the word string including the classified redundant word; Redundant word removing means for removing words and generating a learning text without redundant words, and generating a language model for inputting the learning text without the redundant words and calculating the occurrence probability of a word string excluding the redundant words Language model generating means for removing redundant words from the language model.

【００２８】また、認識対象音声を入力して音声認識を
行い音声認識結果を出力する音声認識装置であって、上
記認識対象音声を入力し音声特徴量を抽出する音声特徴
量抽出手段と、上記音声特徴量の系列の確率を求めるた
めの音響モデルと、請求項１に記載の上記冗長語を含む
言語モデルおよび冗長語を除いた言語モデルと、上記音
響モデル、上記冗長語を含む言語モデルおよび上記冗長
語を除いた言語モデルとを用いて、上記音声特徴量抽出
手段が抽出した音声特徴量に対して照合を行い音声認識
結果を出力する照合手段と、を備えたことを特徴とする
音声認識装置にある。Also, there is provided a speech recognition apparatus for inputting a speech to be recognized, performing speech recognition and outputting a speech recognition result, and a speech feature amount extracting means for inputting the speech to be recognized and extracting a speech feature amount; An acoustic model for determining a probability of a sequence of speech feature amounts; a language model including the redundant word and a language model excluding the redundant word according to claim 1; the acoustic model; a language model including the redundant word; Using the language model from which the redundant words have been removed, and performing collation on the speech feature amount extracted by the speech feature amount extraction unit and outputting a speech recognition result. In the recognition device.

【００２９】また、認識対象音声を入力して音声認識を
行い音声認識結果を出力する音声認識装置であって、上
記認識対象音声を入力し音声特徴量を抽出する音声特徴
量抽出手段と、上記音声特徴量の系列の確率を求めるた
めの音響モデルと、請求項１に記載の上記冗長語を含む
言語モデルおよび冗長語を除いた言語モデルと、上記音
響モデルと上記冗長語を含む言語モデルとを用いて、上
記音声特徴量抽出手段が抽出した音声特徴量に対して照
合を行い複数の音声認識結果候補を出力する第１の照合
手段と、この第１の照合手段が出力した複数の音声認識
結果候補に対して、上記冗長語を含む言語モデルと上記
冗長語を除いた言語モデルとを用いて、照合を行い音声
認識結果を出力する第２の照合手段と、を備えたことを
特徴とする音声認識装置にある。Also, there is provided a speech recognition apparatus for inputting a speech to be recognized, performing speech recognition and outputting a speech recognition result, and a speech feature extraction means for inputting the speech to be recognized and extracting a speech feature. An acoustic model for obtaining a probability of a sequence of speech feature quantities; a language model including the redundant word and a language model excluding the redundant word according to claim 1; a language model including the acoustic model and the redundant word; And a plurality of voice recognition results extracted by the voice feature amount extraction means, and a plurality of voice recognition result candidates are output. Second matching means for performing matching using a language model including the redundant word and a language model excluding the redundant word with respect to the recognition result candidate, and outputting a speech recognition result. Voice recognition Apparatus is in.

【００３０】また、認識対象音声を入力して音声認識を
行い音声認識結果を出力する音声認識装置であって、上
記認識対象音声を入力し音声特徴量を抽出する音声特徴
量抽出手段と、上記音声特徴量の系列の確率を求めるた
めの音響モデルと、請求項２に記載の上記クラス化され
た冗長語を含む言語モデルおよび冗長語を除いた言語モ
デルと、上記音響モデル、上記クラス化された冗長語を
含む言語モデルおよび上記冗長語を除いた言語モデルと
を用いて、上記音声特徴量抽出手段が抽出した音声特徴
量に対して照合を行い音声認識結果を出力する照合手段
と、を備えたことを特徴とする音声認識装置にある。Also, there is provided a speech recognition apparatus for inputting a speech to be recognized, performing speech recognition and outputting a speech recognition result, and a speech feature amount extracting means for inputting the speech to be recognized and extracting a speech feature amount. An acoustic model for obtaining a probability of a sequence of speech feature quantities; a language model including the classified redundant words and a language model excluding the redundant words according to claim 2; the acoustic model; Using the language model including the redundant word and the language model excluding the redundant word, performing matching against the speech feature amount extracted by the speech feature amount extracting unit, and outputting a speech recognition result. A voice recognition device comprising:

【００３１】また、認識対象音声を入力して音声認識を
行い音声認識結果を出力する音声認識装置であって、上
記認識対象音声を入力し音声特徴量を抽出する音声特徴
量抽出手段と、上記音声特徴量の系列の確率を求めるた
めの音響モデルと、請求項２に記載の上記クラス化され
た冗長語を含む言語モデルおよび冗長語を除いた言語モ
デルと、上記音響モデルと上記クラス化された冗長語を
含む言語モデルとを用いて、上記音声特徴量抽出手段が
抽出した音声特徴量に対して照合を行い複数の音声認識
結果候補を出力する第１の照合手段と、この第１の照合
手段が出力した複数の音声認識結果候補に対して、上記
クラス化された冗長語を含む言語モデルと上記冗長語を
除いた言語モデルとを用いて、照合を行い音声認識結果
を出力する第２の照合手段と、を備えたことを特徴とす
る音声認識装置にある。Also, there is provided a speech recognition device for inputting a speech to be recognized, performing speech recognition and outputting a speech recognition result, and a speech feature amount extracting means for inputting the speech to be recognized and extracting a speech feature amount. An acoustic model for obtaining a probability of a sequence of speech feature amounts; a language model including the classified redundant words and a language model excluding the redundant words according to claim 2; A first matching unit that compares the speech feature amount extracted by the speech feature amount extraction unit and outputs a plurality of speech recognition result candidates by using the language model including the redundant word; Using the language model including the classified redundant words and the language model excluding the redundant words with respect to the plurality of speech recognition result candidates output by the matching unit, perform matching and output a speech recognition result. Two In speech recognition apparatus characterized by comprising: a coupling means.

【００３２】また、冗長語を含む学習用テキストから、
単語列の生起確率を求める言語モデルを生成する言語モ
デル生成方法であって、上記冗長語を含む学習用テキス
トから冗長語も含めて単語列の生起確率を求める言語モ
デルを生成する冗長語を含む言語モデル生成工程と、上
記冗長語を含む学習用テキストから冗長語を取り除き、
冗長語を除いた学習用テキストを生成する冗長語除去工
程と、上記冗長語を除いた学習用テキストから、冗長語
を除いた単語列の生起確率を求める言語モデルを生成す
る冗長語を除いた言語モデル生成工程と、を備えたこと
を特徴とする言語モデル生成方法にある。Further, from a learning text including redundant words,
A language model generating method for generating a language model for determining the probability of occurrence of a word string, comprising a redundant word for generating a language model for determining the probability of occurrence of a word string including the redundant word from the learning text including the redundant word. A language model generating step, removing redundant words from the learning text including the redundant words,
A redundant word removing step of generating a learning text excluding a redundant word; and removing a redundant word generating a language model for calculating an occurrence probability of a word string excluding the redundant word from the learning text excluding the redundant word. And a language model generating step.

【００３３】また、冗長語を含む学習用テキストから、
単語列の生起確率を求める言語モデルを生成する言語モ
デル生成方法であって、上記冗長語を含む学習用テキス
トの冗長語をクラス化し、クラス化された冗長語も含め
て単語列の生起確率を求める言語モデルを生成するクラ
ス化された冗長語を含む言語モデル生成工程と、上記冗
長語を含む学習用テキストから冗長語を取り除き、冗長
語を除いた学習用テキストを生成する冗長語除去工程
と、上記冗長語を除いた学習用テキストから、冗長語を
除いた単語列の生起確率を求める言語モデルを生成する
冗長語を除いた言語モデル生成工程と、を備えたことを
特徴とする言語モデル生成方法にある。Also, from the learning text including redundant words,
A language model generation method for generating a language model for determining the probability of occurrence of a word string, comprising: classifying redundant words of a learning text including the redundant words, and determining a probability of occurrence of the word string including the classified redundant words. A language model generating step including a classified redundant word for generating a desired language model; and a redundant word removing step for removing a redundant word from the learning text including the redundant word and generating a learning text excluding the redundant word. A language model generating step of generating a language model for obtaining an occurrence probability of a word string excluding the redundant word from the learning text excluding the redundant word, and a language model generating step excluding the redundant word. In the generation method.

【００３４】また、認識対象音声の音声認識を行う音声
認識方法であって、上記認識対象音声から音声特徴量を
抽出する音声特徴量抽出工程と、上記音声特徴量の系列
の確率を求めるための音響モデル、請求項７に記載の上
記冗長語を含む言語モデルおよび冗長語を除いた言語モ
デルを用いて、上記音声特徴量抽出工程で抽出した音声
特徴量に対して照合を行い音声認識を行う照合工程と、
を備えたことを特徴とする音声認識方法にある。Also, there is provided a voice recognition method for performing voice recognition of a voice to be recognized, comprising: a voice feature amount extracting step of extracting a voice feature amount from the voice to be recognized; Using the acoustic model, the language model including the redundant word and the language model excluding the redundant word according to claim 7, the voice feature amount extracted in the voice feature amount extraction step is collated to perform voice recognition. A matching process;
A voice recognition method characterized by comprising:

【００３５】また、認識対象音声の音声認識を行う音声
認識方法であって、上記認識対象音声から音声特徴量を
抽出する音声特徴量抽出工程と、上記音声特徴量の系列
の確率を求めるための音響モデルと請求項７に記載の上
記冗長語を含む言語モデルとを用いて、上記音声特徴量
抽出工程で抽出した音声特徴量に対して照合を行い複数
の音声認識結果候補を求める第１の照合工程と、この第
１の照合工程で求められた複数の音声認識結果候補に対
して、請求項７に記載の上記冗長語を含む言語モデルと
冗長語を除いた言語モデルとを用いて、照合を行い音声
認識を行う第２の照合工程と、を備えたことを特徴とす
る音声認識方法にある。Also, there is provided a voice recognition method for performing voice recognition of a voice to be recognized, comprising: a voice feature amount extraction step of extracting a voice feature amount from the recognition target voice; Using a sound model and the language model including the redundant word according to claim 7, a first step of matching a plurality of speech recognition result candidates by collating the speech feature amount extracted in the speech feature amount extraction step. Using a language model including the redundant word according to claim 7 and a language model excluding the redundant word for the plurality of speech recognition result candidates obtained in the first matching step, A second matching step of performing matching and performing voice recognition.

【００３６】また、認識対象音声の音声認識を行う音声
認識方法であって、上記認識対象音声を入力し音声特徴
量を抽出する音声特徴量抽出工程と、上記音声特徴量の
系列の確率を求めるための音響モデル、請求項８に記載
の上記クラス化された冗長語を含む言語モデルおよび冗
長語を除いた言語モデルを用いて、上記音声特徴量抽出
工程で抽出された音声特徴量に対して照合を行い音声認
識を行う照合工程と、を備えたことを特徴とする音声認
識方法にある。Also, there is provided a speech recognition method for performing speech recognition of a speech to be recognized, wherein a speech feature amount extracting step of inputting the speech to be recognized and extracting a speech feature amount, and obtaining a probability of the sequence of the speech feature amount. A speech model extracted in the speech feature extraction step by using the language model including the classified redundant words and the language model excluding the redundant words according to claim 8. And a collation step for collating and performing speech recognition.

【００３７】また、認識対象音声の音声認識を行う音声
認識方法であって、上記認識対象音声を入力し音声特徴
量を抽出する音声特徴量抽出工程と、上記音声特徴量の
系列の確率を求めるための音響モデルと請求項８に記載
の上記クラス化された冗長語を含む言語モデルとを用い
て、上記音声特徴量抽出工程で抽出された音声特徴量に
対して照合を行い複数の音声認識結果候補を求める第１
の照合工程と、この第１の照合工程で求められた複数の
音声認識結果候補に対して、請求項８に記載の上記クラ
ス化された冗長語を含む言語モデルと冗長語を除いた言
語モデルとを用いて、照合を行い音声認識を行う第２の
照合工程と、を備えたことを特徴とする音声認識方法に
ある。Also, there is provided a speech recognition method for recognizing speech of a speech to be recognized, wherein the speech feature is extracted by inputting the speech to be recognized and a speech feature is extracted, and a probability of the series of speech features is obtained. A plurality of speech recognition units that collate the speech feature amount extracted in the speech feature amount extraction step by using an acoustic model for the speech recognition and a language model including the classified redundant words according to claim 8. First to find a result candidate
9. A language model including the classified redundant words and a language model excluding the redundant words according to claim 8, for the plurality of candidate speech recognition results obtained in the first matching step. And a second matching step of performing voice recognition by performing verification using the above-mentioned method.

【００３８】また、冗長語を含む学習用テキストを入力
して、単語列の生起確率を求める言語モデルを生成する
言語モデル生成プログラムを記録した記録媒体であっ
て、上記冗長語を含む学習用テキストを入力して、冗長
語も含めて単語列の生起確率を求める言語モデルを生成
する冗長語を含む言語モデル生成手順と、上記冗長語を
含む学習用テキストから冗長語を取り除き、冗長語を除
いた学習用テキストを生成する冗長語除去手順と、上記
冗長語を除いた学習用テキストを入力し、冗長語を除い
た単語列の生起確率を求める言語モデルを生成する冗長
語を除いた言語モデル生成手順と、を実行させる言語モ
デル生成プログラムを記録したコンピュータ読み取り可
能な記録媒体にある。A learning medium including a language model generating program for generating a language model for obtaining a word string occurrence probability by inputting a learning text including a redundant word, wherein the learning text including the redundant word is included. , A language model generation procedure including a redundant word for generating a language model for obtaining the occurrence probability of the word string including the redundant word, and removing the redundant word from the learning text including the redundant word and removing the redundant word Redundant word removing procedure for generating a learned text, and a language model excluding the redundant word for generating a language model for inputting the learning text excluding the redundant word and calculating the occurrence probability of a word string excluding the redundant word And a computer-readable recording medium on which a language model generating program for executing the generating procedure is recorded.

【００３９】また、冗長語を含む学習用テキストを入力
して、単語列の生起確率を求める言語モデルを生成する
言語モデル生成プログラムを記録した記録媒体であっ
て、上記冗長語を含む学習用テキストを入力して冗長語
をクラス化し、クラス化された冗長語も含めて単語列の
生起確率を求める言語モデルを生成するクラス化された
冗長語を含む言語モデル生成手順と、上記冗長語を含む
学習用テキストから冗長語を取り除き、冗長語を除いた
学習用テキストを生成する冗長語除去手順と、上記冗長
語を除いた学習用テキストを入力し、冗長語を除いた単
語列の生起確率を求める言語モデルを生成する冗長語を
除いた言語モデル生成手順と、を実行させる言語モデル
生成プログラムを記録したコンピュータ読み取り可能な
記録媒体にある。Also, the present invention is a recording medium storing a language model generating program for generating a language model for obtaining a word string occurrence probability by inputting a learning text including a redundant word, wherein the learning text including the redundant word is included. A language model generating procedure including a classified redundant word for generating a language model for obtaining the probability of occurrence of a word string including the classified redundant word by inputting A redundant word removal procedure for removing a redundant word from the learning text and generating a learning text without the redundant word, and inputting the learning text without the redundant word and calculating a probability of occurrence of a word string excluding the redundant word. And a computer-readable recording medium storing a language model generating program for executing a language model generating procedure excluding a redundant word for generating a desired language model.

【００４０】また、認識対象音声を入力して音声認識を
行い音声認識結果を出力する音声認識プログラムを記録
した記録媒体であって、上記認識対象音声を入力し音声
特徴量を抽出する音声特徴量抽出手順と、上記音声特徴
量の系列の確率を求めるための音響モデル、冗長語を含
む学習用テキストを入力して生成された冗長語を含む言
語モデルおよび冗長語を除いた学習用テキストを入力し
て生成された冗長語を除いた言語モデルを用いて、上記
音声特徴量抽出手順で抽出した音声特徴量に対して照合
を行い音声認識結果を出力する照合手順と、を実現させ
る音声認識プログラムを記録したコンピュータ読み取り
可能な記録媒体にある。A recording medium for recording a speech recognition program for inputting a speech to be recognized and performing speech recognition and outputting a speech recognition result, wherein the speech feature quantity for inputting the speech to be recognized and extracting the speech feature quantity. An extraction procedure, an acoustic model for determining the probability of the sequence of speech features, a learning text including redundant words, a language model including redundant words, and a learning text excluding redundant words are input. Using a language model excluding the redundant words generated by the above, performing a matching on the speech features extracted in the above-mentioned speech features extracting procedure, and outputting a speech recognition result. Is recorded on a computer-readable recording medium.

【００４１】また、認識対象音声を入力して音声認識を
行い音声認識結果を出力する音声認識プログラムを記録
した記録媒体であって、上記認識対象音声を入力し音声
特徴量を抽出する音声特徴量抽出手順と、上記音声特徴
量の系列の確率を求めるための音響モデルと冗長語を含
む学習用テキストを入力して生成された冗長語を含む言
語モデルとを用いて、上記音声特徴量抽出手順で抽出し
た音声特徴量に対して照合を行い複数の音声認識結果候
補を出力する第１の照合手順と、この第１の照合手順が
出力した複数の音声認識結果候補に対して、上記冗長語
を含む言語モデルと冗長語を除いた学習用テキストを入
力して生成された冗長語を除いた言語モデルとを用い
て、照合を行い音声認識結果を出力する第２の照合手順
と、を実現させる音声認識プログラムを記録したコンピ
ュータ読み取り可能な記録媒体にある。A recording medium for recording a speech recognition program for inputting a speech to be recognized and performing speech recognition and outputting a speech recognition result, wherein the speech feature quantity for inputting the speech to be recognized and extracting the speech feature quantity. Using the acoustic model for obtaining the probability of the sequence of the speech feature amount and the language model including the redundant word generated by inputting the learning text including the redundant word, A first matching procedure for performing matching on the speech feature amounts extracted in step (a) and outputting a plurality of speech recognition result candidates, and a plurality of speech recognition result candidates output by the first matching procedure, And a second language matching procedure of performing matching and outputting a speech recognition result using a language model including redundant words and a language model excluding the redundant words generated by inputting the learning text excluding the redundant words. Sound Certain recognition program in a computer-readable recording medium.

【００４２】また、認識対象音声を入力して音声認識を
行い音声認識結果を出力する音声認識プログラムを記録
した記録媒体であって、上記認識対象音声を入力し音声
特徴量を抽出する音声特徴量抽出手順と、上記音声特徴
量の系列の確率を求めるための音響モデル、冗長語を含
む学習用テキストを入力し冗長語をクラス化して生成さ
れたクラス化された冗長語を含む言語モデルおよび冗長
語を除いた学習用テキストを入力して生成された冗長語
を除いた言語モデルとを用いて、上記音声特徴量抽出手
順が抽出した音声特徴量に対して照合を行い音声認識結
果を出力する照合手順と、を実現させる音声認識プログ
ラムを記録したコンピュータ読み取り可能な記録媒体に
ある。Also, a recording medium storing a speech recognition program for inputting a speech to be recognized and performing speech recognition and outputting a speech recognition result, wherein the speech feature quantity for inputting the speech to be recognized and extracting the speech feature quantity. An extraction procedure, an acoustic model for determining the probability of the sequence of the speech feature amount, a language model including a classified redundant word generated by inputting a learning text including the redundant word and classifying the redundant word, and a redundant model Using the language model excluding the redundant words generated by inputting the learning text excluding the words, collating the speech features extracted by the above speech feature extraction procedure and outputting a speech recognition result. And a computer-readable recording medium on which a voice recognition program for realizing the verification procedure is recorded.

【００４３】また、認識対象音声を入力して音声認識を
行い音声認識結果を出力する音声認識プログラムを記録
した記録媒体であって、上記認識対象音声を入力し音声
特徴量を抽出する音声特徴量抽出手順と、上記音声特徴
量の系列の確率を求めるための音響モデルと冗長語を含
む学習用テキストを入力し冗長語をクラス化して生成さ
れたクラス化された冗長語を含む言語モデルとを用い
て、上記音声特徴量抽出手順で抽出した音声特徴量に対
して照合を行い複数の音声認識結果候補を出力する第１
の照合手順と、この第１の照合手順で出力した複数の音
声認識結果候補に対して、上記クラス化された冗長語を
含む言語モデルと冗長語を除いた学習用テキストを入力
して生成された冗長語を除いた言語モデルとを用いて、
照合を行い音声認識結果を出力する第２の照合手順と、
を実現させる音声認識プログラムを記録したコンピュー
タ読み取り可能な記録媒体にある。A recording medium storing a speech recognition program for inputting a speech to be recognized and performing speech recognition and outputting a speech recognition result, wherein the speech feature quantity for inputting the speech to be recognized and extracting the speech feature quantity. An extraction procedure and a language model including a classified redundant word generated by inputting a training text including a redundant word and an acoustic model for determining the probability of the sequence of the speech feature amount and classifying the redundant word. First, a plurality of voice recognition result candidates are output by comparing the voice feature amounts extracted in the above voice feature amount extraction procedure.
, And a plurality of speech recognition result candidates output in the first matching procedure, a language model including the classified redundant words and a learning text excluding the redundant words are input and generated. Using the language model without redundant words
A second matching procedure for performing matching and outputting a speech recognition result;
And a computer-readable recording medium on which a voice recognition program for realizing is realized.

【００４４】[0044]

【発明の実施の形態】以下、この発明を各実施の形態に
従って説明する。実施の形態１．図１はこの発明の実施の形態１による言
語モデル生成装置の構成を示すブロック図である。図に
おいて１０１は学習用テキスト、１０２は冗長語除去手
段、１０３は冗長語を除いた言語モデル生成手段、１０
４は冗長語を除いた言語モデル、１０５は冗長語を含む
言語モデル生成手段、１０６は冗長語を含む言語モデル
である。これらは一般に、プログラムに従って動作する
コンピュータおよびこれに接続されたデータベースによ
って構成される。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below according to each embodiment. Embodiment 1 FIG. FIG. 1 is a block diagram showing a configuration of a language model generating device according to Embodiment 1 of the present invention. In the figure, 101 is a learning text, 102 is a redundant word removing means, 103 is a language model generating means with redundant words removed, 10
Reference numeral 4 denotes a language model excluding redundant words, reference numeral 105 denotes a language model generating unit including redundant words, and reference numeral 106 denotes a language model including redundant words. These generally consist of a computer operating according to the program and a database connected to it.

【００４５】なお学習用テキスト１０１は、音声認識の
認識対象とする分野の場面、状況において用いられる単
語や文を文字化したものである。例えば、チケットの予
約を行っている対話を認識対象とする場合は、チケット
の予約を行っている対話音声を書き起こしたテキストで
ある。The learning text 101 is obtained by converting words and sentences used in scenes and situations in a field to be recognized by speech recognition into characters. For example, when the dialogue for which a ticket is reserved is to be recognized, the text is a text transcript of the dialogue voice for which the ticket is reserved.

【００４６】図２はこの発明の実施の形態１による言語
モデル生成装置における言語モデル生成方法を示すフロ
ーチャートであり、以下これに従って動作を説明する。FIG. 2 is a flowchart showing a language model generating method in the language model generating apparatus according to the first embodiment of the present invention.

【００４７】冗長語除去手段１０２は、ステップＳＴ１
０１において、学習用テキスト１０１を入力し、学習用
テキスト１０１の中から冗長語を取り除く。ここで冗長
語とは「えーと」、「あのー」等の意味をもたない繋ぎ
の語を指す。冗長語の除去は例えば以下のようになる。
「[あのー]明日から[えーと]三泊したいのですが」([]
内は冗長語)という学習用テキストがあった場合、冗長
語除去手段１０２によって「明日から三泊したいのです
が」という冗長語を除いた学習用テキストが生成され
る。The redundant word elimination means 102 determines in step ST1
In step 01, the learning text 101 is input, and redundant words are removed from the learning text 101. Here, the redundant word refers to a connecting word having no meaning such as “er” and “a”. The redundant word is removed, for example, as follows.
"[Um] I'd like to [um] stay for three nights from tomorrow."
When there is a learning text "redundant word", the redundant word removing means 102 generates a learning text excluding the redundant word "I want to stay for three nights from tomorrow."

【００４８】ステップＳＴ１０２において、冗長語を除
いた言語モデル生成手段１０３では、ステップＳＴ１０
１において生成される冗長語を除いた学習用テキストを
入力して、冗長語を除いた言語モデル１０４の生成を行
う。ここで、言語モデルは上記文献４の３章から５章に
述べられている、N-gramモデル、隠れマルコフモデル、
確率文脈自由文法等を用いる。In step ST102, the language model generating means 103 excluding the redundant word performs the processing in step ST10.
The learning text excluding the redundant words generated in step 1 is input, and the language model 104 excluding the redundant words is generated. Here, the language model is described in Chapters 3 to 5 of the above-mentioned Document 4, N-gram model, hidden Markov model,
Stochastic context-free grammar is used.

【００４９】このようにして生成された冗長語を除いた
言語モデル１０４は、冗長語の影響がないのでスパース
ネスの問題やゼロ頻度の問題が軽減する。従って冗長語
を含まない単語列に対する生起確率の推定精度が高い。The language model 104 thus generated without redundant words has no effect of redundant words, so that the problem of sparseness and the problem of zero frequency are reduced. Therefore, the probability of occurrence probability estimation for a word string that does not include redundant words is high.

【００５０】冗長語を除いた言語モデル１０４による単
語列の生起確率の計算は、例えば「[あのー]・明日・か
ら・[えーと]・三泊・したいの・ですが」([]内は冗長
語、・は単語区切りを表す)という単語列Ｗがあった場
合は、冗長語を除いた「明日・から・三泊・したいの・
ですが」という単語列Ｗ’に対して行う。言語モデルが
2-gramである場合は式(７)のように生起確率を計算す
る。ここでＰ(w_k|w_k-1)は冗長語を除いた言語モデル１
０４で与えられる、単語w_k-1から単語w_kへ接続する確率
である。The calculation of the probability of occurrence of a word string by the language model 104 excluding the redundant word is, for example, "[a-], tomorrow, from, [erto], three nights, want to do." ,. Represents a word delimiter), there is a word string W that excludes redundant words.
Is performed on the word string W '. Language model
If it is a 2-gram, the occurrence probability is calculated as in equation (7). Here, P (w _k | w _k-1 ) is a language model 1 excluding redundant words.
Is given by 04, is the probability of connection from the word w _k-1 to the word w _k.

【００５１】[0051]

【数７】 (Equation 7)

【００５２】ステップＳＴ１０３において、冗長語を含
む言語モデル生成手段１０５では学習用テキスト１０１
を入力して冗長語を含む言語モデル１０６を生成する。
言語モデルは上記文献４の３章から５章に述べられてい
る、N-gramモデル、隠れマルコフモデル、確率文脈自由
文法等を用いる。In step ST103, the language model generating means 105 including the redundant word performs the learning text 101 process.
To generate a language model 106 including redundant words.
As a language model, an N-gram model, a hidden Markov model, a stochastic context-free grammar, and the like described in Chapters 3 to 5 of the above-mentioned Document 4 are used.

【００５３】このようにして生成された冗長語含む言語
モデル１０６は、冗長語を含んだ単語列の生起確率を与
える言語モデルとなり、冗長語の入る傾向を表している
言語モデルとなる。The language model 106 including redundant words generated in this manner is a language model that gives the probability of occurrence of a word string including redundant words, and is a language model indicating the tendency of redundant words to enter.

【００５４】冗長語を含む言語モデル１０６による単語
列の生起確率は、例えば「[あのー]・明日・から・[え
ーと]・三泊・したいの・ですが」([]内は冗長語、・は
単語区切りを表す)という単語列Ｗに対する2-gramモデ
ルによる計算は式(８)によって得る。式(８)においてＰ
_f(w_k|w_k-1)は冗長語を含む言語モデル１０６で与えられ
る単語w_k-1から単語w_kへ連鎖する確率である。The probability of occurrence of a word string based on the language model 106 including redundant words is, for example, “[Ah], tomorrow, from, [erto], three nights, I want to do it.” The calculation using the 2-gram model for the word string W (representing a word break) is obtained by equation (8). In equation (8), P
_f (w _k | w _k-1 ) is the probability of linking from word w _k _-1 to word w _k given by the language model 106 including redundant words.

【００５５】[0055]

【数８】 (Equation 8)

【００５６】音声認識を行う場合は、冗長語を除いた言
語モデル１０４と冗長語を含む言語モデル１０６の両方
を用いて単語の生起確率を計算する。冗長語を含む単語
列をＷ、単語列Ｗから冗長語を除いた単語列をＷ’とし
た場合、例えば式(９)によって求めた対数をとった生起
確率を言語モデルのスコアとする。When speech recognition is performed, the occurrence probability of a word is calculated using both the language model 104 excluding the redundant word and the language model 106 including the redundant word. When a word string including a redundant word is W and a word string obtained by removing the redundant word from the word string W is W ′, for example, an occurrence probability obtained by taking a logarithm obtained by Expression (9) is set as a score of the language model.

【００５７】[0057]

【数９】 (Equation 9)

【００５８】式(９)においてＰ_f(Ｗ)は冗長語を含む単
語列の生起確率、Ｐ(Ｗ’)は冗長語を除いた単語列の生
起確率である。またα₁、α₂は重み係数である。In equation (9), P _f (W) is the occurrence probability of a word string including a redundant word, and P (W ′) is the occurrence probability of a word string excluding the redundant word. Α ₁ and α ₂ are weight coefficients.

【００５９】また、実施の形態１における言語モデル生
成方法を言語モデル生成プログラムとして記録媒体に記
録することもできる。この場合には、冗長語除去手段１
０２と同様の処理を行う冗長語除去手順と、冗長語を除
いた言語モデル生成手段１０３と同様の処理を行う冗長
語を除いた言語モデル生成手順と、冗長語を含む言語モ
デル生成手段１０５と同様の処理を行う冗長語を含む言
語モデル生成手順とから構成される言語モデル生成プロ
グラムを記録媒体に記録する。Further, the language model generating method according to the first embodiment can be recorded on a recording medium as a language model generating program. In this case, the redundant word removing means 1
02, a redundant language-removed language model generating means 105 including a redundant word, and a redundant language-removed language model generating means 105 including a redundant word. A language model generation program including a language model generation procedure including a redundant word for performing similar processing is recorded on a recording medium.

【００６０】以上のように、この実施の形態１における
言語モデル生成装置、言語モデル生成方法によれば、冗
長語を除いた学習用テキストを入力して冗長語を除いた
言語モデルを生成し、冗長語を含む学習用テキストを入
力して冗長語を含む言語モデルを生成するので、冗長語
を除いた言語モデルは冗長語の影響によるスパースネス
やゼロ頻度を軽減するので冗長語を含まない単語列に対
する生起確率の推定精度が高く、また冗長語を含む言語
モデルは冗長語を含む単語連鎖の確率を与える。従って
音声認識に冗長語を除いた言語モデルと、冗長語を含む
言語モデルの両方を用いることで高い認識率が得られる
効果がある。As described above, according to the language model generating apparatus and the language model generating method in the first embodiment, a learning text excluding redundant words is input to generate a language model excluding redundant words. A language model containing redundant words is generated by inputting the training text containing redundant words, and the language model without redundant words reduces sparseness and zero frequency due to the effects of redundant words. Is highly accurate, and a language model including redundant words gives the probability of a word chain including redundant words. Therefore, by using both the language model excluding the redundant words and the language model including the redundant words for speech recognition, a high recognition rate can be obtained.

【００６１】実施の形態２．図３はこの発明の実施の形
態２による言語モデル生成装置の構成を示すブロック図
である。図において、図１に示す実施の形態１と同一も
しくは相当部分は同一の符号で示し説明を省略する。２
０１はクラス化された冗長語を含む言語モデル生成手
段、２０２はクラス化された冗長語を含む言語モデルで
ある。Embodiment 2 FIG. 3 is a block diagram showing a configuration of a language model generation device according to Embodiment 2 of the present invention. In the figure, the same or corresponding parts as those in the first embodiment shown in FIG. 2
Reference numeral 01 denotes a language model generating unit including a classified redundant word, and reference numeral 202 denotes a language model including a classified redundant word.

【００６２】図４はこの発明の実施の形態２による言語
モデル生成装置における言語モデル生成方法を示すフロ
ーチャートであり、以下これに従って動作を説明する。FIG. 4 is a flowchart showing a language model generating method in the language model generating apparatus according to the second embodiment of the present invention.

【００６３】ステップＳＴ２０１とステップＳＴ２０２
の処理は、実施の形態１の図２におけるステップＳＴ１
０１とステップＳＴ１０２の処理と同一である。Step ST201 and step ST202
The processing of step ST1 in FIG.
01 and the process of step ST102.

【００６４】ステップＳＴ２０３において、クラス化さ
れた冗長語を含む言語モデル生成手段２０１は、学習用
テキスト１０１を入力してクラス化された冗長語を含む
言語モデル２０２を生成する。ここでクラスとは複数の
単語をグループとして扱うことである。冗長語のクラス
化は冗長語を１つのクラスｃ^fとする。このときの冗長
語とは例えば図１３に示した単語である。言語モデルが
N-gramモデルである場合は、単語列w_t+1-N ^t-1から冗長
語ｗ_tへ接続する確率は式(１０)で計算する。In step ST203, the language model generating means 201 including the classified redundant words inputs the learning text 101 and generates a language model 202 including the classified redundant words. Here, the class is to treat a plurality of words as a group. Classification of the redundant word to the redundant word and a class c ^f. The redundant word at this time is, for example, the word shown in FIG. Language model
In the case of the N-gram model, the probability of connection from the word string w _{t + 1−N} ^t−1 to the redundant word w _t is calculated by equation (10).

【００６５】[0065]

【数１０】 (Equation 10)

【００６６】式(１０)において、Ｐ(ｃ^f|w_t+1-N ^t-1)は
単語列w_t+1-N ^t-1から冗長語のクラスｃ^fへ接続する確
率、Ｐ(w_t|c^f)は冗長語クラスｃ^fから冗長語ｗ_tが生起
する確率である。冗長語は、どの単語にも接続する可能
性があり、種類も多いのでスパースネスやゼロ頻度問題
を引き起こす原因となるが、冗長語をクラス化すること
で上記の問題を軽減でき、性能の高い言語モデルが生成
できる。[0066] In the formula ^{(10), P (c f} | w t + 1-N t-1) is the probability to be connected to the word sequence _{^{w t + 1-N t-}} 1 from the redundant word of class c ^f, P ( w _t | c ^f) is the probability of occurrence is redundant word w _t from the redundant word class c ^f. Redundant words can be connected to any word, and there are so many types of words that they cause sparseness and zero-frequency problems. A model can be generated.

【００６７】このクラス化された冗長語を含む言語モデ
ル２０２による単語列の生起確率計算の具体例について
述べる。例えば「[あのー]・明日・から・[えーと]・三
泊・したいの・ですが」という単語列Ｗがあった場合
に、クラス化された冗長語を含む言語モデルが2-gramで
あるならば、式(１１)のように生起確率を計算する。A specific example of the word string occurrence probability calculation by the language model 202 including the classified redundant words will be described. For example, if there is a word string W such as "[Ah], tomorrow, from, [erto], three nights, I want to ...", and if the language model containing the redundant words that are classified is 2-gram , The occurrence probability is calculated as shown in Expression (11).

【００６８】[0068]

【数１１】 [Equation 11]

【００６９】音声認識を行う場合は、冗長語を除いた言
語モデル１０４とクラス化された冗長語を含む言語モデ
ル２０２の両方を用いて単語の生起確率を計算する。冗
長語を含む単語列をＷ、単語列Ｗから冗長語を除いた単
語列をＷ’とした場合、例えば式(１２)によって求め
た、対数をとった生起確率を言語モデルのスコアにす
る。When speech recognition is performed, the occurrence probability of a word is calculated using both the language model 104 excluding the redundant word and the language model 202 including the classified redundant word. Assuming that a word string including a redundant word is W and a word string obtained by removing the redundant word from the word string W is W ′, for example, a logarithmic occurrence probability obtained by Expression (12) is used as a score of the language model.

【００７０】[0070]

【数１２】 (Equation 12)

【００７１】式(１２)においてＰ(Ｗ’)は冗長語を除い
た言語モデル１０４による単語列Ｗ’の生起確率、Ｐ^f _c
(Ｗ)はクラス化された冗長語を含む言語モデル２０２に
よる単語列Ｗの生起確率である。また、α₁、α₂は重み
係数である。[0071] probability of Formula (12) in P (W ') is a word sequence W by language model 104, except for the redundant word', P ^f _c
(W) is the occurrence probability of the word string W by the language model 202 including the classified redundant words. Α ₁ and α ₂ are weight coefficients.

【００７２】また、実施の形態２における言語モデル生
成方法を言語モデル生成プログラムとして記録媒体に記
録することもできる。この場合には、冗長語除去手段１
０２と同様の処理を行う冗長語除去手順と、冗長語を除
いた言語モデル生成手段１０３と同様の処理を行う冗長
語を除いた言語モデル生成手順と、クラス化された冗長
語を含む言語モデル生成手段２０１と同様の処理を行う
クラス化された冗長語を含む言語モデル生成手順とから
構成される言語モデル生成プログラムを記録媒体に記録
する。Further, the language model generating method according to the second embodiment can be recorded on a recording medium as a language model generating program. In this case, the redundant word removing means 1
02, a redundant word-excluding procedure for performing the same processing as that of the language model generating unit 103 except for the redundant word, and a language model including the classified redundant word. A language model generation program including a language model generation procedure including a classified redundant word that performs the same processing as the generation unit 201 is recorded on a recording medium.

【００７３】以上のように、この実施の形態２における
言語モデル生成装置、言語モデル生成方法によれば、冗
長語を除いた学習用テキストを入力して冗長語を除いた
言語モデルを生成し、冗長語を含む学習用テキストを入
力してクラス化された冗長語を含む言語モデルを生成す
るので、冗長語を除いた言語モデルは冗長語の影響によ
るスパースネスやゼロ頻度を軽減するので冗長語含まな
い単語列に対する生起確率の推定精度が高く、またクラ
ス化された冗長語を含む言語モデルは冗長語を含む単語
連鎖の確率を与える。従って音声認識に冗長語を除いた
言語モデルと、クラス化された冗長語を含む言語モデル
の両方を用いることで高い認識率が得られる効果があ
る。As described above, according to the language model generating apparatus and the language model generating method in the second embodiment, a learning text excluding redundant words is input to generate a language model excluding redundant words. Since a language model containing redundant words is generated by inputting training texts containing redundant words, the language model excluding redundant words reduces sparseness and zero frequency due to redundant words. The estimation accuracy of the probability of occurrence for a non-existent word string is high, and the language model including the classified redundant words gives the probability of the word chain including the redundant words. Therefore, by using both the language model excluding the redundant words and the language model including the classified redundant words for speech recognition, a high recognition rate can be obtained.

【００７４】実施の形態３．図５はこの発明の実施の形
態３による音声認識装置に構成を示すブロック図であ
る。図において、上記実施の形態および従来の装置と同
一もしくは相当部分は同一の符号で示し説明を省略す
る。３０１は照合手段である。Embodiment 3 FIG. 5 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 3 of the present invention. In the figure, the same or corresponding parts as those of the above-described embodiment and the conventional apparatus are denoted by the same reference numerals, and description thereof is omitted. Reference numeral 301 denotes a matching unit.

【００７５】図６はこの発明の実施の形態３による音声
認識装置における音声認識方法を示すフローチャートで
あり、以下これに従って動作を説明する。FIG. 6 is a flowchart showing a speech recognition method in the speech recognition apparatus according to Embodiment 3 of the present invention, and the operation will be described below in accordance with the flowchart.

【００７６】音声特徴量抽出手段１００２はステップＳ
Ｔ３０１において認識対象音声１００１を入力し、ステ
ップＳＴ３０２において音声特徴量を抽出する。ここで
音声特徴量とは少ない情報量で音声の特徴を表すもので
あり、例えば文献１の５章で述べているようなケプスト
ラム、ケプストラムの動的特徴で構成する特徴ベクトル
である。The voice feature extraction means 1002 determines whether the
In T301, a recognition target voice 1001 is input, and in step ST302, a voice feature amount is extracted. Here, the speech feature amount represents a speech feature with a small amount of information, and is, for example, a cepstrum and a feature vector composed of dynamic features of the cepstrum as described in Chapter 5 of Document 1.

【００７７】ステップＳＴ３０３において、照合手段３
０１は、冗長語を含む言語モデル１０６と、冗長語を除
いた言語モデル１０４と、音響モデル１００３を入力し
て認識対象音声１００１の音声特徴量に対して照合を行
い、最も照合スコアが高い単語列を音声認識結果１００
６として出力する。In step ST303, the matching means 3
01, a language model 106 including a redundant word, a language model 104 excluding the redundant word, and an acoustic model 1003 are inputted and collation is performed on the speech feature amount of the speech 1001 to be recognized, and the word having the highest collation score is obtained. The column is the speech recognition result 100
Output as 6.

【００７８】この場合の照合処理を具体的に説明する。
照合手段３０１は冗長語を含む言語モデル１０６、及び
冗長語を除いた言語モデル１０４が設定している認識対
象の単語 [V(1), V(2), ..., V(vn)] (vnは認識対象と
する単語数)の発音表記を認識ユニットラベル表記に変
換し、このラベルに従って音響モデル１００３に格納さ
れている音素ユニットのＨＭＭを連結し、認識対象単語
の標準パタン [λ_V(1),λ_V(2), ..., λ_V(vn)] を作成
する。そして音声特徴量抽出手段１００２の出力である
音声特徴量Ｏに対して認識対象単語の標準パタンを用い
て計算する単語列Ｗの音響スコアＰ(Ｏ|Ｗ)と、冗長語
を含む言語モデル１０６によって計算する単語列Ｗの生
起確率Ｐ_f(Ｗ)と、冗長語を除いた言語モデル１０４に
よって計算する単語列Wから冗長語を除いた単語列W’の
生起確率Ｐ(Ｗ’)によって照合スコアを求める。照合ス
コアは例えば式(１３)によって計算する。The collation processing in this case will be specifically described.
The matching unit 301 recognizes words [V (1), V (2),..., V (vn)] that are set by the language model 106 including the redundant words and the language model 104 excluding the redundant words. The phonetic notation of (vn is the number of words to be recognized) is converted into a recognition unit label notation, and the HMMs of the phoneme units stored in the acoustic model 1003 are connected according to the label, and the standard pattern of the word to be recognized [λ _{V (1)} , λ _{V (2)} , ..., λ _{V (vn)} ]. Then, an acoustic score P (O | W) of a word string W calculated using a standard pattern of a recognition target word for a speech feature amount O output from the speech feature amount extraction unit 1002, and a language model 106 including a redundant word By the occurrence probability P _f (W) of the word string W calculated by the word model W and the occurrence probability P (W ′) of the word string W ′ obtained by removing the redundant word from the word string W calculated by the language model 104 excluding the redundant word. Find a score. The collation score is calculated by, for example, Expression (13).

【００７９】[0079]

【数１３】 (Equation 13)

【００８０】式(１３)においてα₁、α₂は重み係数であ
る。この照合スコアの値が最も大きい単語列 RW=[V(r
(1)), V(r(2)), ..., V(r(m))] が音声認識結果１００
６となる。ここでｒ(ｉ)は音声認識結果の単語系列のｉ
番目の単語の単語番号を示す。また、ｍは認識単語系列
の単語数を示す。In the equation (13), α ₁ and α ₂ are weight coefficients. The word string RW = [V (r
(1)), V (r (2)), ..., V (r (m))] are the speech recognition results 100
It becomes 6. Here, r (i) is i of the word sequence of the speech recognition result.
Indicates the word number of the second word. M indicates the number of words in the recognized word sequence.

【００８１】また、実施の形態３における音声認識方法
を音声認識プログラムとして記録媒体に記録することも
できる。この場合には実施の形態１の言語モデル生成プ
ログラムに加えて、音声特徴量抽出手段１００２と同様
の処理を実現する音声特徴量抽出手順と、照合手段３０
１と同様の処理を実現する照合手順とを含む音声認識プ
ログラムを記録媒体に記録する。The voice recognition method according to the third embodiment can be recorded on a recording medium as a voice recognition program. In this case, in addition to the language model generation program of the first embodiment, a speech feature extraction procedure for realizing the same processing as the speech feature extraction unit 1002, and a matching unit 30
A speech recognition program including a collation procedure for realizing the same processing as in step 1 is recorded on a recording medium.

【００８２】以上のように、この実施の形態３における
音声認識装置、音声認識方法によれば、冗長語を除いた
学習用テキストを入力して冗長語を除いた言語モデルを
生成し、冗長語を含む学習用テキストを入力して冗長語
を含む言語モデルを生成するので、冗長語を除いた言語
モデルは冗長語の影響によるスパースネスやゼロ頻度を
軽減するので冗長語を含まない単語列に対する生起確率
の推定精度が高く、また冗長語を含む言語モデルは冗長
語を含む単語連鎖の確率を与える。この冗長語を除いた
言語モデルと、冗長語を含む言語モデルの両方を音声認
識に用いるので高い認識率が得られる効果がある。As described above, according to the speech recognition apparatus and the speech recognition method of the third embodiment, a learning model excluding redundant words is input to generate a language model excluding redundant words, and a redundant word model is generated. A language model containing redundant words is generated by inputting training texts containing redundant words, and the language model without redundant words reduces sparseness and zero frequency due to the effects of redundant words. A language model that has a high probability estimation accuracy and includes a redundant word gives the probability of a word chain including the redundant word. Since both the language model excluding the redundant word and the language model including the redundant word are used for speech recognition, an effect of obtaining a high recognition rate is obtained.

【００８３】実施の形態４．図７はこの発明の実施の形
態４による音声認識装置の構成を示すブロック図であ
る。図において、上記実施の形態および従来の装置と同
一もしくは相当部分は同一の符号で示し説明を省略す
る。４０１は第１の照合手段１、４０２は第２の照合手
段、４０３は音声認識結果候補である。Embodiment 4 FIG. 7 is a block diagram showing a configuration of a voice recognition device according to Embodiment 4 of the present invention. In the figure, the same or corresponding parts as those of the above-described embodiment and the conventional apparatus are denoted by the same reference numerals, and description thereof will be omitted. Reference numeral 401 denotes a first matching unit 1; 402, a second matching unit; and 403, speech recognition result candidates.

【００８４】図８はこの発明の実施の形態４による音声
認識装置における音声認識方法を示すフローチャートで
あり、以下これに従って動作を説明する。FIG. 8 is a flowchart showing a speech recognition method in a speech recognition apparatus according to Embodiment 4 of the present invention, and the operation will be described below in accordance with the flowchart.

【００８５】ステップＳＴ４０１及びステップＳＴ４０
２の処理は実施の形態３における図６のステップＳＴ３
０１及びステップＳＴ３０２の処理と同一である。Step ST401 and step ST40
The processing in step 2 corresponds to step ST3 in FIG.
01 and the process of step ST302.

【００８６】ステップＳＴ４０３において、第１の照合
手段４０１は、冗長語を含む言語モデル１０６と、音響
モデル１００３とを入力して認識対象音声１００１の音
声特徴量に対して照合を行い、照合スコアが高い順に複
数の単語列を音声認識結果候補４０３として出力する。In step ST403, the first matching means 401 inputs the language model 106 including the redundant word and the acoustic model 1003 and performs matching on the speech feature amount of the speech 1001 to be recognized. A plurality of word strings are output as speech recognition result candidates 403 in descending order.

【００８７】この場合の照合処理を具体的に説明する。
第１の照合手段４０１は冗長語を含む言語モデル１０６
が設定している認識対象の単語 [V(1), V(2), ..., V(v
n)](vnは認識対象とする単語数)の発音表記を認識ユニ
ットラベル表記に変換し、このラベルに従って音響モデ
ル１００３に格納されている音素ユニットのＨＭＭを連
結し、認識対象単語の標準パタン [λ_V(1), λ_V(2),
..., λ_V(vn)] を作成する。そして音声特徴量抽出手
段１００２の出力である音声特徴量Ｏに対して認識対象
単語の標準パタンを用いて計算する単語列Ｗの音響スコ
アＰ(Ｏ|Ｗ)と、冗長語を含む言語モデル１０６によっ
て計算する単語列Ｗの生起確率Ｐ_f(Ｗ)とによって照合
スコアを求める。照合スコアは例えば式(１４)によって
計算する。The collation processing in this case will be specifically described.
The first matching unit 401 is a language model 106 including a redundant word.
, V (1), V (2), ..., V (v
n)] (where vn is the number of words to be recognized) is converted into a recognition unit label notation, and the HMMs of the phoneme units stored in the acoustic model 1003 are connected according to the label to form a standard pattern of the recognition target word. [λ _{V (1)} , λ _{V (2)} ,
..., λ _{V (vn)} ]. Then, an acoustic score P (O | W) of a word string W calculated using a standard pattern of a recognition target word for a speech feature amount O output from the speech feature amount extraction unit 1002, and a language model 106 including a redundant word The matching score is obtained from the occurrence probability P _f (W) of the word string W calculated by the following. The collation score is calculated by, for example, Expression (14).

【００８８】[0088]

【数１４】 [Equation 14]

【００８９】ここでαは重み係数である。第１の照合手
段４０１では、この照合スコアＦ₁(Ｏ,Ｗ)の値が大きい
複数の単語列 RW₁, RW₂,... , RW_N (RW_k=[V_k(r_k(1)), V
_k(r_k(2)), ... , V_k(r_k(m_k))]) を音声認識結果候補４
０３として出力する。Here, α is a weight coefficient. In the first matching unit 401, a plurality of word strings RW ₁ value is greater the matching score _{F 1 (O, W),} RW 2, ..., RW N (RW k = [V k (r k (1 )), V
_k (r _k (2)),..., V _k (r _k (m _k ))])
03 is output.

【００９０】ステップＳＴ４０４において、第２の照合
手段４０２は、冗長語を含む言語モデル１０６と、冗長
語を除いた言語モデル１０４と、音響モデル１００３を
入力して、第１の照合手段４０１の出力である複数の音
声認識結果候補４０３の単語列に対し照合を行い、最も
照合スコアが高い単語列を音声認識結果１００６として
出力する。In step ST404, the second matching means 402 inputs the language model 106 including the redundant word, the language model 104 excluding the redundant word, and the acoustic model 1003, and outputs the output of the first matching means 401. , And the word string having the highest matching score is output as the speech recognition result 1006.

【００９１】この場合の照合処理を具体的に説明する。
第２の照合手段４０２は冗長語を含む言語モデル１０
６、及び冗長語を除いた言語モデル１０４が設定してい
る認識対象の単語 [V(1), V(2), ..., V(vn)] (vnは認
識対象とする単語数)の発音表記を認識ユニットラベル
表記に変換し、このラベルに従って音響モデル１００３
に格納されている音素ユニットのＨＭＭを連結し、認識
対象単語の標準パタン [λ _V(1), λ_V(2), ...,
λ_V(vn)] を作成する。そして音声特徴量抽出手段１０
０２の出力である音声特徴量Ｏに対して認識対象単語の
標準パタンを用いて計算する音声認識結果候補４０３の
単語列RW_k(k=1〜N、Nは候補数)の音響スコアＰ(Ｏ|RW _k)
と、冗長語を含む言語モデル１０６によって計算する音
声認識結果候補４０３の単語列RW_kの生起確率Ｐ_f(RW_k)
と、冗長語を除いた言語モデル１０６によって計算する
音声認識結果候補４０３の単語列RW_kから冗長語を除い
た単語列RW’_kの生起確率Ｐ(RW’_k)によって照合スコア
を求める。照合スコアは例えば式(１５)によって計算す
る。The collation processing in this case will be specifically described.
The second collation unit 402 determines whether the language model 10
6 and the language model 104 excluding the redundant words
Target words [V (1), V (2), ..., V (vn)] (where vn is
Recognition of phonetic notation of the number of words to be recognized)
Is converted to the notation, and the acoustic model 1003
HMMs of phoneme units stored in
Standard pattern of target word [λ _{V (1)}, λ_{V (2)}, ...,
λ_{V (vn)}]. And the voice feature extraction means 10
02 for the speech feature O output from
Of the speech recognition result candidate 403 calculated using the standard pattern
Word string RW_k(k = 1 to N, N is the number of candidates) acoustic score P (O | RW _k)
And sounds calculated by the language model 106 including redundant words
Word sequence RW of voice recognition result candidate 403_kOccurrence probability P_f(RW_k)
And the language model 106 excluding redundant words
Word sequence RW of speech recognition result candidate 403_kRemove redundant words from
Word string RW '_kOccurrence probability P (RW '_k) By matching score
Ask for. The collation score is calculated by, for example, equation (15).
You.

【００９２】[0092]

【数１５】 (Equation 15)

【００９３】式(１５)においてα₁、α₂は重み係数であ
る。この照合スコアの値が最も大きい単語列 RW=[V(r
(1)), V(r(2)), ..., V(r(m))] が音声認識結果１００
６となる。ここでｒ(ｉ)は音声認識結果の単語系列のｉ
番目の単語の単語番号を示す。また、ｍは認識単語系列
の単語数を示す。In the equation (15), α ₁ and α ₂ are weight coefficients. The word string RW = [V (r
(1)), V (r (2)), ..., V (r (m))] are the speech recognition results 100
It becomes 6. Here, r (i) is i of the word sequence of the speech recognition result.
Indicates the word number of the second word. M indicates the number of words in the recognized word sequence.

【００９４】また、実施の形態４における音声認識方法
を音声認識プログラムとして記録媒体に記録することも
できる。この場合には実施の形態１の言語モデル生成プ
ログラムに加えて、音声特徴量抽出手段１００２と同様
の処理を実現する音声特徴量抽出手順と、第１の照合手
段４０１と同様の処理を実現する第１の照合手順と、第
２の照合手段と同様の処理を実現する第２の照合手順と
を含む音声認識プログラムを記録媒体に記録する。Further, the voice recognition method according to Embodiment 4 can be recorded on a recording medium as a voice recognition program. In this case, in addition to the language model generation program of the first embodiment, a speech feature amount extraction procedure for realizing the same process as speech feature amount extraction unit 1002 and a process similar to first collation unit 401 are realized. A voice recognition program including a first collation procedure and a second collation procedure realizing the same processing as the second collation means is recorded on a recording medium.

【００９５】以上のように、この実施の形態４における
音声認識装置、音声認識方法によれば、冗長語を除いた
学習用テキストを入力して冗長語を除いた言語モデルを
生成し、冗長語を含む学習用テキストを入力して冗長語
を含む言語モデルを生成するので、冗長語を除いた言語
モデルは冗長語の影響によるスパースネスやゼロ頻度を
軽減するので冗長語を含まない単語列に対する生起確率
の推定精度が高く、また冗長語を含む言語モデルは冗長
語を含む単語連鎖の確率を与える。そして、音声認識で
は冗長語を含む言語モデルを用いて音声認識結果候補を
出力し、冗長語を除いた言語モデルと冗長語を含む言語
モデルの両方によって音声認識結果候補から音声認識結
果を選び出すので、高い認識率が得られる効果がある。As described above, according to the speech recognition apparatus and the speech recognition method of the fourth embodiment, a learning model excluding redundant words is input, a language model excluding redundant words is generated, and a redundant word model is generated. A language model containing redundant words is generated by inputting training texts containing redundant words, and the language model without redundant words reduces sparseness and zero frequency due to the effects of redundant words. A language model that has a high probability estimation accuracy and includes a redundant word gives the probability of a word chain including the redundant word. Then, in speech recognition, a speech recognition result candidate is output using a language model including a redundant word, and a speech recognition result is selected from the speech recognition result candidate using both the language model excluding the redundant word and the language model including the redundant word. There is an effect that a high recognition rate can be obtained.

【００９６】実施の形態５．図９はこの発明の実施の形
態５による音声認識装置に構成を示すブロック図であ
る。図において、上記実施の形態および従来の装置と同
一もしくは相当部分は同一の符号で示し説明を省略す
る。また図１０はこの発明の実施の形態５による音声認
識装置における音声認識方法を示すフローチャートであ
り、以下これに従って動作を説明する。Embodiment 5 FIG. 9 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 5 of the present invention. In the figure, the same or corresponding parts as those of the above-described embodiment and the conventional apparatus are denoted by the same reference numerals, and description thereof will be omitted. FIG. 10 is a flowchart showing a speech recognition method in the speech recognition apparatus according to Embodiment 5 of the present invention, and the operation will be described below in accordance with the flowchart.

【００９７】ステップＳＴ５０１及びステップＳＴ５０
２は実施の形態３における図６のステップＳＴ３０１及
びステップＳＴ３０２の処理と同一である。Step ST501 and step ST50
2 is the same as the processing in step ST301 and step ST302 in FIG.

【００９８】ステップＳＴ５０３において、照合手段３
０１は、クラス化された冗長語を含む言語モデル２０２
と、冗長語を除いた言語モデル１０４と、音響モデル１
００３を入力して認識対象音声１００１の音声特徴量に
対して照合を行い、最も照合スコアが高い単語列を音声
認識結果１００６として出力する。In step ST503, the matching means 3
01 is a language model 202 containing redundant words that have been classified
, A language model 104 excluding redundant words, and an acoustic model 1
003 is input to perform matching on the speech feature amount of the recognition target speech 1001, and a word string having the highest matching score is output as the speech recognition result 1006.

【００９９】この場合の照合処理を具体的に説明する。
照合手段３０１はクラス化された冗長語を含む言語モデ
ル２０２、及び冗長語を除いた言語モデル１０４が設定
している認識対象の単語 [V(1), V(2), ..., V(vn)] (v
nは認識対象とする単語数)の発音表記を認識ユニットラ
ベル表記に変換し、このラベルに従って音響モデル１０
０３に格納されている音素ユニットのＨＭＭを連結し、
認識対象単語の標準パタン [λ_V(1), λ_V(2), ..., λ
_V(vn)] を作成する。そして音声特徴量抽出手段１００
２の出力である音声特徴量Ｏに対して認識対象単語の標
準パタンを用いて計算する単語列Ｗの音響スコアＰ(Ｏ|
Ｗ)と、クラス化された冗長語を含む言語モデル２０２
によって計算する単語列Ｗの生起確率Ｐ_f ^c(Ｗ)と、冗長
語を除いた言語モデル１０４によって計算する単語列Ｗ
から冗長語を除いた単語列Ｗ’の生起確率Ｐ(Ｗ’)によ
って照合スコアを求める。照合スコアは例えば式(１６)
によって計算する。The collation processing in this case will be specifically described.
The matching unit 301 determines the recognition target words [V (1), V (2),..., V set by the language model 202 including the classified redundant words and the language model 104 excluding the redundant words. (vn)] (v
(n is the number of words to be recognized) is converted into a recognition unit label notation, and the acoustic model 10
HMMs of the phoneme units stored in 03 are connected,
Standard pattern of recognition target words [λ _{V (1)} , λ _{V (2)} , ..., λ
_{V (vn)} ]. Then, the voice feature amount extraction means 100
The acoustic score P (O | of the word string W calculated using the standard pattern of the word to be recognized with respect to the speech feature O output as
W) and the language model 202 including the classified redundant words
Occurrence probability of the word sequence W calculated by P _f ^c and (W), the word string W calculated by the language model 104, except for the redundant word
The collation score is obtained from the occurrence probability P (W ') of the word string W' obtained by removing the redundant words from. The collation score is, for example, equation (16)
Calculate by

【０１００】[0100]

【数１６】 (Equation 16)

【０１０１】式(１６)においてα₁、α₂は重み係数であ
る。この照合スコアの値が最も大きい単語列 RW=[V(r
(1)), V(r(2)), ..., V(r(m))] が音声認識結果１００
６となる。ここでｒ(ｉ)は音声認識結果の単語系列のｉ
番目の単語の単語番号を示す。また、ｍは認識単語系列
の単語数を示す。In the equation (16), α ₁ and α ₂ are weight coefficients. The word string RW = [V (r
(1)), V (r (2)), ..., V (r (m))] are the speech recognition results 100
It becomes 6. Here, r (i) is i of the word sequence of the speech recognition result.
Indicates the word number of the second word. M indicates the number of words in the recognized word sequence.

【０１０２】また、実施の形態５における音声認識方法
を音声認識プログラムとして記録媒体に記録することも
できる。この場合には実施の形態２の言語モデル生成プ
ログラムに加えて、音声特徴量抽出手段１００２と同様
の処理を実現する音声特徴量抽出手順と、照合手段３０
１と同様の処理を実現する照合手順とを含む音声認識プ
ログラムを記録媒体に記録する。Further, the speech recognition method according to the fifth embodiment can be recorded on a recording medium as a speech recognition program. In this case, in addition to the language model generation program of the second embodiment, a speech feature amount extraction procedure for realizing the same processing as the speech feature amount extraction unit 1002, and a matching unit 30
A speech recognition program including a collation procedure for realizing the same processing as in step 1 is recorded on a recording medium.

【０１０３】以上のように、この実施の形態５における
音声認識装置、音声認識方法によれば、冗長語を除いた
学習用テキストを入力して冗長語を除いた言語モデルを
生成し、冗長語を含む学習用テキストを入力してクラス
化された冗長語を含む言語モデルを生成するので、冗長
語を除いた言語モデルは冗長語の影響によるスパースネ
スやゼロ頻度を軽減するので冗長語含まない単語列に対
する生起確率の推定精度が高く、またクラス化された冗
長語を含む言語モデルは冗長語を含む単語連鎖の確率を
与える。この冗長語を除いた言語モデルと、クラス化さ
れた冗長語を含む言語モデルの両方を音声認識に用いる
ので高い認識率が得られる効果がある。As described above, according to the speech recognition apparatus and the speech recognition method of the fifth embodiment, a learning model excluding redundant words is input to generate a language model excluding redundant words, and a redundant word model is generated. A language model containing redundant words is generated by inputting learning texts containing words, and the language model excluding redundant words reduces sparseness and zero frequency due to the effects of redundant words, so words that do not include redundant words The estimation accuracy of the occurrence probability for the sequence is high, and the language model including the classified redundant words gives the probability of the word chain including the redundant words. Since both the language model excluding the redundant words and the language model including the classified redundant words are used for speech recognition, there is an effect that a high recognition rate can be obtained.

【０１０４】実施の形態６．図１１はこの発明の実施の
形態６による音声認識装置の構成を示すブロック図であ
る。図において、上記実施の形態および従来の装置と同
一もしくは相当部分は同一の符号で示し説明を省略す
る。また図１２はこの発明の実施の形態６による音声認
識装置における音声認識方法を示すフローチャートであ
り、以下これに従って動作を説明する。Embodiment 6 FIG. FIG. 11 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 6 of the present invention. In the figure, the same or corresponding parts as those of the above-described embodiment and the conventional apparatus are denoted by the same reference numerals, and description thereof is omitted. FIG. 12 is a flowchart showing a speech recognition method in the speech recognition apparatus according to Embodiment 6 of the present invention, and the operation will be described below in accordance with the flowchart.

【０１０５】ステップＳＴ６０１及びステップＳＴ６０
２の処理は実施の形態４における図８のステップＳＴ４
０１及びステップＳＴ４０２の処理と同一である。Step ST601 and step ST60
The process of step 2 corresponds to step ST4 in FIG.
01 and the process of step ST402.

【０１０６】ステップＳＴ６０３において、第１の照合
手段４０１は、クラス化された冗長語を含む言語モデル
２０２と音響モデル１００３とを入力して認識対象音声
１００１の音声特徴量に対して照合を行い、照合スコア
が高い順に複数の単語列を音声認識結果候補４０３とし
て出力する。In step ST603, the first matching means 401 inputs the language model 202 including the classified redundant words and the acoustic model 1003, and performs matching on the speech feature amount of the speech 1001 to be recognized. A plurality of word strings are output as speech recognition result candidates 403 in descending order of the matching score.

【０１０７】この場合の照合処理を具体的に説明する。
第１の照合手段４０１はクラス化された冗長語を含む言
語モデル２０２が設定している認識対象の単語 [V(1),
V(2), ..., V(vn)] (vnは認識対象とする単語数)の発音
表記を認識ユニットラベル表記に変換し、このラベルに
従って音響モデル１００３に格納されている音素ユニッ
トのＨＭＭを連結し、認識対象単語の標準パタン [λ
_V(1), λ_V(2), ..., λ_V _(vn)] を作成する。そして音声
特徴量抽出手段１００２の出力である音声特徴量Ｏに対
して認識対象単語の標準パタンを用いて計算する単語列
Ｗの音響スコアＰ(Ｏ|Ｗ)と、クラス化された冗長語を
含む言語モデル２０２によって計算する単語列Ｗの生起
確率Ｐ_f ^c(Ｗ)によって照合スコアを求める。照合スコア
は例えば式(１７)によって計算する。The collation processing in this case will be specifically described.
The first matching unit 401 determines the recognition target word [V (1), set by the language model 202 including the classified redundant word.
V (2),..., V (vn)] (where vn is the number of words to be recognized) is converted into a recognition unit label notation, and the phoneme unit stored in the acoustic model 1003 is converted according to the label. HMMs are concatenated to form a standard pattern [λ
_{V (1)} , λ _{V (2)} , ..., λ _V _(vn) ]. The acoustic score P (O | W) of the word string W calculated using the standard pattern of the recognition target word with respect to the speech feature amount O output from the speech feature amount extraction unit 1002, finding matching score by probability P _f ^c (W) of the word sequence W calculated by the language model 202, including. The collation score is calculated by, for example, Expression (17).

【０１０８】[0108]

【数１７】 [Equation 17]

【０１０９】ここでαは重み係数である。第１照合手段
４０１では、この照合スコアＦ₁(Ｏ,Ｗ)の値が大きい複
数の単語列 RW₁, RW₂,... , RW_N (RW_k=[V_k(r_k(1)), V
_k(r_k(2)), ... , V_k(r_k(m_k))]) を音声認識結果候補４
０３として出力する。Here, α is a weight coefficient. In the first matching unit 401, the matching score F ₁ (O, W) a plurality of word strings RW ₁ value is _{_{large, RW 2, ..., RW N}} (RW k = [V k (r k (1) ), V
_k (r _k (2)),..., V _k (r _k (m _k ))])
03 is output.

【０１１０】ステップＳＴ６０４において、第２の照合
手段４０２は、クラス化された冗長語を含む言語モデル
２０２と、冗長語を除いた言語モデル１０４と、音響モ
デル１００３を入力して、第１の照合手段４０１の出力
である複数の音声認識結果候補４０３の単語列に対し照
合を行い、最も照合スコアが高い単語列を音声認識結果
１００６として出力する。In step ST604, the second matching means 402 inputs the language model 202 including the classified redundant words, the language model 104 excluding the redundant words, and the acoustic model 1003, and performs the first matching. The collation is performed on the word strings of the plurality of speech recognition result candidates 403 output from the means 401, and the word string having the highest collation score is output as the speech recognition result 1006.

【０１１１】この場合の照合処理を具体的に説明する。
第２の照合手段４０２はクラス化された冗長語を含む言
語モデル２０２、及び冗長語を除いた言語モデル１０４
が設定している認識対象の単語 [V(1), V(2), ..., V(v
n)] (vnは認識対象とする単語数)の発音表記を認識ユニ
ットラベル表記に変換し、このラベルに従って音響モデ
ル１００３に格納されている音素ユニットのＨＭＭを連
結し、認識対象単語の標準パタン [λ_V(1), λ_V(2),
..., λ_V(vn)] を作成する。そして音声特徴量抽出手
段１００２の出力である音声特徴量Ｏに対して認識対象
単語の標準パタンを用いて計算する音声認識結果候補４
０３の単語列RW_k(k=1〜N、Nは候補数)の音響スコアＰ
(Ｏ|RW_k)と、クラス化された冗長語を含む言語モデル２
０２によって計算する音声認識結果候補４０３の単語列
RW_kの生起確率Ｐ_f ^c(RW_k)と、冗長語を除いた言語モデル
１０４によって計算する音声認識結果候補４０３の単語
列RW_kから冗長語を除いた単語列RW’_kの生起確率Ｐ(R
W’_k)によって照合スコアを求める。照合スコアは例え
ば式(１８)によって計算する。The collation processing in this case will be specifically described.
The second matching unit 402 includes a language model 202 including a classified redundant word and a language model 104 excluding the redundant word.
, V (1), V (2), ..., V (v
n)] (where vn is the number of words to be recognized) is converted into a recognition unit label notation, and the HMMs of the phoneme units stored in the acoustic model 1003 are connected according to the label to form a standard pattern [λ _{V (1)} , λ _{V (2)} ,
..., λ _{V (vn)} ]. Then, a speech recognition result candidate 4 calculated using the standard pattern of the recognition target word for the speech feature amount O output from the speech feature amount extraction unit 1002
Sound score P of word string RW _{k of} 03 (k = 1 to N, N is the number of candidates)
Language model 2 including (O | RW _k ) and classified redundant words
02, a word string of the speech recognition result candidate 403 calculated by
And RW _k of probability P _f ^c (RW _k), the occurrence probability P word sequence RW _'k from word sequence RW _k excluding the redundant word speech recognition result candidates 403 to calculate by the language model 104, except for the redundant word (R
The matching score is _obtained by W ′ _k ). The collation score is calculated by, for example, Expression (18).

【０１１２】[0112]

【数１８】 (Equation 18)

【０１１３】式(１８)においてα₁、α₂は重み係数であ
る。この照合スコアの値が最も大きい単語列 RW=[V(r
(1)), V(r(2)), ..., V(r(m))] が音声認識結果１００
６となる。ここでｒ(ｉ)は音声認識結果の単語系列のｉ
番目の単語の単語番号を示す。また、ｍは認識単語系列
の単語数を示す。In the equation (18), α ₁ and α ₂ are weight coefficients. The word string RW = [V (r
(1)), V (r (2)), ..., V (r (m))] are the speech recognition results 100
It becomes 6. Here, r (i) is i of the word sequence of the speech recognition result.
Indicates the word number of the second word. M indicates the number of words in the recognized word sequence.

【０１１４】また、実施の形態６における音声認識方法
を音声認識プログラムとして記録媒体に記録することも
できる。この場合には実施の形態２の言語モデル生成プ
ログラムに加えて、音声特徴量抽出手段１００２と同様
の処理を実現する音声特徴量抽出手順と、第１の照合手
段４０１と同様の処理を実現する第１の照合手順と、第
２の照合手段と同様の処理を実現する第２の照合手順と
を含む音声認識プログラムを記録媒体に記録する。The speech recognition method according to the sixth embodiment can be recorded on a recording medium as a speech recognition program. In this case, in addition to the language model generation program of the second embodiment, a voice feature extraction procedure for realizing the same processing as that of the voice feature extraction unit 1002 and a process similar to that of the first matching unit 401 are realized. A voice recognition program including a first collation procedure and a second collation procedure realizing the same processing as the second collation means is recorded on a recording medium.

【０１１５】以上のように、この実施の形態６における
音声認識装置、音声認識方法によれば、冗長語を除いた
学習用テキストを入力して冗長語を除いた言語モデルを
生成し、冗長語を含む学習用テキストを入力してクラス
化された冗長語を含む言語モデルを生成するので、冗長
語を除いた言語モデルは冗長語の影響によるスパースネ
スやゼロ頻度を軽減するので冗長語を含まない単語列に
対する生起確率の推定精度が高く、またクラス化された
冗長語を含む言語モデルは冗長語を含む単語連鎖の確率
を与える。そして、音声認識ではクラス化された冗長語
を含む言語モデルを用いて音声認識結果候補を出力し、
冗長語を除いた言語モデルとクラス化された冗長語を含
む言語モデルの両方によって音声認識結果候補から音声
認識結果を選び出すので、高い認識率が得られる効果が
ある。As described above, according to the speech recognition apparatus and the speech recognition method according to the sixth embodiment, a language model excluding redundant words is generated by inputting a learning text excluding redundant words, and a redundant word model is generated. A language model containing redundant words is generated by inputting a training text that contains redundant words, so the language model excluding redundant words reduces sparseness due to the effects of redundant words and zero frequency, so does not include redundant words The probability of occurrence of the occurrence probability for the word string is high, and the language model including the classified redundant words gives the probability of the word chain including the redundant words. Then, in the speech recognition, a speech recognition result candidate is output using a language model including a classified redundant word,
Since the speech recognition result is selected from the speech recognition result candidates by both the language model excluding the redundant word and the language model including the classified redundant word, a high recognition rate can be obtained.

【０１１６】[0116]

【発明の効果】以上のようにこの発明の実施の形態１に
よる言語モデル生成装置、方法、記憶媒体によれば、冗
長語を除いた学習用テキストを入力して冗長語を除いた
言語モデルを生成し、冗長語を含む学習用テキストを入
力して冗長語を含む言語モデルを生成するので、冗長語
を除いた言語モデルは冗長語の影響によるスパースネス
やゼロ頻度を軽減するので冗長語を含まない単語列に対
する生起確率の推定精度が高く、また冗長語を含む言語
モデルは冗長語を含む単語連鎖の確率を与える。従って
音声認識に冗長語を除いた言語モデルと、冗長語を含む
言語モデルの両方を用いることで高い認識率が得られる
効果がある。As described above, according to the language model generating apparatus, method, and storage medium according to the first embodiment of the present invention, a language model from which a redundant word is removed by inputting a learning text from which a redundant word is removed is input. A language model containing redundant words is generated by inputting a training text including redundant words and generating a language model including redundant words.Language models excluding redundant words reduce the sparseness and zero frequency caused by redundant words and include redundant words. The probability of occurrence probability estimation for a missing word string is high, and a language model containing redundant words gives the probability of a word chain containing redundant words. Therefore, by using both the language model excluding the redundant words and the language model including the redundant words for speech recognition, a high recognition rate can be obtained.

【０１１７】また、この発明の実施の形態２による言語
モデル生成装置、方法、記憶媒体によれば、冗長語を除
いた学習用テキストを入力して冗長語を除いた言語モデ
ルを生成し、冗長語を含む学習用テキストを入力してク
ラス化された冗長語を含む言語モデルを生成するので、
冗長語を除いた言語モデルは冗長語の影響によるスパー
スネスやゼロ頻度を軽減するので冗長語含まない単語列
に対する生起確率の推定精度が高く、またクラス化され
た冗長語を含む言語モデルは冗長語を含む単語連鎖の確
率を与える。従って音声認識に冗長語を除いた言語モデ
ルと、クラス化された冗長語を含む言語モデルの両方を
用いることで高い認識率が得られる効果がある。Further, according to the language model generating apparatus, method and storage medium according to the second embodiment of the present invention, a learning text excluding redundant words is input to generate a language model excluding redundant words, and a redundant language model is generated. Since a learning model including words is input and a language model including redundant words classified into a class is generated,
The language model excluding redundant words reduces sparseness and zero frequency due to the effects of redundant words. Gives the probability of a word chain containing. Therefore, by using both the language model excluding the redundant words and the language model including the classified redundant words for speech recognition, a high recognition rate can be obtained.

【０１１８】また、この発明の実施の形態３による音声
認識装置、方法、記憶媒体によれば、冗長語を除いた学
習用テキストを入力して冗長語を除いた言語モデルを生
成し、冗長語を含む学習用テキストを入力して冗長語を
含む言語モデルを生成するので、冗長語を除いた言語モ
デルは冗長語の影響によるスパースネスやゼロ頻度を軽
減するので冗長語を含まない単語列に対する生起確率の
推定精度が高く、また冗長語を含む言語モデルは冗長語
を含む単語連鎖の確率を与える。この冗長語を除いた言
語モデルと、冗長語を含む言語モデルの両方を音声認識
に用いるので高い認識率が得られる効果がある。Further, according to the speech recognition apparatus, method and storage medium according to Embodiment 3 of the present invention, a learning text excluding redundant words is input to generate a language model excluding redundant words, and a redundant word model is generated. A language model containing redundant words is generated by inputting training texts containing redundant words, and the language model without redundant words reduces sparseness and zero frequency due to the effects of redundant words. A language model that has a high probability estimation accuracy and includes a redundant word gives the probability of a word chain including the redundant word. Since both the language model excluding the redundant word and the language model including the redundant word are used for speech recognition, an effect of obtaining a high recognition rate is obtained.

【０１１９】また、この発明の実施の形態４による音声
認識装置、方法、記憶媒体によれば、冗長語を除いた学
習用テキストを入力して冗長語を除いた言語モデルを生
成し、冗長語を含む学習用テキストを入力して冗長語を
含む言語モデルを生成するので、冗長語を除いた言語モ
デルは冗長語の影響によるスパースネスやゼロ頻度を軽
減するので冗長語を含まない単語列に対する生起確率の
推定精度が高く、また冗長語を含む言語モデルは冗長語
を含む単語連鎖の確率を与える。そして、音声認識では
冗長語を含む言語モデルを用いて音声認識結果候補を出
力し、冗長語を除いた言語モデルと冗長語を含む言語モ
デルの両方によって音声認識結果候補から音声認識結果
を選び出すので、高い認識率が得られる効果がある。According to the speech recognition apparatus, method and storage medium according to Embodiment 4 of the present invention, a learning text excluding redundant words is input to generate a language model excluding redundant words, and a redundant word model is generated. A language model containing redundant words is generated by inputting training texts containing redundant words, and the language model without redundant words reduces sparseness and zero frequency due to the effects of redundant words. A language model that has a high probability estimation accuracy and includes a redundant word gives the probability of a word chain including the redundant word. Then, in speech recognition, a speech recognition result candidate is output using a language model including a redundant word, and a speech recognition result is selected from the speech recognition result candidate using both the language model excluding the redundant word and the language model including the redundant word. There is an effect that a high recognition rate can be obtained.

【０１２０】また、この発明の実施の形態５による音声
認識装置、方法、記憶媒体によれば、冗長語を除いた学
習用テキストを入力して冗長語を除いた言語モデルを生
成し、冗長語を含む学習用テキストを入力してクラス化
された冗長語を含む言語モデルを生成するので、冗長語
を除いた言語モデルは冗長語の影響によるスパースネス
やゼロ頻度を軽減するので冗長語含まない単語列に対す
る生起確率の推定精度が高く、またクラス化された冗長
語を含む言語モデルは冗長語を含む単語連鎖の確率を与
える。この冗長語を除いた言語モデルと、クラス化され
た冗長語を含む言語モデルの両方を音声認識に用いるの
で高い認識率が得られる効果がある。According to the speech recognition apparatus, method and storage medium according to the fifth embodiment of the present invention, a learning text excluding redundant words is input to generate a language model excluding redundant words, and a redundant word model is generated. A language model containing redundant words is generated by inputting learning texts containing words, and the language model excluding redundant words reduces sparseness and zero frequency due to the effects of redundant words, so words that do not include redundant words The estimation accuracy of the occurrence probability for the sequence is high, and the language model including the classified redundant words gives the probability of the word chain including the redundant words. Since both the language model excluding the redundant words and the language model including the classified redundant words are used for speech recognition, there is an effect that a high recognition rate can be obtained.

【０１２１】また、この発明の実施の形態６による音声
認識装置、方法、記憶媒体によれば、冗長語を除いた学
習用テキストを入力して冗長語を除いた言語モデルを生
成し、冗長語を含む学習用テキストを入力してクラス化
された冗長語を含む言語モデルを生成するので、冗長語
を除いた言語モデルは冗長語の影響によるスパースネス
やゼロ頻度を軽減するので冗長語を含まない単語列に対
する生起確率の推定精度が高く、またクラス化された冗
長語を含む言語モデルは冗長語を含む単語連鎖の確率を
与える。そして、音声認識ではクラス化された冗長語を
含む言語モデルを用いて音声認識結果候補を出力し、冗
長語を除いた言語モデルとクラス化された冗長語を含む
言語モデルの両方によって音声認識結果候補から音声認
識結果を選び出すので、高い認識率が得られる効果があ
る。According to the speech recognition apparatus, method, and storage medium according to the sixth embodiment of the present invention, a learning model excluding redundant words is input to generate a language model excluding redundant words, and a redundant word model is generated. A language model containing redundant words is generated by inputting a training text that contains redundant words, and the language model excluding redundant words reduces sparseness and zero frequency due to the effects of redundant words. The estimation accuracy of the occurrence probability for the word string is high, and the language model including the classified redundant words gives the probability of the word chain including the redundant words. Then, in speech recognition, candidate speech recognition results are output using a language model including a classified redundant word, and the speech recognition result is calculated using both a language model excluding the redundant word and a language model including the classified redundant word. Since the speech recognition result is selected from the candidates, there is an effect that a high recognition rate can be obtained.

【図面の簡単な説明】[Brief description of the drawings]

【図１】この発明の実施の形態１による言語モデル生
成装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a language model generation device according to a first embodiment of the present invention.

【図２】この発明の実施の形態１による言語モデル生
成装置における言語モデル生成方法を示すフローチャー
トである。FIG. 2 is a flowchart showing a language model generation method in the language model generation device according to the first embodiment of the present invention.

【図３】この発明の実施の形態２による言語モデル生
成装置の構成を示すブロック図である。FIG. 3 is a block diagram showing a configuration of a language model generation device according to a second embodiment of the present invention.

【図４】この発明の実施の形態２による言語モデル生
成装置における言語モデル生成方法を示すフローチャー
トである。FIG. 4 is a flowchart showing a language model generation method in a language model generation device according to a second embodiment of the present invention.

【図５】この発明の実施の形態３による音声認識装置
に構成を示すブロック図である。FIG. 5 is a block diagram showing a configuration of a voice recognition device according to a third embodiment of the present invention.

【図６】この発明の実施の形態３による音声認識装置
における音声認識方法を示すフローチャートである。FIG. 6 is a flowchart showing a voice recognition method in a voice recognition device according to Embodiment 3 of the present invention.

【図７】この発明の実施の形態４による音声認識装置
に構成を示すブロック図である。FIG. 7 is a block diagram showing a configuration of a voice recognition device according to a fourth embodiment of the present invention.

【図８】この発明の実施の形態４による音声認識装置
における音声認識方法を示すフローチャートである。FIG. 8 is a flowchart showing a voice recognition method in a voice recognition device according to Embodiment 4 of the present invention.

【図９】この発明の実施の形態５による音声認識装置
に構成を示すブロック図である。FIG. 9 is a block diagram showing a configuration of a voice recognition device according to a fifth embodiment of the present invention.

【図１０】この発明の実施の形態５による音声認識装
置における音声認識方法を示すフローチャートである。FIG. 10 is a flowchart showing a voice recognition method in a voice recognition device according to Embodiment 5 of the present invention.

【図１１】この発明の実施の形態６による音声認識装
置に構成を示すブロック図である。FIG. 11 is a block diagram showing a configuration of a voice recognition device according to a sixth embodiment of the present invention.

【図１２】この発明の実施の形態６による音声認識装
置における音声認識方法を示すフローチャートである。FIG. 12 is a flowchart showing a voice recognition method in a voice recognition device according to Embodiment 6 of the present invention.

【図１３】出現する頻度が高い冗長語の例を示す図で
ある。FIG. 13 is a diagram illustrating an example of a redundant word that appears frequently.

【図１４】従来の音声認識装置の構成を示すブロック
図である。FIG. 14 is a block diagram illustrating a configuration of a conventional voice recognition device.

【符号の説明】[Explanation of symbols]

１０１学習用テキスト、１０２冗長語除去手段、１
０３冗長語を除いた言語モデル生成手段、１０４冗
長語を除いた言語モデル、１０５冗長語を含む言語モ
デル生成手段、１０６冗長語を含む言語モデル、２０
１クラス化された冗長語を含む言語モデル生成手段、
２０２クラス化された冗長語を含む言語モデル、３０
１照合手段、４０１第１の照合手段、４０２第２
の照合手段、４０３音声認識結果候補、１００１認
識対象音声、１００２音声特徴量抽出手段、１００３
音響モデル、１００６音声認識結果。101 learning text, 102 redundant word removing means, 1
03 language model generating means excluding redundant words, 104 language model excluding redundant words, 105 language model generating means including redundant words, 106 language model including redundant words, 20
1 language model generating means including a classified redundant word,
202 Language model with classified redundant words, 30
1 collation means, 401 first collation means, 402 second
Collation means, 403 speech recognition result candidate, 1001 speech to be recognized, 1002 speech feature quantity extraction means, 1003
Acoustic model, 1006 Speech recognition result.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/14 Ｇ１０Ｌ 3/00 ５３５Ａ 15/18 ５３７Ｄ 15/28 5/06 Ｄ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 15/14 G10L 3/00 535A 15/18 537D 15/28 5/06 D

Claims

【特許請求の範囲】[Claims]

【請求項１】冗長語を含む学習用テキストを入力し
て、単語列の生起確率を求める言語モデルを生成する言
語モデル生成装置であって、上記冗長語を含む学習用テキストを入力して、冗長語も
含めて単語列の生起確率を求める言語モデルを生成する
冗長語を含む言語モデル生成手段と、上記冗長語を含む学習用テキストから冗長語を取り除
き、冗長語を除いた学習用テキストを生成する冗長語除
去手段と、上記冗長語を除いた学習用テキストを入力し、冗長語を
除いた単語列の生起確率を求める言語モデルを生成する
冗長語を除いた言語モデル生成手段と、を備えたことを特徴とする言語モデル生成装置。1. A language model generating apparatus for inputting a learning text including a redundant word and generating a language model for obtaining an occurrence probability of a word string, wherein the learning text including the redundant word is input, A language model generating means including a redundant word for generating a language model for obtaining an occurrence probability of a word string including the redundant word; and a learning text including the redundant word and removing the redundant word from the learning text including the redundant word. Redundant word removing means for generating; and a language model generating means for removing a redundant word, which inputs a learning text excluding the redundant word and generates a language model for obtaining an occurrence probability of a word string excluding the redundant word. A language model generation device, comprising:

【請求項２】冗長語を含む学習用テキストを入力し
て、単語列の生起確率を求める言語モデルを生成する言
語モデル生成装置であって、上記冗長語を含む学習用テキストを入力して冗長語をク
ラス化し、クラス化された冗長語も含めて単語列の生起
確率を求める言語モデルを生成するクラス化された冗長
語を含む言語モデル生成手段と、上記冗長語を含む学習用テキストから冗長語を取り除
き、冗長語を除いた学習用テキストを生成する冗長語除
去手段と、上記冗長語を除いた学習用テキストを入力し、冗長語を
除いた単語列の生起確率を求める言語モデルを生成する
冗長語を除いた言語モデル生成手段と、を備えたことを特徴とする言語モデル生成装置。2. A language model generating apparatus for inputting a learning text including a redundant word and generating a language model for obtaining an occurrence probability of a word string, comprising the steps of: A language model generating means including a classified redundant word for generating a language model for classifying the word and obtaining the occurrence probability of the word string including the classified redundant word; A redundant word removing means for removing words and generating a learning text excluding redundant words; and generating a language model for inputting the learning text excluding the redundant words and calculating an occurrence probability of a word string excluding the redundant words. A language model generating apparatus, comprising:

【請求項３】認識対象音声を入力して音声認識を行い
音声認識結果を出力する音声認識装置であって、上記認識対象音声を入力し音声特徴量を抽出する音声特
徴量抽出手段と、上記音声特徴量の系列の確率を求めるための音響モデル
と、請求項１に記載の上記冗長語を含む言語モデルおよび冗
長語を除いた言語モデルと、上記音響モデル、上記冗長語を含む言語モデルおよび上
記冗長語を除いた言語モデルとを用いて、上記音声特徴
量抽出手段が抽出した音声特徴量に対して照合を行い音
声認識結果を出力する照合手段と、を備えたことを特徴とする音声認識装置。3. A speech recognition apparatus for inputting a recognition target speech, performing speech recognition and outputting a speech recognition result, comprising: a speech feature extraction means for inputting the recognition speech and extracting a speech feature. An acoustic model for determining a probability of a sequence of speech feature amounts; a language model including the redundant word and a language model excluding the redundant word according to claim 1; a acoustic model; a language model including the redundant word; Using the language model from which the redundant words have been removed, and performing collation on the speech feature amount extracted by the speech feature amount extraction unit and outputting a speech recognition result. Recognition device.

【請求項４】認識対象音声を入力して音声認識を行い
音声認識結果を出力する音声認識装置であって、上記認識対象音声を入力し音声特徴量を抽出する音声特
徴量抽出手段と、上記音声特徴量の系列の確率を求めるための音響モデル
と、請求項１に記載の上記冗長語を含む言語モデルおよび冗
長語を除いた言語モデルと、上記音響モデルと上記冗長語を含む言語モデルとを用い
て、上記音声特徴量抽出手段が抽出した音声特徴量に対
して照合を行い複数の音声認識結果候補を出力する第１
の照合手段と、この第１の照合手段が出力した複数の音声認識結果候補
に対して、上記冗長語を含む言語モデルと上記冗長語を
除いた言語モデルとを用いて、照合を行い音声認識結果
を出力する第２の照合手段と、を備えたことを特徴とする音声認識装置。4. A speech recognition apparatus for inputting a recognition target speech, performing speech recognition and outputting a speech recognition result, comprising: a speech feature extraction means for inputting the recognition speech and extracting a speech feature. An acoustic model for determining a probability of a sequence of speech feature amounts; a language model including the redundant word and a language model excluding the redundant word according to claim 1; a language model including the acoustic model and the redundant word; The first is to collate the speech feature quantity extracted by the speech feature quantity extraction means and to output a plurality of speech recognition result candidates.
Using the language model including the redundant word and the language model excluding the redundant word for the plurality of speech recognition result candidates output by the first matching unit, and performing speech recognition. And a second collating means for outputting a result.

【請求項５】認識対象音声を入力して音声認識を行い
音声認識結果を出力する音声認識装置であって、上記認識対象音声を入力し音声特徴量を抽出する音声特
徴量抽出手段と、上記音声特徴量の系列の確率を求めるための音響モデル
と、請求項２に記載の上記クラス化された冗長語を含む言語
モデルおよび冗長語を除いた言語モデルと、上記音響モデル、上記クラス化された冗長語を含む言語
モデルおよび上記冗長語を除いた言語モデルとを用い
て、上記音声特徴量抽出手段が抽出した音声特徴量に対
して照合を行い音声認識結果を出力する照合手段と、を備えたことを特徴とする音声認識装置。5. A speech recognition apparatus for inputting a speech to be recognized, performing speech recognition, and outputting a speech recognition result, comprising: a speech feature amount extraction unit for inputting the speech to be recognized and extracting a speech feature amount; An acoustic model for obtaining a probability of a sequence of speech feature amounts; a language model including the classified redundant words and a language model excluding the redundant words according to claim 2; Using the language model including the redundant word and the language model excluding the redundant word, performing matching against the speech feature amount extracted by the speech feature amount extracting unit, and outputting a speech recognition result; A voice recognition device, comprising:

【請求項６】認識対象音声を入力して音声認識を行い
音声認識結果を出力する音声認識装置であって、上記認識対象音声を入力し音声特徴量を抽出する音声特
徴量抽出手段と、上記音声特徴量の系列の確率を求めるための音響モデル
と、請求項２に記載の上記クラス化された冗長語を含む言語
モデルおよび冗長語を除いた言語モデルと、上記音響モデルと上記クラス化された冗長語を含む言語
モデルとを用いて、上記音声特徴量抽出手段が抽出した
音声特徴量に対して照合を行い複数の音声認識結果候補
を出力する第１の照合手段と、この第１の照合手段が出力した複数の音声認識結果候補
に対して、上記クラス化された冗長語を含む言語モデル
と上記冗長語を除いた言語モデルとを用いて、照合を行
い音声認識結果を出力する第２の照合手段と、を備えたことを特徴とする音声認識装置。6. A speech recognition apparatus for inputting a speech to be recognized, performing speech recognition, and outputting a speech recognition result, comprising: a speech feature amount extraction unit for inputting the speech to be recognized and extracting a speech feature amount; An acoustic model for obtaining a probability of a sequence of speech feature amounts; a language model including the classified redundant words and a language model excluding the redundant words according to claim 2; A first matching unit that compares the speech feature amount extracted by the speech feature amount extraction unit using the language model including the redundant word and outputs a plurality of speech recognition result candidates; Using the language model including the classified redundant words and the language model excluding the redundant words with respect to the plurality of speech recognition result candidates output by the matching unit, perform matching and output a speech recognition result. Two lights Speech recognition apparatus characterized by comprising: a means.

【請求項７】冗長語を含む学習用テキストから、単語
列の生起確率を求める言語モデルを生成する言語モデル
生成方法であって、上記冗長語を含む学習用テキストから冗長語も含めて単
語列の生起確率を求める言語モデルを生成する冗長語を
含む言語モデル生成工程と、上記冗長語を含む学習用テキストから冗長語を取り除
き、冗長語を除いた学習用テキストを生成する冗長語除
去工程と、上記冗長語を除いた学習用テキストから、冗長語を除い
た単語列の生起確率を求める言語モデルを生成する冗長
語を除いた言語モデル生成工程と、を備えたことを特徴とする言語モデル生成方法。7. A language model generation method for generating a language model for obtaining an occurrence probability of a word string from a learning text including a redundant word, wherein the word string includes a redundant word from the learning text including the redundant word. A language model generating step including a redundant word for generating a language model for determining an occurrence probability of the redundant word; and a redundant word removing step for removing a redundant word from the learning text including the redundant word and generating a learning text excluding the redundant word. A language model generating step of generating a language model for obtaining an occurrence probability of a word string excluding a redundant word from the learning text excluding the redundant word, and a language model generating step excluding a redundant word. Generation method.

【請求項８】冗長語を含む学習用テキストから、単語
列の生起確率を求める言語モデルを生成する言語モデル
生成方法であって、上記冗長語を含む学習用テキストの冗長語をクラス化
し、クラス化された冗長語も含めて単語列の生起確率を
求める言語モデルを生成するクラス化された冗長語を含
む言語モデル生成工程と、上記冗長語を含む学習用テキストから冗長語を取り除
き、冗長語を除いた学習用テキストを生成する冗長語除
去工程と、上記冗長語を除いた学習用テキストから、冗長語を除い
た単語列の生起確率を求める言語モデルを生成する冗長
語を除いた言語モデル生成工程と、を備えたことを特徴とする言語モデル生成方法。8. A language model generating method for generating a language model for obtaining an occurrence probability of a word string from a learning text including a redundant word, comprising: classifying a redundant word of the learning text including the redundant word into a class; A language model generation process including a classified redundant word for generating a language model for determining the occurrence probability of a word string including the redundant word converted into words, and a redundant word is removed from the learning text including the redundant word. A redundant word removing step of generating a learning text excluding the redundant word; and a language model excluding the redundant word generating a language model for obtaining an occurrence probability of a word string excluding the redundant word from the learning text excluding the redundant word. A language model generation method, comprising: a generation step.

【請求項９】認識対象音声の音声認識を行う音声認識
方法であって、上記認識対象音声から音声特徴量を抽出する音声特徴量
抽出工程と、上記音声特徴量の系列の確率を求めるための音響モデ
ル、請求項７に記載の上記冗長語を含む言語モデルおよ
び冗長語を除いた言語モデルを用いて、上記音声特徴量
抽出工程で抽出した音声特徴量に対して照合を行い音声
認識を行う照合工程と、を備えたことを特徴とする音声認識方法。9. A voice recognition method for performing voice recognition of a voice to be recognized, comprising: a voice feature amount extraction step of extracting a voice feature amount from the recognition target voice; Using the acoustic model, the language model including the redundant word described in claim 7 and the language model excluding the redundant word, the voice feature amount extracted in the voice feature amount extraction step is collated to perform voice recognition. A voice recognition method, comprising: a collation step.

【請求項１０】認識対象音声の音声認識を行う音声認
識方法であって、上記認識対象音声から音声特徴量を抽出する音声特徴量
抽出工程と、上記音声特徴量の系列の確率を求めるための音響モデル
と請求項７に記載の上記冗長語を含む言語モデルとを用
いて、上記音声特徴量抽出工程で抽出した音声特徴量に
対して照合を行い複数の音声認識結果候補を求める第１
の照合工程と、この第１の照合工程で求められた複数の音声認識結果候
補に対して、請求項７に記載の上記冗長語を含む言語モ
デルと冗長語を除いた言語モデルとを用いて、照合を行
い音声認識を行う第２の照合工程と、を備えたことを特徴とする音声認識方法。10. A voice recognition method for performing voice recognition of a voice to be recognized, comprising: a voice feature amount extraction step of extracting a voice feature amount from the recognition target voice; First, a plurality of speech recognition result candidates are obtained by comparing a speech feature amount extracted in the speech feature amount extraction step using an acoustic model and the language model including the redundant word according to claim 7.
And using the language model including the redundant word and the language model excluding the redundant word according to claim 7 with respect to the plurality of speech recognition result candidates obtained in the first matching step. And a second matching step of performing matching and performing voice recognition.

【請求項１１】認識対象音声の音声認識を行う音声認
識方法であって、上記認識対象音声を入力し音声特徴量を抽出する音声特
徴量抽出工程と、上記音声特徴量の系列の確率を求めるための音響モデ
ル、請求項８に記載の上記クラス化された冗長語を含む
言語モデルおよび冗長語を除いた言語モデルを用いて、
上記音声特徴量抽出工程で抽出された音声特徴量に対し
て照合を行い音声認識を行う照合工程と、を備えたことを特徴とする音声認識方法。11. A voice recognition method for performing voice recognition of a voice to be recognized, comprising: a voice feature amount extraction step of inputting the voice to be recognized and extracting a voice feature amount; and determining a probability of the sequence of the voice feature amount. An acoustic model for including the redundant words classified into the class and the language model excluding the redundant words according to claim 8.
A verification step of performing voice recognition by performing verification on the voice feature amount extracted in the voice feature amount extraction step.

【請求項１２】認識対象音声の音声認識を行う音声認
識方法であって、上記認識対象音声を入力し音声特徴量を抽出する音声特
徴量抽出工程と、上記音声特徴量の系列の確率を求めるための音響モデル
と請求項８に記載の上記クラス化された冗長語を含む言
語モデルとを用いて、上記音声特徴量抽出工程で抽出さ
れた音声特徴量に対して照合を行い複数の音声認識結果
候補を求める第１の照合工程と、この第１の照合工程で求められた複数の音声認識結果候
補に対して、請求項８に記載の上記クラス化された冗長
語を含む言語モデルと冗長語を除いた言語モデルとを用
いて、照合を行い音声認識を行う第２の照合工程と、を備えたことを特徴とする音声認識方法。12. A speech recognition method for performing speech recognition of a speech to be recognized, comprising: a speech feature extraction step of inputting the speech to be recognized and extracting a speech feature; and determining a probability of the sequence of the speech feature. A plurality of speech recognition units that collate the speech feature amount extracted in the speech feature amount extraction step by using an acoustic model for the speech recognition and a language model including the classified redundant words according to claim 8. 9. A first matching step for obtaining a result candidate; and a language model including the classified redundant word according to claim 8 for a plurality of speech recognition result candidates obtained in the first matching step. A second matching step of performing matching and performing speech recognition using a language model excluding words, and a second matching step.

【請求項１３】冗長語を含む学習用テキストを入力し
て、単語列の生起確率を求める言語モデルを生成する言
語モデル生成プログラムを記録した記録媒体であって、上記冗長語を含む学習用テキストを入力して、冗長語も
含めて単語列の生起確率を求める言語モデルを生成する
冗長語を含む言語モデル生成手順と、上記冗長語を含む学習用テキストから冗長語を取り除
き、冗長語を除いた学習用テキストを生成する冗長語除
去手順と、上記冗長語を除いた学習用テキストを入力し、冗長語を
除いた単語列の生起確率を求める言語モデルを生成する
冗長語を除いた言語モデル生成手順と、を実行させる言語モデル生成プログラムを記録したコン
ピュータ読み取り可能な記録媒体。13. A recording medium recording a language model generating program for inputting a learning text including a redundant word and generating a language model for obtaining a word string occurrence probability, wherein the learning text including the redundant word is included. And a language model generation procedure including a redundant word for generating a language model for obtaining the occurrence probability of a word string including the redundant word, and removing the redundant word from the learning text including the redundant word and removing the redundant word. Redundant word elimination procedure for generating a learned text, and a language model excluding the redundant word for generating a language model for inputting the learning text excluding the redundant word and obtaining the occurrence probability of a word string excluding the redundant word A computer-readable recording medium that records a generation procedure and a language model generation program for executing the program.

【請求項１４】冗長語を含む学習用テキストを入力し
て、単語列の生起確率を求める言語モデルを生成する言
語モデル生成プログラムを記録した記録媒体であって、上記冗長語を含む学習用テキストを入力して冗長語をク
ラス化し、クラス化された冗長語も含めて単語列の生起
確率を求める言語モデルを生成するクラス化された冗長
語を含む言語モデル生成手順と、上記冗長語を含む学習用テキストから冗長語を取り除
き、冗長語を除いた学習用テキストを生成する冗長語除
去手順と、上記冗長語を除いた学習用テキストを入力し、冗長語を
除いた単語列の生起確率を求める言語モデルを生成する
冗長語を除いた言語モデル生成手順と、を実行させる言語モデル生成プログラムを記録したコン
ピュータ読み取り可能な記録媒体。14. A recording medium storing a language model generation program for inputting a learning text including a redundant word and generating a language model for obtaining an occurrence probability of a word string, wherein the learning text includes the redundant word. A language model generation procedure including a classified redundant word for generating a language model for obtaining a word string occurrence probability including the classified redundant word by inputting A redundant word removal procedure for removing a redundant word from the learning text and generating a learning text without the redundant word, and inputting the learning text without the redundant word and calculating a probability of occurrence of a word string excluding the redundant word. A computer-readable recording medium that records a language model generation procedure excluding a redundant word for generating a desired language model, and a language model generation program for executing the language model generation program.

【請求項１５】認識対象音声を入力して音声認識を行
い音声認識結果を出力する音声認識プログラムを記録し
た記録媒体であって、上記認識対象音声を入力し音声特徴量を抽出する音声特
徴量抽出手順と、上記音声特徴量の系列の確率を求めるための音響モデ
ル、冗長語を含む学習用テキストを入力して生成された
冗長語を含む言語モデルおよび冗長語を除いた学習用テ
キストを入力して生成された冗長語を除いた言語モデル
を用いて、上記音声特徴量抽出手順で抽出した音声特徴
量に対して照合を行い音声認識結果を出力する照合手順
と、を実現させる音声認識プログラムを記録したコンピュー
タ読み取り可能な記録媒体。15. A recording medium on which a speech recognition program for inputting a speech to be recognized and performing speech recognition and outputting a speech recognition result is recorded, wherein a speech feature quantity for inputting the speech to be recognized and extracting a speech feature quantity. An extraction procedure, an acoustic model for obtaining the probability of the sequence of the speech features, a language model including a redundant word generated by inputting a learning text including a redundant word, and a learning text excluding the redundant word are input. Using the language model excluding the redundant words generated by the above, performing a collation on the speech features extracted in the above speech features extraction procedure and outputting a speech recognition result; and a speech recognition program realizing: A computer-readable recording medium on which is recorded.

【請求項１６】認識対象音声を入力して音声認識を行
い音声認識結果を出力する音声認識プログラムを記録し
た記録媒体であって、上記認識対象音声を入力し音声特徴量を抽出する音声特
徴量抽出手順と、上記音声特徴量の系列の確率を求めるための音響モデル
と冗長語を含む学習用テキストを入力して生成された冗
長語を含む言語モデルとを用いて、上記音声特徴量抽出
手順で抽出した音声特徴量に対して照合を行い複数の音
声認識結果候補を出力する第１の照合手順と、この第１の照合手順が出力した複数の音声認識結果候補
に対して、上記冗長語を含む言語モデルと冗長語を除い
た学習用テキストを入力して生成された冗長語を除いた
言語モデルとを用いて、照合を行い音声認識結果を出力
する第２の照合手順と、を実現させる音声認識プログラムを記録したコンピュー
タ読み取り可能な記録媒体。16. A recording medium storing a speech recognition program for inputting a speech to be recognized and performing speech recognition and outputting a speech recognition result, wherein the speech feature quantity for inputting the speech to be recognized and extracting a speech feature quantity. Using the acoustic model for determining the probability of the sequence of the speech feature amount and the language model including the redundant word generated by inputting the learning text including the redundant word, A first collation procedure for performing collation on the speech feature amount extracted in step 1 and outputting a plurality of speech recognition result candidates; and a plurality of speech recognition result candidates outputted by the first collation procedure, A second matching procedure of performing matching and outputting a speech recognition result using a language model including redundant words and a language model excluding the redundant words generated by inputting the learning text excluding the redundant words. Sound A computer-readable recording medium recognition program.

【請求項１７】認識対象音声を入力して音声認識を行
い音声認識結果を出力する音声認識プログラムを記録し
た記録媒体であって、上記認識対象音声を入力し音声特徴量を抽出する音声特
徴量抽出手順と、上記音声特徴量の系列の確率を求めるための音響モデ
ル、冗長語を含む学習用テキストを入力し冗長語をクラ
ス化して生成されたクラス化された冗長語を含む言語モ
デルおよび冗長語を除いた学習用テキストを入力して生
成された冗長語を除いた言語モデルとを用いて、上記音
声特徴量抽出手順が抽出した音声特徴量に対して照合を
行い音声認識結果を出力する照合手順と、を実現させる音声認識プログラムを記録したコンピュー
タ読み取り可能な記録媒体。17. A recording medium on which a speech recognition program for inputting a speech to be recognized and performing speech recognition and outputting a speech recognition result is recorded, wherein a speech feature quantity for inputting the speech to be recognized and extracting a speech feature quantity. An extraction procedure, an acoustic model for determining the probability of the sequence of the speech feature amount, a language model including a classified redundant word generated by inputting a learning text including the redundant word and classifying the redundant word, and a redundant model Using the language model excluding the redundant words generated by inputting the learning text excluding the words, collating the speech features extracted by the above speech feature extraction procedure and outputting a speech recognition result. A computer-readable recording medium on which a collation procedure and a voice recognition program for realizing the following are recorded.

【請求項１８】認識対象音声を入力して音声認識を行
い音声認識結果を出力する音声認識プログラムを記録し
た記録媒体であって、上記認識対象音声を入力し音声特徴量を抽出する音声特
徴量抽出手順と、上記音声特徴量の系列の確率を求めるための音響モデル
と冗長語を含む学習用テキストを入力し冗長語をクラス
化して生成されたクラス化された冗長語を含む言語モデ
ルとを用いて、上記音声特徴量抽出手順で抽出した音声
特徴量に対して照合を行い複数の音声認識結果候補を出
力する第１の照合手順と、この第１の照合手順で出力した複数の音声認識結果候補
に対して、上記クラス化された冗長語を含む言語モデル
と冗長語を除いた学習用テキストを入力して生成された
冗長語を除いた言語モデルとを用いて、照合を行い音声
認識結果を出力する第２の照合手順と、を実現させる音声認識プログラムを記録したコンピュー
タ読み取り可能な記録媒体。18. A recording medium storing a speech recognition program for inputting a speech to be recognized and performing speech recognition and outputting a speech recognition result, wherein the speech feature is a speech inputting the speech to be recognized and extracting a speech feature. An extraction procedure, and a language model including a classified redundant word generated by inputting a training text including a redundant word and an acoustic model for determining the probability of the sequence of the speech feature amount and classifying the redundant word. A first matching procedure of performing matching on the speech feature quantity extracted in the above-described speech feature quantity extracting procedure and outputting a plurality of speech recognition result candidates; and a plurality of speech recognition procedures output in the first matching procedure. Speech recognition is performed using the language model containing the redundant words classified as described above and the language model excluding the redundant words generated by inputting the learning text excluding the redundant words. Conclusion A second collation procedure for outputting a result, and a computer-readable recording medium recording a voice recognition program for realizing the following.