JP2003330484A

JP2003330484A - Method and device for voice recognition

Info

Publication number: JP2003330484A
Application number: JP2002142998A
Authority: JP
Inventors: Soichi Toyama; 聡一外山
Original assignee: Pioneer Electronic Corp
Current assignee: Pioneer Corp
Priority date: 2002-05-17
Filing date: 2002-05-17
Publication date: 2003-11-19
Anticipated expiration: 2022-05-17
Also published as: JP4275353B2

Abstract

<P>PROBLEM TO BE SOLVED: To realize speaker application which is hardly affected by a background noise. <P>SOLUTION: An initial voice model Mc is stored in a speaker application model storage part 2, and a noise application part 3 subjects the initial voice model Mc preliminarily stored in the speaker application model storage part 2 to noise application to generate a noise application model Mc'. A speaker application parameter calculation part 4 generates a speaker application parameter P by the noise application model Mc' and a feature vector sequence V (n) of speaker's voice, and a voice model update part 5 uses the speaker application parameter P to subject the initial voice model Mc to speaker application processing and generates a speaker application model Mc". The initial voice model Mc is substituted with the speaker application model Mc", and the speaker application model storage part 2 is updated to store the speaker application model Mc". At the time of voice recognition, the noise application part 3 subjects the speaker application model Mc" stored by update to noise application instead of the initial voice model Mc to generate a speaker application model Mreg subjected to noise application, and a voice recognition part 9 collates a sequence constituted of the noise speaker application model Mreg and the feature vector sequence V(n) of the spoken voice to be recognized, with each other to perform voice recognition. <P>COPYRIGHT: (C)2004,JPO

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば話者適応を
施した音声モデルを用いて音声認識を行う音声認識装置
及び音声認識方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition apparatus and a voice recognition method for performing voice recognition using a voice model with speaker adaptation, for example.

【０００２】[0002]

【従来の技術】音声認識技術の分野では、大量の音声デ
ータベースから学習した不特定話者音声モデルを使用し
て音声認識が行われている。2. Description of the Related Art In the field of speech recognition technology, speech recognition is performed using an unspecified speaker speech model learned from a large amount of speech database.

【０００３】しかし、この不特定話者音声モデルは、不
特定多数の発話音声データに基づいて学習されたもので
あるため、標準的な発話を行う話者の発話音声を音声認
識する場合には比較的高い認識性能が得られるものの、
発話に特徴のある話者の発話音声を認識する場合は、必
ずしも高い認識性能が得られるとは限らないという課題
があった。However, since this unspecified speaker speech model is learned based on a large number of unspecified speech data, in the case of recognizing speech of a speaker who makes a standard speech, Although relatively high recognition performance can be obtained,
There is a problem that high recognition performance is not always obtained when recognizing a speech of a speaker having a characteristic utterance.

【０００４】そのため、個々の話者の発話音声によって
不特定話者音声モデルを話者適応し、その話者適応した
音響モデルを用いることにより、話者個々人に対して適
切な音声認識を行おうとする話者適応法が開発された。Therefore, an attempt is made to perform proper voice recognition for each speaker by adapting the speaker-independent speaker voice model by the speaker's uttered voice and using the speaker-adapted acoustic model. A speaker adaptation method has been developed.

【０００５】従来の話者適応法では、大量の音声データ
ベースを用いて音素等サブワード単位の不特定話者音声
モデル（以下「初期音声モデル」という）を生成してお
き、実際の音声認識を開始する前の前処理段階で、初期
音声モデルに対して話者適応を施す。つまり、前処理段
階の際に話者に発話をしてもらい、発話音声の特徴ベク
トル系列に基づいて初期音声モデルを話者適応すること
により、話者の個人差を考慮した話者適応モデルを生成
する。In the conventional speaker adaptation method, an unspecified speaker speech model (hereinafter referred to as "initial speech model") in units of subwords such as phonemes is generated using a large speech database, and actual speech recognition is started. Speaker adaptation is performed on the initial speech model in the pre-processing stage before the process. In other words, a speaker adaptation model that considers individual differences of speakers is obtained by having the speaker speak during the preprocessing stage and adapting the initial speech model based on the feature vector sequence of the speech. To generate.

【０００６】そして、実際の音声認識に際して、その話
者が発した認識すべき発話音声の特徴ベクトル系列と既
述の話者適応モデルから構成される系列とを照合し、最
も高い尤度の得られる話者適応モデル系列を音声認識結
果としている。In actual speech recognition, the feature vector series of the speech to be recognized by the speaker and the series composed of the speaker adaptation model described above are collated to obtain the highest likelihood. The obtained speaker adaptation model sequence is used as the speech recognition result.

【０００７】[0007]

【発明が解決しようとする課題】ところが、初期音声モ
デルに対して話者適応を行う際、話者の発話音声に発話
環境下での背景雑音が重畳することとなる。However, when speaker adaptation is performed on the initial voice model, background noise in the utterance environment is superimposed on the uttered voice of the speaker.

【０００８】このため、従来の話者適応法では、発話音
声のみならず背景雑音の重畳した発話音声（すなわち、
背景雑音重畳発話音声）の特徴ベクトル系列によって話
者適応することとなり、精度の良い話者適応モデルを生
成することが困難となる場合があった。Therefore, in the conventional speaker adaptation method, not only the uttered voice but also the uttered voice in which background noise is superimposed (that is,
Speaker adaptation is performed by a feature vector sequence of (background noise-superimposed uttered speech), and it may be difficult to generate an accurate speaker adaptation model.

【０００９】特に、雑音の多い環境下での発話音声を用
いて話者適応を行うことになると、背景雑音の影響を大
きく受けてしまい、話者の特徴を適切に反映した話者適
応モデルを生成することが困難となる場合があった。In particular, when speaker adaptation is to be performed using a speech voice in a noisy environment, it is greatly affected by background noise, and a speaker adaptation model that appropriately reflects the characteristics of the speaker is provided. It was sometimes difficult to generate.

【００１０】そして、実際に、従来の話者適応法で話者
適応を施した上記話者適応モデルで音声認識を行うと、
その認識時の環境下での背景雑音と、既述した話者適応
時の背景雑音とが異なるような場合に、話者適応を行っ
たことによる効果すなわち音声認識率向上の効果を必ず
しも十分に得られないという問題があった。Then, when speech recognition is actually performed by the speaker adaptation model which has been speaker-adapted by the conventional speaker adaptation method,
When the background noise in the environment at the time of recognition and the background noise at the speaker adaptation described above are different, the effect of speaker adaptation, that is, the effect of improving the speech recognition rate is not always sufficient. There was a problem that I could not get it.

【００１１】本発明は、上記従来の問題点に鑑みてなさ
れたものであり、背景雑音の影響に対してロバストな話
者適応を行う音声認識装置及び音声認識方法を提供する
ことを目的とする。The present invention has been made in view of the above-mentioned conventional problems, and an object of the present invention is to provide a speech recognition apparatus and a speech recognition method for robust speaker adaptation to the influence of background noise. .

【００１２】[0012]

【課題を解決するための手段】上記目的を達成するため
請求項１に係る音声認識装置の発明は、初期音声モデル
を有する記憶手段と、上記記憶手段の初期音声モデルに
話者適応時の背景雑音によって雑音適応を施すことで雑
音適応モデルを生成する雑音適応手段と、上記雑音適応
手段で生成された上記雑音適応モデルに対し、上記話者
適応時に発話された発話音声によって話者適応演算を行
い、上記雑音適応モデルを雑音重畳話者適応モデルに変
換するための話者適応パラメータを算出する話者適応パ
ラメータ算出手段と、上記記憶手段の初期音声モデル
に上記話者適応パラメータで話者適応を施すことにより
話者適応モデルを生成し、当該話者適応モデルを上記初
期音声モデルに代えて上記記憶手段に更新記憶させる音
声モデル更新手段とを備えることを特徴とする。In order to achieve the above object, the invention of a voice recognition apparatus according to claim 1 is a storage means having an initial voice model, and a background when a speaker is adapted to the initial voice model of the storage means. Noise adaptation means for generating a noise adaptation model by applying noise adaptation with noise, and speaker adaptation calculation is performed on the noise adaptation model generated by the noise adaptation means by the speech uttered during the speaker adaptation. Speaker adaptation parameter calculation means for calculating a speaker adaptation parameter for converting the noise adaptation model into a noise-superimposed speaker adaptation model, and speaker adaptation with the speaker adaptation parameter in the initial speech model of the storage means. And a voice model updating means for generating and storing a speaker adaptation model by updating the speaker adaptation model in the storage means instead of the initial speech model. It is characterized by including.

【００１３】請求項２に係る音声認識装置の発明は、請
求項１に係る音声認識装置の発明において、音声認識時
に音声認識処理を行う認識処理手段を備え、更に上記雑
音適応手段は、上記音声認識時の非発話期間における背
景雑音によって、上記記憶手段に更新記憶された上記話
者適応モデルに対し雑音適応を施すことで雑音適応を施
した話者適応モデルを生成し、当該雑音適応を施した話
者適応モデルを、発話音声を音声認識するための音響モ
デルとして上記音声認識手段に供給することを特徴とす
る。According to a second aspect of the present invention, there is provided a voice recognition device according to the first aspect, further comprising a recognition processing means for performing a voice recognition process at the time of voice recognition, and the noise adaptation means is a voice recognition means. Noise adaptation is performed on the speaker adaptation model updated and stored in the storage means by the background noise in the non-speech period at the time of recognition to generate a speaker adaptation model subjected to noise adaptation, and the noise adaptation is performed. The speaker adaptation model described above is supplied to the voice recognition means as an acoustic model for voice recognition of the uttered voice.

【００１４】請求項３に係る音声認識装置の発明は、初
期音声モデルを有する記憶手段と、音声認識時の非発話
期間における背景雑音により上記記憶手段の初期音声モ
デルに雑音適応を施すことで雑音適応モデルを生成する
雑音適応手段と、上記音声認識時の発話期間に発話され
た音声認識すべき発話音声と、上記雑音適応手段で生成
された上記雑音適応モデルとを照合して音声認識を行う
認識処理手段と、上記雑音適応手段で生成された上記雑
音適応モデルに対し上記音声認識すべき発話音声によっ
て話者適応演算を行い、上記雑音適応モデルを雑音重畳
話者適応モデルに変換するための話者適応パラメータを
算出する話者適応パラメータ算出手段と、上記記憶手
段の初期音声モデルに上記話者適応パラメータで話者適
応を施すことにより話者適応モデルを生成し、当該話者
適応モデルを上記初期音声モデルに代えて上記記憶手段
に更新記憶させる音声モデル更新手段とを備えることを
特徴とする。According to the invention of a speech recognition apparatus of claim 3, noise is generated by applying noise adaptation to the initial speech model of the storage means by the storage means having the initial speech model and the background noise in the non-speech period at the time of speech recognition. Speech recognition is performed by collating the noise adaptation means for generating an adaptive model, the speech to be recognized in the speech period at the time of speech recognition, and the noise adaptation model generated by the noise adaptation means. A recognition processing means and a speaker adaptation calculation are performed on the noise adaptation model generated by the noise adaptation means by the speech to be recognized, and the noise adaptation model is converted to a noise-superimposed speaker adaptation model. A speaker adaptation parameter calculation means for calculating a speaker adaptation parameter, and speaker adaptation by the speaker adaptation parameter to the initial voice model of the storage means. And a voice model updating means for generating a speaker adaptation model and updating and storing the speaker adaptation model in the storage means instead of the initial speech model.

【００１５】請求項４に係る音声認識装置の発明は、請
求項３に係る音声認識装置の発明において、上記話者適
応パラメータ算出手段と音声モデル更新手段は、上記認
識処理手段の認識結果の信頼度が高い場合に、上記話者
適応モデルを生成して上記初期音声モデルに代えて上記
記憶手段に更新記憶させることを特徴とする。According to a fourth aspect of the present invention, there is provided the voice recognition device according to the third aspect, wherein the speaker adaptation parameter calculating means and the voice model updating means are reliable in the recognition result of the recognition processing means. When the frequency is high, the speaker adaptation model is generated and updated and stored in the storage means instead of the initial voice model.

【００１６】請求項５に係る音声認識方法の発明は、記
憶手段に記憶されている初期音声モデルに話者適応時の
背景雑音によって雑音適応を施すことで雑音適応モデル
を生成する雑音適応処理工程と、上記雑音適応処理工程
で生成された上記雑音適応モデルに対し、上記話者適応
時に発話された発話音声によって話者適応演算を行い、
上記雑音適応モデルを雑音重畳話者適応モデルに変換す
るための話者適応パラメータを算出する話者適応パラメ
ータ算出処理工程と、上記記憶手段の初期音声モデルに
上記話者適応パラメータで話者適応を施すことにより話
者適応モデルを生成し、当該話者適応モデルを上記初期
音声モデルに代えて上記記憶手段に更新記憶させる音声
モデル更新処理工程とを備えることを特徴とする。The invention of the speech recognition method according to claim 5 is a noise adaptation processing step for generating a noise adaptation model by applying noise adaptation to the initial speech model stored in the storage means by background noise during speaker adaptation. And, for the noise adaptation model generated in the noise adaptation processing step, speaker adaptation calculation is performed by the speech voice uttered during the speaker adaptation,
A speaker adaptation parameter calculation process for calculating a speaker adaptation parameter for converting the noise adaptation model into a noise-superimposed speaker adaptation model, and speaker adaptation with the speaker adaptation parameter in the initial voice model of the storage means. A voice model update processing step of generating a speaker adaptation model by applying the voice adaptation model, and updating and storing the speaker adaptation model in the storage means instead of the initial voice model.

【００１７】請求項６に係る音声認識方法の発明は、請
求項５に係る音声認識方法の発明において、更に上記雑
音適応処理工程では、音声認識時の非発話期間における
背景雑音によって、上記記憶手段に更新記憶された上記
話者適応モデルに対し雑音適応を施すことで雑音適応を
施した話者適応モデルを生成し、上記雑音適応を施した
話者適応モデルと、上記音声認識時の発話期間における
音声認識すべき発話音声とを照合することにより音声認
識を行う音声認識処理工程とを備えることを特徴とす
る。According to a sixth aspect of the present invention, there is provided the voice recognition method according to the fifth aspect, wherein in the noise adaptation processing step, the storage means is caused by background noise in a non-speech period during voice recognition. A noise adaptive speaker-adapted model is generated by applying noise adaptation to the speaker-adapted model updated and stored in, and the speaker adaptive model subjected to the noise adaptation and the utterance period at the time of speech recognition. The voice recognition processing step of performing voice recognition by collating with the uttered voice to be voice-recognized.

【００１８】請求項７に係る音声認識方法の発明は、音
声認識時の非発話期間における背景雑音により、記憶手
段に記憶されている初期音声モデルに雑音適応を施すこ
とで雑音適応モデルを生成する雑音適応処理工程と、上
記音声認識時の発話期間に発話される音声認識すべき発
話音声と、上記雑音適応処理工程で生成された上記雑音
適応モデルとを照合して音声認識を行う認識処理工程
と、上記雑音適応処理工程で生成された上記雑音適応モ
デルに対し上記音声認識すべき発話音声によって話者適
応演算を行い、上記雑音適応モデルを雑音重畳話者適応
モデルに変換するための話者適応パラメータを算出する
話者適応パラメータ算出処理工程と、上記記憶手段の初
期音声モデルに上記話者適応パラメータで話者適応を施
すことにより話者適応モデルを生成し、当該話者適応モ
デルを上記初期音声モデルに代えて上記記憶手段に更新
記憶させる音声モデル更新処理工程とを備えることを特
徴とする。The invention of the speech recognition method according to claim 7 generates a noise adaptation model by applying noise adaptation to the initial speech model stored in the storage means by the background noise in the non-speech period at the time of speech recognition. A noise adaptation processing step, a recognition processing step of performing speech recognition by collating the uttered speech to be recognized in the utterance period at the time of speech recognition with the noise adaptive model generated in the noise adaptation processing step. And a speaker for converting the noise adaptive model into a noise-superimposed speaker adaptive model by performing speaker adaptive calculation on the noise adaptive model generated in the noise adaptive processing step with the speech to be recognized. A speaker adaptation parameter calculation process for calculating the adaptation parameter, and speaker adaptation by applying the speaker adaptation to the initial voice model of the storage means with the speaker adaptation parameter. To generate a model, the speaker adaptation model; and a speech model updating step of updating stored in the storage means in place of the initial speech model.

【００１９】請求項８に係る音声認識方法の発明は、請
求項７に係る音声認識方法の発明において、上記話者適
応パラメータ算出処理工程と音声モデル更新処理工程
は、上記認識処理工程の認識結果の信頼度が高い場合
に、上記話者適応モデルを生成して上記初期音声モデル
に代えて上記記憶手段に更新記憶させることを特徴とす
る。The invention according to claim 8 is the speech recognition method according to claim 7, wherein the speaker adaptation parameter calculation processing step and the speech model update processing step are the recognition results of the recognition processing step. Is high, the speaker adaptation model is generated and updated and stored in the storage means instead of the initial voice model.

【００２０】請求項１に係る音声認識装置と請求項５に
係る音声認識方法の発明によれば、話者適応に際して、
初期音声モデルに対して雑音適応を施し、その雑音適応
で得られる雑音適応モデルに対して話者適応演算を行
い、更に雑音適応モデルを雑音重畳話者適応モデルに変
換するための話者適応パラメータを算出して、初期音声
モデルに対しその話者適応パラメータで話者適応を施す
ことにより、音声認識に際して利用する話者適応モデル
を生成し、初期音声モデルをその話者適応モデルで更新
する。According to the inventions of the voice recognition device according to claim 1 and the voice recognition method according to claim 5, during speaker adaptation,
Noise adaptation is applied to the initial speech model, the speaker adaptation operation is performed on the noise adaptation model obtained by the noise adaptation, and the speaker adaptation parameter for converting the noise adaptation model into the noise-superimposed speaker adaptation model. Is calculated, and speaker adaptation is applied to the initial speech model with the speaker adaptation parameter to generate a speaker adaptation model used in speech recognition, and the initial speech model is updated with the speaker adaptation model.

【００２１】これにより、話者適応時の背景雑音の悪影
響を低減し、話者適応の本来の目的である話者の個人性
への適応効果の高い話者適応モデルの生成を実現する。This reduces the adverse effects of background noise during speaker adaptation, and realizes the generation of a speaker adaptation model having a high adaptation effect on the individuality of the speaker, which is the original purpose of speaker adaptation.

【００２２】請求項２に係る音声認識装置と請求項６に
係る音声認識方法の発明によれば、話者適応が行われた
後の音声認識の際、その音声認識時の非発話期間におけ
る背景雑音によって、更新記憶された話者適応モデルに
対し雑音適応を施す。これにより、雑音適応を施した話
者適応モデルを生成する。そして、雑音適応を施した話
者適応モデルと音声認識時の発話期間における音声認識
すべき発話音声とを照合することにより音声認識を行
う。According to the invention of the speech recognition apparatus according to claim 2 and the speech recognition method according to claim 6, in the speech recognition after the speaker adaptation is performed, the background in the non-speech period at the time of the speech recognition. Noise adaptation is applied to the speaker adaptation model updated and stored by the noise. As a result, a speaker adaptation model with noise adaptation is generated. Then, speech recognition is performed by comparing the speaker adaptation model with noise adaptation with the speech to be recognized in the speech period during speech recognition.

【００２３】請求項３に係る音声認識装置と請求項７に
係る音声認識方法の発明によれば、音声認識時に話者適
応も行う。According to the invention of the speech recognition apparatus of claim 3 and the speech recognition method of claim 7, speaker adaptation is also performed at the time of speech recognition.

【００２４】すなわち、音声認識時の非発話期間におけ
る背景雑音により初期音声モデルに雑音適応を施すこと
で雑音適応モデルを生成し、その音声認識時の発話期間
に発話される音声認識すべき発話音声と雑音適応モデル
とを照合して音声認識を行う。更に雑音適応モデルに対
して、音声認識すべき発話音声によって話者適応演算を
行い、雑音適応モデルを雑音重畳話者適応モデルに変換
するための話者適応パラメータを算出する。そして、初
期音声モデルに話者適応パラメータで話者適応を施すこ
とにより話者適応モデルを生成し、生成した話者適応モ
デルを初期音声モデルに代えて更新する。That is, the noise adaptation model is generated by applying noise adaptation to the initial speech model by the background noise in the non-speech period at the time of speech recognition, and the speech to be recognized that is to be recognized during the speech period at the time of speech recognition. And the noise adaptation model are compared to perform speech recognition. Furthermore, the speaker adaptation calculation is performed on the noise adaptation model by the speech to be recognized, and the speaker adaptation parameter for converting the noise adaptation model into the noise-superimposed speaker adaptation model is calculated. Then, a speaker adaptation model is generated by applying speaker adaptation to the initial speech model with a speaker adaptation parameter, and the generated speaker adaptation model is updated in place of the initial speech model.

【００２５】この結果、多くの発話音声を音声認識して
いくにしたがって、初期音声モデルを、話者の個人性へ
の適応の度合いの高い話者適応モデルへと更新していく
ことになり、音声認識性能の向上を実現する。As a result, as many uttered speeches are recognized, the initial speech model is updated to a speaker adaptation model having a high degree of adaptation to the individuality of the speaker. Achieves improved voice recognition performance.

【００２６】請求項４に係る音声認識装置と請求項８に
係る音声認識方法の発明によれば、音声認識の結果が信
頼度の高い場合に、話者適応モデルを生成して初期音声
モデルを更新することにより、発話環境の状態等に応じ
て適切な話者適応を実現する。According to the inventions of the speech recognition apparatus and the speech recognition method of claim 8, when the result of the speech recognition is highly reliable, the speaker adaptation model is generated to generate the initial speech model. By updating, appropriate speaker adaptation is realized according to the state of the utterance environment.

【００２７】なお、初期音声モデルとは、話者適応を施
す前の上記記憶手段に記憶されている音声モデルを指
す。本発明では、話者適応によって生成された話者適応
モデルで、記憶手段に記憶されている初期音声モデルを
更新するが、この更新された話者適応音声モデルは、初
期音声モデルとして扱われる。つまり、記憶手段には最
初、初期音声モデルが記憶されているが、話者適応モデ
ルで更新された後は、その更新された話者適応モデルを
初期音声モデルとみなして扱うという処理が繰り返され
る。The initial voice model refers to the voice model stored in the storage means before speaker adaptation. In the present invention, the speaker adaptation model generated by speaker adaptation updates the initial speech model stored in the storage means. The updated speaker adaptation speech model is treated as the initial speech model. That is, although the initial speech model is initially stored in the storage means, after the speaker adaptation model is updated, the processing of treating the updated speaker adaptation model as the initial speech model is repeated. .

【００２８】[0028]

【発明の実施の形態】以下、本発明の好適な実施の形態
を図面を参照して説明する。BEST MODE FOR CARRYING OUT THE INVENTION Preferred embodiments of the present invention will be described below with reference to the drawings.

【００２９】（第１の実施の形態）本発明の第１の実施
の形態を図１及び図２を参照して説明する。図１は、本
実施形態の音声認識装置の構成を示すブロック図であ
る。(First Embodiment) A first embodiment of the present invention will be described with reference to FIGS. 1 and 2. FIG. 1 is a block diagram showing the configuration of the speech recognition apparatus of this embodiment.

【００３０】なお、好適な一実施形態として、初期設定
で話者適応を行い、認識時には初期設定された音声モデ
ルを用いて行う実施形態を説明する。As a preferred embodiment, an embodiment will be described in which speaker adaptation is performed in the initial setting and the initially set voice model is used in recognition.

【００３１】図１において、本音声認識装置はＨＭＭ
（隠れマルコフモデル）を用いて音声認識を行う構成と
なっており、予め雑音のない環境で収録された音声デー
タベースを用いて学習された不特定話者音声モデルであ
る初期音声モデルＭcが記憶された初期音声モデル記憶
部１と、後述の話者適応演算処理で生成される話者適応
モデルＭc”を更新記憶するための話者適応モデル記憶
部２と、雑音適応部３と、話者適応パラメータ算出部４
及び音声モデル更新部５を有する話者適応部と、を備え
て構成されている。In FIG. 1, the present speech recognition apparatus is an HMM.
(Hidden Markov model) is used for speech recognition, and an initial speech model Mc that is an unspecified speaker speech model learned using a speech database recorded in advance in a noise-free environment is stored. Initial voice model storage unit 1, a speaker adaptation model storage unit 2 for updating and storing a speaker adaptation model Mc ″ generated by a speaker adaptation calculation process described later, a noise adaptation unit 3, and a speaker adaptation Parameter calculator 4
And a speaker adaptation unit having a voice model updating unit 5.

【００３２】更に、マイクロフォン７で収音された収音
信号ｖ(t)を所定のフレーム期間毎にケプストラム係数
ベクトルに変換し、ケプストラム領域の特徴ベクトル系
列Ｖ(n)を生成する音響分析部６と、切替スイッチ８及
び認識処理部９が備えられている。Further, the sound analysis unit 6 which converts the sound pickup signal v (t) picked up by the microphone 7 into a cepstrum coefficient vector every predetermined frame period to generate a feature vector series V (n) of the cepstrum region. And a changeover switch 8 and a recognition processing unit 9.

【００３３】尚、同図中、矢印付きの破線と矢印付きの
実線とによって、信号の通る経路を模式的に示している
が、矢印付きの破線は、話者適応時にのみ通る信号の流
れ、一方、矢印付きの実線は、音声認識時にのみ通る信
号の流れ、又は音声認識時及び話者適応時の際の信号の
流れを表している。In the figure, a broken line with an arrow and a solid line with an arrow schematically show a path through which a signal passes. The broken line with an arrow shows a flow of a signal only through speaker adaptation, On the other hand, the solid line with an arrow represents the flow of signals passing only during voice recognition, or the flow of signals during voice recognition and speaker adaptation.

【００３４】ここで、音響分析部６は、マイクロフォン
７が収音することで生じる時間軸上の収音信号ｖ(t)を
所定のフレーム時間毎にケプストラム係数ベクトルに変
換することにより、ケプストラム領域の特徴ベクトル系
列Ｖ(n)を生成して出力する。なお、符号Ｖ(n)中の変数
ｎはフレーム番号を示している。Here, the acoustic analysis unit 6 converts the sound pickup signal v (t) on the time axis, which is generated by the sound picked up by the microphone 7, into a cepstrum coefficient vector at every predetermined frame time, thereby generating a cepstrum region. And outputs the feature vector series V (n). The variable n in the symbol V (n) indicates the frame number.

【００３５】切替スイッチ８は、後述する話者適応の処
理に際して、話者が未だ発話を開始していない非発話期
間内に雑音適応部３側に切り替わり、音声分析部６で生
成される特徴ベクトル系列Ｖ(n)を雑音適応部３へ送出
する。The change-over switch 8 is switched to the noise adaptation section 3 side during the non-speaking period when the speaker has not yet started uttering in the speaker adaptation processing described later, and the feature vector generated by the speech analysis section 6 is generated. The series V (n) is sent to the noise adaptation unit 3.

【００３６】これは、発話者あるいは装置から処理開始
の指示がなされた直後の数十ミリ秒ないし数秒の間は音
声が発話されていない区間とみなして、切替スイッチ８
を雑音適応部３側へ切り替え、その後は、切替スイッチ
８を話者適応算出部４あるいは認識処理部９側へ切替え
制御することで実現可能となっている。This is regarded as a period in which no voice is uttered for several tens of milliseconds to several seconds immediately after the speaker or the device gives an instruction to start processing, and the changeover switch 8 is selected.
Is switched to the noise adaptation unit 3 side, and thereafter the changeover switch 8 is controlled to be switched to the speaker adaptation calculation unit 4 or the recognition processing unit 9 side.

【００３７】あるいは、マイクロフォン７からの収音信
号ｖ(t)を所定の監視制御手段（図示省略）で逐次監視
し、収音信号ｖ(t)に発話者の音声がないと判断したと
きには、切替スイッチ８を雑音適応部３側へ切り替え、
発話者の音声が有ると判断すると、話者適応算出部４あ
るいは認識処理部９側へ切替え制御するようにしてもよ
い。要は、収音信号ｖ(t)に発話者の音声が含まれてい
る区間であるか否かに応じて、切替スイッチ８を上述の
如く切り替える。Alternatively, when the sound collecting signal v (t) from the microphone 7 is sequentially monitored by a predetermined monitor control means (not shown), and it is determined that the sound collecting signal v (t) does not include the voice of the speaker, Switch the changeover switch 8 to the noise adaptation section 3 side,
If it is determined that the voice of the speaker is present, switching control may be performed to the speaker adaptation calculation unit 4 or the recognition processing unit 9 side. In short, the changeover switch 8 is switched as described above depending on whether or not the voice collecting signal v (t) is a section including the voice of the speaker.

【００３８】また、その話者適応処理の際、話者が発話
を開始すると、その発話期間内に話者適応パラメータ算
出部４側に切り替わり、音声分析部６で生成される特徴
ベクトル系列Ｖ(n)を話者適応パラメータ算出部４へ送
出する。When the speaker starts uttering during the speaker adaptation process, the speaker adaptation parameter calculation unit 4 is switched to within the utterance period, and the feature vector series V ( n) is sent to the speaker adaptation parameter calculation unit 4.

【００３９】また、話者適応処理が完了した後、音声認
識処理が開始された場合も切替スイッチ８は同様の動作
を行うが、前述のように、信号は矢印付きの破線で示す
経路を通らず、矢印付きの実線で示す経路を通ることと
なる。Further, when the voice recognition process is started after the speaker adaptation process is completed, the change-over switch 8 operates in the same manner. However, as described above, the signal passes through the route indicated by the broken line with an arrow. Instead, it will take the route shown by the solid line with an arrow.

【００４０】話者が発話を開始する前の背景雑音のみが
収音される期間では、切替スイッチ８は雑音適応部３側
に切り替わり、背景雑音の特徴ベクトルＮ(n)を雑音適
応部３へ送出する。During a period in which only the background noise before the speaker starts speaking, the changeover switch 8 is switched to the noise adaptation section 3 side, and the background noise feature vector N (n) is transferred to the noise adaptation section 3. Send out.

【００４１】マイクロフォン７が発話音声を収音する発
話期間になると、切替スイッチ８は、話者適応処理時に
は話者適応パラメータ算出部４側に、認識処理時には、
認識処理部９側に切り替わり、その発話期間における特
徴ベクトル系列Ｖ(n)を話者適応パラメータ算出部４あ
るいは認識処理部９へ送出する。During the utterance period in which the microphone 7 picks up the uttered voice, the changeover switch 8 is set to the speaker adaptation parameter calculation unit 4 side during the speaker adaptation process and to the recognition process during the recognition process.
Switching to the recognition processing unit 9 side, the feature vector series V (n) in the utterance period is sent to the speaker adaptation parameter calculation unit 4 or the recognition processing unit 9.

【００４２】初期音声モデル記憶部１は、読出し専用の
半導体メモリ（ＲＯＭ）や、着脱自在に設けられたスマ
ートメディアやコンパクトフラッシュ（登録商標）メモ
リ等で形成された所謂データベースであり、予め、標準
的な話者の音声を学習することによって生成された音素
等サブワード単位の初期音声モデルＭcが記憶されてい
る。The initial voice model storage unit 1 is a so-called database formed by a read-only semiconductor memory (ROM), a detachably provided smart media, a compact flash (registered trademark) memory, etc. An initial speech model Mc for each subword such as a phoneme generated by learning a typical speaker's speech is stored.

【００４３】話者適応モデル記憶部２は、再記憶が可能
な非破壊形の半導体メモリ等で形成されており、後述の
話者適応の処理に際して、まず、初期音声モデル記憶部
１に記憶されている初期音声モデルＭcを複写して記憶
する。The speaker adaptation model storage unit 2 is formed by a non-destructive semiconductor memory or the like that can be re-stored, and is first stored in the initial speech model storage unit 1 in the speaker adaptation processing described later. The initial voice model Mc that is being reproduced is copied and stored.

【００４４】そして後述する如く、話者適応パラメータ
算出部４と音声モデル更新部５によってＨＭＭは話者適
応され、話者適応モデルＭc”にて更新されることか
ら、初期音声モデルＭcを話者適応モデルＭc”に置き換
えて（更新して）記憶するようになっている。As will be described later, since the HMM is speaker-adapted by the speaker adaptation parameter calculation unit 4 and the speech model updating unit 5 and is updated with the speaker adaptation model Mc ″, the initial speech model Mc is changed to the speaker. It is stored by replacing (updating) with the adaptive model Mc ″.

【００４５】雑音適応部３は、話者適応の処理に際し
て、話者適応モデル記憶部２に記憶されている音素等サ
ブワード単位での全ての初期音声モデルＭcに雑音適応
を施すことにより、全ての初期音声モデルＭcに対応し
た雑音適応モデルＭc’を生成し、同図中の矢印付き点
線の経路を通じて、話者適応パラメータ算出部４へ送出
する。During the speaker adaptation processing, the noise adaptation unit 3 applies noise adaptation to all initial speech models Mc in subword units such as phonemes stored in the speaker adaptation model storage unit 2, thereby performing all noise adaptation. A noise adaptation model Mc 'corresponding to the initial speech model Mc is generated and sent to the speaker adaptation parameter calculation unit 4 via the path of the dotted line with an arrow in the figure.

【００４６】また雑音適応部３は、音声認識時におい
て、上記話者適応の処理により話者適応モデル記憶部２
に更新記憶されることとなった音声モデル（すなわち、
話者適応モデルＭc”）に対して雑音適応を施し、その
雑音適応した話者適応モデルＭregを、同図中の矢印付
き実線で示す経路を通じて認識処理部９へ送出する。Further, the noise adaptation unit 3 performs the speaker adaptation process during the speech recognition by the speaker adaptation model storage unit 2
The voice model (ie,
Noise adaptation is applied to the speaker adaptation model Mc ″), and the noise adapted speaker adaptation model Mreg is sent to the recognition processing unit 9 through the path shown by the solid line with an arrow in FIG.

【００４７】すなわち、前者の話者適応処理の際には、
話者適応時に話者が未だ発話していない非発話期間に、
その発話環境で生じる背景雑音をマイクロフォン７が収
音すると、音響分析部６がその収音信号ｖ(t)より所定
フレーム期間毎の特徴ベクトル系列Ｖ(n)を生成し、更
に切替スイッチ８が雑音適応部３側に切り替わること
で、その特徴ベクトル系列Ｖ(n)を背景雑音の特徴ベク
トル系列Ｎ(n)として雑音適応部３へ送出する。That is, in the former speaker adaptation processing,
During the non-speaking period when the speaker has not yet spoken when the speaker was adapted,
When the microphone 7 picks up the background noise generated in the utterance environment, the acoustic analysis unit 6 generates a feature vector series V (n) for each predetermined frame period from the picked-up sound signal v (t), and the changeover switch 8 further By switching to the noise adaptation unit 3 side, the feature vector sequence V (n) is sent to the noise adaptation unit 3 as the background noise feature vector sequence N (n).

【００４８】そして、雑音適応部３は、その背景雑音特
徴ベクトル系列Ｎ(n)を用いて、全ての初期音声モデル
Ｍcから、ＨＭＭ合成手法やヤコビ適応手法等の雑音適
応処理により、雑音適応モデルＭc’を生成し、話者適
応パラメータ算出部４へ送出する。Then, the noise adaptation unit 3 uses the background noise feature vector sequence N (n) to perform noise adaptation processing such as HMM synthesis method or Jacobian adaptation method from all initial speech models Mc. Mc 'is generated and sent to the speaker adaptation parameter calculation unit 4.

【００４９】また、後者の音声認識処理の際には、音声
認識時に話者が未だ発話していない非発話期間に、その
発話環境で生じる背景雑音をマイクロフォン７が収音
し、音響分析部６がその収音信号ｖ(t)より所定フレー
ム期間毎の特徴ベクトル系列Ｖ(n)を生成し、更に切替
スイッチ８が雑音適応部３側に切り替わることで、その
特徴ベクトル系列Ｖ(n)を背景雑音の特徴ベクトル系列
Ｎ(n)として雑音適応部３へ送出する。In the latter voice recognition processing, the microphone 7 picks up background noise generated in the utterance environment during the non-utterance period in which the speaker has not yet uttered at the time of voice recognition, and the acoustic analysis unit 6 Generates a feature vector sequence V (n) for each predetermined frame period from the collected sound signal v (t), and the changeover switch 8 is switched to the noise adaptation section 3 side, so that the feature vector sequence V (n) is changed. It is sent to the noise adaptation unit 3 as a background noise feature vector sequence N (n).

【００５０】そして、雑音適応部３は、その背景雑音特
徴ベクトル系列Ｎ(n)を用いて、更新記憶されることと
なった音声モデル（すなわち、話者適応モデルＭc”）
に対して雑音適応を施し、雑音適応した話者適応モデル
Ｍregを認識処理部９へ送出する。Then, the noise adaptation section 3 uses the background noise feature vector sequence N (n) to update and store the speech model (that is, the speaker adaptation model Mc ″).
Noise adaptation is performed to the speaker adaptation model Mreg adapted to noise, and the speaker adaptation model Mreg is transmitted to the recognition processing unit 9.

【００５１】ここで、雑音適応部３が話者適応時にＨＭ
Ｍ合成法を用いて、音声認識率に大きく影響を与える話
者適応モデルＭc”の各分布の平均ベクトルμcを雑音適
応する場合を説明する。Here, the noise adaptation unit 3 is HM-adapted during speaker adaptation.
A case will be described in which the M vector synthesis method is used to noise-adapt the average vector μc of each distribution of the speaker adaptation model Mc ″ that greatly affects the speech recognition rate.

【００５２】まず、雑音適応部３は、背景雑音の特徴ベ
クトル系列Ｎ(n)より、背景雑音モデルＮbを求める。First, the noise adaptation section 3 obtains a background noise model Nb from the background noise feature vector sequence N (n).

【００５３】ここで説明の便宜上、背景雑音は定常と仮
定し、背景雑音モデルＮbは１状態１混合モデルとし、
更に背景雑音モデルＮbの平均ベクトルをμNとして説明
すると、平均ベクトルμNは背景雑音の特徴ベクトル系
列Ｎ(n)をフレーム数で平均することで求める。For convenience of explanation, it is assumed that the background noise is stationary, and the background noise model Nb is a 1-state 1-mixed model.
Further, when the average vector of the background noise model Nb is described as μN, the average vector μN is obtained by averaging the feature vector series N (n) of the background noise with the number of frames.

【００５４】更に、初期音声モデルＭcの分布ｍの平均
ベクトルμcmと背景雑音モデルＮbの平均ベクトルμNを
合成することで、次式（１）で表される、合成後の雑音
適応した分布ｍの平均ベクトルμcm’を求める。Further, by synthesizing the average vector μcm of the distribution m of the initial speech model Mc and the average vector μN of the background noise model Nb, the distribution m of the noise-adapted distribution m represented by the following equation (1) is obtained. Calculate the average vector μcm '.

【００５５】[0055]

【数１】 [Equation 1]

【００５６】ここで、ＩＤＣＴ〔〕は逆離散コサイン
変換、ｌｏｇ〔〕は対数変換、ｅｘｐ〔〕は指数変
換、ＤＣＴ〔〕は離散コサイン変換、ｋはＳＮ比より
求まる混合比である。Here, IDCT [] is the inverse discrete cosine transform, log [] is the logarithmic transform, exp [] is the exponential transform, DCT [] is the discrete cosine transform, and k is the mixture ratio obtained from the SN ratio.

【００５７】これを初期音声モデルＭcの全ての分布に
対して求める。これにより、初期音声モデルＭcに話者
適応時の発話環境下での背景雑音を重畳させた形の雑音
適応モデルＭc’を求め、話者適応パラメータ算出部４
へ送出する。This is obtained for all distributions of the initial voice model Mc. As a result, the noise adaptation model Mc ′ in a form in which the background noise in the utterance environment at the time of speaker adaptation is superimposed on the initial speech model Mc is obtained, and the speaker adaptation parameter calculation unit 4
Send to.

【００５８】なお、ここでは、雑音モデルを１状態１混
合としたが、２状態以上あるいは２混合以上の場合は、
初期音声モデルＭcの１つの分布に対し、対応する雑音
適応モデルＭc’の分布が複数求まることになる。ま
た、共分散行列を考慮する場合も雑音適応モデルＭc’
を求めることが可能である。Here, the noise model is one state and one mixture, but when there are two states or more, or two or more states,
With respect to one distribution of the initial speech model Mc, a plurality of distributions of the corresponding noise adaptation model Mc 'are obtained. Also, when considering the covariance matrix, the noise adaptive model Mc '
It is possible to ask.

【００５９】また、雑音適応手法としてＨＭＭ合成法を
用いる場合を説明したが、本発明では、ヤコビ適応手法
その他の、初期音声モデルＭcに発話時の背景雑音を重
畳した状態の雑音適応モデルＭc’を求める雑音適応手
法を用いることも可能である。Further, although the case where the HMM synthesis method is used as the noise adaptation method has been described, the present invention adopts the Jacobian adaptation method and other noise adaptation models Mc ′ in the state where background noise at the time of utterance is superimposed on the initial speech model Mc. It is also possible to use a noise adaptation method for obtaining

【００６０】話者適応パラメータ算出部４は、話者適応
処理に際して、雑音適応部３からの雑音適応モデルＭ
c’と、音響分析部６から切替スイッチ８を介して供給
される発話音声の特徴ベクトル系列Ｖ(n)とを入力し、
発話音声の特徴を有する話者適応パラメータＰを生成し
て出力する。The speaker adaptation parameter calculation unit 4 receives the noise adaptation model M from the noise adaptation unit 3 in the speaker adaptation process.
c ′ and the feature vector sequence V (n) of the uttered voice supplied from the acoustic analysis unit 6 via the changeover switch 8 are input,
A speaker adaptation parameter P having the characteristics of the uttered voice is generated and output.

【００６１】より具体的に述べると、話者適応処理に際
して話者が発話を開始すると、その発話期間に切替スイ
ッチ８が話者適応パラメータ算出部４側に切り替わり、
背景雑音の重畳した発話音声の特徴ベクトル系列Ｖ(n)
が音声分析部６から切替スイッチ８を介して話者適応パ
ラメータ算出部４に供給される。More specifically, when the speaker starts speaking during speaker adaptation processing, the changeover switch 8 switches to the speaker adaptation parameter calculation section 4 side during the speech period.
Feature vector sequence V (n) of the speech voice with background noise superimposed
Is supplied from the voice analysis unit 6 to the speaker adaptation parameter calculation unit 4 via the changeover switch 8.

【００６２】こうして背景雑音の重畳した音声（背景雑
音重畳音声）の特徴ベクトル系列Ｖ(n)とそれと同じ背
景雑音で雑音適応された雑音適応モデルＭc’が供給さ
れると、話者適応パラメータ算出部４は、それらの特徴
ベクトル系列Ｖ(n)と雑音適応モデルＭc’を用いて話者
適応演算処理を行い、雑音適応モデルＭc’を話者適応
するための話者適応パラメータＰを生成する。When the feature vector series V (n) of the voice on which the background noise is superimposed (background noise-superimposed voice) and the noise adaptation model Mc 'that is noise-adapted by the same background noise as that are supplied, the speaker adaptation parameter calculation is performed. The unit 4 uses the feature vector series V (n) and the noise adaptation model Mc 'to perform speaker adaptation calculation processing to generate a speaker adaptation parameter P for speaker adaptation of the noise adaptation model Mc'. .

【００６３】ここでは話者適応演算処理としてＭＬＬＲ
（Maxmum Likelihood Linear Regression）を用いて、
認識率に大きく影響する話者適応モデルＭcの各分布の
平均ベクトルを更新する場合を説明する。Here, MLLR is used as the speaker adaptive calculation processing.
(Maxmum Likelihood Linear Regression)
A case of updating the average vector of each distribution of the speaker adaptation model Mc that greatly affects the recognition rate will be described.

【００６４】発話内容が既知の発話音声の特徴ベクトル
系列Ｖ(n)と雑音適応モデルＭc’とを用いてＭＬＬＲ処
理を行い、雑音適応モデルＭc’の分布ｍの平均ベクト
ルμcm’を話者適応するための話者適応パラメータＰと
して、変換行列Ｗm’とオフセットベクトルｂm’を求め
る。MLLR processing is performed using the feature vector series V (n) of the uttered speech whose utterance content is known and the noise adaptation model Mc ', and the average vector μcm' of the distribution m of the noise adaptation model Mc 'is adapted to the speaker. The transformation matrix Wm 'and the offset vector bm' are obtained as speaker adaptation parameters P for

【００６５】ここで、変換行列Ｗm’とオフセットベク
トルｂm’は複数の分布で共有させるので、いくつかの
分布では同じ値の変換行列Ｗm’とオフセットベクトル
ｂm’を使用する。Since the transformation matrix Wm 'and the offset vector bm' are shared by a plurality of distributions, the transformation matrix Wm 'and the offset vector bm' having the same value are used in some distributions.

【００６６】また、変換行列Ｗm’とオフセットベクト
ルｂm’を共有させる分布の選択は、全平均ベクトルを
クラスタリングすることにより、雑音適応前の分布を元
に予め計算しておく。The selection of the distribution sharing the conversion matrix Wm 'and the offset vector bm' is calculated in advance based on the distribution before noise adaptation by clustering all average vectors.

【００６７】また、全ての分布で変換行列Ｗm’とオフ
セットベクトルｂm’を共有する場合は、全分布に共通
の１種類の変換行列Ｗm’とオフセットベクトルｂm’を
求める。When the transformation matrix Wm 'and the offset vector bm' are shared by all distributions, one type of transformation matrix Wm 'and offset vector bm' common to all the distributions is obtained.

【００６８】また、上述した雑音適応３で用いた雑音モ
デルが１状態１混合でない場合は、雑音適応モデルＭ
c’の複数分布が初期音声モデルＭcの１つの分布に対応
することになるが、この場合は初期音声モデルＭcの１
つの分布に対応する全ての雑音適応モデルＭc’で、変
換行列Ｗm’とオフセットベクトルｂm’を共有する。If the noise model used in the above noise adaptation 3 is not one-state one-mixture, the noise adaptation model M
A plurality of distributions of c'correspond to one distribution of the initial speech model Mc. In this case, 1 of the initial speech model Mc is used.
The transformation matrix Wm 'and the offset vector bm' are shared by all noise adaptation models Mc 'corresponding to one distribution.

【００６９】なお、ＭＬＬＲでは、一般に数発話分の発
話音声データを用いて計算を行うが、話者適応パラメー
タＰの分布間の共有情報を全発話で共通に用い、発話音
声データに対応する音響モデルは発話毎に雑音適応され
た雑音適応モデルＭc’を用いて計算する。In the MLLR, generally speaking, the speech data for several utterances is used, but the shared information between the distributions of the speaker adaptation parameters P is commonly used for all utterances, and the sound corresponding to the utterance speech data is used. The model is calculated using a noise adaptive model Mc 'that is noise-adapted for each utterance.

【００７０】このように、話者適応手法としてＭＬＬＲ
を用いる場合、話者適応パラメータ算出部４では、発話
内容が既知の発話音声の特徴ベクトル系列Ｖ(n)を用い
て、音響モデルＭc’の各分布の平均ベクトルを更新す
るための話者適応パラメータＰとして、変換行列Ｗm’
とオフセットベクトルｂm’を求める。Thus, MLLR is used as a speaker adaptation method.
In the case of using the speaker adaptation parameter calculation unit 4, the speaker adaptation parameter calculation unit 4 uses the feature vector series V (n) of the uttered voice whose utterance content is known to update the average vector of each distribution of the acoustic model Mc ′. As the parameter P, the transformation matrix Wm '
And the offset vector bm '.

【００７１】なお、上述したように、ＭＬＬＲで変換行
列Ｗm’とオフセットベクトルｂm’を算出する場合を述
べたが、ＭＡＰ（Maxmum A Posteriori）推定法を適用
することも可能である。Although the case where the transformation matrix Wm 'and the offset vector bm' are calculated by the MLLR has been described above, the MAP (Maxmum A Posteriori) estimation method can also be applied.

【００７２】このＭＡＰ推定法を採用して、平均ベクト
ルμcm’を適応するためのパラメータＰを求めるには、
雑音適応モデルＭc’の平均ベクトルをＭＡＰ推定法に
より話者適応させ、そこから話者適応パラメータ算出部
４で、目的の話者適応パラメータＰに変換する。To obtain the parameter P for adapting the average vector μcm ′ by using this MAP estimation method,
The average vector of the noise adaptation model Mc 'is adapted to the speaker by the MAP estimation method, and the speaker adaptation parameter calculation unit 4 converts the average vector into the target speaker adaptation parameter P.

【００７３】このＭＡＰ推定法では、発話内容既知の発
話音声の特徴ベクトル系列Ｖ(n)の各フレームの特徴ベ
クトルと、雑音適応モデルＭc’の各分布との対応関係
をビタビ整合等により算出する。In this MAP estimation method, the correspondence relation between the feature vector of each frame of the feature vector series V (n) of the uttered voice whose utterance content is known and each distribution of the noise adaptive model Mc 'is calculated by Viterbi matching or the like. .

【００７４】そして、雑音適応モデルＭc’の分布ｍに
対応するフレームの特徴ベクトルを集め、それをフレー
ム数で平均することで平均特徴ベクトルＶm~を求める。Then, the feature vectors of the frames corresponding to the distribution m of the noise adaptive model Mc 'are collected and averaged by the number of frames to obtain the average feature vector Vm ~.

【００７５】このときの、分布ｍに対応するフレームの
特徴ベクトルのフレーム数（個数）をｎm、分布ｍの重
み係数をτm’、分布ｍの平均ベクトルμcm’を話者適
応した更新平均ベクトルをμcm’＾とすると、その更新
平均ベクトルμcm’＾を次式(2)で表される関係に従っ
て算出するAt this time, the number of frames (number) of the feature vectors of the frames corresponding to the distribution m is nm, the weighting coefficient of the distribution m is τm ′, and the average vector μcm ′ of the distribution m is the speaker-adapted updated average vector. If μcm ′ ^, then the updated average vector μcm ′ ^ is calculated according to the relationship expressed by the following equation (2).

【００７６】[0076]

【数２】 [Equation 2]

【００７７】また、重み係数τm’も次式(3)で表される
関係に従って、発話毎に更新する。The weighting coefficient τm 'is also updated for each utterance according to the relationship expressed by the following equation (3).

【００７８】[0078]

【数３】 [Equation 3]

【００７９】そして、更新平均ベクトルμcm’＾で平均
ベクトルμcm’を置き換え、更に重み係数もτm’＾で
τm’を置き換えることで、発話がなされる度に平均ベ
クトルμcm’と重み係数τm’を夫々更新平均ベクトル
μcm’＾と重み係数τm’＾で順次に更新していく。By replacing the average vector μcm ′ with the updated average vector μcm ′ ^ and further replacing τm ′ with the weighting coefficient τm ′ ^, the average vector μcm ′ and the weighting coefficient τm ′ are exchanged each time an utterance is made. Each is updated sequentially with the updated average vector μcm '^ and the weighting coefficient τm' ^.

【００８０】ここで、話者適応パラメータＰを話者適応
後のモデルと話者適応前のモデルとの差ベクトルと考え
ると、分布ｍの話者適応パラメータＰである差ベクトル
ｄm’は、次式(4)で表される。Considering the speaker adaptation parameter P as a difference vector between the model after speaker adaptation and the model before speaker adaptation, the difference vector dm 'which is the speaker adaptation parameter P of the distribution m is It is expressed by equation (4).

【００８１】[0081]

【数４】 [Equation 4]

【００８２】この式(4)によると、更新平均ベクトルμc
m’＾を算出することなく差ベクトルｄm’を求めること
ができる。According to this equation (4), the updated average vector μc
The difference vector dm 'can be obtained without calculating m' ^.

【００８３】そして、差ベクトルｄm’を後述の音声モ
デル更新部５に転送し、重み係数τm’は上記式(3)によ
り更新し、話者適応パラメータ算出部４に保持してお
く。なお、重み係数τm’の初期値は任意の値に選ぶこ
とができる。Then, the difference vector dm ′ is transferred to the voice model updating unit 5 described later, the weighting coefficient τm ′ is updated by the above equation (3), and is held in the speaker adaptation parameter calculating unit 4. The initial value of the weighting coefficient τm 'can be selected to any value.

【００８４】また、上述した雑音適応３で用いた雑音適
応モデルＭc’が１状態１混合でない場合は、雑音適応
モデルＭc’の複数分布が初期音声モデルＭcの１つの分
布に対応することになる。When the noise adaptation model Mc 'used in the above-mentioned noise adaptation 3 is not a 1-state 1-mixture, a plurality of distributions of the noise adaptation model Mc' correspond to one distribution of the initial speech model Mc. .

【００８５】例えば初期音声モデルＭcの分布ｍが雑音
適応モデルＭc’の分布ｍ1，ｍ2，……，ｍkに対応した
とする。そして雑音適応モデルの分布ｍ1に対応する、
上記式(4)より求まる話者適応パラメータをｄm1’、重
み係数をτm1’とすると、初期音声モデルＭcの分布ｍ
を更新するための話者適応パラメータｄm’を、次式(5)
で表される加算平均処理にて求める。For example, assume that the distribution m of the initial speech model Mc corresponds to the distributions m1, m2, ..., Mk of the noise adaptive model Mc '. And corresponding to the distribution m1 of the noise adaptive model,
Assuming that the speaker adaptation parameter obtained from the above equation (4) is dm1 ′ and the weighting coefficient is τm1 ′, the distribution m of the initial speech model Mc is
The speaker adaptation parameter dm ′ for updating
It is obtained by the arithmetic mean processing represented by

【００８６】[0086]

【数５】 [Equation 5]

【００８７】また、上記式(5)に重み係数τm1’で重み
付けした次式(6)で表される演算によって、ｋ個のパラ
メータを統合して話者適応パラメータｄm’を算出して
もよい。Further, the speaker adaptation parameter dm 'may be calculated by integrating the k parameters by the operation represented by the following expression (6) in which the weighting coefficient τm1' is weighted in the above expression (5). .

【００８８】[0088]

【数６】 [Equation 6]

【００８９】以上、話者適応手法としてＭＬＬＲとＭＡ
Ｐ推定法を用いる場合の話者適応パラメータ算出部４の
動作を説明した。As described above, MLLR and MA are used as speaker adaptation methods.
The operation of the speaker adaptation parameter calculation unit 4 when the P estimation method is used has been described.

【００９０】なお、話者適応手法として、他の手法を講
じることもできる。Other methods can be adopted as the speaker adaptation method.

【００９１】ＭＬＬＲの変換行列Ｗm’とオフセットベ
クトルｂm’のように、話者適応処理により話者適応パ
ラメータＰが求まる話者適応手法を用いる場合は、それ
らの話者適応パラメータＰを用いることとし、また、Ｍ
ＡＰ推定法のように話者適応パラメータが直接使用でき
ないような場合には、雑音適応モデルＭc’に話者適応
を行った話者雑音適応モデルを考え、その雑音適応モデ
ルＭc’と雑音適応モデルＭc’の差を話者適応パラメー
タＰとして用いることで、様々な話者適応手法に対応す
ることが可能である。When the speaker adaptation method in which the speaker adaptation parameter P is obtained by the speaker adaptation process is used like the transformation matrix Wm 'of the MLLR and the offset vector bm', those speaker adaptation parameters P are used. , Again, M
When the speaker adaptation parameter cannot be directly used as in the AP estimation method, a speaker noise adaptation model in which speaker adaptation is performed on the noise adaptation model Mc 'is considered, and the noise adaptation model Mc' and the noise adaptation model are considered. By using the difference of Mc 'as the speaker adaptation parameter P, it is possible to support various speaker adaptation methods.

【００９２】また、ここでは平均ベクトルを適応する場
合を例示したが、共分散行列を適応する場合にも応用可
能である。Although the case of applying the average vector is illustrated here, the method is also applicable to the case of applying the covariance matrix.

【００９３】また、多くの話者適応手法では発話内容
（発話された単語や文が何であったのか）を知る必要が
ある。この場合は、音声認識処理を行う前に話者適応処
理のみを行い、その際、発話すべき内容は予め定めてお
き、定められた内容を話者に提示し、その提示に従って
発話してもらうようにすることで対処する。In many speaker adaptation methods, it is necessary to know the utterance content (what was the uttered word or sentence). In this case, only the speaker adaptation process is performed before the voice recognition process is performed, the content to be uttered is determined in advance, the determined content is presented to the speaker, and the utterance is performed according to the presentation. To deal with.

【００９４】話者適応では、話者の個人性への適応と共
に発話環境への適応も行われる。In speaker adaptation, adaptation to the speaking environment is performed as well as adaptation to the individuality of the speaker.

【００９５】背景雑音の無い環境でなされた発話を用い
て、背景雑音の無い環境で収録された音声データベース
を用いて学習された不特定話者モデルである初期音声モ
デルＭcを話者適応する場合は、背景雑音の影響を受け
ないので話者の個人性への適応のみを行う。In the case of speaker adaptation of an initial speech model Mc which is an unspecified speaker model learned by using a speech database recorded in an environment without background noise, using an utterance made in an environment without background noise. Is not affected by background noise and therefore only adapts to the individuality of the speaker.

【００９６】しかし、話者適応に用いる発話が背景雑音
のある環境下でなされ、これを用いて上述の初期音声モ
デルＭcを話者適応すると、話者の個人性への適応と適
応発話時の背景雑音への適応が同時になされることにな
る。However, if the utterance used for speaker adaptation is made in an environment with background noise and the above-described initial speech model Mc is adapted for speaker adaptation, adaptation to speaker individuality and adaptation Adaptation to background noise will be done at the same time.

【００９７】このため、一般には話者適応後の話者適応
モデルを用いて音声認識を行うと、音声認識時の発話環
境が適応発話時と同じ雑音環境であれば高い認識率を得
ることができるが、認識を行う発話環境が適応発話時と
異なる場合必ずしも高い認識率を得られない可能性があ
る。Therefore, generally, when speech recognition is performed using the speaker adaptation model after speaker adaptation, a high recognition rate can be obtained if the speech environment at the time of speech recognition is the same noise environment as at the time of adaptive speech. However, if the utterance environment for recognition is different from that during adaptive utterance, a high recognition rate may not always be obtained.

【００９８】本発明では、かかる問題に対処すべく、話
者適応処理を行う前に上述のように雑音適応部３で雑音
適応を行うことにより、上述初期音声モデルＭcを適応
時の発話音声と同じ背景雑音環境に適応させた雑音適応
モデルＭc’を生成し、そして、話者適応パラメータ算
出部４において、その雑音適応モデルＭc’を用いて話
者適応処理を行い、話者適応パラメータＰを算出する。In the present invention, in order to deal with such a problem, the noise adaptation unit 3 performs noise adaptation as described above before the speaker adaptation process is performed, so that the initial speech model Mc is changed to the speech speech at the time of adaptation. A noise adaptation model Mc 'adapted to the same background noise environment is generated, and the speaker adaptation parameter calculation unit 4 performs speaker adaptation processing using the noise adaptation model Mc' to obtain the speaker adaptation parameter P. calculate.

【００９９】尚、雑音適応モデルＭc’は話者適応処理
を行う前に、既に適応用発話環境と同じ背景雑音に適応
しているため、話者適応処理により求まる話者適応パラ
メータＰからは背景雑音適応項の影響が軽減され、本来
の目的である話者の個人性への適応項を多く含むものと
なる。Since the noise adaptation model Mc 'has already been adapted to the same background noise as the adaptation utterance environment before the speaker adaptation process is performed, the background from the speaker adaptation parameter P obtained by the speaker adaptation process. The influence of the noise adaptation term is reduced, and the adaptation term to the individuality of the speaker, which is the original purpose, is included.

【０１００】この話者適応パラメータＰを用いて、後述
する音声モデル更新部５が初期音声モデルを更新するこ
とで、適応発話時の背景雑音の影響の少ない話者適応モ
デルＭc”を生成する。The speaker adaptation parameter P is used to update the initial speech model by the speech model updating unit 5 to be described later, thereby generating a speaker adaptation model Mc ″ which is less affected by background noise during adaptive utterance.

【０１０１】音声モデル更新部５は、話者適応モデル記
憶部２に記憶されている初期音声モデルＭcを、話者適
応パラメータ算出部４の出力する話者適応パラメータＰ
を用いて話者適応モデルＭc”に変換する。The speech model updating unit 5 uses the speaker adaptation parameter P output from the speaker adaptation parameter calculation unit 4 as the initial speech model Mc stored in the speaker adaptation model storage unit 2.
Is converted into a speaker adaptation model Mc ″ using.

【０１０２】なお、上述のＭＬＬＲとＭＡＰ推定法を採
用し、初期音声モデルＭcの分布ｍの平均ベクトルμcm
を更新する場合の音声モデル更新部５の機能を説明する
こととする。The MLLR and the MAP estimation method described above are adopted, and the average vector μcm of the distribution m of the initial speech model Mc is adopted.
The function of the voice model updating unit 5 in the case of updating will be described.

【０１０３】上述のように、話者適応パラメータ算出部
４で話者適応処理としてＭＬＬＲを用い、話者適応パラ
メータＰとして変換行列Ｗm’とオフセットベクトルｂ
m’を用いる場合、話者適応更新後の話者適応モデルＭ
c”の分布ｍの平均ベクトルμcm”は、次式（7）の関係
から求められる。As described above, the speaker adaptation parameter calculation unit 4 uses the MLLR as the speaker adaptation process, and the transformation matrix Wm 'and the offset vector b are used as the speaker adaptation parameters P.
When m'is used, the speaker adaptation model M after the speaker adaptation is updated
The average vector μcm ″ of the distribution m of c ″ is obtained from the relationship of the following expression (7).

【０１０４】[0104]

【数７】 [Equation 7]

【０１０５】また、話者適応パラメータ算出部４で話者
適応処理としてＭＡＰ推定法を用い、話者適応パラメー
タＰとして差分ベクトルｄm’を用いる場合、平均ベク
トルμcm”は、次式（8）の関係から求められる。When the speaker adaptation parameter calculation unit 4 uses the MAP estimation method as the speaker adaptation process and the difference vector dm ′ as the speaker adaptation parameter P, the average vector μcm ″ is given by the following equation (8). Required from the relationship.

【０１０６】[0106]

【数８】 [Equation 8]

【０１０７】いずれの場合も、平均ベクトルμcm”は上
述のように、適応発話時の背景雑音の影響が少なく話者
の個人性への適応がなされた平均ベクトルとなる。In any case, the average vector μcm ″ is an average vector adapted to the individuality of the speaker with little influence of background noise during adaptive utterance, as described above.

【０１０８】そして、上記のように音声モデル更新部５
が話者適応モデル記憶部２に記憶された音声モデルＭc
を、話者適応パラメータ生成部４の出力する話者適応パ
ラメータＰを用いて更新し、その更新した話者適応モデ
ルＭc”を話者適応モデル記憶部２に更新記憶させる。
すなわち、音声認識に際して、話者適応モデルＭc”を
音声モデルＭcとして使用すべく、更新記憶させる。Then, as described above, the voice model updating unit 5
Is the voice model Mc stored in the speaker adaptation model storage unit 2.
Is updated using the speaker adaptation parameter P output from the speaker adaptation parameter generation unit 4, and the updated speaker adaptation model Mc ″ is updated and stored in the speaker adaptation model storage unit 2.
That is, upon voice recognition, the speaker adaptation model Mc ″ is updated and stored so as to be used as the voice model Mc.

【０１０９】認識処理部９は、音声認識処理を行うため
に設けられている。すなわち、音声認識の際、雑音適応
部３が、話者適応音声記憶部２に更新記憶されることと
なった話者適応モデルＭc（すなわち、話者適応モデル
Ｍc”）に対して、認識発話雑音環境下での背景雑音の
特徴ベクトル系列Ｎ(n)で雑音適応を施すことにより、
雑音適応を施した話者適応モデルＭreg生成し、その話
者適応モデルＭregを認識処理部９に供給する。The recognition processing section 9 is provided for performing voice recognition processing. That is, at the time of voice recognition, the noise adaptation unit 3 recognizes and utters the speaker adaptation model Mc that has been updated and stored in the speaker adaptation voice storage unit 2 (that is, the speaker adaptation model Mc ″). By performing noise adaptation with the feature vector sequence N (n) of background noise in a noisy environment,
A speaker adaptation model Mreg subjected to noise adaptation is generated, and the speaker adaptation model Mreg is supplied to the recognition processing unit 9.

【０１１０】そして、認識処理部９は、その雑音適応さ
れた話者適応モデルＭregを用いて構成した系列と、音
響分析部６側から供給される認識すべき発話音声の特徴
ベクトル系列Ｖ(n)とその認識候補単語や文のモデルと
を照合して、最も大きな尤度となる話者適応モデルＭre
gを用いて構成した系列を認識結果として出力する。The recognition processing section 9 then uses the noise-adapted speaker adaptation model Mreg and the characteristic vector series V (n) of the speech to be recognized supplied from the acoustic analysis section 6 side. ) And its recognition candidate word or sentence model, and the speaker adaptation model Mre having the largest likelihood is obtained.
The sequence constructed using g is output as the recognition result.

【０１１１】ここで、音声認識時に使用される上記の雑
音適応された話者適応モデルＭregは、上述のように話
者の個人性への適応がなされ、且つ認識発話時の背景雑
音への適応もなされたものとなる。The noise-adapted speaker adaptation model Mreg used during speech recognition is adapted to the individuality of the speaker as described above, and is adapted to background noise during recognition speech. It has been done.

【０１１２】このため、音声認識時の背景雑音環境と適
応発話時の背景雑音環境が異なっていても、音声認識時
には高い認識性能を得ることが可能である。Therefore, even if the background noise environment during voice recognition is different from the background noise environment during adaptive utterance, high recognition performance can be obtained during voice recognition.

【０１１３】次に、図２のフローチャートを参照して本
音声認識装置の動作を説明する。なお、図２は、話者適
応時の動作を示している。Next, the operation of the speech recognition apparatus will be described with reference to the flowchart of FIG. Note that FIG. 2 shows the operation when the speaker is adapted.

【０１１４】図２において話者適応の処理を開始する
と、まずステップＳ１００において、初期音声モデル記
憶部１に記憶されている初期音声モデルＭcを話者適応
モデル記憶部２に複写した後、雑音適応部３がその初期
音声モデルＭcに雑音適応を施すことにより、雑音適応
モデルＭc’を生成する。When the speaker adaptation process is started in FIG. 2, first, in step S100, the initial speech model Mc stored in the initial speech model storage unit 1 is copied to the speaker adaptation model storage unit 2, and then noise adaptation is performed. The unit 3 applies noise adaptation to the initial speech model Mc to generate a noise adaptation model Mc '.

【０１１５】すなわち、話者適応時の非発話期間に収音
される背景雑音の特徴ベクトル系列Ｎ(n)が音響分析部
６から雑音適応部３に供給され、雑音適応部３がその特
徴ベクトル系列Ｎ(n)によって初期音声モデルＭcに雑音
適応を施すことにより、雑音適応モデルＭc’を生成
し、話者適応パラメータ算出部４へ送出する。That is, the feature vector series N (n) of background noise picked up during the non-speech period during speaker adaptation is supplied from the acoustic analysis unit 6 to the noise adaptation unit 3, and the noise adaptation unit 3 receives the feature vector. A noise adaptation model Mc ′ is generated by applying noise adaptation to the initial speech model Mc with the sequence N (n), and is sent to the speaker adaptation parameter calculation unit 4.

【０１１６】次に、ステップＳ１０２において、話者が
発話を開始すると切替スイッチ８が話者適応パラメータ
算出部４側に切り替わり、その発話期間内に、背景雑音
の重畳した発話音声（背景雑音重畳音声）の特徴ベクト
ル系列Ｖ(n)が音声分析部６から話者適応パラメータ算
出部４に供給される。Next, in step S102, when the speaker starts utterance, the changeover switch 8 is switched to the speaker adaptation parameter calculation unit 4 side, and during the utterance period, the uttered voice with background noise superimposed (background noise superimposed voice) ) Feature vector sequence V (n) is supplied from the voice analysis unit 6 to the speaker adaptation parameter calculation unit 4.

【０１１７】そして、話者適応パラメータ算出部４がこ
れらの特徴ベクトル系列Ｖ(n)と雑音適応モデルＭc’に
よって、話者適応パラメータＰを生成する。Then, the speaker adaptation parameter calculation unit 4 generates the speaker adaptation parameter P from the feature vector series V (n) and the noise adaptation model Mc '.

【０１１８】つまり、既述したＭＬＬＲやＭＡＰ推定法
を適応して話者適応パラメータＰを求める場合には、変
換行列Ｗm’とオフセットベクトルｂm’を話者適応パラ
メータＰとして生成する。That is, when the speaker adaptation parameter P is obtained by applying the MLLR or MAP estimation method described above, the transformation matrix Wm 'and the offset vector bm' are generated as the speaker adaptation parameter P.

【０１１９】次に、ステップＳ１０４において、音声モ
デル更新部５が、話者適応モデル記憶部２に記憶されて
いる初期音声モデルＭcと話者適応パラメータＰとを用
いて、モデル更新演算を行うことで、話者適応モデルＭ
c”を求める。Next, in step S104, the voice model updating unit 5 uses the initial voice model Mc and the speaker adaptation parameter P stored in the speaker adaptation model storage unit 2 to perform model update calculation. And the speaker adaptation model M
ask for c ”.

【０１２０】次に、ステップＳ１０６において、音声モ
デル更新部５が、話者適応モデル記憶部２に記憶されて
いる初期音声モデルＭcに代えて、話者適応モデルＭc”
を更新記憶させた後、話者適応の処理を終了する。Next, in step S106, the voice model updating unit 5 replaces the initial voice model Mc stored in the speaker adaptation model storage unit 2 with the speaker adaptation model Mc ".
Is updated and stored, and the speaker adaptation process ends.

【０１２１】そして、この話者適応処理の後、認識処理
部９が音声認識の処理を行う際には、話者適応モデル記
憶部２に記憶された話者適応モデルＭc”を更新された
初期音声モデルＭcとして利用することとなり、その更
新された初期音声モデルＭc（別言うすれば、話者適応
モデルＭc”）を雑音適応部３が雑音適応することで、
雑音適応を施した話者適応モデルＭregを生成して音声
認識部９に供給し、更に、音声認識部９がその話者適応
モデルＭregより構成される系列と音響分析部６からの
話者音声の特徴ベクトル系列Ｖ(n)とを照合する。そし
て、最も高い尤度の得られる話者適応モデルＭregより
構成される話者適応系列を認識結果として出力する。After the speaker adaptation process, when the recognition processing unit 9 performs the voice recognition process, the speaker adaptation model Mc ″ stored in the speaker adaptation model storage unit 2 is updated to the initial stage. It is used as the speech model Mc, and the updated initial speech model Mc (in other words, the speaker adaptation model Mc ″) is noise-adapted by the noise adaptation unit 3,
A speaker adaptation model Mreg that has been subjected to noise adaptation is generated and supplied to the speech recognition unit 9, and the speech recognition unit 9 further outputs a sequence composed of the speaker adaptation model Mreg and the speaker speech from the acoustic analysis unit 6. With the feature vector series V (n). Then, the speaker adaptation sequence composed of the speaker adaptation model Mreg having the highest likelihood is output as the recognition result.

【０１２２】このように本実施形態の音声認識装置によ
れば、話者適応の処理を行う前に雑音適応の処理を行う
ので、その話者適応処理に際して求まる話者適応パラメ
ータに対して、話者適応時の背景雑音の悪影響を低減す
ることができる。As described above, according to the speech recognition apparatus of this embodiment, the noise adaptation process is performed before the speaker adaptation process is performed. Therefore, the speaker adaptation parameter obtained in the speaker adaptation process is used for the speaker adaptation process. It is possible to reduce the adverse effects of background noise during human adaptation.

【０１２３】そして、この背景雑音の悪影響が低減され
た話者適応パラメータを用いて話者適応モデルＭc”を
生成するので、話者適応本来の目的すなわち話者適応効
果の高い話者適応モデルＭc”を生成することが可能で
ある。Since the speaker adaptation model Mc ″ is generated by using the speaker adaptation parameter in which the adverse effect of the background noise is reduced, the speaker adaptation model Mc having a high original speaker adaptation effect, that is, the speaker adaptation model Mc. It is possible to generate

【０１２４】さらに音声認識時には、更新記憶されてい
る話者適応モデルＭc”を、その認識発話時の背景雑音
で雑音適応して用いる。Further, at the time of voice recognition, the updated and stored speaker adaptation model Mc ″ is used by noise adaptation with the background noise at the time of the recognition utterance.

【０１２５】このため、話者の個人性と発話時の背景雑
音の双方に適応したモデルを用いて認識を行うことが可
能であり、その結果高い認識性能が得られる。（第２の実施の形態）次に、本発明の第２の実施形態を
図３及び図４を参照して説明する。尚、図３は本実施形
態の音声認識装置の構成を示す図であり、図１と同一又
は相当する部分を同一符号で示している。また、本実施
形態は、音声認識の処理中に話者適応を行う。そこで、
図３中にし示す信号の通過経路を全て矢印付きの実線で
示している。Therefore, it is possible to perform recognition using a model adapted to both the individuality of the speaker and the background noise at the time of utterance, and as a result, high recognition performance can be obtained. (Second Embodiment) Next, a second embodiment of the present invention will be described with reference to FIGS. 3 is a diagram showing the configuration of the speech recognition apparatus of this embodiment, and the same or corresponding parts as in FIG. 1 are designated by the same reference numerals. In addition, the present embodiment performs speaker adaptation during the processing of voice recognition. Therefore,
All signal passage paths shown in FIG. 3 are shown by solid lines with arrows.

【０１２６】図３において、本音声認識装置と第１の実
施形態の音声認識装置との差異を述べると、第１の実施
形態の音声認識装置では、話者適応を行った後に音声認
識を行うのに対し、本実施形態の音声認識装置は、音声
認識中に話者適応の処理を同時に行うようになってい
る。Referring to FIG. 3, the difference between the present speech recognition apparatus and the speech recognition apparatus of the first embodiment will be described. In the speech recognition apparatus of the first embodiment, speech recognition is performed after speaker adaptation. On the other hand, the voice recognition apparatus of this embodiment is designed to simultaneously perform speaker adaptation processing during voice recognition.

【０１２７】更に、雑音適応部３から出力される雑音適
応モデルＭc’は、話者適応を行うべく話者適応パラメ
ータ算出部４へ送出される他、話者適応モデル記憶部２
の内容が話者適応モデルＭc”で更新されると、その雑
音適応モデルＭc’は、図１に示した雑音適応された話
者適応モデルＭregとして認識処理部９へ送出される。Furthermore, the noise adaptation model Mc 'output from the noise adaptation unit 3 is sent to the speaker adaptation parameter calculation unit 4 to perform speaker adaptation, and the speaker adaptation model storage unit 2
Is updated with the speaker adaptation model Mc ″, the noise adaptation model Mc ′ is sent to the recognition processing unit 9 as the noise adapted speaker adaptation model Mreg shown in FIG.

【０１２８】したがって、図３に示す雑音適応モデルＭ
c’は、雑音適応部３から話者適応パラメータ算出部４
及び認識処理部９へ出力されるが、話者適応パラメータ
算出部４へは話者適応の処理のための雑音適応モデルＭ
c’として、認識処理部９へは音声認識の処理のための
雑音適応された話者適応モデルＭregとして出力され
る。Therefore, the noise adaptive model M shown in FIG.
c'is the noise adaptation unit 3 to the speaker adaptation parameter calculation unit 4
And the noise adaptation model M for processing the speaker adaptation to the speaker adaptation parameter calculator 4.
As c ′, it is output to the recognition processing unit 9 as a noise-adapted speaker adaptation model Mreg for speech recognition processing.

【０１２９】認識処理部９は、既述した話者適応モデル
Ｍc”を初期音声モデルＭcとして雑音適応部３が雑音適
応モデルＭc’（すなわち、雑音適応された話者適応モ
デルＭreg）から構成される系列と、認識すべき発話音
声の特徴ベクトル系列Ｖ(n)とを照合して、最も大きな
尤度の得られる話者適応モデルＭregから構成される系
列を認識結果として出力する。更に、その尤度から認識
結果と発話音声との類似性を示すスコアデータＳＣＲを
生成して認識結果と共に出力する。In the recognition processing unit 9, the noise adaptation unit 3 is composed of the noise adaptation model Mc '(that is, the noise-adapted speaker adaptation model Mreg) using the already-described speaker adaptation model Mc "as the initial speech model Mc. And the feature vector sequence V (n) of the uttered speech to be recognized are collated, and a sequence composed of the speaker adaptation model Mreg having the largest likelihood is output as a recognition result. Score data SCR indicating the similarity between the recognition result and the spoken voice is generated from the likelihood and is output together with the recognition result.

【０１３０】つまり、上記の照合を行った結果、高い尤
度が得られた場合には、音声認識結果の信頼度が高いこ
とを示すスコアデータＳＣＲと上述の認識結果とを出力
し、高い尤度が得られなかった場合には、音声認識結果
の信頼度が低いことを示すスコアデータＳＣＲと上述の
認識結果とを出力して、話者適応パラメータ算出部４に
供給する。That is, when a high likelihood is obtained as a result of the above collation, score data SCR indicating that the reliability of the speech recognition result is high and the above recognition result are output, and the high likelihood is calculated. When the degree is not obtained, the score data SCR indicating that the reliability of the voice recognition result is low and the above-mentioned recognition result are output and supplied to the speaker adaptation parameter calculation unit 4.

【０１３１】そして、話者適応パラメータ算出部４は、
音声認識結果の信頼度が高いことを示すスコアデータＳ
ＣＲと上述の認識結果とが供給されると、発話音声を正
しく認識したと判断して、その音声認識の対象となって
いる発話音声の特徴ベクトル系列Ｖ(n)と、雑音適応部
３からの雑音適応モデルＭc’とから話者適応用の話者
適応パラメータＰを生成する。Then, the speaker adaptation parameter calculation unit 4
Score data S indicating that the reliability of the voice recognition result is high
When the CR and the above recognition result are supplied, it is determined that the uttered voice is correctly recognized, and the feature vector series V (n) of the uttered voice which is the target of the voice recognition and the noise adaptation unit 3 A speaker adaptation parameter P for speaker adaptation is generated from the noise adaptation model Mc ′ of the above.

【０１３２】更に音声モデル更新部５が、その話者適応
パラメータＰと、話者適応モデル記憶部２に記憶されて
いる初期音声モデルＭcとを用いて話者適応モデルＭc”
を生成し、その話者適応モデルＭc”を話者適応モデル
記憶部２に供給することで、音声モデルＭcに代えて更
新記憶させる。Further, the voice model updating unit 5 uses the speaker adaptation parameter P and the initial voice model Mc stored in the speaker adaptation model storage unit 2 to change the speaker adaptation model Mc ″.
Is generated, and the speaker adaptation model Mc ″ is supplied to the speaker adaptation model storage unit 2 so that the speaker adaptation model Mc ″ is updated and stored in place of the speech model Mc.

【０１３３】したがって、本音声認識装置は、音声認識
の処理をすればするほど、話者適応モデル記憶部２に記
憶されている初期音声モデルＭcの話者の個人性への適
応の度合いを次第に高めていくようになっている。Therefore, the more the speech recognition processing is performed, the present speech recognition apparatus gradually adjusts the degree of adaptation of the initial speech model Mc stored in the speaker adaptation model storage unit 2 to the individuality of the speaker. It is getting higher.

【０１３４】次に、本音声認識装置の動作を図４に示す
フローチャートを参照して説明する。Next, the operation of the speech recognition apparatus will be described with reference to the flowchart shown in FIG.

【０１３５】図４において音声認識処理を開始すると、
まずステップＳ２００において、雑音適応部３が、話者
適応モデル記憶部２に記憶されている初期音声モデルＭ
cに雑音適応を施すことにより、雑音適応モデルＭc’を
生成する。In FIG. 4, when the voice recognition process is started,
First, in step S200, the noise adaptation section 3 causes the noise adaptation section 3 to store the initial speech model M stored in the speaker adaptation model storage section 2.
A noise adaptation model Mc 'is generated by applying noise adaptation to c.

【０１３６】すなわち、話者が未だ発話を開始する前の
非発話期間に収音される背景雑音の特徴ベクトル系列Ｎ
(n)が音響分析部６から雑音適応部３に供給され、雑音
適応部３がその背景雑音の特徴ベクトル系列Ｎ(n)によ
って初期音声モデルＭcを雑音適応することにより、雑
音適応モデルＭc’を生成する。That is, the feature vector series N of background noise picked up in the non-speech period before the speaker has started speaking.
(n) is supplied from the acoustic analysis unit 6 to the noise adaptation unit 3, and the noise adaptation unit 3 noise-adapts the initial speech model Mc with the feature vector sequence N (n) of the background noise, thereby generating the noise adaptation model Mc ′. To generate.

【０１３７】次に、ステップＳ２０２において、話者が
発話を開始すると、切替スイッチ８が認識処理部９側に
切替わり、その発話期間に発話される発話音声の特徴ベ
クトル系列Ｖ(n)が音響分析部６から認識処理部９に供
給されるようになる。Next, in step S202, when the speaker starts speaking, the change-over switch 8 is switched to the recognition processing section 9 side, and the feature vector sequence V (n) of the spoken voice spoken during the speech period is sounded. It is supplied from the analysis unit 6 to the recognition processing unit 9.

【０１３８】そして、認識処理部９が、雑音適応部３で
生成された雑音適応モデルＭc’を用いて認識候補単語
モデルや認識候補文モデルを生成する。Then, the recognition processing unit 9 uses the noise adaptation model Mc 'generated by the noise adaptation unit 3 to generate a recognition candidate word model and a recognition candidate sentence model.

【０１３９】そして更に認識処理部９は、次のステップ
Ｓ２０４において、認識候補単語モデルや認識候補文モ
デルと特徴ベクトル系列Ｖ(n)とを照合することによ
り、音声認識を行い、認識結果とスコアデータＳＣＲを
出力する。Then, in the next step S204, the recognition processing section 9 collates the recognition candidate word model or the recognition candidate sentence model with the feature vector series V (n) to perform voice recognition, and the recognition result and the score. Output the data SCR.

【０１４０】次にステップＳ２０６において、話者適応
パラメータ算出部４が、スコアデータＳＣＲが高スコア
ーとなっているか判断し、高スコアーでない場合（「Ｎ
Ｏ」の場合）には認識結果の信頼度が低いと判断して後
述のステップＳ２１４に移行し、高スコアーのとき
（「ＹＥＳ」の場合）には、ステップＳ２０８へ移行す
る。Next, in step S206, the speaker adaptation parameter calculation unit 4 determines whether the score data SCR has a high score, and if the score data SCR does not have a high score ("N
In the case of “O”), it is determined that the reliability of the recognition result is low, and the process proceeds to step S214 described later. When the score is high (in the case of “YES”), the process proceeds to step S208.

【０１４１】ステップＳ２０８では、話者適応パラメー
タ算出部４は、現在認識対象となっている発話音声の特
徴ベクトル系列Ｖ(n)と雑音適応モデルＭc’と認識結果
によって、話者適応のための話者適応パラメータＰを生
成する。In step S208, the speaker adaptation parameter calculation unit 4 determines the speaker adaptation for the speaker adaptation based on the feature vector series V (n) of the speech that is currently the recognition target, the noise adaptation model Mc ', and the recognition result. A speaker adaptation parameter P is generated.

【０１４２】次に、ステップＳ２１０において、音声モ
デル更新部５が、話者適応モデル記憶部２に記憶されて
いる初期音声モデルＭcと話者適応パラメータＰとを用
いて話者適応処理を行うことで、話者適応モデルＭc”
を求める。Next, in step S210, the voice model updating unit 5 performs the speaker adaptation process using the initial voice model Mc and the speaker adaptation parameter P stored in the speaker adaptation model storage unit 2. Then, the speaker adaptation model Mc ”
Ask for.

【０１４３】更にステップＳ２１２において、音声モデ
ル更新部５が、生成した話者適応モデルＭc”を話者適
応モデル記憶部２に供給し、音声モデルＭcに置き換え
て更新記憶させた後、処理を終了する。Further, in step S212, the speech model updating unit 5 supplies the generated speaker adaptation model Mc ″ to the speaker adaptation model storage unit 2, replaces it with the speech model Mc, and updates and stores it. To do.

【０１４４】このように、本実施形態の音声認識装置に
よれば、音声認識と話者適応を同時進行的に行い、話者
の個人性への適応の度合いの高い話者適応モデルＭc”
を生成して、話者適応モデル記憶部２に更新記憶させ
る。As described above, according to the speech recognition apparatus of this embodiment, the speech adaptation and the speaker adaptation are simultaneously performed in a progressive manner, and the speaker adaptation model Mc ″ having a high degree of adaptation to the individuality of the speaker.
Is generated and updated and stored in the speaker adaptation model storage unit 2.

【０１４５】このため、異なった単語や文がたくさん発
話され、それらの発話音声を認識処理部９が累積的に音
声認識していくにしたがって、話者適応モデル記憶部２
に記憶されている初期音声モデルＭcは、話者の個人性
への適応の度合いの高い話者適応モデルＭc”へと更新
されていくことになり、音声認識性能の向上を図ること
が可能となっている。Therefore, a large number of different words and sentences are uttered, and as the recognition processing unit 9 cumulatively recognizes the uttered voices, the speaker adaptation model storage unit 2
The initial speech model Mc stored in the above will be updated to the speaker adaptation model Mc ″ having a high degree of adaptation to the individuality of the speaker, and the speech recognition performance can be improved. Has become.

【０１４６】また、高スコアーが得られた場合に、話者
適応モデルＭc”を生成して初期音声モデルＭcを更新す
るので、発話環境の状態等に応じて適切な話者適応を行
うことができ、音声認識性能を低下させるような不適切
な話者適応を未然に防止し、ひいては音声認識性能の向
上を実現することができる。Also, when a high score is obtained, the speaker adaptation model Mc ″ is generated and the initial speech model Mc is updated, so that appropriate speaker adaptation can be performed according to the state of the utterance environment. Therefore, it is possible to prevent inappropriate speaker adaptation that deteriorates the voice recognition performance, and to improve the voice recognition performance.

【０１４７】また、音声認識と話者適応を同時進行的に
行う本実施形態の音声認識装置においても、既述した第
１の実施形態と同様、話者適応部で話者適応の処理を行
う前に、雑音適応部３で雑音適応の処理が行われるの
で、その話者適応処理に際して求まる話者適応パラメー
タＰに対して、話者適応時の背景雑音の悪影響を低減す
ることができるという優れた効果が得られる。Further, also in the speech recognition apparatus of this embodiment which simultaneously performs speech recognition and speaker adaptation, speaker adaptation processing is performed by the speaker adaptation unit as in the first embodiment described above. Since the noise adaptation unit 3 performs the noise adaptation process before, it is possible to reduce the adverse effect of the background noise during the speaker adaptation with respect to the speaker adaptation parameter P found in the speaker adaptation process. The effect is obtained.

【０１４８】[0148]

【発明の効果】以上説明したように本発明の音声認識装
置及び音声認識方法によれば、初期音声モデルに対し雑
音適応を施すことで雑音適応モデルを生成し、この雑音
適応モデルに対して話者適応演算を施すことで話者適応
パラメータを求め、雑音適応前の初期音声モデルに対し
この話者適応パラメータで話者適応施すことで話者適応
モデルを生成することとしたので、話者適応時の背景雑
音の悪影響を低減し話者適応本来の目的である話者の個
人性への適応効果の高い話者適応モデルを生成すること
ができる。As described above, according to the speech recognition apparatus and the speech recognition method of the present invention, the noise adaptation model is generated by applying the noise adaptation to the initial speech model, and the noise adaptation model is talked to. The speaker adaptation parameter is obtained by applying the speaker adaptation operation, and the speaker adaptation model is generated by applying the speaker adaptation to the initial speech model before noise adaptation with this speaker adaptation parameter. It is possible to reduce the adverse effects of background noise and to generate a speaker adaptation model that has a high adaptation effect on the individuality of the speaker, which is the original purpose of speaker adaptation.

【０１４９】また、音声認識時に、上記の話者適応した
話者適応モデルに雑音適応を施して雑音適応した話者適
応モデルを生成し、その雑音適応話者適応モデルを用い
て音声認識の処理を行うので、認識発話時の背景雑音と
話者の個人性双方に適応した雑音話者適応モデルを用い
て音声認識を行うことができ、様々な発話雑音環境にお
いて高い認識性能を得ることができる。Further, at the time of speech recognition, noise adaptation is applied to the above speaker-adapted speaker adaptation model to generate a noise-adapted speaker adaptation model, and speech recognition processing is performed using the noise-adapted speaker adaptation model. Therefore, it is possible to perform speech recognition using a noise speaker adaptation model that adapts to both background noise during recognition speech and the individuality of the speaker, and it is possible to obtain high recognition performance in various speech noise environments. .

【図面の簡単な説明】[Brief description of drawings]

【図１】第１の実施形態の音声認識装置の構成を示す図
である。FIG. 1 is a diagram illustrating a configuration of a voice recognition device according to a first embodiment.

【図２】第１の実施形態の音声認識装置の動作を示すフ
ローチャートである。FIG. 2 is a flowchart showing an operation of the voice recognition device in the first exemplary embodiment.

【図３】第２の実施形態の音声認識装置の構成を示す図
である。FIG. 3 is a diagram showing a configuration of a voice recognition device according to a second embodiment.

【図４】第２の実施形態の音声認識装置の動作を示すフ
ローチャートである。FIG. 4 is a flowchart showing an operation of the voice recognition device of the second exemplary embodiment.

【符号の説明】[Explanation of symbols]

１…初期音声モデル記憶部２…話者適応モデル記憶部３…雑音適応部４…話者適応パラメータ生成部５…音声モデル更新部６…音響分析部７…マイクロフォン８…切替スイッチ９…認識処理部 1. Initial voice model storage unit 2 ... Speaker adaptation model storage 3 ... Noise adaptation section 4 ... Speaker adaptation parameter generation unit 5 ... Voice model update unit 6 ... Acoustic analysis unit 7 ... Microphone 8 ... Changeover switch 9 ... Recognition processing unit

Claims

【特許請求の範囲】[Claims]

【請求項１】初期音声モデルを有する記憶手段と、前記記憶手段の初期音声モデルに話者適応時の背景雑音
によって雑音適応を施すことで雑音適応モデルを生成す
る雑音適応手段と、前記雑音適応手段で生成された前記雑音適応モデルに対
し、前記話者適応時に発話された発話音声によって話者
適応演算を行い、前記雑音適応モデルを雑音重畳話者適
応モデルに変換するための話者適応パラメータを算出す
る話者適応パラメータ算出手段と、前記記憶手段の初期音声モデルに前記話者適応パラメー
タで話者適応を施すことにより話者適応モデルを生成
し、当該話者適応モデルを前記初期音声モデルに代えて
前記記憶手段に更新記憶させる音声モデル更新手段と、
を備えることを特徴とする音声認識装置。1. Storage means having an initial speech model; noise adaptation means for generating a noise adaptation model by subjecting the initial speech model of the storage means to noise adaptation by background noise during speaker adaptation; and the noise adaptation. A speaker adaptation parameter for performing a speaker adaptation operation on the noise adaptation model generated by the means by the speech uttered during the speaker adaptation, and converting the noise adaptation model into a noise-superimposed speaker adaptation model. A speaker adaptation parameter calculation means for calculating a speaker adaptation model by applying speaker adaptation to the initial speech model of the storage means with the speaker adaptation parameter, and the speaker adaptation model is generated by the initial speech model. In place of the voice model updating means for updating and storing in the storage means,
A voice recognition device comprising:

【請求項２】音声認識時に音声認識処理を行う認識処
理手段を備え、更に前記雑音適応手段は、前記音声認識時の非発話期間
における背景雑音によって、前記記憶手段に更新記憶さ
れた前記話者適応モデルに対し雑音適応を施すことで雑
音適応を施した話者適応モデルを生成し、当該雑音適応
を施した話者適応モデルを、発話音声を音声認識するた
めの音響モデルとして前記音声認識手段に供給すること
を特徴とする請求項１に記載の音声認識装置。2. A recognition processing means for performing voice recognition processing at the time of voice recognition, further comprising: the noise adaptation means, wherein the speaker updated and stored in the storage means due to background noise in a non-speech period at the time of voice recognition. A noise adaptation is performed on the adaptive model to generate a speaker adaptation model subjected to noise adaptation, and the speaker adaptation model subjected to the noise adaptation is used as the acoustic model for recognizing speech of the uttered speech. The voice recognition device according to claim 1, wherein the voice recognition device is supplied to the voice recognition device.

【請求項３】初期音声モデルを有する記憶手段と、音声認識時の非発話期間における背景雑音により前記記
憶手段の初期音声モデルに雑音適応を施すことで雑音適
応モデルを生成する雑音適応手段と、前記音声認識時の発話期間に発話された音声認識すべき
発話音声と、前記雑音適応手段で生成された前記雑音適
応モデルとを照合して音声認識を行う認識処理手段と、前記雑音適応手段で生成された前記雑音適応モデルに対
し前記音声認識すべき発話音声によって話者適応演算を
行い、前記雑音適応モデルを雑音重畳話者適応モデルに
変換するための話者適応パラメータを算出する話者適応
パラメータ算出手段と、前記記憶手段の初期音声モデル
に前記話者適応パラメータで話者適応を施すことにより
話者適応モデルを生成し、当該話者適応モデルを前記初
期音声モデルに代えて前記記憶手段に更新記憶させる音
声モデル更新手段と、を備えることを特徴とする音声認
識装置。3. Storage means having an initial speech model, and noise adaptation means for generating a noise adaptation model by applying noise adaptation to the initial speech model of the storage means by background noise in a non-speech period during speech recognition. A recognition processing unit for performing voice recognition by collating the uttered voice to be recognized during the utterance period at the time of voice recognition with the noise adaptation model generated by the noise adaptation unit, and the noise adaptation unit. Speaker adaptation in which a speaker adaptation calculation is performed on the generated noise adaptation model with the speech to be recognized, and a speaker adaptation parameter for converting the noise adaptation model into a noise-superimposed speaker adaptation model is calculated. A speaker adaptation model is generated by applying speaker adaptation to the parameter calculation means and the initial speech model of the storage means using the speaker adaptation parameter, and the speaker adaptation model is generated. A voice recognition device, comprising: a voice model updating means for updating and storing a response model in the storage means instead of the initial voice model.

【請求項４】前記話者適応パラメータ算出手段と音声
モデル更新手段は、前記認識処理手段の認識結果の信頼
度が高い場合に、前記話者適応モデルを生成して前記初
期音声モデルに代えて前記記憶手段に更新記憶させるこ
とを特徴とする請求項３に記載の音声認識装置。4. The speaker adaptation parameter calculating means and the voice model updating means generate the speaker adaptation model and replace the initial speech model when the reliability of the recognition result of the recognition processing means is high. The speech recognition apparatus according to claim 3, wherein the speech recognition apparatus updates and stores the speech in the storage unit.

【請求項５】記憶手段に記憶されている初期音声モデ
ルに話者適応時の背景雑音によって雑音適応を施すこと
で雑音適応モデルを生成する雑音適応処理工程と、前記
雑音適応処理工程で生成された前記雑音適応モデルに対
し、前記話者適応時に発話された発話音声によって話者
適応演算を行い、前記雑音適応モデルを雑音重畳話者適
応モデルに変換するための話者適応パラメータを算出す
る話者適応パラメータ算出処理工程と、前記記憶手段の
初期音声モデルに前記話者適応パラメータで話者適応を
施すことにより話者適応モデルを生成し、当該話者適応
モデルを前記初期音声モデルに代えて前記記憶手段に更
新記憶させる音声モデル更新処理工程と、を備えること
を特徴とする音声認識方法。5. A noise adaptation processing step of generating a noise adaptation model by applying noise adaptation to the initial speech model stored in the storage means by background noise during speaker adaptation, and a noise adaptation processing step generated by the noise adaptation processing step. A speaker adaptive calculation is performed on the noise adaptive model using the speech uttered during the speaker adaptation, and a speaker adaptive parameter for converting the noise adaptive model into a noise-superimposed speaker adaptive model is calculated. Speaker adaptation parameter calculation processing step, a speaker adaptation model is generated by applying speaker adaptation to the initial speech model of the storage means with the speaker adaptation parameter, and the speaker adaptation model is replaced with the initial speech model. And a voice model update processing step of updating and storing in the storage means.

【請求項６】更に前記雑音適応処理工程では、音声認
識時の非発話期間における背景雑音によって、前記記憶
手段に更新記憶された前記話者適応モデルに対し雑音適
応を施すことで雑音適応を施した話者適応モデルを生成
し、前記雑音適応を施した話者適応モデルと、前記音声
認識時の発話期間における音声認識すべき発話音声とを
照合することにより音声認識を行う音声認識処理工程
と、を備えることを特徴とする請求項５に記載の音声認
識方法。6. Further, in the noise adaptation processing step, noise adaptation is performed by applying noise adaptation to the speaker adaptation model updated and stored in the storage means by background noise in a non-speech period during voice recognition. A speech recognition processing step of performing speech recognition by generating a speaker adaptation model described above, and collating the speaker adaptation model subjected to the noise adaptation with the speech to be recognized in the speech period during the speech recognition. The voice recognition method according to claim 5, further comprising:

【請求項７】音声認識時の非発話期間における背景雑
音により、記憶手段に記憶されている初期音声モデルに
雑音適応を施すことで雑音適応モデルを生成する雑音適
応処理工程と、前記音声認識時の発話期間に発話される音声認識すべき
発話音声と、前記雑音適応処理工程で生成された前記雑
音適応モデルとを照合して音声認識を行う認識処理工程
と、前記雑音適応処理工程で生成された前記雑音適応モデル
に対し前記音声認識すべき発話音声によって話者適応演
算を行い、前記雑音適応モデルを雑音重畳話者適応モデ
ルに変換するための話者適応パラメータを算出する話者
適応パラメータ算出処理工程と、前記記憶手段の初期音声モデルに前記話者適応パラメー
タで話者適応を施すことにより話者適応モデルを生成
し、当該話者適応モデルを前記初期音声モデルに代えて
前記記憶手段に更新記憶させる音声モデル更新処理工程
と、を備えることを特徴とする音声認識装置。7. A noise adaptation processing step of generating a noise adaptation model by applying noise adaptation to an initial speech model stored in a storage means due to background noise in a non-speech period at the time of speech recognition; Of the speech to be recognized during the utterance period and the noise adaptation model generated in the noise adaptation processing step to perform speech recognition, and a recognition processing step generated in the noise adaptation processing step. Speaker adaptation parameter calculation for performing speaker adaptation calculation on the noise adaptation model with the speech to be recognized, and calculating a speaker adaptation parameter for converting the noise adaptation model into a noise-superimposed speaker adaptation model A speaker adaptation model is generated by subjecting the initial speech model of the storage means to speaker adaptation using the speaker adaptation parameter, and the speaker adaptation model is generated. Voice model update processing step of updating and storing a voice model in the storage means instead of the initial voice model.

【請求項８】前記話者適応パラメータ算出処理工程と
音声モデル更新処理工程は、前記認識処理工程の認識結
果の信頼度が高い場合に、前記話者適応モデルを生成し
て前記初期音声モデルに代えて前記記憶手段に更新記憶
させることを特徴とする請求項７に記載の音声認識方
法。8. The speaker adaptation parameter calculation processing step and the speech model update processing step generate the speaker adaptation model to generate the initial speech model when the reliability of the recognition result of the recognition processing step is high. The speech recognition method according to claim 7, wherein the storage means is updated and stored instead.