JPH06214592A - Noise resisting phoneme model generating system - Google Patents

Noise resisting phoneme model generating system

Info

Publication number
JPH06214592A
JPH06214592A JP5005688A JP568893A JPH06214592A JP H06214592 A JPH06214592 A JP H06214592A JP 5005688 A JP5005688 A JP 5005688A JP 568893 A JP568893 A JP 568893A JP H06214592 A JPH06214592 A JP H06214592A
Authority
JP
Japan
Prior art keywords
hmm
noise
phoneme
phonological
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP5005688A
Other languages
Japanese (ja)
Other versions
JP3247746B2 (en
Inventor
Kiyohiro Kano
清宏 鹿野
Yasuhiro Minami
泰浩 南
Tatsuo Matsuoka
達雄 松岡
Marutan Furanku
マルタン フランク
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP00568893A priority Critical patent/JP3247746B2/en
Publication of JPH06214592A publication Critical patent/JPH06214592A/en
Application granted granted Critical
Publication of JP3247746B2 publication Critical patent/JP3247746B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

PURPOSE:To improve voice recognition performance by adding noise HMM and phoneme HMM, which is generated based on the voice information beforehand collected under a noise free condition, in a spectrum region and generating a phoneme model suitable to uttered voice environment. CONSTITUTION:In a cepstrum region, a phoneme HMM(Hidden Markov Model) in a noise free condition and a noise HMM in a noisy condition are beforehand generated, are respectively cosine transformed and respective logarithmic spectra are computed. Then, they are exponential transformed, respective linear spectra are computed and output probabilities of phoneme and noise are obtained by convolution. The noise convoluted phoneme HMM is exponential transformed, is inverse cosine transformed and a phoneme HMM is generated. Since the noise model of the uttered voice environment is used, a good phoneme HMM, which is suitable to the uttered voice environment, is generated and a high phoneme recognition rate is obtained.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、音声認識に用いる音韻
モデルの作成技術に関し、特に、雑音の存在する場所で
の音声認識に有効な耐雑音音韻モデルの作成に関するも
のである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for creating a phoneme model used for speech recognition, and more particularly to the creation of a noise-resistant phoneme model effective for speech recognition in the presence of noise.

【0002】[0002]

【従来の技術】音韻HMMを用いた従来の音声認識方式
を図を用いて説明する。図6は従来の音声認識方式を実
現する機能ブロック図の一例を示す図である。同図中1
は音声入力部、2は認識部、3は認識結果出力部であ
る。4は音韻モデル格納部であり音韻HMMはここに格
納されている。
2. Description of the Related Art A conventional speech recognition method using a phoneme HMM will be described with reference to the drawings. FIG. 6 is a diagram showing an example of a functional block diagram for realizing a conventional voice recognition system. 1 in the figure
Is a voice input unit, 2 is a recognition unit, and 3 is a recognition result output unit. The phoneme model storage unit 4 stores the phoneme HMM.

【0003】まず、図5の動作を説明する。認識部2で
は、音声入力部1に入力された音声をフレーム化し、フ
レームごとに、音韻モデル格納部4に予め格納された音
韻HMMを用いて確率計算を行い、確率の高いものを認
識結果として認識結果出力部3に送信していた。認識部
2の動作をより詳細に説明する。ある一定の期間、音声
が入力されると、この入力を単位時間に区切ってフレー
ム化し、各フレーム毎に特徴パラメータを抽出する。そ
して、最初のフレームについて抽出された特徴パラメー
タについて、各音韻ごとに用意された音韻HMMを用い
て、例えば「あ」である確率、「い」である確率、
「う」である確率、等を音韻ごとに求める。
First, the operation of FIG. 5 will be described. In the recognition unit 2, the voice input to the voice input unit 1 is framed, the probability is calculated for each frame using the phoneme HMM stored in advance in the phoneme model storage unit 4, and the one with a high probability is used as the recognition result. It was transmitted to the recognition result output unit 3. The operation of the recognition unit 2 will be described in more detail. When a voice is input for a certain period of time, the input is divided into unit time frames and the feature parameters are extracted for each frame. Then, for the characteristic parameters extracted for the first frame, using the phoneme HMM prepared for each phoneme, for example, the probability of being “A”, the probability of being “I”,
The probability of being "u", etc. is obtained for each phoneme.

【0004】続いて、次のフレームの特徴パラメータを
入力し、前の状態の次にこのフレームがくる確率を、そ
れぞれ音韻ごとに求める。これを、入力音声のすべての
フレームについて行い、例えば「あ」の音韻についての
音韻HMMを用いたときに最も確率が高ければ、入力音
声を「あ」として認識する。
Next, the characteristic parameters of the next frame are input, and the probability that this frame will come after the previous state is obtained for each phoneme. This is performed for all the frames of the input voice, and if the probability is highest when using the phoneme HMM for the phoneme of "a", the input voice is recognized as "a".

【0005】[0005]

【発明が解決しようとする課題】しかし従来、音韻HM
Mの作成は、雑音のない状態で得られた音声情報をもと
に行っており、実環境では、雑音の影響を受けて音声認
識の性能が著しく低下する。また、あらかじめ種々の雑
音下で音声を収録し、この音声から音韻HMMを作成す
る方式では、雑音の種類が膨大であるため、高い認識性
能を得るためにはシステムが肥大化するという問題点が
ある。
However, in the past, the phonological HM has been used.
The M is created based on the voice information obtained in a noise-free state, and in a real environment, the voice recognition performance is significantly deteriorated due to the influence of noise. In addition, in a method in which voices are recorded in advance under various noises and a phonological HMM is created from the voices, there is a problem that the system is enlarged in order to obtain high recognition performance because the types of noises are huge. is there.

【0006】本発明は、上記問題点を解決し、簡易なシ
ステムで用いることができ、これにより雑音の多い状況
下においても高い認識性能が得られるような音韻モデル
を作成することにより、発声環境に合わせた音韻モデル
を作成し、音声認識性能を向上させることを目的とす
る。
The present invention solves the above-mentioned problems and can be used in a simple system, thereby creating a phonological model that can obtain high recognition performance even in a noisy environment, thereby producing a utterance environment. The purpose of this study is to improve the speech recognition performance by creating a phoneme model suitable for.

【0007】[0007]

【課題を解決するための手段】上記目的を達成するた
め、本発明の耐雑音音韻HMMの作成方式では、発声場
所の雑音に基づいて雑音HMMを作成し、この雑音HM
Mと予め雑音の無い状態で収集された音声情報に基づい
て作成された音韻HMMとを、スペクトル領域で加算す
ることにより、耐雑音音韻HMMを作成することを主要
な特徴とする。
In order to achieve the above object, in the method of creating a noise-resistant phoneme HMM according to the present invention, a noise HMM is created based on the noise at the utterance location, and this noise HM is created.
The main feature is to create a noise-resistant phoneme HMM by adding M and a phoneme HMM created based on voice information collected in advance in the absence of noise in the spectral domain.

【0008】[0008]

【作用】これにより、発声場所の雑音を考慮した音韻H
MMが作成でき、雑音下における声認識性能の向上を可
能にする。
As a result, the phonological unit H that takes into account the noise of the vocalization site
An MM can be created and the voice recognition performance in noise can be improved.

【0009】[0009]

【実施例】図1は本発明による、耐雑音音韻HMMの作
成手順を示す図である。以下同図に従って作成手順を説
明する。音声認識のHMMで用いられている音響パラメ
ータとして、ケプストラム係数が広く用いられている。
このケプストラム係数は、対数スペクトル(対数パワー
スペクトル)とコサイン変換の関係にある。一方、周囲
雑音等は加法性雑音であり、音声とはスペクトル領域で
加算することができる。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a diagram showing a procedure for creating a noise resistant phoneme HMM according to the present invention. The creation procedure will be described below with reference to FIG. Cepstral coefficients are widely used as acoustic parameters used in HMMs for speech recognition.
The cepstrum coefficient has a relationship of logarithmic spectrum (logarithmic power spectrum) and cosine transform. On the other hand, ambient noise is additive noise and can be added to speech in the spectral domain.

【0010】まず、ケプストラム領域で、雑音のない環
境での音韻HMMと、雑音のある環境での雑音HMMと
を作成しておき、それぞれをコサイン変換してそれぞれ
の対数スペクトルを算出する。次に、それらにエキスポ
ネンシャル変換を行ってそれぞれの線形スペクトルを算
出し、音韻と雑音の出力確率を畳み込み(コンボルーシ
ョン)で求め、このように構成した雑音込みの音韻HM
Mを対数変換し、続いて逆コサイン変換を行うことによ
り、音韻HMMを作成する。
First, in the cepstrum region, a phoneme HMM in a no-noise environment and a noise HMM in a noisy environment are created, and each logarithm spectrum is calculated by cosine transform. Next, exponential transformation is performed on them to calculate their respective linear spectra, and the output probabilities of the phoneme and noise are obtained by convolution (convolution), and the phoneme HM with noise thus constructed is constructed.
A phonetic HMM is created by logarithmically transforming M and then performing an inverse cosine transform.

【0011】これらの変換を、音声、雑音についてそれ
ぞれについて行ない、この線形スペクトル領域で、両者
のパラメータを加算する。その後、この線形スペクトル
を対数変換して対数スペクトル領域に戻し、さらに、コ
サイン変換でケプストラム領域に戻し、周囲雑音を考慮
した音韻HMMを作成する。
These conversions are carried out for voice and noise respectively, and the parameters of both are added in this linear spectral region. After that, the linear spectrum is logarithmically transformed back to the logarithmic spectrum domain, and further returned to the cepstrum domain by the cosine transform to create a phoneme HMM considering ambient noise.

【0012】ここで、音韻HMMと雑音HMMは、正規
分布の混合(いわゆるGaussian mixture)を用いたHM
Mで表されているとする。よって、上記の変換では、平
均値の変換とともに、共分散についても変換を行う。次
に具体的計算方法について説明する。まず、これらのH
MMのパラメータとして、0次からp次までのケプスト
ラム係数を考える。このベクトルを C=(C0 1 2 - - - - - Cp-1 p ) D=(D0 1 2 - - - - - Dp-1 p ) と表す。ここで、Cは音韻のパラメータ、Dは雑音のパ
ラメータである。このケプストラム係数から対数スペク
トルの変換は、コサイン変換が知られているが、これは
線形変換であり、(p+1) xmの変換行列(COS)で表す。
対数スペクトルのベクトルをそれぞれLC, LDで表
す。よって、コサイン変換による平均値は、 LC=C(COS) LD=D(COS) となる。また、ケプストラム係数の共分散をΣC 、ΣD
とすると、対数スペクトルの共分散ΣLC、ΣLDは、 ΣLC=(COS)ΣC(COS)t ΣLD=(COS)ΣD(COS)t となる。このように、平均値と共分散を変換することに
より、音声、雑音の正規分布の対数スペクトル領域での
平均値と共分散が得られる。
Here, the phonological HMM and the noise HMM are HM using a mixture of normal distributions (so-called Gaussian mixture).
Let it be represented by M. Therefore, in the above conversion, not only the conversion of the average value but also the conversion of the covariance is performed. Next, a specific calculation method will be described. First, these H
Consider the 0th to pth cepstrum coefficients as MM parameters. This vector is represented as C = (C 0 C 1 C 2 ----- C p-1 C p ) D = (D 0 D 1 D 2 ---- D p-1 D p ). Here, C is a phonological parameter, and D is a noise parameter. Cosine transform is known as the transform of the cepstrum coefficient to the logarithmic spectrum, but this is a linear transform and is represented by a (p + 1) xm transform matrix (COS).
The logarithmic spectrum vectors are represented by LC and LD, respectively. Therefore, the average value by the cosine transformation is LC = C (COS) LD = D (COS). In addition, the covariance of the cepstrum coefficient is Σ C , Σ D
Then, the covariances Σ LC and Σ LD of the logarithmic spectrum are Σ LC = (COS) Σ C (COS) t Σ LD = (COS) Σ D (COS) t . In this way, by converting the average value and the covariance, the average value and the covariance in the logarithmic spectral region of the normal distribution of voice and noise can be obtained.

【0013】次に、対数スペクトルを線形スペクトルに
変換するエキスポネンシャル変換について述べる。この
変換では正規分布の形は保たれないが、これを正規分布
で近似する。エキスポネンシャル変換したときの分布形
の平均値SC, SDと共分散ΣSC,ΣSDを計算すると、 SCi =exp(LCi +ΣLC ii/2) SDi =exp(LDi +ΣLD ii/2) ΣSC ij=SCi × SCj ×{ exp( ΣLC ij )−1 } ΣSD ij=SDi × SDj ×{ exp( ΣLD ij )−1 } となる。このように、線形スペクトルの領域で音声と加
法性雑音の和をとることになり、和をとることは、分布
形のコンボルーションであるので、容易に2つの分布形
のコンボルーションの結果の平均値、Mと共分散、ΣM
を求めることができる。よって、 M=SCi +SDi ΣM =ΣSC+ΣSD となる。このようにして得られた分布形の平均値と共分
散を、今までの過程と逆に、ケプストラム領域まで変換
して行く。
Next, exponential conversion for converting a logarithmic spectrum into a linear spectrum will be described. Although the shape of the normal distribution cannot be maintained by this conversion, this is approximated by the normal distribution. Average value of the distribution type when converted exponential SC, SD and covariance sigma SC, when calculating the Σ SD, SC i = exp ( LC i + Σ LC ii / 2) SD i = exp (LD i + Σ LD ii / 2) Σ SC ij = SC i × SC j × {exp (Σ LC ij ) −1} Σ SD ij = SD i × SD j × {exp (Σ LD ij ) −1}. In this way, the sum of the speech and the additive noise is taken in the region of the linear spectrum, and since taking the sum is a distributed convolution, it is easy to average the results of the two distributed convolutions. Value, covariance with M , Σ M
Can be asked. Therefore, M = SC i + SD i Σ M = Σ SC + Σ SD The mean value and covariance of the distribution type obtained in this way are converted to the cepstrum domain in the reverse of the process so far.

【0014】まず、エキスポネンシャル変換の逆変換で
ある対数変換を行なう。対数変換された平均値をLM、
共分散をΣLMとすると、エキスポネンシャル変換の逆変
換であるので、 LMi =log(Mi ) −1/2 log(ΣM ii /Mi 2 +1 ) ΣLM ij=log(ΣM ij / (Mi j)+ 1 ) と変換する。さらに、逆コサイン変換(コサイン変換と
同じ)(COS')mx(p+1)によって対数スペクトルをケプ
ストラム領域へ変換し、平均値、Sと共分散ΣSを、 S=LM (COS') ΣS =(COS')ΣLM (COS')t のように得る。そして、ケプストラム領域で得られたH
MMの出力確率の正規分布を線形スペクトル領域までも
って行き、雑音モデルと加算してから、更に逆変換を行
ない、雑音を加算した音韻HMMを作成する。
First, logarithmic transformation, which is the inverse transformation of exponential transformation, is performed. The logarithmically transformed average value is LM,
If the covariance is Σ LM , it is an inverse transformation of exponential transformation, so LM i = log (M i ) −1/2 log (Σ M ii / M i 2 +1) Σ LM ij = log (Σ M ij / (M i M j ) +1). Further, the logarithmic spectrum is transformed into the cepstrum domain by the inverse cosine transform (the same as the cosine transform) (COS ') mx (p + 1), and the average value, S and the covariance Σ S are given by S = LM (COS') Σ S = (COS ') Σ LM (COS') t . And H obtained in the cepstrum region
The normal distribution of the output probability of the MM is taken to the linear spectrum region, added to the noise model, and then inverse transformation is performed to create a phoneme HMM to which noise is added.

【0015】上記説明では、二つの分布形を取り上げ、
その合成方法を述べた。次に、二つのHMMの合成方法
について述べる。通常、音韻HMMは、図2に示すよう
な右から左への3状態ぐらいのモデルで表される。一
方、雑音のモデルとしては、図3に示すようなエルゴー
ド的なHMMが適している。この2つのモデルを積とし
て、積空間での合成を考える。すると、図4のような積
空間でのHMMが得られる。それぞれの状態は、音韻H
MMの状態と雑音HMMの状態の組み合わせからなって
いる。よって、対応するそれぞれの状態の出力確率に対
し、上記の線形スペクトル領域への変換を行ない、続い
てコンボルーションを行ない、逆変換を行なう。
In the above description, two distribution types are taken up,
The synthetic method was described. Next, a method of synthesizing the two HMMs will be described. Usually, the phoneme HMM is represented by a model of about three states from right to left as shown in FIG. On the other hand, an ergodic HMM as shown in FIG. 3 is suitable as a noise model. Let us consider composition in the product space by using these two models as products. Then, the HMM in the product space as shown in FIG. 4 is obtained. Each state is phonological H
It consists of a combination of MM states and noise HMM states. Therefore, the output probabilities of the respective corresponding states are converted into the above-described linear spectral region, followed by convolution, and inverse conversion is performed.

【0016】出力確率が単一の正規分布であるときに
は、2つの分布での上記の変換を行なえばよい。出力分
布が正規分布の混合であるときには、あらゆる分布の組
み合わせ対について上記の変換を行なうことになる。こ
こまでの説明では、ケプストラム係数のみについて説明
したが、音韻HMMでよく用いられるΔケプストラムや
Δパワーに対しても、それらが、ケプストラム係数の線
形和で表されているので、同様の変換が適用できる。
When the output probability has a single normal distribution, the above conversion with two distributions may be performed. When the output distribution is a mixture of normal distributions, the above conversion will be performed for all combinations of distribution combinations. In the description so far, only the cepstrum coefficient has been described, but the same conversion is applied to the Δ cepstrum and the Δ power that are often used in the phoneme HMM because they are represented by the linear sum of the cepstrum coefficients. it can.

【0017】図5は、本発明を音声認識装置に適用した
場合の実施例を示す図である。図中5は音韻モデル格納
部、6は雑音入力部、7は本発明の耐雑音音韻モデル作
成方式を実行する耐雑音音韻モデル作成部、8は耐雑音
音韻モデル格納部である。予め雑音のない状態で得た音
声情報から作成した音韻HMMモデルを音韻モデル格納
部5に用意し、耐雑音音韻モデル作成部7では、雑音入
力部6から入力された雑音から、雑音音韻HMMを作成
し、本発明の方式に従って作成された耐雑音音韻モデル
を耐雑音音韻モデル格納部8に格納させる。
FIG. 5 is a diagram showing an embodiment in which the present invention is applied to a voice recognition device. In the figure, 5 is a phoneme model storage unit, 6 is a noise input unit, 7 is a noise tolerant phoneme model creation unit that executes the noise tolerant phoneme model creation method of the present invention, and 8 is a noise tolerant phoneme model storage unit. A phoneme HMM model created in advance from speech information obtained in a noise-free state is prepared in the phoneme model storage unit 5, and the noise-resistant phoneme model creation unit 7 extracts a noise phoneme HMM from noise input from the noise input unit 6. The noise-resistant phoneme model created according to the method of the present invention is stored in the noise-resistant phoneme model storage unit 8.

【0018】認識部2では、音声入力部1から入力され
た入力音声を耐雑音音韻モデル格納部8に格納された音
韻モデルを用いて音声認識し、認識結果出力部3から出
力させる。
The recognition unit 2 performs voice recognition on the input voice input from the voice input unit 1 using the phoneme model stored in the noise-resistant phoneme model storage unit 8 and outputs it from the recognition result output unit 3.

【0019】[0019]

【発明の効果】この発明によれば、発声環境の雑音モデ
ルを用いることができるので、発声環境に適合した頑健
な音韻HMMの作成が可能になり、高い音韻認識率の達
成が期待できる。
According to the present invention, since the noise model of the vocal environment can be used, it is possible to create a robust phoneme HMM suitable for the vocal environment, and it is expected that a high phoneme recognition rate is achieved.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明による耐雑音音韻モデルの作成手順を示
す図である。
FIG. 1 is a diagram showing a procedure for creating a noise-resistant phoneme model according to the present invention.

【図2】3状態3ループの音韻HMMの例を示す。FIG. 2 shows an example of a 3-state 3-loop phonological HMM.

【図3】2状態のエルゴードHMMによる雑音HMMの
例を示す。
FIG. 3 shows an example of a noise HMM by a two-state ergodic HMM.

【図4】図2の音韻HMMと図3の雑音HMMを、積空
間で合成した音韻HMMを示す。
FIG. 4 shows a phoneme HMM obtained by combining the phoneme HMM shown in FIG. 2 and the noise HMM shown in FIG. 3 in a product space.

【図5】本発明を音声認識装置に適用した場合の実施例
を説明する図である。
FIG. 5 is a diagram for explaining an embodiment when the present invention is applied to a voice recognition device.

【図6】従来の音声認識装置の構成を説明するための図
である。
FIG. 6 is a diagram for explaining the configuration of a conventional voice recognition device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者 フランク マルタン 東京都目黒区駒場4丁目6番29号 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Frank Martin 4-629 Komaba, Meguro-ku, Tokyo

Claims (5)

【特許請求の範囲】[Claims] 【請求項1】音声認識に用いる音韻隠れマルコフモデル
(Hidden Markov Model 以下、HMMという)の作成方
式であって、あらかじめ作成された音韻HMMと雑音の
HMMとを、スペクトル領域で加算することにより、雑
音の重畳した音韻HMMを作成することを特徴とする耐
雑音音韻モデルの作成方式。
1. A method of creating a phonological hidden Markov model (hereinafter referred to as HMM) used for speech recognition, wherein a phoneme HMM and a noise HMM created in advance are added in a spectral domain. A method for creating a noise-resistant phoneme model, characterized by creating a phoneme HMM on which noise is superimposed.
【請求項2】音韻HMMと雑音HMMとを積空間で合成
することを特徴とする請求項1項記載の耐雑音音韻の作
成方式。
2. The noise-tolerant phoneme generation method according to claim 1, wherein the phoneme HMM and the noise HMM are combined in a product space.
【請求項3】ケプストラム領域で表された音韻HMMと
雑音HMMの出力確率の分布をスペクトル領域まで、コ
サイン変換とエキスポネンシャル変換によって変換し
て、そこで両者のコンボルーション(畳み込み)を行
い、雑音に適合した音韻HMMを構成し、さらに、上記
の逆の変換である対数変換と逆コサイン変換を行なっ
て、音韻HMMをケプストラム領域まで変換することを
特徴とする耐雑音音韻モデルの作成方式。
3. A phonological HMM and a noise HMM output probability distribution expressed in a cepstrum domain are converted up to a spectral domain by cosine transformation and exponential transformation, and convolution of both is performed there to generate noise. A method for creating a noise-resistant phoneme model, characterized in that the phoneme HMM adapted to the above is configured, and further the logarithmic conversion and the inverse cosine conversion which are the above-mentioned reverse conversion are performed to convert the phoneme HMM up to the cepstrum region.
【請求項4】音韻HMMと雑音HMMをそれぞれ、スペ
クトル領域で学習により作成し、コンボルーションを行
い、対数変換と逆コサイン変換を行って音韻HMMをケ
プストラム領域まで変換することを特徴とする耐雑音音
韻モデルの作成方式。
4. A phonological HMM and a noise HMM are created by learning in the spectral domain, convolution is performed, and logarithmic transformation and inverse cosine transformation are performed to transform the phonological HMM up to the cepstrum domain. Phonological model creation method.
【請求項5】請求項2の音韻HMMの合成で、2つのモ
デルの対応する出力確率の値を請求項3ないし請求項4
で計算することを特徴とする耐雑音音韻モデルの作成方
式。
5. The phonological HMM synthesis according to claim 2, wherein the values of the corresponding output probabilities of the two models are calculated according to any one of claims 3 to 4.
A method for creating a noise-resistant phoneme model characterized by being calculated by.
JP00568893A 1993-01-18 1993-01-18 Creating a noise-resistant phoneme model Expired - Lifetime JP3247746B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP00568893A JP3247746B2 (en) 1993-01-18 1993-01-18 Creating a noise-resistant phoneme model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP00568893A JP3247746B2 (en) 1993-01-18 1993-01-18 Creating a noise-resistant phoneme model

Publications (2)

Publication Number Publication Date
JPH06214592A true JPH06214592A (en) 1994-08-05
JP3247746B2 JP3247746B2 (en) 2002-01-21

Family

ID=11618046

Family Applications (1)

Application Number Title Priority Date Filing Date
JP00568893A Expired - Lifetime JP3247746B2 (en) 1993-01-18 1993-01-18 Creating a noise-resistant phoneme model

Country Status (1)

Country Link
JP (1) JP3247746B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002323900A (en) * 2001-04-24 2002-11-08 Sony Corp Robot device, program and recording medium
KR100442825B1 (en) * 1997-07-11 2005-02-03 삼성전자주식회사 Method for compensating environment for voice recognition, particularly regarding to improving performance of voice recognition system by compensating polluted voice spectrum closely to real voice spectrum
KR100434527B1 (en) * 1997-08-01 2005-09-28 삼성전자주식회사 Speech Model Compensation Method Using Vector Taylor Series
US7209881B2 (en) 2001-12-20 2007-04-24 Matsushita Electric Industrial Co., Ltd. Preparing acoustic models by sufficient statistics and noise-superimposed speech data
US7660717B2 (en) 2002-03-15 2010-02-09 Nuance Communications, Inc. Speech recognition system and program thereof

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101047104B1 (en) * 2009-03-26 2011-07-07 고려대학교 산학협력단 Acoustic model adaptation method and apparatus using maximum likelihood linear spectral transform, Speech recognition method using noise speech model and apparatus

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100442825B1 (en) * 1997-07-11 2005-02-03 삼성전자주식회사 Method for compensating environment for voice recognition, particularly regarding to improving performance of voice recognition system by compensating polluted voice spectrum closely to real voice spectrum
KR100434527B1 (en) * 1997-08-01 2005-09-28 삼성전자주식회사 Speech Model Compensation Method Using Vector Taylor Series
JP2002323900A (en) * 2001-04-24 2002-11-08 Sony Corp Robot device, program and recording medium
US7209881B2 (en) 2001-12-20 2007-04-24 Matsushita Electric Industrial Co., Ltd. Preparing acoustic models by sufficient statistics and noise-superimposed speech data
US7660717B2 (en) 2002-03-15 2010-02-09 Nuance Communications, Inc. Speech recognition system and program thereof

Also Published As

Publication number Publication date
JP3247746B2 (en) 2002-01-21

Similar Documents

Publication Publication Date Title
US10535336B1 (en) Voice conversion using deep neural network with intermediate voice training
US7792672B2 (en) Method and system for the quick conversion of a voice signal
Sisman et al. A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder.
JP2779886B2 (en) Wideband audio signal restoration method
US7765101B2 (en) Voice signal conversation method and system
US8234110B2 (en) Voice conversion method and system
JP2826215B2 (en) Synthetic speech generation method and text speech synthesizer
Ye et al. High quality voice morphing
US5327521A (en) Speech transformation system
US5165008A (en) Speech synthesis using perceptual linear prediction parameters
US5459815A (en) Speech recognition method using time-frequency masking mechanism
Milner et al. Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model
Ming et al. Exemplar-based sparse representation of timbre and prosody for voice conversion
EP2109096B1 (en) Speech synthesis with dynamic constraints
US20090144058A1 (en) Restoration of high-order Mel Frequency Cepstral Coefficients
Lee Statistical approach for voice personality transformation
JP3240908B2 (en) Voice conversion method
JP2002091478A (en) Voice recognition system
Zorilă et al. Improving the quality of standard GMM-based voice conversion systems by considering physically motivated linear transformations
JPH06214592A (en) Noise resisting phoneme model generating system
JPH10149191A (en) Method and device for adapting model and its storage medium
JPH0744727A (en) Method and device for generating picture
JP3250604B2 (en) Voice recognition method and apparatus
JPH07121197A (en) Learning-type speech recognition method
JP3186013B2 (en) Acoustic signal conversion encoding method and decoding method thereof

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071102

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081102

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091102

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101102

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101102

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111102

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111102

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121102

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121102

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20131102

Year of fee payment: 12

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20131102

Year of fee payment: 12