JPH0713587A

JPH0713587A - Hidden markov connection learning method

Info

Publication number: JPH0713587A
Application number: JP5155359A
Authority: JP
Inventors: Shigeru Honma; 茂本間
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1993-06-25
Filing date: 1993-06-25
Publication date: 1995-01-17

Abstract

PURPOSE:To perform learning in a correct section and to improve the estimation precision of a parameter by performing segmentation by using acoustic analytic parameters and limiting the section of learning speech data used for parameter re-estimation. CONSTITUTION:An input speech signal is segmented by using its acoustic analytic parameters, e.g. a sequence of feature parameters to obtain a 1st boundary (S1). A hidden Markov model which is connected is used and segmentation is performed by Viterbi decoding algorithm to obtain a 2nd boundary (S2). Phoneme environment wherein the 1st boundary and 2nd boundary coincide with each other within a certain error range is previously checked and stored as many as possible (S3). Segmentation is performed by using feature parameters to search for the phoneme environment wherein both the boundaries coincide with each other (S4). The section is limited (S5) by using the boundary which is obtained first. Parameters of each hidden Markov model at a corresponding part of the connected hidden Markov model, i.e., output probability, transition probability, etc., are re-estimated.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、例えば音声認識に用
いられ、音素や音節などのサブワード単位の隠れマルコ
フモデルを、学習音声の内容で連結し、その個々の隠れ
マルコフモデルのパラメータを、学習音声信号を用いて
再推定する隠れマルコフモデル連結学習方法に関し、特
に、再推定に用いる学習音声信号の区間を制限する学習
方法に係わる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is used for speech recognition, for example, to connect hidden Markov models in units of subwords such as phonemes and syllables with the contents of learning speech, and learn the parameters of the individual hidden Markov models. The present invention relates to a hidden Markov model connection learning method for re-estimation using a speech signal, and particularly to a learning method for limiting a section of a learning speech signal used for re-estimation.

【０００２】[0002]

【従来の技術】従来において、隠れマルコフモデルの連
結学習で、各パラメータの再推定に用いる学習音声信号
は全区間用いていた。この場合は計算量が多く、時間が
かかり、しかも学習対象と関係のない多くのデータも取
込まれるため、収束が悪く、パラメーターの再推定精度
が悪くなる場合もある。2. Description of the Related Art Conventionally, in the hidden Markov model connection learning, a learning speech signal used for re-estimation of each parameter is used for all sections. In this case, the amount of calculation is large, it takes time, and a large amount of data unrelated to the learning target is also acquired, so that convergence may be poor and the re-estimation accuracy of the parameter may be poor.

【０００３】このような点から学習音声信号（データ）
を分別し、その各区間ごとに連結隠れマルコフモデル中
の対応する部分の隠れマルコフモデルのパラメータを再
推定することも行われている。即ち図２に示すように、
入力学習音声信号は音声分析部１で分析されて、スペク
トラム、ＬＰＣケプストラムなどの特徴パラメータが求
められ、その特徴パラメータの系列は分析特徴量記憶部
２は記憶される。一方、入力学習音声信号の発声内容、
つまり学習音声内容を示すテキスト文が発声内容記憶部
４に記憶される。モデル連結部５で学習を開始する前
に、学習のパラメータ初期値、つまり出力確率や遷移確
率などの初期値をパラメータ記憶部３から読出して各隠
れマルコフモデルの初期化を行う。次に、発声内容記憶
部４から発声内容を取出し、それを音素系列にほん訳
し、その音素系列に合せて、その各音素と対応する、初
期化された音素単位の隠れマルコフモデルを順次連結す
る。From such a point, a learning voice signal (data)
It is also practiced to classify, and re-estimate the parameters of the corresponding hidden Markov model in the connected hidden Markov model for each section. That is, as shown in FIG.
The input learning voice signal is analyzed by the voice analysis unit 1 to obtain characteristic parameters such as spectrum and LPC cepstrum, and the series of the characteristic parameters is stored in the analyzed characteristic amount storage unit 2. On the other hand, the utterance content of the input learning voice signal,
That is, a text sentence indicating the learning voice content is stored in the utterance content storage unit 4. Before starting learning in the model connection unit 5, initial values of learning parameters, that is, initial values such as output probabilities and transition probabilities are read from the parameter storage unit 3 to initialize each hidden Markov model. Next, the utterance content is extracted from the utterance content storage unit 4, the phoneme sequence is translated into a phoneme sequence, and in accordance with the phoneme sequence, hidden Markov models for each phoneme corresponding to the respective phonemes are sequentially connected.

【０００４】次に学習区間設定部６で、前記連結した、
隠れマルコフモデルを用いて、ビタビ（Ｖｉｔｅｒｂ
ｉ）デコーデイング・アルゴリズムによりセグメンテー
ションを行い、入力学習音声信号の各音素区間境界を求
める。学習処理部７で連結した隠れマルコフモデルの前
記セグメンテーションした各区間ごとに、その区間と対
応する部分とその区間の両端にあるその区間の何倍かの
幅の部分とを学習音声の特徴パラメータ系列から取出
し、この取出した特徴パラメータを用いてその区間の各
隠れマルコフモデルのパラメータを再推定する。Next, the learning section setting unit 6 connects the
Using the hidden Markov model, Viterb (Viterb
i) Segmentation is performed by a decoding algorithm to find each phoneme section boundary of the input learning speech signal. For each segmented segment of the hidden Markov model connected by the learning processing unit 7, a portion corresponding to the segment and a portion at each end of the segment having a width several times larger than that of the segment are used as the characteristic parameter sequence of the learning speech. , And the parameters of each hidden Markov model in the section are re-estimated using the extracted feature parameters.

【０００５】[0005]

【発明が解決しようとする課題】このように学習音声デ
ータ、つまり特徴パラメータ系列を制限してパラメータ
再推定を行うと、計算量が減少し、また学習対象と関係
のないデータが少く、パラメータ推定精度が向上する。
しかし、セグメンテーションの精度が悪いため、本来学
習に必要な音声データを外してしまったり、またこのよ
うなことをなるべく避けるには学習音声データ区間をな
るべく長くする必要があり、不必要なデータが多くな
り、かつ計算量が多くなる。When the parameter re-estimation is performed by limiting the learning speech data, that is, the feature parameter series in this way, the amount of calculation is reduced, and the amount of data unrelated to the learning target is small, resulting in the parameter estimation. Accuracy is improved.
However, the accuracy of segmentation is poor, so the audio data originally required for learning is omitted, and in order to avoid such things as much as possible, it is necessary to make the learning audio data section as long as possible, and there is much unnecessary data. And the amount of calculation increases.

【０００６】[0006]

【課題を解決するための手段】この発明によれば連結さ
れた隠れマルコフモデルを用いてセグメンテーションを
行って境界を求めると共に、同一音声信号の特徴パラメ
ータ系列について、そのパラメータに基づいてセグメン
テーションを行って境界を求め、これら両境界がある誤
差範囲内で一致している音韻環境を予め調べておき、学
習音声信号の特徴パラメータ、つまり音響分析パラメー
タを用いてセグメンテーションして前記両境界が一致す
る音韻環境を探し、その音韻環境部分についてはその各
境界を連結された隠れマルコフモデル中の各隠れマルコ
フモデルに与える学習音声データの境界とを解釈し、学
習に用いる音声データ区間を制限してパラメータを再推
定する。According to the present invention, a boundary is obtained by performing segmentation using a concatenated hidden Markov model, and a feature parameter series of the same speech signal is segmented based on the parameter. A boundary is obtained, and a phonological environment in which these boundaries coincide with each other within a certain error range is investigated in advance, and segmentation is performed using a characteristic parameter of the learning speech signal, that is, an acoustic analysis parameter, and the phonological environment in which the both boundaries coincide. For the phonological environment part, interpret each boundary and the boundary of the learning speech data given to each hidden Markov model in the connected hidden Markov model, and limit the speech data section used for learning to re-parameterize presume.

【０００７】[0007]

【実施例】この発明の実施例を図１の流れ図を参照して
説明する。入力音声信号を、その音響分析パラメータ、
つまり特徴パラメータ系列を用いてセグメンテーション
を行い第１境界を求める（Ｓ₁）。特徴パラメータ系列
中のパワーやスペクトラムなどによりどの部分にどのよ
うな音韻があるかを調べて音韻境界を求める。また同一
音声についての発声内容について求めた連結された隠れ
マルコフモデルを用いて、ビタビデコーデイング・アル
ゴリズムによりセグメンテーションを行って第２境界を
求める（Ｓ₂）。第１境界と第２境界とが一定誤差範囲
内で一致する音韻環境をなるべく多く、予め調べて記憶
しておく（Ｓ ₃）。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT Referring to the flowchart of FIG.
explain. The input speech signal is converted into its acoustic analysis parameters,
That is, segmentation is performed using the feature parameter sequence.
To find the first boundary (S₁). Feature parameter series
Which part depends on the power and spectrum inside
The phonological boundary is obtained by checking whether there is such a phoneme. Same again
Concatenated Hidden Finds for Speech Content for Speech
Viterbi decoding al using Markov model
Gorythm segmentation to the second boundary
Ask (S₂). Fixed error range between the first and second boundaries
Preliminarily investigate and memorize as many phonological environments as possible
Keep (S ₃).

【０００８】次に学習音声信号を入力し、これを分析し
て特徴パラメータ（音響分析パラメータ）系列を得、そ
の特徴パラメータを用いてセグメンテーションして、前
記両境界が一致する音韻環境を探す（Ｓ₄）。探した音
韻環境部分について学習音声データ（特徴パラメータ系
列）を、先に行ったセグメンテーションにより求めた境
界により区間制限する（Ｓ₅）。その制限された区間の
みを用いて、連結された隠れマルコフモデルの対応する
部の各隠れマルコフモデルのパラメータ、つまり出力確
率や遷移確率などを再推定する（Ｓ₆）。Next, a learning speech signal is input, and this is analyzed to obtain a characteristic parameter (acoustic analysis parameter) sequence, and segmentation is performed using the characteristic parameter to search for a phonological environment where the both boundaries match (S). ₄ ). The learning speech data (feature parameter series) for the searched phonological environment part is section-limited by the boundary obtained by the previously performed segmentation (S ₅ ). The parameters of each hidden Markov model in the corresponding part of the connected hidden Markov model, that is, the output probability and the transition probability are re-estimated using only the restricted section (S ₆ ).

【０００９】このようにすると母音−子音−母音−子音
の繰返し部分のような音韻環境では、その音韻境界が明
確であって、この部分についてはその音韻ごとに隠れマ
ルコフモデルのパラメータの再推定を行うことにより学
習音声データ量が少なく、しかも不要なデータがないた
め、少ない計算量で精度よくパラメータを推定すること
ができる。子音−母音−母音−子音などの母音が連続す
る部分や子音−半母音−母音−子音などの半母音と母音
が連続する部分などは音韻境界が明確でないため，音韻
ごとに隠れマルコフモデルのパラメータの再推定を行な
わず，母音−母音，半母音−母音といった複数の音韻に
ついて一括して隠れマルコフモデルのパラメータの再推
定を行う。Thus, in a phonological environment such as a vowel-consonant-vowel-consonant repeating part, the phonological boundary is clear, and for this part, re-estimation of the hidden Markov model parameters is performed for each phonological part. By doing so, the learning voice data amount is small, and since there is no unnecessary data, the parameters can be accurately estimated with a small calculation amount. Consonants, vowels, vowels, vowels, consonants, and other consecutive vowels, and consonants, semivowels, vowels, consonants, and other consecutive vowels and vowels do not have clear phonological boundaries. Without estimating, the parameters of the hidden Markov model are re-estimated collectively for multiple phonemes such as vowel-vowel and half-vowel-vowel.

【００１０】[0010]

【発明の効果】以上述べたようにこの発明によれば、隠
れマルコフモデルを用いたセグメンテーションと、音響
分析パラメータを用いたセグメンテーションとが比較的
よく一致する音韻環境については、音響分析パラメータ
を用いてセグメンテーションを行って、パラメータ再推
定に用いる学習音声データを区間制限しているため、各
隠れマルコフモデルの学習対象とは関係のないデータを
取込むことが減少し、本来学習すべき区間の取りこぼし
が減少し、正しい区間で学習が行われるためパラメータ
の推定精度が向上する。しかも区間制限された少ない学
習データを用いるため計算量が少なくて済み、収束が速
く、全体の学習時間が短かくなる。As described above, according to the present invention, the acoustic analysis parameter is used for the phonological environment in which the segmentation using the hidden Markov model and the segmentation using the acoustic analysis parameter are relatively well matched. Since segmentation is performed to limit the training speech data used for parameter re-estimation to intervals, the data that is not related to the learning target of each hidden Markov model is reduced, and the intervals to be originally learned should be missed. The accuracy of parameter estimation is improved because the learning is performed in the correct interval. Moreover, since a small amount of learning data with a limited section is used, the amount of calculation is small, the convergence is fast, and the overall learning time is short.

【図面の簡単な説明】[Brief description of drawings]

【図１】この発明の実施例を示す流れ図。FIG. 1 is a flow chart showing an embodiment of the present invention.

【図２】隠れマルコフモデル連結学習装置の一般的構成
を示すブロック図。FIG. 2 is a block diagram showing a general configuration of a hidden Markov model connection learning device.

Claims

【特許請求の範囲】[Claims]

【請求項１】入力学習音声信号を分析してその特徴パ
ラメータを求め、これを記憶し、上記入力学習音声信号の発声内容を記憶し、その発声内容に基づき隠れマルコフモデルを連結し、その連結された隠れマルコフモデル中の各隠れマルコフ
モデルのパラメータ再推定に用いる上記学習音声信号の
区間を制限してそのパラメータ再推定をする隠れマルコ
フモデル連結学習方法において、上記特徴パラメータを用いてセグメンテーションを行っ
て得た第１境界と、連結された隠れマルコフモデルを用
いてセグメンテーションを行って得た第２境界とが一定
の誤差範囲で一致する音韻環境を予め調べておき、上記入力学習音声の特徴パラメータを用いてセグメンテ
ーションして上記一致する音韻環境を探し、探した音韻環境部分についてはその各境界により上記パ
ラメータ再推定に用いる上記区間を決定することを特徴
とする隠れマルコフモデル連結学習方法。1. An input learning speech signal is analyzed to obtain a characteristic parameter thereof, which is stored, the utterance content of the input learning speech signal is stored, a hidden Markov model is connected based on the utterance content, and the connection is made. In the hidden Markov model concatenation learning method that limits the section of the learning speech signal used for parameter re-estimation of each hidden Markov model in the hidden Markov model and re-estimates the parameter, segmentation is performed using the feature parameter. The phonological environment in which the first boundary obtained as a result and the second boundary obtained by performing the segmentation using the concatenated hidden Markov model match within a certain error range is previously investigated, and the characteristic parameter of the input learning speech is obtained. To find the matching phonological environment by using the. Hidden Markov model connection learning method, characterized in that the section used for the parameter re-estimation is determined by each boundary of.