JP3547349B2

JP3547349B2 - Acoustic model learning method

Info

Publication number: JP3547349B2
Application number: JP27422299A
Authority: JP
Inventors: 恒夫加藤; 眞吾黒岩; 宜男樋口
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 1999-09-28
Filing date: 1999-09-28
Publication date: 2004-07-28
Anticipated expiration: 2019-09-28
Also published as: JP2001100779A

Description

【０００１】
【発明の属する技術分野】
本発明は音響モデル学習方法に関し、特に、前後音素環境を考慮したトライフォンの音素決定木により音声認識単位を決定する方法に関する。
【０００２】
【従来の技術】
現在、音声認識に用いられる音響モデルの主流はトライフォンである。トライフォンは、先行音素と中心音素と後続音素との３音素連鎖により定義され、学習データから学習によって作成される。先行音素と後続音素が中心の音素に対するる前後音素環境をなす。
【０００３】
しかし、音素は４０種類程度あるため、トライフォン（３音素連鎖）の総異音数は数万個のオーダーに達する。また、トライフォンが学習データに出現しなかったり、出現してもその数が極めて少ないことがある。
【０００４】
上述した膨大な総異音数と、学習データに未出現や数が少ない３音素連鎖の存在のため、従来、図３に示すように、音素決定木による音声認識単位の決定手法によって、ＨＭＭ（隠れマルコフモデル）の共有化を行い、パラメータを削減することが行われている。
【０００５】
図３において、従来は、学習データから共有されていないトライフォン各状態（ＨＭＭ）の単一連続分布を学習により作成する。ステップＳ１１参照。
【０００６】
次に、共有化を許容するトライフォン各状態（ＨＭＭ）の集合（以下、ノードと呼ぶ）、つまり、中心音素が共通のＨＭＭのノードを作成する。ステップＳ１２参照。
【０００７】
次に、各ノードについて、予め設定した前後音素環境に関する複数の決定木のうち、基準となる情報量が分割前に比べて分割後に最も向上する決定木によって、ノードの分割を行う（ステップＳ１３〜Ｓ１６参照）。以下に、ノードの分割と、基準となる情報量の計算方法を説明する。
【０００８】
ノードの分割について説明する。前述のように、トライフォンは先行音素と中心音素と後続音素との３音素連鎖により定義され、ノード（トライフォンの集合）に対して音素決定木は例えば下記（１）〜（３）に例示するように定義される。各音素決定木により１つのノードを２つのノードに分割する。
（１）先行音素が母音（ａ、ｉ、ｕ、ｅ、ｏ）で、後続音素は問わない。
（２）先行音素が鼻音（ｎ、ｍ、ＮＮ）で、後続音素は問わない。
（３）後続音素が破裂音（ｐ、ｔ、ｋ）で、先行音素は問わない。
【０００９】
基準となる情報量の計算方法について説明する。或るノードに含まれる複数のトライフォンをそれぞれ表現するパラメータから、当該ノードを代表するパラメータを求め、この代表パラメータにより、基準となる情報量を計算する。一般的には、ノードに含まれる各トライフォンを表現する連続分布から、ノード全体を表現する連続分布を求め、基準となる情報量として、学習データに対するノード全体を表現する連続分布の尤度を利用する。
【００１０】
分割後の全末端ノードに対しても、同様の分割手法で基準となる情報量が最も向上する音素決定木を選び、選んだ音素決定木によりノード分割を行う。この操作を、分割後の基準となる情報量が予め設定した閾値を超えるまで順次繰り返す（ステップＳ１７からステップＳ１３へのループ参照）。
【００１１】
全ての分割後の末端ノードにおいて基準となる情報量が閾値を超えたら、ノードの分割を停止する（ステップＳ１８参照）。
【００１２】
以上により、同じ末端ノードに属する複数のトライフォンは、１つのＨＭＭを共有することになる。このとき、共有するＨＭＭとして、一般的には、末端ノードに含まれる各トライフォンをそれぞれ構成する複数の単一連続分布のうち、１つの単一連続分布を選択して出力する。つまり、１つの単一連続分布で共有するＨＭＭを代表する。ステップＳ１９参照。
【００１３】
図４を参照すれば、従来は、共有するＨＭＭとしては、ノード１１に含まれる各トライフォン１２ａ〜１２ｎをそれぞれ構成する複数の単一連続分布１３ａ〜１３ｎのうち、いずれか１つの単一連続分布１３ｉを選択して出力する。
【００１４】
【発明が解決しようとする課題】
しかし、上述した手法は、従来、単一連続分布ＨＭＭに対して行われており、認識性能が高い混合連続分布ＨＭＭに対して音素決定木による音声認識単位の決定手法は適用されていないという第１の課題がある。
【００１５】
また、上述した従来手法では、共有するＨＭＭはノードを構成する分布の１つを選択しているため、共有する全音素環境の音響特性を表現できていないという第２の課題がある。
【００１６】
そこで、本発明の目的は、上記２つの課題を解決することにある。
【００１７】
【課題を解決するための手段】
請求項１に係る発明は、上記第１の課題を解決する音響モデル学習方法であり、先行音素と後続音素の音素決定木による音声認識単位の決定方法において、混合連続分布ＨＭＭを対象としてクラスタリングによりノード分割を行い、共有化を行う末端ノードに対して新しい混合連続分布ＨＭＭを出力することを特徴とする。請求項２に係る発明も、上記第１の課題を解決する音響モデル学習方法であり、先行音素と後続音素の音素決定木による音声認識単位の決定方法において、混合連続分布ＨＭＭを対象として、音素決定木を選ぶ過程で各ノードに含まれる混合連続分布をクラスタリングして、情報量が最も向上する音素決定木を選び、その音素決定木によりノード分割を行った後、共有化を行う末端ノードに対して新しい混合連続分布ＨＭＭを出力することを特徴とする。請求項３に係る発明も、上記第１の課題を解決する音響モデル学習方法であり、先行音素と後続音素の音素決定木による音声認識単位の決定方法において、混合連続分布ＨＭＭを対象として、音素決定木を選ぶ過程で各ノードに含まれる混合連続分布を離散値と見なし、Ｋ−ｍｅａｎｓ法（ケイ−ミーンズ法）によりクラスタリングして、情報量が最も向上する音素決定木を選び、その音素決定木によりノード分割を行った後、共有化を行う末端ノードに対して新しい混合連続分布ＨＭＭを出力することを特徴とする。請求項４に係る発明は、上記第１及び第２の課題を解決する音響モデル学習方法であり、先行音素と後続音素の音素決定木による音声認識単位の決定方法において、混合連続分布ＨＭＭを対象としてクラスタリングによりノード分割を行った後、共有化を行う末端ノードに対して、クラスタリングの結果、新しい混合連続分布ＨＭＭが求められるものであり、新しい混合連続分布ＨＭＭは、クラスタリング後の共有するＨＭＭの集合を代表する出力連続分布として、全分布の平均値と分散値を結合したものからなることを特徴とする。
【００１９】
【発明の実施の形態】
図１に本発明の一実施形態例に係る音響モデル学習方法の手順を示す。
【００２０】
図１において、まず、学習データから共有されていないトライフォン各状態（ＨＭＭ）の混合連続分布を学習により作成し、用意する。ステップＳ１参照。
【００２１】
次に、混合連続分布ＨＭＭにより構成される各トライフォンに対して、共有化を許容するＨＭＭのノードを作成する。ステップＳ２参照。
【００２２】
次に、各ノードを、先行音素と後続音素の決定木により、混合連続分布ＨＭＭを対象として、分割する。ステップＳ３〜Ｓ６参照。
【００２３】
ノード分割の際、情報量の計算に用いる混合連続分布ＨＭＭは、情報量が最大になるように構成する。その構成方法の例を以下に述べる。
【００２４】
（１）図２に示すように、分割後のノードに含まれるトライフォンを構成する全分布を予め定めたクラスタ数にクラスタする。図２において、１はノード、２ａ〜２ｎはノード１に含まれる状態、３ａ〜３ｎは連続分布であり、各状態は複数の連続分布ＨＭＭに対応している。つまり、混合連続分布ＨＭＭとなっている。図示の例では、クラスタ数はクラスタ４ａ〜４ｃの３個である。
【００２５】
（１ａ）クラスタリングには、トップダウン式のＫ−ｍｅａｎｓ法（ケイ・ミーンズ法：離散データのクラスタリング法）、ボトムアップ式のＦｕｒｔｈｅｓｔＮｅｉｇｈｂｏｒ法（ファーゼスト・ネイバー法：離散／連続データのクラスタリング法））等を用いる。但し、各クラスタに含まれる連続分布ＨＭＭの数は複数とし、予め下限を設けておく。
【００２６】
（１ｂ）クラスタリング時の入力データとしてはＨＭＭ各状態の学習データ中の出現回数、状態を構成する混合連続分布の分布重み、平均値及び分散が与えられ、また、近似的に各分布の出現回数が計算可能であるため、この分布出現回数を重み付けしてセントロイド計算を行う。
【００２７】
（２）クラスタリング後、各クラスタ毎に、クラスタに含まれる全分布から新しい混合連続分布を１つ合成して代表分布とする。図２では、連続混合分布５ａ〜５ｃが各クラスタ４ａ〜４ｃ毎に新しく合成した代表分布である。
【００２８】
（２ａ）この合成される分布の平均値は、下記数１に示すように、全分布の平均値を出現回数で重み付け平均して求める。
【００２９】
【数１】

【００３０】
（２ｂ）また、合成される分布の分散は、下記数２に示すように、全分布の分散（組内分散）と、分布間の分散（組間分散）と、出現回数から求める。
【００３１】
【数２】

【００３２】
（２ｃ）合成される分布の分布重みは、下記数３に示すように、学習データ中の出現回数の割合から求める。
【００３３】
【数３】

【００３４】
以上の手順によって求めた連続混合分布を利用して、従来と同様、基準となる情報量を計算して、分割後の基準となる情報量が分割前に比べて最も向上する決定木によって、ノードの分割を行う
【００３５】
分割後の全末端ノードに対しても、同様の分割で基準となる情報量が最も向上する音素決定木を選び、選んだ音素決定木によりノード分割を行う。この操作を、分割後の基準となる情報量が予め設定した閾値を超えるまで順次繰り返す（ステップＳ７からステップＳ３へのループ参照）。
【００３６】
全ての分割後の末端ノードにおいて基準となる情報量が閾値を超えたら、ノードの分割を停止する（ステップＳ８参照）。
【００３７】
以上により、同じ末端ノードに属するトライフォンは、１つのＨＭＭを共有することになる。このとき、共有するＨＭＭとして、共有化を行うノードに対して、前述した手順を利用してこの手順により新しい連続混合分布を合成して求め、この合成した連続混合分布を出力する。ステップＳ９参照。つまり、数１〜数３に基づき当該ノードに含まれる全分布から新しい混合連続分布を１つ合成して出力する。
【００３８】
このように、クラスタリング後の共有するＨＭＭの集合を代表する出力連続分布として、要素となる全分布の平均値と分散値を結合した新しい連続分布を求めることにより、従来は１つの分布を選択するだけのために共有する全音素環境の音響特性を表現できていないという課題を解決できる。つまり、共有する全音素環境の音響特性を表現できる。
【００３９】
また、共有するＨＭＭの集合を代表する出力連続分布として、要素となる全分布の平均値と分散値を結合した連続分布を求めるという手法を、単一連続分布ＨＭＭを対象とした従来方法に適用することにより、単一連続分布ＨＭＭを対象とした場合でも、共有する全音素環境の音響特性を表現できる。
【００４０】
【発明の効果】
以上より、本発明によれば、認識性能が高い混合連続分布ＨＭＭに対して音素決定木による音声認識単位の決定を行うことができる。
【００４１】
また、単一連続分布ＨＭＭを対象としたばあいでも、共有する全音素環境の音響特性を表現することができる。
【図面の簡単な説明】
【図１】本発明の実施形態例に係る音響モデル学習方法の手順を示す図。
【図２】本発明のの実施形態例に係るノードを表現する連続分布構成法を示す図。
【図３】従来の音響モデル学習方法の手順をを示す図。
【図４】従来のノードを表現する連続分布構成法を示す図。
【符号の説明】
１ノード
２ａ〜２ｎ状態
３ａ〜３ｎ連続分布
４ａ〜４ｃクラスタ
５ａ〜５ｃクラスタ毎に新しく合成した連続混合分布[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an acoustic model learning method, and more particularly, to a method of determining a speech recognition unit using a triphone phoneme decision tree in consideration of a preceding and succeeding phoneme environment.
[0002]
[Prior art]
At present, the mainstream of acoustic models used for speech recognition is triphones. A triphone is defined by a three-phoneme chain of a preceding phoneme, a central phoneme, and a subsequent phoneme, and is created by learning from learning data. A preceding phoneme and a succeeding phoneme form a phoneme environment before and after the central phoneme.
[0003]
However, since there are about 40 phonemes, the total number of allophones in a triphone (three phoneme chain) reaches the order of tens of thousands. Further, triphones may not appear in the learning data, or the number of triphones may be extremely small.
[0004]
Due to the huge total number of allophones described above and the presence of three phoneme chains that have not appeared or are small in the training data, conventionally, as shown in FIG. Hidden Markov models are shared and the parameters are reduced.
[0005]
In FIG. 3, conventionally, a single continuous distribution of each state (HMM) of a triphone that is not shared is created from learning data by learning. See step S11.
[0006]
Next, a set of triphone states (HMMs) permitted to be shared (hereinafter, referred to as nodes), that is, nodes of the HMM having a common central phoneme are created. See step S12.
[0007]
Next, for each node, the node is divided by a decision tree whose reference information amount is most improved after the division as compared with before the division, among a plurality of decision trees related to the pre- and post-phoneme environments set in advance (steps S13 to S13). See S16). Hereinafter, a method of dividing a node and calculating a reference information amount will be described.
[0008]
The node division will be described. As described above, a triphone is defined by a three-phoneme chain of a preceding phoneme, a central phoneme, and a subsequent phoneme, and a phoneme decision tree for a node (a set of triphones) is exemplified in the following (1) to (3). Is defined as One node is divided into two nodes by each phoneme decision tree.
(1) The preceding phoneme is a vowel (a, i, u, e, o), and the following phoneme does not matter.
(2) The preceding phoneme is a nasal (n, m, NN), and the following phoneme does not matter.
(3) The succeeding phoneme is a plosive (p, t, k), and the preceding phoneme does not matter.
[0009]
The calculation method of the reference information amount will be described. A parameter representative of the node is obtained from parameters representing a plurality of triphones included in a node, and a reference information amount is calculated based on the representative parameter. In general, a continuous distribution expressing the entire node is obtained from a continuous distribution expressing each triphone included in the node, and the likelihood of the continuous distribution expressing the entire node with respect to the learning data is determined as a reference information amount. Use.
[0010]
For all of the terminal nodes after the division, a phoneme decision tree with the highest reference information amount is selected by the same division method, and node division is performed using the selected phoneme decision tree. This operation is sequentially repeated until the reference information amount after division exceeds a preset threshold value (see a loop from step S17 to step S13).
[0011]
If the reference information amount exceeds the threshold value in all the end nodes after division, the division of the nodes is stopped (see step S18).
[0012]
As described above, a plurality of triphones belonging to the same terminal node share one HMM. At this time, as a shared HMM, generally, one single continuous distribution is selected and output from a plurality of single continuous distributions constituting each triphone included in the terminal node. That is, it represents the HMM shared by one single continuous distribution. See step S19.
[0013]
Referring to FIG. 4, conventionally, as a shared HMM, any one of a plurality of single continuous distributions 13 a to 13 n constituting each of the triphones 12 a to 12 n included in the node 11 is used. The distribution 13i is selected and output.
[0014]
[Problems to be solved by the invention]
However, the above-described method is conventionally performed on a single continuous distribution HMM, and a method of determining a speech recognition unit using a phoneme decision tree is not applied to a mixed continuous distribution HMM having high recognition performance. There is one problem.
[0015]
Further, in the above-described conventional method, since the shared HMM selects one of the distributions constituting the node, there is a second problem that the acoustic characteristics of the shared all-phoneme environment cannot be expressed.
[0016]
Therefore, an object of the present invention is to solve the above two problems.
[0017]
[Means for Solving the Problems]
The invention according to claim 1, a sound model learning how to solve the first problem, in the prior phoneme and the method of determining the speech recognition unit by the phoneme decision tree of the subsequent phoneme, the mixed continuous distribution HMM intended The method is characterized in that node division is performed by clustering, and a new mixed continuous distribution HMM is output to terminal nodes to be shared . Also the invention according to claim 2, an acoustic model learning how to solve the first problem, in the prior phoneme and the method of determining the speech recognition unit by the phoneme decision tree of the subsequent phoneme, directed to a mixed continuous distribution HMM, In the process of selecting a phoneme decision tree, cluster the mixture continuous distribution included in each node, select the phoneme decision tree with the highest information amount, perform node division by the phoneme decision tree , and then share the terminal node , A new mixed continuous distribution HMM is output. Also the invention according to claim 3, a sound model learning how to solve the first problem, in the prior phoneme and the method of determining the speech recognition unit by the phoneme decision tree of the subsequent phoneme, directed to a mixed continuous distribution HMM, In the process of selecting a phoneme decision tree, the mixture continuous distribution included in each node is regarded as a discrete value, and clustering is performed by the K-means method (key-means method) to select a phoneme decision tree with the most improved information amount. after Tsu line node divided by the decision tree, and outputs a new mixed continuous distribution HMM with respect to terminal node for sharing. The invention according to claim 4 is an acoustic model learning method that solves the first and second problems. In the method for determining a speech recognition unit using a phoneme decision tree of a preceding phoneme and a succeeding phoneme, the method is directed to a mixed continuous distribution HMM. after node divided by the clustering and for end node for sharing the results of clustering, which new mixed continuous distribution HMM is found, the new mixed continuous distribution HMM is, HMM to share the clustered as an output a continuous distribution representing a set of, characterized in that it consists of the union of variance and the mean value of all distributions.
[0019]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 shows a procedure of an acoustic model learning method according to an embodiment of the present invention.
[0020]
In FIG. 1, a mixed continuous distribution of each state (HMM) of triphones not shared is first created by learning from learning data and prepared. See step S1.
[0021]
Next, for each triphone configured by the mixed continuous distribution HMM, an HMM node that allows sharing is created. See step S2.
[0022]
Next, each node is divided by the decision tree of the preceding phoneme and the succeeding phoneme for the mixed continuous distribution HMM. See steps S3 to S6.
[0023]
At the time of node division, the mixed continuous distribution HMM used for calculating the information amount is configured such that the information amount is maximized. An example of the configuration method will be described below.
[0024]
(1) As shown in FIG. 2, all the distributions constituting the triphone included in the divided nodes are clustered into a predetermined number of clusters. In FIG. 2, 1 is a node, 2a to 2n are states included in the node 1, 3a to 3n are continuous distributions, and each state corresponds to a plurality of continuous distribution HMMs. That is, a mixed continuous distribution HMM is obtained. In the illustrated example, the number of clusters is three, that is, the clusters 4a to 4c.
[0025]
(1a) For clustering, a top-down K-means method (K-means method: clustering method of discrete data), a bottom-up method Furthest Neighbor method (Furzest Neighbor method: clustering method of discrete / continuous data) ) Is used. However, the number of continuous distribution HMMs included in each cluster is plural, and a lower limit is set in advance.
[0026]
(1b) As input data at the time of clustering, the number of appearances in the learning data of each state of the HMM, the distribution weight of the mixed continuous distribution constituting the state, the average value, and the variance are given, and the number of appearances of each distribution approximately Can be calculated, and the centroid calculation is performed by weighting the number of occurrences of the distribution.
[0027]
(2) After clustering, for each cluster, one new mixed continuous distribution is synthesized from all the distributions included in the cluster to be a representative distribution. In FIG. 2, the continuous mixture distributions 5a to 5c are representative distributions newly synthesized for each of the clusters 4a to 4c.
[0028]
(2a) The average value of this combined distribution is obtained by weighting and averaging the average value of all the distributions with the number of appearances, as shown in Equation 1 below.
[0029]
(Equation 1)

[0030]
(2b) The variance of the combined distribution is calculated from the variance of the entire distribution (intra-group variance), the variance between the distributions (inter-group variance), and the number of appearances, as shown in Equation 2 below.
[0031]
(Equation 2)

[0032]
(2c) The distribution weight of the distribution to be synthesized is obtained from the ratio of the number of appearances in the learning data, as shown in Expression 3 below.
[0033]
[Equation 3]

[0034]
Using the continuous mixture distribution obtained by the above procedure, the reference information amount is calculated in the same manner as in the past, and the reference information amount after division is improved by the decision tree in which the reference information amount is more improved than before the division. Is divided.
For all the terminal nodes after the division, a phoneme decision tree in which the reference information amount is most improved by the same division is selected, and node division is performed by the selected phoneme decision tree. This operation is sequentially repeated until the reference information amount after division exceeds a preset threshold value (see a loop from step S7 to step S3).
[0036]
If the reference information amount exceeds the threshold value in all the end nodes after division, the division of the nodes is stopped (see step S8).
[0037]
As described above, triphones belonging to the same terminal node share one HMM. At this time, as the HMM to be shared, a new continuous mixture distribution is synthesized and obtained by this procedure using the above-described procedure for the node to be shared, and the synthesized continuous mixture distribution is output. See step S9. That is, one new mixed continuous distribution is synthesized from all the distributions included in the node based on Equations 1 to 3 and output.
[0038]
In this manner, one continuous distribution is conventionally selected by obtaining a new continuous distribution combining the average value and the variance value of all distributions serving as elements as an output continuous distribution representing a set of shared HMMs after clustering. It is possible to solve the problem that the acoustic characteristics of the all-phoneme environment shared only for the purpose cannot be expressed. That is, the acoustic characteristics of the shared all-phoneme environment can be expressed.
[0039]
In addition, as a continuous output distribution representing a set of shared HMMs, a method of obtaining a continuous distribution obtained by combining the average value and the variance of all distributions serving as elements is applied to a conventional method for a single continuous distribution HMM. By doing so, even when a single continuous distribution HMM is targeted, the acoustic characteristics of the shared all-phoneme environment can be expressed.
[0040]
【The invention's effect】
As described above, according to the present invention, it is possible to determine a speech recognition unit using a phoneme decision tree for a mixed continuous distribution HMM having high recognition performance.
[0041]
Further, even when a single continuous distribution HMM is targeted, it is possible to express the acoustic characteristics of the shared all-phoneme environment.
[Brief description of the drawings]
FIG. 1 is a diagram showing a procedure of an acoustic model learning method according to an embodiment of the present invention.
FIG. 2 is a diagram showing a continuous distribution construction method for representing nodes according to an embodiment of the present invention.
FIG. 3 is a diagram showing a procedure of a conventional acoustic model learning method.
FIG. 4 is a diagram showing a conventional continuous distribution construction method for expressing nodes.
[Explanation of symbols]
1 Nodes 2a to 2n States 3a to 3n Continuous distributions 4a to 4c Clusters 5a to 5c Continuously mixed distribution newly synthesized for each cluster

Claims

先行音素と後続音素の音素決定木による音声認識単位の決定方法において、
混合連続分布ＨＭＭ（ＨＭＭは隠れマルコフモデル）を対象としてクラスタリングによりノード分割を行い、共有化を行う末端ノードに対して新しい混合連続分布ＨＭＭを出力することを特徴とする音響モデル学習方法。In a method for determining a speech recognition unit using a phoneme decision tree of a preceding phoneme and a subsequent phoneme,
Mixed continuous distribution HMM (HMM Hidden Markov Models) performs node divided by the clustering intended for acoustic model learning method and outputting the new mixed continuous distribution HMM with respect to terminal node for sharing.

請求項１記載の音響モデル学習方法において、
前記ノード分割は、音素決定木を選ぶ過程で各ノードに含まれる混合連続分布をクラスタリングして、情報量が最も向上する音素決定木を選び、その音素決定木により行われるものであることを特徴とする音響モデル学習方法。 The acoustic model learning method according to claim 1,
The node division is performed by clustering a mixed continuous distribution included in each node in a process of selecting a phoneme decision tree, selecting a phoneme decision tree with the most improved information amount, and performing the phoneme decision tree. Acoustic model learning method.

請求項１記載の音響モデル学習方法において、
前記ノード分割は、音素決定木を選ぶ過程で各ノードに含まれる混合連続分布を離散値と見なし、Ｋ−ｍｅａｎｓ法（ケイ−ミーンズ法）によりクラスタリングして、情報量が最も向上する音素決定木を選び、その音素決定木により行われるものであることを特徴とする音響モデル学習方法。 The acoustic model learning method according to claim 1,
In the node division, a mixed continuous distribution included in each node is regarded as a discrete value in a process of selecting a phoneme decision tree, and is clustered by a K-means method (K-means method). A learning method of an acoustic model, characterized in that the method is performed by the phoneme decision tree .

請求項１乃至請求項３のいずれかに記載の音響モデル学習方法において、
前記新しい混合連続分布ＨＭＭは、共有化を行う末端ノードに対して、クラスタリングの結果求められるものであり、
クラスタリング後の共有するＨＭＭの集合を代表する出力連続分布として、全分布の平均値と分散値を結合したものからなることを特徴とする音響モデル学習方法。 The acoustic model learning method according to any one of claims 1 to 3,
The new mixed continuous distribution HMM is obtained as a result of clustering for terminal nodes to be shared,
A method for learning an acoustic model, comprising a combination of an average value and a variance value of all distributions as an output continuous distribution representing a set of shared HMMs after clustering.