JP2864775B2

JP2864775B2 - Voice recognition device

Info

Publication number: JP2864775B2
Application number: JP3064193A
Authority: JP
Inventors: 真二古賀
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1991-03-28
Filing date: 1991-03-28
Publication date: 1999-03-08
Anticipated expiration: 2014-03-08
Also published as: JPH04298796A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device.

【０００２】[0002]

【従来の技術】一般に、音声認識装置は非常に多くの演
算量を要する。よって、実時間で音声を認識するために
は、処理速度が速い計算機や専用ハードウェアの使用、
あるいは演算量の削減等が必要である。演算量の削減方
法としては、迫江、藤井らによる、電子情報通信学会技
術研究報告Ｖｏｌ．８７Ｎｏ．９０１９８７年６月
２６日のページ３３〜４０に掲載の論文「ビームサーチ
とベクトル量子化によるＤＰマッチングの高速化」（以
下、文献１と記す）に述べられているようなビームサー
チを用いる方法がある。2. Description of the Related Art Generally, a speech recognition apparatus requires a very large amount of calculation. Therefore, in order to recognize speech in real time, use of a computer with high processing speed or dedicated hardware,
Alternatively, it is necessary to reduce the amount of calculation. As a method for reducing the amount of computation, Sakoe, Fujii et al., IEICE Technical Report Vol. 87 No. 90 A method using a beam search as described in a paper "Speeding up DP matching by beam search and vector quantization" (hereinafter referred to as Reference 1) published on pages 33 to 40 on June 26, 1987. There is.

【０００３】文献１による方法を用いた従来の音声認識
装置では入力パターンと標準パターンとのマッチングに
より認識を行う構成となっており、このマッチングを漸
化式計算を用いての最適経路問題を解くことにより行っ
ている。漸化式計算では入力パターンの時刻ｉに対する
標準パターンの時刻ｊでの累積距離ｇ（ｉ，ｊ）を１≦
ｉ≦Ｉ、１≦ｊ≦Ｊ（Ｉ：入力パターンの時間長、Ｊ：
標準パターンの時間長）に対して求めている。そして、
ｇ（ｉ，ｊ）が時刻ｉでの最小累積距離に一定値を加え
た値より大きい場合、点（ｉ，ｊ）は最適パス上にある
可能性が低いとして、その延長上となる探索を省略する
という枝刈により全体の演算量を削減している。A conventional speech recognition apparatus using the method described in Document 1 is configured to perform recognition by matching an input pattern with a standard pattern, and solves the optimal path problem by using a recurrence formula calculation for this matching. By doing that. In the recurrence formula calculation, the accumulated distance g (i, j) at time j of the standard pattern with respect to time i of the input pattern is 1 ≦
i ≦ I, 1 ≦ j ≦ J (I: time length of input pattern, J:
Standard pattern time length). And
If g (i, j) is larger than the minimum cumulative distance at time i plus a fixed value, it is determined that the point (i, j) is unlikely to be on the optimal path , and a search on its extension is performed. The total amount of calculation is reduced by pruning to omit.

【０００４】[0004]

【発明が解決しようとする課題】上述した従来の音声認
識装置では、枝刈されないパスの数が一定ではないの
で、残されたパスの数が少なくなることにより正解の結
果に対するパスまで枝刈されてしまう可能性があるとい
う問題点がある。In the above-described conventional speech recognition apparatus, the number of paths that are not pruned is not constant, so that the number of remaining paths is reduced, so that the path corresponding to the correct answer is pruned. There is a problem that there is a possibility.

【０００５】また、枝刈されないパスの数を一定にする
ため、閾値によらず各時刻でｇⁿ（ｉ，ｊ）を小さい順
に一定個数を選択するという枝刈が考えられるが、この
場合はソーティングのための処理量が多くなってしまう
という問題点がある。In order to keep the number of paths that are not pruned constant, pruning is conceivable in which a fixed number of g ⁿ (i, j) is selected in ascending order at each time regardless of the threshold value. There is a problem that the processing amount for sorting increases.

【０００６】本発明は、正解の結果に対するパスを枝刈
することなく、高速にかつ高性能に未知音声を認識する
音声認識装置を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a speech recognition apparatus for recognizing an unknown speech at high speed and with high performance without pruning a path for a result of a correct answer.

【０００７】[0007]

【課題を解決するための手段】本発明の音声認識装置
は、音声信号を分析して特徴ベクトル時系列を出力する
特徴分析部と、あらかじめ作成された標準パターンを蓄
えておく標準パターン記憶部と、累積距離を蓄える累積
距離記憶部と、前記累積距離に対する閾値を保持する閾
値保持部と、前記累積距離記憶部に蓄えられた累積距離
と前記特徴ベクトル時系列と前記標準パターンとから新
しい累積距離を求める漸化式計算部と、この漸化式計算
部で求められた累積距離の中から前記閾値保持部に保持
された閾値より小さい累積距離の個数を求める累積距離
カウント部と、前記閾値より小さい累積距離の個数があ
らかじめ定めた値より小さい場合には前記漸化式計算部
で求められた累積距離をすべて出力し、あらかじめ定め
た値より大きい場合には前記閾値より小さい累積距離を
出力する累積距離出力部と、新しい閾値を求める閾値計
算部と、前記累積距離出力部から出力される累積距離よ
り前記未知音声に対する認識結果を求める結果決定部と
を有し、または、前記漸化式計算部から出力される累積
距離からこの累積距離の平均値と分散を求める平均値分
散計算部と、前記平均値および分散と前記累積距離の個
数とから閾値を求める閾値計算部と、前記閾値保持部に
保持された閾値より小さい前記累積距離を出力する累積
距離出力部とを有している。According to the present invention, there is provided a speech recognition apparatus comprising: a feature analysis unit for analyzing a speech signal and outputting a time series of feature vectors; and a standard pattern storage unit for storing a standard pattern created in advance. A cumulative distance storage unit for storing a cumulative distance, a threshold storage unit for storing a threshold value for the cumulative distance, a new cumulative distance from the cumulative distance stored in the cumulative distance storage unit, the feature vector time series, and the standard pattern. Recurrence formula calculation unit for calculating, the cumulative distance counting unit for calculating the number of cumulative distances smaller than the threshold held in the threshold value holding unit from the cumulative distances calculated by the recurrence formula calculation unit, When the number of small cumulative distances is smaller than a predetermined value, all the cumulative distances obtained by the recurrence formula calculation unit are output, and when the number is smaller than a predetermined value. Is a cumulative distance output unit that outputs a cumulative distance smaller than the threshold, a threshold calculation unit that determines a new threshold, and a result determination unit that determines a recognition result for the unknown voice from the cumulative distance output from the cumulative distance output unit. Having, or an average value variance calculation unit that calculates the average value and variance of this cumulative distance from the cumulative distance output from the recurrence formula calculation unit, and a threshold value from the average value and variance and the number of the cumulative distance. A threshold calculation unit to be obtained; and a cumulative distance output unit that outputs the cumulative distance smaller than the threshold value stored in the threshold value storage unit.

【０００８】[0008]

【作用】本発明の音声認識装置は、入力パターンと標準
パターンとのマッチングの最適経路探索において枝刈を
行うことにより演算量を削減する音声認識装置に関し、
過剰な枝刈を防ぎ、高速にかつ高性能な音声認識装置を
実現するものである。The speech recognition apparatus according to the present invention relates to a speech recognition apparatus that reduces the amount of computation by performing pruning in an optimal path search for matching between an input pattern and a standard pattern.
An object of the present invention is to realize a high-speed and high-performance speech recognition device by preventing excessive pruning.

【０００９】まず、一定数以上のパスを必ず残すように
して過剰な枝刈を抑える場合について述べる。認識に用
いる標準パターンは、標準的な発声より求めた特徴ベク
トル時系列よりあらかじめ作成されているものとする。
なお、特徴ベクトル時系列は、古井著、１９８５年、東
海大学出版会発行の「ディジタル音声処理」（以下、文
献２と記す）のページ１５４−１６０に述べられている
メルケプストラムによる方法等を用いて作成される。First, a case where excessive pruning is suppressed by always leaving a certain number of paths or more will be described. It is assumed that a standard pattern used for recognition is created in advance from a feature vector time series obtained from a standard utterance.
Note that the feature vector time series is obtained by a method using a mel-cepstral described in pages 154 to 160 of "Digital Speech Processing" (hereinafter, referred to as Reference 2) published by Tokai University Press, 1985, by Tokai University Press. Created.

【００１０】未知音声を認識する際には、文献１で述べ
られているように、入力パターンＡ＝ａ₁ａ₂…ａ_i…
ａ_Iと標準パターンＢⁿ＝ｂⁿ ₁ｂⁿ ₂…ｂⁿ _j…ｂⁿ
_Ｊ ⁿ（ｎ＝１，２，…，Ｎ）とのマッチングにより認識
を行う。ここで、Ｉは入力パターンのフレーム数、Ｊⁿ
は標準パターンｎのフレーム数、Ｎは標準パターン数で
ある。このマッチングを式（１）の漸化式を用いて最適
経路問題を解くことにより行う。When recognizing an unknown voice, as described in Reference 1, input pattern A = a ₁ a ₂ ... A _i .
a _I and the standard pattern ^{^{_{^{B n = b n 1 b n}}}} 2 ... b n j ... b n
Recognition is performed by matching with _J ⁿ (n = 1, 2,..., N). Here, I is the number of frames of the input pattern, J ⁿ
Is the number of frames of the standard pattern n, and N is the number of standard patterns. This matching is performed by solving the optimal path problem using the recurrence equation of equation (1).

【００１１】 [0011]

【００１２】ここで、ｄⁿ（ｉ，ｊ）は特徴ベクトルａ
_iとｂⁿjとの間の距離、ｇⁿ（ｉ，ｊ）は入力パター
ンのフレームｉと標準パターンｎのフレームｊとの累積
距離である。入力パターンのフレームｉにおいて式
（１）をｊ＝１，２，…，Ｊⁿ、ｎ＝１，２，…，Ｎの
各格子点で計算した後、ｉ＋１→ｉとして次のフレーム
の処理に進む。このとき、ｇⁿ（ｉ，ｊ）＜θ（ｉ）（２）を満たすｇⁿ（ｉ，ｊ）の個数ｈⁿ（ｉ）が、標準パター
ン毎で枝刈を行う場合は式（３）を、全標準パターンに
対して枝刈を行う場合は式（４）を満たすならばｉにお
ける枝刈は行わず探索を続け、満たさないならば式
（２）を満たさない点（ｎ，ｉ，ｊ）に対してその延長
上となる探索を省略する。Here, d ⁿ (i, j) is a feature vector a
The distance between _i and b ⁿ j, g ⁿ (i, j), is the cumulative distance between the frame i of the input pattern and the frame j of the standard pattern n. Frame i j = 1, 2 Equation (1) in the input ^{pattern, ..., J n, n =} 1,2, ..., was calculated at each lattice point of the N, processing of the next frame as i + 1 → i Proceed to. At this time, the number h ⁿ (i) of g ⁿ (i, j) satisfying g ⁿ (i, j) <θ (i) (2) is equal to the equation (3) when pruning is performed for each standard pattern. If pruning is performed on all the standard patterns, the search is continued without performing pruning at i if Expression (4) is satisfied, and if not, the point (n, i, The search which is an extension of j) is omitted.

【００１３】 [0013]

【００１４】ここで、θ（ｉ）は閾値関数、Ｈはあらか
じめ定められた一定値である。なお、閾値関数θ（ｉ）
としては、式（５）を用いることができる。Here, θ (i) is a threshold function, and H is a predetermined constant value. Note that the threshold function θ (i)
Equation (5) can be used.

【００１５】 [0015]

【００１６】αはあらかじめ定められた一定値である。
そして、各標準パターンでの点（ｎ，Ｉ，Ｊⁿ）までの
計算が終了した後、ｇⁿ（Ｉ，Ｊⁿ）が最も小さい標準
パターンに対応する単語を認識結果とする。Α is a predetermined constant value.
After the point ^(n, I, J n) is computed up to the end of each standard ^{^{pattern, g n (I, J n}} ) are a recognition result a word corresponding to the smallest reference pattern.

【００１７】次に、入力パターンの各フレームでの累積
距離の平均値と分散により閾値関数を決めることにより
過剰な枝刈を抑える場合について述べる。この場合も、
認識に用いる標準パターンは、標準的な発声より求めた
特徴ベクトル時系列よりあらかじめ作成されているもの
とする。Next, a case where excessive pruning is suppressed by determining a threshold function based on the average value and variance of the accumulated distance in each frame of the input pattern will be described. Again,
It is assumed that a standard pattern used for recognition is created in advance from a feature vector time series obtained from a standard utterance.

【００１８】未知音声を認識する際には、先の場合と同
様に入力パターンと標準パターンとのマッチングを式
（１）の漸化式を用いて行う。このとき、入力パターン
の各フレームにおいて、累積距離ｇⁿ（ｉ，ｊ）が式
（２）を満たさない場合、以後の点（ｎ，ｉ，ｊ）の延
長上となる探索を省略する。式（２）での閾値関数θ
（ｉ）は以下の手順で求められる。When recognizing an unknown voice, the matching between the input pattern and the standard pattern is performed by using the recurrence formula of the formula (1) as in the above case. At this time, in each frame of the input pattern, if the cumulative distance g ⁿ (i, j) does not satisfy Expression (2), the subsequent search on the extension of the point (n, i, j) is omitted. Threshold function θ in equation (2)
(I) is obtained by the following procedure.

【００１９】（１）算出された累積距離の個数ｍに対す
る枝刈されずに残しておきたいパスの数Ｐの割合ｒを求
める。(1) The ratio r of the number P of paths that should be left without being pruned with respect to the calculated number m of cumulative distances is obtained.

【００２０】ｒ＝Ｐ／ｍ（６）もしｒが１．０以上ならば、θ（ｉ）＝∞（無限大）と
して以下の処理は行わない。R = P / m (6) If r is 1.0 or more, θ (i) = 処理 (infinity) and the following processing is not performed.

【００２１】（２）累積距離の平均値μと分散σ²を求
める。(2) Find the average value μ and the variance σ ² of the cumulative distance.

【００２２】 [0022]

【００２３】（３）標準正規分布(3) Standard normal distribution

【００２４】 [0024]

【００２５】に対してｘ＝０からｘ＝ｔまでの積分値
ｙ′を示したテーブルを用意し、For this purpose, a table showing the integral y 'from x = 0 to x = t is prepared.

【００２６】 [0026]

【００２７】に対応するｔ＝ｔ_Rを求める。Then, t = t _R corresponding to is obtained.

【００２８】（４）以下の式に従って閾値関数θ（ｉ）
を求める。(4) Threshold function θ (i) according to the following equation
Ask for.

【００２９】 θ（ｉ）＝σ²・ｔ_R＋μ （１１）認識結果は、前述と同様にして求める。Θ (i) = σ ² · t _R + μ (11) The recognition result is obtained in the same manner as described above.

【００３０】一定数のパスを残すような枝刈を行う場
合、以上のように平均値と分散を用いると枝刈に必要な
演算回数は約４ｍ回（加算２ｍ回、乗算ｍ回、比較ｍ
回）となり、ソーティング（クイックソート法）を用い
たときの演算回数ｍ・ｌｏｇｍに比べて、ｍが大きい場
合は問題なく少なくて済む。ｍが小さい場合は後者の方
が演算回数は少なくなるが、実際に汎用マイクロプロセ
ッサ等を使用して演算を行う場合、一般に積和演算は比
較処理に比べ高速に実行できるので、比較処理が多い後
者の方法より前者の方法の方が高速に実行できる。When pruning is performed so as to leave a fixed number of paths, if the average value and the variance are used as described above, the number of operations required for pruning is about 4 m (addition 2 m, multiplication m, comparison m
Times), and when m is large, it can be reduced without any problem compared to the number of operations m · logm when sorting (quick sort method) is used. When the value of m is small, the number of operations is smaller in the latter case. However, when the operation is actually performed using a general-purpose microprocessor or the like, the product-sum operation can generally be executed at a higher speed than the comparison process, and thus the comparison process is often performed. The former method can be performed faster than the latter method.

【００３１】[0031]

【実施例】次に、本発明について図面を参照して説明す
る。Next, the present invention will be described with reference to the drawings.

【００３２】図１は本発明の第１の一実施例を示すブロ
ック図である。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【００３３】図１において、本実施例は、音声信号を分
析して特徴ベクトル時系列を出力する特徴分析部１１
と、あらかじめ作成された標準パターンを蓄えておく標
準パターン記憶部１２と、累積距離を蓄える累積距離記
憶部１４と、累積距離に対する閾値を保持する閾値保持
部１６と、累積距離記憶部１４に蓄えれられた累積距離
と特徴ベクトル時系列と標準パターンとから新しい累積
距離を求める漸化式計算部１３と、漸化式計算部１３で
求められた累積距離の中から閾値保持部１６に保持され
た閾値より小さい累積距離の個数を求める累積距離カウ
ント部１７と、閾値より小さい累積距離の個数があらか
じめ定めた値より小さい場合には漸化式計算部１３で求
められた累積距離をすべて出力し、あらかじめ定めた値
より大きい場合には閾値より小さい累積距離を出力する
累積距離出力部１８と、新しい閾値を求める閾値計算部
１５と、累積距離出力部１８から出力される累積距離よ
り未知音声に対する認識結果を求める結果決定部１９と
を有して構成している。Referring to FIG. 1, this embodiment is characterized by analyzing a speech signal and outputting a feature vector time series.
A standard pattern storage unit 12 for storing a standard pattern created in advance, a cumulative distance storage unit 14 for storing a cumulative distance, a threshold storage unit 16 for storing a threshold value for the cumulative distance, and a storage unit for the cumulative distance storage unit 14. The recurrence formula calculating unit 13 for obtaining a new cumulative distance from the obtained cumulative distance, the feature vector time series, and the standard pattern, and is stored in the threshold value holding unit 16 from among the cumulative distances obtained by the recurrence formula calculating unit 13. The cumulative distance counting unit 17 for calculating the number of cumulative distances smaller than the threshold value, and all the cumulative distances obtained by the recurrence formula calculating unit 13 when the number of cumulative distances smaller than the threshold value is smaller than a predetermined value. A cumulative distance output unit 18 that outputs a cumulative distance smaller than the threshold when the value is larger than a predetermined value, a threshold calculator 15 that calculates a new threshold, And a result determination section 19 for determining the recognition result for the unknown speech than the cumulative distance which is output from the force unit 18 is configured.

【００３４】次に、本第１の実施例の動作について説明
する。Next, the operation of the first embodiment will be described.

【００３５】認識に先立ち、標準パターンＰはあらかじ
め標準パターン記憶部１２に保持されている。まず、未
知音声信号Ｓは、特徴分析部１１へ入力される。特徴分
析部１１では、文献２で述べられているようなメルケプ
ストラムによる方法を用いて、音声信号Ｓが特徴ベクト
ル時系列Ｖ＝｛ｖ₁ ，ｖ ₂，…，ｖ_i，…，ｖ_I｝に変
換される。この特徴ベクトル時系列Ｖ中の１フレーム分
の特徴ベクトルｖ_iと、標準パターン記憶部１２に保持
されている標準パターンＰ＝｛ｐ¹ ₁，ｐ¹ ₂，…，ｐ
¹ _{J 1}，ｐ² ₁，…，ｐ² _J2，…，ｐⁿ ₁，…，
ｐⁿ _J，…，ｐⁿ _Jn，…，ｐ^N ₁，…，ｐ^N _JN｝と、累
積処理記憶部１４に蓄えられている１フレーム前の累積
距離群Ｇ＝｛ｇ¹ ₁，ｇ¹ ₂，…，ｇ¹ _J1，ｇ² ₁，
…，ｇ² _J2，…，ｇⁿ ₁，…，ｇⁿ _J，…，ｇⁿ _Jn，
…，ｇ^N ₁，…，ｇ^N _JN｝とが漸化式計算部１３に入力
される。漸化式計算部１３では、式（１２），（１３）
に従って現フレームの累積距離群Ｇ′＝｛ｇ′¹ ₁，
…，ｇ′ⁿ _J，…，ｇ′^N _JN｝および式（１２）において、１フレーム前の累積距離ｇⁿ _J，
ｇⁿ _J-1，ｇⁿ _J-2のいずれかが累積距離群Ｇにない場
合、その累積距離を用いる処理はスキップされる。累積
距離の最小値ｇ″は閾値計算部１５に入力され、それに
あらかじめ定められた一定値Ａを加えた値が次フレーム
に用いる閾値Ｔ′として出力される。閾値Ｔ′は閾値保
持部１６に入力され、保持される。累積距離カウント部
１７では、累積距離群Ｇ′と閾値保持部１６に保持され
ている閾値Ｔが入力され、累積距離群Ｇ′の各要素ｇ′
ⁿ _Jのうち大きさが閾値Ｔ以下のものの（ｎ，ｊ）とそ
の総数が求められ累積距離情報Ｆとして出力される。累
積距離出力部１８では、累積距離情報Ｆと漸化式計算部
１３から出力された累積距離群Ｇ′が入力され、閾値Ｔ
以下のものの総数があらかじめ定められた一定値Ｈより
大きい場合は累積距離情報Ｆ内の（ｎ，ｊ）に対応する
ｇ′ⁿ _Jだけが、そうでない場合は全てのｇ′ⁿ _Jが有
効累積距離群Ｇ″として出力される。有効累積距離群
Ｇ″は累積距離記憶部１４と結果決定部１９に入力され
る。累積距離記憶部１４では、有効累積距離群Ｇ″が１
フレーム前の累積距離群として記憶される。結果決定部
１９では、有効累積距離群Ｇ″が最終フレームに対する
ものであった場合、Ｊ_n（１≦ｎ≦Ｎ）に対する累積距
離が最も小さい標準パターンのカテゴリＯが認識結果と
して出力される。Prior to recognition, the standard pattern P is stored in the standard pattern storage unit 12 in advance. First, the unknown voice signal S is input to the feature analysis unit 11. The feature analyzing unit 11, using the method according to the mel-cepstrum, such as described in Reference 2, the audio signal S is the time-series feature vector _{_{V = {v 1, v 2}} , ..., v i, ..., v I} Is converted to A feature vector v _i for one frame of this feature in vector time series V, standard pattern P = {p ¹ ₁ is held in the reference pattern memory ^{_{12, p 1 2, ...,}} p
^{_{^{_{1 J 1, p 2 1,}}}} ..., p 2 J2, ..., p n 1, ...,
^{_{^{p n J, ..., p n}}} Jn, ..., p N 1, ..., p and ^N _JN}, cumulative distance group G = {g ¹ ₁ of one frame before is stored in the cumulative storage unit 14, g ¹ _{^{_{2, ..., g 1 J1,}}} g 2 1,
^{_{..., g 2 J2, ...,}} g n 1, ..., g n J, ..., g n Jn,
, G ^N ₁ ,..., G ^N _JN } are input to the recurrence formula calculation unit 13. In the recurrence formula calculation unit 13, the formulas (12) and (13)
, The accumulated distance group G ′ = ｛g ′ ¹¹ ₁ ,
…, G ′ ⁿ _J ,…, g ′ ^N _JN ｝ and In equation (12), the cumulative distance g ⁿ _J ,
If any of g ⁿ _J-1 and g ⁿ _J-2 is not in the cumulative distance group G, the process using the cumulative distance is skipped. The minimum value g "of the accumulated distance is input to the threshold value calculation unit 15, and a value obtained by adding a predetermined constant value A thereto is output as a threshold value T 'used for the next frame. The cumulative distance count unit 17 receives the cumulative distance group G 'and the threshold value T stored in the threshold value holding unit 16, and inputs each element g' of the cumulative distance group G '.
(n, j) of ⁿ _J whose size is equal to or smaller than the threshold value T and the total number thereof are obtained and output as cumulative distance information F. The cumulative distance output unit 18 receives the cumulative distance information F and the cumulative distance group G ′ output from the recurrence formula calculating unit 13 and receives the threshold T
If the total number of the following is larger than a predetermined fixed value H, only g ′ ⁿ _J corresponding to (n, j) in the cumulative distance information F, otherwise, all g ′ ⁿ _J are effective cumulative It is output as a distance group G ″. The effective cumulative distance group G ″ is input to the cumulative distance storage unit 14 and the result determination unit 19. In the cumulative distance storage unit 14, the effective cumulative distance group G ″ is 1
It is stored as the accumulated distance group before the frame. When the effective cumulative distance group G ″ is for the last frame, the result determining unit 19 outputs the category O of the standard pattern having the smallest cumulative distance for J _n (1 ≦ n ≦ N) as the recognition result.

【００３６】図２は本発明の第２の実施例を示すブロッ
ク図である。FIG. 2 is a block diagram showing a second embodiment of the present invention.

【００３７】図２において、本第２の実施例は図１に示
す第１の実施例と同じ構成要件には同じ番号が付与され
てあり、第１の実施例と異なる点は漸化式計算部１３か
ら出力される累積距離からこの累積距離の平均値と分散
を求める平均値分散計算部２０と、平均値および分散と
累積距離の個数とから閾値を求める閾値計算部１５ａ
と、閾値保持部１６に保持された閾値より小さい累積距
離を出力する累積距離出力部１８ａとを有している。In FIG. 2, in the second embodiment, the same components as those in the first embodiment shown in FIG. 1 are denoted by the same reference numerals, and the difference from the first embodiment is the recurrence calculation. An average value variance calculating unit 20 for calculating an average value and a variance of the cumulative distance from the cumulative distance output from the unit 13, and a threshold calculating unit 15a for calculating a threshold value from the average value, the variance, and the number of the cumulative distances
And a cumulative distance output unit 18a that outputs a cumulative distance smaller than the threshold value stored in the threshold value storing unit 16.

【００３８】次に、本第２の実施例の動作について説明
する。Next, the operation of the second embodiment will be described.

【００３９】認識に先立ち、標準パターンＰはあらかじ
め標準パターン記憶部１２に保持されている。まず、未
知音声信号Ｓは、特徴分析部１１へ入力される。特徴分
析部１１では、文献２で述べられているようなメルケプ
ストラムによる方法を用いて、音声信号Ｓが特徴ベクト
ル時系列Ｖ＝｛ｖ₁，ｖ ₂，…，ｖ_i，…，ｖ_I｝に変
換される。この特徴ベクトル時系列Ｖ中の１フレーム分
の特徴ベクトルｖ_iと、標準パターン記憶部１２に保持
されている標準パターンＰ＝｛ｐ¹ ₁，ｐ¹ ₂，…，ｐ
¹ _J1，ｐ² ₁，…，ｐ² _J2，…，ｐⁿ ₁，…，ｐⁿ _J，
…，ｐⁿ _Jn，…，ｐ^N ₁，…，ｐ^N _JN｝と、累積処理記
憶部１４に蓄えられている１フレーム前の累積距離群Ｇ
＝｛ｇ¹ ₁，ｇ¹ ₂，…，ｇ¹ _J1，ｇ² ₁，…，
ｇ² _J2，…，ｇⁿ ₁，…，ｇⁿ _J，…，ｇⁿ _Jn，…，ｇ
^N ₁，…，ｇ^N _JN｝とが漸化式計算部１３に入力され
る。漸化式計算部１３では、式（１２），（１３）に従
って現フレームの累積距離群Ｇ′＝｛ｇ′¹ ₁，…，
ｇ′ⁿ _J，…，ｇ′^N _JN｝およびその要素数Ｌが求めら
れる。式（１２）において、１フレーム前の累積距離ｇ
ⁿ _J，ｇⁿ _J-1，ｇⁿ _J-2のいずれかが累積距離群Ｇに
ない場合、その累積距離を用いる処理はスキップされ
る。平均値分散計算部２０では、累積距離群Ｇ′とその
要素数Ｌを用いて次式より累積距離の平均値Ｕと分散Ｚ
が求められ、累積距離分布情報として出力される。Prior to recognition, the standard pattern P is stored in the standard pattern storage unit 12 in advance. First, the unknown voice signal S is input to the feature analysis unit 11. The feature analyzing unit 11, using the method according to the mel-cepstrum, such as described in Reference 2, the audio signal S is the time-series feature vector _{_{V = {v 1, v 2}} , ..., v i, ..., v I} Is converted to A feature vector v _i for one frame of this feature in vector time series V, standard pattern P = {p ¹ ₁ is held in the reference pattern memory ^{_{12, p 1 2, ...,}} p
^{_{^{_{1 J1, p 2 1, ...}}}} , p 2 J2, ..., p n 1, ..., p n J,
, P ⁿ _Jn ,..., P ^N ₁ ,..., P ^N _JN }, and the accumulated distance group G for the immediately preceding frame stored in the accumulation processing storage unit 14.
^{_{^{= {G 1 1, g 1}}} 2, ..., g 1 J1, g 2 1, ...,
^{_{^{g 2 J2, ..., g n}}} 1, ..., g n J, ..., g n Jn, ..., g
^N _1, ..., are input to g ^N _JN} transgressions recurrence formula calculation unit 13. In the recurrence formula calculation unit 13, Equation (12), the cumulative distance group G '= {g' ^{₁ 1} of the current frame according to (13), ...,
g ′ ⁿ _J ,..., g ′ ^N _JN } and the number of elements L thereof are obtained. In equation (12), the cumulative distance g one frame before
^{If any of n} _J , g ⁿ _J-1 and g ⁿ _J-2 is not in the cumulative distance group G, the process using the cumulative distance is skipped. The average value variance calculating unit 20 uses the cumulative distance group G 'and the number L of its elements to calculate the average value U and the variance Z
Is obtained and output as the cumulative distance distribution information.

【００４０】 [0040]

【００４１】閾値計算部１５ａには累積距離群Ｇ′の要
素数Ｌと累積距離分布情報Ｍが入力される。そして、枝
刈されずに残しておきたいパスの数としてあらかじめ定
められた一定値Ｐが累積距離群Ｇ′の要素数Ｌに占める
割合Ｒ＝Ｐ／Ｌが求められる。Ｒが１．０以上の場合、
次フレームに用いる閾値Ｔ′として無限大の値が出力さ
れる。Ｒが１．０未満の場合、あらかじめ用意された、
式（９）の標準正規分布に対してｘ＝０からｘ＝ｔまで
の積分値ｙ′を示したテーブルを用いて、The threshold calculation unit 15a receives the number L of elements of the cumulative distance group G 'and the cumulative distance distribution information M. Then, a ratio R = P / L of a predetermined constant value P as the number of paths to be left without being pruned to the number L of elements of the cumulative distance group G 'is obtained. When R is 1.0 or more,
An infinite value is output as the threshold value T 'used for the next frame.
It is . When R is less than 1.0, it is prepared in advance,
Using a table showing the integral y 'from x = 0 to x = t with respect to the standard normal distribution of equation (9),

【００４２】 [0042]

【００４３】に対応するｔ＝ｔ_Rが求められ、さらに、
式（１７）により求められた閾値が次フレームに用いる
閾値Ｔ′として出力される。T = t _R corresponding to is obtained.
The threshold value obtained by the equation (17) is output as the threshold value T 'used for the next frame.

【００４４】Ｔ′＝Ｚ・ｔ_R＋Ｕ・・・・（１７）閾値Ｔ′は閾値保持部１６に入力され、保持される。累
積距離出力部１８ａでは、累積距離群Ｇ′と閾値保持部
１６に保持されている閾値Ｔが入力され、累積距離群
Ｇ′の各要素ｇ′ⁿ _jのうち大きさが閾値Ｔ以下のもの
が有効累積距離群Ｇ″として出力される。有効累積距離
群Ｇ″では累積距離記憶部１４と結果決定部１９に入力
される。累積距離記憶部１４では、有効累積距離群Ｇ″
が１フレーム前の累積距離群として記憶される。結果決
定部１９では、有効累積距離群Ｇ″が最終フレームに対
するものだった場合、Ｊn（１≦ｎ≦Ｎ）に対する累積
距離が最も小さい標準パターンのカテゴリＯが認識結果
として出力される。T ′ = Z · t _R + U (17) The threshold value T ′ is input to the threshold value holding unit 16 and held. The cumulative distance output unit 18a receives the cumulative distance group G ′ and the threshold value T stored in the threshold value storage unit 16, and the elements g ′ ⁿ _j of the cumulative distance group G ′ whose size is equal to or smaller than the threshold value T Is output as an effective cumulative distance group G ″. In the effective cumulative distance group G ″, it is input to the cumulative distance storage unit 14 and the result determination unit 19. The cumulative distance storage unit 14 stores the effective cumulative distance group G ″.
Is stored as the accumulated distance group one frame before. When the effective cumulative distance group G ″ is for the last frame, the result determining unit 19 outputs the category O of the standard pattern having the smallest cumulative distance to Jn (1 ≦ n ≦ N) as the recognition result.

【００４５】[0045]

【発明の効果】以上説明したように本発明は、音声信号
を分析して特徴ベクトル時系列を出力する特徴分析部
と、あらかじめ作成された標準パターンを蓄えておく標
準パターン記憶部と、累積距離を蓄える累積距離記憶部
と、累積距離に対する閾値を保持する閾値保持部と、累
積距離記憶部に蓄えられた累積距離と特徴ベクトル時系
列と標準パターンとから新しい累積距離を求める漸化式
計算部と、この漸化式計算部で求められた累積距離の中
から閾値保持部に保持された閾値より小さい累積距離の
個数を求める累積距離カウント部と、閾値より小さい累
積距離の個数があらかじめ定めた値より小さい場合には
漸化式計算部で求められた累積距離をすべて出力し、あ
らかじめ定めた値より大きい場合には閾値より小さい累
積距離を出力する累積距離出力部と、新しい閾値を求め
る閾値計算部と、累積距離出力部から出力される累積距
離より未知音声に対する認識結果を求める結果決定部と
を有し、または漸化式計算部から出力される累積距離か
らこの累積距離の平均値と分散を求める平均値分散計算
部と、平均値および分散と累積距離の個数とから閾値を
求める閾値計算部と、閾値保持部に保持された閾値より
小さい累積距離を出力する累積距離出力部とを有するこ
とにより、未知音声の認識を最適経路問題を解くことに
より行う場合に過剰な枝刈りを防ぐことができ、高速で
かつ高性能な音声認識装置を提供することができる効果
がある。As described above, the present invention provides a feature analysis unit for analyzing a speech signal to output a feature vector time series, a standard pattern storage unit for storing a standard pattern created in advance, a cumulative distance Accumulative distance storage unit, a threshold value holding unit for holding a threshold value for the cumulative distance, a recurrence formula calculating unit for obtaining a new cumulative distance from the cumulative distance stored in the cumulative distance storage unit, the feature vector time series, and the standard pattern And a cumulative distance counting unit for calculating the number of cumulative distances smaller than the threshold value held in the threshold value holding unit from among the cumulative distance values obtained by the recurrence formula calculating unit, and the number of cumulative distances smaller than the threshold value is predetermined. If the value is smaller than the value, all the cumulative distances calculated by the recurrence formula calculation unit are output. If the value is larger than the predetermined value, the cumulative distance smaller than the threshold is output. A distance output unit, a threshold calculation unit for obtaining a new threshold, and a result determination unit for obtaining a recognition result for an unknown voice from the cumulative distance output from the cumulative distance output unit, or output from the recurrence formula calculation unit An average value variance calculating unit for calculating an average value and a variance of the cumulative distance from the cumulative distance; a threshold calculating unit for calculating a threshold value from the average value, the variance and the number of the cumulative distances; By providing a cumulative distance output unit that outputs a distance, it is possible to prevent excessive pruning when unknown speech is recognized by solving an optimal path problem, and provide a high-speed and high-performance speech recognition device. There is an effect that can be.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の第１の実施例を示すブロック図であ
る。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【図２】本発明の第２の実施例を示すブロック図であ
る。FIG. 2 is a block diagram showing a second embodiment of the present invention.

【符号の説明】[Explanation of symbols]

１１特徴分析部１２標準パターン記憶部１３漸化式計算部１４累積距離記憶部１５，１５ａ閾値計算部１６閾値保持部１７累積距離カウント部１８，１８ａ累積距離出力部１９結果決定部２０平均値分散計算部 DESCRIPTION OF SYMBOLS 11 Feature analysis part 12 Standard pattern storage part 13 Recurrence formula calculation part 14 Cumulative distance storage part 15, 15a Threshold calculation part 16 Threshold storage part 17 Cumulative distance count part 18, 18a Cumulative distance output part 19 Result determination part 20 Average value dispersion Calculation section

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】音声信号を分析して特徴ベクトル時系列
を出力する特徴分析部と、あらかじめ作成された標準パ
ターンを蓄えておく標準パターン記憶部と、累積距離を
蓄える累積距離記憶部と、前記累積距離に対する閾値を
保持する閾値保持部と、前記累積距離記憶部に蓄えられ
た累積距離と前記特徴ベクトル時系列と前記標準パター
ンとから新しい累積距離を求める漸化式計算部と、この
漸化式計算部で求められた累積距離の中から前記閾値保
持部に保持された閾値より小さい累積距離の個数を求め
る累積距離カウント部と、前記閾値より小さい累積距離
の個数があらかじめ定めた値より小さい場合には前記漸
化式計算部で求められた累積距離をすべて出力し、あら
かじめ定めた値より大きい場合には前記閾値より小さい
累積距離を出力する累積距離出力部と、新しい閾値を求
める閾値計算部と、前記累積距離出力部から出力される
累積距離より前記未知音声に対する認識結果を求める結
果決定部とを有することを特徴とする音声認識装置。A feature analysis unit that analyzes a voice signal and outputs a feature vector time series; a standard pattern storage unit that stores a standard pattern created in advance; a cumulative distance storage unit that stores a cumulative distance; A threshold value holding unit for holding a threshold value for the cumulative distance, and a threshold value stored in the cumulative distance storage unit.
Recurrence formula calculating unit for calculating a new cumulative distance from the accumulated distance, the feature vector time series, and the standard pattern, and the accumulated distance calculated by the recurrence formula calculating unit is held in the threshold value holding unit. A cumulative distance counting unit for calculating the number of cumulative distances smaller than a threshold, and if the number of cumulative distances smaller than the threshold is smaller than a predetermined value, all the cumulative distances calculated by the recurrence formula calculating unit are output; A cumulative distance output unit that outputs a cumulative distance smaller than the threshold when the value is larger than a predetermined value; a threshold calculator that calculates a new threshold value; and a recognition of the unknown voice based on the cumulative distance output from the cumulative distance output unit. A speech recognition device, comprising: a result determination unit for obtaining a result.

【請求項２】前記漸化式計算部から出力される累積距
離からこの累積距離の平均値と分散を求める平均値分散
計算部と、前記平均値および分散と前記累積距離の個数
とから閾値を求める閾値計算部と、前記閾値保持部に保
持された閾値より小さい前記累積距離を出力する累積距
離出力部とを有することを特徴とする請求項１記載の音
声認識装置。2. An average variance calculating unit for calculating an average value and a variance of the cumulative distance from the cumulative distance output from the recurrence formula calculating unit, and a threshold value based on the average value, the variance, and the number of the cumulative distances. 2. The speech recognition apparatus according to claim 1, further comprising: a threshold value calculation unit to be obtained; and a cumulative distance output unit that outputs the cumulative distance smaller than the threshold value stored in the threshold value storage unit.