JPH01241667A

JPH01241667A - Dynamic neural network to have learning mechanism

Info

Publication number: JPH01241667A
Application number: JP63070617A
Authority: JP
Inventors: Kenichi Iso; 健一磯; Hiroaki Sekoe; 迫江　博昭
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-03-23
Filing date: 1988-03-23
Publication date: 1989-09-26
Anticipated expiration: 2011-07-24
Also published as: JP2518007B2

Abstract

PURPOSE:To normalize a presenting pattern to be a fixed continuing time length at a learning time, in which a coupling coefficient between units is determined, to present the pattern to a network and to cause an error in an output layer to be minimum by determining the coupling coefficient so that different between a teacher signal in the output layer and a real output value can be caused to be small. CONSTITUTION:A correcting quantity computing part 40 calculates the correcting quantity of the coupling coefficient with using learning data, which are sent from a time base matching part 30, and the coupling coefficient to be stored in an interunit coupling coefficient storing part 50 and sends the quantity to a coupling coefficient correcting part 60. The coupling coefficient correcting part 60 adds the correcting quantity to the coupling coefficient, which is stored in the interunit coupling coefficient storing part 50, and reloads the coefficient. The correcting quantity computing part 40 repeats this correcting operation until the correcting quantity to all the coefficients goes to be smaller than a threshold value, which is determined in advance, or until a correcting circuit goes over a number to be determined in advance. Thus, the fluctuation of a uttering time length in unknown sound data is normalized by a dynamic programming at the time of recognizing operation and inputted to a neural network.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声等の時系列パターンの認識に用いるパター
ン学習機構を有するダイナミック・ニューラル・ネット
ワークに関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a dynamic neural network having a pattern learning mechanism used for recognizing time-series patterns such as speech.

（従来の技術）ニューラル・ネットワークは生体の脳神経系が比較的単
純な動作特性を有する神経細胞とその間の多数の結合か
ら構成されている情報処理システムであることを参考に
して考案された情報処理モデルで、神経細胞に相当する
処理ユニット（以下ユニットと略す）とその間を結ぶユ
ニット間結合を有する。このユニット間結合の係数を変
えることによってシステムはさまざまな情報処理動作を
行なう。(Prior Art) A neural network is an information processing system devised based on the fact that the biological nervous system is an information processing system composed of neurons with relatively simple operating characteristics and numerous connections between them. This model has processing units (hereinafter referred to as units) corresponding to neurons and inter-unit connections connecting them. By changing the coefficients of this inter-unit coupling, the system performs various information processing operations.

このニューラル・ネットワーク・モデルは情報処理シス
テムとして特に画像や音声等のパターン認識処理に有効
であろうと期待されており、その詳細に関しては［日経
エレクトロニクス１誌、第４２７号の第１１５頁（昭和
６２年８月１０日発行）［ニューラル・ネットをパター
ン認識、信号処理、知識処理に使う」に解説されている
。（以下、文献１と称する。）上記文献１によるとニューラル・ネットワークは第２図
に示すように、入力層、中間層、出力層と呼ばれる階層
構造を有しており、各層は複数のユニットから構成され
ている。またユニット間結合は隣接する層の間にだけ許
され、層内でのユニット間結合は禁止されている。認識
時にはネットワークは入力層の各ユニットの活性度とし
て入力データを与えられ、ユニット間結合を通じて順次
隣接する中間層へ情報を伝達し、最後に出力層にまで到
達する。こうして入力データに対するネットワークの応
答結果が出力層のユニットの活性度のパターンとして得
られる。This neural network model is expected to be effective as an information processing system, especially for pattern recognition processing of images and sounds, etc. For details, see [Nikkei Electronics 1, No. 427, p. Published on August 10, 2017) [Using Neural Nets for Pattern Recognition, Signal Processing, and Knowledge Processing]. (Hereinafter referred to as Document 1.) According to Document 1, a neural network has a hierarchical structure called an input layer, a middle layer, and an output layer, as shown in Figure 2, and each layer consists of multiple units. It is configured. Further, inter-unit coupling is allowed only between adjacent layers, and inter-unit coupling within a layer is prohibited. During recognition, the network is given input data as the activation level of each unit in the input layer, and the information is sequentially transmitted to adjacent intermediate layers through connections between units, and finally reaches the output layer. In this way, the response result of the network to the input data is obtained as a pattern of the activity levels of the units in the output layer.

ネットワークが指定した動作を行なうようにユニット間
結合を定める為には教師付き学習と呼ばれる手法を用い
る。即ち、入力層に学習させたいパターンを提示し、出
力層には対応して出力すべき教師信号を提示して、出力
層での教師信号と実際の出力値との差異を小さくするよ
うに結合係数を決定する。上記のような構成のニューラ
ル・ネットワークの場合には、この出力誤差最小化学習
はパックフロバゲーション学習と呼ばれており、その詳
細なアルゴリズムに関しては文献１に詳しい。A method called supervised learning is used to determine connections between units so that the network performs specified operations. In other words, a pattern to be learned is presented to the input layer, a corresponding teacher signal to be output is presented to the output layer, and the combination is made to reduce the difference between the teacher signal and the actual output value in the output layer. Determine the coefficients. In the case of a neural network configured as described above, this output error minimization learning is called packflobagation learning, and its detailed algorithm is detailed in Reference 1.

（発明が解決しようとする問題点）このようなニューラル・ネットワークを音声認識に用い
ることができれば、音声パターンの有する多様性を学習
によって吸収して、良好な認識特性実現できる可能性が
あるが、実際に上記のニューラル・ネットワークを音声
認識に用いる為には、いくつかの解決しなければならな
い問題が存在する。(Problems to be Solved by the Invention) If such a neural network can be used for speech recognition, it may be possible to absorb the diversity of speech patterns through learning and achieve good recognition characteristics. In order to actually use the above neural network for speech recognition, there are several problems that must be solved.

第一に音声は同じカテゴリ（例えば単語）のパターンで
も発声の度に、或は話者毎にその継続時間長が異なるの
で、長さの異なる音声パターンを同じニューラル・ネッ
トワークの入力層に提示する為の工夫が必要となる。First, even if speech is a pattern of the same category (for example, a word), its duration differs each time it is uttered or for each speaker, so speech patterns of different lengths are presented to the input layer of the same neural network. It is necessary to devise ways to do so.

第二に長さの異なる音声パターンをニューラル・ネット
ワークの入力に提示できたときに、ネットワークが期待
する認識動作を行なうようにユニット間結合を定める学
習方法を確立しなければならない。Second, when speech patterns of different lengths can be presented as input to a neural network, a learning method must be established to determine connections between units so that the network performs the expected recognition operations.

本発明は固定時間長の特徴パラメータ時系列を入力でき
る入力層を持つニューラル・ネットワークに長さの異な
る音声パターンを提示する為に認識時は出力層の出力が
最大になるように入力層の時間軸と入力音声時系列との
対応付けを行い、ユニット間結合係数を定める学習時に
は提示するパターンを固定継続時間長に正規化してネッ
トワークに提示して出力層での誤差を最小にする教師付
きの学習機構を有するダイナミック・ニューラル・ネッ
トワークを提供しようとするものである。In order to present speech patterns of different lengths to a neural network having an input layer that can input feature parameter time series with a fixed time length, the time of the input layer is adjusted so that the output of the output layer is maximized during recognition. A supervised system that associates the axes with the input audio time series and determines the inter-unit coupling coefficient, normalizes the presented pattern to a fixed duration length and presents it to the network to minimize the error in the output layer. This paper attempts to provide a dynamic neural network with a learning mechanism.

（問題点を解決するための手段）本発明は音声等の時系列パターンを認識するニューラル
・ネットワークで、入力・出力層と複数の中間層から構
成される階層構造を有し、更に入力層と中間層が時間軸
に対応する時系列的構造を有し、認識時には動的計画法
によって入力時系列パターンの時間軸をニューラル・ネ
ットワークの出力が最大になるように入力層の持つ時間
軸と対応付けを行い、その時の出力層の出力を認識結果
とするダイナミック・ニューラル・ネットワークに於て
、その各階層間のユニット間結合係数を学習するに際し
て、入力層の時間長と同じ一定の継続時間長に正規化し
た学習用時系列パターンを入力層に提示し、出力層には
対応して出力すべき教師信号を提示して、出力層での教
師信号と実際の出力値の差異を小さくするように結合係
数を決定する教師付き学習を行なう機構を有することを
特徴とする。(Means for Solving the Problems) The present invention is a neural network that recognizes time-series patterns such as speech, and has a hierarchical structure consisting of an input/output layer and a plurality of intermediate layers. The middle layer has a time-series structure corresponding to the time axis, and during recognition, dynamic programming is used to match the time axis of the input time-series pattern with the time axis of the input layer so that the output of the neural network is maximized. In a dynamic neural network where the recognition result is the output of the output layer at that time, when learning the coupling coefficient between units between each layer, a constant time length that is the same as the time length of the input layer is used. The training time series pattern normalized to the input layer is presented to the input layer, and the corresponding teaching signal to be output is presented to the output layer in order to reduce the difference between the teaching signal and the actual output value in the output layer. It is characterized by having a mechanism for performing supervised learning to determine coupling coefficients.

（作用）本発明の詳細な説明を簡単のために中間層を１層にした
３層構造のモデルを用いて行なう。中間層が２層以上の
場合にも同様に適用できることは言うまでもない。(Function) For the sake of simplicity, the present invention will be described in detail using a three-layer structure model with one intermediate layer. Needless to say, the present invention can be similarly applied to cases where there are two or more intermediate layers.

モデルの入力層はＰ次元の特徴ベクトルの時系列（長さ
Ｊ）を受は取ることができるようにＪＸＰ個のユニット
から構成されている。この入カニニットの出力値をｙ”
）７（ｐ）（ｊ　＝　１〜Ｊ、ｐ＝１〜Ｐ）とする。一
般には入力層の時間軸の長さＪと認識時に入力される入
力時系列パターンａｉ（ｐＸｉ＝　１〜Ｌｐ　＝　１〜
Ｐ）の長さＩは異なるので、入力時系列の時間軸になん
らかの伸縮変換を施して長さＪに揃えなければならない
。入力層の時間軸ｊと入力時系列パターンの時間軸ｉで
構成される平面（ｉｊ）上での対応関係を次式で表わす
。The input layer of the model is composed of JXP units so that it can receive a time series (length J) of P-dimensional feature vectors. The output value of this input crab unit is y”
)7(p) (j = 1 to J, p = 1 to P). In general, the length J of the time axis of the input layer and the input time series pattern ai (pXi = 1 ~ Lp = 1 ~
Since the lengths I of P) are different, it is necessary to perform some expansion/contraction transformation on the time axis of the input time series to make it equal to the length J. The correspondence relationship on a plane (ij) formed by the time axis j of the input layer and the time axis i of the input time series pattern is expressed by the following equation.

ｃ（ｋ）−（ｉ（ｋ）ｊ（ｋ））、（ｋ＝１〜Ｋ）　　
　　　　　・・・（１）但し、この関係を用いて入カニニットの出力値ｙ（１）Ｊ（ｋ
、（ｐ）す＝１〜Ｊ、ｐ＝１〜Ｐ）はｙ（１）ｉ（ｋ、
（ｐ）＝ａ、ｋ）（ｐ）・・・（３）と表わされる。即
ち、入カニニットは時間軸を整合して入力されたデータ
をそのまま次の層へ伝達することになる。c(k)-(i(k)j(k)), (k=1~K)
...(1) However, using this relationship, the output value y(1)J(k
, (p)s=1~J, p=1~P) is y(1)i(k,
It is expressed as (p)=a,k)(p)...(3). That is, the input unit aligns the time axis and transmits the input data as is to the next layer.

中間層はＪＸＭ個のユニット（隠れユニットと呼ぶ）か
ら構成され、各ユニットへの入力値ｘ（２）ｊ（ｍ）θ
＝１〜Ｊ、ｍ＝１〜Ｍ）は入カニニットの出力値ｙ（１
）ｊ（ｐ）と入カニニットと隠れユニットの間の結合係
数ｐｏＪ（ｍ、ｐ）、Ｉ３’、（ｍ、ｐ）を用いて次式
のように与えられる。The middle layer is composed of JXM units (called hidden units), and the input value to each unit is x(2)j(m)θ
= 1 ~ J, m = 1 ~ M) is the output value y (1
)j(p) and the coupling coefficients poJ(m, p), I3', (m, p) between the incoming crab unit and the hidden unit, as shown in the following equation.

このようにｊ（ｋ）番目の隠れユニットは入力層の１（
ｋ）番目と１（ｋ−１）番目のユニットからだけ情報を
受は取るようにユニット間結合を制限したニューラル・
ネットワークの構造を時系列構造と呼ぶことにする。こ
のようなネットワークの構造は音声パターン等のように
データ自体が時系列的な構造を持っている場合には、完
全結合（すべての入カニニットとすべての隠れユニット
を結ぶ）に比べて少ないユニット間結合でモデルが構成
できるので、認識・学習時の計算量を大幅に削減するこ
とができる。式４で与えられる入力に対する隠れユニッ
トの応答は次のようになる。In this way, the j(k)th hidden unit is the 1(
A neural network that restricts connections between units so that it only receives information from the k)th and 1st (k-1)th units.
The structure of the network will be called the time-series structure. When the data itself has a time-series structure, such as a voice pattern, the structure of such a network requires fewer units than a complete connection (connecting all incoming units and all hidden units). Since a model can be constructed by combining, the amount of calculation during recognition and learning can be significantly reduced. The response of the hidden unit to the input given by Equation 4 is as follows.

ｙ已（ｍ）＝ｆ（ｘ（２）ｊ（ｍ）−θ（２）ｉ（ｍ）
）　　　　　　　−（５）ｆ（ｘ）＝　１／（１＋　ｅ
−”）　　　　　　　　　　　　−（６）ここでθ”、
（ｍ）は隠れユニット（ｊ、ｍ）が持つ閾値である。式
（６）から明らかなように隠れユニットは一種の閾値論
理の働きをしている。y(m)=f(x(2)j(m)−θ(2)i(m)
) −(5)f(x)=1/(1+e
−”) −(6) where θ”,
(m) is a threshold value that the hidden unit (j, m) has. As is clear from equation (6), the hidden unit functions as a kind of threshold logic.

出力層は認識対象となるＮ個のカテゴリに対応するＮ個
のユニットから構成されている。ｎ番目の出カニニット
への入力値ｘ（３）（ｎ）（ｎ　＝　１〜Ｎ）は隠れユ
ニットの出力値ｙ（２）、（ｍ）と隠れユニットと出カ
ニニットの間の結合係数α”（ｊ、ｍ）を用いて次式の
ように与えられる。The output layer is composed of N units corresponding to N categories to be recognized. The input value x(3)(n) (n = 1 to N) to the n-th output unit is the output value y(2), (m) of the hidden unit and the coupling coefficient α between the hidden unit and the output unit It is given as follows using (j, m).

出カニニットの入出力の応答関係は式２と同じである。The response relationship between the input and output of the output unit is the same as Equation 2.

ｙ（３）（ｎ）：欣（３）（ｎ）−〇（３）（ｎ））　
　　　　　　　　・・・（８）ここでθ（３）（ｎ）は
出カニニットｎの持つ閾値である。y(3)(n): 欣(3)(n)−〇(3)(n))
...(8) Here, θ(3)(n) is the threshold value of output unit n.

こうして得られるネットワークの出力値ｙ（３）（ｎ）
は式１で与えられている入力時系列の時間軸と入カニニ
ット層の時間軸の対応関係（ｃ（ｋ））に依存している
。最終的なカテゴリｎのネットワークによる認識結果は
（ｃ（ｋ））に関して最適化された（最大化された）出
力値ｏｎとして得られる。The output value of the network obtained in this way y(3)(n)
depends on the correspondence relationship (c(k)) between the time axis of the input time series and the time axis of the input Kannit layer given by Equation 1. The final recognition result by the network for category n is obtained as an optimized (maximized) output value on with respect to (c(k)).

ここで式（８）は単調関数なので式（９）は・・・（１
０）と置き換えても同じである。ここでｆＯの中の特徴ベク
トルの成分ｐに関する和は省略した。式（１０）の（）
の中の式を γ（ｃ（ｋ）、ｃ（ｋ−１））と定義すると、式（１０）は％式％（１２）となり、この最適化は良く知られた動的計画法を用いて
解くことができることが分かる。即ち、ｒ（ｃ（ｋ）、
ｃ（ｋ−１））の累積和をｇ（ｋ）として、次の漸化式
を計算してｏｎ＝　ｇ（Ｋ）を求めればよい。Here, equation (8) is a monotone function, so equation (9) is...(1
0) is the same. Here, the sum of the component p of the feature vector in fO is omitted. () in formula (10)
If we define the expression in γ(c(k), c(k-1)), equation (10) becomes %expression %(12), and this optimization uses the well-known dynamic programming method. You can see that it can be solved by That is, r(c(k),
The cumulative sum of c(k-1)) is set as g(k), and the following recurrence formula is calculated to obtain on=g(K).

次にニューラル・ネットワーク・モデルのパラメータで
あるユニット間結合係数（ｐｏ、（ｍ＋ｐ）ｙ　ｐ”；
（ｍ、ｐ）−α”（ｉ、ｍ））と閾値（θ（２）ｉ（ｍ
）、θ（３）（ｎ））を決定する学習法について説明す
る。Next, the inter-unit coupling coefficient (po, (m+p)y p”, which is a parameter of the neural network model);
(m,p)−α”(i,m)) and the threshold (θ(2)i(m
), θ(3)(n)) will be described.

カテゴリｎの学習に用いる特徴ベクトルの時系列の組を
Ａ”’、＝　（ａ”４．＝（ｐ））とする。ここでｑは
同じカテゴリ内の複数の時系列パターンを区別する添字
、ｉは時系列の時間軸を表わす添字、ｐは各時刻での特
徴ベクトルの成分を表わす添字である。各添字の範囲はｎ＝１〜Ｎ、ｑ＝１〜Ｑｎ、ｉ＝１〜Ｉｑ、ｐ＝１〜Ｐ
・・・（１４）ネットワークにこのデータＡ（ｎ）を提
示する為には時系列の長さＩｑをネットワークの入力層
の時間軸の長さＪに正規化しなければならない。学習時
にはモデルのパラメータが最適化されていないので、認
識時のように動的計画法を用いることは難しい。Let A''', = (a''4.=(p)) be a time-series set of feature vectors used for learning category n. Here, q is a subscript that distinguishes multiple time series patterns within the same category, i is a subscript that represents the time axis of the time series, and p is a subscript that represents the component of the feature vector at each time. The range of each subscript is n=1 to N, q=1 to Qn, i=1 to Iq, p=1 to P
(14) In order to present this data A(n) to the network, the length Iq of the time series must be normalized to the length J of the time axis of the input layer of the network. Since the model parameters are not optimized during learning, it is difficult to use dynamic programming as during recognition.

そこで学習の為にはカテゴリｎのデータの集合Ａ（ｎ）
、（ｑ　＝＝　ｌ〜Ｑ”）の中から代表となる時系列パ
ターンＡ（ｎ）、ｏを選び出し、それ以外のデータＡ（
ｎ−（ｑ≠ｑｏ）（７）時間軸をＤＰマツチングによっ
て前記代表パターンの時間軸に対応付ける。その方法を
次に示す。代表パターンＡ（ｎ）、ｏの時間軸をｊθ＝
１〜Ｊ）、時間軸の対応付け（正規化）を行ないたいデ
ータＡ”’ｑ（ｑｆ−ｑｏ）の時間軸をｉ（ｉ　＝　１
〜■）とする。このとき２つのパターンをＤＰマツチン
グすることによって２つのパターンの時間軸の間の対応
関係（歪関数）ｉ＝ｉ（ｉ）が得られる。Therefore, for learning, we need a set of data of category n A(n)
, (q == l~Q''), the representative time series patterns A(n), o are selected, and the other data A(
n-(q≠qo) (7) Match the time axis to the time axis of the representative pattern by DP matching. The method is shown below. Representative pattern A(n), the time axis of o is jθ=
1 to J), set the time axis of data A'''q (qf-qo) for which you want to correlate (normalize) the time axis to i (i = 1).
～■). At this time, by DP matching the two patterns, a correspondence relationship (distortion function) i=i(i) between the time axes of the two patterns is obtained.

ＤＰマツチングと歪関数に関しては「日経エレクトロニ
クス１誌、第３２９号の第１７１頁（昭和５８年１１月
７日発行）に詳しく解説されている（以下、文献２と呼
ぶ）。この歪関数ｉψによって代表パターンの時間軸ｊ
には学習データの時間軸ｉ＝ｉψのフレーム・ベクトル
Ｂｎ、、ψを対応付ければ良いことが分かる。この歪関
数はＤＰマツチングに用いる局所的な経路の制限の仕方
によってはｊ＝ｊ（ｉ）のような形になり、あるｊに対
応するフレーム・ベクトルが複数存在することが起こる
が、このような場合にも対応するフレーム・ベクトルを
平均化することによって同様の時間軸対応付けが行える
。DP matching and distortion function are explained in detail in Nikkei Electronics 1, No. 329, page 171 (published on November 7, 1982) (hereinafter referred to as Document 2). Representative pattern time axis j
It can be seen that it is sufficient to associate the frame vectors Bn, , ψ of the time axis i=iψ of the learning data. Depending on how the local paths used for DP matching are restricted, this distortion function takes the form j = j(i), and there may be multiple frame vectors corresponding to a certain j. Even in such cases, similar time axis correspondence can be achieved by averaging the corresponding frame vectors.

この結果、データ毎にばらついていた時間長Ｉｑが一定
の長さＩｑＯに正規化される。ネットワークの入力層の
時間長Ｊはこのｉｑｏに等しく設定する。As a result, the time length Iq, which varies from data to data, is normalized to a constant length IqO. The time length J of the input layer of the network is set equal to this iqo.

ここでカテゴリーｎの代表パターンの選び方としては様
々な方法が考えられるが、例えばカテゴリｎのパターン
集合の中でパターン間のＤＰマツチングによる累積距離
ｄ（Ａ、。、Ａ、）をパターン間距離として、次式で与
えられる量Δ、を最小にするようなｑ。とする。このｑ。はすべてのｑ
＝１〜Ｑ”をｑ。と仮定してΔを計算する総当たり法に
よって容易に求めることができる。この他にも任意の１
パターンを代表にすることも可能である。Here, various methods can be considered to select the representative pattern of category n, but for example, the cumulative distance d(A,.,A,) by DP matching between patterns in the set of patterns of category n is used as the distance between patterns. , q such that the quantity Δ, given by: is minimized. shall be. This q. is all q
It can be easily determined by the brute force method that calculates Δ by assuming that = 1~Q'' is q.
It is also possible to use a pattern as a representative.

こうして時間軸を長さＪに正規化した入力学習データを
Ａ（”ｑ　＝（”ｑ、４（ｐ））（ｘ　＝　１〜Ｊ）と
する。また、同じ長さＪに正規化された他のカテゴリの
学習データをＢ（ｍ）ｒ　＝　（ｂ”、、１（ｐ））（
ｒ　＝　１〜Ｒ）とする（以後このＢを反学習データと
呼ぶ）。このときｑ番目の学習データに対するネットワ
ークの出力値をｙ（３）、（ｎ）、望ましい出力値をＺ
ｑ（ｎ）（＝１−０）、ｒ番目の反学習データに対する
第ｎユニットの出力値をｙ（３）、（Ｈ）、望ましい出
力値をｚｒ（ｎ）（＝　０．０）とすると、出カニニッ
ト層に於ける出力値の誤差Ｅは・・・（１６）で与えられる。この誤差量Ｅは学習によって決定しなけ
ればならないユニット間結合係数（ｐＯ，（ｍ、ｐ）、１３’、（ｍ、ｐ）、ａ”（ｊ、
ｍ））と閾値（θ（２）Ｊ（ｍ）、θ（３）（ｎ））の
関数と考えられるのでＥを評価関数として最小化するよ
うにこれらのパラメータを決定すればよい。またユニッ
トの閾値は常に１を出力するユニッを仮想的に考えて、
そのユニットとの結合係数と考えればユニット間結合と
同じように学習することかできる。そこで隣接する２層
、第ｎ層のユニットｉと第ｎ＋１層のユニットｊを結ぶ
ユニット間結合係数をωｎ０．とすると、このωｎ、、
に関するＥの微係数を用いて１ｊ　　　　　　　　　　
　　　　　　リω”、（ｔ＋１）＝ω”１ｊ（ｔ）−ｅ
（δＥ／δωｎ１ｊ）ｔ・・・（１７）とすれば、必ず
、Ｅ（ｔ＋１）≦Ｅ（ｔ）　　　　　　　　　　　　　・
・・（１８）となる。ここでｔは繰り返し学習のステッ
プを表わす整数値、εは修正の程度を決める定数である
。結局、Ｅを小さくするようにωｎ０．を繰り返し修正
するＪことがパラメータの学習になるのである。ここでωｎ９
．と前記モデルのユニット間結合係数す（１３°、（ｍ＋ｐ）＋１３’、（ｍ＋ｐ）ｔａｎ（ｉ
、ｍ）＋θ（２１ｊ（ｍ）、θ（３）（ｎ））とは例え
ば次のように対応付ければよい。The input learning data with the time axis normalized to the length J in this way is defined as A("q = ("q, 4(p)) (x = 1~J). Also, the input learning data with the time axis normalized to the length J B(m)r = (b”,,1(p))(
r = 1 to R) (hereinafter, this B will be referred to as anti-learning data). At this time, the output value of the network for the qth learning data is y(3), (n), and the desired output value is Z
Let q(n) (=1-0), the output value of the nth unit for the rth unlearning data be y(3), (H), and the desired output value be zr(n) (=0.0). , the error E of the output value in the output knit layer is given by (16). This error amount E is determined by the inter-unit coupling coefficient (pO, (m, p), 13', (m, p), a'' (j,
m)) and threshold values (θ(2)J(m), θ(3)(n)), these parameters may be determined so as to minimize E as an evaluation function. Also, for the unit threshold, hypothetically consider a unit that always outputs 1,
If you think of it as a coupling coefficient with that unit, you can learn it in the same way as coupling between units. Therefore, the inter-unit coupling coefficient connecting two adjacent layers, unit i of the n-th layer and unit j of the n+1-th layer, is set as ωn0. Then, this ωn,,
1j using the derivative of E with respect to
riω”, (t+1)=ω”1j(t)−e
If (δE/δωn1j)t...(17), then E(t+1)≦E(t) ・
...(18). Here, t is an integer value representing the step of iterative learning, and ε is a constant that determines the degree of correction. In the end, in order to reduce E, ωn0. Repeatedly correcting J becomes parameter learning. Here ωn9
．． and the inter-unit coupling coefficient of the model (13°, (m+p)+13', (m+p)tan(i
, m)+θ(21j(m), θ(3)(n)) may be associated as follows, for example.

Ｅの微係数は解析的な計算の結果次式のようになること
が分かる。As a result of analytical calculation, the differential coefficient of E is found to be as shown in the following equation.

ここでδ（ｎ＋１）、、はｑ番目の学習（または反学習
）データを入力層に提示した場合の第ｎ＋１層のユニッ
トｉの入力値に換算された誤差で、ｙ（ｎ）、、はｑ番
目の学習データに対する第ｎ層のユニットｊの出力値で
ある。Here, δ(n+1), is the error converted to the input value of unit i in the n+1 layer when the q-th learning (or anti-learning) data is presented to the input layer, and y(n), is This is the output value of unit j of the n-th layer for the q-th learning data.

δ（ｎ）、は次のような漸化式を用いて計算することが
Ｉｑできる。δ(n) can be calculated using the following recurrence formula.

ユニでｆ（ｘ）は弐６で与えられるユニットの入出力応
答関数で、ｘ（ｎ）、は第ｎ層のユニットｉへの入力値
、２、は第Ｎ層（出力層）のユニットｉがとるべき値で
学習の時には１．０で反学習の時には０．０である。こ
の式２１に基づいて、各ユニットに換算された誤差量δ
を求める計算が出力層から入力層の方向に進むので、こ
の学習法は逆伝播学習法（バック・プロパゲーション学
習法）と呼ばれている（詳細は文献１を参照のこと）。In Uni, f(x) is the input/output response function of the unit given by 26, x(n) is the input value to unit i in the nth layer, and 2 is the input value to unit i in the Nth layer (output layer). The value that should be taken is 1.0 during learning and 0.0 during anti-learning. Based on this formula 21, the error amount δ converted to each unit
This learning method is called a back-propagation learning method (for details, see Reference 1), since the calculation for obtaining .

結局、ユニット間結合係数に任意の初期値を与えたモデ
ルから出発して、複数の学習・反学習データを提示して
、各ユニット間結合に関して上記の繰り返し訂正学習を
行なえば、出力層での誤差を極小化するユニット間結合
の組を得ることができる。In the end, if you start from a model in which arbitrary initial values are given to the inter-unit coupling coefficients, present multiple learning/unlearning data, and perform the above-mentioned iterative correction learning on the inter-unit coupling, the output layer A set of inter-unit connections that minimizes the error can be obtained.

（実施例）以下に式１３の漸化式計算の為の（ｉｊ）平面上での時
間軸対応付は規則（Ｃ（ｋ）とｃ（ｋ−１）の相対位置
関係）として第３図のような規則を用いた場合の本発明
の詳細な説明する。第３図の場合はｃ（ｋ）＝　（ｉｊ
）とするとｃ（ｋ−１）としては（１−１ｊ）、（ｉ−
１ｊ−１）、（ｉ−１ｊ−２）の３点だけが可能になる
。このように対応付は規則の場合にはニューラル・ネッ
トワークの出力を決める式（１２）、（１３）は次のよ
うに書ける。(Example) The time axis correspondence on the (ij) plane for calculating the recurrence formula of Equation 13 is shown below as a rule (relative positional relationship between C(k) and c(k-1)) as shown in Figure 3. The present invention will be described in detail when using rules such as: In the case of Figure 3, c(k) = (ij
), then c(k-1) is (1-1j), (i-
Only three points, 1j-1) and (i-1j-2), are possible. In this way, when the correspondence is a rule, equations (12) and (13) that determine the output of the neural network can be written as follows.

■ ・・・（２３）ｇ”（ｉｊ）＝γ”（ｉｊ）＋ｍａｘ［ｇ’（ｉ−１ｊ
）、ｇ”（ｉ−１ｊ−１）。■ ...(23) g"(ij)=γ"(ij)+max[g'(i-1j
), g”(i-1j-1).

ｇ”（ｉ−１ｊ　−２）］　　　　　・・・（２４）第
１図は式（２２）〜（２４）に基づいて本発明を実現し
た一実施例を示したブロック図である。分析部１０は入
力された音声波形データを分析して特徴ベクトルの時系
列に変換して、パターンバッファ部２０に記ｉする。パ
ターンバッファ部２０には学習動作時には学習用時系列
データが記憶され、認識動作時には未知発声の分析デー
タが記憶される。続く切り替えスイッチによって学習動
作と認識動作の切り替えを行なう。g"(i-1j-2)]...(24) FIG. 1 is a block diagram showing an embodiment of the present invention based on equations (22) to (24). Analysis section 10 analyzes the input audio waveform data, converts it into a time series of feature vectors, and writes it in the pattern buffer unit 20.The pattern buffer unit 20 stores the learning time series data during the learning operation, and performs the recognition operation. Sometimes analysis data of unknown utterances is stored.A subsequent switch is used to switch between learning and recognition operations.

時間軸整合部３０は学習データ群中の各カテゴリの代表
パターンを式１５に基づいて決定して、他の学習データ
を代表パターンへＤＰマツチングすることによって時間
軸の整合を行い、すべての学習データの時間長を長さＪ
へ規格化する。修正量計算部４０は時間軸整合部３０か
ら送られた学習データとユニット間結合係数記憶部５０
に蓄えられた結合係数を用いて、式１７，２０．２１に
基づいて結合係数ωｎ、ｔｙ＞修正量Δωｎ１．を算出
して、結合係数修正部６０に送りる。結合係数修正部６０はユニット間結合係数記憶部５
０に蓄えられた結合係数に前記修正量Δωｎ、、を加すえて、書き戻す。修正量計算部４０はすべての結合係数
に対する修正量Δωｎ０．が予め定められた閾値よりＪ小さくなるまでか、あるいは修正回数が予め定められた
回数を越えるまで、この修正動作を繰り返す。The time axis matching unit 30 determines the representative pattern of each category in the learning data group based on Equation 15, performs DP matching of other learning data to the representative pattern, and performs time axis matching to match all the learning data. Let the length of time be J
Standardize to. The correction amount calculation section 40 uses the learning data sent from the time axis alignment section 30 and the inter-unit coupling coefficient storage section 50.
Using the coupling coefficients stored in Equations 17 and 20.21, coupling coefficient ωn,ty>correction amount Δωn1. is calculated and sent to the coupling coefficient correction section 60. The coupling coefficient correction section 60 is connected to the inter-unit coupling coefficient storage section 5.
The correction amount Δωn, , is added to the coupling coefficient stored as 0 and written back. The correction amount calculation unit 40 calculates correction amounts Δωn0. for all coupling coefficients. This correction operation is repeated until J becomes smaller than a predetermined threshold value or until the number of corrections exceeds a predetermined number of times.

格子点計算部７０はパターンバッファ部２０から送られ
た未知発声データとユニット間結合係数記憶部５０に蓄
えられた結合係数を用いて、式２３に基づいて格子点デ
ータ７”（ｉｊ）（ｉ　＝　１〜Ｉｊ＝　１〜Ｊ、ｎ　
＝　１〜Ｎ）を計算する。計算された格子点データは格
子点記憶部８０に格納される。漸化式計算部９０は格子
点記憶部８０に蓄えられた格子点データを用いて、式２
４に基づく漸化式計算を行なって累積値ｇｎ（Ｉ、Ｊ）
を作業用記憶部１００に格納する。作業用記憶部１００
は漸化式計算途中にもｇｎ（ｉｊ）の記憶に用いられる
。認識判定部１１０は作業用記憶部１００に格納された
累積値ｇ”（Ｉ、Ｊ）の中から最大の累積値を与えるｎ
の値を認識結果として出力する。The lattice point calculation section 70 uses the unknown utterance data sent from the pattern buffer section 20 and the coupling coefficients stored in the inter-unit coupling coefficient storage section 50 to calculate the lattice point data 7''(ij)(i = 1~Ij=1~J, n
= 1 to N). The calculated grid point data is stored in the grid point storage section 80. The recurrence formula calculation section 90 uses the lattice point data stored in the lattice point storage section 80 to calculate Equation 2.
Perform the recurrence formula calculation based on 4 to obtain the cumulative value gn(I, J)
is stored in the working storage unit 100. Working storage unit 100
is also used to store gn(ij) during the calculation of the recurrence formula. The recognition determination unit 110 gives the maximum cumulative value n from among the cumulative values g”(I, J) stored in the working storage unit 100.
The value of is output as the recognition result.

（発明の効果）以上述べたように、本発明によれば認識動作時に未知音
声データの発声時間長の変動を動的計画法によって正規
化してニューラル・ネットワークに入力することができ
る時間軸の正規化能力を有するニューラル・ネットワー
クを提供できる。このように本発明のニューラル・ネッ
トワークは認識動作時に時間軸正規化能力を有するので
、学習動作時には音声データの発声毎の特徴パラメータ
の変動を小数の学習データ（発声時間長の変動による多
様性を持たなくてよい）を用いて学習することによって
、良好な認識装置を提供することができる。(Effects of the Invention) As described above, according to the present invention, it is possible to normalize the time axis by using dynamic programming to normalize fluctuations in the utterance time length of unknown speech data during a recognition operation and input it into a neural network. It is possible to provide a neural network with the ability to In this way, the neural network of the present invention has the ability to normalize the time axis during recognition operation, so during learning operation, it is possible to reduce the variation in feature parameters for each utterance of audio data by using a small number of learning data (the diversity due to the variation in utterance duration). It is possible to provide a good recognition device by learning using the following information.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック図、第２図は
ニューラル・ネットワークの階層構造を表わす図、第３
図は漸化式計算の為の（ｉｊ）平面上での時間軸対応骨
は規則の例を表わす図である。図に於て、１０は分析部、２０はパターンバッファ部、
３０は時間軸整合部、４０は修正量計算部、５０はユニ
ット間結合係数記憶部、６０は結合係数修正部、７０は
格子点計算部、８０は格子点記憶部、９０は漸化式計算
部、１００は作業用記憶部、１１０は認識判定部である
。Fig. 1 is a block diagram showing an embodiment of the present invention, Fig. 2 is a diagram showing the hierarchical structure of a neural network, and Fig. 3 is a block diagram showing an embodiment of the present invention.
The figure shows an example of rules for bones corresponding to the time axis on the (ij) plane for calculating recurrence formulas. In the figure, 10 is an analysis section, 20 is a pattern buffer section,
30 is a time axis alignment section, 40 is a correction amount calculation section, 50 is an inter-unit coupling coefficient storage section, 60 is a coupling coefficient correction section, 70 is a lattice point calculation section, 80 is a lattice point storage section, and 90 is a recurrence formula calculation 100 is a working storage section, and 110 is a recognition determination section.

Claims

【特許請求の範囲】[Claims]

（１）音声等の時系列パターンを認識するニューラル・
ネットワークで、入力・出力層と複数の中間層から構成
される階層構造を有し、更に入力層と中間層が時間軸に
対応する時系列的構造を有し、認識時には動的計画法に
よって入力時系列パターンの時間軸をニューラル・ネッ
トワークの出力が最大となるように入力層の持つ時間軸
と対応付けを行い、その時の出力層の出力を認識結果と
するダイナミック・ニューラル・ネットワークに於て、
その各階層間のユニット間結合係数を学習するに際して
、入力層の時間長と同じ一定の継続時間長に正規化した
学習用時系列パターンを入力層に提示し、出力層には対
応して出力すべき教師信号を提示して、出力層での教師
信号と実際の出力値との差異を小さくするように結合係
数を決定する教師付き学習を行なう機構を有するダイナ
ミック・ニューラル・ネットワーク。(1) Neural technology that recognizes time-series patterns such as voices, etc.
The network has a hierarchical structure consisting of input/output layers and multiple intermediate layers, and the input layer and intermediate layers have a time-series structure corresponding to the time axis. During recognition, input is input using dynamic programming. In a dynamic neural network, the time axis of the time series pattern is associated with the time axis of the input layer so that the output of the neural network is maximized, and the output of the output layer at that time is the recognition result.
When learning the inter-unit coupling coefficient between each layer, a training time series pattern normalized to the same constant duration as the input layer is presented to the input layer, and the output layer is output accordingly. A dynamic neural network that has a mechanism for performing supervised learning that presents a teaching signal to be used and determines coupling coefficients to reduce the difference between the teaching signal and the actual output value in the output layer.

（２）上記学習用時系列パターンの継続時間長のバラツ
キの正規化を、代表パターンへのＤＰマッチングによっ
て行なうことを特徴とする請求項（１）記載のダイナミ
ック・ニューラル・ネットワーク。(2) The dynamic neural network according to claim 1, wherein the variation in the duration of the learning time-series pattern is normalized by DP matching to a representative pattern.

（３）上記ユニット間結合係数の教師付き学習を、バッ
クプロパゲーション学習法によって実現することを特徴
とする請求項（１）記載のダイナミック・ニューラル・
ネットワーク。(3) The dynamic neural system according to claim (1), wherein the supervised learning of the inter-unit coupling coefficients is realized by a backpropagation learning method.
network.