JP3816762B2

JP3816762B2 - Neural network, neural network system, and neural network processing program

Info

Publication number: JP3816762B2
Application number: JP2001169464A
Authority: JP
Inventors: 敏夫辻; 修福田
Original assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Current assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Priority date: 2001-06-05
Filing date: 2001-06-05
Publication date: 2006-08-30
Anticipated expiration: 2021-06-05
Also published as: JP2002366927A

Description

【０００１】
【発明の属する技術分野】
本発明は、ニューラルネットワーク、ニューラルネットワークシステム及びニューラルネットワーク処理プログラムに係り、特に、脳波や筋電位など生体信号のパターン認識システムに利用することができるニューラルネットワーク、ニューラルネットワークシステム及びニューラルネットワーク処理プログラムに関する。
【０００２】
【従来の技術】
Rumelhartら（文献［1］参照）（［］内は後述の参考文献の番号を表す。）によって提案された誤差逆伝搬型ニューラルネットは任意の非線形写像を獲得できる強力な学習能力を備えており、さまざまなパターン識別問題に応用されている。しかしながら、対象とする問題によってはまだ多くの課題が残されている。例えば、学習対象の写像が複雑になるにつれて、膨大な教師信号数や学習時間、大きなネットワーク構造が必要になり、ローカルミニマムなどの課題も懸念される。
【０００３】
このような課題を解決するための試みの１つに、学習対象に関して既知である特性をネットワークの構造として取り込むという方法がある（文献［2][3］参照）。この方法では結合荷重とネットワーク則に制約を科すことで、ニューラルネットを与えられた問題に適したように調節することが可能である。パターン識別問題に関しても、多くの研究者がニューラルネットと確率モデルの融合（文献［4］参照）を試みており、特に混合ガウス分布モデルを利用したニューラルネットが数多く提案されている（文献［5]-[11］参照）。
【０００４】
混合ガウス分布モデルとは、複数のコンポーネントと呼ばれる確率分布の線形和によって母集団の分布を近似する方法で、各コンポーネントにはガウス分布を用いる（文献［12］参照）。この混合ガウス分布モデルに基づいたニューラルネットの開発はTraven（文献［5］参照），Perlovsky and Mc-Manus（文献［6］参照）,Tsuji et al. （文献［7][8］参照）,Jordan and Jacobs（文献［9］参照）,Lee and Shimoji（文献［10］参照）,Streit and Luginbuhl（文献［11］参照）によって行われている。特に本発明者（辻）らは混合ガウス分布モデルと対数線形モデルに基づいたLog-Linerized Gaus-sian Mixture Network（LLGMN）を提案し（文献［8］参照）、混合ガウス分布モデルに基づいて学習的に事後確率を推定する方法を示した。またこのLLGMNを用いて、複数の電極対によって計測した節電位（ＥＭＧ）信号から、前腕と手の６動作を識別することに成功している（文献［13］参照）。
【０００５】
1． LLGMN
図１８に、LLGMNの構成図を示す（文献［8］参照）。まず最初に、入力ベクトルｘ＝［ｘ_１，ｘ_２，・・・，ｘ_ｄ］^Ｔ ∈Ｒ^ｄに非線形演算による前処理を施し、Ｘ ∈Ｒ^Ｈに変換する。
【０００６】
【数７】

【０００７】
第１層はベクトルＸの次元数Ｈ＝１＋ｄ（ｄ＋３）／２にあわせてＨ個のユニットからなり、入出力関数には恒等関数を用いる。第１層の入出力関係は、入力を^（１）Ｉ_ｊ，出力を^（１）Ｏ_ｊとすると、
^（１）Ｉ_ｊ＝Ｘ_ｊ（２）
^（１）Ｏ_ｊ＝^（１）Ｉ_ｊ（３）
となる。
【０００８】
第２層は混合ガウス分布の総コンポーネント数と同数のユニットから構成され、第１層の出力を重み係数ｗ_ｈ ^{（ｃ、ｍ）}を介して受け取り、事後確率を出力する。第２層のユニット｛ｃ，ｍ｝への入力を^（２）Ｉ_ｃ，ｍ出力を^（２）Ｏ_ｃ，ｍとすると、
【０００９】
【数８】

【００１０】
となる。ただし、（ｃ＝１，・・・，Ｃ；ｍ＝１，・・・，Ｍ_ｃ）で、Ｃは対象とするクラス数，Ｍ_ｃはクラスｃを構成する混合ガウス分布のコンポーネント数を表す。また、ｗ_ｈ ^{（ｃ、Ｍｃ）}＝０
である。
第３層は、事象数に対応するＣ個のユニットからなり、事象ｃの事後確率を出力する。ユニットｃは、第２層のＭ_ｃ個のユニット｛ｃ，ｍ｝の出力を統合したものである。入出力間の関係は、
【００１１】
【数９】

【００１２】
となる。以上のように、このネットワークは第１層と第２層の間の重み係数ｗ_ｈ ^{（ｃ、ｍ）}を学習的に調節するだけで、各事象の事後確率を計算することができる。
しかしながら、これらのニューラルネットは対象とする問題の静的な特性しか学習することができず、対象が時間とともに変動するような動的なデータには適用することはできない。
【００１３】
２．隠れマルコフモデル
時系列信号の識別には隠れマルコフモデル（Hidden Markov Model：ＨＭＭ）（文献［14］参照）が一般的によく用いられており、特に音声認識の分野で成功を収めている（文献［17］参照）。ＨＭＭでは、状態数Ｎ，出力記号の数Ｍに対応して。状態遷移確率行列Ａ ∈Ｒ^Ｎ×Ｎ出力確率行列Ｂ ∈Ｒ^Ｍ×Ｎ，初期状態確率行列π ∈Ｒ^Ｎ×Ｎの３つの確率行列が必要となる。ＨＭＭをパターン識別に応用するには、それぞれの事象ｃに対する確率行列Ａｃ，Ｂｃ，πｃを、事象ｃに属する特定の出力記号列Ｏ_１Ｏ_２・・・Ｏ_ｔの出力確率の尤度が最大になるようなパラメータを推定する。ここで、観測された出力記号列Ｏ_１Ｏ_２・・・Ｏ_ｔにおいて、時刻ｔに事象ｃの状態Ｓｉにいる確率α_ｔ（ｉ）は、
α_ｔ（ｉ）＝Ｐ（Ｏ_１Ｏ_２・・・Ｏ_ｔ，Ｓ_ｉ｜ｃ）（３１）
である。このα_ｔ（ｉ）は以下のように帰納的に計算可能で、観測した記号列出力の確率Ｐ（Ｏ｜ｃ）を求めることができる
【００１４】
【数１０】

【００１５】
ここで、πｉは状態ｉの初期確率、ｂ_ｊ（Ｏ_ｔ）は状態ｉからＯ_ｔが出力される確率、ａ_ｉｊは状態ｉから状態ｊの遷移確率を表す。そして、Baum-Welthアルゴリズム（文献［14］参照）によって教師信号から遷移確率と出力確率の両方を推定することができる（文献［18］参照）。
さらにBaum and Sell（文献［19］参照）はBaum-Welthアルゴリズムに若干の制約条件を加えた連続密度ＨＭＭ(Continuous Density ＨＭＭ：以下CDHMM）を提案した：このCDHMMは連続信号を推定するために確率密度関数に混合ガウス分布モデルを利用しており、隠れマルコフモデルの能力を大きく向上させた。Juangら（文献［20］参照）は、このCDHMMの推定アルゴリズムを拡張して、音声認識に応用している。このようにして、ＨＭＭは高い試用能力を有しており、現実的に非常に効異的な手法である。
【００１６】
【発明が解決しようとする課題】
しかしながら、ＨＭＭの構造は未知なので経験や定義を頼りにパラメータを決定しなければならない。またＨＭＭの学習には大量の学習データが必要であり、学習データが少ないときには必要なパラメータをうまく推定できない場合もある。そこで近年、ＨＭＭとニューラルネットの融合に関する研究が盛んにおこなわれている（文献［21］［22］参照）。Bridle（文献［21］参照）らはＨＭＭをニューラルネット構造で表現する方法を示した。しかしながらこの方法はＨＭＭの構造をネットワークで展開したにすぎず、本質的な能力は同じである。またBourlard and Wellekens（文献［22］参照）は、適切なリカレント結合を有するネットワークならば、ＨＭＭとして機能できることを明らかにした。
【００１７】
リカレント型（相互結合型）ニューラルネットワークは、ユニットを相互に結合し、内部で信号をフィードバックすることによって、動的なモデルを表現できることが知られている。しかし、混合ガウス分布を用いたパターン認識ができるニューラルネットワークに、リカレント構造を組み込む方式は知られておらず、組み込んだとしてもそれが一体として学習できる構造になるかどうかは明らかでなかった。
【００１８】
また、通時的誤差逆伝播学習(back propagation through time)において、学習の終了時刻を予測することは難しく、問題の難しさに依存して学習に必要な時間が大きく変化してしまう。特に生体信号のパターン認識問題のように実行前には必ず学習を行なわなければならないような場合には、学習の終了を待つユーザの負担が大きい。これはニューラルネットのエネルギー関数の収束時間が事前に分からないためである。
【００１９】
本発明は、以上の点に鑑み、簡素な構造で、時系列特性の学習が可能なニューラルネットワークを提供することを目的とする。また、本発明は、対数線形化された混合正規分布モデルに基づく５層を備え、リカレント構造を有し、動的なデータにも対応可能なニューラルネットワークシステムを提供することを目的とする。また、本発明は、学習前に終了時刻を明示することによって、学習収束を待つユーザ負担が軽減できるニューラルネットを提供することを目的とする。
【００２０】
【課題を解決するための手段】
本発明は、LLGMNの能力改善を目的とし、時間とともに変化する動的なデータにも適応可能な新しいリカレント型ニューラルネットR-LLGMNを提案した。このニューラルネットは、動的な確率モデルの1つである隠れマルコフモデルを内包しており、LLGMNでは不可能だった時系列特性の学習を行うことが可能であることを目的とする。また本発明は、遷移確率や出力確率密度関数などの統計パラメータをネットワークの重みとして表現しており、誤差逆伝播方式の学習則によりその重みを調節することを目的とする。提案手法の有効性・妥当性を検証するために、時間とともに激しく変動するＥＥＧ信号の識別実験を行い、十分な識別精度を実現した。さらに、本発明は、ターミナルラーニング方式を導入しているので、生体信号のパターン認識時の終了時刻を予め表示し、ユーザの精神的な負担を軽減することを目的とする。
【００２１】
本発明の第１の解決手段によると、
階層型ニューラルネットワークであって、少なくとも入力を非線形変換する前処理手段と、
混合ガウス分布モデルを構成する層と、
層間にフィードバックを有するリカレント構造をなす層と
を備えたニューラルネットワークが提供される。
【００２２】
本発明の第２の解決手段によると、
入力された入力ベクトルが入力され、入力ベクトルの各要素を組合わせて非線形変換して出力する前処理手段と、
前記前処理手段により非線形変換された値が入力され、入力を複数に分岐して出力する第１の層と、
前記第１の層の各ユニットの出力に、学習によって取得される重み係数を乗じた値が統合されたものが入力され、予め定められた関数で変換して出力する第２の層と、
前記第２層の各ユニットの状態ｋに対応する分布モデルのコンポーネントについての出力が統合されたものが入力され、同じ事象で第４の層の対応する状態の１時刻前の出力とを乗じて出力する第３の層と、
第３の層の各ユニットの状態についての出力が入力され、該入力を状態と事象・クラスについての出力が統合されたもので除算して出力する第４の層と、
第４の層の各ユニットの状態についての出力が統合されたものが入力され、入力を前記事象・クラスの分析結果として出力する第５の層と
を備えたニューラルネットワークシステムが提供される。
【００２３】
本発明の第３の解決手段によると、
処理部が、状態数ｋ（ｋ＝１，２，…，Ｋ_ｃ）、コンポーネント数ｍ（ｍ＝１，２，…，Ｍ_ｃ，ｋ）、事象・クラス数ｃ（ｃ＝１，２，…，Ｃ）、入力信号数ｄ、時系列信号長Ｌ、収束時間ｔ_ｆ、学習データ数Ｎの設定を行う機能と、
処理部が、ターミナルラーニングに基づき学習率η_ｔを計算する機能と、
処理部が、学習回数Ｔ及び学習データ数Ｎに基づき、上述のようなニューラルネットワークにより前向き演算を実行する前向き演算機能と、
処理部は、重みｗ^ｃ _{ｋ’，ｋ，ｍ，ｈ}を含む学習結果を記憶部に記憶する機能と
を備えたニューラルネットワークシステムが提供される。
【００２４】
本発明の第４の解決手段によると、
入力された入力ベクトルが入力され、入力ベクトルの各要素を組合わせて非線形変換して出力する前処理機能と、
第１の層により、前記前処理手段により非線形変換された値が入力され、入力を複数に分岐して出力する機能と、
第２の層により、前記第１の層の各ユニットの出力に、学習によって取得される重み係数を乗じた値が統合されたものが入力され、予め定められた関数で変換して出力する機能と、
第３の層により、前記第２層の各ユニットの状態ｋに対応する分布モデルのコンポーネントについての出力が統合されたものが入力され、同じ事象で第４の層の対応する状態の１時刻前の出力とを乗じて出力する機能と、
第４の層により、第３の層の各ユニットの状態についての出力が入力され、該入力を状態と事象・クラスについての出力が統合されたもので除算して出力する機能と、
第５の層により、第４の層の各ユニットの状態についての出力が統合されたものが入力され、入力を前記事象・クラスの分析結果として出力する機能と
をコンピュータに実現させるためのニューラルネットワーク処理プログラムが提供される。
【００２５】
【発明の実施の形態】
１．混合ガウス分布を有するＨＭＭの対数線形化
本発明の実施の形態のネットワークは動的な確率モデルの１つである隠れマルコフモデル（文献［14］参照）をネットワーク構造として内包しており、通時的誤差逆伝搬アルゴリズム（文献［15］参照）により学習的に重みを調節することができる。
そこで以下に、上記の構成によるリカレント型混合ガウス分布モデルと対数線形モデルに基づいたネットワーク（R-LLGMN）がＨＭＭを構造として内包していることを明らかにする。
【００２６】
図１に、混合ガウス型の連続確率密度分布を持つＨＭＭを示す図である。
対象とする事象はＣ個で、それぞれの事象（ｃ∈｛１，…，Ｃ｝）はＫ_ｃ個の状態から構成されている。このとき、与えられる時系列ｘ（ｔ）＝［ｘ（１），ｘ（２），…，ｘ（ｔ）］^Ｔに対する事象ｃの事後確率Ｐ（ｃ｜ｘ（ｔ））は、次式のようになる。
【００２７】
【数１１】

【００２８】
ただし、γ^c _ｋ’，ｋは事象ｃにおいて状態ｋ’から状態ｋに遷移する確率、ｂ^ｃ _ｋ（ｘ（ｔ））はｘ（ｔ）に対応した事象ｃの状態ｋからの事後確率を表している。また、事前確率π^ｃ _ｋはＰ（ｃ，ｋ）｜_ｔ＝０に等しい。ここで、事後確率ｂ^ｃ _ｋ（ｘ（ｔ））をコンポーネント数Ｍ_ｃ，ｋから成る混合ガウス分布によって与えられるとすると（文献［８］［２３］参照）、（３７）式のγ^c _ｋ’，ｋｂ^c _ｋ（ｘ（ｔ））は、
【００２９】
【数１２】

【００３０】
となる。ただし、γ_{ｃ，ｋ，ｍ}，μ^{（ｃ，ｋ，ｍ）}∈Ｒ^ｄ，Σ^{（ｃ，ｋ，ｍ）} ∈ Ｒ^ｄ×ｄはそれぞれ混合度、平均ベクトル、共分散行列である。平均ベクトルμ^{（ｃ，ｋ，ｍ）}＝（μ^{（ｃ，ｋ，ｍ）} _１，…，μ^{（ｃ，ｋ，ｍ）} _ｄ）^Ｔと共分散行列の逆行列
Σ^{（ｃ，ｋ，ｍ）−１}＝［ｓ^{（ｃ，ｋ，ｍ）} _ｉｊ］
を用いると、（３９）式の右辺は、
【００３１】
【数１３】

【００３２】
と展開できる。ただし、δ_ｉｊはクロネッカーデルタで、ｉ＝ｊのとき１、ｉ≠ｊのとき０である。ここで、（４０）を対数線形化する（文献［８］参照）。
【００３３】
【数１４】

【００３４】
ここで、（４２）式はR-LLGMNの非線形の前処理を表しており（（１）式参照）、ξ^ｃ _ｋ’，_ｋ，ｍは係数ベクトルβ^ｃ _ｋ’，_ｋ，ｍと変換された入力Ｘ∈Ｒ^Ｈの積として表現することができる。以上より、β^ｃ _ｋ’，_ｋ，ｍを重み係数とみなすことで、ニューラルネットの構造として表現することが可能となった。
しかしながら、事後確率Ｐ（ｃ，ｋ｜ｘ（ｔ））の総和である次式は１であるため、事後確率のパラメータとしてξ^ｃ _ｋ’， _ｋ，ｍは冗長である。
【００３５】
【数１５】

【００３６】
そこで、新しく変数Ｙ^ｃ _ｋ ^’，，_ｋ，ｍ係数ベクトルＷ^ｃ _ｋ ^’，_ｋ，ｍを導入すると、
【００３７】
【数１６】

【００３８】
となる。ただし、定義よりＷ^ｃ _{Ｋｃ，ｋｃ，Ｍｃｋ}＝０である。本発明ではこのベクトルＷ^ｃ _ｋ ^’ _，ｋ，ｍを無制約の重み係数とみなすことにする。このとき、（３７）式は、
【００３９】
【数１７】

【００４０】
となる。
一方、（３８）式においてｔ＝１のとき、同様に混合ガウス分布モデルと対数線形モデルを導入すると、
【００４１】
【数１８】

【００４２】
を得る。ここで本発明では、Ｗ^ｃ _ｋ’，_ｋ，ｍ（１）＝Ｗ^ｃ _ｋ’，_ｋ，ｍと考える。なぜならＷ^ｃ _ｋ’，_ｋ，ｍ（１）とＷ^ｃ _ｋ’，_ｋ，ｍの両方とも統計的拘束から解放された無制約の変数で、多数の未知統計パラメータを含んでいるからである。以上より、混合度 γ_{ｃ，ｋ，ｍ}、平均ベクトル μ^{（ｃ，ｋ，ｍ）}、共分散行列 Σ^{（ｃ，ｋ，ｍ）}、遷移行列γ^ｃ _ｋ’，_ｋのような多くのパラメータをより少ない数のパラメータに置き換えることが可能になった。つまり、R-LLGMNは係数ベクトルＷ^ｃ _ｋ’，_ｋ，ｍをニューラルネットの重み係数として扱い、誤差逆伝播方式の学習則によりこの重み係数を学習的に獲得することができるのである。
【００４３】
LLGMNにおいて、ターミナルラーニングは、学習時の評価関数Ｊを次の（１０１）式によって設定すれば、学習率ηに合わせて重み係数を変更することによって、評価関数が平衡点に収束できることが分かっている（文献［１６］参照）。R-LLGMNネットワークにおいて、同様に次の（１０１）式によって収束が可能なことが示されるので、指定した有限時間内で、評価関数を収束させることができる。これによって収束時間が多少長くなっても、終了を待つユーザ負担を減少させることができる。
ｄＪ／ｄｔ＝―ηＪ^β （１０１）
【００４４】
２．リカレント構造の導入
図２に、本発明で提案するニューラルネットの構成図を示す。このネットワークは、第３層と第４層の間にフィードバックを有する５層構造のリカレント型ニューラルネットである。まず、入力ベクトルｘ（ｔ） ∈ Ｒ^ｄ（ｔ＝１，…，Ｔ）はLLGMNと同様の方法で変換し、Ｘ（ｔ） ∈ Ｒ^Ｈを第１層の入力とする。第１層の入出力関係は、
^（１）Ｉ_ｈ（ｔ）＝Ｘ_ｈ（ｔ）（８）
^（１）Ｏ_ｈ（ｔ）＝^（１）Ｉ_ｈ（ｔ）（９）
となる。ここで、^（１）Ｉ_ｈ（ｔ）と^（１）Ｏ_ｈ（ｔ）はｈ番目のユニットの入出力を表す。
【００４５】
第２層のユニット｛ｃ，ｋ，ｋ’，ｍ｝（ｃ＝１，…，Ｃ；ｋ，ｋ’＝１，…，Ｋ_ｃ；ｍ＝１，…，Ｍ_ｃ，ｋ）は、第１層のユニットからの出力を重み係数ｗ^ｃ _{ｋ’，ｋ，ｍ，ｈ}を介して受け取る。第２層のユニットへの入力を^（２）Ｉ^ｃ _{ｋ’，ｋ，ｍ}（ｔ）、出力を^（２）Ｏ^ｃ _{ｋ’，ｋ，ｍ}（ｔ）とすると、
【００４６】
【数１９】

【００４７】
となる。ただし、Ｋ_ｃは隠れマルコフモデルの状態数に対応するパラメータ、Ｍ_ｃ，ｋはクラスｃ、状態ｋに対応する混合ガウス分布モデルのコンポーネント数を表す。（３．２）
第３層のユニット｛ｃ，ｋ，ｋ’｝への入力は、第２層のユニット｛ｃ，ｋ，ｋ’，ｍ｝（ｍ＝１，…，Ｍｃ，ｋ）の出力を統合したものである。また、その入力に１時刻前の第４層の出力を乗じた値が第３層の出力となる。入出力関係は、
【００４８】
【数２０】

【００４９】
となる。ただし、初期状態は、例えば、^（４）Ｏ^ｃ _ｋ，（０）＝１．０とする。
さらに第４層の入力 ^（４）Ｉ^ｃ _ｋ（ｔ）、と出力^（４）Ｏ^ｃ _ｋ（ｔ）との関係は、
【００５０】
【数２１】

【００５１】
で与えられる。
最後に、第５層のユニットｃは第４層のＫ_ｃユニット｛ｃ，ｋ｝（ｋ＝１，…，Ｋ_ｃ）の出力を統合したものである。入出力関係は、
【００５２】
【数２２】

【００５３】
となる。
ここで、時系列信号の長さＴが１、状態数Ｋ_ｃが１の場合の第５層のユニットについて考えよう。常にｔ＝１ならば、（１３）式におけるフィードバックが意味をなさなくなる。このとき、第２層から第５層までの関係（（１０）−（１７）式）はまとめて、
【００５４】
【数２３】

【００５５】
と書くことができる。これはLLGMNにおける第２層と第３層の関係と同じである（（４）（５）式参照）。つまり、対象とする信号が時系列信号でない場合や、フィードバック統合が重要でない場合は、R-LLGMNはLLGMNに帰着するのである。
【００５６】
３．学習則
いま、時刻Ｔにおいて、ｎ番目の入力ベクトルｘ（ｔ）^（ｎ）に対応する教師ベクトルＴ^（ｎ）＝（Ｔ_１ ^（ｎ），…，Ｔ_ｃ ^（ｎ），…，Ｔ_Ｃ ^（ｎ））^、（ｎ＝１，…，Ｎ）が与えられた場合について考える。Ｔ_ｃ ^（ｎ）は観測された事象がｃであるときは１、それ以外は０をとり、複数のクラスが同時に１になることはない。R-LLGMNは、Ｃ個のクラスそれぞれに用意されたＬ個の時系列信号（Ｎ＝Ｌ×Ｃ）を学習データとして学習を行う。学習用データに対するネットワークの評価関数Ｊは、
【００５７】
【数２４】

【００５８】
と定義し、これを最小化、すなわち対数尤度を最大化するように学習を行う。ただし、^（５）Ｏ^ｃ（Ｔ）^（ｎ）は入力ベクトルｘ（ｔ）^（ｎ）に対する時刻Ｔでの出力を意味している。このとき、重みのｗ^ｃ _{ｋ’，ｋ，ｍ，ｈ}の修正量△ｗ^ｃ _{ｋ’，ｋ，ｍ，ｈ}を
【００５９】
【数２５】

【００６０】
と定義する。ただし、η＞０は学習率である。また、R-LLGMNに含まれるリカレント結合の影響を考慮するため、通時的誤差逆転播方式（ＢＰＴＴ）を用いる（文献［１５］参照）。これは時系列中の誤差を蓄積して、重み修正量を計算するという方式である。ここで（２１）式の右の項をｃｈａｉｎｒｕｌｅを使って展開すると、（２２）式を得る。
【００６１】
【数２６】

である。
【００６２】
また、Δ^ｃ’ _ｋ”（ｔ）はＪ_ｎの^（４）Ｏ^ｃ’ _ｋ”（Ｔ−ｔ）による偏微分である。（２４）式が成り立つので、上式は（２５）〜（２６）式となる。
【数２７】

【００６３】
さらに本発明では、ターミナルラーニング（文献［１６］参照）（ＴＬ）の概念を学習則に取り入れ、ＴＬを用いることにより指定した有限時間内でニューラルネットの学習を平衡点に収束させる。ここで、重みｗ^ｃ _{ｋ’，ｋ，ｍ，ｈ}を時間依存の連続変数として考えると、この時間微分は次式となる。
【００６４】
【数２８】

【００６５】
ここで、η_ｔは学習率でβ（０＜β＜１）は定数である。このとき、評価関数Ｊの時間微分は以下のように計算される。
【００６６】
【数２９】

【００６７】
（２９）式から、評価関数Ｊは単調非増加関数となり学習が安定に収束することがわかる。この収束時間を計算する時、
【００６８】
【数３０】

【００６９】
となる。ただし、Ｊ_０は評価関数の初期値で、Ｊ_ｆは平衡点におけるＪの収束値である。Ｊ_ｆ＝０の場合、（３０）式の等号が成立し、学習率η_ｔを用いて収束時間を指定できることがわかる。一方、Ｊ_ｆ≠０の場合でも収束時間は常に（３０）式で与えられる上限値よりも早い時刻で平衡状態に到着することになる。
【００７０】
以上より、学習の終了時刻ｔ_ｆから学習率η_tを計算し、このη_tを（３０）式に代入して学習を行なえば、学習が完全に終了した場合は時刻ｔ_ｆでＪ＝０となる。学習中には、ユーザには、残りの学習時間ｔ_ｆ−ｔとＪの値を随時表示する。これにより学習終了を待つユーザの精神的負担をやわらげることができる。
【００７１】
４．ニューラルネットワークの処理
図３に、ニューラルネットワークを実現するためのハードウェアの構成図を示す。
この構成は、記憶部１、入力部２、出力部３、処理部４、インターフェースＩ／Ｆ５を備える。
記憶部１は、各種データを記憶する。記憶部１は、例えば、初期値を記憶する第１テーブル１１と、ニューラルネットワークの各層の構成を表す式に関するデータを記憶した第２テーブル、中間結果等を記憶する第３テーブル、重みなどの学習結果・最終結果を記憶する第４テーブルを有する。入力部２は、キーボード、ポインティングデバイス等の入力装置であり、初期値、評価関数、ニューラルネット等に関する各種データを入力及び設定する。出力部３は、ディスプレイ、媒体駆動装置、インターフェイス等の各種出力装置であり、初期値、中間結果、最終結果等の適宜のデータを出力する。処理部４は、記憶部１から初期値、中間結果、学習結果・最終結果等の各種データを読み取り、また、記憶部１に書きこむ。処理部４は、また、中間結果、学習結果・最終結果等についての各種計算処理、制御処理を実行する。さらに、処理部４は、中間結果、学習結果・最終結果等の各種データを出力部３に出力する。また、インターフェース（Ｉ／Ｆ）５は、例えば、後述するような節電信号や脳波信号等の各種信号の計測装置と情報の送受をするためのものである。
【００７２】
図４は、ニューラルネットワークの学習処理を示すフローチャートである。
まず、処理部４は、記憶部１に初期設定を行う（Ｓ１０１）。初期設定では、例えば、状態数Ｋ_ｃ、コンポーネント数Ｍ_ｃ，ｋ、クラス数Ｃ、入力信号数ｄ、時系列信号長Ｌ、収束時間ｔ_ｆ、学習データ数Ｎの設定を行う。つぎに、処理部４は、ターミナルラーニング（（２７）、（２８）式参照）に基づき学習率ηｔを計算する（Ｓ１０３）。処理部４は、記憶部１に記憶された学習回数Ｔを読み出し初期値として設定する（Ｓ１０５）。ここで、学習回数Ｔは、設定した収束時間ｔ_ｆ（「３．学習則」参照）により決まる（Ｓ１０５）。処理部４は、さらに記憶部１に記憶された学習データ数Ｎを読み出し初期値として設定する（Ｓ１０７）。つぎに、処理部４は、R-LLGMNの前向き演算を実行する（Ｓ１０９）。
【００７３】
図５は、R-LLGMNの前向き演算処理を示すフローチャートである。
R-LLGMNの前向き演算処理では、まず、処理部４は、時系列信号長Ｌを記憶部１から読み出し、初期値として設定する（Ｓ２０１）。処理部４は、R-LLGMNの演算を、（８）〜（１９）式に基づいて実行する（Ｓ２０３）。処理部４は、時系列信号長Ｌを１減算して更新して記憶する（Ｓ２０５）。処理部４は、時系列信号長Ｌが０以下になるまでステップＳ２０３及びＳ２０５の処理を繰り返す。その後、処理部４は、（１９）式で求められた演算結果^（５）Ｏ（ｔ）を記憶部１に記憶して、必要に応じて出力部３から出力する（Ｓ２０９）。
【００７４】
以上のようにステップＳ１０９の処理が実行された後、処理部４は、学習データ数Ｎを１減算して更新して記憶する（Ｓ１１１）。処理部４は、学習データ数Ｎが０以下になるまでステップＳ１０９及びＳ１１１の処理を繰り返す（Ｓ１１３）。その後、処理部４は、（２１）式に基づき重みｗ^ｃ _{ｋ’，ｋ，ｍ，ｈ}を修正する（Ｓ１１５）。処理部４は、学習回数Ｔを１減算して更新して記憶する（Ｓ１１７）。処理部４は、学習回数Ｔが０以下になるまでステップＳ１０７からＳ１１７の処理を繰り返す。その後、処理部４は、重みｗ^ｃ _{ｋ’，ｋ，ｍ，ｈ}を含む学習結果を記憶部１に記憶する（Ｓ１２１）。
【００７５】
つぎに、図６は、本発明によるニューラルネットワークを用いた生体信号識別処理を実行するための構成図である。
この構成は、特徴抽出部１１、識別部１２、バッファメモリ１４、R-LLGMN演算部１３を備える。特徴抽出部１１には、生体信号（脳波信号（ＥＥＧ）、筋電位信号（ＥＭＧ）など）が入力される。特徴抽出部１１は、入力された生体信号に従い識別用データを抽出する。例えば、生体信号として脳波信号が入力された場合、識別用データとして、特定の周波数帯域のパワースペクトルの平均値を要素とするベクトルを抽出する。また、生体信号として筋電位信号が入力された場合、識別用データとして、整流及び平滑後の振幅値を要素とするベクトルを抽出する。つぎに、識別部１２は、特徴抽出部１１により抽出された識別用データを記憶部に記憶する又はR-LLGMN演算部１３は出力する。R-LLGMN演算部１３は、上述のニューラルネットワークによる演算を行なうものであり、図３に示した構成に相当し、信号の送受はインタフェースＩ／Ｆ５を介して行うことができる。R-LLGMN１３は、記憶部から読み出した又は識別部１２から与えられた識別用データに基づき、R-LLGMN前向き演算を実行する（後述）。識別部１２は、R-LLGMN演算部１３の出力結果^（５）Ｏ_ｃ（ｔ）の中で最も高い値（又は、予め決められた閾値より高い値）を示したクラス番号ｃを識別結果として選択する。バッファメモリ１４は、識別部１２の識別結果を保持する。識別結果は、処理部によりバッファメモリ１４出力される。
【００７６】
図７に、本発明によるニューラルネットワークを用いた識別処理を示すフローチャートである。
R-LLGMN演算部１３は、記憶部から学習終了時に出力された重みｗ^ｃ _{ｋ’，ｋ，ｍ，ｈ}を読み込む（Ｓ３０１）。R-LLGMN演算部１３は、記憶部又は識別部１２から識別用データを入力する（Ｓ３０３）。識別用データは、例えば、脳波信号（ＥＥＧ）又は筋電位信号（ＥＭＧ）から特徴を抽出したデータである。R-LLGMN演算部１３は、上述のようにR-LLGMNの前向き演算処理を実行する（Ｓ３０５）。R-LLGMN演算部１３は、前向き演算処理で算出された^（５）Ｏ_ｃ（ｔ）の中で最も高い値（又は、予め定められた閾値より高い値）を示すクラス番号ｃを識別結果として出力する（Ｓ３０７）。
【００７７】
５．ＥＥＧ信号識別実験
本発明で提案したニューラルネットの有効性を検証するために、時間とともに激しく変動する脳波（ＥＥＧ）信号を用いて識別実験をおこなった。
５．1 ＥＥＧ信号計測装置）
【００７８】
図８はＥＥＧ信号計測装置の全体図である。ＥＥＧ信号の計測には、簡易小型脳波計を使用した。計測したＥＥＧは無線でコンピュータに取り込まれる。測定されたＥＥＧ信号は、ハイパスおよびローパス（40［Hz］のアナログフィルタを通過後増幅され、A／D変換される。電極はヘツドバンドに固定されており、双極導出による脳波の差分をとることで大部分の測定ノイズを排除することができる。
ＥＥＧ信号の測定は以下の2種類の条件で行った。各々120秒間の学習データを計測し、Ｍ系列により作成したランダムな時間間隔にしたがって視覚刺激を変化させて420秒間の識別用脳波を計測した。
【００７９】
（1）開閉眼
被験者は通常の計算機室内の椅子に座り、安静にしておくこの状態で開閉眼時の時系列脳波パターンを計測する。
（２）開閉眼および閃光刺激
被験者は暗くした計算機室内の椅子に座り。安静にしておく。この被験者から約50［cm］離した位置に、光刺激（4［Hz］で点滅）を与えるフラッシュライト（光源：クセノン。エネルギー量：1.76［J］を設置する。この状態で開閉眼および閃光刺激時の時系列脳波パターンを計測する。
【００８０】
５−２．時系列ＥＥＧ信号の特徴抽出
本発明で用いた脳波計は1対の電極しか備えていないために、脳波が発生している部位に関する空間的な情報は利用できない。そこで、一対の電極からより多くの情報を抽出するために、ＥＥＧ信号の周波数成分を利用する。
前処理として計測したＥＥＧ信号を128サンプル毎に高速フーリエ変換（FFT）し、周波数スペクトルを計算する。次に、スペクトル中の識別有効帯域（0〜35［Ｈz］に関して、臨床脳波学におけるδ、θ、α、βなどを基準に分割する。そして、それぞれの帯域毎にパワースペクトルの平均値を計算し、平均値の時系列データを作成した。このとき、帯域毎に時系列の範囲が0〜1になるように正規化した。なお、本実験に使用する周波数帯域は0〜8、9〜35［Hz］の2つの帯域とした。
【００８１】
５−３開閉眼の識別
まず、図９は、披験者Aの開閉眼時の識別結果の一例を示す図である。この図は推定中の信号処理の様子を示しており、上から開眼・閉眼の状態を変化させたタイミング、2つの周波数帯域に対応したＥＥＧ信号、LLGMNの出力、LLGMNとニューラルフィルタを組み合わせた方式（LLGMN with NFと呼ぶ）（文献［8］、［24］参照）の出力、R-LLGMNの出力、識別結果を表している。R-LLGMNの設定は、状態数K₁=K₂=1、コンポーネント数Ｍ_1,1＝Ｍ_2,1＝１、学習データ長T=5,学習データ数Ｌ＝３である。
【００８２】
また、LLGMN with NFのLLGMN部はコンポーネント数が数Ｍ_1,1＝Ｍ_2,1＝３、学習データ数112（各56）。NF部はユニット数を８、それぞれのNＦの学習データ数168とし、考慮する遡り時間ステップは5とした。このNFは中間層にリカレント結合を有しており、非線形信号のフィルタリングを行うことができる。このときのR-LLGMNの識別率は9７.6％でかなり高い識別率を実現できている。またR-LLGMNはLLGMN with NFと比べて、出力波形の立ち上がりが多少遅れているものの、信号の安定性はかなり向上している。
【００８３】
図１０は、3人の被験者（A，B：男性，C：女性）に対して実験をおこなった結果を示す図である。ここでは比較のために、本発明で提案するR-LLGMN、LLGMN、LLGMN with NＦ、ＨＭＭ（注1）の4つの手法を用いた。各手法ともに、0〜lの様乱数を用いて10通りの異なる初期重みを作成して実験を行ったそれぞれの値は各手法の平均値と標準偏差である。ＨＭＭの設定は状態数K₁=K₂=1、学習データ長T=5、学習データ数４０とした。
【００８４】
5人のデータに共通して、LLGMNより他の3手法のほうが高い識別率を示している。これは、静的な統計モデルしか内包していないLLGMNが動的なＥＥＧ信号に対応しきれていないことを表している。それに対し、他の3手法は動的な統計モデルを内包しているためにＥＥＧ信号に適応し、高い識別率を実現している。しかしながら、LLGMN with NＦはネットワークの規模が大きいためLLGMNとNＦを別々に学習しなければならず、学習が非常に困難である。［b］またＨＭＭは、学習は比較的に簡単だが、大量の学習デー夕が必要である。一方、R-LLGMNは少ない学習デー夕で動的・静的な特性を一度に学習できる。そのためLLGMN with NFとＨＭＭの両方の欠点を補っており、非常に有効な手法であることがわかる。
【００８５】
５−４３状態の光刺激の識別
次に、図１１は、被験者Aの開閉眼および閃光刺激時の識別結果の一例を示す図である。開閉眼のみの場合（図９）に比べてR-LLGMNの出力が乱れており、被験者の状態を識別することが非常に困難であることがわかる。これは、閃光刺激時の輝度変化以外の視覚刺激が少なく、ＥＥＧ信号に明確な変化が観測されなかったためである。このときの識別率は87．4％であった。
【００８６】
図１２は、5人の被験者に対する識別結果を示す図である。開閉眼時のＥＥＧパターン識別結果と比べると識別率が低下しているものの、平均するとR-LLGMNの識別率が一番高く、識別が可能であることがわかる。
次に、学習データ数を変化させて実験を行った時の結果を示す。この実験では比較のために、R-LLGMNとHMMを用いた。各設定は時系列の長さＴ＝５、コンポーネント数Mc,k=1、上対数Kc＝１で、学習データ数をＴ＝5,10,…,40に変化させた。なお、HMMの量子化レベル数はＱＬ＝４とした。
図１３及び図１４は、そのときの教師信号数に対する識別率の変化（HMM及びR-LLGMN）を示す図である。これは、１０通りの異なる初期重みで学習をおこない、未学習の２１０個の時系列データに少なくても、かなり高い識別率実現している。
以上の結果から、HMMの学習には大量のサンプルデータが必要であることがわかる。また、LLGMN with NFはネットワークの規模が大きいため、LLGMNとNFを別々に学習しなければならず、学習が非常に困難である[16]。一方、R-LLGMNは少ない学習データで動的・静的な特性を一度に学習することができる。そのためLLGMN with NFとHMMの両方の欠点を補うことができ、EGG識別に非常に有効な手法である。
【００８７】
最後に、状態数、コンポーネントの違いによる識別率の変化について調べた。学習データ数Ｌ＝３、時系列の長さをＴ＝５として、コンポーネント数をＭｃ，ｋ＝1,2,…,10,状態数をＫｃ＝1,2,…,10に変化させた（c＝１，2，3）。
【００８８】
図１５は、そのときの状態数とコンポーネント数に対する識別率の変化を示す図である（被験者Ｅ）。これは、10通りの異なる初期重みで学習をおこない、未学習の210個の時系列データについて識別率の平均と標準偏差を計算した結果である。図より、コンポーネント数と状態数を増やすことでR-LLGMNの表現力が増し。識別精度が向上していることがわかる。
【００８９】
６．ＥＭＧ駆動型ロボットシステムへの応用
本実施の形態では、ニューラルネットを利用したＥＭＧ信号のパターン識別を目的としている。ＥＭＧ情報には、振幅パターン、周波数パターン、またそれぞれに時系列特性がある。しかし従来の識別法のほとんどは、時系列特性を利用していない。また、既存のニューラルネットを利用しているためＥＭＧ信号の従う統計パターンも利用されていない。そこで、本報告ではネットワーク内にリカレント構造を導入し、ＥＭＧ信号の時間的な変化にも対応可能な新しいニューラルネットを紹介する。実験では、このニューラルネットの高い識別能力を証明する。
【００９０】
６−１はじめに
近年、上肢切断者を支援するためのＥＭＧ制御型電動義手の開発が活発に行われている（文献1,2参照）。ＥＭＧ信号を義手のインタフェースとして利用する場合、筋の特性が切断者や計測位置によって異なるため、解析的に動作推定を行うのは難しい。そこで従来から、学習によって動作識別を行う方法が試みられてきた。例えば、FarryらはＥＭＧの周波数情報から、ロボットハンドを遠隔操作する手法を提案した（文献[25]参照）。また、Huang and Chen はＥＭＧの積分筋電等から、誤差逆伝播型ニューラルネット（以下、BPNNと略記）を用いて8動作の識別を行っている（文献[26]参照）。
著者らもこれまでに人間支援を目的としたＥＭＧ制御型マニピュレータシステムの構築を行ってきた。このシステムは、混合ガウス分布モデルと対数線形モデルに基づいたLog-Linearized Gaussian Mixture Network（以下、LLGMNと略記）（文献[27]参照）を用いてＥＭＧ信号の識別をおこない、その識別結果をロボットマニピュレータの制御手段として用いている（文献[28]参照）。
【００９１】
しかしながら、従来の方法はいずれも、入力信号の履歴を考慮しておらず、対象とする問題の静的な特性しか学習していない。そのため、対象が時間とともに変動するような動的な時系列信号を識別するには限界があった。
そこで本実施の形態では、パターン識別部にリカレント構造を有する新しいニューラルネットの導入を試みた。このニューラルネットは隠れマルコフモデルをネットワーク構造として内包しているため、動的な時系列信号を識別することが可能である。本実施の形態に使用した義手型ロボットシステムの概要を説明するとともに、切断者によるＥＭＧ識別実験の結果を示す。
【００９２】
６−２ＥＭＧ制御型人間支援マニピュレータ
６−２．１システム構成
図１６に、ＥＭＧ制御型人間支援マニピュレータシステムの構成図を示す（文献[28]参照）。本システムは、アーム制御部と義手制御部から構成される。アーム制御部には Move Master RM-501（（株）三菱電機製）、義手制御部には超音波モータ駆動型動力義手（文献[29]参照）を用いた。アーム制御部では磁気を利用した３次元位置センサを入力装置として用い、腕全体の大きな動きを実現する。一方、義手制御部では操作者のＥＭＧ信号から意図する前腕動作をニューラルネットにより識別し、手先の細かな動きを実現する。この際、人間の前腕部のインピーダンスモデルを義手制御に導入することで、人間のようなしなやかな動作を実現している（文献[30]参照）。
【００９３】
６−２．２ニューラルネットによる動作識別
６−２．２．１ＥＭＧ信号処理部
まず、操作者の腕に装着したL対の電極から測定したＥＭＧ信号を各チャンネルごとに全波整流した後、2次のディジタルバタワースフィルタ（カットオフ1[Hz]）に通す。そして平滑化した信号をサンプリング周波数100 [Hz]で採取する。そして、この信号の全チャンネルの和が1となるように正規化したものを、入力ベクトルｘ（ｔ）＝［ａ_１（ｔ），ａ_２（ｔ），・・，ａ_Ｌ（ｔ）］^Ｔ∈Ｒ^Ｌ（ｔ＝１，・・，Ｔ）とする。
【００９４】
６−３実験
開発したニューラルネットの有効性を検証するために、ＥＭＧ信号を用いた動作識別実験を行った。ここでは比較のために、R-LLGMN，LLGMN（文献[27]参照），BPNNの３つの手法を用いた。被験者は男性（４２才）で約１年前に事故のため右手首から１５ｃｍ程の部位で前腕部を切断している。R-LLGMNの設定は、電極数Ｌ＝８、対象動作数Ｃ＝８（掌屈、背屈、回内、回外、握り、開き、手首共収縮、手先共収縮）、状態数Ｋ_１＝Ｋ_２＝１、コンポーネント数Ｍ_１，１＝Ｍ_２，１＝１、学習データ長Ｔ＝４、学習データ数Ｎ＝５とした。なお、ＢＰＮＮは試行錯誤的に、隠れ層２、ユニット数（入力層−８、第２層−１０、第３層−１０、出力層−８）とした。ＥＭＧ信号は各動作２．５秒間採取した。ただし、各動作間に１５秒間の休憩を取った。
【００９５】
図１７に、識別結果を示す図である。表中の値は各動作に対する識別率の平均値（Mean）と標準偏差（SD）である。表より、LLGMNよりR-LLGMNのほうが識別率が高く、過去の履歴情報を利用することにより識別精度が向上していることがわかる。また、LLGMNとR-LLGMNの両方がBPNNより標準偏差が小さく、安定して識別できていることが分かる。
【００９６】
６−４まとめ
本実施の形態では、リカレント結合を有する新しいニューラルネット、R-LLGMNの能力を比較するとともに、ＥＭＧ制御型人間支援マニピュレータシステムへの導入を試みた。その結果、時系列信号の特性を有効に利用することが可能になり、識別精度が向上することを示した。今後は、連続動作間の曖昧になりやすい識別を改善するために、R-LLGMNに適した前処理法について考察する予定である。
【００９７】
７．参考文献
[1] D. E. Rumelhart and J. L. McClelland, "Learning Internal Representations by Error Propagation,"
Parallel Distributed Processing, Explorations in the Microstructure of Cognition, Vol. 1, D.E. Rumel-
hart, J.L. McClelland, and the PDP Research Group, Eds. MIT Press, 1986, pp. 318-362.
[2] T. Caelli, D.M. Squire, and T.P.J. Wild, " Model-based neural networks," Neural Networks, Vol. 6,No. 5, pp. 613-625, 1993.
[3]T. Tsuji and K. Ito "Preorganized Neural Networks: Error Back-Propagation Learning of Manipu-
lator Dynamics," Journal of Artificial Neural Networks, Vol. 2, No. 1-2, pp. 81-95, 1995.
[4] M.D. Richard and R.P. Lippmann, "Neural network classifiers estimate Bayesian a posteriori probabilities," Neural Computation, Vol. 3, pp. 461-483, 1991.
[5] H. G. C. Traven, "A neural network approach to statistical pattern classification by "semiparametric" estimation of probability density functions," IEEE
Trans. Neural Networks, Vol. 2, No. 3, pp. 366-377, 1991.
[6]L.I. Perlovsky and M.M. McManus, "Maximum likelihood neural networks for sensor fusion and adaptive classification," Neural Networks, Vol. 4, No. 1, pp. 89-102, 1991.
[7] 辻敏夫,森大一郎,伊藤宏司”統計構造を組み込んだニューラルネットによるＥＭＧ動作識別法,”電気学会論文誌Ｃ,Vol. 112-C, No. 8, pp. 465-473,1992.
[8] 辻敏夫,市延弘行,金子真,"混合正規分布モデルを用いたフィードフォワード型ニューラルネット”,電子情報通信学会論文誌D-II, Vol. 77, No. 10, pp. 2093-2100,1994.
[9] M.I. Jordan and R.A. Jacobs "Hierarchical mixtures of experts and the EM algorithm," Proc. IEEE Int. Joint Conf. Neural Networks 1993, Vol. II, pp. 1339-1344.
[10]S. Lee and S. Shimoji, "Self-organization of Gaussian mixture model for learning class pdfs in pattern classification," Proc. IEEE Int. Joint Conf. Neural Networks 1993, Vol. III, pp. 2492-2495.
[11]R.L. Streit and T.E. Luginbuhl, "Maximum likelihood training of probabilistic neural networks,"
IEEE Trans. Neural Networks, Vol. 5, No. 5, pp. 764-783, 1994.
[12] B.S. Everitt and D.J. Hand, "Finite Mixture Distributions" , Chapman and Hall, 1981.
[13]O. Fukuda, T. Tsuji, A. Ohtsuka, and M. Kaneko, "ＥＭＧ-based human-robot interface for rehabilitation aid," Proc IEEE Int Conf on Robotics and Automation, pp. 3492-3497, Leuven, 1998.
[14] L.E. Baum and T. Petrie, "Statistical inference for probabilistic function of finite state Markov chains," Ann. Math. Stat., Vol. 37, No. 6, pp. 1554-1563, 1966.
[15] P.J. Werbos, "Back propagation through time: what it does and how to do it," Proceedings of the IEEE, Vol. 78, No. 10, pp. 1550-1560, 1990.
[16] T. Tsuji, O. Fukuda, M. Kaneko and K. Ito "Pattern Classification of Time-series ＥＭＧ Signals Using Neural Networks," International Journal of Adaptive Control and Signal Processing. (in press)
[17] L. R. Rablner, "A tutorial on hidden markov model and selected applications in speech recognition," Proceedings of the IEEE, Vol. 77, No. 2, pp. 257-286, 1989.
[18]A.P. Dempster, N. M. Larld and D.B. Rubin, "Maximum likelihood from incomplete data via the
EM algorithm," J. Roy. Stat. Soc. Sereies B, Methodological, Vol. 39, No. 1, pp. 1-38, 1977.
[19]L. E. Baum and G. R. Sell, "Growth functions for transformations on manifold," Ann. Math. Stat.,
Vol. 27, No. 2, pp. 211-2271 1968.
[20] B.H. Juang, S.E. Levinson, and M.M. Sondhi, "Maximum Likelihood estimation for multivariate
mixture observations of Markov chains," IEEETrans. Informat. Theory, Vol. IT-32, No. 2, pp. 307-309, 1986.
[2l] J.S. Bridle, "Alpha-nets : A recurrent 'neural' network architecture with a hidden Markov model interpretation," Speech Communication, Vol. 9, No .1, pp. 83-92, 1990.
[22] H Bourlard and C. J. Wellekens, "Links between Markov models and multilayer perceptrons," in Advances in Neural Information Processing Systems I, D.S. Touretzky, Eds. Morgan Kaufmann, Los Altos, CA, 1989, pp. 502-510.
[23] D.M. Titterington, A.F.M. Simth and U.E. Markov, Statistical analysis of finite mixture distributions, John Wiley & Sons, New York, 1985.
[24] 福田修,辻敏夫,金子真,"ニューラルネットによる時系列脳波パターンの識別,”電子情報通信学会論文誌,Ｄ−ＩＩ，Ｖol．Ｊ88，Ｎｏ.7，pp. 1896-1903, 1997.
[25] K. A. Farry, I. D. Walker and R. G. Baraniuk, ''Myoelectric Teleoperation of a Complex Robotic Hand'', IEEE Transactions on Robotics and Automation, Vol. 12, No. 5, pp. 775-787, 1996.
[26] H.-P. Huang and C.-Y. Chen,
''Development of a Myoelectric Discrimination System for a Multi-Degree Prosthetic Hand'',
Proceedings of the 1999 IEEE International Conference on Robotics and Automation, Vol. 3, pp. 2392-2397, 1999. [27] 辻敏夫、市延弘行、金子真、
''混合正規分布モデルを用いたフィードフォワード型ニューラルネット''、電子情報通信学会論文誌、D-II，Vol. 77，No. 10，pp. 2093-2100，1994．
[28] 福田修、辻敏夫、金子真、
''ＥＭＧ信号を利用した手動制御人間支援マニピュレータ''、
日本ロボット学会誌、Vol. 18，No. 3，pp. 387-394，2000．
[29] 伊藤（宏）、永岡、辻、加藤、伊藤（正）、
''超音波モータを用いた3自由度前腕筋電義手''、
計測自動制御学会論文集、Vol. 27，No. 11，pp. 1281-1289，1991．
[30] 辻敏夫、重吉宏樹、福田修、金子真、
''ＥＭＧ信号に基づく前腕動力義手のバイオミメティック制御''、
日本機械学会論文集（C編）、Vol. 66，No. 648，pp. 2764-2771，2000．
【００９８】
なお、本発明のニューラルネットワーク又はそのシステムは、ニューラルネットワークによる演算方法、その各手順をコンピュータに実行させるためのニューラルネットワーク処理プログラム、ニューラルネットワーク処理プログラムを記録したコンピュータ読み取り可能な記録媒体、ニューラルネットワーク処理プログラムを含みコンピュータの内部メモリにロード可能なプログラム製品、そのプログラムを含むサーバ等のコンピュータ、等により提供されることができる。
【００９９】
【発明の効果】
本発明では、LLGMNの能力改善を実施することができ、時間とともに変化する動的なデータにも適応可能な新しいリカレント型ニューラルネットR-LLGMNを提案した。このニューラルネットは、動的な確率モデルの1つである隠れマルコフモデルを内包しており、LLGMNでは不可能だった時系列特性の学習を行うことが可能である。また遷移確率や出力確率密度関数などの統計パラメータをネットワークの重みとして表現しており、誤差逆伝播方式の学習則によりその重みを調節することができる。提案手法の有効性・妥当性を検証するために、時間とともに激しく変動するＥＥＧ信号の識別実験を行い、十分な識別精度を実現した。さらに、ターミナルラーニング方式を導入しているので、生体信号のパターン認識時の終了時刻を予め表示することができ、ユーザの精神的な負担を軽減できる。
【図面の簡単な説明】
【図１】混合ガウス型の連続確率密度分布を持つＨＭＭを示す図。
【図２】本発明で提案するニューラルネットの構成図。
【図３】ニューラルネットワークを実現するためのハードウェアの構成図。
【図４】ニューラルネットワークの学習処理を示すフローチャート。
【図５】 R-LLGMNの前向き演算処理を示すフローチャート。
【図６】本発明によるニューラルネットワークを用いた生体信号識別処理を実行するための構成図。
【図７】本発明によるニューラルネットワークを用いた識別処理を示すフローチャート。
【図８】ＥＥＧ信号計測装置の全体図。
【図９】披験者Aの開閉眼時の識別結果の一例を示す図。
【図１０】 3人の被験者（A、B：男性、C：女性）に対して実験をおこなった結果を示す図。
【図１１】被験者Aの開閉眼および閃光刺激時の識別結果の一例を示す図。
【図１２】 5人の被験者に対する識別結果を示す図。
【図１３】教師信号数に対する識別率の変化（HMM）を示す図。
【図１４】教師信号数に対する識別率の変化（R-LLGMN）を示す図。
【図１５】状態数とコンポーネント数に対する識別率の変化を示す図。
【図１６】ＥＭＧ制御型人間支援マニピュレータシステムの構成図。
【図１７】識別結果を示す図。
【図１８】 LLGMNの構成図。
【符号の説明】
１記憶部
２入力部
３出力部
４処理部
５Ｉ／Ｆ
１１特徴抽出部
１２識別部
１３バッファメモリ
１４ R-LLGMN演算部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a neural network, a neural network system, and a neural network processing program, and more particularly to a neural network, a neural network system, and a neural network processing program that can be used in a pattern recognition system for biological signals such as brain waves and myoelectric potentials.
[0002]
[Prior art]
The backpropagation neural network proposed by Rumelhart et al. (Ref. [1]) (the numbers in [] represent reference numbers described later) have a powerful learning ability that can acquire an arbitrary nonlinear mapping. Applied to various pattern identification problems. However, many problems still remain depending on the target problem. For example, as the mapping of the learning target becomes complicated, a huge number of teacher signals, learning time, and a large network structure are required, and there are concerns about problems such as local minimum.
[0003]
One of the attempts to solve such a problem is a method of taking in a characteristic known about a learning target as a network structure (refer to documents [2] and [3]). In this method, the neural network can be adjusted to suit the given problem by imposing constraints on the connection weights and the network rules. With regard to the pattern identification problem, many researchers have tried to fuse neural networks and probabilistic models (see reference [4]), and many neural networks using mixed Gaussian distribution models have been proposed (see reference [5]). ]-[11]).
[0004]
The mixed Gaussian distribution model is a method of approximating the distribution of a population by a linear sum of probability distributions called a plurality of components, and uses a Gaussian distribution for each component (see reference [12]). Development of neural networks based on this mixed Gaussian distribution model is described in Traven (Ref. [5]), Perlovsky and Mc-Manus (Ref. [6]), Tsuji et al. (Ref. [7] [8]), This is done by Jordan and Jacobs (see reference [9]), Lee and Shimoji (see reference [10]), and Streit and Luginbuhl (see reference [11]). In particular, the present inventors (辻) proposed a Log-Linerized Gaus-sian Mixture Network (LLGMN) based on a mixed Gaussian distribution model and a log-linear model (see reference [8]), and learned based on the mixed Gaussian distribution model. A method for estimating the posterior probability was presented. Moreover, using this LLGMN, it has succeeded in identifying six forearm and hand movements from nodal potential (EMG) signals measured by a plurality of electrode pairs (see reference [13]).
[0005]
1． LLGMN
FIG. 18 shows a configuration diagram of LLGMN (see reference [8]). First of all, the input vector x = [x₁, X₂, ..., x_d]^T ∈R^dIs subjected to preprocessing by a non-linear operation, and X ∈ R^HConvert to
[0006]
[Expression 7]

[0007]
The first layer is composed of H units according to the dimension number H = 1 + d (d + 3) / 2 of the vector X, and an identity function is used as an input / output function. The input / output relationship of the first layer⁽¹⁾I_j, Output⁽¹⁾O_jThen,
⁽¹⁾I_j= X_j (2)
⁽¹⁾O_j=⁽¹⁾I_j (3)
It becomes.
[0008]
The second layer is composed of the same number of units as the total number of components of the mixed Gaussian distribution, and the output of the first layer is used as a weighting factor w._h ^{(C, m)}And output the posterior probability. Input to the unit {c, m} of the second layer⁽²⁾I_{c, m}Output⁽²⁾O_{c, m}Then,
[0009]
[Equation 8]

[0010]
It becomes. However, (c = 1,..., C; m = 1,..., M_c) Where C is the number of target classes and M_cRepresents the number of components of the mixed Gaussian distribution constituting class c. W_h ^{(C, Mc)}= 0
It is.
The third layer is composed of C units corresponding to the number of events, and outputs the posterior probability of event c. Unit c is the second layer M_cThe outputs of the units {c, m} are integrated. The relationship between input and output is
[0011]
[Equation 9]

[0012]
It becomes. As described above, this network has a weighting factor w between the first layer and the second layer._h ^{(C, m)}It is possible to calculate the posterior probabilities of each event simply by adjusting the learning.
However, these neural networks can only learn the static characteristics of the problem of interest, and cannot be applied to dynamic data where the object varies with time.
[0013]
2. Hidden Markov Model
Hidden Markov Model (HMM) (see reference [14]) is commonly used for time series signal identification, and is particularly successful in the field of speech recognition (reference [17]). reference). In the HMM, corresponding to the number N of states and the number M of output symbols. State transition probability matrix A ∈ R^{N x N}Output probability matrix B ∈ R^{M × N}, Initial state probability matrix π ∈ R^{N x N}These three probability matrices are required. In order to apply the HMM to pattern identification, the probability matrix Ac, Bc, πc for each event c is converted into a specific output symbol sequence O belonging to the event c.₁O₂... O_tEstimate a parameter that maximizes the likelihood of the output probability. Here, the observed output symbol string O₁O₂... O_tThe probability α of being in state Si of event c at time t_t(I)
α_t(I) = P (O₁O₂... O_t, S_i| C) (31)
It is. This α_t(I) can be calculated recursively as follows, and the probability P (O | c) of the observed symbol string output can be obtained.
[0014]
[Expression 10]

[0015]
Where πi is the initial probability of state i, b_j(O_t) Is O from state i_tThe probability that a_ijRepresents the transition probability from state i to state j. Then, both the transition probability and the output probability can be estimated from the teacher signal by the Baum-Welth algorithm (see reference [14]) (see reference [18]).
In addition, Baum and Sell (see Ref. [19]) proposed a continuous density HMM (CDHMM) with some constraints on the Baum-Welth algorithm. A mixed Gaussian distribution model is used for the density function, greatly improving the ability of the hidden Markov model. Juang et al. (See reference [20]) extended this CDHMM estimation algorithm and applied it to speech recognition. In this way, the HMM has a high trial capability and is a very effective method in practice.
[0016]
[Problems to be solved by the invention]
However, since the structure of the HMM is unknown, the parameters must be determined based on experience and definition. Further, learning of HMM requires a large amount of learning data, and when there is little learning data, the necessary parameters may not be estimated well. In recent years, therefore, research on fusion of HMMs and neural networks has been actively carried out (see documents [21] and [22]). Bridle (see reference [21]) et al. Showed a method of expressing an HMM with a neural network structure. However, this method only expands the structure of the HMM in the network, and the essential capabilities are the same. Bourlard and Wellekens (see reference [22]) revealed that any network with appropriate recurrent coupling can function as an HMM.
[0017]
It is known that a recurrent type (mutual coupling type) neural network can express a dynamic model by coupling units to each other and feeding back signals internally. However, there is no known method for incorporating a recurrent structure into a neural network capable of pattern recognition using a mixed Gaussian distribution, and it was not clear whether it would be a structure that could be learned as a whole even if it was incorporated.
[0018]
Also, in back propagation through time, it is difficult to predict the learning end time, and the time required for learning varies greatly depending on the difficulty of the problem. In particular, when learning must be performed before execution, such as a biometric signal pattern recognition problem, the burden on the user waiting for the end of learning is large. This is because the convergence time of the energy function of the neural network is not known in advance.
[0019]
In view of the above, an object of the present invention is to provide a neural network capable of learning time series characteristics with a simple structure. It is another object of the present invention to provide a neural network system having five layers based on a logarithmic linearized mixed normal distribution model, having a recurrent structure, and capable of dealing with dynamic data. Another object of the present invention is to provide a neural network that can reduce the burden on the user waiting for learning convergence by clearly indicating the end time before learning.
[0020]
[Means for Solving the Problems]
The present invention proposes a new recurrent type neural network R-LLGMN that can adapt to dynamic data that changes with time for the purpose of improving the performance of LLGMN. This neural network contains a hidden Markov model, which is one of dynamic probability models, and aims to be able to learn time series characteristics that were not possible with LLGMN. Another object of the present invention is to express statistical parameters such as transition probabilities and output probability density functions as network weights, and to adjust the weights according to a learning rule of an error back propagation method. In order to verify the effectiveness and validity of the proposed method, we performed identification experiments on EEG signals that fluctuate with time and achieved sufficient identification accuracy. Furthermore, since the terminal learning method is introduced, the present invention has an object to display the end time at the time of biometric signal pattern recognition in advance and reduce the mental burden on the user.
[0021]
According to the first solution of the present invention,
A hierarchical neural network, at least preprocessing means for nonlinearly converting the input;
Layers constituting a mixed Gaussian distribution model;
A layer having a recurrent structure with feedback between the layers;
A neural network is provided.
[0022]
According to the second solution of the present invention,
A pre-processing means for inputting an input vector that has been input, combining each element of the input vector, and performing non-linear transformation and outputting;
A first layer that receives a value nonlinearly transformed by the preprocessing means, divides the input into a plurality of outputs, and
A second layer that outputs an output of each unit of the first layer, integrated with a value obtained by multiplying a weighting coefficient obtained by learning, and converts and outputs a predetermined function;
An integrated output of the components of the distribution model corresponding to the state k of each unit in the second layer is input, and the same event is multiplied by the output one hour before the corresponding state in the fourth layer. A third layer to output;
A fourth layer that receives an output for the state of each unit in the third layer and divides the input by the combined output of the state and the event / class;
A fifth layer for inputting an integrated output for the state of each unit of the fourth layer, and outputting the input as an analysis result of the event / class;
A neural network system is provided.
[0023]
According to the third solution of the present invention,
The processing unit determines the number of states k (k = 1, 2,..., K_c), Component number m (m = 1, 2,..., M_{c, k}), Number of events / classes c (c = 1, 2,..., C), number of input signals d, time series signal length L, convergence time t_fA function for setting the number of learning data N;
The processing unit determines the learning rate η based on terminal learning._tWith the ability to calculate
A forward calculation function in which the processing unit performs forward calculation by the neural network as described above based on the number of learning times T and the number of learning data N;
The processing unit uses the weight w^c _{k ′, k, m, h}A function for storing learning results including
A neural network system is provided.
[0024]
According to the fourth solution of the present invention,
A pre-processing function that inputs an input vector that has been input, combines each element of the input vector, and outputs a non-linear transform;
The first layer inputs a value nonlinearly converted by the preprocessing means, and branches the input into a plurality of outputs, and
A function in which a value obtained by multiplying the output of each unit of the first layer by a weighting coefficient acquired by learning is input by the second layer, and converted and output by a predetermined function When,
The third layer inputs the integrated output of the components of the distribution model corresponding to the state k of each unit of the second layer, and one time before the corresponding state of the fourth layer in the same event A function of multiplying the output of
The fourth layer inputs an output about the state of each unit of the third layer, and divides the input by an integrated output of the state and the event / class,
A function in which the output of the state of each unit in the fourth layer is integrated by the fifth layer, and the input is output as an analysis result of the event / class;
A neural network processing program for causing a computer to realize the above is provided.
[0025]
DETAILED DESCRIPTION OF THE INVENTION
1. Log linearization of HMM with mixed Gaussian distribution
The network according to the embodiment of the present invention includes a hidden Markov model (see reference [14]), which is one of dynamic probability models, as a network structure, and a temporal error back-propagation algorithm (reference [15]). The weight can be adjusted in a learning manner.
Therefore, in the following, it will be clarified that the network (R-LLGMN) based on the recurrent mixed Gaussian distribution model and logarithmic linear model having the above configuration includes the HMM as a structure.
[0026]
FIG. 1 is a diagram showing an HMM having a mixed Gaussian continuous probability density distribution.
There are C events of interest, and each event (cε {1, ..., C}) is K_cIt consists of individual states. At this time, given time series x (t) = [x (1), x (2),..., X (t)]^TThe posterior probability P (c | x (t)) of the event c with respect to is as follows.
[0027]
## EQU11 ##

[0028]
However, γ^c _{k ', k}Is the probability of transition from state k 'to state k at event c, b^c _k(X (t)) represents the posterior probability from the state k of the event c corresponding to x (t). Also, prior probability π^c _kIs P (c, k) |_{t = 0}be equivalent to. Where posterior probability b^c _k(X (t)) is the number of components M_{c, k}(See Refs. [8] and [23]), γ in Eq. (37)^c _{k ', k}b^c _k(X (t)) is
[0029]
[Expression 12]

[0030]
It becomes. However, γ_{c, k, m}, Μ^{(C, k, m)}∈R^d, Σ^{(C, k, m)} ∈ R^{d × d}Are the degree of mixing, the mean vector, and the covariance matrix, respectively. Mean vector μ^{(C, k, m)}= (Μ^{(C, k, m)} ₁, ..., μ^{(C, k, m)} _d)^TAnd the inverse of the covariance matrix
Σ^{(C, k, m) -1}= [S^{(C, k, m)} _ij]
Is used, the right side of equation (39) is
[0031]
[Formula 13]

[0032]
And can be expanded. Where δ_ijIs the Kronecker delta, which is 1 when i = j and 0 when i ≠ j. Here, (40) is logarithmically linearized (see reference [8]).
[0033]
[Expression 14]

[0034]
Here, equation (42) represents the non-linear preprocessing of R-LLGMN (see equation (1)), and ξ^c _k’,_{k, m}Is the coefficient vector β^c _k’,_{k, m}And converted input X∈R^HIt can be expressed as a product of From the above, β^c _k’,_{k, m}Can be expressed as a neural network structure.
However, since the following equation that is the sum of the posterior probabilities P (c, k | x (t)) is 1, ξ^c _k’,_{k, m}Is redundant.
[0035]
[Expression 15]

[0036]
Therefore, a new variable Y^c _k ^',,_{k, m}Coefficient vector W^c _k ^',_{k, m}Introduced
[0037]
[Expression 16]

[0038]
It becomes. However, W by definition^c _{Kc, kc, Mck}= 0. In the present invention, this vector W^c _k ^' _{, K, m}Is regarded as an unconstrained weighting factor. At this time, the equation (37) is
[0039]
[Expression 17]

[0040]
It becomes.
On the other hand, when the mixed Gaussian distribution model and the log-linear model are similarly introduced when t = 1 in the equation (38),
[0041]
[Formula 18]

[0042]
Get. Here, in the present invention, W^c _k’,_{k, m}  (1) = W^c _k’,_{k, m}I think. Because W^c _k’,_{k, m}  (1) and W^c _k’,_{k, m}Both are unconstrained variables released from statistical constraints and contain a large number of unknown statistical parameters. From the above, the degree of mixing γ_{c, k, m}, Mean vector μ^{(C, k, m)}, Covariance matrix Σ^{(C, k, m)}, Transition matrix γ^c _k’,_kIt is now possible to replace many parameters such as with a smaller number of parameters. That is, R-LLGMN is a coefficient vector W^c _k’,_{k, m}Can be learned as a weighting factor of the neural network, and this weighting factor can be learned in a learning manner by the error back propagation learning rule.
[0043]
In LLGMN, it has been found that if the evaluation function J at the time of learning is set by the following equation (101), the evaluation function can converge to the equilibrium point by changing the weighting factor according to the learning rate η. (Ref. [16]). Similarly, in the R-LLGMN network, the following equation (101) shows that convergence is possible, so that the evaluation function can be converged within a specified finite time. As a result, even if the convergence time is somewhat longer, the burden on the user waiting for the end can be reduced.
dJ / dt = -ηJ^β                  (101)
[0044]
2. Introduction of recurrent structure
FIG. 2 shows a configuration diagram of the neural network proposed in the present invention. This network is a 5-layer recurrent neural network having feedback between the third and fourth layers. First, input vector x (t) ∈ R^d(T = 1,..., T) is converted in the same manner as LLGMN, and X (t) ∈ R^HIs the input of the first layer. The input / output relationship of the first layer is
⁽¹⁾I_h(T) = X_h(T) (8)
⁽¹⁾O_h(T) =⁽¹⁾I_h(T) (9)
It becomes. here,⁽¹⁾I_h(T) and⁽¹⁾O_h(T) represents the input / output of the h-th unit.
[0045]
Second layer units {c, k, k ', m} (c = 1,..., C; k, k' = 1,..., K_c; M = 1, ..., M_c, K) is a weighting factor w^c _{k ′, k, m, h}Receive through. Input to the second layer unit⁽²⁾  I^c _{k ′, k, m}(T), output⁽²⁾  O^c _{k ′, k, m}(T)
[0046]
[Equation 19]

[0047]
It becomes. However, K_cIs a parameter corresponding to the number of states of the hidden Markov model, M_{c, k}Represents the number of components of the mixed Gaussian distribution model corresponding to class c and state k. (3.2)
The input to the unit {c, k, k ′} of the third layer is an integrated output of the units {c, k, k ′, m} (m = 1,..., Mc, k) of the second layer. It is. Also, the value obtained by multiplying the input by the output of the fourth layer one time ago becomes the output of the third layer. The input / output relationship is
[0048]
[Expression 20]

[0049]
It becomes. However, the initial state is, for example,⁽⁴⁾O^c _k,(0) = 1.0.
4th layer input⁽⁴⁾I^c _k(T), and output⁽⁴⁾O^c _kThe relationship with (t) is
[0050]
[Expression 21]

[0051]
Given in.
Finally, unit c of the fifth layer is K of the fourth layer_cUnit {c, k} (k = 1,..., K_c) Output. The input / output relationship is
[0052]
[Expression 22]

[0053]
It becomes.
Here, the length T of the time series signal is 1, and the number of states K_cConsider the unit in the fifth layer when is 1. If t = 1 at all times, the feedback in equation (13) makes no sense. At this time, the relationship from the second layer to the fifth layer (equation (10)-(17)) is summarized,
[0054]
[Expression 23]

[0055]
Can be written. This is the same as the relationship between the second layer and the third layer in LLGMN (see equations (4) and (5)). That is, when the target signal is not a time-series signal or when feedback integration is not important, R-LLGMN results in LLGMN.
[0056]
3. Learning rules
Now, at time T, the nth input vector x (t)^(N)Teacher vector T corresponding to^(N)= (T₁ ^(N), ..., T_c ^(N),..., T_C ^(N))^,Consider the case where (n = 1,..., N) is given. T_c ^(N)Takes 1 when the observed event is c, 0 otherwise, and multiple classes do not become 1 at the same time. The R-LLGMN learns using L time series signals (N = L × C) prepared for each of the C classes as learning data. The network evaluation function J for the learning data is
[0057]
[Expression 24]

[0058]
Learning is performed to minimize this, that is, to maximize the log likelihood. However,⁽⁵⁾O^c(T)^(N)Is the input vector x (t)^(N)Means output at time T. At this time, weight w^c _{k ′, k, m, h}Correction amount △ w^c _{k ′, k, m, h}The
[0059]
[Expression 25]

[0060]
It is defined as However, η> 0 is a learning rate. Moreover, in order to consider the influence of the recurrent coupling included in R-LLGMN, a time-dependent error reversal seeding method (BPTT) is used (see reference [15]). This is a method of accumulating errors in a time series and calculating a weight correction amount. Here, when the right term of equation (21) is expanded using chain rule, equation (22) is obtained.
[0061]
[Equation 26]

It is.
[0062]
Δ^{c '} _{k ”}(T) is J_nof⁽⁴⁾O^{c '} _{k ”}It is a partial differentiation by (T−t). Since the equation (24) is established, the above equation becomes the equations (25) to (26).
[Expression 27]

[0063]
Furthermore, in the present invention, the concept of terminal learning (see reference [16]) (TL) is taken into the learning rule, and the learning of the neural network is converged to the equilibrium point within the finite time specified by using TL. Where weight w^c _{k ′, k, m, h}As a time-dependent continuous variable, this time derivative is
[0064]
[Expression 28]

[0065]
Where η_tIs the learning rate and β (0 <β <1) is a constant. At this time, the time derivative of the evaluation function J is calculated as follows.
[0066]
[Expression 29]

[0067]
From equation (29), it can be seen that the evaluation function J becomes a monotonous non-increasing function, and learning converges stably. When calculating this convergence time,
[0068]
[30]

[0069]
It becomes. However, J₀Is the initial value of the evaluation function._fIs the convergence value of J at the equilibrium point. J_fWhen = 0, the equal sign of equation (30) is established, and the learning rate η_tIt can be seen that the convergence time can be specified using. On the other hand, J_fEven when ≠ 0, the convergence time always arrives at the equilibrium state at a time earlier than the upper limit value given by the equation (30).
[0070]
From the above, the learning end time t_fLearning rate η_tAnd this η_tIf the learning is completed by substituting into the equation (30), the time t_fJ = 0. During learning, the user is prompted for the remaining learning time t_f-Display values of t and J as needed. Thereby, the mental burden of the user who waits for completion of learning can be eased.
[0071]
4). Neural network processing
FIG. 3 shows a configuration diagram of hardware for realizing the neural network.
This configuration includes a storage unit 1, an input unit 2, an output unit 3, a processing unit 4, and an interface I / F 5.
The storage unit 1 stores various data. The storage unit 1 learns, for example, a first table 11 that stores initial values, a second table that stores data relating to expressions representing the configuration of each layer of the neural network, a third table that stores intermediate results, and weights. A fourth table for storing results and final results is provided. The input unit 2 is an input device such as a keyboard or a pointing device, and inputs and sets various data related to initial values, evaluation functions, neural networks, and the like. The output unit 3 is various output devices such as a display, a medium driving device, and an interface, and outputs appropriate data such as initial values, intermediate results, and final results. The processing unit 4 reads various data such as initial values, intermediate results, learning results, and final results from the storage unit 1 and writes them into the storage unit 1. The processing unit 4 also executes various calculation processes and control processes for intermediate results, learning results, final results, and the like. Further, the processing unit 4 outputs various data such as intermediate results, learning results, and final results to the output unit 3. The interface (I / F) 5 is for transmitting / receiving information to / from various signal measuring devices such as a power saving signal and an electroencephalogram signal as will be described later.
[0072]
FIG. 4 is a flowchart showing the learning process of the neural network.
First, the processing unit 4 performs initial setting in the storage unit 1 (S101). In the initial setting, for example, the number of states K_c, Component number M_{c, k} , Class number C, input signal number d, time-series signal length L, convergence time t_fThe number N of learning data is set. Next, the processing unit 4 calculates a learning rate ηt based on terminal learning (see equations (27) and (28)) (S103). The processing unit 4 reads the learning count T stored in the storage unit 1 and sets it as an initial value (S105). Here, the learning frequency T is the set convergence time t._f(See “3. Learning Rule”) (S105). The processing unit 4 further reads the learning data number N stored in the storage unit 1 and sets it as an initial value (S107). Next, the processing unit 4 executes a forward calculation of R-LLGMN (S109).
[0073]
FIG. 5 is a flowchart showing forward calculation processing of R-LLGMN.
In the R-LLGMN forward calculation process, first, the processing unit 4 reads the time-series signal length L from the storage unit 1 and sets it as an initial value (S201). The processing unit 4 executes R-LLGMN calculation based on the equations (8) to (19) (S203). The processing unit 4 updates the time-series signal length L by subtracting 1 and stores it (S205). The processing unit 4 repeats the processes of steps S203 and S205 until the time series signal length L becomes 0 or less. Thereafter, the processing unit 4 calculates the calculation result obtained by the equation (19).⁽⁵⁾O (t) is stored in the storage unit 1 and output from the output unit 3 as necessary (S209).
[0074]
After the processing of step S109 is executed as described above, the processing unit 4 updates the learning data number N by 1 and stores it (S111). The processing unit 4 repeats the processes of steps S109 and S111 until the learning data number N becomes 0 or less (S113). Thereafter, the processing unit 4 calculates the weight w based on the equation (21).^c _{k ′, k, m, h}Is corrected (S115). The processing unit 4 updates the learning count T by 1 and stores it (S117). The processing unit 4 repeats the processing from step S107 to S117 until the learning count T becomes 0 or less. Thereafter, the processing unit 4 determines the weight w^c _{k ′, k, m, h}Is stored in the storage unit 1 (S121).
[0075]
Next, FIG. 6 is a block diagram for executing a biological signal identification process using the neural network according to the present invention.
This configuration includes a feature extraction unit 11, an identification unit 12, a buffer memory 14, and an R-LLGMN calculation unit 13. A biological signal (such as an electroencephalogram signal (EEG) or a myoelectric potential signal (EMG)) is input to the feature extraction unit 11. The feature extraction unit 11 extracts identification data in accordance with the input biological signal. For example, when an electroencephalogram signal is input as a biological signal, a vector whose element is an average value of a power spectrum in a specific frequency band is extracted as identification data. Further, when a myoelectric potential signal is input as a biological signal, a vector having the amplitude value after rectification and smoothing as an element is extracted as identification data. Next, the identification unit 12 stores the identification data extracted by the feature extraction unit 11 in the storage unit, or the R-LLGMN calculation unit 13 outputs. The R-LLGMN calculation unit 13 performs calculation by the above-described neural network, corresponds to the configuration shown in FIG. 3, and can send and receive signals via the interface I / F 5. The R-LLGMN 13 executes a forward R-LLGMN calculation based on the identification data read from the storage unit or given from the identification unit 12 (described later). The identification unit 12 outputs the output result of the R-LLGMN calculation unit 13⁽⁵⁾O_cA class number c indicating the highest value (or a value higher than a predetermined threshold value) in (t) is selected as an identification result. The buffer memory 14 holds the identification result of the identification unit 12. The identification result is output to the buffer memory 14 by the processing unit.
[0076]
FIG. 7 is a flowchart showing identification processing using a neural network according to the present invention.
The R-LLGMN calculation unit 13 outputs the weight w output from the storage unit at the end of learning.^c _{k ′, k, m, h}Is read (S301). The R-LLGMN calculation unit 13 inputs identification data from the storage unit or the identification unit 12 (S303). The identification data is, for example, data obtained by extracting features from an electroencephalogram signal (EEG) or an electromyogram signal (EMG). The R-LLGMN calculation unit 13 executes the forward calculation process of R-LLGMN as described above (S305). The R-LLGMN calculation unit 13 is calculated by the forward calculation process.⁽⁵⁾O_cThe class number c indicating the highest value in (t) (or a value higher than a predetermined threshold value) is output as an identification result (S307).
[0077]
5). EEG signal discrimination experiment
In order to verify the effectiveness of the neural network proposed in the present invention, an identification experiment was conducted using an electroencephalogram (EEG) signal that fluctuates with time.
5.1 EEG signal measuring device)
[0078]
FIG. 8 is an overall view of the EEG signal measuring apparatus. A simple small electroencephalograph was used to measure the EEG signal. The measured EEG is taken into the computer wirelessly. The measured EEG signal is amplified and A / D converted after passing through a high-pass and low-pass (40 [Hz] analog filter. The electrodes are fixed to the headband, and the difference between the brain waves derived by bipolar derivation is obtained. Most of the measurement noise can be eliminated.
The EEG signal was measured under the following two conditions. The learning data for 120 seconds was measured, and the brain waves for identification were measured for 420 seconds by changing the visual stimulus according to the random time interval created by the M series.
[0079]
(1) Opening and closing eyes
The subject sits on a chair in a normal computer room and measures the time-series electroencephalogram pattern when the eyes are opened and closed in this state of resting.
(2) Opening and closing eyes and flash stimulation
Subject sits in a dark chair in the computer room. Keep calm. Install a flashlight (light source: xenon, energy: 1.76 [J]) that gives light stimulation (flashing at 4 [Hz]) at a position approximately 50 [cm] away from this subject. Measure time-series brain wave patterns during stimulation.
[0080]
5-2. Feature extraction of time series EEG signals
Since the electroencephalograph used in the present invention has only one pair of electrodes, spatial information regarding the part where the electroencephalogram is generated cannot be used. Therefore, in order to extract more information from the pair of electrodes, the frequency component of the EEG signal is used.
The EEG signal measured as preprocessing is subjected to fast Fourier transform (FFT) every 128 samples, and the frequency spectrum is calculated. Next, the effective classification band (0 to 35 [Hz] in the spectrum is divided based on δ, θ, α, β, etc. in clinical electroencephalography. Then, the average value of the power spectrum is calculated for each band. In this case, the time series data of the average values was created so that the time series range was normalized for each band from 0 to 1. The frequency bands used in this experiment were 0 to 8, 9 to Two bands of 35 [Hz] were used.
[0081]
5-3 Open / closed eye identification
First, FIG. 9 is a diagram illustrating an example of an identification result when the examinee A opens and closes his eyes. This figure shows the state of signal processing during estimation. The timing of changing the state of eye opening / closing from the top, the EEG signal corresponding to the two frequency bands, the output of LLGMN, and the combination of LLGMN and neural filter (Referred to as “LLGMN with NF”) (refer to documents [8] and [24]), R-LLGMN output, and identification results. The R-LLGMN setting is the number of states K₁= K₂= 1, number of components M_1,1= M_2,1= 1, learning data length T = 5, learning data number L = 3.
[0082]
In addition, the LLGMN part of LLGMN with NF has several M components._1,1= M_2,1= 3, number of learning data 112 (each 56). The NF section has 8 units, 168 learning data for each NF, and 5 retrospective time steps to consider. This NF has recurrent coupling in the intermediate layer, and can filter nonlinear signals. The recognition rate of R-LLGMN at this time is 97.6%, and a fairly high recognition rate can be realized. Also, R-LLGMN has a slightly improved signal stability, although the rise of the output waveform is somewhat delayed compared to LLGMN with NF.
[0083]
FIG. 10 is a diagram showing the results of experiments performed on three subjects (A, B: male, C: female). Here, for comparison, four methods of R-LLGMN, LLGMN, LLGMN with NF, and HMM (Note 1) proposed in the present invention were used. For each method, 10 different initial weights using random numbers from 0 to l were used to conduct experiments, and the respective values were the average value and standard deviation of each method. HMM setting is number of states K₁= K₂= 1, learning data length T = 5, and learning data number 40.
[0084]
In common with the data of five people, the other three methods show higher identification rates than LLGMN. This indicates that the LLGMN that contains only a static statistical model cannot cope with a dynamic EEG signal. On the other hand, the other three methods include a dynamic statistical model and are therefore adapted to EEG signals to achieve a high identification rate. However, since LLGMN with NF has a large network, LLGMN and NF must be learned separately, and learning is very difficult. [B] HMM is relatively easy to learn, but requires a large amount of learning data. On the other hand, R-LLGMN can learn dynamic and static characteristics at once with less learning data. Therefore, it can be seen that it is a very effective technique that compensates for the disadvantages of both LLGMN with NF and HMM.
[0085]
5-4 Identification of tristate light stimulus
Next, FIG. 11 is a diagram illustrating an example of the identification result when the subject A opens / closes eyes and flashes. It can be seen that the output of R-LLGMN is disturbed compared to the case of only opening and closing eyes (FIG. 9), and it is very difficult to identify the state of the subject. This is because there were few visual stimuli other than the luminance change at the time of the flash stimulus, and no clear change was observed in the EEG signal. The recognition rate at this time was 87.4%.
[0086]
FIG. 12 is a diagram showing identification results for five subjects. Although the discrimination rate is lower than the EEG pattern discrimination result when the eyes are opened and closed, on average, the discrimination rate of R-LLGMN is the highest, and it can be seen that discrimination is possible.
Next, the results when the experiment was performed by changing the number of learning data are shown. In this experiment, R-LLGMN and HMM were used for comparison. Each setting is time series length T = 5, number of components Mc, k = 1, upper logarithm Kc = 1, and the number of learning data is changed to T = 5, 10,. Note that the number of quantization levels of the HMM is QL = 4.
FIGS. 13 and 14 are diagrams showing changes in the identification rate (HMM and R-LLGMN) with respect to the number of teacher signals at that time. In this case, learning is performed with 10 different initial weights, and a considerably high identification rate is realized even if there are at least 210 untrained time-series data.
From the above results, it can be seen that a large amount of sample data is required for HMM learning. Also, LLGMN with NF has a large network, so LLGMN and NF must be learned separately, which makes learning very difficult [16]. On the other hand, R-LLGMN can learn dynamic and static characteristics at a time with less learning data. Therefore, it is possible to compensate for the disadvantages of both LLGMN with NF and HMM, which is a very effective technique for EGG identification.
[0087]
Finally, we investigated the change in identification rate due to the number of states and the difference in components. The number of learning data is L = 3, the time series length is T = 5, the number of components is changed to Mc, k = 1, 2,..., 10, and the number of states is changed to Kc = 1, 2,. c = 1,2,3).
[0088]
FIG. 15 is a diagram illustrating a change in the identification rate with respect to the number of states and the number of components at that time (subject E). This is the result of learning with 10 different initial weights and calculating the average and standard deviation of the discrimination rate for 210 untrained time-series data. From the figure, the expressive power of R-LLGMN increases by increasing the number of components and the number of states. It can be seen that the identification accuracy is improved.
[0089]
6). Application to EMG drive robot system
The purpose of this embodiment is to identify EMG signal patterns using a neural network. The EMG information has an amplitude pattern, a frequency pattern, and time series characteristics. However, most of the conventional identification methods do not use time series characteristics. Further, since an existing neural network is used, a statistical pattern according to the EMG signal is not used. Therefore, in this report, we introduce a new neural network that introduces a recurrent structure in the network and can cope with temporal changes of EMG signals. Experiments prove the high discrimination ability of this neural network.
[0090]
6-1 Introduction
In recent years, EMG-controlled electric prosthetic hands have been actively developed to support upper limb amputees (see References 1 and 2). When the EMG signal is used as an interface for a prosthetic hand, it is difficult to estimate the motion analytically because the muscle characteristics differ depending on the amputee and the measurement position. Therefore, conventionally, a method of performing action identification by learning has been tried. For example, Farry et al. Proposed a method of remotely operating a robot hand from EMG frequency information (see reference [25]). Huang and Chen also identified Eight motions using EMG integral myoelectricity etc. using an error back-propagation neural network (hereinafter abbreviated as BPNN) (see reference [26]).
The authors have so far built an EMG control type manipulator system for human support. This system identifies EMG signals using a Log-Linearized Gaussian Mixture Network (hereinafter abbreviated as LLGMN) (refer to reference [27]) based on a mixed Gaussian distribution model and a log-linear model, and the identification results are transmitted to the robot. It is used as a manipulator control means (see reference [28]).
[0091]
However, none of the conventional methods considers the history of the input signal and only learns the static characteristics of the problem of interest. For this reason, there is a limit in identifying a dynamic time-series signal whose target varies with time.
Therefore, in this embodiment, an attempt was made to introduce a new neural network having a recurrent structure in the pattern identification unit. Since this neural network includes a hidden Markov model as a network structure, it is possible to identify a dynamic time series signal. The outline of the prosthetic hand robot system used in the present embodiment will be described, and the result of an EMG identification experiment by a cutting person will be shown.
[0092]
6-2 EMG-controlled human support manipulator
6.2.1 System configuration
FIG. 16 shows a configuration diagram of an EMG-controlled human support manipulator system (see document [28]). This system is composed of an arm control unit and an artificial hand control unit. Move Master RM-501 (manufactured by Mitsubishi Electric Corporation) was used as the arm control unit, and an ultrasonic motor-driven power prosthesis (see reference [29]) was used as the prosthetic hand control unit. The arm control unit uses a magnetic three-dimensional position sensor as an input device to realize a large movement of the entire arm. On the other hand, the prosthetic hand control unit identifies the intended forearm motion from the operator's EMG signal using a neural network, and realizes fine movement of the hand. At this time, by introducing an impedance model of the human forearm into the artificial hand control, a supple movement like a human being is realized (see reference [30]).
[0093]
6-2.2 Action identification by neural network
6-2.2.1 EMG signal processor
First, the EMG signal measured from the L pair of electrodes attached to the operator's arm is full-wave rectified for each channel, and then passed through a second-order digital Butterworth filter (cutoff 1 [Hz]). The smoothed signal is sampled at a sampling frequency of 100 [Hz]. Then, the signal normalized so that the sum of all channels of this signal becomes 1 is input vector x (t) = [a₁(T), a₂(T), ..., a_L(T)]^T∈R^L(T = 1,..., T).
[0094]
6-3 Experiment
In order to verify the effectiveness of the developed neural network, an operation identification experiment using EMG signals was performed. Here, for comparison, three methods of R-LLGMN, LLGMN (see reference [27]) and BPNN were used. The test subject was a male (42 years old) who had cut his forearm at a site about 15 cm from the right wrist due to an accident about a year ago. R-LLGMN settings are: number of electrodes L = 8, number of target movements C = 8 (palm flexion, dorsiflexion, pronation, pronation, grip, open, wrist co-contraction, hand co-contraction), number of states K₁= K₂= 1, number of components M_1,1= M_2,1= 1, learning data length T = 4, and learning data number N = 5. The BPNN was determined by trial and error as the hidden layer 2 and the number of units (input layer-8, second layer-10, third layer-10, output layer-8). The EMG signal was collected for 2.5 seconds for each operation. However, a 15-second break was taken between each action.
[0095]
FIG. 17 shows the identification result. The values in the table are the mean (Mean) and standard deviation (SD) of the discrimination rate for each action. From the table, it can be seen that R-LLGMN has a higher identification rate than LLGMN, and the identification accuracy is improved by using past history information. It can also be seen that both LLGMN and R-LLGMN have a smaller standard deviation than BPNN and can be identified stably.
[0096]
6-4 Summary
In the present embodiment, the ability of a new neural network having recurrent coupling, R-LLGMN, was compared and an introduction to an EMG controlled human support manipulator system was attempted. As a result, it was shown that the characteristics of the time series signal can be used effectively, and the identification accuracy is improved. In the future, we plan to consider pre-processing methods suitable for R-LLGMN to improve the ambiguity discrimination between continuous motions.
[0097]
7). References
[1] D. E. Rumelhart and J. L. McClelland, "Learning Internal Representations by Error Propagation,"
Parallel Distributed Processing, Explorations in the Microstructure of Cognition, Vol. 1, D.E.Rumel-
hart, J.L.McClelland, and the PDP Research Group, Eds.MIT Press, 1986, pp. 318-362.
[2] T. Caelli, D.M. Squire, and T.P.J.Wild, "Model-based neural networks," Neural Networks, Vol. 6, No. 5, pp. 613-625, 1993.
[3] T. Tsuji and K. Ito "Preorganized Neural Networks: Error Back-Propagation Learning of Manipu-
lator Dynamics, "Journal of Artificial Neural Networks, Vol. 2, No. 1-2, pp. 81-95, 1995.
[4] M.D. Richard and R.P. Lippmann, "Neural network classifiers estimate Bayesian a posteriori probabilities," Neural Computation, Vol. 3, pp. 461-483, 1991.
[5] H. G. C. Traven, "A neural network approach to statistical pattern classification by" semiparametric "estimation of probability density functions," IEEE
Trans. Neural Networks, Vol. 2, No. 3, pp. 366-377, 1991.
[6] L.I. Perlovsky and M.M. McManus, "Maximum likelihood neural networks for sensor fusion and adaptive classification," Neural Networks, Vol. 4, No. 1, pp. 89-102, 1991.
[7] Toshio Tsuji, Daiichiro Mori, Koji Ito “EMG Motion Discrimination Method Using Neural Networks Incorporating Statistical Structure,” IEEJ Transactions C, Vol. 112-C, No. 8, pp. 465-473, 1992 .
[8] Toshio Tsuji, Hiroyuki Ichinobu, Makoto Kaneko, “Feed-forward neural network using mixed normal distribution model”, IEICE Transactions D-II, Vol. 77, No. 10, pp. 2093- 2100,1994.
[9] M.I. Jordan and R.A. Jacobs "Hierarchical combinations of experts and the EM algorithm," Proc. IEEE Int. Joint Conf. Neural Networks 1993, Vol. II, pp. 1339-1344.
[10] S. Lee and S. Shimoji, "Self-organization of Gaussian mixture model for learning class pdfs in pattern classification," Proc. IEEE Int. Joint Conf. Neural Networks 1993, Vol. III, pp. 2492-2495.
[11] R.L. Streit and T.E. Luginbuhl, "Maximum likelihood training of probabilistic neural networks,"
IEEE Trans. Neural Networks, Vol. 5, No. 5, pp. 764-783, 1994.
[12] B.S. Everitt and D.J. Hand, "Finite Mixture Distributions", Chapman and Hall, 1981.
[13] O. Fukuda, T. Tsuji, A. Ohtsuka, and M. Kaneko, "EMG-based human-robot interface for rehabilitation aid," Proc IEEE Int Conf on Robotics and Automation, pp. 3492-3497, Leuven, 1998.
[14] L.E. Baum and T. Petrie, "Statistical inference for probabilistic function of finite state Markov chains," Ann. Math. Stat., Vol. 37, No. 6, pp. 1554-1563, 1966.
[15] P.J. Werbos, "Back propagation through time: what it does and how to do it," Proceedings of the IEEE, Vol. 78, No. 10, pp. 1550-1560, 1990.
[16] T. Tsuji, O. Fukuda, M. Kaneko and K. Ito "Pattern Classification of Time-series EMG Signals Using Neural Networks," International Journal of Adaptive Control and Signal Processing. (In press)
[17] L. R. Rablner, "A tutorial on hidden markov model and selected applications in speech recognition," Proceedings of the IEEE, Vol. 77, No. 2, pp. 257-286, 1989.
[18] A.P. Dempster, N. M. Larld and D.B. Rubin, "Maximum likelihood from incomplete data via the
EM algorithm, "J. Roy. Stat. Soc. Sereies B, Methodological, Vol. 39, No. 1, pp. 1-38, 1977.
[19] L. E. Baum and G. R. Sell, "Growth functions for transformations on manifold," Ann. Math. Stat.,
Vol. 27, No. 2, pp. 211-2271 1968.
[20] B.H. Juang, S.E. Levinson, and M.M. Sondhi, "Maximum Likelihood estimation for multivariate
mixture observations of Markov chains, "IEEETrans. Informat. Theory, Vol. IT-32, No. 2, pp. 307-309, 1986.
[2l] J.S. Bridle, "Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation," Speech Communication, Vol. 9, No. 1, pp. 83-92, 1990.
[22] H Bourlard and C. J. Wellekens, "Links between Markov models and multilayer perceptrons," in Advances in Neural Information Processing Systems I, D.S. Touretzky, Eds. Morgan Kaufmann, Los Altos, CA, 1989, pp. 502-510.
[23] D.M.Titterington, A.F.M.Simth and U.E.Markov, Statistical analysis of finite mixture distributions, John Wiley & Sons, New York, 1985.
[24] Osamu Fukuda, Toshio Tsuji, Makoto Kaneko, “Discrimination of time-series EEG patterns using neural networks,” IEICE Transactions, D-II, Vol. J88, No. 7, pp. 1896-1903, 1997.
[25] K. A. Farry, I. D. Walker and R. G. Baraniuk, `` Myoelectric Teleoperation of a Complex Robotic Hand '', IEEE Transactions on Robotics and Automation, Vol. 12, No. 5, pp. 775-787, 1996.
[26] H.-P. Huang and C.-Y. Chen,
`` Development of a Myoelectric Discrimination System for a Multi-Degree Prosthetic Hand '',
Proceedings of the 1999 IEEE International Conference on Robotics and Automation, Vol. 3, pp. 2392-2397, 1999. [27] Toshio Tsuji, Hiroyuki Ichinobu, Makoto Kaneko,
"Feed forward neural network using mixed normal distribution model", IEICE Transactions, D-II, Vol. 77, No. 10, pp. 2093-2100, 1994.
[28] Osamu Fukuda, Toshio Tsuji, Makoto Kaneko,
`` Manually controlled human support manipulator using EMG signal '',
Journal of the Robotics Society of Japan, Vol. 18, No. 3, pp. 387-394, 2000.
[29] Ito (Hiro), Nagaoka, Kaoru, Kato, Ito (Correct),
`` 3-DOF forearm myoelectric prosthesis using an ultrasonic motor '',
Transactions of the Society of Instrument and Control Engineers, Vol. 27, No. 11, pp. 1281-1289, 1991.
[30] Toshio Tsuji, Hiroki Shigeyoshi, Osamu Fukuda, Makoto Kaneko,
`` Biomimetic control of forearm power prosthesis based on EMG signal '',
Transactions of the Japan Society of Mechanical Engineers (C), Vol. 66, No. 648, pp. 2764-2771, 2000.
[0098]
The neural network or system of the present invention includes a neural network calculation method, a neural network processing program for causing a computer to execute each procedure, a computer-readable recording medium storing the neural network processing program, and neural network processing. The program product can be provided by a program product that can be loaded into the internal memory of the computer, a computer such as a server that includes the program, and the like.
[0099]
【The invention's effect】
In the present invention, a new recurrent type neural network R-LLGMN that can improve the performance of LLGMN and can adapt to dynamic data changing with time has been proposed. This neural network contains a hidden Markov model, which is one of the dynamic probability models, and can learn time series characteristics that were not possible with LLGMN. In addition, statistical parameters such as transition probabilities and output probability density functions are expressed as network weights, and the weights can be adjusted by a learning rule of the error back propagation method. In order to verify the effectiveness and validity of the proposed method, we performed identification experiments on EEG signals that fluctuate with time and achieved sufficient identification accuracy. Furthermore, since the terminal learning method is introduced, the end time at the time of pattern recognition of the biological signal can be displayed in advance, and the mental burden on the user can be reduced.
[Brief description of the drawings]
FIG. 1 is a diagram showing an HMM having a mixed Gaussian continuous probability density distribution.
FIG. 2 is a configuration diagram of a neural network proposed in the present invention.
FIG. 3 is a configuration diagram of hardware for realizing a neural network.
FIG. 4 is a flowchart showing learning processing of a neural network.
FIG. 5 is a flowchart showing forward calculation processing of R-LLGMN.
FIG. 6 is a configuration diagram for executing a biological signal identification process using a neural network according to the present invention.
FIG. 7 is a flowchart showing identification processing using a neural network according to the present invention.
FIG. 8 is an overall view of an EEG signal measuring apparatus.
FIG. 9 is a diagram showing an example of an identification result when the examiner A opens and closes his eyes.
FIG. 10 is a diagram showing results of experiments performed on three subjects (A, B: male, C: female).
FIG. 11 is a diagram showing an example of the identification result when subject A opens / closes eyes and flash stimulation.
FIG. 12 is a diagram showing identification results for five subjects.
FIG. 13 is a diagram showing a change in identification rate (HMM) with respect to the number of teacher signals.
FIG. 14 is a diagram showing a change in identification rate (R-LLGMN) with respect to the number of teacher signals.
FIG. 15 is a diagram showing a change in the identification rate with respect to the number of states and the number of components.
FIG. 16 is a configuration diagram of an EMG control type human support manipulator system.
FIG. 17 is a diagram showing an identification result.
FIG. 18 is a configuration diagram of LLGMN.
[Explanation of symbols]
1 Storage unit
2 Input section
3 Output section
4 processing section
5 I / F
11 Feature extraction unit
12 Identification part
13 Buffer memory
14 R-LLGMN operation part

Claims

階層型ニューラルネットワークであって、
処理部が、入力ベクトルを入力し、前記入力ベクトルをＨ次元のベクトルＸ _ｈ（ｔ）（ｈ＝１，…，Ｈ）に非線形変換して出力する前処理手段と、
処理部が、前記前処理手段からのＨ次元のベクトルをそれぞれ入力し、ニューラルネットワークの各層の構成を表す式に関するデータを記憶した第１テーブルを参照し、次式の入出力関係で出力する複数の第１層のユニット｛ｈ｝（ｈ＝１，…，Ｈ）と、
^（１）Ｉ_ｈ（ｔ）＝Ｘ_ｈ（ｔ）（８）
^（１）Ｏ_ｈ（ｔ）＝^（１）Ｉ_ｈ（ｔ）（９）
（ここで、^（１）Ｉ_ｈ（ｔ）と^（１）Ｏ_ｈ（ｔ）はｈ番目のユニットの入出力を表す。）
処理部が、重み係数を記憶した第２テーブルを参照し、前記第１層のユニットからの出力を重み係数ｗ^ｃ _{ｋ’，ｋ，ｍ，ｈ}を介して入力し（^（２）Ｉ^ｃ _{ｋ’，ｋ，ｍ}（ｔ））、前記第１テーブルを参照し、次式の入出力関係で出力する（^（２）Ｏ^ｃ _{ｋ’，ｋ，ｍ}（ｔ））複数の第２層のユニット｛ｃ，ｋ，ｋ’，ｍ｝（ｃ＝１，…，Ｃ；ｋ，ｋ’＝１，…，Ｋ _ｃ；ｍ＝１，…，Ｍ _ｃ，ｋ）と、

（ただし、分析の対象とする事象はＣ個で、それぞれの事象ｃ（ｃ∈｛１，…，Ｃ｝）はＫ _ｃ個の状態から構成されているとき、Ｋ_ｃは隠れマルコフモデルの状態数に対応するパラメータ、Ｍ_ｃ，ｋは事象ｃ及び状態ｋに対応する混合ガウス分布モデルのコンポーネント数を表す。）
処理部が、前記第２層のユニット｛ｃ，ｋ，ｋ’，ｍ｝（ｍ＝１，…，Ｍ_ｃ，ｋ）の出力を入力し、前記第１テーブルを参照し、前記入力に１時刻前の第４層のユニットの出力を乗じた値を次式の入出力関係で出力する複数の第３層のユニット｛ｃ，ｋ，ｋ’｝と、

処理部が、前記第１テーブルを参照し、入力^（４）Ｉ^ｃ _ｋ（ｔ）と出力^（４）Ｏ^ｃ _ｋ（ｔ）を次式の入出力関係で出力する複数の第４層のユニット｛ｃ，ｋ｝（ｋ＝１，…，Ｋ _ｃ）と、

処理部が、前記第４層のユニットのＫ_ｃユニットの出力を入力し、前記第１テーブルを参照し、次式の入出力関係で出力する複数の第５層のユニット｛ｃ｝と、

処理部が、前記第５層のユニットの出力を出力部に出力する又は記憶部に記憶する手段と
を備えたニューラルネットワーク。 A hierarchical neural network,
A preprocessing means for receiving a non-linear conversion of the input vector into an H-dimensional vector X _h (t) (h = 1,..., H);
Multi-processing unit, wherein the type each H-dimensional vector from the pre-processing means, by referring to the first table storing data on expression for the configuration of each layer of the neural network outputs in the input-output relationship of the following formula Unit {h} (h = 1,..., H) of the first layer of
⁽¹⁾ I _h (t) = X _h (t) (8)
⁽¹⁾ O _h (t) = ⁽¹⁾ I _h (t) (9)
(Here, ⁽¹⁾ I _h (t) and ⁽¹⁾ O _h (t) represent the input and output of the h-th unit.)
The processing unit refers to the second table in which the weighting factor is stored, and inputs the output from the unit of the first layer via the weighting factor w ^c _{k ′, k, m, h} ( ⁽²⁾ I ^c _{k ', k, m (t)} ), by referring to the first table, and outputs in the input-output relationship of the formula ^{^{_{((2) O c k'}}} , k, m (t)) a plurality of second layer units {C, k, k ′, m} (c = 1,..., C; k, k ′ = 1,..., K _c ; m = 1,..., M _{c, k} ) ;

(However, there are C events to be analyzed, and each event c (cε {1,..., C}) is composed of K _c states, where K _c is the state of the hidden Markov model. (The parameter corresponding to the number, M _{c, k} represents the number of components of the mixed Gaussian distribution model corresponding to the event c and the state k.)
The processing unit inputs the output of the unit {c, k, k ′, m} (m = 1,..., M _{c, k} ) of the second layer, refers to the first table, and inputs 1 to the input. A plurality of third-layer units {c, k, k ′} that output a value obtained by multiplying the output of the fourth-layer unit before the time by an input / output relationship of the following equation ;

The processing unit refers to the first table, and outputs a plurality of fourth-layer units that output the input ⁽⁴⁾ I ^c _k (t) and the output ⁽⁴⁾ O ^c _k (t) in an input / output relationship of the following equation: {C, k} (k = 1,..., K _c ) ,

The processing unit inputs the output of the _Kc unit of the fourth layer unit , refers to the first table, and outputs a plurality of fifth layer units {c} that are output in the input / output relationship of the following equation :

A processing unit that outputs the output of the unit of the fifth layer to the output unit or stores it in the storage unit;
Neural network with

前記前処理手段は、入力されたｄ次元ベクトルの要素をＨ次元（Ｈ＝１＋ｄ（ｄ＋３）／２）の値に非線形演算により変換し、
前記第１の層のユニットは、各入力ベクトル毎のＨ個のユニットを有し、
前記第２の層のユニットは、分析の対象とする事象Ｃ（ｃ＝１，２，…，Ｃ）、各事象を構成する隠れマルコフモデルの状態ｋ（ｋ＝１，２，…，Ｋ_ｃ）、１時刻前の該状態ｋ’（ｋ’＝１，２，…，Ｋ_ｃ）、及び、状態の事後確率を混合して表すガウス分布のコンポーネントｍ（ｍ＝１，２，…，Ｍ_ｃ，ｋ）毎のＣ×Ｋｃ×Ｋｃ×Ｍ_ｃ，ｋのユニットを有し、
前記第３の層のユニットは、事象数ｃ、状態数ｋ、１時刻前の状態数ｋ’毎のＣ×Ｋ_ｃ×Ｋ_ｃ個ユニットを有し、
前記第４の層のユニットは、事象数ｃ及び状態数ｋ毎のＣ×Ｋ_ｃ個ユニットを有し、
前記第５の層のユニットは、前記事象数ｃ毎のＣ個のユニットを有する請求項１に記載のニューラルネットワーク。The preprocessing means converts an element of the input d-dimensional vector into an H-dimensional (H = 1 + d (d + 3) / 2) value by a non-linear operation,
Units of the first layer has a H number of units for each input vector,
The unit of the second layer, elephants C as a target of analysis (c = 1, 2, ..., C), the state k (k = 1, 2 hidden Markov model constituting each thing elephants, ..., K _c ), a component m (m = 1, 2,...) Of the state k ′ (k ′ = 1, 2,..., K _c ) one hour before and a mixture of the posterior probabilities of the states. , M _{c, k} ), C × Kc × Kc × M _{c, k} units,
The unit of the third layer has things elephants number c, and C × K _{_c} × K _c pieces units of the number of states k, 1 times the previous state number k 'each,
The unit of the fourth layer has a C × K _c pieces units per thing elephant number c and the number of states k,
The unit of the fifth layer, the neural network of claim 1, before having a C-number of units of each article elephant number c.

前記前処理手段は、
入力ベクトルｘ＝［ｘ_１，ｘ_２，・・・，ｘ_ｄ］^Ｔ ∈Ｒ^ｄに非線形演算による前処理を施し、次のＸ ∈Ｒ^Ｈに変換することを特徴とする請求項１又は２に記載のニューラルネットワーク。

The preprocessing means includes
Input vector _{_{x = [x 1, x 2}} , ···, x d] subjected to a pretreatment with a non-linear operation on ^T ∈R ^d, claim 1 or 2, characterized in that conversion to the next X ∈R ^H Neural network described in 1.

ニューラルネットワークの出力についての関数である評価関数Ｊは、学習時の時間変化が次式の学習則が成り立ち、評価関数Ｊが減少する方向に重みの差分を定めることを特徴とする請求項１乃至３のいずれかに記載のニューラルネットワーク。
ｄＪ／ｄｔ＝―ηＪ^β
（ここで、ηは学習率、βは０＜β＜１を満たす定数）The evaluation function J, which is a function of the output of the neural network, determines a difference in weight in a direction in which the evaluation function J decreases with respect to a time change during learning in accordance with the following learning rule. 4. The neural network according to any one of 3.
dJ / dt = -ηJ ^β
(Where η is the learning rate, β is a constant that satisfies 0 <β <1)

ニューラルネットワークの出力についての評価関数Ｊに関して学習の終了時刻ｔ_ｆから学習率η_ｔを計算し、
これを次式に記入して学習を行なうことにより収束時間を指定することを特徴とする請求項１乃至４のいずれかに記載のニューラルネットワーク。

Calculate the learning rate η _t from the learning end time t _f for the evaluation function J for the output of the neural network,
5. The neural network according to claim 1, wherein a convergence time is designated by performing learning by entering this into the following equation.

さらに、学習中に残りの学習時間ｔ_ｆ−ｔと評価関数Ｊを表示する出力部をさらに備えた請求項５に記載のニューラルネットワーク。The neural network according to claim 5, further comprising an output unit that displays the remaining learning time t _f −t and the evaluation function J during learning.

処理部が、状態数ｋ（ｋ＝１，２，…，Ｋ_ｃ）、コンポーネント数ｍ（ｍ＝１，２，…，Ｍ_ｃ，ｋ）、事象数ｃ（ｃ＝１，２，…，Ｃ）、入力信号数ｄ、時系列信号長Ｌ、収束時間ｔ_ｆ、学習データ数Ｎの設定を行う機能と、
処理部が、ターミナルラーニングに基づき学習率η_ｔを計算する機能と、
処理部が、学習回数Ｔ及び学習データ数Ｎに基づき、請求項１乃至６のいずれかに記載のニューラルネットワークにより前向き演算を実行する前向き演算機能と、
処理部は、重みｗ^ｃ _{ｋ’，ｋ，ｍ，ｈ}を含む学習結果を記憶部に記憶する機能と
を備えたニューラルネットワークシステム。Processing unit, the number of states _{k (k = 1,2, ...,} K c), the component number _{m (m = 1,2, ...,} M c, k), things elephants number c (c = 1,2, ... C), the number of input signals d, the time series signal length L, the convergence time t _f , and the number of learning data N,
A function that the processing unit calculates a learning rate η _t based on terminal learning;
A forward calculation function for executing a forward calculation by the neural network according to any one of claims 1 to 6, wherein the processing unit is based on the learning number T and the learning data number N;
A processing unit is a neural network system having a function of storing a learning result including weights w ^c _{k ′, k, m, and h} in a storage unit.

生体信号が入力され、入力された生体信号に従い識別用データを抽出する特徴抽出部と、
前記特徴抽出部が抽出した識別用データに基づき、請求項１乃至６のいずれかに記載のニューラルネットワークにより前向き演算を実行するＲ−ＬＬＧＭＮ演算部と、
前記Ｒ−ＬＬＧＭＮ演算部の出力結果の中で最も高い若しくは低い値又は予め決められた閾値より高い若しくは低い値を示した事象番号ｃを識別結果として選択する識別部と
を備えたニューラルネットワークシステム。A feature extraction unit that receives a biological signal and extracts identification data in accordance with the input biological signal;
Based on the identification data extracted by the feature extraction unit, an R-LLGMN calculation unit that performs forward calculation by the neural network according to any one of claims 1 to 6,
Neural network system and a discrimination unit for selecting the highest or low value or pre-higher-determined threshold or lower it showed values elephant number c in the R-LLGMN arithmetic unit output as an identification result .

前記Ｒ−ＬＬＧＭＮ演算部は、
前記記憶部から学習終了時に出力された重みｗ^ｃ _{ｋ’，ｋ，ｍ，ｈ}を読み込む機能と、
前記記憶部又は前記識別部から識別用データを入力する機能と、
これら重み及び識別用データに基づき、前記前向き演算処理を実行する機能と
を備えた請求項８に記載のニューラルネットワークシステム。The R-LLGMN calculation unit is
A function of reading weights w ^c _{k ′, k, m, h} output from the storage unit at the end of learning;
A function of inputting identification data from the storage unit or the identification unit;
The neural network system according to claim 8, further comprising a function of executing the forward calculation process based on the weight and identification data.

前記前向き演算機能では、
処理部は、時系列信号長Ｌを記憶部から読み出し、初期値として設定する機能と、
処理部が、時系列信号長Ｌが０以下になるまで、前記前向き演算を実行する機能と、
処理部は、演算結果である第５層の出力を記憶部に記憶する機能と
を備えた請求項７乃至９のいずれかに記載のニューラルネットワークシステム。In the forward calculation function,
The processing unit reads the time series signal length L from the storage unit and sets it as an initial value,
A function that the processing unit executes the forward calculation until the time-series signal length L becomes 0 or less;
The neural network system according to claim 7, wherein the processing unit has a function of storing an output of the fifth layer, which is a calculation result, in the storage unit.

前記特徴抽出部は、
生体信号として、脳波信号（ＥＥＧ）を入力し、
識別用データとして、特定の周波数帯域のパワースペクトルの平均値を要素とするベクトルを抽出することを特徴とする請求項７乃至１０のいずれかに記載のニューラルネットワークシステム。The feature extraction unit includes:
An electroencephalogram signal (EEG) is input as a biological signal,
The neural network system according to any one of claims 7 to 10, wherein a vector whose element is an average value of a power spectrum in a specific frequency band is extracted as identification data.

前記特徴抽出部は、
生体信号は、筋電位信号（ＥＭＧ）を入力し、
識別用データとして、整流及び平滑後の振幅値を要素とするベクトルを抽出することを特徴とする請求項７乃至１０のいずれかに記載のニューラルネットワークシステム。The feature extraction unit includes:
The biosignal is an EMG signal (EMG) input,
The neural network system according to any one of claims 7 to 10, wherein a vector having an amplitude value after rectification and smoothing as an element is extracted as identification data.