JP3733711B2

JP3733711B2 - Simplified quasi-Newton projection calculation system, neural network learning system, recording medium, and signal processing apparatus

Info

Publication number: JP3733711B2
Application number: JP29444597A
Authority: JP
Inventors: 雅彦立石; 一敏小柳; 裕司伊藤
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 1997-10-27
Filing date: 1997-10-27
Publication date: 2006-01-11
Anticipated expiration: 2017-10-27
Also published as: JPH11134314A

Description

【０００１】
【発明の属する技術分野】
本発明は、簡略化準ニュートン射影法演算システム、このシステムを利用した神経回路網学習システム、これらのシステムをコンピュータシステム上で実現するプログラムを記録した記録媒体、および信号処理装置に関する。
【０００２】
【従来の技術】
神経回路網（ニューラルネットワークとも言う。）は、パターン認識やデータ処理等に広く応用されている。この神経回路網は、繰り返し行われる学習処理によりその処理能力を獲得するものであり、迅速な学習と学習処理後に獲得される能力向上のために、シナプス荷重の変更方法がいくつか提案されている。
【０００３】
神経回路網はユニットからなる入力層、中間層、出力層と各層間を結合するシナプスから構成される。各シナプスはシナプス荷重という重みを持ち、このシナプス荷重を学習により変えることで様々な入出力特性を実現できる。以下シナプスの総数をＭとし、各シナプス荷重をｗ₁，ｗ₂，…ｗ_Mとする。また、ｗ＝［ｗ₁，ｗ₂，…ｗ_M］^tで表す。
【０００４】
神経回路網の学習は、教師入力信号を神経回路網に入力したときの神経回路網出力信号を計算し、この出力信号と教師出力信号と比較し、比較結果に基づいて各シナプス荷重ｗ₁，ｗ₂，…，ｗ_Mを変更して、教師出力信号と神経回路網出力信号との誤差、例えば、自乗誤差和Ｅ（ｗ）が最小になるようにする処理である。
【０００５】
一般に最小値はたとえばバックプロパゲーション法（McClelland,J.L.,Rumelhart,D.E., and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, Chapter 8, 1986）などの降下法によって計算する。
【０００６】
このバックプロパゲーション法の計算ステップを説明する。ここで、ｋは更新回数、ｋ_maxは更新回数の上限である。また降下法の模式図を図８（ａ）に示す。（厳密に言うと、以下のステップで求まるのは最小値ではなく極小値であるが、以下の説明において本質的な違いをもたらすものではない。）
ステップ１：ｋ＝０として、神経回路網のシナプス荷重に初期値ｗ^kを設定する。
【０００７】
ステップ２：ｗ^kにおけるＥ（ｗ^k）の勾配∇Ｅ（ｗ^k）を計算する。∇Ｅ（ｗ^k）＝０ならステップ４に飛ぶ。
ステップ３：Ｅ（ｗ^k+1）＜Ｅ（ｗ^k）を満たす新たな点ｗ^k+1を見つける。そしてｗ^kにｗ^k+1の値を設定して新たなｗ^kとして、ｋ＜ｋ_maxならステップ２に戻る。ｋ＝ｋ_maxなら、ステップ４に移る。
【０００８】
ステップ４：ｗ^kを解とする。
図８（ａ）の例では誤差曲面３０１において、初期値ｗ⁰を与えたときの学習の進行する様子を示す。ここではｋ回更新後の値ｗ^kにおいて最小値に収束している。
【０００９】
しかし適用事例によっては最小値が空間の無限遠に存在するものがある。このような事例の学習を行なうと、一部のシナプス荷重の絶対値がたとえば１０００を超えて増大し続ける。その例を図８（ｂ）に示す。このような神経回路網はシナプス荷重のダイナミックレンジが大きく、デジタル式演算装置の浮動小数点演算では正しい入出力特性が得られるが、固定小数点演算では大きな量子化誤差が発生し所望の入出力特性が得られない。民生品ではコスト削減等の理由で固定小数点ＣＰＵを用いるので、シナプス荷重が過大になるような神経回路網を組み込んで使用することはできない。
【００１０】
例えば、語長１６ビットの固定小数点演算で神経回路網を計算する場合を考える。［ｓｘｘｘｘｘｘｘ．ｘｘｘｘｘｘｘｘ］は小数部に８ビットを割り当てたデータ型を示す。ｓは符号ビット、ｘは数値データを表すビットである。このデータ型で表現できる数の精度は「１／２⁸＝０．００３９０６２５」であり、範囲は［−２^16-1／２⁸，（２^16-1−１）／２⁸］＝［−１２８，１２７．９９６０９３７５］である。
【００１１】
神経回路網を固定小数点演算で実現する場合、シナプス荷重ｗ＝［ｗ₁ ｗ₂ … ｗ_M］^tを固定小数点データ型で表現する。データ型は各シナプス荷重の絶対値の最大値により決まる。たとえばその値が１０００であるとすると、その格納のため整数部は１０ビット必要となり、小数部は５ビットしか取れない。すなわち、［ｓｘｘｘｘｘｘｘｘｘｘ．ｘｘｘｘｘ］となる。この精度は「１／２⁵＝０．０３１２５」であり、範囲は［−２^16-1／２⁵，（２^16-1−１）／２⁵］＝［−１０２４，１０２３．９６８７５］である。これでは演算精度を低下させ、量子化誤差が増大する原因となる。
【００１２】
上述した量子化誤差を低減させる方法として、バックプロパゲーション法にてシナプス荷重に上下限を設けて、学習させる方法が知られている（特開平７−１５２７１６号公報，特開平７−４４５１５号公報，特開平２−１４３３８４号公報）が、バックプロパゲーション法の特質から計算速度を高める各種の工夫が困難であり計算速度は不十分なものであった。
【００１３】
この他に、シナプス荷重の絶対値増大を抑制する方法としては、ペナルティ関数法がある（Michael A.Arbib, The Handbook of Brain Theory and Neural Networks, MIT Press,p643,p992）。これはＧ（ｗ）＝Ｅ（ｗ）＋μ×Ｆ（ｗ）で定義された自乗誤差和Ｅ（ｗ）と各シナプスの自乗の関数であるペナルティ項Ｆ（ｗ）の和で定義される関数Ｇ（ｗ）を最小化する方法である。係数μはＥ（ｗ）とＦ（ｗ）との相対的な重要度を決めるパラメーターである。
【００１４】
しかしながらペナルティ関数法ではパラメーターμを試行錯誤により設定しなければならないという問題があり、適切な解が得られるまでに長時間を要した。
【００１５】
【発明が解決しようとする課題】
上述した問題を生じない方法として、シナプス荷重の絶対値に上限を設定し、その上限を超えない範囲で学習を行なうことが考えられる。その実現には準ニュートン射影法であるＧｏｌｄｆａｒｂ（コールドファーブ）法が利用できる（たとえば今野浩、山下浩、非線型計画法、日科技連、p.264-267）。
【００１６】
しかしながらＧｏｌｄｆａｒｂ法は一般化逆行列等の複雑な計算が必要なため、プログラミングが困難であった。また一般化逆行列等の計算時間は長く、計算に用いる作業用メモリ領域としてかなり大きなものが必要であった。別の問題として、一般化逆行列の計算は桁落ち等の数値解析上の問題により、デジタル式演算装置では正確な計算ができないことがあり、その現象が生じた場合、計算結果が不正確になるという問題があった。
【００１７】
本発明は、デジタル式演算装置の固定小数点演算による準ニュートン射影法にて神経回路網の学習等を行う場合に、計算時間を短く、メモリの消費量も小さく、かつ計算結果が正確になる簡略化準ニュートン射影法演算システムを提供すること、更にこの簡略化準ニュートン射影法演算システムを利用した神経回路網学習システム、これらのシステムをコンピュータシステム上で実現するプログラムを記録した記録媒体および前記神経回路網学習システムによる学習処理により得られた神経回路網を組み込んだ信号処理装置の提供を目的とするものである。
【００１８】
【課題を解決するための手段及び発明の効果】
本発明の簡略化準ニュートン射影法演算システムは、固定小数点演算を行うデジタル式演算装置を用いて、式１にて表され式２の制約条件を満たすＭ個の変数ｗからなる関数Ｅ（ｗ）が最小値となる変数ｗの解を求めるに際して、基本的には前述した準ニュートン射影法を用いている。
【００１９】
すなわち、
直線探索を行って、関数Ｅ（ｗ）の値を小さくする変数ｗの値を求める第１処理手段と、
前記第１処理手段の処理の次に行われ、新しい変数ｗの値に基づく前記式２の新しい制約条件が有効になったら、新たに有効になった制約条件の係数ベクトルａ^rを、制約条件が有効である係数ベクトルから構成されている行列Ａ_qに加え、かつ前記関数Ｅ（ｗ）の勾配を表す転置行列∇^tＥ（ｗ）から前記関数Ｅ（ｗ）の変化方向を表すベクトルｄを求めるためのヘシアンＨを式３により更新して処理を前記第１処理手段に戻し、新しい変数ｗの値に基づく新しい制約条件が有効にならなかったら、新しいヘシアンＨを作成するための公式にて、新たなヘシアンＨを更新して処理を前記第１処理手段に戻す第２処理手段と、
前記第１処理手段にて、前記転置行列∇^tＥ（ｗ）と前記ヘシアンＨとの積に基づいて得られる前記関数Ｅ（ｗ）の変化方向を表すベクトルｄがゼロとなった場合には、式４に基づいて行列で得られるラグランジュ乗数λの要素すべてが非負ならば、そのときのｗを解として得て全処理を終了し、前記ベクトルｄがゼロでない場合には、ラグランジュ乗数λの負の要素の内、絶対値が最大のものに対応する制約条件の係数ベクトルａ^sを、前記行列Ａ_qから除いて、式５に基づいてヘシアンＨを更新して処理を第１処理手段に戻す第３処理手段と、を備えることにより、準ニュートン射影法による演算を行っている。
【００２０】
この準ニュートン射影法による処理において、１〜Ｍの整数の内、相異なるｑ個の整数を要素とする集合Ｉ_cを式６に示すごとく表し、各ｌ_i（ｉ＝１，２，…，ｑ）に対して１行Ｍ列のベクトルで、第ｌ_iの要素がｃ_liであり他の要素がすべて０、かつｃ_liが＋１または−１で定義されるベクトルを式７の記号で表し、更にｑ行Ｍ列の行列Ａ_qを式８に示すごとく表した場合に、前記式４の計算の内、式９にて表す行列の計算の代わりに、ｍ∈Ｉ_cならばｂ_m＝１、ｍ∈Ｉ_cでないならばｂ_m＝０である関数ｂ_mを対角要素とする対角行列ｄｉａｇ［ｂ₁ ｂ₂ …ｂ_M］の計算を用いることとして、準ニュートン射影法を簡略化している。
【００２１】
【数６】

【００２２】
この簡略化により、式９に示す一般化逆行列の計算をしなくて済む。したがって、計算時間が長くならず、計算に用いる作業用メモリ領域も小さくて済む。更に、桁落ち等の数値解析上の問題が生じないので、正確な計算ができる。
また同様に、
直線探索を行って、関数Ｅ（ｗ）の値を小さくする変数ｗの値を求める第１処理手段と、
前記第１処理手段の処理の次に行われ、新しい変数ｗの値に基づく前記式１２の新しい制約条件が有効になったら、新たに有効になった制約条件の係数ベクトルａ^rを、制約条件が有効である係数ベクトルから構成されている行列Ａ_qに加え、かつ前記関数Ｅ（ｗ）の勾配を表す転置行列∇^tＥ（ｗ）から前記関数Ｅ（ｗ）の変化方向を表すベクトルｄを求めるためのヘシアンＨを式１３により更新して処理を前記第１処理手段に戻し、新しい変数ｗの値に基づく新しい制約条件が有効にならなかったら、新しいヘシアンＨを作成するための公式にて、新たなヘシアンＨを更新して処理を前記第１処理手段に戻す第２処理手段と、
前記第１処理手段にて、前記転置行列∇^tＥ（ｗ）と前記ヘシアンＨとの積に基づいて得られる前記関数Ｅ（ｗ）の変化方向を表すベクトルｄがゼロとなった場合には、式１４に基づいて行列で得られるラグランジュ乗数λの要素すべてが非負ならば、そのときのｗを解として得て全処理を終了し、前記ベクトルｄがゼロでない場合には、ラグランジュ乗数λの負の要素の内、絶対値が最大のものに対応する制約条件の係数ベクトルａ^sを、前記行列Ａ_qから除いて、式１５に基づいてヘシアンＨを更新して処理を第１処理手段に戻す第３処理手段と、を備えることにより、準ニュートン射影法による演算を行うに際して、次のような処理としても良い。
【００２３】
すなわち、ヘシアンＨの第ｉ行第ｊ列の要素をｈ_ijで表し、全ての制約条件における各係数ベクトルａ^rの第ｒ要素が＋１または−１であり、他の要素がすべて０であるとして表すことで、前記式１３の計算の内、式１６にて表す行列の計算の代わりに、第ｉ行第ｊ列の要素が式１７で表されるＭ行Ｍ列の行列の計算を用いることを特徴とするものである。
【００２４】
【数７】

【００２５】
この簡略化により、５回の行列の乗算が必要な式１６が、式１７のごとく簡略化される。したがって、計算時間が長くならず、計算に用いる作業用メモリ領域も小さくて済む。更に、桁落ち等の数値解析上の問題が生じないので、正確な計算ができる。
【００２６】
また同様に、
直線探索を行って、関数Ｅ（ｗ）の値を小さくする変数ｗの値を求める第１処理手段と、
前記第１処理手段の処理の次に行われ、新しい変数ｗの値に基づく前記式２２の新しい制約条件が有効になったら、新たに有効になった制約条件の係数ベクトルａ^rを、制約条件が有効である係数ベクトルから構成されている行列Ａ_qに加え、かつ前記関数Ｅ（ｗ）の勾配を表す転置行列∇^tＥ（ｗ）から前記関数Ｅ（ｗ）の変化方向を表すベクトルｄを求めるためのヘシアンＨを式２３により更新して処理を前記第１処理手段に戻し、新しい変数ｗの値に基づく新しい制約条件が有効にならなかったら、新しいヘシアンＨを作成するための公式にて、新たなヘシアンＨを更新して処理を前記第１処理手段に戻す第２処理手段と、
前記第１処理手段にて、前記転置行列∇^tＥ（ｗ）と前記ヘシアンＨとの積に基づいて得られる前記関数Ｅ（ｗ）の変化方向を表すベクトルｄがゼロとなった場合には、式２４に基づいて行列で得られるラグランジュ乗数λの要素すべてが非負ならば、そのときのｗを解として得て全処理を終了し、前記ベクトルｄがゼロでない場合には、ラグランジュ乗数λの負の要素の内、絶対値が最大のものに対応する制約条件の係数ベクトルａ^sを、前記行列Ａ_qから除いて、式２５に基づいてヘシアンＨを更新して処理を第１処理手段に戻す第３処理手段と、を備えることにより、準ニュートン射影法による演算を行うに際して、次のような処理としても良い。
【００２７】
すなわち、１〜Ｍの整数の内、相異なるｑ個の整数を要素とする集合Ｉ_cを式２６に示すごとく表し、集合Ｉ_cに含まれる各整数ｌ_i（ｉ＝１，２，…，ｑ）に対して１行Ｍ列のベクトルで、第ｌ_iの要素がｃ_liであり他の要素がすべて０、かつｃ_liが＋１または−１で定義されるベクトルを式２７の記号で表し、更に前記行列Ａ_qを式２８に示すごとくｑ行Ｍ列の行列で表し、∇Ｅ（ｗ^k）を式２９に示すごとく表すことで、前記式２４の計算の内、式３０にて表す行列の計算の代わりに、式３１にて表す計算を用いることを特徴とするものである。
【００２８】
【数８】

【００２９】
この簡略化により、式３０に示す一般化逆行列の計算をしなくて済む。したがって、計算時間が長くならず、計算に用いる作業用メモリ領域も小さくて済む。更に、桁落ち等の数値解析上の問題が生じないので、正確な計算ができる。
また同様に、
直線探索を行って、関数Ｅ（ｗ）の値を小さくする変数ｗの値を求める第１処理手段と、
前記第１処理手段の処理の次に行われ、新しい変数ｗの値に基づく前記式４２の新しい制約条件が有効になったら、新たに有効になった制約条件の係数ベクトルａ^rを、制約条件が有効である係数ベクトルから構成されている行列Ａ_qに加え、かつ前記関数Ｅ（ｗ）の勾配を表す転置行列∇^tＥ（ｗ）から前記関数Ｅ（ｗ）の変化方向を表すベクトルｄを求めるためのヘシアンＨを式４３により更新して処理を前記第１処理手段に戻し、新しい変数ｗの値に基づく新しい制約条件が有効にならなかったら、新しいヘシアンＨを作成するための公式にて、新たなヘシアンＨを更新して処理を前記第１処理手段に戻す第２処理手段と、
前記第１処理手段にて、前記転置行列∇^tＥ（ｗ）と前記ヘシアンＨとの積に基づいて得られる前記関数Ｅ（ｗ）の変化方向を表すベクトルｄがゼロとなった場合には、式４４に基づいて行列で得られるラグランジュ乗数λの要素すべてが非負ならば、そのときのｗを解として得て全処理を終了し、前記ベクトルｄがゼロでない場合には、ラグランジュ乗数λの負の要素の内、絶対値が最大のものに対応する制約条件の係数ベクトルａ^sを、前記行列Ａ_qから除いて、式４５に基づいてヘシアンＨを更新して処理を第１処理手段に戻す第３処理手段と、を備えることにより、準ニュートン射影法による演算を行うに際して、次のような処理としても良い。
【００３０】
すなわち、１〜Ｍの整数の内、相異なるｑ個の整数を要素とする集合Ｉ_cを式４６に示すごとく表し、集合Ｉ_cに含まれる各整数ｌ_i（ｉ＝１，２，…，ｑ）に対して１行Ｍ列のベクトルで、第ｌ_iの要素がｃ_liであり他の要素がすべて０、かつｃ_liが＋１または−１で定義されるベクトルを式４７の記号で表し、更に前記行列Ａ_qを式４８に示すごとくｑ行Ｍ列の行列で表すことで、前記式４５の計算の内、式４９にて表す行列の計算の代わりに、第ｓ行ｓ列の要素が１で他の要素が全て０であるＭ行Ｍ列の計算を用いることを特徴とするものである。
【００３１】
【数９】

【００３２】
この簡略化により、行列の計算が不要となる。したがって、計算時間が長くならず、計算に用いる作業用メモリ領域も小さくて済む。更に、桁落ち等の数値解析上の問題が生じないので、正確な計算ができる。
また、これら全ての簡略化を用いたものであっても良く、より一層効果的である。
【００３３】
第２処理手段にて用いられる公式としては、ＢＦＧＳ公式、ＤＦＰ公式あるいは対称ランク１公式が挙げられる。
前記Ｍ個の変数ｗは、神経回路網における入力層のユニットから出力層のユニットに至るユニットを結合するＭ本のシナプスのシナプス荷重を表し、関数Ｅ（ｗ）は前記神経回路網に与えられる教師信号と前記神経回路網の出力との誤差を表し、第１処理手段、第２処理手段および第３処理手段によって行われる関数Ｅ（ｗ）が最小値となる変数ｗの解を求める処理は、前記神経回路網に対する学習処理であるものとして、上述した簡略化準ニュートン射影法演算システムを神経回路網学習システムに適用しても良い。
【００３４】
前述したごとく、メモリ不足を生じることなく短時間に学習して、精度の高い神経回路網を作成することができる。
なお、このような簡略化準ニュートン射影法演算システムや神経回路網学習システムの各手段をコンピュータシステムにて実現する機能は、例えば、コンピュータシステム側で起動するプログラムとして備えることができる。このようなプログラムの場合、例えば、フロッピーディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ハードディスク等のコンピュータ読み取り可能な記録媒体に記録し、必要に応じてコンピュータシステムにロードして起動することにより用いることができる。この他、ＲＯＭやバックアップＲＡＭをコンピュータ読み取り可能な記録媒体として前記プログラムを記録しておき、このＲＯＭあるいはバックアップＲＡＭをコンピュータシステムに組み込んで用いても良い。
【００３５】
上述した神経回路網学習システムによる学習処理により得られた神経回路網は、信号処理装置に組み込まれることにより、入力層のユニットから出力層のユニットへ、Ｍ本のシナプスのシナプス荷重に基づいて信号を処理することができる。このような信号処理装置に組み込むためには、例えば、処理される入力信号を、神経回路網の入力層のユニットへ入力する入力手段と、神経回路網の出力層のユニットの状態を読み取って信号として出力する出力手段と、を備える。
【００３６】
前記神経回路網学習システムによる学習処理は、安価なデジタルコンピュータでも迅速に学習でき、しかも正確な学習結果を得ることができるので、信号処理装置においても精度の高い出力をなすことができる。
【００３７】
【発明の実施の形態】
図１は、上述した発明が適用された神経回路網学習システム２の概略構成を表すブロック図である。
本神経回路網学習システム２は、神経回路網１２、学習制御部１４、標準パターン格納部１８を備える。ここでは、神経回路網１２はＲＡＭやＥＥＰＲＯＭ等の書換え可能なメモリが用いられる。更に、学習制御部１４はコンピュータ装置として構成され、その中心となるＣＰＵはデジタル式演算装置を用いている。学習制御部１４は、ハードディスクにて構成される標準パターン格納部１８に他のデータと共に記憶されているプログラムをＲＡＭにロードして後述する神経回路網学習処理を実行する。
【００３８】
本神経回路網学習システム２は神経回路網１２に対して学習処理を行う。この学習処理では、図２に示すごとく、学習制御部１４は標準パターン格納部１８内に備えられた教師パターンデータベース１８ａの標準パターンから、標準入力信号１８ｂを形成して神経回路網１２へ出力する。学習制御部１４は、標準入力信号１８ｂの入力に伴う神経回路網１２からの出力信号１８ｃを、教師パターンデータベース１８ａの標準パターンから形成した教師信号１８ｄと比較する。この比較結果に基づいて学習制御部１４は、シナプス荷重更新指令信号１８ｅを神経回路網１２へ出力する。このシナプス荷重更新指令信号１８ｅを受けて神経回路網１２ではユニット１２ａのシナプス荷重、ここではＭ個のシナプス荷重が調整される。この処理を繰り返すことにより、神経回路網１２にて学習が行われる。
【００３９】
このようにして学習される神経回路網１２の一例として、図３に、オートカーエアコンの制御用途に用いるための神経回路網１２の学習例を示す。標準入力信号１８ｂとしてオートカーエアコンの運転状態を検出するセンサからの信号、この神経回路網１２では、目標吹出温度、日射量、内気温度および外気温度の信号が入力され、出力信号として風量レベルを出力している。この風量レベル出力信号が学習制御部１４にて教師信号と比較されて、シナプス荷重を更新する指令がなされる。これを繰り返すことにより学習がなされる。
【００４０】
このようにして学習された結果、適切なシナプス荷重が得られれば、例えば自動車に搭載される電子制御ユニット（ＥＣＵ）に組み込まれて、オートカーエアコンのセンサから目標吹出温度、日射量、内気温度および外気温度の信号を入力して、風量レベルを出力することにより、オートカーエアコンの風量を制御することができる。
【００４１】
上述した学習処理において、シナプス荷重更新指令信号、すなわち、Ｍ個のシナプス荷重の変動量は、簡略化準ニュートン射影法演算システムとして構成されている神経回路網学習システム２により決定される。
次に神経回路網学習システム２にておこなわれる神経回路網学習処理について説明する。神経回路網学習処理のフローチャートを図４〜図６に示す。この神経回路網学習処理は、Ｇｏｌｄｆａｒｂ法（たとえば今野浩、山下浩、非線型計画法、日科技連、p.264-267）を利用した簡略化準ニュートン射影法によるものである。
【００４２】
なお、Ｇｏｌｄｆａｒｂ法は、線形制約条件付き最適化問題を解く手法である。前述した神経回路網１２に対して線形制約条件付き最適化問題は以下のように定式化される。
すなわち、Ｍ個のシナプス荷重ｗ₁，ｗ₂，…，ｗ_Mを変数として、前述した自乗誤差和の関数Ｅ（ｗ）は式７１のごとく表され、この関数Ｅ（ｗ）において、式７２で表すＫ個の不等式を満足する最小値を求める手法である。ただし、ｗは式７３で定義される。
【００４３】
【数１０】

【００４４】
ここで、いくつか用語と記号を定義する。
（１）有効制約とＩ（ｗ）
あるｗに対しａⁱｗ−ｂⁱ＝０となる制約条件を、制約条件が有効である（以下、「有効制約」と称する。）と呼ぶ。またその番号ｉの集合を式７４のごとくＩ（ｗ）で表す。
【００４５】
【数１１】

【００４６】
（２）すべての制約条件を満たす点の集合を許容領域と呼び、式７５のごとく記号Ｓで表す。
【００４７】
【数１２】

【００４８】
（３）ｗにおける有効制約の数をｑとし、有効制約の係数ベクトルａⁱ（制約条件が有効である係数ベクトルａⁱ）を行に持つｑ行Ｍ列の行列をＡ_qとし、Ｉ（ｗ）が式７６で表されるとき、Ａ_qは式７７のごとく表す。
【００４９】
【数１３】

【００５０】
（４）Ｍ×Ｍの単位行列をＩ_Mで表記する。
（５）式７８によりＭ行Ｍ列の行列を定義する。第２項に現れる式７９はＡ_qの一般化逆行列である。
【００５１】
【数１４】

【００５２】
（６）Ｇｏｌｄｆａｒｂ法では式８０で表される条件を満足する点ｗ^k+1を見つけるのにヘシアンというＭ行Ｍ列の行列を使用する。ｋ回目の更新におけるヘシアンをＨ_kで示す。ヘシアンＨ_kは常に対称行列であり、式８１で表す関係が成立する。
【００５３】
【数１５】

【００５４】
（７）式８２のごとく表される∇Ｅ（ｗ^k）は、点ｗ^kにおける関数Ｅ（ｗ^k）の勾配を示す。これは１行Ｍ列の行ベクトルである。∇Ｅ^t（ｗ^k）はその転置であり、Ｍ行１列の列ベクトルである。
【００５５】
【数１６】

【００５６】
（８）対角要素がａ₁ ａ₂ … ａ_MであるＭ行Ｍ列の対角行列を式８３に示すごとく表記する。
【００５７】
【数１７】

【００５８】
神経回路網学習処理が開始されると、まず、ｋ＝０にｋが初期設定され、初期値としてｗ⁰∈Ｓを満足するシナプス荷重ｗ⁰が選択される（Ｓ１００）。次に、有効制約の番号の集合Ｉ（ｗ⁰）と、行列Ａ_qとを求める（Ｓ１０２）。
式８４により、Ｐ_qを求める（Ｓ１０４）。
【００５９】
【数１８】

【００６０】
ここで、ｄｉａｇ［ｂ₁ ｂ₂ … ｂ_M］は、ｍ∈Ｉ（ｗ⁰）ならｂ_m＝１、ｍ∈Ｉ（ｗ⁰）でないならｂ_m＝０である関数ｂ_mを対角成分とする対角行列を表している。
次に、ヘシアンＨの初期の内容として、Ｐ_qをそのまま設定する（Ｓ１０６）。
【００６１】
次に、学習制御部１４はシナプス荷重更新指令信号１８ｅを神経回路網１２に出力して、神経回路網１２の実際のシナプス荷重をｗ⁰の値に設定する（Ｓ１１０）。
次に、標準パターン格納部１８からの標準パターンの内の標準入力信号１８ｂを、神経回路網１２の入力層のユニット１２ａに入力し、同時に神経回路網１２の出力層のユニット１２ａからの出力信号１８ｃを、学習制御部１４内のメモリに記録する（Ｓ１２０）。
【００６２】
次に、Ｅ（ｗ^k）が算出され（Ｓ１３０）、更に∇Ｅ（ｗ^k）が算出される（Ｓ１３４）。
Ｅ（ｗ^k）は、式８５に示すごとく、ｔで表す標準パターンの教師信号１８ｄとｏで表す神経回路網１２の出力信号１８ｃとの自乗誤差和に該当する。
【００６３】
【数１９】

【００６４】
ここで、Ｎは出力層のユニット数、Ｐは標準パターンの数である。
∇Ｅ（ｗ^k）は、式８６に示すごとく定義される。
【００６５】
【数２０】

【００６６】
次に、∇Ｅ（ｗ^k）を転置した∇^tＥ（ｗ^k）をヘシアンＨ_kにより、式８７のごとくの計算により、Ｅ（ｗ^k）の変化方向を表すベクトルｄ^kを求める（Ｓ１４０）。
【００６７】
【数２１】

【００６８】
次に、ｄ^k＝０か否かが判定される（Ｓ１５０）。ｄ^k＝０で無ければ（Ｓ１５０にて「ＮＯ」）、次に直線探索により、新たなシナプス荷重ｗ^k+1を設定する（Ｓ１９０）。ただし、直線探索は、式８８に示す計算によって行われる。
【００６９】
【数２２】

【００７０】
ここで、係数行列αkは、αk＞0 かつｗ^k+1∈Ｓとなる範囲で設定する。
次に、新たに設定されたシナプス荷重ｗ^k+1を神経回路網１２のシナプス荷重として設定して、標準入力信号１８ｂを神経回路網１２の入力層のユニット１２ａに入力し、出力層のユニット１２ａからの出力信号１８ｃと教師信号１８ｄとによる、式８５に示した計算を行って、Ｅ（ｗ^k+1）を算出する（Ｓ２００）。そして、式８９を満足するか否かを判定する（Ｓ２１０）。
【００７１】
【数２３】

【００７２】
式８９を満足していなければ（Ｓ２１０で「ＮＯ」）、再度、ステップＳ１９０に戻って、更に直線探索を継続して、Ｅ（ｗ^k+1）を検討する。
直線探索の結果、式８９を満足すれば（Ｓ２１０で「ＹＥＳ」）、次に、式７２で示した制約条件の内、新たに有効制約となったものがあるか否かが判定される（Ｓ２２０）。新たに有効になった制約条件がなければ（Ｓ２２０で「ＮＯ」）、ＢＦＧＳ公式によりＨ_kを更新する（Ｓ２３０）。
【００７３】
ＢＦＧＳ公式による計算は、式９０に示すごとくなされる。なお、ｓ^k＝ｗ^k+1−ｗ^k、ｒ^k＝∇^tＥ（ｗ^k+1）−∇^tＥ（ｗ^k）とする。
【００７４】
【数２４】

【００７５】
一方、新たに有効になった制約条件があれば（Ｓ２２０で「ＹＥＳ」）、式９１にて新たなヘシアンＨ_k+1を算出する（Ｓ２４０）。
【００７６】
【数２５】

【００７７】
ここで、Ｒ_kは、Ｈ_kの第ｉ行第ｊ列の要素をｈ^k _ijとした場合に、式９２で示すごとく第ｉ行第ｊ列の要素が表される行列である。
【００７８】
【数２６】

【００７９】
次に、新たに有効になった制約条件を行列Ａ_qに加えて、Ａ_q+1とし（Ｓ２５０）、有効制約の数を表すカウンタｑをインクリメントする（Ｓ２６０）。
そして、ｋをインクリメントする（Ｓ２７０）。ステップＳ２３０の処理が終了した場合もこのステップＳ２７０の処理を行う。
【００８０】
ステップＳ２７０の次にｋがｋの上限値ｋ_maxを越えていないか判定し（Ｓ２７２）、越えていなければ（Ｓ２７２で「ＮＯ」）、新たな∇Ｅ（ｗ^k）を算出し（Ｓ１３４）、新たなヘシアンＨ_kと∇^tＥ（ｗ^k）とにより、式８７に示したごとく、ベクトルｄ^kを求め（Ｓ１４０）、ｄ^k＝０でなければ（Ｓ１５０で「ＮＯ」）、前述した処理が繰り返される。
【００８１】
もし、ｄ^k＝０であった場合には（Ｓ１５０で「ＹＥＳ」）、ラグランジュ乗数λを式９３のごとく算出する（Ｓ２８０）。ここで、Ｉ（ｗ^k）は、式９４、∇Ｅ（ｗ^k）は式９５のごとく定義されている。
【００８２】
【数２７】

【００８３】
次にラグランジュ乗数λの全要素が非負、すなわち、λの全要素≧０か否かが判定される（Ｓ２９０）。ラグランジュ乗数λの全要素が非負でない場合（Ｓ２９０で「ＮＯ」）は、現在のシナプス荷重ｗ^kは解ではないので、ラグランジュ乗数λの要素の内、最も小さい要素、すなわち負で絶対値が最大の要素（番号ｓ）に対応する制約条件ａ^sをＡ_qから取り除き、Ａ_q-1とする（Ｓ３００）。
【００８４】
次にＩ（ｗ^k）から番号ｓを取り除く（Ｓ３１０）。そして、式９６の計算にて、新たなヘシアンＨ_k+1を算出する（Ｓ３２０）。
【００８５】
【数２８】

【００８６】
ここでＤsは、第ｓ行第ｓ列の要素が１で他は全て０のＭ行Ｍ列の行列である。
次にｑがステップＳ３００でＡ_qから要素を１つ取り除いたことに対応して、１つ減算される（Ｓ３３０）。
【００８７】
次にｋをインクリメントして（Ｓ３４０）、ｋがｋの上限値ｋ_maxを越えていないか判定し（Ｓ３４２）、越えていなければ（Ｓ３４２で「ＮＯ」）、ベクトルｄ^kを求める処理（Ｓ１４０）に戻る。以後、ステップＳ１５０またはステップＳ２９０にて「ＮＯ」と判定される限り、前述した処理を繰り返し、学習が継続される。
【００８８】
ステップＳ２９０にてラグランジュ乗数λの全要素が非負であると判定された場合（Ｓ２９０にて「ＹＥＳ」）、この時に設定されているｗ^kが解として記録される（Ｓ３５０）。こうして学習処理は終了する。なお、ステップＳ２７２またはステップＳ３４２にて、ｋ＞ｋ_maxと判定された場合も、この時に設定されているｗ^kが解として記録され（Ｓ３５０）、学習処理を終了する。
【００８９】
上述した学習処理では、計算上、４つの簡略化を行っている。そして、この４つの簡略化は、神経回路網１２のシナプス荷重ｗに上下限を設定するに際して、前記式７２にて示した制約条件の係数ベクトルａⁱが、第li要素が−１または１であり、他の要素が全て０の１行Ｍ列の行ベクトルであるとの制約のもとに、初めて得られる。ここで便宜上、ａⁱをｃで表すと、式９７に示すごとくとなる。
【００９０】
【数２９】

【００９１】
更に、式９７をＺ毎に区別して表すと、式９８のごとくに表すことができる。
【００９２】
【数３０】

【００９３】
このような係数ベクトルａⁱの制約による簡略化について説明する。
［第１の簡略化］
ステップＳ１０４におけるＰ_qの算出に際して、式９９に示す計算を行っている。
【００９４】
【数３１】

【００９５】
従来知られているＰ_qの計算は、式１００に示すごとくの一般化逆行列の計算である。
【００９６】
【数３２】

【００９７】
簡単のため、１行Ｍ列で第ｍ番目の要素がｘで他の要素が全て０であるベクトルをｅ_M ^m（x）と表記する。ｅ_M ^m（x）について、式１０１が成立する。
【００９８】
【数３３】

【００９９】
なお、前記式９７のｃ_li ^Zは式１０２のように表記できる。
【０１００】
【数３４】

【０１０１】
次に、ｉ，ｊ＝１，２，…，ｑとして、Ａ_qＡ_q ^tの第ｉ行第ｊ列要素は、ｃ_li（ｃ_li）^tである。ｉ＝ｊならｌ_i＝ｌ_j、ｉ≠ｊならｌ_i≠ｌ_jであるから、式１０３が整成立する。
【０１０２】
【数３５】

【０１０３】
したがって、Ａ_qＡ_q ^tはｑ行ｑ列の単位行列Ｉ_qであり、（Ａ_qＡ_q ^t）^-1もｑ行ｑ列の式１０４で表すごとく単位行列Ｉ_qとなる。
【０１０４】
【数３６】

【０１０５】
式１０４から、式１０５が成立する。
【０１０６】
【数３７】

【０１０７】
Ａ_qの第ｉ列ベクトル（ｉ＝１，２，…，Ｍ）をｄⁱで表記する。これは、式１０６に示すごとくである。
【０１０８】
【数３８】

【０１０９】
すると、Ａ_q ^tＡ_qの第ｉ行第ｊ列要素は、（ｄⁱ）^tｄ^jとなる。
さて、式１０７が成立するならｄⁱ＝０である。したがって式１０７または式１０８が成立するなら（ｄⁱ）^tｄ^j＝０となる。
【０１１０】
【数３９】

【０１１１】
一方、式１０９が成立するなら、ｉ＝ｌ_u、ｊ＝ｌ_vとなる数ｕ，ｖが存在する。ｄⁱは第ｕ要素がｃ_lu、他の要素は０のベクトルとなる。すなわち、式１１０が成立し、同様に式１１１が成立する。
【０１１２】
【数４０】

【０１１３】
ｉ＝ｊならｕ＝ｖ、ｉ≠ｊならｕ≠ｖであるので、式１１２が成立する。
【０１１４】
【数４１】

【０１１５】
以上より、（ｄⁱ）^tｄ^j＝１となるのは、ｉ＝ｊかつｉ∈Ｉ_c（ｗ^k）のときに限る。したがって、ｂ_mを式１１３のごとく表すと、Ａ_q ^tＡ_qは、式１１４のごとく表される。
【０１１６】
【数４２】

【０１１７】
すなわち式９９が証明された。したがって、一般化逆行列の計算を実行しなくても、ステップＳ１０４におけるＰ_qの算出が可能であり、この部分で計算のための作業メモリを要したり、計算が不正確になるのを防止できる。また、プログラム作成時も一般化逆行列のプログラムを作成しなくても良いので、プログラム作成作業が容易となる。
【０１１８】
［第２の簡略化］
ステップＳ２４０におけるＨ_k+1の算出に際して、Ｈ_k+1の各要素について、式１１５に示す計算を行っている。
【０１１９】
【数４３】

【０１２０】
従来知られているＨ_k+1の計算は、式１１６に示すごとくの行列の計算である。
【０１２１】
【数４４】

【０１２２】
ここで、ｗ^kをｗ^k+1に更新して、ｗ_r ^k+1＝−Ｂまたはｗ_r ^k+1＝Ｂになったとする。すると、制約条件ｃ_r ^Zｗ^k+1−Ｂが新たに有効制約となる。ａ^r＝ｃ_r ^Zとおくと、前記式１１６は、式１１７のごとく表される。
【０１２３】
【数４５】

【０１２４】
この内、前記式１１７の第２項の分母と分子とに共通のＨ_k（ｃ_r ^Z）^tを計算すると式１１８のごとくになる。
【０１２５】
【数４６】

【０１２６】
したがって、前記式１１７の第２項の分母は式１１９のように計算できる。
【０１２７】
【数４７】

【０１２８】
次に、Ｈ_k ^t＝Ｈ_kの関係より、前記式１１７の第２項の分子の一部であるｃ_r ^ZＨ_kは、式１２０の計算式に示すごとく、分子の他の部分であるＨ_k（ｃ_r ^Z）^tを転置したものに等しい。
【０１２９】
【数４８】

【０１３０】
前記式１１８と前記式１２０とにより、前記式１１７の第２項の分子は式１２１で表される。
【０１３１】
【数４９】

【０１３２】
前記式１１９と前記式１２１との関係から、前記式１１５の関係が得られる。したがって、５回の行列の乗算が必要な式１１６が、式１１５のごとく簡略化さえる。ステップＳ２４０におけるＨ_k+1の算出が可能であり、この部分で計算のための作業メモリを要したり、計算が不正確になるのを防止できる。また、プログラム作成作業が容易となる。
【０１３３】
［第３の簡略化］
ステップＳ２８０におけるラグランジュ乗数λの算出に際して、式１２２に示す計算を行っている。
【０１３４】
【数５０】

【０１３５】
従来知られているλの計算は、式１２３に示すごとくの一般化逆行列を含む計算である。
【０１３６】
【数５１】

【０１３７】
ここで、前記式１０４の関係から、式１２４の関係が成立する。
【０１３８】
【数５２】

【０１３９】
したがって、λのｉ番目の要素は式１２５のように求められ、前記式１２２が証明された。
【０１４０】
【数５３】

【０１４１】
したがって、一般化逆行列の計算を実行しなくても、ステップＳ２８０におけるλの算出が可能であり、この部分で計算のための作業メモリを要したり、計算が不正確になるのを防止できる。また、プログラム作成時も一般化逆行列のプログラムを作成しなくても良いので、プログラム作成作業が容易となる。
【０１４２】
［第４の簡略化］
ステップＳ３２０におけるＨ_k+1の算出に際して、式１２６に示す計算を行っている。
【０１４３】
【数５４】

【０１４４】
ここでＤ_sは、第ｓ行第ｓ列の要素が１で他は全て０のＭ行Ｍ列の行列である。
従来知られているＨ_k+1の計算は、式１２７に示すごとくの一般化逆行列の計算である。
【０１４５】
【数５５】

【０１４６】
ここで、制約条件ｃ_s ^Zｗ^k−Ｂ＝０を有効制約から取り除くとする。式１２８の条件が成立するので、ｂ^s＝０となる。
【０１４７】
【数５６】

【０１４８】
ａ^s＝ｃ_s ^Zとおくと、式１２７は、式１２９のごとく表される。
【０１４９】
【数５７】

【０１５０】
はじめに、前記式１２９の第２項の分母と分子とに共通なＰ_q-1（ｃ_s ^Z）^tを計算する。前記式９９の関係から式１３０の関係が存在する。
【０１５１】
【数５８】

【０１５２】
ここで、ｂ^s＝０であるので、Ｐ_q-1の第ｓ行第ｓ列の要素は１である。これにより、前記式１２９の第２項の分母と分子とに共通なＰ_q-1（ｃ_s ^Z）^tは式１３１に示すごとく（ｃ_s ^Z）^tに等しいことがわかる。
【０１５３】
【数５９】

【０１５４】
したがって、第２項の分母は、式１３２に示すごとく１となる。
【０１５５】
【数６０】

【０１５６】
一方、Ｐ_q-1 ^t＝Ｐ_q-1であることにより第２項の分子の一部であるｃ_s ^ZＰ_q-1は式１３３に示すごとく、第２項の他の一部であるＰ_q-1（ｃ_s ^Z）^tを転置したｃ_s ^Zとなる。
【０１５７】
【数６１】

【０１５８】
したがって、第２項の分子は、式１３４に示すごとくとなり、第ｓ行第ｓ列の要素が１で、他の要素は全て０のＭ行Ｍ列の行列となる。
【０１５９】
【数６２】

【０１６０】
すなわち、第ｉ行第ｊ列の要素をｐ_ijとすると、式１３５のように表すことができる。
【０１６１】
【数６３】

【０１６２】
したがって、一般化逆行列の計算を実行しなくても、ステップＳ３２０におけるＨ_k+1の算出が可能であり、この部分で計算のための作業メモリを要したり、計算が不正確になるのを防止できる。また、プログラム作成時も一般化逆行列のプログラムを作成しなくても良いので、プログラム作成作業が容易となる。
【０１６３】
以上述べたように、固定小数点式デジタル演算装置で実行する神経回路網１２のシナプス荷重のダイナミックレンジを抑制するために、シナプス荷重の絶対値に上限値を設定し、その範囲内で学習を行なうＧｏｌｄｆａｒｂ法を適用する際に、上述のごとく、前記式７２にて示した制約条件の係数ベクトルａⁱが、第l_i要素が−１または１であり、他の要素が全て０の１行Ｍ列の行ベクトルであるとの制約のもとに、一般化逆行列の複雑な計算を不要にできるため、神経回路網１２の学習プログラムのプログラミングは容易となる。また計算時間、メモリ使用量を削減できる。更に、別の効果として、一般化逆行列の計算は桁落ち等の数値解析上の問題により、正確な計算ができない場合があるが、上述した簡略化によりその問題を回避でき、より正確な数値解が得られるという利点もある。
【０１６４】
なお、本実施の形態では各シナプス荷重ｗ_i（ｉ＝１，２，…，Ｍ）の絶対値に共通の上限値Ｂを設定した場合、すなわち｜ｗ_i｜≦Ｂについて簡略計算式を導出し、証明した。しかしこれらの式は、各シナプス荷重ｗ_iにそれぞれ別個に上限値Ｂ_i ^U、下限値Ｂ_i ^Lを設定した場合、すなわちＢ_i ^L≦ｗ_i≦Ｂ_i ^Uとした場合にも同様に有効である。また本実施の形態は階層型神経回路網について説明したが、降下法により学習できる神経回路網であれば、他のモデル（リカレントニューラルネットワーク等）にも適用可能である。
【０１６５】
［実験例］
オートカーエアコン風量制御に本発明を適用した効果を示す実験結果を以下に説明する。本実験では、前述した４つの簡略化を行った処理にて学習した場合（「実施例」で表す。）と、従来の学習法であるＢＦＧＳ公式を用いたＧｏｌｄｆａｒｂ法（「従来法」で表す。）にて学習した場合との比較を行ない、固定小数点演算での神経回路網出力の誤差を評価した。
【０１６６】
Ａ．要領
比較に用いた適用事例、神経回路網の構成等の条件を以下に示す。オートカーエアコンをＡ／Ｃと略記する。
（ａ）適用事例
Ａ／Ｃ吹き出し口制御（ＦＡＣＥ，Ｂ／Ｌ，ＦＯＯＴ等の切り替え）
（ｂ）入出力の仕様
入出力は、４入力１出力である。各入力の仕様を表１に、各出力の仕様を表２に示す。入力センサー値範囲のうち、単純なif-thenルールによりプログラム処理できる領域を除いた部分を神経回路網により処理する。神経回路網へは、各センサー信号をセンサー値範囲で［０，１］に正規化して入力する。
【０１６７】
【表１】

【０１６８】
【表２】

【０１６９】
出力信号である吹出口モードにしたがい、Ａ／Ｃは以下のようにモードを切り替えさせるものとする。
【０１７０】
【表３】

【０１７１】
吹出口モードの許容誤差は±０．１であるが、モード切替点では確実にモードを切り替えるため、誤差をできるだけ小さくする必要がある。
（ｃ）教師パターン数
５９１５個（うち３９４４個を学習に使用）
（ｄ）神経回路網の構成
４層型神経回路網（入力層４ユニット、第１中間層８ユニット、第２中間層８ユニット、出力層１ユニット）
入力ユニットは線形ユニット、中間、出力ユニットはシグモイドユニット
（ｅ）評価方法
ステップ１．従来法、実施例（上限値Ｂ＝６４、１２８）それぞれにつき、初期値を変えて２０回学習を行なった。各神経回路網係数の初期値は（−１、１）の範囲の乱数とする。学習サイクルは各試行とも１０００回とする。
【０１７２】
ステップ２．各神経回路網を浮動小数点演算、固定小数点演算で計算し、式１３６に示すごとく全パターンに対する自乗誤差和Ｅを算出して比較する。
【０１７３】
【数６４】

【０１７４】
Ｂ．実験結果
自乗誤差和Ｅの計算結果を表４に示す。試行番号が同じ神経回路網は同一の初期値から学習を開始している。従来法は浮動小数点演算において最良の結果を示し、試行３、１２、１３、１７で自乗誤差和Ｅはほぼ０となった。しかしながら固定小数点演算では試行により演算精度が低下し、自乗誤差和Ｅが異常に大きくなることがあった（試行３、１０等）。自乗誤差和Ｅの最大値を比較すると、従来法で１８７．１となったのに対し、実施例は上限値Ｂ＝６４、１２８それぞれで１．３７２、１．０７９となった。これより固定小数点演算に関し、実施例は、従来法より神経回路網の初期値依存性が低く、試行による自乗誤差和Ｅのばらつきの少ないことが分かる。
【０１７５】
【表４】

【０１７６】
次に、従来法、実施例につき固定小数点演算における自乗誤差和Ｅの最も小さいもの３つを選択し、シナプス荷重の絶対値の最大値、教師出力と神経回路網出力との絶対誤差の最大値を比較した。従来法の結果を表５に、実施例の上限値Ｂ＝６４の場合の結果を表６に、実施例の上限値Ｂ＝１２８の場合の結果を表７に示す。
【０１７７】
【表５】

【０１７８】
【表６】

【０１７９】
【表７】

【０１８０】
最小の自乗誤差和Ｅを比較すると、実施例が２桁小さく、より正確な入出力関数が実現できた。絶対誤差の最大値も本実施の形態が従来法より小さく、優れた性能を示した。許容誤差は各学習法すべて±０．１の範囲内であるが、従来法では出力値０．３、０．４、０．５付近で大きな自乗誤差和Ｅが発生した。特に０．３、０．５はモード切替点であり、この神経回路網を制御に用いることはできない。一方、実施例では上限値Ｂ＝６４、１２８いずれにおいても特定の出力値で大きな自乗誤差和Ｅが発生する現象はなかった。
【０１８１】
表８に演算方式の違いによる実施例と従来例との自乗誤差和Ｅの比較を示す。
【０１８２】
【表８】

【０１８３】
表８からわかるように、実施例では浮動小数点演算でも、固定小数点演算でもほとんど自乗誤差和Ｅに差はないが、従来法では極めて大きな差を生じる。このことから、従来法は、ＥＣＵ等において一般的に用いられている固定小数点演算を行う演算装置に用いるのは不適であることがわかる。
【０１８４】
図７に実施例と従来法とによる制御曲線の比較を示す。図中の教師出力は実現すべき制御曲線を、神経回路網出力は神経回路網の出力した制御曲線を示す。（ａ）は実施例による結果、（ｂ）は従来法の結果である。
従来法では出力値０．３、０．４、０．５で大きな自乗誤差和Ｅが発生したのに対し、実施例は教師出力曲線、神経回路網出力曲線がほぼ一致したことが分かる。これより実施例の効果を確認した。
【０１８５】
【その他】
上述した実施の形態では、学習制御部１４は、ハードディスクとして構成されている標準パターン格納部１８に記憶されているプログラムをＲＡＭにロードして神経回路網学習処理を実行したが、これ以外に、例えば、フロッピーディスク、光磁気ディスク、ＣＤ−ＲＯＭ等のコンピュータ読み取り可能な記録媒体に記録し、必要に応じてコンピュータシステムにロードして起動することにより用いても良い。この他、ＲＯＭやバックアップＲＡＭをコンピュータ読み取り可能な記録媒体として前記プログラムを記録しておき、このＲＯＭあるいはバックアップＲＡＭをコンピュータシステムに組み込んで用いても良い。
【図面の簡単な説明】
【図１】一実施の形態としての神経回路網学習システムの概略構成を表すブロック図である。
【図２】前記神経回路網学習システムによる神経回路網に対する学習処理の説明図である。
【図３】前記神経回路網学習システムによるオートカーエアコン制御用の神経回路網に対する学習処理の説明図である。
【図４】前記神経回路網学習システムによる神経回路網学習処理のフローチャートである。
【図５】前記神経回路網学習システムによる神経回路網学習処理のフローチャートである。
【図６】前記神経回路網学習システムによる神経回路網学習処理のフローチャートである。
【図７】実施例と従来法との学習の効果を示す説明図である。
【図８】従来の学習における自乗誤差和Ｅの推移状態説明図である。
【符号の説明】
２…神経回路網学習システム１２…神経回路網１２ａ…ユニット
１４…学習制御部１８…標準パターン格納部
１８ａ…教師パターンデータベース１８ｂ…標準入力信号
１８ｃ…出力信号１８ｄ…教師信号
１８ｅ…シナプス荷重更新指令信号[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a simplified quasi-Newton projection method computing system, a neural network learning system using this system, a recording medium recording a program for realizing these systems on a computer system, and a signal processing apparatus.
[0002]
[Prior art]
Neural networks (also referred to as neural networks) are widely applied to pattern recognition and data processing. This neural network acquires its processing ability through repeated learning processing, and several methods for changing synaptic weights have been proposed for rapid learning and improvement of ability acquired after learning processing. .
[0003]
A neural network is composed of an input layer, an intermediate layer, an output layer, and synapses that connect each layer. Each synapse has a weight called a synaptic load, and various input / output characteristics can be realized by changing the synaptic load by learning. Below, the total number of synapses is M, and each synaptic load is w₁, W₂, ... w_MAnd Also, w = [w₁, W₂, ... w_M]^tRepresented by
[0004]
In the learning of the neural network, the neural network output signal when the teacher input signal is inputted to the neural network is calculated, the output signal is compared with the teacher output signal, and each synaptic load w is calculated based on the comparison result.₁, W₂, ..., w_MIs changed to minimize the error between the teacher output signal and the neural network output signal, for example, the square error sum E (w).
[0005]
In general, the minimum value is calculated by the descent method such as the back propagation method (McClelland, JL, Rumelhart, DE, and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, Chapter 8, 1986). .
[0006]
The calculation steps of this back propagation method will be described. Where k is the number of updates, k_maxIs the maximum number of updates. A schematic diagram of the descent method is shown in FIG. (Strictly speaking, what is found in the following steps is not a minimum value but a minimum value, but does not make an essential difference in the following description.)
Step 1: As k = 0, the initial value w is set to the synaptic load of the neural network.^kSet.
[0007]
Step 2: w^kE (w in^k) Gradient ∇ E (w^k). ∇ E (w^k) = 0, jump to step 4.
Step 3: E (w^{k + 1}) <E (w^kNew point w that satisfies^{k + 1}Find out. And w^kW^{k + 1}Set the value of and set a new w^kK <k_maxIf so, return to Step 2. k = k_maxIf so, go to Step 4.
[0008]
Step 4: w^kLet be the solution.
In the example of FIG. 8A, the initial value w is the error surface 301.⁰Shows how learning progresses when given Here, value w after updating k times^kHas converged to the minimum value.
[0009]
However, there are cases where the minimum value exists at infinity in space depending on the application. When learning of such a case is performed, the absolute values of some synaptic loads continue to increase, for example, exceeding 1000. An example is shown in FIG. Such a neural network has a large dynamic range of the synaptic load, and a correct input / output characteristic can be obtained in a floating point operation of a digital arithmetic unit, but a large quantization error occurs in a fixed point operation and a desired input / output characteristic is not obtained. I can't get it. A consumer product uses a fixed-point CPU for reasons such as cost reduction. Therefore, it cannot be used by incorporating a neural network that causes an excessive synaptic load.
[0010]
For example, consider a case where a neural network is calculated by fixed-point arithmetic with a word length of 16 bits. [Sxxxxxxxx. xxxxxxxx] indicates a data type in which 8 bits are assigned to the decimal part. s is a sign bit, and x is a bit representing numerical data. The accuracy of the number that can be expressed by this data type is “1/2.⁸= 0.00390625 "and the range is [-2.^16-1/ 2⁸, (2^16-1-1) / 2⁸] = [− 128, 127.799609375].
[0011]
When the neural network is realized by fixed-point arithmetic, the synaptic load w = [w₁ w₂ … W_M]^tIs represented by a fixed-point data type. The data type is determined by the maximum absolute value of each synaptic load. For example, if the value is 1000, 10 bits are required for the integer part for storage, and only 5 bits are available for the decimal part. That is, [sxxxxxxxxxxxx. xxxx]]. This accuracy is "1/2^Five= 0.03125 "and the range is [-2.^16-1/ 2^Five, (2^16-1-1) / 2^Five] = [− 1024, 1023.96875]. This reduces the calculation accuracy and increases the quantization error.
[0012]
As a method for reducing the above-described quantization error, a method is known in which learning is performed by providing upper and lower limits to the synaptic load by the back-propagation method (Japanese Patent Laid-Open Nos. 7-152716 and 7-44515). JP-A-2-143384), however, has been difficult to devise various techniques for increasing the calculation speed due to the nature of the back-propagation method, and the calculation speed is insufficient.
[0013]
In addition, there is a penalty function method (Michael A. Arbib, The Handbook of Brain Theory and Neural Networks, MIT Press, p643, p992) as a method for suppressing the increase in the absolute value of the synaptic load. This is a function defined by the sum of the square error sum E (w) defined by G (w) = E (w) + μ × F (w) and the penalty term F (w) which is a function of the square of each synapse. This is a method for minimizing G (w). The coefficient μ is a parameter that determines the relative importance of E (w) and F (w).
[0014]
However, the penalty function method has a problem that the parameter μ must be set by trial and error, and it takes a long time to obtain an appropriate solution.
[0015]
[Problems to be solved by the invention]
As a method that does not cause the above-described problem, it is conceivable to set an upper limit on the absolute value of the synaptic load and perform learning within a range not exceeding the upper limit. The Goldfarb (cold ferb) method, which is a quasi-Newton projection method, can be used to realize this (for example, Hiroshi Imano, Hiroshi Yamashita, Nonlinear Programming, Nikka Giren, p.264-267).
[0016]
However, since the Goldfarb method requires complicated calculations such as a generalized inverse matrix, programming is difficult. Moreover, the calculation time of the generalized inverse matrix is long, and a considerably large working memory area is required for the calculation. Another problem is that the calculation of the generalized inverse matrix may not be able to be accurately calculated by a digital computing device due to problems in numerical analysis such as digit loss. If this phenomenon occurs, the calculation result will be incorrect. There was a problem of becoming.
[0017]
In the present invention, when learning a neural network by a quasi-Newton projection method using a fixed point arithmetic of a digital arithmetic unit, the calculation time is shortened, the amount of memory consumption is reduced, and the calculation result is simplified. And a neural network learning system using the simplified quasi-Newton projection method computing system, a recording medium storing a program for realizing these systems on a computer system, and the neural network. An object of the present invention is to provide a signal processing device incorporating a neural network obtained by learning processing by a network learning system.
[0018]
[Means for Solving the Problems and Effects of the Invention]
The simplified quasi-Newton projection method arithmetic system of the present invention uses a digital arithmetic unit that performs fixed-point arithmetic, and uses a function E (w consisting of M variables w expressed by Equation 1 and satisfying the constraints of Equation 2. When obtaining the solution of the variable w for which) is the minimum value, the quasi-Newton projection method described above is basically used.
[0019]
That is,
A first processing means for performing a straight line search to obtain a value of a variable w for reducing a value of the function E (w);
When the new constraint condition of Formula 2 based on the value of the new variable w becomes effective after the processing of the first processing means, the coefficient vector a of the newly enabled constraint condition^rIs a matrix A composed of coefficient vectors whose constraints are valid_qAnd a transpose matrix 勾配 representing the gradient of the function E (w)^tThe Hessian H for obtaining the vector d representing the change direction of the function E (w) from E (w) is updated by Equation 3 and the processing is returned to the first processing means, and a new constraint based on the value of the new variable w A second processing means for updating the new Hessian H and returning the processing to the first processing means in a formula for creating a new Hessian H if the condition is not valid;
In the first processing means, the transposed matrix ∇^tWhen the vector d representing the change direction of the function E (w) obtained based on the product of E (w) and the Hessian H becomes zero, the Lagrange multiplier λ obtained in the matrix based on Equation 4 If all the elements of are non-negative, w is obtained as a solution and the whole process is terminated. If the vector d is not zero, the absolute value of the negative elements of the Lagrange multiplier λ is maximized. Corresponding constraint coefficient vector a^s, The matrix A_qAnd the third processing means for updating the Hessian H based on Expression 5 and returning the processing to the first processing means, thereby performing the calculation by the quasi-Newton projection method.
[0020]
In the processing by the quasi-Newton projection method, a set I having q integers different from the integers 1 to M as elements._cIs expressed as shown in Equation 6, and each l_iA vector of 1 row and M columns for (i = 1, 2,..., Q),_iElement of c_liAnd all other elements are 0, and c_liRepresents a vector defined by +1 or −1 by the symbol of Equation 7, and a q-by-M matrix A_qIs expressed as shown in Equation 8, mεI instead of the calculation of the matrix shown in Equation 9 in the calculation of Equation 4._cThen b_m= 1, m∈I_cIf not b_mB where = 0_mIs a diagonal matrix diag [b₁ b₂ ... b_M] Is used to simplify the quasi-Newton projection method.
[0021]
[Formula 6]

[0022]
This simplification eliminates the need to calculate the generalized inverse matrix shown in Equation 9. Therefore, the calculation time is not lengthened, and the working memory area used for the calculation can be small. Furthermore, since there is no problem in numerical analysis such as digit loss, accurate calculation can be performed.
Similarly,
A first processing means for performing a straight line search to obtain a value of a variable w for reducing a value of the function E (w);
After the processing of the first processing means, when the new constraint condition of the expression 12 based on the value of the new variable w becomes effective, the coefficient vector a of the newly enabled constraint condition^rIs a matrix A composed of coefficient vectors whose constraints are valid_qAnd a transpose matrix 勾配 representing the gradient of the function E (w)^tThe Hessian H for obtaining the vector d representing the change direction of the function E (w) from E (w) is updated by the equation 13 and the processing is returned to the first processing means, and a new constraint based on the value of the new variable w A second processing means for updating the new Hessian H and returning the processing to the first processing means in a formula for creating a new Hessian H if the condition is not valid;
In the first processing means, the transposed matrix ∇^tWhen the vector d representing the change direction of the function E (w) obtained based on the product of E (w) and the Hessian H becomes zero, the Lagrange multiplier λ obtained by the matrix based on the equation 14 If all the elements of are non-negative, the process is terminated by obtaining w as a solution, and if the vector d is not zero, the absolute value of the negative elements of the Lagrange multiplier λ is maximized. Corresponding constraint coefficient vector a^s, The matrix A_qAnd a third processing unit that updates Hessian H based on Equation 15 and returns the processing to the first processing unit. Also good.
[0023]
That is, the element in the i-th row and the j-th column of the Hessian H is set to h._ijAnd each coefficient vector a in all constraints^rBy representing that the r-th element is +1 or -1 and the other elements are all 0, the i-th row and j-th row are substituted for the matrix computation represented by Equation 16 among the computations of Equation 13 above. The calculation of the matrix of M rows and M columns in which the elements of the columns are expressed by Expression 17 is used.
[0024]
[Expression 7]

[0025]
By this simplification, Expression 16 that requires five matrix multiplications is simplified as Expression 17. Therefore, the calculation time is not lengthened, and the working memory area used for the calculation can be small. Furthermore, since there is no problem in numerical analysis such as digit loss, accurate calculation can be performed.
[0026]
Similarly,
A first processing means for performing a straight line search to obtain a value of a variable w for reducing a value of the function E (w);
When the new constraint condition of the expression 22 based on the value of the new variable w becomes effective after the processing of the first processing means, the coefficient vector a of the newly enabled constraint condition^rIs a matrix A composed of coefficient vectors whose constraints are valid_qAnd a transpose matrix 勾配 representing the gradient of the function E (w)^tThe Hessian H for obtaining the vector d representing the change direction of the function E (w) from E (w) is updated by the equation 23, and the process is returned to the first processing means, and a new constraint based on the value of the new variable w A second processing means for updating the new Hessian H and returning the processing to the first processing means in a formula for creating a new Hessian H if the condition is not valid;
In the first processing means, the transposed matrix ∇^tWhen the vector d representing the change direction of the function E (w) obtained based on the product of E (w) and the Hessian H becomes zero, the Lagrange multiplier λ obtained in the matrix based on the equation 24 If all the elements of are non-negative, w is obtained as a solution and the whole process is terminated. If the vector d is not zero, the absolute value of the negative elements of the Lagrange multiplier λ is maximized. Corresponding constraint coefficient vector a^s, The matrix A_qAnd a third processing unit that updates the Hessian H based on the expression 25 and returns the processing to the first processing unit, so that the following processing is performed when performing the calculation by the quasi-Newton projection method. Also good.
[0027]
That is, a set I having q integers different from 1 to M as elements._cIs expressed as shown in Equation 26, and the set I_cEach integer l_iA vector of 1 row and M columns for (i = 1, 2,..., Q),_iElement of c_liAnd all other elements are 0, and c_liRepresents a vector defined by +1 or −1 by the symbol of Equation 27, and further the matrix A_qIs represented by a matrix of q rows and M columns as shown in Equation 28, and （E (w^k) Is expressed as shown in Expression 29, and the calculation expressed by Expression 31 is used in place of the calculation of the matrix expressed by Expression 30 in the calculation of Expression 24 described above.
[0028]
[Equation 8]

[0029]
This simplification eliminates the need to calculate the generalized inverse matrix shown in Equation 30. Therefore, the calculation time is not lengthened, and the working memory area used for the calculation can be small. Furthermore, since there is no problem in numerical analysis such as digit loss, accurate calculation can be performed.
Similarly,
A first processing means for performing a straight line search to obtain a value of a variable w for reducing a value of the function E (w);
After the processing of the first processing means, when the new constraint condition of the equation 42 based on the value of the new variable w becomes effective, the coefficient vector a of the newly enabled constraint condition^rIs a matrix A composed of coefficient vectors whose constraints are valid_qAnd a transpose matrix 勾配 representing the gradient of the function E (w)^tThe Hessian H for obtaining the vector d representing the change direction of the function E (w) from E (w) is updated by the equation 43 and the processing is returned to the first processing means, and a new constraint based on the value of the new variable w A second processing means for updating the new Hessian H and returning the processing to the first processing means in a formula for creating a new Hessian H if the condition is not valid;
In the first processing means, the transposed matrix ∇^tWhen the vector d representing the change direction of the function E (w) obtained based on the product of E (w) and the Hessian H becomes zero, the Lagrange multiplier λ obtained in a matrix based on the equation 44 If all the elements of are non-negative, w is obtained as a solution and the whole process is terminated. If the vector d is not zero, the absolute value of the negative elements of the Lagrange multiplier λ is maximized. Corresponding constraint coefficient vector a^s, The matrix A_qAnd a third processing unit that updates the Hessian H based on the expression 45 and returns the processing to the first processing unit, so that the following processing is performed when performing the calculation by the quasi-Newton projection method. Also good.
[0030]
That is, a set I having q integers different from 1 to M as elements._cIs expressed as shown in Equation 46, and the set I_cEach integer l_iA vector of 1 row and M columns for (i = 1, 2,..., Q),_iElement of c_liAnd all other elements are 0, and c_liRepresents a vector defined by +1 or −1 by the symbol of Equation 47, and the matrix A_qIs expressed by a matrix of q rows and M columns as shown in Expression 48, so that, in the calculation of Expression 45, instead of calculating the matrix expressed by Expression 49, the element of the s-th row and s-column is 1, and the other elements The calculation uses M rows and M columns in which all are zero.
[0031]
[Equation 9]

[0032]
This simplification eliminates the need for matrix calculations. Therefore, the calculation time is not lengthened, and the working memory area used for the calculation can be small. Furthermore, since there is no problem in numerical analysis such as digit loss, accurate calculation can be performed.
Further, all of these simplifications may be used, which is more effective.
[0033]
Examples of the formula used in the second processing means include a BFGS formula, a DFP formula, and a symmetric rank 1 formula.
The M variables w represent the synaptic loads of M synapses that connect units from the input layer unit to the output layer unit in the neural network, and a function E (w) is given to the neural network. The process of calculating the solution of the variable w that represents the error between the teacher signal and the output of the neural network and that has the minimum value of the function E (w) performed by the first processing means, the second processing means, and the third processing means is As a learning process for the neural network, the simplified quasi-Newton projection method computing system described above may be applied to the neural network learning system.
[0034]
As described above, it is possible to create a highly accurate neural network by learning in a short time without causing a memory shortage.
In addition, the function which implement | achieves each means of such a simplified quasi-Newton projection method computing system and a neural network learning system with a computer system can be provided as a program started on the computer system side, for example. In the case of such a program, for example, the program is recorded on a computer-readable recording medium such as a floppy disk, a magneto-optical disk, a CD-ROM, or a hard disk, and is used by being loaded into a computer system and started up as necessary. it can. In addition, the ROM or backup RAM may be recorded as a computer-readable recording medium, and the ROM or backup RAM may be incorporated into a computer system and used.
[0035]
The neural network obtained by the learning processing by the neural network learning system described above is incorporated into the signal processing device, so that signals from the input layer unit to the output layer unit can be transmitted based on the synaptic loads of M synapses. Can be processed. In order to incorporate into such a signal processing device, for example, the input signal to be processed is input to the input layer unit of the neural network and the state of the unit of the output layer of the neural network is read and the signal is read. Output means.
[0036]
The learning processing by the neural network learning system can be quickly learned even with an inexpensive digital computer, and an accurate learning result can be obtained, so that a highly accurate output can be made even in a signal processing device.
[0037]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a block diagram showing a schematic configuration of a neural network learning system 2 to which the above-described invention is applied.
The neural network learning system 2 includes a neural network 12, a learning control unit 14, and a standard pattern storage unit 18. Here, the neural network 12 uses a rewritable memory such as a RAM or an EEPROM. Further, the learning control unit 14 is configured as a computer device, and a central CPU uses a digital arithmetic device. The learning control unit 14 loads a program stored in the standard pattern storage unit 18 configured with a hard disk together with other data into the RAM, and executes a neural network learning process described later.
[0038]
The neural network learning system 2 performs learning processing on the neural network 12. In this learning process, as shown in FIG. 2, the learning control unit 14 forms a standard input signal 18b from the standard pattern in the teacher pattern database 18a provided in the standard pattern storage unit 18 and outputs it to the neural network 12. . The learning control unit 14 compares the output signal 18c from the neural network 12 with the input of the standard input signal 18b with the teacher signal 18d formed from the standard pattern in the teacher pattern database 18a. Based on the comparison result, the learning control unit 14 outputs the synapse load update command signal 18e to the neural network 12. In response to this synapse load update command signal 18e, the neural network 12 adjusts the synapse load of the unit 12a, here, M synapse loads. By repeating this process, learning is performed in the neural network 12.
[0039]
As an example of the neural network 12 learned in this way, FIG. 3 shows a learning example of the neural network 12 for use in the control application of an auto car air conditioner. A signal from a sensor for detecting the operating state of the auto car air conditioner as a standard input signal 18b, and in this neural network 12, signals of a target blowing temperature, an amount of solar radiation, an inside air temperature and an outside air temperature are inputted, and an air flow level is outputted as an output signal Output. The air flow level output signal is compared with the teacher signal by the learning control unit 14 and a command to update the synaptic load is issued. Learning is performed by repeating this process.
[0040]
If an appropriate synaptic load is obtained as a result of learning in this way, it is incorporated into, for example, an electronic control unit (ECU) mounted on an automobile, and the target air temperature, solar radiation amount, inside air temperature are detected from sensors of an auto car air conditioner. And, by inputting the signal of the outside air temperature and outputting the air volume level, the air volume of the auto car air conditioner can be controlled.
[0041]
In the learning process described above, the synaptic load update command signal, that is, the amount of fluctuation of the M synaptic loads is determined by the neural network learning system 2 configured as a simplified quasi-Newton projection method computing system.
Next, a neural network learning process performed in the neural network learning system 2 will be described. Flowcharts of the neural network learning process are shown in FIGS. This neural network learning processing is based on a simplified quasi-Newton projection method using the Goldfarb method (for example, Hiroshi Imano, Hiroshi Yamashita, Nonlinear Programming, Nikka Giren, p.264-267).
[0042]
The Goldfarb method is a method for solving an optimization problem with linear constraints. For the neural network 12 described above, an optimization problem with linear constraints is formulated as follows.
That is, M synaptic loads w₁, W₂, ..., w_MThe function E (w) of the square error sum described above is expressed as shown in Expression 71, and this function E (w) is a technique for obtaining a minimum value that satisfies the K inequalities expressed by Expression 72. However, w is defined by Expression 73.
[0043]
[Expression 10]

[0044]
Here we define some terms and symbols.
(1) Effective constraints and I (w)
A for some wⁱw-bⁱA constraint condition in which = 0 is referred to as a valid constraint condition (hereinafter referred to as “valid constraint”). The set of numbers i is represented by I (w) as shown in equation 74.
[0045]
## EQU11 ##

[0046]
(2) A set of points satisfying all the constraint conditions is called an allowable region, and is represented by a symbol S as shown in Expression 75.
[0047]
[Expression 12]

[0048]
(3) The number of valid constraints in w is q, and the valid constraint coefficient vector aⁱ(Coefficient vector a for which the constraint is validⁱ) A matrix of q rows and M columns with A_qAnd when I (w) is expressed by Equation 76, A_qIs expressed as in Equation 77.
[0049]
[Formula 13]

[0050]
(4) M × M unit matrix is represented by I_MIndicate.
(5) A matrix of M rows and M columns is defined by Expression 78. Equation 79 appearing in the second term is A_qIs a generalized inverse matrix of
[0051]
[Expression 14]

[0052]
(6) Point w satisfying the condition expressed by the expression 80 in the Goldfarb method^{k + 1}To find a matrix of M rows and M columns called Hessian. Hessian in the k-th update is H_kIt shows with. Hessian H_kIs always a symmetric matrix, and the relationship expressed by Equation 81 holds.
[0053]
[Expression 15]

[0054]
(7) ∇ E (w expressed as in Equation 82^k) Is the point w^kFunction E (w^k). This is a 1 × M row vector. ∇E^t(W^k) Is the transpose, and is a column vector of M rows and 1 column.
[0055]
[Expression 16]

[0056]
(8) The diagonal element is a₁ a₂ ... a_MA diagonal matrix of M rows and M columns is expressed as shown in Equation 83.
[0057]
[Expression 17]

[0058]
When the neural network learning process is started, first, k is initially set to k = 0, and w is set as an initial value.⁰Synaptic load w satisfying ∈S⁰Is selected (S100). Next, a set I (w of valid constraint numbers⁰) And matrix A_q(S102).
From Equation 84, P_qIs obtained (S104).
[0059]
[Formula 18]

[0060]
Where diag [b₁ b₂ ... b_M] Is m∈I (w⁰) If b_m= 1, m∈I (w⁰B) if not_mB where = 0_mRepresents a diagonal matrix having a diagonal component.
Next, as the initial contents of Hessian H, P_qIs set as it is (S106).
[0061]
Next, the learning control unit 14 outputs the synapse load update command signal 18e to the neural network 12, and the actual synaptic load of the neural network 12 is set to w.⁰(S110).
Next, the standard input signal 18b in the standard pattern from the standard pattern storage unit 18 is input to the input layer unit 12a of the neural network 12, and at the same time, the output signal from the unit 12a of the output layer of the neural network 12 is output. 18c is recorded in the memory in the learning control unit 14 (S120).
[0062]
Next, E (w^k) Is calculated (S130), and ∇E (w^k) Is calculated (S134).
E (w^k) Corresponds to the square error sum of the standard pattern teacher signal 18d represented by t and the output signal 18c of the neural network 12 represented by o as shown in Expression 85.
[0063]
[Equation 19]

[0064]
Here, N is the number of units in the output layer, and P is the number of standard patterns.
∇ E (w^k) Is defined as shown in Equation 86.
[0065]
[Expression 20]

[0066]
Next, ∇E (w^k)^tE (w^k) Hessian H_kThus, E (w^k) Vector d representing the direction of change^kIs obtained (S140).
[0067]
[Expression 21]

[0068]
Then d^kIt is determined whether or not = 0 (S150). d^kIf not = 0 (“NO” in S150), a new synaptic load w is then obtained by line search.^{k + 1}Is set (S190). However, the line search is performed by the calculation shown in Expression 88.
[0069]
[Expression 22]

[0070]
Here, the coefficient matrix αk is αk> 0 and w^{k + 1}Set in the range where εS.
Next, the newly set synaptic load w^{k + 1}Is set as the synaptic load of the neural network 12, the standard input signal 18b is input to the unit 12a of the input layer of the neural network 12, and the equation based on the output signal 18c and the teacher signal 18d from the unit 12a of the output layer is obtained. 85, and E (w^{k + 1}) Is calculated (S200). Then, it is determined whether or not Expression 89 is satisfied (S210).
[0071]
[Expression 23]

[0072]
If Expression 89 is not satisfied (“NO” in S210), the process returns to step S190 again, and the line search is continued, and E (w^{k + 1}).
As a result of the line search, if Expression 89 is satisfied (“YES” in S210), it is next determined whether or not any of the constraint conditions shown in Expression 72 has become a new valid constraint ( S220). If there is no newly valid constraint (“NO” in S220), the BFGS formula will_kIs updated (S230).
[0073]
The calculation according to the BFGS formula is performed as shown in Equation 90. S^k= W^{k + 1}-W^k, R^k= ∇^tE (w^{k + 1}) -∇^tE (w^k).
[0074]
[Expression 24]

[0075]
On the other hand, if there is a newly valid constraint condition (“YES” in S220), a new Hessian H is obtained in Expression 91._{k + 1}Is calculated (S240).
[0076]
[Expression 25]

[0077]
Where R_kIs H_kThe element in the i-th row and j-th column^k _ijIs a matrix in which the elements of the i-th row and the j-th column are represented as shown in Expression 92.
[0078]
[Equation 26]

[0079]
Next, the newly activated constraints are expressed in matrix A_qIn addition to A_{q + 1}(S250), and the counter q indicating the number of valid constraints is incremented (S260).
Then, k is incremented (S270). Even when the process of step S230 is completed, the process of step S270 is performed.
[0080]
After step S270, k is the upper limit k of k_maxIs not exceeded (S272), and if not exceeded (“NO” in S272), a new ∇E (w^k) (S134) and a new Hessian H_kAnd nephew^tE (w^k) To obtain a vector d as shown in Equation 87.^k(S140), d^kIf not = 0 (“NO” in S150), the processing described above is repeated.
[0081]
If d^kIf = 0 (“YES” in S150), the Lagrange multiplier λ is calculated as shown in Equation 93 (S280). Where I (w^k) Is the formula 94, ∇ E (w^k) Is defined as shown in Equation 95.
[0082]
[Expression 27]

[0083]
Next, it is determined whether all the elements of the Lagrange multiplier λ are non-negative, that is, whether all the elements of λ ≧ 0 (S290). If all elements of the Lagrange multiplier λ are not non-negative (“NO” in S290), the current synaptic load w^kIs not a solution, the constraint a corresponding to the smallest element of Lagrange multipliers λ, that is, the negative element having the largest absolute value (number s)^sA_qRemove from A_q-1(S300).
[0084]
Then I (w^k) Is removed from S) (S310). Then, the new Hessian H_{k + 1}Is calculated (S320).
[0085]
[Expression 28]

[0086]
Here, Ds is a matrix of M rows and M columns in which the elements in the sth row and the sth column are 1 and all others are 0.
Then q is A in step S300._q1 is subtracted corresponding to the removal of one element from (S330).
[0087]
Next, k is incremented (S340), and k is the upper limit value k of k._maxIs not exceeded (S342), and if not exceeded ("NO" in S342), the vector d^kThe process returns to the process for obtaining (S140). Thereafter, as long as “NO” is determined in step S150 or step S290, the above-described processing is repeated and learning is continued.
[0088]
If it is determined in step S290 that all elements of the Lagrange multiplier λ are non-negative (“YES” in S290), the w set at this time^kIs recorded as a solution (S350). Thus, the learning process ends. In step S272 or step S342, k> k_maxEven if it is determined, w set at this time^kIs recorded as a solution (S350), and the learning process is terminated.
[0089]
In the learning process described above, four simplifications are performed for calculation. These four simplifications are obtained when the upper and lower limits are set for the synaptic load w of the neural network 12 and the constraint coefficient coefficient vector aⁱIs obtained for the first time under the constraint that the li element is -1 or 1 and the other elements are all 0 rows. For convenience, aⁱIs represented by c as shown in Equation 97.
[0090]
[Expression 29]

[0091]
Further, when Expression 97 is distinguished for each Z, it can be expressed as Expression 98.
[0092]
[30]

[0093]
Such a coefficient vector aⁱThe simplification due to the restriction of will be described.
[First simplification]
P in step S104_qIs calculated by the equation 99.
[0094]
[31]

[0095]
Conventionally known P_qIs a calculation of a generalized inverse matrix as shown in Equation 100.
[0096]
[Expression 32]

[0097]
For simplicity, a vector in which the m-th element is x and the other elements are all 0 in 1 row and M column is e_M ^mIndicated as (x). e_M ^mFor (x), Equation 101 holds.
[0098]
[Expression 33]

[0099]
It should be noted that c in formula 97_li ^ZCan be expressed as in Equation 102.
[0100]
[Expression 34]

[0101]
Next, as i, j = 1, 2,._qA_q ^tThe i-th row and j-th column element of c is c_li(C_li)^tIt is. l if i = j_i= L_j, If i ≠ j_i≠ l_jTherefore, Formula 103 is established.
[0102]
[Expression 35]

[0103]
Therefore, A_qA_q ^tIs a q-by-q unit matrix I_qAnd (A_qA_q ^t)^-1Is also a unit matrix I as represented by equation 104 of q rows and q columns._qIt becomes.
[0104]
[Expression 36]

[0105]
From Expression 104, Expression 105 is established.
[0106]
[Expression 37]

[0107]
A_qThe i-th column vector (i = 1, 2,..., M) of dⁱIndicate. This is as shown in Equation 106.
[0108]
[Formula 38]

[0109]
Then A_q ^tA_qThe i-th row and j-th column element of (dⁱ)^td^jIt becomes.
Now, if equation 107 holds, dⁱ= 0. Therefore, if equation 107 or equation 108 holds, (dⁱ)^td^j= 0.
[0110]
[39]

[0111]
On the other hand, if equation 109 holds, i = l_u, J = l_vThere exist numbers u and v. dⁱThe u-th element is c_luThe other elements are 0 vectors. That is, Expression 110 is established, and Expression 111 is similarly established.
[0112]
[Formula 40]

[0113]
Since u = v if i = j and u ≠ v if i ≠ j, Expression 112 is established.
[0114]
[Expression 41]

[0115]
From the above, (dⁱ)^td^j= 1 for i = j and i∈I_c(W^k) Only. Therefore, b_mIs expressed as shown in Equation 113, A_q ^tA_qIs expressed as in Equation 114.
[0116]
[Expression 42]

[0117]
That is, Formula 99 was proved. Therefore, P in Step S104 can be obtained without performing the calculation of the generalized inverse matrix._qIt is possible to prevent the calculation memory from being required and the calculation from being inaccurate. Further, since it is not necessary to create a generalized inverse matrix program at the time of program creation, the program creation work is facilitated.
[0118]
[Second simplification]
H in step S240_{k + 1}When calculating H_{k + 1}The calculation shown in Formula 115 is performed for each of the elements.
[0119]
[Equation 43]

[0120]
Conventionally known H_{k + 1}Is a matrix calculation as shown in Expression 116.
[0121]
(44)

[0122]
Where w^kW^{k + 1}Update to_r ^{k + 1}= -B or w_r ^{k + 1}Suppose = B. Then, the constraint condition c_r ^Zw^{k + 1}-B is a new effective constraint. a^r= C_r ^ZIn other words, the expression 116 is expressed as an expression 117.
[0123]
[Equation 45]

[0124]
Among these, H common to the denominator and the numerator of the second term of the formula 117_k(C_r ^Z)^tIs calculated as shown in Equation 118.
[0125]
[Equation 46]

[0126]
Therefore, the denominator of the second term of Equation 117 can be calculated as Equation 119.
[0127]
[Equation 47]

[0128]
Next, H_k ^t= H_kFrom the relationship, c is a part of the numerator of the second term of Formula 117_r ^ZH_kIs the other part of the molecule, as shown in the equation 120._k(C_r ^Z)^tIs equivalent to the transpose.
[0129]
[Formula 48]

[0130]
From the formula 118 and the formula 120, the numerator of the second term of the formula 117 is represented by the formula 121.
[0131]
[Formula 49]

[0132]
From the relationship between the equation 119 and the equation 121, the relationship of the equation 115 is obtained. Therefore, equation 116 that requires five matrix multiplications can be simplified as equation 115. H in step S240_{k + 1}It is possible to prevent the calculation memory from being required and the calculation from becoming inaccurate. Moreover, the program creation work becomes easy.
[0133]
[Third simplification]
When calculating the Lagrange multiplier λ in step S280, the calculation shown in Expression 122 is performed.
[0134]
[Equation 50]

[0135]
The conventionally known calculation of λ is a calculation including a generalized inverse matrix as shown in Expression 123.
[0136]
[Formula 51]

[0137]
Here, the relationship of the formula 124 is established from the relationship of the formula 104.
[0138]
[Formula 52]

[0139]
Therefore, the i-th element of λ is obtained as shown in Equation 125, and Equation 122 is proved.
[0140]
[53]

[0141]
Therefore, it is possible to calculate λ in step S280 without executing the calculation of the generalized inverse matrix, and it is possible to prevent the calculation memory from being required for this part and the calculation from becoming inaccurate. . In addition, since it is not necessary to create a generalized inverse matrix program when creating a program, the program creation work is facilitated.
[0142]
[Fourth simplification]
H in step S320_{k + 1}Is calculated by the equation 126.
[0143]
[Formula 54]

[0144]
Where D_sIs a matrix of M rows and M columns in which the elements in the s-th row and the s-th column are 1 and all others are 0.
Conventionally known H_{k + 1}Is a calculation of a generalized inverse matrix as shown in Expression 127.
[0145]
[Expression 55]

[0146]
Where constraint c_s ^Zw^kLet −B = 0 be removed from the effective constraint. Since the condition of Expression 128 is satisfied, b^s= 0.
[0147]
[56]

[0148]
a^s= C_s ^ZIn other words, Expression 127 is expressed as Expression 129.
[0149]
[Equation 57]

[0150]
First, P common to the denominator and the numerator of the second term of the formula 129 is used._q-1(C_s ^Z)^tCalculate From the relationship of Equation 99 above, there is a relationship of Equation 130.
[0151]
[Formula 58]

[0152]
Where b^s= 0, so P_q-1The element in the s-th row and the s-th column is 1. As a result, P common to the denominator and the numerator of the second term of Formula 129 is obtained._q-1(C_s ^Z)^tAs shown in Equation 131 (c_s ^Z)^tIt is understood that it is equal to.
[0153]
[Formula 59]

[0154]
Therefore, the denominator of the second term is 1 as shown in Equation 132.
[0155]
[Expression 60]

[0156]
On the other hand, P_q-1 ^t= P_q-1C being part of the molecule of the second term_s ^ZP_q-1Is P, which is another part of the second term, as shown in Equation 133._q-1(C_s ^Z)^tC transposed_s ^ZIt becomes.
[0157]
[Equation 61]

[0158]
Accordingly, the numerator of the second term is a matrix of M rows and M columns in which the elements in the s-th row and the s-th column are 1 and the other elements are all 0 as shown in the equation 134.
[0159]
[62]

[0160]
That is, the element in the i-th row and the j-th column is p_ijThen, it can be expressed as Equation 135.
[0161]
[Equation 63]

[0162]
Therefore, even if the calculation of the generalized inverse matrix is not performed, H in step S320 is performed._{k + 1}It is possible to prevent the calculation memory from being required and the calculation from being inaccurate. In addition, since it is not necessary to create a generalized inverse matrix program when creating a program, the program creation work is facilitated.
[0163]
As described above, in order to suppress the dynamic range of the synaptic load of the neural network 12 executed by the fixed-point digital arithmetic device, an upper limit value is set to the absolute value of the synaptic load, and learning is performed within the range. When applying the Goldfarb method, as described above, the coefficient vector a of the constraint condition shown in the equation 72 is used.ⁱBut l_iSince the element is -1 or 1, and the other elements are all 0's 1-row M-column row vectors, a complicated calculation of the generalized inverse matrix can be made unnecessary. Twelve learning programs can be programmed easily. Also, calculation time and memory usage can be reduced. Furthermore, as another effect, the calculation of the generalized inverse matrix may not be able to be accurately performed due to problems in numerical analysis such as digit loss, but the problem can be avoided by the simplification described above, and more accurate numerical values can be avoided. There is also an advantage that a solution can be obtained.
[0164]
In this embodiment, each synaptic load w_iWhen a common upper limit B is set for the absolute value (i = 1, 2,..., M), that is, | w_iA simple calculation formula was derived and proved for | ≦ B. However, these formulas are different for each synaptic load w_iSeparately in the upper limit B_i ^U, Lower limit B_i ^LIs set, ie B_i ^L≦ w_i≦ B_i ^UThis is also effective in the case of. Further, although the present embodiment has been described for a hierarchical neural network, the present invention can be applied to other models (such as a recurrent neural network) as long as the neural network can be learned by a descent method.
[0165]
[Experimental example]
The experimental results showing the effect of applying the present invention to the auto car air conditioner air volume control will be described below. In this experiment, learning is performed by the above-described four simplified processes (represented by “Example”), and the Goldfarb method (represented by “conventional method”) using the BFGS formula, which is a conventional learning method. .) Was compared with the case of learning, and the error of the neural network output in fixed point arithmetic was evaluated.
[0166]
A. Outline
Conditions such as the application example used for comparison and the configuration of the neural network are shown below. Auto car air conditioner is abbreviated as A / C.
(A) Application examples
A / C outlet control (switching between FACE, B / L, FOOT, etc.)
(B) Input / output specifications
Input / output is 4 inputs and 1 output. Table 1 shows the specifications of each input, and Table 2 shows the specifications of each output. The part of the input sensor value range excluding the area that can be programmed by a simple if-then rule is processed by the neural network. Each sensor signal is normalized to [0, 1] in the sensor value range and input to the neural network.
[0167]
[Table 1]

[0168]
[Table 2]

[0169]
In accordance with the outlet mode that is the output signal, A / C switches the mode as follows.
[0170]
[Table 3]

[0171]
The permissible error of the outlet mode is ± 0.1, but the mode needs to be made as small as possible in order to switch the mode reliably at the mode switching point.
(C) Number of teacher patterns
5915 (3944 used for learning)
(D) Configuration of neural network
4-layer neural network (input layer 4 units, first intermediate layer 8 units, second intermediate layer 8 units, output layer 1 unit)
Input unit is linear unit, middle unit, output unit is sigmoid unit
(E) Evaluation method
Step 1. For each of the conventional method and the examples (upper limit values B = 64, 128), learning was performed 20 times with different initial values. The initial value of each neural network coefficient is a random number in the range of (-1, 1). The learning cycle is 1000 times for each trial.
[0172]
Step 2. Each neural network is calculated by floating-point arithmetic and fixed-point arithmetic, and a square error sum E for all patterns is calculated and compared as shown in Expression 136.
[0173]
[Expression 64]

[0174]
B. Experimental result
Table 4 shows the calculation result of the square error sum E. Neural networks with the same trial number start learning from the same initial value. The conventional method showed the best result in floating point arithmetic, and the square error sum E was almost zero in

trials

3, 12, 13, and 17. However, in the fixed-point arithmetic, the arithmetic accuracy is reduced by the trial, and the square error sum E may become abnormally large (

trial

3, 10, etc.). When the maximum value of the square error sum E was compared, it was 187.1 in the conventional method, whereas in the example, the upper limit values B = 64 and 128 were 1.372 and 1.079, respectively. From this, it can be seen that with respect to fixed-point arithmetic, the example is less dependent on the initial value of the neural network than the conventional method, and the variation in the square error sum E by trial is small.
[0175]
[Table 4]

[0176]
Next, the three methods having the smallest square error sum E in the fixed-point arithmetic are selected for the conventional method and the embodiment, the maximum absolute value of the synaptic load, and the maximum absolute error value between the teacher output and the neural network output. Compared. Table 5 shows the results of the conventional method, Table 6 shows the results when the upper limit B of the example is 64, and Table 7 shows the results when the upper limit B of the example is 128.
[0177]
[Table 5]

[0178]
[Table 6]

[0179]
[Table 7]

[0180]
When the minimum square error sum E was compared, the example was two orders of magnitude smaller and a more accurate input / output function could be realized. The maximum value of the absolute error was also smaller in the present embodiment than in the conventional method, and excellent performance was shown. The allowable error is within a range of ± 0.1 for each learning method, but a large square error sum E is generated in the vicinity of the output values 0.3, 0.4, and 0.5 in the conventional method. In particular, 0.3 and 0.5 are mode switching points, and this neural network cannot be used for control. On the other hand, in the example, there was no phenomenon in which a large square error sum E occurred at a specific output value at any of the upper limit values B = 64 and 128.
[0181]
Table 8 shows a comparison of the square error sum E between the embodiment and the conventional example due to the difference in calculation method.
[0182]
[Table 8]

[0183]
As can be seen from Table 8, in the embodiment, there is almost no difference in the square error sum E between the floating point calculation and the fixed point calculation, but the conventional method produces a very large difference. From this, it can be seen that the conventional method is unsuitable for use in an arithmetic device that performs fixed-point arithmetic generally used in an ECU or the like.
[0184]
FIG. 7 shows a comparison of control curves between the example and the conventional method. In the figure, the teacher output indicates the control curve to be realized, and the neural network output indicates the control curve output by the neural network. (A) is a result by an Example, (b) is a result of a conventional method.
In the conventional method, a large square error sum E is generated at the output values of 0.3, 0.4, and 0.5, whereas in the example, it can be seen that the teacher output curve and the neural network output curve almost coincide. The effect of the Example was confirmed from this.
[0185]
[Others]
In the above-described embodiment, the learning control unit 14 loads the program stored in the standard pattern storage unit 18 configured as a hard disk into the RAM and executes the neural network learning process. For example, the program may be recorded on a computer-readable recording medium such as a floppy disk, a magneto-optical disk, or a CD-ROM, and loaded into a computer system and started as necessary. In addition, the ROM or backup RAM may be recorded as a computer-readable recording medium, and the ROM or backup RAM may be incorporated into a computer system and used.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of a neural network learning system as one embodiment.
FIG. 2 is an explanatory diagram of learning processing for a neural network by the neural network learning system.
FIG. 3 is an explanatory diagram of learning processing for a neural network for controlling an auto car air conditioner by the neural network learning system;
FIG. 4 is a flowchart of a neural network learning process by the neural network learning system.
FIG. 5 is a flowchart of a neural network learning process performed by the neural network learning system.
FIG. 6 is a flowchart of a neural network learning process performed by the neural network learning system.
FIG. 7 is an explanatory diagram showing the effect of learning between an example and a conventional method.
FIG. 8 is an explanatory diagram of a transition state of a square error sum E in conventional learning.
[Explanation of symbols]
2 ... Neural network learning system 12 ... Neural network 12a ... Unit
14 ... Learning control unit 18 ... Standard pattern storage unit
18a ... Teacher pattern database 18b ... Standard input signal
18c ... Output signal 18d ... Teacher signal
18e ... Synaptic load update command signal

Claims

デジタル式演算装置を用いて、式１にて表され式２の制約条件を満たすＭ個の変数ｗからなる関数Ｅ（ｗ）が最小値となる変数ｗの解を求めるに際して、
直線探索を行って、関数Ｅ（ｗ）の値を小さくする変数ｗの値を求める第１処理手段と、
前記第１処理手段の処理の次に行われ、新しい変数ｗの値に基づく前記式２の新しい制約条件が有効になったら、新たに有効になった制約条件の係数ベクトルａ^rを、制約条件が有効である係数ベクトルから構成されている行列Ａ_qに加え、かつ前記関数Ｅ（ｗ）の勾配を表す転置行列∇^tＥ（ｗ）から前記関数Ｅ（ｗ）の変化方向を表すベクトルｄを求めるためのヘシアンＨを式３により更新して処理を前記第１処理手段に戻し、新しい変数ｗの値に基づく新しい制約条件が有効にならなかったら、新しいヘシアンＨを作成するための公式にて、新たなヘシアンＨを更新して処理を前記第１処理手段に戻す第２処理手段と、
前記第１処理手段にて、前記転置行列∇^tＥ（ｗ）と前記ヘシアンＨとの積に基づいて得られる前記関数Ｅ（ｗ）の変化方向を表すベクトルｄがゼロとなった場合には、式４に基づいて行列で得られるラグランジュ乗数λの要素すべてが非負ならば、そのときのｗを解として得て全処理を終了し、前記ベクトルｄがゼロでない場合には、ラグランジュ乗数λの負の要素の内、絶対値が最大のものに対応する制約条件の係数ベクトルａ^sを、前記行列Ａ_qから除いて、式５に基づいてヘシアンＨを更新して処理を第１処理手段に戻す第３処理手段と、
を備えて準ニュートン射影法による演算を行うと共に、
１〜Ｍの整数の内、相異なるｑ個の整数を要素とする集合Ｉ_cを式６に示すごとく表し、集合Ｉ_cに含まれる各整数ｌ_i（ｉ＝１，２，…，ｑ）に対して１行Ｍ列のベクトルで、第ｌ_iの要素がｃ_liであり他の要素がすべて０、かつｃ_liが＋１または−１で定義されるベクトルを式７の記号で表し、更に前記行列Ａ_qを式８に示すごとくｑ行Ｍ列の行列で表すことで、前記式４の計算の内、式９にて表す行列の計算の代わりに、整数ｍ∈Ｉ_cならばｂ_m＝１、整数ｍ∈Ｉ_cでないならばｂ_m＝０である関数ｂ_mを対角要素とする対角行列ｄｉａｇ［ｂ₁ ｂ₂ … ｂ_M］の計算を用いることを特徴とする簡略化準ニュートン射影法演算システム。

Using a digital arithmetic unit, when obtaining a solution of a variable w that has a minimum value of a function E (w) composed of M variables w represented by Expression 1 and satisfying the constraint condition of Expression 2,
A first processing means for performing a straight line search to obtain a value of a variable w for reducing a value of the function E (w);
When the new constraint condition of Formula 2 based on the value of the new variable w becomes effective after the processing of the first processing means, the coefficient vector a ^r of the newly enabled constraint condition is ^expressed as the constraint condition. vector d that is added to the matrix a _q that consists of the coefficient vector is valid and represents the direction of change of the function E (w) from said function E (w) transposed matrix ∇ ^t E representing the gradient of the (w) The Hessian H for determining is updated by Equation 3 and the process is returned to the first processing means. If a new constraint based on the value of the new variable w does not take effect, the formula for creating a new Hessian H is obtained. A second processing means for updating a new Hessian H and returning the processing to the first processing means;
When the vector d indicating the change direction of the function E (w) obtained based on the product of the transposed matrix ∇ ^t E (w) and the Hessian H is zero in the first processing means. If all the elements of the Lagrange multiplier λ obtained by the matrix based on Equation 4 are non-negative, w is obtained as a solution and the whole process is terminated. If the vector d is not zero, the Lagrange multiplier λ The coefficient vector a ^s of the constraint condition corresponding to the negative element having the maximum absolute value among the negative elements is removed from the matrix A _q , and the Hessian H is updated on the basis of the expression 5, and the processing is performed as the first processing means. A third processing means for returning;
With the quasi-Newton projection method,
A set I _c having q integers different from the integers 1 to M is expressed as shown in Expression 6, and each integer l _i (i = 1, 2,..., Q) included in the set I _c is expressed. A vector defined by a 1-by-M vector, the l _i element being c _li , all other elements being 0, and c _li being +1 or −1 is represented by the symbol of Equation 7, and By expressing the matrix A _q as a matrix of q rows and M columns as shown in Expression 8, instead of calculating the matrix expressed in Expression 9 among the calculations in Expression 4, b _m if integer m∈I _c. = 1, if not an integer mεI _c , a simplification characterized by using a calculation of a diagonal matrix diag [b ₁ b ₂ ... B _M ] having a function b _m with b _m = 0 as diagonal elements. Quasi-Newton projection method computing system.

デジタル式演算装置を用いて、式１１にて表され式１２の制約条件を満たすＭ個の変数ｗからなる関数Ｅ（ｗ）が最小値となる変数ｗの解を求めるに際して、
直線探索を行って、関数Ｅ（ｗ）の値を小さくする変数ｗの値を求める第１処理手段と、
前記第１処理手段の処理の次に行われ、新しい変数ｗの値に基づく前記式１２の新しい制約条件が有効になったら、新たに有効になった制約条件の係数ベクトルａ^rを、制約条件が有効である係数ベクトルから構成されている行列Ａ_qに加え、かつ前記関数Ｅ（ｗ）の勾配を表す転置行列∇^tＥ（ｗ）から前記関数Ｅ（ｗ）の変化方向を表すベクトルｄを求めるためのヘシアンＨを式１３により更新して処理を前記第１処理手段に戻し、新しい変数ｗの値に基づく新しい制約条件が有効にならなかったら、新しいヘシアンＨを作成するための公式にて、新たなヘシアンＨを更新して処理を前記第１処理手段に戻す第２処理手段と、
前記第１処理手段にて、前記転置行列∇^tＥ（ｗ）と前記ヘシアンＨとの積に基づいて得られる前記関数Ｅ（ｗ）の変化方向を表すベクトルｄがゼロとなった場合には、式１４に基づいて行列で得られるラグランジュ乗数λの要素すべてが非負ならば、そのときのｗを解として得て全処理を終了し、前記ベクトルｄがゼロでない場合には、ラグランジュ乗数λの負の要素の内、絶対値が最大のものに対応する制約条件の係数ベクトルａ^sを、前記行列Ａ_qから除いて、式１５に基づいてヘシアンＨを更新して処理を第１処理手段に戻す第３処理手段と、
を備えて準ニュートン射影法による演算を行うと共に、
ヘシアンＨの第ｉ行第ｊ列の要素をｈ_ijで表し、全ての制約条件における各係数ベクトルａ^rの第ｒ要素が＋１または−１であり、他の要素がすべて０であるとして表すことで、前記式１３の計算の内、式１６にて表す行列の計算の代わりに、第ｉ行第ｊ列の要素が式１７で表されるＭ行Ｍ列の行列の計算を用いることを特徴とする簡略化準ニュートン射影法演算システム。

Using a digital arithmetic unit, when obtaining a solution of a variable w that minimizes a function E (w) composed of M variables w represented by Expression 11 and satisfying the constraint condition of Expression 12,
A first processing means for performing a straight line search to obtain a value of a variable w for reducing a value of the function E (w);
When the new constraint condition of the expression 12 based on the value of the new variable w becomes effective after the processing of the first processing means, the coefficient vector a ^r of the newly enabled constraint condition is ^expressed as the constraint condition. vector d which is added to the matrix a _q that consists of the coefficient vector is valid and represents the direction of change of the function E (w) from said function E (w) transposed matrix ∇ ^t E representing the gradient of the (w) The Hessian H for determining is updated by the equation 13 and the process is returned to the first processing means. If a new constraint based on the value of the new variable w does not become effective, the formula for creating a new Hessian H is obtained. A second processing means for updating a new Hessian H and returning the processing to the first processing means;
When the vector d indicating the change direction of the function E (w) obtained based on the product of the transposed matrix ∇ ^t E (w) and the Hessian H is zero in the first processing means. If all the elements of the Lagrange multiplier λ obtained by the matrix based on Equation 14 are non-negative, w is obtained as a solution and the whole process is terminated. If the vector d is not zero, the Lagrange multiplier λ The coefficient vector a ^s of the constraint condition corresponding to the negative element having the maximum absolute value among the negative elements is removed from the matrix A _q , and the Hessian H is updated on the basis of Expression 15 and the processing is performed as the first processing means. A third processing means for returning;
With the quasi-Newton projection method,
The element of the i-th row and j-th column of Hessian H is represented by h _ij , and the r-th element of each coefficient vector a ^r in all the constraints is represented as +1 or −1 and the other elements are represented as 0. In the calculation of Expression 13, instead of the calculation of the matrix expressed by Expression 16, the calculation of the matrix of M rows and M columns in which the elements of the i-th row and the j-th column are expressed by Expression 17 is used. A simplified quasi-Newton projective arithmetic system.

デジタル式演算装置を用いて、式２１にて表され式２２の制約条件を満たすＭ個の変数ｗからなる関数Ｅ（ｗ）が最小値となる変数ｗの解を求めるに際して、
直線探索を行って、関数Ｅ（ｗ）の値を小さくする変数ｗの値を求める第１処理手段と、
前記第１処理手段の処理の次に行われ、新しい変数ｗの値に基づく前記式２２の新しい制約条件が有効になったら、新たに有効になった制約条件の係数ベクトルａ^rを、制約条件が有効である係数ベクトルから構成されている行列Ａ_qに加え、かつ前記関数Ｅ（ｗ）の勾配を表す転置行列∇^tＥ（ｗ）から前記関数Ｅ（ｗ）の変化方向を表すベクトルｄを求めるためのヘシアンＨを式２３により更新して処理を前記第１処理手段に戻し、新しい変数ｗの値に基づく新しい制約条件が有効にならなかったら、新しいヘシアンＨを作成するための公式にて、新たなヘシアンＨを更新して処理を前記第１処理手段に戻す第２処理手段と、
前記第１処理手段にて、前記転置行列∇^tＥ（ｗ）と前記ヘシアンＨとの積に基づいて得られる前記関数Ｅ（ｗ）の変化方向を表すベクトルｄがゼロとなった場合には、式２４に基づいて行列で得られるラグランジュ乗数λの要素すべてが非負ならば、そのときのｗを解として得て全処理を終了し、前記ベクトルｄがゼロでない場合には、ラグランジュ乗数λの負の要素の内、絶対値が最大のものに対応する制約条件の係数ベクトルａ^sを、前記行列Ａ_qから除いて、式２５に基づいてヘシアンＨを更新して処理を第１処理手段に戻す第３処理手段と、
を備えて準ニュートン射影法による演算を行うと共に、
１〜Ｍの整数の内、相異なるｑ個の整数を要素とする集合Ｉ_cを式２６に示すごとく表し、集合Ｉ_cに含まれる各整数ｌ_i（ｉ＝１，２，…，ｑ）に対して１行Ｍ列のベクトルで、第ｌ_iの要素がｃ_liであり他の要素がすべて０、かつｃ_liが＋１または−１で定義されるベクトルを式２７の記号で表し、更に前記行列Ａ_qを式２８に示すごとくｑ行Ｍ列の行列で表し、∇Ｅ（ｗ ^k ）を式２９に示すごとく表すことで、前記式２４の計算の内、式３０にて表す行列の計算の代わりに、式３１にて表す計算を用いることを特徴とする簡略化準ニュートン射影法演算システム。

Using a digital arithmetic unit, when finding a solution of a variable w that has a minimum value of a function E (w) composed of M variables w represented by Expression 21 and satisfying the constraints of Expression 22,
A first processing means for performing a straight line search to obtain a value of a variable w for reducing a value of the function E (w);
When the new constraint condition of the expression 22 based on the value of the new variable w becomes effective after the processing of the first processing means, the coefficient vector a ^r of the newly enabled constraint condition is ^expressed as the constraint condition. vector d which is added to the matrix a _q that consists of the coefficient vector is valid and represents the direction of change of the function E (w) from said function E (w) transposed matrix ∇ ^t E representing the gradient of the (w) The Hessian H for determining is updated by the equation 23 and the process is returned to the first processing means. If a new constraint based on the value of the new variable w does not become effective, the formula for creating a new Hessian H is obtained. A second processing means for updating a new Hessian H and returning the processing to the first processing means;
When the vector d indicating the change direction of the function E (w) obtained based on the product of the transposed matrix ∇ ^t E (w) and the Hessian H is zero in the first processing means. If all the elements of the Lagrange multiplier λ obtained by the matrix based on Equation 24 are non-negative, w is obtained as a solution and the whole process is terminated. If the vector d is not zero, the Lagrange multiplier λ The coefficient vector a ^s of the constraint condition corresponding to the negative element having the maximum absolute value among the negative elements is removed from the matrix A _q , and the Hessian H is updated based on Expression 25, and the processing is performed as the first processing means. A third processing means for returning;
With the quasi-Newton projection method,
A set I _c having q integers different from the integers 1 to M is expressed as shown in Expression 26, and each integer l _i (i = 1, 2,..., Q) included in the set I _c is expressed. A vector defined by a 1-by-M vector, the l _i -th element is c _li , the other elements are all 0, and c _li is +1 or −1, and is represented by the symbol of Equation 27, The matrix A _q is expressed by a matrix of q rows and M columns as shown in Expression 28, and ∇E (w ^k ) is expressed as shown in Expression 29, so that the matrix expressed by Expression 30 in the calculation of Expression 24 can be obtained. A simplified quasi-Newton projection method computing system characterized by using the calculation represented by Equation 31 instead of the calculation.

デジタル式演算装置を用いて、式４１にて表され式４２の制約条件を満たすＭ個の変数ｗからなる関数Ｅ（ｗ）が最小値となる変数ｗの解を求めるに際して、
直線探索を行って、関数Ｅ（ｗ）の値を小さくする変数ｗの値を求める第１処理手段と、
前記第１処理手段の処理の次に行われ、新しい変数ｗの値に基づく前記式４２の新しい制約条件が有効になったら、新たに有効になった制約条件の係数ベクトルａ^rを、制約条件が有効である係数ベクトルから構成されている行列Ａ_qに加え、かつ前記関数Ｅ（ｗ）の勾配を表す転置行列∇^tＥ（ｗ）から前記関数Ｅ（ｗ）の変化方向を表すベクトルｄを求めるためのヘシアンＨを式４３により更新して処理を前記第１処理手段に戻し、新しい変数ｗの値に基づく新しい制約条件が有効にならなかったら、新しいヘシアンＨを作成するための公式にて、新たなヘシアンＨを更新して処理を前記第１処理手段に戻す第２処理手段と、
前記第１処理手段にて、前記転置行列∇^tＥ（ｗ）と前記ヘシアンＨとの積に基づいて得られる前記関数Ｅ（ｗ）の変化方向を表すベクトルｄがゼロとなった場合には、式４４に基づいて行列で得られるラグランジュ乗数λの要素すべてが非負ならば、そのときのｗを解として得て全処理を終了し、前記ベクトルｄがゼロでない場合には、ラグランジュ乗数λの負の要素の内、絶対値が最大のものに対応する制約条件の係数ベクトルａ^sを、前記行列Ａ_qから除いて、式４５に基づいてヘシアンＨを更新して処理を第１処理手段に戻す第３処理手段と、
を備えて準ニュートン射影法による演算を行うと共に、
１〜Ｍの整数の内、相異なるｑ個の整数を要素とする集合Ｉ_cを式４６に示すごとく表し、集合Ｉ_cに含まれる各整数ｌ_i（ｉ＝１，２，…，ｑ）に対して１行Ｍ列のベクトルで、第ｌ_iの要素がｃ_liであり他の要素がすべて０、かつｃ_liが＋１または−１で定義されるベクトルを式４７の記号で表し、更に前記行列Ａ_qを式４８に示すごとくｑ行Ｍ列の行列で表すことで、前記式４５の計算の内、式４９にて表す行列の計算の代わりに、第ｓ行ｓ列の要素が１で他の要素が全て０であるＭ行Ｍ列の計算を用いることを特徴とする簡略化準ニュートン射影法演算システム。

Using a digital arithmetic unit, when obtaining a solution of a variable w that minimizes a function E (w) composed of M variables w represented by Expression 41 and satisfying the constraints of Expression 42,
A first processing means for performing a straight line search to obtain a value of a variable w for reducing a value of the function E (w);
When the new constraint condition of the equation 42 based on the value of the new variable w becomes effective after the processing of the first processing means, the coefficient vector a ^r of the newly enabled constraint condition is ^expressed as the constraint condition. vector d which is added to the matrix a _q that consists of the coefficient vector is valid and represents the direction of change of the function E (w) from said function E (w) transposed matrix ∇ ^t E representing the gradient of the (w) The Hessian H for determining is updated by the equation 43 and the processing is returned to the first processing means. If a new constraint based on the value of the new variable w does not become effective, the formula for creating a new Hessian H is obtained. A second processing means for updating a new Hessian H and returning the processing to the first processing means;
When the vector d indicating the change direction of the function E (w) obtained based on the product of the transposed matrix ∇ ^t E (w) and the Hessian H is zero in the first processing means. If all the elements of the Lagrange multiplier λ obtained by the matrix based on the equation 44 are non-negative, the w is obtained as a solution and the whole process is terminated. If the vector d is not zero, the Lagrange multiplier λ The coefficient vector a ^s of the constraint condition corresponding to the negative element having the maximum absolute value among the negative elements is removed from the matrix A _q , and the Hessian H is updated based on Expression 45, and the process is performed as the first processing means. A third processing means for returning;
With the quasi-Newton projection method,
A set I _c whose elements are q integers different from the integers 1 to M is expressed as shown in Expression 46, and each integer l _i (i = 1, 2,..., Q) included in the set I _c is expressed. A vector defined by a 1-by-M vector, the l _i element being c _li , all other elements being 0, and c _li being +1 or −1 is represented by the symbol of Equation 47, By expressing the matrix A _q as a matrix of q rows and M columns as shown in Expression 48, in the calculation of Expression 45, the element of the s-th row and s-column is 1 instead of the calculation of the matrix expressed by Expression 49. And a simplified quasi-Newton projection method computing system characterized by using a calculation of M rows and M columns in which all other elements are 0.

デジタル式演算装置を用いて、式５１にて表され式５２の制約条件を満たすＭ個の変数ｗからなる関数Ｅ（ｗ）が最小値となる変数ｗの解を求めるに際して、
直線探索を行って、関数Ｅ（ｗ）の値を小さくする変数ｗの値を求める第１処理手段と、
前記第１処理手段の処理の次に行われ、新しい変数ｗの値に基づく前記式５２の新しい制約条件が有効になったら、新たに有効になった制約条件の係数ベクトルａ^rを、制約条件が有効である係数ベクトルから構成されている行列Ａ_qに加え、かつ前記関数Ｅ（ｗ）の勾配を表す転置行列∇^tＥ（ｗ）から前記関数Ｅ（ｗ）の変化方向を表すベクトルｄを求めるためのヘシアンＨを式５３により更新して処理を前記第１処理手段に戻し、新しい変数ｗの値に基づく新しい制約条件が有効にならなかったら、新しいヘシアンＨを作成するための公式にて、新たなヘシアンＨを更新して処理を前記第１処理手段に戻す第２処理手段と、
前記第１処理手段にて、前記転置行列∇^tＥ（ｗ）と前記ヘシアンＨとの積に基づいて得られる前記関数Ｅ（ｗ）の変化方向を表すベクトルｄがゼロとなった場合には、式５４に基づいて行列で得られるラグランジュ乗数λの要素すべてが非負ならば、そのときのｗを解として得て全処理を終了し、前記ベクトルｄがゼロでない場合には、ラグランジュ乗数λの負の要素の内、絶対値が最大のものに対応する制約条件の係数ベクトルａ^sを、前記行列Ａ_qから除いて、式５５に基づいてヘシアンＨを更新して処理を第１処理手段に戻す第３処理手段と、
を備えて準ニュートン射影法による演算を行うと共に、
１〜Ｍの整数の内、相異なるｑ個の整数を要素とする集合Ｉ_cを式５６に示すごとく表し、集合Ｉ_cに含まれる各整数ｌ_i（ｉ＝１，２，…，ｑ）に対して１行Ｍ列のベクトルで、第ｌ_iの要素がｃ_liであり他の要素がすべて０、かつｃ_liが＋１または−１で定義されるベクトルを式５７の記号で表し、更に前記行列Ａ_qを式５８に示すごとくｑ行Ｍ列の行列で表すことで、前記式５４の計算の内、式５９にて表す行列の計算の代わりに、整数ｍ∈Ｉ_cならばｂ_m＝１、整数ｍ∈Ｉ_cでないならばｂ_m＝０である関数ｂ_mを対角要素とする対角行列ｄｉａｇ［ｂ₁ ｂ₂ … ｂ_M］の計算を用い、前記式５５の計算の内、式６０にて表す行列の計算の代わりに、第ｓ行ｓ列の要素が１で他の要素が全て０であるＭ行Ｍ列の計算を用い、
更に、∇Ｅ（ｗ ^k ）を式６１に示すごとく表すことで、前記式５４の計算の内、式６２にて表す行列の計算の代わりに、式６３にて表す計算を用い、
更に、ヘシアンＨの第ｉ行第ｊ列の要素をｈ_ijで表し、全ての制約条件における各係数ベクトルａ^rの第ｒ要素が＋１または−１であり、他の要素がすべて０であるとして表すことで、前記式５３の計算の内、式６４にて表す行列の計算の代わりに、第ｉ行第ｊ列の要素が式６５で表されるＭ行Ｍ列の行列の計算を用いることを特徴とする簡略化準ニュートン射影法演算システム。

When a solution of the variable w having the minimum value of the function E (w) composed of M variables w expressed by the equation 51 and satisfying the constraint condition of the equation 52 is obtained using a digital arithmetic unit,
A first processing means for performing a straight line search to obtain a value of a variable w for reducing a value of the function E (w);
When the new constraint condition of the expression 52 based on the value of the new variable w becomes effective after the process of the first processing means, the coefficient vector a ^r of the newly enabled constraint condition is ^expressed as the constraint condition. vector d which is added to the matrix a _q that consists of the coefficient vector is valid and represents the direction of change of the function E (w) from said function E (w) transposed matrix ∇ ^t E representing the gradient of the (w) The Hessian H for determining is updated by the equation 53, and the process is returned to the first processing means. If a new constraint based on the value of the new variable w does not become effective, the formula for creating a new Hessian H is obtained. A second processing means for updating a new Hessian H and returning the processing to the first processing means;
When the vector d indicating the change direction of the function E (w) obtained based on the product of the transposed matrix ∇ ^t E (w) and the Hessian H is zero in the first processing means. If all the elements of the Lagrange multiplier λ obtained by the matrix based on the equation 54 are nonnegative, w is obtained as a solution and the whole process is terminated. If the vector d is not zero, the Lagrange multiplier λ The coefficient vector a ^s of the constraint condition corresponding to the negative element having the maximum absolute value among the negative elements is removed from the matrix A _q , and the Hessian H is updated based on the equation 55, and the processing is performed as the first processing means. A third processing means for returning;
With the quasi-Newton projection method,
A set I _c whose elements are q integers different from the integers 1 to M is represented as shown in Expression 56, and each integer l _i (i = 1, 2,..., Q) included in the set I _c is expressed. A vector defined by a 1-by-M vector, the l _i -th element is c _li , the other elements are all 0, and c _li is +1 or −1, and is represented by the symbol of Expression 57, By expressing the matrix A _q as a matrix of q rows and M columns as shown in Expression 58, instead of calculating the matrix represented by Expression 59 among the calculations of Expression 54, b _m if integer mεI _c. = 1, if not an integer m∈I _c , the calculation of the equation 55 is performed using the calculation of the diagonal matrix diag [b ₁ b ₂ ... B _M ] with the function b _m having b _m = 0 as the diagonal elements. Of these, instead of the calculation of the matrix represented by Equation 60, the calculation of M rows and M columns in which the elements in the s-th row and s-column are 1 and all other elements are 0 is used.
Furthermore, by expressing ∇E (w ^k ) as shown in Equation 61, the calculation expressed by Equation 63 is used instead of the calculation of the matrix expressed by Equation 62 in the calculation of Equation 54 above.
Further, the element of the i-th row and j-th column of Hessian H is represented by h _ij , and the r-th element of each coefficient vector a ^r in all the constraints is +1 or −1, and the other elements are all 0. By expressing, using the calculation of the matrix of M rows and M columns in which the element of the i-th row and the j-th column is expressed by the equation 65 instead of the calculation of the matrix expressed by the equation 64 in the calculation of the equation 53 Simplified quasi-Newton projective computing system characterized by

第２処理手段にて用いられる公式は、ＢＦＧＳ公式、ＤＦＰ公式あるいは対称ランク１公式であることを特徴とする請求項１〜５のいずれかに記載の簡略化準ニュートン射影法演算システム。The formula used in the second processing means, BFGS formula, DFP official or simplified quasi-Newton Projection Method calculation system according to claim 1, wherein the symmetric rank 1 is the official.

前記Ｍ個の変数ｗは、神経回路網における入力層のユニットから出力層のユニットに至るユニットを結合するＭ本のシナプスのシナプス荷重を表し、関数Ｅ（ｗ）は前記神経回路網に与えられる教師信号と前記神経回路網の出力との誤差を表し、第１処理手段、第２処理手段および第３処理手段によって行われる関数Ｅ（ｗ）が最小値となる変数ｗの解を求める処理は、前記神経回路網に対する学習処理であることを特徴とする請求項１〜６のいずれか記載の神経回路網学習システム。The M variables w represent the synaptic loads of M synapses that connect units from the input layer unit to the output layer unit in the neural network, and a function E (w) is given to the neural network. The process of calculating the solution of the variable w that represents the error between the teacher signal and the output of the neural network and that has the minimum value of the function E (w) performed by the first processing means, the second processing means, and the third processing means is , neural network learning system according to any of claims 1 to 6, characterized in that the learning process for the neural network.

請求項１〜６のいずれか記載の簡略化準ニュートン射影法演算システムの各手段としてコンピュータシステムを機能させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体。A computer-readable recording medium in which a program for causing a computer system to function as each means of the simplified quasi-Newton projection method computing system according to claim 1 is recorded.

請求項７記載の神経回路網学習システムの各手段としてコンピュータシステムを機能させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体。A computer-readable recording medium recording a program for causing a computer system to function as each means of the neural network learning system according to claim 7 .

請求項７における神経回路網学習システムによる学習処理により得られた神経回路網を組み込んだことを特徴とする信号処理装置。A signal processing apparatus comprising a neural network obtained by learning processing by the neural network learning system according to claim 7 .

請求項７における神経回路網学習システムによる学習処理により得られた神経回路網と、
処理される入力信号を、前記神経回路網の入力層のユニットへ入力する入力手段と、
前記神経回路網の出力層のユニットの状態を読み取って信号として出力する出力手段と、
を備えたことを特徴とする信号処理装置。A neural network obtained by learning processing by the neural network learning system according to claim 7 ,
An input means for inputting an input signal to be processed to an input layer unit of the neural network;
An output means for reading the state of the unit of the output layer of the neural network and outputting it as a signal;
A signal processing apparatus comprising: