JPH07160659A

JPH07160659A - Learning system

Info

Publication number: JPH07160659A
Application number: JP5329735A
Authority: JP
Inventors: Sumio Watanabe; 澄夫渡辺
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1993-12-01
Filing date: 1993-12-01
Publication date: 1995-06-23

Abstract

PURPOSE:To efficiently perform learning (optimization) with less computation amount and computation time compared to a conventional practice in the learning (optimization) of the system or the like of a neural network or the like. CONSTITUTION:This system is provided with a learning part 1 for performing the learning (optimization) of the system SYS of the neural network or the like so as to optimize the structure of the system SYS, the learning part 1 uses a function provided with learning time as a variable and capable of continuous differentiation for a prescribed parameter to be optimized for converging to a prescribed information amount standard when the learning time becomes infinity and obtains the optimum parameter of the system SYS.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文字認識，画像認識，
音声認識，ロボット制御，株価予測等の種々の分野に利
用される学習方式に関する。BACKGROUND OF THE INVENTION The present invention relates to character recognition, image recognition,
The present invention relates to learning methods used in various fields such as voice recognition, robot control, and stock price prediction.

【０００２】[0002]

【従来の技術】従来、例えば、自由度ＦのパラメータＷ
をもつニューラルネットワークＰ（Ｗ；ｘ，ｙ）の学習
において、Ｓ個の学習データ｛（ｘ_i，ｙ_i）；ｉ＝１，
２，…，Ｓ｝が与えられた場合を考えると、ニューラル
ネットワークの学習は、通常、次式によって与えられる
対数尤度関数ｌ(Ｗ)を最大にするパラメータＷを最尤推
定量として求めることによって行なわれる。2. Description of the Related Art Conventionally, for example, a parameter W having a degree of freedom F is used.
In learning of a neural network P (W; x, y) having S, there are S pieces of learning data {(x _i , y _i ); i = 1,
2, ..., S} is given, the learning of the neural network is usually to find the parameter W that maximizes the log-likelihood function l (W) given by the following equation as the maximum likelihood estimator. Done by.

【０００３】[0003]

【数１】 [Equation 1]

【０００４】しかしながら、ニューラルネットワークの
自由度が大き過ぎる場合には、未知の入力に対する出力
を保証しうる最尤推定量を得ることができない場合があ
る。However, if the degree of freedom of the neural network is too large, it may not be possible to obtain the maximum likelihood estimator that can guarantee the output with respect to the unknown input.

【０００５】そこで、従来では、著者“赤池”らによる
文献「“情報量基準ＡＩＣとは何か”、数理科学，NO.1
53,PP.955〜965,1991」などに示されているように、情
報量基準として、次式のような、予測誤差を最小にする
基準ＡＩＣや、データとモデルの記述長を最小にする基
準ＭＤＬなどを用いる方法が提案されている。[0005] Therefore, conventionally, the author "Akaike" et al., "What is the information standard AIC?", Mathematical Science, No. 1
53, PP.955 ~ 965,1991 ", etc., as the information amount standard, the reference AIC that minimizes the prediction error and the description length of the data and the model are minimized as shown in the following equation. A method using a standard MDL or the like has been proposed.

【０００６】[0006]

【数２】ＡＩＣ＝−２ｌ(Ｗ^*)＋２ＦＭＤＬ＝−２ｌ(Ｗ^*)＋ＦlogＳ[Formula 2] AIC = -2l (W ^* ) + 2F MDL = -2l (W ^* ) + FlogS

【０００７】なお、上式において、Ｗ^*は対数尤度関数
を最小にするパラメータ（最尤推定量）である。In the above equation, W ^* is a parameter (maximum likelihood estimator) that minimizes the log-likelihood function.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、上記の
ような情報量基準を最適化するパラメータ（最尤推定
量）を求めるのに、従来では、パラメータの取りうる自
由度の全ての場合に対して、最尤推定量を算出し、しか
る後に、情報量基準の値を最小とするパラメータを決定
していたので、多大な演算量と演算時間が必要になると
いう欠点があった。However, in order to obtain a parameter (maximum likelihood estimator) for optimizing the information amount criterion as described above, conventionally, in all cases of the degree of freedom that the parameter can take. The maximum likelihood estimator is calculated, and after that, the parameter that minimizes the value of the information amount standard is determined, so that there is a disadvantage that a large amount of calculation and a large amount of calculation time are required.

【０００９】本発明は、ニューラルネットワークなどの
システム等の学習（最適化）において、従来に比べて、
少ない演算量と演算時間で学習（最適化）を効率良く行
なうことの可能な学習方式を提供することを目的として
いる。In the learning (optimization) of a system such as a neural network, the present invention is
It is an object of the present invention to provide a learning method capable of efficiently performing learning (optimization) with a small calculation amount and calculation time.

【００１０】[0010]

【課題を解決するための手段および作用】上記目的を達
成するために、本発明は、学習時間を変数として持ち、
最適化のなされるべき所定のパラメータに対して連続微
分可能であって、学習時間が無限大となるとき所定の情
報量基準に収束する関数を用いて、最適なパラメータを
求めるようになっている。これにより、システム等の学
習を、従来に比べて少ない演算量と演算時間で効率良く
行なうことができる。To achieve the above object, the present invention has a learning time as a variable,
It is possible to continuously differentiate with respect to the specified parameters to be optimized, and when the learning time becomes infinite, the function that converges to the specified information criterion is used to find the optimum parameters. . As a result, the learning of the system and the like can be efficiently performed with a smaller calculation amount and calculation time compared to the conventional case.

【００１１】[0011]

【実施例】以下、本発明の実施例を図面に基づいて説明
する。図１は本発明に係る学習方式を適用した学習機械
の概略構成図である。図１を参照すると、この学習機械
は、ニューラルネットワークなどのシステムＳＹＳの学
習（最適化）を行なうための学習部１を有しており、こ
の学習部１は、システムＳＹＳの構造の最適化を行なう
ために、先ずシステムＳＹＳの構造を入出力空間上の所
定の確率密度関数Ｐで規定し(すなわち、モデル化し)、
このモデルに対して、所定の情報量基準ＩＣを用いて、
システムの最適なパラメータ(システムの最適な構造)を
求めるようになっている。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a schematic configuration diagram of a learning machine to which a learning method according to the present invention is applied. Referring to FIG. 1, the learning machine has a learning unit 1 for performing learning (optimization) of a system SYS such as a neural network. The learning unit 1 optimizes the structure of the system SYS. To do so, first define the structure of the system SYS with a predetermined probability density function P in the input / output space (ie, model),
For this model, using a predetermined information amount reference IC,
It is designed to find the optimal system parameters (the optimal system structure).

【００１２】いま、Ｒ^M(＋)Ｒ^Nを入出力の直和空間と
し、｛(ｘ_i，ｙ_i)｝^S _i=1を入出力空間上のＳ個のサンプ
ル(学習データ)とし、上記確率密度関数として、次式の
ものを考える。Now, ^let R ^M (+) R ^N be the direct sum space of input and output, and {(x _i , y _i )} ^S _{i = 1 be} S samples (learning data) on the input and output space, Consider the following equation as the above probability density function.

【００１３】[0013]

【数３】 [Equation 3]

【００１４】なお、上記式は、システムＳＹＳが関数近
似型ニューラルネットワークであるとした場合に、この
関数近似型ニューラルネットワークの作る確率密度関数
となっている。この確率密度関数によって入出力間の関
係，すなわちシステムの構造が規定されるモデルに対
し、情報量基準ＩＣとして、次式のものを考える。The above equation is a probability density function created by the function approximation type neural network when the system SYS is a function approximation type neural network. For the model in which the relationship between input and output, that is, the structure of the system is defined by this probability density function, consider the following formula as the information amount reference IC.

【００１５】[0015]

【数４】ＩＣ＝−２ｌ(Ｗ)＋ＡＦ[Equation 4] IC = −2 l (W) + AF

【００１６】ここで、Ｗはパラメータ(ｗ，σ)，Ｆはパ
ラメータの連続変数自由度，ｌ(Ｗ)は数１によって表わ
される対数尤度関数である。また、Ａが２の場合、情報
量基準ＩＣはＡＩＣとなり、ＡがlogＳの場合、情報量
基準ＩＣはＭＤＬとなる。Here, W is a parameter (w, σ), F is a continuous variable degree of freedom of the parameter, and l (W) is a log-likelihood function represented by the equation 1. When A is 2, the information amount reference IC is AIC, and when A is logS, the information amount reference IC is MDL.

【００１７】システムＳＹＳとしてこのような関数近似
型のニューラルネットワークを考えると、システムの構
造(パラメータ)を最適化するという問題は、情報量基準
ＩＣを最小にするという問題，すなわち次式の関数ＩＣ
(Ｗ：ｗ，σ)を最小にするという問題に帰着する。Considering such a function approximation type neural network as the system SYS, the problem of optimizing the structure (parameters) of the system is the problem of minimizing the information amount reference IC, that is, the function IC of the following equation.
We come to the problem of minimizing (W: w, σ).

【００１８】[0018]

【数５】 [Equation 5]

【００１９】なお、上式において、推定と関係のない項
は省略している。また上式において、Ｆ₀(ｗ)は、パラ
メータｗの自由度(０でないｗの個数)である。システム
ＳＹＳが図２に示すような３層パーセプトロンである場
合、自由度Ｆ₀(ｗ)は、次式によって与えられる。In the above equation, terms not related to the estimation are omitted. In the above equation, F ₀ (w) is the degree of freedom of the parameter w (the number of w that is not 0). If the system SYS is a three-layer perceptron as shown in FIG. 2, then the degree of freedom F ₀ (w) is given by

【００２０】[0020]

【数６】 [Equation 6]

【００２１】ここで、３層パーセプトロンが、Ｎ個のユ
ニットｉからなる入力層１１と、Ｈ個のユニットｊから
なる中間層１２と、Ｍ個のユニットｋからなる出力層１
３とにより構成されているとしており、上式において、
ｗ_ijは、入力層１１のユニットｉから中間層１２のユニ
ットｊへの重みを表わし、また、ｗ_jkは、中間層１２の
ユニットｊから出力層１３のユニットｋへの重みを表わ
している。Here, the three-layer perceptron has an input layer 11 composed of N units i, an intermediate layer 12 composed of H units j, and an output layer 1 composed of M units k.
3 and, in the above equation,
w _ij represents the weight from the unit i of the input layer 11 to the unit j of the intermediate layer 12, and w _jk represents the weight from the unit j of the intermediate layer 12 to the unit k of the output layer 13.

【００２２】ところで、上記数６において、ｆ
₀(ｗ_ij)，ｆ₀(ｗ_jk)はそれぞれ、ｗ_ij＝０，ｗ_jk＝０の
ときに“０”，ｗ_ij≠０，ｗ_jk≠０のときに“１”とな
る関数である。情報量基準ＩＣの最小化を行なうために
は、情報量基準に微分演算を施す必要があるが、この情
報量基準ＩＣに含まれる関数ｆ₀(ｗ_ij)，ｆ₀(ｗ_jk)が微
分可能ではないため、情報量基準ＩＣから最適なパラメ
ータ(自由度)を直接求めることは一般に非常に難かし
く、実際、従来では、このために、最適なパラメータを
得るのに、多大な演算量と演算時間とを必要としてい
た。すなわち、このような情報量基準ＩＣから最適なパ
ラメータ(自由度)を求めようとするとき、一般には、ｗ
_ij，ｗ_jkが０である場合と０でない場合とについて場合
分けし、これらの全ての場合についてそれぞれ最急降下
法を行なわなければならず、最適化を少ない演算量と演
算時間で効率良く行なうことはできなかった。By the way, in the above equation 6, f
₀ (w _ij ), f ₀ (w _jk ) are functions that become “0” when w _ij = 0 and w _jk = 0, and “1” when w _ij ≠ 0 and w _jk ≠ 0, respectively. is there. In order to minimize the information amount reference IC, it is necessary to perform a differential operation on the information amount reference, but the functions f ₀ (w _ij ), f ₀ (w _jk ) included in this information amount reference IC are differentiated. Since it is not possible, it is generally very difficult to directly obtain the optimum parameter (degree of freedom) from the information amount reference IC. In fact, conventionally, for this reason, a large amount of calculation is required to obtain the optimum parameter. It required calculation time and. That is, when an optimum parameter (degree of freedom) is to be obtained from such an information amount reference IC, generally, w
_{Cases in which ij} and w _jk are 0 and cases in which they are not 0 must be divided, and the steepest descent method must be performed in each of these cases, and optimization should be performed efficiently with a small amount of computation and computation time. I couldn't.

【００２３】本発明では、このような問題を克服するた
め、上記関数ｆ₀(ｘ)のかわりに、次式の関数ｆ
_α(t)(ｘ)を導入する。In the present invention, in order to overcome such a problem, instead of the function f ₀ (x), the function f of the following equation
Introduce _{α (t)} (x).

【００２４】[0024]

【数７】 [Equation 7]

【００２５】ここで、ｔは時間であり、α(ｔ)は時間ｔ
によって変化する変数である。具体的には、α(ｔ)は、
時刻ｔ＝０のときには初期値α₀(≠０)をもち、時間ｔ
が進むにつれて連続的に小さくなり、時間ｔが無限大の
ときに“０”となるものである。すなわち、ｔが無限大
のとき、次式のように、ｆ_α(t)(ｘ)はｆ₀(ｘ)となる。Where t is time and α (t) is time t
Is a variable that changes depending on. Specifically, α (t) is
At time t = 0, it has an initial value α ₀ (≠ 0) and time t
Becomes smaller as time advances, and becomes "0" when the time t is infinite. That is, when t is infinite, f _{α (t)} (x) becomes f ₀ (x) as in the following equation.

【００２６】[0026]

【数８】 [Equation 8]

【００２７】そこで、数６のＦ₀(ｗ)において、ｆ₀(ｗ
_ij)，ｆ₀(ｗ_jk)をｆ_α(t)(ｗ_ij)，ｆ_α(t)(ｗ_jk)に置き
換え、これをＦ_α(t)(ｗ)とし、数５の情報量基準ＩＣ
(ｗ，σ)を次式(数９)のように拡張すると、ｆ_α(t)(ｗ
_ij)，ｆ_α(t)(ｗ_jk)がパラメータｗ_ij，ｗ_jkに関して連
続微分可能であって、Ｆ_α(t)(ｗ)は学習時間が無限大
になるときに(ｔ→∞)、Ｆ₀(ｗ)に収束するので(すなわ
ち、ＩＣ_α(t)(ｗ，σ)は数５のＩＣ(ｗ，σ)に収束す
るので)、ＩＣ(ｗ，σ)を最小にする問題は、ＩＣ_α(t)
(ｗ，σ)を最小にするｗ，σを、学習時間ｔの経過とと
もに求めれば良いことになる。Therefore, in F ₀ (w) of the _equation 6, f ₀ (w
_ij ), f ₀ (w _jk ) is replaced with f _{α (t)} (w _ij ), f _{α (t)} (w _jk ), and this is F _{α (t)} (w) IC
When (w, σ) is expanded as in the following equation (Equation 9), f _{α (t)} (w
_ij ), f _{α (t)} (w _jk ) is continuously differentiable with respect to the parameters w _ij , w _jk , and F _{α (t)} (w) becomes (t → ∞) when the learning time becomes infinite. , F ₀ (w) (ie, IC _{α (t)} (w, σ) converges to IC (w, σ) of Equation 5), the problem of minimizing IC (w, σ) Is IC _{α (t)}
It is only necessary to find w and σ that minimize (w, σ) as the learning time t elapses.

【００２８】[0028]

【数９】 [Equation 9]

【００２９】すなわち、次式のように最小降下法により
求めることができる。That is, it can be obtained by the minimum descent method as in the following equation.

【００３０】[0030]

【数１０】 [Equation 10]

【００３１】なお、上式では、パラメータｗとσとを連
立させて求めるようにしている。これに対し、パラメー
タｗに関する更新則をパターン毎に行なう場合には、次
式のようにすれば良い。In the above equation, the parameters w and σ are simultaneously calculated. On the other hand, when the update rule regarding the parameter w is performed for each pattern, the following equation may be used.

【００３２】[0032]

【数１１】 [Equation 11]

【００３３】このように、数１０あるいは数１１によ
り、時間ｔが経過する都度(△ｔが増加する都度)、α
(ｔ)を△αづつ減少させながら、最終的に、パラメータ
ｗ，σの最適値(すなわち、システムの最適な構造)を得
ることができる。As described above, according to the equations 10 and 11, each time time t elapses (each time Δt increases), α
Finally, the optimum values of the parameters w and σ (that is, the optimum structure of the system) can be obtained while decreasing (t) by Δα.

【００３４】換言すれば、本発明では、数７により定義
された関数ｆ_α(t)(ｗ_ij)，ｆ_α(t)(ｗ_jk)により、情報
量基準を数９のようにＩＣ_α(t)(ｗ，σ)に拡張し、こ
の拡張された(連続微分可能な)情報量基準ＩＣ
_α(t)(ｗ，σ)をｔ→∞にしながら最小化することによ
り、システムの最適化を行なうことができるので、前述
したようにｗ_ij，ｗ_jkが０である場合と０でない場合と
について場合分けし各場合についてそれぞれ処理を行な
う必要はなく、場合分けなどをせずに、基本的に１回の
処理により、少ない演算量と少ない演算時間で効率良く
最適化を行なうことができる。In other words, in the present invention, the information amount reference is expressed by IC _α as shown in equation 9 by the functions f _{α (t)} (w _ij ), f _{α (t)} (w _jk ) defined by equation 7. _(t) is extended to (w, σ), and this extended (continuously differentiable) information amount reference IC
_Since the system can be optimized by minimizing _{α (t)} (w, σ) while t → ∞, as described above, when w _ij and w _jk are 0 and when they are not 0. It is not necessary to carry out processing for each case separately for each case, and it is possible to efficiently perform optimization with a small amount of calculation and a small calculation time by basically performing the processing once, without performing processing for each case. .

【００３５】図３には、本発明による処理の具体例が示
されている。図３を参照すると、学習部１は、先ず、シ
ステムＳＹＳのパラメータｗを初期設定する(ステップ
Ｓ１)。システムＳＹＳが例えば図２に示すような３層
パーセプトロンである場合、システムＳＹＳのパラメー
タｗとして、ｗ_ij，ｗ_jkを乱数により発生し、これを初
期設定する。FIG. 3 shows a concrete example of the processing according to the present invention. Referring to FIG. 3, the learning unit 1 first initializes the parameter w of the system SYS (step S1). When the system SYS is, for example, a three-layer perceptron as shown in FIG. 2, as parameters w of the system SYS, w _ij and w _jk are generated by random numbers and are initialized.

【００３６】また、システムＳＹＳの他のパラメータσ
の初期値を、数１０により，すなわち次式(数１２)によ
り計算する(ステップＳ２)。Further, another parameter σ of the system SYS
The initial value of is calculated by Equation 10, that is, by the following equation (Equation 12) (step S2).

【００３７】[0037]

【数１２】 [Equation 12]

【００３８】さらに、変数α(ｔ)の時刻ｔ＝０における
初期値α(０)を適当な値に決定する(ステップＳ３)。Further, the initial value α (0) of the variable α (t) at time t = 0 is determined to be an appropriate value (step S3).

【００３９】このようにして、パラメータ等の初期設定
を行なった後、数１１に従って、パラメータｗの学習を
行なう(ステップＳ４)。すなわち、次式により、パラメ
ータｗの変化量△ｗを求めて、パラメータｗの学習を行
なう。After initializing the parameters and the like in this manner, the parameter w is learned according to the equation 11 (step S4). That is, the amount of change Δw of the parameter w is obtained by the following equation, and the parameter w is learned.

【００４０】[0040]

【数１３】 [Equation 13]

【００４１】しかる後、αが初期値εよりも小さくなっ
たか否か、すなわちαが十分小さくなったか否かを判別
する(ステップＳ５)。この結果、αがεよりも小さくな
い場合には、αを△αだけ小さくして(ステップＳ６)、
再びステップＳ４に戻る。Thereafter, it is judged whether or not α has become smaller than the initial value ε, that is, whether or not α has become sufficiently small (step S5). As a result, when α is not smaller than ε, α is reduced by Δα (step S6),
It returns to step S4 again.

【００４２】このようにして、αが十分に小さくなるま
で、αを△αづつ段階的に順次に小さくし、すなわち時
間ｔが無限大になるときにα(ｔ)が“０”となるように
して、各段階で数１３の演算を行ない、パラメータｗの
学習を行なう。そして、αがεよりも小さくなったとき
に、αが実質的に“０”になったとみなし、学習処理を
終了する。In this way, α is gradually decreased in steps of Δα until α becomes sufficiently small, that is, α (t) becomes “0” when the time t becomes infinite. Then, the equation (13) is calculated at each stage to learn the parameter w. Then, when α becomes smaller than ε, it is considered that α becomes substantially “0”, and the learning process ends.

【００４３】このように、この具体例では、αを△αづ
つ段階的に小さくしていくことにより、少ない演算量と
少ない演算時間でパラメータｗの学習，すなわちシステ
ムＳＹＳの最適化を行なうことができる。As described above, in this specific example, the parameter w is learned with a small amount of calculation and a small amount of calculation time, that is, the system SYS is optimized by decreasing α stepwise by Δα. it can.

【００４４】上述の実施例では、本発明の学習方式をシ
ステムＳＹＳの構造の学習（最適化）に適用した場合を
述べたが、本発明の方式は、実際のシステムＳＹＳの構
造の最適化のみならず、統計的推定処理にも適用するこ
とができる。In the above embodiment, the case where the learning method of the present invention is applied to the learning (optimization) of the structure of the system SYS has been described, but the method of the present invention only optimizes the structure of the actual system SYS. Instead, it can be applied to the statistical estimation process.

【００４５】例えば、与えられたサンプルｘ_iからその
サンプルの従う確率密度を、パラメータｗをもつ確率密
度Ｐ(ｗ;ｘ)で近似する問題に適用可能であって、この
場合、上記関数Ｆ_α(t)(ｗ)を用い、各時刻毎にパラメ
ータに対する最急降下法を利用して最適なパラメータｗ
を求めることができる。For example, it can be applied to the problem of approximating the probability density of a given sample x _{i according} to the sample with the probability density P (w; x) having the parameter w, and in this case, the above function F _{α Using (t) and} (w), the optimum parameter w is obtained by using the steepest descent method for the parameter at each time.
Can be asked.

【００４６】また、与えられた入出力サンプル(ｘ_i,
ｙ_i)からそのサンプルを発生している条件つき確率密度
を、パラメータｗをもつ確率密度Ｐ(ｗ;ｙ|ｘ)で近似す
る問題にも適用可能であって、この場合、上記関数Ｆ
_α(t)(ｗ)を用い、各時刻毎にパラメータに対する最急
降下法を利用して最適なパラメータｗを求めることがで
きる。Further, given input / output samples (x _i ,
It is also applicable to the problem of approximating the conditional probability density generating the sample from y _i ) by the probability density P (w; y | x) having the parameter w. In this case, the function F
_{By using α (t)} (w), the optimum parameter w can be obtained at each time using the steepest descent method for the parameter.

【００４７】また、与えられた入出力サンプル(ｘ_i,
ｙ_i)からそのサンプルを発生している条件つき確率密度
を、パラメータｗを持つ人工的ニューラルネットワーク
Ｐ(ｗ;ｙ|ｘ)で近似する問題に適用可能であって、この
場合、上記関数Ｆ_α(t)(ｗ)を用い、各時刻毎にパラメ
ータに対する最急降下法を利用して最適なパラメータｗ
を求めることができる。Further, given input / output samples (x _i ,
It is applicable to the problem of approximating the conditional probability density generating the sample from y _i ) with the artificial neural network P (w; y | x) having the parameter w, in which case the function F _{Using α (t)} (w), the optimum parameter w is obtained at each time using the steepest descent method for the parameter.
Can be asked.

【００４８】また、与えられた入出力サンプル(ｘ_i,
ｙ_i)からそのサンプルを発生している条件つき確率密度
をパラメータｗを持つ多層パーセプトロンで近似する問
題に適用可能であって、この場合、上記関数Ｆ
_α(t)(ｗ)を用い、各時刻毎にパラメータに対する最急
降下法を利用して最適なパラメータｗを求めることがで
きる。Further, given input / output samples (x _i ,
y _i ) is applicable to the problem of approximating the conditional probability density generating the sample from y _i ) by a multi-layer perceptron with parameter w, in which case the function F
_{By using α (t)} (w), the optimum parameter w can be obtained at each time using the steepest descent method for the parameter.

【００４９】[0049]

【発明の効果】以上に説明したように、本発明によれ
ば、学習時間を変数として持ち、最適化のなされるべき
所定のパラメータに対して連続微分可能であって、学習
時間が無限大となるとき所定の情報量基準に収束する関
数を用いて、最適なパラメータを求めるようになってい
る。学習(最適化)を、従来に比べて少ない演算量と演算
時間で効率良く行なうことができる。As described above, according to the present invention, the learning time is used as a variable, continuously differentiable with respect to a predetermined parameter to be optimized, and the learning time is infinite. In such a case, a function that converges to a predetermined information amount criterion is used to find the optimum parameter. Learning (optimization) can be efficiently performed with a smaller amount of calculation and a shorter calculation time than in the past.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明に係る学習方式が適用される学習機械の
概略構成図である。FIG. 1 is a schematic configuration diagram of a learning machine to which a learning method according to the present invention is applied.

【図２】３層パーセプトロンを示す図である。FIG. 2 is a diagram showing a three-layer perceptron.

【図３】本発明に係る学習方式の処理例を示すフローチ
ャートである。FIG. 3 is a flowchart showing a processing example of a learning method according to the present invention.

【符号の説明】[Explanation of symbols]

１学習部１１入力層１２中間層１３出力層ＳＹＳシステムｗ，σ パラメータ 1 Learning Unit 11 Input Layer 12 Intermediate Layer 13 Output Layer SYS System w, σ Parameter

Claims

【特許請求の範囲】[Claims]

【請求項１】学習時間を変数として持ち、最適化のな
されるべき所定のパラメータに対して連続微分可能であ
って、学習時間が無限大となるとき所定の情報量基準に
収束する関数を用いて、最適なパラメータを求めること
を特徴とする学習方式。1. A function having a learning time as a variable, continuously differentiable with respect to a predetermined parameter to be optimized, and converging to a predetermined information amount criterion when the learning time becomes infinite is used. A learning method characterized by finding optimal parameters.

【請求項２】請求項１記載の学習方式は、所定のシス
テムのパラメータを最適化し、システムの構造を最適化
するのに用いられることを特徴とする学習方式。2. A learning method according to claim 1, wherein the learning method is used for optimizing a predetermined system parameter and optimizing a system structure.

【請求項３】請求項１記載の学習方式は、与えられた
サンプルｘ_iからそのサンプルの従う確率密度を、パラ
メータｗをもつ確率密度Ｐ(ｗ;ｘ)で近似する問題に適
用可能であって、この場合、前記関数を用い、各時刻毎
にパラメータに対する最急降下法を利用して最適なパラ
メータｗを求めることを特徴とする学習方式。3. The learning method according to claim 1 can be applied to a problem in which a probability density P (w; x) having a parameter w approximates a probability density that a given sample x _i follows. Then, in this case, the learning method is characterized in that the optimum parameter w is obtained by using the steepest descent method for the parameter at each time using the function.

【請求項４】請求項１記載の学習方式は、与えられた
入出力サンプル(ｘ_i,ｙ_i)からそのサンプルを発生して
いる条件つき確率密度を、パラメータｗをもつ確率密度
Ｐ(ｗ;ｙ|ｘ)で近似する問題に適用可能であって、この
場合、前記関数を用い、各時刻毎にパラメータに対する
最急降下法を利用して最適なパラメータｗを求めること
を特徴とする学習方式。4. The learning method according to claim 1, wherein the conditional probability density of generating a sample from a given input / output sample (x _i , y _i ) is a probability density P (w with a parameter w. a learning method characterized in that it is applicable to a problem approximated by; y | x), and in this case, the above-mentioned function is used to obtain the optimum parameter w by using the steepest descent method for the parameter at each time. .

【請求項５】請求項１記載の学習方式は、与えられた
入出力サンプル(ｘ_i,ｙ_i)からそのサンプルを発生して
いる条件つき確率密度を、パラメータｗを持つ人工的ニ
ューラルネットワークＰ(ｗ;ｙ|ｘ)で近似する問題に適
用可能であって、この場合、前記関数を用い、各時刻毎
にパラメータに対する最急降下法を利用して最適なパラ
メータｗを求めることを特徴とする学習方式。5. The learning method according to claim 1, wherein the conditional probability density of generating a sample from a given input / output sample (x _i , y _i ) is an artificial neural network P having a parameter w. It is applicable to a problem approximated by (w; y | x), and in this case, the function is used to find the optimum parameter w by using the steepest descent method for the parameter at each time. Learning method.

【請求項６】請求項１記載の学習方式は、与えられた
入出力サンプル(ｘ_i,ｙ_i)からそのサンプルを発生して
いる条件つき確率密度をパラメータｗを持つ多層パーセ
プトロンで近似する問題に適用可能であって、この場
合、前記関数を用い、各時刻毎にパラメータに対する最
急降下法を利用して最適なパラメータｗを求めることを
特徴とする学習方式。6. The learning method according to claim 1, wherein the conditional probability density of generating a given input / output sample (x _i , y _i ) is approximated by a multilayer perceptron having a parameter w. In this case, the learning method is characterized in that the optimum parameter w is obtained by using the function and using the steepest descent method for the parameter at each time.