JP5246751B2

JP5246751B2 - Information processing apparatus, information processing method, and program

Info

Publication number: JP5246751B2
Application number: JP2008092018A
Authority: JP
Inventors: 淳谷; 祐一山下; 淳並川
Original assignee: RIKEN Institute of Physical and Chemical Research
Current assignee: RIKEN Institute of Physical and Chemical Research
Priority date: 2008-03-31
Filing date: 2008-03-31
Publication date: 2013-07-24
Anticipated expiration: 2028-03-31
Also published as: JP2009245236A

Description

本発明は情報処理装置、情報処理方法、およびプログラムに関し、特に、より多くの時系列パターンを学習、予測できるようにした情報処理装置、情報処理方法、およびプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program, and more particularly to an information processing device, an information processing method, and a program that can learn and predict more time-series patterns.

従来よりリカレントニューラルネットワークによりロボットその他の対象を制御することが研究されている（例えば、非特許文献１）。 Controlling robots and other objects using a recurrent neural network has been studied (for example, Non-Patent Document 1).

従来のリカレントニューラルネットワークにおいては、例えば１０次元の時系列データを学習し、予測値を演算する場合、１０個の出力ニューロンが用意され、それぞれのニューロンが対応する次元の値を表現するというような情報記述方法が採用されていた。 In a conventional recurrent neural network, for example, when learning 10-dimensional time-series data and calculating a predicted value, 10 output neurons are prepared, and each neuron represents a corresponding dimension value. An information description method was adopted.

しかしながら、このような記述方法では、学習する時系列データ間に重なり合った部分や似通った部分が生じ易く、結果的にリカレントニューラルネットワーク内での表現に混乱や矛盾が生じ、結局破綻してしまうことが多かった。 However, with such a description method, overlapping and similar parts are likely to occur between the time series data to be learned, resulting in confusion and contradiction in the expression in the recurrent neural network, which eventually fails. There were many.

そこで、同時に複数の時系列データを学習できるように、複数のモジュールを用いたり、リカレントニューラルネットワークの外部にパターンを切り替える装置を設け、その外部装置を切り替えて学習させることにより、同時に複数の時系列データを学習できるようにする試みがなされていた。 Therefore, a plurality of time series data can be learned at the same time by using a plurality of modules or by providing a device for switching a pattern outside the recurrent neural network, and switching the external device to learn. Attempts were made to learn the data.

社団法人電子情報通信学会信学技報PRMU2002-218(2003-02)「リカレントニューラルネットワークを用いた車両検出」p.43-48The Institute of Electronics, Information and Communication Engineers IEICE Technical Report PRMU2002-218 (2003-02) “Vehicle Detection Using Recurrent Neural Networks” p.43-48

しかしながら、外部装置により切り替えるようにした場合においても、実際には３種類程度の時系列パターンしか学習することができなかった。 However, even when switching is performed by an external device, only about three types of time-series patterns can actually be learned.

本発明は、このような状況に鑑みてなされたものであり、より多くの時系列パターンを学習し、予測できるようにするものである。 The present invention has been made in view of such a situation, and makes it possible to learn and predict more time-series patterns.

本発明の一側面は、所定の対象をセンシングすることにより取得されたセンサ信号であるデータを、グループ毎に、１峰性の確率分布のより高次元のデータにトポロジープリザービングマップにより変換する高次元変換部と、高次元のデータから重み付け係数に基づいて、ソフトマックス関数を用いてグループ毎に合計が１になるように予測値を演算するリカレントニューラルネットワークと、前記センサ信号に対応して取得された教示用データを、グループ毎に、１峰性の確率分布のより高次元の教示用データに変換する他の高次元変換部とを備え、前記リカレントニューラルネットワークは、高次元の前記教示用データの値が大きいほど誤差の値が大きくなるように前記誤差を演算し、その演算結果に基づき、時系列のパターンの学習処理を行う情報処理装置である。 One aspect of the present invention is a high-level conversion that converts data, which is a sensor signal obtained by sensing a predetermined object, into higher-dimensional data having a unimodal probability distribution for each group using a topology preserving map. A dimension conversion unit, a recurrent neural network that calculates a predicted value so that the total is 1 for each group using a softmax function based on a weighting coefficient from high-dimensional data , and is acquired corresponding to the sensor signal Another high-dimensional conversion unit that converts the teaching data thus obtained into higher-dimensional teaching data having a unimodal probability distribution for each group, and the recurrent neural network includes the higher-dimensional teaching data. The error is calculated so that the larger the data value, the larger the error value. Based on the calculation result, a time-series pattern learning process is performed. Which is an information processing apparatus for performing.

本発明の一側面においては、高次元変換部が、所定の対象をセンシングすることにより取得されたセンサ信号であるデータを、グループ毎に、１峰性の確率分布のより高次元のデータにトポロジープリザービングマップにより変換し、リカレントニューラルネットワークが、高次元のデータから重み付け係数に基づいて、ソフトマックス関数を用いてグループ毎に合計が１になるように予測値を演算し、他の高次元変換部が、センサ信号に対応して取得された教示用データを、グループ毎に、１峰性の確率分布のより高次元の教示用データに変換し、リカレントニューラルネットワークが、高次元の教示用データの値が大きいほど誤差の値が大きくなるように誤差を演算し、その演算結果に基づき、時系列のパターンの学習処理を行う。 In one aspect of the present invention, high-dimensional conversion unit, the topology data is a sensor signal obtained by sensing the predetermined target, for each group, the higher dimensional data of the probability distribution of one unimodal Converted by preserving map , recurrent neural network uses softmax function to calculate predicted value so that the total is 1 for each group based on weighting coefficient from high-dimensional data , and other high-dimensional conversion The unit converts the teaching data acquired corresponding to the sensor signal into higher-dimensional teaching data having a unimodal probability distribution for each group, and the recurrent neural network converts the higher-dimensional teaching data. The error is calculated such that the larger the value is, the larger the error value is, and based on the calculation result, a time-series pattern learning process is performed .

以上のように、本発明の一側面によれば、より多くの時系列パターンを学習、予測することができる。 As described above, according to one aspect of the present invention, more time series patterns can be learned and predicted.

以下、図を参照して本発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は本発明を適用した情報処理装置の一実施の形態の構成を示す。この情報処理装置１は、予測処理のため、リカレントニューラルネットワーク（以下、ＲＮＮとも記述する）１１および変換部１２，１３を有する他、ＲＮＮ１１の学習のため、教示用データ取得部２１、変換部２２、および演算部２３を有している。 FIG. 1 shows a configuration of an embodiment of an information processing apparatus to which the present invention is applied. The information processing apparatus 1 includes a recurrent neural network (hereinafter also referred to as RNN) 11 and conversion units 12 and 13 for prediction processing, and a teaching data acquisition unit 21 and conversion unit 22 for learning the RNN 11. And a calculation unit 23.

この実施の形態においては、情報処理装置１がロボット２を制御する。このロボット２はモーター８１と視覚センサー８２を有している。モーター８１は、ロボット２の所定の部位を駆動することで、ロボット２に対して所定の動作を実行させる。モーター８１はセンサリーモーターであり、駆動の結果に対応する信号をセンサリフィードバックとして外部に出力する。視覚センサー８２は所定のオブジェクトを観察し、その観察結果に対応する信号をセンサリフィードバックとして出力する。 In this embodiment, the information processing apparatus 1 controls the robot 2. The robot 2 has a motor 81 and a visual sensor 82. The motor 81 drives a predetermined part of the robot 2 to cause the robot 2 to execute a predetermined operation. The motor 81 is a sensory motor, and outputs a signal corresponding to the driving result to the outside as sensory feedback. The visual sensor 82 observes a predetermined object and outputs a signal corresponding to the observation result as sensory feedback.

この実施の形態の場合、モーター８１の駆動の結果に対応する信号として８次元のデータｍ_tが出力され、視覚センサー８２のオブジェクトの観察結果に対応する信号として２次元のデータＳ_tが出力される。従って、ロボット２から合計１０次元のデータが、センサリフィードバックとして変換部１２に出力される。 In this embodiment, 8-dimensional data m _t as a signal corresponding to the result of the driving of the motor 81 is outputted, two-dimensional data S _t is output as a signal corresponding to the observation object in the visual sensor 82 The Therefore, a total of 10-dimensional data is output from the robot 2 to the conversion unit 12 as sensory feedback.

取得されたデータを、グループ毎に、１峰性の確率分布のより高次元のデータに変換する高次元変換部としての変換部１２は、トポロジープリザービングマップ（topology preserving map）（以下、ＴＰＭと記述する）６１，６２を有する。ＴＰＭ６１は、ロボット２から入力されたモーター８１に関する８次元のデータｍ_tを、１峰性の確率分布の６４次元のデータＸ_i（ｔ）に変換する。ＴＰＭ６２は、ロボット２から入力された視覚センサー８２に関する２次元のデータＳ_tを、１峰性の確率分布の３６次元のデータＸ_i（ｔ）に変換する。結局変換部１２は、ロボット２から入力された１０次元のデータを、１００次元のデータＸ_i（ｔ）に変換する。 A conversion unit 12 serving as a high-dimensional conversion unit that converts the acquired data into higher-dimensional data having a unimodal probability distribution for each group includes a topology preserving map (hereinafter referred to as TPM). (Describe) 61, 62. The TPM 61 converts the 8-dimensional data m _t regarding the motor 81 input from the robot 2 into 64-dimensional data X _i (t) having a unimodal probability distribution. TPM62 converts the two-dimensional data S _t about visual sensor 82 that is input from the robot 2, the 1-peak of the probability distribution of 36-dimensional data X _i (t). Eventually, the conversion unit 12 converts the 10-dimensional data input from the robot 2 into 100-dimensional data X _i (t).

高次元のデータから重み付け係数に基づいて、グループ毎に合計が１になるように予測値を演算するＲＮＮ１１は、中間層を有しておらず、入力層３１と出力層３２の２層構造とされている。 The RNN 11 that calculates a predicted value so that the total is 1 for each group based on the weighting coefficient from high-dimensional data does not have an intermediate layer, and has a two-layer structure of an input layer 31 and an output layer 32. Has been.

入力層３１は、ニューロン４１とニューロン４２により構成されている。ニューロン４１は、変換部１２のＴＰＭ６１からの６４次元のデータＸ_i（ｔ）を入力する６４個のニューロンと、変換部１２のＴＰＭ６２からの３６次元のデータＸ_i（ｔ）を入力する３６個のニューロンの合計１００個のニューロンで構成される。ニューロン４２は、出力層３２からコンテキストループ５６を介して供給される８０次元のコンテキストが入力される８０個のニューロンで構成される。 The input layer 31 is composed of neurons 41 and neurons 42. The neuron 41 has 64 neurons that input 64-dimensional data X _i (t) from the TPM 61 of the conversion unit 12 and 36 that input 36-dimensional data X _i (t) from the TPM 62 of the conversion unit 12. It is composed of a total of 100 neurons. The neuron 42 includes 80 neurons to which an 80-dimensional context supplied from the output layer 32 via the context loop 56 is input.

ニューロン４１は、変換部１２のＴＰＭ６１から供給される６４次元のデータと、ＴＰＭ６２からから供給される３６次元のデータの合計１００次元のデータＸ_i（ｔ）を、出力層３２のニューロン５１，５２に供給する。またニューロン４２は、出力層３２からフィードバックされた８０次元のコンテキストＣ_i（ｔ−１）を、出力層３２のニューロン５１，５２に供給する。 The neuron 41 uses the 64-dimensional data supplied from the TPM 61 of the conversion unit 12 and the 36-dimensional data supplied from the TPM 62 as a total of 100-dimensional data X _i (t), and the neurons 51 and 52 in the output layer 32. To supply. The neuron 42 supplies the 80-dimensional context C _i (t−1) fed back from the output layer 32 to the neurons 51 and 52 of the output layer 32.

出力層３２のニューロン５１は、１００個のニューロンから構成され、自身の内部状態をフィードバックするセルフフィードバックループ５１Ａを有している。すなわち、ニューロン５１の内部状態（internal state）は、入力層３１のニューロン４１，４２からのデータを重み付け係数に基づき重み付けした値と、自身の過去の内部状態の履歴に基づき決定される。１００次元のデータＸ_i（ｔ）に対する重み付け係数はω^bx _ijとされ、８０次元のコンテキストに対する重み付け係数はω^bc _ijとされる。これらの重み付け係数により重み付けされ、出力された１００次元のデータＸ^u _i（ｔ）は、演算部５３，５４に６４次元と３６次元に分配して供給される。 The neuron 51 of the output layer 32 is composed of 100 neurons and has a self-feedback loop 51A that feeds back its own internal state. That is, the internal state of the neuron 51 is determined based on a value obtained by weighting the data from the neurons 41 and 42 of the input layer 31 based on the weighting coefficient and the history of its own internal state. The weighting coefficient for 100-dimensional data X _i (t) is ω ^bx _ij, and the weighting coefficient for 80-dimensional context is ω ^bc _ij . The 100-dimensional data X ^u _i (t) weighted by these weighting coefficients and outputted is distributed and supplied to the arithmetic units 53 and 54 in 64 dimensions and 36 dimensions.

出力層３２のニューロン５２は、８０個のニューロンから構成され、自身の内部状態をフィードバックするセルフフィードバックループ５２Ａを有している。すなわち、ニューロン５２の内部状態（internal state）は、入力層３１のニューロン４１，４２からのデータを重み付け係数に基づき重み付けした値と、自身の過去の内部状態の履歴に基づき決定される。１００次元のデータＸ_i（ｔ）に対する重み付け係数はω^ux _ijとされ、８０次元のコンテキストに対する重み付け係数はω^uc _ijとされている。これらの重み付け係数により重み付けされた８０次元のデータＣ^u _i（ｔ）は、演算部５５に供給される。 The neuron 52 of the output layer 32 is composed of 80 neurons and has a self-feedback loop 52A that feeds back its own internal state. That is, the internal state of the neuron 52 is determined based on a value obtained by weighting data from the neurons 41 and 42 of the input layer 31 based on the weighting coefficient and the history of the past internal state of the neuron 52. The weighting coefficient for 100-dimensional data X _i (t) is ω ^ux _ij, and the weighting coefficient for 80-dimensional context is ω ^uc _ij . The 80-dimensional data C ^u _i (t) weighted by these weighting coefficients is supplied to the calculation unit 55.

このように、この実施の形態の場合、時間特性が異なる２つの時系列パターンを表現することができるようにするために、ＲＮＮ１１はセルフフィードバックループを有する時間連続型のＲＮＮ（continuous time RNN）（以下、ＣＴＲＮＮとも記述する）とされる。 Thus, in this embodiment, in order to be able to express two time series patterns having different time characteristics, the RNN 11 has a continuous time RNN (continuous time RNN) (self-feedback loop). Hereinafter, it is also referred to as CTRNN).

出力層３２の演算部５３は、ニューロン５１により重み付けされた１００次元のデータＸ^u _i（ｔ）のうちの、６４次元のデータの予測値としてのアクティベーションをソフトマックス関数により演算する。演算された６４次元の予測値Ｙ_i（ｔ）は演算部２３と、変換部１３のＴＰＭ７１に供給される。 The computing unit 53 of the output layer 32 computes activation as a predicted value of 64-dimensional data of 100-dimensional data X ^u _i (t) weighted by the neuron 51 using a softmax function. The calculated 64-dimensional predicted value Y _i (t) is supplied to the calculation unit 23 and the TPM 71 of the conversion unit 13.

演算部５４は、ニューロン５１により重み付けされた１００次元のデータＸ^u _i（ｔ）のうちの、残りの３６次元のデータの予測値としてのアクティベーションをソフトマックス関数により演算する。演算された３６次元の予測値Ｙ_i（ｔ）は演算部２３と、変換部１３のＴＰＭ７２に供給される。 The calculation unit 54 calculates an activation as a predicted value of the remaining 36-dimensional data of the 100-dimensional data X ^u _i (t) weighted by the neuron 51 using a softmax function. The calculated 36-dimensional predicted value Y _i (t) is supplied to the calculation unit 23 and the TPM 72 of the conversion unit 13.

演算部５５は、ニューロン５２により重み付けされた８０次元のコンテキストの内部状態Ｃ^u _i（ｔ）からコンテキストアクティベーションＣ_i（ｔ）を演算する。演算されたＣ_i（ｔ）は、コンテキストとしてコンテキストループ５６を介して入力層３１のニューロン４２にフィードバックされる。 The computing unit 55 computes the context activation C _i (t) from the internal state C ^u _i (t) of the 80-dimensional context weighted by the neuron 52. The calculated C _i (t) is fed back to the neuron 42 of the input layer 31 through the context loop 56 as a context.

教示用データ取得部２１は図示せぬ装置あるいは記憶部から、モーター８１のセンサリフィードバックとしての出力データｍ_tに対応する８次元の教示用データｍ^*（ｔ＋１）と、視覚センサー８２のセンサリフィードバックとしての出力データＳ_tに対応する２次元の教示用データＳ^*（ｔ＋１）を取得する。取得された教示用データを、グループ毎に、１峰性の確率分布のより高次元のデータに変換する他の高次元変換部としての変換部２２は、変換部１２と同様の構成とされ、ＴＰＭ１０１とＴＰＭ１０２を有している。ＴＰＭ１０１は入力されたモーター８１に関する８次元の教示用データｍ^*（ｔ＋１）を６４次元の教示用データＹ^* _i（ｔ）に変換する。ＴＰＭ１０２は入力された視覚センサー８２に関する２次元の教示用データＳ^*（ｔ＋１）を３６次元の教示用データＹ^* _i（ｔ）に変換する。 From the teaching data acquisition unit 21 is not shown device or the storage unit, the output data m _t data for 8-dimensional teaching corresponding to m as SENSORY feedback motor 81 ^* and (t + 1), Sensory visual sensor 82 It acquires output data S _t data for 2-dimensional teaching corresponding to S ^* (t + 1) as the feedback. The conversion unit 22 as another high-dimensional conversion unit that converts the acquired teaching data into higher-dimensional data of a unimodal probability distribution for each group has the same configuration as the conversion unit 12. TPM 101 and TPM 102 are included. The TPM 101 converts the input 8-dimensional teaching data m ^* (t + 1) related to the motor 81 into 64-dimensional teaching data Y ^* _i (t). The TPM 102 converts the input two-dimensional teaching data S ^* (t + 1) relating to the visual sensor 82 into 36-dimensional teaching data Y ^* _i (t).

演算部２３は、出力層３２の演算部５３から供給された６４次元の予測値Ｙ_i（ｔ）および演算部５４から供給された３６次元の予測値Ｙ_i（ｔ）の合計１００次元の予測値Ｙ_i（ｔ）と、変換部２２のＴＰＭ１０１から供給された６４次元の教示用データＹ^* _i（ｔ）およびＴＰＭ１０２から供給された３６次元の教示用データＹ^* _i（ｔ）の合計１００次元の教示用データＹ^* _i（ｔ）との誤差を演算する。演算部２３は、演算した誤差に基づいて、出力層３２の重み付け係数を修正する。 The calculation unit 23 predicts a total of 100 dimensions of the 64-dimensional predicted value Y _i (t) supplied from the calculation unit 53 of the output layer 32 and the 36-dimensional predicted value Y _i (t) supplied from the calculation unit 54. A total of 100 of the value Y _i (t) and the 64-dimensional teaching data Y ^* _i (t) supplied from the TPM 101 of the conversion unit 22 and the 36-dimensional teaching data Y ^* _i (t) supplied from the TPM 102 An error from the dimension teaching data Y ^* _i (t) is calculated. The calculator 23 corrects the weighting coefficient of the output layer 32 based on the calculated error.

予測値を、変換部１２により取得されたデータｍ_t，Ｓｔと同じ次元に変換する低次元変換部としての変換部１３はＴＰＭ７１，７２を有し、変換部１２における場合と逆の次元の変換を行う。すなわち、ＴＰＭ７１は出力層３２の演算部５３から供給された６４次元の予測値Ｙ_i（ｔ）を８次元のデータに変換し、アクションとしてロボット２に出力する。ＴＰＭ７２は出力層３２の演算部５４から供給された３６次元の予測値Ｙ_i（ｔ）を２次元のデータに変換し、アクションとしてロボット２に出力する。 The conversion unit 13 as a low-dimensional conversion unit that converts the predicted value into the same dimension as the data m _t and St acquired by the conversion unit 12 includes TPMs 71 and 72, and the conversion of dimensions opposite to that in the conversion unit 12 is performed. I do. That is, the TPM 71 converts the 64-dimensional predicted value Y _i (t) supplied from the calculation unit 53 of the output layer 32 into 8-dimensional data, and outputs it to the robot 2 as an action. The TPM 72 converts the 36-dimensional predicted value Y _i (t) supplied from the calculation unit 54 of the output layer 32 into two-dimensional data, and outputs it to the robot 2 as an action.

ロボット２は、ＴＰＭ７１からの８次元のデータに基づき、モーター８１の動作を制御するとともに、ＴＰＭ７２からの２次元のデータに基づき、視覚センサー８２の動作を制御する。 The robot 2 controls the operation of the motor 81 based on the eight-dimensional data from the TPM 71 and also controls the operation of the visual sensor 82 based on the two-dimensional data from the TPM 72.

次に図２のフローチャートを参照して、ＲＮＮ１１の学習処理について説明する。 Next, the learning process of the RNN 11 will be described with reference to the flowchart of FIG.

ステップＳ１において演算部２３は、処理回数を表す変数ｔを０に設定する。ステップＳ２において変換部１２は、データを取得し、次元を変換する。学習処理の場合、ロボット２は実際には使用されず、メンタルシミュレーションが行われ、アクションがそのままセンサリフィードバックとして使用される。ｔ＝０ではない場合、後述するステップＳ８の処理で生成されたデータｍ（ｔ），Ｓ（ｔ）を変換部１２により次元変換してＸ（ｔ）が生成される。また前の時刻の演算部５５の演算結果Ｃ（ｔ−１）が取得される。 In step S1, the calculation unit 23 sets a variable t representing the number of processes to 0. In step S2, the conversion unit 12 acquires data and converts dimensions. In the learning process, the robot 2 is not actually used, a mental simulation is performed, and the action is used as it is as sensory feedback. When t is not 0, the data m (t) and S (t) generated in the process of step S8 described later is subjected to dimension conversion by the conversion unit 12 to generate X (t). In addition, the calculation result C (t−1) of the calculation unit 55 at the previous time is acquired.

これに対してｔ＝０である場合、初期値Ｘ^u（ｉｎｉｔ），Ｃ^u（ｉｎｉｔ）が取得され、それぞれが演算部５３，５４によりソフトマックス関数またはシグモイド関数で演算される。演算された６４次元と３６次元の結果が、変換部１３により、８次元のデータと２次元のデータに変換され、センサリフィードバックとされる。 On the other hand, when t = 0, initial values X ^u (init) and C ^u (init) are acquired and calculated by the calculation units 53 and 54 using a softmax function or a sigmoid function, respectively. The calculated 64-dimensional and 36-dimensional results are converted into 8-dimensional data and 2-dimensional data by the conversion unit 13 and used as sensory feedback.

変換部１２のＴＰＭ６１は、センサリフィードバック（初期値）としての８次元のデータｍ_tを取得し、これを次のソフトマックス関数の式（１）に従って、１峰性の確率分布のより高次元（この実施の形態の場合６４次元）のデータＸ_i（ｔ）に変換する。すなわち、６４次元の出力の合計が１になるように変換される。これにより、近似したものをより近くにマップし、違うものをより遠くにマップするという、ウィナーテークオール的な変換が行われる。その結果、出力の最大値が１．０に近づけられ、その他の出力が０．０に近づけられて、データ間の重なり合いが減少し、無理なく自己組織化が可能となる。 TPM61 converting section 12 obtains the 8-dimensional data m _t as SENSORY feedback (initial value), according to which the formula for a softmax function (1), higher-dimensional probability distribution of 1-peak The data is converted into data X _i (t) (64 dimensions in this embodiment). That is, conversion is performed so that the total of the 64 dimensional outputs is 1. As a result, a winner-take-all transformation is performed in which approximates are mapped closer and different ones are mapped farther. As a result, the maximum value of the output is brought close to 1.0, the other outputs are brought close to 0.0, the overlap between data is reduced, and self-organization becomes possible without difficulty.

上記式（１）において、δは定数である。ｍ_tは、取得されたモーター８１に関する８次元のデータであり、次の式（２）で表される。 In the above formula (1), δ is a constant. m _t is 8-dimensional data regarding the acquired motor 81 and is expressed by the following equation (2).

ｋ_j（ｊ＝１，２，・・・，６４）は、次の式（３）で表されるように、参照ベクトルＫの要素である。 k _j (j = 1, 2,..., 64) is an element of the reference vector K as represented by the following expression (3).

参照ベクトルＫの要素ｋ_iは、図３に示されるように、８×８個のマトリックス状の各ノードの位置に対応して配置されており、次の式（４）で表されるように、８個の要素μ_ij（j＝１，２，・・・，８）により構成される。 As shown in FIG. 3, the elements k _i of the reference vector K are arranged corresponding to the positions of 8 × 8 matrix nodes, and are expressed by the following equation (4). , 8 elements μ _ij (j = 1, 2,..., 8).

ＴＰＭ６２においては、参照ベクトルＫの要素ｋ_iは、図４に示されるように、６×６個のマトリックス状の各ノードの位置に対応して配置されている。そして同様に、ＴＰＭ６２は視覚センサー８２のセンサリフィードバック（初期値）としての２次元のデータをより高次元(この実施の形態の場合３６次元)のデータに変換する。すなわち、３６次元の出力の合計が１になるように変換される。その処理は、上述したＴＰＭ６１の場合と同様であるのでその説明は省略する。 In the TPM 62, the elements k _i of the reference vector K are arranged corresponding to the positions of the 6 × 6 matrix nodes as shown in FIG. Similarly, the TPM 62 converts two-dimensional data as sensory feedback (initial value) of the visual sensor 82 into higher-dimensional data (36 dimensions in this embodiment). That is, conversion is performed so that the total of 36-dimensional outputs becomes 1. Since the processing is the same as that of the TPM 61 described above, description thereof is omitted.

ＴＰＭ６２においてもＴＰＭ６１と同様に、式（１）乃至式（４）が用いられる。ただし、式（１）においては、モーター８１の８次元のデータに対応するデータｍ_tに代えて、視覚センサー８２の２次元のデータに対応するデータＳ_tが用いられ、式（２）においては、視覚センサー８２のデータに対応するデータＳ_tの要素は２次元とされ、式（３）においては、参照ベクトルＫの要素ｋ_iの数は３６個とされ、式（４）においては、参照ベクトルＫの要素ｋ_iの要素μ_iの数は２個とされる。 Similarly to the TPM 61, the TPM 62 uses the equations (1) to (4). However, in the formula (1), in place of the data m _t corresponding to 8-dimensional data of the motor 81, the data S _t is used corresponding to the two-dimensional data of the visual sensor 82, in the formula (2) is , the elements of the data S _t that corresponds to the data of the visual sensor 82 is a two-dimensional, in the formula (3), the number of elements k _i of the reference vector K is set to 36, in the formula (4), see The number of elements μ _i of the elements k _i of the vector K is two.

すなわち、データはモーター８１に関するグループ、あるいは視覚センサー８２に関するグループといったグループ毎に、その出力の合計が１になるように調整される。 In other words, the data is adjusted so that the total output is 1 for each group such as the group related to the motor 81 or the group related to the visual sensor 82.

なお、ＴＰＭ６１，６２は学習処理を行うことで実現される。その学習処理については、図５のフローチャートを参照して後述する。 The TPMs 61 and 62 are realized by performing learning processing. The learning process will be described later with reference to the flowchart of FIG.

データＸ_i（ｔ），Ｃ_j（ｔ−１）は入力層３１により取得される。すなわち、入力層３１のニューロン４１がＴＰＭ６１からの６４次元のデータＸ_i（ｔ）と、ＴＰＭ６２からの３６次元のデータＸ_i（ｔ）の、合計１００次元のデータＸ_i（ｔ）を取得する。またニューロン４２が、コンテキストループ５６によりフィードバックされる８０次元のコンテキストＣ_j（ｔ−１）を取得する。 Data X _i (t) and C _j (t−1) are acquired by the input layer 31. That is, the neuron 41 of the input layer 31 acquires a total of 100-dimensional data X _i (t) of the 64-dimensional data X _i (t) from the TPM 61 and the 36-dimensional data X _i (t) from the TPM 62. . The neuron 42 acquires the 80-dimensional context C _j (t−1) fed back by the context loop 56.

ステップＳ３において出力層の内部状態を更新する処理が行われる。すなわち、出力層３２のニューロン５１は次の式（５）に基づいて内部状態Ｘ^u _i（ｔ）を更新する。式（５）の右辺の第１項は、セルフフィードバックループ５１Ａにより、現在の内部状態Ｘ^u _i（ｔ）が、過去の内部状態Ｘ^u _i（ｔ−１）により決定されることを表す。第２項は、現在の内部状態Ｘ^u _i（ｔ）が、入力層３１のニューロン４１からの１００次元のデータＸ_j（ｔ）に重み付け係数ω^bx _ijにより重み付けした値と、８０次元のコンテキストＣ_j（ｔ−１）に重み付け係数ω^bc _ijにより重み付けした値との積和により決定されることを表す。 In step S3, processing for updating the internal state of the output layer is performed. That is, the neuron 51 of the output layer 32 updates the internal state X ^u _i (t) based on the following equation (5). The first term on the right side of Equation (5) represents that the current internal state X ^u _i (t) is determined by the past internal state X ^u _i (t−1) by the self-feedback loop 51A. The second term shows that the current internal state X ^u _i (t) is a value obtained by weighting 100-dimensional data X _j (t) from the neuron 41 of the input layer 31 with a weighting coefficient ω ^bx _ij, and an 80-dimensional context. It represents that it is determined by the product sum of C _j (t−1) and the value weighted by the weighting coefficient ω ^bc _ij .

すなわちこの演算においては、ニューロン５１の1個のニューロンiの時刻tにおける内部状態Ｘ^u _i（ｔ）を得るために、時刻（ｔ−１）におけるニューロンiの内部状態に時間係数を乗じたもの（１−１／τ_i）Ｘ^u _i（ｔ−１）が演算される。さらに、入力層３１のニューロン４１のすべてのニューロンjの出力Ｘ_j（ｔ）に、ニューロン４１のニューロンjからニューロン５１のニューロンiへの重み付け係数ω^bx _ijに従って重み付けされた出力と、入力層３１のニューロン４２のすべてのニューロンjの出力Ｃ_j（ｔ−１）に、ニューロン４２のニューロンjからニューロン５１のニューロンiへの重み付け係数ω^bc _ijに従って重み付けされた出力の合計に、時間係数１／τ_iを乗じたものの合計が演算される。そしてそれらの演算値がさらに加算される。 That is, in this calculation, in order to obtain the internal state X ^u _i (t) of one neuron i of neuron 51 at time t, the internal state of neuron i at time (t−1) is multiplied by a time coefficient. (1-1 / τ _i ) X ^u _i (t−1) is calculated. Further, the output X _j (t) of all neurons j of the neuron 41 of the input layer 31 is weighted according to the weighting coefficient ω ^bx _ij from the neuron j of the neuron 41 to the neuron i of the neuron 51, and the input layer 31 To the sum of the outputs C _j (t−1) of all the neurons j of the neuron 42 of the current one according to the weighting factor ω ^bc _ij from the neuron j of the neuron 42 to the neuron i of the neuron 51, The sum of τ _i multiplied is calculated. These calculated values are further added.

出力層３２のニューロン５２は次の式（６）に基づいて内部状態Ｃ^u _i（ｔ）を更新する。式（６）の右辺の第１項は、セルフフィードバックループ５２Ａにより、現在の内部状態Ｃ^u _i（ｔ）が、過去の内部状態Ｃ^u _j（ｔ−１）により決定されることを表す。第２項は、現在の内部状態Ｃ^u _i（ｔ）が、入力層３１のニューロン４１からの１００次元のデータＸ_j（ｔ）に重み付け係数ω^ux _ijにより重み付けした値と、８０次元のコンテキストＣ_j（ｔ−１）に重み付け係数ω^uc _ijにより重み付けした値との積和により決定されることを表す。 The neuron 52 of the output layer 32 updates the internal state C ^u _i (t) based on the following equation (6). The first term on the right side of Equation (6) represents that the current internal state C ^u _i (t) is determined by the past internal state C ^u _j (t−1) by the self-feedback loop 52A. The second term indicates that the current internal state C ^u _i (t) is a value obtained by weighting 100-dimensional data X _j (t) from the neuron 41 of the input layer 31 with a weighting coefficient ω ^ux _ij, and an 80-dimensional context. It represents that it is determined by the product sum of C _j (t−1) and the value weighted by the weighting coefficient ω ^uc _ij .

すなわちこの演算においては、ニューロン５２の1個のニューロンiの時刻tにおける内部状態Ｃ^u _i（ｔ）を得るために、時刻（ｔ−１）におけるニューロンiの内部状態に時間係数を乗じたもの（１−１／τ_i）Ｃ^u _i（ｔ−１）が演算される。さらに、入力層３１のニューロン４１のすべてのニューロンjの出力Ｘ_j（ｔ）に、ニューロン４１のニューロンjからニューロン５２のニューロンiへの重み付け係数ω^ux _ijに従って重み付けされた出力と、入力層３１のニューロン４２のすべてのニューロンjの出力Ｃ_j（ｔ−１）に、ニューロン４２のニューロンjからニューロン５２のニューロンiへの重み付け係数ω^uc _ijに従って重み付けされた出力の合計に、時間係数１／τ_iを乗じたものの合計が演算される。そしてそれらの演算値がさらに加算される。 That is, in this calculation, in order to obtain the internal state C ^u _i (t) of one neuron i of neuron 52 at time t, the internal state of neuron i at time (t−1) is multiplied by a time coefficient. (1-1 / τ _i ) C ^u _i (t−1) is calculated. Further, the outputs X _j (t) of all the neurons j of the neurons 41 of the input layer 31 are weighted according to the weighting coefficients ω ^ux _ij from the neurons j of the neurons 41 to the neurons i of the neurons 52, and the input layer 31. The output C _j (t−1) of all the neurons j of the neuron 42 of the current neuron 42 is added to the sum of the outputs weighted according to the weighting factor ω ^uc _ij from the neuron j of neuron 42 to the neuron i of neuron 52, The sum of τ _i multiplied is calculated. These calculated values are further added.

式（５），式（６）から明らかなように内部状態Ｘ^u _i（ｔ），Ｃ^u _i（ｔ）は、時定数τ_iの値が大きいほど、（１−１／τ_i）の値が大きくなるので、過去の内部状態から大きな影響を受ける。 As is apparent from the equations (5) and (6), the internal states X ^u _i (t) and C ^u _i (t) are expressed as (1-1 / τ _i ) as the value of the time constant τ _i increases. Since the value becomes large, it is greatly influenced by the past internal state.

連続時間型のリカレントニューラルネットワークであるＲＮＮ１１は、時定数τ_iで表される時間特性を有するが、この実施の形態の場合、式（５）で表されるニューロン５１の内部状態Ｘ^u _i（ｔ）を決定する時定数τ_iとして、１つの値（例えばτ_i＝２）が設定される。これに対して、式（６）で表されるニューロン５２の内部状態Ｃ^u _i（ｔ）を決定する時定数τ_iとして、２つの異なる値が設定される。例えば、８０個（次元）のニューロンのうち、次元がｉ＝１乃至６０の６０個のニューロンについては、時定数がτ_i＝５（ファーストコンテキスト）とされ、次元がｉ＝６１乃至８０の２０個のニューロンについては、時定数がτ_i＝７０（スローコンテキスト）とさる。 The RNN 11 that is a continuous-time recurrent neural network has a time characteristic represented by a time constant τ _i . In this embodiment, the internal state X ^u _i ( One value (for example, τ _i = 2) is set as the time constant τ _i for determining t). On the other hand, two different values are set as the time constant τ _i for determining the internal state C ^u _i (t) of the neuron 52 represented by the equation (6). For example, out of 80 (dimension) neurons, 60 neurons with dimensions i = 1 to 60 have a time constant of τ _i = 5 (first context), and dimensions 20 to 20 with i = 61 to 80. For each neuron, the time constant is τ _i = 70 (slow context).

標的とされる時系列データは、短い時間スケールで変化する運動と、長い時間スケールで変化する運動の両方を含む複雑さを有していることが多い。時定数を１つに設定すると、２つの時間スケールの一方にしか対応できない。これに対して、２つの異なる時定数を設定して、時間特性の異なるニューロングループを用意することで、時間スケールの違いに応じた役割分担がＲＮＮ１１自体に自己組織的に構成される。 Targeted time series data often has a complexity that includes both motion that varies on a short time scale and motion that varies on a long time scale. If the time constant is set to one, only one of the two time scales can be handled. On the other hand, by setting two different time constants and preparing neuron groups with different time characteristics, the role sharing according to the difference in time scale is configured in a self-organized manner in the RNN 11 itself.

その結果、短い時間スケールで変化するニューロングループにより、複数の時系列パターンに出現するｃｈｕｎｋに相当する部分が表現され、長い時間スケールで変化するニューロングループにより、より抽象化されたレベルでのシーケンス、例えばｃｈｕｎｋの組み合わせの順序や切り替えが表現される。すなわち、時間特性が異なる２つの時系列パターンを表現することが可能になる。 As a result, a portion corresponding to a chunk that appears in a plurality of time series patterns is represented by a neuron group that changes on a short time scale, and a sequence at a more abstract level by a neuron group that changes on a long time scale, For example, the order and switching of chunk combinations are expressed. That is, it is possible to express two time series patterns having different time characteristics.

なお、実験の結果、時間特性が異なる２つの時系列パターンを表現可能にするには、大きい時定数としてのスローの時定数τが、小さい時定数としてのファーストの時定数τの５倍以上大きければよいことが判った。 As a result of the experiment, in order to be able to express two time series patterns with different time characteristics, the slow time constant τ as a large time constant must be greater than 5 times the first time constant τ as a small time constant. I knew it would be good.

ステップＳ４において出力層のアクティベーションを演算する処理が行われる。すなわち、演算部５３はニューロン５１により更新された１００次元の内部状態Ｘ^u _i（ｔ）のうちのｉ＝１乃至６４のモーター８１に関する６４次元の部分から、次のソフトマックス関数の式（７）に基づいてアクティベーションを演算する。これにより出力される６４次元の予測値Ｙ_i（ｔ）の合計は１になるように調整される。 In step S4, processing for calculating activation of the output layer is performed. That is, the arithmetic unit 53 calculates the following softmax function equation (7) from the 64-dimensional part related to the motor 81 of i = 1 to 64 in the 100-dimensional internal state X ^u _i (t) updated by the neuron 51. ) To calculate the activation. As a result, the total of the 64-dimensional predicted values Y _i (t) output is adjusted to be 1.

同様に、演算部５４はニューロン５１により更新された１００次元の内部状態Ｘ^u _i（ｔ）のうちのｉ＝６５乃至１００の視覚センサー８２に関する３６次元の部分から、ソフトマックス関数の式（７）に基づいてアクティベーションを演算する。これにより出力される３６次元の予測値Y_i（ｔ）の合計は１になるように調整される。 Similarly, the calculation unit 54 calculates the softmax function expression (7) from the 36-dimensional part of the visual sensor 82 with i = 65 to 100 in the 100-dimensional internal state X ^u _i (t) updated by the neuron 51. ) To calculate the activation. Thus, the sum of the 36-dimensional predicted values Y _i (t) output is adjusted to be 1.

すなわち、出力される予測データは、モーター８１に関するデータのグループ、あるいは視覚センサー８２に関するデータのグループといった各グループ毎に、その予測値の合計が１になるように調整される。 That is, the output prediction data is adjusted so that the sum of the prediction values becomes 1 for each group such as the data group related to the motor 81 or the data group related to the visual sensor 82.

なお、出力の合計が１になるようにするには、ソフトマックス関数を用いるのに代えて、出力をその総和で除算するようにしてもよい。 In order to make the sum of outputs equal to 1, instead of using the softmax function, the output may be divided by the sum.

一方、演算部５５はニューロン５２により更新された内部状態Ｃ^u _i（ｔ）から、次のシグモイド関数の式（８）に基づいてアクティベーションを演算する。すなわち式（８）の予測値Ｃ_i（ｔ）が演算される。 On the other hand, the calculation unit 55 calculates the activation from the internal state C ^u _i (t) updated by the neuron 52 based on the following expression (8) of the sigmoid function. That is, the predicted value C _i (t) of Expression (8) is calculated.

ステップＳ５において教示用データ取得部２１は、あらかじめ用意されている教示用データを取得する。この教示用データは、モーター８１に関する８次元のデータｍ^* _i（ｔ＋１）と、視覚センサー８２に関する２次元のデータＳ^* _i（ｔ＋１）の、合計１０次元のデータにより構成されている。 In step S5, the teaching data acquisition unit 21 acquires teaching data prepared in advance. This teaching data is constituted by 10-dimensional data in total, that is, 8-dimensional data m ^* _i (t + 1) related to the motor 81 and 2-dimensional data S ^* _i (t + 1) related to the visual sensor 82.

ステップＳ６において変換部２２は教示用データの次元を変換する。すなわち、変換部２２のＴＰＭ１０１は、モーター８１に関する８次元の教示用データｍ^* _i（ｔ＋１）を６４次元のデータＹ^* _i（ｔ）に変換する。同様に、ＴＰＭ１０２は、視覚センサー８２に関する２次元の教示用データＳ^* _i（ｔ＋１）を３６次元のデータＹ^* _i（ｔ）に変換する。この変換処理は、ステップＳ２における変換部１２による変換処理と同様の処理である。 In step S6, the conversion unit 22 converts the dimension of the teaching data. That is, the TPM 101 of the conversion unit 22 converts the eight-dimensional teaching data m ^* _i (t + 1) related to the motor 81 into 64-dimensional data Y ^* _i (t). Similarly, the TPM 102 converts the two-dimensional teaching data S ^* _i (t + 1) relating to the visual sensor 82 into 36-dimensional data Y ^* _i (t). This conversion process is the same as the conversion process performed by the conversion unit 12 in step S2.

ステップＳ７において演算部２３は、ステップＳ４で出力層３２より出力された１００次元の予測データＹ_i（ｔ）と、ステップＳ６で次元変換された１００次元の教示用データＹ^* _i（ｔ）の誤差を演算し、記憶する。この実施の形態においては、誤差Ｅは，出力層３２より出力された予測データＹ_i（ｔ）と教示用データＹ^* _i（ｔ）の差分ではなく、次のＫＬ−ｄｉｖｅｒｇｅｎｃｅの式（９）により演算される。 In step S7, the calculation unit 23 calculates the 100-dimensional prediction data Y _i (t) output from the output layer 32 in step S4 and the 100-dimensional teaching data Y ^* _i (t) converted in step S6. Calculate and store the error. In this embodiment, the error E is not the difference between the prediction data Y _i (t) output from the output layer 32 and the teaching data Y ^* _i (t), but the following KL-divergence equation (9) Is calculated by

誤差Eを式（９）で定義すると、教示用データＹ^* _i（ｔ）が小さければ、Ｙ^* _i（ｔ）／Ｙ_i（ｔ）の値が大きくても誤差Ｅの全体に与える影響は小さくなる。すなわち、教示用データＹ^* _i（ｔ）が大きいほど、Ｙ^* _i（ｔ）／Ｙ_i（ｔ）の値が誤差Ｅに大きく影響する。 If the error E is defined by the equation (9), if the teaching data Y ^* _i (t) is small, even if the value of Y ^* _i (t) / Y _i (t) is large, the influence on the entire error E is Get smaller. That is, as the teaching data Y ^* _i (t) is larger, the value of Y ^* _i (t) / Y _i (t) has a greater influence on the error E.

このように、誤差Eを確率分布の距離尺度である式（９）で定義することで、誤差Eを従来のように差分で表現する場合に較べて、よりもっともらしさの高い解、すなわち最適な重み付け係数を得ることができる。また、誤差Eを確率分布の式で定義することにより、ステップＳ２のソフトマックス関数に従って、グループ毎に、その出力の合計が１になるように次元を増加させる処理、およびステップＳ４の各グループ毎に合計が１になるように予測値を演算する処理とあいまって、より多くの時系列パターンを学習、予測することが可能になる。 In this way, by defining the error E by the equation (9) which is a distance scale of the probability distribution, compared to the conventional case where the error E is expressed by a difference, a more plausible solution, that is, an optimum A weighting factor can be obtained. Further, by defining the error E by a probability distribution equation, the process of increasing the dimension so that the sum of the outputs becomes 1 for each group according to the softmax function of step S2, and for each group of step S4 In combination with the process of calculating the predicted value so that the sum is 1, more time series patterns can be learned and predicted.

ステップＳ８において出力層のアクティベーションの次元を変換する処理が行われる。すなわち、変換部１３のＴＰＭ７１は、演算部５３により演算された６４次元の予測値Ｙ_i（ｔ）を、式（１０）に基づいて８次元のデータｍ_tに変換する。 In step S8, a process of converting the activation dimension of the output layer is performed. That, TPM71 of converter 13 is calculated by the arithmetic unit 53 the 64-dimensional prediction value Y _i a (t), it is converted into 8-dimensional data m _t based on the equation (10).

この変換は、変換部１２における変換と逆の変換となる。なお、式（１０）において、ｉは１乃至６４の値をとり、ｌは、モーター８１に関するデータの次元に対応し、１乃至８の値をとる。 This conversion is the reverse of the conversion in the conversion unit 12. In equation (10), i takes a value from 1 to 64, and l takes a value from 1 to 8 corresponding to the dimension of data relating to the motor 81.

同様に、ＴＰＭ７２も、演算部５４より供給される視覚センサー８２に関する３６次元のデータＹ_i（ｔ）を、２次元のデータに変換する。この場合、式（１０）において、ｉは１乃至３６の値をとり、ｌは、視覚センサー８２に関するデータの次元に対応し、１，２の値をとる。 Similarly, the TPM 72 also converts 36-dimensional data Y _i (t) relating to the visual sensor 82 supplied from the calculation unit 54 into two-dimensional data. In this case, in Expression (10), i takes a value of 1 to 36, and l takes a value of 1 or 2 corresponding to the dimension of the data relating to the visual sensor 82.

このようにして生成されたデータが次のタイミングで行われるステップＳ２の処理で使用される。 The data generated in this way is used in the process of step S2 performed at the next timing.

ステップＳ９において演算部２３は、変数ｔが予め設定されている値Ｔと等しいかを判定する。変数ｔが値Ｔと等しくない場合、すなわち処理回数がまだＴ回に達していない場合、ステップＳ１０において演算部２３は、変数ｔを１だけインクリメントする。その後処理はステップＳ２に戻り、それ以降の処理が繰り返される。この繰り返しの処理では、上述したようにステップＳ２においては、ｔ＝０ではない場合の処理が実行される。 In step S9, the calculation unit 23 determines whether or not the variable t is equal to a preset value T. If the variable t is not equal to the value T, that is, if the number of processes has not yet reached T, the calculation unit 23 increments the variable t by 1 in step S10. Thereafter, the processing returns to step S2, and the subsequent processing is repeated. In this repetitive process, as described above, in step S2, a process when t = 0 is not performed.

ステップＳ９において変数ｔが値Ｔと等しいと判定された場合、すなわち処理回数がＴ回に達した場合、ステップＳ１１において演算部２３は、バックプロパゲーションスルータイム法で重み付け係数を更新する。この更新は、次の式（１１）に従って行われる。式（１１）においてαは所定の係数である。 When it is determined in step S9 that the variable t is equal to the value T, that is, when the number of processing times reaches T, the calculation unit 23 updates the weighting coefficient by the back propagation through time method in step S11. This update is performed according to the following equation (11). In Expression (11), α is a predetermined coefficient.

学習の目的は誤差Ｅを最小化する重み付け係数ω_ijを見つけることである。このため、重み付け係数ω_ijを変化させたときの誤差Ｅの増加量∂Ｅ／∂ω_ijに従って、−∂Ｅ／∂ω_ijの方向に重み付け係数ω_ijを変化させていく処理が行われる。式（１４）乃至式（１７）に示す増加量∂Ｅ／∂ω_ij（∂Ｅ／∂ω^bx _ij，∂Ｅ／∂ω^bc _ij，∂Ｅ／∂ω^ux _ij，∂Ｅ／∂ω^uc _ij，）は、次の式（１２）と式（１３）を、時間Tから反復して計算することで求めることができる。 The purpose of learning is to find a weighting factor ω _ij that minimizes the error E. Therefore, with increasing amount ∂E / ∂ω _ij of the error E at the time of changing the weighting factor omega _ij, processing to continue by changing the weighting factor omega _ij in the direction of -∂E / ∂ω _ij is performed. Increases ∂E / ∂ω _ij (∂E / ∂ω ^bx _ij , ∂E / ∂ω ^bc _ij , ∂E / ∂ω ^ux _ij , ∂E / ∂ω ^uc shown in equations (14) to (17) _ij ,) can be obtained by repeatedly calculating the following equations (12) and (13) from time T.

すなわち、時系列シーケンスの始まりの時刻ｔを０、終わりの時刻ｔをTとする。式（９）から明らかなように、誤差Eは時刻ｔ＝０から時刻ｔ＝TまでのＥ（ｔ）の総和である。式（１４）乃至式（１７）に示されているように、増加量∂Ｅ／∂ω_ijは、∂Ｅ／∂Ｘ（ｔ），∂Ｅ／∂Ｃ（ｔ）の時間についての総和から求められるが、式（１２）と式（１３）に示されているように、∂Ｅ／∂Ｘ（ｔ），∂Ｅ／∂Ｃ（ｔ）は、時刻（ｔ＋１）における∂Ｅ／∂Ｘ（ｔ＋１），∂Ｅ／∂Ｃ（ｔ＋１）から求められる漸化式の形で与えられる。 That is, the start time t of the time series sequence is 0, and the end time t is T. As is clear from equation (9), the error E is the sum of E (t) from time t = 0 to time t = T. As shown in the equations (14) to (17), the increase amount ∂E / ∂ω _ij is calculated from the sum of the times ∂E / ∂X (t) and ∂E / ∂C (t). As shown in equations (12) and (13), ∂E / ∂X (t) and ∂E / ∂C (t) are expressed as ∂E / ∂X at time (t + 1). (T + 1), given in the form of a recurrence formula obtained from ∂E / ∂C (t + 1).

そこで、式（１２）と式（１３）に従って、最後の時刻Ｔにおける∂Ｅ／∂Ｘ（Ｔ），∂Ｅ／∂Ｃ（Ｔ）として初期値を与えることで∂Ｅ／∂Ｘ（Ｔ−１），∂Ｅ／∂Ｃ（Ｔ−１）が演算され、以下同様に、式（１２）と式（１３）の演算を繰り返すことで、∂Ｅ／∂Ｘ（Ｔ−２），∂Ｅ／∂Ｃ（Ｔ−２），∂Ｅ／∂Ｘ（Ｔ−３），∂Ｅ／∂Ｃ（Ｔ−３），・・・と時刻ｔ＝０までの値が演算される。そして、式（１４）乃至式（１７）に従って、それらの演算値を積和することで∂Ｅ／∂ω_ijが求められる。 Therefore, according to the equations (12) and (13), by giving initial values as ∂E / ∂X (T) and ∂E / ∂C (T) at the last time T, ∂E / ∂X (T− 1), ∂E / ∂C (T-1) are calculated, and thereafter, similarly, the calculation of Equation (12) and Equation (13) is repeated, so that ∂E / ∂X (T-2), ∂E / ∂C (T-2), ∂E / ∂X (T-3), ∂E / ∂C (T-3), ... and values up to time t = 0 are calculated. Then, according to Equations (14) to (17), 演算 E / 演算 ω _ij is obtained by multiplying and summing the calculated values.

式（１３）のｆ（ｘ）は、コンテキストＣ_i（ｔ）のアクティベーションの演算に用いられるシグモイド関数であり、δ_ikはＫｒｏｎｅｃｋｅｒｄｅｌｔａであり、ｉ＝ｋのとき１、それ以外のとき０となる。 In Equation (13), f (x) is a sigmoid function used to calculate the activation of the context C _i (t), δ _ik is a Kronecker delta, 1 when i = k, and 0 otherwise. It becomes.

なお、バックプロパゲーションスルータイム法による重み付け係数を更新する処理は、１００次元まとめて行われる。 In addition, the process which updates the weighting coefficient by a back propagation through time method is performed collectively for 100 dimensions.

ステップＳ１２において演算部２３は、学習の回数があらかじめ設定されている所定の回数に達したかを判定する。まだ所定の回数に達していない場合、処理はステップＳ１に戻り、それ以降の処理が繰り返される。学習処理の回数が所定の回数に達した場合、学習処理は終了する。 In step S12, the calculation unit 23 determines whether the number of learning has reached a predetermined number set in advance. If the predetermined number of times has not been reached, the process returns to step S1, and the subsequent processes are repeated. When the number of learning processes reaches a predetermined number, the learning process ends.

なお、誤差が十分小さくなったとき、学習処理を終了させるようにすることもできる。 Note that when the error becomes sufficiently small, the learning process may be terminated.

次に図５のフローチャートを参照して、ＴＰＭ６１の学習処理について説明する。 Next, the learning process of the TPM 61 will be described with reference to the flowchart of FIG.

ステップＳ４１において、変換部１２は、次の式（１８）で表されるロボット２のモーター８１のデータに対応する８個のサンプルのデータｍ_tを取得する。 In step S41, the converting unit 12 obtains the data m _t of the eight samples corresponding to the data of the motor 81 of the robot 2 is expressed by the following equation (18).

ステップＳ４２において変換部１２は、次の式（１９）で表されるように、サンプルのデータｍ_tとノードｉ（図３の８×８個のノードのうちのｉ番目のノード）の参照ベクトルｋ_i（式（４））の差が最小となる勝者ノードのｋｃを求める。 Converter 12 at step S42, as expressed by the following equation (19), the reference vector of the sample data m _t and node i (i-th node of the 8 × 8 nodes in FIG. 3) The kc of the winner node that minimizes the difference of k _i (formula (4)) is obtained.

ステップＳ４３において変換部１２は、勝者ノードであるｃ番目のノードを中心に、近傍の参照ベクトルｋ_iを更新式に従って更新する。更新式は式（２０）と式（２１）に示されている。近傍関数である式（２１）において、αは学習レート、δはパラメータ定数を表す。また式（２１）の右辺の分子の||ｒ_i−ｒ_c||は、勝者ノードであるノードｃからノードｉまでの距離を表す。上記式に基づく更新により、勝者ノードに近いノードほど強く学習が行われる。 In step S43, the conversion unit 12 updates the nearby reference vector k _i according to the update formula with the c-th node being the winner node as the center. The update formulas are shown in formula (20) and formula (21). In equation (21), which is a neighborhood function, α represents a learning rate and δ represents a parameter constant. In addition, || r _i −r _c || of the numerator on the right side of Expression (21) represents the distance from node c, which is the winner node, to node i. As a result of the update based on the above formula, the closer to the winner node, the stronger the learning.

ステップＳ４４において変換部１２は、学習の回数が所定の回数に達したかを判定する。学習の回数がまだ所定の回数に達していない場合、処理はステップＳ４１に戻り、それ以降の処理が繰り返される。学習の回数が所定の回数に達した場合、学習処理は終了される。 In step S44, the conversion unit 12 determines whether the number of learning has reached a predetermined number. If the number of learnings has not yet reached the predetermined number, the process returns to step S41, and the subsequent processes are repeated. When the number of learning reaches a predetermined number, the learning process is terminated.

以上のようにしてＴＰＭ６１の学習処理が行われる。 As described above, the learning process of the TPM 61 is performed.

ＴＰＭ６２においても式（１８）乃至式（２１）を用いて同様の学習処理が行われる。ただし、式（１８）においては、モーター８１に対応するサンプルの８次元のデータｍ_tではなく、視覚センサー８２に対応するサンプルの２次元のデータＳ_tとされ、式（１９）と式（２０）においては、モーター８１に対応するサンプルの８次元のデータｍ_tに代えて、視覚センサー８２に対応するサンプルの２次元のデータＳ_tが用いられる。 In the TPM 62, the same learning process is performed using the equations (18) to (21). However, in the equation (18), instead of the data m _t 8 D of samples corresponding to the motor 81, is a two-dimensional data S _t of samples corresponding to the visual sensor 82, the formula (19) and (20 in), instead of the 8-dimensional data m _t of samples corresponding to the motor 81, the two-dimensional data S _t of samples corresponding to the visual sensor 82 is used.

ＴＰＭ１０１，１０２，７１，７２においても同様の学習処理が行われる。 A similar learning process is performed in the TPMs 101, 102, 71, and 72.

次に図６のフローチャートを参照して、情報処理装置１によるロボット２の駆動処理について説明する。 Next, the driving process of the robot 2 by the information processing apparatus 1 will be described with reference to the flowchart of FIG.

ステップＳ７１において変換部１２は、データを取得し、次元を変換する。すなわち、ＴＰＭ６１は、ロボット２のモーター８１に関する８次元のデータｍ_tを取得し、これを式（１）に従って、６４次元のデータＸ_i（ｔ）に変換する。同様に、ＴＰＭ６２も視覚センサー８２に関する２次元のデータＳ_tを取得し、これを式（１）に従って、３６次元のデータＸ_i（ｔ）に変換する。 In step S71, the conversion unit 12 acquires data and converts dimensions. That, TPM61 acquires the 8-dimensional data m _t regarding motor 81 of the robot 2, which according to the equation (1), into a 64-dimensional data X _i (t). Similarly, to get the 2-dimensional data S _t about visual sensor 82, which according to equation (1), into a 36-dimensional data X _i (t) TPM62.

ステップＳ７２において入力層がＸ_i（ｔ），Ｃ_j（ｔ−１）を取得する。すなわち、ニューロン４１が、ＴＰＭ６１で変換されたモーター８１に関する６４次元のデータＸ_i（ｔ）と、ＴＰＭ６２で変換された視覚センサー８２に関する３６次元のデータＸ_i（ｔ）の、合計１００次元のデータＸ_i（ｔ）を取得する。またニューロン４２がコンテキストループ５６によりフィードバックされるコンテキストＣ_j（ｔ−１）を取得する。 In step S72, the input layer acquires X _i (t), C _j (t−1). That is, the neuron 41 has a total of 100-dimensional data including 64-dimensional data X _i (t) related to the motor 81 converted by the TPM 61 and 36-dimensional data X _i (t) related to the visual sensor 82 converted by the TPM 62. X _i (t) is acquired. Further, the neuron 42 obtains the context C _j (t−1) fed back by the context loop 56.

ステップＳ７３において出力層の内部状態を更新する処理が行われる。すなわち、ニューロン５１が式（５）に従って、内部状態Ｘ^u _i（ｔ）を更新する。またニューロン５２が式（６）に従って、内部状態Ｃ^u _i（ｔ）を更新する。 In step S73, processing for updating the internal state of the output layer is performed. That is, the neuron 51 updates the internal state X ^u _i (t) according to the equation (5). The neuron 52 updates the internal state C ^u _i (t) according to the equation (6).

ステップＳ７４において出力層のアクティベーションを演算する処理が行われる。すなわち、演算部５３は、式（７）に従って、１００次元の内部状態Ｘ^u _i（ｔ）のうちのｉ＝１乃至６４のモーター８１に関する６４次元の部分から、ソフトマックス関数の式（７）に基づいてアクティベーションを演算する。演算部５４は、式（７）に従って、１００次元の内部状態Ｘ^u _i（ｔ）のうちのｉ＝６５乃至１００の視覚センサー８２に関する３６次元の部分から、ソフトマックス関数の式（７）に基づいてアクティベーションを演算する。 In step S74, processing for calculating activation of the output layer is performed. That is, according to the equation (7), the computing unit 53 calculates the softmax function equation (7) from the 64-dimensional part related to the motor 81 with i = 1 to 64 in the 100-dimensional internal state X ^u _i (t). Calculate activation based on. The computing unit 54 converts the 36-dimensional portion of the 100-dimensional internal state X ^u _i (t) related to the visual sensor 82 of i = 65 to 100 into the softmax function equation (7) according to the equation (7). Calculate the activation based on it.

演算部５５は、シグモイド関数の式（８）に従って、アクティベーションを演算する。このアクティベーションとしての予測値Ｃ_i（ｔ）は、コンテキストループ５６により入力層３１のニューロン４１にフィードバックされる。 The computing unit 55 computes activation according to the sigmoid function equation (8). The predicted value C _i (t) as the activation is fed back to the neuron 41 of the input layer 31 by the context loop 56.

ステップＳ７５において変換部１３は次元を変換する。この変換は、変換部１２における変換と逆の変換となる。すなわち、ＴＰＭ７１は、式（１０）に従って、演算部５３より供給されるモーター８１に関する６４次元のデータＹ_i（ｔ）を、８次元のデータに変換する。なお、式（１０）において、ｉは１乃至６４の値をとり、ｌは、モーター８１に関するデータの次元に対応し、１乃至８の値をとる。 In step S75, the conversion unit 13 converts dimensions. This conversion is the reverse of the conversion in the conversion unit 12. That is, the TPM 71 converts the 64-dimensional data Y _i (t) related to the motor 81 supplied from the calculation unit 53 into 8-dimensional data according to the equation (10). In equation (10), i takes a value from 1 to 64, and l takes a value from 1 to 8 corresponding to the dimension of data relating to the motor 81.

変換されたデータはアクションとしてロボット２に供給される。 The converted data is supplied to the robot 2 as an action.

ステップＳ７６においてロボットを駆動する処理が実行される。すなわち、モーター８１は、ＴＰＭ７１より供給される８次元のデータに基づいて駆動され、視覚センサー８２は、ＴＰＭ７２より供給される２次元のデータに基づいて駆動される。 In step S76, a process for driving the robot is executed. That is, the motor 81 is driven based on the eight-dimensional data supplied from the TPM 71, and the visual sensor 82 is driven based on the two-dimensional data supplied from the TPM 72.

ステップＳ７７において変換部１２は、終了が指示されたかを判定する。まだ終了が指示されてない場合には、処理はステップＳ７１に戻り、それ以降の処理が繰り返される。終了が指示された場合、処理は終了される。 In step S77, the conversion unit 12 determines whether termination is instructed. If the end has not been instructed yet, the process returns to step S71 and the subsequent processes are repeated. If termination is instructed, the process is terminated.

ＲＮＮ１１の重み付け係数は適正に学習されている。従って、ロボット２の行動は適正に制御される。 The weighting coefficient of the RNN 11 is properly learned. Therefore, the behavior of the robot 2 is appropriately controlled.

以上のように、この実施の形態においては、取得されたデータを、グループ毎に、１峰性の確率分布のより高次元のデータに変換し、高次元のデータから重み付け係数に基づいて、データのグループ毎に合計が１になるように予測値を演算するようにしたので、より多くの時系列パターンを学習、予測することができる。 As described above, in this embodiment, the acquired data is converted into higher-dimensional data of a unimodal probability distribution for each group, and the data is converted based on the weighting coefficient from the higher-dimensional data. Since the predicted value is calculated so that the sum is 1 for each group, more time series patterns can be learned and predicted.

このように、より多くの時系列パターンを学習、予測することができるのは、次元数を増加することで、情報の直交性が増大するからと考えられる。すなわち、一般的に得られる教示用データの値のほとんどは、上下限ぎりぎりの値ではなく、所定の範囲のダイナミックレンジのうちの狭い一部の範囲に集中していることが多い。このことはダイナミックレンジが効率的に十分利用されていないことを意味する。 The reason why more time series patterns can be learned and predicted in this way is considered to be that the orthogonality of information increases by increasing the number of dimensions. That is, most of the values of the teaching data that are generally obtained are not limited to the upper and lower limits, but are often concentrated in a narrow part of the predetermined dynamic range. This means that the dynamic range is not fully utilized efficiently.

そこで近いデータは近くに、遠いデータは遠い位置にマッピングするように次元を増加することで、すなわち１峰性の特性で演算により次元を増加することで、トポロジーを保持したまま次元を増加するようにすれば、少ない次元の空間で近接配置されていたデータ（ベクトル）を、多次元空間に分散して位置づけることができ、ベクトル相互の直交性が増加する（それまで近接していて識別が困難であったベクトルを識別することが可能になる）と考えられる。 Therefore, by increasing the dimension so that near data is mapped to near and far data is mapped to a far position, that is, by increasing the dimension by calculation with a single-peak characteristic, the dimension is increased while maintaining the topology. By doing so, data (vectors) placed close together in a small dimensional space can be distributed and positioned in a multidimensional space, increasing the orthogonality between vectors (it is difficult to identify because they are close to each other) It is possible to identify the vector that was

また、次元を増加するのにＴＰＭを用いると、１峰性の特性を無理なく自己組織的に実現することが可能となる。 In addition, when TPM is used to increase the dimension, it is possible to realize a unimodal characteristic without unreasonableness in a self-organizing manner.

例えば、図７に示されるように、ベクトルＡとＢを、２次元のｘｙ座標上で比較すると、両者は近接しているため、その差異はわずかであり、両者の識別は困難である。しかし、ｚ座標を増加し、ベクトルＡをベクトルＡ’として３次元のｘｙｚ座標でベクトルＢと比較すると、ベクトルＡ’とベクトルＢとの差異は大きく表現することができ、ベクトルＡ’とベクトルＢを識別することが容易になるのである。 For example, as shown in FIG. 7, when the vectors A and B are compared on the two-dimensional xy coordinates, the two are close to each other, so the difference between them is slight, and it is difficult to identify them. However, when the z coordinate is increased and the vector A is the vector A ′ and compared with the vector B in the three-dimensional xyz coordinates, the difference between the vector A ′ and the vector B can be expressed greatly. It becomes easy to identify.

もちろんロボット２からの出力自体の次元を増加すればより適正な学習、予測が可能になる。しかしそのようにするには、センサーの数を多くする必要があり、構成が複雑となり、コスト高となる。また、配置することができるセンサーの数はハードウェアに依存し、必ずしも多くのセンサーを配置することができるとは限らない。さらに、例えば近傍に多くのセンサーを配置したとしても、ほとんど同じセンサー出力しか得られなければ、センサーの数を増加したことに対応する十分な効果は必ずしも得られない。また相互の関係が希薄なセンサーが多くなれば、その出力に基づき学習し、予測することが困難になる。 Of course, if the dimension of the output itself from the robot 2 is increased, more appropriate learning and prediction are possible. However, in order to do so, it is necessary to increase the number of sensors, which complicates the configuration and increases the cost. Further, the number of sensors that can be arranged depends on the hardware, and a large number of sensors cannot always be arranged. Furthermore, for example, even if many sensors are arranged in the vicinity, if only the same sensor output can be obtained, a sufficient effect corresponding to the increase in the number of sensors is not necessarily obtained. In addition, if the number of sensors with weak mutual relationships increases, it becomes difficult to learn and predict based on the output.

これに対して、この実施の形態のように、少ないセンサー出力の次元を、グループ毎に、１峰性の特性で演算により増加するようにすれば、トポロジーを保持したまま次元を増加することができるので、構成を複雑にしたり、コスト高とすることなく、学習、予測が容易となる。 On the other hand, if the dimension of a small sensor output is increased by calculation with a single-peak characteristic for each group as in this embodiment, the dimension can be increased while maintaining the topology. Therefore, learning and prediction are facilitated without complicating the configuration and increasing the cost.

図８と図９は、本発明を適用した実験の結果を表している。この実験においては、ロボット２に、ホームポジションからオブジェクトを上下に３回動かしてホームポジションに戻る動作を実行させている。 8 and 9 show the results of an experiment to which the present invention is applied. In this experiment, the robot 2 is caused to perform an operation of moving the object up and down three times from the home position to return to the home position.

図８は教示用データを、図９は予測値を、それぞれ表している。 FIG. 8 shows teaching data, and FIG. 9 shows predicted values.

図８Ａは、モーター８１に関する低次元の（高次元変換する前の）教示用データｍ^*（ｔ）を表し、図９Ａは、対応する（ＴＰＭ７１により低次元に変換された）予測値ｍ（ｔ）を表している。これらはロボット２の各部の関節角度を０乃至１の値で表している。実線は左腕回内回外、破線は左肘屈曲伸展、一点鎖線は右肩屈曲伸展、点線は右腕回内回外を、それぞれ表している。すなわち、８次元の内の４次元が示されている。 FIG. 8A shows low-dimensional teaching data m ^* (t) relating to the motor 81 (before high-dimensional conversion), and FIG. 9A shows the corresponding predicted value m (t converted to low dimension by the TPM 71). ). These represent the joint angles of each part of the robot 2 with values of 0 to 1. The solid line represents the left arm pronation, the broken line represents the left elbow flexion extension, the alternate long and short dash line represents the right shoulder flexion extension, and the dotted line represents the right arm pronation. That is, four of the eight dimensions are shown.

図８Ｂは、視覚センサー８２に関する低次元の（高次元変換する前の）教示用データＳ^*（ｔ）を表し、図９Ｂは対応する（ＴＰＭ７２により低次元変換された）予測値Ｓ（ｔ）を表している。これらはロボット２が上下に動かすオブジェクトの位置を０乃至１の値で表している。実線と破線はオブジェクトのＸ，Ｙ座標をそれぞれ表している。 FIG. 8B shows low-dimensional teaching data S ^* (t) related to the visual sensor 82 (before high-dimensional conversion), and FIG. 9B shows the corresponding predicted value S (t) (low-dimensional converted by the TPM 72). Represents. These represent the position of the object that the robot 2 moves up and down as a value from 0 to 1. The solid line and the broken line represent the X and Y coordinates of the object, respectively.

図８Ｃは、ＴＰＭ１０１により高次元変換された、モーター８１に関する６４次元の教示用データＹ^* _i（ｔ）（ｉ＝１乃至６４）と、ＴＰＭ１０２により高次元変換された、視覚センサー８２に関する３６次元の教示用データＹ^* _i（ｔ）（ｉ＝６５乃至１００）を表している。図９Ｃは、図８Ｃに対応する、演算部５３により演算された、モーター８１に関する６４次元の予測値Ｙ_i（ｔ）（ｉ＝１乃至６４）と、演算部５４により演算された、視覚センサー８２に関する３６次元の予測値Ｙ_i（ｔ）（ｉ＝６５乃至１００）を表している。いずれも、８×８個または６×６個のノードを縦方向に分割し、それを縦方向に配置した状態で表されている。各ノードの０乃至１のアクティビティの値がグレースケールで表されている。 FIG. 8C shows the 64-dimensional teaching data Y ^* _i (t) (i = 1 to 64) related to the motor 81 converted in high dimensions by the TPM 101 and 36-dimensional related to the visual sensor 82 converted in high dimensions by the TPM 102. Data for teaching Y ^* _i (t) (i = 65 to 100). 9C corresponds to FIG. 8C, the 64-dimensional predicted value Y _i (t) (i = 1 to 64) regarding the motor 81 calculated by the calculation unit 53, and the visual sensor calculated by the calculation unit 54. The 36-dimensional predicted value Y _i (t) (i = 65 to 100) for 82 is represented. In both cases, 8 × 8 nodes or 6 × 6 nodes are divided in the vertical direction and are arranged in the vertical direction. The values of activities 0 to 1 of each node are represented in gray scale.

図９Ｄは、コンテキストアクティベーションＣ_i（ｔ）を表している。ｉ＝１乃至６０は、式（６）のτ_i＝５のファーストコンテキストの場合であり、ｉ＝６１乃至８０は、式（６）のτ_i＝７０のスローコンテキストの場合である。この場合も、各ノードの０乃至１のアクティビティの値がグレースケールで表されている。ファーストコンテキストの場合（ｉ＝１乃至６０の場合）、短い時間で激しく変化し、スローコンテキストの場合（ｉ＝６１乃至８０の場合）、変化がゆっくりであることがわかる。 FIG. 9D represents the context activation C _i (t). i = 1 to 60 is the case of the fast context of τ _i = 5 in equation (6), and i = 61 to 80 is the case of the slow context of τ _i = 70 in equation (6). Also in this case, the values of activities 0 to 1 of each node are represented in gray scale. In the case of the first context (i = 1 to 60), it changes drastically in a short time, and in the case of the slow context (i = 61 to 80), it can be seen that the change is slow.

この実験により、１０次元のデータの学習と予測が適正に行われていることを確認することができた。 From this experiment, it was confirmed that learning and prediction of 10-dimensional data were properly performed.

なお、上記実施の形態における次元数、ニューロン数などは、例に過ぎず、本発明はそれらに限定されるものではない。 The number of dimensions, the number of neurons, etc. in the above embodiment are merely examples, and the present invention is not limited to them.

上述した一連の処理は、ハードウェアにより実行させることもできるし、ソフトウエアにより実行させることもできる。一連の処理をソフトウエアにより実行させる場合には、そのソフトウエアを構成するプログラムが、専用のハードウェアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに、プログラム記録媒体からインストールされる。 The series of processes described above can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software executes various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a program recording medium in a general-purpose personal computer or the like.

コンピュータにインストールされ、コンピュータによって実行可能な状態とされるプログラムを格納するプログラム記録媒体は、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory),DVD(Digital Versatile Disc)を含む）、光磁気ディスクを含む）、もしくは半導体メモリなどよりなるパッケージメディアであるリムーバブルメディア、または、プログラムが一時的もしくは永続的に格納されるROMやハードディスクなどにより構成される。プログラム記録媒体へのプログラムの格納は、必要に応じてルータ、モデムなどのインタフェースである通信部を介して、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の通信媒体を利用して行われる。 Program recording media for storing programs that are installed in a computer and are ready to be executed by the computer are magnetic disks (including flexible disks), optical disks (CD-ROM (Compact Disc-Read Only Memory), DVD (Digital Versatile). Disk), a magneto-optical disk), or a removable medium that is a package medium made of semiconductor memory, or a ROM or hard disk in which a program is temporarily or permanently stored. The program is stored in the program recording medium using a wired or wireless communication medium such as a local area network, the Internet, or digital satellite broadcasting via a communication unit that is an interface such as a router or a modem as necessary. Is called.

なお、本明細書において、プログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 In the present specification, the steps for describing a program are not only processes performed in time series in the order described, but also processes that are executed in parallel or individually even if they are not necessarily processed in time series. Is also included.

また、本発明の実施の形態は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiments of the present invention are not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention.

本発明の情報処理装置の一実施の形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of the information processing apparatus of this invention. リカレントニューラルネットワークの学習処理を説明するフローチャートである。It is a flowchart explaining the learning process of a recurrent neural network. トポロジープリザービングマップを説明する図である。It is a figure explaining a topology preserving map. 他のトポロジープリザービングマップを説明する図である。It is a figure explaining another topology preserving map. トポロジープリザービングマップの学習処理を説明するフローチャートである。It is a flowchart explaining the learning process of a topology preserving map. ロボットの駆動処理を説明するフローチャートである。It is a flowchart explaining the drive process of a robot. ベクトルの多次元化を説明する図である。It is a figure explaining multidimensionalization of a vector. 実験に用いた教示用データを示す図である。It is a figure which shows the data for teaching used for experiment. 実験により得られた予測値を示す図である。It is a figure which shows the predicted value obtained by experiment.

符号の説明Explanation of symbols

１情報処理装置，２ロボット，１１リカレントニューラルネットワーク，１２，１３変換部，２１教示用データ取得部，２２変換部，２３演算部，３１入力層，３２出力層，４１，４２，５１ニューロン，５１Ａセルフィフィードバックループ，５２ニューロン，５２Ａセルフィフィードバックループ，５３乃至５５演算部，５６コンテキストループ，６１，６２，７１，７２トポロジープリザービングマップ，８１モーター，８２視覚センサー，１０１，１０２トポロジープリザービングマップ DESCRIPTION OF SYMBOLS 1 Information processing apparatus, 2 Robot, 11 Recurrent neural network, 12, 13 Conversion part, 21 Teaching data acquisition part, 22 Conversion part, 23 Operation part, 31 Input layer, 32 Output layer, 41, 42, 51 Neuron, 51A Selfie feedback loop, 52 neurons, 52A Selfie feedback loop, 53 to 55 computing units, 56 context loops, 61, 62, 71, 72 topology preserving map, 81 motor, 82 visual sensor, 101, 102 topology preserving map

Claims

所定の対象をセンシングすることにより取得されたセンサ信号であるデータを、グループ毎に、１峰性の確率分布のより高次元のデータにトポロジープリザービングマップにより変換する高次元変換部と、
高次元の前記データから重み付け係数に基づいて、ソフトマックス関数を用いて前記グループ毎に合計が１になるように予測値を演算するリカレントニューラルネットワークと、
前記センサ信号に対応して取得された教示用データを、グループ毎に、１峰性の確率分布のより高次元の教示用データに変換する他の高次元変換部と
を備え、
前記リカレントニューラルネットワークは、高次元の前記教示用データの値が大きいほど誤差の値が大きくなるように前記誤差を演算し、その演算結果に基づき、時系列のパターンの学習処理を行う
情報処理装置。 A high-dimensional conversion unit that converts data, which is a sensor signal acquired by sensing a predetermined target, into a higher-dimensional data of a unimodal probability distribution for each group using a topology preserving map ;
A recurrent neural network that calculates a predicted value based on a weighting coefficient from the high-dimensional data so that the total is 1 for each group using a softmax function ;
Another high-dimensional conversion unit that converts the teaching data acquired corresponding to the sensor signal into higher-dimensional teaching data of a unimodal probability distribution for each group ;
The recurrent neural network calculates the error so that the error value increases as the value of the high-dimensional teaching data increases, and performs time-series pattern learning processing based on the calculation result.
Information processing device.

前記リカレントニューラルネットワークは、KL-divergenceに基づいて前記誤差を演算する
請求項１に記載の情報処理装置。 The information processing apparatus according to claim 1 , wherein the recurrent neural network calculates the error based on KL-divergence.

前記リカレントニューラルネットワークは、連続時間型のリカレントニューラルネットワークであり、コンテキストを演算するニューロンは、その値が大きいほど、過去の内部状態の影響を大きくする時定数として、複数の値を有する
請求項２に記載の情報処理装置。 The recurrent neural network is a continuous-time recurrent neural network, a neuron for calculating the context, larger the value, the time constant for increasing the influence of the past internal state, claim 2 having a plurality of values The information processing apparatus described in 1.

複数の前記時定数のうちの大きい時定数は、小さい時定数の５倍以上の大きさである
請求項３に記載の情報処理装置。 The information processing apparatus according to claim 3 , wherein a large time constant of the plurality of time constants is five times as large as a small time constant.

前記予測値を、所定の対象をセンシングすることにより取得されたセンサ信号である前記データと同じ次元に変換する低次元変換部をさらに備える
請求項１に記載の情報処理装置。 The information processing apparatus according to claim 1, further comprising: a low-dimensional conversion unit that converts the predicted value into the same dimension as the data that is a sensor signal acquired by sensing a predetermined target .

高次元変換部と、リカレントニューラルネットワークと、他の高次元変換部とを備える情報処理装置の情報処理方法において、
前記高次元変換部が、所定の対象をセンシングすることにより取得されたセンサ信号であるデータを、グループ毎に、１峰性の確率分布のより高次元のデータにトポロジープリザービングマップにより変換し、
前記リカレントニューラルネットワークが、高次元の前記データから重み付け係数に基づいて、ソフトマックス関数を用いて前記グループ毎に合計が１になるように予測値を演算し、
前記他の高次元変換部が、前記センサ信号に対応して取得された教示用データを、グループ毎に、１峰性の確率分布のより高次元の教示用データに変換し、
前記リカレントニューラルネットワークが、高次元の前記教示用データの値が大きいほど誤差の値が大きくなるように前記誤差を演算し、その演算結果に基づき、時系列のパターンの学習処理を行う
ステップを含む情報処理方法。 In an information processing method of an information processing apparatus including a high-dimensional conversion unit, a recurrent neural network, and another high-dimensional conversion unit,
The high-dimensional conversion unit converts data, which is a sensor signal acquired by sensing a predetermined target , into a higher-dimensional data of a unimodal probability distribution for each group using a topology preserving map ,
The recurrent neural network calculates a predicted value based on a weighting coefficient from the high-dimensional data , using a softmax function so that the sum is 1 for each group ,
The other high-dimensional conversion unit converts the teaching data acquired corresponding to the sensor signal into higher-dimensional teaching data of a unimodal probability distribution for each group,
The recurrent neural network includes a step of calculating the error so that the error value increases as the value of the high-dimensional teaching data increases, and performs a time-series pattern learning process based on the calculation result. Information processing method .

所定の対象をセンシングすることにより取得されたセンサ信号であるデータを、グループ毎に、１峰性の確率分布のより高次元のデータにトポロジープリザービングマップにより変換し、
高次元の前記データから重み付け係数に基づいて、ソフトマックス関数を用いて前記グループ毎に合計が１になるように予測値を演算し、
前記センサ信号に対応して取得された教示用データを、グループ毎に、１峰性の確率分布のより高次元の教示用データに変換し、
高次元の前記教示用データの値が大きいほど誤差の値が大きくなるように前記誤差を演算し、その演算結果に基づき、時系列のパターンの学習処理を行う
処理をコンピュータに実行させるプログラム。 Data that is a sensor signal acquired by sensing a predetermined target is converted into higher-dimensional data of a unimodal probability distribution for each group by a topology preserving map ,
Based on the weighting coefficient from the high-dimensional data, a predicted value is calculated so that the total is 1 for each group using a softmax function ,
The teaching data acquired corresponding to the sensor signal is converted into higher-dimensional teaching data of a unimodal probability distribution for each group,
A program for causing a computer to execute a process of performing a time-series pattern learning process based on the calculation result so that the error value increases as the value of the high-dimensional teaching data increases .