JP2004054638A

JP2004054638A - Cross modal learning device and method for recognition processing

Info

Publication number: JP2004054638A
Application number: JP2002211759A
Authority: JP
Inventors: Takamasa Echizen; 越膳　孝方; Sou Yamada; 山田　想; Koji Tsujino; 辻野　広司
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2002-07-19
Filing date: 2002-07-19
Publication date: 2004-02-19

Abstract

<P>PROBLEM TO BE SOLVED: To attain learning processing which flexibly corresponds to environmental changes by integrating a plurality of sensor information without excessively increasing calculation processing volume and storage capacity. <P>SOLUTION: A rewiring circuit 24 integrates information from a plurality of sensors 20 for measuring external information into shape supra-modality and motion supra-modality. A noticing reinforced learning part 26 learns the parameters of the shape supra-modality and the motion supra-modality on the basis of expected value maximizing algorithm. A combined storage map 34 calculates combined relation between the shape supra-modality and the motion supra-modality by using a weight vector and determines a noticing class related to predictive motion. An action control part 36 outputs action corresponding to the determined noticing class. Since the invented cross modal learning device 10 optimizes parameters like self-teacher without using teacher data, the device is flexibly fitted into environmental changes. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、種類の異なる複数のセンサ情報を統合処理し、適切な行動を選択するためのクロスモーダル学習装置及び認識処理方法に関する。
【０００２】
【従来の技術】
認識処理や計測などの分野では、認識や計測の精度を向上させ信頼度を上げるための手段として、複数のセンサを利用したセンサ統合技術が用いられる。しかし、周囲の環境に応じてセンサの検出能力や精度にはばらつきがあるため、周囲環境に適さないセンサを用いると十分な認識処理が行えなくなる。そこで、種類の異なる複数のセンサを用いて認識処理を行うシステムが提案されている。
【０００３】
そのようなシステムの例として、特開２００２−３２７５４号公報においては、種類の異なる複数のセンサで各々検出された検出データに関する情報に対する重みを周囲環境の変化に応じて適切に変化させることにより、環境変化に対して柔軟に適合可能とした認識処理装置が提供されている。この装置では、予め想定される周囲環境に対する適切な重みを周囲環境情報と共に記憶手段に記憶しておく。そして、実動作においては、検出領域の周囲環境に関する情報を入力し、周囲環境を参照データとして記憶手段から重みを引き出して用いる手法を取る。従って、周囲環境が記憶手段に記憶されていないような想定外の状況になった場合には、適切な重み設定を行うことが極めて困難であるという問題がある。
【０００４】
この発明のように、予め与えられるデータすなわち教師データに基づいて学習やパラメータ最適化を行う場合には、環境の変化に対する柔軟な対応が一般に困難であることが知られている。
【０００５】
教師データを用いず、外部情報のみに基づいて対象を認識するようにすれば環境の変化に柔軟に対応する装置を実現することができる。このようなシステムの例が、特開平８−３０５８５３号公報に開示されている。記号推論システムのように問題解決に必要な情報を全て記号表現として与えるようなシステムでは、外界情報から記号への変換と意味の付与を全て人間が行う必要があるため、解決可能な問題が極めて限定されてしまう。この問題を解決するため、上記発明による意思決定装置は、センサ情報を処理することによって外界情報を内部データ表現に変換するシステムを構築している。すなわち、意思決定装置は、各種のセンサ情報から形状、動き、色、テクスチャ等の属性に対応する情報を抽出しそれらをシステム内部で照合可能な内部データ表現に変換する機構を有する。そして、その内部データ表現と記憶蓄積部に記憶されたデータとの照合を条件付き確率に基づいて行うことにより、認識対象物体の認識を行う。これにより、従来の記号表現とは異なる柔軟な推論を爆発的な計算量の増大なしに行うことが可能となる。
【０００６】
しかしながら、この発明では、センサ情報として画像信号を用いる場合のみが示されており、他の複数種類のセンサ情報を同時に入力する場合の処理方法に関しては述べられていない。また、認識対象と行動計画の対応表は前もって与えられており、これを自ら獲得するための学習手段については述べられていない。
【０００７】
また、本願発明者らによるＴ．　Ｋｏｓｈｉｚｅｎ，　Ｋ．　Ａｋａｔｓｕｋａ　ａｎｄ　Ｈ．　Ｔｓｕｊｉｎｏ，　“Ａ　Ｃｏｍｐｕｔａｔｉｏｎａｌ　Ｍｏｄｅｌ　ｏｆ　Ａｔｔｅｎｔｉｖｅ　Ｖｉｓｕａｌ　Ｓｙｓｔｅｍ　Ｉｎｄｕｃｅｄ　ｂｙ　Ｃｏｒｔｉｃａｌ　ＮｅｕｒａｌＮｅｔｗｏｒｋｓ”，　Ｎｅｕｒｏｃｏｍｐｕｔｉｎｇ，　Ｖｏｌ．　４４−４６Ｃ，　ｐｐ．　８７９−８８５　（Ｊｕｎ．　２００２）は、センサの取得した画像を複数の局所領域に分割し、各局所領域毎に特徴を抽出し、抽出した特徴を画像全体で融合させてモダリティ情報とし、この情報に基づいて行動推定のための注意のクラスを決定する画像処理装置を開示している。この発明も、使用するセンサ情報は画像のみであり、それ以外のセンサ情報をも統合する処理方法は述べられていない。
【０００８】
【発明が解決しようとする課題】
本発明は上記の点に鑑みてなされたものであり、複数のセンサ情報を統合し、環境の変化に柔軟に対応可能な学習処理を、計算処理量と記憶容量を過大に増大させることなく実現する装置及び方法を提供することを目的とする。
【０００９】
【課題を解決するための手段】
本発明によるクロスモーダル学習装置は、複数のセンサ情報をモダリティ（感覚情報処理の様式）情報に集約する再配線回路と、教師データ無しで内部パラメータの学習を行う注意的強化学習機構とを備えることを特徴とする。
【００１０】
本発明の一実施形態によると、クロスモーダル学習装置は、外界の情報を計測する複数のセンサと、各センサで捉えた情報を位置座標におけるセンサモダリティ情報と速度座標におけるセンサモダリティ情報に分離するモダリティ分離手段と、前記位置座標におけるセンサモダリティ情報と前記速度座標におけるセンサモダリティ情報をそれぞれ形状スプラモダリティと動きスプラモダリティに統合する再配線回路と、形状スプラモダリティと動きスプラモダリティのパラメータを学習する注意的強化学習部と、形状スプラモダリティと動きスプラモダリティに基づいて注意クラスを決定する結合記憶マップと、前記注意クラスに応じた行動を出力する行動制御部を備える。
【００１１】
この形態では、クロスモーダル学習装置は、複数のセンサ情報を形状という（スプラ）モダリティ情報と動きという（スプラ）モダリティ情報とに統合する処理を行う。このように異種センサによる情報を統合することによって、より正確な行動の選択が行える。これらスプラモダリティ情報は、確率密度分布と密接に関連している。
【００１２】
各センサで捉えた情報は、必要に応じてさらに複数のサブセンサデータに分離され、該サブセンサデータは位置座標におけるセンサモダリティ情報と速度座標におけるセンサモダリティ情報に分離される。
【００１３】
注意的強化学習部は、位置座標におけるセンサモダリティ情報と速度座標におけるセンサモダリティ情報に基づいて前記行動の事後確率を計算する強化学習部をさらに含む。強化学習部は、事後確率を使用して、期待値最大化アルゴリズムによりスプラモダリティ情報に関するパラメータを更新する。
【００１４】
注意的強化学習部は、事後確率を評価して、評価結果に応じた報酬を出力する行動評価部をさらに含む。また、報酬値を使用してコスト関数を計算し、該コスト関数と所定の閾値との比較結果に応じて、重みベクトルの変更を指示する注意転調を行うか否かを判断する注意要求／転調部をも含む。そして、強化学習部は注意転調に応じて重みベクトルを再計算する。結合記憶マップは、重みベクトルを使用して前記形状スプラモダリティと前記動きスプラモダリティの結合関係を計算し、予測的行動に関わる注意クラスを決定する。
【００１５】
このような構成によって、クロスモーダル学習装置は教師データに基づかずに自己教師的にパラメータ最適化を行うので、環境の変化に対して柔軟に適合可能となる。さらに、本発明の一実施形態によれば、スプラモダリティ情報と注意クラスとの関係を記述するマップのみを準備すれば足り、このマップのサイズはセンサ数には依存しないので、自己教師型の学習処理を少ない計算処理量と少ない記憶容量で実現することができる。
【００１６】
【発明の実施の形態】
＜本発明の概念的説明＞
初めに、図１を参照して本発明による認識処理の概念を説明する。
【００１７】
本発明では、ＣＣＤカメラやＸ線センサのように画像情報を捉える視覚センサ、マイクロフォン等の聴覚センサ、触圧を感知する触覚センサ等、モダリティ情報を計測可能な任意のセンサを用いて外界情報を計測する。これらのセンサにより計測されたデータは、必要に応じてサブセンサ情報に分離される。得られたサブセンサデータから、センサの種類に応じた適切な処理により、まず位置情報が抽出される。
【００１８】
抽出された位置情報は、二次元位置座標にｓ_−１，ｋ（Ｘ，Ｙ）として展開される。図１中の四角形６が認識対象の平面であり、四角形６の中の小丸が位置座標成分を表す。二次元位置座標は、例えば一方が視覚センサでもう一方が聴覚センサである場合のように、各サブセンサ毎に異なることもあるが、一定の対応関係を持っており所定の計算で同じ座標に変換することができる。
【００１９】
展開された位置座標成分ｓ_−１，ｋ（Ｘ，Ｙ）は、それぞれ時間差分を計算され、二次元速度座標上にｓ_＋１，ｋ（Ｖ_Ｘ，Ｖ_Ｙ）として展開される。位置座標と同様に、図中の四角形８が認識対象の平面であり、四角形８の中の小丸が速度座標成分を表す。
【００２０】
本明細書では、ｓ_−１，ｋ（Ｘ，Ｙ）を「位置センサモダリティ」、ｓ_＋１，ｋ（Ｖ_Ｘ，Ｖ_Ｙ）を「速度センサモダリティ」と呼ぶ。２つを併せて「センサモダリティ」と呼ぶ場合もある。
【００２１】
続いて、期待値最大化アルゴリズム（ＥＭアルゴリズム）を用いた学習によって獲得されるパラメータの集合θを用いて、別々に得られる複数のセンサモダリティを、それぞれ二次元位置座標及び二次元速度座標上の各点において、スプラモダリティ情報ρ_−１（ｓ_−１（Ｘ，Ｙ））とρ_＋１（ｓ_＋１（Ｖ_ｘ，Ｖ_ｙ））に統合する。前者は、二次元位置座標に基づいて計算されていることから、認識対象の形状様式を表現していると捉えることができ、従ってこれを「形状スプラモダリティ」と呼ぶ。後者は、二次元速度座標に基づいて計算されていることから、認識対象の動き様式を表現していると捉えることができ、従ってこれを「動きスプラモダリティ」と呼ぶ。ρ_−１（ｓ_−１）とρ_＋１（ｓ_＋１）は、集合的にスプラモダリティ情報ρ_ｉ（ｓ_ｉ）（ｉ＝±１、ここで、ｉ＝−１は形状を表し、ｉ＝＋１は動きを表す）と記載する場合もある。
【００２２】
続いて、スプラモダリティ情報ρ_−１（ｓ_−１）及びρ_＋１（ｓ_＋１）と、スプラモダリティ情報の結合関係を表す重みベクトルｗ_ｉを用いて、予測的行動情報に関わる注意クラスΩが決定される。ここで、予測的行動情報とは、後の処理においてどのような行動を出力させるかという指針となる情報のことである。重みベクトルｗ_ｉはＡｌｌｅｎ−Ｃａｈｎアルゴリズムを用いて最適化される。そして、決定された注意クラスΩに基づいた行動が外界に出力されることになる。
【００２３】
本願発明者らによる上述の文献における処理は、図１において一点鎖線で囲んだ部分に相当すると考えることができる。この発明においては、センサ情報として視覚センサのみが用いられている。そして、センサから得られる情報に基づいて抽出された動きに関わる局所的な情報がＥＭアルゴリズムにより計算され、さらにこの局所的な情報から注意クラスが決定されている。
【００２４】
これに対し、本発明によるクロスモーダル学習装置では、複数のセンサを統合的に用いることが可能である。そして、これらセンサ情報に基づいて形状と動きの両者に関わるスプラモダリティ情報を計算し、スプラモダリティ情報に基づいて注意クラスが決定される。すなわち、本発明によるクロスモーダル学習装置の特徴の１つは、複数のセンサ情報を形状スプラモダリティと動きスプラモダリティに集約する処理を行う点にある。このような処理を行うことによって、センサ数の増加に伴う計算量の増大が抑制される。
【００２５】
＜構成要素の説明＞
次に、本発明の一実施形態であるクロスモーダル学習装置１０について、図２を参照して説明する。クロスモーダル学習装置１０は、Ｚ個のセンサ２０、Ｍ個のサブセンサ２２、再配線回路２４、注意的強化学習部２６、結合記憶マップ３４、行動制御部３６の各機能ブロックにより構成される。このうち、図１のセンサモダリティはサブセンサ２２に対応し、形状スプラモダリティと動きスプラモダリティは再配線回路２４で計算される。また、図１中のＥＭアルゴリズムによるパラメータθの学習と重みベクトルｗ_ｉの設定は、注意的強化学習部２６で行われる。クロスモーダル学習装置１０は、センサ２０を除きコンピュータにより実現することができ、各機能ブロックはソフトウェアでもハードウェアでも構成することができる。
【００２６】
ここで、予め符号の説明をしておく。時間ステップｊとは、センサ２０が計測した外界情報を出力するタイミングであり、これは注意的強化学習部２６における学習の時間ｔとは無関係に進行する。
【００２７】
また、ある変数Ａについて「Ａ_{ｉ，ｋ，ｊ}」のように表記した場合、Ａは、ｋ番目（１≦ｋ≦Ｍ）のサブセンサ２２が時間ステップｊにおいて出力したスプラモダリティｉに関連する情報であることを表している。ここで、上述したようにｉ＝±１はスプラモダリティ情報の種類である。また、「Ａ_ｉ，ｋ」のようにインデックスｊを付けずに表記した場合は、Ａは、ｋ番目のサブセンサ２２が直近の時間ステップにおいて出力したスプラモダリティｉに関連する情報であることを示す。
【００２８】
以下、クロスモーダル学習装置１０の各ブロックの機能を順に説明する。
【００２９】
Ｚ個のセンサ２０は、それぞれ外界の情報を計測する。各センサ２０は、ＣＣＤカメラやＸ線センサのように画像情報を捉える視覚センサ、マイクロフォン等の聴覚センサ、触圧を感知する触覚センサ等、モダリティ情報を計測可能な任意のセンサとすることができる。外界の特性に応じて、若しくは行動を出力する対象に応じて、ＣＣＤカメラと赤外線カメラ、Ｘ線センサといったように異なる作用で視覚情報を得るセンサを組み合わせることも、あるいは視覚センサと聴覚センサ、触覚センサといったように異なるモダリティ情報を得るセンサを組み合わせることも可能である。
【００３０】
計測されたデータは、必要に応じてサブセンサデータに分離される。例えば、センサ２０がＣＣＤカメラである場合は、赤（Ｒ）、緑（Ｇ）、青（Ｂ）それぞれの出力データを３つのサブセンサデータとする。また、センサ２０がマイクロフォンである場合は、計測した音声データを適当な数の周波数帯域に分離し、各周波数帯域における信号をそれぞれサブセンサデータとする。このようにセンサデータをサブセンサデータに分離するのは、１つの測定データから詳細な情報を得るためである。また、１つのセンサデータから分離するサブセンサデータの数は任意である。それぞれのサブセンサデータを得る機構がサブセンサ２２である。サブセンサ２２を用いずに、センサデータを直接以下で述べるセンサモダリティ情報としても良い。
【００３１】
各サブセンサデータからは、位置情報ｓ_ｉ，ｋ（ｉ＝±１、１≦ｋ≦Ｍ）が抽出される。例えば、センサ２０がＣＣＤカメラである場合のように、センサの計測する情報の中に既に位置に関する情報が含まれている場合は、そのまま位置情報とする（つまり、画像強度がそのまま位置情報ｓ_ｉ，ｋとなる）。マイクロフォンで計測された音響信号データのように、センサの計測する情報の中に位置に関する情報が明確には含まれていない場合は、マイクロフォンアレーのような音源方向推定手法を適用して位置情報を抽出する。二次元位置座標は各サブセンサ毎に異なるが、所定の対応関係付けがされている。
【００３２】
抽出された位置情報は、二次元位置座標上に位置センサモダリティｓ_−１，ｋ（Ｘ，Ｙ）として展開される。速度情報は位置センサモダリティｓ_−１，ｋ（Ｘ，Ｙ）の時間差分として求められ、二次元速度座標上に速度センサモダリティｓ_＋１，ｋ（Ｖ_Ｘ，Ｖ_Ｙ）として展開される。
【００３３】
サブセンサデータが画像情報である場合に速度情報を抽出する方法の一例は、上述の文献に詳細に記載されている。
【００３４】
得られた位置センサモダリティｓ_−１，ｋ（Ｘ，Ｙ）と速度センサモダリティｓ_＋１，ｋ（Ｖ_Ｘ，Ｖ_Ｙ）は、再配線回路２４及び強化学習部２８へ出力される。
【００３５】
再配線回路２４は、Ｍ個のサブセンサ２２によって分離された全ての位置センサモダリティと速度センサモダリティをそれぞれ受け取り、２つのスプラモダリティ情報ρ_−１（ｓ_−１）及びρ_＋１（ｓ_＋１）に統合する。スプラモダリティ情報は、次式のように表される。
【００３６】
【数５】

ここで、ｓ_ｉはｓ_ｉ，ｋ（ｉ＝±１、１≦ｋ≦Ｍ）の集合である。パラメータα_ｉ，ｋ、μ_ｉ，ｋ、σ_ｉ，ｋはそれぞれｋ番目のセンサ情報ｓ_ｉ，ｋの混合比率、平均、分散であり、まとめてθ_ｉ，ｋで表す。また、｜ｄｓ_ｉ，ｋ｜はｋ番目のサブセンサの分解能を表し、ｓ_ｉ，ｋと同じ物理次元（例えば、輝度、周波数、温度等）を有する。さらに、パラメータα_ｉ，ｋは無次元数であり、ｉ＝±１それぞれに対し０≦α_ｉ，ｋ≦１かつΣα_ｉ，ｋ＝１を満たす。後述するように、このα_ｉ，ｋの割合を変えることで、各センサモダリティのスプラモダリティ情報に対する寄与度を変化させ、外界の環境の変化に適合することができる。
【００３７】
パラメータθ_ｉ，ｋは、時刻ｔ＝０において適当な値に初期化され、ｔ＞０においては強化学習部２６により計算される。
【００３８】
これらのパラメータも、ｉ＝−１に対しては位置座標において、ｉ＝＋１に対しては速度座標において、それぞれ分布を持つ。同様に、スプラモダリティ情報ρ_ｉ（ｓ_ｉ）は二次元座標上に分布する。計算されたスプラモダリティ情報ρ_ｉ（ｓ_ｉ）は、強化学習部２８及び結合記憶マップ３４へ出力される。
【００３９】
このように、再配線回路２４によりスプラモダリティ情報を統合することによって、学習すべきパラメータはセンサの数に関わらずスプラモダリティ情報に対応するｉ＝±１の２組だけになり、扱うべき計算量の増大を抑制できる。尚、本実施形態ではセンサ情報を２つのスプラモダリティに統合しているが、スプラモダリティを３つ以上としても良い。
【００４０】
別法では、スプラモダリティ情報をより簡便な次式で計算しても良い。
【００４１】
【数６】

ここで、ｙ_ｉはパラメータ（関数）ｓ_ｉに応じて「−１」または「＋１」に決められる。この場合もα_ｉ，ｋは無次元数であり、０≦α_ｉ，ｋ≦１かつΣα_ｉ，ｋ＝１を満たす。この式を用いた場合、計算量はさらに削減される。
【００４２】
注意的強化学習部２６は、ＥＭアルゴリズムを用いた学習によるθの更新と再配線回路２４への出力を行う強化学習部２８、事後確率に対する報酬値を計算する行動評価部３０、及びコスト関数を計算し、重みベクトルｗ_ｉを最適化する注意要求／転調部３２により構成される。
【００４３】
強化学習部２８は、サブセンサ２２から最近のセンサモダリティ情報を受け取り、各ｉ、ｋに対し事後確率Ｐ_ｉｋ ^ｐｏｓｔを計算し、行動評価部３０へ出力する。さらに、事後確率を使用して、ＥＭアルゴリズムによりパラメータθ_ｉ，ｋを更新し、その結果を再配線回路２４へ出力する。これらは位置座標及び速度座標の各点に対して行われる。ここで、事後確率Ｐ_ｉｋ ^ｐｏｓｔは、サブセンサｋからのセンサモダリティの各スプラモダリティに対する寄与の割合である。
【００４４】
つまり、強化学習部２８は、前回の計算により決定された行動の影響によって外界から計算されるセンサモダリティ情報が変化していることから、その情報を利用してスプラモダリティ情報の構築に必要なパラメータを自己教師的に学習しようとする。
【００４５】
各時間ステップｊにおける事後確率Ｐ_ｉｋ ^ｐｏｓｔは、スプラモダリティを式（１）で計算した場合は、次式で求められる。
【００４６】
【数７】

【００４７】
スプラモダリティを式（２）で計算した場合は、次式で求められる。
【００４８】
【数８】

【００４９】
求めた事後確率Ｐ_ｉｋ ^ｐｏｓｔを用いて、強化学習部２８は、次式により新たなパラメータθ_ｉ，ｋ＝（α_ｉ，ｋ，μ_ｉ，ｋ，σ_ｉ，ｋ）を計算する。
【００５０】
【数９】

ここで、Ｑは学習に用いるｋ番目のセンサ情報の数、すなわちセンサ出力の時間ステップ数である。この数Ｑはセンサ間で異なっていても良い。
【００５１】
σ_ｉ，ｋについては、次式で計算することも可能である。
【００５２】
【数１０】

ここで、ηはパラメータであり、［０，１］の範囲の値に設定される。
【００５３】
新たなパラメータθ_ｉ，ｋは再配線回路２４へ出力され、次の時間ステップで再配線回路２４におけるモダリティ情報の計算に用いられる。
【００５４】
強化学習部２８は、注意要求／転調部３２から報酬値εが入力されるとき、Ａｌｌｅｎ−Ｃａｈｎアルゴリズムにより重みベクトルｗを最適化する役割も有するが、これについては後述する。
【００５５】
行動評価部３０は、強化学習部２８から受け取った事後確率Ｐ_ｉｋ ^ｐｏｓｔに基づいて、前回選択された行動によって生じた外界の変化が適当であったか否かを評価する。
【００５６】
具体的には、行動評価部３０はまず次式により報酬値１／εの逆数εを計算する。
【００５７】
【数１１】

【００５８】
上式によると、二次元位置座標で積分した事後確率と二次元速度座標で積分した事後確率とが近い場合に、εは０に近づく。つまり、報酬値１／εは高く与えられることになる。
【００５９】
一例として、図２のセンサとして視覚センサ、聴覚センサ、触覚センサの３種類のセンサを使用しており、各センサについての事後確率Ｐ_ｉｋ ^ｐｏｓｔの積分値が図３のように分布していると仮定する。この場合、各センサに対する事後確率の積分値がｉ＝−１（位置座標）とｉ＝＋１（速度座標）とで大きく異なるため、εが大きくなり、従って報酬値１／εは低くなる。これに対し、同様のセンサの組合せに対し図４のような分布が得られると、ｉ＝−１とｉ＝＋１とで事後確率の積分値の分布が相似しているため、εが小さくなり高い報酬値１／εが得られる。計算した報酬値１／εは、注意要求／転調部３２へ出力される。式（７）のようにεの計算式を与えることで、ｉ＝−１とｉ＝＋１とで積分値の分布を相似させる方向に重みベクトルｗ_ｉが最適化される。
【００６０】
注意要求／転調部３２は、行動評価部３０の計算した報酬値に基づいて、重みベクトルｗ_ｉを更新するよう強化学習部２８に注意要求をするべきか否かを決定する。ここで、用語「注意要求」とは、複数のスプラモダリティ情報を取り扱う際に付加する重みなどの内部パラメータの変更が必要であると判断することを意味し、「注意転調」とはこの要求を出力することによりパラメータの変更を実行させることを意味する。
【００６１】
具体的には、注意要求／転調部３２は、まず次式によりコスト関数Ψの計算を行う。
【００６２】
【数１２】

Φ（ｗ_ｉ）は二重井戸型ポテンシャルであり、例えば次式で表される形状を持つ。
【００６３】
【数１３】

ｃは適切に設定されるパラメータであり、ｃ＝１とした場合のΦ（ｗ_ｉ）の形状を図５に示す。
【００６４】
尚、上記の式（８）において用いた積分の標識
【数１４】

は、Ａなる量を認識対象の位置座標全体において積分した結果と速度座標全体において積分した結果とを加算することを意味している。すなわち、次式の関係を満たす。
【００６５】
【数１５】

【００６６】
式（８）のコスト関数Ψの計算式における右辺第一項は、二重井戸型ポテンシャルΦ（ｗ_ｉ）に基づくエネルギーを低くすることを目的とする項であり、第二項は学習を進めて行くときにｗの変化を滑らかにし収束性を向上させることを目的とする項である。
【００６７】
注意要求／転調部３２は、コスト関数Ψの計算結果を所定の閾値と比較する。そして、コスト関数Ψが閾値より大きければ、注意を要求すべきと判断し、強化学習部２８に対して注意要求を行い、εを出力する。コスト関数Ψが閾値より小さければ、注意要求を行わない。
【００６８】
注意要求されると、上述の強化学習部２８は次式に従って新しい重みベクトルｗ_ｉを計算する。ｗ_ｉの計算も位置座標及び速度座標の各点において行われる。
【００６９】
【数１６】

【００７０】
この関数の形状を図７に示す。式（１２）で計算される重みベクトルｗ_ｉを用いると、前回のｗ_ｉを用いる場合と比較して報酬値１／εが大きくなる（すなわちεが小さくなる）ことがＡｌｌｅｎ−Ｃａｈｎアルゴリズムにより保証されている。Ａｌｌｅｎ−Ｃａｈｎアルゴリズムの代わりに、サポートベクターマシンやニューラルネットワークを用いても良い。
【００７１】
計算された重みベクトルｗ_ｉは結合記憶マップ３４へ出力される。
【００７２】
結合記憶マップ３４は、スプラモダリティ情報と予測的行動情報との関係を記憶しており、強化学習部２８から新たな重みベクトルｗ_ｉを受け取ると、その関係を書き換える。そして、再配線回路２４から受け取るスプラモダリティ情報を用いて、予測的行動情報に関わる注意クラスΩを次式により決定する。
【００７３】
【数１７】

【００７４】
決定された注意クラスΩは、行動制御部３６に送られる。
【００７５】
行動制御部３６は、注意クラスΩを受け取り、対応する行動出力Ｏに変換して外界へ出力する。注意クラスΩと行動出力Ｏの対応関係は、事前に教師付き学習により獲得しておくか、または人間が予め適切な出力を想定して入力しておく。あるいは、より高次の学習機能により対応関係を自己獲得するようにしても良い。
【００７６】
＜認識処理のプロセス＞
以上説明した各機能ブロックを有するクロスモーダル学習装置は、外界の情報に対して異種のモダリティ情報の結合関係を自己学習的に更新していくことによって、外界の状態を認識し、外界に適応した行動を出力する。このときの各機能ブロック間の連係を図６のフローチャートを参照して説明する。
【００７７】
初期状態と開始時について説明すると、初めに位置座標及び速度座標の各点におけるパラメータθ_ｉ，ｋ＝（α_ｉ，ｋ、μ_ｉ，ｋ、σ_ｉ，ｋ）及び重みベクトルｗ_ｉの初期値を設定する。一例として、α_ｉ，ｋ＝１／Ｍ、μ_ｉ，ｋ＝０、σ_ｉ，ｋ＝１とし、またｗ_−１は区間［−１，０］において発生させた乱数、ｗ_＋１は区間［０，１］において発生させた乱数を初期値とする。乱数は、例えばＣ言語における疑似乱数発生関数ｒａｎｄ（）を用いて生成することが可能である。
【００７８】
計算を開始し、センサ２０は外界情報を計測し、サブセンサ２２は時刻ｔ＝０のセンサモダリティｓ_ｉ，ｋを取得する。再配線回路２４は、センサモダリティｓ_ｉ，ｋとθ_ｉ，ｋの初期値を用いて、スプラモダリティρ_ｉ（ｓ_ｉ）を計算する。時刻ｔ＝０においては、まだ強化学習部２８で学習すべき対象が存在しないので、スプラモダリティρ_ｉ（ｓ_ｉ）は結合記憶マップ３４にのみ出力される。結合記憶マップ３４は、スプラモダリティρ_ｉ（ｓ_ｉ）に対して、重みベクトルｗ_ｉの初期値を用いて注意クラスΩを決定する。行動制御部３６は、注意クラスΩに対応する行動Ｏを外界に出力する。以降、センサ２０で捉えられる外界の情報には、前の時間ステップで外界に出力された行動Ｏの影響が外界を経由して反映されることになる。
【００７９】
次の時間ステップからは、以下に説明する処理が繰り返される。
【００８０】
センサ２０は、時間ステップｊで外界の情報を計測する（Ｓ４８）。サブセンサ２２はその情報を位置モダリティと速度モダリティに分離する（Ｓ５０）。分離されたセンサモダリティは、再配線回路２４と強化学習部２８へ出力される。
【００８１】
再配線回路２４は、前時刻に強化学習部２８において決定されたパラメータθ_ｉ，ｋを使用して、位置モダリティと速度モダリティを２つのスプラモダリティ情報ρ_−１（ｓ_−１）及びρ_＋１（ｓ_＋１）に統合する（Ｓ５２）。スプラモダリティ情報は、強化学習部２８と結合記憶マップ３４へ出力される。
【００８２】
このとき、強化学習部２８は、サブセンサ２２から受け取ったセンサモダリティを用いて、上記式（５）または式（６）により新たなパラメータθ_ｉ，ｋを計算する（Ｓ５４）。計算された新たなパラメータθ_ｉ，ｋは再配線回路２４へ送られ、次の時間ステップでの再配線回路２４におけるスプラモダリティ情報ρ_−１（ｓ_−１）及びρ_＋１（ｓ_＋１）の構築（式（１）または式（２））に使用されることになる。
【００８３】
強化学習部２８は、さらに現時点のセンサモダリティ情報を使用して、位置座標及び速度座標の各点における事後確率Ｐ_ｉｋ ^ｐｏｓｔを計算する（Ｓ５６）。事後確率Ｐ_ｉｋ ^ｐｏｓｔは行動評価部３０へ出力される。
【００８４】
行動評価部３０は、強化学習部２８で計算された事後確率の分布を使用して、式（７）により報酬値１／εの逆数εを計算する（Ｓ５８）。報酬値は注意要求／転調部３２へ送られる。
【００８５】
注意要求／転調部３２は、行動評価部から入力される報酬値の逆数εを用いて、式（８）に従ってコスト関数Ψを計算する（Ｓ６０）。ここで用いる重みベクトルｗ_ｉは前時刻の計算で得られた値である。注意要求／転調部３２は、コスト関数Ψを所定の閾値（例えば０．０１）と比較し（Ｓ６２）、Ψが閾値より大きければ、重みベクトルｗ_ｉの更新が必要であると判断（注意要求）し、εを強化学習部へ出力（注意転調）する（Ｓ６４）。コスト関数Ψが閾値より小さければ、重みベクトルｗ_ｉは適切に設定されていると判断し、ｗ_ｉの更新をせずにステップＳ６８へ進む。
【００８６】
注意要求／転調部３２が注意転調をした場合、強化学習部２８は新しい重みベクトルｗ_ｉを計算し、結合記憶マップを書き換える（Ｓ６６）。
【００８７】
結合記憶マップ３４は、再配線回路２４から受け取ったスプラモダリティ情報に基づいて、式（１３）に従って注意クラスΩを決定する（Ｓ６８）。重みベクトルｗ_ｉが更新されていた場合は、同一値のスプラモダリティ情報から計算される注意クラスΩが変化することになる。重みベクトルが更新されていなければ前回の値を用いる。決定した注意クラスΩは行動制御部３６へ出力される。
【００８８】
行動制御部３６は、注意クラスΩを行動Ｏに変換し、外界に出力する（Ｓ７０）。以上で、１つの時間ステップの計算が終了し、次の時間ステップで再びステップＳ４８からの処理を繰り返し行う。
【００８９】
クロスモーダル学習装置が多数の時間ステップの間学習を継続すると、上記の処理を全て実行しなくても、学習をすることができるようになる。以下ではこの場合について説明する。
【００９０】
図６のフローチャートでは、ステップＳ５４でＥＭアルゴリズムを用いた学習によりパラメータθ_ｉ，ｋ＝（α_ｉ，ｋ，μ_ｉ，ｋ，σ_ｉ，ｋ）の全てを更新する計算を行った。しかしながら、ある程度学習が進んだ段階においては、パラメータθ_ｉ，ｋのうちμ_ｉ，ｋ及びσ_ｉ，ｋの変化をゼロと見なすことができるようになり、従って各センサモダリティ情報の混合係数α_ｉ，ｋ及びスプラモダリティ情報の重みベクトルｗ_ｉのみを更新するだけで環境への適合が可能となる。
【００９１】
学習の進み具合の判断は、例えば以下の条件式を用いることにより行う。
【００９２】
【数１８】

ここで、α_{ｔｈｒｅｓ}は定数であり、例えば０．７のような数値に設定する。上記の式が成立する場合には、次回の時間ステップからはμ_ｉ，ｋ及びσ_ｉ，ｋの更新を行わず、強化学習部２８はα_ｉ，ｋのみを計算し、再配線回路２４へはα_ｉ，ｋのみが出力される。
【００９３】
本発明のクロスモーダル学習装置は、異種センサによる情報を統合してより正確な行動の選択が行える。選択した行動が不適切な場合には、上記のように事後確率の積分値の分布が各センサモダリティ間で相似しないため、報酬値１／εが小さくなる。そして、これに応答して重みベクトルｗ_ｉが更新され、スプラモダリティ情報ρの結合関係を変化させる。従って、それまでとは異なる注意クラスΩが選択されることになり、これによって行動Ｏも変化する。こうして、外部環境の状態に応じてパラメータが最適化される。このように、本発明では教師データに基づかずに自己教師的にパラメータ最適化が行われるので、環境の変化に対し柔軟に適合可能となる。また、自己運動に伴って生じる外界の変化を計算量の増大なく柔軟に効率良く認識できるようになる。
【００９４】
さらに、本発明によるクロスモーダル学習装置は、センサ数の増加に伴い指数関数的に増加する計算量を抑制する。例えば、センサをＭ個備える認識システムにおいて、各センサモダリティが位置座標上及び速度座標上でそれぞれＮ×Ｎ＝Ｎ^２個の点において抽出されると仮定する。中間的なモダリティ情報を経由せず、各センサ情報の組合せに対して直接的に行動情報をマッピングする従来の処理方法では、センサ情報と行動との関係を記述するマップをセンサ情報の組合せの各々に対して与える必要があるため、マップのサイズは（Ｎ^２）^２Ｍとなり、センサ数Ｍに指数関数的に依存して増大する。それに対し、図１に示す本発明の一実施形態によれば、スプラモダリティ情報ρ_ｉと注意クラスΩとの関係を記述するマップのみを与えれば良く、そのサイズはセンサ数Ｍには依存せず、常に（Ｎ^２）^２となる。また、従来の処理方法では、各センサ情報に対して与えたパラメータを学習により決定する場合、学習に要する計算量は（Ｎ^２）^２Ｍのオーダーとなるが、本発明の一実施形態によれば２ＭＮ^２のオーダーに収まる。従って、装置内部に必要となるマップのサイズと学習に必要となる計算量の両方が低減される。以上の内容を表１にまとめて示す。
【００９５】
【表１】

【００９６】
クロスモーダル学習装置は、単独の行動決定装置として使用できるだけでなく、具体的な応用形態として、自動車やヘリコプター、人間型ロボット等の運動体に搭載し、外界の情報に基づいて運動体のとるべき行動を決定するように使用することができる。
【００９７】
以上本発明のいくつかの実施形態を説明してきｔたが、本発明はこれに限定されるものではない。
【００９８】
【発明の効果】
本発明によれば、複数のセンサ情報を統合し、環境の変化に対し柔軟に適合可能な学習処理を少ない計算処理量と少ない記憶容量で実現することができる。
【図面の簡単な説明】
【図１】本発明による認識処理の概念を説明する図である。
【図２】本発明の一実施形態によるクロスモーダル学習装置のブロック図である。
【図３】報酬値が低い場合の事後確率の積分値の分布の一例を示すグラフである。
【図４】報酬値が高い場合の事後確率の積分値の分布の一例を示すグラフである。
【図５】二重井戸型ポテンシャルの形状の一例を示すグラフである。
【図６】図２のクロスモーダル学習装置による処理を説明するフローチャートである。
【図７】重みベクトルとスプラモダリティ情報の関係を示すグラフである。
【符号の説明】
２０　　　　センサ
２２　　　　サブセンサ
２４　　　　再配線回路
２６　　　　注意的強化学習部
２８　　　　強化学習部
３０　　　　行動評価部
３２　　　　注意要求／転調部
３４　　　　結合記憶マップ
３６　　　　行動制御部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a cross-modal learning device and a recognition processing method for integrating a plurality of different types of sensor information and selecting an appropriate action.
[0002]
[Prior art]
In fields such as recognition processing and measurement, a sensor integration technology using a plurality of sensors is used as a means for improving the accuracy of recognition and measurement and increasing reliability. However, since the detection capability and accuracy of the sensor vary depending on the surrounding environment, if a sensor that is not suitable for the surrounding environment is used, sufficient recognition processing cannot be performed. Therefore, a system that performs recognition processing using a plurality of different types of sensors has been proposed.
[0003]
As an example of such a system, in JP-A-2002-32754, by appropriately changing a weight for information on detection data detected by a plurality of different types of sensors according to a change in a surrounding environment, There is provided a recognition processing device capable of flexibly adapting to environmental changes. In this device, an appropriate weight for the assumed surrounding environment is stored in the storage unit together with the surrounding environment information. Then, in the actual operation, a method is employed in which information about the surrounding environment of the detection area is input, and the surrounding environment is used as reference data by extracting weights from the storage means. Therefore, there is a problem that it is extremely difficult to set an appropriate weight in an unexpected situation where the surrounding environment is not stored in the storage means.
[0004]
It is generally known that when learning or parameter optimization is performed based on data given in advance, that is, teacher data, it is generally difficult to flexibly respond to changes in the environment.
[0005]
If the target is recognized based on only the external information without using the teacher data, it is possible to realize a device that can flexibly respond to changes in the environment. An example of such a system is disclosed in Japanese Patent Application Laid-Open No. H8-305853. In systems that provide all the information necessary for problem solving as symbolic expressions, such as symbolic inference systems, humans need to perform all conversion from external information to symbols and assign meanings. It will be limited. In order to solve this problem, the decision making device according to the present invention constructs a system that converts external information into an internal data expression by processing sensor information. That is, the decision making device has a mechanism for extracting information corresponding to attributes such as shape, motion, color, and texture from various types of sensor information and converting them into an internal data expression that can be collated in the system. Then, the recognition of the recognition target object is performed by comparing the internal data expression with the data stored in the storage storage unit based on the conditional probability. This makes it possible to perform flexible inference different from the conventional symbolic representation without explosive increase in the amount of computation.
[0006]
However, in the present invention, only a case where an image signal is used as sensor information is shown, and no description is given of a processing method when other plural types of sensor information are input simultaneously. In addition, the correspondence table between the recognition target and the action plan is given in advance, and the learning means for acquiring the self is not described.
[0007]
In addition, T.I. Koshizen, K. {Akatsuka} and {H. @Tsujino, "A \ Computational \ Model \ Attentive \ Visual \ System \ Induced \ by \ Cortical \ Neural Networks", \ Neurocomputing, @Vol. 44-46C, pp. 879-885} (Jun. 2002) divides an image acquired by a sensor into a plurality of local regions, extracts features for each local region, fuses the extracted features over the entire image as modality information, and obtains this information. Discloses an image processing apparatus that determines a class of attention for estimating an action based on the class of the attention. Also in the present invention, the sensor information to be used is only an image, and no processing method for integrating other sensor information is described.
[0008]
[Problems to be solved by the invention]
The present invention has been made in view of the above points, and realizes a learning process that integrates a plurality of pieces of sensor information and can flexibly respond to environmental changes without excessively increasing a calculation processing amount and a storage capacity. It is an object of the present invention to provide an apparatus and a method for performing the above.
[0009]
[Means for Solving the Problems]
A cross-modal learning device according to the present invention includes a rewiring circuit that aggregates a plurality of sensor information into modality (sensory information processing style) information, and a cautious reinforcement learning mechanism that learns internal parameters without teacher data. It is characterized by.
[0010]
According to one embodiment of the present invention, a cross-modal learning device includes a plurality of sensors that measure information of the external world, and a modality that separates information captured by each sensor into sensor modality information in position coordinates and sensor modality information in velocity coordinates. Separating means, a rewiring circuit for integrating the sensor modality information in the position coordinates and the sensor modality information in the speed coordinates into a shape splat modality and a motion splat modality, respectively; A reinforcement learning unit, a joint storage map for determining an attention class based on the shape supramorality and the motion supramordality, and an action control unit for outputting an action according to the attention class are provided.
[0011]
In this embodiment, the cross-modal learning device performs a process of integrating a plurality of sensor information into (supra) modality information called shape and (supra) modality information called motion. By integrating information from different types of sensors in this way, a more accurate action can be selected. These pieces of splat modality information are closely related to the probability density distribution.
[0012]
The information captured by each sensor is further separated into a plurality of sub-sensor data as necessary, and the sub-sensor data is separated into sensor modality information in position coordinates and sensor modality information in velocity coordinates.
[0013]
The cautious reinforcement learning unit further includes a reinforcement learning unit that calculates the posterior probability of the action based on the sensor modality information in the position coordinates and the sensor modality information in the speed coordinates. The reinforcement learning unit uses the posterior probabilities to update parameters related to the splat modality information by an expected value maximizing algorithm.
[0014]
The careful reinforcement learning unit further includes an action evaluation unit that evaluates the posterior probability and outputs a reward according to the evaluation result. Also, a cost function is calculated using the reward value, and an attention request / modulation for determining whether or not to perform an attention modulation for instructing a weight vector change according to a comparison result between the cost function and a predetermined threshold value. Including parts. Then, the reinforcement learning unit recalculates the weight vector according to the attention modulation. The joint storage map calculates a joint relationship between the shape supra modality and the motion supra modality using a weight vector, and determines an attention class related to a predictive action.
[0015]
With such a configuration, the cross-modal learning device performs parameter optimization in a self-teacher manner based on no teacher data, and thus can be flexibly adapted to environmental changes. Further, according to one embodiment of the present invention, it is sufficient to prepare only a map describing the relationship between the supramorality information and the attention class, and since the size of this map does not depend on the number of sensors, self-teacher-type learning can be performed. The processing can be realized with a small calculation processing amount and a small storage capacity.
[0016]
BEST MODE FOR CARRYING OUT THE INVENTION
<Conceptual description of the present invention>
First, the concept of the recognition processing according to the present invention will be described with reference to FIG.
[0017]
In the present invention, external information can be obtained using any sensor capable of measuring modality information, such as a visual sensor that captures image information, such as a CCD camera or an X-ray sensor, an auditory sensor such as a microphone, and a tactile sensor that senses tactile pressure. measure. Data measured by these sensors is separated into sub-sensor information as needed. First, position information is extracted from the obtained sub-sensor data by an appropriate process according to the type of the sensor.
[0018]
The extracted position information is represented by s in the two-dimensional position coordinates._{-1, k}Expanded as (X, Y). A rectangle 6 in FIG. 1 is a plane to be recognized, and a small circle in the rectangle 6 represents a position coordinate component. The two-dimensional position coordinates may be different for each sub sensor, for example, when one is a visual sensor and the other is an auditory sensor, but it has a certain correspondence and is converted to the same coordinate by a predetermined calculation. can do.
[0019]
Expanded position coordinate component s_{-1, k}(X, Y) is calculated for each time difference, and s is expressed on the two-dimensional velocity coordinate._{+ 1, k}(V_X, V_Y). Similarly to the position coordinates, a rectangle 8 in the figure is a plane to be recognized, and a small circle in the rectangle 8 represents a velocity coordinate component.
[0020]
In this specification, s_{-1, k}(X, Y) is “position sensor modality”, s_{+ 1, k}(V_X, V_Y) Is referred to as the "speed sensor modality." The two may be collectively referred to as a “sensor modality”.
[0021]
Subsequently, using a set θ of parameters obtained by learning using an expectation value maximization algorithm (EM algorithm), a plurality of sensor modalities separately obtained are respectively expressed in two-dimensional position coordinates and two-dimensional velocity coordinates. At each point, supra modality information ρ_-1(S_-1(X, Y)) and ρ₊₁(S₊₁(V_x, V_y)). Since the former is calculated based on the two-dimensional position coordinates, it can be regarded as expressing the shape style of the recognition target. Therefore, this is called “shape splat modality”. Since the latter is calculated based on the two-dimensional velocity coordinates, it can be regarded as expressing the motion style of the recognition target, and is therefore called "motion supramordality". ρ_-1(S_-1) And ρ₊₁(S₊₁) Collectively collects supramorality information ρ_i(S_i) (I = ± 1, where i = −1 represents a shape and i = + 1 represents a motion).
[0022]
Next, the supramorality information ρ_-1(S_-1) And ρ₊₁(S₊₁) And a weight vector w representing the connection relationship between the supramorality information_iIs used to determine the attention class Ω related to the predictive behavior information. Here, the predictive behavior information is information that serves as a guideline for what behavior should be output in the subsequent processing. Weight vector w_iIs optimized using the Allen-Cahn algorithm. Then, an action based on the determined attention class Ω is output to the outside world.
[0023]
The processing in the above-mentioned document by the inventors of the present application can be considered to correspond to a portion surrounded by a chain line in FIG. In the present invention, only a visual sensor is used as sensor information. Then, local information relating to the motion extracted based on the information obtained from the sensor is calculated by the EM algorithm, and the attention class is determined from the local information.
[0024]
On the other hand, in the cross-modal learning device according to the present invention, it is possible to use a plurality of sensors in an integrated manner. Then, based on the sensor information, the supramorality information relating to both the shape and the movement is calculated, and the attention class is determined based on the supramorality information. That is, one of the features of the cross-modal learning device according to the present invention resides in that a process for aggregating a plurality of pieces of sensor information into a shape supramordality and a motion supramordality is performed. By performing such processing, an increase in the amount of calculation accompanying an increase in the number of sensors is suppressed.
[0025]
<Description of components>
Next, a cross-modal learning device 10 according to an embodiment of the present invention will be described with reference to FIG. The cross-modal learning device 10 includes functional blocks of Z sensors 20, M sub-sensors 22, a rewiring circuit 24, a cautious reinforcement learning unit 26, a combined storage map 34, and an action control unit 36. Among them, the sensor modality in FIG. 1 corresponds to the sub-sensor 22, and the shape supra modality and the motion supra modality are calculated by the rewiring circuit 24. Also, learning of the parameter θ by the EM algorithm in FIG._iIs set by the cautious reinforcement learning unit 26. The cross-modal learning device 10 can be realized by a computer except for the sensor 20, and each functional block can be configured by software or hardware.
[0026]
Here, reference numerals are described in advance. The time step j is a timing at which the external information measured by the sensor 20 is output, and the time advances irrespective of the learning time t in the cautious reinforcement learning unit 26.
[0027]
For a variable A, "A_{i, k, j}A indicates that the k-th (1 ≦ k ≦ M) sub-sensor 22 is information related to the supramordality i output in the time step j. Here, as described above, i = ± 1 is the type of the splat modality information. Also, "A_{i, k}, A indicates that the k-th sub-sensor 22 is the information related to the splat modality i output in the latest time step.
[0028]
Hereinafter, the function of each block of the cross-modal learning device 10 will be described in order.
[0029]
Each of the Z sensors 20 measures external information. Each sensor 20 can be any sensor capable of measuring modality information, such as a visual sensor that captures image information, such as a CCD camera or an X-ray sensor, an auditory sensor such as a microphone, and a tactile sensor that senses tactile pressure. . Depending on the characteristics of the outside world or the object that outputs the action, it is possible to combine sensors that obtain visual information by different actions such as CCD camera and infrared camera, X-ray sensor, or visual sensor and auditory sensor, tactile It is also possible to combine sensors that obtain different modality information, such as sensors.
[0030]
The measured data is separated into sub-sensor data as needed. For example, when the sensor 20 is a CCD camera, the output data of each of red (R), green (G), and blue (B) is set to three sub-sensor data. When the sensor 20 is a microphone, the measured voice data is separated into an appropriate number of frequency bands, and signals in each frequency band are used as sub sensor data. The reason for separating the sensor data into sub-sensor data is to obtain detailed information from one measurement data. Further, the number of sub-sensor data separated from one sensor data is arbitrary. The mechanism for obtaining each sub-sensor data is the sub-sensor 22. The sensor data may be directly used as the sensor modality information described below without using the sub sensor 22.
[0031]
From each sub sensor data, position information s_{i, k}(I = ± 1, 1 ≦ k ≦ M) is extracted. For example, when the information measured by the sensor already includes information on the position, such as when the sensor 20 is a CCD camera, the position information is used as it is (that is, the image intensity is directly used as the position information s)._{i, k}Becomes). If the information measured by the sensor does not clearly include information about the position, such as acoustic signal data measured by a microphone, the position information is obtained by applying a sound source direction estimation method such as a microphone array. Extract. Although the two-dimensional position coordinates are different for each sub-sensor, they have a predetermined correspondence.
[0032]
The extracted position information is represented by a position sensor modality s on the two-dimensional position coordinates._{-1, k}Expanded as (X, Y). Speed information is a position sensor modality_{-1, k}(X, Y) is obtained as the time difference, and the velocity sensor modality s is represented on the two-dimensional velocity coordinate._{+ 1, k}(V_X, V_Y).
[0033]
An example of a method for extracting speed information when the sub-sensor data is image information is described in detail in the above-mentioned document.
[0034]
Obtained position sensor modality s_{-1, k}(X, Y) and speed sensor modality s_{+ 1, k}(V_X, V_Y) Is output to the rewiring circuit 24 and the reinforcement learning unit 28.
[0035]
The rewiring circuit 24 receives all the position sensor modalities and the speed sensor modalities separated by the M sub-sensors 22, respectively, and receives two pieces of supramorality information ρ_-1(S_-1) And ρ₊₁(S₊₁). Supra modality information is represented by the following equation.
[0036]
(Equation 5)

Where s_iIs s_{i, k}(I = ± 1, 1 ≦ k ≦ M). Parameter α_{i, k}, Μ_{i, k}, Σ_{i, k}Is the k-th sensor information s_{i, k}Are the mixing ratio, average, and variance of_{i, k}Expressed by Also, | ds_{i, k}| Represents the resolution of the k-th sub-sensor and s_{i, k}Have the same physical dimensions as (e.g., brightness, frequency, temperature, etc.). Furthermore, the parameter α_{i, k}Is a dimensionless number, 0 ≦ α for each i = ± 1_{i, k}≦ 1 and Σα_{i, k}= 1 is satisfied. As described later, this α_{i, k}By changing the ratio, the degree of contribution of each sensor modality to the supra-modality information can be changed to adapt to changes in the external environment.
[0037]
Parameter θ_{i, k}Is initialized to an appropriate value at time t = 0, and is calculated by the reinforcement learning unit 26 when t> 0.
[0038]
These parameters also have distributions in position coordinates for i = −1 and velocity coordinates for i = + 1. Similarly, supra modality information ρ_i(S_i) Are distributed on two-dimensional coordinates. Calculated supra modality information ρ_i(S_i) Is output to the reinforcement learning unit 28 and the combined storage map 34.
[0039]
As described above, by integrating the splat modality information by the rewiring circuit 24, the parameters to be learned are only two sets of i = ± 1 corresponding to the splat modality information regardless of the number of sensors. Increase can be suppressed. In this embodiment, the sensor information is integrated into two supra modalities. However, three or more supra modalities may be used.
[0040]
Alternatively, supramorality information may be calculated by the following simpler equation.
[0041]
(Equation 6)

Where y_iIs the parameter (function) s_iIs determined to be "-1" or "+1" according to. Again, α_{i, k}Is a dimensionless number and 0 ≦ α_{i, k}≦ 1 and Σα_{i, k}= 1 is satisfied. When this equation is used, the amount of calculation is further reduced.
[0042]
The careful reinforcement learning unit 26 includes a reinforcement learning unit 28 that updates θ by learning using the EM algorithm and outputs the updated θ to the rewiring circuit 24, an action evaluation unit 30 that calculates a reward value for the posterior probability, and a cost function. Calculate the weight vector w_iThe attention request / modulation unit 32 that optimizes
[0043]
The reinforcement learning unit 28 receives the latest sensor modality information from the sub sensor 22 and calculates the posterior probability P for each i and k._ik ^postIs calculated and output to the behavior evaluation unit 30. In addition, using the posterior probabilities, the parameter θ_{i, k}And outputs the result to the rewiring circuit 24. These are performed for each point of the position coordinate and the speed coordinate. Where the posterior probability P_ik ^postIs the ratio of the contribution of the sensor modality from the sub-sensor k to each splat modality.
[0044]
That is, since the sensor modality information calculated from the outside world changes due to the influence of the action determined by the previous calculation, the reinforcement learning unit 28 uses the information to obtain the parameters necessary for constructing the supramordality information. Try to learn as a self-teacher.
[0045]
Posterior probability P at each time step j_ik ^postIs calculated by the following equation when the supramorality is calculated by equation (1).
[0046]
(Equation 7)

[0047]
When the splat modality is calculated by the equation (2), it is obtained by the following equation.
[0048]
(Equation 8)

[0049]
Calculated posterior probability P_ik ^post, The reinforcement learning unit 28 calculates a new parameter θ by the following equation:_{i, k}= (Α_{i, k}, Μ_{i, k}, Σ_{i, k}) Is calculated.
[0050]
(Equation 9)

Here, Q is the number of k-th sensor information used for learning, that is, the number of time steps of sensor output. This number Q may be different between the sensors.
[0051]
σ_{i, k}Can be calculated by the following equation.
[0052]
(Equation 10)

Here, η is a parameter and is set to a value in the range of [0, 1].
[0053]
New parameter θ_{i, k}Are output to the rewiring circuit 24 and are used for calculating modality information in the rewiring circuit 24 in the next time step.
[0054]
The reinforcement learning unit 28 also has a role of optimizing the weight vector w by the Allen-Cahn algorithm when the reward value ε is input from the attention request / modulation unit 32, which will be described later.
[0055]
The behavior evaluation unit 30 calculates the posterior probability P received from the reinforcement learning unit 28._ik ^post, It is evaluated whether or not the change in the external world caused by the action selected last time was appropriate.
[0056]
Specifically, the behavior evaluation unit 30 first calculates the reciprocal ε of the reward value 1 / ε by the following equation.
[0057]
[Equation 11]

[0058]
According to the above equation, ε approaches 0 when the posterior probability integrated in the two-dimensional position coordinates and the posterior probability integrated in the two-dimensional velocity coordinates are close. That is, the reward value 1 / ε is given higher.
[0059]
As an example, three types of sensors, a visual sensor, an auditory sensor, and a tactile sensor, are used as the sensors in FIG. 2, and the posterior probability P for each sensor is used._ik ^postIs distributed as shown in FIG. In this case, since the integrated value of the posterior probabilities for each sensor is significantly different between i = -1 (position coordinates) and i = + 1 (velocity coordinates), ε increases, and thus the reward value 1 / ε decreases. On the other hand, when a distribution as shown in FIG. 4 is obtained for a similar combination of sensors, the distribution of the integrated value of the posterior probability is similar at i = −1 and i = + 1, so ε becomes small. A high reward value 1 / ε is obtained. The calculated reward value 1 / ε is output to the attention request / modulation unit 32. By giving the calculation expression of ε as in Expression (7), the weight vector w is set in a direction in which the distribution of the integrated value is similar between i = −1 and i = + 1._iIs optimized.
[0060]
The attention request / modulation unit 32 calculates the weight vector w based on the reward value calculated by the behavior evaluation unit 30._iIt is determined whether or not an attention request should be made to the reinforcement learning unit 28 to update. Here, the term “attention request” means that it is determined that it is necessary to change internal parameters such as weights added when handling a plurality of splat modality information, and “attention modulation” refers to this request. Outputting means executing parameter change.
[0061]
Specifically, the attention request / modulation unit 32 first calculates the cost function により by the following equation.
[0062]
(Equation 12)

Φ (w_i) Is a double well type potential and has, for example, a shape represented by the following equation.
[0063]
(Equation 13)

c is a parameter that is appropriately set, and Φ (w_i5) is shown in FIG.
[0064]
In addition, the indicator of the integration used in the above equation (8)
[Equation 14]

Means that the result of integrating the quantity A over the entire position coordinates of the recognition target and the result of integrating over the entire velocity coordinates are added. That is, the following relationship is satisfied.
[0065]
(Equation 15)

[0066]
The first term on the right-hand side in the equation for calculating the cost function Ψ in equation (8) is a double-well potential Φ (w_i) Is intended to lower the energy, and the second term is intended to improve the convergence by smoothing the change of w as the learning progresses.
[0067]
The attention request / modulation unit 32 compares the calculation result of the cost function Ψ with a predetermined threshold. If the cost function Ψ is larger than the threshold value, it is determined that caution is required, a caution request is made to the reinforcement learning unit 28, and ε is output. If the cost function Ψ is smaller than the threshold, no attention request is made.
[0068]
When a caution is required, the above-described reinforcement learning unit 28 uses the new weight vector w_iIs calculated. w_iIs also calculated at each point of the position coordinate and the speed coordinate.
[0069]
(Equation 16)

[0070]
FIG. 7 shows the shape of this function. Weight vector w calculated by equation (12)_i, The previous w_iIt is guaranteed by the Allen-Cahn algorithm that the reward value 1 / ε is larger (that is, ε is smaller) as compared with the case where is used. Instead of the Allen-Cahn algorithm, a support vector machine or a neural network may be used.
[0071]
Calculated weight vector w_iIs output to the combined storage map 34.
[0072]
The combined storage map 34 stores the relationship between the splat modality information and the predictive behavior information, and the new weight vector w_i, The relationship is rewritten. Then, using the supramordality information received from the rewiring circuit 24, the attention class Ω related to the predictive behavior information is determined by the following equation.
[0073]
[Equation 17]

[0074]
The determined attention class Ω is sent to the behavior control unit 36.
[0075]
The behavior control unit 36 receives the attention class Ω, converts it into the corresponding behavior output O, and outputs it to the outside world. The correspondence between the attention class Ω and the action output O is acquired in advance by supervised learning, or input by a human assuming an appropriate output in advance. Alternatively, the correspondence may be self-acquired by a higher-order learning function.
[0076]
<Recognition process>
The cross-modal learning device having the above-described functional blocks recognizes the state of the external world by updating the connection relationship of the different types of modality information to the external information in a self-learning manner, and adapts to the external world. Output actions. The linkage between the functional blocks at this time will be described with reference to the flowchart in FIG.
[0077]
The initial state and the start time will be described. First, the parameter θ at each point of the position coordinate and the velocity coordinate_{i, k}= (Α_{i, k}, Μ_{i, k}, Σ_{i, k}) And weight vector w_iSet the initial value of. As an example, α_{i, k}= 1 / M, μ_{i, k}= 0, σ_{i, k}= 1 and w_-1Is a random number generated in the section [-1, 0], w₊₁Is a random number generated in the section [0, 1] as an initial value. The random number can be generated using, for example, a pseudo random number generation function rand () in the C language.
[0078]
The calculation is started, the sensor 20 measures external information, and the sub-sensor 22 detects the sensor modality s at time t = 0._{i, k}To get. The rewiring circuit 24 has a sensor modality s_{i, k}And θ_{i, k}Using the initial value of_i(S_i) Is calculated. At time t = 0, there is no target to be learned by the reinforcement learning unit 28 yet, so the supramorality ρ_i(S_i) Is output only to the combined storage map 34. The combined storage map 34 has a supramorality ρ_i(S_i), The weight vector w_iIs determined using the initial value of. The action control unit 36 outputs the action O corresponding to the attention class Ω to the outside world. Thereafter, the influence of the action O output to the outside in the previous time step is reflected in the outside information captured by the sensor 20 via the outside.
[0079]
From the next time step, the processing described below is repeated.
[0080]
The sensor 20 measures external information at the time step j (S48). The sub sensor 22 separates the information into a position modality and a speed modality (S50). The separated sensor modality is output to the rewiring circuit 24 and the reinforcement learning unit 28.
[0081]
The rewiring circuit 24 calculates the parameter θ determined by the reinforcement learning unit 28 at the previous time._{i, k}Is used to convert the position and velocity modalities into two supra-modality information ρ_-1(S_-1) And ρ₊₁(S₊₁) (S52). The supramorality information is output to the reinforcement learning unit 28 and the combined storage map 34.
[0082]
At this time, the reinforcement learning unit 28 uses the sensor modality received from the sub sensor 22 to generate a new parameter θ by the above equation (5) or (6)._{i, k}Is calculated (S54). New calculated parameter θ_{i, k}Is sent to the rewiring circuit 24, and the supramorality information ρ in the rewiring circuit 24 at the next time step is_-1(S_-1) And ρ₊₁(S₊₁) (Equation (1) or equation (2)).
[0083]
The reinforcement learning unit 28 further uses the sensor modality information at the present time to calculate the posterior probability P at each point of the position coordinate and the speed coordinate._ik ^postIs calculated (S56). Posterior probability P_ik ^postIs output to the behavior evaluation unit 30.
[0084]
Using the distribution of the posterior probabilities calculated by the reinforcement learning unit 28, the behavior evaluation unit 30 calculates the reciprocal ε of the reward value 1 / ε using Expression (7) (S58). The reward value is sent to the attention request / modulation unit 32.
[0085]
The attention request / modulation unit 32 calculates a cost function Ψ according to the equation (8) using the reciprocal ε of the reward value input from the behavior evaluation unit (S60). Weight vector w used here_iIs the value obtained in the previous time calculation. The attention request / modulation unit 32 compares the cost function Ψ with a predetermined threshold (for example, 0.01) (S62), and if Ψ is larger than the threshold, the weight vector w_iIs determined (attention request), and ε is output to the reinforcement learning unit (attention modulation) (S64). If the cost function Ψ is smaller than the threshold, the weight vector w_iJudge that is set appropriately, w_iThe process proceeds to step S68 without updating.
[0086]
When the attention request / modulation unit 32 performs attention modulation, the reinforcement learning unit 28 outputs a new weight vector w._iIs calculated, and the combined storage map is rewritten (S66).
[0087]
The coupling storage map 34 determines the attention class Ω according to the equation (13) based on the splat modality information received from the rewiring circuit 24 (S68). Weight vector w_iHas been updated, the attention class Ω calculated from the splat modality information having the same value changes. If the weight vector has not been updated, the previous value is used. The determined attention class Ω is output to the behavior control unit 36.
[0088]
The action control unit 36 converts the attention class Ω into the action O, and outputs the action O to the outside world (S70). Thus, the calculation of one time step is completed, and the processing from step S48 is repeated again in the next time step.
[0089]
If the cross-modal learning device continues learning for a number of time steps, learning can be performed without performing all of the above processing. Hereinafter, this case will be described.
[0090]
In the flowchart of FIG. 6, the parameter θ is learned by learning using the EM algorithm in step S54._{i, k}= (Α_{i, k}, Μ_{i, k}, Σ_{i, k}) Was calculated to update everything. However, at a stage where learning has progressed to some extent, the parameter θ_{i, k}Μ_{i, k}And σ_{i, k}Can be regarded as zero, and therefore the mixing coefficient α of each sensor modality information._{i, k}And weight vector w of the supramorality information_iIt is possible to adapt to the environment only by updating only.
[0091]
The progress of the learning is determined by using, for example, the following conditional expression.
[0092]
(Equation 18)

Where α_thresIs a constant, and is set to a numerical value such as 0.7. If the above equation holds, μ_{i, k}And σ_{i, k}Is not updated, and the reinforcement learning unit 28_{i, k}Is calculated, and α is supplied to the rewiring circuit 24._{i, k}Only output.
[0093]
The cross-modal learning device of the present invention can perform more accurate action selection by integrating information from different types of sensors. If the selected action is inappropriate, the distribution of the integrated value of the posterior probability is not similar between the sensor modalities as described above, so that the reward value 1 / ε is reduced. Then, in response to this, the weight vector w_iIs updated to change the connection relationship of the supramorality information ρ. Therefore, a different attention class Ω is selected, and the action O changes accordingly. Thus, the parameters are optimized according to the state of the external environment. As described above, in the present invention, the parameter optimization is performed in a self-teacher manner without using the teacher data, so that it is possible to flexibly adapt to changes in the environment. Further, it becomes possible to flexibly and efficiently recognize changes in the external world caused by self-motion without increasing the amount of calculation.
[0094]
Furthermore, the cross-modal learning device according to the present invention suppresses the amount of calculation that increases exponentially with an increase in the number of sensors. For example, in a recognition system having M sensors, each sensor modality has N × N = N on position coordinates and speed coordinates, respectively.²Suppose that it is extracted at points. In the conventional processing method that directly maps action information to each combination of sensor information without passing through intermediate modality information, a map describing the relationship between sensor information and action is used for each combination of sensor information. , The size of the map is (N²)^2MAnd increases exponentially depending on the number M of sensors. In contrast, according to the embodiment of the present invention shown in FIG._iOnly the map describing the relationship between the sensor and the attention class Ω needs to be given.²)²It becomes. Further, in the conventional processing method, when a parameter given to each sensor information is determined by learning, the amount of calculation required for learning is (N²)^2M, But according to one embodiment of the invention 2MN²Fit in the order. Therefore, both the size of the map required inside the device and the amount of calculation required for learning are reduced. The above contents are summarized in Table 1.
[0095]
[Table 1]

[0096]
The cross-modal learning device can be used not only as a single action decision device, but also as a specific application, mounted on a moving object such as a car, helicopter, humanoid robot, etc., and should take a moving object based on information from the outside world Can be used to determine behavior.
[0097]
Although several embodiments of the present invention have been described above, the present invention is not limited to these embodiments.
[0098]
【The invention's effect】
According to the present invention, it is possible to integrate a plurality of sensor information and realize a learning process that can flexibly adapt to changes in the environment with a small amount of calculation processing and a small storage capacity.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating the concept of recognition processing according to the present invention.
FIG. 2 is a block diagram of a cross-modal learning device according to an embodiment of the present invention.
FIG. 3 is a graph showing an example of a distribution of integrated values of posterior probabilities when a reward value is low.
FIG. 4 is a graph showing an example of a distribution of integrated values of posterior probabilities when a reward value is high.
FIG. 5 is a graph showing an example of the shape of a double well potential.
FIG. 6 is a flowchart illustrating processing by the cross-modal learning device in FIG. 2;
FIG. 7 is a graph showing a relationship between a weight vector and supramorality information.
[Explanation of symbols]
20 sensor
22 mm sub sensor
24 rewiring circuit
26 Attentional Reinforcement Learning Department
28 Reinforcement Learning Department
30 Behavior Evaluation Department
32 attention request / modulation part
34 memory map
36 Behavior control unit

Claims

外界の情報を計測する複数のセンサと、
各センサで捉えた情報を位置座標におけるセンサモダリティ情報と速度座標におけるセンサモダリティ情報に分離するモダリティ分離手段と、
前記位置座標におけるセンサモダリティ情報と前記速度座標におけるセンサモダリティ情報をそれぞれ形状スプラモダリティと動きスプラモダリティに統合する再配線回路と、
前記形状スプラモダリティと前記動きスプラモダリティのパラメータを学習する注意的強化学習部と、
前記形状スプラモダリティ及び前記動きスプラモダリティに基づいて注意クラスを決定する結合記憶マップと、
前記注意クラスに応じた行動を出力する行動制御部と、
を備えるクロスモーダル学習装置。Multiple sensors that measure information from the outside world,
Modality separating means for separating information captured by each sensor into sensor modality information in position coordinates and sensor modality information in velocity coordinates,
A rewiring circuit that integrates the sensor modality information at the position coordinates and the sensor modality information at the velocity coordinates into a shape splat modality and a motion splat modality, respectively;
A cautious reinforcement learning unit that learns parameters of the shape splat modality and the motion splat modality,
A combined storage map that determines an attention class based on the shape splat modality and the motion splat modality,
An action control unit that outputs an action according to the attention class,
Cross-modal learning device equipped with.

前記形状スプラモダリティ情報及び前記動きスプラモダリティ情報が次式により計算される、請求項１に記載のクロスモーダル学習装置

ここで、ｓ_ｉ（ｉ＝±１、ここで、ｉ＝−１は形状を表し、ｉ＝＋１は動きを表す）は位置座標におけるセンサモダリティ情報ｓ_−１，ｋ及び速度座標におけるセンサモダリティ情報ｓ_＋１，ｋ（１≦ｋ≦Ｍ）の集合であり、Ｍはセンサモダリティ情報の総数であり、α_ｉ，ｋ、μ_ｉ，ｋ、σ_ｉ，ｋはそれぞれｋ番目のセンサモダリティ情報ｓ_ｉ，ｋの混合比率、平均、分散であり、｜ｄｓ_ｉ，ｋ｜はｋ番目のセンサモダリティ情報の分解能である。The cross-modal learning device according to claim 1, wherein the shape splat modality information and the motion splat modality information are calculated by the following equation.

Here, s _i (i = ± 1, where i = −1 represents a shape and i = + 1 represents a motion) is sensor modality information s− _{1, k} in position coordinates and sensor modality information in velocity coordinates. s _{+ 1, k} (1 ≦ k ≦ M), M is the total number of sensor modality information, and α _{i, k} , μ _{i, k} , σ _{i, k} are the k-th sensor modality information s _{i, respectively. , K,} the mixture ratio, average, and variance, and | ds _{i, k} | is the resolution of the k-th sensor modality information.

ここで、ｓ_ｉ（ｉ＝±１、ここで、ｉ＝−１は形状を表し、ｉ＝＋１は動きを表す）は位置座標におけるセンサモダリティ情報ｓ_−１，ｋ及び速度座標におけるセンサモダリティ情報ｓ_＋１，ｋ（１≦ｋ≦Ｍ）の集合であり、Ｍはセンサモダリティ情報の総数であり、α_ｉ，ｋ、μ_ｉ，ｋ、σ_ｉ，ｋはそれぞれｋ番目のセンサモダリティ情報ｓ_ｉ，ｋの混合比率、平均、分散である。The cross-modal learning device according to claim 1, wherein the shape splat modality information and the motion splat modality information are calculated by the following equation.

Here, s _i (i = ± 1, where i = −1 represents a shape and i = + 1 represents a motion) is sensor modality information s− _{1, k} in position coordinates and sensor modality information in velocity coordinates. s _{+ 1, k} (1 ≦ k ≦ M), M is the total number of sensor modality information, and α _{i, k} , μ _{i, k} , σ _{i, k} are the k-th sensor modality information s _{i, respectively. , K} , average, and variance.

前記注意的強化学習部は、前記位置座標におけるセンサモダリティ情報と前記速度座標におけるセンサモダリティ情報に基づいて前記行動の事後確率を計算する強化学習部をさらに含む、請求項１に記載のクロスモーダル学習装置。The cross-modal learning according to claim 1, wherein the cautious reinforcement learning unit further includes a reinforcement learning unit that calculates a posteriori probability of the action based on the sensor modality information at the position coordinates and the sensor modality information at the speed coordinates. apparatus.

前記強化学習部は、前記事後確率を使用して、期待値最大化アルゴリズムにより前記パラメータを更新する、請求項１に記載のクロスモーダル学習装置。The cross-modal learning device according to claim 1, wherein the reinforcement learning unit updates the parameter using an expected value maximization algorithm using the posterior probability.

前記注意的強化学習部は、前記事後確率を評価して、評価結果に応じた報酬を出力する行動評価部をさらに含む、請求項１に記載のクロスモーダル学習装置。The cross-modal learning device according to claim 1, wherein the cautious reinforcement learning unit further includes an action evaluation unit that evaluates the posterior probability and outputs a reward according to the evaluation result.

前記結合記憶マップは、重みベクトルを使用して前記形状スプラモダリティと前記動きスプラモダリティの結合関係を表現する、請求項１に記載のクロスモーダル学習装置。The cross-modal learning device according to claim 1, wherein the connection storage map expresses a connection relationship between the shape splat modality and the motion splat modality using a weight vector.

前記注意的強化学習部は、前記報酬値を使用してコスト関数を計算し、該コスト関数と所定の閾値との比較結果に応じて前記重みベクトルの変更を指示する注意転調を行うか否かを判断する注意要求／転調部をさらに含む、請求項７に記載のクロスモーダル学習装置。The cautious reinforcement learning unit calculates a cost function using the reward value, and determines whether or not to perform a cautionary modulation for instructing a change of the weight vector according to a comparison result between the cost function and a predetermined threshold value. The cross-modal learning device according to claim 7, further comprising a caution request / modulation unit that determines the condition.

前記強化学習部は前記注意転調に応じて前記重みベクトルを再計算する請求項８に記載のクロスモーダル学習装置。The cross-modal learning device according to claim 8, wherein the reinforcement learning unit recalculates the weight vector according to the attention modulation.

前記強化学習部は、学習の進展度合いに応じて前記パラメータの更新方法を変更する請求項５に記載のクロスモーダル学習装置。The cross-modal learning device according to claim 5, wherein the reinforcement learning unit changes a method of updating the parameter according to a progress degree of learning.

各センサで捉えた情報をさらに複数のサブセンサデータに分離し、該サブセンサデータを位置座標におけるセンサモダリティ情報と速度座標におけるセンサモダリティ情報に分離する、請求項１乃至１０に記載のクロスモーダル学習装置。The cross-modal learning according to any one of claims 1 to 10, wherein information captured by each sensor is further separated into a plurality of sub-sensor data, and the sub-sensor data is separated into sensor modality information in position coordinates and sensor modality information in speed coordinates. apparatus.

外界の情報を計測し、
計測した情報を位置座標におけるセンサモダリティ情報と速度座標におけるセンサモダリティ情報に分離し、
前記位置座標におけるセンサモダリティ情報と前記速度座標におけるセンサモダリティ情報をそれぞれ形状スプラモダリティと動きスプラモダリティに統合し、
前記形状スプラモダリティと前記動きスプラモダリティのパラメータを学習し、
前記形状スプラモダリティ及び前記動きスプラモダリティに基づいて注意クラスを決定し、
前記注意クラスに応じた行動を出力することを含む認識処理方法。Measures information from the outside world,
Separating the measured information into sensor modality information in position coordinates and sensor modality information in velocity coordinates,
Integrating the sensor modality information at the position coordinates and the sensor modality information at the velocity coordinates into a shape splat modality and a motion splat modality, respectively.
Learning the parameters of the shape splat modality and the motion splat modality,
Determine a class of attention based on the shape splat modality and the motion splat modality,
A recognition processing method including outputting an action according to the attention class.

前記形状スプラモダリティ情報及び前記動きスプラモダリティ情報が次式により計算される、請求項１２に記載の認識処理方法

ここで、ｓ_ｉ（ｉ＝±１、ここで、ｉ＝−１は形状を表し、ｉ＝＋１は動きを表す）は位置座標におけるセンサモダリティ情報ｓ_−１，ｋ及び速度座標におけるセンサモダリティ情報ｓ_＋１，ｋ（１≦ｋ≦Ｍ）の集合であり、Ｍはセンサモダリティ情報の総数であり、α_ｉ，ｋ、μ_ｉ，ｋ、σ_ｉ，ｋはそれぞれｋ番目のセンサモダリティ情報ｓ_ｉ，ｋの混合比率、平均、分散であり、｜ｄｓ_ｉ，ｋ｜はｋ番目のセンサモダリティ情報の分解能である。The recognition processing method according to claim 12, wherein the shape splat modality information and the motion splat modality information are calculated by the following equation.

ここで、ｓ_ｉ（ｉ＝±１、ここで、ｉ＝−１は形状を表し、ｉ＝＋１は動きを表す）は位置座標におけるセンサモダリティ情報ｓ_−１，ｋ及び速度座標におけるセンサモダリティ情報ｓ_＋１，ｋ（１≦ｋ≦Ｍ）の集合であり、Ｍはセンサモダリティ情報の総数であり、α_ｉ，ｋ、μ_ｉ，ｋ、σ_ｉ，ｋはそれぞれｋ番目のセンサモダリティ情報ｓ_ｉ，ｋの混合比率、平均、分散である。The recognition processing method according to claim 12, wherein the shape splat modality information and the motion splat modality information are calculated by the following equation.

前記位置座標におけるセンサモダリティ情報と前記速度座標におけるセンサモダリティ情報に基づいて前記行動の事後確率を計算することをさらに含む、請求項１２に記載の認識処理方法。13. The recognition processing method according to claim 12, further comprising calculating a posterior probability of the action based on the sensor modality information at the position coordinates and the sensor modality information at the speed coordinates.

前記パラメータの学習は、前記事後確率を使用して期待値最大化アルゴリズムにより行われる、請求項１２に記載の認識処理方法。The recognition processing method according to claim 12, wherein learning of the parameter is performed by an expected value maximization algorithm using the posterior probability.

前記事後確率を評価して、評価結果に応じた報酬を出力することをさらに含む、請求項１２に記載の認識処理方法。The recognition processing method according to claim 12, further comprising: evaluating the posterior probability and outputting a reward according to the evaluation result.

前記注意クラスの決定は、重みベクトルにより表現された前記形状スプラモダリティと前記動きスプラモダリティの結合関係に基づいて行われる、請求項１２に記載の認識処理方法。The recognition processing method according to claim 12, wherein the determination of the attention class is performed based on a connection relationship between the shape splat modality and the motion splat modality represented by a weight vector.

前記報酬値を使用してコスト関数を計算し、該コスト関数と所定の閾値との比較結果に応じて前記重みベクトルの変更を指示する注意転調を行うか否かを判断することをさらに含む、請求項１８に記載の認識処理方法。Calculating a cost function using the reward value, and further including determining whether to perform attention modulation to instruct the change of the weight vector according to a comparison result between the cost function and a predetermined threshold, The recognition processing method according to claim 18.

前記注意転調に応じて前記重みベクトルを再計算することをさらに含む請求項１９に記載の認識処理方法。20. The recognition processing method according to claim 19, further comprising recalculating the weight vector according to the attention modulation.

学習の進展度合いに応じて前記パラメータの更新方法を変更する請求項１６に記載の認識処理方法。17. The recognition processing method according to claim 16, wherein the parameter updating method is changed according to the degree of progress of learning.

各センサで捉えた情報をさらに複数のサブセンサデータに分離し、該サブセンサデータを位置座標におけるセンサモダリティ情報と速度座標におけるセンサモダリティ情報に分離する、請求項１２乃至２１に記載の認識処理方法。22. The recognition processing method according to claim 12, wherein information captured by each sensor is further separated into a plurality of sub-sensor data, and the sub-sensor data is separated into sensor modality information in position coordinates and sensor modality information in speed coordinates. .

請求項１２ないし２１の何れか１項に記載の処理をコンピュータに実行させるためのプログラム。A program for causing a computer to execute the process according to any one of claims 12 to 21.