JP2012041004A

JP2012041004A - Operation planning method of autonomous mobile robot, control method of autonomous mobile robot using operation planning method of autonomous mobile robot, operation planning device of autonomous mobile robot, operation control device of autonomous mobile robot, operation planning program of autonomous mobile robot and control program of autonomous mobile robot

Info

Publication number: JP2012041004A
Application number: JP2010185831A
Authority: JP
Inventors: Hiroshi Kawano; 洋川野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-08-23
Filing date: 2010-08-23
Publication date: 2012-03-01
Anticipated expiration: 2030-08-23
Also published as: JP5391164B2

Abstract

PROBLEM TO BE SOLVED: To perform an operation plan for enabling an autonomous mobile robot to reach a target position with sufficient accuracy, in an indefinite environment of a flow speed and the flow direction.SOLUTION: A dynamic planning method in an Augmented MDP is applied based on an assumption that the probability distribution on a state of the behavior subject can be expressed by entropy of a state among a partial observation Markov determining process (POMDP). In the present invention, a state of the autonomous mobile robot is expressed by including Cartesian coordinates, an azimuth angle, entropy on the probability distribution of the Cartesian coordinates and entropy on the probability distribution of the azimuth angle, and position coordinates constituted of the Cartesian coordinates and the azimuth angle are probabilistically calculated.

Description

本発明は、自律移動ロボットの動作計画方法及びこれを用いた自律移動ロボットの制御方法に関する。より詳しくは、流速が不確定な流体外乱の存在する環境中で、自律移動ロボットが障害物との衝突を高確率で回避し目標位置に高精度で到達することを可能ならしめる動作計画を行うための技術、そしてその動作計画に基づいて自律移動ロボットを制御するための技術に関する。 The present invention relates to an operation planning method for an autonomous mobile robot and a control method for an autonomous mobile robot using the same. More specifically, in an environment where fluid disturbance with uncertain flow velocity exists, an autonomous mobile robot avoids collision with obstacles with high probability and makes it possible to reach the target position with high accuracy. And a technology for controlling an autonomous mobile robot based on its operation plan.

近年、屋外で活動可能な自律移動ロボットの研究が活発に行われており、それらの応用先が広がりつつある。このような、自律移動ロボットの一例として、自律飛行船を挙げることができる。自律飛行船は、浮力を利用して空中に静止することが可能であり、当該機体に搭載したアクチュエータ（推進器や舵など）によって自身の運動を制御することが可能である。このような特性から、例えば、自律飛行船を地雷探査機として、あるいは大地震後の空中携帯電話中継局として応用することが期待される。 In recent years, research on autonomous mobile robots that can be operated outdoors has been actively conducted, and their application destinations are expanding. An autonomous airship can be mentioned as an example of such an autonomous mobile robot. An autonomous airship can rest in the air using buoyancy, and can control its own motion by an actuator (such as a propulsion device or rudder) mounted on the aircraft. From such characteristics, for example, it is expected that an autonomous airship will be applied as a mine explorer or an airborne mobile phone relay station after a large earthquake.

しかし、自律飛行船は、その機体比重を大気と同じにしなければならないために、機体重量は軽くなければならず、搭載可能なアクチュエータの数やその重量に大きな制約がある。このため、自律飛行船は一般に劣駆動ロボットと呼ばれるものである。劣駆動ロボットとは、劣駆動ロボットに搭載されたアクチュエータによって直接制御可能な劣駆動ロボットの運動自由度数が、劣駆動ロボットの実際の運動自由度数よりも少ないロボットのことである。このような劣駆動ロボットの制御には高度な知的制御アルゴリズムが必要であることが知られている。さらに、自律飛行船の比重は周囲の大気のそれと同じであることから、運動におけるイナーシャが高い。加えて、自律飛行船が受ける空気抵抗に対するアクチュエータの推力が小さいため、一般に自律飛行船の最大航行速度は低い。また、風外乱によってその運動が大きく影響を受けやすいことも問題である。 However, since the autonomous airship must have the same specific gravity as the atmosphere, the weight of the airframe must be light, and there are significant restrictions on the number and weight of actuators that can be mounted. For this reason, autonomous airships are generally called underactuated robots. An underactuated robot is a robot that has a lower degree of freedom of motion of an underactuated robot that can be directly controlled by an actuator mounted on the underactuated robot than the actual degree of freedom of movement of an underactuated robot. It is known that an advanced intelligent control algorithm is necessary for controlling such an underactuated robot. Furthermore, since the specific gravity of an autonomous airship is the same as that of the surrounding atmosphere, inertia in movement is high. In addition, since the actuator thrust against the air resistance experienced by the autonomous airship is small, the maximum speed of the autonomous airship is generally low. Another problem is that the movement is easily affected by wind disturbance.

上述のような特性を持つ自律飛行船を、風外乱と障害物の存在する環境において制御するには、マルコフ決定過程（MDP）における動的計画法の手法を応用した動作計画法を適用することが有効であるとされている（非特許文献１）。 In order to control an autonomous airship with the above characteristics in an environment where wind disturbances and obstacles exist, it is necessary to apply motion planning that applies the dynamic programming method in the Markov Decision Process (MDP). It is considered to be effective (Non-Patent Document 1).

なお、ここでは自律移動ロボットの例として自律飛行船を挙げたが、この他に例えば水中無人探査機のような自律水中ロボットなども例示でき、自律飛行船と同様の特性を持つ。 In addition, although the autonomous airship was mentioned here as an example of an autonomous mobile robot, the autonomous underwater robot like an underwater unmanned explorer etc. can be illustrated in addition to this, and has the same characteristic as an autonomous airship.

Kawano H., "Three-dimensional Obstacle Avoidance of Blimp-type Unmanned Aerial Vehicle Flying in Unknown and Non-uniform Wind Disturbance", JSME International Journal of Robotics and Mechatronics, Vol.19 No.2, April, 2007.Kawano H., "Three-dimensional Obstacle Avoidance of Blimp-type Unmanned Aerial Vehicle Flying in Unknown and Non-uniform Wind Disturbance", JSME International Journal of Robotics and Mechatronics, Vol.19 No.2, April, 2007.

非特許文献１に開示される技術は、自律飛行船の動作環境における風速と風向の不確定性を考慮していないため、風速および風向が不確定な環境中において、目標地点までの到達を十分な精度で達成することが必ずしも容易ではない。また、位置計測センサが持つ計測値の確率的な性質を考慮することもしていない。 Since the technology disclosed in Non-Patent Document 1 does not consider the uncertainty of the wind speed and the wind direction in the operating environment of the autonomous airship, it is sufficient to reach the target point in an environment where the wind speed and the wind direction are uncertain. Achieving with accuracy is not always easy. In addition, the probabilistic nature of the measurement value of the position measurement sensor is not taken into consideration.

そこで本発明は、流速および流れの向きが不確定な環境中において、自律移動ロボットが障害物との衝突を高確率で回避し目標位置に十分な精度で到達することを可能ならしめる動作計画を行うための技術、そしてその動作計画に基づいて自律移動ロボットを制御するための技術を提供することを目的とする。 Therefore, the present invention provides an operation plan that enables an autonomous mobile robot to avoid a collision with an obstacle with high probability and reach a target position with sufficient accuracy in an environment where the flow velocity and the flow direction are uncertain. It is an object of the present invention to provide a technique for performing an autonomous mobile robot and a technique for controlling the autonomous mobile robot based on the operation plan.

本発明である自律移動ロボットの動作計画技術は次のとおりである。自律移動ロボットの状態が、デカルト座標、方位角、当該デカルト座標の確率分布についてのエントロピー、当該方位角の確率分布についてのエントロピーを含んで表現されるとし、自律移動ロボットが遷移しえる予め定められた状態の集合（状態集合）と自律移動ロボットの採りえる行動の集合（行動集合）が予め定められている。そして、状態・行動選択部が、状態集合の要素と行動集合の要素との組み合わせのうち未選択の組み合わせを選択する〔状態・行動選択処理〕。現在位置座標決定部は、状態・行動選択処理において選択された組み合わせに含まれる状態の下で、自律移動ロボットのデカルト座標および方位角を要素とする位置座標の確率分布を計算し〔現在位置座標決定処理〕、移動先位置座標選択部は、状態遷移確率に基づき、自律移動ロボットの移動後の位置を計算する〔移動先位置座標選択処理〕。さらに、存在確率計算部が、現在位置座標決定処理で計算された自律移動ロボットの位置座標の確率分布にベイジアンフィルタを適用して、遷移先の状態を求め〔存在確率計算処理〕、遷移確率計算部が、現在の状態から存在確率計算処理で得られた状態へ遷移する遷移確率と当該遷移に伴う報酬を計算する〔遷移確率計算処理〕。制御部は、状態・行動選択処理において選択された組み合わせについて、現在位置座標決定処理、移動先位置座標選択処理、存在確率計算処理、遷移確率計算処理を所定回数繰り返し実行し、当該所定回数繰り返し実行した場合には、状態・行動選択処理を行う制御を行う〔制御処理〕。そして、価値関数計算部が、遷移確率と報酬を用いて価値関数を計算し〔価値関数計算処理〕、方策計算部が、遷移確率と報酬と価値関数を用いて表される方策関数を計算する〔方策計算処理〕。 The operation planning technique of the autonomous mobile robot according to the present invention is as follows. It is assumed that the state of the autonomous mobile robot is expressed including Cartesian coordinates, azimuth angle, entropy about the probability distribution of the Cartesian coordinates, and entropy about the probability distribution of the azimuth angle. A set of states (state set) and a set of actions (action set) that can be taken by the autonomous mobile robot are determined in advance. Then, the state / behavior selection unit selects an unselected combination among the combinations of the state set elements and the action set elements [state / behavior selection process]. The current position coordinate determination unit calculates the probability distribution of the position coordinates with the Cartesian coordinates and azimuth of the autonomous mobile robot as elements, under the states included in the combination selected in the state / behavior selection process. Determination processing], the movement destination position coordinate selection unit calculates the position after movement of the autonomous mobile robot based on the state transition probability [movement destination position coordinate selection processing]. Further, the existence probability calculation unit applies a Bayesian filter to the probability distribution of the position coordinates of the autonomous mobile robot calculated in the current position coordinate determination process to obtain a transition destination state (existence probability calculation process), and calculates a transition probability. The section calculates the transition probability of transition from the current state to the state obtained by the existence probability calculation process and the reward associated with the transition [transition probability calculation process]. The control unit repeatedly executes the current position coordinate determination process, the movement destination position coordinate selection process, the existence probability calculation process, and the transition probability calculation process a predetermined number of times for the combination selected in the state / action selection process, and repeatedly executes the predetermined number of times. If so, control to perform state / action selection processing is performed [control processing]. Then, the value function calculation unit calculates the value function using the transition probability and the reward [value function calculation processing], and the policy calculation unit calculates the policy function expressed using the transition probability, the reward, and the value function. [Measure calculation process].

また、動作計画技術においては、計測位置座標選択部が、自律移動ロボットの位置座標を計測するセンサの計測誤差を確率分布で表し、当該センサで計測された位置座標と移動先位置座標選択処理において得られた自律移動ロボットの移動後の位置との誤差に基づいて、自律移動ロボットの移動後の位置を計算する〔計測位置座標選択〕としてもよい。この場合、存在確率計算処理では、存在確率計算部が、現在位置座標決定処理で計算された自律移動ロボットの位置座標の確率分布にベイジアンフィルタを適用して得られた確率分布を計測位置座標選択処理で得られた自律移動ロボットの移動後の位置に基づいて補正して、遷移先の状態を求めることになる。なお、制御処理では、制御部が、状態・行動選択処理において選択された組み合わせについて、現在位置座標決定処理、移動先位置座標選択処理、計測位置座標選択処理、存在確率計算処理、遷移確率計算処理の各処理を所定回数繰り返し実行し、当該所定回数繰り返し実行した場合には、状態・行動選択処理を行う制御を行う。 In the motion planning technology, the measurement position coordinate selection unit represents a measurement error of a sensor that measures the position coordinates of the autonomous mobile robot by a probability distribution, and in the position coordinate measured by the sensor and the destination position coordinate selection process. The position after movement of the autonomous mobile robot may be calculated based on the obtained error from the position after movement of the autonomous mobile robot [measurement position coordinate selection]. In this case, in the existence probability calculation process, the existence probability calculation unit selects a measurement position coordinate from the probability distribution obtained by applying a Bayesian filter to the probability distribution of the position coordinates of the autonomous mobile robot calculated in the current position coordinate determination process. Correction is made based on the position after the movement of the autonomous mobile robot obtained by the processing, and the transition destination state is obtained. In the control process, the control unit determines the current position coordinate determination process, the movement destination position coordinate selection process, the measurement position coordinate selection process, the existence probability calculation process, and the transition probability calculation process for the combination selected in the state / action selection process. Each of these processes is repeatedly executed a predetermined number of times, and when the predetermined process is repeatedly executed, a control for performing a state / action selection process is performed.

遷移確率を求めるための計算を自律移動ロボットの１回の行動で遷移可能な状態の範囲に限定してもよいし、遷移確率が０である場合には価値関数の計算を行わないようにしてもよい。 The calculation for obtaining the transition probability may be limited to the range of states that can be transitioned by one action of the autonomous mobile robot, and the value function is not calculated when the transition probability is zero. Also good.

本発明である自律移動ロボットの制御技術は、上述の自律移動ロボットの動作計画技術で決まった動作計画に基づき、自律移動ロボットを制御する。動作計画技術により得られた方策関数および、自律移動ロボットの採りえる行動の集合は記憶部に記憶されている。位置取得部が、自律移動ロボットのデカルト座標および方位角を要素とする位置座標を計測するセンサの計測結果を取得する〔位置取得処理〕。そして、状態遷移確率計算部が、状態遷移確率に基づき、自律移動ロボットの移動後の位置を計算する〔状態遷移確率計算〕。存在確率予測値計算部は、位置取得処理にて取得された位置座標から遷移先の位置座標に自律移動ロボットが存在する確率の予測値（存在確率予測値）を求める〔存在確率予測値計算処理〕。移動先状態決定部は、存在確率予測値を最大にする位置座標を遷移先とする〔移動先状態決定処理〕。行動決定部は、方策関数により行動を決定する〔行動決定処理〕。 The autonomous mobile robot control technology according to the present invention controls the autonomous mobile robot based on the operation plan determined by the above-described autonomous mobile robot operation planning technology. A policy function obtained by the motion planning technique and a set of actions that can be taken by the autonomous mobile robot are stored in the storage unit. A position acquisition part acquires the measurement result of the sensor which measures the position coordinate which uses the Cartesian coordinate and azimuth of an autonomous mobile robot as an element [position acquisition process]. And a state transition probability calculation part calculates the position after the movement of an autonomous mobile robot based on a state transition probability [state transition probability calculation]. The presence probability predicted value calculation unit obtains a predicted value (presence probability predicted value) of the probability that the autonomous mobile robot exists at the position coordinate of the transition destination from the position coordinates acquired in the position acquisition process [presence probability predicted value calculation process ]. The destination state determination unit sets a position coordinate that maximizes the existence probability prediction value as a transition destination [movement destination state determination process]. The action determination unit determines an action using a policy function [action determination process].

この制御技術においては、計測確率計算部が、確率分布で表されたセンサの計測誤差に基づいて、センサで計測された位置座標の確率分布を求め〔計測確率計算処理〕、存在確率予測値補正部が、存在確率予測値を計測確率計算処理で得られた確率分布に基づいて補正する〔存在確率予測値補正処理〕としてもよい。この場合、移動先状態決定処理では、移動先状態決定部が、存在確率予測値補正処理で補正された存在確率予測値を最大にする位置座標を遷移先とする。 In this control technology, the measurement probability calculation unit calculates the probability distribution of the position coordinates measured by the sensor based on the measurement error of the sensor represented by the probability distribution [measurement probability calculation processing], and corrects the existence probability prediction value The unit may correct the presence probability predicted value based on the probability distribution obtained by the measurement probability calculation process [presence probability predicted value correction process]. In this case, in the movement destination state determination process, the movement destination state determination unit sets the position coordinate that maximizes the existence probability predicted value corrected by the existence probability prediction value correction process as the transition destination.

本発明に拠れば、自律移動ロボットの状態が、デカルト座標と方位角に加えて、当該デカルト座標の確率分布についてのエントロピー、当該方位角の確率分布についてのエントロピーを含んで表現されており、デカルト座標と方位角からなる位置座標が確率的に計算されることから、流速および流れの向きが不確定な環境中において、自律移動ロボットが障害物との衝突を高確率で回避し目標位置に十分な精度で到達することが可能となる。 According to the present invention, the state of the autonomous mobile robot is expressed including the entropy about the probability distribution of the Cartesian coordinate and the entropy about the probability distribution of the azimuth in addition to the Cartesian coordinates and the azimuth. Position coordinates consisting of coordinates and azimuth are calculated probabilistically, so in an environment where the flow velocity and flow direction are uncertain, the autonomous mobile robot avoids collisions with obstacles with high probability and is sufficient for the target position. It is possible to reach with high accuracy.

行動計画装置の機能構成例を示す図。The figure which shows the function structural example of an action plan apparatus. 行動計画方法の処理手順を示す図。The figure which shows the process sequence of an action plan method. 行動制御装置の機能構成例を示す図。The figure which shows the function structural example of an action control apparatus. 行動制御方法の処理手順を示す図。The figure which shows the process sequence of a behavior control method.

図面１−４を参照して本発明の実施形態を説明する。なお、説明を具体的にするために自律移動ロボット（行動主体）の一例として自律飛行船を採用する。劣駆動ロボットとしての自律飛行船の例として、舵と前後進用推進器を装備した自律飛行船を考えることができる。自律飛行船の重心位置は通常低く設定されているので縦横姿勢傾斜（pitchingとrolling）に対する復元力は大きく、自律飛行船の姿勢角は方位角（yawing）を除いてほぼ０に維持される。また、自律飛行船に搭載される位置計測用のセンサとして、例えば、複数の無線LAN方式の位置計測センサを採用する。計算の都合、無線LAN方式の位置計測センサは自律飛行船の水平面内重心位置に装備されているとする。さらに、自律飛行船は、方位角の値を直接計測するための方位角計測センサとしてジャイロセンサも装備している。これらのセンサによって、自律飛行船の３次元デカルト座標（世界座標）と方位角の実時間での計測が可能である。 An embodiment of the present invention will be described with reference to FIGS. In order to make the explanation more specific, an autonomous airship is adopted as an example of an autonomous mobile robot (action subject). As an example of an autonomous airship as an underactuated robot, an autonomous airship equipped with a rudder and a forward / backward propulsion device can be considered. Since the center of gravity of the autonomous airship is normally set low, the restoring force against the vertical and horizontal posture inclination (pitching and rolling) is large, and the attitude angle of the autonomous airship is maintained at almost 0 except for the azimuth angle (yawing). As a position measurement sensor mounted on an autonomous airship, for example, a plurality of wireless LAN type position measurement sensors are employed. For convenience of calculation, it is assumed that a wireless LAN position measurement sensor is installed at the center of gravity of the autonomous airship in the horizontal plane. Furthermore, the autonomous airship is also equipped with a gyro sensor as an azimuth measuring sensor for directly measuring the value of the azimuth. With these sensors, it is possible to measure the three-dimensional Cartesian coordinates (world coordinates) and azimuth of the autonomous airship in real time.

時刻ｔにおける位置計測センサの計測値を（Ｍ_ｘ（ｔ），Ｍ_ｙ（ｔ），Ｍ_ｚ（ｔ））とし、方位角計測センサの計測値をＭ_φ（ｔ）とする。そして、自律飛行船の重心位置の真値を（Ｘ_ｇ（ｔ），Ｙ_ｇ（ｔ），Ｚ_ｇ（ｔ））とし、方位角の真値をφ（ｔ）とする。自律飛行船の重心位置を表現するデカルト座標は、水平面内では、Ｘ軸もしくはＹ軸が平均風向に平行であるように定義する。ここでは、Ｙ軸が平均風向に一致しているとする。Ｘ軸に平均風向が一致している場合については、以下の記述でのＹ軸に関する説明をＸ軸に置き換えて理解すればよいから説明を省略する。なお、デカルト座標の設定は任意であるから、平均風向がＸ軸とＹ軸のいずれの軸にも一致しないということを避けることができることは云うまでもない。 The measurement values of the position measurement sensor at time t and _{_{(M x (t), M}} y (t), M z (t)), the measured value of the azimuth measuring sensor M _phi and (t). The true value of the center of gravity position of the autonomous airship is (X _g (t), Y _g (t), Z _g (t)), and the true value of the azimuth angle is φ (t). Cartesian coordinates expressing the position of the center of gravity of the autonomous airship are defined so that the X axis or Y axis is parallel to the average wind direction in the horizontal plane. Here, it is assumed that the Y axis coincides with the average wind direction. When the average wind direction coincides with the X-axis, the description regarding the Y-axis in the following description may be replaced with the X-axis, and the description will be omitted. In addition, since the setting of Cartesian coordinate is arbitrary, it cannot be overemphasized that it can avoid that an average wind direction does not correspond to any axis | shaft of an X-axis and a Y-axis.

〔従来モデル〕
本発明の理解の一助として、まず従来的手法について概説する。
従来技術は、マルコフ決定過程(MDP)を利用して、自律飛行船の動作計画を行う。MDPでは環境が以下のようにモデル化される。 [Conventional model]
As an aid to understanding the present invention, first, conventional techniques will be outlined.
The prior art uses the Markov Decision Process (MDP) to plan the operation of an autonomous airship. In MDP, the environment is modeled as follows.

環境を以下のようにモデル化したものが、マルコフ状態遷移モデルである（強化学習〔Reinforcement Learning〕におけるマルコフ決定過程〔Markov decision Process〕）。環境のとりうる離散的な状態の集合をＳ＝｛ｓ_１，ｓ_２，…，ｓ_ｎ｝、行動主体が取り得る行動の集合をＡ＝｛ａ_１，ａ_２，…ａ_ｍ｝と表す。環境中のある状態ｓ∈Ｓにおいて、行動主体がある行動ａ∈Ａを実行すると、環境は確率的に状態ｓ’∈Ｓへ遷移する。その遷移確率を
Ｐ（ｓ’｜ｓ，ａ）＝Ｐｒ｛ｓ_ｔ＋１＝ｓ’｜ｓ_ｔ＝ｓ，ａ_ｔ＝ａ｝
により表す。このとき環境から行動主体へ報酬ｒが確率的に与えられるが、その期待値を
Ｒ（ｓ’｜ｓ，ａ）＝Ｅ｛ｒ_ｔ｜ｓ_ｔ＝ｓ，ａ_ｔ＝ａ，ｓ_ｔ＋１＝ｓ’｝
とする。行動主体の各時刻における意志決定は方策関数
π（ｓ，ａ）＝Ｐｒ｛ａ_ｔ＝ａ｜ｓ_ｔ＝ｓ｝
によって表される。π（ｓ，ａ）は、全状態ｓおよび全行動ａにおいて定義される。方策関数π（ｓ，ａ）は、単に方策πとも呼ばれる。なお、状態ｓ’に附されている記号’は、状態ｓとの識別を図るための記号である。 A model of the environment as follows is a Markov state transition model (Markov decision process in Reinforcement Learning). Possible discrete _S = the set of states of the environment _{{s 1, s 2, ...} , s n}, action _A = the set of entities can take action _{{a 1, a 2, ...} a m} represents the . In a state sεS in the environment, when an action aεA is executed, the environment probabilistically changes to the state s′εS. The transition probability is represented by P (s ′ | s, a) = Pr {s _{t + 1} = s ′ | s _t = s, a _t = a}.
Is represented by At this time, the reward r is probabilistically given from the environment to the action subject, and the expected value is R (s ′ | s, a) = E {r _t | s _t = s, a _t = a, s _{t + 1} = s '}
And Decision Strategies function π at each time actors _{(s, a) = Pr {} a t = a | s t = s}
Represented by π (s, a) is defined in all states s and all actions a. The policy function π (s, a) is also simply called policy π. Note that the symbol 'attached to the state s' is a symbol for identifying the state s.

状態ｓはｓ＝（Ｘ_ｓ，Ｙ_ｓ，Ｚ_ｓ，φ_ｓ）により構成される。
Ｘ_ｓ：Ｘ軸の離散化表現。自律飛行船の存在確率の最も高いＸ座標。
Ｙ_ｓ：Ｙ軸の離散化表現。自律飛行船の存在確率の最も高いＹ座標。
Ｚ_ｓ：Ｚ軸の離散化表現。自律飛行船の存在確率の最も高いＺ座標。
φ_ｓ：φ軸の離散化表現。自律飛行船の存在確率の最も高いφ座標（φは方位角）。 The state s is constituted by s = (X _s , Y _s , Z _s , φ _s ).
X _s : Discrete representation of the X axis. X coordinate with the highest probability of existence of an autonomous airship.
Y _s : Discrete representation of the Y axis. Y coordinate with the highest probability of existence of an autonomous airship.
Z _s : Z-axis discrete expression. Z coordinate with the highest probability of existence of an autonomous airship.
φ _s : A discretized representation of the φ axis. The φ coordinate with the highest probability of existence of an autonomous airship (φ is the azimuth angle).

ある時刻ｔで実行した行動が、その後の報酬獲得にどの程度貢献したのかを評価するため、その後に得られる報酬の時系列を考える。報酬の時系列評価は価値と呼ばれる。行動主体の目標は、価値を最大化すること、あるいはそのような方策を求めることである。価値は、時間の経過とともに報酬を割引率γ（０≦γ＜１）で割引いて合計される。すなわち、ある時刻ｔにおける状態ｓにおいて、ある行動ａを実行したときの価値関数Ｖ^π（ｓ）は、以下のように定義される。Ｅ_πは期待値を求める関数である。

In order to evaluate how much the action executed at a certain time t contributed to the subsequent reward acquisition, a time series of rewards obtained thereafter is considered. The time series evaluation of reward is called value. The goal of the action actor is to maximize value or seek such a strategy. The value is totaled by discounting the reward with a discount rate γ (0 ≦ γ <1) over time. That is, the value function V ^π (s) when a certain action a is executed in the state s at a certain time t is defined as follows. E _π is a function for _obtaining an expected value.

ここでは価値関数として方策πの下での状態ｓの価値である状態価値関数Ｖ^π（ｓ）を例示したが、方策πの下で状態ｓにおいて行動ａを採ることの価値である行動価値関数Ｑ^π（ｓ，ａ）を採用することもできる。

Here, the state value function V ^π (s) that is the value of the state s under the policy π is illustrated as the value function, but the action value function that is the value of taking the action a in the state s under the policy π. Q ^π (s, a) can also be adopted.

行動主体の目標は、最適な方策を求めること、つまり任意の状態ｓについて価値関数（上記の例では状態価値関数Ｖ^π（ｓ）である）が他の方策を採った場合よりも劣るものではない方策を求めることである。この方策の探求は、Bellman方程式で表され、すべての状態ｓ、行動ａ、遷移先の状態ｓ’の組み合わせについてのＰ（ｓ’｜ｓ，ａ）とＲ（ｓ’｜ｓ，ａ）の値が定まっていれば、ダイナミックプログラミング（Dynamic Programming）法により、最適な価値関数Ｖ^π（ｓ）及び方策πを計算することができる（例えば、三上貞芳、皆川雅章共訳、R.S.Sutton、A.G.Barto 原著「強化学習」森北出版、1998、pp.94-118参照）。ダイナミックプログラミング法の処理は、周知技術であるため説明は省略する。 The goal of the action subject is to find the optimal policy, that is, the value function (in the above example, the state value function V ^π (s)) is not inferior to the case where other policies are taken for an arbitrary state s. There is no way to seek a policy. The search for this strategy is expressed by the Bellman equation, and P (s ′ | s, a) and R (s ′ | s, a) for all combinations of state s, action a, and transition state s ′. If the value is fixed, the optimal value function V ^π (s) and policy π can be calculated by the dynamic programming method (for example, Sadayoshi Mikami, Masaaki Minagawa co-translation, RSSutton, AGBarto Original) (Refer to “Reinforcement Learning” Morikita Publishing, 1998, pp.94-118). Since the processing of the dynamic programming method is a well-known technique, description thereof is omitted.

〔本発明におけるモデル〕
本発明は、部分観測マルコフ決定過程（POMDP）のうち、行動主体の状態についての確率分布（これは一般的に信念と呼ばれる）が状態のエントロピーにて表現可能であるという仮定に基づく、いわばPOMDPのサブセットであるAugmented MDP（A-MDP）における動的計画法を応用する。これにより、流速や流れの向きが変化した際の自律移動ロボットの位置や方位角のずれの不確定性の程度が自律移動ロボットの方位角によって異なりえるが、本発明によって最適な経路が変化することを考慮した自律移動ロボットの動作計画が現実的な計算負荷で可能である。 [Model in the present invention]
The present invention is based on the assumption that the probability distribution (which is generally called belief) of the state of the action subject in the partially observed Markov decision process (POMDP) can be expressed by the entropy of the state. Apply dynamic programming in Augmented MDP (A-MDP), a subset of As a result, the position of the autonomous mobile robot and the degree of uncertainty of the azimuth angle deviation when the flow velocity and the flow direction change may vary depending on the azimuth angle of the autonomous mobile robot, but the optimum route changes according to the present invention. Therefore, it is possible to plan the movement of autonomous mobile robots with realistic calculation load.

本発明で利用するA-MDPにおいては、MDPと同様に離散空間で状態が表現される。ただし、当該A-MDPでは、従来的なMDPと異なり、「自律飛行船の状態がどの状態にあるのかという確率」（信念）を記述するための変数がA-MDPの状態空間の要素である状態に追加されている。このため本発明では、状態空間を信念空間Ｂ＝｛ｂ_１，ｂ_２，…，ｂ_ｎ｝として構築する。ここで、信念空間中の状態ｂ∈Ｂは、ｂ＝（Ｘ_ｂ，Ｙ_ｂ，Ｚ_ｂ，φ_ｂ，Ｈ_Ｘｂ，Ｈ_Ｙｂ，Ｈ_Ｚｂ，Ｈ_φｂ）である。各変数の意味は以下の通りである。
Ｘ_ｂ：Ｘ軸の離散化表現。状態ｂにおいて自律飛行船の存在確率の最も高いＸ座標。
Ｙ_ｂ：Ｙ軸の離散化表現。状態ｂにおいて自律飛行船の存在確率の最も高いＹ座標。
Ｚ_ｂ：Ｚ軸の離散化表現。状態ｂにおいて自律飛行船の存在確率の最も高いＺ座標。
φ_ｂ：φ軸の離散化表現。状態ｂにおいて自律飛行船の存在確率の最も高いφ座標。
Ｈ_Ｘｂ：自律飛行船のＸ_ｂ値の確率分布についてのエントロピーＨ_Ｘの離散化表現。
Ｈ_Ｙｂ：自律飛行船のＹ_ｂ値の確率分布についてのエントロピーＨ_Ｙの離散化表現。
Ｈ_Ｚｂ：自律飛行船のＺ_ｂ値の確率分布についてのエントロピーＨ_ｚの離散化表現。
Ｈ_φｂ：自律飛行船のφ_ｂ値の確率分布についてのエントロピーＨ_φの離散化表現。 In the A-MDP used in the present invention, the state is expressed in a discrete space like the MDP. However, in the A-MDP, unlike the conventional MDP, the state in which the variable for describing the "probability of the state of the autonomous airship" (belief) is an element of the state space of the A-MDP. Has been added to. Therefore, in the present invention, the state space is constructed as the belief space B = {b ₁ , b ₂ ,..., B _n }. Here, the state bεB in the belief space is b = (X _b , Y _b , Z _b , φ _b , H _Xb , H _Yb , H _Zb , H _φb ). The meaning of each variable is as follows.
X _b : Discrete representation of the X axis. X coordinate with the highest probability of existence of an autonomous airship in state b.
Y _b : Discrete representation of the Y axis. Y coordinate with the highest probability of existence of an autonomous airship in state b.
Z _b : Z-axis discrete expression. Z coordinate with the highest probability of existence of an autonomous airship in state b.
φ _b : A discrete representation of the φ axis. Φ coordinate with the highest probability of existence of an autonomous airship in state b.
H _Xb : A discretized representation of entropy H _X for the probability distribution of the X _b value of an autonomous airship.
H _Yb : A discretized representation of entropy H _Y for the probability distribution of the Y _b value of an autonomous airship.
H _Zb : A discretized representation of entropy H _z for the probability distribution of the Z _b value of an autonomous airship.
H _φb : A discretized representation of entropy H _φ for the probability distribution of φ _b values of an autonomous airship.

信念空間中の状態ｂ∈Ｂは、MDPの場合の状態空間Ｓに含まれる状態ｓ＝（Ｘ_ｓ，Ｙ_ｓ，Ｚ_ｓ，φ_ｓ）に対応するものである。また、以下の説明では、状態ｂにおいて最も存在確率の高い位置座標を（Ｘ_ｂ，Ｙ_ｂ，Ｚ_ｂ，φ_ｂ）と表記し、それよりも存在確率の低い位置座標を（Ｘ_ｂ，ｉ，Ｙ_ｂ，ｉ，Ｚ_ｂ，ｉ，φ_ｂ，ｉ）と表記する。 The state bεB in the belief space corresponds to the state s = (X _s , Y _s , Z _s , φ _s ) included in the state space S in the case of MDP. In the following description, the position coordinates with the highest existence probability in the state b are expressed as (X _b , Y _b , Z _b , φ _b ), and the position coordinates with the lower existence probability are (X _{b, i} , Y _{b, i} , Z _{b, i} , φ _{b, i} ).

離散状態空間に確率の概念を導入することの意義は、自律飛行船の任務環境内に吹いている風の速度が一定ではなく或る幅の揺らぎをもって変化していることを考慮できる点にある。現実には、風速が平均値を中心にランダムに変化することが多い。このため、風の影響を受けて飛行する自律飛行船の毎回の行動ごとの変位量もその度にランダムに変化するのである。このような自律飛行船の変位の揺らぎに加えて自律飛行船の位置姿勢を計測するセンサの確率的計測精度を考慮したうえで、行動選択を行う毎に自律飛行船の任務環境内における各位置での存在確率を評価し、行動選択に反映することが本発明の特徴の一つとなっている。 The significance of introducing the concept of probability in the discrete state space is that it is possible to consider that the speed of the wind blowing in the mission environment of the autonomous airship is not constant but varies with a certain fluctuation. In reality, the wind speed often changes randomly around the average value. For this reason, the displacement amount for each action of the autonomous airship flying under the influence of the wind also changes randomly each time. In addition to the fluctuation of the autonomous airship's displacement, the existence of each position in the mission environment of the autonomous airship every time an action is selected, taking into account the probabilistic measurement accuracy of the sensor that measures the position and orientation of the autonomous airship It is one of the features of the present invention that the probability is evaluated and reflected in the action selection.

ここで、エントロピーの定義を述べる。エントロピーとは確率分布のばらつき具合を記述する変数である。一例として、自律飛行船のＸ方向の位置がＸ_ｂ，ｉ値である確率をＰ（Ｘ_ｂ，ｉ）とするとＸ_ｂ値の確率分布についてのエントロピーＨ_Ｘは式（１）により計算される。Ｈ_ＸｂはこのＨ_Ｘの値を離散化したものである。

Here, the definition of entropy is described. Entropy is a variable that describes the degree of variation in probability distribution. As an example, if the probability that the position of the autonomous airship in the X direction is an X _{b, i} value is P (X _{b, i} ), the entropy H _X for the probability distribution of the X _b value is calculated by Equation (1). H _Xb is a discretized value of H _X.

同様に、自律飛行船のＹ方向の位置がＹ_ｂ，ｉ値である確率をＰ（Ｙ_ｂ，ｉ）とするとＹ_ｂ値の確率分布についてのエントロピーＨ_Ｙは式（２）により、自律飛行船のＺ方向の位置がＺ_ｂ，ｉ値である確率をＰ（Ｚ_ｂ，ｉ）とするとＺ_ｂ値の確率分布についてのエントロピーＨ_Ｚは式（３）により、自律飛行船の方位角がφ_ｂ，ｉ値である確率をＰ（φ_ｂ，ｉ）とするとφ_ｂ値の確率分布についてのエントロピーＨ_φは式（４）により計算される。Ｈ_ＹｂはこのＨ_Ｙの値を離散化したものであり、Ｈ_ＺｂはこのＨ_Ｚの値を離散化したものであり、Ｈ_φｂはこのＨ_φの値を離散化したものである。

Similarly, if the probability that the position of the autonomous airship in the Y direction is Y _{b, i} is P (Y _{b, i} ), the entropy H _Y for the probability distribution of the Y _b value is If the probability that the position in the Z direction is Z _{b, i} value is P (Z _{b, i} ), the entropy H _Z for the probability distribution of Z _b value is expressed by equation (3), and the azimuth angle of the autonomous airship is φ _b, If the probability of _i value is P (φ _{b, i} ), entropy H _φ for the probability distribution of φ _b value is calculated by equation (4). H _Yb is a value obtained by discretizing the value of H _Y , H _Zb is a value obtained by discretizing the value of H _Z , and H _φb is a value obtained by discretizing the value of H _φ .

A-MDPにおいても、MDPの場合と同様に、或る時刻ｔにおける状態ｂ∈Ｂにおいて、最適な行動ａを実行したときの価値関数Ｖ（ｂ）を求めることにより方策π（ｂ）を決定する。そして、行動の結果与えられる報酬と結果として起こる状態遷移の確率が定義されることにより、A-MDPは完全に定義される。状態遷移確率と報酬は以下のように定義される。
Ｐ（ｂ’｜ｂ，ａ）:状態ｂにおいて行動ａを選択した際に、状態が状態ｂ’に遷移する確率。
Ｒ（ｂ，ａ）:状態ｂにおいて行動ａを選択した際に与えられる報酬。
これらは、MDPの場合のＰ（ｓ’｜ｓ，ａ）およびＲ（ｓ’｜ｓ，ａ）に対応する。 In A-MDP, as in the case of MDP, policy π (b) is determined by obtaining value function V (b) when optimal action a is executed in state bεB at a certain time t. To do. And A-MDP is completely defined by defining the reward given as a result of action and the probability of the resulting state transition. State transition probabilities and rewards are defined as follows:
P (b ′ | b, a): The probability that the state transitions to state b ′ when action a is selected in state b.
R (b, a): Reward given when selecting action a in state b.
These correspond to P (s ′ | s, a) and R (s ′ | s, a) in the case of MDP.

価値関数Ｖ^π（ｂ）は、更新式による繰り返し計算で得られる。この更新式は、繰り返し回数をＴとして式（５）で表される。Σはすべてのｂ’における総和を表す。γは０＜γ≦１を満たす定数である。

The value function V ^π (b) is obtained by iterative calculation using an update formula. This update formula is expressed by formula (5) where T is the number of repetitions. Σ represents the sum of all b ′. γ is a constant that satisfies 0 <γ ≦ 1.

あるいは、報酬として状態ｂ’が関係する場合には、つまり報酬がＲ（ｂ’｜ｂ，ａ）の場合には、式（５）に替えて式（５．１）が用いられる。ここで、Ｐ（ｂ’｜ｂ，ａ）は、状態がｂの時に行動ａを選択した後に状態がｂ’となる確率である。また、Ｒ（ｂ’｜ｂ，ａ）は、状態がｂの時に行動ａを選択した後に状態がｂ’に遷移したときに与えられる報酬の値である。

Alternatively, when the state b ′ is related as a reward, that is, when the reward is R (b ′ | b, a), the expression (5.1) is used instead of the expression (5). Here, P (b ′ | b, a) is a probability that the state becomes b ′ after the action a is selected when the state is b. R (b ′ | b, a) is a reward value given when the state transitions to b ′ after selecting the action a when the state is b.

上記の式（５）ないし式（５．１）による価値関数Ｖ^π（ｂ）の更新を｜Ｖ^π _Ｔ（ｂ）−Ｖ^π _Ｔ−１（ｂ）｜の値が十分に小さくなるまで（例えば予め定められた十分に小さい値ε以下となるまで）繰り返すことによりＶ^π _Ｔ（ｂ）を求めることができる。更新が完了したときに方策π（ｂ）は、式（６）ないし式（６．１）により計算される。

The value function V ^π (b) is updated by the above formulas (5) to (5.1) until the value of | V ^π _T (b) −V ^π _T-1 (b) | V ^π _T (b) can be obtained by repeating (for example, until a predetermined sufficiently small value ε or less). When the update is completed, the policy π (b) is calculated by the equations (6) to (6.1).

詳細に述べると、式（５）と式（６）は、A-MDPの上位概念である部分観測マルコフ決定過程において定義された一般的な価値関数と方策の決定式に他ならない。しかし、状態ｂの定義の仕方（つまり状態ｂに含まれる要素の定義）が式（５）と式（６）の計算量を大きく左右する。特にＰ（ｂ’｜ｂ，ａ）の値の決定とその総和を求める計算では計算量が大きい。上述のA-MDPは、式（１）−（４）で定義したエントロピーの概念を使用して信念空間Ｂを定義することにより式（５）と式（６）の計算量を大幅に減ずることが可能になっている。 More specifically, equations (5) and (6) are nothing but general value function and policy determinants defined in the partial observation Markov decision process, which is a superordinate concept of A-MDP. However, how to define the state b (that is, the definition of the elements included in the state b) greatly affects the calculation amount of the equations (5) and (6). Particularly, the calculation amount is large in the determination of the value of P (b ′ | b, a) and the calculation for obtaining the sum thereof. The above-mentioned A-MDP significantly reduces the amount of calculations of equations (5) and (6) by defining belief space B using the concept of entropy defined in equations (1)-(4). Is possible.

〔実施形態１〕
本発明の実施形態に係る動作計画装置の機能構成例を図１に示す。
本実施形態では、説明の簡略のため、自律飛行船の移動を水平面内運動に限定する。このため、Ｚ座標については考慮しない。また、位置のデカルト座標のうちＹ軸を平均風向に平行になるように設定することにより、風の影響は、自律飛行船のＸ軸方向の移動においては無視可能と仮定できる。以上により、信念空間Ｂの各状態をｂ＝（Ｘ_ｂ，Ｙ_ｂ，φ_ｂ，Ｈ_Ｙｂ，Ｈ_φｂ）として説明を行う。このように風向をＹ軸に一致させたことにより、状態空間を構成する変数のうちＨ_Ｘｂを減らすことができる。ただし、Ｚ方向の移動が含まれる場合には、Ｘ座標やＹ座標に対する計算と同様の処理をＺ座標についても行えばよいため、容易に拡張することができる。 Embodiment 1
FIG. 1 shows an example of the functional configuration of the motion planning apparatus according to the embodiment of the present invention.
In the present embodiment, for the sake of simplicity, the movement of the autonomous airship is limited to the movement in the horizontal plane. For this reason, the Z coordinate is not considered. Further, by setting the Y axis of the Cartesian coordinates of the position to be parallel to the average wind direction, it can be assumed that the influence of the wind is negligible when the autonomous airship moves in the X axis direction. As described above, each state in the belief space B is described as b = (X _b , Y _b , φ _b , H _Yb , H _φb ). Thus, by making the wind direction coincide with the Y axis, H _Xb among the variables constituting the state space can be reduced. However, in the case where movement in the Z direction is included, the same processing as the calculation for the X coordinate and the Y coordinate may be performed for the Z coordinate, so that it can be easily expanded.

動作計画装置は、実際に自律飛行船を制御する前に、信念空間Ｂ＝｛ｂ_１，ｂ_２，…，ｂ_ｎ｝の各状態ｂ∈Ｂと各行動ａ∈Ａ＝｛ａ_１，ａ_２，…ａ_ｍ｝の組み合わせについて、状態ｂのときに行動ａを選択した際に、状態ｂ’へ遷移する遷移確率Ｐ（ｂ’｜ｂ，ａ）とそのときの報酬Ｒ（ｂ’｜ｂ，ａ）とを計算しておく。さらに、動作計画装置は、計算した遷移確率と報酬を元に価値関数Ｖ^π（ｂ）と方策π（ｂ，ａ）とを計算し、これらの結果を記憶部に記憶しておく。したがって、本実施形態の説明で述べる自律飛行船の「現在位置」、「移動先の位置座標」、「計測位置座標」などは、実際の自律飛行船の位置座標ではなく、シミュレーション上の位置座標を表すものである。 Before actually controlling the autonomous airship, the motion planning device sets each state b∈B and each action a∈A = {a ₁ , a ₂ in the belief space B = {b ₁ , b ₂ ,..., B _n }. ,...,... A _m }, and when the action a is selected in the state b, the transition probability P (b ′ | b, a) that makes a transition to the state b ′ and the reward R (b ′ | b at that time) , A) are calculated in advance. Further, the motion planning device calculates a value function V ^π (b) and a policy π (b, a) based on the calculated transition probability and reward, and stores these results in the storage unit. Therefore, the “current position”, “position coordinate of the destination”, “measurement position coordinate”, etc. of the autonomous airship described in the description of the present embodiment represent not the actual autonomous airship position coordinates but the position coordinates in the simulation. Is.

＜記憶部＞
記憶部５０には、予め自律飛行船が取り得る行動の集合Ａ＝｛ａ_１，ａ_２，…，ａ_ｍ｝が記憶されているものとする。
また、記憶部５０には、自律飛行船が遷移し得る離散的な状態の集合（信念空間）Ｂ＝｛ｂ_１，ｂ_２，・・・，ｂ_ｎ｝も予め記憶されているとする。 <Storage unit>
It is assumed that a set of actions A = {a ₁ , a ₂ ,..., A _m } that can be taken by the autonomous airship is stored in the storage unit 50 in advance.
Further, it is assumed that the storage unit 50 also stores in advance a set of discrete states (belief space) B = {b ₁ , b ₂ ,..., B _n } that the autonomous airship can transition.

ここで、ｂ＝（Ｘ_ｂ，Ｙ_ｂ，φ_ｂ，Ｈ_Ｙｂ，Ｈ_φｂ）∈Ｂである。Ｘ_ｂとＹ_ｂは、自律飛行船が遷移し得るＸ座標の範囲とＹ座標の範囲の範囲をそれぞれ離散化したときの各区間を表している。例えば、Ｘ軸方向の稼働範囲を０〜１ｋｍとし、それを４分割して離散化したとすると、Ｘ_１＝［０ｋｍ，０．２５ｋｍ］，Ｘ_２＝［０．２５ｋｍ，０．５ｋｍ］，Ｘ_３＝［０．５ｋｍ，０．７５ｋｍ］，Ｘ_４＝［０．７５ｋｍ，１ｋｍ］の区間に対応する。Ｘ_ｂはＸ_１からＸ_４のどれかである。 Here, a _{_{b = (X b, Y b}} , φ b, H Yb, H φb) ∈B. X _b and Y _b represent each section when the autonomous airship has each discretized range of the scope and the Y coordinate of the X coordinate may transition. For example, if the operating range in the X-axis direction is 0 to ₁ km, and it is divided into four parts and discretized, X ₁ = [0 km, 0.25 km], X ₂ = [0.25 km, 0.5 km], This corresponds to the section of X ₃ = [0.5 km, 0.75 km] and X ₄ = [0.75 km, 1 km]. X _b is any from _{X 1} of _{X 4.}

方位角の離散値φ_ｉは、方位角として取り得る範囲（０〜２π）を離散化した各区間を、０に近い方から順にφ_１，φ_２，・・・，φ_Ｄφとおいたものである。 A discrete value φ _i of azimuth is obtained by placing each section obtained by discretizing a range (0 to 2π) that can be taken as an azimuth as φ ₁ , φ ₂ _,. is there.

Ｈ_Ｙｉ，Ｈ_φｉは、取りうるエントロピーの離散値であり、予め計測センサの精度などを考慮してＹ_ｉ値、φ_ｉ値の存在確率の分布のばらつきとして取りうる値を設定しておく。例えば、エントロピーの最大値Ｈ^ｍａｘ _Ｙ、Ｈ^ｍａｘ _φを予め与えておき、０からＨ^ｍａｘ _Ｙ（またはＨ^ｍａｘ _φ）の範囲を離散化することにより、各Ｈ_Ｙｉ，Ｈ_φｉを決定してもよい。 H _Yi and H _φi are discrete values of entropy that can be taken, and values that can be taken as variations in the distribution of the existence probability of the Y _i value and φ _i value are set in advance in consideration of the accuracy of the measurement sensor and the like. For example, the maximum entropy values H ^max _Y and H ^max _φ are given in advance, and the respective values H _Yi and H _φi are determined by discretizing the range from 0 to H ^max _Y (or H ^max _φ ). Good.

○ステップＳ１
状態・行動選択部１は、記憶部から、まだ選択されていない状態ｂ∈Ｂと行動ａ∈Ａの組（ｂ，ａ）を選択する。なお、すべての状態及び行動の組み合わせについて以降の計算を行うので、未選択の組み合わせの中から任意に選択を行うことができる。 ○ Step S1
The state / behavior selection unit 1 selects a set (b, a) of the state bεB and the behavior aεA that have not been selected from the storage unit. In addition, since subsequent calculations are performed for all combinations of states and actions, it is possible to arbitrarily select from unselected combinations.

○ステップＳ２
現在位置座標決定部２は、状態・行動選択部１によって選択された状態ｂを仮定したときの自律飛行船の位置（Ｘ，Ｙ，φ）を決定する。 ○ Step S2
The current position coordinate determination unit 2 determines the position (X, Y, φ) of the autonomous airship when the state b selected by the state / action selection unit 1 is assumed.

まず、式（７）により、状態ｂのときに自律飛行船が状態（Ｘ_ｂ，ｉ，Ｙ_ｂ，ｉ，φ_ｂ，ｉ）に存在する確率Ｐ（Ｘ_ｂ，ｉ，Ｙ_ｂ，ｉ，φ_ｂ，ｉ｜ｂ）を定義する。

First, according to the equation (7), the probability P ( _{Xb, i} , Yb _{, i} , φ) that the autonomous airship exists in the state ( _{Xb, i} , Yb _{, i} , φb _{, i} ) in the state _b. Define _{b, i} | b).

式（７）にて、Ｐ（Ｘ_ｂ，ｉ｜ｂ）、Ｐ（Ｙ_ｂ，ｉ｜ｂ）、Ｐ（φ_ｂ，ｉ｜ｂ）は式（８）（９）（１０）で定義される。ただし、積分範囲は式（８）の場合は状態Ｘ_ｂ，ｉ内全域、式（９）の場合は状態Ｙ_ｉ内全域、式（１０）の場合は状態φ_ｂ，ｉ内全域を表す。

In Expression (7), P (X _{b, i} | b), P (Y _{b, i} | b), and P (φ _{b, i} | b) are defined by Expressions (8), (9), and (10). The However, the integration range represents the whole area in the state phi _{b, i} in the case of state _{X b, i} in the entire region, state _{Y i} in the entire region in the case of formula (9), equation (10) in the case of formula (8).

例えば、Ｘ_ｂ，ｉ=［０，０．２５］であれば、式（８）は下記式のとおりである。

For example, if X _{b, i} = [0, 0.25], Expression (8) is as follows.

また、状態ｂのとき自律飛行船が座標（Ｘ，Ｙ，φ）にいる確率分布ｐ（Ｘ，Ｙ，φ｜ｂ）を式（１１）により定義する。

Further, a probability distribution p (X, Y, φ | b) in which the autonomous airship is at coordinates (X, Y, φ) in state b is defined by equation (11).

ここで、確率分布ｐ（Ｘ｜ｂ）、ｐ（Ｙ｜ｂ）、ｐ（φ｜ｂ）は、式（１２）（１３）（１４）により定義される。

Here, the probability distributions p (X | b), p (Y | b), and p (φ | b) are defined by equations (12), (13), and (14).

Ｘ，Ｙ，φは変数であり、Ｘ_ｂｃはＸ_ｂの中心点のＸ座標，Ｙ_ｂｃはＹ_ｂの中心点のＹ座標，φ_ｂｃはφ_ｂの中心点のφ座標である。ここで、式（１２）（１３）（１４）で使用しているσ_Ｘ、σ_Ｙ、σ_φの値は、動作計画の計算結果に大きく影響を与えるものである。 X, Y, phi is a _{variable, X bc} the X coordinate of the center point of _{X _b,} the _{Y bc} a Y-coordinate, phi _bc is phi coordinates of the center point of the phi _b of the center point of _{Y b.} Here, the values of σ _X , σ _Y , and σ _φ used in the equations (12), (13), and (14) greatly affect the calculation result of the motion plan.

現在位置座標決定部２は、式（７）または式（１１）のいずれかを用いて、自律飛行船の位置（Ｘ，Ｙ，φ）を決定する。式（７）を使用する場合は、Ｐ（Ｘ_ｂ，ｉ，Ｙ_ｂ，ｉ，φ_ｂ，ｉ｜ｂ）の値が示す確率に従って、サンプル状態（Ｘ_ｂ，ｉ，Ｙ_ｂ，ｉ，φ_ｂ，ｉ）を選び出し、選ばれたサンプル状態の中心点を自律飛行船の位置（Ｘ，Ｙ，φ）とする。式（１１）を使用する場合は、Ｐ（Ｘ，Ｙ，φ｜ｂ）の値が示す確率に従って、サンプル点（Ｘ，Ｙ，φ）を選び出し、自律飛行船の位置とする。 The current position coordinate determination unit 2 determines the position (X, Y, φ) of the autonomous airship using either equation (7) or equation (11). When using equation (7), according to the probability indicated by the value of P ( _{Xb, i} , Yb _{, i} , φb _{, i} | b), the sample state ( _{Xb, i} , Yb _{, i} , φ) _{b, i} ) is selected, and the center point of the selected sample state is set as the position (X, Y, φ) of the autonomous airship. When using the equation (11), the sample point (X, Y, φ) is selected according to the probability indicated by the value of P (X, Y, φ | b) and set as the position of the autonomous airship.

なお、式（１１）を使用して自律飛行船の位置を決定する場合でも、式（７）の値は後の処理（存在確率計算部による処理）で使用するので、この時点で式（７）も求めておく。 Even when the position of the autonomous airship is determined using Equation (11), the value of Equation (7) is used in the subsequent processing (processing by the existence probability calculation unit). Also ask.

本実施形態では、風の平均風向がＹ軸に平行なことを考慮してＨ_Ｘｂの値をｂの成分としていないことから、σ_Ｘをセンサの計測と使用しているフィルタの性能を考慮した一定値に固定し、主にσ_Ｙとσ_φの値について、式（７）を使用する場合は式（１５）および式（１６）を満たすように定める。ただし、Ｈ_Ｙｂｃは状態Ｈ_Ｙｂの中心点のＨ_Ｙ座標であり、Ｈ_φｂｃは状態Ｈ_φｂの中心点のＨ_φ座標である。

In this embodiment, since the value of H _Xb is not used as a component of b considering that the average wind direction of the wind is parallel to the Y-axis, σ _X is taken into consideration the sensor performance and the performance of the filter used. When fixed to a constant value and using equation (7), the values of σ _Y and σ _φ are mainly determined so as to satisfy equations (15) and (16). Here, H _Ybc is the H _Y coordinate of the center point of the state H _Yb , and H _φbc is the H _φ coordinate of the center point of the state H _φb .

なお、式（１５）（１６）を満たすσ_Ｙ、σ_φの値は、本処理の実行前に全てのｂについて予め決定しておく。値の決定のためには、例えば、σ_Ｙについては、σ_Ｙの初期値として小さい値を設定して式（１５）の右辺値を計算し、その結果とＨ_Ｙｂｃの値とを大小比較して、右辺値が小さい場合は、σ_Ｙの値を微小量だけ大きくし、逆に右辺値が小さい場合は、σ_Ｙの値を微小量だけ大きくなるように更新することで求めることができる。 Note that the values of σ _Y and σ _φ that satisfy Expressions (15) and (16) are determined in advance for all b before execution of this processing. For the determination of the value, for example, for σ _Y , a small value is set as the initial value of σ _Y , the right side value of Equation (15) is calculated, and the result is compared with the value of H _Ybc. When the right side value is small, the value of σ _Y is increased by a minute amount. Conversely, when the right side value is small, the value of σ _Y can be updated by a minute amount.

式（１１）を使用する場合は、式（１７）および式（１８）により定める。

When using Expression (11), it is determined by Expression (17) and Expression (18).

ここで、式（１７）の積分範囲はＹ全域、式（１８）の積分範囲はφ全域である。σ_Ｙとσ_φの値の決定方法は、式（１５）と式（１６）を満たすσ_Ｙとσ_φの値の決定方法と同じである。 Here, the integration range of equation (17) is the entire Y range, and the integration range of equation (18) is the entire φ range. method of determining the value of sigma _Y and sigma _phi is the same as the method of determining the value of the expression (15) and satisfies the equation (16) sigma _Y and sigma _phi.

○ステップＳ３
移動先位置座標選択部３は、状態遷移確率分布ｐ（Ｘ’−Ｘ，Ｙ’−Ｙ，φ’−φ｜φ_ｂ，ｉ，ａ）の指す確率に従って、自律飛行船の移動後の位置を（Ｘ’，Ｙ’，φ’）を決定する。状態遷移確率分布は、任務環境内の風速の平均値および揺らぎ量の大きさを反映して決める。状態遷移確率ｐ（Ｘ’−Ｘ，Ｙ’−Ｙ，φ’−φ｜φ_ｂ，ｉ，ａ）の値は、式（１９）により決定する。

○ Step S3
The destination position coordinate selection unit 3 determines the position of the autonomous airship after movement according to the probability indicated by the state transition probability distribution p (X′−X, Y′−Y, φ′−φ | φb _{, i} , a). (X ′, Y ′, φ ′) is determined. The state transition probability distribution is determined by reflecting the average value of the wind speed and the amount of fluctuation in the mission environment. The value of the state transition probability p (X′−X, Y′−Y, φ′−φ | φb _{, i} , a) is determined by the equation (19).

ここで、Ｘ’、Ｙ’、φ’は変数であり、式（２０）（２１）（２２）が成立する。

Here, X ′, Y ′, and φ ′ are variables, and Expressions (20), (21), and (22) are established.

また、Ｄ_Ｘ（φ_ｂ，ｉ，ａ），Ｄ_Ｙ（φ_ｂ，ｉ，ａ），Ｄ_φ（φ_ｂ，ｉ，ａ）は、方位角状態φ_ｂ，ｉにおいて行動ａを選択した際のＸ，Ｙ，φの変位量の平均値である（特許第４４０６４３６号参照）。平均風速が各位置Ｘ，Ｙにて異なる値であることを考慮しなければならないので、Ｄ_Ｘ（φ_ｂ，ｉ，ａ），Ｄ_Ｙ（φ_ｂ，ｉ，ａ），Ｄ_φ（φ_ｂ，ｉ，ａ）は、Ｘ，Ｙも変数に含んでいる。 Further, D _X (φ _{b, i} , a), D _Y (φ _{b, i} , a), D _φ (φ _{b, i} , a) are selected when the action a is selected in the azimuth state φ _{b, i} . Is an average value of displacement amounts of X, Y, and φ (see Japanese Patent No. 4406436). Since it is necessary to consider that the average wind speed is different at each position X and Y, D _X (φ _{b, i} , a), D _Y (φ _{b, i} , a), D _φ (φ _{b , I} , a) includes X and Y as variables.

なお、σ_Ｘｔ，σ_Ｙｔ，σ_φｔの値には、風速に関するばらつきの程度が影響する。風速が一定で変わらない場合は、自律飛行船の行動ごとの変位はＤ_Ｘ，Ｄ_Ｙ，Ｄ_φの値の計算の際に平均風速値が考慮されているので、決定された変位から大きくずれることはない。しかし、各行動の間に風速が急激に変動した場合、風速の変動分だけ自律飛行船は変動風速の影響を受けて変位に誤差を生じてしまう。その誤差のばらつきの程度を式（２０）（２１）（２２）にて考慮している。 Note that the values of σ _Xt , σ _Yt , and σ _φt are affected by the degree of variation regarding wind speed. If the wind speed is constant and does not change, the displacement for each action of the autonomous airship will deviate significantly from the determined displacement because the average wind speed value is taken into account when calculating the values of D _X , _DY and D _φ. There is no. However, if the wind speed fluctuates abruptly during each action, the autonomous airship will be affected by the fluctuating wind speed by an amount corresponding to the fluctuation of the wind speed, resulting in an error in displacement. The degree of variation in the error is taken into consideration in the equations (20), (21), and (22).

自律飛行船の任務環境内の各位置における風速のばらつきに応じてσ_Ｘｔ，σ_Ｙｔ，σ_φｔの値は変化する。一般には、σ_Ｘｔ，σ_Ｙｔ，σ_φｔは、Ｘ，Ｙの関数となるところである。本実施形態では、平均風向はＹ軸に平行としているので風の影響はＹ軸方向のみに主に表れると考えてよく、σ_Ｘｔは一定値としてよい。σ_Ｘｔの大きさは、風向のばらつきの大きさに影響する量であるとも言える。風向のバラつきが大きい場合には、σ_Ｘｔの値をそれに応じて大きくするとよい。もし風向のばらつき量が各位置で異なるならば、σ_Ｘｔの値もX,Yの関数となる。σ_Ｙｔ，σ_φｔの値は、方位角φに依存する。一般的に自律飛行船は、真横から風を受ける場合に最も風の影響を受けやすい。逆に正面前方より風を受けた場合に、最も風の影響を受けにくいという性質がある。 The values of σ _Xt , σ _Yt , and σ _φt change according to the variation in wind speed at each position in the mission environment of the autonomous airship. In general, σ _Xt , σ _Yt , and σ _φt are functions of X and Y. In this embodiment, since the average wind direction is parallel to the Y axis, it may be considered that the influence of the wind appears mainly only in the Y axis direction, and σ _Xt may be a constant value. It can be said that the magnitude of σ _Xt is an amount that affects the magnitude of the variation in wind direction. If the variation in wind direction is large, the value of σ _Xt should be increased accordingly. If the amount of variation in the wind direction is different at each position, the value of σ _Xt is also a function of X and Y. The values of σ _Yt and σ _φt depend on the azimuth angle φ. In general, an autonomous airship is most susceptible to wind when receiving wind from the side. Conversely, when receiving wind from the front, it has the property of being least susceptible to wind.

今、Ｙ軸が風向に一致しているとして自律飛行船が飛行する環境内全域で風が吹いているとする。Ｙ方向から見た飛行船の断面積をＳ（φ）とする。Ｓ（φ）についての非負値単調増加関数をＦ_Ｙ（Ｓ（φ））、Ｆ_φ（Ｓ（φ））、位置Ｘ，Ｙにおける風速のばらつきの大きさについての単調増加関数F_σ（Ｘ，Ｙ）として、式（２３）でσ_Ｙｔ，σ_φｔの値を決定する。簡単のため、σ_φｔを一定値にして計算することも有用である。

Now, suppose that the wind is blowing throughout the environment in which the autonomous airship flies, assuming that the Y axis coincides with the wind direction. Let S (φ) be the cross-sectional area of the airship viewed from the Y direction. A non-negative monotone increasing function for S (φ) is expressed as F _Y (S (φ)), F _φ (S (φ)), and a monotonic increasing function F _σ (X , Y), the values of σ _Yt and σ _φt are determined by the equation (23). For simplicity, it is also useful to calculate with σ _φt constant.

もちろん、簡単のため、任務環境内での風速のばらつきの程度を一定値とすれば、F_σは一定値であり、また、平均風速を任務環境内で一定速度とすれば式（２０）（２１）（２２）は以下のように簡略化される。

Of course, for simplicity, if the degree of variation in the wind speed in the mission environment is a constant value, _Fσ is a constant value, and if the average wind speed is a constant speed in the mission environment, Equation (20) ( 21) (22) is simplified as follows.

○ステップＳ４
計測位置座標選択部４は、センサによる計測確率分布ｐ（Ｍ_Ｘ’，Ｍ_Ｙ’，Ｍ_φ’| Ｘ’，Ｙ’，φ’）の指す確率に従って、移動先位置座標選択部３によって推定される自律飛行船の移動後の位置座標とセンサにより計測される実際の位置座標との誤差を考慮して、自律飛行船の移動後の位置座標（Ｍ_Ｘ’，Ｍ_Ｙ’，Ｍ_φ’）を決定する。計測確率分布ｐ（Ｍ_Ｘ’，Ｍ_Ｙ’，Ｍ_φ’|Ｘ’，Ｙ’，φ’）の値は、式（２４）によって決定する。式（２４）の右辺の各各確率は式（２５）（２６）（２７）で定義される。

○ Step S4
The measurement position coordinate selection unit 4 is estimated by the movement destination position coordinate selection unit 3 according to the probability indicated by the measurement probability distribution p (M _{X ′} , M _{Y ′} , M _{φ ′} | X ′, Y ′, φ ′) by the sensor. In consideration of the error between the position coordinates after the movement of the autonomous airship and the actual position coordinates measured by the sensor, the position coordinates (MX _′ , _{MY ′} , _{Mφ ′} ) after the movement of the autonomous airship are decide. The value of the measurement probability distribution p (M _{X ′} , M _{Y ′} , M _{φ ′} | X ′, Y ′, φ ′) is determined by Expression (24). Each probability on the right side of Expression (24) is defined by Expressions (25), (26), and (27).

σ_ＭＸ，σ_ＭＹ，σ_Ｍφの値は、自律飛行船に搭載しているセンサの精度を考慮して決定される。ＧＰＳ（Global Positioning System）を位置計測センサとして、ジャイロセンサを方位角計測センサとして使用するのであれば、環境内全域で一定のσ_ＭＸ，σ_ＭＹの値が設定される。無線LAN方式の位置計測センサを使用する場合は、環境内の各位置によって多少の精度の差があるので、σ_ＭＸ，σ_ＭＹは、X,Yの関数となる。 The values of σ _MX , σ _MY , and σ _Mφ are determined in consideration of the accuracy of sensors mounted on the autonomous airship. If GPS (Global Positioning System) is used as a position measurement sensor and a gyro sensor is used as an azimuth angle measurement sensor, constant σ _MX and σ _MY values are set throughout the environment. When using a wireless LAN type position measurement sensor, σ _MX and σ _MY are functions of X and Y because there is a slight difference in accuracy depending on each position in the environment.

○ステップＳ５
存在確率計算部５は、自律飛行船の位置計測値を（Ｍ_Ｘ’，Ｍ_Ｙ’，Ｍ_φ’）とし、行動ａを選択する前の状態ｂのときの自律飛行船が位置座標（Ｘ_ｂ，ｉ，Ｙ_ｂ，ｉ，φ_ｂ，ｉ）にいる確率をＰ（Ｘ_ｂ，ｉ，Ｙ_ｂ，ｉ，φ_ｂ，ｉ｜ｂ）として、離散ベイジアンフィルタを使用して、選択された行動後の状態ｂ’を計算する。まず、存在確率計算部５は、全ての遷移先状態（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’）について、式（２８）で遷移先状態（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’）に自律飛行船が存在する確率の予測値（存在確率予測値）Ｐ’（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’）を計算する。

○ Step S5
The existence probability calculation unit 5 uses the position measurement values of the autonomous airship as (M _{X ′} , _{MY ′} , M _{φ ′} ), and the autonomous airship in the state b before selecting the action a is the position coordinate (X _{b, i} , Y _{b, i} , φ _{b, i} ) as P (X _{b, i} , Y _{b, i} , φ _{b, i} | b), and after the selected action using a discrete Bayesian filter The state b ′ of is calculated. First, the existence probability calculation unit 5, all the transition destination state _{_{(X b, i ', Y}} b, i', φ b, i ') for the transition destination state _{(X b} in the formula _{(28), i',} Y _{b, i ′} , φ _{b, i ′} ) is a predicted value (presence probability predicted value) P ′ (X _{b, i ′} , Y _{b, i ′} , φ _{b, i ′} ) of the probability that an autonomous airship exists. calculate.

式（２８）において、Ｐ（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’｜Ｘ_ｂ，ｉ，Ｙ_ｂ，ｉ，φ_ｂ，ｉ，ａ）は式（２９）で表される。

In Equation (28), P ( _{Xb, i ′} , Yb _{, i ′} , φb _{, i ′} | _{Xb, i} , Yb _{, i} , φb _{, i} , a) is expressed by Equation (29). Is done.

式（２８）で、Σは全（Ｘ_ｂ，ｉ，Ｙ_ｂ，ｉ，φ_ｂ，ｉ）についての総和である。∫ｉは遷移前状態ｉ内全域についての積分、∫ｉ’は遷移後状態ｉ’内全域についての積分である。 In equation (28), Σ is the sum for all (X _{b, i} , Y _{b, i} , φ _{b, i} ). ∫i is the integration over the entire region in the state i before the transition, and ∫i ′ is the integration over the entire region in the state i ′ after the transition.

次に、存在確率計算部５は、計算したＰ’（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’）の値を式（３０）により補正する。ηは正規化係数である。

Next, the existence probability calculation unit 5 corrects the calculated value of P ′ ( _{Xb, i ′} , Yb _{, i ′} , φb _{, i ′} ) by the equation (30). η is a normalization coefficient.

ここで、（Ｘ_{ｂ，ｉ’ｃ}，Ｙ_{ｂ，ｉ’ｃ}，φ_{ｂ，ｉ’ｃ}）は、状態（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’）の中心点である。 _{_{Here, (X b, i'c, Y}} b, i'c, φ b, i'c) , the state _{_{(X b, i ', Y}} b, i', φ b, i ') the center point of the It is.

そして、計算されたＰ（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’｜ｂ’）の値のうち、Ｐ（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’｜ｂ’）の値を最大にする状態（Ｘ^ｍａｘ _ｂ’，Ｙ^ｍａｘ _ｂ’,φ^ｍａｘ _ｂ’）をｂ’における中心状態とし、式（３１）（３２）で計算されるエントロピーＨ_Ｙ’，Ｈ_φ’の値から状態ｂ’のエントロピー状態成分Ｈ_Ｙｂ’，Ｈ_φｂ’を決定し、ｂ’＝(Ｘ^ｍａｘ _ｂ’，Ｙ^ｍａｘ _ｂ’,φ^ｍａｘ _ｂ’，Ｈ_Ｙｂ’，Ｈ_φｂ’）に更新する。Σは、全てのＹ_ｂまたはφ_ｂについての総和を計算することを表す。

The calculated _{P (X b, i ',} Y b, i', φ b, i '| b') of the value _{of, P (X b, i '} , Y b, i', φ b, _The state (X ^max _{b ′} , Y ^max _{b ′} , φ ^max _{b ′} ) that maximizes the value of _{i ′} | b ′) is the central state in b ′, and the entropy H calculated by the equations (31) and (32) The entropy state components H _{Yb ′} and H _{φb ′} of the state b _′ are determined from the values of _{Y ′} and H _{φ ′} , and b ′ = (X ^max _{b ′} , Y ^max _{b ′} , φ ^max _{b ′} , H _{Yb ′} , H _{φb ′} ). Σ represents calculating the sum for all Y _b or φ _b .

○ステップＳ６
遷移確率計算部６は、存在確率計算部５によって計算された状態ｂ’に対する遷移確率Ｐ（ｂ’｜ｂ，ａ）と報酬Ｒ（ｂ’｜ｂ，ａ）の値を所定の値だけ増加させる（Ｎは繰り返し回数）。このようにすることで、或る（ｂ，ａ）の組に対して、Ｎ回のシミュレーションを実行した結果、多く推定された移動後の状態ｂ’に関する遷移確率と報酬が高く設定されることになる。遷移確率Ｐ（ｂ’｜ｂ，ａ）のインクリメントは式（３３）に従う。

○ Step S6
The transition probability calculation unit 6 increases the values of the transition probability P (b ′ | b, a) and the reward R (b ′ | b, a) for the state b ′ calculated by the existence probability calculation unit 5 by a predetermined value. (N is the number of repetitions). By doing in this way, as a result of executing N simulations for a certain (b, a) group, the transition probability and the reward relating to the state b ′ after movement that are estimated to be high are set high. become. The increment of the transition probability P (b ′ | b, a) follows equation (33).

報酬値のインクリメントは、式（３４）（３５）（３６）（３７）のいずれかに従う。

The increment of the reward value follows any of the equations (34) (35) (36) (37).

式（３４）はベイジアンフィルタの結果をベースに報酬を決定する場合、式（３５）は飛行船搭載センサの計測値をベースに報酬が決定される場合、式（３６）は自律飛行船以外の飛行船の真値を知る第三者（人間操作者）により報酬の判定が行われる場合である。式(３７)は、行動選択時の行動開始位置により報酬判定が行われる場合である。 When the reward is determined based on the result of the Bayesian filter in the equation (34), when the reward is determined based on the measurement value of the airship mounted sensor, the equation (36) is calculated for an airship other than the autonomous airship. This is a case where the reward is determined by a third party (human operator) who knows the true value. Formula (37) is a case where reward determination is performed based on the action start position at the time of action selection.

ｒ（Ｘ，Ｙ，φ）は、例えば、以下のような関数である。

r (X, Y, φ) is, for example, the following function.

○ステップＳ７
繰り返し制御部７は、状態・行動選択部１によって選択された（ｂ，ａ）の組に対して、ステップＳ２−ステップＳ６の処理をＮ回繰り返したか否かを判定し、繰り返し回数がＮ回に満たない場合には現在位置座標決定部２による処理へ移行し、ステップＳ２−ステップＳ６の処理を繰り返し実行させる。繰り返し回数がＮ回に到達した場合には、状態・行動選択部１による処理へ移行し、まだ選択していない信念状態と行動の組についてステップＳ２−ステップＳ６の処理を実行させる。すべての信念状態と行動の組についてステップＳ２−ステップＳ６の処理を行った場合には、処理を終了する。 ○ Step S7
The repetition control unit 7 determines whether or not the processing of step S2 to step S6 has been repeated N times for the pair (b, a) selected by the state / action selection unit 1, and the number of repetitions is N times. If not, the process proceeds to the process by the current position coordinate determination unit 2, and the processes of step S2 to step S6 are repeatedly executed. When the number of repetitions reaches N times, the process proceeds to the process by the state / action selection unit 1, and the process of step S2 to step S6 is executed for a belief state / action pair that has not been selected. If the processes in step S2 to step S6 have been performed for all the belief state and action pairs, the process ends.

○ステップＳ８
価値関数計算部８は、上述の処理で求めた遷移確率Ｐ（ｂ’｜ｂ，ａ）と報酬Ｒ（ｂ’｜ｂ，ａ）を用いて、式（５．１）により、価値関数Ｖ^π（ｂ）を計算する。ここで、｜Ｖ^π _Ｔ（ｂ）−Ｖ^π _Ｔ−１（ｂ）｜の値が予め定めた閾値εより小さくなるまでＴを増加させながらＶ^π _Ｔ（ｂ）の値を更新していく。 ○ Step S8
The value function calculation unit 8 uses the transition probability P (b ′ | b, a) and the reward R (b ′ | b, a) obtained by the above-described processing, and the value function V according to the equation (5.1). ^π (b) is calculated. Here, the value of V ^π _T (b) is updated while increasing T until the value of | V ^π _T (b) −V ^π _T-1 (b) | becomes smaller than a predetermined threshold value ε. .

○ステップＳ９
方策計算部９は、式（６．１）により、上述の処理で求めた遷移確率Ｐ（ｂ’｜ｂ，ａ）と報酬Ｒ（ｂ’｜ｂ，ａ）と価値関数Ｖ^π（ｂ）を用いて、方策π（ｂ，ａ）を求める。 ○ Step S9
The policy calculation unit 9 obtains the transition probability P (b ′ | b, a), the reward R (b ′ | b, a), and the value function V ^π (b) obtained by the above-described processing according to the equation (6.1). Is used to find policy π (b, a).

〔変形例１〕
上述のステップＳ１−ステップＳ６の処理では、信念空間内の全ての信念ｂ=（Ｘ_ｂ，Ｙ_ｂ，φ_ｂ，Ｈ_ｙｂ，Ｈ_φｂ）について、行動ａによる起こりうる全ての遷移先の信念ｂ’について、Ｐ（ｂ’｜ｂ，ａ）を評価していた。これを全空間で行っていては、計算量が膨大なものとなってしまう場合がある。幸い、自律飛行船の場合は、或る状態（Ｘ_ｂ，Ｙ_ｂ，φ_ｂ）に存在する自律飛行船は、一回の行動によって、Ｘ，Ｙ，φ座標についてそれほど離れていない周囲の状態のどれかに移動することが分かっている。そして、その移動の際の、Ｘ，Ｙ，φ座標での差分の量は、平均風速および風速のばらつき量が任務環境内の各位置で一定値であるという仮定を設けることが可能であれば、行動と方位角のみによって決定される。このことをうまく利用すると、遷移先の可能性のある信念ｂ’についてのみＰ（ｂ’｜ｂ，ａ）を評価すればよいため、計算量を大幅に削減することができる。 [Modification 1]
In the processing of the above steps S1- step S6, all the belief b = in belief space _{_{(X b, Y b, φ}} b, H yb, H φb) for, beliefs of all transition destinations possible by action a b P (b ′ | b, a) was evaluated for “. If this is performed in the entire space, the calculation amount may become enormous. Fortunately, in the case of an autonomous airship, an autonomous airship that exists in a certain state (X _b , Y _b , φ _b ) is able to I know that I ’ll move. If the amount of difference in the X, Y, and φ coordinates during the movement can be assumed to be a constant value at each position in the mission environment, the average wind speed and the amount of variation in the wind speed can be set. , Determined only by behavior and azimuth. If this is utilized well, it is only necessary to evaluate P (b ′ | b, a) only for the belief b ′ that may be the transition destination, so that the amount of calculation can be greatly reduced.

Ｐ（ｂ’｜ｂ，ａ）の評価の際に、ｂについては、Ｘ_ｂ，Ｙ_ｂの値を固定値としてｂ＝（０，０，φ_ｂ，Ｈ_ｙｂ，Ｈ_φｂ）とし、方位角とエントロピーのみのとりうる全ての値に対してｂ’への遷移確率を考慮する。ここでｂ’の中でＸ，Ｙ，φの値が取りうる範囲は、一回の行動で自律飛行船が移動しうるＸ，Ｙ，φ面内の距離がそれぞれｄ^ｍａｘ _Ｘｂ，ｄ^ｍａｘ _Ｙｂ，ｄ^ｍａｘ _φｂとすれば、それぞれＸ，Ｙ，φ方向の幅が２×ｄ^ｍａｘ _Ｘｂ＋１，２×ｄ^ｍａｘ _Ｙｂ＋１，２×ｄ^ｍａｘ _φｂ＋１となる格子空間に収まる。よって、この格子空間内の信念ｂ’についてのみ、Ｐ（ｂ’｜ｂ，ａ）を評価すればよい。 In the evaluation of P (b ′ | b, a), for _b , the values of X _b and Y _b are fixed values and b = (0, 0, φ _b , H _yb , H _φb ), and the azimuth angle And the transition probability to b ′ is considered for all possible values of only entropy. Here, the range of values of X, Y, and φ in b ′ is such that the distance in the X, Y, and φ planes that the autonomous airship can move in one action is d ^max _Xb , d ^max _Yb , respectively. Assuming d ^max _φb , the widths in the X, Y, and φ directions are 2 × d ^max _Xb + 1, 2 × d ^max _Yb + 1, 2 × d ^max _φb + 1, respectively. Therefore, it is only necessary to evaluate P (b ′ | b, a) only for the belief b ′ in the lattice space.

以上によりＰ（ｂ’｜ｂ，ａ）のデータ形式は、以下のような変数Ｐの配列データとなる。

As described above, the data format of P (b ′ | b, a) is the array data of the variable P as follows.

ｄ_Ｘｂ’，ｄ_Ｙｂ’は、Ｘ_ｂ，Ｙ_ｂの変化量である。配列数の取りえる範囲は、それぞれ以下の通りである。

d _{Xb ′} and d _{Yb ′} are the amounts of change in X _b and Y _b . The possible range of the number of arrays is as follows.

ここで、φ^ｍａｘ _ｂはφ_ｂの状態数、Ｈ^ｍａｘ _ＹｂはＨ_Ｙｂの状態数、Ｈ^ｍａｘ _φｂはＨ_φｂの状態数、ａ^ｍａｘは行動数である。なお、ＸＹ平面内の移動と方位角φの変化については、独立の事象であると単純化することも可能であるから、方位角の変位量は、もとの方位角によらず行動のみに依存することも利用して、以下のようにＰ（ｂ’｜ｂ，ａ）の配列データを組むことも可能である。

Here, φ ^max _b is the number of states of φ _b , H ^max _Yb is the number of states of H _Yb , H ^max _φb is the number of states of H _φb , and a ^max is the number of actions. Note that the movement in the XY plane and the change in azimuth angle φ can be simplified as independent events, so the amount of azimuth displacement is limited to the action regardless of the original azimuth angle. Using the dependence, it is also possible to assemble the array data of P (b ′ | b, a) as follows.

ここで、ｄ_φｂ’は、方位角φ_ｂの変位量であり、以下を満たす。

Here, d _{.phi.b 'is} a displacement of the azimuth angle phi _b, satisfy the following.

以上により保持されたデータを利用して、ｂ=（Ｘ_ｂ，Ｙ_ｂ，φ_ｂ，Ｈ_Ｙｂ，Ｈ_φｂ），ｂ’=（Ｘ_ｂ’，Ｙ_ｂ’，φ_ｂ’，Ｈ_Ｙｂ’，Ｈ_φｂ’）とすると、以下のいずれかが成立する。

By using the data held as described above, b = (X _b , Y _b , φ _b , H _Yb , H _φb ), b ′ = (X _{b ′} , Y _{b ′} , φ _{b ′} , H _{Yb ′} , H _{φb ′} ), one of the following holds:

〔変形例２〕
ステップＳ８の処理において式（５．１）によりＶ^π _Ｔ（ｂ）値の計算を行う際に、ある（ｂ，ｂ’）の組に対して全てのａにおいて、Ｐ（ｂ’｜ｂ，ａ）の値が０である場合には、式（５．１）の右辺は０となるため計算の意味がない。ここで、次に定義するフラグを使用する。

[Modification 2]
In calculating the V ^π _T (b) value according to the equation (5.1) in the process of step S8, P (b ′ | b, When the value of a) is 0, the right side of Equation (5.1) is 0, so there is no meaning in calculation. Here, the flag defined below is used.

これらのフラグは、信念遷移確率の計算前には、すべて０に初期化されている。そして、ステップＳ６の処理において式（３３）−（３６）のＰ（ｂ’｜ｂ，ａ）もしくはＲ（ｂ’|ｂ，ａ）のインクリメントが行われた際には、対応するｄ_Ｘｂ’，ｄ_Ｙｂ’，ｄ_φｂ’のフラグ値を１に設定する。 These flags are all initialized to 0 before the belief transition probability is calculated. Then, when P (b ′ | b, a) or R (b ′ | b, a) in Expressions (33) to (36) is incremented in the process of Step S6, the corresponding d _{Xb ′} , D _{Yb ′} , d _{φb ′} are set to 1.

実際の（５）式による計算を実行する際には、ｂとｂ’のＸ，Ｙ，φ座標の差分値とａ値に対応したフラグの値を参照し、フラグ値が０である場合は、全てのｂとｂ’のエントロピー値に対して（５）式の計算を行わない。 When the actual calculation according to the equation (5) is executed, the difference value between the X, Y, and φ coordinates of b and b ′ and the flag value corresponding to the a value are referred to. The calculation of equation (5) is not performed for all the entropy values of b and b ′.

変形例１と変形例２を用いることで、動的計画法での価値関数計算のコストを１０倍近く短縮することができる。 By using the first modification and the second modification, the cost of the value function calculation in the dynamic programming can be reduced by almost ten times.

〔実施形態２〕
次に、動作計画装置により作成された価値関数Ｖ^π（ｂ）と方策π（ｂ，ａ）を用いて、実施形態２において実際に動作主体の行動を制御する行動制御装置について説明を行う。 [Embodiment 2]
Next, the behavior control device that actually controls the behavior of the motion subject in the second embodiment will be described using the value function V ^π (b) and the policy π (b, a) created by the motion planning device.

行動制御装置は、行動選択を行う度に、得られた計測値（Ｍ_Ｘ，Ｍ_Ｙ，Ｍ_φ）に対してベイジアンフィルタを使用して、自律飛行船がどの位置にどの確率で存在するかについての確率分布Ｐ（Ｘ_ｂ，Ｙ_ｂ，φ_ｂ）を計算する。ベイジアンフィルタは、カルマンフィルタや、パーティクルフィルタなどが使用可能であるが、本発明においては、離散ベイジアンフィルタを採用するのが簡便である。その理由は、本発明における信念空間ｂは離散空間（Ｘ_ｂ，Ｙ_ｂ，φ_ｂ，Ｈ_Ｙｂ，Ｈ_φｂ）で構成されており、離散ベイジアンフィルタの式の定義をそのまま使用可能である点と、動作計画の計算の際に行うシミュレーションでも離散ベイジアンフィルタを使用しているので、実際のミッションと動作計画計算での仮定との間の違いを最小限にすることが可能である点によるものである。 The behavior control device uses a Bayesian filter for the obtained measurement values (M _X , M _Y , M _φ ) each time an action is selected, and the probability that the autonomous airship is present at which position. The probability distribution P (X _b , Y _b , φ _b ) is calculated. As the Bayesian filter, a Kalman filter, a particle filter, or the like can be used. However, in the present invention, it is simple to employ a discrete Bayesian filter. The reason is that the belief space b in the present invention is composed of a discrete space (X _b , Y _b , φ _b , H _Yb , H _φb ), and the definition of the discrete Bayesian filter equation can be used as it is. This is because the difference between the actual mission and the assumptions in the motion plan calculation can be minimized because the discrete Bayesian filter is used in the simulation performed when calculating the motion plan. is there.

○ステップＳ１１
位置取得部１１は、自律飛行船に搭載された位置及び方位角計測センサ６１により計測した、自機位置と方位角（Ｍ_Ｘ，Ｍ_Ｙ，Ｍ_φ）を取得する。 ○ Step S11
The position acquisition unit 11 acquires the position and azimuth (M _X , M _Y , M _φ ) measured by the position and azimuth measuring sensor 61 mounted on the autonomous airship.

○ステップＳ１２
状態遷移確率計算部１２は、フィルタリングで使用する状態遷移確率ｐ（Ｘ’−Ｘ，Ｙ’−Ｙ，φ’−φ｜φ_ｂ，ａ）の値を式（１９）により計算する。 ○ Step S12
The state transition probability calculation unit 12 calculates the value of the state transition probability p (X′−X, Y′−Y, φ′−φ | φ _b , a) used in the filtering by the equation (19).

○ステップＳ１３
計測確率計算部１３は、式（２４）により計測確率ｐ（Ｍ_Ｘ’，Ｍ_Ｙ’，Ｍ_φ’| Ｘ’，Ｙ’，φ’）の値を計算する。ここで、式（２５）−（２７）におけるＭ_Ｘ’，Ｍ_Ｙ’，Ｍ_φ’には、位置取得部で取得した値を利用する。 ○ Step S13
The measurement probability calculation unit 13 calculates the value of the measurement probability p (M _{X ′} , M _{Y ′} , M _{φ ′} | X ′, Y ′, φ ′) according to the equation (24). Here, the values acquired by the position acquisition unit are used for M _{X ′} , M _{Y ′} , and M _{φ ′} in Expressions (25) to (27).

○ステップＳ１４
存在確率予測値計算部１４は、全ての遷移先状態（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’）について、（２８）式により、遷移先状態（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’）に自律飛行船が存在する確率の予測値（存在確率予測値）Ｐ’（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’）を計算する。 ○ Step S14
Existence probability predicted value calculation unit 14, all of the transition destination state _{_{(X b, i ', Y}} b, i', φ b, i ') for, (28) the formula, the transition destination state _{(X b, i'} , Y _{b, i ′} , φ _{b, i ′} ) Predicted value of probability of existence of an autonomous airship (presence probability predicted value) P ′ (X _{b, i ′} , Y _{b, i ′} , φ _{b, i ′} ) Calculate

○ステップＳ１５
存在確率予測値補正部１５は、存在確率予測値Ｐ’（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’）の補正を（３０）式により補正する。 ○ Step S15
The existence probability predicted value correction unit 15 corrects the correction of the existence probability prediction value P ′ (X _{b, i ′} , Y _{b, i ′} , φ _{b, i ′} ) using the equation (30).

○ステップＳ１６
移動先状態決定部１６は、存在確率予測値補正部１５によって計算されたＰ（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’｜ｂ’）の値のうち、Ｐ（Ｘ_ｂ，ｉ’，Ｙ_ｂ，ｉ’，φ_ｂ，ｉ’｜ｂ’）の値を最大にする状態（Ｘ^ｍａｘ _ｂ’，Ｙ^ｍａｘ _ｂ’,φ^ｍａｘ _ｂ’）をｂ’における中心状態とし、式（３１）（３２）により計算されるエントロピーの値から、ｂ’のエントロピー状態成分Ｈ_Ｙｂ’，Ｈ_φｂ’を決定し、ｂ’＝(Ｘ^ｍａｘ _ｂ’，Ｙ^ｍａｘ _ｂ’,φ^ｍａｘ _ｂ’，Ｈ_Ｙｂ’，Ｈ_φｂ’）を移動先状態に決定する。 ○ Step S16
Destination state determination section 16, the existence probability prediction value correction unit 15 is calculated by the _{P (X b, i ',} Y b, i', φ b, i '| b') of the value of, P (X _{b, i ′} , Y _{b, i ′} , φ _{b, i ′} | b ′) is set to the maximum state (X ^max _{b ′} , Y ^max _{b ′} , φ ^max _{b ′} ) as the central state in b ′. The entropy state components H _{Yb ′} and H _{φb ′} of b ′ are determined from the entropy values calculated by the equations (31) and (32), and b ′ = (X ^max _{b ′} , Y ^max _{b ′} , φ ^max _{b ′} , H _{Yb ′} , H _{φb ′} ) are determined as the movement destination states.

○ステップＳ１７
行動決定部１７は、移動先状態決定部１６によって決定された状態ｂ’について、記憶部５０に記憶された方策π（ｂ’，ａ）を検索して行動ａを決定する。自律飛行船の制御部６２は、決定された行動ａに基づきアクチュエータなどの制御を実行する。 ○ Step S17
The action determination unit 17 searches the policy π (b ′, a) stored in the storage unit 50 for the state b ′ determined by the destination state determination unit 16 and determines the action a. The control unit 62 of the autonomous airship executes control of the actuator and the like based on the determined action a.

以上の動作を各時刻ステップで行い、各時刻ステップにおける自律飛行船の行動制御を行う。 The above operation is performed at each time step, and the behavior of the autonomous airship at each time step is controlled.

実施形態等から明らかなように、風速にばらつきがある環境中にて、自律飛行船の運動に対する、方位角による風の影響の違いを考慮した自律飛行船の動作計画が可能である。これにより、目標位置へ高い確率で自律飛行船が到達し、かつ障害物回避の確率も最大の経路を計算することができる。 As is clear from the embodiments and the like, it is possible to plan the operation of the autonomous airship in consideration of the difference in the influence of the wind due to the azimuth on the movement of the autonomous airship in an environment where the wind speed varies. As a result, it is possible to calculate a route in which the autonomous airship reaches the target position with high probability and has the maximum probability of obstacle avoidance.

＜動作計画装置および動作制御装置のハードウェア構成例＞
上述の実施形態に関わる各装置は、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ＣＰＵ（Central Processing Unit）〔キャッシュメモリなどを備えていてもよい。〕、メモリであるＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）と、ハードディスクである外部記憶装置、並びにこれらの入力部、出力部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置間のデータのやり取りが可能なように接続するバスなどを備えている。また必要に応じて、各装置に、ＣＤ−ＲＯＭなどの記憶媒体を読み書きできる装置（ドライブ）などを設けるとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 <Hardware configuration example of motion planning device and motion control device>
Each device according to the above-described embodiments may include an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, a CPU (Central Processing Unit) [cache memory, or the like. ] RAM (Random Access Memory) or ROM (Read Only Memory) and external storage device as a hard disk, and data exchange between these input unit, output unit, CPU, RAM, ROM, and external storage device It has a bus that can be connected. If necessary, each device may be provided with a device (drive) that can read and write a storage medium such as a CD-ROM. A physical entity having such hardware resources includes a general-purpose computer.

各装置の外部記憶装置には、動作計画ないし行動制御のためのプログラム並びにこのプログラムの処理において必要となるデータなどが記憶されている〔外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくなどでもよい。〕。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。以下、データやその格納領域のアドレスなどを記憶する記憶装置を単に「記憶部」と呼ぶことにする。 The external storage device of each device stores a program for operation planning or behavior control and data necessary for processing of this program [not limited to the external storage device, for example, the program is read by a read-only storage device. It may be stored in a certain ROM. ]. Data obtained by the processing of these programs is appropriately stored in a RAM or an external storage device. Hereinafter, a storage device that stores data, addresses of storage areas, and the like is simply referred to as a “storage unit”.

各装置では、記憶部に記憶されたプログラムとプログラムの処理に必要なデータが必要に応じてＲＡＭに読み込まれて、ＣＰＵで解釈実行・処理される。この結果、ＣＰＵが所定の機能（状態・行動選択部、現在位置座標決定部、移動先位置座標選択部、計測位置座標選択部、存在確率計算部、遷移確率計算部、繰り返し制御部、価値関数計算部、方策計算部／位置取得部、状態遷移確率計算部、計測確率計算部、存在確率予測値計算部、存在確率予測値補正部、移動先状態決定部、行動決定部）を実現することで動作計画ないし行動制御が実現される。 In each device, a program stored in the storage unit and data necessary for processing the program are read into the RAM as necessary, and are interpreted and executed by the CPU. As a result, the CPU performs predetermined functions (state / action selection unit, current position coordinate determination unit, destination position coordinate selection unit, measurement position coordinate selection unit, existence probability calculation unit, transition probability calculation unit, repetition control unit, value function A calculation unit, a policy calculation unit / position acquisition unit, a state transition probability calculation unit, a measurement probability calculation unit, an existence probability prediction value calculation unit, an existence probability prediction value correction unit, a destination state determination unit, and an action determination unit) The action plan or action control is realized.

＜補記＞
本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。上述の実施形態では確率分布として正規分布を与えたがこれに限定する趣旨ではない。具体的な確率分布は例えば自律移動ロボットの特性や任務環境などに応じて適宜に設定される。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 <Supplementary note>
The present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the spirit of the present invention. In the above-described embodiment, the normal distribution is given as the probability distribution, but the present invention is not limited to this. The specific probability distribution is set as appropriate according to, for example, the characteristics of the autonomous mobile robot and the mission environment. In addition, the processing described in the above embodiment may be executed not only in time series according to the order of description but also in parallel or individually as required by the processing capability of the apparatus that executes the processing. .

また、上記実施形態において説明したハードウェアエンティティにおける処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 Further, when the processing functions in the hardware entity described in the above embodiment are realized by a computer, the processing contents of the functions that the hardware entity should have are described by a program. Then, by executing this program on a computer, the processing functions in the hardware entity are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, a hardware entity is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

Claims

自律移動ロボットの動作計画方法であって、
上記自律移動ロボットの状態が、デカルト座標、方位角、当該デカルト座標の確率分布についてのエントロピー、当該方位角の確率分布についてのエントロピーを含んで表現されるとし、
上記自律移動ロボットが遷移しえる予め定められた上記状態の集合（以下、状態集合という）と上記自律移動ロボットの採りえる行動の集合（以下、行動集合という）が予め定められており、
状態・行動選択部が、上記状態集合の要素と上記行動集合の要素との組み合わせのうち未選択の組み合わせを選択する状態・行動選択ステップと、
現在位置座標決定部が、上記状態・行動選択ステップにおいて選択された上記組み合わせに含まれる状態の下で、上記自律移動ロボットのデカルト座標および方位角を要素とする位置座標の確率分布を計算する現在位置座標決定ステップと、
移動先位置座標選択部が、状態遷移確率に基づき、上記自律移動ロボットの移動後の位置を計算する移動先位置座標選択ステップと、
存在確率計算部が、上記現在位置座標決定ステップで計算された上記自律移動ロボットの位置座標の確率分布にベイジアンフィルタを適用して、遷移先の状態を求める存在確率計算ステップと、
遷移確率計算部が、現在の状態から上記存在確率計算ステップで得られた状態へ遷移する遷移確率と当該遷移に伴う報酬を計算する遷移確率計算ステップと、
制御部が、上記状態・行動選択ステップにおいて選択された上記組み合わせについて、上記現在位置座標決定ステップ、上記移動先位置座標選択ステップ、上記存在確率計算ステップ、上記遷移確率計算ステップの各処理を所定回数繰り返し実行し、当該所定回数繰り返し実行した場合には、上記状態・行動選択ステップの処理を行う制御を行う制御ステップと、
価値関数計算部が、上記遷移確率と上記報酬を用いて価値関数を計算する価値関数計算ステップと、
方策計算部が、上記遷移確率と上記報酬と上記価値関数を用いて表される方策関数を計算する方策計算ステップと
を有する自律移動ロボットの動作計画方法。 An operation planning method for an autonomous mobile robot,
The state of the autonomous mobile robot is expressed including Cartesian coordinates, azimuth, entropy about the probability distribution of the Cartesian coordinates, and entropy about the probability distribution of the azimuth.
A predetermined set of states that the autonomous mobile robot can transition to (hereinafter referred to as a state set) and a set of actions that can be taken by the autonomous mobile robot (hereinafter referred to as an action set) are predetermined.
A state / behavior selection step in which the state / behavior selection unit selects an unselected combination among the combinations of the elements of the state set and the elements of the behavior set;
The current position coordinate determining unit calculates a probability distribution of position coordinates having Cartesian coordinates and azimuth angles of the autonomous mobile robot as elements in the state included in the combination selected in the state / action selection step. A position coordinate determination step;
A destination position coordinate selection unit that calculates a position after the movement of the autonomous mobile robot based on the state transition probability;
An existence probability calculation unit that applies a Bayesian filter to the probability distribution of the position coordinates of the autonomous mobile robot calculated in the current position coordinate determination step to obtain a transition destination state; and
A transition probability calculation unit, a transition probability calculation step for calculating a transition probability for transition from the current state to the state obtained in the existence probability calculation step and a reward associated with the transition,
For the combination selected in the state / behavior selection step, the control unit performs each process of the current position coordinate determination step, the movement destination position coordinate selection step, the existence probability calculation step, and the transition probability calculation step a predetermined number of times. Repetitively executed, and when it is repeatedly executed a predetermined number of times, a control step for performing control to perform the processing of the state / action selection step;
A value function calculating unit that calculates a value function using the transition probability and the reward;
An operation planning method for an autonomous mobile robot, wherein the policy calculation unit includes a policy calculation step for calculating a policy function represented by using the transition probability, the reward, and the value function.

請求項１に記載の自律移動ロボットの動作計画方法において、
計測位置座標選択部が、上記自律移動ロボットの位置座標を計測するセンサの計測誤差を確率分布で表し、当該センサで計測された位置座標と上記移動先位置座標選択ステップにおいて得られた上記自律移動ロボットの移動後の位置との誤差に基づいて、上記自律移動ロボットの移動後の位置を計算する計測位置座標選択ステップを有し、
上記存在確率計算ステップでは、存在確率計算部が、上記現在位置座標決定ステップで計算された上記自律移動ロボットの位置座標の確率分布にベイジアンフィルタを適用して得られた確率分布を上記計測位置座標選択ステップで得られた上記自律移動ロボットの移動後の位置に基づいて補正して、遷移先の状態を求め、
上記制御ステップは、制御部が、上記状態・行動選択ステップにおいて選択された上記組み合わせについて、上記現在位置座標決定ステップ、上記移動先位置座標選択ステップ、上記計測位置座標選択ステップ、上記存在確率計算ステップ、上記遷移確率計算ステップの各処理を所定回数繰り返し実行し、当該所定回数繰り返し実行した場合には、上記状態・行動選択ステップの処理を行う制御を行う
ことを特徴とする自律移動ロボットの動作計画方法。 The operation planning method for an autonomous mobile robot according to claim 1,
The measurement position coordinate selection unit represents a measurement error of a sensor that measures the position coordinates of the autonomous mobile robot as a probability distribution, and the autonomous movement obtained in the position coordinate measurement destination and the destination position coordinate selection step A measurement position coordinate selection step for calculating a position after movement of the autonomous mobile robot based on an error from the position after movement of the robot;
In the existence probability calculation step, the existence probability calculation unit calculates a probability distribution obtained by applying a Bayesian filter to the probability distribution of the position coordinates of the autonomous mobile robot calculated in the current position coordinate determination step. Correct based on the position after the movement of the autonomous mobile robot obtained in the selection step, find the transition destination state,
In the control step, the control unit determines the current position coordinate determination step, the destination position coordinate selection step, the measurement position coordinate selection step, and the existence probability calculation step for the combination selected in the state / action selection step. The operation plan of the autonomous mobile robot, wherein each process of the transition probability calculation step is repeatedly executed a predetermined number of times, and when the predetermined number of times is repeatedly executed, the state / behavior selection step is controlled. Method.

請求項１または請求項２に記載の自律移動ロボットの動作計画方法において、
上記遷移確率を求めるための計算を上記自律移動ロボットの１回の行動で遷移可能な状態の範囲に限定する
ことを特徴とする自律移動ロボットの動作計画方法。 In the operation planning method of the autonomous mobile robot according to claim 1 or 2,
An operation planning method for an autonomous mobile robot, characterized in that the calculation for obtaining the transition probability is limited to a range of states that can be transitioned by one action of the autonomous mobile robot.

請求項１から請求項３のいずれかに記載の自律移動ロボットの動作計画方法において、
上記遷移確率が０である場合には上記価値関数の計算を行わない
ことを特徴とする自律移動ロボットの動作計画方法。 In the movement planning method of the autonomous mobile robot according to any one of claims 1 to 3,
An operation planning method for an autonomous mobile robot, wherein the value function is not calculated when the transition probability is 0.

請求項１から請求項４のいずれかに記載の自律移動ロボットの動作計画方法で決まった動作計画に基づき、自律移動ロボットを制御する自律移動ロボットの制御方法であって、
記憶部には、上記動作計画方法により得られた方策関数および、上記自律移動ロボットの採りえる行動の集合が記憶されており、
位置取得部が、上記自律移動ロボットのデカルト座標および方位角を要素とする位置座標を計測するセンサの計測結果を取得する位置取得ステップと、
状態遷移確率計算部が、状態遷移確率に基づき、上記自律移動ロボットの移動後の位置を計算する状態遷移確率計算ステップと、
存在確率予測値計算部が、上記位置取得ステップにて取得された位置座標から遷移先の位置座標に上記自律移動ロボットが存在する確率の予測値（以下、存在確率予測値という）を求める存在確率予測値計算ステップと、
移動先状態決定部が、上記存在確率予測値を最大にする位置座標を遷移先とする移動先状態決定ステップと、
行動決定部が、上記方策関数により行動を決定する行動決定ステップと
を有する自律移動ロボットの制御方法。 An autonomous mobile robot control method for controlling an autonomous mobile robot based on an operation plan determined by the autonomous mobile robot operation plan method according to any one of claims 1 to 4,
The storage unit stores a policy function obtained by the motion planning method and a set of actions that can be taken by the autonomous mobile robot.
A position acquisition step in which a position acquisition unit acquires a measurement result of a sensor that measures position coordinates having Cartesian coordinates and azimuth angles of the autonomous mobile robot as elements;
A state transition probability calculating unit that calculates a position after the movement of the autonomous mobile robot based on the state transition probability; and
Presence probability that the existence probability prediction value calculation unit obtains a prediction value (hereinafter referred to as existence probability prediction value) of the probability that the autonomous mobile robot exists at the position coordinate of the transition destination from the position coordinates acquired in the position acquisition step. A predicted value calculation step;
A destination state determination unit is a destination state determination step in which the position coordinate that maximizes the existence probability prediction value is a transition destination;
A method for controlling an autonomous mobile robot, wherein the behavior determination unit includes a behavior determination step of determining a behavior by the policy function.

請求項５に記載の自律移動ロボットの制御方法において、
計測確率計算部が、確率分布で表された上記センサの計測誤差に基づいて、上記センサで計測された位置座標の確率分布を求める計測確率計算ステップと、
存在確率予測値補正部が、上記存在確率予測値を上記計測確率計算ステップで得られた確率分布に基づいて補正する存在確率予測値補正ステップとを有し、
上記移動先状態決定ステップでは、移動先状態決定部が、上記存在確率予測値補正ステップで補正された存在確率予測値を最大にする位置座標を遷移先とする
ことを特徴とする自律移動ロボットの制御方法。 The autonomous mobile robot control method according to claim 5,
A measurement probability calculation step for obtaining a probability distribution of position coordinates measured by the sensor based on the measurement error of the sensor represented by the probability distribution;
A presence probability predicted value correction unit, the presence probability predicted value correction step of correcting the presence probability predicted value based on the probability distribution obtained in the measurement probability calculation step,
In the destination state determination step, the destination state determination unit uses a position coordinate that maximizes the existence probability predicted value corrected in the existence probability prediction value correction step as a transition destination. Control method.

自律移動ロボットの動作計画装置であって、
上記自律移動ロボットの状態が、デカルト座標、方位角、当該デカルト座標の確率分布についてのエントロピー、当該方位角の確率分布についてのエントロピーを含んで表現されるとし、
上記自律移動ロボットが遷移しえる予め定められた上記状態の集合（以下、状態集合という）と上記自律移動ロボットの採りえる行動の集合（以下、行動集合という）が予め定められており、
上記状態集合の要素と上記行動集合の要素との組み合わせのうち未選択の組み合わせを選択する状態・行動選択部と、
上記状態・行動選択部によって選択された上記組み合わせに含まれる状態の下で、上記自律移動ロボットのデカルト座標および方位角を要素とする位置座標の確率分布を計算する現在位置座標決定部と、
状態遷移確率に基づき、上記自律移動ロボットの移動後の位置を計算する移動先位置座標選択部と、
上記現在位置座標決定部が算出した上記自律移動ロボットの位置座標の確率分布にベイジアンフィルタを適用して、遷移先の状態を求める存在確率計算部と、
現在の状態から上記存在確率計算部によって得られた状態へ遷移する遷移確率と当該遷移に伴う報酬を計算する遷移確率計算部と、
上記状態・行動選択部によって選択された上記組み合わせについて、上記現在位置座標決定部、上記移動先位置座標選択部、上記存在確率計算部、上記遷移確率計算部による各処理を所定回数繰り返し実行し、当該所定回数繰り返し実行した場合には、上記状態・行動選択部による処理を行う制御を行う制御部と、
上記遷移確率と上記報酬を用いて価値関数を計算する価値関数計算部と、
上記遷移確率と上記報酬と上記価値関数を用いて表される方策関数を計算する方策計算部と
を含む自律移動ロボットの動作計画装置。 An operation planning device for an autonomous mobile robot,
The state of the autonomous mobile robot is expressed including Cartesian coordinates, azimuth, entropy about the probability distribution of the Cartesian coordinates, and entropy about the probability distribution of the azimuth.
A predetermined set of states that the autonomous mobile robot can transition to (hereinafter referred to as a state set) and a set of actions that can be taken by the autonomous mobile robot (hereinafter referred to as an action set) are predetermined.
A state / action selection unit for selecting an unselected combination among the combinations of the elements of the state set and the elements of the action set;
A current position coordinate determination unit that calculates a probability distribution of position coordinates having Cartesian coordinates and azimuth as elements of the autonomous mobile robot under a state included in the combination selected by the state / action selection unit;
Based on the state transition probability, a destination position coordinate selection unit that calculates the position after movement of the autonomous mobile robot;
Applying a Bayesian filter to the probability distribution of the position coordinates of the autonomous mobile robot calculated by the current position coordinate determination section, an existence probability calculation section for obtaining a transition destination state;
A transition probability calculating unit that calculates a transition probability of transition from the current state to the state obtained by the existence probability calculating unit and a reward associated with the transition;
For the combination selected by the state / behavior selection unit, each process by the current position coordinate determination unit, the destination position coordinate selection unit, the existence probability calculation unit, and the transition probability calculation unit is repeatedly executed a predetermined number of times. When repeatedly executing the predetermined number of times, a control unit that performs control by the state / action selection unit, and
A value function calculator that calculates a value function using the transition probability and the reward;
An operation planning apparatus for an autonomous mobile robot, comprising: a policy calculation unit that calculates a policy function expressed using the transition probability, the reward, and the value function.

請求項７に記載の自律移動ロボットの動作計画装置において、
上記自律移動ロボットの位置座標を計測するセンサの計測誤差を確率分布で表し、当該センサで計測された位置座標と上記移動先位置座標選択部によって得られた上記自律移動ロボットの移動後の位置との誤差に基づいて、上記自律移動ロボットの移動後の位置を計算する計測位置座標選択部を含み、
上記存在確率計算部は、上記現在位置座標決定部によって算出された上記自律移動ロボットの位置座標の確率分布にベイジアンフィルタを適用して得られた確率分布を上記計測位置座標選択部によって得られた上記自律移動ロボットの移動後の位置に基づいて補正して、遷移先の状態を求め、
上記制御部は、上記状態・行動選択部によって選択された上記組み合わせについて、上記現在位置座標決定部、上記移動先位置座標選択部、上記計測位置座標選択部、上記存在確率計算部、上記遷移確率計算部による各処理を所定回数繰り返し実行し、当該所定回数繰り返し実行した場合には、上記状態・行動選択部による処理を行う制御を行う
ことを特徴とする自律移動ロボットの動作計画装置。 The autonomous mobile robot motion planning device according to claim 7,
The measurement error of the sensor that measures the position coordinate of the autonomous mobile robot is represented by a probability distribution, the position coordinate measured by the sensor and the position after the movement of the autonomous mobile robot obtained by the destination position coordinate selection unit A measurement position coordinate selection unit that calculates the position of the autonomous mobile robot after movement based on the error of
The existence probability calculation unit is obtained by the measurement position coordinate selection unit, a probability distribution obtained by applying a Bayesian filter to the probability distribution of the position coordinates of the autonomous mobile robot calculated by the current position coordinate determination unit. Correct based on the position after the movement of the autonomous mobile robot, find the state of the transition destination,
The control unit, for the combination selected by the state / action selection unit, the current position coordinate determination unit, the destination position coordinate selection unit, the measurement position coordinate selection unit, the existence probability calculation unit, the transition probability An operation planning apparatus for an autonomous mobile robot, wherein each process by the calculation unit is repeatedly executed a predetermined number of times, and control is performed to perform the process by the state / behavior selection unit when the process is repeatedly executed the predetermined number of times.

請求項７または請求項８に記載の自律移動ロボットの動作計画装置によって決まった動作計画に基づき、自律移動ロボットを制御する自律移動ロボットの制御装置であって、
上記動作計画装置により得られた方策関数と、上記自律移動ロボットの採りえる行動の集合を記憶する記憶部と、
上記自律移動ロボットのデカルト座標および方位角を要素とする位置座標を計測するセンサの計測結果を取得する位置取得部と、
状態遷移確率に基づき、上記自律移動ロボットの移動後の位置を計算する状態遷移確率計算部と、
上記位置取得部によって取得された位置座標から遷移先の位置座標に上記自律移動ロボットが存在する確率の予測値（以下、存在確率予測値という）を求める存在確率予測値計算部と、
上記存在確率予測値を最大にする位置座標を遷移先とする移動先状態決定部と、
上記方策関数により行動を決定する行動決定部と
を有する自律移動ロボットの制御装置。 An autonomous mobile robot control device for controlling an autonomous mobile robot based on the motion plan determined by the autonomous mobile robot motion planning device according to claim 7,
A policy unit obtained by the motion planning device, a storage unit that stores a set of actions that the autonomous mobile robot can take,
A position acquisition unit that acquires a measurement result of a sensor that measures position coordinates having Cartesian coordinates and azimuth angles of the autonomous mobile robot as elements;
Based on the state transition probability, a state transition probability calculation unit that calculates the position of the autonomous mobile robot after movement;
A presence probability predicted value calculation unit for obtaining a predicted value of the probability that the autonomous mobile robot is present at the position coordinate of the transition destination from the position coordinates acquired by the position acquisition unit (hereinafter referred to as a presence probability predicted value);
A destination state determination unit whose transition destination is the position coordinate that maximizes the existence probability prediction value;
A control apparatus for an autonomous mobile robot, comprising: an action determining unit that determines an action using the policy function.

請求項９に記載の自律移動ロボットの制御装置において、
確率分布で表された上記センサの計測誤差に基づいて、上記センサで計測された位置座標の確率分布を求める計測確率計算部と、
上記存在確率予測値を上記計測確率計算部によって得られた確率分布に基づいて補正する存在確率予測値補正部とを含み、
上記移動先状態決定部は、上記存在確率予測値補正部によって補正された存在確率予測値を最大にする位置座標を遷移先とする
ことを特徴とする自律移動ロボットの制御装置。 The autonomous mobile robot control device according to claim 9,
A measurement probability calculation unit that obtains a probability distribution of position coordinates measured by the sensor based on the measurement error of the sensor expressed by the probability distribution;
An existence probability prediction value correction unit that corrects the existence probability prediction value based on the probability distribution obtained by the measurement probability calculation unit,
The control apparatus for an autonomous mobile robot, wherein the destination state determination unit uses a position coordinate that maximizes the presence probability predicted value corrected by the presence probability prediction value correction unit as a transition destination.

請求項１から請求項４のいずれかに記載の自律移動ロボットの動作計画方法の各ステップをコンピュータに実行させるための自律移動ロボットの動作計画プログラム。 An autonomous mobile robot motion planning program for causing a computer to execute each step of the autonomous mobile robot motion planning method according to claim 1.

請求項５または請求項６に記載の自律移動ロボットの制御方法の各ステップをコンピュータに実行させるための自律移動ロボットの制御プログラム。 An autonomous mobile robot control program for causing a computer to execute each step of the autonomous mobile robot control method according to claim 5.